Wednesday, August 6, 2008

Diss Data TagCrowd

[Cross-posted at Wind Farm]

I've been playing around with TagCrowd quite a bit as I review and analyze my data. It's a fantastic tool that's helping me "hover above the data" (a [sage] recommendation of Eli's). Notice the happily coincidental part of the hovering advice and the tag cloud: "hover", "cloud." The idea is to step back from the data, depersonalize it. And so I employ the cloud (via TagCrowd). I've been playing with it all along, but now it feels like a legitimized research tool. Beautiful. I love it!

The tag cloud below is the visualization of all of the data that I currently have in Word Document format. Still I am missing about 200 or so pages of interview transcripts. As well, all of the photos and other cultural documents don't really transfer into TagCrowd.

There are a few potential, um, sticking points about employing TagCrowd as an analytic tool. First, I'm not entire certain of how it determines word frequency. For example, is "guy" and "guys" considered the same word? Either way it affects the frequency count. I assume that this is not the case. But I don't know for sure. Second, I'm struggling to determine which words to exclude from the visualization (there is a feature that allows you to make a list of words to exclude - nice!). So, for example, do I exclude "really" from the list? If so, why? Potentially, "really" or "pretty" signify something about the discourse of the guys as well as my fieldnoting. I'm not doing a discourse analysis, but still such decisions matter.

There are some major positives to using TagCrowd. First, it is a really cool way of presenting data in an alternative format. It's definitely non-traditional. Second, it offers a level of transparency to the analysis. It shows the word frequency and provides insight into the raw data. E.g. if I'm arguing that "Coach" played a major role in the literacy practices of the student-athletes you can look and see that "Coach" was one of the most frequently used terms (it doesn't come through in the cloud below b/c of some edits I made to the word list, but Coach was actually the most frequently appearing term throughout all of the data).

There are other positives and negatives. The positives, however, far outweigh the negatives. As a result, there will be a version of the below tag cloud appearing in my dissertation.

created at

1 comment:

Cro-Code said...

If you are still interested in correct frequency calculation, where both problems you mentioned as "sticky" are resolved, try Textanz tool. Besides words, Textanz allows to count wordforms and phrases. So you can easy get summary frequency for "guy" and "guys". You also have a full control over the list of words to ignore.
If interested, just google for "Textanz" and go to product page at Cro-Code software website.

Kind regards,
Alexander Potyomkin