Build datasets (a little) faster

For those who are not computer genuises, but still want to try an electronic approach to content analysis, you might want to try the Firefox extension downTHEMall. What it does is allow you to selectively, automatically download all links on a website to a local folder.

Say, for instance, that you wanted to perform a content analysis of GoogleNews reporting. Open your start page, then right-click. You then have the option of filtering out certain URLs (e.g., those containing the strings "google" or ".net"), then automatically saving the others.

It's not the perfect solution, but it might be easier (and quicker) for some of you than learning Perl or Java.

Kudos to Shanna for the tip.


Jess Elliott said...

I tried Firefox's downTHEMall, and it has been nothing but hassle from day one for me. Half the time I can't even get it to work correctly. I am not computer illiterate by a long shot, but I would only recommend DTA to those who are very computer savvy. My dh, the techno junkie, likes it. It's not for your everyday computer user.

Ken said...

Thanks Jess. Yeah, we've had hiccups with it, as well. My biggest gripe is that it's only possible to filter in one direction - while excluding links is a simple matter, it's not currently possible to limit downloads by features shared only by the targeted links.

We were initially considering dTA as a quick-and-nasty solution to create a local corpus of all editorials on a given topic during a given time period (n >4,000 texts). We've mostly been able to reduce the impact of the filter limitations by manipulating our original search strategies, but it's still not perfect.

Just somewhat less tedious than downloading and naming each link individually.