Generating lists of words from reading

Sometimes, I’ll do intensive reading on the computer, so as to make it a bit quicker – I have quick access to my software dictionaries, grammar websites, and the like. When doing this in the past, I’ve generally copied and pasted any words and expressions I didn’t know into a text document, so I could look them up after reading the article. The only downside to this was the constant flipping back and forth between what I was reading and the text document – lots of alt-tabbing, in other words. I figured out a geeky way to avoid doing this, which I thought might be of interest to some people.

It’s basically a bit like highlighting words on a page, and then magically pulling all of those words off the page to instantly put them in a list, at which point you can do whatever it is you want to do – add definitions to the list, look them up and add them to Anki, etc.

Here’s how to do it (and please note, unless someone knows a workaround, you’ll need Microsoft Word near the end):

1. Take whatever you want to read, copy it, and paste it into a word processing app. Google Docs will work fine for the initial steps.

2. Select all of the text (Ctrl+A in Google Docs), and remove all formatting (Ctrl+). You can also find Clear Formatting under the Format menu.

3. Go through the article, select any word you want to grab, and underline it (Ctrl+U). You can also bold it, or change it to a particular color. You can use whatever you want as your word marker as long as you’re consistent.

4. Unfortunately, there’s only one program that I’ve found so far that does the next step easily: Microsoft Word. I’m using Microsoft Word 2007; I’m not sure if older versions of the program have the option needed. For the fourth step, copy your marked up text and paste it into Word. Then, select one of your underlined words, then use Select Text with Similar Formatting.

This will select all of the words you marked up.

5. Copy the selected text in Word, and then paste into your word processor of choice (I usually head back to Google Docs at this point).

Rather than the words showing up in a list, they’re automatically formatted in a list, one word / expression per line.

Hopefully, this is a bit easier and quicker than highlighting a word, copying it, going to your text editor, pasting it, and then going back to your article.

(If anyone knows of an open source application that has an option like Word’s “Select Text with Similar Formatting,” please let me know; I checked out Open Office’s Find and Replace options, and didn’t see anything that worked.)

9 thoughts on “Generating lists of words from reading”

  1. You could also consider using a clipboard manager that remembers all your ctrl+C actions. Requires extra software but might be easier.

  2. You may use LingQ (, import your text, go through your text and create vocab items “LingQs” with the dictionaries, and use the built-in flashcards or export to your favourite program. Much easier to use.

    1. I agree that it would be easier, but it would also cost me $10 a month; I just looked, and you can’t export vocabulary as a Free member, nor can you mass delete LingQs to make room for new words. I gave LingQ a try (with a paid membership), and while I like the idea, there were enough quirks / bad design decisions that I chose to not use it.

    1. Hi Alexander,

      Yep, I have tried it; in fact, is one of the things that led me to figuring out how to quickly pull marked words out of a text (the other one was LingQ). I like how lets you add marked words to your lists, but I didn’t care for how there was no way to export the words.

  3. As a student of Spanish and a computer programmer at the same time, I created a website to help with problems like the one you are trying to solve. It is a mix of google- reader and google-translate. It allows you to subscribe to RSS feeds and mark words you want to learn. It memorizes the words you know and highlights the remaining ones. You can export marked words later to Anki. You can also upload documents for processing. To help me with learning, I implemented one more feature. It is completely unscientific and linguists would probably not like it. When I read English blogs (like yours) the software replaces some of the English words into Spanish ones mixing the 2 languages. There are obvious problems with double-meaning words taken out of context, but it still helps me memorize them. The website is free to use, I am asking for any donation to activate features I have to pay for (like storage)

  4. Thanks! Nice hack.
    Now, if there were just a way to do the lookups automatically too, we could be supremely lazy. šŸ™‚

