The top 2000 words in Russian

I came across a good resource for learners of Russian just now: this page, which has the top 2000 words used in modern Russian. The words provided are based on The frequency dictionary for Russian.

According to the frequency dictionary, the top 2000 most used words in Russian account for 72% of the word forms used in texts, so if you learn these, you’ll be well on your way to being able to (slowly) work your way through many Russian texts. The site provides both lists of the words, coupled with their usage frequency, along with their parts of speech, and of course, the translations. Also available are quizes for all of the words.

While the frequency dictionary page doesn’t offer any definitions, they offer lists of Russian words beyond the top 2000. They offer one list of “32,000 words with frequency greater than 1 ipm (one instance per million).” They offer a second list, with the top 5,000 most often used words in Russian. I’d say the latter would be more useful for learners of Russian.

There’s a bit of “interesting” data on the frequency dictionary page which I enjoyed reading:

  • The average word length is 5.28 characters.
  • The average sentence length is 10.38 words.
  • 1000 most frequent lemmas cover 64.0708% of word forms in texts.
  • 2000 most frequent lemmas cover 71.9521% of word forms in texts.
  • 3000 most frequent lemmas cover 76.6824% of word forms in texts.
  • 5000 most frequent lemmas cover 82.0604% of word forms in texts.

I think it’s interesting to note that the first 2000 words gets you to 72%, and yet learning another three thousand words will only gain another 10%. Diminishing returns, ineed. :)

This entry was posted in Resources, Russian, Vocabulary. Bookmark the permalink.

5 Responses to The top 2000 words in Russian

  1. Jon says:

    Hi Josh, excellent blog mate! Just found it a few nights ago and have been reading through.

    I’ve often thought about frequency lists and have tried to gauge how many words are necessary to have a strong foundation in a language. Originally I thought 70% (about 2000 words) might be adequate, but I know people from my Russian classes who know many more and still have great difficulty following natural conversations, movies, radio etc.

    For example (and this is just a really rough example!), the following sentence has 20% of the words removed and subsequently loses its meaning:

    Olya doesn’t like ????, she prefers ???? because it is easier to ???? and doesn’t make her feel ????.

    The original might have gone something like:

    Olya doesn’t like beer, she prefers vodka because it is easier to swallow and doesn’t make her feel too full.

    or:

    Olya doesn’t like vodka, she prefers mineral water because it is easier to drink and doesn’t make her feel dizzy.

    Of course these example are quite artificial, but you get my drift.

    So in reading around the net there seems to be some opinions that about 10,000 words make a good base. But that’s a heck of a lot of words and a big chunk of neural real estate (well, at least in my old brain!). Not to mention the time investment involved – I can do about 2 new words per day, on a consistent basis, so to reach the goal of 10,000 would take me approximately 14 years!!! :)

    But on your advice I’ve recently installed Anki (great word learning software), so hopefully I can reduce that down to 10 years or so :)

    All the best,

    Jon.

  2. Josh says:

    Jon: Welcome to the blog, and thanks for commenting! I too thought (I read it somewhere, actually) that knowing 2000 core words was enough to follow basic conversations and what not. After learning German for so long now, I don’t agree with the 2000 thing. I’m not sure how many words I know in German, but I’d bet my left arm I know more than 2000 words, and I still struggle through many news articles.

    10,000 words is indeed a large number, but I have to admit that after my experiences with German, I’d say 10,000 is closer to the mark than 2,000. And, agreed – the idea of needing to know 10,000 words as a “good base” in a language is daunting. But, really, it’s not that surprising. I’ve read that the average (whatever that means, heh) person knows 10,000-20,000 words. This site says that one fellow figured that the average college graduate might have an active vocabulary of 60,000 words, and a passive one of 75,000. I don’t know if that’s too high or not, but I do know that 2,000 words just isn’t going to cut it.

    Glad you’re liking Anki. It’s a welcome change from SuperMemo.

  3. Petr says:

    Hi, the frequency list for Russian that I think you’re referring to is now located here (the site that you link gives me 404):
    http://www.comp.leeds.ac.uk/ssharoff/frqlist/frqlist-en.html
    (and it might as well be the original location for that data)

    Oh yeah, and Anki is great. I also recommend the SWAC plugin for Anki, it will make Anki talk to you. You select words that you want to know, and it generates recognition and recall audio cards for you. (If given word is in the database. Not everything is.)

    And I’ve also done that ten-year-math that you mention, and it somehow took my enthusiasm away. Russian is nowhere near… Now I’m trying to keep my Anki unseen cards buffer always full, and get my 50 cards a day, which should map to something like 5-10 words per day (verbs easily take 7 cards each, plus 2 SWAC cards, etc.). Let’s see if I can keep that pace for more than a couple weeks.

    Meh, languages are hard. And I bet that Russian is eating away my English!

  4. xxd says:

    I reckon the 10,000 words is correct. I don’t see, however, that this is unsurmountable.
    With about an hour a day of effort you can get 100-200 words entered in, using a dictionary and the wiktionary word frequency list in English if you don’t have a native language frequency list. You can get your 10,000 words entered into anki in less than 6 months and complete the memorisation in another six months for about an hour’s worth of effort per day. In the second year you’ll have much less work to do in anki and can use the time to watch movies or read books or practise with native speakers at a language swap. I reckon 600-1000 hours effort is all you need to reach acceptable levels of fluency.

    An interesting side note: I’m currently learning french in anki and I’m not attempting to learn written french. I’m doing purely mp3 words culled from a frequency list and listening to anki. No grammar at all. Interestingly, unlike when I learned Spanish using supermemo (which I’m now 100% fluent at), I’m learning much faster.

    I think the brain is hard wired for learning spoken languages and you set yourself up for failure trying to force written language on top before you’re ready.

    Anecdotal evidence I know but I reckon I’m already 80% of the way there with about six months effort in so far. I reckon I’ll be functionally fluent by march 2011.

  5. Robert Dupuy says:

    I suppose my base of words in Russian is well over 5000. I’ve tried to count a couple years ago, and it was over 3,000 then.

    But I agree with the notion that you only know 80% and only then the 80% doesn’t even feel like 80% yes you know the common words, take a word like “and” and it is in every text, but you almost don’t even want to count such a word.

    As an earlier commenter said, the words that you need to know, to understand the meaning – it’s devastating when you don’t have them.

    So, I’ll just agree with everyone else, 10,000 is about right. The author of the 10,000 words in frequency order book, said he thought 8,000 would be sufficient, so 8k to 10k, somewhere about there. I have to look up words almost every single day, to read any type of book, newspaper, or chat in a chat room. With 5,000 words! But then again, if you need 8,000 I guess I’m still missing several thousand and quite a chunk percentage wise, so I shouldn’t be surprised.

    I’m not convinced that word order lists are necessary, but they might be…. I’ve been learning for 8 years, admittedly, my progress is slow because its a hobby for me. But I recently met a fellow whose been learning for only 3 and he’s, got the full 10,000 words in his head already….he swears by it, so maybe there is something to it.

    But I also think if you are reading, say 20 to 30 books a year, you probably are getting the most frequent words anyway – just because you looked them up so many times. Hey I’m slow, but by the time I’ve looked up a word 300 times or so – even I’m going to remember it.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

The top 2000 words in Russian | Language Geek

The top 2000 words in Russian

I came across a good resource for learners of Russian just now: this page, which has the top 2000 words used in modern Russian. The words provided are based on The frequency dictionary for Russian.

According to the frequency dictionary, the top 2000 most used words in Russian account for 72% of the word forms used in texts, so if you learn these, you’ll be well on your way to being able to (slowly) work your way through many Russian texts. The site provides both lists of the words, coupled with their usage frequency, along with their parts of speech, and of course, the translations. Also available are quizes for all of the words.

While the frequency dictionary page doesn’t offer any definitions, they offer lists of Russian words beyond the top 2000. They offer one list of “32,000 words with frequency greater than 1 ipm (one instance per million).” They offer a second list, with the top 5,000 most often used words in Russian. I’d say the latter would be more useful for learners of Russian.

There’s a bit of “interesting” data on the frequency dictionary page which I enjoyed reading:

  • The average word length is 5.28 characters.
  • The average sentence length is 10.38 words.
  • 1000 most frequent lemmas cover 64.0708% of word forms in texts.
  • 2000 most frequent lemmas cover 71.9521% of word forms in texts.
  • 3000 most frequent lemmas cover 76.6824% of word forms in texts.
  • 5000 most frequent lemmas cover 82.0604% of word forms in texts.

I think it’s interesting to note that the first 2000 words gets you to 72%, and yet learning another three thousand words will only gain another 10%. Diminishing returns, ineed. :)

This entry was posted in Resources, Russian, Vocabulary. Bookmark the permalink.

5 Responses to The top 2000 words in Russian

  1. Jon says:

    Hi Josh, excellent blog mate! Just found it a few nights ago and have been reading through.

    I’ve often thought about frequency lists and have tried to gauge how many words are necessary to have a strong foundation in a language. Originally I thought 70% (about 2000 words) might be adequate, but I know people from my Russian classes who know many more and still have great difficulty following natural conversations, movies, radio etc.

    For example (and this is just a really rough example!), the following sentence has 20% of the words removed and subsequently loses its meaning:

    Olya doesn’t like ????, she prefers ???? because it is easier to ???? and doesn’t make her feel ????.

    The original might have gone something like:

    Olya doesn’t like beer, she prefers vodka because it is easier to swallow and doesn’t make her feel too full.

    or:

    Olya doesn’t like vodka, she prefers mineral water because it is easier to drink and doesn’t make her feel dizzy.

    Of course these example are quite artificial, but you get my drift.

    So in reading around the net there seems to be some opinions that about 10,000 words make a good base. But that’s a heck of a lot of words and a big chunk of neural real estate (well, at least in my old brain!). Not to mention the time investment involved – I can do about 2 new words per day, on a consistent basis, so to reach the goal of 10,000 would take me approximately 14 years!!! :)

    But on your advice I’ve recently installed Anki (great word learning software), so hopefully I can reduce that down to 10 years or so :)

    All the best,

    Jon.

  2. Josh says:

    Jon: Welcome to the blog, and thanks for commenting! I too thought (I read it somewhere, actually) that knowing 2000 core words was enough to follow basic conversations and what not. After learning German for so long now, I don’t agree with the 2000 thing. I’m not sure how many words I know in German, but I’d bet my left arm I know more than 2000 words, and I still struggle through many news articles.

    10,000 words is indeed a large number, but I have to admit that after my experiences with German, I’d say 10,000 is closer to the mark than 2,000. And, agreed – the idea of needing to know 10,000 words as a “good base” in a language is daunting. But, really, it’s not that surprising. I’ve read that the average (whatever that means, heh) person knows 10,000-20,000 words. This site says that one fellow figured that the average college graduate might have an active vocabulary of 60,000 words, and a passive one of 75,000. I don’t know if that’s too high or not, but I do know that 2,000 words just isn’t going to cut it.

    Glad you’re liking Anki. It’s a welcome change from SuperMemo.

  3. Petr says:

    Hi, the frequency list for Russian that I think you’re referring to is now located here (the site that you link gives me 404):
    http://www.comp.leeds.ac.uk/ssharoff/frqlist/frqlist-en.html
    (and it might as well be the original location for that data)

    Oh yeah, and Anki is great. I also recommend the SWAC plugin for Anki, it will make Anki talk to you. You select words that you want to know, and it generates recognition and recall audio cards for you. (If given word is in the database. Not everything is.)

    And I’ve also done that ten-year-math that you mention, and it somehow took my enthusiasm away. Russian is nowhere near… Now I’m trying to keep my Anki unseen cards buffer always full, and get my 50 cards a day, which should map to something like 5-10 words per day (verbs easily take 7 cards each, plus 2 SWAC cards, etc.). Let’s see if I can keep that pace for more than a couple weeks.

    Meh, languages are hard. And I bet that Russian is eating away my English!

  4. xxd says:

    I reckon the 10,000 words is correct. I don’t see, however, that this is unsurmountable.
    With about an hour a day of effort you can get 100-200 words entered in, using a dictionary and the wiktionary word frequency list in English if you don’t have a native language frequency list. You can get your 10,000 words entered into anki in less than 6 months and complete the memorisation in another six months for about an hour’s worth of effort per day. In the second year you’ll have much less work to do in anki and can use the time to watch movies or read books or practise with native speakers at a language swap. I reckon 600-1000 hours effort is all you need to reach acceptable levels of fluency.

    An interesting side note: I’m currently learning french in anki and I’m not attempting to learn written french. I’m doing purely mp3 words culled from a frequency list and listening to anki. No grammar at all. Interestingly, unlike when I learned Spanish using supermemo (which I’m now 100% fluent at), I’m learning much faster.

    I think the brain is hard wired for learning spoken languages and you set yourself up for failure trying to force written language on top before you’re ready.

    Anecdotal evidence I know but I reckon I’m already 80% of the way there with about six months effort in so far. I reckon I’ll be functionally fluent by march 2011.

  5. Robert Dupuy says:

    I suppose my base of words in Russian is well over 5000. I’ve tried to count a couple years ago, and it was over 3,000 then.

    But I agree with the notion that you only know 80% and only then the 80% doesn’t even feel like 80% yes you know the common words, take a word like “and” and it is in every text, but you almost don’t even want to count such a word.

    As an earlier commenter said, the words that you need to know, to understand the meaning – it’s devastating when you don’t have them.

    So, I’ll just agree with everyone else, 10,000 is about right. The author of the 10,000 words in frequency order book, said he thought 8,000 would be sufficient, so 8k to 10k, somewhere about there. I have to look up words almost every single day, to read any type of book, newspaper, or chat in a chat room. With 5,000 words! But then again, if you need 8,000 I guess I’m still missing several thousand and quite a chunk percentage wise, so I shouldn’t be surprised.

    I’m not convinced that word order lists are necessary, but they might be…. I’ve been learning for 8 years, admittedly, my progress is slow because its a hobby for me. But I recently met a fellow whose been learning for only 3 and he’s, got the full 10,000 words in his head already….he swears by it, so maybe there is something to it.

    But I also think if you are reading, say 20 to 30 books a year, you probably are getting the most frequent words anyway – just because you looked them up so many times. Hey I’m slow, but by the time I’ve looked up a word 300 times or so – even I’m going to remember it.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>