The top 2000 words in Russian

I came across a good resource for learners of Russian just now: this page, which has the top 2000 words used in modern Russian. The words provided are based on The frequency dictionary for Russian.

According to the frequency dictionary, the top 2000 most used words in Russian account for 72% of the word forms used in texts, so if you learn these, you’ll be well on your way to being able to (slowly) work your way through many Russian texts. The site provides both lists of the words, coupled with their usage frequency, along with their parts of speech, and of course, the translations. Also available are quizes for all of the words.

While the frequency dictionary page doesn’t offer any definitions, they offer lists of Russian words beyond the top 2000. They offer one list of “32,000 words with frequency greater than 1 ipm (one instance per million).” They offer a second list, with the top 5,000 most often used words in Russian. I’d say the latter would be more useful for learners of Russian.

There’s a bit of “interesting” data on the frequency dictionary page which I enjoyed reading:

  • The average word length is 5.28 characters.
  • The average sentence length is 10.38 words.
  • 1000 most frequent lemmas cover 64.0708% of word forms in texts.
  • 2000 most frequent lemmas cover 71.9521% of word forms in texts.
  • 3000 most frequent lemmas cover 76.6824% of word forms in texts.
  • 5000 most frequent lemmas cover 82.0604% of word forms in texts.

I think it’s interesting to note that the first 2000 words gets you to 72%, and yet learning another three thousand words will only gain another 10%. Diminishing returns, ineed. 🙂

15 thoughts on “The top 2000 words in Russian”

  1. Hi Josh, excellent blog mate! Just found it a few nights ago and have been reading through.

    I’ve often thought about frequency lists and have tried to gauge how many words are necessary to have a strong foundation in a language. Originally I thought 70% (about 2000 words) might be adequate, but I know people from my Russian classes who know many more and still have great difficulty following natural conversations, movies, radio etc.

    For example (and this is just a really rough example!), the following sentence has 20% of the words removed and subsequently loses its meaning:

    Olya doesn’t like ????, she prefers ???? because it is easier to ???? and doesn’t make her feel ????.

    The original might have gone something like:

    Olya doesn’t like beer, she prefers vodka because it is easier to swallow and doesn’t make her feel too full.

    or:

    Olya doesn’t like vodka, she prefers mineral water because it is easier to drink and doesn’t make her feel dizzy.

    Of course these example are quite artificial, but you get my drift.

    So in reading around the net there seems to be some opinions that about 10,000 words make a good base. But that’s a heck of a lot of words and a big chunk of neural real estate (well, at least in my old brain!). Not to mention the time investment involved – I can do about 2 new words per day, on a consistent basis, so to reach the goal of 10,000 would take me approximately 14 years!!! 🙂

    But on your advice I’ve recently installed Anki (great word learning software), so hopefully I can reduce that down to 10 years or so 🙂

    All the best,

    Jon.

    1. Hi,

      Can someone give me some info on how to get these top 10,000 used russian words and how to implement them in anki? Also, is there going to be much of a difference between written words and spoken words frequency lists?

      I would greatly appreciate the help.

      Alex

      1. Hi Alex,

        I imagine the top 10,000 words thing is referring to the Russian Learner’s Dictionary, by Nicholas Brown. It consists of 10,000 words sorted by frequency. You can see it here:

        http://www.amazon.com/Russian-Learners-Dictionary-Words-Frequency/dp/0415137926?tag=langgeek0d-20

        As for getting them into Anki, I don’t have a deck of these words, but it looks like someone else has already done all of the legwork in getting the words into a digital format. See this page for links to a Google spreadsheet with the words:

        http://www.reddit.com/r/russian/comments/289wba/10000_most_common_russian_words_in_spreadsheet/

        Additionally, the same user has set up 2 Memrise courses (parts 1 and 2) for these 10,000 words. You can see them here and here.

        As for your question about differences between written and spoken – yes, there will be some, and it largely comes down to what corpus is used to compile the word lists for the frequency dictionary. However, learning even the first few thousand is still a fine approach, as most of the vocabulary will overlap. Whether written or spoken, most of the basic words are going to be the same. If you come across a clearly odd word in the list that you can’t imagine ever needing in every day usage, just skip it. 🙂

  2. Jon: Welcome to the blog, and thanks for commenting! I too thought (I read it somewhere, actually) that knowing 2000 core words was enough to follow basic conversations and what not. After learning German for so long now, I don’t agree with the 2000 thing. I’m not sure how many words I know in German, but I’d bet my left arm I know more than 2000 words, and I still struggle through many news articles.

    10,000 words is indeed a large number, but I have to admit that after my experiences with German, I’d say 10,000 is closer to the mark than 2,000. And, agreed – the idea of needing to know 10,000 words as a “good base” in a language is daunting. But, really, it’s not that surprising. I’ve read that the average (whatever that means, heh) person knows 10,000-20,000 words. This site says that one fellow figured that the average college graduate might have an active vocabulary of 60,000 words, and a passive one of 75,000. I don’t know if that’s too high or not, but I do know that 2,000 words just isn’t going to cut it.

    Glad you’re liking Anki. It’s a welcome change from SuperMemo.

  3. Hi, the frequency list for Russian that I think you’re referring to is now located here (the site that you link gives me 404):
    http://www.comp.leeds.ac.uk/ssharoff/frqlist/frqlist-en.html
    (and it might as well be the original location for that data)

    Oh yeah, and Anki is great. I also recommend the SWAC plugin for Anki, it will make Anki talk to you. You select words that you want to know, and it generates recognition and recall audio cards for you. (If given word is in the database. Not everything is.)

    And I’ve also done that ten-year-math that you mention, and it somehow took my enthusiasm away. Russian is nowhere near… Now I’m trying to keep my Anki unseen cards buffer always full, and get my 50 cards a day, which should map to something like 5-10 words per day (verbs easily take 7 cards each, plus 2 SWAC cards, etc.). Let’s see if I can keep that pace for more than a couple weeks.

    Meh, languages are hard. And I bet that Russian is eating away my English!

  4. I reckon the 10,000 words is correct. I don’t see, however, that this is unsurmountable.
    With about an hour a day of effort you can get 100-200 words entered in, using a dictionary and the wiktionary word frequency list in English if you don’t have a native language frequency list. You can get your 10,000 words entered into anki in less than 6 months and complete the memorisation in another six months for about an hour’s worth of effort per day. In the second year you’ll have much less work to do in anki and can use the time to watch movies or read books or practise with native speakers at a language swap. I reckon 600-1000 hours effort is all you need to reach acceptable levels of fluency.

    An interesting side note: I’m currently learning french in anki and I’m not attempting to learn written french. I’m doing purely mp3 words culled from a frequency list and listening to anki. No grammar at all. Interestingly, unlike when I learned Spanish using supermemo (which I’m now 100% fluent at), I’m learning much faster.

    I think the brain is hard wired for learning spoken languages and you set yourself up for failure trying to force written language on top before you’re ready.

    Anecdotal evidence I know but I reckon I’m already 80% of the way there with about six months effort in so far. I reckon I’ll be functionally fluent by march 2011.

  5. I suppose my base of words in Russian is well over 5000. I’ve tried to count a couple years ago, and it was over 3,000 then.

    But I agree with the notion that you only know 80% and only then the 80% doesn’t even feel like 80% yes you know the common words, take a word like “and” and it is in every text, but you almost don’t even want to count such a word.

    As an earlier commenter said, the words that you need to know, to understand the meaning – it’s devastating when you don’t have them.

    So, I’ll just agree with everyone else, 10,000 is about right. The author of the 10,000 words in frequency order book, said he thought 8,000 would be sufficient, so 8k to 10k, somewhere about there. I have to look up words almost every single day, to read any type of book, newspaper, or chat in a chat room. With 5,000 words! But then again, if you need 8,000 I guess I’m still missing several thousand and quite a chunk percentage wise, so I shouldn’t be surprised.

    I’m not convinced that word order lists are necessary, but they might be…. I’ve been learning for 8 years, admittedly, my progress is slow because its a hobby for me. But I recently met a fellow whose been learning for only 3 and he’s, got the full 10,000 words in his head already….he swears by it, so maybe there is something to it.

    But I also think if you are reading, say 20 to 30 books a year, you probably are getting the most frequent words anyway – just because you looked them up so many times. Hey I’m slow, but by the time I’ve looked up a word 300 times or so – even I’m going to remember it.

  6. Hey, great blog!
    Google sent me your way (“Russian” and “anki”) and I’m forced to wonder – do you (or anyone you know) share your Russian deck??
    Thanks and all the best!
    Laura

  7. Great blog, Couple of things: Some languages are easier to learn without reading, others not so much. Russian, for me is easier with the reading; it helps to “see” the endings so I internalize the grammar. Word frequency lists or vocabulary flash cards are only a small part of language learning. You will almost have to relearn the high frequency words when you start to understand things like grammar, time, situtaion, appropriateness, etc. Using multiple methods seems to be best for me. What is awesome is when I learn words organically, as in unintentionally. That is harder to plan than word lists however, and word lists do give you that confidence boost. But diminishing returns is a wake up to people who parse out language acquisition in equally incremental chunks.

  8. I would go with a 2,000 word vocabulary. I have functioned with a much smaller language vocabulary – 2 words in Japanese when I went to visit there. Under 100 for my first visit to Ukraine (Russian), and probably 5,000 when I go to Spanish countries.

    If you want a larger vocabulary, I would suggest you focus on getting the first 2,000 words into active vocabulary. Then worry about whether you need to learn how to say, “liver, kidney, big toe, hair follicle, lug bolt, pilot, nuclear bomb, etc.”

    Russian is just plain hard. Simplifying how you learn will help, but it will not make it easy.

    Thank you for your blog, I was just ordering his CDROM. No need to order the CDROM if it is online and free. Save me $100 ….

    Thanks

    Wayne
    luvsiesous.com

  9. It all depends on what you want to do with Russian (or any other foreign language). If you want to read Pushkin in Russian or to read academic texts, you’ll need loads of words. If you want to have a chat with a few Russian friends about football, you’ll need far fewer words. So the whole 10,000 or 2,000 word thing is a bit of a red herring. You simply need the words that allow YOU to do what YOU want with the language. And the things you want to do with a foreign language are probably the same things you like to do with your mother tongue.

    I’d say that beyond 2000 words, your time is going to be more usefully used doing other things than learning words. Even the top 1000 words in Russian includes things like ‘colonel’ and ‘citizen’, words I very rarely use in English never mind Russian.

    I think Wayne and Bibes’ advice above is spot on.

    By the way, Memrise.com is another great flashcard type website.

    1. Matt,

      Thanks for your comment. Anki and Memrise are very different animals. Memrise revolves around the idea of spaced repetition, and you make (or import) cards that you like, whether they’re direct word translations, sentence items, or something else entirely. Memrise focuses on memorization through mnemonics – humorous examples, images, etc. I’ve used both, and tend to prefer Anki, simply because at a certain point, I find the act of coming up with a mnemonic for every word I want to learn to be rather cumbersome and time-consuming, and not all that helpful. For a lot of words, I don’t need anything special to remember it. Lots of people like the site, though, so give it a try if you never have.

      Thanks, also, for the PDF! That is indeed quite handy. That will definitely be printed out and added to a binder!

Leave a Reply