Beyond the Average: How Many Kanji and Vocabulary Do You Really Need to Understand Japanese?
Axogo Research Team
-->
Axogo Research Team
For anyone learning Japanese, the question is inevitable: How much do I need to learn to finally understand things?
The standard answer usually involves statistics based on word frequency. You'll often hear statements like: "If you learn the 2,000 most frequent kanji, you'll understand 90% of any average Japanese text."
While technically true, this approach is fundamentally misleading. Understanding 90% of the words in a sentence often means not understanding the full meaning of the sentence. And when you're trying to read a manga, watch an anime, or comprehend a news article, those missing words are the ones that make or break your comprehension.
This is the central problem of relying on frequency averages: the words that make up the "other 10%" are often the crucial subject, the complex verb, or the key modifier.
But here’s the issue: frequency is calculated across all words in a corpus. Just because you know 90% of the words in a text doesn’t mean you can actually read it. Imagine this sentence:
明日の試験に合格できるかどうか分からない。
(Ashita no shiken ni gōkaku dekiru ka dō ka wakaranai.)
(I don’t know if I’ll pass tomorrow’s exam.)
If you don’t know the word 合格 (gōkaku, to pass an exam), the sentence collapses. You might have understood 90% of the tokens (words and particles) in the sentence, but it wasn't enough to grasp the main idea. When a text is truly challenging, it's not the repeated, common particles or basic verbs that trip you up—it's the unique vocabulary.
To find a more accurate answer, we built a massive database and changed the evaluation metric from "words understood" to "sentences understood."
We compiled a database of over 120 million unique Japanese sentences spanning every major domain, including:
Entertainment: Anime, movies, and manga scripts.
Academic/News: Wikipedia articles, news reports, and educational texts.
Literature: Books and general articles.
Using this unparalleled corpus, our goal wasn't to see what percentage of words we could cover, but to determine the minimum kanji and vocabulary needed to understand a specific percentage of the entire sentences in the database.
It's important to note a critical feature of our word counts: the vocabulary numbers provided represent base forms (or "lemmas"). They do not account for conjugations or derived words. For instance:
Verbs: The base verb する (suru, to do) is counted as one word, regardless of whether it appears as します (shimasu), した (shita), or され (sare).
Adjectives/Adverbs: The adjective 速い (hayai, fast) and the corresponding adverb 速く (hayaku, quickly) are counted as a single entry.
Derived Nouns: A noun like 祭り (matsuri, festival) derived from the verb 祭る is counted as a single word entry.
This means the number of words in this research is a pure count of the unique concepts and roots you need to master, not the total number of inflected forms you will encounter.
Our methodology was deliberately strict to reflect true understanding:
Full Comprehension (Readability): A sentence is considered fully readable only if every single kanji and vocabulary word within it is known.
Near-Comprehension (Guessability): A sentence is considered guessable if it contains only one unknown word. This single unknown word must also be composed entirely of kanji that the learner already knows (allowing for a highly educated guess based on the kanji's meaning). Any sentence with more than one unknown word was immediately marked as incomprehensible.
By setting these high standards, we created a requirement that forces the vocabulary and kanji sets to cover the unique and diverse words that actually convey meaning, rather than just the most common fillers.
The results show a clear picture of what's required for robust, domain-spanning Japanese comprehension, proving that the required vocabulary count is significantly higher than most frequency-based estimates suggest.
With about 1,500 kanji and 4,000 words, you can expect to understand three out of every four unique sentences you encounter across a wide range of media. At this level, you can follow most conversations, get the main plot points of a show, and handle simple news.
Reaching 85% comprehension requires a substantial jump to over 6,200 words and nearly 2,000 kanji. This leap primarily covers the vocabulary needed for specific topics—the terminology for politics, the jargon for a fantasy setting, or the complex emotional language in a novel. This is where you become a functional Japanese user, rarely getting completely lost in a text.
To understand 95% of the unique sentences in our massive database, the true challenge becomes clear. You need to know over 2,500 kanji and a massive 13,157 base vocabulary words. This vocabulary is what truly separates advanced students from native-level comprehension. It encompasses highly specific, academic, and technical terms, including a huge number of proper nouns that only appear infrequently but are vital for context.
Our data confirms a crucial paradigm shift for Japanese learners:
Forget the 90% trap. Knowing 90% of the words in theory isn’t enough if you can’t get through the sentence. It's the last word that matters, not the average.
Vocabulary breadth matters most. Beyond kanji recognition, what really drives comprehension is the size of your usable, base-form vocabulary.
Expect steep costs at higher levels. After ≈ 85% comprehension, each extra percent requires a massive, nonlinear jump in knowledge because you are chasing the rare, highly-specific words.
Your domain changes everything. Anime vocabulary is different from news vocabulary. To read across all genres, you need far broader exposure.
So instead of asking:
“How many kanji do I need to know?”
A better question is:
“With my current kanji and vocabulary, what percentage of Japanese sentences can I fully understand?”
That shift matters. Because when it comes to reading Japanese, it’s not about the coverage of characters in a text — it’s about whether the sentences themselves make sense. And thanks to a database of 120 million real sentences, we now have numbers that reflect just that.
Try axogo for free and start optimizing your content today.