The 25 Most Commonly Used English Words make up about one-third of the words in all printed material in English:
- the
- of
- and
- a
- to
- in
- is
- you
- that
- it
- he
- was
- for
- on
- are
- as
- with
- his
- they
- I
- at
- be
- this
- have
- from
Word frequencies follow a power law (like Pareto’s famous 80-20 rule) — Zipf’s Law, in this case, named after the linguist George Kingsley Zipf who first proposed it:
Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. For example, in the Brown Corpus “the” is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69971 out of slightly over 1 million). True to Zipf’s Law, the second-place word “of” accounts for slightly over 3.5% of words (36411 occurrences), followed by “and” (28852). Only 135 vocabulary items are needed to account for half the Brown Corpus.