The Great Noun List

Created on Friday, January 25, 2013. I last modified it on Sunday, February 10, 2019.
Filed under Software, Productivity.
 

A list of frequently-used common nouns in the English language, delivered in a plain .txt format.

 

What we have here is a list of the most frequently-used common nouns (i.e. not proper nouns) in English, the largest plain list of its kind freely available on this great internet (currently storing 6,775 nouns). It compiles a large number of common nouns including clothing, raw materials, professions, transportation, abstract concepts, matter, food, education, and many and sundry objects.

There are many sources on the internet for collecting tens of thousands of nouns (word corpuses, dictionary APIs, text mining Wikipedia), but my noun list is different because it contains only very frequently-used ones, and it has been checked line-by-line with actual human eyes to make sure that the words in it are ones that I’ve seen before. This makes the list more practical for use in software that will process normal written English. The list is an alphabetised text file with each word on a new line.

Guidelines for word selection

  • The nouns are singular (leg versus legs, baby versus babies) except where the noun is commonly used collectively (pants versus pant, barracks versus barrack).
  • There are some abstract collective nouns like earnings or statistics. I keep these if they are commonly used.
  • There are no proper nouns like countries and people’s names, nor are there racial, national, or religious groups (e.g. Albanian or Christian).
  • Some words are both nouns and verbs (swimming or climbing or gliding). I kept these words if their noun form is regularly used by people in everyday speech (“I went swimming yesterday”).

What are the usage and licensing restrictions?

There are no usage restrictions; I dedicate this list to the public domain. You don’t need to credit me or link to this page, although it would be nice if you did so that others could use the list if they wanted.

Can I distribute the list?

Yes, you can distribute the list as part of your program or project. If you want people to download the plain list for themselves, it would be best to send them to this page since I add new words all the time.

What can I use it for?

Whatevs, really! Use it to compile flashcards to teach English, fashion some sort of board game with it, or use it in software you’re programming, as I did for some auto-linking wiki software and for my random noun generator.

Where did the nouns come from?

The first 4,609 words of the list came from all over the web:

  • Word lists for students and language learners
  • Lists of animals, household objects, foods, and so on
  • The Simple English word list
  • Words that I added myself over the next 6 years

In February 2019, I upped my game by harvesting nouns from the Manually Annotated Sub-Corpus and verifying my nouns with the Oxford Dictionaries API. I removed 241 non-nouns and added 2,091 new ones.

I documented the R analysis pipelines of harvesting MASC and accessing Oxford API here:

  1. 01. Getting common nouns from MASC 3.0.0 (PDF, 254 KB)
  2. 02. Using the Oxford Dictionaries API to eliminate non-nouns (PDF, 220 KB)
  3. 03. Merging new and verified nouns for the final list (PDF, 222 KB)

I continue to come back and add new nouns as I think of them.

Is this a complete list of common nouns?

Oh goodness no.

The Second Edition of the 20-volume Oxford English Dictionary, published in 1989, contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of exclamations, conjunctions, prepositions, suffixes, etc. And these figures don’t take account of entries with senses for different word classes (such as noun and adjective).

How many words are there in the English language?, Oxford Dictionaries

What about verbs?

Leland R. Beaumont made The Verbinator by retrieving verbs from the Open American National Corpus (a very big and detailed word list) and then reducing that massive dataset using my list of commonly-used nouns. Leland provides a PDF file full of verbs that you can use.

That's all there is, there isn't any more.
© Desi Quintans, 2002 – 2018.