What Is Named Entity Recognition (NER) and How Does It Work?

Written by Jessica Schulze • Updated on

The NER technique is used in many industries, from entertainment to health care. Learn why it’s popular and how it works in this article.

[Featured Image] Multi-colored code displayed on a black background

Named entity recognition (NER) is a natural language processing (NLP) method, which is a subcategory of artificial intelligence (AI) and machine learning (ML). Although it isn’t exactly a household name, named entity recognition powers much of the technology we use every day. It helps search engines produce the results we seek and enables chatbots to answer our questions in a human-like, conversational manner. In the following article, you can learn more about how this technique works, who uses it, and why. 

What is named entity recognition?

Named entity recognition, or NER, is a process that extracts information from text. It’s also referred to as entity chunking, entity extraction, or entity identification. The goal is to identify, sort, and rank pieces of information by importance. Breaking this term down into two parts can help us better understand it:

Named Entity: A named entity is any object that can be referenced by name in text.

Recognition: NER systems are trained to recognize these objects and sort them into helpful classifications called entity types.

4 types of NER systems

  1. Dictionary-based. Dictionary-based NER systems reference terms listed in dictionaries to identify their presence in text. Dictionaries can be any collection of words related to a specific field or domain. You can create one yourself or use public sources such as databases. 

  2. Rule-based. Rule-based NER systems rely on a set of instructions for extracting named entities from text. You must create the rules based on two types of instruction: pattern-based rules, which relate to word forms and structure, and context-based rules like “if a contraction such as Mr. or Ms. precedes a name, then that contraction is the person’s honorific title.” These rules can also be combined with dictionaries.

  3. Machine learning-based. Machine learning-based NER systems are based on statistical models designed to identify entity names. To develop an ML-based NER system, the machine learning model must be trained on annotated documents. Annotated documents have explanations that help the machine learn to produce entity names based on instruction and past experiences.

  4. Hybrid systems. Hybrid NER systems combine more than one of the approaches listed above. 

What is named entity recognition used for?

NER is especially useful for analyzing unstructured text. In the context of data sets, “unstructured” refers to the absence of organization or database formatting. For example, the collection of files in your computer can be considered unstructured. If you sorted those files into categories such as portable document formats (PDFs) and word documents (DOCs), they would become structured. NER systems reduce the need for time and resource-consuming human analysis, making them ideal for situations that involve large quantities of text.

Industry applications of NER

  • Customer service. NER models are used in customer service to power chatbots and organize data related to customer care. For example, ChatGPT responds to user queries conversationally by identifying relevant entities to determine context. A customer support system can route users to the appropriate departments by categorizing their complaints and matching them to resolutions.

  • Health care. Medical professionals use NER models to analyze large amounts of documentation regarding diseases, drugs, and patients. Being able to quickly identify and extract the most pertinent information from lengthy, unstructured text helps reduce research time. 

  • Finance. In the financial field, NER can be used to monitor trends and inform risk analyses. Aside from financial information such as loans and earnings reports, NER models can analyze company names and other relevant mentions on social media to monitor developments that may affect stock prices. 

  • Entertainment. Recommendation systems such as the ones you see on Netflix, Spotify, and Amazon are often powered by NER models that analyze your search history and content you’ve recently interacted with. 

Why we use named entity recognition in NLP

Named entity recognition systems can be used to enhance other natural language processing tasks, such as parsing. For example, NER can increase the efficiency of part-of-speech tagging or the categorization of words that correspond with specific parts of speech depending on context.

Placeholder

How does named entity recognition work?

The named entity recognition process can be broken down into five steps:

  1. Tokenization. Text must first be split into smaller splices that the NER system can process. These splices can be as small as single words or as large as whole sentences. For example, “A24 released a movie starring Mia Goth” may be split into the following tokens: A24, movie, Mia, Goth. 

  2. Identification. This step is where statistical methods or semantic rules come into play. The NER system can identify entities by format or capitalization. For example, the capitalization in “Mia” and the subsequent word “Goth” indicates a proper noun.  

  3. Classification. Now that the text has been broken down into identifiable pieces, each token can be sorted into predefined categories. Examples of these categories may include “company,” “person,” or “location.”

  4. Contextual analysis. To improve output accuracy, NER systems use context clues. Using the previous example, “Goth” will be recognized as a last name rather than a subculture since the identification process determined it to be a proper noun and the classification process placed it under the category of “person.” 

  5. Post-processing. The post-processing phase is used to refine the NER system’s results. You might use an information base to enhance the data set it’s working with or fine-tune categorization rules to resolve inexactness.

Pros and cons of using named entity recognition systems

AdvantagesDisadvantages
Automates information extraction in large volumes of textDefining rules and providing NER models with vocabulary can be time-consuming.
Applicable in nearly every industryHuman language evolves constantly, requiring NER systems to be updated to avoid false-positive identifications.
The NER process does not evaluate text for truthfulness.Can struggle with spelling variations and spoken word that’s been converted to text
Helps eliminate human errors during text analyses such as overlookingMachine-learning based NER outputs can be challenging to explain.

Learn more about named entity recognition with Coursera 

You can strengthen your knowledge of natural language processing with expert-level guidance on Coursera. In the Natural Language Processing Specialization offered by DeepLearning.AI, you’ll gain expertise surrounding named entity recognition, language models, and machine translation. By the end, you’ll have learned to build a chatbot and earned a shareable certificate for your resume. 

Keep reading

Updated on
Written by:

Writer

Jessica is a technical writer who specializes in computer science and information technology. Equipp...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.