Wednesday, February 5, 2025

What is natural language processing (NLP)?

What is natural language processing (NLP)?

 Natural Language Processing (NLP) is a field of artificial intelligence that deals with the interaction between computers and humans in natural language. 

It involves understanding the structure and meaning of human language, as well as the ways in which humans use language to communicate. 

What is NLP
What is NLP


NLP is used in a wide range of applications, including language translation, text summarization, chatbots, and sentiment analysis. It is an interdisciplinary field that draws on linguistics, computer science, and psychology, among other fields.

  • Sentiment analysis
  • Topic classification
  • Article/content recommendation
  • Search and retrieval Information
  • extractionIntent Detection

NLP tasks

Sentiment analysis 

  • Sentiment analysis is a natural language processing (NLP) task that involves classifying the sentiment of a given text as positive, negative, or neutral. It is often used to identify the overall opinion of a piece of writing or to identify the sentiment of a speaker or writer with respect to a particular topic or product.
  • Here is an example of sentiment analysis in action: Imagine that you are a data scientist working for a company that sells a new type of smartphone. You want to find out what people are saying about the phone online, so you scrape a large number of reviews from various websites and social media platforms.
  • To perform sentiment analysis on this data, you could use a pre-trained machine learning model that has been trained on a large dataset of labeled text data. The model might take as input a review written by a customer, and output a prediction of whether the sentiment of the review is positive, negative, or neutral.

Examples: 
"I absolutely love my new smartphone! The camera is amazing and the battery life is fantastic. I would highly recommend this phone to anyone." The model might output a prediction of "positive" sentiment, because the review is expressing overall satisfaction with the product.
"I am extremely disappointed with my new smartphone. The battery life is terrible and the camera quality is subpar" The model might output a prediction of "negative" sentiment, because the review is expressing overall dissatisfaction with the product.

Topic classification

  • Topic detection, also known as topic identification or topic classification, is a NLP task that involves automatically identifying the main topics or themes that are present in a text. This is often used to organize or summarize large amounts of text data by grouping similar documents together or to provide a high-level overview of the content of a document. 
  • One example of topic detection in action is a news website that uses NLP to automatically categorize articles into different topics such as politics, sports, entertainment, and business. When a user visits the website, they can filter the articles by topic to only see the ones that are relevant to their interests.
  • Another example is a social media platform that uses topic detection to identify the main themes of a user's posts and suggests related content or advertisements. For example, if a user frequently posts about their workouts and healthy eating habits, the platform may suggest fitness products or healthy recipe ideas.
  • Overall, topic detection is a useful NLP tool for organizing and understanding large amounts of text data, and it has a wide range of applications in various industries.

Search and retrieval 


Search and retrieval is an NLP application that involves finding and retrieving relevant information from a large collection of documents or data sources. For example, a search engine is a common application of search and retrieval in NLP. When you enter a query into a search engine, the search engine uses NLP techniques to understand the meaning of your query and then searches its index of web pages to find pages that are relevant to the query.

Types:

  • Keyword search: In this process, the search engine looks for documents that contain the keywords specified in the query. For example, if you enter the query "best restaurants in Gurgaon Sector 45," the search engine will retrieve a list of documents that contain the keywords "best," "restaurants," “Gurgaon,“ “Sector,“ and “45”
  • Semantic search: Another technique that may be used in search and retrieval includes natural language querying, which allows users to enter queries in a more free-form, natural language style, and semantic search, which takes into account the meaning of words and the relationships between them to provide more relevant results.

Search and retrieval (contd.)

A semantic search will return results for 

  • Dining establishments
  • Cafes
  • Bistros
  • Grills

Sectors that are nearby 45 e.g., 46,44 (with somewhat less relevance)
In and around Gurgaon (with less relevance)

Information extraction 

  • Information extraction is a task in NLP that involves automatically extracting structured information from unstructured text data. This is often used to extract specific pieces of information such as names, dates, and locations from a document or text.
  • For example, consider the following text:
  • "John Smith was born on January 1, 1980 in New York City and currently lives in San Francisco. He works as a software engineer at Google."
  • Using information extraction, it would be possible to automatically extract the following structured information:

Name: John Smith

  • Information extraction systems typically use a combination of rule-based approaches and machine learning techniques to identify and extract relevant information from text. The extracted information can then be stored in a structured format such as a database or spreadsheet for further analysis or processing.

 Content recommendation 

  • In NLP, a content-based recommendation can be implemented by first extracting features from the text description of an item, and then using those features to compute the similarity between items. For example, if you were building a content-based recommendation system for books, you could extract features such as the genre, author, and keywords from the book's description, and then use those features to recommend other books that are similar.
  • To give a concrete example, let's say you have a book called "The Great Gatsby" by F. Scott Fitzgerald. Some features you might extract from the book's description could include:
  1. Genre: fiction, classic
  2. Author: F. Scott Fitzgerald
  3. Keywords: love, obsession, wealth, social class
  • Based on these features, you could recommend:
  1.  other books by F. Scott Fitzgerald, or 
  2. other classic fiction books that deal with similar themes such as love, obsession, and social class.

Intent detection

  • Intent detection is a natural language processing (NLP) task that involves identifying the intention or goal behind a user's spoken or written words. It is an important aspect of building conversational agents, such as chatbots and virtual assistants, as it helps the system understand what the user wants and how to respond appropriately.
  • Example 1
  1. User: "I wanna travel from New York to Los Angeles soon."
  2. Chatbot: "Sure, I can help you with that. When do you want to travel?"
  3. User: "I want to leave on Jan 15th and return on Jan 20th."
  4. Chatbot: "Got it. How many passengers will be traveling with you?"
  5. User: "There will be two of us."
  6. Chatbot: "OK. I have found a few options for you. There is a direct flight from New York to Los Angeles on June 15th for $300 per person, and a flight with a layover in Chicago for $250 per person. Which one would you like to book?"
  • In this example, the chatbot uses intent detection to understand that the user's intention is to book a flight from New York to Los Angeles. It then gathers additional information about the dates of travel and the number of passengers to provide the user with appropriate flight options.

Intent detection (contd.)

Example 2User: “Hi, one of my checked in baggage is missing."Chatbot: "I'm sorry to hear that. Can you please provide your confirmation number and the flight details so that I can look up your reservation and assist you?"User: "My confirmation number is XYZ123 and I was on the flight from New York to Los Angeles on June 15th."Chatbot: "Thank you for the information. I have located your reservation and it appears that your luggage was checked in. I will report it as missing and begin the process for locating it. In the meantime, please make sure to fill out a property irregularity report at the airport. Is there anything else I can help with?"

In this example, the chatbot uses intent detection to understand that the user's intention is to report missing luggage and that they need assistance with the process. It then gathers the necessary information and provides the user with information on how to proceed.

Why do we need NLP – some examples

1. Language translation: Many tech companies, such as Google and Microsoft, have developed NLP-based language translation tools that can automatically translate text and speech from one language to another. These tools are used in a variety of applications, including the translation of websites and documents, as well as the real-time translation of conversations.

2. Text summarization: NLP algorithms can be used to automatically summarize long texts by extracting the most important points and presenting them in a shorter form. This can be useful for tasks such as creating summaries of news articles or long documents.

i)Extractive 

ii)Abstractive 

3. Chatbots: Many tech companies have developed chatbots that use NLP to understand and respond to user requests and questions. These chatbots can be used in customer service, for example, to answer common questions or help users navigate a website.

Sentiment analysis: NLP algorithms can be used to analyze the sentiment of text, which can be useful for tasks such as analyzing customer feedback or understanding the sentiment of social media posts about a particular topic.

4. Information extraction: NLP algorithms can be used to automatically extract important information from text documents, such as names, dates, and locations. This can be useful for tasks such as building databases of information or extracting data for analysis.Let’s take an example: SA

I love the movie, it was amazing!! - pos

The music was terrible, could not bear it! – neg 

Topics: Movie, films, entertainment

The screen is great, sound quality is good too. - pos

The battery drains so fast, not a flagship phone!! – neg

Topics: product reviews, mobiles, laptops, electronics

List of 

Positive sentiment words – 1000 (positive lexicon)

Negative sentiment words – 1000 (negative lexicon)Let’s take an example: SA (contd.)

1. Lexical-based methods: These methods rely on the use of dictionaries of words or phrases that have been annotated with sentiment labels (e.g., positive, negative, neutral). The sentiment of a given text can be determined by counting the number of positive and negative words it contains, or by looking for specific phrases that are known to convey positive or negative sentiment.

2.Machine learning-based methods: These methods involve training a machine learning model on a labeled dataset of texts and their corresponding sentiments. The model can then be used to predict the sentiment of new, unseen texts.

  • (sample, label) pair – gather training data 
  • Preprocessing 
  • Vectorization – this is most crucial step 
  • Build an ML model 
  • Predict using the above ML model 
  • Report metrics/business insights

3.Rule-based methods: These methods involve defining a set of rules or heuristics that can be used to identify the sentiment of a given text. For example, a rule might be "if the text contains the word 'excellent' and no negative words, the sentiment is positive."

4.Hybrid methods: These methods combine multiple approaches, such as lexical-based and machine learning-based methods, to improve the accuracy of sentiment analysis.

Note:It's worth noting that while these classical NLP techniques can be effective for sentiment analysis, they have also been largely superseded by more advanced techniques based on deep learning and neural networks, which can often achieve higher levels of accuracy.

Contractions & remove special characters

Contractions removal 

I did not like the movie

I didn’t 

like the movie -> I did not

 like the movie 

Spl characters removal 

I loved the music!!! 

It’s the pathetic!

I loved the music 🙂 

No comments:

Post a Comment