Unveiling Natural Language Processing: Beyond the Surface of AI

By Zhang Chenxi

To many people who hear the layman’s term artificial intelligence, the first thought they associate with the term may be the image of automation performing menial tasks, or Deep Blue beating Kasparov at Chess. Yet not many people may be familiar with the various branches of artificial intelligence. To name a few, computer vision, machine learning, and what I shall be focusing on for the rest of my article – Natural Language Processing. As its name suggests, NLP refers to a class of problems concerned with manipulating and comprehending human language. Certain tasks within the realm of NLP include understanding the syntax and rules of the language, similarities in connotations between certain words, and the classification of malicious texts and fake news. Recently, deep learning models have seen huge success in performing a variety of NLP tasks. 

Understanding NLP Fundamentals

While approaching NLP-related problems, the preliminary step is often to understand how the inputs are fed into the computer to synthesise usable information. Let’s put it this way – what gives a sentence meaning if it’s just a block of words? The ordering of the words and the syntax clearly matters. There are also constraints like grammatical rules governing how the words ought to be ordered to make logical sense and semantic sense. When it comes to NLP, linguistic theories such as distributional semantics stating that the meaning of the word is frequently given by the words in its proximity are a basis for most of the techniques in NLP. Once we feed a random assortment of words into the computer, the syntax and the meaning have to be retained, so it can be understood as a transformation from words to vectors which the computer understands. These words are first tokenised into smaller units of words to better assign meaning, then lemmatised to process the words into a meaningful base form. Words are also processed to remove nonsensical words in the corpus using stop words. Then, the inputs are encoded as vectors, which are similar to the vector of words with similar meanings, hence having a smaller distance in the vector space once visualised. A common technique used is the Word2vec, which factors in the syntactical positioning of the word in the text, the context words in its proximity, and a sliding window of a fixed length in order to calculate and predict the distributional probability of the context words given a center word. The cosine distance between the word vectors represents their semantic similarities. 

Enhancing Syntactic Understanding

In the next section of the discussion, it is necessary to bring in another theory in linguistics, which is the postulate that words depend on other words for meaning. Part-of-speech tagging is also done to label each word using a tag that indicates its grammatical role, such as a noun, adverb, etc. executed by POS classifiers. Dependency parsing is a technique then executed to make better syntactic sense of the statement, by tracing the root word which another word is dependent on, rather similar to the concept of UFDS. It takes the word embeddings from the method explained above, as well as its part-of-speech tags as vectors, and then using these values computes the possible dependency for each word. Being able to model the contextual meanings of the words in a sentence then brings to another integral task in NLP, which is language modeling, and being able to predict the word that comes next. 

Evolving Language Models with Transformers

Indeed, the concept of language modeling is deployed heavily with the use of transformers, which generate the probable words that follow from a given string of words based on the self-attention mechanism. Such an idea is similar to that used in chatbots like the notorious Chat-GPT. Prior to the ‘Attention is All You Need’ paper in 2017 which revolutionised language models, the task of word prediction had primarily been assigned to recurrent neural networks. These models take in the input of a sequence of words to generate the structured probability distribution of the next word for every time step. Yet, this model has been criticised for being too computationally expensive and taking too long given its recurrent computation.

Another problem is that given the input of the word being a fixed length, there is an information gap over long distances that cannot be bridged due to its ‘feed-forward’ nature. These errors can result in sequential mistakes, where the model generates words that violate grammatical rules. To curb these errors, different variants of recurrent neural networks have been invented notably the Long Short-Term Memory, or LSTM. The incorporation of a cell state which allows information to be added or removed in between time steps allows information to be stored. Yet, the invention of the attention concept is a harbinger of many revolutions in the field of NLP, as it allows the model to focus on a particular part of the input instead of having to take in the entire string of words. Instead of taking in the inputs of the positioning encoding sequentially, the transformer is able to utilise the attention mechanism to focus on what’s most important in the input itself and process it in a parallel manner. 

Crafting New Text: Natural Language Generation

While the discussion in the previous sections has largely been predicated on the prediction of words, another essential task in NLP involves natural language generation, which is the ability to generate new text. The essential idea is that by feeding the desired output into the decoder regardless of its prediction, which is called teacher forcing. A decoding algorithm is then used to generate text from the model to obtain the outputs, hence generating the text. Well-known NLG tasks shall include summarisation from a document, storytelling, or poetry generation. While natural language generation techniques have seen huge improvements over the past decade, one area of possible limitation is the lack of a concrete metric to evaluate its performance, and the most prevalent way of evaluating has been human ratings. This shows that the evaluation of such generated texts is still predicated on the subjective enjoyment of humans, which constrains the progress of its research. 

NLP’s Vast Landscape and Future

I have talked about some of the most basic tasks in NLP, albeit barely the tip of the iceberg in this vastly evolving and complex field, which is certainly not just restricted to LLMs. However, it must be acknowledged that the bulk of NLP research in the coming years is likely heavily focused on LLMs, fixing their hallucination issues, as well as reducing their computational time and cost. Beyond that, there are still open and prevalent problems in NLP that have the potential to change the field drastically once solved. One thing’s for sure – we can certainly be optimistic about the progress of NLP and the potential benefits it can reap in our lives. 

References: 

  1. CS224n: Natural Language Processing with Deep Learning [Extremely Useful Resource]
  2. Foundations of Statistical Natural Language Processing, Christopher D. Manning, Hinrich Schutze
  3. Machine Learning in Automated Text Categorization, Fabrizio Sebastiani
  4. Natural Language Processing from Scratch, Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, Pavel Kuksa
  5. Speech and Language Processing: An Introducting to Natural Language Processing, Computational Linguistics, and Speech Recognition
  6. Understanding LSTM Networks, Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  7. Long Short-Term Memory Neural Computation, Sepp Hochreiter, Jurgen Schmidhuber
  8. Emerging Trends in NLP Research, Roberto Iriondo, https://txt.cohere.com/top-nlp-papers-april-2023/