Vectors, ontologies, embeddings and sentiment analysis: how relationships between words are the foundation of NLP
Human language follows unexpected patterns. Varied forms of syntax and metaphor, together with context and repetition, make classification challenging for both humans and machines. Manual classification of unstructured linguistic data by human curators can be valuable, but without technological assistance, is time consuming and potentially rife with errors. When such “natural language” data is left untouched it is less usable. Enterprises are faced with the challenge of transforming huge amounts of this raw data into actionable intelligence.
To face this challenge, enterprises are turning to forms of Artificial Intelligence (AI) to organize information, but this has challenges of its own. Machine Learning systems can teach computers to generalize data from a series of pre-selected examples, but such programs face the challenge of interacting with humans and the content they create. This is where natural language processing (NLP)—the interaction between human expression and computers that build systems for understanding natural language— comes into play. NLP is being used to address information overload. For their employees to access information, computers parsing language effectively can help.
Gartner lists “conversational language” as one of the Top Ten Strategic Technology Trends for 2017. One of the techniques used by NLP is word embedding, particularly useful in sentiment analysis—the process of classifying opinions expressed in a text article, usually according to the writer’s attitude about the subject. Sentiment analysis can be used, for example, to accurately classify a movie review as positive or negative according to the words and sentences that describe it.
Oren Etzioni, Computer Science professor and CEO of the Allen Institute for Artificial Intelligence, once quipped that, “When AI can’t determine what “it” refers to in a sentence, it’s hard to believe that it will take over the world.” Word embeddings are mathematical representations of words that capture meaning from multiple examples of usage. They are based on the relationships between words, not on the meaning of words themselves, the nuances of which are difficult to classify accurately. Paradoxical as it might seem, word relationships are sometimes easier to establish than word meanings.
An example of the NLP word relationship model is given by journalist Katherine Bailey: “king is to man as queen is to woman.” The vector formula would be would be presented as (king – man + woman = queen). It is unnecessary in this instance to have established the meaning of either term, “king,” or queen.” Word embeddings that create vectorized representations of entire sentences are being used by Google’s translation system to increase in accuracy. IBM is also making use of NLP with through Watson Analytics, a predictive data discovery tool for businesses launched in 2014. Such AI need not directly recognize word meaning, but instead infer it from word relationships.
For Natural Language Processing to work properly, it must be built from ontologies and taxonomies of a world in which hierarchies are established. Such information humans have learned through many years of interaction and association and take for granted. Language is fluid, which makes the construction of stable ontological models difficult. The challenge for NLP is twofold: to accurately classify natural language text, and to communicate the results of such classifications to humans in a conversational format.
Though NLP has made some promising inroads into addressing its challenges, humans remain essential for NLP, even as they rely on NLP more and more with each passing day. The meaning of words and humans remains elusive, but it is the relationships between them that continue to define NLP.
Tame the Ever-Increasing Flow of Information
InfoDesk has created the world’s smartest platform for managing and sharing information. With our comprehensive solutions, you can bring all your information together, filter and select relevant content, and deliver the right intelligence to the right people. InfoDesk has been providing actionable intelligence to multinational corporations, government agencies and other organizations since 1999. InfoDesk is based in New York with offices in London, Washington, DC and India. Learn more about InfoDesk.