How does Semantic Search Differ from Traditional Keyword Searching?

This comparison focuses on search capabilities for retrieval of biopharmaceutical data on federated search platforms intended to provide simultaneous access to multiple information resources.

Differences in Query-Building Capabilities

A keyword search retrieves only those documents containing the terms entered by the user. Results are likely to be incomplete, unless the query includes multiple variant, but synonymous, terms that could refer to the same concept in published documents. For example, a thorough search for articles citing the medical condition commonly known as “GERD” could not rely solely on the abbreviation to retrieve all relevant publications. Adding alternative terms to the query, such as “gastroesophageal reflux” OR “gastro-oesophageal reflux” OR “esophageal reflux” OR “acid reflux,” would find many more articles. Basic requirements for creation of effective search strategies dependent on keywords alone are:

  • The user has prior knowledge of the subject under investigation
  • The ability to anticipate variant language likely to be used in relevant documents

On the other hand, Semantic Search can simplify query building, because it’s supported by automated natural language processing programs that draw upon a knowledgebase of predefined vocabularies. These built-in “dictionaries” (also known as “ontologies”) enable a Semantic Search platform to recognize and interpret terminology typically found in user queries. Behind the scenes, keywords entered are cross-checked in the ontologies and all synonyms associated with the concept specified in the user’s query are automatically added as alternative terms to enhance retrieval.

The extracts from results of a Semantic Search for “GERD” included below show that a query consisting of only one keyword can find documents containing variant terms—not just “GERD.” Synonyms provided in the underlying ontologies support semantic amplification of queries, adding artificial intelligence to the search process.

Potential Differences in Precision of Search Results

Semantic Search can also increase the precision of search results, in comparison with output based solely on keyword occurrence. Queries that include abbreviations or acronyms often retrieve “false positives,” because these terms are potentially ambiguous. When results of a keyword-only search for “GERD” (unsupported by semantic enrichment) were compared with output retrieved using Semantic Search, several false positives were detected. For example, “GERD” was frequently found in author names, which led to retrieval of documents totally unrelated to the concept intended: “gastroesophageal reflux disease.”

Inevitably, keyword ambiguity can also cause false positive results on Semantic Search platforms. However, intelligent “autosuggestions” are likely to reduce the risk of irrelevant document retrieval. Suggestions provided after a user enters a keyword or phrase will immediately identify terms present in more than one subject-oriented ontology. For example, autosuggestions generated when “icos” is entered will alert the user that the term appears as part of either drug or organization names, as well as mechanism of action vocabulary. By selecting a specific concept category where the keyword will represent the meaning intended, the user can avoid retrieval of numerous false positives typically found if “icos” were searched on a keyword-only platform.

Differences in Ease of Use When Searching Broad Topics

When literature research requires finding documents related to broad topics, keyword searching can be very labor-intensive. For example, a query for a topic such as “viral hemorrhagic fevers” would need to include not only the general term (as well as its alternative British spelling), but also specific disease names for each infection recognized as a member of this overall class of fevers. Moreover, a thorough keyword search would require inclusion of synonyms for all of these disorders. Lists of relevant diseases and their alternative names are readily available on the Web, but time involved in preliminary identification of vocabulary, plus entering all the terms correctly in the appropriate format (e.g., enclosing phrases in double quotes, insertion of Boolean OR between each keyword or phrase, etc.) could take several minutes.

In contrast, a Semantic Search platform is likely to be faster and easier to use when constructing a query intended to retrieve documents related to broad concepts. Built-in ontologies designed to support natural language processing have already assembled lists of pertinent terms and created hierarchies to express broader and narrower conceptual relationships among them. Thus, a query need only include the broad term, and the subtypes indented under it will automatically be added to the search strategy. This capability is comparable to “Exploding a Tree” on PubMed. For example, results of a Semantic Search query for the phrase “viral hemorrhagic fevers” will include documents where specific disease names are cited, even though the user has not entered them, and the broad class term was not mentioned in the text.

Differences in Text Annotation for Discovery of Key Concepts

In results of traditional keyword searching, only those terms included in the query are highlighted in the text of documents retrieved. In contrast, ontologies supporting Semantic Search capabilities extend the scope of text mark-up to include any terms recognized in the knowledgebase of predefined vocabularies. Automatically-generated summaries of subject-oriented metadata found in each document also include concepts co-occurring with the term originally entered in the Semantic Search query. This enhanced text annotation, and the detailed metadata summaries that it produces, enable users to discover potentially relevant data more quickly and assess its value in the context of other topics cited in the document.

Conclusion

This comparison of search capabilities emphasized differences that affect the quantity and quality of search results. Other factors considered were ease of use, as well as relative time and effort required when constructing search strategies to achieve good results.

If you’d like to learn more about the concepts behind these capabilities, you may also be interested in listening to InfoDesk’s latest webinar, which explored the power of ontologies in biopharmaceutical research.