Knowledge graphs and their role in R&D

About knowledge graphs

A knowledge graph (KG) is a way of organizing and connecting data to show relationships between elements. It links facts, concepts and entities to make it easier to explore complex information and uncover new insights. Because of how KGs structure data, they support further adoption of advanced technology including AI/ML, natural language queries, automation and scenario modelling.

The term knowledge graph was first introduced opens in new tab/window by Google in 2012; the box users see on the right-hand side of the page of Google search results is populated using its underlying KG. Meta’s social networking platforms are powered by KGs, as is Amazon’s product graph.

Industries outside of technology are increasingly exploring how KGs can be applied to their work. Recent years have seen significant interest, with a 582% increase in mentions of KGs in Scopus from 2012 to 2023.

Graph showing the rise of searches for "knowledge graphs" on Scopus through 2023. — 1,271 mentions of "knowledge graphs" in 2013; 8,672 in 2023 (last full year indexed)

In R&D organizations, KGs serve as a powerful method of data science representation. By mimicking the interconnectedness of concepts and ideas in the human mind, KGs offer a 3D visual representation of complex relationships between entities. This capability makes them particularly effective for organizations where researchers are navigating large, disparate and multifaceted datasets.

The upsurge in KG exploration parallels the widespread adoption of artificial intelligence (AI) and machine learning (ML). Structured as a semantic graph, a KG integrates information into an ontology – a formalized system for organizing and categorizing data. When combined with ontologies, KGs provide a framework of human understanding that can inform and validate AI models.

Learn more about knowledge graphs with SciBite. opens in new tab/window

What is an ontology?

Ontologies are machine-readable frameworks that define the concepts, categories, and relationships within a domain. By providing a standardized vocabulary, they ensure shared understanding, enable data annotation, and help AI models grasp the meaning and connections behind technical terms.

Read: Using ontologies to unlock the full potential of your scientific data. opens in new tab/window

Figure 1: A knowledge graph brings together scattered to data to allow users to infer new knowledge and relationships that were previously unknown. (Source: ScienceDirect opens in new tab/window)

What are the benefits of a knowledge graph approach for R&D?

Knowledge graphs underpin an approach to search that is particularly beneficial in R&D, with data volumes and sources growing rapidly. KGs are a purpose-built solution able to handle domain-specific terminology that delivers multilayered insight into nuanced questions. That makes them well suited to R&D where research questions are complex – from drug repurposing projects to developing new materials with specific chemical properties.

Researchers need the ability to visualize and analyze multifaceted relationships between entities in datasets to uncover novel relationships and insights. This requires a data and search strategy that goes beyond a 2D “flat” search via a traditional relational database, to one that enables deep, multidirectional insight and can handle domain specific terminology. KGs are a “3D” search upgrade that power this strategy.

Here are five core benefits of knowledge graphs for researchers:

Relevant search results with data provenance: Data organized in a knowledge graph produce more accurate and relevant answers to user queries. Users can also click on links in the KG to explore where every relationship has come from, enabling evidence-based decision making.
Dynamically updated data with easier integration: Data pipelines that feed KGs are updated automatically and in real-time, unlike traditional static databases. Moreover, data connected from multiple sources, both internal and external, are integrated in a single destination via a unified view in a KG.
Enables discovery: The ability to infer relationships, dependencies and connections can reveal new insights that weren’t visible before. While flat search of a relational database only shows the relationship between A–B, and B–C, a KGs underpinning logic also infers the hidden relationship between A–C.
Exposes gaps or errors and can be easily navigated: The visual element of a KG makes it easier than in a traditional linear database to identify gaps or data errors. KGs are easier to visually traverse and quickly identify links, a simple but useful benefit for new users/researchers, and non-domain experts.
Enhances AI projects and works alongside LLMs: LLMs aid in the generation of a KG and further lower the barrier to entry when it comes to interrogation of graphs, enabling users of all experience levels to benefit from KGs.

Discover more of the building blocks for effective AI in R&D.

What kinds of questions can be answered with knowledge graphs?

Figure 2: Sankey diagram produced for drug repurposing project from data within Elsevier’s Biology Knowledge Graph; showing relationships among the disease endometrial cancer and associated entities, including drugs and proteins. (Source: Elsevier.com)

Knowledge graphs are versatile and highly customizable, which makes them a good fit for many R&D questions. KGs can be enterprise level or built for a specific use case or project. The enterprise-wide model is useful for finding and collating all existing data an organization holds on a particular topic. Complex and specialist questions can be probed and answered in detail with a use-case based knowledge graph.

With a clear use case in mind, a purpose-built KG can accelerate the path from data to discovery and help researchers to make new connections. Moreover, these smaller, fine-tuned knowledge graphs can eventually be stitched together using a large language model (LLM) to convert natural language to graph-based query syntaxes.

Three use cases include:

Drug repurposing projects: Finding new uses for existing drugs reduces the time and cost of developing new therapies. To identify new indications for an existing therapy, researchers need to understand the complex and interconnected relationship between three key elements: drug, disease, target. A KG is an intuitive and explainable way of visualizing this trio.
Predicting properties of new materials: Competition is fierce for discovering novel materials that are tougher, sustainable or require specific properties, such as semiconductors. Using a KG helps to analyze the relationship between existing materials to predict the properties of new materials, for example, newly created plastics or materials made from mixing two metals. Data sources in this KG include information from patents, proprietary data, published journals, public databases and industry bodies.
Powering chatbots and virtual assistants: Natural language interfaces like chatbots and virtual assistants are increasingly used in many industries, both for customer-facing services and for answering internal questions. Because a KG can manage multiple large datasets from many varied sources, KGs provide a solid foundation for virtual chatbots and assistants that can be queried easily and in real-time.

Knowledge graph use cases

How does a knowledge graph work in practice?

Knowledge graphs map between equivalent concepts from different sources of data. This can include connecting data from multiple heterogeneous data silos, both external and internal (provided entities are harmonized to common identifiers, as explained in the next section).

In a KG, entities or ‘things’ are represented as nodes, or vertices, with associations between these nodes captured as edges, or relationships. Additionally, nodes and edges may hold attributes that describe their characteristics.

A KG is semantically enriched and aligned to ontologies, so meaning is associated to graph’s entities. For example, a node with the name NASH is relatively meaningless in isolation. To a pharma researcher, it may be clear the node refers to a disease (nonalcoholic steatohepatitis), but how would a computer assign a type to this node; is it a gene, a drug, a person? Or understand which other nodes NASH may interact with and via what type of edge?

KGs get around this by labelling the NASH node as a disease. Once this node is aligned to a disease ontology, a computer can then understand that entity in the context of other node types in the KG. For even deeper understanding to see which genes are associated with NASH – genes can be included in the KG and edges added between diseases and genes (see Fig 3).

Figure 3: Visualization of a knowledge graph. Nodes are represented as circles and edges as arrows, with attributes allowed on either. Entities are captured in ontologies, with green nodes representing genes and blue nodes representing indications.

Five steps to building a knowledge graph

1. Identify the relevant datasets

First define the question you want to answer to identify the datasets required. Data can come from public, private or partner sources—or a combination of all three. By integrating multiple datasets, richer knowledge graphs are created, breaking down information silos and enabling researchers to approach specific questions with a comprehensive and diverse set of data.

2. Apply ontologies

A semantically enriched knowledge graph aligns entities to one or more ontologies to create a standardized data framework. This second step is crucial because data often resides in silos, in various formats and is filled with jargon, abbreviations, and technical terms. Internal data management systems frequently differ from external sources, making integration challenging. By applying ontologies, companies standardize and structure datasets, using a common model of knowledge associated with their domain. This process not only unifies disparate data but also captures valuable metadata automatically, enhancing the graph’s utility and depth.

3. Harmonize data

Data must be harmonized through semantic enrichment, annotation and the addition of metadata. This involves scanning vast quantities of scientific publications and other datasets and normalizing scientific concepts to unique entity IDs as part of the process. By aligning these entities to IDs captured in ontologies, structured data can be seamlessly integrated into a KG, whether it originates from internal or external sources.

4. Extract relationships

The penultimate step is extracting relationships between entities. The goal is to determine when a meaningful association exists, rather than merely co-occurrence in the same document. Some relationships are ambiguous, making context critical – for example, distinguishing between causal and coincidental connections in drug adverse events. Text mining tools and machine learning algorithms can generate additional attributes that describe these relationships to further enrich the KG.

5. Generate graph schema

The final stage is generating a schema – a meta-graph that defines the relevant entities and their relationships. The schema can then be ingested into a graph database using a "bridging ontology" that provides a simplified representation of entities and connections. This forms the foundation of a KG pipeline, integrating ontology creation and data harmonization. Such a pipeline can be semi-automated or fully automated to streamline the process.

Explore Datasets and AI solutions from Elsevier

Deploy your AI strategy on a solid foundation of comprehensive, structured and enriched data. Elsevier Datasets deliver the highest level of integrity and accuracy to help you discover, innovate and develop with confidence. Transform data into discoveries with Elsevier.

Subscribe to "Foundations for Effective AI"

Sign up to receive a quarterly email from Elsevier on AI in R&D, best practices, building blocks and more. Plus, stay up to date on upcoming webinars and events.

How knowledge graphs support AI in R&D