Skip to content

Leveraging AI to Illuminate Scientific "Dark Data": University of Utah Researchers Advance Discovery Through the National Data Platform 

What if some of the world's most valuable scientific data already exists, but researchers cannot find it? 

That challenge is driving innovative work led by Dr. Manish Parashar, the University of Utah's Chief AI Officer, who also serves as Executive Director of the Scientific Computing and Imaging Institute, and Presidential Professor in the Kahlert School of Computing. 

Parashar and collaborators are leveraging artificial intelligence to help researchers discover and better understand scientific datasets through the National Data Platform, a National Science Foundation-funded initiative that supports data discovery, access, and use across distributed scientific resources. 

The work addresses a growing challenge facing modern research:  the problem of "dark data." 

As scientific data volumes continue to grow, many valuable datasets remain hidden because they are poorly indexed, inconsistently cited, or isolated within disciplinary silos. Researchers may never know that relevant data exists, limiting opportunities for collaboration, discovery, and scientific advancement. 

"AI can help us find patterns and generate insights from data, but first we need to be able to find and understand the data itself," said Parashar. "Our goal is to make scientific data more visible, more accessible, and more useful for discovery." 

To help address this challenge, the team developed Contextual Data Insights, an AI-powered capability within the National Data Platform. Using large language models, the system analyzes millions of scientific publications to identify datasets and how researchers are actually using them.

The process goes beyond traditional metadata catalogs. Rather than simply identifying datasets, the platform generates data-usage descriptors that provide valuable context about how data is being used in scientific research. The AI can identify dataset references in publications, distinguish between a simple mention and substantive use, and uncover additional related datasets that may not have been previously indexed. 

The resulting insights help researchers answer important questions: 

  • Who is using a particular dataset?
  • Which institutions are working with it?
  • What publications cite it?
  • What software tools or AI models are associated with it? 

By making this information visible, the platform helps researchers discover new data resources and evaluate whether a dataset is relevant and trustworthy for their work. 

The effort reflects the interdisciplinary expertise that defines the University of Utah's approach to AI innovation. 

Parashar's collaborators include Jess Tate, PhD, Research Computer Scientist and Saleem Alharir, Senior Software Developer, and Software Development Engineers Rafael Ladislau and Pratik Kharade. Together, the team brings expertise in artificial intelligence, scientific computing, cyberinfrastructure, and data science to help address one of the most significant challenges facing modern research. 

As AI continues to transform science, initiatives such as Contextual Data Insights demonstrate how the University of Utah is helping build the infrastructure needed for more transparent, inclusive, and impactful discovery. 

By bringing scientific dark data into the light, Parashar and his collaborators are expanding opportunities for researchers and helping unlock the full potential of AI-enabled science. 

Learn More 

Share this article:

 

Last Updated: 6/17/26