Categories: FAANG

Datasets at your fingertips in Google Search

googleresearch

Posted by Natasha Noy, Research Scientist, and Omar Benjelloun, Software Engineer, Google Research

Access to datasets is critical to many of today’s endeavors across verticals and industries, whether scientific research, business analysis, or public policy. In the scientific community and throughout various levels of the public sector, reproducibility and transparency are essential for progress, so sharing data is vital. For one example, in the United States a recent new policy requires free and equitable access to outcomes of all federally funded research, including data and statistical information along with publications.

To facilitate discovery of content with this level of statistical detail and better distill this information from across the web, Google now makes it easier to search for datasets. You can click on any of the top three results (see below) to get to the dataset page or you can explore further by clicking “More datasets.” Here is an example:

When users search for datasets in Google search, they find a dedicated section highlighting pages with dataset descriptions. They can explore many more datasets by clicking on “More datasets” and going to Dataset Search.

Powered by Dataset Search

Dataset Search, a dedicated search engine for datasets, powers this feature and indexes more than 45 million datasets from more than 13,000 websites. Datasets cover many disciplines and topics, including government, scientific, and commercial datasets. Dataset Search shows users essential metadata about datasets and previews of the data where available. Users can then follow the links to the data repositories that host the datasets.

Dataset Search primarily indexes dataset pages on the Web that contain schema.org structured data. The schema.org metadata allows Web page authors to describe the semantics of the page: the entities on the pages and their properties. For dataset pages, schema.org metadata describes key elements of the datasets, such as their description, license, temporal and spatial coverage, and available download formats. In addition to aggregating this metadata and providing easy access to it, Dataset Search normalizes and reconciles the metadata that comes directly from the Web pages.

If you are a dataset author or provider and want others to find your datasets in Search, make sure that you publish your dataset in a way that makes it discoverable and specifies how others can reuse the data. Specifically, ensure that the Web page that describes the dataset has machine-readable metadata. The easiest way to ensure this is to publish your dataset in an established dataset repository. Some repositories cater to specific research communities, while others are “generalists” (figshare.com, zenodo.org, datadryad.org, kaggle.com, etc.). These repositories automatically include metadata in dataset pages for every dataset, which makes it easy for search engines to discover and include them in specialized result sections, as in the figure above.

As data sharing continues to grow and evolve, we will continue to make datasets as easy to find, access, and use as any other type of information on the web.

Acknowledgments

We are extremely grateful to the numerous Googlers who contributed to developing and launching this feature, including: Rachel Zax, Damian Biollo, Shiyu Chen, Jonathan Drake, Sunil Vemuri, Stephen Tseou, Amit Bapat, Will Leszczuk, Marc Najork, Sergei Vassilvitskii, Bruno Possas, and Corinna Cortes.

How scientists can leverage AI agents using Gemini Enterprise, Gemini Code Assist, and Gemini CLI

Scientific inquiry has always been a journey of curiosity, meticulous effort, and groundbreaking discoveries. Today, that journey is being redefined, fueled by the incredible capabilities of AI. It’s moving beyond simply processing data to actively participating in every stage of discovery, and Google Cloud is at the forefront of this…

November 4, 2025

In "FAANG"

Advancing cancer research with public imaging datasets from the National Cancer Institute Imaging Data Commons

February 3, 2023

In "FAANG"

What enterprises should know about The White House’s new AI ‘Manhattan Project’ the Genesis Mission

President Donald Trump’s new “Genesis Mission” unveiled Monday, November 24, 2025, is billed as a generational leap in how the United States does science akin to the Manhattan Project that created the atomic bomb during World War II. The executive order directs the Department of Energy (DOE) to build a…

November 26, 2025

In "AI/ML News"

AI Generated Robotic Content