Recently, Google has launched a new feature named “Dataset Search”. It allows searchers to find datasets on many topics across various disciplines including government data and data provided by news organizations, such as ProPublica, Google said. The feature will be mainly useful to the scientists, data journalists, and anyone who is inquisitive about the data behind any specific topic hence can find the data quickly.
Traditionally, researchers relied on sources like the World Bank, NASA, ProPublica, Kaggle (search engine). Now, the Dataset Search will make the researchers’ work much easier.
The purpose of this tool is also to improve discovery of datasets from fields such as social sciences, machine learning, life sciences, civic and government data, and many more.
To know what can comprise a dataset, here are a few examples:
- An organized collection of tables
- A file in a proprietary format that contains data
- A collection of files that constitute some meaningful dataset
- Images capturing data
- Files related to machine learning, such as trained parameters, or neural network structure definitions
How Google DataSet Search works?
Dataset Search works in multiple languages and there will soon be additional languages as well. You can just enter what you are looking for and it will guide you to the published dataset on the repository’s provider’s site.
Dataset Search is similar to Google Scholar. It lets you find databases wherever they are hosted, whether it’s a publisher’s site, a digital library or a personal webpage. There are certain guidelines for dataset providers in order to describe their data in a way that Google or any other search engine can better understand the content of their pages. The institutions that publish their data online will need to include metadata tags in their web pages to describe theory data including who created it, when it was published, how it was collected, how the data was collected, what are terms for using the data, and so on. This information will then be indexed by Dataset Search and will be combined with the inputs for Google’s Knowledge Graph.
Also Read 9 Amazing Things You Can Do With Google
In simple words, the information is collected, linked, and analyzed with different versions of the same dataset and find publications that may be describing the dataset. The approach is based on an open standard for describing the information (schema.org). Schema.org standard basically means it provides a vocabulary for describing structured data on the internet. It was developed by Google, Yahoo, and Microsoft. And this is one of the reasons to choose this standard as it is already being used by over 10 million sites. The dataset providers should adopt this common standard so that all datasets are part of this robust ecosystem and more people can locate their datasets within this search feature.
The publishers and data providers can mark up their published data web pages so that search engines can surface the data to searchers within the vertical search feature and make it easy for the people to search.
A search tool like Dataset Search is only as good as the metadata that data publishers are ready to provide.
Google wrote, “As more data repositories use the schema.org standard to describe their datasets, the variety and coverage of datasets that users will find in dataset search will continue to grow.”
If you are looking for any dataset record try searching it now. Click here