Subscribe For Newsletter

Recent Posts
Also Checkout Recent forum posts
No topics yet!

Open Source Data Sets for Machine Learning Training Model

Whenever you hear the term AI, you must think about the data behind it.

In this post, I am sharing a collection of open source data sets available, to actually train the Machine Learning model to perform various actions.

A data set is a collection of data. In ML projects, we need a training data set.

1. Xray-Images

  • https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/
  • https://www.kaggle.com/nih-chest-xrays
  • http://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a
  • https://nihcc.app.box.com/v/ChestXray-NIHCC

2. US Government

  • Data.gov
  • NOAA – ncfc.noaa.gov/cdo-web (motions, inflation, environmental data)
  • US Census Data – census.gov/data.html (demographics)
  • Bls.gov/data – (employment/un-employment, product categories)

3. UK Government

  • UK Dataservice – www.ukdataservice.ac.uk (census data)
  • WorldBank – datacatalog.worldbank.org (census, demographics, geographic, health, income, GDP)
  • IMF imf.org/en/Data (economic, currency, finance, commodities)
  • OpenData.go.ke
  • Data.world

Find your Fun Application ideas using these dataset:

  • Kaggle.com/datasets (variety)
  • snap.stanford.edu/data/web-Amazon.html (35 Million product reviews)
  • Group lens.org/datasets/movielens (20M MOVIE ratings)
    Yelp.com/dataset
  • IMDB – ai.stanford.edu/~amaas/data/sentiment/ (25M Movie ratings)
  • Twitter Sentiments – help.sentiment140.com/for-students (160k Tweets)
  • AirBnb – insiderairbnb.com/get-the-data.html
  • UCI ML Datasets – mar.cs.umass.edu/ml
  • EMAIL dataset – cs.cmu.edu/~enron/ (500k Emails)
  • SpamBase – archive.ics.uci.edu/ml/datasets/Spambase (emails)
  • reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/ (200K questions and answers)
  • Gutenberg EBOOKS – Gutenberg.org/wiki/Gutenberg:Offline_Catalogs (LARGE collection of ebooks)

Training Images using Natural Language Processing:

  1. ImageNet – httpimage-net.org (14M images).
  2. Google – ai.googleblog.com/2016/09/introducing-open-images-dataset.html (9M images URLs with labels)
  3. Microsoft Coco – cocodataset.org (330k Images, mostly labelled)
  4. Stanford Dogs – vision.stanford.edu/aditya86/ImageNetDogs (120 dog breeds, 20K images)

    Please comment below if you are pridicting something out of it.

Share Post On:

10 thoughts on “Open Source Data Sets for Machine Learning Training Model

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related News

Blockchain Sustainable Development Goals

The sustainable development goals were created in 2012, and blockchain technology was still in its infancy. Few could have predicted the course and potential of blockchain for achieving these lofty goals. However,Read More...
admin | Jul 10, 20220

Mobile App Development Platform

WHAT IS THE BEST MOBILE APP DEVELOPMENT PLATFORM  Using the best mobile app development software makes creating apps for your business easy. This is significant because mobile apps were traditionally associated withRead More...
admin | Jul 5, 20220

On Demand Food Delivery App Development

On Demand Food Delivery App Development In Digital world,Advancement in technology helped many industries to grow and increase their market value. Nowadays, In the market on demand food delivery is most trendingRead More...
redcrixnew | Sep 12, 20200

How to keep your takeaway restaurant customers coming back for more

How to keep your takeaway restaurant customers coming back for more…??? Let me ask you a question. Suppose you want to order dinner. What’s easier? Option 1: Open your web browser DecideRead More...
redcrixnew | Sep 12, 20200