Subscribe For Newsletter

Also Checkout Recent forum posts
No topics yet!

Open Source Data Sets for Machine Learning Training Model

Whenever you hear the term AI, you must think about the data behind it.

In this post, I am sharing a collection of open source data sets available, to actually train the Machine Learning model to perform various actions.

A data set is a collection of data. In ML projects, we need a training data set.

1. Xray-Images

  • https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/
  • https://www.kaggle.com/nih-chest-xrays
  • http://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a
  • https://nihcc.app.box.com/v/ChestXray-NIHCC

2. US Government

  • Data.gov
  • NOAA – ncfc.noaa.gov/cdo-web (motions, inflation, environmental data)
  • US Census Data – census.gov/data.html (demographics)
  • Bls.gov/data – (employment/un-employment, product categories)

3. UK Government

  • UK Dataservice – www.ukdataservice.ac.uk (census data)
  • WorldBank – datacatalog.worldbank.org (census, demographics, geographic, health, income, GDP)
  • IMF imf.org/en/Data (economic, currency, finance, commodities)
  • OpenData.go.ke
  • Data.world

Find your Fun Application ideas using these dataset:

  • Kaggle.com/datasets (variety)
  • snap.stanford.edu/data/web-Amazon.html (35 Million product reviews)
  • Group lens.org/datasets/movielens (20M MOVIE ratings)
    Yelp.com/dataset
  • IMDB – ai.stanford.edu/~amaas/data/sentiment/ (25M Movie ratings)
  • Twitter Sentiments – help.sentiment140.com/for-students (160k Tweets)
  • AirBnb – insiderairbnb.com/get-the-data.html
  • UCI ML Datasets – mar.cs.umass.edu/ml
  • EMAIL dataset – cs.cmu.edu/~enron/ (500k Emails)
  • SpamBase – archive.ics.uci.edu/ml/datasets/Spambase (emails)
  • reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/ (200K questions and answers)
  • Gutenberg EBOOKS – Gutenberg.org/wiki/Gutenberg:Offline_Catalogs (LARGE collection of ebooks)

Training Images using Natural Language Processing:

  1. ImageNet – httpimage-net.org (14M images).
  2. Google – ai.googleblog.com/2016/09/introducing-open-images-dataset.html (9M images URLs with labels)
  3. Microsoft Coco – cocodataset.org (330k Images, mostly labelled)
  4. Stanford Dogs – vision.stanford.edu/aditya86/ImageNetDogs (120 dog breeds, 20K images)

    Please comment below if you are pridicting something out of it.

Share Post On:

10 thoughts on “Open Source Data Sets for Machine Learning Training Model

Leave a Reply

Your email address will not be published. Required fields are marked *

Related News

Food Delivery Application Development

Food Delivery Application Development In Digital world,Advancement in technology helped many industries to grow and increase their market value. Nowadays,In the market online food delivery is most trending things and people alsoRead More...
redcrixnew | Sep 12, 20200

How to keep your takeaway restaurant customers coming back for more

How to keep your takeaway restaurant customers coming back for more…??? Let me ask you a question. Suppose you want to order dinner. What’s easier? Option 1: Open your web browser DecideRead More...
redcrixnew | Sep 12, 20200

Delivery Mobile App Development Solutions

Are you want best delivery mobile app development solution??? As a today lead generation people moving further on online strategy. As well as people are moving to like do online marketing andRead More...
redcrixnew | Aug 28, 20200

How to delete KYC if I entered wrong details in mobikwik app?

You want to delete your Mobikwik account. You have submit wrong information during KYC of Mobikwik and you want to permanently delete your account from Mobikwik. You want delete, close & deactivateRead More...
redcrixnew | Aug 24, 20200