Open Source Data Sets for Machine Learning Training Model - Redcrix Technologies (P) Ltd.

Start a project

Subscribe For Newsletter

Also Checkout Recent forum posts

No topics yet!

Open Source Data Sets for Machine Learning Training Model

Whenever you hear the term AI, you must think about the data behind it.

In this post, I am sharing a collection of open source data sets available, to actually train the Machine Learning model to perform various actions.

A data set is a collection of data. In ML projects, we need a training data set.

1. Xray-Images

https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/
https://www.kaggle.com/nih-chest-xrays
http://academictorrents.com/details/557481faacd824c83fbf57dcf7b6da9383b3235a
https://nihcc.app.box.com/v/ChestXray-NIHCC

2. US Government

Data.gov
NOAA – ncfc.noaa.gov/cdo-web (motions, inflation, environmental data)
US Census Data – census.gov/data.html (demographics)
Bls.gov/data – (employment/un-employment, product categories)

3. UK Government

UK Dataservice – www.ukdataservice.ac.uk (census data)
WorldBank – datacatalog.worldbank.org (census, demographics, geographic, health, income, GDP)
IMF imf.org/en/Data (economic, currency, finance, commodities)
OpenData.go.ke
Data.world

Find your Fun Application ideas using these dataset:

Kaggle.com/datasets (variety)
snap.stanford.edu/data/web-Amazon.html (35 Million product reviews)
Group lens.org/datasets/movielens (20M MOVIE ratings)
Yelp.com/dataset
IMDB – ai.stanford.edu/~amaas/data/sentiment/ (25M Movie ratings)
Twitter Sentiments – help.sentiment140.com/for-students (160k Tweets)
AirBnb – insiderairbnb.com/get-the-data.html
UCI ML Datasets – mar.cs.umass.edu/ml
EMAIL dataset – cs.cmu.edu/~enron/ (500k Emails)
SpamBase – archive.ics.uci.edu/ml/datasets/Spambase (emails)
reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/ (200K questions and answers)
Gutenberg EBOOKS – Gutenberg.org/wiki/Gutenberg:Offline_Catalogs (LARGE collection of ebooks)

Training Images using Natural Language Processing:

ImageNet – httpimage-net.org (14M images).
Google – ai.googleblog.com/2016/09/introducing-open-images-dataset.html (9M images URLs with labels)
Microsoft Coco – cocodataset.org (330k Images, mostly labelled)
Stanford Dogs – vision.stanford.edu/aditya86/ImageNetDogs (120 dog breeds, 20K images)
Please comment below if you are pridicting something out of it.

Share Post On:

10 thoughts on “Open Source Data Sets for Machine Learning Training Model”

Zentner says:

September 26, 2019 at 6:59 pm

I’ve added this write-up to my bookmarks

Reply
Zingaro says:

October 1, 2019 at 5:40 am

Thanks for telling this message and making it public

Reply
Machine learning Training in Hyderabad says:

December 14, 2019 at 6:30 am

I am really happy to say it’s an interesting post to read . I learn new information from your article , you are doing a great job . Keep it up

Reply
Machine Learning Training in Hyderabad says:

December 25, 2019 at 8:43 am

Thanks for sharing nice information and nice article and very useful information…

Reply
Machine Learning Training in Hyderabad says:

January 4, 2020 at 9:22 am

Excellent read, Positive site, where did u come up with the information on this posting? I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work

Reply
Machine Learning Training in Hyderabad says:

January 8, 2020 at 10:25 am

Excellent post. Gained a lot of knowledge from it. Looking ahead for more of such interesting postings

Reply
Machine Learning Training in Hyderabad says:

January 8, 2020 at 10:30 am

Was in search for this information from a long time. Thank you for such informative post. Looking forward for more of such informative postings

Reply
fred087856460.hatenadiary.com says:

February 13, 2020 at 4:35 am

SV

Reply
Machine Learning Training in Hyderabad says:

February 29, 2020 at 9:51 am

Informative post. Concept has been explained very well.Looking forward for such informative posts

Reply
Machine Learning Training in Hyderabad says:

March 14, 2020 at 6:52 am

Was looking for this post since a while. Very well explained. Looking forward to see more of such interesting posts from you..

Reply

Leave a Reply Cancel reply

Related News

OUR OFFICE LOCATION

TDI Business Center, Sector 118, Sahibzada Ajit Singh Nagar, Punjab 160059

FOR BUSINESS INQUIRIES

sales@redcrix.com

CALL US ON

+91 9988098200

Let’s Make Something Fresh Now

Copyright © 2022. Redcrix Technologies Pvt Ltd Chandigarh , India. All Rights Reserved