Whenever you hear the term AI, you must think about the data behind it.
In this post, I am sharing a collection of open source data sets available, to actually train the Machine Learning model to perform various actions.
A data set is a collection of data. In ML projects, we need a training data set.
2. US Government
- NOAA – ncfc.noaa.gov/cdo-web (motions, inflation, environmental data)
- US Census Data – census.gov/data.html (demographics)
- Bls.gov/data – (employment/un-employment, product categories)
3. UK Government
- UK Dataservice – www.ukdataservice.ac.uk (census data)
- WorldBank – datacatalog.worldbank.org (census, demographics, geographic, health, income, GDP)
- IMF imf.org/en/Data (economic, currency, finance, commodities)
Find your Fun Application ideas using these dataset:
- Kaggle.com/datasets (variety)
- snap.stanford.edu/data/web-Amazon.html (35 Million product reviews)
- Group lens.org/datasets/movielens (20M MOVIE ratings)
- IMDB – ai.stanford.edu/~amaas/data/sentiment/ (25M Movie ratings)
- Twitter Sentiments – help.sentiment140.com/for-students (160k Tweets)
- AirBnb – insiderairbnb.com/get-the-data.html
- UCI ML Datasets – mar.cs.umass.edu/ml
- EMAIL dataset – cs.cmu.edu/~enron/ (500k Emails)
- SpamBase – archive.ics.uci.edu/ml/datasets/Spambase (emails)
- reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/ (200K questions and answers)
- Gutenberg EBOOKS – Gutenberg.org/wiki/Gutenberg:Offline_Catalogs (LARGE collection of ebooks)
Training Images using Natural Language Processing:
- ImageNet – httpimage-net.org (14M images).
- Google – ai.googleblog.com/2016/09/introducing-open-images-dataset.html (9M images URLs with labels)
- Microsoft Coco – cocodataset.org (330k Images, mostly labelled)
- Stanford Dogs – vision.stanford.edu/aditya86/ImageNetDogs (120 dog breeds, 20K images)
Please comment below if you are pridicting something out of it.