The majority of us frequently hear this name, particularly those who work in artificial intelligence, specially in the following fields: Machine learning,NLP,computer vision.
Kaggle is an online community of machine learning engineers and data scientists run by a Google subsidiary.
Users can find datasets on Kaggle that they can use to create AI models, collaborate with other machine learning engineers and data scientists, and participate in competitions to find solutions to data science problems.
In addition to providing a public data and cloud-based business platform for data science and AI education, Kaggle launched in 2010 with machine learning and data science competitions.
Jeremy Howard and Anthony Goldblum made up the majority of the core staff.
Kaggle’s sixth annual industry-wide poll was conducted in September 2022 in an effort to provide a genuinely complete picture of the state of machine learning and data science
“State of Data Science and Machine Learning 2022”
The survey was for those who work in or are interested in data science and software, their degrees, nationalities, tools they use, incomes, and years of experience.
These are some of the results of the Survey:
– Jupyter Notebook is the most used IDE. (8736 participants)
– Colab notebooks is the most used hosted notebook product. ( 5764 participants)
– Matplotlib is the most used visualization library. ( 9121 participants)
– Sci-kit learn is the most used machine learning framework. ( 7140 participants)
– Linear or logistic regression is the most used machine learning algorithm. ( 6970 participants)
– The most used computer vision method is image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc). ( 2327 participants)
– The most used NLP methods are Transformer language models (GPT-3, BERT, XLnet, etc). (1331 participants)
– The most used ML model hubs/repositories are Kaggle datasets. ( 1046 participants)
– The most common role is data scientist ( 998 participants)
– The most common employer industry is Computers/technology.
– The majority of candidates work in large companies of 10,000 employees
– The majority of candidates (2063) said that analyzing and understanding data to influence product or business decisions makes up an important part of their role at work.
– The average yearly compensation for the majority of candidates (531) is $0-999 which probably refers to the fact that the majority of respondents are entry level.
– The majority of respondents (1216) said that they didn’t spend any money on machine learning and or cloud computing services at home or at work in the past 5 years.
– The majority of respondents said that AWS is the best developer experience they have.
– The most used cloud computing platform is Amazon Web Services.
– The most used data storage product is Amazon Simple Storage Service.
– The most used data product is MySQL.
– The most used business intelligence tool is Tableau.
– The majority of candidates in 7208 said that Youtube is their favorite media source that reports on data science topics.
These are The most comprehensive dataset available on the state of ML and data science from 2019 to 2022 :
- 2022 Kaggle Machine Learning & Data Science Survey https://www.kaggle.com/competitions/kaggle-survey-2022
- 2021 Kaggle Machine Learning & Data Science Survey
https://www.kaggle.com/competitions/kaggle-survey-2021
- 2020 Kaggle Machine Learning & Data Science Survey
https://www.kaggle.com/competitions/kaggle-survey-2020
- 2019 Kaggle Machine Learning & Data Science Survey