Kaggle Data Science Survey 2017

Kaggle Data Science Survey 2017

Kaggle is a fantastic company. They are a fun place to get experience dealing with realistic data science problems. You compete with others trying to get the best model. Sometimes it’s “just” for bragging rights (which is really all that matters anyway) and sometimes there are prizes involved. Check it out if you haven’t already.

The latest cool thing Kaggle has done is to survey everyone who has an account on their website. Sixteen thousand people responded. No, I was not one of them. In addition to collecting and using the data themselves, they’ve published an anonymized version of the data and launched an interesting website that lets you easily cut the data in different ways.

https://www.kaggle.com/surveys/2017

Here are my three favorite takeaways:

  1. There are a non-trivial amount of people on Kaggle that are zero years old. Give that kid a job. (a.k.a., Surveys have crazy data sometimes.)
  2. 63.5% of people do logistic regression at work. That’s the most common method used by those who report on this question. It holds the top stop for all of the different company size buckets.
  3. The battle between Python and R is intense. Python takes the top seat overall and in many industry sub-categories. Non-profits seem to favor R more.

You can bet this website is going to make an appearance in my final lecture of Supply Chain Analytics next week.