How does kaggle get the data

The master of the data science competition

Yasmin Billeter

The online community for data analysis Kaggle offers exciting competitions and up to seven-figure prizes - addiction factor included. "Master" and data science expert Tobias Mérinat shows where there is still great potential for the economy.

In his spare time, Tobias Mérinat used to play clairvoyant. He was trying to predict who would buy insurance at a given rate, how restaurant sales would perform, and whether a mobile ad would be clicked. Mérinat has no psychic abilities: he is a data scientist. He uses algorithms for his forecasts. He found the building material - the data - on the Kaggle website, where research institutions and companies set tasks that they cannot solve themselves. Experts from all over the world then try to find an answer. You compete for the best algorithm - whoever wins usually receives a cash prize. In some cases, up to five, six or seven-digit awards are advertised.

There are various competitions on Kaggle in which the community develops machine learning models.

Tobias Mérinat is now a father of two. In his spare time, he doesn't have time for machine learning competitions. “They are extremely complex and addictive,” he says with a smile. The 43-year-old prefers to play Lego with his children. The family is important to him: He also has one daddy day a week. Working in the home office suits him. Analyzing data in a targeted manner is now his job at the Algorithmic Business Research Lab.

Machine learning competitions paved the way for me from theory to practice.

He fondly remembers his beginnings with Kaggle competitions: "They paved the way for me from theory to practice: Here I was able to apply data science and machine learning in greater depth." Mérinat took part in dozens of competitions. In the beginning everywhere, later according to interest and where he could learn the most.

He took an active part in the community and, thanks to his success, can call himself a Kaggle Competitions Master. Along the way, he earned virtual medals and competed for the fame of data scientists on live rankings.

Kaggle is not the real machine learning world.

He avoided those competitions in which it was foreseeable that the participants would fight for alcohol - “like in ski racing”. That has little to do with reality. And here he also sees the problem: “Kaggle is not the real machine learning world. It's a cutout. Sophisticated solutions emerge that are extremely precise, but cannot be operationalized. "

Learn from the best

In the real world, a company must first describe a business problem, get clean data, and determine how success is measured. It is not always that easy and often the greater part of the work, says Mérinat. Here he sees an opportunity to learn from Kaggle: “If a company takes the effort and accepts costs of 80,000 to 200,000 US dollars to place a competition, this must be central to its business and bring a significant benefit. "

Extremely sophisticated solutions are created.

As a logical consequence: “By analyzing Kaggle's machine learning competitions, we get a good indicator of which data problems exist in which sectors and branches of industry. We also find out which industries use Kaggle or even know about them. " So Mérinat did what he is good at anyway: organizing data and making it usable. You can get an insight below:

Findings from the Kaggle study by Tobias Mérinat:

Which industries rely on machine learning competitions?

Which topics do the machine learning competitions cover?

The study shows that machine learning is already used in most industries. The industries with the broadest usage are research, internet advertising, medicine, finance, consulting, and insurance.

Kaggle competitions are an indicator of the central and difficult data problems that the big players in the various industrial sectors are dealing with.

The topics where machine learning is most widely used are disease diagnostics, object recognition and natural sciences. Cross-industry topics such as forecasting sales prices or volumes, recommender systems and conversion are also well represented. Object recognition is an established method that is already used across industries.

Tobias Mérinat emphasizes that the competitions are not a comprehensive picture of where machine learning can generally be used. However, they are a good indicator of the central and difficult data problems that the big players in the various industrial sectors are dealing with. The study shows, for example, that banks represent relatively broad interests and can afford to pursue them through competitions. In the area of ​​industrial production, on the other hand, there were almost no competitions. This amazes Mérinat: "There is also a lot of potential here because a lot of data is recorded during production processes."

Tips for the data science journey

Tobias Mérinat Kaggle can highly recommend all those who are looking to enter the data science world. «I learned an incredible amount there. Even tricks that I haven't found in any book or heard in any lecture. " He would not unreservedly recommend the platform to companies. “If a company is willing to invest that much money in a competition, it would be better off knocking on our team. We develop applicable and efficient solutions. "

Our data expert:As a machine learning engineer at the Algorithmic Business Research Lab (ABIZ), Tobias Mérinat is passionate about implementing data-based industrial projects. He has achieved the title of Competitions Master in Kaggle's international ranking.

The online community for data analysis:Kaggle offers services related to big data, machine learning and data mining. The main purpose of Kaggle is to organize data science competitions. Thousands of contributions are usually submitted there. Both individuals and entire teams compete for lucrative prizes. Kaggle is now part of Google.

You have exciting data sets, but not about the algorithms that organize the data and make it usable? The ABIZ research team supports industrial and cooperation partners in the development of business models and services based on complex algorithms (Algorithmic Business) in the context of digital transformation. In addition to research and development, the team offers the following services: consulting on digital business, on-site training, audits and coaching in the areas of artificial intelligence, machine learning, image processing and data analysis.

Educate yourself: Understand AI with the CAS Artificial Intelligence / Artificial Intelligence (AI / KI), apply it innovatively and look behind the hype!

Artificial intelligence at the Lucerne University of Applied Sciences: AI is a focus at the Lucerne University of Applied Sciences and Arts. The Artificial Intelligence & Machine Learning bachelor's degree has been running since spring 2020.

Do you like our IT blog? Here you can get tips and read about trends from the world of computer science. We offer insights into our department and portraits of IT pioneers, visionaries and exciting people: Subscribe to our blog now!

Share post