Impressions from WHAT Datathon 2017

As part of WHAT Conference, we organized the first edition of WHAT Datathon on April 22-23, 2017. We decided to host this event to show how behavioral data can be turned from raw data into valuable insights, how they could be visualized, and how the data could be used to predict future behavior of consumers. Since there is so much unexplored potential with behavioral data, we invited knowledgable and inquisitive people and challenged them to create power out of said data.

The participants could compete in three tracks:

The GfK Insights track asked participants to reveal meaningful insights that generate business value from the current data (e.g. path-to-purchase, segmentation, categorization).

The DAN DNA Visualization track challenged the participants to impress the jury with an outstanding visualization of the information and insights that they haven't seen before (e.g. spheres, flow charts, activity visualization).

Finally, for the IceMobile Predictive track the participants had to develop machine learning algorithms to predict the gender of a user based on their navigation in terms of sessions and urls (e.g. 200k sessions/14 million urls to train, and 50k sessions/3.4 million urls to test).

We started the day with a small breakfast, as we gave the participants some time to network, meet each other and form teams.

breakfast1 breakfast2 networking

We also handed out special WHAT Datathon goodie bags, with a little message of support on the front for the tasks ahead.

The event was hosted at the TQ main event space, which was generously sponsored by them and provided a great open space (with a fantastic view!) for our participants to work.


After breakfast, the participants were given their briefing and access to the data.


At 11am, the clock started on the 24 hours of competition, after which the teams would have to present their findings from the behavioral data set to the jury.



During the 24 hours, we of course provided the participants with plenty of food, snacks, drinks and entertainment to keep them going through the long, sleepless night!




Games Many of the teams stayed through the night to make sure they had the best possible chance of winning one of the tracks. The teams worked hard during the night, as this graph tracking the activity on the machines provided by AWS shows. All times are UTC, which means there was a spike of activity at around 5am local time.


When the competition ended at 11am on Sunday, it was time for the presentations. All eight teams presented their findings to our jury of behavioral data experts from DAN DNA, IceMobile, GfK and Wakoopa.


There were some great insights, visualizations and predictive problem solutions presented by our teams. Some notable mentions include an analysis of the behavior of 'adult content' watchers, and some creative hand-drawn graphs.

After all the presentations were done, the participants enjoyed a catered lunch while the jury deliberated and decided on their winners for each track.

First up were the winners of the DAN DNA Visualization track, Team Cortex (Jesse Paquette, Sergio Bellon Alcarazo, and Marjolein Smits). Incredibly, Team Cortex was formed at the Datathon, and hadn't met before the event. They used a combination of Tableau and Jesse's own software,, to visualize the user data.

Team Cortex used Tableau to visualize information about the users, such as their education, preferred bank, and age group.
Additionally, they also used to create characterizations of user clusters.


The winners of the IceMobile Predictive track were Team TeaClub (Jedda Boyle, Madli Uutma, Taavi Kivisik, and Sebastian Mehldau) who had the highest AUC or 'area under the curve' score. The teams were given a mostly complete data set, with roughly 20% of it missing the genders of the users. The challenge was to predict whether the users were male or female based on their online behavior. Team TeaClub, by simply identifying some key domains which showed up predominantly for one gender, managed to score the highest and win the track.

The final scores for the predictive track, with Team TeaClub edging out the second place team by just a few points. The leaderboard was projected in the room all night, and teams were given the opportunity throughout the 24 hours to submit new predictions and see their score in real time.


Finally, the GfK Insights track was won by Team CodersCo (Mariann Lesko, Gosia Wrzesinska, Jan-Mark Wams, and Artem Duplinskiy), who used their own dashboard to visualize and cluster online fashion shoppers behavior based on the data set.

Using their dashboard, CodersCo visualized shopping behavior in order to provide insights for businesses about how their consumers are behaving online.


Even after staying awake all night and working hard on their presentations, some of the participants still had the energy to stay for some celebratory drinks.
WHAT Datathon was a huge success. We received a lot of great feedback, and even 100% of the respondents from our post-survey would recommend the event to friends/colleagues. And of course, we as Wakoopa can take away a lot of inspiration from the event. We are looking forward to the next Datathon in the future.

We want to give a huge thank you to everybody who came down to participate in the first ever WHAT Datathon!

Of course, the Datathon would not have been possible without the help of our fantastic sponsors. Thank you to DAN DNA, IceMobile, GfK, Blendle, Elastic, Catawiki, StartupAmsterdam, Neuro Flash, Fruitful Office, and Tony's Chocolonely for all the support!


For more images from WHAT Datathon head to the facebook page.