The basics of data analysis on behavioral data

Behavioral data is one of the New Kids on the Block in the market research world. Since behavioral data is very different from the data that is coming in from traditional research, we often get the question from data analysts if we can give some guidance in analyzing the data.

This blog post will give an introduction into behavioral data and how you can start working with it.

We will do this by showing you what metrics you can think about when looking at the different data files of Wakoopa, as well as creating small use case examples which you can use as inspiration.

Why is behavioral data so different from traditional survey data?

When talking about online data collection for market research purposes, we are usually interested in two different types of data.

  • Opinions (stated), which refers to subjective data like emotions, intentions, moods or preferences; that is, all kinds of information that is inside our brains.
  • Behavior (observed), which is a kind of record of our physical actions, like the products we purchase in webshops.

Both of the data sources are needed to fully understand consumers, but both are of a different nature and need a completely different approach.

People change their minds quite often. People might change their preference for a political party, a brand, or a product. But facts are facts; if last week I purchased a smartphone, nothing will change this fact, even if I am not fully satisfied with it.

Opinion Data Behavioral Data
Nature of Data Subjective Objective
Structure Multi-Variables Limited variables
Relevance of data High Saturated, unsure
Level of influence on data Direct, high Low

Differences in approach to analysis

When we look at the way the two sources of data need to be analyzed to make sure we get valuable output, there are big differences between them.

The most visible difference between the traditional survey data and behavioral data lies in the initial output.

Surveys are aimed at getting a specific question answered, and this is done by sending a survey with targeted questions to a possibly pre-defined group of participants. The outcome of this survey are answers that are often pretty linear. On question 1 you can answer A-E, and that goes on for a number of questions, with some dependencies between them. However, in the end, the format of the output is predictable.

The part of defining the questions and drawing up the questionnaire is the critical phase here. When the goal of the research is taken into consideration during the definition of the questions, the questions can and will be formulated in such a way they will lead to answers and data that are useful for the project and are easy to process.

When we look at behavioral data, we are looking at an outcome which is the online data of an N amount of people for an X amount of days. Nowhere in this process it is known if this data will be useful, what data is actually useful and what data should be looked at.

This means that it can be a bit overwhelming when you start your analysis by merely looking at huge sets of behavioral data.

If you are looking at one day of data of a regular internet user, you are looking at hundreds of pageviews on desktop, and the same on mobile if the participant installed cross-device.

To give you an example of how quickly data sets can become large, when we look at our own data, a panel of 200 participants creates about 1 million rows of desktop data per month. Microsoft Excel cannot handle files with more than 1,048,576 rows. (source)

So you can imagine that this type of data can run into large numbers very quickly, and that calls for a different approach.

How to approach behavioral data from a market research perspective

Whereas the initial filtering on data and participants is traditionally done when creating the survey questions, this filtering is now one of the most important parts of the analytics on a data level.

Defining the points of interest, segments, and categories is important to start off from before looking at the data itself. Defining what you want to see in the end, what information is of interest to you, and what will add value is the same type of creativity and logic as defining survey questions.

So, are there specific groups in my sample I am interested in? Do I already know certain demographics that are particularly interesting? Do I want to create segments of people based on certain behaviors? Am I working with pre-defined communities or groups? Am I interested in a specific industry or domain? etc.

Answering these questions before even opening the data files will give you easy opportunities to filter or subset the data, which will allow you to work quicker and will give you the possibility to work with smaller files as well.

The next step is to further define your research questions. If you have decided that mobile data is interesting for you, you need to define if you want to see mobile data produced with apps, or on websites. If it is on apps, is it for all apps, a specific category, or a few apps?

Defining these initial research questions as detailed as possible can and will save you a lot of time when executing the analysis.

A simple example of a use case you might want to answer with behavioral data can be:

“Which website categories are more popular with my target audience than for the overall population?”

I have made all words that are variables in this question bold.
Looking at an example this basic, you can already define a lot of different metrics and more precise questions that will help you to filter and subset the data with, making it easier to handle.

Website: We know we are looking at websites, this means that app data is not of importance here and can be removed from the data.

Categories: The main object of interest are website categories, so we aggregate our data set based on the categories. No need to run analysis over exports showing all visited domains or even full length URLs.

More popular: By identifying what is more popular you create the main metrics of this question. Is a site popular when 1) it is visited most often, 2) it is visited by the most unique visitors, 3) most time is spent in total, 4) most time spent on average etc.? If all metrics are interesting, create multiple research questions.

Target audience: Here we look to identify who our target audience is. If you say you are interested in ‘young male population’, then define a young male. Is this based on age ranges 18-25 or from 16-30 etc? If your audience is ‘people who have visited website XX’ you can also do this based on behavioral data. If you have identified the audience, you can filter the data on the data of only these individuals.

Running the actual analysis

Continuing with the same example as stated previously, “Which website categories are more popular with my target audience than for the overall population?”, a lot of research questions could be formulated:

  • Which website categories have the most visits from panelists between 20-30 years old?
  • Which website categories have the most visits from all panelists?
  • Which website categories have the most unique visitors from panelists between 20-30 years old?
  • Which website categories have the most unique visitors from all panelists?
  • Which website categories have the most time spent by panelists between 20-30 years old?
  • Which website categories have the most time spent by all panelists?
  • Which website categories have the longest visits on average by panelists between 20-30 years old?
  • Which website categories have the longest visits on average by all panelists?

So even though we are looking at a research question that is seemingly very simple, we already have 8 different, more precise, questions based on it. If you start with a more complex research objective, you can create more research questions. But you will see that when you break down these questions into as much separate ones as you can, all the separate questions will be easy answerable questions.

Example of answering one of the questions

When the detailed questions are formulated, answering it is easy.
If we take: “Which website categories have the most visits from panelists between 20-30 years old”, answering needs the following steps (regardless of the analysis tools used):

  • Subset data on target audience.
  • Count the total number of visits to a website category and order them from high to low.

If you then want to zoom in on one category, you can remove all data that does not belong to this category, and do the same analysis, but change the category for the domain.


As with every task, preparation is key when we look at behavioral data analysis. Answering clearly defined questions is not difficult. The biggest trick lies in the definition and knowing what is of interest in the data, even before looking at it.

This way of thinking is something that is important throughout the entire research design, from sales to execution. When a prospect approaches you with a question such as ‘I want to know what people do online’ there should be an immediate response to further define this question so if you can identify which research methods are necessary, which people should be invited, and help the research team to come up with research questions that can be defined as detailed as necessary.

If you need guidance to set up a research or if you would like to discuss how you could approach a project, just contact Wakoopa. We are happy to help!