I’m teasing of course. I don’t expect to raise anybody’s salary with a discussion of quantitative variables (or anything else).
But I do hope to explain one of the more important differences between web analytics and most of the traditional BI marketing analytics that has been done in the last twenty years. In addition, I’m going to make the case for why a new tool from WebTrends is a lot more important than you may be inclined to think. Along the way, you just might get a little richer – in understanding if not in take-home pay!
What actually got me to write about quantitative variables is a product from WebTrends that I first saw in late July and re-visited at the Engage Conference in Vegas. The product is called Score and it’s a product direction that I believe to be significant. One that echoes back to a great deal of work we did in the first six or seven years at Semphonic.
To explain why I think Score is significant, I’m going to have to have to delve into a bit of analytic theory. Don’t groan – I’ll do my best not to be too digressive!
For many years, multi-dimensional analysis has been the staple of business intelligence systems. Products from companies like Business Objects, Cognos and MicroStrategy have provided rich multi-dimensional reporting and analysis for probably a good decade. Web Analytics tools like Discover 2.0, Visual Site and WebTrends VI are just beginning to provide the similar levels of capability to web analytics.
What is multi-dimensional analysis? It’s simple stuff really and this is ground I’ve gone over before. In basic statistics, you typically start an analysis with a Frequency table. A Frequency table gives you the counts for all the values of a single variable. Here’s a example:
Gender
Count
Percent
Male
1549
37%
Female
2238
53%
Unk
413
10%
A frequency is a 1 dimensional analysis – it looks at a single variable. The next step up in complexity is called a cross-tabulation. And cross-tabulation typically begins with two variables. Here’s a classis 2-way cross-tabulation:
Gender
Age
Male
Female
Unk
16-25
540
400
60
26-40
525
900
75
41-65
325
780
195
65+
159
158
83
Cross-Tabulation is 2 dimensional analysis – and the basic method can be infinitely extend into three, four, five and potentially even more dimensions. In three way analysis, we might add a variable like income and be able to see the count of all High-Income, Age 16-25, Males versus the count for all Low Income, Gender Unknown, Age 65+.
What’s happening when you do multi-dimensional analysis is, implicitly, visitor segmentation. Each cell in the n-dimensional table can be reasonably considered a specific visitor segment. And by adding metrics around success or usage, you can map these descriptive variables to real-world differences in performance.
N-Dimensional analysis is powerful. But it also has some fundamental limitations that are poorly appreciated both in the BI world where it has been the dominant paradigm and in the web analytics world where it has looked like the holy grail.
When you use multi-dimensional analysis, you are segmenting visitors (or visits) into finer and finer units. Eventually, you might have a success count for an extremely small population defined by six or seven different factors. But as powerful as this is, there are some things it just can’t do.
First, multi-dimensional analysis is like a series of implicit AND filters. A visitor must be 18-35 AND male AND High-Income AND located in California. But suppose you want to add an OR filter. Suppose, for example, that you want 18-35 and MALE and (High-Income OR Medium Income) AND located in California. You can do this (in a way), by adding up cell counts. But the multi-dimensionality works against you now, because the OR may add 500 (50 states x 5 age categories x 2 gender categories) cells to keep track of. That isn’t very practical. So here’s a key capability to look at when you evaluate a multi-dimensional reporting system – can you collapse some values in a dimension easily. Some systems let you do this – – but many don’t. It’s a subtle point but it makes a big difference in the real world.
Even more significant, however, is the difficulty that ANY multi-dimensional analysis system has with quantitative (continuous) variables. Classic multi-dimensional analysis evolved in the CPG world where demographic variables were almost always the dimensions. Demographics aren’t usually quantitative. You are either Male or Female, 18-35 or 60+. It’s true that variables like age and income COULD be treated as quantitative values, but they are almost always used as Category variables. These variables aren’t treated as numeric variables where the value difference is signficant. From a marketers perspective, 25-40 is just a category. and 26 is the same as 40 but different than 24.
This is a key fact about Multi-Dimensional analysis – it isn’t particularly useful for anything except the analysis of variables as CATEGORY variables. Multi-dimensional analysis doesn’t treat values as numerically significant.
In web analytics, however, the key dimensions are behavioral. And virtually every behavioral variable IS quantitative. We are commonly interested in HOW MUCH a visitor did: how many page views of Product Material, how many petitions they signed, how long they spent on site, how often they visited, how much product they purchased. In all of these cases, good analysis of the data requires numerical comparison of the variable values.
There are behavioral variables that aren’t quantitative (Is Customer, Is Registered) but they are much less common than quantitative variables. In web analytics, by far the most important behavior is page view. And page views are quantitative in every sense. They are numerically comparative and the number is ALWAYS significant.
Multi-dimensional analysis doesn’t handle these quantitative variables. You HAVE to bucket variables before you can analyze them – so you’re reducing quantitative data to category data at the very outset. And if the variable isn’t bucketed, it isn’t available as a dimension. So suppose you want to analyze the effect of viewing Product X feature pages or spending time in the Product X feature area. In multi-dimensional analysis, you can’t do it unless you can bucket Product X feature page views or Product X feature time.
In many, many systems, you can not bucket these variables. That means you can’t do multi-dimensional analysis on them. And even when you can bucket them, you are limited to the ranges in the buckets. You aren’t ever analyzing them numerically. And here’s where our AND issue strikes again, because if I want to evaluate visitors with High Pages OR High Content Time, I’m back to adding up cells. And the methods of multi-dimensional analysis don’t give me anyway to take advantage of the fact that the actual value of a quantitative variable IS significant and can be usefully compared to other actual values.
Without the ability to flexibly bucket and collapse range variables, doing multi-dimensional analysis on quantitative variables is impossible. With them, it is extremely clumsy and not very useful. No matter how powerful the multi-dimensional model is, it’s really the wrong tool for the job.
So what’s the right tool? Well, that brings us back to my opening. Because the answer is something like Score.
Score lets you assign values to actions – and those values are additive. So I can produce a visitor score based on the number of product content views. Or the total time in a content area. Or on ANY combination of those two values that exceeds a threshold I set. What is virtually impossible to do with even the most flexible and powerful multi-dimensional analysis tool is trivial with a scoring system.
Before we began using off-the-shelf software, scoring was the primary analytic technique we used at Semphonic. We didn’t do it by hand (the way Score has you do it) – we used neural networks to score visitors across dozens of different dimensions. But Score’s user driven approach is still vastly more powerful for a host of important web analytics tasks than mulit-dimensional analysis.
What tasks are these? One of the obvious ones is measuring Engagement – which is why Score was discussed so prominently at Engage. You can’t measure Engagement with a single measure. And nearly every significant component of Engagement is quantitative – how much you do of something is CRITICAL. Which makes analysis of Engagement in multi-dimensional tools problematic.
But the utility of a product like Score is hardly limited to Engagement. For obvious reasons, scoring methodologies allow for significantly richer visitor segmentation than rules based around dimensional filtering.
CRM integration is a third – and particularly significant application for scoring methods. When I first saw Score, its potential uses for driving customer contact programs seemed obvious. Many of our clients do both regular and event-driven email messaging based on site behavior. Using multi-dimensional filtering, these cuts are quite limited. Scoring makes this process both much simpler and much more powerful.
A travel site, for instance, could easily establish threshold values for receiving an email alert on a particular destination. That threshold might include ANY combination of actual trips to the destination, planned trips to the destination and actual trips to similar destinations. Trying to accomplish a similar filter with multi-dimensional cells is either impossible or tortuous.
Here’s an even an even more difficult problem – suppose I send my customers a once-monthly newsletter and I want to target a dynamic offer to the destination they are MOST interested in. This is flatly impossible with multi-dimensional filtering. It is possible with scoring systems (though not – sadly – with Score which doesn’t currently support this).
Effectively, what scoring methods let the analyst accomplish is the analysis and combination of quantitative variables. In web analytics, this turns out to be an immensely useful capability because all of the core variables ARE quantitative.
Score is a ways from being perfect – it’s Version 1.0 after all. You can’t use all the variables you should be able to. You can’t compare scores. You can’t use negative scores. There are limitations on the number of scores you can build. The rule building process isn’t terribly flexible when it comes to integrating with large amounts of content. There is no data driven scoring.
But despite these V1.0 weaknesses, Score is a very, very significant upgrade in capability compared to multi-dimensional analysis within web analytics. It’s already a great tool for a number of key web analytics tasks. With continuous improvement, it has the potential to become one of the most important tools in web analytics. Once analysts start using tools like Score, they are going to realize something that should have been fairly obvious all along. Web Analytics is all about quantitative variables.