Being data literate is arguably the most important fundamental skill for a digital marketing professional. Data literacy is a separate discipline from raw mathematical proficiency; being good at math is one thing but being fluent in data is another. This probably sounds confusing because they both sound like the same thing so let’s hope I can make the differences more distinctive in this article.
If I took one thing with me from my high-school psychology classes, it’s the concepts of ‘reliability’ and ‘validity’, both of which are key to being data literate.
Reliability: How trustworthy are the sources?
Validity: Does the test successfully measure what it’s supposed to?
Mathematics doesn’t teach you to question the sources of an equation’s values. They’re just there, you use them to complete the equation and you’ve got it right or wrong. Mathematics is binary and unforgiving in this aspect; you might have nearly got the answer right, but you’re still wrong. Mathematics is objective.
Data literacy is subjective and generously forgiving compared to math. You read the data, you question the sources, you draw conclusions and decide if they’re useful enough to base changes upon. But there’s nothing to tell you directly if you’re right or not, and no calculator is going to do it for you. Data literacy is hazier than math and subjective in comparison.
Let’s talk about reliability.
Reliability: Sample Sizes
“Office workers at double cancer risk”
How often do you see headlines like this? They can seem scary at first, but once we question the reliability of the source of this conclusion (plus the fact it’s designed to sell a paper), we can see this statement start to unravel.
If the paper only took a sample of 1,000 office workers and found two with cancer, and compared it to 1,000 non-office workers and found one with cancer, should we be able to fairly conclude that the headline is correct? What if the sample size was only two groups of 100? When we start questioning the sources of the data, the reliability shrinks rapidly.
“1 in 15 Europeans are illiterate”
Oh, come on.
I can apply this mode of thought to digital marketing, particularly the way percentages and averages are tossed around and panicked/knee-jerked over. “This keyword’s conversion rate is 100%, wow!” Yes, but how many clicks does it have? If it has only one then that’s not very impressive. We can solve this problem by having more data. How much? That’s up to you, there are some fancy equations you can use to achieve good statistical confidence, but the conclusion is still subjective, just a bit more reliable. Test 10,000 office workers vs. non-office workers, likewise for Europeans and non-Europeans. Good sample sizes are the key foundation to drawing reliable conclusions.
Validity: Flawed Tests
What if the sample size is big enough for the data to be reliable but the test itself is poorly constructed?
In 1917, with America’s entrance into the First World War, psychologist Robert Yerkes devised an IQ test for army recruits. This was the prototype for the first modern IQ test and also formed part of the foundations of the pseudoscience of eugenics, which would cast a horrific shadow two decades later. The test was comprised of written, pictorial and verbal tests and concluded that the average American serviceman had a mental age of 13, whilst Russians and Italians had mental ages of 11, and Poles 10, on average. Yerkes claimed the test to be free of ‘ethnocentric bias’, meaning that anyone of any nationality would be able to complete the test fairly. Read the sample questions below and see if you agree:
- Crisco is a: patent medicine, disinfectant, toothpaste, food product
- Washington is to Adams as first is to . . .
- Christy Mathewson is famous as a: writer, artist, baseball player, comedian.
This is clearly unfair for non-Americans, no wonder the immigrants scored lower; it’s about as ethnocentric as a test can get. Give the same Americans a Russian-centric test and watch them flounder instead! Lenin, Marx and Rasputin questions, please. Also, why was cultural awareness used as a measure of intelligence? If a similar test were to exist today it would probably ask questions about Obama, the Superbowl and the Kardassians, then I would be graded firmly as an Idiot.
Pictorial ‘fill in the blanks’ Yerkes test included for giggles. Notice the undeniable Western cultural bias.
Believe it or not, these used to be scientific terms.
So, how in the heck does all this century-old stuff relate to modern digital marketing? Well, directly, not much but it’s an interesting snapshot of the mindset of early 20th century psychology and is infamous for being probably the worst test ever. How it does relate though is that testing is a core task for a digital marketer and thinking critically about the validity of that test (does it actually test what it’s supposed to?) is necessary to avoid poor decision-making.
Let’s say you’ve set up an A/B test of the same landing page with each variation having different coloured buttons. The objective of your experiment is to see if a red button increases signups over a blue button but you’ve also set only mobile traffic to Variation A and only desktop traffic to Variation B. Now the test is unfair as you can’t say whether it’s the mobile traffic or the red button that is causing Variation A to win, or vice-versa. The test isn’t flawed in the same way as the Yerkes IQ test but it’s still invalid; squeezing in too many variables into each variation makes judging what’s working too foggy.
Averages are a pitfall.
Basing decisions on aggregate averages is sloppy. What’s an aggregate average? An example would be an AdWords campaign-level clickthrough rate, which are notoriously unreliable as they represent the sum of so many parts. Keywords, ads, bids, extensions and everything in the campaign setting all impact a campaign’s clickthrough rate.
It’s far better to comb through each component and inspect it for damage rather than blowing off a whole campaign because of its aggregate clickthrough rate. One bad keyword could be dragging the whole campaign down, like a faulty screw breaking a complex machine.
Would you throw out your washing machine without checking it for broken parts first?
Simpson’s paradox is a statistical phenomenon that occurs when observing a trend in aggregate data, which is then inverted when the data is separated. Basically, aggregate averages often tell a different story compared to their components. Look at this example below:
By judging the ads’ performance from the overview only, you might decide that Ad 2 is the winner and that you will pause Ad 1. If we look at the individual performance of the ad groups, however, we can see that Ad 1 is actually superior. This is Simpson’s paradox and it usually occurs when comparing aggregate percentages between vastly differing data volumes, when those volumes are made of multiple data sets. The key takeaway from this is that you should be looking at the pieces and not just the whole!
Have a look at the Simpson’s Paradox Wikipedia article. It’s good brain meltdown material for when you need some light bedtime reading.
Statistics in Vacuum
What happens when you don’t look at the bigger picture.
Don’t get tunnel-vision when looking at stats; examine them in context and in a meaningful timeline.
Oh no, the clickthrough rate has plummeted!
Now we can see that it’s because we’re gaining a ton more impressions. Flooding your campaign with impressions will typically decrease your clickthrough rates.
- Exercise skepticism.
- Work from evidence.
- Think in context, not in vacuum.
- Don’t focus on averages.
- Drill down when possible.
- Question your sample.
- Question everything about your data!
Now enjoy some stupid graphs from Brass Eye.