This blog post is a bit of fun that might give you some idea of the sorts of thing that cannot be tracked by your web analytics system…
1. Odysseus Off-Site
This intrepid traveller ranges far and wide across the internet but has so far resisted the siren call of your site. People like this are very interesting for online marketers especially if they represent your target demographic but you will not find out anything about them from your onsite analytics. Service like Hitwise or Google Ad Planner can give you information about where on the internet Odysseus is hanging out.
2. Henry In-a-Hurry
Henry decides very quickly if he wants to continue looking at a web page. If your analytics code hasn’t loaded by the time he clicks a link or presses the back button then his pageview will not be recorded. The new Google Analytics Asynchronous code will improve tracking of Henry In-a-Hurry.
3. Nadia No-Script
For whatever reason Nadia browses the web without javascript. This could be for many reasons; paranoia, browser security settings, browser default settings or browser age. Nadia may be a competing SEO trying to view the web as Googlebot sees it or a geek browsing using only the command line.
4. Christopher Cookie-Eater
Who stole the cookies from the cookie jar? Christopher just can’t bear to leave all those cookies just sitting on his hard drive so he eats/deletes them at the end of every browsing session. Chris will be tracked but he will seen as a new visitor even if he visits your site everyday.
5. Deborah Disabled-Cookies
Unlike Christopher, Deborah doesn’t eat cookies at all. In fact, she won’t allow them anywhere near her browser. Some javascript based analytics systems will track Deb’s pageviews but there will be no persistance so every pageview will be seen as the start of a new visit. Google Analytics will not track Deborah at all (probably to prevent your own site appearing in your referrer stats).
6. Thomas Tin-Foil-Hat
Tom does not like the idea of you having any information about what he does online. He probably has javascripts and cookies disabled but what distinguishes him from Nadia and Deborah is that he has explicitally told his computer to never communicate with Google Analytics. This can be done by editing the hosts file. Thomas may also use a variety of (secure) proxies or Tor so his visits can not even be tracked in your server logs.





















Do you have any idea what percentage these personas represent on a typical site? I would love to know if it’s minor or something to be more concerned about…also, recommendations on tracking these visitors via non-Google Analytics methods?
Thanks!
Sharon Mostyn
@sharonmostyn
Hi Sharon,
The percentages will vary quite a lot depending on the demographics of the site. W3Schools estimate the 5% of users don’t use javascript, a random webmasterworld thread says that just under 10% of users block cookies.
No javascript tracking system will be able to accurately monitor these visitors. You could try using a server logs analyser to see if a lot of visitors are not being tracked by Google Analytics.
I run piwik, google analytics and raw server-log analytics on several of my clients sites (and my own).
The number vary greatly. Not only demographics, also volume makes a difference.
My general (not scientific) rules are:
* The more bulk you are serving the *higher* the percentage of untracked visitors. This probably goes to a certain amount, which I have not seen yet. I assume this is because higher volume == higher visibility. And that will attract more fake hits (wgets, crawlers, accelarators, caching proxies etc)
* Generally Google is the lowest of the three. Raw server logs report between 1 upto 13% (that i have seen) more vsistors then Google. Most probably because of the abovementioned reasons, but also because it is the most-used and hence the easiest to block.
* Piwik (a self-hosted, opensource anaytics tool) uses cookies+javascript. Reports generally more hits then Google, but still far less then the raw logs.
I think that a very interesting feature for either google analytics or a tool such as piwik, would be to pull in raw server logs and anayse these alongside the javascript-generated reports.
On a mobile site (twa.sh) I see over 47% difference between my piwik logs (javascript) and the raw logs. That is because most mobile devices (symbian) have hardly any javascript-support and will therefore not be tracked.
I’ve seen some noscript sections which include 1px by 1px img refs where the src points to a script which tracks the hit via querystrings in the URL of the SRC. This requires some advanced server-side scripting, but the effect is a GET request (via the SRC) to a script which can indeed track what Google misses.
A very cool way to explain some very geeky stuff. Great!
@Bèr Kessels
Some very good points you make there. One reason for the difference between Piwiki and Google Analytics may be that they use different definitions for “visit” and “session”.
Agreed about mobile browsers. I should have mentioned that in the article.
@SEO Blogger
I’m not a server side expert, but I think your method might run into problems with browser caching. Maybe it would be best to run it alongside a javascript system.
To echo what SEO Blogger said, almost every analytics package I’ve implemented/come across has used some kind of type fallback for the nonscript users. Setting the right combination of response headers can get around most of the caching issues.
I’d suggest that in much the same way that it is impossible to create a site which will look perfect in every combination of browser/OS, there’s no single silver bullet to capture all page views in a single analytics package. The more important point is that surely when applied and interpreted properly, there shouldn’t be any need for such a thing. Any set of usage statistics will only ever be indicative and should only form one aspect of a wider test/review strategy, and understanding this will be far more effective than blindly following the hit counters.
A review of the logs of one 5 figure page views/month site revealed that all the script-disabled views were either developers testing, or spiders
It’s best not to get bogged down in the details of the few users your analytics software can’t track.
Analytics is useful in viewing trends in your traffic, keywords sources, browsers etc. The major trends can lead you to different actions you can take to optimize the site for conversions or search engines.
While you might not be getting a complete data set, you will be able to monitor general trends with any of the analytics software mentioned in the other comments.
Great post Richard, love the stereotypes!
As Bekka rightly points out, it’s often better to avoid getting bogged down and worrying about the ‘untrackable 5%’, as the remaining 95% will generally provide a significant level of insight to make informed decisions.
That said, I’m interested to know how big a percentage the ‘Henry-In-A-Hurry’ visitors might might make up, as I imagine it could be quite large, especially in areas with slow internet speeds (such as Australia) where pages often don’t fully load before the next one is requested.
Cheers,
Alan
@Alan,
You make a good point that Henry might not be in a hurry, it might be everything else (or specifically the tracking code load time) that is slow.
@Bekka, @Alan
You are, of course, correct in saying that it is often not worth worrying about the untrackable percentage of visitors and that useful information can deduced from the remaining ninety something percent. I think it is also important to realise that you are not getting the complete picture, particularly if you are segmenting the data into thin slices.
Don’t forget users using AdBlock (e.g. the “EasyPrivacy” list) or other addons like Ghostery. The latter might fall under the “tin foil hat” grouping, but the former are definitely another distinct group.
@Ian
You make a good point.
Technically (I think, please correct me) AdBlock works in a similar way to the Tin Foil Hat host blocker method but it is used by a lot more people.
The title says ” Personas that Google Analytics can’t Track” like it applies only to Google Analytics, but is there any reason to believe Coremetrics or Omniture would track these personas any better?
@James my expertise is with Google Analytics rather than Coremetrics or Omniture. The persona’s listed will not be tracked by any javascript based tracking system. If Coremetrics or Omniture can integrate some sort of server side solution or log file parsing then they will be able to get some information on the visitors that GA can’t track.