In my previous blog I covered how to how to identify fake referral traffic in Google Analytics data. Now it’s time to clean it up!
In order to clean up the data that had already been collected I created a segment view to see the data with the spam traffic filtered out. Setting up a filter will block spam data from affecting future data (which I will cover next time).
Creating a new Google Analytics Segment
To begin, click the + Add Segment towards the top of the Google Analytics screen. Give it a name and then open the Conditions tab in the left column. Two filters are required to remove both the Ghost Referrals and the Bot Crawler traffic.
Remove Ghost Referrals
As I discovered earlier, ghost referrals (as well as fake direct and search traffic) all share a common element; they all have a Hostname value that is not that of my website. Therefore the most effective way to remove them is to only include sessions that have a Hostname value that contain PaulJardine.co.uk.
Note that if you use your tracking code on other sites (such as an external Paypal checkout if you have a shop) you will also need to list these sites as well (in regex format).
Applying this filter removed over 80% of the overall traffic, that’s just how much Ghost Referral traffic I was getting on this particular site!
Remove Bot Crawlers
Having applied the Hostname filter, next I wanted to remove the remaining bot crawlers. With the hostname filter applied to my new segment view, I returned back to the Referrals data in the Acquisition tab to review what was left.
I copy-and-pasted the spam results into a text document like so…
buttons-for-your-website.com buttons-for-website.com semalt.com
I then needed to convert this list into a regex format, placing a | (vertical line) symbol between each domain (there shouldn’t be a vertical line after the last one). The finished regex code looks like this…
I then created a second filter to exclude sessions where the source matches the regex of the list.
This filter may need to be updated from time to time if/when new bot crawlers start visiting the website.
Applying the filtered segment view revealed that the spam referrals to this particular website accounted for approximately 70% of the overall data collected! Having cleaned it up and restored the accurate results, the Analytics stats became useful again and can once again be used to help inform design decisions on future updates to the website.
Next time, I’ll explain how to create a filter to stop future data from being collected in the first place!