This is the second post in a series that explores ways you can use Search Console (SC) and Google Data Studio (DS) together. This time we will be looking at some traffic that is often hidden from Google Analytics: Direct PDF downloads.

In the first post, we looked at ‘Question Queries’  and used a new dimension to group and isolate particular queries.  Now we will be extending that example and looking at what I call  ‘hidden traffic’.  Again, I will be providing a basic workbench that helps you explore this and analyze it further!  I’ll be sharing some very overlooked opportunities here as well.

First, here is the Data Studio workbench we will be building. Just like the last examples, this one uses the data source selector feature so you can try it right away on your own Search Console properties. You can also use it as a template to build your own reports however you like!

PDF Analysis Ex2 V1.0 – Helpfullee

Next, we’ll look at the ‘Case Statement’ that is the basis for finding direct downloads from URL fields from the Search Console URL data.  After that, I’ll show you how to use the workbench to do some  SEO analysis.  We’ll finish up with a discussion on why this is called ‘hidden’ traffic and what to do with your findings.  Here we go!

How to track direct from search PDF traffic from Search Console and Data Studio

We are going to use our new friend the case statement again in this example.  I’m going to skip over the basics on how to set this up – please see PT 1 for videos and details on this.

In the last example, we used the Search Console Site connector to look at queries. This time we use the Search Console URL Connector to capture and segment out the pdf URLs.

  1.  Start a new data studio – or – create a new connector from the main data studio page at datastudio.google.com. If you start a new report you can create the connector from the connection screen if you don’t have one already.  If you do, skip the next steps!
  2. Select the Search Console Connector.  In the connection screen select your property and then select “Url Impressions” in the table column. Then click connect.
  3. From the edit data source screen create a new field.

A Data Studio Case Statement for a Direct Download Dimension.

While we are searching for PDF files, we might as well search for other kinds of files also. We will just use some simple statements here. You could load all the Regex matches on a single line, but the effect is the same.  I like simple!

Field Name:  Direct Download
Formula:
Case
When REGEXP_MATCH(Landing Page, “.*.pdf$|.*.PDF$”) THEN “Yes”
When REGEXP_MATCH(Landing Page, “.*.xls$|.*.xlsm$|.*.xlsx$”) THEN “Yes”
When REGEXP_MATCH(Landing Page, “.*.doc$|.*.ppt$|.*.txt$”) THEN “Yes”
Else “No”
END

We could do a lot more with this – search for many more types and return the actual type instead of “Yes” or “No”.   But, you get the idea.   These Regex statements basically look for the file extension at the end of the Landing Page Url.

How to use the example workbench

The analysis of helpfullee.com is pretty quick! I have no PDF downloads so there’s nothing to see here folks, move along! Unfortunately, I don’t have permission to share a juicy live example here.   But, I can show you snippets from a medium sized product based site with the details blurred.

screenshot of actual PDF workbench in action on typical site

Here’s how we use the different sections to analyze this example and get some insights.

  1. Data Selector – You must be logged into a google account to see this (they also don’t show in embeds, sorry!)  This allows you to select your own Search Console properties directly without having to copy the report. I selected the target site and the data gets updated.
  2. Summary table – This shows details about URLs that are direct downloads and all the other Landing Page URLs.  This table has filtering turned on so if you click on a row, say Hidden Download ‘ Yes’, all the other tables will be filtered to only show hidden download data.

    Insights:
      It’s easy to see there are a sizeable number of hidden downloads!  And they have about double the click-through rate of normal pages! There must be some juicy keywords in those files! Just click on any of the charts to isolate one type or another.
  3. Hidden Download graphic summary area –  The purpose of these graphics is to get a feel for the data and get a visual idea of the relative amounts of each type of URL.   We use both pie and area charts here based on our new Hidden Download dimension.  These are also interactive filters – you can select the segments of the pie charts and “brush” (left click and drag) on the area charts to isolate a time period.

    Insights: 
    13% of impressions and 21% of clicks are hidden.  The area charts confirm this is not a fluke, it’s a pattern that persists over time.
  4. Filter Controls – These serve double duty: These allow detailed selection and unselection of Landing Pages and Queries, and they show the values for impressions for each detail.   Again, using these filters effects all the other charts.

    Insights:  
    In this case, I filtered out queries with a major brand term. This lowered the values quite a bit, but the ratio between hidden and normal pages remains about the same.  I could also select individual landing pages and see details about that page.
  5. Query Detail Table – this is a slightly special implementation designed to add extra value.  It uses a blend of the data source with itself!  We use the Query field to join the data sources.  The base, or left side of the join, is filtered to only use queries where the landing page is a hidden download.  The right side is filtered to only show Urls that are not direct downloads. Since the names for the metrics are the same we rename the direct side for clarity. Below is a screenshot of the blend setup.

    Insights:  
    Blending in this way lets us see where there is overlap, or in SEO terms cannibalization, of queries between hidden downloads and regular pages.  If you sort the table by the Url Clicks column on the main chart, it will show you clearly which queries are ranking for both kinds of traffic!  Just right click to download this to a spreadsheet for further action.

 

PDF downloads can be a major source of hidden traffic and major opportunities for SEO.

So, why should you care about these ‘Hidden Downloads’ and are they really hidden?  Here’s a rather funny example that illustrates some points.  I was trying to find out how much of SERPS is actually PDF based.  I Googled ‘PDF seo 2018’, a suggested search, and here is what I got!

Serps results showing PDF in the first position!

The irony of this is almost unbearable and I don’t know whether to laugh or cry! Here’s why …

  • A PDF file is the #1 result for this search.  It ranks above Search Engine Journal article on the subject! PDF for the win!
  • The result actually has nothing to do with PDF SEO!   It’s ranking is a happy accident probably based mostly on the name of the file, and the fact that it has a better backlink profile than #2.   It is a nice rich piece of content.
  • It is not gated, and we can’t know for sure, but it probably isn’t registering in their analytics because no site page is ever hit.  Oh, it will show in the server logs, but almost certainly not in analytics!
  • This PDF is not optimized.  It has some nice links going back to the website – a great SEO practice, but these links have no UTM parameters. I’ll get to that later…
  • The second page is a pretty good article about PDF optimization for SEO.  I say pretty good because they do not suggest putting UTM campaign links in the PDF.  They mention that PDF downloads are often a micro-conversion, but fail to mention that if they are downloaded directly you won’t know it!

So, go take a look at all the top posts about tracking PDF files.  They suggest that you track downloads through various plugins or use a tag manager.  But, none of the ‘ultimate’ guides I found even mention that these tracking methods only work for downloads from your site, not when they come directly from search results!

Direct PDF downloads are almost never measured in analytics. 

So, you may be thinking this was a lucky result.  I can tell you, from experience, it is not unusual.  I have not run across a site that actually registered direct downloads in their Google Analytics (except the ones I have worked on and even there we lost the ability after a migration).

In cases where I have measured this, with custom back end code to trigger an analytics hit,  direct downloads accounted for nearly 40% of the actual web traffic to the site! This was not counting robot hits either.  How would you feel about your analytics if it missed 40% of visits?

Why is direct download tracking so uncommon? I think it is because no amount of javascript or GTM wizardry can make this happen on the front end. You have to hit the Google Analytics measurement API directly from your back end system to register the download and that requires back end coding!  I’m sure someone has made a clever plugin for this for WordPress, but if they have I couldn’t find it.

As I mentioned before, this can be a very big deal. Virtually all manufacturing and product based websites have product pages that have multiple PDFs for tech specs, instructions, brochures and more. On some sites, the downloadable files outnumber the pages and have much richer content!

Action from Analysis – What should you do about “Hidden Download’ traffic?

I think the most important thing is to recognize that direct downloads are going on and are probably not in your analytics. What you do then depends on your priorities in context to the goals of the site. I’m not going to give you a one size fits all prescription here, but let me give you some suggestions based on some different scenarios.

Scenario 1.  My download content is gated, but it is showing up in the results. 

You, my friend, have a problem.   Here is a good post that tells you how to remove PDF files from the Google Index and how to keep them from getting indexed by using the robots.txt file.

Scenario 2.  I want the PDF traffic to go to my pages, not the PDF files.  

First, I would check the queries going to the PDF files and try to figure out what content is pulling hits.   Then do some on-page SEO and move that content up to the target web page.  After that, you could modify your redirects to go to that page instead of the PDF file.

Exercise caution and planning here: If you have a lot of direct downloads to switch it is very much like a site migration.  You risk losing this hidden traffic if done wrong. Also, if you are doing site migration you better account for Url changes here also!

This could be an easy win if you want to create some new content and not sure where to start.  You already rank for the queries!

Scenario 3. I have a LOT of downloads!  I want to optimize that traffic, but … 

Optimizing a lot of files requires substantial effort. This can be a bit overwhelming, but it is possible to turn to your advantage.  It may help to get some results before you try to get approval.   Here’s what I suggest you try…

Find the top 20 PDF files by volume (clicks or impressions) and SEO the heck out of those! After that do the top 20% and keep working on it in chunks from the most popular on down.

PDF SEO is a bit different from normal pages but has similar tasks. You’ll need a pdf editor to update titles, meta description tags and links back to the website. Did someone mention links?

SEO Ultra-Pro Tip – Put the damn UTM tracking links in your freakin PDF files! 

I’m not kidding around here.  All the “ultimate” guides for PDF optimization I looked at tell you to put links in your PDF and other files that go back to your sites.  Well and good, but if you don’t use UTM tracking links what does analytics think the source and medium are?  These visits end up in the ‘Direct’ bucket! If you already have links on your PDFs go back and update them with tracking links.

It’s rare that you get a chance to move things out of the direct traffic bucket in analytics. I set my links with a whole new medium so I can easily track this new source of traffic.   I link logos, all URLs, product mentions and more. Heres how I set up my links, Im sure other people have different ideas.

https://www.sitename.com?utm_medium=pdf&utm_source=pdf-title&utm_content=footer&campaign=round1

Will this really make a difference?  Where I have done this we focused on tagging only the top 20% of pages on that site and hits coming back from the correctly linked PDF files outperform all social media sources combined!   Think about it – pdfs are like the OG of viral content!  They get passed around all over the place, and usually to important people who make buying decisions. There are a lot of opportunities here!

So, I would love to hear if you find some new “hidden” traffic on your sites!  I’d also like to hear what you do to modify the workbench and make it better for you.  Suggestions? Comments?