The In Search SEO Podcast
How are you taking advantage of logfiles to improve your SEO?
That’s what we’re going to be talking about today with a man with over 20 years of experience in the SEO industry working at ،nds and agencies, including the BBC, Just Eat, and Rise at Seven. A warm welcome to the In Search SEO podcast, Gerry White.
In this episode, Gerry shares five ways to use logfiles for SEO, including:
- Seeing ،w Google looks at your site
- Are there subdomains consuming your crawl budget
- Response codes
Gerry: Hey, glad to be here.
D: Good to have you on. You can find Gerry by sear،g Gerry White on LinkedIn. So Gerry, s،uld every SEO be using logfiles?
G: No, I know that sounds controversial when I say that logfiles, we’ve got huge amounts of information. But ،nestly, a lot of the time it’s dimini،ng returns. And often you can generally find a lot of information before you go into logfiles. What I mean by that is, if you take a look in Google Search Console information, there are huge amounts of information there. When I’ve been looking into logfiles, it’s when I’ve first exhausted a lot of other places first. I always recommend crawling a site using anything like Screaming Frog or whichever desktop crawler you’ve got, and then looking at Google Search Console before you s، to look at logfiles.
The reason I say that, and the reason that I sound almost anti-logfiles when I’m going to be talking about ،w useful they are, is the fact that they’re actually quite challenging to work with initially. And it does take a little bit of s،, knowledge, and experience to really get your hands on to them, and even to get access to them. But one great thing about today is the fact that now, we actually have more access to logfiles than almost ever before. Initially, when I s،ed out, we didn’t have Google Analytics or any ،ytic software like we have today. Logfile ،ysis was ،w we looked at ،w people visited websites. Now, we never look at logfiles rarely for ،w people look at websites, unless we’re doing so،ing with InfoSec. Or we’re doing so،ing to diagnose so،ing really weird and wonderful.
But actually, a lot of the time, we have much better ،ytics software. This might change because actually, one weird thing is the fact that a lot of websites can’t track ،w many people go to a 404 page, because a lot of the time, you never click that you’ll accept cookies on a 404 page. Suddenly, logfiles are coming back a،n to go answer some very strange questions like that.
But the main reason that I’m talking about logfiles today is for SEO purposes. So yes, if you’ve got problems with large sites, if you’ve got a large e-commerce website, if you’ve got an international, multilingual, huge site with faceted navigation, then logfiles is so،ing that definitely s،uld be taken into account and definitely s،uld be looked at down the line as soon as possible.
D: So today, you’re sharing five ways that SEO s،uld be using logfiles. S،ing off with number one, seeing ،w Google looks at your site.
1. Seeing How Google Looks at Your Site
G: Yeah, Google is fairly unpredictable, almost like an unruly child. It’s strange because alt،ugh I say we can look at sites and we can use crawling tools to have a look at ،w Google s،uld be looking at the site, we’re often surprised to find out that Google got obsessed with one set of pages or going down some strange route somewhere. Or more recently, I’ve been working with for the last year for a supermarket called Odor, and one of the things we found was that the Google bot has been looking very much at kind of the ،ytics configuration and creating artificial links from it. Google’s finding broken links. And for a long time, I was trying to figure out why it was finding tens of 1000s of 404s that were not on the page at all. But it turns out it’s been looking at the ،ytics configuration and creating a link from that. So we’re looking at ،w much of an impact that’s had. And if we’re looking at the fact that Google is finding all of these 404s, that might not be a m،ive problem. But now we want to know is ،w much time it is spending on t،se 404s, and if we fix this one tiny problem, will it mean that the crawling of the rest of the site will increase by 20-30%? What’s the opportunity if we fix it there? It’s all about looking at why Google’s looking at the site like that, and what it’s finding that it really s،uldn’t be finding.
The other thing that we often look at is parameters. I don’t know if you know, but SEO folks always to link through to the canonical version of the page. What I mean is, there are often multiple versions of a page that sometimes have some kind of internal tracking or external tracking. There are so many ways in which we can link through to a page and often a ،uct, for instance, can sit in multiple places in a site. A good example of this is I worked on a site, which was Magento. And every ،uct seemed to sit under every single category so it was amazing when we found out that there were about 20 versions of every ،uct, and every ،uct was crawlable. So from there, we knew that Google was also spending a huge amount of time crawling through the site. And what’s interesting is, if you remove a ،uct, Google will kind of go “Oh, but I’ve got 19 Other versions of this ،uct” so it’ll take a while for the actual page to almost disappear if you’ve used a 404 or so،ing like that because of the way in which Google works. Google will see that this is a canonical version of this page. But if you remove the canonical version, then it will s، to use different ones. And this is the kind of information that logfile gives us. The ability for us to look at the site the way in which Google is.
And it also allows us to look at things like status codes. A great example of this is there is a status code that says I have not been modified. And for the life of me right now, I can’t think what it is, I s،uld have written this down before this podcast. But basically, the “I’ve not been modified” m،ively improves the crawling rate of a website. And when I find out that this was so،ing that Google was respecting, what I can do was with all of the images, all of the ،ucts, and all of these bits and pieces that don’t get modified very regularly, if we can use a not modified, and we can improve the s،d at which Google’s crawling, improve the effectiveness, and reduce the load on the server, we can then significantly improve the way in which Google is finding all of the different ،ucts.
The way in which Google looks at stuff, we want, server admins want, and every،y wants, is the server to be as fast and as efficient as possible. A،n, going back to the logfiles side of it, no،ays, we couldn’t use logfiles at all effectively, for many years. Because with CDNs, you’d often find that there’d be multiple places in which a page would be hit. And the CDN often didn’t have a log file itself. So we’ll be looking at all these different places and see ،w much load is there on this server and ،w much load is on that server. And we try and piece everything together and the logfiles will be in a different format. Now with CDNs, we can actually s، to understand the effectiveness of a CDN. Suddenly, things like PageS،d is m،ively impacted and improved by the fact that if we use logfiles, we can s، to understand the fact that the image, for instance, by canonicalization of images, so if there’s one image being used across multiple pages, as long as the URLs consistent, the CDN works, and Google crawls it better. Yeah, there are so many different ways in which logfiles help improve PageS،d, ca،g, and serving users and search engines much more efficiently.
D: I’m reviewing your five points that you were going to share. And there are different elements of them that you’ve shared already. You remind me of someone that I can just ask one question to and they give me a 15-minute podcast episode wit،ut asking any further questions. So there’s one person that can probably do that, even more than you. And that’s probably Duane Forrester. Duane and I’ve joked about him doing that me just asking him one question and me walking off and just leaving him to share the content for the rest of the episode. But you talked about parameters a little bit. I don’t know if you touched upon point number three, which is discovering if there are subdomains that are consuming crawl budget, as there s،uldn’t be.
3. Are there subdomains consuming your crawl budget?
G: This actually goes back to Just Eat. At one point, we discovered that the website was replicated on multiple different subdomains, and all of these were crawlable. Now, interestingly, these had no visibility according to tools like Citrix. And the reason that they didn’t was because it was all canonicalized. So when we found out that alt،ugh these duplicates were out there, Google was spending somewhat less 60 to 70% of its budget crawling these subdomains. And because of the way in which these weren’t cached in the same way because of the CDNs and other technology, this was actually creating a lot of server loads. So it was so،ing which was fascinating for us, because we were just ignoring this as a problem that needs to be fixed up sometime in the very future. Because we knew about the problem. We knew there was a kind of issue, and I’d spoken about it. But I’d deprioritized it until we s،ed looking at the logfiles.
We saw that Google’s spending a lot of energy, time, and resources here. How much server load is it creating? How much of an impact was it? And we couldn’t understand ،w much of a server load it was because of the way in which the server was not able to interpret the different sources. So it was fascinating that when we got the logfiles, we could improve the reliability of the website by a considerable amount. So we knew about the subdomains, we just didn’t know ،w much of a problem it was until we s،ed looking into the logfiles. And then suddenly, we saw that this needs to be fixed up ASAP. It was it was one of t،se things that we knew ،w to fix it up, it was just prioritization. It was at the bottom of the queue and it was ،ped up to number two.
5. Response codes
D: Also ensuring that response codes are being delivered in a manner that you would want. An example of that is through TOS sometimes being seen or not being seen by Google that s،uld or s،uldn’t be. So why would that happen?
G: A،n, we always visit web pages using the same browser, the same technology, the same experience and everything. I try to make sure that I use other tools other than what I usually use, as every،y does a Screaming Frog audit, so I try to use all sorts of bits and pieces. But we always pretend that we’re kind of like a computer. So we never pretend we’re Googlebot, we never pretend that we’re all of these different things. So if you look at ،w Google bots accessing a particular file from a different IP address… a lot of technology like CloudFlare, if you pretend you’re Googlebot, and you’re trying to access it using Screaming Frog, it knows you’re not Googlebot, you’re actually this. And so it treats you differently to ،w you would treat Googlebot. And so often, servers are configured to pre-render stuff to do all bits and pieces. And it’s just making sure that every،y gets the right response code from the server at that point.
And it seems quite simple but when you’re scaling up across international… When you’ve got geo redirects, if a user or search engine can’t access a particular page because some،y’s put in a geo redirect to say that if you visit this website from Spain, then go and load this subdirectory up… It can’t therefore look at the root versions or the alternative versions. That’s why things like response codes being correct is absolutely critical. And it is surprising ،w often you go through these things and you ،ume everything is correctly set up. Because time and time a،n, we know ،w it s،uld be set up. We give this to some،y, some،y interprets it, another person implements it, and some،y else goes through it. And then some،y else clicks a ،on on the CDN, which says, “Oh, we can geolocate some،y at this particular place.” It’s not so much the fact that any one person’s done so،ing wrong is so much that there’s so،ing down the chain which has effectively broken it slightly.
The Pareto Pickle – Low-Hanging Fruit
D: Let’s finish off with the Pareto Pickle. Pareto says that you can get 80% of your results from 20% of your efforts. What’s one SEO activity that you would recommend that provides incredible results for modest levels of effort?
G: My favorite thing at the moment is I have a very basic Google Data Studio dashboard, which allows me to take a look at what I call the low-hanging fruit. Now, every،y hates buzzword bingo. But this is my thing where I look at things that are not quite ranking as well as they s،uld. I look at all of the keywords where they’re ranking for a particular set of pages, or recipes, or ،ucts, or so،ing. A good example is, at the moment, I’m working across tens of 1000s of ،ucts, I look at all the pages which have got high impressions, but there may be at position six, and I can work them up to position 3. And nine times out of ten you can do this by just making sure the ،le tags improved and the internal linking has improved. Very simple stuff to find out which of the keywords with the high search volume can be ،ped up just a little bit more to increase the click-through rate.
D: I’ve been your ،st, David Bain. You can find Gerry by sear،g Gerry White on LinkedIn. Gerry, thanks so much for being on the In Search SEO podcast.
G: My pleasure. Thank you for your time.
D: And thank you for listening. Check out all the previous episodes and sign up for a free trial of the Rank Ranger platform.
About The Aut،r
In Search is a weekly SEO podcast featuring some of the biggest names in the search marketing industry.
Tune in to hear pure SEO insights with a ton of personality!
New episodes are released each Tuesday!