If you really, really want to get a clear picture of your site’s visibility, you need to start by looking at your site from the search engines’ point of view. That means using some type of crawler to spider your site, the same way Google, Bing and Yahoo! do.
Warning: The rest of this training goes far, far better if you learn to use Xenu! So learn. to. use. it. Read this section carefully and follow along.
There are dozens (hundreds? thousands?) of different crawling tools out there. For this training, I’m going to stick with five, and focus mostly on Xenu:
- Xenu™ Link Sleuth. Free, and more features than a Swiss Army Knife. Only runs on Windows. Which is why I still own a Windows computer.
- Integrity for Mac. Free (donation recommended). Not bad, but it’s no Xenu.
- LinkAider (web based). You can do a 500-page crawl for free on a single domain, on a monthly basis. After that it costs from $24/month up.
- 80Legs (web based). Super-versatile but requires some serious expertise to really be useful. If you’re a geek and want to use a really advanced crawler, this is the one for you.
- SEOMOZ’s Crawl Test. Part of SEOMOZ’s toolset. Not bad, but only does 50 pages, and requires a membership.
Of all of these, I have the most love for Xenu, which continues to be the easiest, speediest tool out there.
Using Xenu to crawl your site
I’ll assume you’ve managed to install Xenu. If you had any problems, let me know in the comments below.
- Start up Xenu.
- Before you start your first crawl, set Xenu’s options. Click Options // Preferences:
- I check everything, including ‘Treat redirections as errors’, so that I can detect non-search-friendly redirects like 302 redirects.
- Set the parallel threads as desired. Note that if you set too high and your site can’t handle the load, you’ll crash your site. Makes a good story over beers, but it’s not much fun when it occurs.
- Click File // New // Check URL.
- For our purposes a basic crawl should work. Enter your home page URL in the first field:
- Click OK. Xenu will toddle off and start crawling your site. This can take a while on a larger site. On my blog, which has about 3800 pages, it takes about five minutes.
- When Xenu’s done, you’ll get a report showing broken links, all URLs on your site, and some other fun stuff. Be sure to open the report and browse through it.
- In Xenu, save the result. Click File // Save and give your file a name you’ll remember. That way, you can open the crawl results later on for comparison to newer crawls, or to run a a report.
Note that you can also:
- Export an XML sitemap. Click File // Create Google Sitemap File. You’ll need this in the next module, so you may as well save the sitemap now.
- Export to Graphviz format. Graphviz is an open source data/charting/flowcharting tool. Export this file and you can generate an instant sitemap. I don’t always have great luck with this method but when it works it can save a great deal of time.
What you’ll really want, though, is to dump the crawl results into Microsoft Excel. On to the next section.
Importing Xenu results into Excel
Exporting is easy. Click File // Export to TAB separated file:
Then open Excel. Click File // Open and open the delimited file you just saved out of Xenu. Excel will prompt you to specify the file type. Make sure you pick ‘Delimited’.
Save the result.
Nothing else to do with this right now – you’ve got the central resource you’ll use for a lot of your onsite optimization set up and good to go.
I’ve given you the basics – just what you’ll need to use Xenu for a solid SEO campaign.
If you want to learn even more about Xenu, read Tom Critchlow’s excellent post about the software over on the SEOMOZ blog.
I can’t use Xenu!
If, for some reason, you can’t get your hands on an old copy of Windows XP, or if you simply refuse to touch anything from Microsoft, you can use one of the other tools I recommended above. Of all of them, LinkAider is the easiest.
Or (shameless plug here) you can also talk to me about using Portent’s custom crawler. If you’re a paying training member chances are I’ll let you do a crawl or two for free.
Roll your own
You can, of course, write your own crawler. We did it at Portent, years ago, and continue to use the tool today. But that’s some hardcore geekery.
But really, unless you’re a hardcore coding geek and just think spending hours poring over scripts is the most fun you’ve ever had, I don’t recommend it.