You’re a developer. Someone asks you if you ‘know SEO’. You answer ‘yes’. Of course you do! SEO is where you put in META tags and stuff!
Alas, it’s a bit more complicated than that. Use this checklist to keep yourself out of trouble and still look good when your site starts ranking.
General Development Requirements for Any Site
No matter what your site is built on – a CMS, in static HTML, etc. - these are musts:
Canonicalization
- One page, one URL. There are specific items about this below. But as a rule, make absolutely sure that there is a single URL for each page of content. The rel=canonical tag is not a solution. Bing doesn’t consistently support it; Google keeps changing their support for it; plus, you’ll need rel=canonical as an emergency fix later on. Don’t build dependence on it into your site.
- Link back to the home page at http://www.yoursite.com/ . Don’t add ‘index.aspx’ or ‘index.html’ or similar.
- Set up a 301 redirect from http://yoursite.com to http://www.yoursite.com.
URL Structures
- Do not use query strings for click tracking of any kind! If you need to track which button someone clicks, or whether they click on an image, use javascript tracking like the event tools built into Google Analytics, instead. So-called search-friendly URLs do not fix this problem.
- Don’t let URLs contain superfluous folders. For example, if all pages of the site are in the ‘site’ folder, don’t include that in URLs.
- Do use plain language URLs. While keywords in the URL don’t help with rankings, and while search-friendly URLs don’t really help with search, they do increase clickthru. People like to see URLs that are descriptive and understandable.
- Don’t add a session ID of any kind to a URL, unless someone is already in a checkout funnel or logged into a secure area that search engines can’t access.
- Use only alphanumeric, hyphen (‘-’) and forward slash (‘/’) characters in URLs. Don’t use underscores, as search engines don’t recognize these as word separators.
Metadata
- The title tag on every page must be editable, independent of the page content heading or headline.
- The title tag should default, though, to the page content heading or headline.
- An ideal title tag formula would be [page title] – [category] – [brand] or similar.
- The description META tag must be editable. While the description tag won’t directly affect rankings, it can affect site quality and clickthru.
- The description META tag should default, though, to the first 250 characters of the page content.
Image Files
- Put all images in an ‘images’ folder. You can split images into separate folders if there’s a topic. For example, if you have lots of images of past projects, or of certain products, you can put those in a ‘project-images’ folder. There’s some evidence that search engines may be kinder to images that are in organized folders.
- Give all images descriptive filenames. ‘soccer-game-photo.jpg’ is good. ‘f2341asdf.jpg’ is not.
- Every image should have a relevant ALT attributes.
- Image ALT attributes should be editable independent of the filename.
- If you’re using descriptive image filenames, though, use the filename, with hyphens replaced by spaces, as the default ALT attribute.
- Require a height and width attribute on all images.
- If you’re going to be using thumbnails as well as full-sized images, try to automate the resizing process for your content managers. It’ll save you a lot of time reminding folks to resize using a photo editor – most writers will resize images using the height and width attribute if you don’t tell them otherwise.
Semantic Structure
- All pages should have a single H1 element. The H1 element should be text, and should hold the headline for that page. Don’t use an H1 element for the company logo, or for text that’s repeated on multiple pages.
- After that, use H2 elements for subheadings, H3 for sub-subheadings, and so on.
- Use list elements (LI) to hold navigation. Don’t use H elements or other HTML elements.
- All paragraphs should reside in P elements. This will make future editing and optimization much easier.
- Use tables only for tabular data. I know there’s a lot of controversy about this. Ignore it. What matters is page ‘weight’ and content-to-code ratio. Tables add code, so minimize their use.
- All javascript and CSS blocks should be in separate .js and .css include files. Your site will run far faster, and search engines will see a better content to code ratio.
- Wherever possible, navigation links should be text, not image-based text.
Interactivity
- For any javascript- or css-enabled navigation, make sure it degrades gracefully. If you turn off javascript and css, all navigation should be revealed. That way, search engines can crawl the navigation.
- The same is true for any javascript-driven slideshows. Make sure these use list items or divs, and that, if you turn off javascript and css, you can see all of these slides.
- Don’t hide large blocks of content using javascript or CSS. Search engines may interpret this as spam.
- Don’t load any content via an AJAX call. Search engines will completely ignore it. AJAX is great for interactive applications, but not so great for web publishing where SEO is a factor.
Server Configuration
- At a minimum, enable GZIP compression for all image filetypes, script filetypes and HTML.
- Set far-future expire dates for images and elements (like backgrounds) that you know won’t change in the foreseeable future.
- Install a robots.txt file.
- Verify the site with Google Webmaster Tools.
- Verify the site with Bing Webmaster Tools.
- Set up the latest version of Google Analytics or a similar, javascript ‘bug’-driven tool.
- But also make sure you’re logging all default data plus referrer, referring keyword, platform, browser, cookie in your server logs. They’re a critical backup, plus they let us troubleshoot SEO and other issues.
Video
- Use YouTube.
- Then upload to other services. You can use a service like TubeMogul to do it. This isn’t developer stuff, strictly speaking, except that you can often use the service APIs to make multiple-service uploads easier.
Special CMS Issues
These issues are directly relevant to the database-driven side of your site.
- Use memory caching of pages. Don’t hit the database for every page request. This is probably totally obvious, but I’ve had times when I regretted not saying it, so here it is.
- Enable daily, dynamic publishing of a standard XML sitemap. http://sitemaps.org
- Enable daily, dynamic publishing of an XML image sitemap. http://bit.ly/9W504p
- And yes, enable daily, dynamic publishing of an XML video sitemap. http://bit.ly/9xVdBm
- Make sure that any ‘page not found’ errors deliver a 404 error code. They must not redirect – redirection can create huge duplicate content issues.
- Map old addresses on the old site to new ones via 301 redirects. This will minimize any disruptions caused by re-launch.
- Don’t have any content that’s accessible only via a search or other form type. Ironically, search engines can’t use search engines. So all content must be accessible via clicks on links.
- If you want content to be crawled and indexed, it can’t be hidden behind a login form.
Note: Even though Google now has a unified sitemap format, I still recommend doing separate sitemaps for each media type. That’s because the other search engines don’t support the unified format – in fact, the unified format isn’t even accepted as a standard.