insilico

Helping Google find your website – technical guidelines

So far in this blog series ‘Helping Google find your website’ we have talked about simple changes you can make to your website’s text and images to make it easier for Google to find, index and rank your website.

In this post, we are going to look at some more ways of improving your website’s visibility to search engines. We are going to get a little more technical, but don’t worry, we’ll try and make it as easy to understand as possible…

1. Search engines have to examine your website to figure out what is in it and therefore how they should index it. They use computer programs called ‘spiders’ to ‘crawl’ over your website and retrieve the information they need. Spiders read text, so have a look at your website in a text-only browser such as Lynx to make sure there aren’t too many fancy features like JavaScript preventing spiders from crawling your site.

2. Allow spiders to crawl your site without tracking their path through the site by removing session IDs and trackers for spiders. These are useful for tracking what individuals do when they visit your site, but spiders work differently. Leaving these activated can result in incomplete indexing of your site, as the spider can’t tell when two different-looking links point to the same web page.

3. Check that your web server supports the ‘If-Modified-Since’ HTTP header. This feature lets your web server tell a search engine whether or not your website’s content has changed since it was last crawled by a search engine spider. If there is new content, the spider will crawl the site again and gather more information to help it index and rank your website.

4. The spiders used by search engines cannot and will not crawl your entire site without your permission. They’re like vampires, they need to be invited in.  To do this, a text file called robots.txt is saved on your web server. This document tells spiders which directories in your website can and cannot be crawled. Keep this document up to date so that the spiders aren’t accidentally blocked from accessing your website. You can check out some frequently asked questions about robots.txt or test your robots.txt file in the Google robots.txt analysis tool to make sure it’s working properly.

5. Try to ensure that advertisements don’t affect your search engine ranking. Some ads, like Google AdSense ads and DoubleClick links are blocked from being crawled. If ads are not blocked from being crawled, make sure they are relevant to your website content so that the spiders don’t get confused and index your page incorrectly. The spiders will read text in ads in the same way as any other text on your website and treat them as part of your site.

6. If your website is a content management system that you can edit yourself, make sure that the system allows any extra pages or links you create to be crawled.

7. Use your robots.txt document to prevent spiders from crawling any pages that wouldn’t be of much value for visitors arriving there from a search engine. For example, if your website has a search feature, you don’t need spiders to crawl the search results page as visitors are unlikely to want to visit these pages as a first stop on your website.

8. Different web browsers will display your website slightly differently. Sometimes menus, images and text will appear completely different and look like somebody made a mistake in building them. You can test your website simply by looking at it in a few different browsers yourself or by submitting it to a site like Browser Shots.

9. Monitor your website’s performance and make sure it loads as quickly as possible. Google aims to provide its users with a great experience, and so faster-loading websites will perform better than slow ones. There are a few tools that can help with monitoring your website’s speed, such as Webpagetest.

Next time – Google’s quality guidelines…

Facebook insilico on LinkedIn Follow insilico_web on Twitter RSS YouTube Four Square
eBook