One of the consequences of our server fiasco was people couldn’t click through to our articles. They would click a link on our home page or from another source only to get a page not found error. We don’t expect this type of problem to happen with our new hosting provider. But if it does, there’s a workaround you can use with many sites when pages or servers aren’t available. The trick is to use a cached page.
A goal most websites have is to get their content indexed by the search engines. This makes it easier for readers and customers to find the answer to their question when using a search engine.
When a search engine spider indexes a web page, it keeps a copy of the scanned page. This is called the cached copy. You might think of this page as a historical skeleton. The reason I say skeleton is that the spider doesn’t save everything. Each time the spider comes back a new cache page overwrites the previous one.
A nice benefit is cached pages highlight your search terms (Label 3). I often use cached pages to quickly find my terms on a page. All the major search engines offer this highlighting feature. They also provide links to jump to the current page.
How to Find Cached Pages
It’s easy to overlook this resource when using search engines. We’re so trained to click the title; we often don’t see the cached link. The major search engines such as Google, Yahoo!, MSN and Ask provide a link after the page description if a cached page is available.
If you look at your search query results, you can see these links, but not always. Not all web sites want their cached pages to appear. Some examples include sites where membership or payment is needed. A webmaster can add coding to their pages to tell the search engines not to cache the results. This means a user can only jump to the current page.
Most of our pages have a cached link if they’ve been crawled. There are some exceptions such as our tutorial pages which haven’t been recrawled since we removed the membership restriction. The results also vary by search engine as each has their list of pages they want indexed.
What’s in a Cached Page
While some people expect the cached page to be an exact representation, it’s not. The content for a cached page is usually served from multiple computers. If you look at the screen snap below, you see the URL (Label 1) belongs to a Yahoo! server, not ours.
When the page was crawled, the spider grabbed the first 100K or so of text. It then goes backs to the source to fill in what it needs to finish the page when a user requests it. This is an important point as some people believe they can surf anonymously by viewing cached pages.
The reason we have a missing image and the formatting is off is because we changed our template after Yahoo! crawled this page. We no longer use that CSS file and image so never copied these files to our new server. You’ll note that our thumbnail images display since they still exist on our new server.
Not All Cache Pages Are the Same
You might think that the cached pages from the search engines would be the same. In some instances they are, but for many sites the cached pages differ. As example, look at the Google’s cached page for the same article.
Google’s page looks complete since they indexed the page more recently. You can tell the last time it was crawled was around 7/14 based on the date stamp in the NEWS ALERT box.
With the exception of Ask.com and Google, the search engines don’t show the date of the cached page. There are some search engine tools that can assist. For example, there is a tool from We Build Pages that provides more specific cache times for Google.
What you see for a cached page can also differ by browser since Internet Explorer and Firefox can render the same page differently. In fact, we recently spotted a problem in our cached pages with Firefox. It turns out we had a coding error that allowed just the top half of our content to display in Google. Now, we just have to wait for the spiders to revisit and overwrite our mistake. Some days you just want a “do over” button.
So the next time you run into a situation where a web page no longer shows, try looking for a cached page. The page may not display perfectly, but there may be enough there to answer your question.
Related Google Article
Last Updated (Sunday, 30 September 2012 14:13)