302 search engine exploit

I just read about the Page Hijack Exploit: 302, redirects and Google, where the temporary redirect HTTP code allows search engine to use wrong URIs for web pages. In this post, I suggest an alternative solution to this problem.

An excerpt of the description of the problem:

  1. When both pages arrive at the reception of the "index" they are spotted by the "duplicate filter" as it is discovered that they are identical.
  2. The "duplicate filter" doesn't know that one of these pages is not a page but just a link. It has two URLs and identical content, so this is a piece of cake: Let the best page win. The other disappears.

As a solution, Claus suggests that search engines should treat differently temporary redirects across domains. This would probably work well enough, but a different solution seems to be apparent from the point 14 above that might be considered cleaner:

The problem is in the step where duplicates are encountered and the search engine doesn't look at where the URIs came from before keeping only one and throwing away the duplicates. If the search engine knows one URI was direct and another was a redirect (of any form) then the redirect URI should be thrown away, not the direct one. So the solution is to keep some kind of provenance info available to the "duplicate filter" so it can make a better informed decision about throwing away URIs.

Posted at 1454 on Tue, Mar 15, 2005 in category Ideas | TrackBack | Comments feed
Comments

http://302-redirect-highjacking.netfirms.com/ You must read this page to know more sinister details about this devasting Google bug.

Posted by: Carla at June 14, 2005 3:18 AM