Tuesday, October 11, 2005

Q and A: Does the dupe domain indexing dilemma really exist?

Dear Kalena...

I'm a new reader of your excellent blog and Search Light. Thank you for both.

The leadoff item in the current Search Light issue seems crazy to me. The idea that Google treats the same site with and without www in the url as separate duplicate sites strikes me as absurd.

Think about it. Would Google allow such an obvious and widespread problem in its processes to exist without quickly correcting it? If they couldn't correct the problem they would at least publicly acknowledge the problem and publish a recommended solution for webmasters. Google's not going to allow a bug like that to adversely affect every web site to work with and without the www. Google's not going to force sites to compete with themselves for rank. I've seen several exclamatory posts about this alleged problem in forums and blogs in the past month, and I'm skeptical.

Could it be that some folks are misinterpreting their data, developing a theory about Google's inner workings, and passing it off as fact? I can't believe the problem exists as it has been portrayed.

I decided to see if my site's data fits with the allegations. My site works with and without the www. Since I had no idea how my host makes both addresses point to the same site, I used the "get-validated" tool listed at the top of "www is deprecated" ( http://no-www.org/) to check for redirects. It says both my addresses are valid, neither address redirects to the other, and the HTML at the two addresses match.

I performed the Google search using the "site:" prefix for each address as suggested in the Search Light article. The search results show that Google has 450 pages indexed for the non-www address and only 438 pages for the www address. According to the article that means I have "Dupe Domain Indexing Dilemma" and I should add a 301 Redirect to my htaccess file.

However, both addresses have a Google page rank of 4. I have always used the www in my link submissions. Yahoo shows 151 back links to the www page and only 1 back link to the non-www page. Google shows zero back links to both addresses (but Google admittedly omits back links for many reasons).

Could a simple search show if Google has really indexed both my sites as separate sites? I performed a search for a unique phrase only found on one of my pages. There was only one result, and the address uses www. Doesn't that test show that Google is not indexing the non-www site as a separate site? I guess it could be that Google doesn't index one of the sites, but then it wouldn't show over 400 indexed pages for each site. It looks to me like Google knows they are the same site.

I'm a layman when it comes to these matters and I may be completely wrong, but common sense and a little analysis makes me very skeptical regarding this problem as it's been presented.

Best Regards,

Jon


Kalena's Answer:

Dear Jon

It is not a bug in Google, merely a logical mis-interpretation by Googlebot that two different sites exist.

It all depends on the site server configuration - if the server presents both www and non-www versions of a domain as stand-alone sites, there is more of a chance that Googlebot will index both. In some cases, Google will correctly determine that the sites are one and the same and choose to index one only. But as Dan pointed out in his blog post, unless you correctly set up your server configuration, you may not have a choice of which one Google chooses to index.

As you said you had no idea how your host makes both versions of your site point to the same domain, I've checked and it looks like your host has already set up your server correctly as Google is only indexing and caching pages on the www version of your domain. If you have another look at Google search results for your site using the "site" prefix without the www, you'll see that all pages listed actually begin with www.

This means that Googlebot has correctly interpreted your domain as including www and is not indexing a non-www version of your site. So relax, you don't suffer from DDID after all.
AddThis Social Bookmark Button

6 Comments:

At 1:47 PM , Anonymous said...

Ok - I just Googled one of my sites by searching "site:www.example.com" and "site:example.com" (without the quotes).

The first search yielded only two results, the www.example.com/ home page and the www.example.com/privacy.htm page for the privacy statement.

The second search produced seven results, including the two listed above, as well as five others without the www, with different page versions despite that I terminated and recreated the account and they should no longer exist on any server, as follows:

example.com/termsofuse.htm
www.example.com/
example.com/exampleterms.htm
example.com/privacy.htm
www.example.com/privacy.htm
example.com/
example.com/exampleprivacy.htm

It appears then, that I may have this problem written about, right?

This website used to be on page 1 of Google for a long time. It also held the #1 spot for a considerable time for the most important key phrase.

It has floundered more or less “off the map” for many months now, except for a very, very brief period in which it appeared on page 1 again, and now has a page rank of 1.

The site was made with FrontPage XP. When I tried to modify the .htaccess file, the FrontPage extensions no longer worked, which I most definitely want to avoid.

So, the question is, what short snippet of text can I insert into the .htaccess file which will both eliminate the problem and allow the FP extensions to keep working properly? Why do multiple page versions still appear in Google despite that it’s been almost four months since deleting and recreating the account and loading only one version after that? Thanks to anyone who has the real, true, and easy to implement answer(s).

 
At 1:57 PM , Kalena said...

To fix the FP probs, try this code:

# -FrontPage-

RewriteEngine on
Options +FollowSymlinks
Options +ExecCGI

 
At 10:26 AM , Anonymous said...

Thanks, Kalena. I tried the code, but only a long error message appears including a statement to the effect that the FP extensions may not be installed on the server. When I remove the lines after the "# -FrontPage-," the FP extensions are reactivated.

 
At 12:26 PM , Anonymous said...

This evening I did some more research and discovered a suggestion that the code to remedy this needs to be placed in numerous .htaccess files and not just one. After placing the code in every .htaccess file I could find in File Manager, the FP extensions have been successfully restored. This is excellent; however, it is also very time consuming. If someone would write an application to automate this, that would be worth paying for.

 
At 12:30 PM , Anonymous said...

P.S. The additional research I did was based on a referral to another website Kalena had made to me elsewhere in the past about this matter, so thank you again, Kalena. Above all, thank God for this blog.

 
At 1:30 PM , Kalena said...

You're very welcome! Glad to hear you got it sorted.

 

Post a Comment

<< Home


Proposal templates ready for editing