Preventing Duplicate Content With .htaccess File: Beginner’s Guide
Friday SEO tip is provided by Fabio Ricotta who works for the Brazilian company MestreSEO. Follow Fabio on Twitter.
1. You can access your domain with and without www. Eg.: mysite.com and www.mysite.com return the same content. This is a common problem and you can solve it by using this code in your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^mysite.com [L]
rewriterule ^(.*)$ http://www.mysite.com/$1 [r=301,L]
Replace mysite.com and www.mysite.com with your domain name.
2. You can access www.mysite.com/index.php and www.mysite.com. You can solve it by using this code in your .htaccess file:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(html?|php[45]?|[aj]spx?)\ HTTP/
RewriteRule index\.(html?|php[45]?|[aj]spx?)$ http://www.example.com/%1 [R=301,L]
Replace mysite.com and www.mysite.com with your domain name.
3. When you need to change your domain name you should use 301 redirect to preserve your PageRank (and rankings). You can do it by using this code in your .htaccess file
Options +FollowSymlinks
RewriteEngine on
rewriterule(.*) http://www.newdomain.com/$1 [r=301,L]
Replace newdomain.com with your new domain name.
4. The last one is the most well-known. If you changed a page URL, and want to 301 it to a new URL, you can do it using this code in your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
rewriterule ^file.php$ http://www.mysite.com/newfile.php [r=301,L]
Replace mysite.com and www.mysite.com with your domain name, and file.php and newfile.php – with your file names.
IMPORTANT: All the methods mentioned above work ONLY on Linux servers having the Apache Mod-Rewrite module enabled. I hope this could be useful for you.
Anything to add? Please share your thoughts!
32 Responses to “Preventing Duplicate Content With .htaccess File: Beginner’s Guide”
Recent Comments
- Nijin @blogseoads.com on Search Engine Optimization Gone Bad
- winona on Social Media Marketing for Real Estate (Infographic)
- Dipak Rajyaguru on Link Evaluation Survey 2012
- Nick Stamoulis on Search Engine Optimization Gone Bad
- XNUMERIK on Importance of NoFollow Links In Driving Traffic
Friends and Partners
Tags
Archives
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009





Reading: “Preventing Duplicate Content With .htaccess File: Beginner?s Guide” ( http://tinyurl.com/conpkq )
Today’s Daily SEO tip: http://tinyurl.com/conpkq
RT: @fabioricotta: Saiu meu artigo no DailySEOtip: http://tinyurl.com/conpkq thanks to @seosmarty // mto bom mano! gratz!
Saiu meu artigo no DailySEOtip: http://tinyurl.com/conpkq thanks to @seosmarty
reventing Duplicate Content With .htaccess File: Beginner?s Guide http://is.gd/mJ7J
Another good solution for people whom does not have experience with htaccess is using the tag canonical
Heron’s last incredible blog post..Leandro Riolino – Search Engine Optimizer
seobag Reply:
March 6th, 2009 at 6:32 pm
Tag canonical have a one shortcoming, it’s not supported for all search bots. I think this must be important for Brazil.
But maybe it’s a bots problem.
This is particularly important since you may be unaware of the leakage in search ranking that results from duplicate URLs. Using Google Webmaster Tools and verifying your website with them gives the best possible alert on whether Google has problems or not.
Barry Welford’s last incredible blog post..The Best (and Worst) In Customer Service
You’re playing fast and loose with the Mod_Rewrite syntax, especially capitalisation of reserved words. Be careful with that, to avoid future incompatibilities.
Additionally, ^(.*)$ simplifies to (.*) here.
Option 2 cannot possibly work as presented. RewriteRule cannot ‘see’ the domain name, only the URL path part. That code has a number of other problems. It will create an infinite loop, when DirectoryIndex internally rewrites / to /index.php. The loop occurs because the rewritten pointer now matches the pattern for redirecting again. That can be solved by only invoking the redirect for direct client requests. Further, the target URL is missing a trailing / so there will be a double redirect here to add it back on. Additionally, the code does not work in folders, only in the root. There’s a much better way to code this so that it works for all index names at any level of folder depth, preserves the folder path, and is only invoked for direct client requests.
Every rule here should always have [L] on the end, otherwise you risk ‘strange behaviour’. Omitting it can cause you hours of messing about chasing strange bugs.
Fabio Ricotta Reply:
March 6th, 2009 at 4:24 pm
Hi g1smd,
The 2nd rule worked for me. I thought same as you (the loop problem) but it worked into my webserver.
Thanks for upgrading my views. I really recommend using [L] as you specified.
g1smd Reply:
March 6th, 2009 at 6:53 pm
There is no way that
rewriterule ^mysite.com/index.php$ http://www.mysite.com [r=301,nc]could ever work.
.
What if it did work as coded there?
It would redirect non-www index URLs to www root, but what would it do for www index files? Nothing.
.
RewriteRule cannot ‘see’ domain names. The rule does nothing.
Ann Smarty Reply:
March 6th, 2009 at 6:33 pm
Thanks a lot for your comment, G1smd. I’d appreciate it if you add the full file content where you believe the author missed anything. I will add that to the post then (to ensure our readers see the correct version).
Great tips Anne. One thing I’d like to add is creating a 301 redirect for dynamic pages – this can be a real pain to figure out. (since I like this site so much, here is an example of how to 301 redirect a dynamic page to a static url)
Ex. old – domain.com/old-domain.php?page=1
NEW – domain.com/fresh-seo-url
RewriteCond %{QUERY_STRING} ^page\=1$
RewriteRule page.php /fresh-seo-url? [L,NC,R=301]
Ann Smarty Reply:
March 6th, 2009 at 6:40 pm
Not my tip, but anyway, thanks a lot
g1smd Reply:
March 6th, 2009 at 6:56 pm
That redirect will cause a Redirection Chain for a non-www request where the canonical domain is www (and vice versa).
.
You should ALWAYS include the domain name in the target URL when you code up a redirect.
.
Additionally, if there are other parameters after the wanted parameter, your use of $ on the end of the pattern will ensure that your pattern does not match and the rule will fail.
Replace NC with L in every rule (that’s the rule not the condition).
Option 2 is completely broken. There is no way it can work. RewriteRule cannot ‘see’ the domain name.
This should work for index files:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(html?|php[45]?|[aj]spx?)\ HTTP/RewriteRule index\.(html?|php[45]?|[aj]spx?)$ http://www.example.com/%1 [R=301,L]
Add a question mark after the %1 if you also need to strip parameters.
rasamassen Reply:
July 26th, 2010 at 9:18 pm
This works great, except it doesn’t redirect http://www.example.com/index.php?a=1 to http://www.example.com/?a=1 How would you make it also do that?
It’s surprising how many people don’t realize that they may have 4 versions of their homepage available at all times.
www.example.com
example.com
www.example.com/index.html
example.com/index.html
Kind of insane. Great tip.
Josh Millrod’s last incredible blog post..SEO Headscratcher: Hidden Elements on Page
And if you use wordpress, also attempt to make categories etc have unique titles and meta descriptions: http://www.malcolmcoles.co.uk/blog/avoid-duplicate-meta-descriptions-in-pages-2-and-higher-of-the-wordpress-loop/
malcolm coles’s last incredible blog post..Can you use rel = canonical to fix duplicate comment problems caused by comment pagination in wordpress?
Ann,
I have my site set to redirect www.sem-group.net to http://sem-group.net and I have my preferred domain also set accordingly in Google webmaster tools. Is there any reason one would prefer www. as the domain rather than http: other than aesthetics?
Gerald Weber’s last incredible blog post..Facebook’s BK Sacrifice App and 6 Other Examples of Brilliant Viral Campaigns
g1smd Reply:
March 11th, 2009 at 10:02 am
Yes, using [site:www.example.com] will show you all your wanted pages, and [site:example.com] will show you all pages (from all sub-domains) listed, and [site:example.com -inurl:www] will show you all the duplicates on non-www root and non-www sub-domains that have been inadvertently picked up.
.
Doing it the other way round (with your site on example.com) you just cannot do the same types of search, and clearly see the results.
.
My logic is this. You register example.com and then put up services like www .example.com feeds.example.com ftp .example.com mail.example.com and so on. I also 301 redirect example.com to www .example.com for HTTP requests, and that redirect is site-wide and preserves the originally requested filepath.
Great Post Ann. Really useful for beginners.
Most people neglect to realize that not only can these issues cause duplicate content problems – which Google claims don’t exist – but, just as importantly, can slaughter the link equity of a page. If a page has 100 links, but those 100 links come in via 4 different URLs you’re really shooting yourself in the foot!
I wrote a post about the issue not long ago: http://www.seoaly.com/canonical-url-issues-and-link-equity/
Alysson’s last incredible blog post..SEOAly Makes the TopRank “BigList”
That would be nice if you could publish same codes and tips BUT for a IIS server…
Just a thought.. and a request
Tom
Very useful.
When I first set up my website, I was horrified when I read about the “canonical” issue and could have used this article then.
Fortunately I did find info on how to write anhtaccess file and have used it since to redirect old pages to new pages.
Nice tips. I find them useful.
For bandwidth theft & image hotlinking I wrote http://inforids.com/using-htaccess-to-prevent-images-hotlinking-saving-your-web-hosting-bandwidth/, that can be very beneficial.
It worked for me Thanks!
I have a very different problem. In my webmaster tool show duplicate title and add ?ftr=vidpgurl on url of my website. please tell the solutions. thanks!
I liked the tip Josh Millrod.
Thanks.
I like this tips. It’s very helpful for SEO’s and beginners
I loved your tips. I’ll put into practice.
Congratulations!
Thanks.
wish there would have an option of .htaccess for windows shared hosting as well.