Subscribe to our RSS feed RSS
March 6, 2009

Preventing Duplicate Content With .htaccess File: Beginner’s Guide

Friday SEO tip is provided by Fabio Ricotta who works for the Brazilian company MestreSEO. Follow Fabio on Twitter.

1. You can access your domain with and without www. Eg.: mysite.com and www.mysite.com return the same content. This is a common problem and you can solve it by using this code in your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^mysite.com [L]
rewriterule ^(.*)$ http://www.mysite.com/$1 [r=301,L]

Replace mysite.com and www.mysite.com with your domain name.

2. You can access www.mysite.com/index.php and www.mysite.com. You can solve it by using this code in your .htaccess file:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(html?|php[45]?|[aj]spx?)\ HTTP/
RewriteRule index\.(html?|php[45]?|[aj]spx?)$ http://www.example.com/%1 [R=301,L]

Replace mysite.com and www.mysite.com with your domain name.

3. When you need to change your domain name you should use 301 redirect to preserve your PageRank (and rankings). You can do it by using this code in your .htaccess file
Options +FollowSymlinks
RewriteEngine on
rewriterule(.*) http://www.newdomain.com/$1 [r=301,L]

Replace newdomain.com with your new domain name.

4. The last one is the most well-known. If you changed a page URL, and want to 301 it to a new URL, you can do it using this code in your .htaccess file:
Options +FollowSymlinks
RewriteEngine on
rewriterule ^file.php$ http://www.mysite.com/newfile.php [r=301,L]

Replace mysite.com and www.mysite.com with your domain name, and file.php and newfile.php - with your file names.

IMPORTANT: All the methods mentioned above work ONLY on Linux servers having the Apache Mod-Rewrite module enabled. I hope this could be useful for you.

Anything to add? Please share your thoughts!

Feed for this Entry | Trackback Address
Daily SEO Tip on Facebook

27 comments already

  1. eporrell (Eddie Porrello) on 12.31.1969 at 11:59 pm | permalink
  2. Reading: “Preventing Duplicate Content With .htaccess File: Beginner?s Guide” ( http://tinyurl.com/conpkq )

    [Reply]

  3. juliansambles (Julian Sambles) on 12.31.1969 at 11:59 pm | permalink
  4. Today’s Daily SEO tip: http://tinyurl.com/conpkq

    [Reply]

  5. brunonassar (brunonassar) on 12.31.1969 at 11:59 pm | permalink
  6. RT: @fabioricotta: Saiu meu artigo no DailySEOtip: http://tinyurl.com/conpkq thanks to @seosmarty // mto bom mano! gratz!

    [Reply]

  7. fabioricotta (fabioricotta) on 12.31.1969 at 11:59 pm | permalink
  8. Saiu meu artigo no DailySEOtip: http://tinyurl.com/conpkq thanks to @seosmarty

    [Reply]

  9. the_gman (Gerald Weber) on 12.31.1969 at 11:59 pm | permalink
  10. reventing Duplicate Content With .htaccess File: Beginner?s Guide http://is.gd/mJ7J

    [Reply]

  11. Heron on 03.06.2009 at 2:59 pm | permalink
  12. Another good solution for people whom does not have experience with htaccess is using the tag canonical

    Heron’s last incredible blog post..Leandro Riolino - Search Engine Optimizer

    [Reply]

    seobag Reply:

    Tag canonical have a one shortcoming, it’s not supported for all search bots. I think this must be important for Brazil.
    But maybe it’s a bots problem. :)

    [Reply]

  13. Barry Welford on 03.06.2009 at 3:03 pm | permalink
  14. This is particularly important since you may be unaware of the leakage in search ranking that results from duplicate URLs. Using Google Webmaster Tools and verifying your website with them gives the best possible alert on whether Google has problems or not.

    Barry Welford’s last incredible blog post..The Best (and Worst) In Customer Service

    [Reply]

  15. g1smd on 03.06.2009 at 4:01 pm | permalink
  16. You’re playing fast and loose with the Mod_Rewrite syntax, especially capitalisation of reserved words. Be careful with that, to avoid future incompatibilities.

    Additionally, ^(.*)$ simplifies to (.*) here.

    Option 2 cannot possibly work as presented. RewriteRule cannot ’see’ the domain name, only the URL path part. That code has a number of other problems. It will create an infinite loop, when DirectoryIndex internally rewrites / to /index.php. The loop occurs because the rewritten pointer now matches the pattern for redirecting again. That can be solved by only invoking the redirect for direct client requests. Further, the target URL is missing a trailing / so there will be a double redirect here to add it back on. Additionally, the code does not work in folders, only in the root. There’s a much better way to code this so that it works for all index names at any level of folder depth, preserves the folder path, and is only invoked for direct client requests.

    Every rule here should always have [L] on the end, otherwise you risk ’strange behaviour’. Omitting it can cause you hours of messing about chasing strange bugs.

    [Reply]

    Fabio Ricotta Reply:

    Hi g1smd,

    The 2nd rule worked for me. I thought same as you (the loop problem) but it worked into my webserver.

    Thanks for upgrading my views. I really recommend using [L] as you specified.

    [Reply]

    g1smd Reply:

    There is no way that

    rewriterule ^mysite.com/index.php$ http://www.mysite.com [r=301,nc]

    could ever work.

    .

    What if it did work as coded there?
    It would redirect non-www index URLs to www root, but what would it do for www index files? Nothing.

    .

    RewriteRule cannot ’see’ domain names. The rule does nothing.

    [Reply]

    Ann Smarty Reply:

    Thanks a lot for your comment, G1smd. I’d appreciate it if you add the full file content where you believe the author missed anything. I will add that to the post then (to ensure our readers see the correct version).

    [Reply]

  17. Jonathan Georger on 03.06.2009 at 5:50 pm | permalink
  18. Great tips Anne. One thing I’d like to add is creating a 301 redirect for dynamic pages - this can be a real pain to figure out. (since I like this site so much, here is an example of how to 301 redirect a dynamic page to a static url)

    Ex. old - domain.com/old-domain.php?page=1
    NEW - domain.com/fresh-seo-url

    RewriteCond %{QUERY_STRING} ^page\=1$
    RewriteRule page.php /fresh-seo-url? [L,NC,R=301]

    [Reply]

    Ann Smarty Reply:

    Not my tip, but anyway, thanks a lot ;)

    [Reply]

    g1smd Reply:

    That redirect will cause a Redirection Chain for a non-www request where the canonical domain is www (and vice versa).

    .

    You should ALWAYS include the domain name in the target URL when you code up a redirect.

    .

    Additionally, if there are other parameters after the wanted parameter, your use of $ on the end of the pattern will ensure that your pattern does not match and the rule will fail.

    [Reply]

  19. g1smd on 03.06.2009 at 6:45 pm | permalink
  20. Replace NC with L in every rule (that’s the rule not the condition).

    Option 2 is completely broken. There is no way it can work. RewriteRule cannot ’see’ the domain name.

    This should work for index files:

    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(html?|php[45]?|[aj]spx?)\ HTTP/

    RewriteRule index\.(html?|php[45]?|[aj]spx?)$ http://www.example.com/%1 [R=301,L]

    Add a question mark after the %1 if you also need to strip parameters.

    [Reply]

  21. Josh Millrod on 03.06.2009 at 6:56 pm | permalink
  22. It’s surprising how many people don’t realize that they may have 4 versions of their homepage available at all times.

    http://www.example.com
    example.com
    http://www.example.com/index.html
    example.com/index.html

    Kind of insane. Great tip.

    Josh Millrod’s last incredible blog post..SEO Headscratcher: Hidden Elements on Page

    [Reply]

  23. malcolm coles on 03.06.2009 at 11:07 pm | permalink
  24. And if you use wordpress, also attempt to make categories etc have unique titles and meta descriptions: http://www.malcolmcoles.co.uk/blog/avoid-duplicate-meta-descriptions-in-pages-2-and-higher-of-the-wordpress-loop/

    malcolm coles’s last incredible blog post..Can you use rel = canonical to fix duplicate comment problems caused by comment pagination in wordpress?

    [Reply]

  25. Gerald Weber on 03.07.2009 at 9:29 pm | permalink
  26. Ann,

    I have my site set to redirect http://www.sem-group.net to http://sem-group.net and I have my preferred domain also set accordingly in Google webmaster tools. Is there any reason one would prefer http://www. as the domain rather than http: other than aesthetics?

    Gerald Weber’s last incredible blog post..Facebook’s BK Sacrifice App and 6 Other Examples of Brilliant Viral Campaigns

    [Reply]

    g1smd Reply:

    Yes, using [site:www.example.com] will show you all your wanted pages, and [site:example.com] will show you all pages (from all sub-domains) listed, and [site:example.com -inurl:www] will show you all the duplicates on non-www root and non-www sub-domains that have been inadvertently picked up.

    .

    Doing it the other way round (with your site on example.com) you just cannot do the same types of search, and clearly see the results.

    .

    My logic is this. You register example.com and then put up services like www .example.com feeds.example.com ftp .example.com mail.example.com and so on. I also 301 redirect example.com to www .example.com for HTTP requests, and that redirect is site-wide and preserves the originally requested filepath.

    [Reply]

  27. Bala Abirami on 03.09.2009 at 1:36 pm | permalink
  28. Great Post Ann. Really useful for beginners.

    [Reply]

  29. Alysson on 03.11.2009 at 5:34 pm | permalink
  30. Most people neglect to realize that not only can these issues cause duplicate content problems - which Google claims don’t exist - but, just as importantly, can slaughter the link equity of a page. If a page has 100 links, but those 100 links come in via 4 different URLs you’re really shooting yourself in the foot!

    I wrote a post about the issue not long ago: http://www.seoaly.com/canonical-url-issues-and-link-equity/

    Alysson’s last incredible blog post..SEOAly Makes the TopRank “BigList”

    [Reply]

  31. Monsieur Hotels on 04.03.2009 at 12:09 pm | permalink
  32. That would be nice if you could publish same codes and tips BUT for a IIS server…

    Just a thought.. and a request ;)

    Tom

    [Reply]

  33. Zafar Majid on 05.30.2009 at 3:54 pm | permalink
  34. Very useful.
    When I first set up my website, I was horrified when I read about the “canonical” issue and could have used this article then.

    Fortunately I did find info on how to write anhtaccess file and have used it since to redirect old pages to new pages.

    [Reply]

  35. techprism on 10.25.2009 at 8:57 am | permalink
  36. Nice tips. I find them useful.

    For bandwidth theft & image hotlinking I wrote http://inforids.com/using-htaccess-to-prevent-images-hotlinking-saving-your-web-hosting-bandwidth/, that can be very beneficial.

    [Reply]

  37. Mia on 02.18.2010 at 4:50 am | permalink
  38. It worked for me Thanks!

    [Reply]

  39. Price India on 03.17.2010 at 4:01 pm | permalink
  40. I have a very different problem. In my webmaster tool show duplicate title and add ?ftr=vidpgurl on url of my website. please tell the solutions. thanks!

    [Reply]

Leave a Comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Bad Behavior has blocked 2438 access attempts in the last 7 days.