Mar 10 2009

Crawl Your Site for Broken Links, Errors and Duplicate Content

One very overlooked part of the entire SEO mix is making sure that your site does not have broken outbound (or internal) links which either link to error pages, or do not work at all. Furthermore, if your site delivers error pages or links to non-existent pages or files on your server, then search engines like Google are going to consider your site as being “under construction“, therefore not being useful or relevant to the human user.

Website Under Construction

Your site can have all of the optimized content, titles and headers in the world, but if it is not functioning correctly, then it will not rank to the best of its ability. Working on client SEO accounts, I have run into numerous issues where fixing some internal broken links, outbound links and making sure all of the files on the server are working have boosted a site ranking from pages 3 or 4, to page one.

My favorite tool for checking this information is Xenu’s Link Sleuth.

Xenu Link Sleuth

Xenu is usually the first step I take in on-site SEO research and identifying issues. I like to get these issues tackled from the beginning and when working with an IT team on a client project, handing them a 10 page error report usually puts them in their place, and show’s them that you’re serious about your technical SEO. :)

Xenu’s Link Sleuth checks Web sites for broken links. Link verification is done on “normal” links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria.

Xenu spiders a website in a similar fashion that a search engine will and delivers a report which looks at :

  • Broken links on the site which send the spider to error messages
  • Duplicate content issues such as similar title tags and URL structure
  • Broken files such as images and multimedia content whcih are not loading correctly
  • Images which do not include alt attributes (which can be helpful to SEO)
  • Identifing files and images which may effect page load time
  • Links to server redirected pages or 301 redirects which you can change on your site to link to the real page instead of the redirect command

In addition to checking for links via Xenu’s Link Sleuth, I also recommend doing a basic duplicate content and header error diagnosis. Sure, if you are using Google Webmaster Tools, this can be done easily, but we always don’t have access to Google Webmaster Tools, especially when working on 3rd party sites or performing competitive research.

One free tool which can be used to check for common duplcate content issues is the Virante Duplicate Content tool.

Virante’s tool diagnoses the following :

  • Common www vs non-www duplicate content issue by checking the headers returned by both versions of the url, the current cache in google, and possible PR dispersion.
  • Common default page error where both the / and /index.html (or other default page) return 200/OK headers.
  • Incorrect 404 pages which deliver a 200/OK Header and,
  • Supplemental pages in the Google index

By using tools like these to identify errors on your website, you can enhance your SEO and rankings substantially, especially if you have any critical errors which are keeping your site from ranking properly in the search engines.

What are some of your favorite tools for checking broken links, files and duplicate content issues? Please feel free to share them in the comments (and get a do-follow link back to your site!)

88 Responses to “Crawl Your Site for Broken Links, Errors and Duplicate Content”

  1. tim viec says:

    nice tool, thanks so much for sharing goog job.

  2. nazcar says:

    thanks.. Xenu is a great tool

  3. Facebook Connect Developers says:

    Hey its really helpful for webmasters. Nice info. Thanks

  4. canl? maç izle says:

    it’s really good information but i want to know about the effects.

  5. sxe says:

    Thanks man

    it’s really good information but i want to know about the effects.

  6. ps2 ?????? ? ??????? says:

    Hallo everybody.

    Very creative ways to use Twitter.
    I´ll do it also.
    Greetings

  7. justin tv says:

    it’s really good information but i want to know about the effects.

  8. Karl Kelman says:

    Xenu is the best! I love the ability to export the data into Excel where I can manipulate it any way that I want.

  9. canli maç izleme says:

    thanks for nice informations. i liked your site too much

  10. spor says:

    it’s too useful

  11. Cs 1.5 Sxe Hack says:

    Great Thanks !!

  12. Gol Videolar says:

    YEs very good text.
    Forever site :)

  13. Gol Videolar? says:

    Hmm…Anlamad?m ama deneyelim..

  14. Wally says:

    This a great tool just what I need. I noticed the error 404 on my stats fro crawls but never knew what to do about it. This is great, now I can check for broken links. Thanks for this great post.

  15. scrape box forum says:

    yeha nice article dude

  16. iddaa tahminleri says:

    it’s really good information but i want to know about the effects.

  17. lideriz says:

    Multicast Wireless is a mission-based, cutting edge, progressive multimedia organization located in Huntsville, Alabama

  18. Misyon5ice says:

    This is great article. Thank you very much for this post.

  19. Michael Roberts says:

    Thanks for letting us know about Xenu. It’s already been a big help!

  20. Zaria says:

    Being a newborn in the world of SEO, and I find the info in this article easy to digest and beneficial. Thanks Hugo!

  21. Zaria says:

    Being a newborn in the world of SEO, and I find the info in this article easy to digest and beneficial. Thanks!

  22. Cathal says:

    This is a great tool. Sometimes I worry about my competitors finding this excellent site!

  23. Youtube.video says:

    All Indian T.v Serials Available And Watch Live World Cup 2011 Only On

    online desi videos,indian tv serials,dramas Watch Daily Episode

  24. Dekorasyon says:

    it’s too useful

  25. Sara Technologies pvt Ltd. says:

    Sara Technologies Pvt. Ltd is a global provider of high quality and cost-effective Information Technology Services. We offers a range of expertise aimed at helping customers re-engineer and re-invent their businesses to compete successfully in an ever-changing marketplace.

  26. Sara Technologies pvt Ltd. says:

    Sara Technologies Pvt. Ltd is a global provider of high quality and cost-effective Information Technology Services.

  27. Kevin Roberts says:

    At være en nyfødt i verden af ??SEO, og jeg finder de info i denne artikel er let at fordøje og gavnlige, tak.

  28. justin tv says:

    I agree “Sara Technologies Pvt. Ltd is a global provider of high quality and cost-effective Information Technology Services. We offers a range of expertise aimed at helping customers re-engineer and re-invent their businesses to compete successfully in an ever-changing marketplace.”

Twitter Icon Facebook Icon RSS Icon