Mar 10 2009

Crawl Your Site for Broken Links, Errors and Duplicate Content

One very overlooked part of the entire SEO mix is making sure that your site does not have broken outbound (or internal) links which either link to error pages, or do not work at all. Furthermore, if your site delivers error pages or links to non-existent pages or files on your server, then search engines like Google are going to consider your site as being “under construction“, therefore not being useful or relevant to the human user.

Website Under Construction

Your site can have all of the optimized content, titles and headers in the world, but if it is not functioning correctly, then it will not rank to the best of its ability. Working on client SEO accounts, I have run into numerous issues where fixing some internal broken links, outbound links and making sure all of the files on the server are working have boosted a site ranking from pages 3 or 4, to page one.

My favorite tool for checking this information is Xenu’s Link Sleuth.

Xenu Link Sleuth

Xenu is usually the first step I take in on-site SEO research and identifying issues. I like to get these issues tackled from the beginning and when working with an IT team on a client project, handing them a 10 page error report usually puts them in their place, and show’s them that you’re serious about your technical SEO. :)

Xenu’s Link Sleuth checks Web sites for broken links. Link verification is done on “normal” links, images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts and java applets. It displays a continously updated list of URLs which you can sort by different criteria.

Xenu spiders a website in a similar fashion that a search engine will and delivers a report which looks at :

  • Broken links on the site which send the spider to error messages
  • Duplicate content issues such as similar title tags and URL structure
  • Broken files such as images and multimedia content whcih are not loading correctly
  • Images which do not include alt attributes (which can be helpful to SEO)
  • Identifing files and images which may effect page load time
  • Links to server redirected pages or 301 redirects which you can change on your site to link to the real page instead of the redirect command

In addition to checking for links via Xenu’s Link Sleuth, I also recommend doing a basic duplicate content and header error diagnosis. Sure, if you are using Google Webmaster Tools, this can be done easily, but we always don’t have access to Google Webmaster Tools, especially when working on 3rd party sites or performing competitive research.

One free tool which can be used to check for common duplcate content issues is the Virante Duplicate Content tool.

Virante’s tool diagnoses the following :

  • Common www vs non-www duplicate content issue by checking the headers returned by both versions of the url, the current cache in google, and possible PR dispersion.
  • Common default page error where both the / and /index.html (or other default page) return 200/OK headers.
  • Incorrect 404 pages which deliver a 200/OK Header and,
  • Supplemental pages in the Google index

By using tools like these to identify errors on your website, you can enhance your SEO and rankings substantially, especially if you have any critical errors which are keeping your site from ranking properly in the search engines.

What are some of your favorite tools for checking broken links, files and duplicate content issues? Please feel free to share them in the comments (and get a do-follow link back to your site!)

88 Responses to “Crawl Your Site for Broken Links, Errors and Duplicate Content”

  1. Tools for broken links on your site and more (with complete URL this time): http://is.gd/nfqJ

    Avila Reply:

    Giving good results. sometimes giving weird result

  2. Crawl Your Site for Broken Links, Errors and Duplicate Content http://is.gd/nfqJ

  3. RT @webaddict @Hakicoma: #Crawl Your Site for Broken #Links, Errors and #Duplicate #Content http://tinyurl.com/bg83jp #webmaster #blogger

  4. RT @webaddict RT @Hakicoma: #Crawl Your Site for Broken #Links, Errors and #Duplicate #Content http://tinyurl.com/bg83jp #webmaster #blogger

  5. RT @Hakicoma: #Crawl Your Site for Broken #Links, Errors and #Duplicate #Content http://tinyurl.com/bg83jp #webmaster #blogger

  6. Crawl Your Site for Broken Links, Errors and Duplicate Content | Daily SEO Tip http://tinyurl.com/bg83jp

  7. RT @RandomReTweet: RT @Hakicoma Crawl Your Site for Broken Links, Errors and Duplicate Content http://tinyurl.com/bg83jp

  8. RT @Hakicoma Crawl Your Site for Broken Links, Errors and Duplicate Content http://tinyurl.com/bg83jp

  9. Crawl Your Site for Broken Links, Errors and Duplicate Content http://tinyurl.com/bg83jp

  10. RT @TyDowning: When is the last time you checked your site for broken links? http://tinyurl.com/bg83jp

  11. RT @TyDowning: When is the last time you checked your site for broken links? http://tinyurl.com/bg83jp

  12. When is the last time you checked your site for broken links? http://tinyurl.com/bg83jp

  13. Checking your site for broken links: http://tinyurl.com/bg83jp Xenu has always been my personal favorite

  14. Crawl Your Site for Broken Links, Errors and Duplicate Content http://is.gd/nfqJ

  15. Barry Welford says:

    Yes, Xenu does the trick. Excellent tool.

    To know that you need it, do a regular review of your website via Google Webmaster Tools. It will warn you if you have some of these broken links, incorrect redirects, etc.

    Barry Welford’s last incredible blog post..Since Google Is In Mountain View, CA, Is It A Guru?

    Loren Baker Reply:

    True Barry, I like using various resources to get a diverse report on the errors and issues with a site, sometimes one will find many more issues than another.

  16. Steen Ohman says:

    XENU is the best tool I have found for broken links. But must say Webmaster tools are more user friendly for the average webmaster.

    Still it’s a very overlooked topic … keeping your site clean from broken links and duplicate content.

    Normally I review the indexed pages in Google once a month to check for duplicate content – and to check tje quality of the snippets/meta description.

    Steen Ohman’s last incredible blog post..Danish Site Search!

  17. Kevin Boss says:

    I use XENU on a daily basis. Hands down one of the greatest tools in my arsenal.

    Duane Brown Reply:

    Checking it on a daily bases is not overkill?

    Duane Brown’s last incredible blog post..Skittles + Social Media = New Power Couple

  18. Duane Brown says:

    Thanks for the tip and free tool. I’m going to use it this week, I hope, to check my own sites first and then bookmark it for future client work one day.

    Duane Brown’s last incredible blog post..Skittles Scandal: Tracking the Skittles Experiment [Friday March 6th at 11am]

  19. Josh Millrod says:

    SEOmoz also has a great crawl tool. It’s one of the only free one’s they offer, but it’s super solid.

    http://www.seomoz.org/crawl-test

    Josh Millrod’s last incredible blog post..The 4 Rules of SEO for Online Journalism

  20. Junaid Ahmed says:

    Thank you so much for this great tool, I’ll see if I can use it on my mac if not then will have to use the windows based tool.

    Junaid Ahmed’s last incredible blog post..Twitter Power

  21. mihai says:

    incredible tool.but what can i use to check my poosotion in google? i ve tried advancedwebranking but it s too expensive (about600$). what can i use?please i invite you to email me and give me a good solution. on another hand great blog, intresting posts.

  22. Josh Millrod says:

    SEOmoz also has a pretty great rank check tool. I swear I don’t work for them (I only dream about it). I just dig the tools.

    Josh Millrod’s last incredible blog post..The 4 Rules of SEO for Online Journalism

    Loren Baker Reply:

    LOL, you can plug SEOmoz all you want, we love them!

  23. Steen Öhman says:

    For ranking check I use SEOmoz and IBP – International Business Promoter (dont know if a link is ok here – so find it on Google).

    SEOmoz have some good tools, but are not very strong if you want to digg down in the different european markets.

    But … as allways have more than one tool in the box, and use the best one for the job.

    Steen Öhman’s last incredible blog post..Media status marts 2009

  24. Dana Lookadoo says:

    A vote for Xenu, Google Webmaster Central and SEOmoz tools.

    Use Google Analytics to track 404 error pages, which can result from broken links. Info for setting up tracking:
    http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=86927

    @mihai, @Steen Ranking varies based on demographics, keywords, and history of your Gmail account if searching while logged in. Searching Google is free, and you can use a proxy server to check ranking from various IPs. (Wouldn’t spend too much time on ranking reports, IMHO.)

    @Loren, many thanks for Virante tip for duplicate content!

    Dana Lookadoo’s last incredible blog post..Twitter Engages Professional Cycling Fans

    Steen Öhman Reply:

    True .. forgot to mention … you have to log-off from Google before checking rankings. It can have quite a difference.

    A good tool will automaticly do this … so your ranking results are not biased by your profile.

    But remember … your client could be on the Google account .. an will then .. maybe get another ranking!

    Steen Öhman’s last incredible blog post..Media status marts 2009

    Dana Lookadoo Reply:

    Correction: “demographics” should have been “geographics”

    Dana Lookadoo’s last incredible blog post..Twitter Engages Professional Cycling Fans

    Loren Baker Reply:

    No problem Dana! Thanks for your feedback and additions :)

  25. Robert Publer says:

    Good review, i’m going to use it this week, I hope, to check my own sites first and then bookmark it for future client work one day.

  26. BG Mahesh says:

    Linkscan from elsop.com is a great tool but it is not free.

    The other tool we have developed works on the Google Webmasters Tools error reports which I believe you will find it very useful. Please see http://www.greynium.com/tools/gwt/

    BG Mahesh
    http://www.greynium.com

    BG Mahesh’s last incredible blog post..Is the naming industry in India non-existent?

    Loren Baker Reply:

    Thanks BG, here’s the direct link to Elsop : http://www.elsop.com/

  27. Mahesh says:

    Thanx for the tool.. Trying it now!

  28. iCan't Internet says:

    Excellent tools that do the job!
    These are links that every webmaster should have bookmarked…

    iCan’t Internet’s last incredible blog post..Advertising on Facebook

  29. Matt Evans says:

    Xenu is a great tool I’ve used for years. Even when I worked for a top SEM Agency it was a tool we used everyday.

    Matt Evans’s last incredible blog post..Optimizing Local Business Listings Online

  30. Lisa says:

    XENU for broken links, definitely. For duplicate content I like GSiteCrawler.

  31. Lohith says:

    Thanks for this new duplicate content tool.

  32. komikya says:

    i like it.))

    Lig tv izle

    komikya’s last incredible blog post..Lig tv izle

  33. Nate at Plasticprinters says:

    I love Xenu! It’s a great tool, can’t say enough about it.

  34. AndyW says:

    I use SEOmoz Crawl Test although it is part of their paid package

    AndyW’s last incredible blog post..Review of ezBusinessNeeds DoFollow Searcher Tool

  35. Online Internet Faxing says:

    Great post. I so love Xenu for both an SEO tool as well as for web development so that I can track all the original pages from a website has been included in the new one. Cheers.

  36. Bill Cook says:

    copyscape.com is a great tool for checking dup content. I also find it useful when reviewing copy that has been outsourced.

    Bill Cook’s last incredible blog post..The American Sugar Alliance Sweetens its Presence on the Web

  37. Glenn Crocker says:

    I use Xenu and Web CEO for this, but recently started using a neat tool from Microsys Tools called A1 Website Analyzer. It will spider sites and do much of the same tasks as the others, but will also export title/meta-description (handy for cleaning problems up and sending to IT folks to implement). But the best thing it has is an on-site PageRank simulator that helps identify problems with link structure. Neat stuff!

    Here’s a recent article I wrote about using it for PageRank Sculpting.

    Glenn Crocker’s last incredible blog post..PageRank Sculpting: Why Your Home Page has Low PR

  38. Nick Stamoulis says:

    You should always take the time to go over your website and make sure all your links are working. It is always the worst when a client brings it to your attention that you have a broken link.

    Nick Stamoulis’s last incredible blog post..Interest-Targeted Content Network Advertising

  39. Matt says:

    here you go folks .. try our broken link checker.

    enjoy

  40. Anish K.S says:

    Xenu is good, me using w3c link checker.

  41. nette says:

    SEO Experts, here’s something very interesting and helpful
    http://www.thesiteflingmafia.com/

  42. ligtv izle says:

    thank you mübarek

    ligtv izle’s last incredible blog post..Ankaraspor Fenerbahçe Maç? Saat 20-00?da Canl? Yay?nlanacakt?r..

  43. Oliver says:

    This looks like a really good tool and something I will have to try out soon to check the sites that I put live. Thanks for some great information, this will be handy.

  44. Matt says:

    nice post – you may also want to try our broken link check tool, it’s free!

    thanks
    Matt

    Matt’s last incredible blog post..Link Building – The Fundamentals

  45. Submitter says:

    Indeed, using a link scanner is a good idea to scan for broken links. Personally I prefer the more passive approach.

    I start a http sniffer and then go through the web pages of my sites. I click through links and when I am done, I pause the sniffer and check the log. I will immediately see the 404 response headers. I know that this method is less user friendly and could seem more complicated than anything out there, but this is how security experts do it.

    Submitter’s last incredible blog post..GET OUT OF DEBT FAST AND FREE

  46. tolga says:

    Thank You …

    tolga’s last incredible blog post..Öyle bir geçer zaman ki

  47. sxe says:

    Thank you admin..

    sxe’s last incredible blog post..Lolipoplar

  48. a?k ?iirleri says:

    thank you.

  49. news says:

    i can enhance your SEO and rankings substantially, especially if you have any critical errors which are keeping your site from ranking properly in the search engines.

  50. Curious George says:

    What about 200 redirects?
    Are they ‘search engine friendly’ with SEO in mind?

    cheers
    G

    Ann Smarty Reply:

    200 is the header status that sends “OK” response. It is required for any working URL.

    Curious George Reply:

    Many thanks Anne,
    That’s sorted me out, cheers.