Get Updates Via Email

BLOG

  • When You Don't Want to be Indexed

    Smetimes you have things you don’t want indexed by Google. What are those things, and how to you avoid them being indexed?

    The top offender when it comes to stuff being indexed that shouldn’t is duplicate content. Search engines can get confused and dismissive when presented with a lot of highly similar content. When they find content at more than one location, or if there is marked similarity between content on two different urls, this may result in only one result being shown – and you have no guarantee it will be the one you would rather have displayed.

    Of course, this only happens to you, not the gazillion results you see on google that are virtually the same darn thing, but trust me it can and does happen, usually when you least expect it. This can damage your results by knocking a main page on your website that has a terrific snippet out of the running in favor of a PDF file that a lot of people won’t bother to click on.

    Worst case scenario – the search engine finds so much duplication as it crawls a site that it gives up and move on – leaving the rest of our site un-indexed. It just doesn’t make sense for them to waste resources crawling content that is already in their system, and Google says they don’t want  SERPs with the same result listed again and again – even though it does happen.

    You ‘ve seen it, probably, in your own searches online – you get a few results followed by a notice that lets you know there are even more, but they seem to be identical to ones already offered. There are several different things that could cause a crawler to assume they are looking at duplicate content:

    1. Product descriptions, title, meta descriptions, headings, navigation, and text shared globally across different products or pages on your site can flip a dupe content switch . You need a content management system that allows you to vary your meta descriptions on each page.

    2. Printer friendly versions. Disallow or no-index these! This is an example of something that is easily avoided with just a little care; all you have to do is ask yourself before you add a page : “Do I want the search engine to see this?”

    3. Server side include html RSS feeds. Replace these with a client side include such as java script. This will help the RSS to avoid being picked up on page after page and mistaken for duplicate content.

    4. Different URLs pointing at the same page. Sometimes a crawler won’t realize it is crawling the same page because it can access it through different urls, especially if you serve up a session ID to the crawler, which changes the url a little each time. Follow Google Webmaster Guidelines and tell your site to allow bot access without tracking! A “canonical URL” should be the best one for the page to be indexed under.

    5. Copyright infringement. If you have free articles on your site that you allow others to use in return for a link, or if you have scrapers come in and lift your content, their copy could end up outranking your original. Google paragraphs from your site regularly to se if you are being flched from.

    6. Mirrored sites, secondary domains and sub domains with similar content may not be recognized by the bots as such. Be careful! Build with brand new content where ever possible.

    Many duplicate content issues can be avoided simply by using of the robot txt file. The best plan is to consider such issues as you build your site, and avoid any behavior that could lead to your site being left un-indexed due to perceived duplicate content.

    The other thing you want unindexed is paid links. If you pay for a link, please, please no-follow it! You can be heavily penalized for paid links if the spiders crawl them and think you are trying to artificially increase PageRank!

     

Leave a Reply

Recent Posts

Share Now Facebook
Share Now Pinterest
Share Now LinkedIn
Share Now Google+
https://www.submitedgeseo.com/blog/when-you-dont-want-to-be-indexed/">
Follow by Email