« Previous Next »

Thread: Broken links with dynamic sites

Last post 11-09-2009 1:38 PM by CarlosAg. 5 replies.

Average Rating Rate It (5)

RSS

Page 1 of 1 (6 items)

Sort Posts:

  • 11-02-2009, 11:30 AM

    • ganseki
    • Not Ranked
    • Joined on 05-18-2006, 9:59 PM
    • England
    • Posts 3

    Broken links with dynamic sites

    Can anyone help configure SEOTK to identify broken links.

    In my scenario page content is generated from the database and the entire site requires authentication.

    Turns out that one of the content authors has been adding hyperlinks without http:// and this is resulting in pages in the site content generating urls such as http://www.mysite.com/www.anothersite.com/ - which does not exist. I am trying to use the SEOTK to identify the pages containing broken links like these but so far without success.I am getting one or two errors for "The page contains broken hyperlinks" and "The URL fro the hyperlink is broken", but I know there are more errors than are being detected this way.

    My settings
    I create a new Analysis, give it a name (test), the path is already set to the root of my application. Under advanced settings specify Host: www.mysite.com for Consider as internal link if coming from and in Authentication set basic mode and supply username and password for an account.

    Simon
    Sharing Knowledge Saves Valuable Time!
  • 11-02-2009, 1:12 PM In reply to

    Re: Broken links with dynamic sites

    Is it not working with the setup above? If everything runs correctly you should be able to see the broken links both as Violation (The page contains broken hyperlinks) as well as in the "Content" category in the "Pages with broken links" report.

    Does your application require more advanced interaction that simple GET requests (such as entering information, or clicking buttons, or interacting with javascript, or issuing POST requets)?

    In general the SEO Toolkit behaves as a Search Engine would which means it will only issue GET requests, it does support authentication (Basic and Windows).

    If your site is configured to not return 404 (NOT FOUND) when it gets a random page request then the reports above will not work, however you should still be able to find some of them by accessing the "Content" category and choose the "Directory Summary" report. In there you should be able to find funny directories like:

    /www.anothersite.com/

    which will show you which URLs were wrong even if they did not reported a 404.

     

     

     

  • 11-03-2009, 8:17 AM In reply to

    • ganseki
    • Not Ranked
    • Joined on 05-18-2006, 9:59 PM
    • England
    • Posts 3

    Re: Broken links with dynamic sites

     Hi Carlos,

     Thanks for the info. 

    At the moment I'm not seeing any broken links appearing, either as a Violation or Pages with broken links report.

    The site is pretty simple and most (if not all of it) should be accessible with simpe GET requests. Because so little of the site is being accessed I am a little worried that almost none of the site will be indexed.

    Checking in the Content / Directory Summary report it is totally empty - but then all pages are dynamically created from the database, there are no physical folders / heirarchies (I am using Sitefinity). I was hoping to see some results in the Content / External links report but that only has 2 Urls and they are both internal?

    The Content / Pages with Broken Links report is also empty. 

    It sounds like I'm doing things right but maybe SEO toolkit doesn't play well with Sitefinity?

     

    Simon
    Sharing Knowledge Saves Valuable Time!
  • 11-07-2009, 7:22 PM In reply to

    Re: Broken links with dynamic sites

    If I understand correctly it seems like IIS SEO is not able to crawl anything from your site, right?

    When you are in the report and you use the menu "Report->New Query" and Press Execute, do you only get two URLs? Could you add the Column "Is External" and see if they are treated as External?

    It could be that the Start URL you are entering is not matching the criteria for "external pages" used in the "New Analysis" advanced settings. If that is the case then when you start a new analysis could you make sure that you expand the Advanced Settings and make sure that the "Consider as internal link..." matches the criteria that would crawl your site.

    Also, could you see if in the Violations->Summary there is something about using Noindex/nofollow?

    Could you also see what is the content in the URLs that were crawled and see if we found links on them? (Double click and see "Content tab" and "Links Tab"), Is the Content Type set correctly to HTML.

    I am not familiar with Sitefinity, but if you send me an email at  microsoft . com (same user as in this reply) and you can send me either the content of the HTML of the start url or a link to your site I can try to look at the problem.

     

     

     

  • 11-09-2009, 5:56 AM In reply to

    • ganseki
    • Not Ranked
    • Joined on 05-18-2006, 9:59 PM
    • England
    • Posts 3

    Re: Broken links with dynamic sites

    I hadn't seen the New Query option, but in there I get 51 urls, the thing is that they are JavaScript / axd or image files and not pages from my site. There are some that are treated as external e.g. a link to jQuery on Google.

    The start URL I am using is the root of the site and I set Consider as internal link to:
    Host: tidev.ifslearning.ac.uk

    I didn't select the nofollow/noindex options in Advanced - but also don't see any mention of it in the Violations Summary page.

    Ok - looking at the content of the page I can see the problem. The Site uses ASP.NET Master and content pages, the content for the homepage is only showing the Master content and none of the content that would normally be added at runtime. So there are no links to follow.

    This is a step forward - but very puzzling. Why is there no content to the page. Does this mean that a crawler would not see any links either? When I visit the pages in a browser they appear with the content I would expect. 

    Simon
    Sharing Knowledge Saves Valuable Time!
  • 11-09-2009, 1:38 PM In reply to

    Re: Broken links with dynamic sites

    Just to update with some information about the problem. The problem was that the Web site was using Forms Authentication which is not supported by the SEO Toolkit. This meant that the crawler was only able to crawl the login page and a few other resources but really was never able to enter the site succesfully.

    In v1.0 the only authentication schemes supported are Basic and Windows authentication.

    And just as most Search engine crawlers would, the iisbot will not send cookies, referer, or information such as that which means this scenario is not supported.

Page 1 of 1 (6 items)
Microsoft Communities