We are excited to announce that the IIS.NET Forums are moving to the new Microsoft Q&A experience. Learn more >

IIS Lock-up - Cannot Find The Cause!!! Help!!!RSS

28 replies

Last post Sep 15, 2011 02:03 AM by laurin1

  • IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 10:08 AM|laurin1|LINK

    We have suddenly developed a serious problem with IIS. Environment:

    • <div mce_keep="true">IIS 7.5</div>
    • <div mce_keep="true">Server 2008 R2 Web Edition</div>
    • <div mce_keep="true">Intel Xeon Quad-core</div>
    • <div mce_keep="true">PHP 5.3.6</div>
    • <div mce_keep="true">MySQL 5.5</div>

    We only built and deployed this server about 2 months ago. Ran fine for about 6 weeks and then about 3 weeks ago, IIS starting locking up once or twice a week. When attempting to retrieve a web page, the browser will just sit there (and if it ever times out, we haven't waited long enough.) However, enough refreshes will eventually give us a 500 error. On the server, all we know is that Max Non-Anonymous users rises quickly from it's normal 5-10 range to over 110, Current Non-Anonymous users does something similar, it goes from about 10 to about 150 (we are an Intranet enviroment and everyone must authenticate.) Also, we go from about from the normal 4-5 php-cgi.exe processes to over....well, to whatever that Current Non-Anonymous users number is (a lot.)

    One thing we haven't tested is non-PHP (.htm) files, which we will do the next time it fails. This is severe issue though. Our Intranet is an integral part of our process here (patient tracking, time sheets, financial tracking and reporting, everything we do for the most part.)

    I know this could be a PHP issue, but the only thing that fixes this went it dies: IISRESET. We've tried everything else we can think of. Almost, nothing in the Event Logs, except on the last two occasions, right before this occurred, we got this Event:

    Log Name:      System
    Source:        Service Control Manager
    Date:          7/25/2011 8:37:50 AM
    Event ID:      7036
    Task Category: None
    Level:         Information
    Keywords:      Classic
    User:          N/A
    Computer:      APP01.pridedallas.com
    Description:
    The Application Experience service entered the running state.

    We are close to the point of rebuilding the server, because it's too critical and we don't know how to fix it (though, if it's a code issue on our end, that won't fix it...but we have no way of knowing.)

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 10:22 AM|dustinmoorman|LINK

    Good day,

    I work with Laurin and hope to add some details to the situation.

    It seems that the normal number of Current NonAnonymous users stays at 1-5, but never above 5. During these outages the number of Current NonAnonymous users rises to 110, and seems to stay there. You will notice that the php-cgi.exe processes shown in the task manager match the number of Current NonAnonymous users at the time.

    You may end the process in task manager, and are able to watch the numbers for the Current NonAnonymous users in the performance monitor drop with them, but shortly thereafter, the number rises again and the only way to restore services to the users are to reset IIS.

    The server itself is 64 bit. We are and have been using the 32 bit php-cgi for some time now without this issue.

    Hope this helps.
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 10:30 AM|HCamper|LINK

    Hello,

    I suggest you check "Black Vipers" Service wiki http://www.blackviper.com/wiki/Application_Experience .

    Additionally check the Windows Task Library for items for  compatible tasks that will may have names like setup

    and some folder.

    Was a program installed recently that had a dialog for was the program installed correctly

      or re-install with recommended settings ?

    If the program install was installed and re-installed with recommended settings usually appears

          in the Task Library.

     The new task will launch automatically and may be checking on-line for solutions.

     

    Martin

     

     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 11:12 AM|laurin1|LINK

    After further review, I don't think that Event has anything to do with the problem. The Event occurs about 34 minutes before the last failure. Now, I did find this Event at about 15 minutes before the failure (both events occur all the time, so why it would be the cause of the failure this time, I have no idea):

     Log Name:      Microsoft-Windows-NlaSvc/Operational
    Source:        Microsoft-Windows-NlaSvc
    Date:          7/25/2011 8:21:41 AM
    Event ID:      4343
    Task Category: Ldap Authentication
    Level:         Error
    Keywords:      (4),(2)
    User:          NETWORK SERVICE
    Computer:      APP01.pridedallas.com
    Description:
    LDAP authentication on interface {BD5F6C7D-C824-4827-80D9-638022301573} (192.168.200.1) failed with error 0x51

    That interface it's referring to is a direct connection cable to the failover server (crossover cable), with a route just for that (on a different subnet.)

    We've had that same configuration on the previous two servers (Server 2003) for years, with no issues.

    Also, another strange thing is during the failure, the log files still show requests that are going through (no errors), but that's not what the browser sees.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 12:03 PM|HCamper|LINK

    Hello,

     

    That interface it's referring to is a direct connection cable to the failover server (crossover cable), with a route just for that (on a different subnet.)

    We've had that same configuration on the previous two servers (Server 2003) for years, with no issues.

    What has changed for the Network above ?

    Has a certificate changed ?

    Some of the Searches for "Microsoft-Windows-NlaSvc" lead to problems verifying

           Anti-Virus updates Disk Access  System dll integrity.

    I suggest you check the logs for Network Related events.

    You might check the Network cables NIC Cards for problems.

    Martin

     


     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 12:06 PM|laurin1|LINK

    Nothing. What I am saying is that an LDAP call (or any call other than what is available on that server) will fail. That is a crossver cable to the failover server, ONLY. It should not even be trying, as the default route (0.0.0.0) is pointing to 192.168.0.17.

    I'm going to install Failed Request Tracing tonight and see if that tells me anything.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 12:19 PM|HCamper|LINK

    Ok,

    The Failed Request Tracing can help.

    Have you checked that the Open SSL libraries are ok in the PHP install location ?

    Are you using Open SSL for windows ?

    Martin

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 25, 2011 12:38 PM|laurin1|LINK

    No, we are not using SSL (it's all local network traffic.)

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 10:01 AM|laurin1|LINK

    Ok, installed Failed Request Tracing. We had a failure this morning.

    Initially the max number of logs was set to 50. We are getting about 50 a minute!!! So, by the time I looked at it, we were long past the failure time. Now, to the logs, I think this is or related to the problem. Like the other authentication error I saw in the Event Logs, every single one of these is:

     STATUSCODE:"401.2"
    SITE.ID:"1"
    SITE.NAME:"wwwroot"
    WP.NAME:"3292"
    APPPOOL.NAME:"wwwroot"
    verb:"GET"
    authenticationType:"NOT_AVAILABLE"

    OR:

    TRACE "wwwroot/fr004396.xml" (url:http://intranet.pridedallas.com:80/reports/sav
    ed/msu/1241.htm,statuscode:401.2,wp:3292)

    401.2, Access Denied due to that authenticationType:"NOT_AVAILABLE". I found some similar log entries in the IIS logs, like this:

     2011-07-26 12:42:45 192.168.0.17 GET /javascript/menu_tpl.js - 80 PRIDEDALLAS\jaclynbailey 192.168.0.154 Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729;+.NET4.0C;+.NET4.0E) 304 0 0 46

    Not matching the same times, but I believe they are the same problem. We require all users to authenticate (mostly using Windows Authentication with IE or Chrome.)

    Those IIS log entries are missing the username, so I know they are the same problem.

    So the question is why are we inconsistently having authentication problems?

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 10:36 AM|HCamper|LINK

    Hi,

    You might check if the files and folders listed have had permissions changed.

    For IE "Defining Document Compatibility" has helped in many cases.

     http://msdn.microsoft.com/en-us/library/cc288325(v=vs.85).aspx

    Recent security updates for IE8 browsers downstreamed from IE9 for "mixed content"

    IE Blog    http://blogs.msdn.com/b/ie/archive/2011/06/23/internet-explorer-9-security-part-4-protecting-consumers-from-malicious-mixed-content.aspx .

    Have there been changes to authentication user or provider ?

    Martin

     

     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 10:48 AM|laurin1|LINK

    What I mean, is that same request happens to the exact same file by the exact same person seconds later (or sometimes in the same second) and authenticates just fine. One fails, one succeeds. On top of that, neither myself, nor anyone else has actually seen this cause an error while using the site, except when it completely fails.

    No changes have been made to the security or the provider.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 10:57 AM|HCamper|LINK

    Hello,

    Then for "javascript/menu_tpl.js" check the permissions or the file it's self or copy a backup over the file.

    How are the sessions stored and cached and or the times set ?

    It is always a problem to reproduce the problem running as developer or admin.

    The other possible is you may have a "HAcker" at work?

    Martin

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 11:17 AM|laurin1|LINK

    BTW, these fail for me as much as anyone.

    After further review, I think we may be barking up the wrong tree. I did a test with another site on the same server and when I first access the site, I get failures (depending on the page, one or multiple) and then it succeeds every single time, no matter how many times I refresh or change pages.

     I think this is just the failed attempt at Anonymous access, and then it finds one that succeeds and goes forward. If that is the case, then Failed Request Tracing is useless in my enviroment. We do nothing but authenticated requests.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 26, 2011 01:07 PM|HCamper|LINK

    Hello,

    Since you can not find information with Failed Requests then try Fiddler or Wireshark.

    I agree "We do nothing but authenticated requests." but it is possible

    that some one else is not something to consider in Intranets.

    Recap of the problem areas:

    A) The server Max Non-Anonymous users rises quickly from it's normal 5-10 range to over 110.

    B) Current Non-Anonymous users does something similar goes from about 10 to about 150.

    C) Using Intranet enviroment and everyone must authenticate.

    D) We go from about from the normal 4-5 php-cgi.exe processes to well over.

    E) Current Non-Anonymous users number is (a lot.)

    This guide http://confluence.atlassian.com/display/DOC/Capturing+HTTP+traffic+using+Wireshark+or+Fiddler has

              discussion for Wireshark and Fiddler.

    You might create a Wireshark Filter for "Authentication and Anonymous " packets

             and look for the IIS Server status codes portion of the transfer from request to response.

    Your should be able to get more information with packet sniffing.

    Martin

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 10:07 AM|laurin1|LINK

    Ok, we did more tests. We don't use hardly any static .htm pages, so we didn't even think about that. I can still load .htm files or images, I just can't load .php files. So it's PHP getting stuck (or the way that IIS handles it.)
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 10:12 AM|laurin1|LINK

    Also, we can point a second site towards the same file that the first one is using on port 81 and that works fine during this failure.
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 10:51 AM|laurin1|LINK

    I believe that I have found the source of the problem: URL Rewrite. While failing, I disabled the rules and it started working. The thing is, the rules work. Seems that either the module itself has a bug or our rules (which are very simple, one rewrite and one redirect), are getting stuck.
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 12:19 PM|HCamper|LINK

    Hi,

    At least you have more narrowed it down.

    I know you say FRT does not work for you

     still you might look at this http://learn.iis.net/page.aspx/467/using-failed-request-tracing-to-trace-rewrite-rules/ 

    use of FRT for Rewrite.

    Martin

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 07:18 PM|laurin1|LINK

    The problem is that FRT requires the request to "fail", and it never does. It just sits there...foreveer. I was wrong about it timing out, it never will. 19 people will have screens just sitting there, and I think it would last infinitely, until they stop the browser or we restart IIS (or stop the URL rewrite, now that that appears to be problem.) Which is very strange...why don't any of the timeouts kick in? It should fail at some point, but never does.

    I've seen at least one other person say they have this problem and same setup (IIS, PHP and Wincache), but no solution was posted (http://forums.iis.net/t/1169634.aspx.)

     I wonder about the the part of the FRT that allows seconds elapsed to be specified instead of a failure....that might work.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Jul 30, 2011 07:35 PM|HCamper|LINK

    Hi,

    I looked at the thread you posted.

    Look at this http://forums.iis.net/p/1168533/1988302.aspx#1988302 thread and http://forums.iis.net/p/1180059/1989515.aspx#1989515

     I don't think you have the same mixture of configurations ?

    Yes, I think changing the FRT time intervals would work.

    You may have read this before http://ruslany.net/2008/10/debug-and-troubleshoot-rewrite-rules-easily/ but

    it may help to get the style sheet.

    If you can on your developer box create a known "bad test" and use that to start with.

    You might consider getting the php-test-archive.zip from PHP net and archives to run a PHP battery

    of tests to speed up the  "run to fail" testing.

    Martin

     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Aug 01, 2011 04:15 PM|laurin1|LINK

    Except for the 16 cores, yes we do. We can't generate a "known bad test", it seems to fail at random times and loads. Not sure what this one has to do with us: http://forums.iis.net/p/1168533/1988302.aspx#1988302.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Aug 01, 2011 04:16 PM|laurin1|LINK

    I cannot find this: php-test-archive.zip from PHP net.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Aug 01, 2011 04:33 PM|HCamper|LINK

    Hi,

    I agree the mixture of settings are not likey part of your problems.

    Check this link http://windows.php.net/downloads/releases/archives/ 

                the archives site contain the tests and deps for php.

    Martin

     

     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Aug 09, 2011 02:29 PM|laurin1|LINK

    Well, we thought we were on the right track, but apparently not. The fix of disabling and enabling the URL Rewrite rule worked a few times, then stopped. I was able to get it unstuck by restarting FastCGI, but that apparently does the same thing as an IISRESET, or close to it......

    That does appear to be the issue. Using FRT, catching anything over 30 seconds, all requests once it fails are stopped here:

     

    49. i FASTCGI_ASSIGN_PROCESS
    CommandLine="C:\PHP\php-cgi.exe", IsNewProcess="true", ProcessId="3992", RequestNumber="1"
    18:15:20.929
    50. i FASTCGI_START
    
    
    18:15:20.929
    51. FASTCGI_WAITING_FOR_RESPONSE
    
    
    18:15:20.929

     

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Aug 17, 2011 11:13 AM|laurin1|LINK

    I figured out why it seemed like the Rewrite rule was causing the problem.

    It wasn't the disabling and re-enabling the URL Rewrite rule that was fixing the problem, it was disabling and enabling the URL Redirect rule. Since I had that rule disabled, I never figured that was the problem. Well, this time, after trying the Rewrite to no avail, I enabled the Redirect rule, and then disabled it again....and it fixed the problem....

    Still don't know why or how, but since we are not using that, I'm deleting it to see if the problem goes away.

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Sep 14, 2011 05:32 PM|laurin1|LINK

    Still no solution. Happens about once or twice a week (sometimes we go over a week without it happening at all.) Not any closer to a solution.
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Sep 14, 2011 06:27 PM|HCamper|LINK

    Hi @ Laurin1,

    So your still tracking down the problem with this. 

    Since you have narrowed it to urlRewite and Rule(s) being the cause:

    A) It wasn't the disabling and re-enabling the URL Rewrite rule that was fixing the problem.

    B) It was disabling and enabling the URL Redirect rule.

    C) Since I had that rule disabled. I never figured that was the problem.

    D) This time after trying the Rewrite to no avail,

    E) I enabled the Redirect rule, and then disabled it again....and it fixed the problem....

    How about creating a thread / post in the urlRewrite Forum let Lloyd  Kris others check and test the rules.

    At least you might have a different view of the urlRewrite and problems.

    Martin

     

     

     

    Windows and Linux work Together IT-Pros
    Community Member Award 2011
  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Sep 15, 2011 01:38 AM|laurin1|LINK

    I'm not 100% sure it is the URL Rewrite. I permanently disabled the Redirect. We really don't need it anyway. Now, sometimes disabling and enabling the URL Rewrite fixed it and sometimes it doesn't. We only have one rule. It's very simple. Also, during this failure we have confirmed that it's not all of IIS that is locked up, it is either just PHP or at least just FastCGI (we only use FastCGI with PHP, so it's hard to tell), because we can still access .htm files or images via this site. Also, while it is locked up, I can use or create another site pointing to the same files and that site works fine. I've tried just creating a new site, but that doesn't solve the problem.

    Also, we use Wincache....and I'm beginning to suspect that might have something to do with it as well.

    Still, we have to figure this out, so I'll do as you say and post in the URL Rewrite forum.

     

  • Re: IIS Lock-up - Cannot Find The Cause!!! Help!!!

    Sep 15, 2011 02:03 AM|laurin1|LINK