Intermittent Server 500 Errors / FastCGI debugging /tracing [Answered]RSS

4 replies

Last post Mar 12, 2010 11:54 AM by sethmay

  • Intermittent Server 500 Errors / FastCGI debugging /tracing

    Mar 04, 2010 03:27 PM|sethmay|LINK

    I'm really hoping that folks might have some ideas for how I can gather more detailed information about these problems.

    We've been experiencing a number of problems in our IIS 7 environment. We see two different things happening:

    • Users see Server 500 errors intermittently, on random pages.
    • All the users get 500 errors and no requests will complete. User requests backup (queue) until something recycles (my guess) or I restart the app pool.

    I've been working on these issues for several weeks, and I've done due diligence in trying to fix the problem through most of the resources at my disposal. I already tried opening an incident with Microsoft. They dropped me like a hot potato when they saw I was using PHP.

    Environment:

    • x64 Windows 2008, SP1. Dual Quad Core processor with 8gb of memory
    • IIS 7 running fastcgi. Each application runs in a dedicated app pool. The application uses an SSL certificate.
    • PHP 5.2.12 NTS running under fastcgi, configured using best practices as described here: http://learn.iis.net/page.aspx/246/using-fastcgi-to-host-php-applications-on-iis-70/
      • I've also used ZendServer 5 with similar settings with the same problems. Event watch and Code trace don't turn up anything of use.
    • FastCGI
      • InstanceMaxRequest: 10000
      • PHP_FCGI_MAX_REQUESTS: 10005
      • MaxInstances: 64 (8 * # Cores as advised by Kanwaljeet Singla, Fastcgi/Wincache developer, and also extensively tested by me through load tests)
      • ActivityTimeout: 60
    • PHP has these relevant modules installed:
      • SQLSRV 1.1 driver (v1.1.428, latest)
      • WinCache (v1.0.1325.0, latest)

    We ran a previous version of our web app on this same machine from Jan, 2009 to Jan, 2010. We released our new version Feb 1, 2010 on this exact same machine. The only differences are MaxInstances was set to 10 and PHP did not have the SQLSRV & WinCache extensions, and we were using PHP 5.2.8. The previous version of the application ran flawlessly environment.

    I've run our current version of the application in PHP 5.2.8 without wincache (It cannot run without the  sqlsrv driver) with the same results. I hope this rules out the wincache extension or some change in PHP between 5.2.8 and 5.2.12.

    I've setup similar environments on other servers with the same results. To me, this is strong evidence that hardware is not the issues (network cards, memory, etc). In my isolated environments, I tested without the SSL certificate and still had the issues, making me thing the SSL layer is not the problem.

    Windows system log is clean (no related errors or warnings).

    PHPs log file has nothing related (timestamps have no relation to the 500 errors as seen via Traces and IIS log files). Logging verbosely from the SQLSRV driver doesn't seem to turn up anything interesting.

    IIS's Failed Request Tracing Rules and out IIS log files are the only way that I can see any evidence of the Server 500 errors the users are experiencing.  These logs show a couple things:

    • The requested pages that receive Server 500 errors are across the board.
    • Most of the errors are not related to time outs (trace files will often show between 0ms and 7000ms execution times, even in Request Details or Compact View timestamps). Our timeouts are set to 60 seconds across the board, in both PHP and Fastcgi.

    Our most common trace file warning are (the first being far more common (and thus concerning) than the second):

    • MODULE_SET_RESPONSE_ERROR_STATUS: Warning
      • ModuleName FastCgiModule
      • Notification 128
      • HttpStatus 500
      • HttpReason Internal Server Error
      • HttpSubStatus 0
      • ErrorCode 2147942464
      • ConfigExceptionInfo
      • Notification EXECUTE_REQUEST_HANDLER
      • ErrorCode The specified network name is no longer available. (0x80070040)
    • MODULE_SET_RESPONSE_ERROR_STATUS: Warning
      • ModuleName: FastCgiModule
      • Notification: 128
      • HttpStatus: 500
      • HttpReason: Internal Server Error
      • HttpSubStatus: 0
      • ErrorCode: 2147943395
      • ConfigExceptionInfo :
      • Notification: EXECUTE_REQUEST_HANDLER
      • ErrorCode: The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)

    I'm happy to pass full trace files on to anyone who wants a closer look. I get several thousand a day. I can find no useful information anywhere concerning the error (especially the first one).

    I really need to find a way to pinpoint the issue. Is there a way to get more information out of IIS or FastCGI? Does anyone have any ideas of routes I have not taken? Should this go in a different forum?

    Thanks
    Seth

     

    PHP FastCGI IIS7 error 500 - Internal server error

  • Re: Intermittent Server 500 Errors / FastCGI debugging /tracing

    Mar 04, 2010 05:19 PM|ksingla|LINK

    Hi Seth.

    Error "network name is no longer available" is thrown when request which is about to get processed by FastCGI is already disconnected. FastCGI aborts the request with a 500. This was done to protect against F5 attack. So this error is not of concern. We need to get to bottom of the second error you are seeing. I think something is causing php-cgi.exe processes to crash in rapid succession which is causing FastCGI to go in a failure protection mode. Once in failure protection mode, FastCGI will return 500 for a minute before attempting to execute php-cgi again. Do you know if php-cgis are crashing on the server? Would you happen to have a full memory dump of the crash?

    Thanks,
    Kanwal

  • Re: Intermittent Server 500 Errors / FastCGI debugging /tracing

    Mar 05, 2010 12:36 PM|sethmay|LINK

    You've confirmed my suspicions concerning the "network name is no longer available". My load tests gave hints that these messages were related to severed connections.


    I'm currently working to get a memory dump of the crashes. This is not something I've done before, so it is taking me a bit of time. I'm trying to use windbg to attach to the w3wp.exe process assigned to the relevant app pool. So far, I seem to lock up the process (even in non-invasive mode). I'll consult with the other developers here to see if any of them have experience with this process.

    I hope to have more data/info within a few hours.

    Thanks
    Seth

  • Re: Intermittent Server 500 Errors / FastCGI debugging /tracing

    Mar 05, 2010 08:30 PM|ksingla|LINK

    http://support.microsoft.com/kb/286350 might be helpful in creating the crash dump.

  • Re: Intermittent Server 500 Errors / FastCGI debugging /tracing

    Mar 12, 2010 11:54 AM|sethmay|LINK

    Thanks for pointing me to that article. I set adplus to monitor php-cgi.exe in crash mode, as well as the w3wp process assigned to the relevant app pool.

    It's generated a number of dump files off the php-cgi.exe, one of which is linked to the relevant "The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)" error. Truthfully, I've looked through the log and used Debug Diagnostic on the dump file, but I have not idea how to effectively interpret this information. 

    Any ideas?

    Thanks

    Seth