Hi,
I hope this is the correct forum to post in, I wasn't sure if this belonged in troubleshooting or IIS7 FastCgi.
I do hope someone can help me as I'm going completely crazy trying to troubleshoot this one. Server is running Windows 2008 x64 w/24GB of RAM, IIS7 running PHP 5.2.9 nts using FastCGI.
In a nutshell, we have quite a few sites on the server (we are a small web hosting company in the middle of a migration from older hardware to this new box) and I am seeing repeated http 500 errors on a random but frequent basis. PHP pages causing the 500 errors will run absolutely fine 95% of the time but the problems, when they do occur appear to be load related. I have switched on FRT for the site in question but, very worryingly, I am getting a multitude of different error codes, very few of which seem to have been discussed in detail on these boards and with seemingly no correlation.
Some of the FRT errors and recent timings are as follows:
63. r FASTCGI_UNEXPECTED_EXIT
Error 15:35:57.741
64. r SET_RESPONSE_ERROR_DESCRIPTION
Warning ErrorDescription="D:\PHP\5299nts\php-cgi.exe - The FastCGI process exited unexpectedly" 15:35:57.741
65. r MODULE_SET_RESPONSE_ERROR_STATUS
Warning ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The operation completed successfully.
(0x0)", ConfigExceptionInfo=""
56. r MODULE_SET_RESPONSE_ERROR_STATUS
Warning ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The specified network name is no longer available.
(0x80070040)", ConfigExceptionInfo="" 15:21:33.486
61. r FASTCGI_UNEXPECTED_EXIT
Error 15:38:04.352
62. r SET_RESPONSE_ERROR_DESCRIPTION
Warning ErrorDescription="D:\PHP\5299nts\php-cgi.exe - The FastCGI process exited unexpectedly" 15:38:04.352
63. r MODULE_SET_RESPONSE_ERROR_STATUS
Warning ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The operation completed successfully.
(0x0)", ConfigExceptionInfo="" 15:38:04.352
As you can see there are a multitude of different error codes received and the problem appears to be capable of effecting just about every page on the website so no one PHP script is to blame.
In terms of frequency, the errors, once they start happening, can happen multiple times per minute or as infrequently as every half hour. The PHP_FCGI_MAX_REQUESTS environment variable for the FastCGI app is set to 10000 and so is InstanceMaxRequests, as documented. Reducing these values so as to make sure FastCGI processes are recycled more frequently does not make any difference and the errors are still as frequent, as far as I can tell.
I have installed and enabled Debug Diag with a crash rule for all instances of php-cgi.exe as well a hang rule configured to pole a page on the problem site every second with an action to dump all process targets with the name php-cgi.exe if the page does not load successfully. Despite all the errors above continuing to be logged by FRT, debug diag has not kicked in once which makes me think that the issue is not down to PHP crashes.
PHP logging is on, but nothing at all is logged for this site, so that is no help whatsoever.
Our MySQL server sits on the same box and so is accessed via localhost. Server CPU utilisation is very low, even when our sites are quite busy, no more than 5-10%.
One other very interesting fact: We currently run a pool of 16 FastCGI processes (instances of php-cgi.exe), reducing this in testing to 4 or even 8 results in a *massive* slow down of the problematic website. Interestingly though, the CPU utilisation for each php-cgi.exe process remains low at only 1-2%, this doesn't seem to correlate with the slowdown of the site. Also, the MySQL processor usage jumps from its usual 2-3% up to 6-7% - this makes even less sense.
I'm extremely concerned that the issue my lie elsewhere on the web stack, perhaps with MySQL or PHP itself since some of the failed requests seem to be taking between 8-120 seconds, this is bizarre since the server is extremely fast and when pages, load, they do so instantly. Without more info from PHP though I can't tell where the issue lies, or whether we are being held up because of some random fault with MySQL under load. We have other customer sites on the same box using MySQL at the same time and they are not logging anything with FRT switched on - in other words, our MySQL server is still responding to them seemingly quite happily.
Does anyone have any ideas at all? To say that this is a disaster would be an understatement as I am in the middle of trying to complete a migration to this new hardware. Everything appeared to be running stably until these errors began cropping up. I am now under immense pressure to get this resolved ASAP and it seems the harder I dig for info the less I find.
Please let me know if there is any further information I can post to aid in diagnostics.
Very many thanks in advance.
James