IIS 7 and Above
FastCGI with PHP failing all over the place under load
Last post May 24, 2011 03:05 AM by kctt
Jul 07, 2009 04:07 PM|JamesRLM|LINK
I hope this is the correct forum to post in, I wasn't sure if this belonged in troubleshooting or IIS7 FastCgi.
I do hope someone can help me as I'm going completely crazy trying to troubleshoot this one. Server is running Windows 2008 x64 w/24GB of RAM, IIS7 running PHP 5.2.9 nts using FastCGI.
In a nutshell, we have quite a few sites on the server (we are a small web hosting company in the middle of a migration from older hardware to this new box) and I am seeing repeated http 500 errors on a random but frequent basis. PHP pages causing
the 500 errors will run absolutely fine 95% of the time but the problems, when they do occur appear to be load related. I have switched on FRT for the site in question but, very worryingly, I am getting a multitude of different error codes, very few of which
seem to have been discussed in detail on these boards and with seemingly no correlation.
Some of the FRT errors and recent timings are as follows:
63. r FASTCGI_UNEXPECTED_EXIT
64. r SET_RESPONSE_ERROR_DESCRIPTION
Warning ErrorDescription="D:\PHP\5299nts\php-cgi.exe - The FastCGI process exited unexpectedly" 15:35:57.741
65. r MODULE_SET_RESPONSE_ERROR_STATUS
Warning ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The operation completed successfully.
56. r MODULE_SET_RESPONSE_ERROR_STATUS
Warning ModuleName="FastCgiModule", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus="500", HttpReason="Internal Server Error", HttpSubStatus="0", ErrorCode="The specified network name is no longer available.
(0x80070040)", ConfigExceptionInfo="" 15:21:33.486
61. r FASTCGI_UNEXPECTED_EXIT
62. r SET_RESPONSE_ERROR_DESCRIPTION
Warning ErrorDescription="D:\PHP\5299nts\php-cgi.exe - The FastCGI process exited unexpectedly" 15:38:04.352
63. r MODULE_SET_RESPONSE_ERROR_STATUS
(0x0)", ConfigExceptionInfo="" 15:38:04.352
As you can see there are a multitude of different error codes received and the problem appears to be capable of effecting just about every page on the website so no one PHP script is to blame.
In terms of frequency, the errors, once they start happening, can happen multiple times per minute or as infrequently as every half hour. The PHP_FCGI_MAX_REQUESTS environment variable for the FastCGI app is set to 10000 and so is InstanceMaxRequests,
as documented. Reducing these values so as to make sure FastCGI processes are recycled more frequently does not make any difference and the errors are still as frequent, as far as I can tell.
I have installed and enabled Debug Diag with a crash rule for all instances of php-cgi.exe as well a hang rule configured to pole a page on the problem site every second with an action to dump all process targets with the name php-cgi.exe if
the page does not load successfully. Despite all the errors above continuing to be logged by FRT, debug diag has not kicked in once which makes me think that the issue is not down to PHP crashes.
PHP logging is on, but nothing at all is logged for this site, so that is no help whatsoever.
Our MySQL server sits on the same box and so is accessed via localhost. Server CPU utilisation is very low, even when our sites are quite busy, no more than 5-10%.
One other very interesting fact: We currently run a pool of 16 FastCGI processes (instances of php-cgi.exe), reducing this in testing to 4 or even 8 results in a *massive* slow down of the problematic website. Interestingly though, the CPU utilisation
for each php-cgi.exe process remains low at only 1-2%, this doesn't seem to correlate with the slowdown of the site. Also, the MySQL processor usage jumps from its usual 2-3% up to 6-7% - this makes even less sense.
I'm extremely concerned that the issue my lie elsewhere on the web stack, perhaps with MySQL or PHP itself since some of the failed requests seem to be taking between 8-120 seconds, this is bizarre since the server is extremely fast and when
pages, load, they do so instantly. Without more info from PHP though I can't tell where the issue lies, or whether we are being held up because of some random fault with MySQL under load. We have other customer sites on the same box using MySQL at the same
time and they are not logging anything with FRT switched on - in other words, our MySQL server is still responding to them seemingly quite happily.
Does anyone have any ideas at all? To say that this is a disaster would be an understatement as I am in the middle of trying to complete a migration to this new hardware. Everything appeared to be running stably until these errors began cropping
up. I am now under immense pressure to get this resolved ASAP and it seems the harder I dig for info the less I find.
Please let me know if there is any further information I can post to aid in diagnostics.
Very many thanks in advance.
Jul 07, 2009 07:48 PM|JamesRLM|LINK
Just a further follow up to say that, having upgraded PHP to the latest 5.2.10 build, things seem (very slightly) more stable.
I'm still seeing the following in FRT every half hour or so:
57. view trace Warning -MODULE_SET_RESPONSE_ERROR_STATUS
HttpReason Internal Server Error
ErrorCode The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)
I still have absolutely no clue as to what could be causing one request in thousands on a busy website to fail like this. The request above was timed as having taken 129793 msec on a page which normally executes in fractions of a second.
Any help *very* gratefully accepted - thanks so much :)
Jul 07, 2009 08:31 PM|anilr|LINK
That just seems like the server timing out the client - and time-taken also corresponds to the default Connection Timeout of 120secs. Is that entry in failed request tracing preceded by either a READ_ENTITY or FLUSH which fails with the same error code?
Jul 08, 2009 01:05 AM|JamesRLM|LINK
Gosh, you're right I can't believe I didn't see that before now - I think it's a case of looking at an issue for so long you end up not being able to see the wood for the trees!
So, excuse the ignorant question, but I assume these are quite naturally caused by a some sort of communication error with a user's web browser or the general vagaries of the internet for example? In other words, is this normal behaviour?
If the answer to the above is 'yes' are the original error codes I posted at the top of the thread also symptomatic of the same behaviour?
Certainly the newer PHP version does seem to have stabalised things more in that I am now only getting the 0x800703e3 error codes where I was getting all sorts before.
Any other background info you can provide would be great as I just want fully understand the causes and make sure our PHP environment is stable before moving more customer sites to this new server.
One last question, do you recommend an app pool that is serving only PHP via FastCGI be run in Classic or Integrated mode?
Thanks so much again for your time and expertise in this :)
Jul 08, 2009 05:36 PM|anilr|LINK
Originally, you had errors that indicating both PHP crashes and user timeouts - I don't know why debugdiag did not catch any crashes, maybe the process was silently terminating - seems like you have got rid of the PHP crashes after upgrading php and you
are left with client timeouts which may or may not be a problem - it may just be that the clients are on a real slow connection and it is not your problem at all.
Jul 08, 2009 05:57 PM|JamesRLM|LINK
Thanks Anil, that's really helpful and potentially a great relief. The only remaining worry I have is that some of the FRT logs indicate 0x800703e3 errors with much lower request times. I have one here marked as taking only 31 msec:
The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)
Internal Server Error
Sorry to seem paranoid, but given the above and the fact that we haven't hit anything like the connection timeout do you think this could still be the fault of the clients? I have no categoric proof that our visitors are actually seeing 500 errors anymore,
it could simply be poorly behaved search spiders for example.
Thanks a million again for all your help :)
Jul 15, 2009 11:08 PM|anilr|LINK
I wouldn't depend on the time shown in that view - that part of the xsl has known bugs - click on the compact view to be sure you see the correct time. Assuming the time is shown correctly, this could just be an impatient client sending the request and
disconnecting immediately - or maybe disconnecting by the time the request processing was done.
May 09, 2011 07:11 PM|icm76|LINK
My server seems to have a similar behavior but I do not think that the error occurs because of a timeout - sending a file ~9 MB in size seems to take ~2sec but still, the error appears. Here is a
link to my tracing file: . Can you help me?
PHP5.3.5 NTS, Win2008 R2 Standard x64, FastCGI.
Also, my app (moodle) died completely every now and then - that's why I activated the tracing.
Here is my post on Moodle forums: http://moodle.org/mod/forum/discuss.php?d=174802#p766660
May 19, 2011 07:20 PM|HCamper|LINK
This is a very old Thread / Post and is not getting updates or replies:
If you your having Time Out errors check the Forum for FastCGI
If you look at the FastCGI Extension
http://learn.iis.net/page.aspx/248/configuring-the-fastcgi-extension-for-iis-60/ Configuration and Values.
If the Guides do not resolve the issues it is best to open a new Post / Thread with
information for the IIS Server and Windows System.
May 24, 2011 03:05 AM|kctt|LINK
What's the error when your app fails? Internal server error 500 or something else?