IIS 5 & IIS 6
unexplained 5-second delay added to many requests
Last post Apr 21, 2008 07:46 AM by Rovastar
Apr 16, 2008 08:54 AM|Chris J.|LINK
We have been experiencing serious problems with IIS6 in the past week whereby most or all requests to some of our web sites are taking exactly 5 seconds longer than normal (or some multiple of 5 seconds). These values are read from the time-taken value
in the IIS logs and are also measurable from the end-user perspective.
The 5-second delay will randomly begin during a high-traffic time over a period of 45 minutes or so and then last continuously for 12 hours or more before stopping spontaneously. Sometimes no events in the event log are associated with the start and stop.
The problem is more frequent with our brand new, faster servers. Recycling app pools has no effect on the problem.
We have been spending an enormous amount of time and effort over the past few months in trying to track this and related problems down. We are starting to suspect ISAPI_Rewrite 3 -- which we are using very lightly (but in a crucial way nevertheless).
Can anyone think of anything that would cause exactly 5 seconds to be added to the handling of requests? Below is an excerpt of a histogram of time-takens from our IIS logs:
iis 6.0 time-taken 5-second delay isapi rewrite
Apr 16, 2008 10:51 AM|Rovastar|LINK
OK some troubleshooting questions to get a better understanding.
Do *all* the pages take a long time in this problem time period, certain types of pages aspx, etc, complex pages?
Do you ever get long timetaken at other times or is it always ion these 12 hour (or so) bursts?
any win32 error code status ?
Remember Timetaken only logs how long the user takes to receive all the file. It could just be a slow/bad connection.
So from the users click on the page at IIS log timestamp - timetaken.
Is anything occuring at that stage?
Any other tiers invloved (e.g. going to an app server or writing to a database)? [Explain your infrastructure]
I have troubleshooted this type of problem before and the COM layer would be slow or the com layer would be quick and the writing to the database would be slow etc.
If you have any logging of those applications you can see if there are network issues in teh process. I have seen when a pages take 30 seconds but the IIS layer is quick and when the info get to the app layer it is subsecond and the database layer is subsecond. It
is a network issue between the layers.
You need to make sure what elements of these pages are slow.
Even if your infrastucture is not tiered like this you can use the technique. Write timestamps in a problem page to the database at key points. It gives a better picture of what is happening.
The only really way to prove it is internal is to replace the problem on your internal internet network.
It could be the ISAPI filters? Is there really no way of turing them off.
Anything in httperr log?
"The 5-second delay will randomly begin during a high-traffic time over a period of 45 minutes or so and then last continuously for 12 hours or more before stopping spontaneously. Sometimes no events in the event log are associated with the start and stop."
What is stop and starting? I presumed you meant the behaviour starts and then stops why would this be recorded in the event log. What events are recorded in the event viewers at the time. (time in logs - timetaken remember)
Run performance monitor and monitor what the connections, traffic, etc. Look for different behaviour from your benchmarks you have for normal/ok usage.
Still having problems then use a packet sniffer like netmon.
Apr 16, 2008 12:41 PM|Chris J.|LINK
When it happens, it affects all or nearly all requests to most web apps on the machine. This includes gifs, jpegs, js, etc -- and even 404's to images that don't exist. We have two web sites on each machine that are near duplicates of one another. They
only differ in the list of server/host name bindings. The one with the catch-all binding seems to be affected more while the one with an explicit list of server bindings is not affected. (The one with the catch-all sees many more requests come in.)
Twice I have observed the following application event log message around the time that the slow-downs began:
"A process serving application pool '<app pool name>' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '5208'."
Apr 21, 2008 07:46 AM|Rovastar|LINK
Well I would if possible, Swap the host bindings around. That way you could determine if it is a problem with the bnding issue you have noticed. I supect that could be a red herring though.
I am curious about what is being sent out and what delays you get.
You need to establish if when a request comes in IIS is slow at reponding to the request. A packet sniffer shoudl help this.
It is also important to not if the same errors come from internal as well as externally. There could be a routing/traffic issue as timetaken only shows when the client receives it.
It coudl be a Disk I/O bottleneck (or network bottle or anything else) so check the performance monitor for this.