I am running a web farm of IIS7 on Server 2008 Web edition. We max out at 150 hits/sec at peak times load balanced across 4 servers.
One of our customers is complaining they are getting time outs and slow responses to one of our web services so I have been investigating.
I started by looking through the logs and found that most requests are serviced in sub 120ms but some hang on for seconds!
I followed up by running my own load tests from an external linux VPS using httping. This confirmed the behavior our client and the logs were showing.
To track down the issue I used powershell to write a quick test script * so I could test behind the firewall; all of the following figures are based upon script looping through 10,000 requests. I got the same behavior but less pronounced (some requests taking
1000+ms with an average of 18ms).
I thought maybe it was the Load balancer, so I ran the script on the web servers (one by one); Same behavior, less pronounced again ( long requests taking 25+ms avg 1ms) .
I reduced complexity and ran it against localhost with the default website; same behavior same times as above .
I took it to the test lab, clean (ish) install of 2k8 default website in IIS7; same results.
I am now perplexed? I have looked online for idea's on why I am getting seemingly random high latency spikes. I am sure that other things in my network such as Load balancers ect are amplifying the issue externally but to begin with I aim to get my max requests
in my lab to a same order time as my average.
Given the dataset it sounds like it could be something between the load balancers and the backend servers (if the servers max response is 25ms but the load balancer test maxes at 1000ms, that's a big jump). Are you using ARR for load balancing? If so, try
turning on Failed Request Tracing. You'll probably also want to run network traces between the web server and load balancer to see if it's something lower level (like a ton of retries).
<div style="padding-left: 30px;">4 load generators set to 25 hits per second</div>
<div style="padding-left: 30px;">Each load generator hits a custom png that is 7KB in size viahttp://127.0.0.1/welcome2.png(This
removes networking and DNS as performance factors)</div>
<div style="padding-left: 30px;">Record request times into CSV</div>
Run this LoadUI project for 30 minutes (I can provide an export of my project if anyone wants to test)
While running the test the machine has the network cable removed; to reduce impact of network IO causing interrupts
All windows minimised; to reduce the CPU load of rendering LoadUI
System locked (Winkey + L) to ensure no one tampers with the system while running the test.
Results:
Around 200,000 requests
Avg and Median response time are effectivly 1ms (LoadUi is only timing to the millisecond)
300+ requests with response time between 2 and 9 ms (inclusive)
30+ requests with response time 10ms or over (some as high as 60ms)
This shows 0.165% of request being double or greater than the avg response time.
Request to the community:
Can anyone suggest how to tune IIS 7 so that we consistently get 1-2ms requests on localhost with static content? So that we can apply these techniques to our dynamic web service and attempt to remove these time-outs that I am sure are exaggerations of
the underlying slow responses.
I expect there is something else going on rather than in IIS.
Maybe writing things to log files slows things slightly, etc.
I could be anything a time of a few extra milliseconds is nothing to be too concerned about. Surely that is within the margin of error for a app. The app itself might not respond instantly 100% of the time.
I expect there is something else going on rather than in IIS.
Originally I thought it may be another process that was slowing down our live service which is why I have paired down my test bench to a fresh install of server 2008 std with only the IIS role, java and the testing software installed.
To test the theorem that it may be CPU bound I set an affinity for the whole of IIS to CPU 0 and CPU 1 while limiting Java to CPU 2 and CPU 3 however the same pattern of results emerged.
Rovastar
Maybe writing things to log files slows things slightly, etc.
I have disabled IIS logging and I am using an SSD to mitigate the time taken to log the output data from LoadUI. I did a run without logging at all in IIS or LoadUI and it also showed signs of latency spikes however I couldn't tell how frequently because I
was only graphing the data in real-time not saving the raw data.
I may try and run remote load testers and put the server back onto the network to remove the possibility that it is Java or LoadUI interfering with the results however this will reinstate the possibility that it is a network latency.
Rovastar
I could be anything a time of a few extra milliseconds is nothing to be too concerned about. Surely that is within the margin of error for a app. The app itself might not respond instantly 100% of the time.
It concerns me because this is only testing static content over localhost, I would expect very little variance in response time.
Our live environment is time sensitive and much more complicated. It serves dynamic web services throughout Europe with time-outs of 1 second. Most requests are well within this time being sub 200ms outside the UK and sub 100ms within the UK. However
we have been investigating because no matter how much we improve the network, load balancers, application code, firewalls we still have a proportion that are slower. We monitor at several stages inside our infrastructure and can see that a response from IIS
that has a 30ms latency between it and the balancers has a customer response time as indicated above; When a response from IIS to the balancers is 300ms this is exaggerated to the customer at above our 1 second time out. These 10 times the normal compute time
responses happen at roughly the same frequency as our experimentation with static content would suggest.
Full process explorer (Sysinternals) shows 34 processes not including procexp.exe.
I am going to set the affinity to all processes such that IIS and its workers are the only processes on one core and set it to have a higher priority to see if that makes a difference, I'll post my findings tomorrow.
I have tried the processor affinity experiment and a curious result has occured.
I set the affinity such that W3WP was (to the extent that you can with procexp) the only process with affinity to cpu3 everything else was affinitied to cpu0+1+2.
While watching the latency in realtime against the CPU graph I can see that at the moment that the latency spikes the CPU spikes and that half of it is red thus System. Furthermore at 100 requests per second there is a definite periodicity to these spikes
of around 45 seconds. When I change the requests per second the period changes; more requests per second and the period is shorter, less and the period extends.
SamRowe
7 Posts
Spikes in latency
Jun 14, 2012 04:50 PM|LINK
Hi All,
I am running a web farm of IIS7 on Server 2008 Web edition. We max out at 150 hits/sec at peak times load balanced across 4 servers.
One of our customers is complaining they are getting time outs and slow responses to one of our web services so I have been investigating.
I started by looking through the logs and found that most requests are serviced in sub 120ms but some hang on for seconds!
I followed up by running my own load tests from an external linux VPS using httping. This confirmed the behavior our client and the logs were showing.
To track down the issue I used powershell to write a quick test script * so I could test behind the firewall; all of the following figures are based upon script looping through 10,000 requests. I got the same behavior but less pronounced (some requests taking 1000+ms with an average of 18ms).
I thought maybe it was the Load balancer, so I ran the script on the web servers (one by one); Same behavior, less pronounced again ( long requests taking 25+ms avg 1ms) .
I reduced complexity and ran it against localhost with the default website; same behavior same times as above .
I took it to the test lab, clean (ish) install of 2k8 default website in IIS7; same results.
I am now perplexed? I have looked online for idea's on why I am getting seemingly random high latency spikes. I am sure that other things in my network such as Load balancers ect are amplifying the issue externally but to begin with I aim to get my max requests in my lab to a same order time as my average.
*SCRIPT WARNING
$client = new-object System.Net.WebClient
$slow=0
for($i=1; $i -le 10000; $i++)
{
$time = (measure-command { $client.DownloadFile( "http://localhost/", "C:\data\index.tmp")}).TotalMilliseconds
if($time -gt 10)
{
echo "Time over 10 ms: " $time
}
}
/SCRIPT
Obviously if you Tee-Object -file the output of the measure-command to a text file you can put it in a spreadsheet and Analise .
I've been getting a quick break down out of Excel that looks like this.
Any idea's on things I could monitor, tune, ect?
Regards
*PS I vote for an upgrade of the WYSIWYG editior so that it works in Opera.
windows server 2008 IIS 7 performance Latency spike Performance measuring
owjeff
680 Posts
Re: Spikes in latency
Jun 15, 2012 02:29 PM|LINK
Given the dataset it sounds like it could be something between the load balancers and the backend servers (if the servers max response is 25ms but the load balancer test maxes at 1000ms, that's a big jump). Are you using ARR for load balancing? If so, try turning on Failed Request Tracing. You'll probably also want to run network traces between the web server and load balancer to see if it's something lower level (like a ton of retries).
OrcsWeb: Managed Windows Hosting Solutions
"Remarkable Service. Remarkable Support."
SamRowe
7 Posts
Re: Spikes in latency
Mar 04, 2013 10:53 AM|LINK
I have come back to this issue with a new set of tools and a new method that has reproduced similar results.
Setup:
One server (Quad core i5 with 8GB RAM and SSD) running a clean install of Windows Server 2008 std
Method:
LoadUI project:
Run this LoadUI project for 30 minutes (I can provide an export of my project if anyone wants to test)
While running the test the machine has the network cable removed; to reduce impact of network IO causing interrupts
All windows minimised; to reduce the CPU load of rendering LoadUI
System locked (Winkey + L) to ensure no one tampers with the system while running the test.
Results:
This shows 0.165% of request being double or greater than the avg response time.
Request to the community:
Can anyone suggest how to tune IIS 7 so that we consistently get 1-2ms requests on localhost with static content? So that we can apply these techniques to our dynamic web service and attempt to remove these time-outs that I am sure are exaggerations of the underlying slow responses.
<div style="padding-left: 30px;"> </div>Rovastar
3321 Posts
MVP
Moderator
Re: Spikes in latency
Mar 04, 2013 11:24 AM|LINK
I expect there is something else going on rather than in IIS.
Maybe writing things to log files slows things slightly, etc.
I could be anything a time of a few extra milliseconds is nothing to be too concerned about. Surely that is within the margin of error for a app. The app itself might not respond instantly 100% of the time.
SamRowe
7 Posts
Re: Spikes in latency
Mar 04, 2013 12:11 PM|LINK
Thank you for the response however....
Originally I thought it may be another process that was slowing down our live service which is why I have paired down my test bench to a fresh install of server 2008 std with only the IIS role, java and the testing software installed.
To test the theorem that it may be CPU bound I set an affinity for the whole of IIS to CPU 0 and CPU 1 while limiting Java to CPU 2 and CPU 3 however the same pattern of results emerged.
I have disabled IIS logging and I am using an SSD to mitigate the time taken to log the output data from LoadUI. I did a run without logging at all in IIS or LoadUI and it also showed signs of latency spikes however I couldn't tell how frequently because I was only graphing the data in real-time not saving the raw data.
I may try and run remote load testers and put the server back onto the network to remove the possibility that it is Java or LoadUI interfering with the results however this will reinstate the possibility that it is a network latency.
It concerns me because this is only testing static content over localhost, I would expect very little variance in response time.
Our live environment is time sensitive and much more complicated. It serves dynamic web services throughout Europe with time-outs of 1 second. Most requests are well within this time being sub 200ms outside the UK and sub 100ms within the UK. However we have been investigating because no matter how much we improve the network, load balancers, application code, firewalls we still have a proportion that are slower. We monitor at several stages inside our infrastructure and can see that a response from IIS that has a 30ms latency between it and the balancers has a customer response time as indicated above; When a response from IIS to the balancers is 300ms this is exaggerated to the customer at above our 1 second time out. These 10 times the normal compute time responses happen at roughly the same frequency as our experimentation with static content would suggest.
Rovastar
3321 Posts
MVP
Moderator
Re: Spikes in latency
Mar 04, 2013 05:47 PM|LINK
IF you really want to look in more I would first look at Failed Request tracing to see if it thinks there are bottlenecks.
But my real point is then even when you think a computer is doing nothing it is doing something. It can be one of these things.
Run Process Monitor is *really* wnat to know what is occuring. But you will be in for a heavy ride.
SamRowe
7 Posts
Re: Spikes in latency
Mar 05, 2013 01:24 PM|LINK
I really want to get to the bottom of this.
Today i have run the Failed Request tracing and all the delay seems to be in
GENERAL_RESPONSE_ENTITY_BUFFER
A quick bing of this shows others are having similar issues but without any futher comment on how to proceed. - http://forums.iis.net/t/1179627.aspx/1
Full process explorer (Sysinternals) shows 34 processes not including procexp.exe.
I am going to set the affinity to all processes such that IIS and its workers are the only processes on one core and set it to have a higher priority to see if that makes a difference, I'll post my findings tomorrow.
SamRowe
7 Posts
Re: Spikes in latency
Mar 06, 2013 10:10 AM|LINK
I have tried the processor affinity experiment and a curious result has occured.
I set the affinity such that W3WP was (to the extent that you can with procexp) the only process with affinity to cpu3 everything else was affinitied to cpu0+1+2.
While watching the latency in realtime against the CPU graph I can see that at the moment that the latency spikes the CPU spikes and that half of it is red thus System. Furthermore at 100 requests per second there is a definite periodicity to these spikes of around 45 seconds. When I change the requests per second the period changes; more requests per second and the period is shorter, less and the period extends.