IIS 7 and Above
Application Request Routing (ARR)
ARR Performance with File Uploads and Downloads
Last post Jul 06, 2017 02:28 PM by Rovastar
Jun 23, 2017 04:34 PM|tolian49|LINK
We have 2 ARR servers in an NLB cluster which route through server farms to 2 back-end servers. All servers are Windows Server 2016 / IIS10.
We're facing very significant performance problems when sending file data to the ARR server(s) and having them push to the back-end for storage/manipulation of that data. These are web requests which send data using the filefield element of a standard HTML
If we go straight to the back-end servers to make the request, which are local to us right now, we see network activity between 500mbps and 750mbps/second. Being on a gigabit network this makes sense. If we send the same request to the ARR servers via the
virtual IP from the NLB node then the inbound network activity in resource monitor looks to be maxing out at about 2mbps on the NIC and the outbound to the back-end server appears to be between 1 and 2mbps.
The ARR and back-end servers on on the same network within a VMWare environment. Spread between 2 hosts currently.
As you can imagine this is making the site pretty tough to use.
We see this same behavior with both file uploads over https and file downloads. Going straight to the back-end server is fast in both cases.
We've tried disabling IPv6, we've double-checked MTU = 1500, we've set proxy response buffer threshold to 0 to test as well. I've also been in the applicationHost.config to change the <webLimits /> declaration to include a limit of 0, but that didn't make
an impact. I've tried editing that within the IIS config editor too, but that won't let me change the default maxGlobalBandwidth from 4294967295 to 0 because it's out of range, even though 0 is listed as a minimum in the declaration. I've then changed it in
the text file to make the change stick. Still no improvement.
The requests are for large files of >100mb, but everything is configured for large files, and these files will finish with success generally. It's just taking a tremendous amount of time to get them to move. Smaller files of 5mb or so will definitely transfer
after a couple minutes and everything works as expected. Just should be super fast and it's not.
I've not had any real luck gathering useful data from failed request tracing. The requests will finish ultimately, so I'm not sure where to see anything useful about what the bottleneck is.
Not sure what we can do next. Any ideas?
Jun 25, 2017 12:28 PM|Rovastar|LINK
Jun 26, 2017 02:06 PM|tolian49|LINK
All tests are going through the same network yes. The ARR servers in NLB are on the same subnet as the back-end servers and we're doing all tests on an internal gigabit network. Amazing to see 1.6mbps on a gig network.
Interestingly we also see severe performance degradation if we send data to the ARR servers NOT in https. If I send https of a simple page it responds right away. If I send http of the same page request it may take 15 seconds or more and then respond. If
I send http or https to the back-end then it responds perfectly. No clue what's causing that one either... it may be related I suppose. Something not routing quite right.
Jun 26, 2017 05:10 PM|Rovastar|LINK
That might be more telling.
I would be looking at what the network traffic is doing. Spin up wireshark/netmon and compare tarffic through ARR and without.(it might be easier with the
Can the ARR ping the backend servers ok? (leave it running for a while)
I experienced weirdness with VMs under HyperV periodically spiking pings and that were not related to IIS/ARR setup.
I would also try ARR with and without the NLB.
Stuff like this is not easy to solve and your best bet is to eliminate as many options as possible.
I am not using much Windows 2106 myself yet so I suppose it could be platform specfic. But there is no reason I can think of that would think of that would limit traffic like this from a vanilla configuration.
Hope that is enough to go on for now.
Jul 05, 2017 02:22 PM|tolian49|LINK
To provide update, we've been doing a lot of testing and we've narrowed it down to a problem with NLB using multicast on Cisco nexus 9k switches over a vmware setup. Essentially we're losing a lot of packets. The methodology for NLB broadcasting traffic
vs other ways of keeping two servers synced up over a virtual IP seems to be a real problem with the virtual environment. There are fixes but they entail broadcasting the traffic heavily to all the physical NICs that the hosts use. We'd be sending a lot of
useless traffic constantly that way so it looks like we'll be removing NLB and relying only on ARR for the time being and then having to explore options for doing load balancing and application routing through other tech.
Jul 05, 2017 02:54 PM|Rovastar|LINK
Yeah NLB over Nexus kit seems to be an issue (from the default config at least)
As soon as you mentioned Cisco I thought about NLB issues. (I would even look for NLB and Catalyst switches or any other Cisco kit there is a load of info out there hopefully one config can work for you)
It is not really my area to know anymore if configs on the network kit can get it working or not. BUt I seem to remember many years ago unicast was used instead of multicast when NLB had issues.
Hope that helps.
Jul 05, 2017 08:25 PM|tolian49|LINK
Yeah unicast is an option, unfortunately it means flooding those ports which do a lot more than just web services since it's our VMWare hosts... Plus it causes trouble with vmotion if we hard assign to the ports and not the VLAN level...
That said I'm still seeing some performance degradation I've got to solve.
Transfer of 1.7 GB file to different configs:
1. To a single ARR front-end, no NLB, does the SSL offload and sends data to the back-end. 340mbps
2. Directly to the back-end web server, no SSL. 780-800mbps consistently
3. To a stand-alone web server, with SSL. 560mbps
All web servers have the same software and versions handling the uploaded file.
Does it make sense that SSL offloading is taking the SSL overhead and then slowing things down even further from the transfer? 340mbps is still reasonable, just not great.
Thanks for the advice!
Jul 05, 2017 08:55 PM|tolian49|LINK
Actually even more interesting is doing the test in IE instead of Chrome. All tests are much closer to the same speed:
close to 700mbps in IE for the ARR and the back-end and closer to 800 when it's just a single server doing all the work.
Vs the variations in Chrome that fall to as much as half of that. How could the web browser be seeing a different based on reverse proxy? Shouldn't that be seamless to the browser?
Jul 06, 2017 02:28 PM|Rovastar|LINK
I haven't don't any work with using huge files on a regular basis and it is a niche area for the stuff you are looking at. TBH it will require a large amount of investigation to understand all the nuances of it (windows reg settings, network kit, traffic
analysis and as you see browser specifications)
I would expect that the there would be some slowdown as you are going through another piece of kit. I mean it cannot be quicker or exactly the same speed than hitting it directly. SSL encoding would come into play too.
Make sure youa re offloading the SSL to the ARR.
And I would look into other things to like HTTP 2 (and the different ciphers used for SSL) that might effect the difference you are getting with the browsers.