IIS 7 and Above
Application Request Routing (ARR)
WAS Error ID 5010 - ARR 3.0 + Websocket + Win2012R2
Last post Jun 28, 2017 08:42 PM by rpf_br
Jun 27, 2017 12:06 AM|rpf_br|LINK
I have an ARR 3.0 proxy server (last version), new installation, running on Windows 2012 R2 (IIS 8.5), publishing a new application that uses Websockets.
From time to time (40 min / 1 hour) the Event Viewer logs an Error 5010 - WAS, "A process serving application pool 'xxxxxx' failed to respond to a ping. The process id was xxxxx". The current connections (the websockets) down to 0, and the ARR status page
"reset" (the status information).
Seems like something with the number of the sockets opened? Or sockets not being closed correctly?
The application has a chat buit-in, so each member opens a websocket to use the chat features. Seems like these connections get ARR to crash.
I never work with websockets, but manage an ARR cluster for 4 years, never get these kind of problems (all request get "freeze" from time to time, until the application pool refreshed and start to work again).
How can an site beside a reverse proxy crash everything?
Any suggestions? Thanks
Jun 27, 2017 01:34 AM|Rovastar|LINK
What does the worker process health look like? Do you have a memory leaking app pool? (maybe it is running out of resources - i sometimes see arr taking up way to much memory that I think it should for just rewrite rules and routing traffic)
for more general 5010 troubleshooting it sadly can get a little heavy have a look here.
I am not using websockets on ARR so I cannot help much there and I only used them a few times in a none ARR setup many moons ago.
Jun 27, 2017 04:39 PM|rpf_br|LINK
Thanks for the response. I have an ARR cluster that hosts something like 200 webfarms, with 6 different sites/application pools based on type of sites or group of sites. I have 2 servers with 2 CPUs and 8GB RAM each, on an active-active mode with NLB, with
6000 current connections at this cluster +/- and 200 requests/sec. Never have any problem related to resources. Even if i stop one node on NLB, just one ARR node can handle all these access without problems.
When i put this specific site (who uses Websockets) all the 200 sites hosted on this ARR cluster (and the specific site) started to have conectivity problems, like "randon freezes", and a lot of 5010 errors.
And i can ensure that it's directly related, as i never have an 5010 error before this websocket site (i searched on Event Viewer).
After a week trying to troubleshoot everything, my last move is to create a new ARR cluster, with same hardware config, and migrate only these specific site. Conclusion:
- "old" cluster with 200 webfarms: no 5010 errors
- "new" cluster with only this site: 20 +/- 5010 errors/day
So seems like it's something related to the websocket support... My guest is that ARR handles the websocket connections (shows at "monitoring and management" as "current requests") and then with the number being increased, somewhere between 50-6 current
requests the application pool shuts down).
Jun 28, 2017 05:27 PM|Rovastar|LINK
Not sure what to suggest you have done most of the things I would think of.
But I would remove NLB from the equation to see if that is effecting it. e.g. have teh traffic just hit a ARR box and not NLB to 1 ARR.
and maybe try some failed request tracing on it the websocket stuff.
But it sounds liek you might ahve some bug there but it would be helpful to look at the memory usage (Debug Diag) and the actual traffic to try and astablish a pattern but you may need Microsoft support assistance to get anywhere meaningful with it.
Jun 28, 2017 08:42 PM|rpf_br|LINK
Thanks for the awnser.. I've done this, stopped one of the ARR nodes, making that all traffic goes to just one node.
Seems like some connections get stuck, for example, 10 websockets opened, 9 closed... and then these stucked websockets grown until it causes the restart of the pool (the PID changes when the crash occurs, so the app_pool recycles and get a new PID).
I never worked with websockets, so i don't have experience managing these connections. Only thing i know is that ARR won a hotfix to fix websocket problems on Windows 2012/2012R2. Maybe the problem wasn't solved at all.
I can say that not seems to be an installation error, as i created a brand new ARR server with latest tools and updates, only to reverse proxy this system, but the error persists.