« Previous Next »

Thread: Content server doesn't receive all requests

Last post 09-23-2009 2:22 PM by Mark S. Rasmussen. 3 replies.

Average Rating Rate It (5)

RSS

Page 1 of 1 (4 items)

Sort Posts:

  • 09-23-2009, 10:28 AM

    Content server doesn't receive all requests

    Hi,

     I've added a third content server to our current ARR cluster consisting of a dedicated ARR machine as well as two dedicated content servers behind it.

    When I enable the new server, rather quickly the number of current requests grows up to the point where it doesn't receive any traffic anymore due to using lowest-current-requests load balancing.

     If I run "appcmd list request" on the ARR machine, there's a number of hints of the outstanding requests. A lot of the requests are basically for static files (we have some code running on them as well, but it's very basic and usually runs in no time), but some are for normal app files as well. Usually we have an avg. response time of about 30ms, so the time values are certainly not usual.

    REQUEST "190000408000f700" (url:GET /Modules/FlipMag.swf, time:1019061 msec, client:85.200.96.170, stage:ExecuteRequestHandler, module:ApplicationRequestRouting)
    REQUEST "a00000c08000c6cb" (url:GET /Modules/FlipMag.swf, time:1016924 msec, client:80.63.228.160, stage:ExecuteRequestHandler, module:ApplicationRequestRouting)
    REQUEST "8800004180003a2c" (url:GET /Modules/ClassicShopInterface.swf, time:1015270 msec, client:93.161.100.127, stage:ExecuteRequestHandler, module:ApplicationRequestRouting)
    REQUEST "580000c080010d78" (url:GET /Modules/FlipMag.swf, time:983602 msec, client:93.178.137.134, stage:ExecuteRequestHandler, module:ApplicationRequestRouting)

    I've enabled FREB on the ARR machine and sure enough I get a number of 502.3 errors. I've looked at one in particular:

    <failedRequest url="http://xxx.yyy:80/ViewFile.ashx?FileID=10599"
                   siteId="1"
                   appPoolId="DefaultAppPool"
                   processId="1680"
                   verb="GET"
                   remoteUserName=""
                   userName=""
                   tokenUserName="NT AUTHORITY\IUSR"
                   authenticationType="anonymous"
                   activityId="{00000000-0000-0000-8DDA-008080010097}"
                   failureReason="STATUS_CODE"
                   statusCode="502.3"
                   triggerStatusCode="502.3"
                   timeTaken="32870"
                   xmlns:freb="http://schemas.microsoft.com/win/2006/06/iis/freb"
                   >

    Further into the log I can validate that the request is being routed to the newly added machine called CAESARS:

     <EventData>
      <Data Name="ContextId">{00000000-0000-0000-8DDA-008080010097}</Data>
      <Data Name="RoutingReason">2</Data>
      <Data Name="Server">CAESARS</Data>
      <Data Name="State">0</Data>
      <Data Name="TotalRequests">7889</Data>
      <Data Name="CurrentRequests">17</Data>
      <Data Name="BytesSent">5901518</Data>
      <Data Name="BytesReceived">298776461</Data>
      <Data Name="ResponseTime">135</Data>
     </EventData>

    However, if I review the current requests on CAESARS, there are no outstanding ones, and all is perfect in the logs. Also, CAESARS does not have any data on the above request in it's server wide IIS log.

    It would seem that the ARR machine is routing the request to the content server but the content server doesn't receive it for some reason. Currently I've got 86 failed requests out of 15k served (it's running at a weight of 5% to avoid too much disturbance), so it's far from all requets that fail, but certainly too many - especially as they interfere with the load balancing algorithm.

    I'm eager to hear any suggestions on what I can do to debug the issue. I've got FREB files available as well as IIS logs for the content servers. With a FREB file, I don't suppose an IIS log of the ARR machine makes sense.

    EDIT: CAESARS is running Server 2008 R2 Web x64 while the others are Server 2008 Web x64's, but besides IIS7 vs 7.5 I don't think it should matter.

    Thanks!

  • 09-23-2009, 11:12 AM In reply to

    • wonyoo
    • Top 50 Contributor
    • Joined on 12-15-2007, 6:46 PM
    • Posts 125

    Re: Content server doesn't receive all requests

    I think there are a couple of questions in there.

    1) ARR caches contents (in memory in ARRv1 and in memory/disk in ARRv2).  So when there is a cache hit at ARR, these reqeusts will not be forwarded to your content servers since they have already been responded with the cached content by ARR.

    2) As for the 502 error, what is your proxy timeout setting?  I am guessing 30 seconds (since the timetaken above is suspiciously close to 30sec (32870).  You may want to look at other logs and see if you see similar patterns.  If so, increase the timeout value under proxy settings in ARR.

    3) Also, could you take a look at your event log and see if any of the logs are written by ARR?  I am curious to see what kind of settings you have for health test and see (via the event log) if some of the servers are being marked as healthy/unhealthy.

    Hope this helps.

  • 09-23-2009, 12:59 PM In reply to

    • anilr
    • Top 10 Contributor
    • Joined on 05-23-2006, 10:13 PM
    • Redmond, WA
    • Posts 2,343

    Re: Content server doesn't receive all requests

    ARR adds a header with a unique guid to each request it sends to the content server - so if you have IIS logs from the ARR server (or a freb log for one of the failing requests) and you have all the freb logs from the content server, we should be able to find the request on the content server corresponding to the one timing out on ARR - you can send me the logs (zipped) at anil (dot) ruia (at) microsoft (dot) com if you want.

    Anil Ruia
    Senior Software Design Engineer
    IIS Core Server
  • 09-23-2009, 2:22 PM In reply to

    Re: Content server doesn't receive all requests

    @wonyoo
    Thanks for the clarifying questions.

    1)
    Memory cache is set to 5 seconds, disk cache is disabled. Query string support is set to disabled. If the request was cached I'd also guess it wouldn't be an outstanding request timing out.
    <cache enabled="false" queryStringHandling="NoCaching" />

    2)
    The proxy timeout is set to 3600 seconds as we have some long running file downloads:
    <protocol timeout="01:00:00" includePortInXForwardedFor="false">

    3)
    Health testing is disabled and all servers are marked as healthy. The only events in the event log are from when I've taken the CAESARS server manually online/offline while debugging the missing requests.

     

    @anilr
    I haven't tried comparing the GUIDs in the ARR and content server logs. I'll try and gather the corresponding logs and see if I can match them up. Unless I discover something I'll send them to your mail address. Thanks!

Page 1 of 1 (4 items)
Microsoft Communities