I need some help figuring out why we are sporadically receiving an error against a web service we have running in IIS 7. I think it is security related.
By sporadically I mean that for users that may be making dozens/hundreds of web service requests a day, they may receive this error 3-5 times. Some days they won't see it at all, others they may have only a dozen requests total but may encounter the error twice. This happens across different methods defined in the service.
Background: internal intranet .Net 2.0/3.5 web service using integrated security, runs on a Windows 2008 web farm that consists of two servers (error has happened on both servers in the farm). This is an application that was recently installed on IIS 7 and previously ran on IIS 6 with no errors of this sort (in fact we still have it running on IIS 6 in our development and staging environments still with no issues).
Okay, the actual error message received in our .Net client application:
The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
Attempting the same web service call again will always work. This error won't happen as the result of a 'first' call to the web service and will be fine for many, many more web service calls after the call that is successful right after the error.
Looking at the logs, I'll see that the first 'challenge/response' entry you get on applications with integrated security is timing out after about 15-20 seconds. The windows status code received is 121:
The semaphore timeout period has elapsed.
Here is some log entries that show the problem. The first two entries show a successful call. Two minutes later there will be the error on a subsequent call (note the status code and time elapsed, bolded), followed immediately by a retry that succeeds:
2009-04-01 14:22:34 10.67.8.94 POST /Specifications.asmx - 80 - 10.105.3.85 Mozilla/4.0+(compatible;+MSIE+6.0;+MS+Web+Services+Client+Protocol+2.0.50727.3082) 401 2 5 15
2009-04-01 14:22:36 10.67.8.94 POST /Specifications.asmx - 80 BEMIS\113584 10.105.3.85 Mozilla/4.0+(compatible;+MSIE+6.0;+MS+Web+Services+Client+Protocol+2.0.50727.3082) 200 0 0 1468
2009-04-01 14:24:02 10.67.8.94 POST /Specifications.asmx - 80 - 10.105.3.85 Mozilla/4.0+(compatible;+MSIE+6.0;+MS+Web+Services+Client+Protocol+2.0.50727.3082) 401 2 121 21342
2009-04-01 14:24:51 10.67.8.94 POST /Specifications.asmx - 80 - 10.105.3.85 Mozilla/4.0+(compatible;+MSIE+6.0;+MS+Web+Services+Client+Protocol+2.0.50727.3082) 401 2 5 15
2009-04-01 14:24:51 10.67.8.94 POST /Specifications.asmx - 80 BEMIS\113584 10.105.3.85 Mozilla/4.0+(compatible;+MSIE+6.0;+MS+Web+Services+Client+Protocol+2.0.50727.3082) 200 0 0 328
The .Net client code making the web service call will always create the service and set the service to the default credentials before executing a method:
service.Credentials = System.Net.CredentialCache.DefaultCredentials
(Hmm, I wonder if there is something on the client side regarding recreating the service object from scratch for every method; but again, this doesn't happen against IIS 6).
Another slight wrinkle to our setup is we have a secondary site that has the same physical path that uses Basic authentication as explained in this post: http://forums.iis.net/t/1155161.aspx
I wouldn't think that it would have anything to do with it since we have the same setup in IIS 6 and it works okay (albeit the configuration is setup differently between IIS 6 and 7). None of the web service calls against the basic authentication site have any errors.
Is this an IIS 7 configuration issue, an application client issue, or a network/infrastructure issue? If anyone has any ideas, they would be sincerely appreciated. Is the next step to turn detailed tracing on in IIS or add different logging fields?
Thanks,
-David