Looking at the log files I can see you are waiting several minutes between postbacks. That is also a way to avoid the issue. You cannot wait a couple minutes between posts as that seems to clear the issue as well (and is part of the reason why I feel it
is related to FireFox's handling of HTTP connections).
Ummh I thought the longer you left it the more of an issue it was. So there is a peak problem time of say 1 minute and up to what 5 minutes? When a (one of the many) timeout occurs it is all ok again?
I haven't used wireshark much do you have a netmon output?
Alrighty, I've got a ticket open with MSDN support and they're looking into the issue. I was able to reproduce the issue, and provided MS with the following:
Wireshark dump from the client
Wireshark dump from the server
iis log
failed request log
You can see a whole bunch of black (TCP Retransmissions) on both sides until the request finally comes back. IIS logs show nothing until all the TCP business gets sorted out, at which point the response is normal. As I suspected, nothing of note in the
failed request logs.
I'll keep the group posted on how this gets resolved.
Excellent. I didn't know how to gather some of those logs or how to read them. I'm glad that someone that knows how to handle that is handling that. I hope that you'll post results of anything you find to this forum.
The Mozilla team seems to think that IIS7 is not properly closing connections. FireFox apparently holds connections for 5 minutes and attempts to reuse them during this time period. This may be longer than other browsers and may explain why FireFox is the
only one that displays this issue. See the updates in the bug report for more details: https://bugzilla.mozilla.org/show_bug.cgi?id=491541
It might be worth asking the MSDN support people to look at how IIS7 (or Windows Server 2008) is closing TCP connections...
Ok so it seems that in some situations that a connection has been left open.
The firefox people seem to think that they send out the packet information.
However we still don't know if the server is receiving the packet. If the packet is correct (not corrupted) Then if it has been processed in the TCP/IP stack or then if it is getting to IIS.
What if it is a dodgy network card/server hardware/etc setup? *shrug* Did you set the server up or a hosting company?
If it connections issues you could try increasing your http keepalives timeout from 120 to something more. Hopefully that will give you more functionality.
What timeout settings are you using for your server? http, asp.net, sessions, etc, etc
That still doesn't explain why I am not getting any issues at all at the moment. And that too me is even more confusing as it goes against all the investigation that you have made so far as it shouldn't work.
I just waited 3 minutes on your test page and clicked 'next' and it is all ok.
First of all, this will most likely not be a "dodgy network card" This problem has been duplicated on several different servers. The commonalities are ALWAYS IIS7 and FireFox. This has been duplicated by MANY different people. You must wait a specific
amount of time before this occurs. My understanding is if you bring the page up, and wait a little more than 130 seconds (a little over 2 minutes), then try to proceed that the problem will most likely occur. As was already pointed out too, this does not
ALWAYS happen, but it does frequently happen. But, if you change either your browser, or to a different version of IIS (IIS6, for instance), this problem goes away.
Please explain to me how it could be a dodgy network card/server that only affects FireFox? You have suggested this numerous times, and each time I have reiterated that the problem is specific to a single browser version. One would expect a dodgy network
card (or any hardware issue for that matter) to have problems across the board.
I believe the reason it is intermittent (i.e. can't be reproduced 100% of the time) has to do with server load. Perhaps IIS/Windows leaves the connection open for a certain period of time until the connection resource is disposed of on the server? Perhaps
it does this more often as the server gets more load? That would explain why you have a tuff time reproducing it as you are hitting our site at a very slow time of day (European daytime).
Here is Microsoft's response to the ticket I opened.
Firefox is waiting for more than the standard 2 minutes before trying to re-use the connection.
Firefox never sends "FIN" command(FIN- Finish is used during a graceful session close to show that the sender has no more data to send) to the server, so it cannot re-open the connection.
IIS times the request out as expected, due to the default 2 minute ConnectionTimeout setting of HTTP.sys.
The IIS server, however should not be waiting for 9 seconds to send a reset. So we doubt that there could be some issues with the NIC or NIC drivers which initiates this waiting.
So, the part of the problem here is Firefox trying to reuse an old connection.
The other problem seems to be with TCP on the server not issuing a timely RST(RST- Reset is an instantaneous abort in both directions (abnormal session disconnection)).
Recommendations:
Let’s disable TCP chimney and/or update NIC drivers on server.
Lets run the following command to disable the TCPChimney,
Netsh int ip set chimney DISABLED
Unfortunately that command didn't work for me... I kept getting "command not found". So they had me add the following registry entries:
Lets add the following registry entries and set those values to zero.
The 5-minute timeout is probably why the issue is specific to FireFox browsers. MSIE defaults to a 60-sec timeout, which is a shorter window than IIS, so that explains why MSIE appears to be unaffected. I have been testing to try to see if IIS is properly
sending a RST to close/cancel the connection after the server timeout is reached, but I don't see IIS responding at all. Seems like either IIS has a bug and it isn't sending the proper RST response, or there is something in the stack that is preventing this
command from reaching the browser. Seems like disabling TCPChimney is an attempt by MS to see if the problem lies in the TCP data processing... When it is enabled it appears to pass this off to the network card for handling. Hope disabling it fixes the problem!
Rovastar
3321 Posts
MVP
Moderator
Re: Pages appear to stop responding
May 07, 2009 12:57 PM|LINK
Ummh I thought the longer you left it the more of an issue it was. So there is a peak problem time of say 1 minute and up to what 5 minutes? When a (one of the many) timeout occurs it is all ok again?
I haven't used wireshark much do you have a netmon output?
pbreitz
8 Posts
Re: Pages appear to stop responding
May 07, 2009 07:12 PM|LINK
Alrighty, I've got a ticket open with MSDN support and they're looking into the issue. I was able to reproduce the issue, and provided MS with the following:
You can see a whole bunch of black (TCP Retransmissions) on both sides until the request finally comes back. IIS logs show nothing until all the TCP business gets sorted out, at which point the response is normal. As I suspected, nothing of note in the failed request logs.
I'll keep the group posted on how this gets resolved.
-paul
ianderson
22 Posts
Re: Pages appear to stop responding
May 07, 2009 07:51 PM|LINK
Fantastic! Great work Paul!
I actually opened up a case in the FireFox bugtracker to attack the problem from that angle as well:
https://bugzilla.mozilla.org/show_bug.cgi?id=491541
Hopefully one of these parties will figure out what the issue is...
VorlonShadow
79 Posts
Re: Pages appear to stop responding
May 07, 2009 08:01 PM|LINK
Excellent. I didn't know how to gather some of those logs or how to read them. I'm glad that someone that knows how to handle that is handling that. I hope that you'll post results of anything you find to this forum.
Thanks!
Jesse
ianderson
22 Posts
Re: Pages appear to stop responding
May 08, 2009 11:43 AM|LINK
The Mozilla team seems to think that IIS7 is not properly closing connections. FireFox apparently holds connections for 5 minutes and attempts to reuse them during this time period. This may be longer than other browsers and may explain why FireFox is the only one that displays this issue. See the updates in the bug report for more details:
https://bugzilla.mozilla.org/show_bug.cgi?id=491541
It might be worth asking the MSDN support people to look at how IIS7 (or Windows Server 2008) is closing TCP connections...
Rovastar
3321 Posts
MVP
Moderator
Re: Pages appear to stop responding
May 08, 2009 12:31 PM|LINK
Ok so it seems that in some situations that a connection has been left open.
The firefox people seem to think that they send out the packet information.
However we still don't know if the server is receiving the packet. If the packet is correct (not corrupted) Then if it has been processed in the TCP/IP stack or then if it is getting to IIS.
What if it is a dodgy network card/server hardware/etc setup? *shrug* Did you set the server up or a hosting company?
If it connections issues you could try increasing your http keepalives timeout from 120 to something more. Hopefully that will give you more functionality.
What timeout settings are you using for your server? http, asp.net, sessions, etc, etc
That still doesn't explain why I am not getting any issues at all at the moment. And that too me is even more confusing as it goes against all the investigation that you have made so far as it shouldn't work.
I just waited 3 minutes on your test page and clicked 'next' and it is all ok.
VorlonShadow
79 Posts
Re: Pages appear to stop responding
May 08, 2009 12:45 PM|LINK
First of all, this will most likely not be a "dodgy network card" This problem has been duplicated on several different servers. The commonalities are ALWAYS IIS7 and FireFox. This has been duplicated by MANY different people. You must wait a specific amount of time before this occurs. My understanding is if you bring the page up, and wait a little more than 130 seconds (a little over 2 minutes), then try to proceed that the problem will most likely occur. As was already pointed out too, this does not ALWAYS happen, but it does frequently happen. But, if you change either your browser, or to a different version of IIS (IIS6, for instance), this problem goes away.
Jesse
ianderson
22 Posts
Re: Pages appear to stop responding
May 08, 2009 12:47 PM|LINK
Please explain to me how it could be a dodgy network card/server that only affects FireFox? You have suggested this numerous times, and each time I have reiterated that the problem is specific to a single browser version. One would expect a dodgy network card (or any hardware issue for that matter) to have problems across the board.
I believe the reason it is intermittent (i.e. can't be reproduced 100% of the time) has to do with server load. Perhaps IIS/Windows leaves the connection open for a certain period of time until the connection resource is disposed of on the server? Perhaps it does this more often as the server gets more load? That would explain why you have a tuff time reproducing it as you are hitting our site at a very slow time of day (European daytime).
pbreitz
8 Posts
Re: Pages appear to stop responding
May 11, 2009 04:11 PM|LINK
Here is Microsoft's response to the ticket I opened.
Firefox is waiting for more than the standard 2 minutes before trying to re-use the connection.
Firefox never sends "FIN" command(FIN- Finish is used during a graceful session close to show that the sender has no more data to send) to the server, so it cannot re-open the connection.
IIS times the request out as expected, due to the default 2 minute ConnectionTimeout setting of HTTP.sys.
The IIS server, however should not be waiting for 9 seconds to send a reset. So we doubt that there could be some issues with the NIC or NIC drivers which initiates this waiting.
So, the part of the problem here is Firefox trying to reuse an old connection.
The other problem seems to be with TCP on the server not issuing a timely RST(RST- Reset is an instantaneous abort in both directions (abnormal session disconnection)).
Recommendations:
Let’s disable TCP chimney and/or update NIC drivers on server.
Lets run the following command to disable the TCPChimney,
Netsh int ip set chimney DISABLED
Unfortunately that command didn't work for me... I kept getting "command not found". So they had me add the following registry entries:
Lets add the following registry entries and set those values to zero.
EnableTCPChimney
Type: REG_DWORD
Values: 1 (enabled) 0 (disabled))
EnableRSS
Type: REG_DWORD
Values: 1 (enabled) 0 (disabled))
EnableTCPA
Type: REG_DWORD
Values: 1 (enabled) 0 (disabled))
I need to reboot and try to reproduce again, but I figured I'd share what I've got so far in case anyone else has better luck.
-paul
ianderson
22 Posts
Re: Pages appear to stop responding
May 11, 2009 04:25 PM|LINK
The 5-minute timeout is probably why the issue is specific to FireFox browsers. MSIE defaults to a 60-sec timeout, which is a shorter window than IIS, so that explains why MSIE appears to be unaffected. I have been testing to try to see if IIS is properly sending a RST to close/cancel the connection after the server timeout is reached, but I don't see IIS responding at all. Seems like either IIS has a bug and it isn't sending the proper RST response, or there is something in the stack that is preventing this command from reaching the browser. Seems like disabling TCPChimney is an attempt by MS to see if the problem lies in the TCP data processing... When it is enabled it appears to pass this off to the network card for handling. Hope disabling it fixes the problem!