IIS 5 & IIS 6
IIS 6 Hangs. requires a reboot
Last post Apr 26, 2010 07:11 PM by raja.gregory
Jun 09, 2008 09:00 AM|d0z3r|LINK
Apologies if this issue has been answered, it's rather lengthy and hard to google!
Anyway, I have a public site getting moderate traffic - 1.5 million page views / month. Almost weekly, it hangs on Saturday AM and Sunday AM. Meaning externally, you get a page not found error. Interestingly it is still accessable internally (servers
are dual-homed, with 1 public and 1 private IP). Running asp and ASP.NET. Requires reboot to fix.
I have recently put the site on a totally new hardware platform with the latest 2003/R2 code and patches and the same behavior has continued.
Unfortunately I am NOT a programmer and don't have a full time coder on staff. I have tried debugdiag but every time I get an error on an incomplete or corrupted dump (both running it manually and on-demand).
Anything I can look at?
Jun 09, 2008 11:54 AM|HostingASPNet|LINK
Tell us more info, what is IIS configuration, ASP.NET Version, check your log files for errors, also when hang try to make in command prompt "telnet yoursite.com 80" to see what will happen.
Jun 09, 2008 12:36 PM|rfwilliams777|LINK
Ok, if I understand you correctly you have Windows 2003. Question, do you have
all the latest Windows Updates including recommended? I would start there.
Second, although server admins and other "experts" don't say this, I reboot my server at least once a week. It helps clear the cache. I don't recommend doing it during peak times. If you have a fairly decent server, you should only be down for a minute
or a minute and a half. I think that is enough "down time" to avoid the phone ringing later of other problems.
I would do a few other things and/or check on a few things. For one, set the page file size to be 2.5 times the size of your RAM (measured in MB). This can help Windows itself run better even if your resources are a bit strapped. Their is about 5 lines
in the page file/memory thing in the Task Manager. You typically want to try keeping your page file size around the first line. Past the halfway point is way too much load.
I would also see if you have enough RAM to really run things. I would also occasionally do some other diagnostics on your machine routinely. There are a few things that can also be adjusted in IIS.
If all that above is a bit too much to work on, you're welcome to hire out an IT company who can do it. I can make a suggestion. :)
Jun 09, 2008 12:56 PM|d0z3r|LINK
Thanks for the quick responses. I'm running ASP.NET 2.0, dual-core opteron 2.0ghz, 4GB RAM. Server has no other duties, but some dynamic pages do make odbc connections to a 1/2 dozen access databases (yeah, I know - that's a separate battle!).
There are no application or system errors logged when site goes down (there are some smtp and other errors that are expected after the site hangs). Web logs almost always show crawler activity right before hang. This past weekend, a baidu.com spider immediately
preceeded 2 crashes.
I have done the weekly reboot thing to clear caches / whatever, but 2 crashes in 2 days last weekend indicate that method would not have worked.
Next hang I'll try the telnet 80, however telnet is only enabled internally and from internal machines the site still works. So I expect to get a response running it internally. Externally telnet traffic is dropped at the edge.
Jun 09, 2008 01:14 PM|rfwilliams777|LINK
I would recommend trying some of the suggestions I gave and see if that works. That does seem a tad odd how a site crawler basically crashed your server...so to speak. Let us know if you need any assistance.
Jun 09, 2008 06:33 PM|ma_khan|LINK
Y saturday and sunday only?
Check if you have any scheduled task or any DB job running at that point of time?? These jobs sometimes put a lot of pressure on IIS if they require it?
Jun 10, 2008 05:43 AM|Rovastar|LINK
So it is working internal so how do you now IIS is hanging?
If it is all accessable internally going to the same site then I suspect a network/firewall issue.
I would be looking at the network/link layer. Have you tried changing your network drivers, new card, etc, even swop over the IPs, cables, etc Maybe a reboot is refreshing this.
What you are describing, so far, doesn't make sense. Either IIS works and serves pages or it doesn't. If it does serve pages internally then IIS is working. Having all the memory used up, etc would affect all requests, internal and external, so I would not
look down that route.
Why do you think IIS hangs? Do you have anything in event log, etc?
Do the requests get the IIS layer for the IIS logs? If they do what do the full logs say for the problem requests?
Do *all* the internal request get through? Do *none* of the external request get through?
Do the requets get the http (httperr.logs) layer?
Second, although server admins and other "experts" don't say this, I reboot my server at least once a week. It helps clear the cache. I don't recommend doing it during peak times. If you have a fairly decent server, you should only be down for a minute or
a minute and a half. I think that is enough "down time" to avoid the phone ringing later of other problems.
Eek! why don't you just clear the caches, tweak your settings?
I never have had a need to reboot a server 'once a week' if you are having problem you should be looking at the route cause. I could never justify that downtime but I guess it all depends on teh enviornment you work in.
Jun 10, 2008 09:43 AM|d0z3r|LINK
Again, thanks all. Trying to cover all the questions:
Usually weekends only, and the scheduled tasks are NIGHTLY - antivirus DAT downloads and server backups (veritas).
A member of the public or an OPS officer will be the one that notices the site is down. An OPS officer will then check it on the internal network where it is usually working. This internal network is privately IP'd and goes through a proxy server. What
may be happening is the proxy is caching the site and it only "looks" available. I have turned off ISA caching and will check this the next crash.
As far as a firewall / router issue, I have thought of that, but a WEB SERVER reboot ALWAYS fixes the issue. As far as a NIC issue, this is the second hardware platform where this behavior is occuring (ie different NIC, slightly different OS code - 2003R2
versus 2003 Gold). That also means different cables / different switch ports / etc.
All patches are current to this month.
There are no clues in the event logs. In the Web logs, crash occurs almost always during a crawler scan. My robots.txt now turns away all reputable bots, but I did notice this past weekend that the baiduspider was crawling immediately before both crashes.
Since I'm running IIS I can't try .htaccess to block baidu.com.
Jun 10, 2008 10:25 AM|Rovastar|LINK
OK maybe it is being cached. IMHO you need to clear this up and confirm what is happening.
The best way to test this is to:
a) open a page directly on the server.
b) access a page that no-one else would have accessed (obscure page or one not linked to anything and never accessed) and is not in any cache. Like a dynamic page - If you have a search box on your site search for some
random characters like jkfhjkflhflhl or search a database ,etc
c) setup an additional hostheaders and additional domains like test2.mysite.com to the same site. That way you can access the
That way you can test what exactly is occurring and avoid the caching issue. Get your ops guys to check this. I should not be too hard to get them to follow some simple instructions.
It is misleading for troubleshooting if you don't know if this occurs.
Is it a standalone box or made of multiple tier or in a farm, etc?
Do all types of pages get affected asp, html, etc
Logically if the traffic is getting through (i.e. no firewall/network/routing issues) then you will (should) hit the http.sys. Anything in the httperr.sys logs?
Long shot but do your own routing on the web server box?
Does anything occur in the IIS logs when the site is 'down'? What occur just before and just after. Turn on full logging (well you can leave off the lengthy cookies one) for IIS this might give you a better indication
of what happen just prior to the 'hang/crash'
Jun 10, 2008 02:55 PM|d0z3r|LINK
I'll check the cache next crash.
It's a standalone box and I own the routing fabric up to the T1 connection.
All pages are affected (nothing served).
Thanks for the reminder about the httperr logs. In one crash last weekend I had a Timer_MinBytesPerSecond immediately before the crash. In the other I had a 503 error on one of my AppPools (but that may have been after the hang.). Before that just a
bunch of Timer_ConnectionIdle messages.
I have full logging on in IIS. Imediately before BOTH crashes last weekend I was getting crawled by the baidu.com spider. I get crawled by them a couple of times a day though (they ignore robots.txt).
Jun 10, 2008 05:52 PM|ma_khan|LINK
One doubt that I have in mind is when you say "A reboot fixes the issue" you mean IIS reset ? or a server reboot?
I believe more than the IIS reset itself.. I would go for the AppPool recycle 1st... Recycle the target appPool and see if that works?
Again, are you sure that exactly when Externally your site is out, Internally it's working? I have seen such scenarios but never had to reset IIS to solve them... Coz they were usually due to bad programming or Network Glitches....
Try blocking the bots from the meta tags of your websites homepage... not sure if that's gonna make a difference.. but worth a shot :)
Jun 10, 2008 06:10 PM|rfwilliams777|LINK
I do server reboots; however, I have been testing a way to automate a restart of IIS. My main concern is for it to kick off without a user being logged in (like a service) and guarantee that it'll come up.
Jun 10, 2008 06:20 PM|ma_khan|LINK
How about a simple batch file and put that into the scheduled tasks!
Hope this helps...
Jun 11, 2008 09:15 AM|d0z3r|LINK
I just had to do another reboot, but first I tested the caching issue by creating a new page not linked anywhere in the site. I WAS able to access the site and that page internally, but nothing externally. I have no doubt it could be a programming error,
but since I don't code, I can't catch it myself.
I did not try an apppool recycle, but an iis service reset does not solve the issue. but a server reboot does.
again, right before this crash I was getting crawled - msnbot honored the robots.txt and went away; something called dibot was crawling some small pdf files at the time of the hang.
i'll try the meta tag idea.
thanks for your great help!
Jun 25, 2008 01:36 PM|d0z3r|LINK
i've continued tshooting and i believe your suggestion it's bad programming or network glitches is correct.
i've found rebooting my dmz switch (cisco 3550) fixes the external access problem. so i replaced the switch and again am seeing the site hang. i have a fair number of underruns on the firewall interface on the 3550. i have to set the port at 10mb half
because that's dictated by the pix... but i don't think that's the issue since the pix doesnt get mem or cache full errors. plus my isp connection is 1.5mb. the 3550 throughput is 6.5mb.
could it just be too much for the infrastucture during periods of high web connections (web crawlers, etc.)? would migrating to an asa firewall and a 3560e with 65mb throughput solve the issue? or could there still be a programatic element too?
Apr 26, 2010 07:11 PM|raja.gregory|LINK