IIS 7 and Above
Application Request Routing (ARR)
ARR Disk Cache Invalidation Process
Last post Nov 01, 2011 06:50 AM by richtea
Aug 25, 2009 09:42 AM|JAltrichter|LINK
I am working with ARR v2 Beta 2 on a simple disk caching test and would like to understand if the behaviour I am experiencing is normal
Here's the test scenario. I have a single ARR server, routing all requests to a single node server farm. I create a simple Html page with text and an Image. I then load the page on another computer in IE8. I confirm the page and image are cached in my
ARR cache via opening the cache directory in explorer and browsing the cache content through IIS manager.
At this point, I go back to the source for the testing page and update the text. Then I go to the IE8 computer and refresh the page and it always takes 2 page reloads to update to the new version of the page.
I suspect that the first request is is necessary to somehow invalidate the current cache object, but have not seen this functionality explained anywhere. I would like to understand how the disk cache is working. In this scenario.
Also, I know this is a caching issue because any attempt to load the page through a different path/host name will return the most current page version (because it is not cached).
Ultimately, I would like to setup a Cache Control Rule to exclude .HTM files from the cache, but have not be able to get the following rule to work:
Aplly rule: Always
Do Not Cache
Host Name: (target IP)
Thanks in advance.
Aug 25, 2009 10:28 AM|wonyoo|LINK
1) The first behavior that you are describing is working as designed. Basically, it's a design decision that was based on perf. vs. accuracy. For example, one approach is to optimize for accurarcy. Meaning, we will check for the freshness of the content
before serving. The other approach is to optimize for perf. Meaning, we will serve it first, then check for the accuracy (and if it is still fresh, then no harm done. if it is not fresh, then update the cache.)
We opted for the 2nd b/c we designed ARR for a large volumn traffic environment and didn't think having one user get the stale content for the benefit of better overall perf (reduced latency) is an acceptable trade-off.
2) The rules that you speak of, you probably want to use the real hostname and not the target IP. ARR uses this info, along with URL rewrite, to inspect the incoming requests. So you will want to provide the info that ARR/URL rewrite can use to match against
the incoming HTTP headers.
Aug 25, 2009 10:41 AM|JAltrichter|LINK
Thanks for the speedy reply. I think I solved my mystery regarding the Cache Control Rules and now I have another question.
First, I did not mention that I happened to be testing with defauft page, Default.htm. I was typing
http://(target)/ as my URL. When I did this the page was cached.
Then I tried typing http://(target)/Default.htm as my URL and noticed the page would no longer cache. Which was the expected behaviour.
So I infer that the cache control rules are based on original URL not the translated path. Is that correct or is there something more interesting to this story?
Aug 25, 2009 10:58 AM|wonyoo|LINK
You are right again, but here is more lengthy explanation.
The fact that http://target/ shows the default.htm is due to how the default page configuration is setup on the origin server. However, the URL that is going through ARR is still
http://target/ (and not
http://target/default.htm). ARR, like most other cache proxies, uses the URL as the "key" to store the cached content. So this case, as far as ARR is concerned, it is looking at
http://target/ and it is unaware of that the content server may do with this request behind the scenes (in this case, serving the default page based on the configuration.)
Aug 25, 2009 11:13 AM|JAltrichter|LINK
I assume the prescribed solution for this condition, might be create a Global URL rewrite rule on the ARR server that redirects input for a blank URL explicitly to the default page.
If that rule processes before the ARR Cache Control rule then I would think you would get the same behaviour for the default page as all other HTML pages.
Aug 25, 2009 12:50 PM|anilr|LINK
Unless you are pretty sure that the default document will not change on your origin server, doing such a rule on the ARR server sounds risky. Maybe, you can do a rule for all urls with trailing /'s to cover all directories.
Mar 02, 2011 08:58 PM|dmeagor|LINK
>We opted for the 2nd b/c we designed ARR for a large volumn traffic environment and didn't think having one user get the stale content for the benefit of >better overall perf (reduced latency) is an acceptable trade-off.
Can anyone confirm that this behavior has now been changed?
As far as I can tell ARR does not re-request anything from the origin server until the page expires (max-age value is exceeded.)
I've not managed to find any combination of cache-control or expiry settings that cause ARR to recheck the data. The proxy-revalidate header is completely ignored and other settings like must-revalidate" cause the page not to be cached in the first place.
Nov 01, 2011 06:50 AM|richtea|LINK
I'm having the same problem, there appears to be no method to invalidate cache content as there is in other systems.
We have had to fall back on deleting the file from the disk cache, but this is very unreliable as IIS keeps the file locked for long periods and in busy situations will be serving from memory with is not effected by this approach.