With memory (and in conjuction with disk), it works slightly differently. Let me see if I can clarify.
Let's suppose that the memory cache interval is set to 5 seconds (default value.)
The flow would be something like:
1) Request is received by Windows server (http.sys, actually).
2) Cache miss at the kernel
3) Request forwarded to IIS/ARR
4) Cache miss at the disk
5) Request forwarded to origin and content received by IIS/ARR
6) Content cached to disk
7) Content cached to memory (for 5 seconds)
(Note that this is a two tiered caching system - and not counting the extreme corner cases, what's cached in memory will almost always be a subset of what's cached on disk.)
At T + 2 seconds
8) Same request received by Windows
9) Cache hit at memory (by http.sys)
10) response sent to the client
At T + 6 seconds (greater than the configured 5 second value)
11) SAme request received by Windows
12) Cache miss at memory (for memory caching, it will unload the object when it doesn't need it. This is one of the core differences between memory caching and disk caching. Memory is more scarce than disk - so after 5 (or configured) seconds, they are unloaded. They may be unloaded more aggressively depending on memory consumption, etc but that's all managed by kernel.)
13) Request forwarded to IIS/ARR
14) Disk cache hit (and it will also check for the freshness)
15) Content cached once again in memory (for 5 seconds - or configured value).
Hope that explains.