Hi, Sending this to the ia64 list, because that is so far the only platform I have tested on, and because the patch may be more likely to have real applications on ia64 systems. I have been looking at different implementations of unmapping and page table freeing recently. As a consequence, I came to notice that the vast majority of L2 cache misses on ia64 (and probably all architectures) in an unmapping workload comes from the line: pte_t ptent = *pte; In zap_pte_range, ie. walking the bottom level page table pages. I should qualify that - that is the case when the page tables aren't in cache - this does not apply to a simple lmbench fork/exit test for example. Anyway, I tried prefetching a line ahead of the one we're currently working in, and put the prefetching into zap_pte_range, and copy_pte_range (which does a similar pte walk to set up page tables on fork()). microbenchmark results are pretty good - but I wonder if anyone might have a real-world use for it? After applying the recent freepgt patchset from Hugh (on lkml), the time to fork+exit a process mapping 64GB of address (32MB of page tables) is 0.471s. With the prefetch patch, this drops to 0.357s.