I think we can do some accessed/dirty bit handler tuning. E.g. in my patch (based on the Christoph's one entitled "Fix race in the accessed/dirty bit handlers"), I think we gain a bit by: - using the "nta" hint in order not to "pollute" the caches L1D / L3 - using the "bias" hint in order to obtain the "E" cache state at the beginning (the additional snoop bus cycle for the "S" => "E" state transition is eliminated) - not testing the result of "cmpxchg" (we'll re-read the PTE and compare it anyway) Thanks, Zoltan