From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zachary Amsden Subject: Re: pte_offset_map + lazy mmu Date: Fri, 09 Mar 2007 22:54:20 -0800 Message-ID: <45F2561C.8010604@vmware.com> References: <45F2521E.1000807@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <45F2521E.1000807@goop.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.osdl.org Errors-To: virtualization-bounces@lists.osdl.org To: Jeremy Fitzhardinge Cc: Chris Wright , Virtualization Mailing List List-Id: virtualization@lists.linuxfoundation.org Jeremy Fitzhardinge wrote: > Is pte_offset_map allowed to happen within lazy mmu? I presume not, > because you definitely don't want the mapping pte update to be deferred. > > Or, more specifically, is kunmap_atomic ever allowed within lazy mmu? = > I'm looking at kpte_clear_flush; I've already got a patch which turns > this into a pv_op, along with a Xen implementation. But I think its > probably an excess pv_op for a relatively minor corner case. It seems > to me that it would be better to define kpte_clear_flush as: > > #define kpte_clear_flush(ptep, vaddr) \ > do { \ > arch_enter_lazy_mmu_mode(); \ > pte_clear(&init_mm, vaddr, ptep); \ > __flush_tlb_one(vaddr); \ > arch_leave_lazy_mmu_mode(); \ > } while (0) > = > > and take advantage of mmu batching to make this operation efficient. = > But I'm not sure if this is safe. > > (Also, kmap_atomic could use set_pte_at rather than set_pte.) > > What do you think? > = We (artificially) don't allow nesting of lazy modes, they are intended = mostly to be isolated to make the guest interface simpler. We could fix = that, rather easily, but I think a better solution is possible. This could also be genericized in a different way. PTE updates should = be limited in scope to just a couple operations. We don't want to have = pv-ops for every possible combination of weirdness that can be = efficiently combined, because then pv-ops explodes. I propose adding a hint or flags field to the PV-op. In fact, the = address argument vaddr can be compressed down do a VPN, which gives 12 = more bits for flags to use. Now, you can pass hints: PV_LAZY_UPDATE PV_DEAD_TABLE PV_VADDR_VALID which the backend can use. In the kpte_flush example, you can now pass = PV_LAZY_UPDATE to combine the pte write with the flush. And in address = space destruction, you can pass PV_DEAD_TABLE, which allows you to = optimize away the pte writes which would otherwise trap to the = hypervisor, but are not needed, because in the Xen case, you switch off = the current process page tables during exit() (or should, anyway, to = avoid spurious traps to Xen before the page tables are freed), and in = our case, gets rid of these pte clears that don't need to be reflected = in the shadows because the shadow is about to die. And for updates in the current address space, you can pass = PV_VADDR_VALID to indicate the virtual address field is actually valid = (note that vaddr =3D=3D 0 is not a sufficient test, as updates to VPN 0 = mappings). This allows for various flush optimizations as well. This also gets rid of all the update_pte_defer junk in asm-i386 = includes. As long as we cooperate on the flags definition and native is = not adversely affect by shifting the vaddr down (P4 shift are slow - our = metrics with VMI showed no measurable high level disadvantage here for = native, but the design has changed, and we should re-verify), then this = solution is workable. It just requires us to cooperate to come up with = a common flags definition. Zach