From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KIT9L-0004AM-HN for qemu-devel@nongnu.org; Mon, 14 Jul 2008 14:51:51 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KIT9J-0004A2-SU for qemu-devel@nongnu.org; Mon, 14 Jul 2008 14:51:51 -0400 Received: from [199.232.76.173] (port=56100 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KIT9J-00049z-Oo for qemu-devel@nongnu.org; Mon, 14 Jul 2008 14:51:49 -0400 Received: from mail2.shareable.org ([80.68.89.115]:38079) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KIT9J-00010G-LH for qemu-devel@nongnu.org; Mon, 14 Jul 2008 14:51:49 -0400 Date: Mon, 14 Jul 2008 19:51:47 +0100 From: Jamie Lokier Subject: Re: [Qemu-devel] [RFC][PATCH] x86: Optional segment type and limit checks - v2 Message-ID: <20080714185147.GA12436@shareable.org> References: <4874AB47.9090208@siemens.com> <487B2BC8.9050804@siemens.com> <20080714105531.GB2381@shareable.org> <200807141211.49825.paul@codesourcery.com> <20080714140238.GA5496@shareable.org> <20080714175027.GA6719@morn.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080714175027.GA6719@morn.localdomain> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin O'Connor Cc: qemu-devel@nongnu.org Kevin O'Connor wrote: > > - All segment bases are zero, and all limits are LIMIT > > (3GiB for old Linux in user mode). > > - When filling the MMU TLB, if it's for an address >= LIMIT, > > treat as MMU exception. > > That's interesting. > > If I understand correctly, you're suggesting using segment checks in > translated code if any segment has a non-zero base or a different size > limit. Thus limiting the optimization to those (common) cases where > the limits and bases are all the same. Presumably this would also > require LIMIT to be page aligned. Yes. I hadn't thought of non-zero bases, but that would work too. Base and limit have to be page aligned. The only code where that isn't true is likely 16 bit MS-DOS and Windows code, which being designed for much slower processes, should be fine with fully inlined segment checks anyway. > One could probably allow ES, FS, and GS to have different bases/limits > as long as their ranges all fit within the primary range (0..LIMIT of > CS, DS, and SS). You could also translate just the base offset adding, in cases where the limit covers the whole 4GiB. But that adds more translation key bits. It's probably worth including ES as a "primary" segment like DS. > It should be okay to always emit segment checks for > accesses that use these segments as they should be pretty rare. In Windows, and modern Linux, %fs or %gs are used in userspace code for thread-specific variables. In older Linux kernels, %fs is used to copy data to/from userspace memory. Maybe the translated checks for these are not enough overhead to matter? > Would this still work in 32bit flat mode? It will if QEMU's MMU TLB is always used, and uses an identity mapping. I'm not familiar with that part of QEMU, but I had the impression it's always used as it also traps memory-mapped I/O. In which case, it should work in 16 bit flat mode too :) > > - Flush MMU TLB on any interesting segment change (limit gets > > smaller, etc.). > > - Count rate of interesting segment changes. When it's high, > > switch to including segment checks in translated code (same as > > non-zero bases) and not flushing TLB. When it's low, don't put > > segment checks into translated code, and use TLB flushes on > > segment changes. > > - Keep separate count for ring 0 and ring 3, or for > > "code which uses segment prefixes" vs "code which doesn't". > > Why are the heuristics needed? I wonder if the tlb flush could just > be optimized. Even if TLB flush itself is fast, you need to refill the TLB entries on subsequent memory accesses. It's good to avoid TLB flushes for that reason. I'm thinking of code like this from Linux which does movl %fs:(%eax),%ebx movl %ebx,(%ecx) I.e. rapidly switching between segments with different limits, and the %ds accesses are to addresses forbidden by %fs. If you're inlining %fs segment checks, though, then no TLB flush will be needed. > One would only need to flush the tlb when transitioning from "segment > checks in translated code" mode to "segment checks in mmu" mode, or > when directly going to a new LIMIT. In these cases one could just > flush NEWLIMIT..OLDLIMIT. That's true, you could optimise the flush in other ways too, such as when changing protection ring, just flush certain types of TLB entry. Or even keep multiple TLBs on the go, hashed on mostly-constant values like the translation cache, and the TLB choice being a translation cache key so it can be inlined into translated code. I didn't want to overcomplicate the suggestion, but you seem to like funky optimisations :-) -- Jamie