From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LehKq-0002vM-AZ for qemu-devel@nongnu.org; Tue, 03 Mar 2009 21:59:52 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LehKm-0002um-50 for qemu-devel@nongnu.org; Tue, 03 Mar 2009 21:59:51 -0500 Received: from [199.232.76.173] (port=36439 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LehKl-0002uj-V1 for qemu-devel@nongnu.org; Tue, 03 Mar 2009 21:59:48 -0500 Received: from mx20.gnu.org ([199.232.41.8]:64427) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LehKk-0004r0-Bq for qemu-devel@nongnu.org; Tue, 03 Mar 2009 21:59:47 -0500 Received: from mail.codesourcery.com ([65.74.133.4]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LehKU-0000WR-Pi for qemu-devel@nongnu.org; Tue, 03 Mar 2009 21:59:31 -0500 From: Paul Brook Subject: Re: [Qemu-devel] sh : performance problem Date: Wed, 4 Mar 2009 02:59:18 +0000 References: <49A6C317.1080202@juno.dti.ne.jp> <761ea48b0903031125n5d97462eu15caa552764789d9@mail.gmail.com> <1236119312.4005.13.camel@coalu.atr> In-Reply-To: <1236119312.4005.13.camel@coalu.atr> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200903040259.18791.paul@codesourcery.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Lionel Landwerlin > > > Great :) But we're still far from arm :( > By the way, does someone know why there is some kind of "tlb management > code" in exec.c ?? > > Does the SH4 architecture have special features that can't be handled in > a generic code ? Or are we just rewriting some code that is already > there ... ? I think you're missing the most important difference; SH uses a software managed TLB, whereas ARM uses a hardware managed TLB. The main consequence of this is that we don't have to model the actual ARM TLB at all, it is never directly visible. We effectively implement an infinitely large TLB. For SH the TLB is programmed directly, so we end up having to maintain two TLBs: The qemu TLB and the architectural SH TLB. For correct operation pages must be removed from the qemu TLB when they are evicted/replaced in the SH TLB. The SH TLB is quite small, and flushing qemu TLB entries is quite expensive, so this results in fairly poor performance. MIPS has a similar problem. However in that case the most common TLB operations do not directly expose the TLB state. In particular when setting a new TLB entry it is unspecified which TLB entry is replaced. At that point the OS can't know which ehtry was evicted, so we can lie, and not evict pages until the guest does something that allows it to determine the exact TLB state. In practice this is sufficient to make mips-linux workreasonably well. I'm not sure if the same is posible for SH. It probably depends whether URC is visible to/used by the guest. Large pages add even more complications. The qemu tlb canonly handle a single page size. In practice means that when large pages are used invalidating a single page entry requires the whole qemu tlb to be flushed. I'm pretty sure x86 getsand works mainly be chance (nothing actually ues large pages enough to notice it's broken). ARM takes the hit of a full TLB flush (linux breakss if you only flush a 1k region of a 4k entry), but single pge flushes are rare so in practice this doesn't hurt too much Paul