From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:42331) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLxVX-00007g-4c for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:43:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLxVV-0000MP-VC for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:43:35 -0400 Received: from mail-vc0-f173.google.com ([209.85.220.173]:34812) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLxVV-0000MG-QW for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:43:33 -0400 Received: by mail-vc0-f173.google.com with SMTP id fl15so763716vcb.4 for ; Wed, 10 Oct 2012 07:43:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20121010142823.GM9643@ohm.aurel32.net> References: <1349814652-22325-1-git-send-email-aurelien@aurel32.net> <1349814652-22325-6-git-send-email-aurelien@aurel32.net> <20121010142823.GM9643@ohm.aurel32.net> Date: Wed, 10 Oct 2012 16:43:32 +0200 Message-ID: From: Laurent Desnogues Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [Qemu-devel] [PATCH 5/5] tcg/arm: improve direct jump List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: Peter Maydell , qemu-devel@nongnu.org On Wed, Oct 10, 2012 at 4:28 PM, Aurelien Jarno wrote: > On Wed, Oct 10, 2012 at 03:21:48PM +0200, Laurent Desnogues wrote: >> On Tue, Oct 9, 2012 at 10:30 PM, Aurelien Jarno wrote: >> > Use ldr pc, [pc, #-4] kind of branch for direct jump. This removes the >> > need to flush the icache on TB linking, and allow to remove the limit >> > on the code generation buffer. >> >> I'm not sure I like it. In general having data in the middle >> of code will increase I/D cache and I/D TLB pressure. > > Agreed. On the other hand, this patch remove the synchronization of > the instruction cache for TB linking/unlinking. TB linking/unlinking should happen less often than code execution. >> > This improves the boot-up speed of a MIPS guest by 11%. >> >> Boot speed is very specific. Did you test some other code? >> Also what was your host? > > I tested it on a Cortex-A8 machine. I have only tested MIPS, but I can > do more tests, like running the openssl testsuite in the emulated guest. Yes, please. [...] > This doesn't really surprise me. The goal of the patch is to remove the > limit of 16MB for the generated code. I really doubt you reach such a > limit in user mode unless you use some complex code. > > On the other hand in system mode, this can be already reached once the > whole guest kernel is translated, so cached code is dropped and has to > be re-translated regularly. Re-translating guest code is clearly more > expensive than the increase of I/D cache and I/D TLB pressure. Ha yes, that's a real problem. What about having some define and/or runtime flag to keep both caches sync and your ldr PC change in QEMU? > The other way to allow more than 16MB of generated code would be to > disable direct jump on ARM. It adds one 32-bit constant loading + one > memory load, but then you don't have the I/D cache and TLB issue. The performance hit would be even worse :-) Laurent