From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39478) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XALJE-0000TN-JJ for qemu-devel@nongnu.org; Thu, 24 Jul 2014 11:52:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XALJ8-0008JE-F6 for qemu-devel@nongnu.org; Thu, 24 Jul 2014 11:51:56 -0400 Received: from static.88-198-71-155.clients.your-server.de ([88.198.71.155]:60873 helo=socrates.bennee.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XALJ8-0008GF-7q for qemu-devel@nongnu.org; Thu, 24 Jul 2014 11:51:50 -0400 From: =?UTF-8?q?Alex=20Benn=C3=A9e?= Date: Thu, 24 Jul 2014 16:52:52 +0100 Message-Id: <1406217175-30267-1-git-send-email-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [RFC PATCH 0/3] target-arm: Some fixes to page and TLB handling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: peter.maydell@linaro.org, =?UTF-8?q?Alex=20Benn=C3=A9e?= , rth@twiddle.net Hi, While doing some performance analysis on aarch64 system emulation I noticed a fairly high utilisation of cpu_arm_exec and the related find next TB machinery. Peter pointed it this is probably not helped by that fact TARGET_PAGE_BITS was set to 10 (1k pages) which would imply less chaining of TBs than we should be able to get. However enabling TARGET_PAGE_BITS 12 managed to shake out a bunch of bugs in the TLB handing. With TARGET_PAGE_BITS finally set to twelve I saw a drop in the % time taken by cpu_arm_exec from 21.68% to 17.01% in my simple hand driven android benchmark. I think if we are ever going to improve on this further we need to consider alternative strategies to collecting, invalidating and chaining together Translation Blocks. I don't think this patch set is mergable as-is because we still include a bunch of 32 bit ARM boards in the aarch64-softmmu build which could be using an old enough ARM that has support for 1k page tables (and may even use them?). However review comments are welcome as well as any wider discussion on reducing the time spent jumping between TBs. Regards, Alex Bennée (3): target-arm: don't hardcode mask values in arm_cpu_handle_mmu_fault target-arm: A64: fix TLB flush instructions target-arm: A64: fix use 12 bit page tables for aarch64 target-arm/cpu.h | 13 ++++++++++--- target-arm/helper.c | 16 ++++++++++++---- 2 files changed, 22 insertions(+), 7 deletions(-) -- 2.0.2