From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46477) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEGZ2-00015k-Ca for qemu-devel@nongnu.org; Mon, 04 Aug 2014 07:36:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XEGYx-0003Bo-8h for qemu-devel@nongnu.org; Mon, 04 Aug 2014 07:36:28 -0400 Received: from static.88-198-71-155.clients.your-server.de ([88.198.71.155]:36725 helo=socrates.bennee.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEGYx-0003Bj-1o for qemu-devel@nongnu.org; Mon, 04 Aug 2014 07:36:23 -0400 References: <1406733627-24255-1-git-send-email-alex.bennee@linaro.org> <53DBEC17.9040009@redhat.com> <87a97klf7p.fsf@linaro.org> From: Alex =?utf-8?Q?Benn=C3=A9e?= Date: Mon, 04 Aug 2014 12:34:22 +0100 In-reply-to: <87a97klf7p.fsf@linaro.org> Message-ID: <878un4lc5d.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v2 0/5] AArch64 TLB performance improvements List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: peter.maydell@linaro.org, Xin Tong , qemu-devel@nongnu.org Alex Bennée writes: > Paolo Bonzini writes: > >> Il 30/07/2014 17:20, Alex Bennée ha scritto: >>> Hi, >>> > >>> The most important thing is I've measured a 25-30% improvement in >>> kernel and android boot time. >>> > >> Hi Alex, have you seen this patch? Perhaps you're interested in >> reviving it. >> >> http://article.gmane.org/gmane.comp.emulators.qemu/253864 > > I saw it when it first came out but I didn't quite follow what it was > doing as I hadn't looked at the TLB code. I'll have another look and see > what difference it can make. A quick and dirty benchmark: **** Comparing 10bit/12bit tables with and without [[http://article.gmane.org/gmane.comp.emulators.qemu/253864][victim cache]] #+BEGIN_NOTES Time in seconds, smaller is better Percentage is amount of time compared to run to the left #+END_NOTES | Code | 10 bit | 10 bit + victim | 12 bit | 12 bit + victim | |-------+----------+-----------------+-----------+-----------------| | | 12.783 | 11.664 | 10.348 | 9.527 | | Runs | 13.046 | 11.971 | 10.123 | 9.326 | | | 12.929 | 11.673 | 11.130 | 9.858 | | | 12.981 | 11.941 | 10.223 | 9.673 | |-------+----------+-----------------+-----------+-----------------| | Avgs | 12.93475 | 11.81225 | 10.456 | 9.596 | |-------+----------+-----------------+-----------+-----------------| | %prev | 100% | 91.321827 | 88.518276 | 91.775057 | #+TBLFM: $2=vmean(@I..II)::$3=(@II$3/@II$2)*100::$4=vmean(@I..II)::$5=vmean(@I..II) Which as you expect shows the page table size is a greater improvement to the performance but the victim cache also improves the run time on top of this. I say as you would expect because any time you need to exit translated code there is a bunch of overhead in doing so. -- Alex Bennée