From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51828) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bPREo-0006Jn-Bu for qemu-devel@nongnu.org; Tue, 19 Jul 2016 05:22:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bPREk-0004eT-M1 for qemu-devel@nongnu.org; Tue, 19 Jul 2016 05:22:50 -0400 References: <578CEB7B.7010801@samsung.com> From: Maxim Ostapenko Message-id: <578DF161.9040205@samsung.com> Date: Tue, 19 Jul 2016 12:22:41 +0300 MIME-version: 1.0 In-reply-to: Content-type: text/plain; charset=utf-8; format=flowed Content-transfer-encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-discuss] ASan'ed binaries start up very slow under qemu-aarch64. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: qemu-discuss , QEMU Developers On 18/07/16 18:51, Peter Maydell wrote: > (CCing qemu-devel, which is more likely to get developer attention) Peter, thank you for your answer. > > On 18 July 2016 at 15:45, Maxim Ostapenko wrote: >> 1) AddressSanitizer mmaps quite large regions of memory for redzones and >> shadow gap. In particular, for 39-bit AS it mmapes: >> >> || `[0x1400000000, 0x1fffffffff]` || HighShadow || - 48 Gb >> || `[0x1200000000, 0x13ffffffff]` || ShadowGap || - 8 Gb >> || `[0x1000000000, 0x11ffffffff]` || LowShadow || - 4 Gb >> >> 2) In QEMU, page_set_flags is called for these ranges. It cuts given range >> to individual pages and sets flags for them. Given the page size is 4 Kb, >> for 8 Gb range we have 2097152 iterations and for 48 Gb 12582912 iterations >> in inner loop. This is obviously a performance bottleneck. > Mmm, the algorithm here is pretty simple and basically assumes the > guest isn't going to be doing enormous allocations like that. > (If the host process doesn't happen to have a suitable big lump of its > VA space free then the mmap will fail anyway.) Hm, it seems that ASan is really special here. Actually, I think that this slowdown is not critical for individual runs, but it certainly critical for people who rely on QEMU in their builds (e.g. in Aarch64 chroot). Not sure it's a common case, though. > >> 3) Same issue may happen when ASan tries to read /proc/self/map later in >> page_check_range function, after it already mmaped HighShadow, ShadowGap and >> LowShadow regions. >> >> Could someone help me, how can I mitigate this performance issue? Do we >> really need to set flags to each page on entire (quite big) memory region? > Well, we do need to do some things: > * we're populating the PageDesc data structure which we later use > to cache generated code > * if we're marking the range as writeable and it wasn't previously > writeable, we need to check whether there's already generated code > anywhere in this memory range and invalidate those translations > > This could probably be done in a way that doesn't iterate naively > through every page, though. Oh, I see. Perhaps we can restrict QEMU to use some well defined pages for generated code? Thanks, -Maxim > > thanks > -- PMM > >