From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53635) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eJKYj-0002EH-0m for qemu-devel@nongnu.org; Mon, 27 Nov 2017 09:38:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eJKYd-0003oT-Ch for qemu-devel@nongnu.org; Mon, 27 Nov 2017 09:38:57 -0500 Received: from mail-wr0-f173.google.com ([209.85.128.173]:42767) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eJKYd-0003oB-5N for qemu-devel@nongnu.org; Mon, 27 Nov 2017 09:38:51 -0500 Received: by mail-wr0-f173.google.com with SMTP id o14so26637012wrf.9 for ; Mon, 27 Nov 2017 06:38:51 -0800 (PST) References: From: Paolo Bonzini Message-ID: Date: Mon, 27 Nov 2017 15:38:47 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] javac crash in user-mode emulation: races on page_unprotect() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell , QEMU Developers Cc: =?UTF-8?Q?Alex_Benn=c3=a9e?= , Richard Henderson On 24/11/2017 18:18, Peter Maydell wrote: > * threads A & B both try to do a write to a page with code in it at > the same time (ie which we've made non-writeable, so SEGV) > * they race into the signal handler with this faulting address > * thread A happens to get to page_unprotect() first and takes the > mmap lock, so thread B sits waiting for it to be done > * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable > * A can then continue OK (returns from signal handler to retry the > memory access) > * ...but when B gets the mmap lock it finds that the page is already > PAGE_WRITE, and so it exits page_unprotect() via the "not due to > protected translation" code path, and wrongly delivers the signal > to the guest rather than just retrying the access > > I'm not sure how best to fix this. We could make page_unprotect() > say "if PAGE_WRITE is set, assume this call raced with another one > and say 'this was caused by protected translation' without doing > anything". Yes, I think this is the only solution since SIGSEGV is raised asynchronously. Even using a trylock would only narrow the race window but not fix it. > But I have a feeling that will mean we could end up looping > endlessly if we get a SEGV for a write to a writeable page (not > sure when this could happen, but maybe alignment issues?). Those would have to be detected via si_code (for the specific case of invalid address alignment, that would be a SIGBUS with si_code==BUS_ADRALN, not a SIGSEGV). In general, I think that only SIGSEGV/SEGV_ACCERR needs to go down the page_unprotect path. Paolo