From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:41885) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzsR1-0006V3-0m for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RzsQv-0001t8-7e for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:22 -0500 Received: from cantor2.suse.de ([195.135.220.15]:49406 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RzsQu-0001sn-Ul for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:17 -0500 Message-ID: <4F43C403.1000000@suse.de> Date: Tue, 21 Feb 2012 17:19:15 +0100 From: Alexander Graf MIME-Version: 1.0 References: <1326469565-7736-1-git-send-email-agraf@suse.de> <393B307E-09C8-4067-AFF1-98C5607C7F97@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] linux-user: fix segfault deadlock List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Maydell Cc: Fabio Erculiani , Riku Voipio , "qemu-devel@nongnu.org Developers" On 02/21/2012 05:11 PM, Peter Maydell wrote: > On 13 January 2012 16:21, Alexander Graf wrote: >> On 13.01.2012, at 17:16, Peter Maydell wrote: >>> On 13 January 2012 15:46, Alexander Graf wrote: >>>> This patch forces unlocking of that lock in the segv handler. I'm not sure >>>> this is the right approach though. Maybe we should rather make sure we don't >>>> segfault in the code? I would greatly appreciate someone more intelligible >>>> than me to look at this :). >>> A segfault while we're walking the TB chains in QEMU C code? >>> That's just a bug (and we know we have one there) -- we should >>> fix it rather than papering over it like this. >> Well, we're segfaulting in this exact special case which calls setrlimit() before an mmap which fails: > So what's actually happening here is not a problem with the failing > mmap requested by the guest, but with one we do ourselves slightly > later. The guest attempts to do a call to some code that hasn't been > translated yet (as it happens it's an attempt to call the tls function > in the ARM commpage at 0xffff0fe0, but that's not particularly > important). tb_gen_code() generates the code, and then calls > tb_link_page() to add it to our data structures. tb_link_page calls > tb_alloc_page() which calls page_find_alloc() which then needs > to allocate a level 2 page table entry in the l1_map[]. Unfortunately > (a) the mmap() it uses to do this fails because of the rlimit > (b) we don't check the mmap() return value, so we proceed on > with the bogus -1 and quickly segfault. > > So "unlock and carry on regardless" is definitely wrong. We could > do the same as the system-mode (which is using g_malloc0) and > abort when the mmap() fails, although that won't help your test > program much I suspect. > > For a proper fix we probably need to handle set/getrlimit for > RLIMIT_AS specially so we can apply them ourselves to the guest's > mmap/brk usage and don't get spurious allocation failures of > our private data structures... Or we preallocate structures we need in those critical paths. Or we manage an allocation failure by not linking pages. Alex