From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:41885)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1RzsR1-0006V3-0m
	for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:33 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1RzsQv-0001t8-7e
	for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:22 -0500
Received: from cantor2.suse.de ([195.135.220.15]:49406 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1RzsQu-0001sn-Ul
	for qemu-devel@nongnu.org; Tue, 21 Feb 2012 11:19:17 -0500
Message-ID: <4F43C403.1000000@suse.de>
Date: Tue, 21 Feb 2012 17:19:15 +0100
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <1326469565-7736-1-git-send-email-agraf@suse.de>	<CAFEAcA9VWGSqQmr=S-BkPCSggNHULWqTbwksUiDfr2RxR5T2fg@mail.gmail.com>	<393B307E-09C8-4067-AFF1-98C5607C7F97@suse.de>
	<CAFEAcA8yzpm5p17Hae5+kqKD3kB12XD+gkeys9wBGWEU-g+-0w@mail.gmail.com>
In-Reply-To: <CAFEAcA8yzpm5p17Hae5+kqKD3kB12XD+gkeys9wBGWEU-g+-0w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] linux-user: fix segfault deadlock
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Fabio Erculiani <lxnay@sabayon.org>, Riku Voipio <riku.voipio@iki.fi>, "qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>

On 02/21/2012 05:11 PM, Peter Maydell wrote:
> On 13 January 2012 16:21, Alexander Graf<agraf@suse.de>  wrote:
>> On 13.01.2012, at 17:16, Peter Maydell wrote:
>>> On 13 January 2012 15:46, Alexander Graf<agraf@suse.de>  wrote:
>>>> This patch forces unlocking of that lock in the segv handler. I'm not sure
>>>> this is the right approach though. Maybe we should rather make sure we don't
>>>> segfault in the code? I would greatly appreciate someone more intelligible
>>>> than me to look at this :).
>>> A segfault while we're walking the TB chains in QEMU C code?
>>> That's just a bug (and we know we have one there) -- we should
>>> fix it rather than papering over it like this.
>> Well, we're segfaulting in this exact special case which calls setrlimit() before an mmap which fails:
> So what's actually happening here is not a problem with the failing
> mmap requested by the guest, but with one we do ourselves slightly
> later. The guest attempts to do a call to some code that hasn't been
> translated yet (as it happens it's an attempt to call the tls function
> in the ARM commpage at 0xffff0fe0, but that's not particularly
> important). tb_gen_code() generates the code, and then calls
> tb_link_page() to add it to our data structures. tb_link_page calls
> tb_alloc_page() which calls page_find_alloc() which then needs
> to allocate a level 2 page table entry in the l1_map[]. Unfortunately
> (a) the mmap() it uses to do this fails because of the rlimit
> (b) we don't check the mmap() return value, so we proceed on
> with the bogus -1 and quickly segfault.
>
> So "unlock and carry on regardless" is definitely wrong. We could
> do the same as the system-mode (which is using g_malloc0) and
> abort when the mmap() fails, although that won't help your test
> program much I suspect.
>
> For a proper fix we probably need to handle set/getrlimit for
> RLIMIT_AS specially so we can apply them ourselves to the guest's
> mmap/brk usage and don't get spurious allocation failures of
> our private data structures...

Or we preallocate structures we need in those critical paths. Or we 
manage an allocation failure by not linking pages.

Alex