From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:43085) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qr4f6-0002qQ-S3 for qemu-devel@nongnu.org; Wed, 10 Aug 2011 05:01:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qr4f1-0003fD-3w for qemu-devel@nongnu.org; Wed, 10 Aug 2011 05:01:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37131) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qr4f0-0003f5-Su for qemu-devel@nongnu.org; Wed, 10 Aug 2011 05:01:11 -0400 Message-ID: <4E4248D0.6070209@redhat.com> Date: Wed, 10 Aug 2011 12:01:04 +0300 From: Avi Kivity MIME-Version: 1.0 References: <1312516970-26606-1-git-send-email-david@gibson.dropbear.id.au> <4E3B8ACA.7080104@web.de> <20110805153053.GA15083@amt.cnet> <20110808060328.GB20120@yookeroo.fritz.box> <4E3F9D29.2000708@redhat.com> <20110810051002.GM23511@yookeroo.fritz.box> In-Reply-To: <20110810051002.GM23511@yookeroo.fritz.box> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti , Jan Kiszka , qemu-devel@nongnu.org, agraf@suse.de, kvm , Paul Mackerras On 08/10/2011 08:10 AM, David Gibson wrote: > On Mon, Aug 08, 2011 at 11:24:09AM +0300, Avi Kivity wrote: > > On 08/08/2011 09:03 AM, David Gibson wrote: > > >Second, if userspace qemu passing hugepages to kvm can cause (host) > > >kernel memory corruption, that is clearly a host kernel bug. So am I > > >correct in thinking this is basically just a safety feature if qemu is > > >run on a buggy kernel. > > > > Seems so, yes. 2.6.2[456] are exploitable. We only found out after > > these were all released. > > > > >Presumably this bug was corrected at some > > >point? Is the presence of the SYNC_MMU feature just being used as a > > >proxy for "is this kernel recent enough to have the corruption bug > > >fixed"? > > > > SYNC_MMU actually fixes the bug. > > Ah, so SYNC_MMU fixed the bug on x86, and all the other archs without > SYNC_MMU were left with a serious memory corruption bug, under a > userspace bandaid. Thanks for that. Unfortunately it's all too easy to ignore non-x86. It may be considered that not implementing SYNC_MMU is a bug in itself, as it allows userspace to pin arbitrary amounts of user memory. At least on x86 we had shrinkers that kill off shadow page tables under memory pressure, unpinning memory, but I don't see it on ppc. > As I understand the bug that causes the problem, it's because removing > all the hugepage VMAs from userspace will cause the inode (and > therefore address_space) for the hugepage file to be freed, but not > the pages (because another ref is held by kvm). Then when kvm > releases the pages, the address_space will be touched after free from > free_huge_page(). > > This would seem to be a genuine bug in the hugepage code, which has > just been hidden by SYNC_MMU. It should be quite easy to fix - the > mapping is only stored in the struct page to get to the hugetlbfs > superblock, so we could just store a direct superblock pointer > instead, and bump it's refcount when we put that in the page private > pointer. > > But then I'm not sure how qemu would detect that it's on a kernel > where the bug is fixed and allow -mem-path to be used again. Any > ideas? If it's just a kernel bug, the fix belongs in the kernel, not in qemu. We used to have KVM_CAPs to declare this sort of thing (KVM_CAP_HUGETLBFS_WORKS_EVEN_WITHOUT_SYNC_MMU) but I don't think it was a good idea. -- error compiling committee.c: too many arguments to function