* [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
@ 2011-08-05 4:02 David Gibson
2011-08-05 6:16 ` Jan Kiszka
0 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2011-08-05 4:02 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-trivial, agraf
At present, an explicit test disallows use of -mem-path when kvm is enabled
but KVM_CAP_SYNC_MMU is not set. In particular, this prevents the user
from using hugetlbfs to back the guest memory.
I can see no reason for this check, and when I asked about it previously,
the only theory offered was that this was a limitation of the very early
days of kvm which only happened to match the SYNC_MMU flag by accident.
This patch, therefore, removes the check. This is of particular use to
us on POWER, where we haven't yet implement SYNC_MMU, but where backing
the guest with hugepages is possible, and in fact mandatory (for now).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
exec.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/exec.c b/exec.c
index 476b507..041637c 100644
--- a/exec.c
+++ b/exec.c
@@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}
- if (kvm_enabled() && !kvm_has_sync_mmu()) {
- fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n");
- return NULL;
- }
-
if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
return NULL;
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-05 4:02 [Qemu-devel] [PATCH] Permit -mem-path without sync mmu David Gibson
@ 2011-08-05 6:16 ` Jan Kiszka
2011-08-05 15:30 ` Marcelo Tosatti
0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2011-08-05 6:16 UTC (permalink / raw)
To: David Gibson; +Cc: Avi Kivity, Marcelo Tosatti, qemu-devel, kvm, agraf
[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]
On 2011-08-05 06:02, David Gibson wrote:
> At present, an explicit test disallows use of -mem-path when kvm is enabled
> but KVM_CAP_SYNC_MMU is not set. In particular, this prevents the user
> from using hugetlbfs to back the guest memory.
>
> I can see no reason for this check, and when I asked about it previously,
> the only theory offered was that this was a limitation of the very early
> days of kvm which only happened to match the SYNC_MMU flag by accident.
>
> This patch, therefore, removes the check. This is of particular use to
> us on POWER, where we haven't yet implement SYNC_MMU, but where backing
> the guest with hugepages is possible, and in fact mandatory (for now).
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> exec.c | 5 -----
> 1 files changed, 0 insertions(+), 5 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 476b507..041637c 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block,
> return NULL;
> }
>
> - if (kvm_enabled() && !kvm_has_sync_mmu()) {
> - fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n");
> - return NULL;
> - }
> -
> if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> return NULL;
> }
This is nothing trivial, see ce9a92411d in qemu-kvm or
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/27380. And it
should rather target uq/master. CCing Avi, Marcelo, and the kvm list.
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-05 6:16 ` Jan Kiszka
@ 2011-08-05 15:30 ` Marcelo Tosatti
2011-08-08 6:03 ` David Gibson
0 siblings, 1 reply; 8+ messages in thread
From: Marcelo Tosatti @ 2011-08-05 15:30 UTC (permalink / raw)
To: Jan Kiszka; +Cc: agraf, Avi Kivity, qemu-devel, kvm, David Gibson
On Fri, Aug 05, 2011 at 08:16:42AM +0200, Jan Kiszka wrote:
> On 2011-08-05 06:02, David Gibson wrote:
> > At present, an explicit test disallows use of -mem-path when kvm is enabled
> > but KVM_CAP_SYNC_MMU is not set. In particular, this prevents the user
> > from using hugetlbfs to back the guest memory.
> >
> > I can see no reason for this check, and when I asked about it previously,
> > the only theory offered was that this was a limitation of the very early
> > days of kvm which only happened to match the SYNC_MMU flag by accident.
> >
> > This patch, therefore, removes the check. This is of particular use to
> > us on POWER, where we haven't yet implement SYNC_MMU, but where backing
> > the guest with hugepages is possible, and in fact mandatory (for now).
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > exec.c | 5 -----
> > 1 files changed, 0 insertions(+), 5 deletions(-)
> >
> > diff --git a/exec.c b/exec.c
> > index 476b507..041637c 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block,
> > return NULL;
> > }
> >
> > - if (kvm_enabled() && !kvm_has_sync_mmu()) {
> > - fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n");
> > - return NULL;
> > - }
> > -
> > if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> > return NULL;
> > }
>
> This is nothing trivial, see ce9a92411d in qemu-kvm or
> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/27380. And it
> should rather target uq/master. CCing Avi, Marcelo, and the kvm list.
>
> Jan
Yes, the check cannot be removed because there is the possibility of
corruption using hugepages without mmu notifiers (described in the
archived message above).
Why are mmu notifiers not implemented for PPC again?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-05 15:30 ` Marcelo Tosatti
@ 2011-08-08 6:03 ` David Gibson
2011-08-08 8:24 ` Avi Kivity
0 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2011-08-08 6:03 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, Jan Kiszka, qemu-devel, kvm, agraf
On Fri, Aug 05, 2011 at 12:30:53PM -0300, Marcelo Tosatti wrote:
> On Fri, Aug 05, 2011 at 08:16:42AM +0200, Jan Kiszka wrote:
> > On 2011-08-05 06:02, David Gibson wrote:
> > > At present, an explicit test disallows use of -mem-path when kvm is enabled
> > > but KVM_CAP_SYNC_MMU is not set. In particular, this prevents the user
> > > from using hugetlbfs to back the guest memory.
> > >
> > > I can see no reason for this check, and when I asked about it previously,
> > > the only theory offered was that this was a limitation of the very early
> > > days of kvm which only happened to match the SYNC_MMU flag by accident.
> > >
> > > This patch, therefore, removes the check. This is of particular use to
> > > us on POWER, where we haven't yet implement SYNC_MMU, but where backing
> > > the guest with hugepages is possible, and in fact mandatory (for now).
> > >
> > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > > exec.c | 5 -----
> > > 1 files changed, 0 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/exec.c b/exec.c
> > > index 476b507..041637c 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block,
> > > return NULL;
> > > }
> > >
> > > - if (kvm_enabled() && !kvm_has_sync_mmu()) {
> > > - fprintf(stderr, "host lacks kvm mmu notifiers, -mem-path unsupported\n");
> > > - return NULL;
> > > - }
> > > -
> > > if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> > > return NULL;
> > > }
> >
> > This is nothing trivial, see ce9a92411d in qemu-kvm or
> > http://thread.gmane.org/gmane.comp.emulators.kvm.devel/27380. And it
> > should rather target uq/master. CCing Avi, Marcelo, and the kvm list.
> >
> > Jan
Well, sending the patch flushed out the real reason for that check, at
least, as I thought it might.
> Yes, the check cannot be removed because there is the possibility of
> corruption using hugepages without mmu notifiers (described in the
> archived message above).
Ok, so. If I understand the archived message correctly. First, this
check *is* all about hugepages - which is not obvious from the test
itself.
Second, if userspace qemu passing hugepages to kvm can cause (host)
kernel memory corruption, that is clearly a host kernel bug. So am I
correct in thinking this is basically just a safety feature if qemu is
run on a buggy kernel. Presumably this bug was corrected at some
point? Is the presence of the SYNC_MMU feature just being used as a
proxy for "is this kernel recent enough to have the corruption bug
fixed"?
In any case this test sure as hell needs a big comment next to it
explaining this context.
> Why are mmu notifiers not implemented for PPC again?
It's just not done yet; we're working on it. (That is, mmu notifiers
are certainly present on PPC, it's just they're not wired up to kvm,
yet).
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-08 6:03 ` David Gibson
@ 2011-08-08 8:24 ` Avi Kivity
2011-08-10 5:10 ` David Gibson
0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2011-08-08 8:24 UTC (permalink / raw)
To: Marcelo Tosatti, Jan Kiszka, qemu-devel, agraf, kvm
On 08/08/2011 09:03 AM, David Gibson wrote:
> Second, if userspace qemu passing hugepages to kvm can cause (host)
> kernel memory corruption, that is clearly a host kernel bug. So am I
> correct in thinking this is basically just a safety feature if qemu is
> run on a buggy kernel.
Seems so, yes. 2.6.2[456] are exploitable. We only found out after
these were all released.
> Presumably this bug was corrected at some
> point? Is the presence of the SYNC_MMU feature just being used as a
> proxy for "is this kernel recent enough to have the corruption bug
> fixed"?
SYNC_MMU actually fixes the bug.
> In any case this test sure as hell needs a big comment next to it
> explaining this context.
Yes.
>
> > Why are mmu notifiers not implemented for PPC again?
>
> It's just not done yet; we're working on it. (That is, mmu notifiers
> are certainly present on PPC, it's just they're not wired up to kvm,
> yet).
>
If ppc doesn't have this issue even without SYNC_MMU, we can make the
check x86 specific.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-08 8:24 ` Avi Kivity
@ 2011-08-10 5:10 ` David Gibson
2011-08-10 9:01 ` Avi Kivity
0 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2011-08-10 5:10 UTC (permalink / raw)
To: Avi Kivity
Cc: kvm, Marcelo Tosatti, qemu-devel, agraf, Paul Mackerras,
Jan Kiszka
On Mon, Aug 08, 2011 at 11:24:09AM +0300, Avi Kivity wrote:
> On 08/08/2011 09:03 AM, David Gibson wrote:
> >Second, if userspace qemu passing hugepages to kvm can cause (host)
> >kernel memory corruption, that is clearly a host kernel bug. So am I
> >correct in thinking this is basically just a safety feature if qemu is
> >run on a buggy kernel.
>
> Seems so, yes. 2.6.2[456] are exploitable. We only found out after
> these were all released.
>
> >Presumably this bug was corrected at some
> >point? Is the presence of the SYNC_MMU feature just being used as a
> >proxy for "is this kernel recent enough to have the corruption bug
> >fixed"?
>
> SYNC_MMU actually fixes the bug.
Ah, so SYNC_MMU fixed the bug on x86, and all the other archs without
SYNC_MMU were left with a serious memory corruption bug, under a
userspace bandaid. Thanks for that.
As I understand the bug that causes the problem, it's because removing
all the hugepage VMAs from userspace will cause the inode (and
therefore address_space) for the hugepage file to be freed, but not
the pages (because another ref is held by kvm). Then when kvm
releases the pages, the address_space will be touched after free from
free_huge_page().
This would seem to be a genuine bug in the hugepage code, which has
just been hidden by SYNC_MMU. It should be quite easy to fix - the
mapping is only stored in the struct page to get to the hugetlbfs
superblock, so we could just store a direct superblock pointer
instead, and bump it's refcount when we put that in the page private
pointer.
But then I'm not sure how qemu would detect that it's on a kernel
where the bug is fixed and allow -mem-path to be used again. Any
ideas?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-10 5:10 ` David Gibson
@ 2011-08-10 9:01 ` Avi Kivity
2011-08-11 6:09 ` David Gibson
0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2011-08-10 9:01 UTC (permalink / raw)
To: Marcelo Tosatti, Jan Kiszka, qemu-devel, agraf, kvm,
Paul Mackerras
On 08/10/2011 08:10 AM, David Gibson wrote:
> On Mon, Aug 08, 2011 at 11:24:09AM +0300, Avi Kivity wrote:
> > On 08/08/2011 09:03 AM, David Gibson wrote:
> > >Second, if userspace qemu passing hugepages to kvm can cause (host)
> > >kernel memory corruption, that is clearly a host kernel bug. So am I
> > >correct in thinking this is basically just a safety feature if qemu is
> > >run on a buggy kernel.
> >
> > Seems so, yes. 2.6.2[456] are exploitable. We only found out after
> > these were all released.
> >
> > >Presumably this bug was corrected at some
> > >point? Is the presence of the SYNC_MMU feature just being used as a
> > >proxy for "is this kernel recent enough to have the corruption bug
> > >fixed"?
> >
> > SYNC_MMU actually fixes the bug.
>
> Ah, so SYNC_MMU fixed the bug on x86, and all the other archs without
> SYNC_MMU were left with a serious memory corruption bug, under a
> userspace bandaid. Thanks for that.
Unfortunately it's all too easy to ignore non-x86.
It may be considered that not implementing SYNC_MMU is a bug in itself,
as it allows userspace to pin arbitrary amounts of user memory. At
least on x86 we had shrinkers that kill off shadow page tables under
memory pressure, unpinning memory, but I don't see it on ppc.
> As I understand the bug that causes the problem, it's because removing
> all the hugepage VMAs from userspace will cause the inode (and
> therefore address_space) for the hugepage file to be freed, but not
> the pages (because another ref is held by kvm). Then when kvm
> releases the pages, the address_space will be touched after free from
> free_huge_page().
>
> This would seem to be a genuine bug in the hugepage code, which has
> just been hidden by SYNC_MMU. It should be quite easy to fix - the
> mapping is only stored in the struct page to get to the hugetlbfs
> superblock, so we could just store a direct superblock pointer
> instead, and bump it's refcount when we put that in the page private
> pointer.
>
> But then I'm not sure how qemu would detect that it's on a kernel
> where the bug is fixed and allow -mem-path to be used again. Any
> ideas?
If it's just a kernel bug, the fix belongs in the kernel, not in qemu.
We used to have KVM_CAPs to declare this sort of thing
(KVM_CAP_HUGETLBFS_WORKS_EVEN_WITHOUT_SYNC_MMU) but I don't think it was
a good idea.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] Permit -mem-path without sync mmu
2011-08-10 9:01 ` Avi Kivity
@ 2011-08-11 6:09 ` David Gibson
0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2011-08-11 6:09 UTC (permalink / raw)
To: Avi Kivity
Cc: kvm, Marcelo Tosatti, qemu-devel, agraf, Paul Mackerras,
Jan Kiszka
On Wed, Aug 10, 2011 at 12:01:04PM +0300, Avi Kivity wrote:
> On 08/10/2011 08:10 AM, David Gibson wrote:
> >On Mon, Aug 08, 2011 at 11:24:09AM +0300, Avi Kivity wrote:
> >> On 08/08/2011 09:03 AM, David Gibson wrote:
[snip]
> >This would seem to be a genuine bug in the hugepage code, which has
> >just been hidden by SYNC_MMU. It should be quite easy to fix - the
> >mapping is only stored in the struct page to get to the hugetlbfs
> >superblock, so we could just store a direct superblock pointer
> >instead, and bump it's refcount when we put that in the page private
> >pointer.
> >
> >But then I'm not sure how qemu would detect that it's on a kernel
> >where the bug is fixed and allow -mem-path to be used again. Any
> >ideas?
>
> If it's just a kernel bug, the fix belongs in the kernel, not in qemu.
Obviously.
> We used to have KVM_CAPs to declare this sort of thing
> (KVM_CAP_HUGETLBFS_WORKS_EVEN_WITHOUT_SYNC_MMU) but I don't think it
> was a good idea.
I tend to agree - especially since there's nothing actually kvm
specific about this bug. AFAICT a driver which did gup on hugepages
could trigger the bug equally well.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-08-11 6:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-05 4:02 [Qemu-devel] [PATCH] Permit -mem-path without sync mmu David Gibson
2011-08-05 6:16 ` Jan Kiszka
2011-08-05 15:30 ` Marcelo Tosatti
2011-08-08 6:03 ` David Gibson
2011-08-08 8:24 ` Avi Kivity
2011-08-10 5:10 ` David Gibson
2011-08-10 9:01 ` Avi Kivity
2011-08-11 6:09 ` David Gibson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).