* [PATCH 0/2] Fix failures caused by fork() interaction with internal slots @ 2010-06-21 8:18 Avi Kivity 2010-06-21 8:18 ` [PATCH 1/2] KVM: Keep slot ID in memory slot structure Avi Kivity 2010-06-21 8:18 ` [PATCH 2/2] KVM: Prevent internal slots from being COWed Avi Kivity 0 siblings, 2 replies; 7+ messages in thread From: Avi Kivity @ 2010-06-21 8:18 UTC (permalink / raw) To: kvm, Marcelo Tosatti fork() has a WONTFIX bug where a page with an elevated reference count will be COWed such that the page address changes even in the process which has taken the reference. This interacts badly with internal memory slots that install pages in vmcs registers, such as the APIC access page. This patchset disables fork() for these slots. Avi Kivity (2): KVM: Keep slot ID in memory slot structure KVM: Prevent internal slots from being COWed arch/x86/kvm/x86.c | 5 +++++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 3 files changed, 7 insertions(+), 0 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] KVM: Keep slot ID in memory slot structure 2010-06-21 8:18 [PATCH 0/2] Fix failures caused by fork() interaction with internal slots Avi Kivity @ 2010-06-21 8:18 ` Avi Kivity 2010-06-21 8:18 ` [PATCH 2/2] KVM: Prevent internal slots from being COWed Avi Kivity 1 sibling, 0 replies; 7+ messages in thread From: Avi Kivity @ 2010-06-21 8:18 UTC (permalink / raw) To: kvm, Marcelo Tosatti May be used for distinguishing between internal and user slots, or for sorting slots in size order. Signed-off-by: Avi Kivity <avi@redhat.com> --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2d96555..d84bf40 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -124,6 +124,7 @@ struct kvm_memory_slot { } *lpage_info[KVM_NR_PAGE_SIZES - 1]; unsigned long userspace_addr; int user_alloc; + int id; }; static inline unsigned long kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 84a0906..add43a3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -570,6 +570,7 @@ int __kvm_set_memory_region(struct kvm *kvm, new = old = *memslot; + new.id = mem->slot; new.base_gfn = base_gfn; new.npages = npages; new.flags = mem->flags; -- 1.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] KVM: Prevent internal slots from being COWed 2010-06-21 8:18 [PATCH 0/2] Fix failures caused by fork() interaction with internal slots Avi Kivity 2010-06-21 8:18 ` [PATCH 1/2] KVM: Keep slot ID in memory slot structure Avi Kivity @ 2010-06-21 8:18 ` Avi Kivity 2010-06-21 20:23 ` Marcelo Tosatti 1 sibling, 1 reply; 7+ messages in thread From: Avi Kivity @ 2010-06-21 8:18 UTC (permalink / raw) To: kvm, Marcelo Tosatti If a process with a memory slot is COWed, the page will change its address (despite having an elevated reference count). This breaks internal memory slots which have their physical addresses loaded into vmcs registers (see the APIC access memory slot). Signed-off-by: Avi Kivity <avi@redhat.com> --- arch/x86/kvm/x86.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 33156a3..d9a33e6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5633,6 +5633,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, int user_alloc) { int npages = memslot->npages; + int map_flags = MAP_PRIVATE | MAP_ANONYMOUS; + + /* Prevent internal slot pages from being moved by fork()/COW. */ + if (memslot->id >= KVM_MEMORY_SLOTS) + map_flags = MAP_SHARED | MAP_ANONYMOUS; /*To keep backward compatibility with older userspace, *x86 needs to hanlde !user_alloc case. -- 1.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed 2010-06-21 8:18 ` [PATCH 2/2] KVM: Prevent internal slots from being COWed Avi Kivity @ 2010-06-21 20:23 ` Marcelo Tosatti 2010-06-22 11:17 ` Avi Kivity 0 siblings, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2010-06-21 20:23 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm On Mon, Jun 21, 2010 at 11:18:13AM +0300, Avi Kivity wrote: > If a process with a memory slot is COWed, the page will change its address > (despite having an elevated reference count). This breaks internal memory > slots which have their physical addresses loaded into vmcs registers (see > the APIC access memory slot). > > Signed-off-by: Avi Kivity <avi@redhat.com> > --- > arch/x86/kvm/x86.c | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 33156a3..d9a33e6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5633,6 +5633,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > int user_alloc) > { > int npages = memslot->npages; > + int map_flags = MAP_PRIVATE | MAP_ANONYMOUS; > + > + /* Prevent internal slot pages from being moved by fork()/COW. */ > + if (memslot->id >= KVM_MEMORY_SLOTS) > + map_flags = MAP_SHARED | MAP_ANONYMOUS; > > /*To keep backward compatibility with older userspace, > *x86 needs to hanlde !user_alloc case. Forgot to use map_flags below. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed 2010-06-21 20:23 ` Marcelo Tosatti @ 2010-06-22 11:17 ` Avi Kivity 2010-07-06 14:45 ` Andrea Arcangeli 0 siblings, 1 reply; 7+ messages in thread From: Avi Kivity @ 2010-06-22 11:17 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm On 06/21/2010 11:23 PM, Marcelo Tosatti wrote: > On Mon, Jun 21, 2010 at 11:18:13AM +0300, Avi Kivity wrote: > >> If a process with a memory slot is COWed, the page will change its address >> (despite having an elevated reference count). This breaks internal memory >> slots which have their physical addresses loaded into vmcs registers (see >> the APIC access memory slot). >> >> Signed-off-by: Avi Kivity<avi@redhat.com> >> --- >> arch/x86/kvm/x86.c | 5 +++++ >> 1 files changed, 5 insertions(+), 0 deletions(-) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 33156a3..d9a33e6 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5633,6 +5633,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, >> int user_alloc) >> { >> int npages = memslot->npages; >> + int map_flags = MAP_PRIVATE | MAP_ANONYMOUS; >> + >> + /* Prevent internal slot pages from being moved by fork()/COW. */ >> + if (memslot->id>= KVM_MEMORY_SLOTS) >> + map_flags = MAP_SHARED | MAP_ANONYMOUS; >> >> /*To keep backward compatibility with older userspace, >> *x86 needs to hanlde !user_alloc case. >> > Forgot to use map_flags below. > > Ouch, corrected and applied. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed 2010-06-22 11:17 ` Avi Kivity @ 2010-07-06 14:45 ` Andrea Arcangeli 2010-07-06 14:53 ` Avi Kivity 0 siblings, 1 reply; 7+ messages in thread From: Andrea Arcangeli @ 2010-07-06 14:45 UTC (permalink / raw) To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, Glauber Costa On Tue, Jun 22, 2010 at 02:17:44PM +0300, Avi Kivity wrote: > On 06/21/2010 11:23 PM, Marcelo Tosatti wrote: > > On Mon, Jun 21, 2010 at 11:18:13AM +0300, Avi Kivity wrote: > > > >> If a process with a memory slot is COWed, the page will change its address > >> (despite having an elevated reference count). This breaks internal memory > >> slots which have their physical addresses loaded into vmcs registers (see > >> the APIC access memory slot). > >> > >> Signed-off-by: Avi Kivity<avi@redhat.com> > >> --- > >> arch/x86/kvm/x86.c | 5 +++++ > >> 1 files changed, 5 insertions(+), 0 deletions(-) > >> > >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> index 33156a3..d9a33e6 100644 > >> --- a/arch/x86/kvm/x86.c > >> +++ b/arch/x86/kvm/x86.c > >> @@ -5633,6 +5633,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > >> int user_alloc) > >> { > >> int npages = memslot->npages; > >> + int map_flags = MAP_PRIVATE | MAP_ANONYMOUS; > >> + > >> + /* Prevent internal slot pages from being moved by fork()/COW. */ > >> + if (memslot->id>= KVM_MEMORY_SLOTS) > >> + map_flags = MAP_SHARED | MAP_ANONYMOUS; > >> > >> /*To keep backward compatibility with older userspace, > >> *x86 needs to hanlde !user_alloc case. > >> > > Forgot to use map_flags below. > > > > > > Ouch, corrected and applied. I think I tracked down the corruption during swapping with THP enabled to this bug. The real bug is that the mmu notifier fires (it's not like fork isn't covered by the mmu notifier) but KVM ignores it and keeps writing to the old location. Shared pages can also be swapped out and if the dirty bit on the spte isn't set faster than the time it takes to write the page, the page can be relocated. Basically if do_swap_page decides to make a copy of the page (like in ksm-swapin case, erratically triggered now even for non-ksm pages in current upstream by a bug in the new anon-vma code which I fixed already in aa.git) and the dirty bit on the spte is ignored because of lumpy reclaim (which also I removed now and that makes the bug stop triggering too), eventually what happens is that the page is unmapped and during swapin it is relocated to a different page. The bug really is in KVM that ignores the mmu_notifier_invalidate_page and keeps using the old page. It should have rang a bell that fork was breaking anything... fork must not break anything since KVM is mmu notifier capable. MADV_DONTFORK must only be a performance optimization now. And the above change should be unnecessary (and I doubt the above really fixes the swapping case as tmpfs can also be swapped out, at least unless the page is pinned). The way I'd like to fix it is to allocate those magic pages by hand and not add them to lru and have page->mapping null. Then they will remain pinned in the pte, and all problems will go away. The other way would be to have a lookup hashtable that when mmu notifier invalidate fires, we lookup the hash and we call a method to have kvm stop using the page. And then something is needed during the page fault, if the gfn in the hash is paged-in another method is called to set the magic host user address to point the new pfn. I think pinning the pages and allocating them by hand is simpler, hopefully we can do it in a way that munmap will collect them automatically like now. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] KVM: Prevent internal slots from being COWed 2010-07-06 14:45 ` Andrea Arcangeli @ 2010-07-06 14:53 ` Avi Kivity 0 siblings, 0 replies; 7+ messages in thread From: Avi Kivity @ 2010-07-06 14:53 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Marcelo Tosatti, kvm, Glauber Costa On 07/06/2010 05:45 PM, Andrea Arcangeli wrote: > >> Ouch, corrected and applied. >> > I think I tracked down the corruption during swapping with THP enabled > to this bug. The real bug is that the mmu notifier fires (it's not > like fork isn't covered by the mmu notifier) but KVM ignores it and > keeps writing to the old location. Shared pages can also be swapped > out and if the dirty bit on the spte isn't set faster than the time it > takes to write the page, the page can be relocated. Basically if > do_swap_page decides to make a copy of the page (like in ksm-swapin > case, erratically triggered now even for non-ksm pages in current > upstream by a bug in the new anon-vma code which I fixed already in > aa.git) and the dirty bit on the spte is ignored because of lumpy > reclaim (which also I removed now and that makes the bug stop > triggering too), eventually what happens is that the page is unmapped > and during swapin it is relocated to a different page. > > The bug really is in KVM that ignores the mmu_notifier_invalidate_page > and keeps using the old page. > Right. It's the same problem as O_DIRECT + fork() - kvm did a get_user_pages() long ago on this page, it got forked, and on the next write the mm duplicated the page and assigned qemu the new page, which kvm ignored. > It should have rang a bell that fork was breaking anything... fork > must not break anything since KVM is mmu notifier > capable. MADV_DONTFORK must only be a performance optimization > now. And the above change should be unnecessary (and I doubt the above > really fixes the swapping case as tmpfs can also be swapped out, at > least unless the page is pinned). > That particular page is pinned. > The way I'd like to fix it is to allocate those magic pages by hand > and not add them to lru and have page->mapping null. Then they will > remain pinned in the pte, and all problems will go away. > Yes, that's the correct solution. It shouldn't be a user page in the first place. Problem is that this is a very intrusive change. > The other way would be to have a lookup hashtable that when mmu > notifier invalidate fires, we lookup the hash and we call a method to > have kvm stop using the page. And then something is needed during the > page fault, if the gfn in the hash is paged-in another method is > called to set the magic host user address to point the new pfn. > > I think pinning the pages and allocating them by hand is simpler, > hopefully we can do it in a way that munmap will collect them > automatically like now. > I'd like to remove it completely from the memslot mechanism, unfortunately it may affect many paths. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-07-06 14:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-06-21 8:18 [PATCH 0/2] Fix failures caused by fork() interaction with internal slots Avi Kivity 2010-06-21 8:18 ` [PATCH 1/2] KVM: Keep slot ID in memory slot structure Avi Kivity 2010-06-21 8:18 ` [PATCH 2/2] KVM: Prevent internal slots from being COWed Avi Kivity 2010-06-21 20:23 ` Marcelo Tosatti 2010-06-22 11:17 ` Avi Kivity 2010-07-06 14:45 ` Andrea Arcangeli 2010-07-06 14:53 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).