[RFC] KVM MMU: improve large munmap efficiency

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] KVM MMU: improve large munmap efficiency
@ 2012-01-26 23:24 Eric Northup
  2012-01-27  0:59 ` Takuya Yoshikawa
  2012-01-29 11:01 ` Avi Kivity
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Northup @ 2012-01-26 23:24 UTC (permalink / raw)
  To: KVM

Flush the shadow MMU instead of iterating over each host VA when doing
a large invalidate range callback.

The previous code is O(N) in the number of virtual pages being
invalidated, while holding both the MMU spinlock and the mmap_sem.
Large unmaps can cause significant delay, during which the process is
unkillable.  Worse, all page allocation could be delayed if there's
enough memory pressure that mmu_shrink gets called.

Signed-off-by: Eric Northup <digitaleric@google.com>

---

We have seen delays of over 30 seconds doing a large (128GB) unmap.

It'd be nicer to check if the amount of work to be done by the entire
flush is less than the work to be done iterating over each HVA page,
but that information isn't currently available to the arch-
independent part of KVM.

Better ideas would be most welcome ;-)

Tested by attaching a debugger to a running qemu w/kvm and running
"call munmap(0, 1UL << 46)".

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7287bf5..9fe303a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -61,6 +61,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/kvm.h>

+#define MMU_NOTIFIER_FLUSH_THRESHOLD_PAGES	(1024u*1024u*1024u)
+
 MODULE_AUTHOR("Qumranet");
 MODULE_LICENSE("GPL");

@@ -332,8 +334,12 @@ static void
kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	 * count is also read inside the mmu_lock critical section.
 	 */
 	kvm->mmu_notifier_count++;
-	for (; start < end; start += PAGE_SIZE)
-		need_tlb_flush |= kvm_unmap_hva(kvm, start);
+	if (end - start < MMU_NOTIFIER_FLUSH_THRESHOLD_PAGES)
+		for (; start < end; start += PAGE_SIZE)
+			need_tlb_flush |= kvm_unmap_hva(kvm, start);
+	else
+		kvm_arch_flush_shadow(kvm);
+
 	need_tlb_flush |= kvm->tlbs_dirty;
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-26 23:24 [RFC] KVM MMU: improve large munmap efficiency Eric Northup
@ 2012-01-27  0:59 ` Takuya Yoshikawa
  2012-01-27  1:13   ` Takuya Yoshikawa
  2012-01-29 11:01 ` Avi Kivity
  1 sibling, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-01-27  0:59 UTC (permalink / raw)
  To: Eric Northup; +Cc: KVM

Hi,

(2012/01/27 8:24), Eric Northup wrote:
> Flush the shadow MMU instead of iterating over each host VA when doing
> a large invalidate range callback.
>
> The previous code is O(N) in the number of virtual pages being
> invalidated, while holding both the MMU spinlock and the mmap_sem.
> Large unmaps can cause significant delay, during which the process is
> unkillable.  Worse, all page allocation could be delayed if there's
> enough memory pressure that mmu_shrink gets called.
>
> Signed-off-by: Eric Northup<digitaleric@google.com>
>
> ---
>
> We have seen delays of over 30 seconds doing a large (128GB) unmap.
>
> It'd be nicer to check if the amount of work to be done by the entire
> flush is less than the work to be done iterating over each HVA page,
> but that information isn't currently available to the arch-
> independent part of KVM.

Using the number of (active) shadow pages may be one way.

See kvm->arch.n_used_mmu_pages.


>
> Better ideas would be most welcome ;-)


I will soon, this weekend if possible, send a patch series which may
result in speeding up kvm_unmap_hva() loop.

Though my work has been done for optimizing a different thing, dirty
logging, I think this loop will also be optimized.

	I have checked that dirty logging improved significantly,
	so hope that your case will also.

So, in addition to your patch, please see to what extent my patch series
will help your case, if possible.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-27  0:59 ` Takuya Yoshikawa
@ 2012-01-27  1:13   ` Takuya Yoshikawa
  0 siblings, 0 replies; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-01-27  1:13 UTC (permalink / raw)
  To: Eric Northup; +Cc: KVM

(2012/01/27 9:59), Takuya Yoshikawa wrote:

>> We have seen delays of over 30 seconds doing a large (128GB) unmap.
>>
>> It'd be nicer to check if the amount of work to be done by the entire
>> flush is less than the work to be done iterating over each HVA page,
>> but that information isn't currently available to the arch-
>> independent part of KVM.
>
> Using the number of (active) shadow pages may be one way.
>
> See kvm->arch.n_used_mmu_pages.

Ah, sorry, you are looking for arch independent information.

>
>
>>
>> Better ideas would be most welcome ;-)
>
>
> I will soon, this weekend if possible, send a patch series which may
> result in speeding up kvm_unmap_hva() loop.

... and I also need to check if my work can be naturally implemented by
arch independent manner.

	Takuya

>
> Though my work has been done for optimizing a different thing, dirty
> logging, I think this loop will also be optimized.
>
> I have checked that dirty logging improved significantly,
> so hope that your case will also.
>
> So, in addition to your patch, please see to what extent my patch series
> will help your case, if possible.
>
> Thanks,
> Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-26 23:24 [RFC] KVM MMU: improve large munmap efficiency Eric Northup
  2012-01-27  0:59 ` Takuya Yoshikawa
@ 2012-01-29 11:01 ` Avi Kivity
  2012-01-29 13:22   ` Takuya Yoshikawa
  1 sibling, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2012-01-29 11:01 UTC (permalink / raw)
  To: Eric Northup; +Cc: KVM

On 01/27/2012 01:24 AM, Eric Northup wrote:
> Flush the shadow MMU instead of iterating over each host VA when doing
> a large invalidate range callback.
>
> The previous code is O(N) in the number of virtual pages being
> invalidated, while holding both the MMU spinlock and the mmap_sem.
> Large unmaps can cause significant delay, during which the process is
> unkillable.  Worse, all page allocation could be delayed if there's
> enough memory pressure that mmu_shrink gets called.
>
> Signed-off-by: Eric Northup <digitaleric@google.com>
>
> ---
>
> We have seen delays of over 30 seconds doing a large (128GB) unmap.
>
> It'd be nicer to check if the amount of work to be done by the entire
> flush is less than the work to be done iterating over each HVA page,
> but that information isn't currently available to the arch-
> independent part of KVM.
>
> Better ideas would be most welcome ;-)
>
>
> Tested by attaching a debugger to a running qemu w/kvm and running
> "call munmap(0, 1UL << 46)".
>

How about computing the intersection of (start, end) with the hva ranges
in kvm->memslots?

If there is no intersection, you exit immediately.

It's still possible for the work to drop the intersection to be larger
than dropping the entire shadow, but it's unlikely.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-29 11:01 ` Avi Kivity
@ 2012-01-29 13:22   ` Takuya Yoshikawa
  2012-01-29 13:25     ` Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-01-29 13:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Northup, KVM

On Sun, 29 Jan 2012 13:01:18 +0200
Avi Kivity <avi@redhat.com> wrote:

> > Tested by attaching a debugger to a running qemu w/kvm and running
> > "call munmap(0, 1UL << 46)".
> >
> 
> How about computing the intersection of (start, end) with the hva ranges
> in kvm->memslots?
> 
> If there is no intersection, you exit immediately.

I think introducing kvm_handle_hva_range() is the right thing if we really
care about unmapping large area at once.

Current iteration:
for each page
	for each slot
		for each level

My suggestion:
for each slot
	for each level
		for each page

This way compiler can optimize the task to be a simple iteration over the
rmap array.

But I could not imagine why unmapping 128GB is needed.
mmu_notifier will not do such crazy thing.

	Takuya

> 
> It's still possible for the work to drop the intersection to be larger
> than dropping the entire shadow, but it's unlikely.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-29 13:22   ` Takuya Yoshikawa
@ 2012-01-29 13:25     ` Avi Kivity
  2012-01-29 13:50       ` Takuya Yoshikawa
  0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2012-01-29 13:25 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Eric Northup, KVM

On 01/29/2012 03:22 PM, Takuya Yoshikawa wrote:
> On Sun, 29 Jan 2012 13:01:18 +0200
> Avi Kivity <avi@redhat.com> wrote:
>
> > > Tested by attaching a debugger to a running qemu w/kvm and running
> > > "call munmap(0, 1UL << 46)".
> > >
> > 
> > How about computing the intersection of (start, end) with the hva ranges
> > in kvm->memslots?
> > 
> > If there is no intersection, you exit immediately.
>
> I think introducing kvm_handle_hva_range() is the right thing if we really
> care about unmapping large area at once.
>
> Current iteration:
> for each page
> 	for each slot
> 		for each level
>
> My suggestion:
> for each slot
> 	for each level
> 		for each page
>
> This way compiler can optimize the task to be a simple iteration over the
> rmap array.

Yes.  This automatically includes the intersection calculation, since
you have to do it for the 'for each page in slot' loop.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] KVM MMU: improve large munmap efficiency
  2012-01-29 13:25     ` Avi Kivity
@ 2012-01-29 13:50       ` Takuya Yoshikawa
  0 siblings, 0 replies; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-01-29 13:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Northup, KVM

On Sun, 29 Jan 2012 15:25:59 +0200
Avi Kivity <avi@redhat.com> wrote:

> > I think introducing kvm_handle_hva_range() is the right thing if we really
> > care about unmapping large area at once.
> >
> > Current iteration:
> > for each page
> > 	for each slot
> > 		for each level
> >
> > My suggestion:
> > for each slot
> > 	for each level
> > 		for each page
> >
> > This way compiler can optimize the task to be a simple iteration over the
> > rmap array.
> 
> Yes.  This automatically includes the intersection calculation, since
> you have to do it for the 'for each page in slot' loop.
> 

I personally implemented this before but stopped it because I wanted to
do rmap and rmap_pde refactoring first.

	Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-01-29 13:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-26 23:24 [RFC] KVM MMU: improve large munmap efficiency Eric Northup
2012-01-27  0:59 ` Takuya Yoshikawa
2012-01-27  1:13   ` Takuya Yoshikawa
2012-01-29 11:01 ` Avi Kivity
2012-01-29 13:22   ` Takuya Yoshikawa
2012-01-29 13:25     ` Avi Kivity
2012-01-29 13:50       ` Takuya Yoshikawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox