[PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
@ 2012-07-05 10:56 Takuya Yoshikawa
  2012-07-05 11:50 ` Gleb Natapov
  0 siblings, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-07-05 10:56 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, gleb

The following commit changed mmu_shrink() so that it would skip VMs
whose n_used_mmu_pages is not zero and try to free pages from others:

  commit 1952639665e92481c34c34c3e2a71bf3e66ba362
  KVM: MMU: do not iterate over all VMs in mmu_shrink()

This patch fixes the function so that it can free mmu pages as before.

Note that "if (!nr_to_scan--)" check is removed since we do not try to
free mmu pages from more than one VM.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/kvm/mmu.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3b53d9e..5fd268a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 		 * want to shrink a VM that only started to populate its MMU
 		 * anyway.
 		 */
-		if (kvm->arch.n_used_mmu_pages > 0) {
-			if (!nr_to_scan--)
-				break;
+		if (!kvm->arch.n_used_mmu_pages)
 			continue;
-		}
 
 		idx = srcu_read_lock(&kvm->srcu);
 		spin_lock(&kvm->mmu_lock);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-05 10:56 [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended Takuya Yoshikawa
@ 2012-07-05 11:50 ` Gleb Natapov
  2012-07-05 14:05   ` Takuya Yoshikawa
  0 siblings, 1 reply; 7+ messages in thread
From: Gleb Natapov @ 2012-07-05 11:50 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm

On Thu, Jul 05, 2012 at 07:56:07PM +0900, Takuya Yoshikawa wrote:
> The following commit changed mmu_shrink() so that it would skip VMs
> whose n_used_mmu_pages is not zero and try to free pages from others:
> 
Oops,

>   commit 1952639665e92481c34c34c3e2a71bf3e66ba362
>   KVM: MMU: do not iterate over all VMs in mmu_shrink()
> 
> This patch fixes the function so that it can free mmu pages as before.
> 
> Note that "if (!nr_to_scan--)" check is removed since we do not try to
> free mmu pages from more than one VM.
> 
IIRC this was proposed in the past that we should iterate over vm list
until freeing something eventually, but Avi was against it. I think the
probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so
it looks OK to drop nr_to_scan to me.

> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> Cc: Gleb Natapov <gleb@redhat.com>
> ---
>  arch/x86/kvm/mmu.c |    5 +----
>  1 files changed, 1 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 3b53d9e..5fd268a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
>  		 * want to shrink a VM that only started to populate its MMU
>  		 * anyway.
>  		 */
> -		if (kvm->arch.n_used_mmu_pages > 0) {
> -			if (!nr_to_scan--)
> -				break;
> +		if (!kvm->arch.n_used_mmu_pages)
>  			continue;
> -		}
>  
>  		idx = srcu_read_lock(&kvm->srcu);
>  		spin_lock(&kvm->mmu_lock);
> -- 
> 1.7.5.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-05 11:50 ` Gleb Natapov
@ 2012-07-05 14:05   ` Takuya Yoshikawa
  2012-07-12  9:35     ` Takuya Yoshikawa
  0 siblings, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-07-05 14:05 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Takuya Yoshikawa, avi, mtosatti, kvm

On Thu, 5 Jul 2012 14:50:00 +0300
Gleb Natapov <gleb@redhat.com> wrote:

> > Note that "if (!nr_to_scan--)" check is removed since we do not try to
> > free mmu pages from more than one VM.
> > 
> IIRC this was proposed in the past that we should iterate over vm list
> until freeing something eventually, but Avi was against it. I think the
> probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so
> it looks OK to drop nr_to_scan to me.

Since our batch size is 128, the minimum positive @nr_to_scan, it's almost
impossible to see the effect of the check.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-05 14:05   ` Takuya Yoshikawa
@ 2012-07-12  9:35     ` Takuya Yoshikawa
  2012-07-18 20:52       ` Marcelo Tosatti
  0 siblings, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-07-12  9:35 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Gleb Natapov, avi, mtosatti, kvm

On Thu, 5 Jul 2012 23:05:46 +0900
Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> On Thu, 5 Jul 2012 14:50:00 +0300
> Gleb Natapov <gleb@redhat.com> wrote:
> 
> > > Note that "if (!nr_to_scan--)" check is removed since we do not try to
> > > free mmu pages from more than one VM.
> > > 
> > IIRC this was proposed in the past that we should iterate over vm list
> > until freeing something eventually, but Avi was against it. I think the
> > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so
> > it looks OK to drop nr_to_scan to me.
> 
> Since our batch size is 128, the minimum positive @nr_to_scan, it's almost
> impossible to see the effect of the check.

Thinking more about this:

I think freeing mmu pages by shrink_slab() is problematic.

For example, if we do
 # echo 2 /proc/sys/vm/drop_caches
on the host, some mmu pages will be freed.

This is not what most people expect, probably.


Although this patch is needed to care about shadow paging's extreme
mmu page usage, we should do somthing better in the future.

What I think reasonable is not treating all mmu pages as freeable:
 - determine some base number of mmu pages: base_mmu_pages
 - return (total_mmu_pages - base_mmu_pages) to the caller

 * We may use n_max_mmu_pages for calculating this base number.

By doing so, we can avoid freeing mmu pages especially when EPT/NPT ON.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-12  9:35     ` Takuya Yoshikawa
@ 2012-07-18 20:52       ` Marcelo Tosatti
  2012-07-20  1:04         ` Takuya Yoshikawa
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Tosatti @ 2012-07-18 20:52 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm

On Thu, Jul 12, 2012 at 06:35:09PM +0900, Takuya Yoshikawa wrote:
> On Thu, 5 Jul 2012 23:05:46 +0900
> Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:
> 
> > On Thu, 5 Jul 2012 14:50:00 +0300
> > Gleb Natapov <gleb@redhat.com> wrote:
> > 
> > > > Note that "if (!nr_to_scan--)" check is removed since we do not try to
> > > > free mmu pages from more than one VM.
> > > > 
> > > IIRC this was proposed in the past that we should iterate over vm list
> > > until freeing something eventually, but Avi was against it. I think the
> > > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so
> > > it looks OK to drop nr_to_scan to me.
> > 
> > Since our batch size is 128, the minimum positive @nr_to_scan, it's almost
> > impossible to see the effect of the check.
> 
> Thinking more about this:
> 
> I think freeing mmu pages by shrink_slab() is problematic.
> 
> For example, if we do
>  # echo 2 /proc/sys/vm/drop_caches
> on the host, some mmu pages will be freed.
> 
> This is not what most people expect, probably.
> 
> 
> Although this patch is needed to care about shadow paging's extreme
> mmu page usage, we should do somthing better in the future.
> 
> What I think reasonable is not treating all mmu pages as freeable:
>  - determine some base number of mmu pages: base_mmu_pages
>  - return (total_mmu_pages - base_mmu_pages) to the caller
> 
>  * We may use n_max_mmu_pages for calculating this base number.
> 
> By doing so, we can avoid freeing mmu pages especially when EPT/NPT ON.

Takuya,

Can't understand, can you please expand more clearly? 

TIA


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-18 20:52       ` Marcelo Tosatti
@ 2012-07-20  1:04         ` Takuya Yoshikawa
  2012-07-20 14:42           ` Marcelo Tosatti
  0 siblings, 1 reply; 7+ messages in thread
From: Takuya Yoshikawa @ 2012-07-20  1:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm

On Wed, 18 Jul 2012 17:52:46 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> Can't understand, can you please expand more clearly? 

I think mmu pages are not worth freeing under usual memory pressure,
especially when we have EPT/NPT on.

What's happening:
shrink_slab() vainly calls mmu_shrink() with the default batch size 128,
and mmu_shrink() takes a long time to zap mmu pages far fewer than the
requested number, usually just frees one.  Sadly, KVM may recreate the
page soon after that.

Since we set the seeks 10 times greater than the default, total_scan is
very small and shrink_slab() just wastes time for freeing such small
amount of may-be-reallocated-soon memory: I want it to use time for
scanning other objects instead.

Actually the total amount of memory used for mmu pages is not huge in
the case of EPT/NPT on: maybe smaller that that of rmap?

So, it's clear that no one wants mmu pages to be freed as other objects.
Sure, our seeks size prevents shrink_slab() from calling mmu_shrink()
usually.  But what if administrators want to drop clean caches on the
host?

Documentation/sysctl/vm.txt says:
  Writing to this will cause the kernel to drop clean caches, dentries and
  inodes from memory, causing that memory to become free.

  To free pagecache:
          echo 1 > /proc/sys/vm/drop_caches
  To free dentries and inodes:
          echo 2 > /proc/sys/vm/drop_caches
  To free pagecache, dentries and inodes:
          echo 3 > /proc/sys/vm/drop_caches

I don't want mmu pages to be freed in such cases.

So, how about stopping reporting/returning the total number of used
mmu pages to shrink_slab()?

If we do so, it will think that there are not enough objects to get
memory back from KVM.

In the case of shadow paging, guests can do bad things to allocate
enormous mmu pages, so we should report such exceeded numbers to
shrink_slab() as freeable objects, not the total.

  |--- needed ---|--- freeable under memory pressure ---|

We may be able to use n_max_mmu_pages for this: the shrinker tries
to free mmu pages unless the number reaches the goal.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
  2012-07-20  1:04         ` Takuya Yoshikawa
@ 2012-07-20 14:42           ` Marcelo Tosatti
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Tosatti @ 2012-07-20 14:42 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm

On Fri, Jul 20, 2012 at 10:04:34AM +0900, Takuya Yoshikawa wrote:
> On Wed, 18 Jul 2012 17:52:46 -0300
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > Can't understand, can you please expand more clearly? 
> 
> I think mmu pages are not worth freeing under usual memory pressure,
> especially when we have EPT/NPT on.
> 
> What's happening:
> shrink_slab() vainly calls mmu_shrink() with the default batch size 128,
> and mmu_shrink() takes a long time to zap mmu pages far fewer than the
> requested number, usually just frees one.  Sadly, KVM may recreate the
> page soon after that.
> 
> Since we set the seeks 10 times greater than the default, total_scan is
> very small and shrink_slab() just wastes time for freeing such small
> amount of may-be-reallocated-soon memory: I want it to use time for
> scanning other objects instead.
> 
> Actually the total amount of memory used for mmu pages is not huge in
> the case of EPT/NPT on: maybe smaller that that of rmap?

rmap size is a function of mmu pages, so mmu_shrink indirectly 
releases rmap also.

> So, it's clear that no one wants mmu pages to be freed as other objects.
> Sure, our seeks size prevents shrink_slab() from calling mmu_shrink()
> usually.  But what if administrators want to drop clean caches on the
> host?
> 
> Documentation/sysctl/vm.txt says:
>   Writing to this will cause the kernel to drop clean caches, dentries and
>   inodes from memory, causing that memory to become free.
> 
>   To free pagecache:
>           echo 1 > /proc/sys/vm/drop_caches
>   To free dentries and inodes:
>           echo 2 > /proc/sys/vm/drop_caches
>   To free pagecache, dentries and inodes:
>           echo 3 > /proc/sys/vm/drop_caches
> 
> I don't want mmu pages to be freed in such cases.

drop_caches should be used in special occasions. I would not worry
about it.

> So, how about stopping reporting/returning the total number of used
> mmu pages to shrink_slab()?
> 
> If we do so, it will think that there are not enough objects to get
> memory back from KVM.

No, its important to be able to release memory quickly in low memory
conditions.

I bet the reasoning behind current seeks value (10*default) is close to
arbitrary.

mmu_shrink can be smarter, by freeing pages which are less likely to
be used. IIRC Avi had some nice ideas for LRU-like schemes (search the
archives).

You can also consider the fact that freeing a higher level pagetable
frees all of its children (that is quite dumb actually, sequential
shrink passes should free only pages with no children).

> In the case of shadow paging, guests can do bad things to allocate
> enormous mmu pages, so we should report such exceeded numbers to
> shrink_slab() as freeable objects, not the total.

A guest idle for 2 months should not have its mmu pages in memory.

>   |--- needed ---|--- freeable under memory pressure ---|
> 
> We may be able to use n_max_mmu_pages for this: the shrinker tries
> to free mmu pages unless the number reaches the goal.
> 
> Thanks,
> 	Takuya
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-07-20 15:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-05 10:56 [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended Takuya Yoshikawa
2012-07-05 11:50 ` Gleb Natapov
2012-07-05 14:05   ` Takuya Yoshikawa
2012-07-12  9:35     ` Takuya Yoshikawa
2012-07-18 20:52       ` Marcelo Tosatti
2012-07-20  1:04         ` Takuya Yoshikawa
2012-07-20 14:42           ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox