* [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended @ 2012-07-05 10:56 Takuya Yoshikawa 2012-07-05 11:50 ` Gleb Natapov 0 siblings, 1 reply; 7+ messages in thread From: Takuya Yoshikawa @ 2012-07-05 10:56 UTC (permalink / raw) To: avi, mtosatti; +Cc: kvm, gleb The following commit changed mmu_shrink() so that it would skip VMs whose n_used_mmu_pages is not zero and try to free pages from others: commit 1952639665e92481c34c34c3e2a71bf3e66ba362 KVM: MMU: do not iterate over all VMs in mmu_shrink() This patch fixes the function so that it can free mmu pages as before. Note that "if (!nr_to_scan--)" check is removed since we do not try to free mmu pages from more than one VM. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> --- arch/x86/kvm/mmu.c | 5 +---- 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3b53d9e..5fd268a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) * want to shrink a VM that only started to populate its MMU * anyway. */ - if (kvm->arch.n_used_mmu_pages > 0) { - if (!nr_to_scan--) - break; + if (!kvm->arch.n_used_mmu_pages) continue; - } idx = srcu_read_lock(&kvm->srcu); spin_lock(&kvm->mmu_lock); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-05 10:56 [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended Takuya Yoshikawa @ 2012-07-05 11:50 ` Gleb Natapov 2012-07-05 14:05 ` Takuya Yoshikawa 0 siblings, 1 reply; 7+ messages in thread From: Gleb Natapov @ 2012-07-05 11:50 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm On Thu, Jul 05, 2012 at 07:56:07PM +0900, Takuya Yoshikawa wrote: > The following commit changed mmu_shrink() so that it would skip VMs > whose n_used_mmu_pages is not zero and try to free pages from others: > Oops, > commit 1952639665e92481c34c34c3e2a71bf3e66ba362 > KVM: MMU: do not iterate over all VMs in mmu_shrink() > > This patch fixes the function so that it can free mmu pages as before. > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > free mmu pages from more than one VM. > IIRC this was proposed in the past that we should iterate over vm list until freeing something eventually, but Avi was against it. I think the probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so it looks OK to drop nr_to_scan to me. > Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > Cc: Gleb Natapov <gleb@redhat.com> > --- > arch/x86/kvm/mmu.c | 5 +---- > 1 files changed, 1 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 3b53d9e..5fd268a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) > * want to shrink a VM that only started to populate its MMU > * anyway. > */ > - if (kvm->arch.n_used_mmu_pages > 0) { > - if (!nr_to_scan--) > - break; > + if (!kvm->arch.n_used_mmu_pages) > continue; > - } > > idx = srcu_read_lock(&kvm->srcu); > spin_lock(&kvm->mmu_lock); > -- > 1.7.5.4 -- Gleb. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-05 11:50 ` Gleb Natapov @ 2012-07-05 14:05 ` Takuya Yoshikawa 2012-07-12 9:35 ` Takuya Yoshikawa 0 siblings, 1 reply; 7+ messages in thread From: Takuya Yoshikawa @ 2012-07-05 14:05 UTC (permalink / raw) To: Gleb Natapov; +Cc: Takuya Yoshikawa, avi, mtosatti, kvm On Thu, 5 Jul 2012 14:50:00 +0300 Gleb Natapov <gleb@redhat.com> wrote: > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > > free mmu pages from more than one VM. > > > IIRC this was proposed in the past that we should iterate over vm list > until freeing something eventually, but Avi was against it. I think the > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so > it looks OK to drop nr_to_scan to me. Since our batch size is 128, the minimum positive @nr_to_scan, it's almost impossible to see the effect of the check. Thanks, Takuya ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-05 14:05 ` Takuya Yoshikawa @ 2012-07-12 9:35 ` Takuya Yoshikawa 2012-07-18 20:52 ` Marcelo Tosatti 0 siblings, 1 reply; 7+ messages in thread From: Takuya Yoshikawa @ 2012-07-12 9:35 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: Gleb Natapov, avi, mtosatti, kvm On Thu, 5 Jul 2012 23:05:46 +0900 Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote: > On Thu, 5 Jul 2012 14:50:00 +0300 > Gleb Natapov <gleb@redhat.com> wrote: > > > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > > > free mmu pages from more than one VM. > > > > > IIRC this was proposed in the past that we should iterate over vm list > > until freeing something eventually, but Avi was against it. I think the > > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so > > it looks OK to drop nr_to_scan to me. > > Since our batch size is 128, the minimum positive @nr_to_scan, it's almost > impossible to see the effect of the check. Thinking more about this: I think freeing mmu pages by shrink_slab() is problematic. For example, if we do # echo 2 /proc/sys/vm/drop_caches on the host, some mmu pages will be freed. This is not what most people expect, probably. Although this patch is needed to care about shadow paging's extreme mmu page usage, we should do somthing better in the future. What I think reasonable is not treating all mmu pages as freeable: - determine some base number of mmu pages: base_mmu_pages - return (total_mmu_pages - base_mmu_pages) to the caller * We may use n_max_mmu_pages for calculating this base number. By doing so, we can avoid freeing mmu pages especially when EPT/NPT ON. Thanks, Takuya ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-12 9:35 ` Takuya Yoshikawa @ 2012-07-18 20:52 ` Marcelo Tosatti 2012-07-20 1:04 ` Takuya Yoshikawa 0 siblings, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2012-07-18 20:52 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm On Thu, Jul 12, 2012 at 06:35:09PM +0900, Takuya Yoshikawa wrote: > On Thu, 5 Jul 2012 23:05:46 +0900 > Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote: > > > On Thu, 5 Jul 2012 14:50:00 +0300 > > Gleb Natapov <gleb@redhat.com> wrote: > > > > > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > > > > free mmu pages from more than one VM. > > > > > > > IIRC this was proposed in the past that we should iterate over vm list > > > until freeing something eventually, but Avi was against it. I think the > > > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so > > > it looks OK to drop nr_to_scan to me. > > > > Since our batch size is 128, the minimum positive @nr_to_scan, it's almost > > impossible to see the effect of the check. > > Thinking more about this: > > I think freeing mmu pages by shrink_slab() is problematic. > > For example, if we do > # echo 2 /proc/sys/vm/drop_caches > on the host, some mmu pages will be freed. > > This is not what most people expect, probably. > > > Although this patch is needed to care about shadow paging's extreme > mmu page usage, we should do somthing better in the future. > > What I think reasonable is not treating all mmu pages as freeable: > - determine some base number of mmu pages: base_mmu_pages > - return (total_mmu_pages - base_mmu_pages) to the caller > > * We may use n_max_mmu_pages for calculating this base number. > > By doing so, we can avoid freeing mmu pages especially when EPT/NPT ON. Takuya, Can't understand, can you please expand more clearly? TIA ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-18 20:52 ` Marcelo Tosatti @ 2012-07-20 1:04 ` Takuya Yoshikawa 2012-07-20 14:42 ` Marcelo Tosatti 0 siblings, 1 reply; 7+ messages in thread From: Takuya Yoshikawa @ 2012-07-20 1:04 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm On Wed, 18 Jul 2012 17:52:46 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote: > Can't understand, can you please expand more clearly? I think mmu pages are not worth freeing under usual memory pressure, especially when we have EPT/NPT on. What's happening: shrink_slab() vainly calls mmu_shrink() with the default batch size 128, and mmu_shrink() takes a long time to zap mmu pages far fewer than the requested number, usually just frees one. Sadly, KVM may recreate the page soon after that. Since we set the seeks 10 times greater than the default, total_scan is very small and shrink_slab() just wastes time for freeing such small amount of may-be-reallocated-soon memory: I want it to use time for scanning other objects instead. Actually the total amount of memory used for mmu pages is not huge in the case of EPT/NPT on: maybe smaller that that of rmap? So, it's clear that no one wants mmu pages to be freed as other objects. Sure, our seeks size prevents shrink_slab() from calling mmu_shrink() usually. But what if administrators want to drop clean caches on the host? Documentation/sysctl/vm.txt says: Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free. To free pagecache: echo 1 > /proc/sys/vm/drop_caches To free dentries and inodes: echo 2 > /proc/sys/vm/drop_caches To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches I don't want mmu pages to be freed in such cases. So, how about stopping reporting/returning the total number of used mmu pages to shrink_slab()? If we do so, it will think that there are not enough objects to get memory back from KVM. In the case of shadow paging, guests can do bad things to allocate enormous mmu pages, so we should report such exceeded numbers to shrink_slab() as freeable objects, not the total. |--- needed ---|--- freeable under memory pressure ---| We may be able to use n_max_mmu_pages for this: the shrinker tries to free mmu pages unless the number reaches the goal. Thanks, Takuya ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended 2012-07-20 1:04 ` Takuya Yoshikawa @ 2012-07-20 14:42 ` Marcelo Tosatti 0 siblings, 0 replies; 7+ messages in thread From: Marcelo Tosatti @ 2012-07-20 14:42 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Gleb Natapov, avi, kvm On Fri, Jul 20, 2012 at 10:04:34AM +0900, Takuya Yoshikawa wrote: > On Wed, 18 Jul 2012 17:52:46 -0300 > Marcelo Tosatti <mtosatti@redhat.com> wrote: > > > Can't understand, can you please expand more clearly? > > I think mmu pages are not worth freeing under usual memory pressure, > especially when we have EPT/NPT on. > > What's happening: > shrink_slab() vainly calls mmu_shrink() with the default batch size 128, > and mmu_shrink() takes a long time to zap mmu pages far fewer than the > requested number, usually just frees one. Sadly, KVM may recreate the > page soon after that. > > Since we set the seeks 10 times greater than the default, total_scan is > very small and shrink_slab() just wastes time for freeing such small > amount of may-be-reallocated-soon memory: I want it to use time for > scanning other objects instead. > > Actually the total amount of memory used for mmu pages is not huge in > the case of EPT/NPT on: maybe smaller that that of rmap? rmap size is a function of mmu pages, so mmu_shrink indirectly releases rmap also. > So, it's clear that no one wants mmu pages to be freed as other objects. > Sure, our seeks size prevents shrink_slab() from calling mmu_shrink() > usually. But what if administrators want to drop clean caches on the > host? > > Documentation/sysctl/vm.txt says: > Writing to this will cause the kernel to drop clean caches, dentries and > inodes from memory, causing that memory to become free. > > To free pagecache: > echo 1 > /proc/sys/vm/drop_caches > To free dentries and inodes: > echo 2 > /proc/sys/vm/drop_caches > To free pagecache, dentries and inodes: > echo 3 > /proc/sys/vm/drop_caches > > I don't want mmu pages to be freed in such cases. drop_caches should be used in special occasions. I would not worry about it. > So, how about stopping reporting/returning the total number of used > mmu pages to shrink_slab()? > > If we do so, it will think that there are not enough objects to get > memory back from KVM. No, its important to be able to release memory quickly in low memory conditions. I bet the reasoning behind current seeks value (10*default) is close to arbitrary. mmu_shrink can be smarter, by freeing pages which are less likely to be used. IIRC Avi had some nice ideas for LRU-like schemes (search the archives). You can also consider the fact that freeing a higher level pagetable frees all of its children (that is quite dumb actually, sequential shrink passes should free only pages with no children). > In the case of shadow paging, guests can do bad things to allocate > enormous mmu pages, so we should report such exceeded numbers to > shrink_slab() as freeable objects, not the total. A guest idle for 2 months should not have its mmu pages in memory. > |--- needed ---|--- freeable under memory pressure ---| > > We may be able to use n_max_mmu_pages for this: the shrinker tries > to free mmu pages unless the number reaches the goal. > > Thanks, > Takuya > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-07-20 15:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-05 10:56 [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended Takuya Yoshikawa 2012-07-05 11:50 ` Gleb Natapov 2012-07-05 14:05 ` Takuya Yoshikawa 2012-07-12 9:35 ` Takuya Yoshikawa 2012-07-18 20:52 ` Marcelo Tosatti 2012-07-20 1:04 ` Takuya Yoshikawa 2012-07-20 14:42 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox