[PATCH] KVM: Avoid wasting pages for small lpage

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
@ 2012-05-10 14:33 Takuya Yoshikawa
  2012-05-13 10:20 ` Avi Kivity
  0 siblings, 1 reply; 6+ messages in thread
From: Takuya Yoshikawa @ 2012-05-10 14:33 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

lpage_info is created for each large level even when the memory slot is
not for RAM.  This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc():
this problem will become severer if we support more guests with more
devices in the future.

Although it is not easy to differentiate RAM slots from others, we can
avoid wasting pages by making KVM_NR_PAGE_SIZES - 1 lpage_info arrays
coalesce into one and using kmalloc() when the result is small enough.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/x86.c |   56 ++++++++++++++++++++++++++++++---------------------
 1 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4de705c..716d543 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6300,35 +6300,52 @@ void kvm_arch_free_memslot(struct kvm_memory_slot *free,
 {
 	int i;
 
-	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
-		if (!dont || free->arch.lpage_info[i] != dont->arch.lpage_info[i]) {
-			vfree(free->arch.lpage_info[i]);
-			free->arch.lpage_info[i] = NULL;
-		}
-	}
+	if (dont && free->arch.lpage_info[0] == dont->arch.lpage_info[0])
+		return;
+
+	if (is_vmalloc_addr(free->arch.lpage_info[0]))
+		vfree(free->arch.lpage_info[0]);
+	else
+		kfree(free->arch.lpage_info[0]);
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i)
+		free->arch.lpage_info[i] = NULL;
 }
 
 int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 {
 	int i;
+	int level;
+	int total_size = 0;
+	int lpages[KVM_NR_PAGE_SIZES - 1];
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
-		unsigned long ugfn;
-		int lpages;
-		int level = i + 2;
+		level = i + 2;
+		lpages[i] = gfn_to_index(slot->base_gfn + npages - 1,
+					 slot->base_gfn, level) + 1;
+		total_size += lpages[i] * sizeof(*slot->arch.lpage_info[0]);
+	}
 
-		lpages = gfn_to_index(slot->base_gfn + npages - 1,
-				      slot->base_gfn, level) + 1;
+	if (total_size > PAGE_SIZE)
+		slot->arch.lpage_info[0] = vzalloc(total_size);
+	else
+		slot->arch.lpage_info[0] = kzalloc(total_size, GFP_KERNEL);
 
+	if (!slot->arch.lpage_info[0])
+		return -ENOMEM;
+
+	for (i = 1; i < KVM_NR_PAGE_SIZES - 1; ++i)
 		slot->arch.lpage_info[i] =
-			vzalloc(lpages * sizeof(*slot->arch.lpage_info[i]));
-		if (!slot->arch.lpage_info[i])
-			goto out_free;
+			slot->arch.lpage_info[i - 1] + lpages[i - 1];
+
+	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+		unsigned long ugfn;
 
+		level = i + 2;
 		if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1))
 			slot->arch.lpage_info[i][0].write_count = 1;
 		if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1))
-			slot->arch.lpage_info[i][lpages - 1].write_count = 1;
+			slot->arch.lpage_info[i][lpages[i] - 1].write_count = 1;
 		ugfn = slot->userspace_addr >> PAGE_SHIFT;
 		/*
 		 * If the gfn and userspace address are not aligned wrt each
@@ -6339,19 +6356,12 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 		    !kvm_largepages_enabled()) {
 			unsigned long j;
 
-			for (j = 0; j < lpages; ++j)
+			for (j = 0; j < lpages[i]; ++j)
 				slot->arch.lpage_info[i][j].write_count = 1;
 		}
 	}
 
 	return 0;
-
-out_free:
-	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
-		vfree(slot->arch.lpage_info[i]);
-		slot->arch.lpage_info[i] = NULL;
-	}
-	return -ENOMEM;
 }
 
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
  2012-05-10 14:33 [PATCH] KVM: Avoid wasting pages for small lpage_info arrays Takuya Yoshikawa
@ 2012-05-13 10:20 ` Avi Kivity
  2012-05-14 13:29   ` Takuya Yoshikawa
  0 siblings, 1 reply; 6+ messages in thread
From: Avi Kivity @ 2012-05-13 10:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 05/10/2012 05:33 PM, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
>
> lpage_info is created for each large level even when the memory slot is
> not for RAM.  This means that when we add one slot for a PCI device, we
> end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc():
> this problem will become severer if we support more guests with more
> devices in the future.
>
> Although it is not easy to differentiate RAM slots from others, we can
> avoid wasting pages by making KVM_NR_PAGE_SIZES - 1 lpage_info arrays
> coalesce into one and using kmalloc() when the result is small enough.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/kvm/x86.c |   56 ++++++++++++++++++++++++++++++---------------------
>  1 files changed, 33 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4de705c..716d543 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6300,35 +6300,52 @@ void kvm_arch_free_memslot(struct kvm_memory_slot *free,
>  {
>  	int i;
>  
> -	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
> -		if (!dont || free->arch.lpage_info[i] != dont->arch.lpage_info[i]) {
> -			vfree(free->arch.lpage_info[i]);
> -			free->arch.lpage_info[i] = NULL;
> -		}
> -	}
> +	if (dont && free->arch.lpage_info[0] == dont->arch.lpage_info[0])
> +		return;
> +
> +	if (is_vmalloc_addr(free->arch.lpage_info[0]))
> +		vfree(free->arch.lpage_info[0]);
> +	else
> +		kfree(free->arch.lpage_info[0]);
> +
> +	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i)
> +		free->arch.lpage_info[i] = NULL;
>  }
>

I don't feel that the savings is worth the extra complication.  We save
two pages per memslot here.

What about using kvmalloc() instead of vmalloc()?  It's in
security/apparmor now, but can be made generic.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
  2012-05-13 10:20 ` Avi Kivity
@ 2012-05-14 13:29   ` Takuya Yoshikawa
  2012-05-15  8:02     ` Avi Kivity
  0 siblings, 1 reply; 6+ messages in thread
From: Takuya Yoshikawa @ 2012-05-14 13:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 13 May 2012 13:20:46 +0300
Avi Kivity <avi@redhat.com> wrote:

> I don't feel that the savings is worth the extra complication.  We save
> two pages per memslot here.

Using a 4KB vmalloced page for a 16B array is ...

Actually I felt like you before and did not do this, but recently there
was a talk about creating hundreds of memslots.

> What about using kvmalloc() instead of vmalloc()?  It's in
> security/apparmor now, but can be made generic.

Andrew once, maybe some times, rejected making such an API generic saying
that there should not be a generic criterion by which we can decide which
function - vmalloc() or kmalloc() - to use.

So each caller should decide by its own criteria.

In this case, we need to implement kvm specific kvmalloc().
BTW, we are already doing this for dirty_bitmap.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
  2012-05-14 13:29   ` Takuya Yoshikawa
@ 2012-05-15  8:02     ` Avi Kivity
  2012-05-15 20:25       ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Avi Kivity @ 2012-05-15  8:02 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya, Andrew Morton

On 05/14/2012 04:29 PM, Takuya Yoshikawa wrote:
> On Sun, 13 May 2012 13:20:46 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > I don't feel that the savings is worth the extra complication.  We save
> > two pages per memslot here.
>
> Using a 4KB vmalloced page for a 16B array is ...
>
> Actually I felt like you before and did not do this, but recently there
> was a talk about creating hundreds of memslots.
>
> > What about using kvmalloc() instead of vmalloc()?  It's in
> > security/apparmor now, but can be made generic.
>
> Andrew once, maybe some times, rejected making such an API generic saying
> that there should not be a generic criterion by which we can decide which
> function - vmalloc() or kmalloc() - to use.
>
> So each caller should decide by its own criteria.
>
> In this case, we need to implement kvm specific kvmalloc().
> BTW, we are already doing this for dirty_bitmap.

Okay, a local kvmalloc() is better than open-coding the logic.

Andrew, prepare yourself for some code duplication.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
  2012-05-15  8:02     ` Avi Kivity
@ 2012-05-15 20:25       ` Andrew Morton
  2012-05-16  9:26         ` Avi Kivity
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2012-05-15 20:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, mtosatti, kvm, yoshikawa.takuya

On Tue, 15 May 2012 11:02:17 +0300
Avi Kivity <avi@redhat.com> wrote:

> On 05/14/2012 04:29 PM, Takuya Yoshikawa wrote:
> > On Sun, 13 May 2012 13:20:46 +0300
> > Avi Kivity <avi@redhat.com> wrote:
> >
> > > I don't feel that the savings is worth the extra complication.  We save
> > > two pages per memslot here.
> >
> > Using a 4KB vmalloced page for a 16B array is ...
> >
> > Actually I felt like you before and did not do this, but recently there
> > was a talk about creating hundreds of memslots.
> >
> > > What about using kvmalloc() instead of vmalloc()?  It's in
> > > security/apparmor now, but can be made generic.
> >
> > Andrew once, maybe some times, rejected making such an API generic saying
> > that there should not be a generic criterion by which we can decide which
> > function - vmalloc() or kmalloc() - to use.
> >
> > So each caller should decide by its own criteria.
> >
> > In this case, we need to implement kvm specific kvmalloc().
> > BTW, we are already doing this for dirty_bitmap.
> 
> Okay, a local kvmalloc() is better than open-coding the logic.
> 
> Andrew, prepare yourself for some code duplication.

There are reasons for avoiding vmalloc().

The kernel does not run in a virtual memory environment.  It is a
harsh, low-level environment and kernel code should be robust. 
Assuming that you can allocate vast amounts of contiguous memory is not
robust.  Robust code will implement data structures which avoid this
weakness.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] KVM: Avoid wasting pages for small lpage_info arrays
  2012-05-15 20:25       ` Andrew Morton
@ 2012-05-16  9:26         ` Avi Kivity
  0 siblings, 0 replies; 6+ messages in thread
From: Avi Kivity @ 2012-05-16  9:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Takuya Yoshikawa, mtosatti, kvm, yoshikawa.takuya

On 05/15/2012 11:25 PM, Andrew Morton wrote:
> On Tue, 15 May 2012 11:02:17 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > On 05/14/2012 04:29 PM, Takuya Yoshikawa wrote:
> > > On Sun, 13 May 2012 13:20:46 +0300
> > > Avi Kivity <avi@redhat.com> wrote:
> > >
> > > > I don't feel that the savings is worth the extra complication.  We save
> > > > two pages per memslot here.
> > >
> > > Using a 4KB vmalloced page for a 16B array is ...
> > >
> > > Actually I felt like you before and did not do this, but recently there
> > > was a talk about creating hundreds of memslots.
> > >
> > > > What about using kvmalloc() instead of vmalloc()?  It's in
> > > > security/apparmor now, but can be made generic.
> > >
> > > Andrew once, maybe some times, rejected making such an API generic saying
> > > that there should not be a generic criterion by which we can decide which
> > > function - vmalloc() or kmalloc() - to use.
> > >
> > > So each caller should decide by its own criteria.
> > >
> > > In this case, we need to implement kvm specific kvmalloc().
> > > BTW, we are already doing this for dirty_bitmap.
> > 
> > Okay, a local kvmalloc() is better than open-coding the logic.
> > 
> > Andrew, prepare yourself for some code duplication.
>
> There are reasons for avoiding vmalloc().
>
> The kernel does not run in a virtual memory environment.  It is a
> harsh, low-level environment and kernel code should be robust. 

This is about downgrading an existing vmalloc() to kmalloc(), when the
sizes permit, to reduce wastage.  Not about upgrading a kmalloc() to
vmalloc().

> Assuming that you can allocate vast amounts of contiguous memory is not
> robust.  Robust code will implement data structures which avoid this
> weakness.

This is true on some architectures.  On others vast amounts of
contiguous memory _are_ available, and implementing software radix trees
to replace the hardware radix trees is not going to improve things.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-05-16  9:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-10 14:33 [PATCH] KVM: Avoid wasting pages for small lpage_info arrays Takuya Yoshikawa
2012-05-13 10:20 ` Avi Kivity
2012-05-14 13:29   ` Takuya Yoshikawa
2012-05-15  8:02     ` Avi Kivity
2012-05-15 20:25       ` Andrew Morton
2012-05-16  9:26         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox