[RFC 0/2] kvm: Transcendent Memory (tmem) on KVM

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
@ 2012-03-08 16:29 Akshay Karle
  2012-03-15 16:42 ` Konrad Rzeszutek Wilk
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Akshay Karle @ 2012-03-08 16:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dan Magenheimer, konrad.wilk, kvm, ashu tripathi, nishant gulhane,
	amarmore2006, Shreyas Mahure, mahesh mohan

Hi,

We are undergraduate engineering students of Maharashtra Academy of
Engineering, Pune, India and we are working on a project entitled
'Transcendent Memory on KVM' as a part of our academics.
The project members are:
1. Ashutosh Tripathi
2. Shreyas Mahure
3. Nishant Gulhane
4. Akshay Karle

---
Project Description:
What is Transcendent Memory(tmem in short)?
Transcendent Memory is a memory optimization technique for the
virtualized environment. It collects the underutilized memory of the
guests and the unassigned(fallow) memory of the host and places it into
a central tmem pool. Indirect access to this pool is then provided to the guests.
For further information on tmem, please refer the article on lwn by Dr.
Dan Magenheimer:
http://lwn.net/Articles/454795/

Since kvm is one of the most popular hypervisors available,
we decided to implement this technique for kvm.

---
kvm-tmem Patch details:
This patch adds appropriate shims at the guest that invokes the kvm
hypercalls, and the host uses zcache pools to implement the required
functions.

To enable tmem on the 'kvm host' add the boot parameter:
"kvmtmem"
And to enable tmem in the 'kvm guests' add the boot parameter:
"tmem"

The diffstat details for this patch are given below:
 arch/x86/include/asm/kvm_host.h      |    1 
 arch/x86/kvm/x86.c                   |    4 
 drivers/staging/zcache/Makefile      |    2 
 drivers/staging/zcache/kvm-tmem.c    |  356 +++++++++++++++++++++++++++++++++++
 drivers/staging/zcache/kvm-tmem.h    |   55 +++++
 drivers/staging/zcache/zcache-main.c |   98 ++++++++-
 include/linux/kvm_para.h             |    1 
 7 files changed, 508 insertions(+), 9 deletions(-)
	
We have already uploaded our work alongwith the 'Frontswap' submitted by Dan,
on the following link:
https://github.com/akshaykarle/kvm-tmem

Any comments/feedback would be appreciated and will help us a lot with our work.

Regards,
Akshay


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-08 16:29 [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM Akshay Karle
@ 2012-03-15 16:42 ` Konrad Rzeszutek Wilk
  2012-03-15 16:48 ` Konrad Rzeszutek Wilk
  2012-03-15 16:58 ` Avi Kivity
  2 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-15 16:42 UTC (permalink / raw)
  To: Akshay Karle
  Cc: linux-kernel, Dan Magenheimer, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On Thu, Mar 08, 2012 at 09:59:41PM +0530, Akshay Karle wrote:
> Hi,
> 
> We are undergraduate engineering students of Maharashtra Academy of
> Engineering, Pune, India and we are working on a project entitled
> 'Transcendent Memory on KVM' as a part of our academics.
> The project members are:
> 1. Ashutosh Tripathi
> 2. Shreyas Mahure
> 3. Nishant Gulhane
> 4. Akshay Karle
> 
> ---
> Project Description:
> What is Transcendent Memory(tmem in short)?
> Transcendent Memory is a memory optimization technique for the
> virtualized environment. It collects the underutilized memory of the
> guests and the unassigned(fallow) memory of the host and places it into
> a central tmem pool. Indirect access to this pool is then provided to the guests.
> For further information on tmem, please refer the article on lwn by Dr.
> Dan Magenheimer:
> http://lwn.net/Articles/454795/
> 
> Since kvm is one of the most popular hypervisors available,
> we decided to implement this technique for kvm.
> 
> ---
> kvm-tmem Patch details:
> This patch adds appropriate shims at the guest that invokes the kvm
> hypercalls, and the host uses zcache pools to implement the required
> functions.

Great!

> 
> To enable tmem on the 'kvm host' add the boot parameter:
> "kvmtmem"
> And to enable tmem in the 'kvm guests' add the boot parameter:
> "tmem"
> 
> The diffstat details for this patch are given below:
>  arch/x86/include/asm/kvm_host.h      |    1 
>  arch/x86/kvm/x86.c                   |    4 
>  drivers/staging/zcache/Makefile      |    2 
>  drivers/staging/zcache/kvm-tmem.c    |  356 +++++++++++++++++++++++++++++++++++
>  drivers/staging/zcache/kvm-tmem.h    |   55 +++++
>  drivers/staging/zcache/zcache-main.c |   98 ++++++++-
>  include/linux/kvm_para.h             |    1 
>  7 files changed, 508 insertions(+), 9 deletions(-)
> 	
> We have already uploaded our work alongwith the 'Frontswap' submitted by Dan,
> on the following link:
> https://github.com/akshaykarle/kvm-tmem
> 
> Any comments/feedback would be appreciated and will help us a lot with our work.

Great. Will do.
> 
> Regards,
> Akshay

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-08 16:29 [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM Akshay Karle
  2012-03-15 16:42 ` Konrad Rzeszutek Wilk
@ 2012-03-15 16:48 ` Konrad Rzeszutek Wilk
  2012-03-15 16:58 ` Avi Kivity
  2 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-15 16:48 UTC (permalink / raw)
  To: Akshay Karle
  Cc: linux-kernel, Dan Magenheimer, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

> ---
> kvm-tmem Patch details:
> This patch adds appropriate shims at the guest that invokes the kvm
> hypercalls, and the host uses zcache pools to implement the required
> functions.
> 
> To enable tmem on the 'kvm host' add the boot parameter:
> "kvmtmem"
> And to enable tmem in the 'kvm guests' add the boot parameter:
> "tmem"
> 
> The diffstat details for this patch are given below:
>  arch/x86/include/asm/kvm_host.h      |    1 
>  arch/x86/kvm/x86.c                   |    4 
>  drivers/staging/zcache/Makefile      |    2 
>  drivers/staging/zcache/kvm-tmem.c    |  356 +++++++++++++++++++++++++++++++++++
>  drivers/staging/zcache/kvm-tmem.h    |   55 +++++
>  drivers/staging/zcache/zcache-main.c |   98 ++++++++-
>  include/linux/kvm_para.h             |    1 
>  7 files changed, 508 insertions(+), 9 deletions(-)
> 	
> We have already uploaded our work alongwith the 'Frontswap' submitted by Dan,
> on the following link:
> https://github.com/akshaykarle/kvm-tmem

Is there a way for these patches to be posted on LKML? It is rather difficult
to copy-n-paste patches in emails and sending them. Or if you want to, you can
email them directly to me. To do that use 'git send-email' and 'git format-patch'
to prep the git commits into patches.


Also, the title says 'RFC 0/2' but I am not seing 1 or 2?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-08 16:29 [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM Akshay Karle
  2012-03-15 16:42 ` Konrad Rzeszutek Wilk
  2012-03-15 16:48 ` Konrad Rzeszutek Wilk
@ 2012-03-15 16:58 ` Avi Kivity
  2012-03-15 17:49   ` Dan Magenheimer
  2 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-03-15 16:58 UTC (permalink / raw)
  To: Akshay Karle
  Cc: linux-kernel, Dan Magenheimer, konrad.wilk, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On 03/08/2012 06:29 PM, Akshay Karle wrote:
> Hi,
>
> We are undergraduate engineering students of Maharashtra Academy of
> Engineering, Pune, India and we are working on a project entitled
> 'Transcendent Memory on KVM' as a part of our academics.
> The project members are:
> 1. Ashutosh Tripathi
> 2. Shreyas Mahure
> 3. Nishant Gulhane
> 4. Akshay Karle
>
> ---
> Project Description:
> What is Transcendent Memory(tmem in short)?
> Transcendent Memory is a memory optimization technique for the
> virtualized environment. It collects the underutilized memory of the
> guests and the unassigned(fallow) memory of the host and places it into
> a central tmem pool. Indirect access to this pool is then provided to the guests.
> For further information on tmem, please refer the article on lwn by Dr.
> Dan Magenheimer:
> http://lwn.net/Articles/454795/
>
> Since kvm is one of the most popular hypervisors available,
> we decided to implement this technique for kvm.
>
> Any comments/feedback would be appreciated and will help us a lot with our work.
>

One of the potential problems with tmem is reduction in performance when
the cache hit rate is low, for example when streaming.

Can you test this by creating a large file, for example with

  dd < /dev/urandom > file bs=1M count=100000

and then measuring the time to stream it, using

  time dd < file > /dev/null

with and without the patch?

Should be done on a cleancache enabled guest filesystem backed by a
virtio disk with cache=none.

It would be interesting to compare kvm_stat during the streaming, with
and without the patch.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 16:58 ` Avi Kivity
@ 2012-03-15 17:49   ` Dan Magenheimer
  2012-03-15 18:01     ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Magenheimer @ 2012-03-15 17:49 UTC (permalink / raw)
  To: Avi Kivity, Akshay Karle
  Cc: linux-kernel, Konrad Wilk, kvm, ashu tripathi, nishant gulhane,
	amarmore2006, Shreyas Mahure, mahesh mohan

> From: Avi Kivity [mailto:avi@redhat.com]
> Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
> 
> On 03/08/2012 06:29 PM, Akshay Karle wrote:
> > Hi,
> >
> > We are undergraduate engineering students of Maharashtra Academy of
> > Engineering, Pune, India and we are working on a project entitled
> > 'Transcendent Memory on KVM' as a part of our academics.
> >
> > Since kvm is one of the most popular hypervisors available,
> > we decided to implement this technique for kvm.
> >
> > Any comments/feedback would be appreciated and will help us a lot with our work.
> 
> One of the potential problems with tmem is reduction in performance when
> the cache hit rate is low, for example when streaming.
> 
> Can you test this by creating a large file, for example with
> 
>   dd < /dev/urandom > file bs=1M count=100000
> 
> and then measuring the time to stream it, using
> 
>   time dd < file > /dev/null
> 
> with and without the patch?
> 
> Should be done on a cleancache enabled guest filesystem backed by a
> virtio disk with cache=none.
> 
> It would be interesting to compare kvm_stat during the streaming, with
> and without the patch.

Hi Avi --

The "WasActive" patch (https://lkml.org/lkml/2012/1/25/300) 
is intended to avoid the streaming situation you are creating here.
It increases the "quality" of cached pages placed into zcache
and should probably also be used on the guest-side stubs (and/or maybe
the host-side zcache... I don't know KVM well enough to determine
if that would work).

As Dave Hansen pointed out, the WasActive patch is not yet correct
and, as akpm points out, pageflag bits are scarce on 32-bit systems,
so it remains to be seen if the WasActive patch can be upstreamed.
Or maybe there is a different way to achieve the same goal.
But I wanted to let you know that the streaming issue is understood
and needs to be resolved for some cleancache backends just as it was
resolved in the core mm code.

The measurement you suggest would still be interesting even
without the WasActive patch as it measures a "worst case".

Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 17:49   ` Dan Magenheimer
@ 2012-03-15 18:01     ` Avi Kivity
  2012-03-15 18:02       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-03-15 18:01 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Akshay Karle, linux-kernel, Konrad Wilk, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On 03/15/2012 07:49 PM, Dan Magenheimer wrote:
> > One of the potential problems with tmem is reduction in performance when
> > the cache hit rate is low, for example when streaming.
> > 
> > Can you test this by creating a large file, for example with
> > 
> >   dd < /dev/urandom > file bs=1M count=100000
> > 
> > and then measuring the time to stream it, using
> > 
> >   time dd < file > /dev/null
> > 
> > with and without the patch?
> > 
> > Should be done on a cleancache enabled guest filesystem backed by a
> > virtio disk with cache=none.
> > 
> > It would be interesting to compare kvm_stat during the streaming, with
> > and without the patch.
>
> Hi Avi --
>
> The "WasActive" patch (https://lkml.org/lkml/2012/1/25/300) 
> is intended to avoid the streaming situation you are creating here.
> It increases the "quality" of cached pages placed into zcache
> and should probably also be used on the guest-side stubs (and/or maybe
> the host-side zcache... I don't know KVM well enough to determine
> if that would work).
>
> As Dave Hansen pointed out, the WasActive patch is not yet correct
> and, as akpm points out, pageflag bits are scarce on 32-bit systems,
> so it remains to be seen if the WasActive patch can be upstreamed.
> Or maybe there is a different way to achieve the same goal.
> But I wanted to let you know that the streaming issue is understood
> and needs to be resolved for some cleancache backends just as it was
> resolved in the core mm code.

Nice.  This takes care of the tail-end of the streaming (the more
important one - since it always involves a cold copy).  What about the
other side?  Won't the read code invoke cleancache_get_page() for every
page? (this one is just a null hypercall, so it's cheaper, but still
expensive).

> The measurement you suggest would still be interesting even
> without the WasActive patch as it measures a "worst case".

It can provide the justification for that patch, yes.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 18:01     ` Avi Kivity
@ 2012-03-15 18:02       ` Konrad Rzeszutek Wilk
  2012-03-15 18:10         ` Avi Kivity
  2012-03-15 19:16         ` Dan Magenheimer
  0 siblings, 2 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-15 18:02 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Dan Magenheimer, Akshay Karle, linux-kernel, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On Thu, Mar 15, 2012 at 08:01:52PM +0200, Avi Kivity wrote:
> On 03/15/2012 07:49 PM, Dan Magenheimer wrote:
> > > One of the potential problems with tmem is reduction in performance when
> > > the cache hit rate is low, for example when streaming.
> > > 
> > > Can you test this by creating a large file, for example with
> > > 
> > >   dd < /dev/urandom > file bs=1M count=100000
> > > 
> > > and then measuring the time to stream it, using
> > > 
> > >   time dd < file > /dev/null
> > > 
> > > with and without the patch?
> > > 
> > > Should be done on a cleancache enabled guest filesystem backed by a
> > > virtio disk with cache=none.
> > > 
> > > It would be interesting to compare kvm_stat during the streaming, with
> > > and without the patch.
> >
> > Hi Avi --
> >
> > The "WasActive" patch (https://lkml.org/lkml/2012/1/25/300) 
> > is intended to avoid the streaming situation you are creating here.
> > It increases the "quality" of cached pages placed into zcache
> > and should probably also be used on the guest-side stubs (and/or maybe
> > the host-side zcache... I don't know KVM well enough to determine
> > if that would work).
> >
> > As Dave Hansen pointed out, the WasActive patch is not yet correct
> > and, as akpm points out, pageflag bits are scarce on 32-bit systems,
> > so it remains to be seen if the WasActive patch can be upstreamed.
> > Or maybe there is a different way to achieve the same goal.
> > But I wanted to let you know that the streaming issue is understood
> > and needs to be resolved for some cleancache backends just as it was
> > resolved in the core mm code.
> 
> Nice.  This takes care of the tail-end of the streaming (the more
> important one - since it always involves a cold copy).  What about the
> other side?  Won't the read code invoke cleancache_get_page() for every
> page? (this one is just a null hypercall, so it's cheaper, but still
> expensive).

That is something we should fix - I think it was mentioned in the frontswap
email thread the need for batching and it certainly seems required as those
hypercalls aren't that cheap.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 18:02       ` Konrad Rzeszutek Wilk
@ 2012-03-15 18:10         ` Avi Kivity
  2012-03-15 19:36           ` Dan Magenheimer
  2012-03-15 19:16         ` Dan Magenheimer
  1 sibling, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2012-03-15 18:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Dan Magenheimer, Akshay Karle, linux-kernel, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On 03/15/2012 08:02 PM, Konrad Rzeszutek Wilk wrote:
> > 
> > Nice.  This takes care of the tail-end of the streaming (the more
> > important one - since it always involves a cold copy).  What about the
> > other side?  Won't the read code invoke cleancache_get_page() for every
> > page? (this one is just a null hypercall, so it's cheaper, but still
> > expensive).
>
> That is something we should fix - I think it was mentioned in the frontswap
> email thread the need for batching and it certainly seems required as those
> hypercalls aren't that cheap.

In fact when tmem was first proposed I asked for two changes - make it
batchable, and make it asynchronous (so we can offload copies to a dma
engine, etc).  Of course that would have made tmem significantly more
complicated.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 18:10         ` Avi Kivity
@ 2012-03-15 19:36           ` Dan Magenheimer
  2012-03-15 19:46             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 11+ messages in thread
From: Dan Magenheimer @ 2012-03-15 19:36 UTC (permalink / raw)
  To: Avi Kivity, Konrad Wilk
  Cc: Akshay Karle, linux-kernel, kvm, ashu tripathi, nishant gulhane,
	amarmore2006, Shreyas Mahure, mahesh mohan

> From: Avi Kivity [mailto:avi@redhat.com]
> Sent: Thursday, March 15, 2012 12:11 PM
> To: Konrad Rzeszutek Wilk
> Cc: Dan Magenheimer; Akshay Karle; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; ashu tripathi;
> nishant gulhane; amarmore2006; Shreyas Mahure; mahesh mohan
> Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
> 
> On 03/15/2012 08:02 PM, Konrad Rzeszutek Wilk wrote:
> > >
> > > Nice.  This takes care of the tail-end of the streaming (the more
> > > important one - since it always involves a cold copy).  What about the
> > > other side?  Won't the read code invoke cleancache_get_page() for every
> > > page? (this one is just a null hypercall, so it's cheaper, but still
> > > expensive).
> >
> > That is something we should fix - I think it was mentioned in the frontswap
> > email thread the need for batching and it certainly seems required as those
> > hypercalls aren't that cheap.
> 
> In fact when tmem was first proposed I asked for two changes - make it
> batchable, and make it asynchronous (so we can offload copies to a dma
> engine, etc).  Of course that would have made tmem significantly more
> complicated.

(Sorry, I'm not typing fast enough to keep up with the thread...)

Hi Avi --

In case it wasn't clear from my last reply, RAMster shows
that tmem CAN be used asynchronously... by making it more
complicated, but without making the core kernel changes more
complicated.

In RAMster, pages are locally cached (compressed using zcache)
and then, depending on policy, a separate thread sends the pages
to a remote machine.  So the first part (compress and store locally)
still must be synchronous, but the second part (transmit to
another -- remote or possibly host? -- system) can be done
asynchronously.  The RAMster code has to handle all the race
conditions, which is a pain but seems to work.

This is all working today in RAMster (which is in linux-next).
Batching is still not implemented by any tmem backend, but RAMster
demonstrates how the backend implementation COULD do batching without
any additional core kernel changes.  I.e. no changes necessary
to frontswap or cleancache.

So, you see, I *was* listening. I just wasn't willing to fight
the uphill battle of much more complexity in the core kernel
for a capability that could be implemented differently.

That said, I still think it remains to be proven that
reducing the number of hypercalls by 2x or 3x (or whatever
the batching factor you choose) will make a noticeable
performance difference.  But if it does, batching can
be done... and completely hidden in the backend.

(I hope Andrea is listening ;-)

Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 19:36           ` Dan Magenheimer
@ 2012-03-15 19:46             ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-03-15 19:46 UTC (permalink / raw)
  To: Dan Magenheimer
  Cc: Avi Kivity, Akshay Karle, linux-kernel, kvm, ashu tripathi,
	nishant gulhane, amarmore2006, Shreyas Mahure, mahesh mohan

On Thu, Mar 15, 2012 at 12:36:48PM -0700, Dan Magenheimer wrote:
> > From: Avi Kivity [mailto:avi@redhat.com]
> > Sent: Thursday, March 15, 2012 12:11 PM
> > To: Konrad Rzeszutek Wilk
> > Cc: Dan Magenheimer; Akshay Karle; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; ashu tripathi;
> > nishant gulhane; amarmore2006; Shreyas Mahure; mahesh mohan
> > Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
> > 
> > On 03/15/2012 08:02 PM, Konrad Rzeszutek Wilk wrote:
> > > >
> > > > Nice.  This takes care of the tail-end of the streaming (the more
> > > > important one - since it always involves a cold copy).  What about the
> > > > other side?  Won't the read code invoke cleancache_get_page() for every
> > > > page? (this one is just a null hypercall, so it's cheaper, but still
> > > > expensive).
> > >
> > > That is something we should fix - I think it was mentioned in the frontswap
> > > email thread the need for batching and it certainly seems required as those
> > > hypercalls aren't that cheap.
> > 
> > In fact when tmem was first proposed I asked for two changes - make it
> > batchable, and make it asynchronous (so we can offload copies to a dma
> > engine, etc).  Of course that would have made tmem significantly more
> > complicated.
> 
> (Sorry, I'm not typing fast enough to keep up with the thread...)
> 
> Hi Avi --
> 
> In case it wasn't clear from my last reply, RAMster shows
> that tmem CAN be used asynchronously... by making it more
> complicated, but without making the core kernel changes more
> complicated.
> 
> In RAMster, pages are locally cached (compressed using zcache)
> and then, depending on policy, a separate thread sends the pages
> to a remote machine.  So the first part (compress and store locally)
> still must be synchronous, but the second part (transmit to
> another -- remote or possibly host? -- system) can be done
> asynchronously.  The RAMster code has to handle all the race
> conditions, which is a pain but seems to work.
> 
> This is all working today in RAMster (which is in linux-next).
> Batching is still not implemented by any tmem backend, but RAMster
> demonstrates how the backend implementation COULD do batching without
> any additional core kernel changes.  I.e. no changes necessary
> to frontswap or cleancache.
> 
> So, you see, I *was* listening. I just wasn't willing to fight
> the uphill battle of much more complexity in the core kernel
> for a capability that could be implemented differently.

Dan, please stop this.

The frontswap work is going through me and my goal is to provide
the batching and asynchronous option. It might take longer than
anticipated b/c it might require redoing some of the code - that
is OK. We can do this in steps too - first do the synchronous
(as is right now in implementation) and then add on the batching
and asynchrnous work. This means breaking the ABI/API, and I believe
Avi would like the ABI be as much baked as possible so that he does
not have to provide a v2 (or v3) of the tmem support in KVM.

I appreciate you having done that in RAMster but the "transmit"
option is what we need to batch. Think of Scatter Gather DMA.

> 
> That said, I still think it remains to be proven that
> reducing the number of hypercalls by 2x or 3x (or whatever
> the batching factor you choose) will make a noticeable

I was thinking 32 - about the same number that we do in
Xen with PV MMU upcalls. We also batch it there with multicalls.

> performance difference.  But if it does, batching can
> be done... and completely hidden in the backend.
> 
> (I hope Andrea is listening ;-)
> 
> Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
  2012-03-15 18:02       ` Konrad Rzeszutek Wilk
  2012-03-15 18:10         ` Avi Kivity
@ 2012-03-15 19:16         ` Dan Magenheimer
  1 sibling, 0 replies; 11+ messages in thread
From: Dan Magenheimer @ 2012-03-15 19:16 UTC (permalink / raw)
  To: Konrad Wilk, Avi Kivity
  Cc: Akshay Karle, linux-kernel, kvm, ashu tripathi, nishant gulhane,
	amarmore2006, Shreyas Mahure, mahesh mohan

> From: Konrad Rzeszutek Wilk
> Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
> 
> On Thu, Mar 15, 2012 at 08:01:52PM +0200, Avi Kivity wrote:
> > On 03/15/2012 07:49 PM, Dan Magenheimer wrote:
> > >
> > > The "WasActive" patch (https://lkml.org/lkml/2012/1/25/300)
> > > is intended to avoid the streaming situation you are creating here.
> > > It increases the "quality" of cached pages placed into zcache
> > > and should probably also be used on the guest-side stubs (and/or maybe
> > > the host-side zcache... I don't know KVM well enough to determine
> > > if that would work).
> > >
> > > As Dave Hansen pointed out, the WasActive patch is not yet correct
> > > and, as akpm points out, pageflag bits are scarce on 32-bit systems,
> > > so it remains to be seen if the WasActive patch can be upstreamed.
> > > Or maybe there is a different way to achieve the same goal.
> > > But I wanted to let you know that the streaming issue is understood
> > > and needs to be resolved for some cleancache backends just as it was
> > > resolved in the core mm code.
> >
> > Nice.  This takes care of the tail-end of the streaming (the more
> > important one - since it always involves a cold copy).  What about the
> > other side?  Won't the read code invoke cleancache_get_page() for every
> > page? (this one is just a null hypercall, so it's cheaper, but still
> > expensive).
> 
> That is something we should fix - I think it was mentioned in the frontswap
> email thread the need for batching and it certainly seems required as those
> hypercalls aren't that cheap.

And exactly how expensive ARE hypercalls these days?  On the first VT/SVN
systems they were tens of thousands of cycles... now they are closer
to sub-thousand are they not?  (I remember seeing a graph of hypercall
overhead dropping across generations of CPUs... anybody have a pointer to
a public graph of this?)

One of my favorite papers these days is "When Poll is Better than Interrupt"
(http://static.usenix.org/events/fast12/tech/full_papers/Yang.pdf) which
argues that wasting some CPU cycles doing a busy-wait is often more
efficient than slogging through the Block I/O subsystem to set up
and respond to an interrupt, if the device is fast enough.  I wonder if the
same might be true comparing hypercall overhead for tmem vs the path for
KVM to get a page from the host via its normal path?

Ignoring that for now, if excessive hypercalls is a problem, a better
solution than batching may be to modify the Maharashtra approach to
be more like RAMster:  Put zcache in the guest-side and treat the
host like a "remote" system.

But let's wait for the Maharashta team to do some measurements first
before we make any assumptions or change any designs...

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-03-15 19:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-08 16:29 [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM Akshay Karle
2012-03-15 16:42 ` Konrad Rzeszutek Wilk
2012-03-15 16:48 ` Konrad Rzeszutek Wilk
2012-03-15 16:58 ` Avi Kivity
2012-03-15 17:49   ` Dan Magenheimer
2012-03-15 18:01     ` Avi Kivity
2012-03-15 18:02       ` Konrad Rzeszutek Wilk
2012-03-15 18:10         ` Avi Kivity
2012-03-15 19:36           ` Dan Magenheimer
2012-03-15 19:46             ` Konrad Rzeszutek Wilk
2012-03-15 19:16         ` Dan Magenheimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox