public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* Elvis upstreaming plan
@ 2013-11-24  9:22 Razya Ladelsky
  2013-11-24 10:26 ` Michael S. Tsirkin
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Razya Ladelsky @ 2013-11-24  9:22 UTC (permalink / raw)
  To: kvm
  Cc: anthony, mst, gleb, pbonzini, asias, jasowang, digitaleric,
	abel.gordon, Abel Gordon, Eran Raichstein, Joel Nider

Hi all,

I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
developed Elvis, presented by Abel Gordon at the last KVM forum: 
ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 


According to the discussions that took place at the forum, upstreaming 
some of the Elvis approaches seems to be a good idea, which we would like 
to pursue.

Our plan for the first patches is the following: 

1.Shared vhost thread between mutiple devices 
This patch creates a worker thread and worker queue shared across multiple 
virtio devices 
We would like to modify the patch posted in
https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
to limit a vhost thread to serve multiple devices only if they belong to 
the same VM as Paolo suggested to avoid isolation or cgroups concerns.

Another modification is related to the creation and removal of vhost 
threads, which will be discussed next.

2. Sysfs mechanism to add and remove vhost threads 
This patch allows us to add and remove vhost threads dynamically.

A simpler way to control the creation of vhost threads is statically 
determining the maximum number of virtio devices per worker via a kernel 
module parameter (which is the way the previously mentioned patch is 
currently implemented)

I'd like to ask for advice here about the more preferable way to go:
Although having the sysfs mechanism provides more flexibility, it may be a 
good idea to start with a simple static parameter, and have the first 
patches as simple as possible. What do you think?

3.Add virtqueue polling mode to vhost 
Have the vhost thread poll the virtqueues with high I/O rate for new 
buffers , and avoid asking the guest to kick us.
https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

4. vhost statistics
This patch introduces a set of statistics to monitor different performance 
metrics of vhost and our polling and I/O scheduling mechanisms. The 
statistics are exposed using debugfs and can be easily displayed with a 
Python script (vhost_stat, based on the old kvm_stats)
https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0


5. Add heuristics to improve I/O scheduling 
This patch enhances the round-robin mechanism with a set of heuristics to 
decide when to leave a virtqueue and proceed to the next.
https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

This patch improves the handling of the requests by the vhost thread, but 
could perhaps be delayed to a 
later time , and not submitted as one of the first Elvis patches.
I'd love to hear some comments about whether this patch needs to be part 
of the first submission.

Any other feedback on this plan will be appreciated,
Thank you,
Razya


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
@ 2013-11-24 10:26 ` Michael S. Tsirkin
  2013-11-25 11:06   ` Razya Ladelsky
  2013-11-26 15:50 ` Stefan Hajnoczi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-24 10:26 UTC (permalink / raw)
  To: Razya Ladelsky
  Cc: kvm, anthony, gleb, pbonzini, asias, jasowang, digitaleric,
	abel.gordon, Abel Gordon, Eran Raichstein, Joel Nider

On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> Hi all,
> 
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
> 
> 
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
> 
> Our plan for the first patches is the following: 
> 
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> 
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
> 
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)
> 
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
> 
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0
> 
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
> 
> 
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> 
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
> 
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya


How about we start with the stats patch?
This will make it easier to evaluate the other patches.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24 10:26 ` Michael S. Tsirkin
@ 2013-11-25 11:06   ` Razya Ladelsky
  0 siblings, 0 replies; 39+ messages in thread
From: Razya Ladelsky @ 2013-11-25 11:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Abel Gordon, abel.gordon, anthony, asias, digitaleric,
	Eran Raichstein, gleb, jasowang, Joel Nider, kvm, pbonzini

"Michael S. Tsirkin" <mst@redhat.com> wrote on 24/11/2013 12:26:15 PM:

> From: "Michael S. Tsirkin" <mst@redhat.com>
> To: Razya Ladelsky/Haifa/IBM@IBMIL, 
> Cc: kvm@vger.kernel.org, anthony@codemonkey.ws, gleb@redhat.com, 
> pbonzini@redhat.com, asias@redhat.com, jasowang@redhat.com, 
> digitaleric@google.com, abel.gordon@gmail.com, Abel Gordon/Haifa/
> IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL
> Date: 24/11/2013 12:22 PM
> Subject: Re: Elvis upstreaming plan
> 
> On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> > Hi all,
> > 
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> > developed Elvis, presented by Abel Gordon at the last KVM forum: 
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> > ELVIS slides: 
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
> > 
> > 
> > According to the discussions that took place at the forum, upstreaming 

> > some of the Elvis approaches seems to be a good idea, which we would 
like 
> > to pursue.
> > 
> > Our plan for the first patches is the following: 
> > 
> > 1.Shared vhost thread between mutiple devices 
> > This patch creates a worker thread and worker queue shared across 
multiple 
> > virtio devices 
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> > to limit a vhost thread to serve multiple devices only if they belong 
to 
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > 
> > Another modification is related to the creation and removal of vhost 
> > threads, which will be discussed next.
> >
> > 2. Sysfs mechanism to add and remove vhost threads 
> > This patch allows us to add and remove vhost threads dynamically.
> > 
> > A simpler way to control the creation of vhost threads is statically 
> > determining the maximum number of virtio devices per worker via a 
kernel 
> > module parameter (which is the way the previously mentioned patch is 
> > currently implemented)
> > 
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may 
be a 
> > good idea to start with a simple static parameter, and have the first 
> > patches as simple as possible. What do you think?
> > 
> > 3.Add virtqueue polling mode to vhost 
> > Have the vhost thread poll the virtqueues with high I/O rate for new 
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > 
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different 
performance 
> > metrics of vhost and our polling and I/O scheduling mechanisms. The 
> > statistics are exposed using debugfs and can be easily displayed with 
a 
> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > 
> > 
> > 5. Add heuristics to improve I/O scheduling 
> > This patch enhances the round-robin mechanism with a set of heuristics 
to 
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > 
> > This patch improves the handling of the requests by the vhost thread, 
but 
> > could perhaps be delayed to a 
> > later time , and not submitted as one of the first Elvis patches.
> > I'd love to hear some comments about whether this patch needs to be 
part 
> > of the first submission.
> > 
> > Any other feedback on this plan will be appreciated,
> > Thank you,
> > Razya
> 
> 
> How about we start with the stats patch?
> This will make it easier to evaluate the other patches.
> 

Hi Michael,
Thank you for your quick reply.
Our plan was to send all these patches that contain the Elvis code.
We can start with the stats patch, however, many of the statistics there 
are related to the features that the other patches provide...
B.T.W. If you a chance to look at the rest of the patches,
I'd really appreciate your comments,
Thank you very much,
Razya



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
  2013-11-24 10:26 ` Michael S. Tsirkin
@ 2013-11-26 15:50 ` Stefan Hajnoczi
  2013-11-26 18:05 ` Anthony Liguori
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2013-11-26 15:50 UTC (permalink / raw)
  To: Razya Ladelsky
  Cc: kvm, anthony, mst, gleb, pbonzini, asias, jasowang, digitaleric,
	abel.gordon, Abel Gordon, Eran Raichstein, Joel Nider

On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d

This patch should probably do something portable instead of relying on
x86-only rdtscll().

Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
  2013-11-24 10:26 ` Michael S. Tsirkin
  2013-11-26 15:50 ` Stefan Hajnoczi
@ 2013-11-26 18:05 ` Anthony Liguori
  2013-11-26 18:53   ` Abel Gordon
  2013-11-26 22:27 ` Bandan Das
  2013-11-27  2:49 ` Jason Wang
  4 siblings, 1 reply; 39+ messages in thread
From: Anthony Liguori @ 2013-11-26 18:05 UTC (permalink / raw)
  To: Razya Ladelsky, kvm
  Cc: mst, gleb, pbonzini, asias, jasowang, digitaleric, abel.gordon,
	Abel Gordon, Eran Raichstein, Joel Nider

Razya Ladelsky <RAZYA@il.ibm.com> writes:

> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.

I think this is an exceptionally bad idea.

We shouldn't throw away isolation without exhausting every other
possibility.

We've seen very positive results from adding threads.  We should also
look at scheduling.

Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case scenario.

> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)
>
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
>
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Ack on this.

Regards,

Anthony Liguori

> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-26 18:05 ` Anthony Liguori
@ 2013-11-26 18:53   ` Abel Gordon
  2013-11-26 21:11     ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-26 18:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: abel.gordon, asias, digitaleric, Eran Raichstein, gleb, jasowang,
	Joel Nider, kvm, mst, pbonzini, Razya Ladelsky



Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00 PM:

>
> Razya Ladelsky <RAZYA@il.ibm.com> writes:
>
> > Hi all,
> >
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> >
> >
> > According to the discussions that took place at the forum, upstreaming
> > some of the Elvis approaches seems to be a good idea, which we would
like
> > to pursue.
> >
> > Our plan for the first patches is the following:
> >
> > 1.Shared vhost thread between mutiple devices
> > This patch creates a worker thread and worker queue shared across
multiple
> > virtio devices
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > to limit a vhost thread to serve multiple devices only if they belong
to
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> >
> > Another modification is related to the creation and removal of vhost
> > threads, which will be discussed next.
>
> I think this is an exceptionally bad idea.
>
> We shouldn't throw away isolation without exhausting every other
> possibility.

Seems you have missed the important details here.
Anthony, we are aware you are concerned about isolation
and you believe we should not share a single vhost thread across
multiple VMs.  That's why Razya proposed to change the patch
so we will serve multiple virtio devices using a single vhost thread
"only if the devices belong to the same VM". This series of patches
will not allow two different VMs to share the same vhost thread.
So, I don't see why this will be throwing away isolation and why
this could be a "exceptionally bad idea".

By the way, I remember that during the KVM forum a similar
approach of having a single data plane thread for many devices
was discussed....

> We've seen very positive results from adding threads.  We should also
> look at scheduling.

...and we have also seen exceptionally negative results from
adding threads, both for vhost and data-plane. If you have lot of idle
time/cores
then it makes sense to run multiple threads. But IMHO in many scenarios you
don't have lot of idle time/cores.. and if you have them you would probably
prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
enough physical cores to run all the VCPU threads and the I/O threads is
not a
realistic scenario.

That's why we are proposing to implement a mechanism that will enable
the management stack to configure 1 thread per I/O device (as it is today)
or 1 thread for many I/O devices (belonging to the same VM).

> Once you are scheduling multiple guests in a single vhost device, you
> now create a whole new class of DoS attacks in the best case scenario.

Again, we are NOT proposing to schedule multiple guests in a single
vhost thread. We are proposing to schedule multiple devices belonging
to the same guest in a single (or multiple) vhost thread/s.

>
> > 2. Sysfs mechanism to add and remove vhost threads
> > This patch allows us to add and remove vhost threads dynamically.
> >
> > A simpler way to control the creation of vhost threads is statically
> > determining the maximum number of virtio devices per worker via a
kernel
> > module parameter (which is the way the previously mentioned patch is
> > currently implemented)
> >
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may
be a
> > good idea to start with a simple static parameter, and have the first
> > patches as simple as possible. What do you think?
> >
> > 3.Add virtqueue polling mode to vhost
> > Have the vhost thread poll the virtqueues with high I/O rate for new
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> Ack on this.

:)

Regards,
Abel.

>
> Regards,
>
> Anthony Liguori
>
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different
performance
> > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > statistics are exposed using debugfs and can be easily displayed with a

> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> >
> > 5. Add heuristics to improve I/O scheduling
> > This patch enhances the round-robin mechanism with a set of heuristics
to
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> >
> > This patch improves the handling of the requests by the vhost thread,
but
> > could perhaps be delayed to a
> > later time , and not submitted as one of the first Elvis patches.
> > I'd love to hear some comments about whether this patch needs to be
part
> > of the first submission.
> >
> > Any other feedback on this plan will be appreciated,
> > Thank you,
> > Razya
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-26 18:53   ` Abel Gordon
@ 2013-11-26 21:11     ` Michael S. Tsirkin
  2013-11-27  7:43       ` Joel Nider
  2013-11-27  9:03       ` Abel Gordon
  0 siblings, 2 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-26 21:11 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Anthony Liguori, abel.gordon, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> 
> 
> Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00 PM:
> 
> >
> > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> >
> > > Hi all,
> > >
> > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > ELVIS slides:
> https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > >
> > >
> > > According to the discussions that took place at the forum, upstreaming
> > > some of the Elvis approaches seems to be a good idea, which we would
> like
> > > to pursue.
> > >
> > > Our plan for the first patches is the following:
> > >
> > > 1.Shared vhost thread between mutiple devices
> > > This patch creates a worker thread and worker queue shared across
> multiple
> > > virtio devices
> > > We would like to modify the patch posted in
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > to limit a vhost thread to serve multiple devices only if they belong
> to
> > > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > >
> > > Another modification is related to the creation and removal of vhost
> > > threads, which will be discussed next.
> >
> > I think this is an exceptionally bad idea.
> >
> > We shouldn't throw away isolation without exhausting every other
> > possibility.
> 
> Seems you have missed the important details here.
> Anthony, we are aware you are concerned about isolation
> and you believe we should not share a single vhost thread across
> multiple VMs.  That's why Razya proposed to change the patch
> so we will serve multiple virtio devices using a single vhost thread
> "only if the devices belong to the same VM". This series of patches
> will not allow two different VMs to share the same vhost thread.
> So, I don't see why this will be throwing away isolation and why
> this could be a "exceptionally bad idea".
> 
> By the way, I remember that during the KVM forum a similar
> approach of having a single data plane thread for many devices
> was discussed....
> > We've seen very positive results from adding threads.  We should also
> > look at scheduling.
> 
> ...and we have also seen exceptionally negative results from
> adding threads, both for vhost and data-plane. If you have lot of idle
> time/cores
> then it makes sense to run multiple threads. But IMHO in many scenarios you
> don't have lot of idle time/cores.. and if you have them you would probably
> prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> enough physical cores to run all the VCPU threads and the I/O threads is
> not a
> realistic scenario.
> 
> That's why we are proposing to implement a mechanism that will enable
> the management stack to configure 1 thread per I/O device (as it is today)
> or 1 thread for many I/O devices (belonging to the same VM).
> 
> > Once you are scheduling multiple guests in a single vhost device, you
> > now create a whole new class of DoS attacks in the best case scenario.
> 
> Again, we are NOT proposing to schedule multiple guests in a single
> vhost thread. We are proposing to schedule multiple devices belonging
> to the same guest in a single (or multiple) vhost thread/s.
> 

I guess a question then becomes why have multiple devices?


> >
> > > 2. Sysfs mechanism to add and remove vhost threads
> > > This patch allows us to add and remove vhost threads dynamically.
> > >
> > > A simpler way to control the creation of vhost threads is statically
> > > determining the maximum number of virtio devices per worker via a
> kernel
> > > module parameter (which is the way the previously mentioned patch is
> > > currently implemented)
> > >
> > > I'd like to ask for advice here about the more preferable way to go:
> > > Although having the sysfs mechanism provides more flexibility, it may
> be a
> > > good idea to start with a simple static parameter, and have the first
> > > patches as simple as possible. What do you think?
> > >
> > > 3.Add virtqueue polling mode to vhost
> > > Have the vhost thread poll the virtqueues with high I/O rate for new
> > > buffers , and avoid asking the guest to kick us.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> >
> > Ack on this.
> 
> :)
> 
> Regards,
> Abel.
> 
> >
> > Regards,
> >
> > Anthony Liguori
> >
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with a
> 
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > >
> > >
> > > 5. Add heuristics to improve I/O scheduling
> > > This patch enhances the round-robin mechanism with a set of heuristics
> to
> > > decide when to leave a virtqueue and proceed to the next.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > >
> > > This patch improves the handling of the requests by the vhost thread,
> but
> > > could perhaps be delayed to a
> > > later time , and not submitted as one of the first Elvis patches.
> > > I'd love to hear some comments about whether this patch needs to be
> part
> > > of the first submission.
> > >
> > > Any other feedback on this plan will be appreciated,
> > > Thank you,
> > > Razya
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
                   ` (2 preceding siblings ...)
  2013-11-26 18:05 ` Anthony Liguori
@ 2013-11-26 22:27 ` Bandan Das
  2013-11-27  2:49 ` Jason Wang
  4 siblings, 0 replies; 39+ messages in thread
From: Bandan Das @ 2013-11-26 22:27 UTC (permalink / raw)
  To: Razya Ladelsky
  Cc: kvm, anthony, mst, gleb, pbonzini, asias, jasowang, digitaleric,
	abel.gordon, Abel Gordon, Eran Raichstein, Joel Nider

Razya Ladelsky <RAZYA@il.ibm.com> writes:

> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)

Does the sysfs interface aim to let the _user_ control the maximum number of 
devices per vhost thread or/and let the user create and  destroy 
worker threads at will ?

Setting the limit on the number of devices makes sense but I am not sure
if there is any reason to actually expose an interface to create or destroy 
workers. Also, it might be worthwhile to think if it's better to just let 
the worker thread stay around (hoping it might be used again in 
the future) rather then destroying it..

> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?

I am actually inclined more towards a static limit. I think that in a 
typical setup, the user will set this for his/her environment just once 
at load time and forget about it.

Bandan

> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
                   ` (3 preceding siblings ...)
  2013-11-26 22:27 ` Bandan Das
@ 2013-11-27  2:49 ` Jason Wang
  2013-11-27  7:35   ` Gleb Natapov
  2013-11-27 10:18   ` Abel Gordon
  4 siblings, 2 replies; 39+ messages in thread
From: Jason Wang @ 2013-11-27  2:49 UTC (permalink / raw)
  To: Razya Ladelsky, kvm
  Cc: anthony, Michael S. Tsirkin, gleb, pbonzini, asias, digitaleric,
	abel.gordon, Abel Gordon, Eran Raichstein, Joel Nider, bsd

On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.
>
> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)

Any chance we can re-use the cwmq instead of inventing another
mechanism? Looks like there're lots of function duplication here. Bandan
has an RFC to do this.
>
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
>
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Maybe we can make poll_stop_idle adaptive which may help the light load
case. Consider guest is often slow than vhost, if we just have one or
two vms, polling too much may waste cpu in this case.
> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0

How about using trace points instead? Besides statistics, it can also
help more in debugging.
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  2:49 ` Jason Wang
@ 2013-11-27  7:35   ` Gleb Natapov
  2013-11-27  7:45     ` Joel Nider
  2013-11-27  9:18     ` Abel Gordon
  2013-11-27 10:18   ` Abel Gordon
  1 sibling, 2 replies; 39+ messages in thread
From: Gleb Natapov @ 2013-11-27  7:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Razya Ladelsky, kvm, anthony, Michael S. Tsirkin, pbonzini, asias,
	digitaleric, abel.gordon, Abel Gordon, Eran Raichstein,
	Joel Nider, bsd

On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different performance 
> > metrics of vhost and our polling and I/O scheduling mechanisms. The 
> > statistics are exposed using debugfs and can be easily displayed with a 
> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
> 
> How about using trace points instead? Besides statistics, it can also
> help more in debugging.
Definitely. kvm_stats has moved to ftrace long time ago.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-26 21:11     ` Michael S. Tsirkin
@ 2013-11-27  7:43       ` Joel Nider
  2013-11-27 10:27         ` Michael S. Tsirkin
  2013-11-27 15:00         ` Stefan Hajnoczi
  2013-11-27  9:03       ` Abel Gordon
  1 sibling, 2 replies; 39+ messages in thread
From: Joel Nider @ 2013-11-27  7:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Abel Gordon, abel.gordon, Anthony Liguori, asias, digitaleric,
	Eran Raichstein, gleb, jasowang, kvm, pbonzini, Razya Ladelsky

[-- Attachment #1: Type: text/plain, Size: 8635 bytes --]

Hi,

Razya is out for a few days, so I will try to answer the questions as well
as I can:

"Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:

> From: "Michael S. Tsirkin" <mst@redhat.com>
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> Haifa/IBM@IBMIL
> Date: 27/11/2013 01:08 AM
> Subject: Re: Elvis upstreaming plan
>
> On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >
> >
> > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
PM:
> >
> > >
> > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > >
<edit>
> >
> > That's why we are proposing to implement a mechanism that will enable
> > the management stack to configure 1 thread per I/O device (as it is
today)
> > or 1 thread for many I/O devices (belonging to the same VM).
> >
> > > Once you are scheduling multiple guests in a single vhost device, you
> > > now create a whole new class of DoS attacks in the best case
scenario.
> >
> > Again, we are NOT proposing to schedule multiple guests in a single
> > vhost thread. We are proposing to schedule multiple devices belonging
> > to the same guest in a single (or multiple) vhost thread/s.
> >
>
> I guess a question then becomes why have multiple devices?

If you mean "why serve multiple devices from a single thread" the answer is
that we cannot rely on the Linux scheduler which has no knowledge of I/O
queues to do a decent job of scheduling I/O.  The idea is to take over the
I/O scheduling responsibilities from the kernel's thread scheduler with a
more efficient I/O scheduler inside each vhost thread.  So by combining all
of the I/O devices from the same guest (disks, network cards, etc) in a
single I/O thread, it allows us to provide better scheduling by giving us
more knowledge of the nature of the work.  So now instead of relying on the
linux scheduler to perform context switches between multiple vhost threads,
we have a single thread context in which we can do the I/O scheduling more
efficiently.  We can closely monitor the performance needs of each queue of
each device inside the vhost thread which gives us much more information
than relying on the kernel's thread scheduler.
This does not expose any additional opportunities for attacks (DoS or
other) than are already available since all of the I/O traffic belongs to a
single guest.
You can make the argument that with low I/O loads this mechanism may not
make much difference.  However when you try to maximize the utilization of
your hardware (such as in a commercial scenario) this technique can gain
you a large benefit.

Regards,

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab
                                                                                        
                                                                                        
                                                                                        
 Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded image moved to file: 
 E-mail: JOELN@il.ibm.com                                              pic31578.gif)IBM 
                                                                                        
                                                                                        




> > > > Hi all,
> > > >
> > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > ELVIS slides:
> > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > >
> > > >
> > > > According to the discussions that took place at the forum,
upstreaming
> > > > some of the Elvis approaches seems to be a good idea, which we
would
> > like
> > > > to pursue.
> > > >
> > > > Our plan for the first patches is the following:
> > > >
> > > > 1.Shared vhost thread between mutiple devices
> > > > This patch creates a worker thread and worker queue shared across
> > multiple
> > > > virtio devices
> > > > We would like to modify the patch posted in
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > to limit a vhost thread to serve multiple devices only if they
belong
> > to
> > > > the same VM as Paolo suggested to avoid isolation or cgroups
concerns.
> > > >
> > > > Another modification is related to the creation and removal of
vhost
> > > > threads, which will be discussed next.
> > >
> > > I think this is an exceptionally bad idea.
> > >
> > > We shouldn't throw away isolation without exhausting every other
> > > possibility.
> >
> > Seems you have missed the important details here.
> > Anthony, we are aware you are concerned about isolation
> > and you believe we should not share a single vhost thread across
> > multiple VMs.  That's why Razya proposed to change the patch
> > so we will serve multiple virtio devices using a single vhost thread
> > "only if the devices belong to the same VM". This series of patches
> > will not allow two different VMs to share the same vhost thread.
> > So, I don't see why this will be throwing away isolation and why
> > this could be a "exceptionally bad idea".
> >
> > By the way, I remember that during the KVM forum a similar
> > approach of having a single data plane thread for many devices
> > was discussed....
> > > We've seen very positive results from adding threads.  We should also
> > > look at scheduling.
> >
> > ...and we have also seen exceptionally negative results from
> > adding threads, both for vhost and data-plane. If you have lot of idle
> > time/cores
> > then it makes sense to run multiple threads. But IMHO in many scenarios
you
> > don't have lot of idle time/cores.. and if you have them you would
probably
> > prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> > enough physical cores to run all the VCPU threads and the I/O threads
is
> > not a
> > realistic scenario.

>
> > >
> > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > This patch allows us to add and remove vhost threads dynamically.
> > > >
> > > > A simpler way to control the creation of vhost threads is
statically
> > > > determining the maximum number of virtio devices per worker via a
> > kernel
> > > > module parameter (which is the way the previously mentioned patch
is
> > > > currently implemented)
> > > >
> > > > I'd like to ask for advice here about the more preferable way to
go:
> > > > Although having the sysfs mechanism provides more flexibility, it
may
> > be a
> > > > good idea to start with a simple static parameter, and have the
first
> > > > patches as simple as possible. What do you think?
> > > >
> > > > 3.Add virtqueue polling mode to vhost
> > > > Have the vhost thread poll the virtqueues with high I/O rate for
new
> > > > buffers , and avoid asking the guest to kick us.
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > >
> > > Ack on this.
> >
> > :)
> >
> > Regards,
> > Abel.
> >
> > >
> > > Regards,
> > >
> > > Anthony Liguori
> > >
> > > > 4. vhost statistics
> > > > This patch introduces a set of statistics to monitor different
> > performance
> > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > statistics are exposed using debugfs and can be easily displayed
with a
> >
> > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > >
> > > >
> > > > 5. Add heuristics to improve I/O scheduling
> > > > This patch enhances the round-robin mechanism with a set of
heuristics
> > to
> > > > decide when to leave a virtqueue and proceed to the next.
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > >
> > > > This patch improves the handling of the requests by the vhost
thread,
> > but
> > > > could perhaps be delayed to a
> > > > later time , and not submitted as one of the first Elvis patches.
> > > > I'd love to hear some comments about whether this patch needs to be
> > part
> > > > of the first submission.
> > > >
> > > > Any other feedback on this plan will be appreciated,
> > > > Thank you,
> > > > Razya
> > >
>

[-- Attachment #2: pic31578.gif --]
[-- Type: image/gif, Size: 1851 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  7:35   ` Gleb Natapov
@ 2013-11-27  7:45     ` Joel Nider
  2013-11-27  9:18     ` Abel Gordon
  1 sibling, 0 replies; 39+ messages in thread
From: Joel Nider @ 2013-11-27  7:45 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Abel Gordon, abel.gordon, anthony, asias, bsd, digitaleric,
	Eran Raichstein, Jason Wang, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]



Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 09:35:01 AM:

> From: Gleb Natapov <gleb@redhat.com>
> To: Jason Wang <jasowang@redhat.com>,
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org,
> anthony@codemonkey.ws, "Michael S. Tsirkin" <mst@redhat.com>,
> pbonzini@redhat.com, asias@redhat.com, digitaleric@google.com,
> abel.gordon@gmail.com, Abel Gordon/Haifa/IBM@IBMIL, Eran Raichstein/
> Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, bsd@redhat.com
> Date: 27/11/2013 11:35 AM
> Subject: Re: Elvis upstreaming plan
>
> On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with
a
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> Definitely. kvm_stats has moved to ftrace long time ago.
>
> --
>          Gleb.
>

Ok - we will look at this newer mechanism.

Joel Nider
Virtualization Research
IBM Research and Development
Haifa Research Lab
                                                                                        
                                                                                        
                                                                                        
 Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded image moved to file: 
 E-mail: JOELN@il.ibm.com                                              pic56195.gif)IBM 
                                                                                        
                                                                                        


[-- Attachment #2: pic56195.gif --]
[-- Type: image/gif, Size: 1851 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-26 21:11     ` Michael S. Tsirkin
  2013-11-27  7:43       ` Joel Nider
@ 2013-11-27  9:03       ` Abel Gordon
  2013-11-27  9:21         ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky



"Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:

> On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >
> >
> > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
PM:
> >
> > >
> > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > >
> > > > Hi all,
> > > >
> > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > ELVIS slides:
> > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > >
> > > >
> > > > According to the discussions that took place at the forum,
upstreaming
> > > > some of the Elvis approaches seems to be a good idea, which we
would
> > like
> > > > to pursue.
> > > >
> > > > Our plan for the first patches is the following:
> > > >
> > > > 1.Shared vhost thread between mutiple devices
> > > > This patch creates a worker thread and worker queue shared across
> > multiple
> > > > virtio devices
> > > > We would like to modify the patch posted in
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > to limit a vhost thread to serve multiple devices only if they
belong
> > to
> > > > the same VM as Paolo suggested to avoid isolation or cgroups
concerns.
> > > >
> > > > Another modification is related to the creation and removal of
vhost
> > > > threads, which will be discussed next.
> > >
> > > I think this is an exceptionally bad idea.
> > >
> > > We shouldn't throw away isolation without exhausting every other
> > > possibility.
> >
> > Seems you have missed the important details here.
> > Anthony, we are aware you are concerned about isolation
> > and you believe we should not share a single vhost thread across
> > multiple VMs.  That's why Razya proposed to change the patch
> > so we will serve multiple virtio devices using a single vhost thread
> > "only if the devices belong to the same VM". This series of patches
> > will not allow two different VMs to share the same vhost thread.
> > So, I don't see why this will be throwing away isolation and why
> > this could be a "exceptionally bad idea".
> >
> > By the way, I remember that during the KVM forum a similar
> > approach of having a single data plane thread for many devices
> > was discussed....
> > > We've seen very positive results from adding threads.  We should also
> > > look at scheduling.
> >
> > ...and we have also seen exceptionally negative results from
> > adding threads, both for vhost and data-plane. If you have lot of idle
> > time/cores
> > then it makes sense to run multiple threads. But IMHO in many scenarios
you
> > don't have lot of idle time/cores.. and if you have them you would
probably
> > prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> > enough physical cores to run all the VCPU threads and the I/O threads
is
> > not a
> > realistic scenario.
> >
> > That's why we are proposing to implement a mechanism that will enable
> > the management stack to configure 1 thread per I/O device (as it is
today)
> > or 1 thread for many I/O devices (belonging to the same VM).
> >
> > > Once you are scheduling multiple guests in a single vhost device, you
> > > now create a whole new class of DoS attacks in the best case
scenario.
> >
> > Again, we are NOT proposing to schedule multiple guests in a single
> > vhost thread. We are proposing to schedule multiple devices belonging
> > to the same guest in a single (or multiple) vhost thread/s.
> >
>
> I guess a question then becomes why have multiple devices?

I assume that there are guests that have multiple vhost devices
(net or scsi/tcm). We can also extend the approach to consider
multiqueue devices, so we can create 1 vhost thread shared for all the
queues,
1 vhost thread for each queue or a few threads for multiple queues. We
could also share a thread across multiple queues even if they do not belong
to the same device.

Remember the experiments Shirley Ma did with the split
tx/rx ? If we have a control interface we could support both
approaches: different threads or a single thread.

>
>
> > >
> > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > This patch allows us to add and remove vhost threads dynamically.
> > > >
> > > > A simpler way to control the creation of vhost threads is
statically
> > > > determining the maximum number of virtio devices per worker via a
> > kernel
> > > > module parameter (which is the way the previously mentioned patch
is
> > > > currently implemented)
> > > >
> > > > I'd like to ask for advice here about the more preferable way to
go:
> > > > Although having the sysfs mechanism provides more flexibility, it
may
> > be a
> > > > good idea to start with a simple static parameter, and have the
first
> > > > patches as simple as possible. What do you think?
> > > >
> > > > 3.Add virtqueue polling mode to vhost
> > > > Have the vhost thread poll the virtqueues with high I/O rate for
new
> > > > buffers , and avoid asking the guest to kick us.
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > >
> > > Ack on this.
> >
> > :)
> >
> > Regards,
> > Abel.
> >
> > >
> > > Regards,
> > >
> > > Anthony Liguori
> > >
> > > > 4. vhost statistics
> > > > This patch introduces a set of statistics to monitor different
> > performance
> > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > statistics are exposed using debugfs and can be easily displayed
with a
> >
> > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > >
> > > >
> > > > 5. Add heuristics to improve I/O scheduling
> > > > This patch enhances the round-robin mechanism with a set of
heuristics
> > to
> > > > decide when to leave a virtqueue and proceed to the next.
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > >
> > > > This patch improves the handling of the requests by the vhost
thread,
> > but
> > > > could perhaps be delayed to a
> > > > later time , and not submitted as one of the first Elvis patches.
> > > > I'd love to hear some comments about whether this patch needs to be
> > part
> > > > of the first submission.
> > > >
> > > > Any other feedback on this plan will be appreciated,
> > > > Thank you,
> > > > Razya
> > >
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  7:35   ` Gleb Natapov
  2013-11-27  7:45     ` Joel Nider
@ 2013-11-27  9:18     ` Abel Gordon
  2013-11-27  9:21       ` Gleb Natapov
  1 sibling, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27  9:18 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: abel.gordon, anthony, asias, bsd, digitaleric, Eran Raichstein,
	Jason Wang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky



Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 09:35:01 AM:

> On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with
a
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> Definitely. kvm_stats has moved to ftrace long time ago.
>

We should use trace points for debugging information  but IMHO we should
have a dedicated (and different) mechanism to expose data that can be
easily consumed by a user-space (policy) application to control how many
vhost threads we need or any other vhost feature we may introduce
(e.g. polling). That's why we proposed something like vhost_stat
based on sysfs.

This is not like kvm_stat that can be replaced with tracepoints. Here
we will like to expose data to "control" the system. So I would
say what we are trying to do something that resembles the ksm interface
implemented under /sys/kernel/mm/ksm/


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:03       ` Abel Gordon
@ 2013-11-27  9:21         ` Michael S. Tsirkin
  2013-11-27  9:49           ` Abel Gordon
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27  9:21 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> 
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > ELVIS slides:
> > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > >
> > > > >
> > > > > According to the discussions that took place at the forum,
> upstreaming
> > > > > some of the Elvis approaches seems to be a good idea, which we
> would
> > > like
> > > > > to pursue.
> > > > >
> > > > > Our plan for the first patches is the following:
> > > > >
> > > > > 1.Shared vhost thread between mutiple devices
> > > > > This patch creates a worker thread and worker queue shared across
> > > multiple
> > > > > virtio devices
> > > > > We would like to modify the patch posted in
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > to limit a vhost thread to serve multiple devices only if they
> belong
> > > to
> > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> concerns.
> > > > >
> > > > > Another modification is related to the creation and removal of
> vhost
> > > > > threads, which will be discussed next.
> > > >
> > > > I think this is an exceptionally bad idea.
> > > >
> > > > We shouldn't throw away isolation without exhausting every other
> > > > possibility.
> > >
> > > Seems you have missed the important details here.
> > > Anthony, we are aware you are concerned about isolation
> > > and you believe we should not share a single vhost thread across
> > > multiple VMs.  That's why Razya proposed to change the patch
> > > so we will serve multiple virtio devices using a single vhost thread
> > > "only if the devices belong to the same VM". This series of patches
> > > will not allow two different VMs to share the same vhost thread.
> > > So, I don't see why this will be throwing away isolation and why
> > > this could be a "exceptionally bad idea".
> > >
> > > By the way, I remember that during the KVM forum a similar
> > > approach of having a single data plane thread for many devices
> > > was discussed....
> > > > We've seen very positive results from adding threads.  We should also
> > > > look at scheduling.
> > >
> > > ...and we have also seen exceptionally negative results from
> > > adding threads, both for vhost and data-plane. If you have lot of idle
> > > time/cores
> > > then it makes sense to run multiple threads. But IMHO in many scenarios
> you
> > > don't have lot of idle time/cores.. and if you have them you would
> probably
> > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> > > enough physical cores to run all the VCPU threads and the I/O threads
> is
> > > not a
> > > realistic scenario.
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> I assume that there are guests that have multiple vhost devices
> (net or scsi/tcm).

These are kind of uncommon though.  In fact a kernel thread is not a
unit of isolation - cgroups supply isolation.
If we had use_cgroups kind of like use_mm, we could thinkably
do work for multiple VMs on the same thread.


> We can also extend the approach to consider
> multiqueue devices, so we can create 1 vhost thread shared for all the
> queues,
> 1 vhost thread for each queue or a few threads for multiple queues. We
> could also share a thread across multiple queues even if they do not belong
> to the same device.
> 
> Remember the experiments Shirley Ma did with the split
> tx/rx ? If we have a control interface we could support both
> approaches: different threads or a single thread.


I'm a bit concerned about interface managing specific
threads being so low level.
What exactly is it that management knows that makes it
efficient to group threads together?
That host is over-committed so we should use less CPU?
I'd like the interface to express that knowledge.


> >
> >
> > > >
> > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > This patch allows us to add and remove vhost threads dynamically.
> > > > >
> > > > > A simpler way to control the creation of vhost threads is
> statically
> > > > > determining the maximum number of virtio devices per worker via a
> > > kernel
> > > > > module parameter (which is the way the previously mentioned patch
> is
> > > > > currently implemented)
> > > > >
> > > > > I'd like to ask for advice here about the more preferable way to
> go:
> > > > > Although having the sysfs mechanism provides more flexibility, it
> may
> > > be a
> > > > > good idea to start with a simple static parameter, and have the
> first
> > > > > patches as simple as possible. What do you think?
> > > > >
> > > > > 3.Add virtqueue polling mode to vhost
> > > > > Have the vhost thread poll the virtqueues with high I/O rate for
> new
> > > > > buffers , and avoid asking the guest to kick us.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > >
> > > > Ack on this.
> > >
> > > :)
> > >
> > > Regards,
> > > Abel.
> > >
> > > >
> > > > Regards,
> > > >
> > > > Anthony Liguori
> > > >
> > > > > 4. vhost statistics
> > > > > This patch introduces a set of statistics to monitor different
> > > performance
> > > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > > statistics are exposed using debugfs and can be easily displayed
> with a
> > >
> > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > >
> > > > >
> > > > > 5. Add heuristics to improve I/O scheduling
> > > > > This patch enhances the round-robin mechanism with a set of
> heuristics
> > > to
> > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > >
> > > > > This patch improves the handling of the requests by the vhost
> thread,
> > > but
> > > > > could perhaps be delayed to a
> > > > > later time , and not submitted as one of the first Elvis patches.
> > > > > I'd love to hear some comments about whether this patch needs to be
> > > part
> > > > > of the first submission.
> > > > >
> > > > > Any other feedback on this plan will be appreciated,
> > > > > Thank you,
> > > > > Razya
> > > >
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:18     ` Abel Gordon
@ 2013-11-27  9:21       ` Gleb Natapov
  2013-11-27  9:33         ` Abel Gordon
  0 siblings, 1 reply; 39+ messages in thread
From: Gleb Natapov @ 2013-11-27  9:21 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, anthony, asias, bsd, digitaleric, Eran Raichstein,
	Jason Wang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky

On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 09:35:01 AM:
> 
> > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > 4. vhost statistics
> > > > This patch introduces a set of statistics to monitor different
> > performance
> > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > statistics are exposed using debugfs and can be easily displayed with
> a
> > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > >
> > > How about using trace points instead? Besides statistics, it can also
> > > help more in debugging.
> > Definitely. kvm_stats has moved to ftrace long time ago.
> >
> 
> We should use trace points for debugging information  but IMHO we should
> have a dedicated (and different) mechanism to expose data that can be
> easily consumed by a user-space (policy) application to control how many
> vhost threads we need or any other vhost feature we may introduce
> (e.g. polling). That's why we proposed something like vhost_stat
> based on sysfs.
> 
> This is not like kvm_stat that can be replaced with tracepoints. Here
> we will like to expose data to "control" the system. So I would
> say what we are trying to do something that resembles the ksm interface
> implemented under /sys/kernel/mm/ksm/
There are control operation and there are performance/statistic
gathering operations use /sys for former and ftrace for later. The fact
that you need /sys interface for other things does not mean you can
abuse it for statistics too.

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:21       ` Gleb Natapov
@ 2013-11-27  9:33         ` Abel Gordon
  2013-11-27  9:48           ` Gleb Natapov
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27  9:33 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: abel.gordon, anthony, asias, bsd, digitaleric, Eran Raichstein,
	Jason Wang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky



Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 11:21:59 AM:


> On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> >
> >
> > Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 09:35:01 AM:
> >
> > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > > 4. vhost statistics
> > > > > This patch introduces a set of statistics to monitor different
> > > performance
> > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
The
> > > > > statistics are exposed using debugfs and can be easily displayed
with
> > a
> > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > >
> > > > How about using trace points instead? Besides statistics, it can
also
> > > > help more in debugging.
> > > Definitely. kvm_stats has moved to ftrace long time ago.
> > >
> >
> > We should use trace points for debugging information  but IMHO we
should
> > have a dedicated (and different) mechanism to expose data that can be
> > easily consumed by a user-space (policy) application to control how
many
> > vhost threads we need or any other vhost feature we may introduce
> > (e.g. polling). That's why we proposed something like vhost_stat
> > based on sysfs.
> >
> > This is not like kvm_stat that can be replaced with tracepoints. Here
> > we will like to expose data to "control" the system. So I would
> > say what we are trying to do something that resembles the ksm interface
> > implemented under /sys/kernel/mm/ksm/
> There are control operation and there are performance/statistic
> gathering operations use /sys for former and ftrace for later. The fact
> that you need /sys interface for other things does not mean you can
> abuse it for statistics too.

Agree. Any statistics that we add for debugging purposes should be
implemented
using tracepoints. But control and related data interfaces (that are not
for
debugging purposes) should be in sysfs. Look for example at
 /sys/kernel/mm/ksm/full_scans
 /sys/kernel/mm/ksm/pages_shared
 /sys/kernel/mm/ksm/pages_sharing
 /sys/kernel/mm/ksm/pages_to_scan
 /sys/kernel/mm/ksm/pages_unshared
 /sys/kernel/mm/ksm/pages_volatile



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:33         ` Abel Gordon
@ 2013-11-27  9:48           ` Gleb Natapov
  0 siblings, 0 replies; 39+ messages in thread
From: Gleb Natapov @ 2013-11-27  9:48 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, anthony, asias, bsd, digitaleric, Eran Raichstein,
	Jason Wang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky

On Wed, Nov 27, 2013 at 11:33:19AM +0200, Abel Gordon wrote:
> 
> 
> Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 11:21:59 AM:
> 
> 
> > On Wed, Nov 27, 2013 at 11:18:26AM +0200, Abel Gordon wrote:
> > >
> > >
> > > Gleb Natapov <gleb@redhat.com> wrote on 27/11/2013 09:35:01 AM:
> > >
> > > > On Wed, Nov 27, 2013 at 10:49:20AM +0800, Jason Wang wrote:
> > > > > > 4. vhost statistics
> > > > > > This patch introduces a set of statistics to monitor different
> > > > performance
> > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
> The
> > > > > > statistics are exposed using debugfs and can be easily displayed
> with
> > > a
> > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > >
> > > > > How about using trace points instead? Besides statistics, it can
> also
> > > > > help more in debugging.
> > > > Definitely. kvm_stats has moved to ftrace long time ago.
> > > >
> > >
> > > We should use trace points for debugging information  but IMHO we
> should
> > > have a dedicated (and different) mechanism to expose data that can be
> > > easily consumed by a user-space (policy) application to control how
> many
> > > vhost threads we need or any other vhost feature we may introduce
> > > (e.g. polling). That's why we proposed something like vhost_stat
> > > based on sysfs.
> > >
> > > This is not like kvm_stat that can be replaced with tracepoints. Here
> > > we will like to expose data to "control" the system. So I would
> > > say what we are trying to do something that resembles the ksm interface
> > > implemented under /sys/kernel/mm/ksm/
> > There are control operation and there are performance/statistic
> > gathering operations use /sys for former and ftrace for later. The fact
> > that you need /sys interface for other things does not mean you can
> > abuse it for statistics too.
> 
> Agree. Any statistics that we add for debugging purposes should be
> implemented
> using tracepoints. But control and related data interfaces (that are not
> for
> debugging purposes) should be in sysfs. Look for example at
Yes things that are not for statistics only and part of control interface
that management will use should not use ftrace (I do not think adding
more knobs is a good idea, but this is for vhost maintainer to decide),
but ksm predates ftrace, so some things below could have been implemented
as ftrace points.

>  /sys/kernel/mm/ksm/full_scans
>  /sys/kernel/mm/ksm/pages_shared
>  /sys/kernel/mm/ksm/pages_sharing
>  /sys/kernel/mm/ksm/pages_to_scan
>  /sys/kernel/mm/ksm/pages_unshared
>  /sys/kernel/mm/ksm/pages_volatile
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:21         ` Michael S. Tsirkin
@ 2013-11-27  9:49           ` Abel Gordon
  2013-11-27 10:29             ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27  9:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky



"Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00 AM:

>
> On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
which
> > > > > > developed Elvis, presented by Abel Gordon at the last KVM
forum:
> > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > ELVIS slides:
> > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > >
> > > > > >
> > > > > > According to the discussions that took place at the forum,
> > upstreaming
> > > > > > some of the Elvis approaches seems to be a good idea, which we
> > would
> > > > like
> > > > > > to pursue.
> > > > > >
> > > > > > Our plan for the first patches is the following:
> > > > > >
> > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > This patch creates a worker thread and worker queue shared
across
> > > > multiple
> > > > > > virtio devices
> > > > > > We would like to modify the patch posted in
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > to limit a vhost thread to serve multiple devices only if they
> > belong
> > > > to
> > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > concerns.
> > > > > >
> > > > > > Another modification is related to the creation and removal of
> > vhost
> > > > > > threads, which will be discussed next.
> > > > >
> > > > > I think this is an exceptionally bad idea.
> > > > >
> > > > > We shouldn't throw away isolation without exhausting every other
> > > > > possibility.
> > > >
> > > > Seems you have missed the important details here.
> > > > Anthony, we are aware you are concerned about isolation
> > > > and you believe we should not share a single vhost thread across
> > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > so we will serve multiple virtio devices using a single vhost
thread
> > > > "only if the devices belong to the same VM". This series of patches
> > > > will not allow two different VMs to share the same vhost thread.
> > > > So, I don't see why this will be throwing away isolation and why
> > > > this could be a "exceptionally bad idea".
> > > >
> > > > By the way, I remember that during the KVM forum a similar
> > > > approach of having a single data plane thread for many devices
> > > > was discussed....
> > > > > We've seen very positive results from adding threads.  We should
also
> > > > > look at scheduling.
> > > >
> > > > ...and we have also seen exceptionally negative results from
> > > > adding threads, both for vhost and data-plane. If you have lot of
idle
> > > > time/cores
> > > > then it makes sense to run multiple threads. But IMHO in many
scenarios
> > you
> > > > don't have lot of idle time/cores.. and if you have them you would
> > probably
> > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
have
> > > > enough physical cores to run all the VCPU threads and the I/O
threads
> > is
> > > > not a
> > > > realistic scenario.
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > I assume that there are guests that have multiple vhost devices
> > (net or scsi/tcm).
>
> These are kind of uncommon though.  In fact a kernel thread is not a
> unit of isolation - cgroups supply isolation.
> If we had use_cgroups kind of like use_mm, we could thinkably
> do work for multiple VMs on the same thread.
>
>
> > We can also extend the approach to consider
> > multiqueue devices, so we can create 1 vhost thread shared for all the
> > queues,
> > 1 vhost thread for each queue or a few threads for multiple queues. We
> > could also share a thread across multiple queues even if they do not
belong
> > to the same device.
> >
> > Remember the experiments Shirley Ma did with the split
> > tx/rx ? If we have a control interface we could support both
> > approaches: different threads or a single thread.
>
>
> I'm a bit concerned about interface managing specific
> threads being so low level.
> What exactly is it that management knows that makes it
> efficient to group threads together?
> That host is over-committed so we should use less CPU?
> I'd like the interface to express that knowledge.
>

We can expose information such as the amount of I/O being
handled for each queue, the amount of CPU cycles consumed for
processing the I/O, latency and more.
If we start with a simple mechanism that just enables the
feature we can later expose more information to implement a policy
framework that will be responsible for taking the decisions
(the orchestration part).


> > >
> > >
> > > > >
> > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > This patch allows us to add and remove vhost threads
dynamically.
> > > > > >
> > > > > > A simpler way to control the creation of vhost threads is
> > statically
> > > > > > determining the maximum number of virtio devices per worker via
a
> > > > kernel
> > > > > > module parameter (which is the way the previously mentioned
patch
> > is
> > > > > > currently implemented)
> > > > > >
> > > > > > I'd like to ask for advice here about the more preferable way
to
> > go:
> > > > > > Although having the sysfs mechanism provides more flexibility,
it
> > may
> > > > be a
> > > > > > good idea to start with a simple static parameter, and have the
> > first
> > > > > > patches as simple as possible. What do you think?
> > > > > >
> > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > Have the vhost thread poll the virtqueues with high I/O rate
for
> > new
> > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > >
> > > > > Ack on this.
> > > >
> > > > :)
> > > >
> > > > Regards,
> > > > Abel.
> > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Anthony Liguori
> > > > >
> > > > > > 4. vhost statistics
> > > > > > This patch introduces a set of statistics to monitor different
> > > > performance
> > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
The
> > > > > > statistics are exposed using debugfs and can be easily
displayed
> > with a
> > > >
> > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > >
> > > > > >
> > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > This patch enhances the round-robin mechanism with a set of
> > heuristics
> > > > to
> > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > >
> > > > > > This patch improves the handling of the requests by the vhost
> > thread,
> > > > but
> > > > > > could perhaps be delayed to a
> > > > > > later time , and not submitted as one of the first Elvis
patches.
> > > > > > I'd love to hear some comments about whether this patch needs
to be
> > > > part
> > > > > > of the first submission.
> > > > > >
> > > > > > Any other feedback on this plan will be appreciated,
> > > > > > Thank you,
> > > > > > Razya
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  2:49 ` Jason Wang
  2013-11-27  7:35   ` Gleb Natapov
@ 2013-11-27 10:18   ` Abel Gordon
  2013-11-27 10:37     ` Michael S. Tsirkin
  1 sibling, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27 10:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: abel.gordon, anthony, asias, bsd, digitaleric, Eran Raichstein,
	gleb, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky



Jason Wang <jasowang@redhat.com> wrote on 27/11/2013 04:49:20 AM:

>
> On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> > Hi all,
> >
> > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > ELVIS slides:
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> >
> >
> > According to the discussions that took place at the forum, upstreaming
> > some of the Elvis approaches seems to be a good idea, which we would
like
> > to pursue.
> >
> > Our plan for the first patches is the following:
> >
> > 1.Shared vhost thread between mutiple devices
> > This patch creates a worker thread and worker queue shared across
multiple
> > virtio devices
> > We would like to modify the patch posted in
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > to limit a vhost thread to serve multiple devices only if they belong
to
> > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> >
> > Another modification is related to the creation and removal of vhost
> > threads, which will be discussed next.
> >
> > 2. Sysfs mechanism to add and remove vhost threads
> > This patch allows us to add and remove vhost threads dynamically.
> >
> > A simpler way to control the creation of vhost threads is statically
> > determining the maximum number of virtio devices per worker via a
kernel
> > module parameter (which is the way the previously mentioned patch is
> > currently implemented)
>
> Any chance we can re-use the cwmq instead of inventing another
> mechanism? Looks like there're lots of function duplication here. Bandan
> has an RFC to do this.

Thanks for the suggestion. We should certainly take a look at Bandan's
patches which I guess are:

http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html

My only concern here is that we may not be able to easily implement
our polling mechanism and heuristics with cwmq.

> >
> > I'd like to ask for advice here about the more preferable way to go:
> > Although having the sysfs mechanism provides more flexibility, it may
be a
> > good idea to start with a simple static parameter, and have the first
> > patches as simple as possible. What do you think?
> >
> > 3.Add virtqueue polling mode to vhost
> > Have the vhost thread poll the virtqueues with high I/O rate for new
> > buffers , and avoid asking the guest to kick us.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> 26616133fafb7855cc80fac070b0572fd1aaf5d0
>
> Maybe we can make poll_stop_idle adaptive which may help the light load
> case. Consider guest is often slow than vhost, if we just have one or
> two vms, polling too much may waste cpu in this case.

Yes, make polling adaptive based on the amount of wasted cycles (cycles
we did polling but didn't find new work) and I/O rate is a very good idea.
Note we already measure and expose these values but we do not use them
to adapt the polling mechanism.

Having said that, note that adaptive polling may be a bit tricky.
Remember that the cycles we waste polling in the vhost thread actually
improves the performance of the vcpu threads because the guest is no longer

require to kick (pio==exit) the host when vhost does polling. So even if
we waste cycles in the vhost thread, we are saving cycles in the
vcpu thread and improving performance.

> > 4. vhost statistics
> > This patch introduces a set of statistics to monitor different
performance
> > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > statistics are exposed using debugfs and can be easily displayed with a

> > Python script (vhost_stat, based on the old kvm_stats)
> > https://github.com/abelg/virtual_io_acceleration/commit/
> ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
> How about using trace points instead? Besides statistics, it can also
> help more in debugging.

Yep, we just had a discussion with Gleb about this :)

> >
> > 5. Add heuristics to improve I/O scheduling
> > This patch enhances the round-robin mechanism with a set of heuristics
to
> > decide when to leave a virtqueue and proceed to the next.
> > https://github.com/abelg/virtual_io_acceleration/commit/
> f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> >
> > This patch improves the handling of the requests by the vhost thread,
but
> > could perhaps be delayed to a
> > later time , and not submitted as one of the first Elvis patches.
> > I'd love to hear some comments about whether this patch needs to be
part
> > of the first submission.
> >
> > Any other feedback on this plan will be appreciated,
> > Thank you,
> > Razya
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  7:43       ` Joel Nider
@ 2013-11-27 10:27         ` Michael S. Tsirkin
  2013-11-27 10:41           ` Abel Gordon
  2013-11-27 15:00         ` Stefan Hajnoczi
  1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:27 UTC (permalink / raw)
  To: Joel Nider
  Cc: Abel Gordon, abel.gordon, Anthony Liguori, asias, digitaleric,
	Eran Raichstein, gleb, jasowang, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> Hi,
> 
> Razya is out for a few days, so I will try to answer the questions as well
> as I can:
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> 
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > Haifa/IBM@IBMIL
> > Date: 27/11/2013 01:08 AM
> > Subject: Re: Elvis upstreaming plan
> >
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > >
> <edit>
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> If you mean "why serve multiple devices from a single thread" the answer is
> that we cannot rely on the Linux scheduler which has no knowledge of I/O
> queues to do a decent job of scheduling I/O.  The idea is to take over the
> I/O scheduling responsibilities from the kernel's thread scheduler with a
> more efficient I/O scheduler inside each vhost thread.  So by combining all
> of the I/O devices from the same guest (disks, network cards, etc) in a
> single I/O thread, it allows us to provide better scheduling by giving us
> more knowledge of the nature of the work.  So now instead of relying on the
> linux scheduler to perform context switches between multiple vhost threads,
> we have a single thread context in which we can do the I/O scheduling more
> efficiently.  We can closely monitor the performance needs of each queue of
> each device inside the vhost thread which gives us much more information
> than relying on the kernel's thread scheduler.
> This does not expose any additional opportunities for attacks (DoS or
> other) than are already available since all of the I/O traffic belongs to a
> single guest.
> You can make the argument that with low I/O loads this mechanism may not
> make much difference.  However when you try to maximize the utilization of
> your hardware (such as in a commercial scenario) this technique can gain
> you a large benefit.
> 
> Regards,
> 
> Joel Nider
> Virtualization Research
> IBM Research and Development
> Haifa Research Lab

So all this would sound more convincing if we had sharing between VMs.
When it's only a single VM it's somehow less convincing, isn't it?
Of course if we would bypass a scheduler like this it becomes harder to
enforce cgroup limits.
But it might be easier to give scheduler the info it needs to do what we
need.  Would an API that basically says "run this kthread right now"
do the trick?


>                                                                                         
>                                                                                         
>                                                                                         
>  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded image moved to file: 
>  E-mail: JOELN@il.ibm.com                                              pic39571.gif)IBM 
>                                                                                         
>                                                                                         
> 
> 
> 
> 
> > > > > Hi all,
> > > > >
> > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > ELVIS slides:
> > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > >
> > > > >
> > > > > According to the discussions that took place at the forum,
> upstreaming
> > > > > some of the Elvis approaches seems to be a good idea, which we
> would
> > > like
> > > > > to pursue.
> > > > >
> > > > > Our plan for the first patches is the following:
> > > > >
> > > > > 1.Shared vhost thread between mutiple devices
> > > > > This patch creates a worker thread and worker queue shared across
> > > multiple
> > > > > virtio devices
> > > > > We would like to modify the patch posted in
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > to limit a vhost thread to serve multiple devices only if they
> belong
> > > to
> > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> concerns.
> > > > >
> > > > > Another modification is related to the creation and removal of
> vhost
> > > > > threads, which will be discussed next.
> > > >
> > > > I think this is an exceptionally bad idea.
> > > >
> > > > We shouldn't throw away isolation without exhausting every other
> > > > possibility.
> > >
> > > Seems you have missed the important details here.
> > > Anthony, we are aware you are concerned about isolation
> > > and you believe we should not share a single vhost thread across
> > > multiple VMs.  That's why Razya proposed to change the patch
> > > so we will serve multiple virtio devices using a single vhost thread
> > > "only if the devices belong to the same VM". This series of patches
> > > will not allow two different VMs to share the same vhost thread.
> > > So, I don't see why this will be throwing away isolation and why
> > > this could be a "exceptionally bad idea".
> > >
> > > By the way, I remember that during the KVM forum a similar
> > > approach of having a single data plane thread for many devices
> > > was discussed....
> > > > We've seen very positive results from adding threads.  We should also
> > > > look at scheduling.
> > >
> > > ...and we have also seen exceptionally negative results from
> > > adding threads, both for vhost and data-plane. If you have lot of idle
> > > time/cores
> > > then it makes sense to run multiple threads. But IMHO in many scenarios
> you
> > > don't have lot of idle time/cores.. and if you have them you would
> probably
> > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> > > enough physical cores to run all the VCPU threads and the I/O threads
> is
> > > not a
> > > realistic scenario.
> 
> >
> > > >
> > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > This patch allows us to add and remove vhost threads dynamically.
> > > > >
> > > > > A simpler way to control the creation of vhost threads is
> statically
> > > > > determining the maximum number of virtio devices per worker via a
> > > kernel
> > > > > module parameter (which is the way the previously mentioned patch
> is
> > > > > currently implemented)
> > > > >
> > > > > I'd like to ask for advice here about the more preferable way to
> go:
> > > > > Although having the sysfs mechanism provides more flexibility, it
> may
> > > be a
> > > > > good idea to start with a simple static parameter, and have the
> first
> > > > > patches as simple as possible. What do you think?
> > > > >
> > > > > 3.Add virtqueue polling mode to vhost
> > > > > Have the vhost thread poll the virtqueues with high I/O rate for
> new
> > > > > buffers , and avoid asking the guest to kick us.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > >
> > > > Ack on this.
> > >
> > > :)
> > >
> > > Regards,
> > > Abel.
> > >
> > > >
> > > > Regards,
> > > >
> > > > Anthony Liguori
> > > >
> > > > > 4. vhost statistics
> > > > > This patch introduces a set of statistics to monitor different
> > > performance
> > > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > > statistics are exposed using debugfs and can be easily displayed
> with a
> > >
> > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > >
> > > > >
> > > > > 5. Add heuristics to improve I/O scheduling
> > > > > This patch enhances the round-robin mechanism with a set of
> heuristics
> > > to
> > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > >
> > > > > This patch improves the handling of the requests by the vhost
> thread,
> > > but
> > > > > could perhaps be delayed to a
> > > > > later time , and not submitted as one of the first Elvis patches.
> > > > > I'd love to hear some comments about whether this patch needs to be
> > > part
> > > > > of the first submission.
> > > > >
> > > > > Any other feedback on this plan will be appreciated,
> > > > > Thank you,
> > > > > Razya
> > > >
> >



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  9:49           ` Abel Gordon
@ 2013-11-27 10:29             ` Michael S. Tsirkin
  2013-11-27 10:55               ` Abel Gordon
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:29 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00 AM:
> 
> >
> > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
> which
> > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> forum:
> > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > ELVIS slides:
> > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > >
> > > > > > >
> > > > > > > According to the discussions that took place at the forum,
> > > upstreaming
> > > > > > > some of the Elvis approaches seems to be a good idea, which we
> > > would
> > > > > like
> > > > > > > to pursue.
> > > > > > >
> > > > > > > Our plan for the first patches is the following:
> > > > > > >
> > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > This patch creates a worker thread and worker queue shared
> across
> > > > > multiple
> > > > > > > virtio devices
> > > > > > > We would like to modify the patch posted in
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > to limit a vhost thread to serve multiple devices only if they
> > > belong
> > > > > to
> > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > > concerns.
> > > > > > >
> > > > > > > Another modification is related to the creation and removal of
> > > vhost
> > > > > > > threads, which will be discussed next.
> > > > > >
> > > > > > I think this is an exceptionally bad idea.
> > > > > >
> > > > > > We shouldn't throw away isolation without exhausting every other
> > > > > > possibility.
> > > > >
> > > > > Seems you have missed the important details here.
> > > > > Anthony, we are aware you are concerned about isolation
> > > > > and you believe we should not share a single vhost thread across
> > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > so we will serve multiple virtio devices using a single vhost
> thread
> > > > > "only if the devices belong to the same VM". This series of patches
> > > > > will not allow two different VMs to share the same vhost thread.
> > > > > So, I don't see why this will be throwing away isolation and why
> > > > > this could be a "exceptionally bad idea".
> > > > >
> > > > > By the way, I remember that during the KVM forum a similar
> > > > > approach of having a single data plane thread for many devices
> > > > > was discussed....
> > > > > > We've seen very positive results from adding threads.  We should
> also
> > > > > > look at scheduling.
> > > > >
> > > > > ...and we have also seen exceptionally negative results from
> > > > > adding threads, both for vhost and data-plane. If you have lot of
> idle
> > > > > time/cores
> > > > > then it makes sense to run multiple threads. But IMHO in many
> scenarios
> > > you
> > > > > don't have lot of idle time/cores.. and if you have them you would
> > > probably
> > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
> have
> > > > > enough physical cores to run all the VCPU threads and the I/O
> threads
> > > is
> > > > > not a
> > > > > realistic scenario.
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > I assume that there are guests that have multiple vhost devices
> > > (net or scsi/tcm).
> >
> > These are kind of uncommon though.  In fact a kernel thread is not a
> > unit of isolation - cgroups supply isolation.
> > If we had use_cgroups kind of like use_mm, we could thinkably
> > do work for multiple VMs on the same thread.
> >
> >
> > > We can also extend the approach to consider
> > > multiqueue devices, so we can create 1 vhost thread shared for all the
> > > queues,
> > > 1 vhost thread for each queue or a few threads for multiple queues. We
> > > could also share a thread across multiple queues even if they do not
> belong
> > > to the same device.
> > >
> > > Remember the experiments Shirley Ma did with the split
> > > tx/rx ? If we have a control interface we could support both
> > > approaches: different threads or a single thread.
> >
> >
> > I'm a bit concerned about interface managing specific
> > threads being so low level.
> > What exactly is it that management knows that makes it
> > efficient to group threads together?
> > That host is over-committed so we should use less CPU?
> > I'd like the interface to express that knowledge.
> >
> 
> We can expose information such as the amount of I/O being
> handled for each queue, the amount of CPU cycles consumed for
> processing the I/O, latency and more.
> If we start with a simple mechanism that just enables the
> feature we can later expose more information to implement a policy
> framework that will be responsible for taking the decisions
> (the orchestration part).

What kind of possible policies do you envision?
If we just react to load by balancing the work done,
and when over-committed anyway, localize work so
we get less IPIs, then this is not policy, this is the mechanism.


> 
> > > >
> > > >
> > > > > >
> > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > This patch allows us to add and remove vhost threads
> dynamically.
> > > > > > >
> > > > > > > A simpler way to control the creation of vhost threads is
> > > statically
> > > > > > > determining the maximum number of virtio devices per worker via
> a
> > > > > kernel
> > > > > > > module parameter (which is the way the previously mentioned
> patch
> > > is
> > > > > > > currently implemented)
> > > > > > >
> > > > > > > I'd like to ask for advice here about the more preferable way
> to
> > > go:
> > > > > > > Although having the sysfs mechanism provides more flexibility,
> it
> > > may
> > > > > be a
> > > > > > > good idea to start with a simple static parameter, and have the
> > > first
> > > > > > > patches as simple as possible. What do you think?
> > > > > > >
> > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > Have the vhost thread poll the virtqueues with high I/O rate
> for
> > > new
> > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > >
> > > > > > Ack on this.
> > > > >
> > > > > :)
> > > > >
> > > > > Regards,
> > > > > Abel.
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Anthony Liguori
> > > > > >
> > > > > > > 4. vhost statistics
> > > > > > > This patch introduces a set of statistics to monitor different
> > > > > performance
> > > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
> The
> > > > > > > statistics are exposed using debugfs and can be easily
> displayed
> > > with a
> > > > >
> > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > >
> > > > > > >
> > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > This patch enhances the round-robin mechanism with a set of
> > > heuristics
> > > > > to
> > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > >
> > > > > > > This patch improves the handling of the requests by the vhost
> > > thread,
> > > > > but
> > > > > > > could perhaps be delayed to a
> > > > > > > later time , and not submitted as one of the first Elvis
> patches.
> > > > > > > I'd love to hear some comments about whether this patch needs
> to be
> > > > > part
> > > > > > > of the first submission.
> > > > > > >
> > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > Thank you,
> > > > > > > Razya
> > > > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:18   ` Abel Gordon
@ 2013-11-27 10:37     ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:37 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Jason Wang, abel.gordon, anthony, asias, bsd, digitaleric,
	Eran Raichstein, gleb, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 12:18:51PM +0200, Abel Gordon wrote:
> 
> 
> Jason Wang <jasowang@redhat.com> wrote on 27/11/2013 04:49:20 AM:
> 
> >
> > On 11/24/2013 05:22 PM, Razya Ladelsky wrote:
> > > Hi all,
> > >
> > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > ELVIS slides:
> https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > >
> > >
> > > According to the discussions that took place at the forum, upstreaming
> > > some of the Elvis approaches seems to be a good idea, which we would
> like
> > > to pursue.
> > >
> > > Our plan for the first patches is the following:
> > >
> > > 1.Shared vhost thread between mutiple devices
> > > This patch creates a worker thread and worker queue shared across
> multiple
> > > virtio devices
> > > We would like to modify the patch posted in
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > to limit a vhost thread to serve multiple devices only if they belong
> to
> > > the same VM as Paolo suggested to avoid isolation or cgroups concerns.
> > >
> > > Another modification is related to the creation and removal of vhost
> > > threads, which will be discussed next.
> > >
> > > 2. Sysfs mechanism to add and remove vhost threads
> > > This patch allows us to add and remove vhost threads dynamically.
> > >
> > > A simpler way to control the creation of vhost threads is statically
> > > determining the maximum number of virtio devices per worker via a
> kernel
> > > module parameter (which is the way the previously mentioned patch is
> > > currently implemented)
> >
> > Any chance we can re-use the cwmq instead of inventing another
> > mechanism? Looks like there're lots of function duplication here. Bandan
> > has an RFC to do this.
> 
> Thanks for the suggestion. We should certainly take a look at Bandan's
> patches which I guess are:
> 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg96603.html
> 
> My only concern here is that we may not be able to easily implement
> our polling mechanism and heuristics with cwmq.

It's not so hard, to poll you just requeue work to make sure it's
re-invoked.

> > >
> > > I'd like to ask for advice here about the more preferable way to go:
> > > Although having the sysfs mechanism provides more flexibility, it may
> be a
> > > good idea to start with a simple static parameter, and have the first
> > > patches as simple as possible. What do you think?
> > >
> > > 3.Add virtqueue polling mode to vhost
> > > Have the vhost thread poll the virtqueues with high I/O rate for new
> > > buffers , and avoid asking the guest to kick us.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> >
> > Maybe we can make poll_stop_idle adaptive which may help the light load
> > case. Consider guest is often slow than vhost, if we just have one or
> > two vms, polling too much may waste cpu in this case.
> 
> Yes, make polling adaptive based on the amount of wasted cycles (cycles
> we did polling but didn't find new work) and I/O rate is a very good idea.
> Note we already measure and expose these values but we do not use them
> to adapt the polling mechanism.
> 
> Having said that, note that adaptive polling may be a bit tricky.
> Remember that the cycles we waste polling in the vhost thread actually
> improves the performance of the vcpu threads because the guest is no longer
> 
> require to kick (pio==exit) the host when vhost does polling. So even if
> we waste cycles in the vhost thread, we are saving cycles in the
> vcpu thread and improving performance.


So my suggestion would be:

- guest runs some kicks
- measures how long it took, e.g. kick = T cycles
- sends this info to host

host polls for at most fraction * T cycles


> > > 4. vhost statistics
> > > This patch introduces a set of statistics to monitor different
> performance
> > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > statistics are exposed using debugfs and can be easily displayed with a
> 
> > > Python script (vhost_stat, based on the old kvm_stats)
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >
> > How about using trace points instead? Besides statistics, it can also
> > help more in debugging.
> 
> Yep, we just had a discussion with Gleb about this :)
> 
> > >
> > > 5. Add heuristics to improve I/O scheduling
> > > This patch enhances the round-robin mechanism with a set of heuristics
> to
> > > decide when to leave a virtqueue and proceed to the next.
> > > https://github.com/abelg/virtual_io_acceleration/commit/
> > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > >
> > > This patch improves the handling of the requests by the vhost thread,
> but
> > > could perhaps be delayed to a
> > > later time , and not submitted as one of the first Elvis patches.
> > > I'd love to hear some comments about whether this patch needs to be
> part
> > > of the first submission.
> > >
> > > Any other feedback on this plan will be appreciated,
> > > Thank you,
> > > Razya
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:27         ` Michael S. Tsirkin
@ 2013-11-27 10:41           ` Abel Gordon
  2013-11-27 10:59             ` Michael S. Tsirkin
  2013-11-27 22:33             ` Anthony Liguori
  0 siblings, 2 replies; 39+ messages in thread
From: Abel Gordon @ 2013-11-27 10:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky



"Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:

>
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > >
> > <edit>
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
> > This does not expose any additional opportunities for attacks (DoS or
> > other) than are already available since all of the I/O traffic belongs
to a
> > single guest.
> > You can make the argument that with low I/O loads this mechanism may
not
> > make much difference.  However when you try to maximize the utilization
of
> > your hardware (such as in a commercial scenario) this technique can
gain
> > you a large benefit.
> >
> > Regards,
> >
> > Joel Nider
> > Virtualization Research
> > IBM Research and Development
> > Haifa Research Lab
>
> So all this would sound more convincing if we had sharing between VMs.
> When it's only a single VM it's somehow less convincing, isn't it?
> Of course if we would bypass a scheduler like this it becomes harder to
> enforce cgroup limits.

True, but here the issue becomes isolation/cgroups. We can start to show
the value for VMs that have multiple devices / queues and then we could
re-consider extending the mechanism for multiple VMs (at least as a
experimental feature).

> But it might be easier to give scheduler the info it needs to do what we
> need.  Would an API that basically says "run this kthread right now"
> do the trick?

...do you really believe it would be possible to push this kind of change
to the Linux scheduler ? In addition, we need more than
"run this kthread right now" because you need to monitor the virtio
ring activity to specify "when" you will like to run a "specific kthread"
and for "how long".

>
> >

> >

> >

> >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> image moved to file:
> >  E-mail: JOELN@il.ibm.com
> pic39571.gif)IBM
> >

> >

> >
> >
> >
> >
> > > > > > Hi all,
> > > > > >
> > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
which
> > > > > > developed Elvis, presented by Abel Gordon at the last KVM
forum:
> > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > ELVIS slides:
> > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > >
> > > > > >
> > > > > > According to the discussions that took place at the forum,
> > upstreaming
> > > > > > some of the Elvis approaches seems to be a good idea, which we
> > would
> > > > like
> > > > > > to pursue.
> > > > > >
> > > > > > Our plan for the first patches is the following:
> > > > > >
> > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > This patch creates a worker thread and worker queue shared
across
> > > > multiple
> > > > > > virtio devices
> > > > > > We would like to modify the patch posted in
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > to limit a vhost thread to serve multiple devices only if they
> > belong
> > > > to
> > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > concerns.
> > > > > >
> > > > > > Another modification is related to the creation and removal of
> > vhost
> > > > > > threads, which will be discussed next.
> > > > >
> > > > > I think this is an exceptionally bad idea.
> > > > >
> > > > > We shouldn't throw away isolation without exhausting every other
> > > > > possibility.
> > > >
> > > > Seems you have missed the important details here.
> > > > Anthony, we are aware you are concerned about isolation
> > > > and you believe we should not share a single vhost thread across
> > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > so we will serve multiple virtio devices using a single vhost
thread
> > > > "only if the devices belong to the same VM". This series of patches
> > > > will not allow two different VMs to share the same vhost thread.
> > > > So, I don't see why this will be throwing away isolation and why
> > > > this could be a "exceptionally bad idea".
> > > >
> > > > By the way, I remember that during the KVM forum a similar
> > > > approach of having a single data plane thread for many devices
> > > > was discussed....
> > > > > We've seen very positive results from adding threads.  We should
also
> > > > > look at scheduling.
> > > >
> > > > ...and we have also seen exceptionally negative results from
> > > > adding threads, both for vhost and data-plane. If you have lot of
idle
> > > > time/cores
> > > > then it makes sense to run multiple threads. But IMHO in many
scenarios
> > you
> > > > don't have lot of idle time/cores.. and if you have them you would
> > probably
> > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
have
> > > > enough physical cores to run all the VCPU threads and the I/O
threads
> > is
> > > > not a
> > > > realistic scenario.
> >
> > >
> > > > >
> > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > This patch allows us to add and remove vhost threads
dynamically.
> > > > > >
> > > > > > A simpler way to control the creation of vhost threads is
> > statically
> > > > > > determining the maximum number of virtio devices per worker via
a
> > > > kernel
> > > > > > module parameter (which is the way the previously mentioned
patch
> > is
> > > > > > currently implemented)
> > > > > >
> > > > > > I'd like to ask for advice here about the more preferable way
to
> > go:
> > > > > > Although having the sysfs mechanism provides more flexibility,
it
> > may
> > > > be a
> > > > > > good idea to start with a simple static parameter, and have the
> > first
> > > > > > patches as simple as possible. What do you think?
> > > > > >
> > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > Have the vhost thread poll the virtqueues with high I/O rate
for
> > new
> > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > >
> > > > > Ack on this.
> > > >
> > > > :)
> > > >
> > > > Regards,
> > > > Abel.
> > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Anthony Liguori
> > > > >
> > > > > > 4. vhost statistics
> > > > > > This patch introduces a set of statistics to monitor different
> > > > performance
> > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
The
> > > > > > statistics are exposed using debugfs and can be easily
displayed
> > with a
> > > >
> > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > >
> > > > > >
> > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > This patch enhances the round-robin mechanism with a set of
> > heuristics
> > > > to
> > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > >
> > > > > > This patch improves the handling of the requests by the vhost
> > thread,
> > > > but
> > > > > > could perhaps be delayed to a
> > > > > > later time , and not submitted as one of the first Elvis
patches.
> > > > > > I'd love to hear some comments about whether this patch needs
to be
> > > > part
> > > > > > of the first submission.
> > > > > >
> > > > > > Any other feedback on this plan will be appreciated,
> > > > > > Thank you,
> > > > > > Razya
> > > > >
> > >
>
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:29             ` Michael S. Tsirkin
@ 2013-11-27 10:55               ` Abel Gordon
  2013-11-27 11:03                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27 10:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky,
	Eyal Moscovici, Yossi Kuperman1



"Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:29:43 PM:

>
> On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00 AM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
PM:
> > > >
> > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > 08:05:00
> > > > PM:
> > > > > >
> > > > > > >
> > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
team,
> > which
> > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > forum:
> > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > ELVIS slides:
> > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > >
> > > > > > > >
> > > > > > > > According to the discussions that took place at the forum,
> > > > upstreaming
> > > > > > > > some of the Elvis approaches seems to be a good idea, which
we
> > > > would
> > > > > > like
> > > > > > > > to pursue.
> > > > > > > >
> > > > > > > > Our plan for the first patches is the following:
> > > > > > > >
> > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > This patch creates a worker thread and worker queue shared
> > across
> > > > > > multiple
> > > > > > > > virtio devices
> > > > > > > > We would like to modify the patch posted in
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > to limit a vhost thread to serve multiple devices only if
they
> > > > belong
> > > > > > to
> > > > > > > > the same VM as Paolo suggested to avoid isolation or
cgroups
> > > > concerns.
> > > > > > > >
> > > > > > > > Another modification is related to the creation and removal
of
> > > > vhost
> > > > > > > > threads, which will be discussed next.
> > > > > > >
> > > > > > > I think this is an exceptionally bad idea.
> > > > > > >
> > > > > > > We shouldn't throw away isolation without exhausting every
other
> > > > > > > possibility.
> > > > > >
> > > > > > Seems you have missed the important details here.
> > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > and you believe we should not share a single vhost thread
across
> > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > so we will serve multiple virtio devices using a single vhost
> > thread
> > > > > > "only if the devices belong to the same VM". This series of
patches
> > > > > > will not allow two different VMs to share the same vhost
thread.
> > > > > > So, I don't see why this will be throwing away isolation and
why
> > > > > > this could be a "exceptionally bad idea".
> > > > > >
> > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > approach of having a single data plane thread for many devices
> > > > > > was discussed....
> > > > > > > We've seen very positive results from adding threads.  We
should
> > also
> > > > > > > look at scheduling.
> > > > > >
> > > > > > ...and we have also seen exceptionally negative results from
> > > > > > adding threads, both for vhost and data-plane. If you have lot
of
> > idle
> > > > > > time/cores
> > > > > > then it makes sense to run multiple threads. But IMHO in many
> > scenarios
> > > > you
> > > > > > don't have lot of idle time/cores.. and if you have them you
would
> > > > probably
> > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
you
> > have
> > > > > > enough physical cores to run all the VCPU threads and the I/O
> > threads
> > > > is
> > > > > > not a
> > > > > > realistic scenario.
> > > > > >
> > > > > > That's why we are proposing to implement a mechanism that will
> > enable
> > > > > > the management stack to configure 1 thread per I/O device (as
it is
> > > > today)
> > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > >
> > > > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> > > > > > > now create a whole new class of DoS attacks in the best case
> > > > scenario.
> > > > > >
> > > > > > Again, we are NOT proposing to schedule multiple guests in a
single
> > > > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > >
> > > > >
> > > > > I guess a question then becomes why have multiple devices?
> > > >
> > > > I assume that there are guests that have multiple vhost devices
> > > > (net or scsi/tcm).
> > >
> > > These are kind of uncommon though.  In fact a kernel thread is not a
> > > unit of isolation - cgroups supply isolation.
> > > If we had use_cgroups kind of like use_mm, we could thinkably
> > > do work for multiple VMs on the same thread.
> > >
> > >
> > > > We can also extend the approach to consider
> > > > multiqueue devices, so we can create 1 vhost thread shared for all
the
> > > > queues,
> > > > 1 vhost thread for each queue or a few threads for multiple queues.
We
> > > > could also share a thread across multiple queues even if they do
not
> > belong
> > > > to the same device.
> > > >
> > > > Remember the experiments Shirley Ma did with the split
> > > > tx/rx ? If we have a control interface we could support both
> > > > approaches: different threads or a single thread.
> > >
> > >
> > > I'm a bit concerned about interface managing specific
> > > threads being so low level.
> > > What exactly is it that management knows that makes it
> > > efficient to group threads together?
> > > That host is over-committed so we should use less CPU?
> > > I'd like the interface to express that knowledge.
> > >
> >
> > We can expose information such as the amount of I/O being
> > handled for each queue, the amount of CPU cycles consumed for
> > processing the I/O, latency and more.
> > If we start with a simple mechanism that just enables the
> > feature we can later expose more information to implement a policy
> > framework that will be responsible for taking the decisions
> > (the orchestration part).
>
> What kind of possible policies do you envision?
> If we just react to load by balancing the work done,
> and when over-committed anyway, localize work so
> we get less IPIs, then this is not policy, this is the mechanism.

(CCing Eyal Moscovici who is actually prototyping with multiple
policies and may want to join this thread)

Starting with basic policies: we can use a single vhost thread
and create new vhost threads if it becomes saturated and there
are enough cpu cycles available in the system
or if the latency (how long the requests in the virtio queues wait
until they are handled) is too high.
We can merge threads if the latency is already low or if the threads
are not saturated.

There is a hidden trade-off here: when you run more vhost threads you
may actually be stealing cpu cycles from the vcpu threads and also
increasing context switches. So, from the vhost perspective it may
improve performance but from the vcpu threads perspective it may
degrade performance.

>
>
> >
> > > > >
> > > > >
> > > > > > >
> > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > This patch allows us to add and remove vhost threads
> > dynamically.
> > > > > > > >
> > > > > > > > A simpler way to control the creation of vhost threads is
> > > > statically
> > > > > > > > determining the maximum number of virtio devices per worker
via
> > a
> > > > > > kernel
> > > > > > > > module parameter (which is the way the previously mentioned
> > patch
> > > > is
> > > > > > > > currently implemented)
> > > > > > > >
> > > > > > > > I'd like to ask for advice here about the more preferable
way
> > to
> > > > go:
> > > > > > > > Although having the sysfs mechanism provides more
flexibility,
> > it
> > > > may
> > > > > > be a
> > > > > > > > good idea to start with a simple static parameter, and have
the
> > > > first
> > > > > > > > patches as simple as possible. What do you think?
> > > > > > > >
> > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > Have the vhost thread poll the virtqueues with high I/O
rate
> > for
> > > > new
> > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > >
> > > > > > > Ack on this.
> > > > > >
> > > > > > :)
> > > > > >
> > > > > > Regards,
> > > > > > Abel.
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Anthony Liguori
> > > > > > >
> > > > > > > > 4. vhost statistics
> > > > > > > > This patch introduces a set of statistics to monitor
different
> > > > > > performance
> > > > > > > > metrics of vhost and our polling and I/O scheduling
mechanisms.
> > The
> > > > > > > > statistics are exposed using debugfs and can be easily
> > displayed
> > > > with a
> > > > > >
> > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > >
> > > > > > > >
> > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > heuristics
> > > > > > to
> > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > >
> > > > > > > > This patch improves the handling of the requests by the
vhost
> > > > thread,
> > > > > > but
> > > > > > > > could perhaps be delayed to a
> > > > > > > > later time , and not submitted as one of the first Elvis
> > patches.
> > > > > > > > I'd love to hear some comments about whether this patch
needs
> > to be
> > > > > > part
> > > > > > > > of the first submission.
> > > > > > > >
> > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > Thank you,
> > > > > > > > Razya
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:41           ` Abel Gordon
@ 2013-11-27 10:59             ` Michael S. Tsirkin
  2013-11-27 11:02               ` Abel Gordon
  2013-11-27 22:33             ` Anthony Liguori
  1 sibling, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:59 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:
> 
> >
> > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > Hi,
> > >
> > > Razya is out for a few days, so I will try to answer the questions as
> well
> > > as I can:
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > > Haifa/IBM@IBMIL
> > > > Date: 27/11/2013 01:08 AM
> > > > Subject: Re: Elvis upstreaming plan
> > > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > >
> > > <edit>
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > If you mean "why serve multiple devices from a single thread" the
> answer is
> > > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
> > > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
> > > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
> > > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
> > > of the I/O devices from the same guest (disks, network cards, etc) in a
> > > single I/O thread, it allows us to provide better scheduling by giving
> us
> > > more knowledge of the nature of the work.  So now instead of relying on
> the
> > > linux scheduler to perform context switches between multiple vhost
> threads,
> > > we have a single thread context in which we can do the I/O scheduling
> more
> > > efficiently.  We can closely monitor the performance needs of each
> queue of
> > > each device inside the vhost thread which gives us much more
> information
> > > than relying on the kernel's thread scheduler.
> > > This does not expose any additional opportunities for attacks (DoS or
> > > other) than are already available since all of the I/O traffic belongs
> to a
> > > single guest.
> > > You can make the argument that with low I/O loads this mechanism may
> not
> > > make much difference.  However when you try to maximize the utilization
> of
> > > your hardware (such as in a commercial scenario) this technique can
> gain
> > > you a large benefit.
> > >
> > > Regards,
> > >
> > > Joel Nider
> > > Virtualization Research
> > > IBM Research and Development
> > > Haifa Research Lab
> >
> > So all this would sound more convincing if we had sharing between VMs.
> > When it's only a single VM it's somehow less convincing, isn't it?
> > Of course if we would bypass a scheduler like this it becomes harder to
> > enforce cgroup limits.
> 
> True, but here the issue becomes isolation/cgroups. We can start to show
> the value for VMs that have multiple devices / queues and then we could
> re-consider extending the mechanism for multiple VMs (at least as a
> experimental feature).

Sorry, If it's unsafe we can't merge it even if it's experimental.

> > But it might be easier to give scheduler the info it needs to do what we
> > need.  Would an API that basically says "run this kthread right now"
> > do the trick?
> 
> ...do you really believe it would be possible to push this kind of change
> to the Linux scheduler ? In addition, we need more than
> "run this kthread right now" because you need to monitor the virtio
> ring activity to specify "when" you will like to run a "specific kthread"
> and for "how long".

How long is easy - just call schedule. When sounds like specifying a
deadline which sounds like a reasonable fit to how scheduler works now.
Certainly adding an in-kernel API sounds like a better approach than
a bunch of user-visible ones.
So I'm not at all saying we need to change the scheduler - it's more
adding APIs to existing functionality.

> >
> > >
> 
> > >
> 
> > >
> 
> > >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> > image moved to file:
> > >  E-mail: JOELN@il.ibm.com
> > pic39571.gif)IBM
> > >
> 
> > >
> 
> > >
> > >
> > >
> > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
> which
> > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> forum:
> > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > ELVIS slides:
> > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > >
> > > > > > >
> > > > > > > According to the discussions that took place at the forum,
> > > upstreaming
> > > > > > > some of the Elvis approaches seems to be a good idea, which we
> > > would
> > > > > like
> > > > > > > to pursue.
> > > > > > >
> > > > > > > Our plan for the first patches is the following:
> > > > > > >
> > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > This patch creates a worker thread and worker queue shared
> across
> > > > > multiple
> > > > > > > virtio devices
> > > > > > > We would like to modify the patch posted in
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > to limit a vhost thread to serve multiple devices only if they
> > > belong
> > > > > to
> > > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> > > concerns.
> > > > > > >
> > > > > > > Another modification is related to the creation and removal of
> > > vhost
> > > > > > > threads, which will be discussed next.
> > > > > >
> > > > > > I think this is an exceptionally bad idea.
> > > > > >
> > > > > > We shouldn't throw away isolation without exhausting every other
> > > > > > possibility.
> > > > >
> > > > > Seems you have missed the important details here.
> > > > > Anthony, we are aware you are concerned about isolation
> > > > > and you believe we should not share a single vhost thread across
> > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > so we will serve multiple virtio devices using a single vhost
> thread
> > > > > "only if the devices belong to the same VM". This series of patches
> > > > > will not allow two different VMs to share the same vhost thread.
> > > > > So, I don't see why this will be throwing away isolation and why
> > > > > this could be a "exceptionally bad idea".
> > > > >
> > > > > By the way, I remember that during the KVM forum a similar
> > > > > approach of having a single data plane thread for many devices
> > > > > was discussed....
> > > > > > We've seen very positive results from adding threads.  We should
> also
> > > > > > look at scheduling.
> > > > >
> > > > > ...and we have also seen exceptionally negative results from
> > > > > adding threads, both for vhost and data-plane. If you have lot of
> idle
> > > > > time/cores
> > > > > then it makes sense to run multiple threads. But IMHO in many
> scenarios
> > > you
> > > > > don't have lot of idle time/cores.. and if you have them you would
> > > probably
> > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
> have
> > > > > enough physical cores to run all the VCPU threads and the I/O
> threads
> > > is
> > > > > not a
> > > > > realistic scenario.
> > >
> > > >
> > > > > >
> > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > This patch allows us to add and remove vhost threads
> dynamically.
> > > > > > >
> > > > > > > A simpler way to control the creation of vhost threads is
> > > statically
> > > > > > > determining the maximum number of virtio devices per worker via
> a
> > > > > kernel
> > > > > > > module parameter (which is the way the previously mentioned
> patch
> > > is
> > > > > > > currently implemented)
> > > > > > >
> > > > > > > I'd like to ask for advice here about the more preferable way
> to
> > > go:
> > > > > > > Although having the sysfs mechanism provides more flexibility,
> it
> > > may
> > > > > be a
> > > > > > > good idea to start with a simple static parameter, and have the
> > > first
> > > > > > > patches as simple as possible. What do you think?
> > > > > > >
> > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > Have the vhost thread poll the virtqueues with high I/O rate
> for
> > > new
> > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > >
> > > > > > Ack on this.
> > > > >
> > > > > :)
> > > > >
> > > > > Regards,
> > > > > Abel.
> > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Anthony Liguori
> > > > > >
> > > > > > > 4. vhost statistics
> > > > > > > This patch introduces a set of statistics to monitor different
> > > > > performance
> > > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
> The
> > > > > > > statistics are exposed using debugfs and can be easily
> displayed
> > > with a
> > > > >
> > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > >
> > > > > > >
> > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > This patch enhances the round-robin mechanism with a set of
> > > heuristics
> > > > > to
> > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > >
> > > > > > > This patch improves the handling of the requests by the vhost
> > > thread,
> > > > > but
> > > > > > > could perhaps be delayed to a
> > > > > > > later time , and not submitted as one of the first Elvis
> patches.
> > > > > > > I'd love to hear some comments about whether this patch needs
> to be
> > > > > part
> > > > > > > of the first submission.
> > > > > > >
> > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > Thank you,
> > > > > > > Razya
> > > > > >
> > > >
> >
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:59             ` Michael S. Tsirkin
@ 2013-11-27 11:02               ` Abel Gordon
  2013-11-27 11:36                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky



"Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:59:38 PM:


> On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > > Hi,
> > > >
> > > > Razya is out for a few days, so I will try to answer the questions
as
> > well
> > > > as I can:
> > > >
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
PM:
> > > >
> > > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > > Cc: Anthony Liguori <anthony@codemonkey.ws>,
abel.gordon@gmail.com,
> > > > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel
Nider/Haifa/
> > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya
Ladelsky/
> > > > > Haifa/IBM@IBMIL
> > > > > Date: 27/11/2013 01:08 AM
> > > > > Subject: Re: Elvis upstreaming plan
> > > > >
> > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > 08:05:00
> > > > PM:
> > > > > >
> > > > > > >
> > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > >
> > > > <edit>
> > > > > >
> > > > > > That's why we are proposing to implement a mechanism that will
> > enable
> > > > > > the management stack to configure 1 thread per I/O device (as
it is
> > > > today)
> > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > >
> > > > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> > > > > > > now create a whole new class of DoS attacks in the best case
> > > > scenario.
> > > > > >
> > > > > > Again, we are NOT proposing to schedule multiple guests in a
single
> > > > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > >
> > > > >
> > > > > I guess a question then becomes why have multiple devices?
> > > >
> > > > If you mean "why serve multiple devices from a single thread" the
> > answer is
> > > > that we cannot rely on the Linux scheduler which has no knowledge
of
> > I/O
> > > > queues to do a decent job of scheduling I/O.  The idea is to take
over
> > the
> > > > I/O scheduling responsibilities from the kernel's thread scheduler
with
> > a
> > > > more efficient I/O scheduler inside each vhost thread.  So by
combining
> > all
> > > > of the I/O devices from the same guest (disks, network cards, etc)
in a
> > > > single I/O thread, it allows us to provide better scheduling by
giving
> > us
> > > > more knowledge of the nature of the work.  So now instead of
relying on
> > the
> > > > linux scheduler to perform context switches between multiple vhost
> > threads,
> > > > we have a single thread context in which we can do the I/O
scheduling
> > more
> > > > efficiently.  We can closely monitor the performance needs of each
> > queue of
> > > > each device inside the vhost thread which gives us much more
> > information
> > > > than relying on the kernel's thread scheduler.
> > > > This does not expose any additional opportunities for attacks (DoS
or
> > > > other) than are already available since all of the I/O traffic
belongs
> > to a
> > > > single guest.
> > > > You can make the argument that with low I/O loads this mechanism
may
> > not
> > > > make much difference.  However when you try to maximize the
utilization
> > of
> > > > your hardware (such as in a commercial scenario) this technique can
> > gain
> > > > you a large benefit.
> > > >
> > > > Regards,
> > > >
> > > > Joel Nider
> > > > Virtualization Research
> > > > IBM Research and Development
> > > > Haifa Research Lab
> > >
> > > So all this would sound more convincing if we had sharing between
VMs.
> > > When it's only a single VM it's somehow less convincing, isn't it?
> > > Of course if we would bypass a scheduler like this it becomes harder
to
> > > enforce cgroup limits.
> >
> > True, but here the issue becomes isolation/cgroups. We can start to
show
> > the value for VMs that have multiple devices / queues and then we could
> > re-consider extending the mechanism for multiple VMs (at least as a
> > experimental feature).
>
> Sorry, If it's unsafe we can't merge it even if it's experimental.
>
> > > But it might be easier to give scheduler the info it needs to do what
we
> > > need.  Would an API that basically says "run this kthread right now"
> > > do the trick?
> >
> > ...do you really believe it would be possible to push this kind of
change
> > to the Linux scheduler ? In addition, we need more than
> > "run this kthread right now" because you need to monitor the virtio
> > ring activity to specify "when" you will like to run a "specific
kthread"
> > and for "how long".
>
> How long is easy - just call schedule. When sounds like specifying a
> deadline which sounds like a reasonable fit to how scheduler works now.

... but "when" you should call schedule actually depends on the I/O
activity of the queues. The patches we shared constantly monitor the
virtio rings (pending items and for how long they are pending there)
to decide if we should continue processing the same queue or switch to
other queue.

> Certainly adding an in-kernel API sounds like a better approach than
> a bunch of user-visible ones.
> So I'm not at all saying we need to change the scheduler - it's more
> adding APIs to existing functionality.

Yep, but this may be also difficult to push...

>
> > >
> > > >
> >
> > > >
> >
> > > >
> >
> > > >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> > > image moved to file:
> > > >  E-mail: JOELN@il.ibm.com
> > > pic39571.gif)IBM
> > > >
> >
> > > >
> >
> > > >
> > > >
> > > >
> > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
team,
> > which
> > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > forum:
> > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > ELVIS slides:
> > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > >
> > > > > > > >
> > > > > > > > According to the discussions that took place at the forum,
> > > > upstreaming
> > > > > > > > some of the Elvis approaches seems to be a good idea, which
we
> > > > would
> > > > > > like
> > > > > > > > to pursue.
> > > > > > > >
> > > > > > > > Our plan for the first patches is the following:
> > > > > > > >
> > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > This patch creates a worker thread and worker queue shared
> > across
> > > > > > multiple
> > > > > > > > virtio devices
> > > > > > > > We would like to modify the patch posted in
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > to limit a vhost thread to serve multiple devices only if
they
> > > > belong
> > > > > > to
> > > > > > > > the same VM as Paolo suggested to avoid isolation or
cgroups
> > > > concerns.
> > > > > > > >
> > > > > > > > Another modification is related to the creation and removal
of
> > > > vhost
> > > > > > > > threads, which will be discussed next.
> > > > > > >
> > > > > > > I think this is an exceptionally bad idea.
> > > > > > >
> > > > > > > We shouldn't throw away isolation without exhausting every
other
> > > > > > > possibility.
> > > > > >
> > > > > > Seems you have missed the important details here.
> > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > and you believe we should not share a single vhost thread
across
> > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > so we will serve multiple virtio devices using a single vhost
> > thread
> > > > > > "only if the devices belong to the same VM". This series of
patches
> > > > > > will not allow two different VMs to share the same vhost
thread.
> > > > > > So, I don't see why this will be throwing away isolation and
why
> > > > > > this could be a "exceptionally bad idea".
> > > > > >
> > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > approach of having a single data plane thread for many devices
> > > > > > was discussed....
> > > > > > > We've seen very positive results from adding threads.  We
should
> > also
> > > > > > > look at scheduling.
> > > > > >
> > > > > > ...and we have also seen exceptionally negative results from
> > > > > > adding threads, both for vhost and data-plane. If you have lot
of
> > idle
> > > > > > time/cores
> > > > > > then it makes sense to run multiple threads. But IMHO in many
> > scenarios
> > > > you
> > > > > > don't have lot of idle time/cores.. and if you have them you
would
> > > > probably
> > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
you
> > have
> > > > > > enough physical cores to run all the VCPU threads and the I/O
> > threads
> > > > is
> > > > > > not a
> > > > > > realistic scenario.
> > > >
> > > > >
> > > > > > >
> > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > This patch allows us to add and remove vhost threads
> > dynamically.
> > > > > > > >
> > > > > > > > A simpler way to control the creation of vhost threads is
> > > > statically
> > > > > > > > determining the maximum number of virtio devices per worker
via
> > a
> > > > > > kernel
> > > > > > > > module parameter (which is the way the previously mentioned
> > patch
> > > > is
> > > > > > > > currently implemented)
> > > > > > > >
> > > > > > > > I'd like to ask for advice here about the more preferable
way
> > to
> > > > go:
> > > > > > > > Although having the sysfs mechanism provides more
flexibility,
> > it
> > > > may
> > > > > > be a
> > > > > > > > good idea to start with a simple static parameter, and have
the
> > > > first
> > > > > > > > patches as simple as possible. What do you think?
> > > > > > > >
> > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > Have the vhost thread poll the virtqueues with high I/O
rate
> > for
> > > > new
> > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > >
> > > > > > > Ack on this.
> > > > > >
> > > > > > :)
> > > > > >
> > > > > > Regards,
> > > > > > Abel.
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Anthony Liguori
> > > > > > >
> > > > > > > > 4. vhost statistics
> > > > > > > > This patch introduces a set of statistics to monitor
different
> > > > > > performance
> > > > > > > > metrics of vhost and our polling and I/O scheduling
mechanisms.
> > The
> > > > > > > > statistics are exposed using debugfs and can be easily
> > displayed
> > > > with a
> > > > > >
> > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > >
> > > > > > > >
> > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > heuristics
> > > > > > to
> > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > >
> > > > > > > > This patch improves the handling of the requests by the
vhost
> > > > thread,
> > > > > > but
> > > > > > > > could perhaps be delayed to a
> > > > > > > > later time , and not submitted as one of the first Elvis
> > patches.
> > > > > > > > I'd love to hear some comments about whether this patch
needs
> > to be
> > > > > > part
> > > > > > > > of the first submission.
> > > > > > > >
> > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > Thank you,
> > > > > > > > Razya
> > > > > > >
> > > > >
> > >
> > >
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:55               ` Abel Gordon
@ 2013-11-27 11:03                 ` Michael S. Tsirkin
  2013-11-27 11:05                   ` Abel Gordon
  0 siblings, 1 reply; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 11:03 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky,
	Eyal Moscovici, Yossi Kuperman1

On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:29:43 PM:
> 
> >
> > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00 AM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> team,
> > > which
> > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > > forum:
> > > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > ELVIS slides:
> > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the discussions that took place at the forum,
> > > > > upstreaming
> > > > > > > > > some of the Elvis approaches seems to be a good idea, which
> we
> > > > > would
> > > > > > > like
> > > > > > > > > to pursue.
> > > > > > > > >
> > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > >
> > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > This patch creates a worker thread and worker queue shared
> > > across
> > > > > > > multiple
> > > > > > > > > virtio devices
> > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > to limit a vhost thread to serve multiple devices only if
> they
> > > > > belong
> > > > > > > to
> > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> cgroups
> > > > > concerns.
> > > > > > > > >
> > > > > > > > > Another modification is related to the creation and removal
> of
> > > > > vhost
> > > > > > > > > threads, which will be discussed next.
> > > > > > > >
> > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > >
> > > > > > > > We shouldn't throw away isolation without exhausting every
> other
> > > > > > > > possibility.
> > > > > > >
> > > > > > > Seems you have missed the important details here.
> > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > and you believe we should not share a single vhost thread
> across
> > > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > > so we will serve multiple virtio devices using a single vhost
> > > thread
> > > > > > > "only if the devices belong to the same VM". This series of
> patches
> > > > > > > will not allow two different VMs to share the same vhost
> thread.
> > > > > > > So, I don't see why this will be throwing away isolation and
> why
> > > > > > > this could be a "exceptionally bad idea".
> > > > > > >
> > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > approach of having a single data plane thread for many devices
> > > > > > > was discussed....
> > > > > > > > We've seen very positive results from adding threads.  We
> should
> > > also
> > > > > > > > look at scheduling.
> > > > > > >
> > > > > > > ...and we have also seen exceptionally negative results from
> > > > > > > adding threads, both for vhost and data-plane. If you have lot
> of
> > > idle
> > > > > > > time/cores
> > > > > > > then it makes sense to run multiple threads. But IMHO in many
> > > scenarios
> > > > > you
> > > > > > > don't have lot of idle time/cores.. and if you have them you
> would
> > > > > probably
> > > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
> you
> > > have
> > > > > > > enough physical cores to run all the VCPU threads and the I/O
> > > threads
> > > > > is
> > > > > > > not a
> > > > > > > realistic scenario.
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question then becomes why have multiple devices?
> > > > >
> > > > > I assume that there are guests that have multiple vhost devices
> > > > > (net or scsi/tcm).
> > > >
> > > > These are kind of uncommon though.  In fact a kernel thread is not a
> > > > unit of isolation - cgroups supply isolation.
> > > > If we had use_cgroups kind of like use_mm, we could thinkably
> > > > do work for multiple VMs on the same thread.
> > > >
> > > >
> > > > > We can also extend the approach to consider
> > > > > multiqueue devices, so we can create 1 vhost thread shared for all
> the
> > > > > queues,
> > > > > 1 vhost thread for each queue or a few threads for multiple queues.
> We
> > > > > could also share a thread across multiple queues even if they do
> not
> > > belong
> > > > > to the same device.
> > > > >
> > > > > Remember the experiments Shirley Ma did with the split
> > > > > tx/rx ? If we have a control interface we could support both
> > > > > approaches: different threads or a single thread.
> > > >
> > > >
> > > > I'm a bit concerned about interface managing specific
> > > > threads being so low level.
> > > > What exactly is it that management knows that makes it
> > > > efficient to group threads together?
> > > > That host is over-committed so we should use less CPU?
> > > > I'd like the interface to express that knowledge.
> > > >
> > >
> > > We can expose information such as the amount of I/O being
> > > handled for each queue, the amount of CPU cycles consumed for
> > > processing the I/O, latency and more.
> > > If we start with a simple mechanism that just enables the
> > > feature we can later expose more information to implement a policy
> > > framework that will be responsible for taking the decisions
> > > (the orchestration part).
> >
> > What kind of possible policies do you envision?
> > If we just react to load by balancing the work done,
> > and when over-committed anyway, localize work so
> > we get less IPIs, then this is not policy, this is the mechanism.
> 
> (CCing Eyal Moscovici who is actually prototyping with multiple
> policies and may want to join this thread)
> 
> Starting with basic policies: we can use a single vhost thread
> and create new vhost threads if it becomes saturated and there
> are enough cpu cycles available in the system
> or if the latency (how long the requests in the virtio queues wait
> until they are handled) is too high.
> We can merge threads if the latency is already low or if the threads
> are not saturated.
> 
> There is a hidden trade-off here: when you run more vhost threads you
> may actually be stealing cpu cycles from the vcpu threads and also
> increasing context switches. So, from the vhost perspective it may
> improve performance but from the vcpu threads perspective it may
> degrade performance.

So this is a very interesting problem to solve but what does
management know that suggests it can solve it better?

> >
> >
> > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > > This patch allows us to add and remove vhost threads
> > > dynamically.
> > > > > > > > >
> > > > > > > > > A simpler way to control the creation of vhost threads is
> > > > > statically
> > > > > > > > > determining the maximum number of virtio devices per worker
> via
> > > a
> > > > > > > kernel
> > > > > > > > > module parameter (which is the way the previously mentioned
> > > patch
> > > > > is
> > > > > > > > > currently implemented)
> > > > > > > > >
> > > > > > > > > I'd like to ask for advice here about the more preferable
> way
> > > to
> > > > > go:
> > > > > > > > > Although having the sysfs mechanism provides more
> flexibility,
> > > it
> > > > > may
> > > > > > > be a
> > > > > > > > > good idea to start with a simple static parameter, and have
> the
> > > > > first
> > > > > > > > > patches as simple as possible. What do you think?
> > > > > > > > >
> > > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > > Have the vhost thread poll the virtqueues with high I/O
> rate
> > > for
> > > > > new
> > > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > > >
> > > > > > > > Ack on this.
> > > > > > >
> > > > > > > :)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Abel.
> > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Anthony Liguori
> > > > > > > >
> > > > > > > > > 4. vhost statistics
> > > > > > > > > This patch introduces a set of statistics to monitor
> different
> > > > > > > performance
> > > > > > > > > metrics of vhost and our polling and I/O scheduling
> mechanisms.
> > > The
> > > > > > > > > statistics are exposed using debugfs and can be easily
> > > displayed
> > > > > with a
> > > > > > >
> > > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > > heuristics
> > > > > > > to
> > > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > > >
> > > > > > > > > This patch improves the handling of the requests by the
> vhost
> > > > > thread,
> > > > > > > but
> > > > > > > > > could perhaps be delayed to a
> > > > > > > > > later time , and not submitted as one of the first Elvis
> > > patches.
> > > > > > > > > I'd love to hear some comments about whether this patch
> needs
> > > to be
> > > > > > > part
> > > > > > > > > of the first submission.
> > > > > > > > >
> > > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > > Thank you,
> > > > > > > > > Razya
> > > > > > > >
> > > > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 11:03                 ` Michael S. Tsirkin
@ 2013-11-27 11:05                   ` Abel Gordon
  2013-11-27 11:40                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 39+ messages in thread
From: Abel Gordon @ 2013-11-27 11:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	Eyal Moscovici, gleb, jasowang, Joel Nider, kvm, pbonzini,
	Razya Ladelsky, Yossi Kuperman1



"Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 01:03:25 PM:

>
> On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
> >
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:29:43 PM:
> >
> > >
> > > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00
AM:
> > > >
> > > > >
> > > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > > > >
> > > > > >
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013
11:11:57
> > PM:
> > > > > >
> > > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > > > 08:05:00
> > > > > > PM:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> > team,
> > > > which
> > > > > > > > > > developed Elvis, presented by Abel Gordon at the last
KVM
> > > > forum:
> > > > > > > > > > ELVIS video:
https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > > ELVIS slides:
> > > > > > > >
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > According to the discussions that took place at the
forum,
> > > > > > upstreaming
> > > > > > > > > > some of the Elvis approaches seems to be a good idea,
which
> > we
> > > > > > would
> > > > > > > > like
> > > > > > > > > > to pursue.
> > > > > > > > > >
> > > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > > >
> > > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > > This patch creates a worker thread and worker queue
shared
> > > > across
> > > > > > > > multiple
> > > > > > > > > > virtio devices
> > > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > >
https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > > to limit a vhost thread to serve multiple devices only
if
> > they
> > > > > > belong
> > > > > > > > to
> > > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> > cgroups
> > > > > > concerns.
> > > > > > > > > >
> > > > > > > > > > Another modification is related to the creation and
removal
> > of
> > > > > > vhost
> > > > > > > > > > threads, which will be discussed next.
> > > > > > > > >
> > > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > > >
> > > > > > > > > We shouldn't throw away isolation without exhausting
every
> > other
> > > > > > > > > possibility.
> > > > > > > >
> > > > > > > > Seems you have missed the important details here.
> > > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > > and you believe we should not share a single vhost thread
> > across
> > > > > > > > multiple VMs.  That's why Razya proposed to change the
patch
> > > > > > > > so we will serve multiple virtio devices using a single
vhost
> > > > thread
> > > > > > > > "only if the devices belong to the same VM". This series of
> > patches
> > > > > > > > will not allow two different VMs to share the same vhost
> > thread.
> > > > > > > > So, I don't see why this will be throwing away isolation
and
> > why
> > > > > > > > this could be a "exceptionally bad idea".
> > > > > > > >
> > > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > > approach of having a single data plane thread for many
devices
> > > > > > > > was discussed....
> > > > > > > > > We've seen very positive results from adding threads.  We
> > should
> > > > also
> > > > > > > > > look at scheduling.
> > > > > > > >
> > > > > > > > ...and we have also seen exceptionally negative results
from
> > > > > > > > adding threads, both for vhost and data-plane. If you have
lot
> > of
> > > > idle
> > > > > > > > time/cores
> > > > > > > > then it makes sense to run multiple threads. But IMHO in
many
> > > > scenarios
> > > > > > you
> > > > > > > > don't have lot of idle time/cores.. and if you have them
you
> > would
> > > > > > probably
> > > > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM
when
> > you
> > > > have
> > > > > > > > enough physical cores to run all the VCPU threads and the
I/O
> > > > threads
> > > > > > is
> > > > > > > > not a
> > > > > > > > realistic scenario.
> > > > > > > >
> > > > > > > > That's why we are proposing to implement a mechanism that
will
> > > > enable
> > > > > > > > the management stack to configure 1 thread per I/O device
(as
> > it is
> > > > > > today)
> > > > > > > > or 1 thread for many I/O devices (belonging to the same
VM).
> > > > > > > >
> > > > > > > > > Once you are scheduling multiple guests in a single vhost
> > device,
> > > > you
> > > > > > > > > now create a whole new class of DoS attacks in the best
case
> > > > > > scenario.
> > > > > > > >
> > > > > > > > Again, we are NOT proposing to schedule multiple guests in
a
> > single
> > > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > > belonging
> > > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > > >
> > > > > > >
> > > > > > > I guess a question then becomes why have multiple devices?
> > > > > >
> > > > > > I assume that there are guests that have multiple vhost devices
> > > > > > (net or scsi/tcm).
> > > > >
> > > > > These are kind of uncommon though.  In fact a kernel thread is
not a
> > > > > unit of isolation - cgroups supply isolation.
> > > > > If we had use_cgroups kind of like use_mm, we could thinkably
> > > > > do work for multiple VMs on the same thread.
> > > > >
> > > > >
> > > > > > We can also extend the approach to consider
> > > > > > multiqueue devices, so we can create 1 vhost thread shared for
all
> > the
> > > > > > queues,
> > > > > > 1 vhost thread for each queue or a few threads for multiple
queues.
> > We
> > > > > > could also share a thread across multiple queues even if they
do
> > not
> > > > belong
> > > > > > to the same device.
> > > > > >
> > > > > > Remember the experiments Shirley Ma did with the split
> > > > > > tx/rx ? If we have a control interface we could support both
> > > > > > approaches: different threads or a single thread.
> > > > >
> > > > >
> > > > > I'm a bit concerned about interface managing specific
> > > > > threads being so low level.
> > > > > What exactly is it that management knows that makes it
> > > > > efficient to group threads together?
> > > > > That host is over-committed so we should use less CPU?
> > > > > I'd like the interface to express that knowledge.
> > > > >
> > > >
> > > > We can expose information such as the amount of I/O being
> > > > handled for each queue, the amount of CPU cycles consumed for
> > > > processing the I/O, latency and more.
> > > > If we start with a simple mechanism that just enables the
> > > > feature we can later expose more information to implement a policy
> > > > framework that will be responsible for taking the decisions
> > > > (the orchestration part).
> > >
> > > What kind of possible policies do you envision?
> > > If we just react to load by balancing the work done,
> > > and when over-committed anyway, localize work so
> > > we get less IPIs, then this is not policy, this is the mechanism.
> >
> > (CCing Eyal Moscovici who is actually prototyping with multiple
> > policies and may want to join this thread)
> >
> > Starting with basic policies: we can use a single vhost thread
> > and create new vhost threads if it becomes saturated and there
> > are enough cpu cycles available in the system
> > or if the latency (how long the requests in the virtio queues wait
> > until they are handled) is too high.
> > We can merge threads if the latency is already low or if the threads
> > are not saturated.
> >
> > There is a hidden trade-off here: when you run more vhost threads you
> > may actually be stealing cpu cycles from the vcpu threads and also
> > increasing context switches. So, from the vhost perspective it may
> > improve performance but from the vcpu threads perspective it may
> > degrade performance.
>
> So this is a very interesting problem to solve but what does
> management know that suggests it can solve it better?

Yep, and Eyal is currently working on this.
What the management knows ? depends who the management is :)
Could be just I/O activity (black-box: I/O request rate, I/O
handling rate, latency) or application performance (white-box).

>
> > >
> > >
> > > >
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > > > This patch allows us to add and remove vhost threads
> > > > dynamically.
> > > > > > > > > >
> > > > > > > > > > A simpler way to control the creation of vhost threads
is
> > > > > > statically
> > > > > > > > > > determining the maximum number of virtio devices per
worker
> > via
> > > > a
> > > > > > > > kernel
> > > > > > > > > > module parameter (which is the way the previously
mentioned
> > > > patch
> > > > > > is
> > > > > > > > > > currently implemented)
> > > > > > > > > >
> > > > > > > > > > I'd like to ask for advice here about the more
preferable
> > way
> > > > to
> > > > > > go:
> > > > > > > > > > Although having the sysfs mechanism provides more
> > flexibility,
> > > > it
> > > > > > may
> > > > > > > > be a
> > > > > > > > > > good idea to start with a simple static parameter, and
have
> > the
> > > > > > first
> > > > > > > > > > patches as simple as possible. What do you think?
> > > > > > > > > >
> > > > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > > > Have the vhost thread poll the virtqueues with high I/O
> > rate
> > > > for
> > > > > > new
> > > > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > > >
https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > > > >
> > > > > > > > > Ack on this.
> > > > > > > >
> > > > > > > > :)
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Abel.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Anthony Liguori
> > > > > > > > >
> > > > > > > > > > 4. vhost statistics
> > > > > > > > > > This patch introduces a set of statistics to monitor
> > different
> > > > > > > > performance
> > > > > > > > > > metrics of vhost and our polling and I/O scheduling
> > mechanisms.
> > > > The
> > > > > > > > > > statistics are exposed using debugfs and can be easily
> > > > displayed
> > > > > > with a
> > > > > > > >
> > > > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > > >
https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > > > This patch enhances the round-robin mechanism with a
set of
> > > > > > heuristics
> > > > > > > > to
> > > > > > > > > > decide when to leave a virtqueue and proceed to the
next.
> > > > > > > > > >
https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > > > >
> > > > > > > > > > This patch improves the handling of the requests by the
> > vhost
> > > > > > thread,
> > > > > > > > but
> > > > > > > > > > could perhaps be delayed to a
> > > > > > > > > > later time , and not submitted as one of the first
Elvis
> > > > patches.
> > > > > > > > > > I'd love to hear some comments about whether this patch
> > needs
> > > > to be
> > > > > > > > part
> > > > > > > > > > of the first submission.
> > > > > > > > > >
> > > > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > > > Thank you,
> > > > > > > > > > Razya
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 11:02               ` Abel Gordon
@ 2013-11-27 11:36                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 11:36 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:59:38 PM:
> 
> 
> > On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > > > Hi,
> > > > >
> > > > > Razya is out for a few days, so I will try to answer the questions
> as
> > > well
> > > > > as I can:
> > > > >
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > > > Cc: Anthony Liguori <anthony@codemonkey.ws>,
> abel.gordon@gmail.com,
> > > > > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > > > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel
> Nider/Haifa/
> > > > > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya
> Ladelsky/
> > > > > > Haifa/IBM@IBMIL
> > > > > > Date: 27/11/2013 01:08 AM
> > > > > > Subject: Re: Elvis upstreaming plan
> > > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > > >
> > > > > <edit>
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question then becomes why have multiple devices?
> > > > >
> > > > > If you mean "why serve multiple devices from a single thread" the
> > > answer is
> > > > > that we cannot rely on the Linux scheduler which has no knowledge
> of
> > > I/O
> > > > > queues to do a decent job of scheduling I/O.  The idea is to take
> over
> > > the
> > > > > I/O scheduling responsibilities from the kernel's thread scheduler
> with
> > > a
> > > > > more efficient I/O scheduler inside each vhost thread.  So by
> combining
> > > all
> > > > > of the I/O devices from the same guest (disks, network cards, etc)
> in a
> > > > > single I/O thread, it allows us to provide better scheduling by
> giving
> > > us
> > > > > more knowledge of the nature of the work.  So now instead of
> relying on
> > > the
> > > > > linux scheduler to perform context switches between multiple vhost
> > > threads,
> > > > > we have a single thread context in which we can do the I/O
> scheduling
> > > more
> > > > > efficiently.  We can closely monitor the performance needs of each
> > > queue of
> > > > > each device inside the vhost thread which gives us much more
> > > information
> > > > > than relying on the kernel's thread scheduler.
> > > > > This does not expose any additional opportunities for attacks (DoS
> or
> > > > > other) than are already available since all of the I/O traffic
> belongs
> > > to a
> > > > > single guest.
> > > > > You can make the argument that with low I/O loads this mechanism
> may
> > > not
> > > > > make much difference.  However when you try to maximize the
> utilization
> > > of
> > > > > your hardware (such as in a commercial scenario) this technique can
> > > gain
> > > > > you a large benefit.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Joel Nider
> > > > > Virtualization Research
> > > > > IBM Research and Development
> > > > > Haifa Research Lab
> > > >
> > > > So all this would sound more convincing if we had sharing between VMs.
> > > > When it's only a single VM it's somehow less convincing, isn't it?
> > > > Of course if we would bypass a scheduler like this it becomes harder to
> > > > enforce cgroup limits.
> > >
> > > True, but here the issue becomes isolation/cgroups. We can start to show
> > > the value for VMs that have multiple devices / queues and then we could
> > > re-consider extending the mechanism for multiple VMs (at least as a
> > > experimental feature).
> >
> > Sorry, If it's unsafe we can't merge it even if it's experimental.
> >
> > > > But it might be easier to give scheduler the info it needs to do what we
> > > > need.  Would an API that basically says "run this kthread right now"
> > > > do the trick?
> > >
> > > ...do you really believe it would be possible to push this kind of change
> > > to the Linux scheduler ? In addition, we need more than
> > > "run this kthread right now" because you need to monitor the virtio
> > > ring activity to specify "when" you will like to run a "specific kthread"
> > > and for "how long".
> >
> > How long is easy - just call schedule. When sounds like specifying a
> > deadline which sounds like a reasonable fit to how scheduler works now.
> 
> ... but "when" you should call schedule actually depends on the I/O
> activity of the queues. The patches we shared constantly monitor the
> virtio rings (pending items and for how long they are pending there)
> to decide if we should continue processing the same queue or switch to
> other queue.

Confused. I thought you want to give up CPU to other tasks like VCPUs
and run vhost at some later time.

If it's just between vhost threads, why isn't "run this right now" what
we want?
We just process one queue as long as we want to stay there, when we want to
switch to another one do "run that other thread right now".


> > Certainly adding an in-kernel API sounds like a better approach than
> > a bunch of user-visible ones.
> > So I'm not at all saying we need to change the scheduler - it's more
> > adding APIs to existing functionality.
> 
> Yep, but this may be also difficult to push...

If it's a reasonable thing and actually helps customers
it's not difficult to push, in my experience.




> >
> > > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> > > > image moved to file:
> > > > >  E-mail: JOELN@il.ibm.com
> > > > pic39571.gif)IBM
> > > > >
> > >
> > > > >
> > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> team,
> > > which
> > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > > forum:
> > > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > ELVIS slides:
> > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the discussions that took place at the forum,
> > > > > upstreaming
> > > > > > > > > some of the Elvis approaches seems to be a good idea, which
> we
> > > > > would
> > > > > > > like
> > > > > > > > > to pursue.
> > > > > > > > >
> > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > >
> > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > This patch creates a worker thread and worker queue shared
> > > across
> > > > > > > multiple
> > > > > > > > > virtio devices
> > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > to limit a vhost thread to serve multiple devices only if
> they
> > > > > belong
> > > > > > > to
> > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> cgroups
> > > > > concerns.
> > > > > > > > >
> > > > > > > > > Another modification is related to the creation and removal
> of
> > > > > vhost
> > > > > > > > > threads, which will be discussed next.
> > > > > > > >
> > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > >
> > > > > > > > We shouldn't throw away isolation without exhausting every
> other
> > > > > > > > possibility.
> > > > > > >
> > > > > > > Seems you have missed the important details here.
> > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > and you believe we should not share a single vhost thread
> across
> > > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > > so we will serve multiple virtio devices using a single vhost
> > > thread
> > > > > > > "only if the devices belong to the same VM". This series of
> patches
> > > > > > > will not allow two different VMs to share the same vhost
> thread.
> > > > > > > So, I don't see why this will be throwing away isolation and
> why
> > > > > > > this could be a "exceptionally bad idea".
> > > > > > >
> > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > approach of having a single data plane thread for many devices
> > > > > > > was discussed....
> > > > > > > > We've seen very positive results from adding threads.  We
> should
> > > also
> > > > > > > > look at scheduling.
> > > > > > >
> > > > > > > ...and we have also seen exceptionally negative results from
> > > > > > > adding threads, both for vhost and data-plane. If you have lot
> of
> > > idle
> > > > > > > time/cores
> > > > > > > then it makes sense to run multiple threads. But IMHO in many
> > > scenarios
> > > > > you
> > > > > > > don't have lot of idle time/cores.. and if you have them you
> would
> > > > > probably
> > > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
> you
> > > have
> > > > > > > enough physical cores to run all the VCPU threads and the I/O
> > > threads
> > > > > is
> > > > > > > not a
> > > > > > > realistic scenario.
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > > This patch allows us to add and remove vhost threads
> > > dynamically.
> > > > > > > > >
> > > > > > > > > A simpler way to control the creation of vhost threads is
> > > > > statically
> > > > > > > > > determining the maximum number of virtio devices per worker
> via
> > > a
> > > > > > > kernel
> > > > > > > > > module parameter (which is the way the previously mentioned
> > > patch
> > > > > is
> > > > > > > > > currently implemented)
> > > > > > > > >
> > > > > > > > > I'd like to ask for advice here about the more preferable
> way
> > > to
> > > > > go:
> > > > > > > > > Although having the sysfs mechanism provides more
> flexibility,
> > > it
> > > > > may
> > > > > > > be a
> > > > > > > > > good idea to start with a simple static parameter, and have
> the
> > > > > first
> > > > > > > > > patches as simple as possible. What do you think?
> > > > > > > > >
> > > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > > Have the vhost thread poll the virtqueues with high I/O
> rate
> > > for
> > > > > new
> > > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > > >
> > > > > > > > Ack on this.
> > > > > > >
> > > > > > > :)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Abel.
> > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Anthony Liguori
> > > > > > > >
> > > > > > > > > 4. vhost statistics
> > > > > > > > > This patch introduces a set of statistics to monitor
> different
> > > > > > > performance
> > > > > > > > > metrics of vhost and our polling and I/O scheduling
> mechanisms.
> > > The
> > > > > > > > > statistics are exposed using debugfs and can be easily
> > > displayed
> > > > > with a
> > > > > > >
> > > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > > heuristics
> > > > > > > to
> > > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > > >
> > > > > > > > > This patch improves the handling of the requests by the
> vhost
> > > > > thread,
> > > > > > > but
> > > > > > > > > could perhaps be delayed to a
> > > > > > > > > later time , and not submitted as one of the first Elvis
> > > patches.
> > > > > > > > > I'd love to hear some comments about whether this patch
> needs
> > > to be
> > > > > > > part
> > > > > > > > > of the first submission.
> > > > > > > > >
> > > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > > Thank you,
> > > > > > > > > Razya
> > > > > > > >
> > > > > >
> > > >
> > > >
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 11:05                   ` Abel Gordon
@ 2013-11-27 11:40                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 11:40 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, digitaleric, Eran Raichstein,
	Eyal Moscovici, gleb, jasowang, Joel Nider, kvm, pbonzini,
	Razya Ladelsky, Yossi Kuperman1

On Wed, Nov 27, 2013 at 01:05:40PM +0200, Abel Gordon wrote:
> > > (CCing Eyal Moscovici who is actually prototyping with multiple
> > > policies and may want to join this thread)
> > >
> > > Starting with basic policies: we can use a single vhost thread
> > > and create new vhost threads if it becomes saturated and there
> > > are enough cpu cycles available in the system
> > > or if the latency (how long the requests in the virtio queues wait
> > > until they are handled) is too high.
> > > We can merge threads if the latency is already low or if the threads
> > > are not saturated.
> > >
> > > There is a hidden trade-off here: when you run more vhost threads you
> > > may actually be stealing cpu cycles from the vcpu threads and also
> > > increasing context switches. So, from the vhost perspective it may
> > > improve performance but from the vcpu threads perspective it may
> > > degrade performance.
> >
> > So this is a very interesting problem to solve but what does
> > management know that suggests it can solve it better?
> 
> Yep, and Eyal is currently working on this.
> What the management knows ? depends who the management is :)
> Could be just I/O activity (black-box: I/O request rate, I/O
> handling rate, latency)

We know much more about this than managament, don't we?

> or application performance (white-box).

This would have to come with a proposal for getting
this white-box info out of guest somehow.

-- 
MSR

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27  7:43       ` Joel Nider
  2013-11-27 10:27         ` Michael S. Tsirkin
@ 2013-11-27 15:00         ` Stefan Hajnoczi
  2013-11-27 15:30           ` Michael S. Tsirkin
                             ` (2 more replies)
  1 sibling, 3 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2013-11-27 15:00 UTC (permalink / raw)
  To: Joel Nider
  Cc: Michael S. Tsirkin, Abel Gordon, abel.gordon, Anthony Liguori,
	asias, digitaleric, Eran Raichstein, gleb, jasowang, kvm,
	pbonzini, Razya Ladelsky

On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> Hi,
> 
> Razya is out for a few days, so I will try to answer the questions as well
> as I can:
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> 
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > Haifa/IBM@IBMIL
> > Date: 27/11/2013 01:08 AM
> > Subject: Re: Elvis upstreaming plan
> >
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > >
> <edit>
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> If you mean "why serve multiple devices from a single thread" the answer is
> that we cannot rely on the Linux scheduler which has no knowledge of I/O
> queues to do a decent job of scheduling I/O.  The idea is to take over the
> I/O scheduling responsibilities from the kernel's thread scheduler with a
> more efficient I/O scheduler inside each vhost thread.  So by combining all
> of the I/O devices from the same guest (disks, network cards, etc) in a
> single I/O thread, it allows us to provide better scheduling by giving us
> more knowledge of the nature of the work.  So now instead of relying on the
> linux scheduler to perform context switches between multiple vhost threads,
> we have a single thread context in which we can do the I/O scheduling more
> efficiently.  We can closely monitor the performance needs of each queue of
> each device inside the vhost thread which gives us much more information
> than relying on the kernel's thread scheduler.

And now there are 2 performance-critical pieces that need to be
optimized/tuned instead of just 1:

1. Kernel infrastructure that QEMU and vhost use today but you decided
to bypass.
2. The new ELVIS code which only affects vhost devices in the same VM.

If you split the code paths it results in more effort in the long run
and the benefit seems quite limited once you acknowledge that isolation
is important.

Isn't the sane thing to do taking lessons from ELVIS improving existing
pieces instead of bypassing them?  That way both the single VM and
host-wide performance improves.  And as a bonus non-virtualization use
cases may also benefit.

Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 15:00         ` Stefan Hajnoczi
@ 2013-11-27 15:30           ` Michael S. Tsirkin
  2013-11-28  7:24           ` Joel Nider
  2013-11-28  7:31           ` Abel Gordon
  2 siblings, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 15:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Joel Nider, Abel Gordon, abel.gordon, Anthony Liguori, asias,
	digitaleric, Eran Raichstein, gleb, jasowang, kvm, pbonzini,
	Razya Ladelsky

On Wed, Nov 27, 2013 at 04:00:53PM +0100, Stefan Hajnoczi wrote:
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> > 
> > Razya is out for a few days, so I will try to answer the questions as well
> > as I can:
> > 
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> > 
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > >
> > <edit>
> > > >
> > > > That's why we are proposing to implement a mechanism that will enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> > 
> > If you mean "why serve multiple devices from a single thread" the answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over the
> > I/O scheduling responsibilities from the kernel's thread scheduler with a
> > more efficient I/O scheduler inside each vhost thread.  So by combining all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving us
> > more knowledge of the nature of the work.  So now instead of relying on the
> > linux scheduler to perform context switches between multiple vhost threads,
> > we have a single thread context in which we can do the I/O scheduling more
> > efficiently.  We can closely monitor the performance needs of each queue of
> > each device inside the vhost thread which gives us much more information
> > than relying on the kernel's thread scheduler.
> 
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
> 
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.
> 2. The new ELVIS code which only affects vhost devices in the same VM.
> 
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.
>
> Isn't the sane thing to do taking lessons from ELVIS improving existing
> pieces instead of bypassing them?  That way both the single VM and
> host-wide performance improves.  And as a bonus non-virtualization use
> cases may also benefit.
> 
> Stefan

I'm not sure about that. elvis is all about specific behaviour
patterns that are virtualization specific, and general claims
that we can improve scheduler for all workloads seem somewhat
optimistic.

-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 10:41           ` Abel Gordon
  2013-11-27 10:59             ` Michael S. Tsirkin
@ 2013-11-27 22:33             ` Anthony Liguori
  2013-11-28  8:25               ` Abel Gordon
  1 sibling, 1 reply; 39+ messages in thread
From: Anthony Liguori @ 2013-11-27 22:33 UTC (permalink / raw)
  To: Abel Gordon, Michael S. Tsirkin
  Cc: abel.gordon, asias, digitaleric, Eran Raichstein, gleb, jasowang,
	Joel Nider, kvm, pbonzini, Razya Ladelsky

Abel Gordon <ABELG@il.ibm.com> writes:

> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:
>
>>
>> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
>> > Hi,
>> >
>> > Razya is out for a few days, so I will try to answer the questions as
> well
>> > as I can:
>> >
>> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
>> >
>> > > From: "Michael S. Tsirkin" <mst@redhat.com>
>> > > To: Abel Gordon/Haifa/IBM@IBMIL,
>> > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
>> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
>> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
>> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
>> > > Haifa/IBM@IBMIL
>> > > Date: 27/11/2013 01:08 AM
>> > > Subject: Re: Elvis upstreaming plan
>> > >
>> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
>> > > >
>> > > >
>> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> 08:05:00
>> > PM:
>> > > >
>> > > > >
>> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
>> > > > >
>> > <edit>
>> > > >
>> > > > That's why we are proposing to implement a mechanism that will
> enable
>> > > > the management stack to configure 1 thread per I/O device (as it is
>> > today)
>> > > > or 1 thread for many I/O devices (belonging to the same VM).
>> > > >
>> > > > > Once you are scheduling multiple guests in a single vhost device,
> you
>> > > > > now create a whole new class of DoS attacks in the best case
>> > scenario.
>> > > >
>> > > > Again, we are NOT proposing to schedule multiple guests in a single
>> > > > vhost thread. We are proposing to schedule multiple devices
> belonging
>> > > > to the same guest in a single (or multiple) vhost thread/s.
>> > > >
>> > >
>> > > I guess a question then becomes why have multiple devices?
>> >
>> > If you mean "why serve multiple devices from a single thread" the
> answer is
>> > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
>> > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
>> > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
>> > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
>> > of the I/O devices from the same guest (disks, network cards, etc) in a
>> > single I/O thread, it allows us to provide better scheduling by giving
> us
>> > more knowledge of the nature of the work.  So now instead of relying on
> the
>> > linux scheduler to perform context switches between multiple vhost
> threads,
>> > we have a single thread context in which we can do the I/O scheduling
> more
>> > efficiently.  We can closely monitor the performance needs of each
> queue of
>> > each device inside the vhost thread which gives us much more
> information
>> > than relying on the kernel's thread scheduler.
>> > This does not expose any additional opportunities for attacks (DoS or
>> > other) than are already available since all of the I/O traffic belongs
> to a
>> > single guest.
>> > You can make the argument that with low I/O loads this mechanism may
> not
>> > make much difference.  However when you try to maximize the utilization
> of
>> > your hardware (such as in a commercial scenario) this technique can
> gain
>> > you a large benefit.
>> >
>> > Regards,
>> >
>> > Joel Nider
>> > Virtualization Research
>> > IBM Research and Development
>> > Haifa Research Lab
>>
>> So all this would sound more convincing if we had sharing between VMs.
>> When it's only a single VM it's somehow less convincing, isn't it?
>> Of course if we would bypass a scheduler like this it becomes harder to
>> enforce cgroup limits.
>
> True, but here the issue becomes isolation/cgroups. We can start to show
> the value for VMs that have multiple devices / queues and then we could
> re-consider extending the mechanism for multiple VMs (at least as a
> experimental feature).
>
>> But it might be easier to give scheduler the info it needs to do what we
>> need.  Would an API that basically says "run this kthread right now"
>> do the trick?
>
> ...do you really believe it would be possible to push this kind of change
> to the Linux scheduler ? In addition, we need more than
> "run this kthread right now" because you need to monitor the virtio
> ring activity to specify "when" you will like to run a "specific kthread"
> and for "how long".

Paul Turner has a proposal for exactly this:

http://www.linuxplumbersconf.org/2013/ocw/sessions/1653

The video is up on Youtube I think. It definitely is a general problem
that is not at all virtual I/O specific.

Regards,

Anthony Liguori

>
>>
>> >
>
>> >
>
>> >
>
>> >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
>> image moved to file:
>> >  E-mail: JOELN@il.ibm.com
>> pic39571.gif)IBM
>> >
>
>> >
>
>> >
>> >
>> >
>> >
>> > > > > > Hi all,
>> > > > > >
>> > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team,
> which
>> > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> forum:
>> > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
>> > > > > > ELVIS slides:
>> > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
>> > > > > >
>> > > > > >
>> > > > > > According to the discussions that took place at the forum,
>> > upstreaming
>> > > > > > some of the Elvis approaches seems to be a good idea, which we
>> > would
>> > > > like
>> > > > > > to pursue.
>> > > > > >
>> > > > > > Our plan for the first patches is the following:
>> > > > > >
>> > > > > > 1.Shared vhost thread between mutiple devices
>> > > > > > This patch creates a worker thread and worker queue shared
> across
>> > > > multiple
>> > > > > > virtio devices
>> > > > > > We would like to modify the patch posted in
>> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
>> > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>> > > > > > to limit a vhost thread to serve multiple devices only if they
>> > belong
>> > > > to
>> > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
>> > concerns.
>> > > > > >
>> > > > > > Another modification is related to the creation and removal of
>> > vhost
>> > > > > > threads, which will be discussed next.
>> > > > >
>> > > > > I think this is an exceptionally bad idea.
>> > > > >
>> > > > > We shouldn't throw away isolation without exhausting every other
>> > > > > possibility.
>> > > >
>> > > > Seems you have missed the important details here.
>> > > > Anthony, we are aware you are concerned about isolation
>> > > > and you believe we should not share a single vhost thread across
>> > > > multiple VMs.  That's why Razya proposed to change the patch
>> > > > so we will serve multiple virtio devices using a single vhost
> thread
>> > > > "only if the devices belong to the same VM". This series of patches
>> > > > will not allow two different VMs to share the same vhost thread.
>> > > > So, I don't see why this will be throwing away isolation and why
>> > > > this could be a "exceptionally bad idea".
>> > > >
>> > > > By the way, I remember that during the KVM forum a similar
>> > > > approach of having a single data plane thread for many devices
>> > > > was discussed....
>> > > > > We've seen very positive results from adding threads.  We should
> also
>> > > > > look at scheduling.
>> > > >
>> > > > ...and we have also seen exceptionally negative results from
>> > > > adding threads, both for vhost and data-plane. If you have lot of
> idle
>> > > > time/cores
>> > > > then it makes sense to run multiple threads. But IMHO in many
> scenarios
>> > you
>> > > > don't have lot of idle time/cores.. and if you have them you would
>> > probably
>> > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
> have
>> > > > enough physical cores to run all the VCPU threads and the I/O
> threads
>> > is
>> > > > not a
>> > > > realistic scenario.
>> >
>> > >
>> > > > >
>> > > > > > 2. Sysfs mechanism to add and remove vhost threads
>> > > > > > This patch allows us to add and remove vhost threads
> dynamically.
>> > > > > >
>> > > > > > A simpler way to control the creation of vhost threads is
>> > statically
>> > > > > > determining the maximum number of virtio devices per worker via
> a
>> > > > kernel
>> > > > > > module parameter (which is the way the previously mentioned
> patch
>> > is
>> > > > > > currently implemented)
>> > > > > >
>> > > > > > I'd like to ask for advice here about the more preferable way
> to
>> > go:
>> > > > > > Although having the sysfs mechanism provides more flexibility,
> it
>> > may
>> > > > be a
>> > > > > > good idea to start with a simple static parameter, and have the
>> > first
>> > > > > > patches as simple as possible. What do you think?
>> > > > > >
>> > > > > > 3.Add virtqueue polling mode to vhost
>> > > > > > Have the vhost thread poll the virtqueues with high I/O rate
> for
>> > new
>> > > > > > buffers , and avoid asking the guest to kick us.
>> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
>> > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
>> > > > >
>> > > > > Ack on this.
>> > > >
>> > > > :)
>> > > >
>> > > > Regards,
>> > > > Abel.
>> > > >
>> > > > >
>> > > > > Regards,
>> > > > >
>> > > > > Anthony Liguori
>> > > > >
>> > > > > > 4. vhost statistics
>> > > > > > This patch introduces a set of statistics to monitor different
>> > > > performance
>> > > > > > metrics of vhost and our polling and I/O scheduling mechanisms.
> The
>> > > > > > statistics are exposed using debugfs and can be easily
> displayed
>> > with a
>> > > >
>> > > > > > Python script (vhost_stat, based on the old kvm_stats)
>> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
>> > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
>> > > > > >
>> > > > > >
>> > > > > > 5. Add heuristics to improve I/O scheduling
>> > > > > > This patch enhances the round-robin mechanism with a set of
>> > heuristics
>> > > > to
>> > > > > > decide when to leave a virtqueue and proceed to the next.
>> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
>> > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>> > > > > >
>> > > > > > This patch improves the handling of the requests by the vhost
>> > thread,
>> > > > but
>> > > > > > could perhaps be delayed to a
>> > > > > > later time , and not submitted as one of the first Elvis
> patches.
>> > > > > > I'd love to hear some comments about whether this patch needs
> to be
>> > > > part
>> > > > > > of the first submission.
>> > > > > >
>> > > > > > Any other feedback on this plan will be appreciated,
>> > > > > > Thank you,
>> > > > > > Razya
>> > > > >
>> > >
>>
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 15:00         ` Stefan Hajnoczi
  2013-11-27 15:30           ` Michael S. Tsirkin
@ 2013-11-28  7:24           ` Joel Nider
  2013-11-28  7:31           ` Abel Gordon
  2 siblings, 0 replies; 39+ messages in thread
From: Joel Nider @ 2013-11-28  7:24 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Abel Gordon, abel.gordon, Anthony Liguori, asias, digitaleric,
	Eran Raichstein, gleb, jasowang, kvm, Michael S. Tsirkin,
	pbonzini, Razya Ladelsky


Stefan Hajnoczi <stefanha@gmail.com> wrote on 27/11/2013 05:00:53 PM:

> From: Stefan Hajnoczi <stefanha@gmail.com>
> To: Joel Nider/Haifa/IBM@IBMIL,
> Cc: "Michael S. Tsirkin" <mst@redhat.com>, Abel Gordon/Haifa/
> IBM@IBMIL, abel.gordon@gmail.com, Anthony Liguori
> <anthony@codemonkey.ws>, asias@redhat.com, digitaleric@google.com,
> Eran Raichstein/Haifa/IBM@IBMIL, gleb@redhat.com,
> jasowang@redhat.com, kvm@vger.kernel.org, pbonzini@redhat.com, Razya
> Ladelsky/Haifa/IBM@IBMIL
> Date: 27/11/2013 05:00 PM
> Subject: Re: Elvis upstreaming plan
>
> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > >
> > <edit>
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
>
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
>
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.
> 2. The new ELVIS code which only affects vhost devices in the same VM.
>
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.

Yes you are correct that there are now 2 performance-critical pieces of
code.  However what we are proposing is just proper module decoupling.  I
believe you will be hard pressed to make a good case that all of this logic
could be integrated into the Linux thread scheduler more efficiently.
Think of this as an I/O scheduler for virtualized guests.  I don't believe
anyone would try to integrate the Linux I/O schedulers with the Linux
thread scheduler, even though they are both performance-critical modules?
Even if we were to take the route of using these principles to improve the
existing scheduler, I have to ask: which scheduler?  If we spend this
effort on CFS (completely fair scheduler) but then someone switches their
thread scheduler to O(1) or some other scheduler, all of our advantage
would be lost.  We would then have to reimplement for every possible thread
scheduler.

I don't agree that we are losing isolation, even if you go with the "full
ELVIS" which was originally proposed.  But that is a discussion for another
day.  For now, let's agree that in this "reduced ELVIS" solution, no
isolation is lost, since each vhost thread is only dealing with I/O from
the same guest.

As for more effort - for whom do you mean?  Development time? Maintenance
effort? CPU time?  I would say all of those are actually less effort in the
long run. Dividing responsibility between modules with well-defined
interfaces reduces both development and maintenance effort. If we were to
modify the thread scheduler, there would be many corner cases and
interactions introduced which may take some time to work out.  By
separating the responsibility to a different module, we can avoid having to
modify a very critical, central piece of code to add functionality for a
special-case.  This also reduces CPU time since there are fewer threads to
be scheduled, and the scheduling algorithm itself doesn't become more
complicated with more information about I/O queue lengths, waiting times,
priorities, etc.  In the optimal case, the vhost threads would be run on
dedicated cores with little or no contention, rather than being
interspersed with VCPU threads or other Linux process threads.

Joel

> Isn't the sane thing to do taking lessons from ELVIS improving existing
> pieces instead of bypassing them?  That way both the single VM and
> host-wide performance improves.  And as a bonus non-virtualization use
> cases may also benefit.
>
> Stefan
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 15:00         ` Stefan Hajnoczi
  2013-11-27 15:30           ` Michael S. Tsirkin
  2013-11-28  7:24           ` Joel Nider
@ 2013-11-28  7:31           ` Abel Gordon
  2013-11-28 11:01             ` Michael S. Tsirkin
  2013-12-02 15:11             ` Stefan Hajnoczi
  2 siblings, 2 replies; 39+ messages in thread
From: Abel Gordon @ 2013-11-28  7:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky



Stefan Hajnoczi <stefanha@gmail.com> wrote on 27/11/2013 05:00:53 PM:

> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > Hi,
> >
> > Razya is out for a few days, so I will try to answer the questions as
well
> > as I can:
> >
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> >
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > Haifa/IBM@IBMIL
> > > Date: 27/11/2013 01:08 AM
> > > Subject: Re: Elvis upstreaming plan
> > >
> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > >
> > > >
> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
08:05:00
> > PM:
> > > >
> > > > >
> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > >
> > <edit>
> > > >
> > > > That's why we are proposing to implement a mechanism that will
enable
> > > > the management stack to configure 1 thread per I/O device (as it is
> > today)
> > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > >
> > > > > Once you are scheduling multiple guests in a single vhost device,
you
> > > > > now create a whole new class of DoS attacks in the best case
> > scenario.
> > > >
> > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > vhost thread. We are proposing to schedule multiple devices
belonging
> > > > to the same guest in a single (or multiple) vhost thread/s.
> > > >
> > >
> > > I guess a question then becomes why have multiple devices?
> >
> > If you mean "why serve multiple devices from a single thread" the
answer is
> > that we cannot rely on the Linux scheduler which has no knowledge of
I/O
> > queues to do a decent job of scheduling I/O.  The idea is to take over
the
> > I/O scheduling responsibilities from the kernel's thread scheduler with
a
> > more efficient I/O scheduler inside each vhost thread.  So by combining
all
> > of the I/O devices from the same guest (disks, network cards, etc) in a
> > single I/O thread, it allows us to provide better scheduling by giving
us
> > more knowledge of the nature of the work.  So now instead of relying on
the
> > linux scheduler to perform context switches between multiple vhost
threads,
> > we have a single thread context in which we can do the I/O scheduling
more
> > efficiently.  We can closely monitor the performance needs of each
queue of
> > each device inside the vhost thread which gives us much more
information
> > than relying on the kernel's thread scheduler.
>
> And now there are 2 performance-critical pieces that need to be
> optimized/tuned instead of just 1:
>
> 1. Kernel infrastructure that QEMU and vhost use today but you decided
> to bypass.

We are NOT bypassing existing components. We are just changing the
threading
model: instead of having one vhost-thread per virtio device, we propose to
use
1 vhost thread to server devices belonging to the same VM. In addition, we
propose to add new features such as polling.

> 2. The new ELVIS code which only affects vhost devices in the same VM.

Also existent vhost code (or any other user-space back-end) should be
optimized/tuned if you care about performance.

>
> If you split the code paths it results in more effort in the long run
> and the benefit seems quite limited once you acknowledge that isolation
> is important.

Isolation is important but the question is what isolation means ?
I personally don't believe that 2 kernel threads provide more
isolation than 1 kernel threat that changes the mm (use_mm) and
avoids queue starvation.
Anyway, we propose to start with the simple approach (not sharing
threads across VMs) but once we show the value for this case we
can discuss if it makes sense to extend the approach and share
threads between different VMs.


> Isn't the sane thing to do taking lessons from ELVIS improving existing
> pieces instead of bypassing them?  That way both the single VM and
> host-wide performance improves.  And as a bonus non-virtualization use
> cases may also benefit.

The model we are proposing are specific for I/O virtualization... not sure
if they are applicable to bare-metal.

>
> Stefan
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-27 22:33             ` Anthony Liguori
@ 2013-11-28  8:25               ` Abel Gordon
  0 siblings, 0 replies; 39+ messages in thread
From: Abel Gordon @ 2013-11-28  8:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: abel.gordon, asias, digitaleric, Eran Raichstein, gleb, jasowang,
	Joel Nider, kvm, Michael S. Tsirkin, pbonzini, Razya Ladelsky,
	Eyal Moscovici, Yossi Kuperman1



Anthony Liguori <anthony@codemonkey.ws> wrote on 28/11/2013 12:33:36 AM:

> From: Anthony Liguori <anthony@codemonkey.ws>
> To: Abel Gordon/Haifa/IBM@IBMIL, "Michael S. Tsirkin" <mst@redhat.com>,
> Cc: abel.gordon@gmail.com, asias@redhat.com, digitaleric@google.com,
> Eran Raichstein/Haifa/IBM@IBMIL, gleb@redhat.com,
> jasowang@redhat.com, Joel Nider/Haifa/IBM@IBMIL,
> kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/Haifa/IBM@IBMIL
> Date: 28/11/2013 12:33 AM
> Subject: Re: Elvis upstreaming plan
>
> Abel Gordon <ABELG@il.ibm.com> writes:
>
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:27:19 PM:
> >
> >>
> >> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> >> > Hi,
> >> >
> >> > Razya is out for a few days, so I will try to answer the questions
as
> > well
> >> > as I can:
> >> >
> >> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
PM:
> >> >
> >> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> >> > > To: Abel Gordon/Haifa/IBM@IBMIL,
> >> > > Cc: Anthony Liguori <anthony@codemonkey.ws>,
abel.gordon@gmail.com,
> >> > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> >> > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> >> > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya
Ladelsky/
> >> > > Haifa/IBM@IBMIL
> >> > > Date: 27/11/2013 01:08 AM
> >> > > Subject: Re: Elvis upstreaming plan
> >> > >
> >> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> >> > > >
> >> > > >
> >> > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > 08:05:00
> >> > PM:
> >> > > >
> >> > > > >
> >> > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> >> > > > >
> >> > <edit>
> >> > > >
> >> > > > That's why we are proposing to implement a mechanism that will
> > enable
> >> > > > the management stack to configure 1 thread per I/O device (as it
is
> >> > today)
> >> > > > or 1 thread for many I/O devices (belonging to the same VM).
> >> > > >
> >> > > > > Once you are scheduling multiple guests in a single vhost
device,
> > you
> >> > > > > now create a whole new class of DoS attacks in the best case
> >> > scenario.
> >> > > >
> >> > > > Again, we are NOT proposing to schedule multiple guests in a
single
> >> > > > vhost thread. We are proposing to schedule multiple devices
> > belonging
> >> > > > to the same guest in a single (or multiple) vhost thread/s.
> >> > > >
> >> > >
> >> > > I guess a question then becomes why have multiple devices?
> >> >
> >> > If you mean "why serve multiple devices from a single thread" the
> > answer is
> >> > that we cannot rely on the Linux scheduler which has no knowledge of
> > I/O
> >> > queues to do a decent job of scheduling I/O.  The idea is to take
over
> > the
> >> > I/O scheduling responsibilities from the kernel's thread scheduler
with
> > a
> >> > more efficient I/O scheduler inside each vhost thread.  So by
combining
> > all
> >> > of the I/O devices from the same guest (disks, network cards, etc)
in a
> >> > single I/O thread, it allows us to provide better scheduling by
giving
> > us
> >> > more knowledge of the nature of the work.  So now instead of relying
on
> > the
> >> > linux scheduler to perform context switches between multiple vhost
> > threads,
> >> > we have a single thread context in which we can do the I/O
scheduling
> > more
> >> > efficiently.  We can closely monitor the performance needs of each
> > queue of
> >> > each device inside the vhost thread which gives us much more
> > information
> >> > than relying on the kernel's thread scheduler.
> >> > This does not expose any additional opportunities for attacks (DoS
or
> >> > other) than are already available since all of the I/O traffic
belongs
> > to a
> >> > single guest.
> >> > You can make the argument that with low I/O loads this mechanism may
> > not
> >> > make much difference.  However when you try to maximize the
utilization
> > of
> >> > your hardware (such as in a commercial scenario) this technique can
> > gain
> >> > you a large benefit.
> >> >
> >> > Regards,
> >> >
> >> > Joel Nider
> >> > Virtualization Research
> >> > IBM Research and Development
> >> > Haifa Research Lab
> >>
> >> So all this would sound more convincing if we had sharing between VMs.
> >> When it's only a single VM it's somehow less convincing, isn't it?
> >> Of course if we would bypass a scheduler like this it becomes harder
to
> >> enforce cgroup limits.
> >
> > True, but here the issue becomes isolation/cgroups. We can start to
show
> > the value for VMs that have multiple devices / queues and then we could
> > re-consider extending the mechanism for multiple VMs (at least as a
> > experimental feature).
> >
> >> But it might be easier to give scheduler the info it needs to do what
we
> >> need.  Would an API that basically says "run this kthread right now"
> >> do the trick?
> >
> > ...do you really believe it would be possible to push this kind of
change
> > to the Linux scheduler ? In addition, we need more than
> > "run this kthread right now" because you need to monitor the virtio
> > ring activity to specify "when" you will like to run a "specific
kthread"
> > and for "how long".
>
> Paul Turner has a proposal for exactly this:
>
> http://www.linuxplumbersconf.org/2013/ocw/sessions/1653
>
> The video is up on Youtube I think. It definitely is a general problem
> that is not at all virtual I/O specific.

Interesting, thanks for sharing. If you have a link to concrete patches
or the youtube video please share. It's difficult to understand if the
proposal considers all the requirements only from the abstract/slides.

By the way, do you know what was the feedback from the community ?

>
> Regards,
>
> Anthony Liguori
>
> >
> >>
> >> >
> >
> >> >
> >
> >> >
> >
> >> >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> >> image moved to file:
> >> >  E-mail: JOELN@il.ibm.com
> >> pic39571.gif)IBM
> >> >
> >
> >> >
> >
> >> >
> >> >
> >> >
> >> >
> >> > > > > > Hi all,
> >> > > > > >
> >> > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
team,
> > which
> >> > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > forum:
> >> > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> >> > > > > > ELVIS slides:
> >> > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> >> > > > > >
> >> > > > > >
> >> > > > > > According to the discussions that took place at the forum,
> >> > upstreaming
> >> > > > > > some of the Elvis approaches seems to be a good idea, which
we
> >> > would
> >> > > > like
> >> > > > > > to pursue.
> >> > > > > >
> >> > > > > > Our plan for the first patches is the following:
> >> > > > > >
> >> > > > > > 1.Shared vhost thread between mutiple devices
> >> > > > > > This patch creates a worker thread and worker queue shared
> > across
> >> > > > multiple
> >> > > > > > virtio devices
> >> > > > > > We would like to modify the patch posted in
> >> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> >> > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> >> > > > > > to limit a vhost thread to serve multiple devices only if
they
> >> > belong
> >> > > > to
> >> > > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> >> > concerns.
> >> > > > > >
> >> > > > > > Another modification is related to the creation and removal
of
> >> > vhost
> >> > > > > > threads, which will be discussed next.
> >> > > > >
> >> > > > > I think this is an exceptionally bad idea.
> >> > > > >
> >> > > > > We shouldn't throw away isolation without exhausting every
other
> >> > > > > possibility.
> >> > > >
> >> > > > Seems you have missed the important details here.
> >> > > > Anthony, we are aware you are concerned about isolation
> >> > > > and you believe we should not share a single vhost thread across
> >> > > > multiple VMs.  That's why Razya proposed to change the patch
> >> > > > so we will serve multiple virtio devices using a single vhost
> > thread
> >> > > > "only if the devices belong to the same VM". This series of
patches
> >> > > > will not allow two different VMs to share the same vhost thread.
> >> > > > So, I don't see why this will be throwing away isolation and why
> >> > > > this could be a "exceptionally bad idea".
> >> > > >
> >> > > > By the way, I remember that during the KVM forum a similar
> >> > > > approach of having a single data plane thread for many devices
> >> > > > was discussed....
> >> > > > > We've seen very positive results from adding threads.  We
should
> > also
> >> > > > > look at scheduling.
> >> > > >
> >> > > > ...and we have also seen exceptionally negative results from
> >> > > > adding threads, both for vhost and data-plane. If you have lot
of
> > idle
> >> > > > time/cores
> >> > > > then it makes sense to run multiple threads. But IMHO in many
> > scenarios
> >> > you
> >> > > > don't have lot of idle time/cores.. and if you have them you
would
> >> > probably
> >> > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you
> > have
> >> > > > enough physical cores to run all the VCPU threads and the I/O
> > threads
> >> > is
> >> > > > not a
> >> > > > realistic scenario.
> >> >
> >> > >
> >> > > > >
> >> > > > > > 2. Sysfs mechanism to add and remove vhost threads
> >> > > > > > This patch allows us to add and remove vhost threads
> > dynamically.
> >> > > > > >
> >> > > > > > A simpler way to control the creation of vhost threads is
> >> > statically
> >> > > > > > determining the maximum number of virtio devices per worker
via
> > a
> >> > > > kernel
> >> > > > > > module parameter (which is the way the previously mentioned
> > patch
> >> > is
> >> > > > > > currently implemented)
> >> > > > > >
> >> > > > > > I'd like to ask for advice here about the more preferable
way
> > to
> >> > go:
> >> > > > > > Although having the sysfs mechanism provides more
flexibility,
> > it
> >> > may
> >> > > > be a
> >> > > > > > good idea to start with a simple static parameter, and have
the
> >> > first
> >> > > > > > patches as simple as possible. What do you think?
> >> > > > > >
> >> > > > > > 3.Add virtqueue polling mode to vhost
> >> > > > > > Have the vhost thread poll the virtqueues with high I/O rate
> > for
> >> > new
> >> > > > > > buffers , and avoid asking the guest to kick us.
> >> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> >> > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> >> > > > >
> >> > > > > Ack on this.
> >> > > >
> >> > > > :)
> >> > > >
> >> > > > Regards,
> >> > > > Abel.
> >> > > >
> >> > > > >
> >> > > > > Regards,
> >> > > > >
> >> > > > > Anthony Liguori
> >> > > > >
> >> > > > > > 4. vhost statistics
> >> > > > > > This patch introduces a set of statistics to monitor
different
> >> > > > performance
> >> > > > > > metrics of vhost and our polling and I/O scheduling
mechanisms.
> > The
> >> > > > > > statistics are exposed using debugfs and can be easily
> > displayed
> >> > with a
> >> > > >
> >> > > > > > Python script (vhost_stat, based on the old kvm_stats)
> >> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> >> > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> >> > > > > >
> >> > > > > >
> >> > > > > > 5. Add heuristics to improve I/O scheduling
> >> > > > > > This patch enhances the round-robin mechanism with a set of
> >> > heuristics
> >> > > > to
> >> > > > > > decide when to leave a virtqueue and proceed to the next.
> >> > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> >> > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> >> > > > > >
> >> > > > > > This patch improves the handling of the requests by the
vhost
> >> > thread,
> >> > > > but
> >> > > > > > could perhaps be delayed to a
> >> > > > > > later time , and not submitted as one of the first Elvis
> > patches.
> >> > > > > > I'd love to hear some comments about whether this patch
needs
> > to be
> >> > > > part
> >> > > > > > of the first submission.
> >> > > > > >
> >> > > > > > Any other feedback on this plan will be appreciated,
> >> > > > > > Thank you,
> >> > > > > > Razya
> >> > > > >
> >> > >
> >>
> >>
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-28  7:31           ` Abel Gordon
@ 2013-11-28 11:01             ` Michael S. Tsirkin
  2013-12-02 15:11             ` Stefan Hajnoczi
  1 sibling, 0 replies; 39+ messages in thread
From: Michael S. Tsirkin @ 2013-11-28 11:01 UTC (permalink / raw)
  To: Abel Gordon
  Cc: Stefan Hajnoczi, abel.gordon, Anthony Liguori, asias, digitaleric,
	Eran Raichstein, gleb, jasowang, Joel Nider, kvm, pbonzini,
	Razya Ladelsky

On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
> Isolation is important but the question is what isolation means ?

Mostly two things:
- Count resource usage against the correct cgroups,
  and limit it as appropriate
- If one user does something silly and is blocked,
  another user isn't affected


-- 
MST

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Elvis upstreaming plan
  2013-11-28  7:31           ` Abel Gordon
  2013-11-28 11:01             ` Michael S. Tsirkin
@ 2013-12-02 15:11             ` Stefan Hajnoczi
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2013-12-02 15:11 UTC (permalink / raw)
  To: Abel Gordon
  Cc: abel.gordon, Anthony Liguori, asias, digitaleric, Eran Raichstein,
	gleb, jasowang, Joel Nider, kvm, Michael S. Tsirkin, pbonzini,
	Razya Ladelsky

On Thu, Nov 28, 2013 at 09:31:50AM +0200, Abel Gordon wrote:
> 
> 
> Stefan Hajnoczi <stefanha@gmail.com> wrote on 27/11/2013 05:00:53 PM:
> 
> > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > Hi,
> > >
> > > Razya is out for a few days, so I will try to answer the questions as
> well
> > > as I can:
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> > >
> > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > > > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > > > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > > > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > > > Haifa/IBM@IBMIL
> > > > Date: 27/11/2013 01:08 AM
> > > > Subject: Re: Elvis upstreaming plan
> > > >
> > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> 08:05:00
> > > PM:
> > > > >
> > > > > >
> > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > >
> > > <edit>
> > > > >
> > > > > That's why we are proposing to implement a mechanism that will
> enable
> > > > > the management stack to configure 1 thread per I/O device (as it is
> > > today)
> > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > >
> > > > > > Once you are scheduling multiple guests in a single vhost device,
> you
> > > > > > now create a whole new class of DoS attacks in the best case
> > > scenario.
> > > > >
> > > > > Again, we are NOT proposing to schedule multiple guests in a single
> > > > > vhost thread. We are proposing to schedule multiple devices
> belonging
> > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > >
> > > >
> > > > I guess a question then becomes why have multiple devices?
> > >
> > > If you mean "why serve multiple devices from a single thread" the
> answer is
> > > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
> > > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
> > > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
> > > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
> > > of the I/O devices from the same guest (disks, network cards, etc) in a
> > > single I/O thread, it allows us to provide better scheduling by giving
> us
> > > more knowledge of the nature of the work.  So now instead of relying on
> the
> > > linux scheduler to perform context switches between multiple vhost
> threads,
> > > we have a single thread context in which we can do the I/O scheduling
> more
> > > efficiently.  We can closely monitor the performance needs of each
> queue of
> > > each device inside the vhost thread which gives us much more
> information
> > > than relying on the kernel's thread scheduler.
> >
> > And now there are 2 performance-critical pieces that need to be
> > optimized/tuned instead of just 1:
> >
> > 1. Kernel infrastructure that QEMU and vhost use today but you decided
> > to bypass.
> 
> We are NOT bypassing existing components. We are just changing the
> threading
> model: instead of having one vhost-thread per virtio device, we propose to
> use
> 1 vhost thread to server devices belonging to the same VM. In addition, we
> propose to add new features such as polling.

What I meant with "bypassing" is that reducing scope to single VMs
leaves multi-VM performance unchanged.  I know the original aim was to
improve multi-VM performance too and I hope that will be possible by
extending the current approach.

Stefan

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-12-02 15:11 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
2013-11-24 10:26 ` Michael S. Tsirkin
2013-11-25 11:06   ` Razya Ladelsky
2013-11-26 15:50 ` Stefan Hajnoczi
2013-11-26 18:05 ` Anthony Liguori
2013-11-26 18:53   ` Abel Gordon
2013-11-26 21:11     ` Michael S. Tsirkin
2013-11-27  7:43       ` Joel Nider
2013-11-27 10:27         ` Michael S. Tsirkin
2013-11-27 10:41           ` Abel Gordon
2013-11-27 10:59             ` Michael S. Tsirkin
2013-11-27 11:02               ` Abel Gordon
2013-11-27 11:36                 ` Michael S. Tsirkin
2013-11-27 22:33             ` Anthony Liguori
2013-11-28  8:25               ` Abel Gordon
2013-11-27 15:00         ` Stefan Hajnoczi
2013-11-27 15:30           ` Michael S. Tsirkin
2013-11-28  7:24           ` Joel Nider
2013-11-28  7:31           ` Abel Gordon
2013-11-28 11:01             ` Michael S. Tsirkin
2013-12-02 15:11             ` Stefan Hajnoczi
2013-11-27  9:03       ` Abel Gordon
2013-11-27  9:21         ` Michael S. Tsirkin
2013-11-27  9:49           ` Abel Gordon
2013-11-27 10:29             ` Michael S. Tsirkin
2013-11-27 10:55               ` Abel Gordon
2013-11-27 11:03                 ` Michael S. Tsirkin
2013-11-27 11:05                   ` Abel Gordon
2013-11-27 11:40                     ` Michael S. Tsirkin
2013-11-26 22:27 ` Bandan Das
2013-11-27  2:49 ` Jason Wang
2013-11-27  7:35   ` Gleb Natapov
2013-11-27  7:45     ` Joel Nider
2013-11-27  9:18     ` Abel Gordon
2013-11-27  9:21       ` Gleb Natapov
2013-11-27  9:33         ` Abel Gordon
2013-11-27  9:48           ` Gleb Natapov
2013-11-27 10:18   ` Abel Gordon
2013-11-27 10:37     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox