Re: Elvis upstreaming plan - Michael S. Tsirkin

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Joel Nider <JOELN@il.ibm.com>
Cc: Abel Gordon <ABELG@il.ibm.com>,
	abel.gordon@gmail.com, Anthony Liguori <anthony@codemonkey.ws>,
	asias@redhat.com, digitaleric@google.com,
	Eran Raichstein <ERANRA@il.ibm.com>,
	gleb@redhat.com, jasowang@redhat.com, kvm@vger.kernel.org,
	pbonzini@redhat.com, Razya Ladelsky <RAZYA@il.ibm.com>
Subject: Re: Elvis upstreaming plan
Date: Wed, 27 Nov 2013 12:27:19 +0200	[thread overview]
Message-ID: <20131127102719.GC29446@redhat.com> (raw)
In-Reply-To: <OF0078B532.190A1E2C-ON00257C30.00336E0F-C2257C30.002A74D2@il.ibm.com>

On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> Hi,
> 
> Razya is out for a few days, so I will try to answer the questions as well
> as I can:
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57 PM:
> 
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > To: Abel Gordon/Haifa/IBM@IBMIL,
> > Cc: Anthony Liguori <anthony@codemonkey.ws>, abel.gordon@gmail.com,
> > asias@redhat.com, digitaleric@google.com, Eran Raichstein/Haifa/
> > IBM@IBMIL, gleb@redhat.com, jasowang@redhat.com, Joel Nider/Haifa/
> > IBM@IBMIL, kvm@vger.kernel.org, pbonzini@redhat.com, Razya Ladelsky/
> > Haifa/IBM@IBMIL
> > Date: 27/11/2013 01:08 AM
> > Subject: Re: Elvis upstreaming plan
> >
> > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > >
> > >
> > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013 08:05:00
> PM:
> > >
> > > >
> > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > >
> <edit>
> > >
> > > That's why we are proposing to implement a mechanism that will enable
> > > the management stack to configure 1 thread per I/O device (as it is
> today)
> > > or 1 thread for many I/O devices (belonging to the same VM).
> > >
> > > > Once you are scheduling multiple guests in a single vhost device, you
> > > > now create a whole new class of DoS attacks in the best case
> scenario.
> > >
> > > Again, we are NOT proposing to schedule multiple guests in a single
> > > vhost thread. We are proposing to schedule multiple devices belonging
> > > to the same guest in a single (or multiple) vhost thread/s.
> > >
> >
> > I guess a question then becomes why have multiple devices?
> 
> If you mean "why serve multiple devices from a single thread" the answer is
> that we cannot rely on the Linux scheduler which has no knowledge of I/O
> queues to do a decent job of scheduling I/O.  The idea is to take over the
> I/O scheduling responsibilities from the kernel's thread scheduler with a
> more efficient I/O scheduler inside each vhost thread.  So by combining all
> of the I/O devices from the same guest (disks, network cards, etc) in a
> single I/O thread, it allows us to provide better scheduling by giving us
> more knowledge of the nature of the work.  So now instead of relying on the
> linux scheduler to perform context switches between multiple vhost threads,
> we have a single thread context in which we can do the I/O scheduling more
> efficiently.  We can closely monitor the performance needs of each queue of
> each device inside the vhost thread which gives us much more information
> than relying on the kernel's thread scheduler.
> This does not expose any additional opportunities for attacks (DoS or
> other) than are already available since all of the I/O traffic belongs to a
> single guest.
> You can make the argument that with low I/O loads this mechanism may not
> make much difference.  However when you try to maximize the utilization of
> your hardware (such as in a commercial scenario) this technique can gain
> you a large benefit.
> 
> Regards,
> 
> Joel Nider
> Virtualization Research
> IBM Research and Development
> Haifa Research Lab

So all this would sound more convincing if we had sharing between VMs.
When it's only a single VM it's somehow less convincing, isn't it?
Of course if we would bypass a scheduler like this it becomes harder to
enforce cgroup limits.
But it might be easier to give scheduler the info it needs to do what we
need.  Would an API that basically says "run this kthread right now"
do the trick?


>                                                                                         
>                                                                                         
>                                                                                         
>  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded image moved to file: 
>  E-mail: JOELN@il.ibm.com                                              pic39571.gif)IBM 
>                                                                                         
>                                                                                         
> 
> 
> 
> 
> > > > > Hi all,
> > > > >
> > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which
> > > > > developed Elvis, presented by Abel Gordon at the last KVM forum:
> > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > ELVIS slides:
> > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > >
> > > > >
> > > > > According to the discussions that took place at the forum,
> upstreaming
> > > > > some of the Elvis approaches seems to be a good idea, which we
> would
> > > like
> > > > > to pursue.
> > > > >
> > > > > Our plan for the first patches is the following:
> > > > >
> > > > > 1.Shared vhost thread between mutiple devices
> > > > > This patch creates a worker thread and worker queue shared across
> > > multiple
> > > > > virtio devices
> > > > > We would like to modify the patch posted in
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > to limit a vhost thread to serve multiple devices only if they
> belong
> > > to
> > > > > the same VM as Paolo suggested to avoid isolation or cgroups
> concerns.
> > > > >
> > > > > Another modification is related to the creation and removal of
> vhost
> > > > > threads, which will be discussed next.
> > > >
> > > > I think this is an exceptionally bad idea.
> > > >
> > > > We shouldn't throw away isolation without exhausting every other
> > > > possibility.
> > >
> > > Seems you have missed the important details here.
> > > Anthony, we are aware you are concerned about isolation
> > > and you believe we should not share a single vhost thread across
> > > multiple VMs.  That's why Razya proposed to change the patch
> > > so we will serve multiple virtio devices using a single vhost thread
> > > "only if the devices belong to the same VM". This series of patches
> > > will not allow two different VMs to share the same vhost thread.
> > > So, I don't see why this will be throwing away isolation and why
> > > this could be a "exceptionally bad idea".
> > >
> > > By the way, I remember that during the KVM forum a similar
> > > approach of having a single data plane thread for many devices
> > > was discussed....
> > > > We've seen very positive results from adding threads.  We should also
> > > > look at scheduling.
> > >
> > > ...and we have also seen exceptionally negative results from
> > > adding threads, both for vhost and data-plane. If you have lot of idle
> > > time/cores
> > > then it makes sense to run multiple threads. But IMHO in many scenarios
> you
> > > don't have lot of idle time/cores.. and if you have them you would
> probably
> > > prefer to run more VMs/VCPUs....hosting a single SMP VM when you have
> > > enough physical cores to run all the VCPU threads and the I/O threads
> is
> > > not a
> > > realistic scenario.
> 
> >
> > > >
> > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > This patch allows us to add and remove vhost threads dynamically.
> > > > >
> > > > > A simpler way to control the creation of vhost threads is
> statically
> > > > > determining the maximum number of virtio devices per worker via a
> > > kernel
> > > > > module parameter (which is the way the previously mentioned patch
> is
> > > > > currently implemented)
> > > > >
> > > > > I'd like to ask for advice here about the more preferable way to
> go:
> > > > > Although having the sysfs mechanism provides more flexibility, it
> may
> > > be a
> > > > > good idea to start with a simple static parameter, and have the
> first
> > > > > patches as simple as possible. What do you think?
> > > > >
> > > > > 3.Add virtqueue polling mode to vhost
> > > > > Have the vhost thread poll the virtqueues with high I/O rate for
> new
> > > > > buffers , and avoid asking the guest to kick us.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > >
> > > > Ack on this.
> > >
> > > :)
> > >
> > > Regards,
> > > Abel.
> > >
> > > >
> > > > Regards,
> > > >
> > > > Anthony Liguori
> > > >
> > > > > 4. vhost statistics
> > > > > This patch introduces a set of statistics to monitor different
> > > performance
> > > > > metrics of vhost and our polling and I/O scheduling mechanisms. The
> > > > > statistics are exposed using debugfs and can be easily displayed
> with a
> > >
> > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > >
> > > > >
> > > > > 5. Add heuristics to improve I/O scheduling
> > > > > This patch enhances the round-robin mechanism with a set of
> heuristics
> > > to
> > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > >
> > > > > This patch improves the handling of the requests by the vhost
> thread,
> > > but
> > > > > could perhaps be delayed to a
> > > > > later time , and not submitted as one of the first Elvis patches.
> > > > > I'd love to hear some comments about whether this patch needs to be
> > > part
> > > > > of the first submission.
> > > > >
> > > > > Any other feedback on this plan will be appreciated,
> > > > > Thank you,
> > > > > Razya
> > > >
> >

next prev parent reply	other threads:[~2013-11-27 10:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
2013-11-24 10:26 ` Michael S. Tsirkin
2013-11-25 11:06   ` Razya Ladelsky
2013-11-26 15:50 ` Stefan Hajnoczi
2013-11-26 18:05 ` Anthony Liguori
2013-11-26 18:53   ` Abel Gordon
2013-11-26 21:11     ` Michael S. Tsirkin
2013-11-27  7:43       ` Joel Nider
2013-11-27 10:27         ` Michael S. Tsirkin [this message]
2013-11-27 10:41           ` Abel Gordon
2013-11-27 10:59             ` Michael S. Tsirkin
2013-11-27 11:02               ` Abel Gordon
2013-11-27 11:36                 ` Michael S. Tsirkin
2013-11-27 22:33             ` Anthony Liguori
2013-11-28  8:25               ` Abel Gordon
2013-11-27 15:00         ` Stefan Hajnoczi
2013-11-27 15:30           ` Michael S. Tsirkin
2013-11-28  7:24           ` Joel Nider
2013-11-28  7:31           ` Abel Gordon
2013-11-28 11:01             ` Michael S. Tsirkin
2013-12-02 15:11             ` Stefan Hajnoczi
2013-11-27  9:03       ` Abel Gordon
2013-11-27  9:21         ` Michael S. Tsirkin
2013-11-27  9:49           ` Abel Gordon
2013-11-27 10:29             ` Michael S. Tsirkin
2013-11-27 10:55               ` Abel Gordon
2013-11-27 11:03                 ` Michael S. Tsirkin
2013-11-27 11:05                   ` Abel Gordon
2013-11-27 11:40                     ` Michael S. Tsirkin
2013-11-26 22:27 ` Bandan Das
2013-11-27  2:49 ` Jason Wang
2013-11-27  7:35   ` Gleb Natapov
2013-11-27  7:45     ` Joel Nider
2013-11-27  9:18     ` Abel Gordon
2013-11-27  9:21       ` Gleb Natapov
2013-11-27  9:33         ` Abel Gordon
2013-11-27  9:48           ` Gleb Natapov
2013-11-27 10:18   ` Abel Gordon
2013-11-27 10:37     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131127102719.GC29446@redhat.com \
    --to=mst@redhat.com \
    --cc=ABELG@il.ibm.com \
    --cc=ERANRA@il.ibm.com \
    --cc=JOELN@il.ibm.com \
    --cc=RAZYA@il.ibm.com \
    --cc=abel.gordon@gmail.com \
    --cc=anthony@codemonkey.ws \
    --cc=asias@redhat.com \
    --cc=digitaleric@google.com \
    --cc=gleb@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox