All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Abel Gordon <ABELG@il.ibm.com>
Cc: abel.gordon@gmail.com, Anthony Liguori <anthony@codemonkey.ws>,
	asias@redhat.com, digitaleric@google.com,
	Eran Raichstein <ERANRA@il.ibm.com>,
	gleb@redhat.com, jasowang@redhat.com,
	Joel Nider <JOELN@il.ibm.com>,
	kvm@vger.kernel.org, pbonzini@redhat.com,
	Razya Ladelsky <RAZYA@il.ibm.com>,
	Eyal Moscovici <EYALMO@il.ibm.com>,
	Yossi Kuperman1 <YOSSIKU@il.ibm.com>
Subject: Re: Elvis upstreaming plan
Date: Wed, 27 Nov 2013 13:03:25 +0200	[thread overview]
Message-ID: <20131127110325.GE29702@redhat.com> (raw)
In-Reply-To: <OF9E1F0B4F.4188F1F7-ONC2257C30.003AF3C9-C2257C30.003BFA60@il.ibm.com>

On Wed, Nov 27, 2013 at 12:55:07PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 12:29:43 PM:
> 
> >
> > On Wed, Nov 27, 2013 at 11:49:03AM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 27/11/2013 11:21:00 AM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 11:03:57AM +0200, Abel Gordon wrote:
> > > > >
> > > > >
> > > > > "Michael S. Tsirkin" <mst@redhat.com> wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori <anthony@codemonkey.ws> wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky <RAZYA@il.ibm.com> writes:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> team,
> > > which
> > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > > forum:
> > > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > ELVIS slides:
> > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the discussions that took place at the forum,
> > > > > upstreaming
> > > > > > > > > some of the Elvis approaches seems to be a good idea, which
> we
> > > > > would
> > > > > > > like
> > > > > > > > > to pursue.
> > > > > > > > >
> > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > >
> > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > This patch creates a worker thread and worker queue shared
> > > across
> > > > > > > multiple
> > > > > > > > > virtio devices
> > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > to limit a vhost thread to serve multiple devices only if
> they
> > > > > belong
> > > > > > > to
> > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> cgroups
> > > > > concerns.
> > > > > > > > >
> > > > > > > > > Another modification is related to the creation and removal
> of
> > > > > vhost
> > > > > > > > > threads, which will be discussed next.
> > > > > > > >
> > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > >
> > > > > > > > We shouldn't throw away isolation without exhausting every
> other
> > > > > > > > possibility.
> > > > > > >
> > > > > > > Seems you have missed the important details here.
> > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > and you believe we should not share a single vhost thread
> across
> > > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > > so we will serve multiple virtio devices using a single vhost
> > > thread
> > > > > > > "only if the devices belong to the same VM". This series of
> patches
> > > > > > > will not allow two different VMs to share the same vhost
> thread.
> > > > > > > So, I don't see why this will be throwing away isolation and
> why
> > > > > > > this could be a "exceptionally bad idea".
> > > > > > >
> > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > approach of having a single data plane thread for many devices
> > > > > > > was discussed....
> > > > > > > > We've seen very positive results from adding threads.  We
> should
> > > also
> > > > > > > > look at scheduling.
> > > > > > >
> > > > > > > ...and we have also seen exceptionally negative results from
> > > > > > > adding threads, both for vhost and data-plane. If you have lot
> of
> > > idle
> > > > > > > time/cores
> > > > > > > then it makes sense to run multiple threads. But IMHO in many
> > > scenarios
> > > > > you
> > > > > > > don't have lot of idle time/cores.. and if you have them you
> would
> > > > > probably
> > > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
> you
> > > have
> > > > > > > enough physical cores to run all the VCPU threads and the I/O
> > > threads
> > > > > is
> > > > > > > not a
> > > > > > > realistic scenario.
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question then becomes why have multiple devices?
> > > > >
> > > > > I assume that there are guests that have multiple vhost devices
> > > > > (net or scsi/tcm).
> > > >
> > > > These are kind of uncommon though.  In fact a kernel thread is not a
> > > > unit of isolation - cgroups supply isolation.
> > > > If we had use_cgroups kind of like use_mm, we could thinkably
> > > > do work for multiple VMs on the same thread.
> > > >
> > > >
> > > > > We can also extend the approach to consider
> > > > > multiqueue devices, so we can create 1 vhost thread shared for all
> the
> > > > > queues,
> > > > > 1 vhost thread for each queue or a few threads for multiple queues.
> We
> > > > > could also share a thread across multiple queues even if they do
> not
> > > belong
> > > > > to the same device.
> > > > >
> > > > > Remember the experiments Shirley Ma did with the split
> > > > > tx/rx ? If we have a control interface we could support both
> > > > > approaches: different threads or a single thread.
> > > >
> > > >
> > > > I'm a bit concerned about interface managing specific
> > > > threads being so low level.
> > > > What exactly is it that management knows that makes it
> > > > efficient to group threads together?
> > > > That host is over-committed so we should use less CPU?
> > > > I'd like the interface to express that knowledge.
> > > >
> > >
> > > We can expose information such as the amount of I/O being
> > > handled for each queue, the amount of CPU cycles consumed for
> > > processing the I/O, latency and more.
> > > If we start with a simple mechanism that just enables the
> > > feature we can later expose more information to implement a policy
> > > framework that will be responsible for taking the decisions
> > > (the orchestration part).
> >
> > What kind of possible policies do you envision?
> > If we just react to load by balancing the work done,
> > and when over-committed anyway, localize work so
> > we get less IPIs, then this is not policy, this is the mechanism.
> 
> (CCing Eyal Moscovici who is actually prototyping with multiple
> policies and may want to join this thread)
> 
> Starting with basic policies: we can use a single vhost thread
> and create new vhost threads if it becomes saturated and there
> are enough cpu cycles available in the system
> or if the latency (how long the requests in the virtio queues wait
> until they are handled) is too high.
> We can merge threads if the latency is already low or if the threads
> are not saturated.
> 
> There is a hidden trade-off here: when you run more vhost threads you
> may actually be stealing cpu cycles from the vcpu threads and also
> increasing context switches. So, from the vhost perspective it may
> improve performance but from the vcpu threads perspective it may
> degrade performance.

So this is a very interesting problem to solve but what does
management know that suggests it can solve it better?

> >
> >
> > >
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > > This patch allows us to add and remove vhost threads
> > > dynamically.
> > > > > > > > >
> > > > > > > > > A simpler way to control the creation of vhost threads is
> > > > > statically
> > > > > > > > > determining the maximum number of virtio devices per worker
> via
> > > a
> > > > > > > kernel
> > > > > > > > > module parameter (which is the way the previously mentioned
> > > patch
> > > > > is
> > > > > > > > > currently implemented)
> > > > > > > > >
> > > > > > > > > I'd like to ask for advice here about the more preferable
> way
> > > to
> > > > > go:
> > > > > > > > > Although having the sysfs mechanism provides more
> flexibility,
> > > it
> > > > > may
> > > > > > > be a
> > > > > > > > > good idea to start with a simple static parameter, and have
> the
> > > > > first
> > > > > > > > > patches as simple as possible. What do you think?
> > > > > > > > >
> > > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > > Have the vhost thread poll the virtqueues with high I/O
> rate
> > > for
> > > > > new
> > > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > > >
> > > > > > > > Ack on this.
> > > > > > >
> > > > > > > :)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Abel.
> > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Anthony Liguori
> > > > > > > >
> > > > > > > > > 4. vhost statistics
> > > > > > > > > This patch introduces a set of statistics to monitor
> different
> > > > > > > performance
> > > > > > > > > metrics of vhost and our polling and I/O scheduling
> mechanisms.
> > > The
> > > > > > > > > statistics are exposed using debugfs and can be easily
> > > displayed
> > > > > with a
> > > > > > >
> > > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > > heuristics
> > > > > > > to
> > > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > > >
> > > > > > > > > This patch improves the handling of the requests by the
> vhost
> > > > > thread,
> > > > > > > but
> > > > > > > > > could perhaps be delayed to a
> > > > > > > > > later time , and not submitted as one of the first Elvis
> > > patches.
> > > > > > > > > I'd love to hear some comments about whether this patch
> needs
> > > to be
> > > > > > > part
> > > > > > > > > of the first submission.
> > > > > > > > >
> > > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > > Thank you,
> > > > > > > > > Razya
> > > > > > > >
> > > > > >
> > > >
> >

  reply	other threads:[~2013-11-27 11:00 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-24  9:22 Elvis upstreaming plan Razya Ladelsky
2013-11-24 10:26 ` Michael S. Tsirkin
2013-11-25 11:06   ` Razya Ladelsky
2013-11-26 15:50 ` Stefan Hajnoczi
2013-11-26 18:05 ` Anthony Liguori
2013-11-26 18:53   ` Abel Gordon
2013-11-26 21:11     ` Michael S. Tsirkin
2013-11-27  7:43       ` Joel Nider
2013-11-27 10:27         ` Michael S. Tsirkin
2013-11-27 10:41           ` Abel Gordon
2013-11-27 10:59             ` Michael S. Tsirkin
2013-11-27 11:02               ` Abel Gordon
2013-11-27 11:36                 ` Michael S. Tsirkin
2013-11-27 22:33             ` Anthony Liguori
2013-11-28  8:25               ` Abel Gordon
2013-11-27 15:00         ` Stefan Hajnoczi
2013-11-27 15:30           ` Michael S. Tsirkin
2013-11-28  7:24           ` Joel Nider
2013-11-28  7:31           ` Abel Gordon
2013-11-28 11:01             ` Michael S. Tsirkin
2013-12-02 15:11             ` Stefan Hajnoczi
2013-11-27  9:03       ` Abel Gordon
2013-11-27  9:21         ` Michael S. Tsirkin
2013-11-27  9:49           ` Abel Gordon
2013-11-27 10:29             ` Michael S. Tsirkin
2013-11-27 10:55               ` Abel Gordon
2013-11-27 11:03                 ` Michael S. Tsirkin [this message]
2013-11-27 11:05                   ` Abel Gordon
2013-11-27 11:40                     ` Michael S. Tsirkin
2013-11-26 22:27 ` Bandan Das
2013-11-27  2:49 ` Jason Wang
2013-11-27  7:35   ` Gleb Natapov
2013-11-27  7:45     ` Joel Nider
2013-11-27  9:18     ` Abel Gordon
2013-11-27  9:21       ` Gleb Natapov
2013-11-27  9:33         ` Abel Gordon
2013-11-27  9:48           ` Gleb Natapov
2013-11-27 10:18   ` Abel Gordon
2013-11-27 10:37     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131127110325.GE29702@redhat.com \
    --to=mst@redhat.com \
    --cc=ABELG@il.ibm.com \
    --cc=ERANRA@il.ibm.com \
    --cc=EYALMO@il.ibm.com \
    --cc=JOELN@il.ibm.com \
    --cc=RAZYA@il.ibm.com \
    --cc=YOSSIKU@il.ibm.com \
    --cc=abel.gordon@gmail.com \
    --cc=anthony@codemonkey.ws \
    --cc=asias@redhat.com \
    --cc=digitaleric@google.com \
    --cc=gleb@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.