Re: Updated Elvis Upstreaming Roadmap

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Abel Gordon <abel.gordon@gmail.com>
Cc: Razya Ladelsky <RAZYA@il.ibm.com>,
	Anthony Liguori <anthony@codemonkey.ws>,
	asias@redhat.com, digitaleric@google.com,
	Eran Raichstein <ERANRA@il.ibm.com>,
	gleb@redhat.com, jasowang@redhat.com,
	Joel Nider <JOELN@il.ibm.com>,
	kvm@vger.kernel.org, kvm-owner@vger.kernel.org,
	pbonzini@redhat.com, Stefan Hajnoczi <stefanha@gmail.com>,
	Yossi Kuperman1 <YOSSIKU@il.ibm.com>,
	Eyal Moscovici <EYALMO@il.ibm.com>,
	bsd@redhat.com
Subject: Re: Updated Elvis Upstreaming Roadmap
Date: Thu, 19 Dec 2013 12:13:28 +0200	[thread overview]
Message-ID: <20131219101328.GA1853@redhat.com> (raw)
In-Reply-To: <CA+OY2tvjU_GR17MSor2JrKN-eWuumfQwaDfJV-mxeYo8cxOpPQ@mail.gmail.com>

On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
> On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
> >> Hi,
> >>
> >> Thank you all for your comments.
> >> I'm sorry for taking this long to reply, I was away on vacation..
> >>
> >> It was a good, long discussion, many issues were raised, which we'd like
> >> to address with the following proposed roadmap for Elvis patches.
> >> In general, we believe it would be best to start with patches that are
> >> as simple as possible, providing the basic Elvis functionality,
> >> and attend to the more complicated issues in subsequent patches.
> >>
> >> Here's the road map for Elvis patches:
> >
> > Thanks for the follow up. Some suggestions below.
> > Please note they suggestions below merely represent
> > thoughts on merging upstream.
> > If as the first step you are content with keeping this
> > work as out of tree patches, in order to have
> > the freedom to experiment with interfaces and
> > performance, please feel free to ignore them.
> >
> >> 1. Shared vhost thread for multiple devices.
> >>
> >> The way to go here, we believe, is to start with a patch having a shared
> >> vhost thread for multiple devices of the SAME vm.
> >> The next step/patch may be handling vms belonging to the same cgroup.
> >>
> >> Finally, we need to extend the functionality so that the shared vhost
> >> thread
> >> serves multiple vms (not necessarily belonging to the same cgroup).
> >>
> >> There was a lot of discussion about the way to address the enforcement
> >> of cgroup policies, and we will consider the various solutions with a
> >> future
> >> patch.
> >
> > With respect to the upstream kernel,
> > I'm not sure a bunch of changes just for the sake of guests with
> > multiple virtual NIC cards makes sense.
> > And I wonder how this step, in isolation, will affect e.g.
> > multiqueue workloads.
> > But I guess if the numbers are convincing, this can be mergeable.
> 
> Even if you have a single multiqueue device this change allows
> to create one vhost thread for all the queues, one vhost thread per
> queue or any other combination. I guess that depending on the workload
> and depending on the system utilization (free cycles/cores, density)
> you would prefer
> to use one or more vhost threads.

That is already controllable from the guest though, which likely has a better
idea about the workload.

> >
> >>
> >> 2. Creation of vhost threads
> >>
> >> We suggested two ways of controlling the creation and removal of vhost
> >> threads:
> >> - statically determining the maximum number of virtio devices per worker
> >> via a kernel module parameter
> >> - dynamically: Sysfs mechanism to add and remove vhost threads
> >>
> >> It seems that it would be simplest to take the static approach as
> >> a first stage. At a second stage (next patch), we'll advance to
> >> dynamically
> >> changing the number of vhost threads, using the static module parameter
> >> only as a default value.
> >
> > I'm not sure how independent this is from 1.
> > With respect to the upstream kernel,
> > Introducing interfaces (which we'll have to maintain
> > forever) just for the sake of guests with
> > multiple virtual NIC cards does not look like a good tradeoff.
> >
> > So I'm unlikely to merge this upstream without making it useful cross-VM,
> > and yes this means isolation and accounting with cgroups need to
> > work properly.
> 
> Agree, but even if you use a single multiqueue device having the
> ability to use 1 thread to serve all the queues or multiple threads to
> serve all the queues looks like a useful feature.

Could be.  At the moment, multiqueue is off by default because it causes
regressions for some workloads as compared to a single queue.
If we have heuristics in vhost that fix this by auto-tuning threading, that
would be nice.  But if you need to tune it manually anyway,
then from upstream perspective it does not seem to be worth it - you can just
turn multiqueue on/off in the guest.


> >
> >> Regarding cwmq, it is an interesting mechanism, which we need to explore
> >> further.
> >> At the moment we prefer not to change the vhost model to use cwmq, as some
> >> of the issues that were discussed, such as cgroups, are not supported by
> >> cwmq, and this is adding more complexity.
> >> However, we'll look further into it, and consider it at a later stage.
> >
> > Hmm that's still assuming some smart management tool configuring
> > this correctly.  Can't this be determined automatically depending
> > on the workload?
> > This is what the cwmq suggestion was really about: detect
> > that we need more threads and spawn them.
> > It's less about sharing the implementation with workqueues -
> > would be very nice but not a must.
> 
> But how cwmq can consider cgroup accounting ?

I think cwmq is just a replacement for our own thread pool.
It doesn't make cgroup accounting easier or harder.

> In any case, IMHO, the kernel should first provide the "mechanism" so
> later on a user-space management application (the "policy") can
> orchestrate it.

I think policy would be something coarse-grained, like setting priority.
Making detailed scheduling decisions in userspace seems wrong somehow:
what does management application know that kernel doesn't?


> >
> >
> >
> >> 3. Adding polling mode to vhost
> >>
> >> It is a good idea making polling adaptive based on various factors such as
> >> the I/O rate, the guest kick overhead(which is the tradeoff of polling),
> >> or the amount of wasted cycles (cycles we kept polling but no new work was
> >> added).
> >> However, as a beginning polling patch, we would prefer having a naive
> >> polling approach, which could be tuned with later patches.
> >>
> >
> > While any polling approach would still need a lot of testing to prove we
> > don't for example steal CPU from guest which could be doing other useful
> > work, given that an exit is at least 1.5K cycles at least in theory it
> > seems like something that can improve performance.  I'm not sure how
> > naive we can be without introducing regressions  for some workloads.
> > For example, if we are on the same host CPU, there's no
> > chance busy waiting will help us make progress.
> > How about detecting that the VCPU thread that kicked us
> > is currently running on another CPU, and only polling in
> > this case?
> >
> >> 4. vhost statistics
> >>
> >> The issue that was raised for the vhost statistics was using ftrace
> >> instead of the debugfs mechanism.
> >> However, looking further into the kvm stat mechanism, we learned that
> >> ftrace didn't replace the plain debugfs mechanism, but was used in
> >> addition to it.
> >>
> >> We propose to continue using debugfs for statistics, in a manner similar
> >> to kvm,
> >> and at some point in the future ftrace can be added to vhost as well.
> >
> > IMHO which kvm stat is a useful script, the best tool
> > for perf stats is still perf. So I would try to integrate with that.
> > How it works internally is IMHO less important.
> >
> >> Does this plan look o.k.?
> >> If there are no further comments, I'll start preparing the patches
> >> according to what we've agreed on thus far.
> >> Thank you,
> >> Razya
> >
> > I think a good place to try to start merging upstream would be 3 and 4.
> > So if you want to make it easier to merge things upstream, try to keep 3
> > and 4 independent from 1 and 2.
> 
> Note -1- and -3- are strongly related. If you have a thread that
> serves multiple queues (whenever they belong to the same device/vm or
> not) then this thread will be polling multiple queues at the same
> time. This increases the chances you will find pending work to do in
> some queue. In other words, you reduce the cycles wasted for polling.
> In the other hand,  if you run multiple threads and these threads do
> polling simultaneously then the threads may starve each other and
> reduce performance (if they are scheduled to run in the same core). In
> addition, a shared thread can decide when it should stop processing a
> given queue and switch to other queue because by polling the thread
> knows when new requests were added to a queue (this is what we called
> fine-grained I/O scheduled heuristics)
> 
> So, seems like polling makes more sense when you serve multiple queues
> with the same thread.
> 
> Abel.

A combination might bring gains in more workloads, but it should work on its
own too. It's quite possible that only a single VM is active, others are
idle. So either polling should handle that well or be smart enough to turn
itself off in this case.

> >
> > Thanks again,
> >
> > --
> > MST

next prev parent reply	other threads:[~2013-12-19 10:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-17 10:04 Updated Elvis Upstreaming Roadmap Razya Ladelsky
2013-12-18 10:43 ` Michael S. Tsirkin
2013-12-19  6:40   ` Abel Gordon
2013-12-19 10:13     ` Michael S. Tsirkin [this message]
2013-12-19 10:36       ` Abel Gordon
2013-12-19 11:37         ` Michael S. Tsirkin
2013-12-19 12:56           ` Abel Gordon
2013-12-19 13:48             ` Michael S. Tsirkin
2013-12-19 14:19               ` Abel Gordon
2013-12-19 14:48                 ` Michael S. Tsirkin
2013-12-24 12:50                   ` Razya Ladelsky
2013-12-24 16:21 ` Gleb Natapov
2013-12-25  7:38   ` Razya Ladelsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131219101328.GA1853@redhat.com \
    --to=mst@redhat.com \
    --cc=ERANRA@il.ibm.com \
    --cc=EYALMO@il.ibm.com \
    --cc=JOELN@il.ibm.com \
    --cc=RAZYA@il.ibm.com \
    --cc=YOSSIKU@il.ibm.com \
    --cc=abel.gordon@gmail.com \
    --cc=anthony@codemonkey.ws \
    --cc=asias@redhat.com \
    --cc=bsd@redhat.com \
    --cc=digitaleric@google.com \
    --cc=gleb@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kvm-owner@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox