From: "Michael S. Tsirkin" <mst@redhat.com>
To: Abel Gordon <abel.gordon@gmail.com>
Cc: Razya Ladelsky <RAZYA@il.ibm.com>,
Anthony Liguori <anthony@codemonkey.ws>,
asias@redhat.com, digitaleric@google.com,
Eran Raichstein <ERANRA@il.ibm.com>,
gleb@redhat.com, jasowang@redhat.com,
Joel Nider <JOELN@il.ibm.com>,
kvm@vger.kernel.org, kvm-owner@vger.kernel.org,
pbonzini@redhat.com, Stefan Hajnoczi <stefanha@gmail.com>,
Yossi Kuperman1 <YOSSIKU@il.ibm.com>,
Eyal Moscovici <EYALMO@il.ibm.com>,
bsd@redhat.com
Subject: Re: Updated Elvis Upstreaming Roadmap
Date: Thu, 19 Dec 2013 12:13:28 +0200 [thread overview]
Message-ID: <20131219101328.GA1853@redhat.com> (raw)
In-Reply-To: <CA+OY2tvjU_GR17MSor2JrKN-eWuumfQwaDfJV-mxeYo8cxOpPQ@mail.gmail.com>
On Thu, Dec 19, 2013 at 08:40:44AM +0200, Abel Gordon wrote:
> On Wed, Dec 18, 2013 at 12:43 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
> >> Hi,
> >>
> >> Thank you all for your comments.
> >> I'm sorry for taking this long to reply, I was away on vacation..
> >>
> >> It was a good, long discussion, many issues were raised, which we'd like
> >> to address with the following proposed roadmap for Elvis patches.
> >> In general, we believe it would be best to start with patches that are
> >> as simple as possible, providing the basic Elvis functionality,
> >> and attend to the more complicated issues in subsequent patches.
> >>
> >> Here's the road map for Elvis patches:
> >
> > Thanks for the follow up. Some suggestions below.
> > Please note they suggestions below merely represent
> > thoughts on merging upstream.
> > If as the first step you are content with keeping this
> > work as out of tree patches, in order to have
> > the freedom to experiment with interfaces and
> > performance, please feel free to ignore them.
> >
> >> 1. Shared vhost thread for multiple devices.
> >>
> >> The way to go here, we believe, is to start with a patch having a shared
> >> vhost thread for multiple devices of the SAME vm.
> >> The next step/patch may be handling vms belonging to the same cgroup.
> >>
> >> Finally, we need to extend the functionality so that the shared vhost
> >> thread
> >> serves multiple vms (not necessarily belonging to the same cgroup).
> >>
> >> There was a lot of discussion about the way to address the enforcement
> >> of cgroup policies, and we will consider the various solutions with a
> >> future
> >> patch.
> >
> > With respect to the upstream kernel,
> > I'm not sure a bunch of changes just for the sake of guests with
> > multiple virtual NIC cards makes sense.
> > And I wonder how this step, in isolation, will affect e.g.
> > multiqueue workloads.
> > But I guess if the numbers are convincing, this can be mergeable.
>
> Even if you have a single multiqueue device this change allows
> to create one vhost thread for all the queues, one vhost thread per
> queue or any other combination. I guess that depending on the workload
> and depending on the system utilization (free cycles/cores, density)
> you would prefer
> to use one or more vhost threads.
That is already controllable from the guest though, which likely has a better
idea about the workload.
> >
> >>
> >> 2. Creation of vhost threads
> >>
> >> We suggested two ways of controlling the creation and removal of vhost
> >> threads:
> >> - statically determining the maximum number of virtio devices per worker
> >> via a kernel module parameter
> >> - dynamically: Sysfs mechanism to add and remove vhost threads
> >>
> >> It seems that it would be simplest to take the static approach as
> >> a first stage. At a second stage (next patch), we'll advance to
> >> dynamically
> >> changing the number of vhost threads, using the static module parameter
> >> only as a default value.
> >
> > I'm not sure how independent this is from 1.
> > With respect to the upstream kernel,
> > Introducing interfaces (which we'll have to maintain
> > forever) just for the sake of guests with
> > multiple virtual NIC cards does not look like a good tradeoff.
> >
> > So I'm unlikely to merge this upstream without making it useful cross-VM,
> > and yes this means isolation and accounting with cgroups need to
> > work properly.
>
> Agree, but even if you use a single multiqueue device having the
> ability to use 1 thread to serve all the queues or multiple threads to
> serve all the queues looks like a useful feature.
Could be. At the moment, multiqueue is off by default because it causes
regressions for some workloads as compared to a single queue.
If we have heuristics in vhost that fix this by auto-tuning threading, that
would be nice. But if you need to tune it manually anyway,
then from upstream perspective it does not seem to be worth it - you can just
turn multiqueue on/off in the guest.
> >
> >> Regarding cwmq, it is an interesting mechanism, which we need to explore
> >> further.
> >> At the moment we prefer not to change the vhost model to use cwmq, as some
> >> of the issues that were discussed, such as cgroups, are not supported by
> >> cwmq, and this is adding more complexity.
> >> However, we'll look further into it, and consider it at a later stage.
> >
> > Hmm that's still assuming some smart management tool configuring
> > this correctly. Can't this be determined automatically depending
> > on the workload?
> > This is what the cwmq suggestion was really about: detect
> > that we need more threads and spawn them.
> > It's less about sharing the implementation with workqueues -
> > would be very nice but not a must.
>
> But how cwmq can consider cgroup accounting ?
I think cwmq is just a replacement for our own thread pool.
It doesn't make cgroup accounting easier or harder.
> In any case, IMHO, the kernel should first provide the "mechanism" so
> later on a user-space management application (the "policy") can
> orchestrate it.
I think policy would be something coarse-grained, like setting priority.
Making detailed scheduling decisions in userspace seems wrong somehow:
what does management application know that kernel doesn't?
> >
> >
> >
> >> 3. Adding polling mode to vhost
> >>
> >> It is a good idea making polling adaptive based on various factors such as
> >> the I/O rate, the guest kick overhead(which is the tradeoff of polling),
> >> or the amount of wasted cycles (cycles we kept polling but no new work was
> >> added).
> >> However, as a beginning polling patch, we would prefer having a naive
> >> polling approach, which could be tuned with later patches.
> >>
> >
> > While any polling approach would still need a lot of testing to prove we
> > don't for example steal CPU from guest which could be doing other useful
> > work, given that an exit is at least 1.5K cycles at least in theory it
> > seems like something that can improve performance. I'm not sure how
> > naive we can be without introducing regressions for some workloads.
> > For example, if we are on the same host CPU, there's no
> > chance busy waiting will help us make progress.
> > How about detecting that the VCPU thread that kicked us
> > is currently running on another CPU, and only polling in
> > this case?
> >
> >> 4. vhost statistics
> >>
> >> The issue that was raised for the vhost statistics was using ftrace
> >> instead of the debugfs mechanism.
> >> However, looking further into the kvm stat mechanism, we learned that
> >> ftrace didn't replace the plain debugfs mechanism, but was used in
> >> addition to it.
> >>
> >> We propose to continue using debugfs for statistics, in a manner similar
> >> to kvm,
> >> and at some point in the future ftrace can be added to vhost as well.
> >
> > IMHO which kvm stat is a useful script, the best tool
> > for perf stats is still perf. So I would try to integrate with that.
> > How it works internally is IMHO less important.
> >
> >> Does this plan look o.k.?
> >> If there are no further comments, I'll start preparing the patches
> >> according to what we've agreed on thus far.
> >> Thank you,
> >> Razya
> >
> > I think a good place to try to start merging upstream would be 3 and 4.
> > So if you want to make it easier to merge things upstream, try to keep 3
> > and 4 independent from 1 and 2.
>
> Note -1- and -3- are strongly related. If you have a thread that
> serves multiple queues (whenever they belong to the same device/vm or
> not) then this thread will be polling multiple queues at the same
> time. This increases the chances you will find pending work to do in
> some queue. In other words, you reduce the cycles wasted for polling.
> In the other hand, if you run multiple threads and these threads do
> polling simultaneously then the threads may starve each other and
> reduce performance (if they are scheduled to run in the same core). In
> addition, a shared thread can decide when it should stop processing a
> given queue and switch to other queue because by polling the thread
> knows when new requests were added to a queue (this is what we called
> fine-grained I/O scheduled heuristics)
>
> So, seems like polling makes more sense when you serve multiple queues
> with the same thread.
>
> Abel.
A combination might bring gains in more workloads, but it should work on its
own too. It's quite possible that only a single VM is active, others are
idle. So either polling should handle that well or be smart enough to turn
itself off in this case.
> >
> > Thanks again,
> >
> > --
> > MST
next prev parent reply other threads:[~2013-12-19 10:09 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 10:04 Updated Elvis Upstreaming Roadmap Razya Ladelsky
2013-12-18 10:43 ` Michael S. Tsirkin
2013-12-19 6:40 ` Abel Gordon
2013-12-19 10:13 ` Michael S. Tsirkin [this message]
2013-12-19 10:36 ` Abel Gordon
2013-12-19 11:37 ` Michael S. Tsirkin
2013-12-19 12:56 ` Abel Gordon
2013-12-19 13:48 ` Michael S. Tsirkin
2013-12-19 14:19 ` Abel Gordon
2013-12-19 14:48 ` Michael S. Tsirkin
2013-12-24 12:50 ` Razya Ladelsky
2013-12-24 16:21 ` Gleb Natapov
2013-12-25 7:38 ` Razya Ladelsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131219101328.GA1853@redhat.com \
--to=mst@redhat.com \
--cc=ERANRA@il.ibm.com \
--cc=EYALMO@il.ibm.com \
--cc=JOELN@il.ibm.com \
--cc=RAZYA@il.ibm.com \
--cc=YOSSIKU@il.ibm.com \
--cc=abel.gordon@gmail.com \
--cc=anthony@codemonkey.ws \
--cc=asias@redhat.com \
--cc=bsd@redhat.com \
--cc=digitaleric@google.com \
--cc=gleb@redhat.com \
--cc=jasowang@redhat.com \
--cc=kvm-owner@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox