public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Wright <chrisw@sous-sol.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vatsa@linux.vnet.ibm.com, Avi Kivity <avi@redhat.com>,
	Anthony Liguori <aliguori@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	Chris Wright <chrisw@sous-sol.org>, Ingo Molnar <mingo@elte.hu>,
	Mike Galbraith <efault@gmx.de>,
	riel@redhat.com
Subject: Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)
Date: Wed, 1 Dec 2010 09:17:58 -0800	[thread overview]
Message-ID: <20101201171758.GA8514@sequoia.sous-sol.org> (raw)
In-Reply-To: <1291220718.32004.1696.camel@laptop>

* Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> On Wed, 2010-12-01 at 21:42 +0530, Srivatsa Vaddagiri wrote:
> 
> > Not if yield() remembers what timeslice was given up and adds that back when
> > thread is finally ready to run. Figure below illustrates this idea:
> > 
> > 
> >       A0/4    C0/4 D0/4 A0/4  C0/4 D0/4 A0/4  C0/4 D0/4 A0/4 
> > p0   |----|-L|----|----|----|L|----|----|----|L|----|----|----|--------------|
> >             \                \               \                  \
> > 	   B0/2[2]	    B0/0[6]         B0/0[10]            B0/14[0]
> > 
> >  
> > where,
> > 	p0	-> physical cpu0
> > 	L	-> denotes period of lock contention
> > 	A0/4    -> means vcpu A0 (of guest A) ran for 4 ms
> > 	B0/2[6] -> means vcpu B0 (of guest B) ran for 2 ms (and has given up
> > 		   6ms worth of its timeslice so far). In reality, we should
> > 	     	   not see too much of "given up" timeslice for a vcpu.
> 
> /me fails to parse
> 
> > > >Regarding directed yield, do we have any reliable mechanism to find target of
> > > >directed yield in this (unmodified/non-paravirtualized guest) case? IOW how do
> > > >we determine the vcpu thread to which cycles need to be yielded upon contention?
> > > 
> > > My idea was to yield to a random starved vcpu of the same guest.
> > > There are several cases to consider:
> > > 
> > > - we hit the right vcpu; lock is released, party.
> > > - we hit some vcpu that is doing unrelated work.  yielding thread
> > > doesn't make progress, but we're not wasting cpu time.
> > > - we hit another waiter for the same lock.  it will also PLE exit
> > > and trigger a directed yield.  this increases the cost of directed
> > > yield by a factor of count_of_runnable_but_not_running_vcpus, which
> > > could be large, but not disasterously so (i.e. don't run a 64-vcpu
> > > guest on a uniprocessor host with this)
> > > 
> > > >>  So if you were to test something similar running with a 20% vcpu
> > > >>  cap, I'm sure you'd run into similar issues.  It may show with fewer
> > > >>  vcpus (I've only tested 64).
> > > >>
> > > >>  >Are you assuming the existence of a directed yield and the
> > > >>  >specific concern is what happens when a directed yield happens
> > > >>  >after a PLE and the target of the yield has been capped?
> > > >>
> > > >>  Yes.  My concern is that we will see the same kind of problems
> > > >>  directed yield was designed to fix, but without allowing directed
> > > >>  yield to fix them.  Directed yield was designed to fix lock holder
> > > >>  preemption under contention,
> > > >
> > > >For modified guests, something like [2] seems to be the best approach to fix
> > > >lock-holder preemption (LHP) problem, which does not require any sort of
> > > >directed yield support. Essentially upon contention, a vcpu registers its lock
> > > >of interest and goes to sleep (via hypercall) waiting for lock-owner to wake it
> > > >up (again via another hypercall).
> > > 
> > > Right.
> > 
> > We don't have these hypercalls for KVM atm, which I am working on now.
> > 
> > > >For unmodified guests, IMHO a plain yield (or slightly enhanced yield [1])
> > > >should fix the LHP problem.
> > > 
> > > A plain yield (ignoring no-opiness on Linux) will penalize the
> > > running guest wrt other guests.  We need to maintain fairness.
> > 
> > Agreed on the need to maintain fairness.
> 
> Directed yield and fairness don't mix well either. You can end up
> feeding the other tasks more time than you'll ever get back.

If the directed yield is always to another task in your cgroup then
inter-guest scheduling fairness should be maintained.

> > > >Fyi, Xen folks also seem to be avoiding a directed yield for some of the same
> > > >reasons [3].
> > > 
> > > I think that fails for unmodified guests, where you don't know when
> > > the lock is released and so you don't have a wake_up notification.
> > > You lost a large timeslice and you can't gain it back, whereas with
> > > pv the wakeup means you only lose as much time as the lock was held.
> > > 
> > > >Given this line of thinking, hard-limiting guests (either in user-space or
> > > >kernel-space, latter being what I prefer) should not have adverse interactions
> > > >with LHP-related solutions.
> > > 
> > > If you hard-limit a vcpu that holds a lock, any waiting vcpus are
> > > also halted.
> > 
> > This can happen in normal case when lock-holders are preempted as well. So
> > not a new problem that hard-limits is introducing!
> 
> No, but hard limits make it _much_ worse.
> 
> > >  With directed yield you can let the lock holder make
> > > some progress at the expense of another vcpu.  A regular yield()
> > > will simply stall the waiter.
> > 
> > Agreed. Do you see any problems with slightly enhanced version of yeild
> > described above (rather than directed yield)? It has some advantage over 
> > directed yield in that it preserves not only fairness between VMs but also 
> > fairness between VCPUs of a VM. Also it avoids the need for a guessing game 
> > mentioned above and bad interactions with hard-limits.
> > 
> > CCing other scheduler experts for their opinion of proposed yield() extensions.
> 
> sys_yield() usage for anything other but two FIFO threads of the same
> priority goes to /dev/null.
> 
> The Xen paravirt spinlock solution is relatively sane, use that.
> Unmodified guests suck anyway, there's really nothing much sane you can
> do there as you don't know who owns what lock.

  reply	other threads:[~2010-12-01 17:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23 16:49 [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2) Anthony Liguori
2010-11-23 19:35 ` [Qemu-devel] " Blue Swirl
2010-11-23 21:46   ` Anthony Liguori
2010-11-23 23:43     ` Paolo Bonzini
2010-11-24  1:15       ` Anthony Liguori
2010-11-24  2:08         ` Paolo Bonzini
2010-11-24  8:18 ` Avi Kivity
2010-11-24 13:58   ` Anthony Liguori
2010-11-24 14:23     ` Avi Kivity
2010-12-01 12:37       ` Srivatsa Vaddagiri
2010-12-01 12:56         ` Avi Kivity
2010-12-01 16:12           ` Srivatsa Vaddagiri
2010-12-01 16:25             ` Peter Zijlstra
2010-12-01 17:17               ` Chris Wright [this message]
2010-12-01 17:22                 ` Peter Zijlstra
2010-12-01 17:26                   ` Rik van Riel
2010-12-01 19:07                     ` Peter Zijlstra
2010-12-01 19:24                       ` Rik van Riel
2010-12-01 19:35                         ` Peter Zijlstra
2010-12-01 19:42                           ` Rik van Riel
2010-12-01 19:47                             ` Peter Zijlstra
2010-12-02  9:07                       ` Avi Kivity
2010-12-01 17:46                   ` Chris Wright
2010-12-01 17:29               ` Srivatsa Vaddagiri
2010-12-01 17:45                 ` Peter Zijlstra
2010-12-01 18:00                   ` Srivatsa Vaddagiri
2010-12-01 19:09                     ` Peter Zijlstra
2010-12-02  9:17                       ` Avi Kivity
2010-12-02 11:47                         ` Srivatsa Vaddagiri
2010-12-02 12:22                           ` Srivatsa Vaddagiri
2010-12-02 12:41                           ` Avi Kivity
2010-12-02 13:13                             ` Srivatsa Vaddagiri
2010-12-02 13:49                               ` Avi Kivity
2010-12-02 15:27                                 ` Srivatsa Vaddagiri
2010-12-02 15:28                                   ` Srivatsa Vaddagiri
2010-12-02 15:33                                   ` Avi Kivity
2010-12-02 15:44                                     ` Srivatsa Vaddagiri
2010-12-02 12:19                         ` Srivatsa Vaddagiri
2010-12-02 12:42                           ` Avi Kivity
2010-12-02  9:14                 ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101201171758.GA8514@sequoia.sous-sol.org \
    --to=chrisw@sous-sol.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=efault@gmx.de \
    --cc=kvm@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox