From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List"
<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Tue, 15 Sep 2015 10:38:36 -0700 [thread overview]
Message-ID: <20150915173836.GO4029@linux.vnet.ibm.com> (raw)
In-Reply-To: <55F84A6B.1010207@redhat.com>
On Tue, Sep 15, 2015 at 06:42:19PM +0200, Paolo Bonzini wrote:
>
>
> On 15/09/2015 15:36, Christian Borntraeger wrote:
> > I am wondering why the old code behaved in such fatal ways. Is there
> > some interaction between waiting for a reschedule in the
> > synchronize_sched writer and some fork code actually waiting for the
> > read side to get the lock together with some rescheduling going on
> > waiting for a lock that fork holds? lockdep does not give me an hints
> > so I have no clue :-(
>
> It may just be consuming too much CPU usage. kernel/rcu/tree.c warns
> about it:
>
> * if you are using synchronize_sched_expedited() in a loop, please
> * restructure your code to batch your updates, and then use a single
> * synchronize_sched() instead.
>
> and you may remember that in KVM we switched from RCU to SRCU exactly to
> avoid userspace-controlled synchronize_rcu_expedited().
>
> In fact, I would say that any userspace-controlled call to *_expedited()
> is a bug waiting to happen and a bad idea---because userspace can, with
> little effort, end up calling it in a loop.
Excellent points!
Other options in such situations include the following:
o Rework so that the code uses call_rcu*() instead of *_expedited().
o Maintain a per-task or per-CPU counter so that every so many
*_expedited() invocations instead uses the non-expedited
counterpart. (For example, synchronize_rcu instead of
synchronize_rcu_expedited().)
Note that synchronize_srcu_expedited() is less troublesome than are the
other *_expedited() functions, because synchronize_srcu_expedited() does
not inflict OS jitter on other CPUs. This situation is being improved,
so that the other *_expedited() functions inflict less OS jitter and
(mostly) avoid inflicting OS jitter on nohz_full CPUs and idle CPUs (the
latter being important for battery-powered systems). In addition, the
*_expedited() functions avoid hammering CPUs with N-squared OS jitter
in response to concurrent invocation from all CPUs because multiple
concurrent *_expedited() calls will be satisfied by a single expedited
grace-period operation. Nevertheless, as Paolo points out, it is still
necessary to exercise caution when exposing synchronous grace periods
to userspace control.
Thanx, Paul
next prev parent reply other threads:[~2015-09-15 17:38 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36 ` Christian Borntraeger
2015-09-15 13:53 ` Tejun Heo
2015-09-15 16:42 ` Paolo Bonzini
2015-09-15 17:38 ` Paul E. McKenney [this message]
2015-09-16 8:32 ` Paolo Bonzini
2015-09-16 8:57 ` Christian Borntraeger
2015-09-16 9:12 ` Paolo Bonzini
2015-09-16 12:22 ` Oleg Nesterov
2015-09-16 12:35 ` Paolo Bonzini
2015-09-16 12:43 ` Oleg Nesterov
2015-09-16 12:56 ` Christian Borntraeger
2015-09-16 14:16 ` Tejun Heo
2015-09-16 14:19 ` Paolo Bonzini
2015-09-15 21:11 ` Christian Borntraeger
2015-09-15 21:26 ` Tejun Heo
2015-09-15 21:38 ` Paul E. McKenney
2015-09-15 22:28 ` Tejun Heo
2015-09-15 23:38 ` Paul E. McKenney
2015-09-16 1:24 ` Tejun Heo
2015-09-16 4:35 ` Paul E. McKenney
2015-09-16 11:06 ` Tejun Heo
2015-09-16 7:44 ` Christian Borntraeger
2015-09-16 10:58 ` Christian Borntraeger
2015-09-16 11:03 ` Tejun Heo
2015-09-16 11:50 ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56 ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150915173836.GO4029@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.