From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List"
<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Tue, 15 Sep 2015 10:38:36 -0700 [thread overview]
Message-ID: <20150915173836.GO4029@linux.vnet.ibm.com> (raw)
In-Reply-To: <55F84A6B.1010207@redhat.com>
On Tue, Sep 15, 2015 at 06:42:19PM +0200, Paolo Bonzini wrote:
>
>
> On 15/09/2015 15:36, Christian Borntraeger wrote:
> > I am wondering why the old code behaved in such fatal ways. Is there
> > some interaction between waiting for a reschedule in the
> > synchronize_sched writer and some fork code actually waiting for the
> > read side to get the lock together with some rescheduling going on
> > waiting for a lock that fork holds? lockdep does not give me an hints
> > so I have no clue :-(
>
> It may just be consuming too much CPU usage. kernel/rcu/tree.c warns
> about it:
>
> * if you are using synchronize_sched_expedited() in a loop, please
> * restructure your code to batch your updates, and then use a single
> * synchronize_sched() instead.
>
> and you may remember that in KVM we switched from RCU to SRCU exactly to
> avoid userspace-controlled synchronize_rcu_expedited().
>
> In fact, I would say that any userspace-controlled call to *_expedited()
> is a bug waiting to happen and a bad idea---because userspace can, with
> little effort, end up calling it in a loop.
Excellent points!
Other options in such situations include the following:
o Rework so that the code uses call_rcu*() instead of *_expedited().
o Maintain a per-task or per-CPU counter so that every so many
*_expedited() invocations instead uses the non-expedited
counterpart. (For example, synchronize_rcu instead of
synchronize_rcu_expedited().)
Note that synchronize_srcu_expedited() is less troublesome than are the
other *_expedited() functions, because synchronize_srcu_expedited() does
not inflict OS jitter on other CPUs. This situation is being improved,
so that the other *_expedited() functions inflict less OS jitter and
(mostly) avoid inflicting OS jitter on nohz_full CPUs and idle CPUs (the
latter being important for battery-powered systems). In addition, the
*_expedited() functions avoid hammering CPUs with N-squared OS jitter
in response to concurrent invocation from all CPUs because multiple
concurrent *_expedited() calls will be satisfied by a single expedited
grace-period operation. Nevertheless, as Paolo points out, it is still
necessary to exercise caution when exposing synchronous grace periods
to userspace control.
Thanx, Paul
next prev parent reply other threads:[~2015-09-15 17:38 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36 ` Christian Borntraeger
2015-09-15 13:53 ` Tejun Heo
2015-09-15 16:42 ` Paolo Bonzini
2015-09-15 17:38 ` Paul E. McKenney [this message]
2015-09-16 8:32 ` Paolo Bonzini
2015-09-16 8:57 ` Christian Borntraeger
2015-09-16 9:12 ` Paolo Bonzini
2015-09-16 12:22 ` Oleg Nesterov
2015-09-16 12:35 ` Paolo Bonzini
2015-09-16 12:43 ` Oleg Nesterov
2015-09-16 12:56 ` Christian Borntraeger
2015-09-16 14:16 ` Tejun Heo
2015-09-16 14:19 ` Paolo Bonzini
2015-09-15 21:11 ` Christian Borntraeger
2015-09-15 21:26 ` Tejun Heo
2015-09-15 21:38 ` Paul E. McKenney
2015-09-15 22:28 ` Tejun Heo
2015-09-15 23:38 ` Paul E. McKenney
2015-09-16 1:24 ` Tejun Heo
2015-09-16 4:35 ` Paul E. McKenney
2015-09-16 11:06 ` Tejun Heo
2015-09-16 7:44 ` Christian Borntraeger
2015-09-16 10:58 ` Christian Borntraeger
2015-09-16 11:03 ` Tejun Heo
2015-09-16 11:50 ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56 ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150915173836.GO4029@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).