Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Tue, 15 Sep 2015 15:36:34 +0200	[thread overview]
Message-ID: <55F81EE2.4090708@de.ibm.com> (raw)
In-Reply-To: <20150915130550.GC16853@twins.programming.kicks-ass.net>

Am 15.09.2015 um 15:05 schrieb Peter Zijlstra:
> On Tue, Sep 15, 2015 at 02:05:14PM +0200, Christian Borntraeger wrote:
>> Tejun,
>>
>>
>> commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace 
>> signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably
>> hickups when starting several kvm guests (which libvirt will move into cgroups
>> - each vcpu thread and each i/o thread)
>> When you now start lots of guests in parallel on a bigger system (32CPUs with
>> 2way smt in my case) the system is so busy that systemd runs into several timeouts
>> like "Did not receive a reply. Possible causes include: the remote application did
>> not send a reply, the message bus security policy blocked the reply, the reply
>> timeout expired, or the network connection was broken."
>>
>> The problem seems to be that the newly used percpu_rwsem does a
>> rcu_synchronize_sched_expedited for all write downs/ups.
> 
> Can you try:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2015.09.11ab

yes, dev.2015.09.11a seems to help, thanks. Getting rid of the expedited hammer was
really helpful - I guess.

> 
> those include Oleg's rework of the percpu rwsem which should hopefully
> improve things somewhat.
> 
> But yes, pounding a global lock on a big machine will always suck.

By hacking out the fast path I actually degraded percpu rwsem to a real global lock, but
things were still a lot faster. 
I am wondering why the old code behaved in such fatal ways. Is there some interaction 
between waiting for a reschedule in the synchronize_sched writer and some fork code 
actually waiting for the read side to get the lock together with some rescheduling going
on waiting for a lock that fork holds? lockdep does not give me an hints so I have no clue :-(


Christian

next prev parent reply	other threads:[~2015-09-15 13:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36   ` Christian Borntraeger [this message]
2015-09-15 13:53     ` Tejun Heo
2015-09-15 16:42     ` Paolo Bonzini
2015-09-15 17:38       ` Paul E. McKenney
2015-09-16  8:32         ` Paolo Bonzini
2015-09-16  8:57           ` Christian Borntraeger
2015-09-16  9:12             ` Paolo Bonzini
2015-09-16 12:22               ` Oleg Nesterov
2015-09-16 12:35                 ` Paolo Bonzini
2015-09-16 12:43                   ` Oleg Nesterov
2015-09-16 12:56                 ` Christian Borntraeger
2015-09-16 14:16                 ` Tejun Heo
2015-09-16 14:19                   ` Paolo Bonzini
2015-09-15 21:11       ` Christian Borntraeger
2015-09-15 21:26         ` Tejun Heo
2015-09-15 21:38           ` Paul E. McKenney
2015-09-15 22:28             ` Tejun Heo
2015-09-15 23:38               ` Paul E. McKenney
2015-09-16  1:24                 ` Tejun Heo
2015-09-16  4:35                   ` Paul E. McKenney
2015-09-16 11:06                     ` Tejun Heo
2015-09-16  7:44                   ` Christian Borntraeger
2015-09-16 10:58                     ` Christian Borntraeger
2015-09-16 11:03                       ` Tejun Heo
2015-09-16 11:50                         ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56   ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00   ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55F81EE2.4090708@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).