linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	paulmck@linux.vnet.ibm.com, Peter Zijlstra <peterz@infradead.org>,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Wed, 16 Sep 2015 14:22:49 +0200	[thread overview]
Message-ID: <20150916122249.GA28821@redhat.com> (raw)
In-Reply-To: <55F9326A.9070509@redhat.com>

On 09/16, Paolo Bonzini wrote:
>
>
> On 16/09/2015 10:57, Christian Borntraeger wrote:
> > Am 16.09.2015 um 10:32 schrieb Paolo Bonzini:
> >>
> >>
> >> On 15/09/2015 19:38, Paul E. McKenney wrote:
> >>> Excellent points!
> >>>
> >>> Other options in such situations include the following:
> >>>
> >>> o	Rework so that the code uses call_rcu*() instead of *_expedited().
> >>>
> >>> o	Maintain a per-task or per-CPU counter so that every so many
> >>> 	*_expedited() invocations instead uses the non-expedited
> >>> 	counterpart.  (For example, synchronize_rcu instead of
> >>> 	synchronize_rcu_expedited().)
> >>
> >> Or just use ratelimit (untested):
> >
> > One of my tests was to always replace synchronize_sched_expedited with
> > synchronize_sched and things turned out to be even worse. Not sure if
> > it makes sense to test yopur in-the-middle approach?
>
> I don't think it applies here, since down_write/up_write is a
> synchronous API.
>
> If the revert isn't easy, I think backporting rcu_sync is the best bet.

I leave this to Paul and Tejun... at least I think this is not v4.2 material.

>  The issue is that rcu_sync doesn't eliminate synchronize_sched,

Yes, but it eliminates _expedited(). This is good, but otoh this means
that (say) individual __cgroup_procs_write() can take much more time.
However, it won't block the readers and/or disturb the whole system.
And percpu_up_write() doesn't do synchronize_sched() at all.

> it only
> makes it more rare.

Yes, so we can hope that multiple __cgroup_procs_write()'s can "share"
a single synchronize_sched().

> So it's possible that it isn't eliminating the root
> cause of the problem.

We will see... Just in case, currently the usage of percpu_down_write()
is suboptimal. We do not need to do ->sync() under cgroup_mutex. But
this needs some WIP changes in rcu_sync. Plus we can do more improvements,
but this is off-topic right now.

Oleg.


  reply	other threads:[~2015-09-16 12:25 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36   ` Christian Borntraeger
2015-09-15 13:53     ` Tejun Heo
2015-09-15 16:42     ` Paolo Bonzini
2015-09-15 17:38       ` Paul E. McKenney
2015-09-16  8:32         ` Paolo Bonzini
2015-09-16  8:57           ` Christian Borntraeger
2015-09-16  9:12             ` Paolo Bonzini
2015-09-16 12:22               ` Oleg Nesterov [this message]
2015-09-16 12:35                 ` Paolo Bonzini
2015-09-16 12:43                   ` Oleg Nesterov
2015-09-16 12:56                 ` Christian Borntraeger
2015-09-16 14:16                 ` Tejun Heo
2015-09-16 14:19                   ` Paolo Bonzini
2015-09-15 21:11       ` Christian Borntraeger
2015-09-15 21:26         ` Tejun Heo
2015-09-15 21:38           ` Paul E. McKenney
2015-09-15 22:28             ` Tejun Heo
2015-09-15 23:38               ` Paul E. McKenney
2015-09-16  1:24                 ` Tejun Heo
2015-09-16  4:35                   ` Paul E. McKenney
2015-09-16 11:06                     ` Tejun Heo
2015-09-16  7:44                   ` Christian Borntraeger
2015-09-16 10:58                     ` Christian Borntraeger
2015-09-16 11:03                       ` Tejun Heo
2015-09-16 11:50                         ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56   ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00   ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916122249.GA28821@redhat.com \
    --to=oleg@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).