All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	paulmck@linux.vnet.ibm.com, Peter Zijlstra <peterz@infradead.org>,
	Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List"
	<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Wed, 16 Sep 2015 14:22:49 +0200	[thread overview]
Message-ID: <20150916122249.GA28821@redhat.com> (raw)
In-Reply-To: <55F9326A.9070509@redhat.com>

On 09/16, Paolo Bonzini wrote:
>
>
> On 16/09/2015 10:57, Christian Borntraeger wrote:
> > Am 16.09.2015 um 10:32 schrieb Paolo Bonzini:
> >>
> >>
> >> On 15/09/2015 19:38, Paul E. McKenney wrote:
> >>> Excellent points!
> >>>
> >>> Other options in such situations include the following:
> >>>
> >>> o	Rework so that the code uses call_rcu*() instead of *_expedited().
> >>>
> >>> o	Maintain a per-task or per-CPU counter so that every so many
> >>> 	*_expedited() invocations instead uses the non-expedited
> >>> 	counterpart.  (For example, synchronize_rcu instead of
> >>> 	synchronize_rcu_expedited().)
> >>
> >> Or just use ratelimit (untested):
> >
> > One of my tests was to always replace synchronize_sched_expedited with
> > synchronize_sched and things turned out to be even worse. Not sure if
> > it makes sense to test yopur in-the-middle approach?
>
> I don't think it applies here, since down_write/up_write is a
> synchronous API.
>
> If the revert isn't easy, I think backporting rcu_sync is the best bet.

I leave this to Paul and Tejun... at least I think this is not v4.2 material.

>  The issue is that rcu_sync doesn't eliminate synchronize_sched,

Yes, but it eliminates _expedited(). This is good, but otoh this means
that (say) individual __cgroup_procs_write() can take much more time.
However, it won't block the readers and/or disturb the whole system.
And percpu_up_write() doesn't do synchronize_sched() at all.

> it only
> makes it more rare.

Yes, so we can hope that multiple __cgroup_procs_write()'s can "share"
a single synchronize_sched().

> So it's possible that it isn't eliminating the root
> cause of the problem.

We will see... Just in case, currently the usage of percpu_down_write()
is suboptimal. We do not need to do ->sync() under cgroup_mutex. But
this needs some WIP changes in rcu_sync. Plus we can do more improvements,
but this is off-topic right now.

Oleg.

  reply	other threads:[~2015-09-16 12:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36   ` Christian Borntraeger
2015-09-15 13:53     ` Tejun Heo
2015-09-15 16:42     ` Paolo Bonzini
2015-09-15 17:38       ` Paul E. McKenney
2015-09-16  8:32         ` Paolo Bonzini
2015-09-16  8:57           ` Christian Borntraeger
2015-09-16  9:12             ` Paolo Bonzini
2015-09-16 12:22               ` Oleg Nesterov [this message]
2015-09-16 12:35                 ` Paolo Bonzini
2015-09-16 12:43                   ` Oleg Nesterov
2015-09-16 12:56                 ` Christian Borntraeger
2015-09-16 14:16                 ` Tejun Heo
2015-09-16 14:19                   ` Paolo Bonzini
2015-09-15 21:11       ` Christian Borntraeger
2015-09-15 21:26         ` Tejun Heo
2015-09-15 21:38           ` Paul E. McKenney
2015-09-15 22:28             ` Tejun Heo
2015-09-15 23:38               ` Paul E. McKenney
2015-09-16  1:24                 ` Tejun Heo
2015-09-16  4:35                   ` Paul E. McKenney
2015-09-16 11:06                     ` Tejun Heo
2015-09-16  7:44                   ` Christian Borntraeger
2015-09-16 10:58                     ` Christian Borntraeger
2015-09-16 11:03                       ` Tejun Heo
2015-09-16 11:50                         ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56   ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00   ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45   ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916122249.GA28821@redhat.com \
    --to=oleg@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.