From: Paolo Bonzini <pbonzini@redhat.com>
To: paulmck@linux.vnet.ibm.com
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
"linux-kernel@vger.kernel.org >> Linux Kernel Mailing List"
<linux-kernel@vger.kernel.org>, KVM list <kvm@vger.kernel.org>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm
Date: Wed, 16 Sep 2015 10:32:04 +0200 [thread overview]
Message-ID: <55F92904.4090206@redhat.com> (raw)
In-Reply-To: <20150915173836.GO4029@linux.vnet.ibm.com>
On 15/09/2015 19:38, Paul E. McKenney wrote:
> Excellent points!
>
> Other options in such situations include the following:
>
> o Rework so that the code uses call_rcu*() instead of *_expedited().
>
> o Maintain a per-task or per-CPU counter so that every so many
> *_expedited() invocations instead uses the non-expedited
> counterpart. (For example, synchronize_rcu instead of
> synchronize_rcu_expedited().)
Or just use ratelimit (untested):
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 834c4e52cb2d..8fb66b2aeed9 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -6,6 +6,7 @@
#include <linux/percpu.h>
#include <linux/wait.h>
#include <linux/lockdep.h>
+#include <linux/ratelimit.h>
struct percpu_rw_semaphore {
unsigned int __percpu *fast_read_ctr;
@@ -13,6 +14,7 @@ struct percpu_rw_semaphore {
struct rw_semaphore rw_sem;
atomic_t slow_read_ctr;
wait_queue_head_t write_waitq;
+ struct ratelimit_state expedited_ratelimit;
};
extern void percpu_down_read(struct percpu_rw_semaphore *);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index f32567254867..c33f8bc89384 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -20,6 +20,8 @@ int __percpu_init_rwsem(struct percpu_rw_semaphore *brw,
atomic_set(&brw->write_ctr, 0);
atomic_set(&brw->slow_read_ctr, 0);
init_waitqueue_head(&brw->write_waitq);
+ /* Expedite one down_write and one up_write per second. */
+ ratelimit_state_init(&brw->expedited_ratelimit, HZ, 2);
return 0;
}
@@ -152,7 +156,10 @@ void percpu_down_write(struct percpu_rw_semaphore *brw)
* fast-path, it executes a full memory barrier before we return.
* See R_W case in the comment above update_fast_ctr().
*/
- synchronize_sched_expedited();
+ if (__ratelimit(&brw->expedited_ratelimit))
+ synchronize_sched_expedited();
+ else
+ synchronize_sched();
/* exclude other writers, and block the new readers completely */
down_write(&brw->rw_sem);
@@ -172,7 +179,10 @@ void percpu_up_write(struct percpu_rw_semaphore *brw)
* Insert the barrier before the next fast-path in down_read,
* see W_R case in the comment above update_fast_ctr().
*/
- synchronize_sched_expedited();
+ if (__ratelimit(&brw->expedited_ratelimit))
+ synchronize_sched_expedited();
+ else
+ synchronize_sched();
/* the last writer unblocks update_fast_ctr() */
atomic_dec(&brw->write_ctr);
}
> Note that synchronize_srcu_expedited() is less troublesome than are the
> other *_expedited() functions, because synchronize_srcu_expedited() does
> not inflict OS jitter on other CPUs.
Yup, synchronize_srcu_expedited() is just a busy wait and it can
complete extremely fast if you use SRCU as a "local RCU" rather
than a "sleepable RCU". However it doesn't apply here since you
want to avoid SRCU's 2 memory barriers per lock/unlock.
Paolo
next prev parent reply other threads:[~2015-09-16 8:32 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-15 12:05 [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm Christian Borntraeger
2015-09-15 13:05 ` Peter Zijlstra
2015-09-15 13:36 ` Christian Borntraeger
2015-09-15 13:53 ` Tejun Heo
2015-09-15 16:42 ` Paolo Bonzini
2015-09-15 17:38 ` Paul E. McKenney
2015-09-16 8:32 ` Paolo Bonzini [this message]
2015-09-16 8:57 ` Christian Borntraeger
2015-09-16 9:12 ` Paolo Bonzini
2015-09-16 12:22 ` Oleg Nesterov
2015-09-16 12:35 ` Paolo Bonzini
2015-09-16 12:43 ` Oleg Nesterov
2015-09-16 12:56 ` Christian Borntraeger
2015-09-16 14:16 ` Tejun Heo
2015-09-16 14:19 ` Paolo Bonzini
2015-09-15 21:11 ` Christian Borntraeger
2015-09-15 21:26 ` Tejun Heo
2015-09-15 21:38 ` Paul E. McKenney
2015-09-15 22:28 ` Tejun Heo
2015-09-15 23:38 ` Paul E. McKenney
2015-09-16 1:24 ` Tejun Heo
2015-09-16 4:35 ` Paul E. McKenney
2015-09-16 11:06 ` Tejun Heo
2015-09-16 7:44 ` Christian Borntraeger
2015-09-16 10:58 ` Christian Borntraeger
2015-09-16 11:03 ` Tejun Heo
2015-09-16 11:50 ` Christian Borntraeger
2015-09-16 15:55 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Tejun Heo
2015-09-16 15:56 ` [PATCH cgroup/for-4.3-fixes 2/2] Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Tejun Heo
2015-09-16 17:00 ` [PATCH cgroup/for-4.3-fixes 1/2] Revert "cgroup: simplify threadgroup locking" Oleg Nesterov
2015-09-16 18:45 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55F92904.4090206@redhat.com \
--to=pbonzini@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.