From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753136AbbIPIcN (ORCPT ); Wed, 16 Sep 2015 04:32:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50520 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751179AbbIPIcK (ORCPT ); Wed, 16 Sep 2015 04:32:10 -0400 Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem) causes regression for libvirt/kvm To: paulmck@linux.vnet.ibm.com References: <55F8097A.7000206@de.ibm.com> <20150915130550.GC16853@twins.programming.kicks-ass.net> <55F81EE2.4090708@de.ibm.com> <55F84A6B.1010207@redhat.com> <20150915173836.GO4029@linux.vnet.ibm.com> Cc: Christian Borntraeger , Peter Zijlstra , Tejun Heo , Ingo Molnar , "linux-kernel@vger.kernel.org >> Linux Kernel Mailing List" , KVM list , Oleg Nesterov From: Paolo Bonzini X-Enigmail-Draft-Status: N1110 Message-ID: <55F92904.4090206@redhat.com> Date: Wed, 16 Sep 2015 10:32:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150915173836.GO4029@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/09/2015 19:38, Paul E. McKenney wrote: > Excellent points! > > Other options in such situations include the following: > > o Rework so that the code uses call_rcu*() instead of *_expedited(). > > o Maintain a per-task or per-CPU counter so that every so many > *_expedited() invocations instead uses the non-expedited > counterpart. (For example, synchronize_rcu instead of > synchronize_rcu_expedited().) Or just use ratelimit (untested): diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 834c4e52cb2d..8fb66b2aeed9 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -6,6 +6,7 @@ #include #include #include +#include struct percpu_rw_semaphore { unsigned int __percpu *fast_read_ctr; @@ -13,6 +14,7 @@ struct percpu_rw_semaphore { struct rw_semaphore rw_sem; atomic_t slow_read_ctr; wait_queue_head_t write_waitq; + struct ratelimit_state expedited_ratelimit; }; extern void percpu_down_read(struct percpu_rw_semaphore *); diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index f32567254867..c33f8bc89384 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -20,6 +20,8 @@ int __percpu_init_rwsem(struct percpu_rw_semaphore *brw, atomic_set(&brw->write_ctr, 0); atomic_set(&brw->slow_read_ctr, 0); init_waitqueue_head(&brw->write_waitq); + /* Expedite one down_write and one up_write per second. */ + ratelimit_state_init(&brw->expedited_ratelimit, HZ, 2); return 0; } @@ -152,7 +156,10 @@ void percpu_down_write(struct percpu_rw_semaphore *brw) * fast-path, it executes a full memory barrier before we return. * See R_W case in the comment above update_fast_ctr(). */ - synchronize_sched_expedited(); + if (__ratelimit(&brw->expedited_ratelimit)) + synchronize_sched_expedited(); + else + synchronize_sched(); /* exclude other writers, and block the new readers completely */ down_write(&brw->rw_sem); @@ -172,7 +179,10 @@ void percpu_up_write(struct percpu_rw_semaphore *brw) * Insert the barrier before the next fast-path in down_read, * see W_R case in the comment above update_fast_ctr(). */ - synchronize_sched_expedited(); + if (__ratelimit(&brw->expedited_ratelimit)) + synchronize_sched_expedited(); + else + synchronize_sched(); /* the last writer unblocks update_fast_ctr() */ atomic_dec(&brw->write_ctr); } > Note that synchronize_srcu_expedited() is less troublesome than are the > other *_expedited() functions, because synchronize_srcu_expedited() does > not inflict OS jitter on other CPUs. Yup, synchronize_srcu_expedited() is just a busy wait and it can complete extremely fast if you use SRCU as a "local RCU" rather than a "sleepable RCU". However it doesn't apply here since you want to avoid SRCU's 2 memory barriers per lock/unlock. Paolo