From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753398AbbGOS12 (ORCPT ); Wed, 15 Jul 2015 14:27:28 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:44143 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752390AbbGOS11 (ORCPT ); Wed, 15 Jul 2015 14:27:27 -0400 X-Helo: d03dlp01.boulder.ibm.com X-MailFrom: paulmck@linux.vnet.ibm.com X-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 15 Jul 2015 11:27:13 -0700 From: "Paul E. McKenney" To: Oleg Nesterov Cc: Linus Torvalds , Peter Zijlstra , Daniel Wagner , Davidlohr Bueso , Ingo Molnar , Tejun Heo , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/7] Add rcu_sync infrastructure to avoid _expedited() in percpu-rwsem Message-ID: <20150715182713.GL3717@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150711233535.GA829@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150711233535.GA829@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15071518-8236-0000-0000-00000D2E42F9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jul 12, 2015 at 01:35:35AM +0200, Oleg Nesterov wrote: > Hello, > > Let me make another attempt to push rcu_sync and add a _simple_ > improvment into percpu-rwsem. It already has another user (cgroups) > and I think it can have more. Peter has some use-cases. sb->s_writers > (which afaics is buggy btw) can be turned into percpu-rwsem too I think. > > Linus, I am mostly trying to convince you. Nobody else objected so far. > Could you please comment? > > Peter, if you agree with 5-7, can I add your Signed-off-by's ? > > To me, the most annoying problem with percpu_rw_semaphore is > synchronize_sched_expedited() which is called twice by every > down_write/up_write. I think it would be really nice to avoid it. > > Let's start with the simple test-case, > > #!/bin/bash > > perf probe -x /lib/libc.so.6 syscall > > for i in {1..1000}; do > echo 1 >| /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > echo 0 >| /sys/kernel/debug/tracing/events/probe_libc/syscall/enable > done > > It needs ~ 13.5 seconds (2 CPUs, KVM). If we simply replace > synchronize_sched_expedited() with synchronize_sched() it takes > ~ 67.5 seconds. This is not good. Yep, even if you avoided the write-release grace period, you would still be looking at something like 40 seconds, which is 3x. Some might consider that to be a performance regression. ;-) > With these patches it takes around 13.3 seconds again (a little > bit faster), and it doesn't use _expedited. synchronize_sched() > is called 1-2 (max 3) times in average. And now it does not > disturb the whole system. > > And just in case, I also measured > > for (i = 0; i < 1000000; ++i) { > percpu_down_write(&dup_mmap_sem); > percpu_up_write(&dup_mmap_sem); > } > > and it runs more than 1.5 times faster (to remind, only 2 CPUs), > but this is not that interesting, I agree. Your trick avoiding the grace periods during a writer-to-writer handoff are cute, and they are helping a lot here. Concurrent readers would have a tough time of it with this workload, though. They would all be serialized. > And note that the actual change in percpu-rwsem is really simple, > and imo it even makes the code simpler. (the last patch is off- > topic cleanup). > > So the only complication is rcu_sync itself. But, rightly or not (I > am obviously biased), I believe this new rcu infrastructure is natural > and useful, and I think it can have more users too. I don't have an objection to it, even in its current form (I did review it long ago), but it does need to have a user! > And. We can do more improvements in rcu_sync and percpu-rwsem, and > I don't only mean other optimizations from Peter. In particular, we > can extract the "wait for gp pass" from rcu_sync_enter() into another > helper, we can teach percpu_down_write() to allow multiple writers, > and more. As in a percpu_down_write() that allows up to (say) five concurrent write-holders? (Which can be useful, don't get me wrong.) Or do you mean as an internal optimization of some sort? Thanx, Paul > Oleg. > > include/linux/percpu-rwsem.h | 3 +- > include/linux/rcusync.h | 57 +++++++++++++++ > kernel/locking/percpu-rwsem.c | 78 ++++++--------------- > kernel/rcu/Makefile | 2 +- > kernel/rcu/sync.c | 152 +++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 235 insertions(+), 57 deletions(-) >