From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753982AbbIKRJy (ORCPT <rfc822;w@1wt.eu>);
	Fri, 11 Sep 2015 13:09:54 -0400
Received: from e37.co.us.ibm.com ([32.97.110.158]:59078 "EHLO
	e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752587AbbIKRJw (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 11 Sep 2015 13:09:52 -0400
X-Helo: d03dlp03.boulder.ibm.com
X-MailFrom: paulmck@linux.vnet.ibm.com
X-RcptTo: linux-kernel@vger.kernel.org
Date: Fri, 11 Sep 2015 10:05:26 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
Message-ID: <20150911170526.GU4029@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20150911155901.GA27246@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150911155901.GA27246@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15091117-0025-0000-0000-00001C56A755
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Sep 11, 2015 at 05:59:01PM +0200, Oleg Nesterov wrote:
> On 09/10, Paul E. McKenney wrote:
> >
> > On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > > On 09/09, Paul E. McKenney wrote:
> > > >
> > > > This is obsolete, but its replacement is the same patch.
> > >
> > > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > > I guess?
> > >
> > > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > > locktorture work?
> > >
> > > No, I think the patch is fine. Thanks for doing this! I was going to
> > > send something like this change too. And in fact I am still thinking
> > > about another test which plays with rcu_sync only, but probably we
> > > need some cleanups first (and we need them anyway). I'll try to do
> > > this a bit later.
> >
> > I would welcome an rcu_sync-specific torture patch!
> 
> I want it much more than you ;) I have already warned you, I'll send
> more rcu_sync patches. The current code is actually a very early draft
> which was written during the discussion with Peter a long ago. I sent
> it unchanged because a) it was already reviewed and b) I tested it a
> bit in the past.
> 
> We can greatly simplify this code and at the same time make it more
> useful. Actually I already have the patches. The 1st one removes
> rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
> almost self-obvious and allows other improvements. See the resulting
> (pseudo) code at the end.
> 
> But again, I'll try very much to write the test before I send the patch.

That sounds very good!  ;-)

> Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
> code looks trivial but imo really annoying. And it is not complete,
> so lets document this at least. Plus rcu_lockdep_assert() looks more
> consistent.
> 
> 
> > > > +void torture_percpu_rwsem_init(void)
> > > > +{
> > > > +	BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
> > > > +}
> > > > +
> > >
> > > Aha, we don't really need this... I mean we can use the static initialiser
> > > which can also be used by uprobes and cgroups. I'll try to send the patch
> > > tomorrow.
> >
> > Very good, please do!
> 
> Hmm. I am lier. I won't send this patch at least today.
> 
> The change I had in mind is very simple,
> 
> 	#define DECLARE_PERCPU_RWSEM(sem)				\
> 		static DEFINE_PER_CPU(unsigned int, sem##_counters);	\
> 		struct percpu_rw_semaphore sem = {			\
> 			.fast_read_ctr = &sem##_counters,		\
> 			...						\
> 		}
> 			
> and yes, uprobes and cgroups can use it.
> 
> But somehow I missed that we can't use it to define a _static_ sem,
> 
> 	static DECLARE_PERCPU_RWSEM(sem);
> 
> obviously won't work. And damn, I am shy to admit that I spent several
> hours trying to invent something but failed. Perhaps we can add 2 helpers,
> DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

That is indeed what we do for SRCU for the same reason, DEFINE_SRCU()
and DEFINE_STATIC_SRCU(), but with a common __DEFINE_SRCU() doing the
actual work.

							Thanx, Paul

> Oleg.
> 
> -------------------------------------------------------------------------------
> static const struct {
> 	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> 	void (*wait)(void);	// TODO: remove this
> #ifdef CONFIG_PROVE_RCU
> 	int  (*held)(void);
> #endif
> } gp_ops[] = {
> 	...
> };
> 
> // COMMENT to explain these states
> enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
> 
> #define	rss_lock	gp_wait.lock
> 
> // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!!!!!!!!
> // XXX code must be removed when we split rcu_sync_enter() into start + wait
> // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> 
> static void rcu_sync_func(struct rcu_head *rcu)
> {
> 	struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
> 	unsigned long flags;
> 
> 	BUG_ON(rsp->gp_state == GP_IDLE);
> 	BUG_ON(rsp->gp_state == GP_PASSED);
> 
> 	spin_lock_irqsave(&rsp->rss_lock, flags);
> 	if (rsp->gp_count) {
> 		/*
> 		 * COMMENT.
> 		 */
> 		rsp->gp_state = GP_PASSED;
> 		wake_up_locked(&rsp->gp_wait);
> 	} else if (rsp->gp_state == GP_REPLAY) {
> 		/*
> 		 * A new rcu_sync_exit() has happened; requeue the callback
> 		 * to catch a later GP.
> 		 */
> 		rsp->gp_state = GP_EXIT;
> 		gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
> 	} else {
> 		/*
> 		 * We're at least a GP after rcu_sync_exit(); eveybody will now
> 		 * have observed the write side critical section. Let 'em rip!.
> 		 */
> 		BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
> 		rsp->gp_state = GP_IDLE;
> 	}
> 	spin_unlock_irqrestore(&rsp->rss_lock, flags);
> }
> 
> static void rcu_sync_call(struct rcu_sync *rsp)
> {
> 	// TODO:
> 	// This is called by might_sleep() code outside of ->rss_lock,
> 	// we can avoid ->call() in some cases (say rcu_blocking_is_gp())
> 	gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
> }
> 
> void rcu_sync_enter(struct rcu_sync *rsp)
> {
> 	int gp_count, gp_state;
> 
> 	spin_lock_irq(&rsp->rss_lock);
> 	gp_count = rsp->gp_count++;
> 	gp_state = rsp->gp_state;
> 	if (gp_state == GP_IDLE)
> 		rsp->gp_state = GP_ENTER;
> 	spin_unlock_irq(&rsp->rss_lock);
> 
> 	BUG_ON(gp_count != 0 && gp_state == GP_IDLE);
> 	BUG_ON(gp_count == 0 && gp_state == GP_PASSED);
> 	BUG_ON(gp_count == 0 && gp_state == GP_ENTER); // XXX
> 
> 	if (gp_state == GP_IDLE)
> 		rcu_sync_call(rsp);
> 
> 	wait_event(rsp->gp_wait, rsp->gp_state != GP_ENTER);
> 	BUG_ON(rsp->gp_state < GP_PASSED);
> }
> 
> void rcu_sync_exit(struct rcu_sync *rsp)
> {
> 	bool need_call;
> 
> 	BUG_ON(rsp->gp_state == GP_IDLE);
> 	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
> 
> 	spin_lock_irq(&rsp->rss_lock);
> 	if (!--rsp->gp_count) {
> 		if (rsp->gp_state == GP_PASSED) {
> 			need_call = true;
> 			rsp->gp_state = GP_EXIT;
> 		} else if (rsp->gp_state == GP_EXIT) {
> 			rsp->gp_state = GP_REPLAY;
> 		}
> 	}
> 	spin_unlock_irq(&rsp->rss_lock);
> 
> 	// Comment to explain why we do not care if another enter()
> 	// and perhaps even exit() comes after spin_unlock().
> 	if (need_call)
> 		rcu_sync_call(rsp);
> }
> 
> void rcu_sync_dtor(struct rcu_sync *rsp)
> {
> 	int gp_state;
> 
> 	BUG_ON(rsp->gp_count);
> 	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
> 	BUG_ON(rsp->gp_state == GP_PASSED);
> 
> 	spin_lock_irq(&rsp->rss_lock);
> 	if (rsp->gp_state == GP_REPLAY)
> 		rsp->gp_state = GP_EXIT;
> 	gp_state = rsp->gp_state;
> 	spin_unlock_irq(&rsp->rss_lock);
> 
> 	// TODO: add another wake_up_locked() into rcu_sync_func(),
> 	// use wait_event + spin_lock_wait, remove gp_ops->wait().
> 
> 	if (gp_state != GP_IDLE) {
> 		gp_ops[rsp->gp_type].wait();
> 		BUG_ON(rsp->gp_state != GP_IDLE);
> 	}
> }
>