From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753564AbbIKQBv (ORCPT <rfc822;w@1wt.eu>);
	Fri, 11 Sep 2015 12:01:51 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48667 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753178AbbIKQBt (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 11 Sep 2015 12:01:49 -0400
Date: Fri, 11 Sep 2015 17:59:01 +0200
From: Oleg Nesterov <oleg@redhat.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel@vger.kernel.org
Subject: [PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
Message-ID: <20150911155901.GA27246@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 09/10, Paul E. McKenney wrote:
>
> On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > On 09/09, Paul E. McKenney wrote:
> > >
> > > This is obsolete, but its replacement is the same patch.
> >
> > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > I guess?
> >
> > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > locktorture work?
> >
> > No, I think the patch is fine. Thanks for doing this! I was going to
> > send something like this change too. And in fact I am still thinking
> > about another test which plays with rcu_sync only, but probably we
> > need some cleanups first (and we need them anyway). I'll try to do
> > this a bit later.
>
> I would welcome an rcu_sync-specific torture patch!

I want it much more than you ;) I have already warned you, I'll send
more rcu_sync patches. The current code is actually a very early draft
which was written during the discussion with Peter a long ago. I sent
it unchanged because a) it was already reviewed and b) I tested it a
bit in the past.

We can greatly simplify this code and at the same time make it more
useful. Actually I already have the patches. The 1st one removes
rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
almost self-obvious and allows other improvements. See the resulting
(pseudo) code at the end.

But again, I'll try very much to write the test before I send the patch.


Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
code looks trivial but imo really annoying. And it is not complete,
so lets document this at least. Plus rcu_lockdep_assert() looks more
consistent.


> > > +void torture_percpu_rwsem_init(void)
> > > +{
> > > +	BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
> > > +}
> > > +
> >
> > Aha, we don't really need this... I mean we can use the static initialiser
> > which can also be used by uprobes and cgroups. I'll try to send the patch
> > tomorrow.
>
> Very good, please do!

Hmm. I am lier. I won't send this patch at least today.

The change I had in mind is very simple,

	#define DECLARE_PERCPU_RWSEM(sem)				\
		static DEFINE_PER_CPU(unsigned int, sem##_counters);	\
		struct percpu_rw_semaphore sem = {			\
			.fast_read_ctr = &sem##_counters,		\
			...						\
		}
			
and yes, uprobes and cgroups can use it.

But somehow I missed that we can't use it to define a _static_ sem,

	static DECLARE_PERCPU_RWSEM(sem);

obviously won't work. And damn, I am shy to admit that I spent several
hours trying to invent something but failed. Perhaps we can add 2 helpers,
DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

Oleg.

-------------------------------------------------------------------------------
static const struct {
	void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
	void (*wait)(void);	// TODO: remove this
#ifdef CONFIG_PROVE_RCU
	int  (*held)(void);
#endif
} gp_ops[] = {
	...
};

// COMMENT to explain these states
enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };

#define	rss_lock	gp_wait.lock

// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!!!!!!!!
// XXX code must be removed when we split rcu_sync_enter() into start + wait
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

static void rcu_sync_func(struct rcu_head *rcu)
{
	struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
	unsigned long flags;

	BUG_ON(rsp->gp_state == GP_IDLE);
	BUG_ON(rsp->gp_state == GP_PASSED);

	spin_lock_irqsave(&rsp->rss_lock, flags);
	if (rsp->gp_count) {
		/*
		 * COMMENT.
		 */
		rsp->gp_state = GP_PASSED;
		wake_up_locked(&rsp->gp_wait);
	} else if (rsp->gp_state == GP_REPLAY) {
		/*
		 * A new rcu_sync_exit() has happened; requeue the callback
		 * to catch a later GP.
		 */
		rsp->gp_state = GP_EXIT;
		gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
	} else {
		/*
		 * We're at least a GP after rcu_sync_exit(); eveybody will now
		 * have observed the write side critical section. Let 'em rip!.
		 */
		BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
		rsp->gp_state = GP_IDLE;
	}
	spin_unlock_irqrestore(&rsp->rss_lock, flags);
}

static void rcu_sync_call(struct rcu_sync *rsp)
{
	// TODO:
	// This is called by might_sleep() code outside of ->rss_lock,
	// we can avoid ->call() in some cases (say rcu_blocking_is_gp())
	gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
}

void rcu_sync_enter(struct rcu_sync *rsp)
{
	int gp_count, gp_state;

	spin_lock_irq(&rsp->rss_lock);
	gp_count = rsp->gp_count++;
	gp_state = rsp->gp_state;
	if (gp_state == GP_IDLE)
		rsp->gp_state = GP_ENTER;
	spin_unlock_irq(&rsp->rss_lock);

	BUG_ON(gp_count != 0 && gp_state == GP_IDLE);
	BUG_ON(gp_count == 0 && gp_state == GP_PASSED);
	BUG_ON(gp_count == 0 && gp_state == GP_ENTER); // XXX

	if (gp_state == GP_IDLE)
		rcu_sync_call(rsp);

	wait_event(rsp->gp_wait, rsp->gp_state != GP_ENTER);
	BUG_ON(rsp->gp_state < GP_PASSED);
}

void rcu_sync_exit(struct rcu_sync *rsp)
{
	bool need_call;

	BUG_ON(rsp->gp_state == GP_IDLE);
	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX

	spin_lock_irq(&rsp->rss_lock);
	if (!--rsp->gp_count) {
		if (rsp->gp_state == GP_PASSED) {
			need_call = true;
			rsp->gp_state = GP_EXIT;
		} else if (rsp->gp_state == GP_EXIT) {
			rsp->gp_state = GP_REPLAY;
		}
	}
	spin_unlock_irq(&rsp->rss_lock);

	// Comment to explain why we do not care if another enter()
	// and perhaps even exit() comes after spin_unlock().
	if (need_call)
		rcu_sync_call(rsp);
}

void rcu_sync_dtor(struct rcu_sync *rsp)
{
	int gp_state;

	BUG_ON(rsp->gp_count);
	BUG_ON(rsp->gp_state == GP_ENTER);	// XXX
	BUG_ON(rsp->gp_state == GP_PASSED);

	spin_lock_irq(&rsp->rss_lock);
	if (rsp->gp_state == GP_REPLAY)
		rsp->gp_state = GP_EXIT;
	gp_state = rsp->gp_state;
	spin_unlock_irq(&rsp->rss_lock);

	// TODO: add another wake_up_locked() into rcu_sync_func(),
	// use wait_event + spin_lock_wait, remove gp_ops->wait().

	if (gp_state != GP_IDLE) {
		gp_ops[rsp->gp_type].wait();
		BUG_ON(rsp->gp_state != GP_IDLE);
	}
}