From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752185AbdJKWch (ORCPT <rfc822;w@1wt.eu>);
        Wed, 11 Oct 2017 18:32:37 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60980 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1751827AbdJKWcf (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 11 Oct 2017 18:32:35 -0400
Date: Wed, 11 Oct 2017 15:32:30 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: stern@rowland.harvard.edu, parri.andrea@gmail.com, will.deacon@arm.com,
        peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com,
        dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr
Cc: linux-kernel@vger.kernel.org
Subject: Linux-kernel examples for LKMM recipes
Reply-To: paulmck@linux.vnet.ibm.com
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-GCONF: 00
x-cbid: 17101122-2213-0000-0000-0000022A38F8
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00007880; HX=3.00000241; KW=3.00000007;
 PH=3.00000004; SC=3.00000236; SDB=6.00929780; UDB=6.00467995; IPR=6.00710051;
 BA=6.00005634; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017496; XFM=3.00000015;
 UTC=2017-10-11 22:32:32
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17101122-2214-0000-0000-000057D30DE5
Message-Id: <20171011223229.GA31650@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-11_07:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000
 definitions=main-1710110303
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello!

At Linux Plumbers Conference, we got requests for a recipes document,
and a further request to point to actual code in the Linux kernel.
I have pulled together some examples for various litmus-test families,
as shown below.  The decoder ring for the abbreviations (ISA2, LB, SB,
MP, ...) is here:

	https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf

This document is also checked into the memory-models git archive:

	https://github.com/aparri/memory-model.git

I would be especially interested in simpler examples in general, and
of course any example at all for the cases where I was unable to find
any.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

This document lists the litmus-test patterns that we have been discussing,
along with examples from the Linux kernel.  This is intended to feed into
the recipes document.  All examples are from v4.13.

0.	Single-variable SC.

	a.	Within a single CPU, the use of the ->dynticks_nmi_nesting
		counter by rcu_nmi_enter() and rcu_nmi_exit() qualifies
		(see kernel/rcu/tree.c).  The counter is accessed by
		interrupts and NMIs as well as by process-level code.
		This counter can be accessed by other CPUs, but only
		for debug output.

	b.	Between CPUs, I would put forward the ->dflags
		updates, but this is anything but simple.  But maybe
		OK for an illustration?

1.	MP (see test6.pdf for nickname translation)

	a.	smp_store_release() / smp_load_acquire()

		init_stack_slab() in lib/stackdepot.c uses release-acquire
		to handle initialization of a slab of the stack.  Working
		out the mutual-exclusion design is left as an exercise for
		the reader.

	b.	rcu_assign_pointer() / rcu_dereference()

		expand_to_next_prime() does the rcu_assign_pointer(),
		and next_prime_number() does the rcu_dereference().
		This mediates access to a bit vector that is expanded
		as additional primes are needed.  These two functions
		are in lib/prime_numbers.c.

	c.	smp_wmb() / smp_rmb()

		xlog_state_switch_iclogs() contains the following:

			log->l_curr_block -= log->l_logBBsize;
			ASSERT(log->l_curr_block >= 0);
			smp_wmb();
			log->l_curr_cycle++;

		And xlog_valid_lsn() contains the following:

			cur_cycle = ACCESS_ONCE(log->l_curr_cycle);
			smp_rmb();
			cur_block = ACCESS_ONCE(log->l_curr_block);

	d.	Replacing either of the above with smp_mb()

		Holding off on this one for the moment...

2.	Release-acquire chains, AKA ISA2, Z6.2, LB, and 3.LB

	Lots of variety here, can in some cases substitute:
	
	a.	READ_ONCE() for smp_load_acquire()
	b.	WRITE_ONCE() for smp_store_release()
	c.	Dependencies for both smp_load_acquire() and
		smp_store_release().
	d.	smp_wmb() for smp_store_release() in first thread
		of ISA2 and Z6.2.
	e.	smp_rmb() for smp_load_acquire() in last thread of ISA2.

	The canonical illustration of LB involves the various memory
	allocators, where you don't want a load from about-to-be-freed
	memory to see a store initializing a later incarnation of that
	same memory area.  But the per-CPU caches make this a very
	long and complicated example.

	I am not aware of any three-CPU release-acquire chains in the
	Linux kernel.  There are three-CPU lock-based chains in RCU,
	but these are not at all simple, either.

	Thoughts?

3.	SB

	a.	smp_mb(), as in lockless wait-wakeup coordination.
		And as in sys_membarrier()-scheduler coordination,
		for that matter.

		Examples seem to be lacking.  Most cases use locking.
		Here is one rather strange one from RCU:

		void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func)
		{
			unsigned long flags;
			bool needwake;
			bool havetask = READ_ONCE(rcu_tasks_kthread_ptr);

			rhp->next = NULL;
			rhp->func = func;
			raw_spin_lock_irqsave(&rcu_tasks_cbs_lock, flags);
			needwake = !rcu_tasks_cbs_head;
			*rcu_tasks_cbs_tail = rhp;
			rcu_tasks_cbs_tail = &rhp->next;
			raw_spin_unlock_irqrestore(&rcu_tasks_cbs_lock, flags);
			/* We can't create the thread unless interrupts are enabled. */
			if ((needwake && havetask) ||
			    (!havetask && !irqs_disabled_flags(flags))) {
				rcu_spawn_tasks_kthread();
				wake_up(&rcu_tasks_cbs_wq);
			}
		}

		And for the wait side, using synchronize_sched() to supply
		the barrier for both ends, with the preemption disabling
		due to raw_spin_lock_irqsave() serving as the read-side
		critical section:

		if (!list) {
			wait_event_interruptible(rcu_tasks_cbs_wq,
						 rcu_tasks_cbs_head);
			if (!rcu_tasks_cbs_head) {
				WARN_ON(signal_pending(current));
				schedule_timeout_interruptible(HZ/10);
			}
			continue;
		}
		synchronize_sched();

		-----------------

		Here is another one that uses atomic_cmpxchg() as a
		full memory barrier:

		if (!wait_event_timeout(*wait, !atomic_read(stopping),
					msecs_to_jiffies(1000))) {
			atomic_set(stopping, 0);
			smp_mb();
			return -ETIMEDOUT;
		}

		int omap3isp_module_sync_is_stopping(wait_queue_head_t *wait,
						     atomic_t *stopping)
		{
			if (atomic_cmpxchg(stopping, 1, 0)) {
				wake_up(wait);
				return 1;
			}

			return 0;
		}