Re: [PATCH] Convert struct pid count to refcount_t

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	Kees Cook <keescook@chromium.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Android Kernel Team <kernel-team@android.com>,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH] Convert struct pid count to refcount_t
Date: Thu, 28 Mar 2019 10:37:07 -0700	[thread overview]
Message-ID: <20190328173707.GP4102@linux.ibm.com> (raw)
In-Reply-To: <20190328162641.GC19441@redhat.com>

On Thu, Mar 28, 2019 at 05:26:42PM +0100, Oleg Nesterov wrote:
> On 03/28, Jann Horn wrote:
> >
> > Since we're just talking about RCU stuff now, adding Paul McKenney to
> > the thread.
> 
> Since you added Paul let me add more confusion to this thread ;)

Woo-hoo!!!  More confusion!  Bring it on!!!  ;-)

> There were some concerns about the lack of barriers in put_pid(), but I can't
> find that old discussion and I forgot the result of that discussion...
> 
> Paul, could you confirm that this code
> 
> 	CPU_0		CPU_1
> 
> 	X = 1;		if (READ_ONCE(Y))
> 	mb();			X = 2;
> 	Y = 1;		BUG_ON(X != 2);
> 
> 
> is correct? I think it is, control dependency pairs with mb(), right?

The BUG_ON() is supposed to happen at the end of time, correct?
As written, there is (in the strict sense) a data race between the load
of X in the BUG_ON() and CPU_0's store to X.  In a less strict sense,
you could of course argue that this data race is harmless, especially
if X is a single byte.  But the more I talk to compiler writers, the
less comfortable I become with data races in general.  :-/

So I would also feel better if the "Y = 1" was WRITE_ONCE().

On the other hand, this is a great opportunity to try out Alan Stern's
prototype plain-accesses patch to the Linux Kernel Memory Model (LKMM)!

https://lkml.kernel.org/r/Pine.LNX.4.44L0.1903191459270.1593-200000@iolanthe.rowland.org

Also adding Alan on CC.

Here is what I believe is the litmus test that your are interested in:

------------------------------------------------------------------------
C OlegNesterov-put_pid

{}

P0(int *x, int *y)
{
	*x = 1;
	smp_mb();
	*y = 1;
}

P1(int *x, int *y)
{
	int r1;

	r1 = READ_ONCE(*y);
	if (r1)
		*x = 2;
}

exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------

Running this through herd with Alan's patch detects the data race
and says that the undesired outcome is allowed:

	$ herd7  -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid.litmus 
	Test OlegNesterov-put_pid Allowed
	States 3
	1:r1=0; x=1;
	1:r1=1; x=1;
	1:r1=1; x=2;
	Ok
	Witnesses
	Positive: 1 Negative: 2
	Flag data-race
	Condition exists (1:r1=1 /\ not (x=2))
	Observation OlegNesterov-put_pid Sometimes 1 2
	Time OlegNesterov-put_pid 0.00
	Hash=a3e0043ad753effa860fea37eeba0a76

Using WRITE_ONCE() for P0()'s store to y still allows this outcome,
although it does remove the "Flag data-race".

Using WRITE_ONCE() for both P0()'s store to y and P1()'s store to x
gets rid of both the "Flag data-race" and the undesired outcome:

	$ herd7  -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-WO.litmus 
	Test OlegNesterov-put_pid-WO-WO Allowed
	States 2
	1:r1=0; x=1;
	1:r1=1; x=2;
	No
	Witnesses
	Positive: 0 Negative: 2
	Condition exists (1:r1=1 /\ not (x=2))
	Observation OlegNesterov-put_pid-WO-WO Never 0 2
	Time OlegNesterov-put_pid-WO-WO 0.01
	Hash=6e1643e3c5e4739b590bde0a8e8a918e

Here is the corresponding litmus test, in case I messed something up:

------------------------------------------------------------------------
C OlegNesterov-put_pid-WO-WO

{}

P0(int *x, int *y)
{
	*x = 1;
	smp_mb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r1;

	r1 = READ_ONCE(*y);
	if (r1)
		WRITE_ONCE(*x, 2);
}

exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------

> If not, then put_pid() needs atomic_read_acquire() as it was proposed in that
> discussion.

Good point, let's try with smp_load_acquire() in P1():

	$ herd7  -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-sla.litmus 
	Test OlegNesterov-put_pid-WO-sla Allowed
	States 2
	1:r1=0; x=1;
	1:r1=1; x=2;
	No
	Witnesses
	Positive: 0 Negative: 2
	Condition exists (1:r1=1 /\ not (x=2))
	Observation OlegNesterov-put_pid-WO-sla Never 0 2
	Time OlegNesterov-put_pid-WO-sla 0.01
	Hash=4fb0276eabf924793dec1970199db3a6

This also works.  Here is the litmus test:

------------------------------------------------------------------------
C OlegNesterov-put_pid-WO-sla

{}

P0(int *x, int *y)
{
	*x = 1;
	smp_mb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r1;

	r1 = smp_load_acquire(y);
	if (r1)
		*x = 2;
}

exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------

Demoting P0()'s WRITE_ONCE() to a plain write while leaving P1()'s
smp_load_acquire() gets us a data race and allows the undesired
outcome:

	$ herd7  -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-sla.litmus 
	Test OlegNesterov-put_pid-sla Allowed
	States 3
	1:r1=0; x=1;
	1:r1=1; x=1;
	1:r1=1; x=2;
	Ok
	Witnesses
	Positive: 1 Negative: 2
	Flag data-race
	Condition exists (1:r1=1 /\ not (x=2))
	Observation OlegNesterov-put_pid-sla Sometimes 1 2
	Time OlegNesterov-put_pid-sla 0.01
	Hash=ec6f71f3d9f7cd6e45a874c872e3d946

But what if you are certain that the compiler cannot mess up your use
of plain C-language loads and stores?  Then simply tell LKMM that they
are READ_ONCE() and WRITE_ONCE(), respectively.  LKMM is admittedly
somewhat paranoid, but real C compilers really do tear stores of certain
constants on systems (like x86) that have store-immediate instructions,
so a bit of paranoia is not misplaced here.  ;-)

Plus please note that this patch to LKMM is prototype and thus subject
to change.

							Thanx, Paul

next prev parent reply	other threads:[~2019-03-28 17:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-27 14:53 [PATCH] Convert struct pid count to refcount_t Joel Fernandes (Google)
2019-03-28  0:06 ` Kees Cook
2019-03-28  0:59   ` Jann Horn
2019-03-28  2:34     ` Joel Fernandes
2019-03-28  2:57       ` Jann Horn
2019-03-28 14:37         ` Joel Fernandes
2019-03-28 15:17           ` Jann Horn
2019-03-28 16:26             ` Oleg Nesterov
2019-03-28 17:37               ` Paul E. McKenney [this message]
2019-03-29 17:32                 ` Oleg Nesterov
2019-03-29 19:45                   ` Alan Stern
2019-04-01 15:28                     ` David Laight
2019-03-30  2:36                 ` Joel Fernandes
2019-03-30 15:16                   ` Alan Stern
2019-03-31 21:57                     ` Paul E. McKenney
2019-03-31 21:55                   ` Paul E. McKenney
2019-04-01 21:11                     ` Joel Fernandes
2019-04-04 15:23                       ` Paul E. McKenney
2019-04-04 16:01                         ` Alan Stern
2019-04-04 18:08                           ` Joel Fernandes
2019-04-04 18:19                             ` Paul E. McKenney
2019-04-04 20:31                               ` Joel Fernandes
2019-04-04 19:09                             ` Alan Stern
2019-03-28 20:00             ` Joel Fernandes
2019-03-29  2:24               ` Joel Fernandes
2019-03-28 16:52           ` Kees Cook
2019-03-28 14:26       ` Oleg Nesterov
2019-03-28 14:39         ` Joel Fernandes
2019-03-29  2:34           ` Joel Fernandes
2019-03-29 17:37             ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190328173707.GP4102@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=elena.reshetova@intel.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=oleg@redhat.com \
    --cc=stern@rowland.harvard.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.