From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>,
Joel Fernandes <joel@joelfernandes.org>,
Kees Cook <keescook@chromium.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
LKML <linux-kernel@vger.kernel.org>,
Android Kernel Team <kernel-team@android.com>,
Kernel Hardening <kernel-hardening@lists.openwall.com>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
"Reshetova, Elena" <elena.reshetova@intel.com>,
Alan Stern <stern@rowland.harvard.edu>
Subject: Re: [PATCH] Convert struct pid count to refcount_t
Date: Thu, 28 Mar 2019 10:37:07 -0700 [thread overview]
Message-ID: <20190328173707.GP4102@linux.ibm.com> (raw)
In-Reply-To: <20190328162641.GC19441@redhat.com>
On Thu, Mar 28, 2019 at 05:26:42PM +0100, Oleg Nesterov wrote:
> On 03/28, Jann Horn wrote:
> >
> > Since we're just talking about RCU stuff now, adding Paul McKenney to
> > the thread.
>
> Since you added Paul let me add more confusion to this thread ;)
Woo-hoo!!! More confusion! Bring it on!!! ;-)
> There were some concerns about the lack of barriers in put_pid(), but I can't
> find that old discussion and I forgot the result of that discussion...
>
> Paul, could you confirm that this code
>
> CPU_0 CPU_1
>
> X = 1; if (READ_ONCE(Y))
> mb(); X = 2;
> Y = 1; BUG_ON(X != 2);
>
>
> is correct? I think it is, control dependency pairs with mb(), right?
The BUG_ON() is supposed to happen at the end of time, correct?
As written, there is (in the strict sense) a data race between the load
of X in the BUG_ON() and CPU_0's store to X. In a less strict sense,
you could of course argue that this data race is harmless, especially
if X is a single byte. But the more I talk to compiler writers, the
less comfortable I become with data races in general. :-/
So I would also feel better if the "Y = 1" was WRITE_ONCE().
On the other hand, this is a great opportunity to try out Alan Stern's
prototype plain-accesses patch to the Linux Kernel Memory Model (LKMM)!
https://lkml.kernel.org/r/Pine.LNX.4.44L0.1903191459270.1593-200000@iolanthe.rowland.org
Also adding Alan on CC.
Here is what I believe is the litmus test that your are interested in:
------------------------------------------------------------------------
C OlegNesterov-put_pid
{}
P0(int *x, int *y)
{
*x = 1;
smp_mb();
*y = 1;
}
P1(int *x, int *y)
{
int r1;
r1 = READ_ONCE(*y);
if (r1)
*x = 2;
}
exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------
Running this through herd with Alan's patch detects the data race
and says that the undesired outcome is allowed:
$ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid.litmus
Test OlegNesterov-put_pid Allowed
States 3
1:r1=0; x=1;
1:r1=1; x=1;
1:r1=1; x=2;
Ok
Witnesses
Positive: 1 Negative: 2
Flag data-race
Condition exists (1:r1=1 /\ not (x=2))
Observation OlegNesterov-put_pid Sometimes 1 2
Time OlegNesterov-put_pid 0.00
Hash=a3e0043ad753effa860fea37eeba0a76
Using WRITE_ONCE() for P0()'s store to y still allows this outcome,
although it does remove the "Flag data-race".
Using WRITE_ONCE() for both P0()'s store to y and P1()'s store to x
gets rid of both the "Flag data-race" and the undesired outcome:
$ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-WO.litmus
Test OlegNesterov-put_pid-WO-WO Allowed
States 2
1:r1=0; x=1;
1:r1=1; x=2;
No
Witnesses
Positive: 0 Negative: 2
Condition exists (1:r1=1 /\ not (x=2))
Observation OlegNesterov-put_pid-WO-WO Never 0 2
Time OlegNesterov-put_pid-WO-WO 0.01
Hash=6e1643e3c5e4739b590bde0a8e8a918e
Here is the corresponding litmus test, in case I messed something up:
------------------------------------------------------------------------
C OlegNesterov-put_pid-WO-WO
{}
P0(int *x, int *y)
{
*x = 1;
smp_mb();
WRITE_ONCE(*y, 1);
}
P1(int *x, int *y)
{
int r1;
r1 = READ_ONCE(*y);
if (r1)
WRITE_ONCE(*x, 2);
}
exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------
> If not, then put_pid() needs atomic_read_acquire() as it was proposed in that
> discussion.
Good point, let's try with smp_load_acquire() in P1():
$ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-WO-sla.litmus
Test OlegNesterov-put_pid-WO-sla Allowed
States 2
1:r1=0; x=1;
1:r1=1; x=2;
No
Witnesses
Positive: 0 Negative: 2
Condition exists (1:r1=1 /\ not (x=2))
Observation OlegNesterov-put_pid-WO-sla Never 0 2
Time OlegNesterov-put_pid-WO-sla 0.01
Hash=4fb0276eabf924793dec1970199db3a6
This also works. Here is the litmus test:
------------------------------------------------------------------------
C OlegNesterov-put_pid-WO-sla
{}
P0(int *x, int *y)
{
*x = 1;
smp_mb();
WRITE_ONCE(*y, 1);
}
P1(int *x, int *y)
{
int r1;
r1 = smp_load_acquire(y);
if (r1)
*x = 2;
}
exists (1:r1=1 /\ ~x=2)
------------------------------------------------------------------------
Demoting P0()'s WRITE_ONCE() to a plain write while leaving P1()'s
smp_load_acquire() gets us a data race and allows the undesired
outcome:
$ herd7 -conf linux-kernel.cfg /tmp/OlegNesterov-put_pid-sla.litmus
Test OlegNesterov-put_pid-sla Allowed
States 3
1:r1=0; x=1;
1:r1=1; x=1;
1:r1=1; x=2;
Ok
Witnesses
Positive: 1 Negative: 2
Flag data-race
Condition exists (1:r1=1 /\ not (x=2))
Observation OlegNesterov-put_pid-sla Sometimes 1 2
Time OlegNesterov-put_pid-sla 0.01
Hash=ec6f71f3d9f7cd6e45a874c872e3d946
But what if you are certain that the compiler cannot mess up your use
of plain C-language loads and stores? Then simply tell LKMM that they
are READ_ONCE() and WRITE_ONCE(), respectively. LKMM is admittedly
somewhat paranoid, but real C compilers really do tear stores of certain
constants on systems (like x86) that have store-immediate instructions,
so a bit of paranoia is not misplaced here. ;-)
Plus please note that this patch to LKMM is prototype and thus subject
to change.
Thanx, Paul
next prev parent reply other threads:[~2019-03-28 17:37 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-27 14:53 [PATCH] Convert struct pid count to refcount_t Joel Fernandes (Google)
2019-03-28 0:06 ` Kees Cook
2019-03-28 0:59 ` Jann Horn
2019-03-28 2:34 ` Joel Fernandes
2019-03-28 2:57 ` Jann Horn
2019-03-28 14:37 ` Joel Fernandes
2019-03-28 15:17 ` Jann Horn
2019-03-28 16:26 ` Oleg Nesterov
2019-03-28 17:37 ` Paul E. McKenney [this message]
2019-03-29 17:32 ` Oleg Nesterov
2019-03-29 19:45 ` Alan Stern
2019-04-01 15:28 ` David Laight
2019-03-30 2:36 ` Joel Fernandes
2019-03-30 15:16 ` Alan Stern
2019-03-31 21:57 ` Paul E. McKenney
2019-03-31 21:55 ` Paul E. McKenney
2019-04-01 21:11 ` Joel Fernandes
2019-04-04 15:23 ` Paul E. McKenney
2019-04-04 16:01 ` Alan Stern
2019-04-04 18:08 ` Joel Fernandes
2019-04-04 18:19 ` Paul E. McKenney
2019-04-04 20:31 ` Joel Fernandes
2019-04-04 19:09 ` Alan Stern
2019-03-28 20:00 ` Joel Fernandes
2019-03-29 2:24 ` Joel Fernandes
2019-03-28 16:52 ` Kees Cook
2019-03-28 14:26 ` Oleg Nesterov
2019-03-28 14:39 ` Joel Fernandes
2019-03-29 2:34 ` Joel Fernandes
2019-03-29 17:37 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190328173707.GP4102@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=elena.reshetova@intel.com \
--cc=jannh@google.com \
--cc=joel@joelfernandes.org \
--cc=keescook@chromium.org \
--cc=kernel-hardening@lists.openwall.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=oleg@redhat.com \
--cc=stern@rowland.harvard.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).