From: Peter Zijlstra <peterz@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Douglas Hatch <doug.hatch@hp.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>, Waiman Long <Waiman.Long@hp.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Norton, Scott J" <scott.norton@hp.com>,
Peter Anvin <hpa@zytor.com>,
"linux-tip-commits@vger.kernel.org"
<linux-tip-commits@vger.kernel.org>
Subject: Re: [tip:locking/core] locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
Date: Tue, 12 May 2015 10:45:29 +0200 [thread overview]
Message-ID: <20150512084529.GC21418@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CA+55aFxre4JPFoPUC1UqsCvG8=nSka=AJMvFgqKzG8XiNWoD=A@mail.gmail.com>
On Mon, May 11, 2015 at 10:50:42AM -0700, Linus Torvalds wrote:
> On Mon, May 11, 2015 at 7:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Hmm, so I looked at the set_mb() definitions and I figure we want to do
> > something like the below, right?
>
> I don't think you need to do this for the non-smp cases.
Well, its the store tearing thing again, we use WRITE_ONCE() in
smp_store_release() for the same reason. We want it to be a single
store.
> The whole
> thing is about smp memory ordering, so on UP you don't even need the
> WRITE_ONCE(), much less a barrier.
No, we actually need both still on UP.
Imagine the following sequence:
for (;;) {
set_current_state(TASK_KILLABLE);
if (cond)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
vs
<IRQ>
wake_up_process(p);
As we know, set_current_state() is set_mb(), and thus will look like:
current->state = TASK_KILLABLE;
smp_mb();
if (cond)
break;
So without the WRITE_ONCE() we can get store tearing, and suppose our
compiler is insane and translates the store into 4 byte stores.
current->state[0] = TASK_UNINTERRUPTIBLE;
current->state[1] = TASK_WAKEKILL >> 8;
current->state[2] = 0;
current->state[3] = 0;
The obvious fail here is to get the wakeup interrupt between [0] and
[1].
current->state[0] = TASK_UNINTERRUPTIBLE;
<IRQ>
wake_up_process(p);
p->state = TASK_RUNNING;
current->state[1] = TASK_WAKEKILL >> 8;
current->state[2] = 0;
current->state[3] = 0;
With the end result that ->state == TASK_WAKEKILL, from which we'll not
wake up unless killed.
Similarly, without the barrier(), our friendly compiler is allowed to
do:
if (cond)
break
current->state = TASK_KILLABLE;
schedule();
Which we all know to be broken.
So no, set_mb() (or smp_store_mb()) very much does need the WRITE_ONCE()
and a barrier() on UP.
next prev parent reply other threads:[~2015-05-12 8:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <tip-52c9d2badd1ae4d11c29de57d4e964e48afd3cb4@git.kernel.org>
2015-05-11 14:54 ` [tip:locking/core] locking/pvqspinlock: Replace xchg() by the more descriptive set_mb() Peter Zijlstra
2015-05-11 16:50 ` Waiman Long
2015-05-11 17:50 ` Linus Torvalds
2015-05-12 8:45 ` Peter Zijlstra [this message]
2015-05-12 13:00 ` Peter Zijlstra
2015-05-12 8:53 ` Peter Zijlstra
2015-05-12 14:59 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150512084529.GC21418@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=Waiman.Long@hp.com \
--cc=doug.hatch@hp.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.