Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Boqun Feng <boqun.feng@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Waiman Long <longman@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Mike Galbraith <efault@gmx.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
Date: Sun, 12 Sep 2021 11:57:22 +0800	[thread overview]
Message-ID: <YT16ognizWI6xROs@boqun-archlinux> (raw)
In-Reply-To: <YToZ4h/nfsrD3JfY@hirez.programming.kicks-ass.net>

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > > 
> > > 	current_save_and_set_rtlock_wait_state();
> > > 	for (;;) {
> > > 		if (try_lock())
> > > 			break;
> > > 
> > > 		raw_spin_unlock_irq(&lock->wait_lock);
> > > 		schedule();
> > > 		raw_spin_lock_irq(&lock->wait_lock);
> > > 
> > > 		set_current_state(TASK_RTLOCK_WAIT);
> > > 	}
> > > 	current_restore_rtlock_saved_state();
> > > 
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> > 
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
> 
> Moo yes, so the earlier changelog I wrote was something like:
> 
> 	current_save_and_set_rtlock_wait_state();
> 	for (;;) {
> 		if (try_lock())
> 			break;
> 
> 		raw_spin_unlock_irq(&lock->wait_lock);
> 		if (!cond)
> 			schedule();
> 		raw_spin_lock_irq(&lock->wait_lock);
> 
> 		set_current_state(TASK_RTLOCK_WAIT);
> 	}
> 	current_restore_rtlock_saved_state();
> 
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.
> 
> It all sorta works in the current application because most things are
> serialized by ->wait_lock, but given the 'normal' wait pattern I got
> highly suspicious of there not being a full barrier around.

Hmm.. I think ->pi_lock actually protects us here. IIUC, a mising
wake-up would happen if try_to_wake_up() failed to observe the __state
change by the about-to-wait task, and the about-to-wait task didn't
observe the condition set by the waker task, for example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
	...
	current->__state = TASK_RTLOCK_WAIT
	...
	if (!cond) // !cond is true because of memory reordering
	  schedule(); // sleep, and may not be waken up again.

But let's add ->pi_lock critical sections into the example:

	TASK 0				TASK 1
	======				======
					cond = 1;
					...
					try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
					  raw_spin_lock_irqsave(->pi_lock,...);
					  ttwu_state_match(...)
					    if (t0->__state & TASK_RTLOCK_WAIT) // false
					      ..
					    return false; // don't wake up
					  raw_spin_unlock_irqrestore(->pi_lock,...); // A
	...
	raw_spin_lock_irqsave(->pi_lock, ...); // B
	current->__state = TASK_RTLOCK_WAIT
	raw_spin_unlock_irqrestore(->pi_lock, ...);
	if (!cond)
	  schedule();

Now the read of cond on TASK0 must observe the store of cond on TASK1,
because accesses to __state is serialized by ->pi_lock, so if TASK1's
read to __state didn't observe the write of TASK0 to __state, then the
lock B must read from the unlock A (or another unlock co-after A),
then we have a release-acquire pair to guarantee that the read of cond
on TASK0 sees the write of cond on TASK1. Simplify this by a litmus
test below:

	C unlock-lock
	{
	}

	P0(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		spin_lock(s);
		WRITE_ONCE(*state, 1);
		spin_unlock(s);
		r1 = READ_ONCE(*cond);
	}

	P1(spinlock_t *s, int *cond, int *state)
	{
		int r1;

		WRITE_ONCE(*cond, 1);
		spin_lock(s);
		r1 = READ_ONCE(*state);
		spin_unlock(s);
	}

	exists (0:r1=0 /\ 1:r1=0)

and result is:

	Test unlock-lock Allowed
	States 3
	0:r1=0; 1:r1=1;
	0:r1=1; 1:r1=0;
	0:r1=1; 1:r1=1;
	No
	Witnesses
	Positive: 0 Negative: 3
	Condition exists (0:r1=0 /\ 1:r1=0)
	Observation unlock-lock Never 0 3
	Time unlock-lock 0.01
	Hash=e1f914505f07e380405f65d3b0fb6940

In short, since we write to the __state with ->pi_lock held, I don't
think we need to smp_store_mb() for __state. But maybe I'm missing
something subtle here ;-)

Regards,
Boqun

next prev parent reply	other threads:[~2021-09-12  4:01 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 10:59 [PATCH 0/4] locking/rwbase: Assorted fixes Peter Zijlstra
2021-09-09 10:59 ` [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state() Peter Zijlstra
2021-09-09 13:45   ` Will Deacon
2021-09-09 14:27     ` Peter Zijlstra
2021-09-10 12:57       ` Will Deacon
2021-09-10 13:17         ` Peter Zijlstra
2021-09-10 14:01           ` Peter Zijlstra
2021-09-10 15:06             ` Will Deacon
2021-09-10 16:07             ` Waiman Long
2021-09-10 17:09               ` Peter Zijlstra
2021-09-12  3:57       ` Boqun Feng [this message]
2021-09-10 12:45   ` Sebastian Andrzej Siewior
2021-09-13 22:08   ` Thomas Gleixner
2021-09-13 22:52     ` Thomas Gleixner
2021-09-14  6:45       ` Peter Zijlstra
2021-09-09 10:59 ` [PATCH 2/4] locking/rwbase: Properly match set_and_save_state() to restore_state() Peter Zijlstra
2021-09-09 13:53   ` Will Deacon
2021-09-14  7:31   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 3/4] locking/rwbase: Fix rwbase_write_lock() vs __rwbase_read_lock() Peter Zijlstra
2021-09-14  7:45   ` Thomas Gleixner
2021-09-14 13:59     ` Peter Zijlstra
2021-09-14 15:00       ` Thomas Gleixner
2021-09-16 11:59       ` [tip: locking/urgent] locking/rwbase: Extract __rwbase_write_trylock() tip-bot2 for Peter Zijlstra
2021-09-09 10:59 ` [PATCH 4/4] locking/rwbase: Take care of ordering guarantee for fastpath reader Peter Zijlstra
2021-09-14  7:46   ` Thomas Gleixner
2021-09-16 11:59   ` [tip: locking/urgent] " tip-bot2 for Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YT16ognizWI6xROs@boqun-archlinux \
    --to=boqun.feng@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=efault@gmx.de \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox