All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>, Al Viro <viro@ZenIV.linux.org.uk>,
	Bart Van Assche <bvanassche@acm.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Neil Brown <neilb@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock()
Date: Fri, 2 Sep 2016 14:06:02 +0200	[thread overview]
Message-ID: <20160902120601.GA26495@redhat.com> (raw)
In-Reply-To: <20160901190141.GJ10138@twins.programming.kicks-ass.net>

On 09/01, Peter Zijlstra wrote:
>
> On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:
>
> > We do not need anything tricky to avoid the race,
>
> The race being:
>
> CPU0			CPU1			CPU2
>
> 			__wait_on_bit_lock()
> 			  bit_wait_io()
> 			    io_schedule()
>
> clear_bit_unlock()
> __wake_up_common(.nr_exclusive=1)
>   list_for_each_entry()
>     if (curr->func() && --nr_exclusive)
>       break
>
> 						signal()
>
> 			    if (signal_pending_state()) == TRUE
> 			      return -EINTR
>
> And no progress because CPU1 exits without acquiring the lock and CPU0
> thinks its done because it woke someone.

Yes,

> > we can just call finish_wait() if action() fails.
>
> That would be bit_wait*() returning -EINTR because sigpending.

Hmm. Not sure I understand... Let me reply just in case, even if
I am sure you get it right.

Yes, in the likely case we are going to fail with -EINTR, but only
if test-and-set after thar fails.

> Sure, you can always call that, first thing through the loop does
> prepare again, so no harm. That however does not connect to your
> condition,.. /me puzzled

If ->action() fails we will abort the loop in any case, prepare
won't be called. So in this case finish_wait() does the right thing.

> > test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
>
> You're talking about this ordering?:
>
> 	finish_wait()			clear_bit_unlock();
> 	  list_empty_careful()
>
> 	/* MB implied */		smp_mb__after_atomic();
> 	test_and_set_bit()		wake_up_page()
> 					  ...
> 					    autoremove_wake_function()
> 					      list_del_init();
>
>
> That could do with spelling out I feel.. :-)

Yes, yes.

> >  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> >  			wait_bit_action_f *action, unsigned mode)
> >  {
> > +	int ret = 0;
> >
> > +	for (;;) {
> >  		prepare_to_wait_exclusive(wq, &q->wait, mode);
> > +		if (test_bit(q->key.bit_nr, q->key.flags)) {
> > +			ret = action(&q->key, mode);
> > +			/*
> > +			 * Ensure that clear_bit() + wake_up() right after
> > +			 * test_and_set_bit() below can't see us; it should
> > +			 * wake up another exclusive waiter if we fail.
> > +			 */
> > +			if (ret)
> > +				finish_wait(wq, &q->wait);
> > +		}
> > +		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
>
> So this is the actual difference, instead of failing the lock and
> aborting on signal, we acquire the lock if possible. If its not
> possible, someone else has it, which guarantees that someone else will
> do an unlock which implies another wakeup and life goes on.

Yes. This way we eliminate the need for the additional wake_up.

Oleg.

  parent reply	other threads:[~2016-09-02 12:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-26 12:44 [PATCH 0/2] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Oleg Nesterov
2016-08-26 12:45 ` [PATCH 1/2] " Oleg Nesterov
2016-09-01 11:39   ` Peter Zijlstra
2016-09-01 17:26     ` Oleg Nesterov
2016-09-01 18:09       ` Peter Zijlstra
2016-08-26 12:45 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Oleg Nesterov
2016-08-26 12:47   ` Oleg Nesterov
2016-09-01 19:01   ` Peter Zijlstra
2016-09-01 19:08     ` Peter Zijlstra
2016-09-02 12:06       ` Oleg Nesterov
2016-09-01 22:17     ` Peter Zijlstra
2016-09-02 12:06       ` Oleg Nesterov
2016-09-02 13:20         ` Peter Zijlstra
2016-09-02 12:06     ` Oleg Nesterov [this message]
2016-09-01 11:03 ` [PATCH 0/2] sched/wait: abort_exclusive_wait() should pass TASK_NORMAL to wake_up() Peter Zijlstra
     [not found] <00e501d201cf$7bfecd40$73fc67c0$@alibaba-inc.com>
2016-08-29  8:41 ` [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit_lock() Hillf Danton
2016-08-29 13:48   ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160902120601.GA26495@redhat.com \
    --to=oleg@redhat.com \
    --cc=bvanassche@acm.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=neilb@suse.de \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.