All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>,
	Christian Brauner <brauner@kernel.org>,
	David Howells <dhowells@redhat.com>,
	WangYuli <wangyuli@uniontech.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()
Date: Mon, 6 Jan 2025 17:30:39 +0100	[thread overview]
Message-ID: <20250106163038.GE7233@redhat.com> (raw)
In-Reply-To: <CAHk-=wj9Hr4PBobc13ZEv3HvFfpiZYrWX2-t5F62TXmMJoL5ZA@mail.gmail.com>

On 01/04, Linus Torvalds wrote:
>
> On Thu, 2 Jan 2025 at 08:33, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > I was going to send a one-liner patch which adds mb() into pipe_poll()
> > but then I decided to make even more spam and ask some questions first.
>
> poll functions are not *supposed* to need memory barriers.
...
> But no, this is most definitely not a pipe-only thing.

Agreed, that is why I didn't send the patch which adds mb() into pipe_poll().

> They are supposed to do "poll_wait()" and then not need any more
> serialization after that, because we either
>
>  (a) have a NULL wait-address, in which case we're not going to sleep
> and this is just a "check state"

To be honest, I don't understand the wait_address check in poll_wait(),
it seems that wait_address is never NULL.

But ->_qproc can be NULL if do_poll/etc does another iteration after
poll_schedule_timeout(), in this case we can sleep again. But this
case is fine.

>  (b) the waiting function is supposed to do add_wait_queue() (usually
> by way of __pollwait) and that should be a sufficient barrier to
> anybody who does a wakeup

Yes.

> And this makes me think that the whole comment above
> waitqueue_active() is just fundamentally wrong. The smp_mb() is *not*
> sufficient in the sequence
>
>     smp_mb();
>     if (waitqueue_active(wq_head))
>         wake_up(wq_head);
>
> because while it happens to work wrt prepare_to_wait() sequences, is
> is *not* against other users of add_wait_queue().

Well, this comment doesn't look wrong to me, but perhaps it can be
more clear. It should probably explain that this pseudo code is only
correct because the waiter has a barrier before "if (@cond)" which
pairs with smp_mb() above waitqueue_active(). It even says

	prepare_to_wait(&wq_head, &wait, state);
	// smp_mb() from set_current_state()

but perhaps this is not 100% clear.

> But I think this poll() thing is very much an example of this *not*
> being valid, and I don't think it's in any way pipe-specific.

Yes.

> So maybe we really do need to add the memory barrier to
> __add_wait_queue(). That's going to be painful, particularly with lots
> of users not needing it because they have the barrier in all the other
> places.

Yes, that is why I don't really like the idea to add mb() into
__add_wait_queue().

> End result: maybe adding it just to __pollwait() is the thing to do,
> in the hopes that non-poll users all use the proper sequences.

That is what I tried to propose. Will you agree with this change?
We can even use smp_store_mb(), say

	@@ -224,11 +224,12 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
		if (!entry)
			return;
		entry->filp = get_file(filp);
	-	entry->wait_address = wait_address;
		entry->key = p->_key;
		init_waitqueue_func_entry(&entry->wait, pollwake);
		entry->wait.private = pwq;
		add_wait_queue(wait_address, &entry->wait);
	+	// COMMENT
	+	smp_store_mb(entry->wait_address, wait_address);

Oleg.


  reply	other threads:[~2025-01-06 16:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-29 13:57 PATCH? avoid the unnecessary wakeups in pipe_read() Oleg Nesterov
2024-12-29 17:27 ` Linus Torvalds
2025-01-02 16:33 ` wakeup_pipe_readers/writers() && pipe_poll() Oleg Nesterov
2025-01-04 20:57   ` Manfred Spraul
2025-01-04 22:05   ` Linus Torvalds
2025-01-06 16:30     ` Oleg Nesterov [this message]
2025-01-06 18:03       ` Oleg Nesterov
2025-01-06 18:23       ` Linus Torvalds
2025-01-06 18:36         ` Oleg Nesterov
2025-01-06 19:33           ` Oleg Nesterov
2025-01-06 20:23             ` Linus Torvalds
2025-01-07 17:25               ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250106163038.GE7233@redhat.com \
    --to=oleg@redhat.com \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyuli@uniontech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.