linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>,
	Christian Brauner <brauner@kernel.org>,
	David Howells <dhowells@redhat.com>,
	WangYuli <wangyuli@uniontech.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()
Date: Mon, 6 Jan 2025 17:30:39 +0100	[thread overview]
Message-ID: <20250106163038.GE7233@redhat.com> (raw)
In-Reply-To: <CAHk-=wj9Hr4PBobc13ZEv3HvFfpiZYrWX2-t5F62TXmMJoL5ZA@mail.gmail.com>

On 01/04, Linus Torvalds wrote:
>
> On Thu, 2 Jan 2025 at 08:33, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > I was going to send a one-liner patch which adds mb() into pipe_poll()
> > but then I decided to make even more spam and ask some questions first.
>
> poll functions are not *supposed* to need memory barriers.
...
> But no, this is most definitely not a pipe-only thing.

Agreed, that is why I didn't send the patch which adds mb() into pipe_poll().

> They are supposed to do "poll_wait()" and then not need any more
> serialization after that, because we either
>
>  (a) have a NULL wait-address, in which case we're not going to sleep
> and this is just a "check state"

To be honest, I don't understand the wait_address check in poll_wait(),
it seems that wait_address is never NULL.

But ->_qproc can be NULL if do_poll/etc does another iteration after
poll_schedule_timeout(), in this case we can sleep again. But this
case is fine.

>  (b) the waiting function is supposed to do add_wait_queue() (usually
> by way of __pollwait) and that should be a sufficient barrier to
> anybody who does a wakeup

Yes.

> And this makes me think that the whole comment above
> waitqueue_active() is just fundamentally wrong. The smp_mb() is *not*
> sufficient in the sequence
>
>     smp_mb();
>     if (waitqueue_active(wq_head))
>         wake_up(wq_head);
>
> because while it happens to work wrt prepare_to_wait() sequences, is
> is *not* against other users of add_wait_queue().

Well, this comment doesn't look wrong to me, but perhaps it can be
more clear. It should probably explain that this pseudo code is only
correct because the waiter has a barrier before "if (@cond)" which
pairs with smp_mb() above waitqueue_active(). It even says

	prepare_to_wait(&wq_head, &wait, state);
	// smp_mb() from set_current_state()

but perhaps this is not 100% clear.

> But I think this poll() thing is very much an example of this *not*
> being valid, and I don't think it's in any way pipe-specific.

Yes.

> So maybe we really do need to add the memory barrier to
> __add_wait_queue(). That's going to be painful, particularly with lots
> of users not needing it because they have the barrier in all the other
> places.

Yes, that is why I don't really like the idea to add mb() into
__add_wait_queue().

> End result: maybe adding it just to __pollwait() is the thing to do,
> in the hopes that non-poll users all use the proper sequences.

That is what I tried to propose. Will you agree with this change?
We can even use smp_store_mb(), say

	@@ -224,11 +224,12 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
		if (!entry)
			return;
		entry->filp = get_file(filp);
	-	entry->wait_address = wait_address;
		entry->key = p->_key;
		init_waitqueue_func_entry(&entry->wait, pollwake);
		entry->wait.private = pwq;
		add_wait_queue(wait_address, &entry->wait);
	+	// COMMENT
	+	smp_store_mb(entry->wait_address, wait_address);

Oleg.


  reply	other threads:[~2025-01-06 16:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-29 13:57 PATCH? avoid the unnecessary wakeups in pipe_read() Oleg Nesterov
2024-12-29 17:27 ` Linus Torvalds
2025-01-02 16:33 ` wakeup_pipe_readers/writers() && pipe_poll() Oleg Nesterov
2025-01-04 20:57   ` Manfred Spraul
2025-01-04 22:05   ` Linus Torvalds
2025-01-06 16:30     ` Oleg Nesterov [this message]
2025-01-06 18:03       ` Oleg Nesterov
2025-01-06 18:23       ` Linus Torvalds
2025-01-06 18:36         ` Oleg Nesterov
2025-01-06 19:33           ` Oleg Nesterov
2025-01-06 20:23             ` Linus Torvalds
2025-01-07 17:25               ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250106163038.GE7233@redhat.com \
    --to=oleg@redhat.com \
    --cc=brauner@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wangyuli@uniontech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).