From: Oleg Nesterov <oleg@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>,
Christian Brauner <brauner@kernel.org>,
David Howells <dhowells@redhat.com>,
WangYuli <wangyuli@uniontech.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()
Date: Mon, 6 Jan 2025 17:30:39 +0100 [thread overview]
Message-ID: <20250106163038.GE7233@redhat.com> (raw)
In-Reply-To: <CAHk-=wj9Hr4PBobc13ZEv3HvFfpiZYrWX2-t5F62TXmMJoL5ZA@mail.gmail.com>
On 01/04, Linus Torvalds wrote:
>
> On Thu, 2 Jan 2025 at 08:33, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > I was going to send a one-liner patch which adds mb() into pipe_poll()
> > but then I decided to make even more spam and ask some questions first.
>
> poll functions are not *supposed* to need memory barriers.
...
> But no, this is most definitely not a pipe-only thing.
Agreed, that is why I didn't send the patch which adds mb() into pipe_poll().
> They are supposed to do "poll_wait()" and then not need any more
> serialization after that, because we either
>
> (a) have a NULL wait-address, in which case we're not going to sleep
> and this is just a "check state"
To be honest, I don't understand the wait_address check in poll_wait(),
it seems that wait_address is never NULL.
But ->_qproc can be NULL if do_poll/etc does another iteration after
poll_schedule_timeout(), in this case we can sleep again. But this
case is fine.
> (b) the waiting function is supposed to do add_wait_queue() (usually
> by way of __pollwait) and that should be a sufficient barrier to
> anybody who does a wakeup
Yes.
> And this makes me think that the whole comment above
> waitqueue_active() is just fundamentally wrong. The smp_mb() is *not*
> sufficient in the sequence
>
> smp_mb();
> if (waitqueue_active(wq_head))
> wake_up(wq_head);
>
> because while it happens to work wrt prepare_to_wait() sequences, is
> is *not* against other users of add_wait_queue().
Well, this comment doesn't look wrong to me, but perhaps it can be
more clear. It should probably explain that this pseudo code is only
correct because the waiter has a barrier before "if (@cond)" which
pairs with smp_mb() above waitqueue_active(). It even says
prepare_to_wait(&wq_head, &wait, state);
// smp_mb() from set_current_state()
but perhaps this is not 100% clear.
> But I think this poll() thing is very much an example of this *not*
> being valid, and I don't think it's in any way pipe-specific.
Yes.
> So maybe we really do need to add the memory barrier to
> __add_wait_queue(). That's going to be painful, particularly with lots
> of users not needing it because they have the barrier in all the other
> places.
Yes, that is why I don't really like the idea to add mb() into
__add_wait_queue().
> End result: maybe adding it just to __pollwait() is the thing to do,
> in the hopes that non-poll users all use the proper sequences.
That is what I tried to propose. Will you agree with this change?
We can even use smp_store_mb(), say
@@ -224,11 +224,12 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
if (!entry)
return;
entry->filp = get_file(filp);
- entry->wait_address = wait_address;
entry->key = p->_key;
init_waitqueue_func_entry(&entry->wait, pollwake);
entry->wait.private = pwq;
add_wait_queue(wait_address, &entry->wait);
+ // COMMENT
+ smp_store_mb(entry->wait_address, wait_address);
Oleg.
next prev parent reply other threads:[~2025-01-06 16:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-29 13:57 PATCH? avoid the unnecessary wakeups in pipe_read() Oleg Nesterov
2024-12-29 17:27 ` Linus Torvalds
2025-01-02 16:33 ` wakeup_pipe_readers/writers() && pipe_poll() Oleg Nesterov
2025-01-04 20:57 ` Manfred Spraul
2025-01-04 22:05 ` Linus Torvalds
2025-01-06 16:30 ` Oleg Nesterov [this message]
2025-01-06 18:03 ` Oleg Nesterov
2025-01-06 18:23 ` Linus Torvalds
2025-01-06 18:36 ` Oleg Nesterov
2025-01-06 19:33 ` Oleg Nesterov
2025-01-06 20:23 ` Linus Torvalds
2025-01-07 17:25 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250106163038.GE7233@redhat.com \
--to=oleg@redhat.com \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=torvalds@linux-foundation.org \
--cc=wangyuli@uniontech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).