From: Oleg Nesterov <oleg@redhat.com>
To: Manfred Spraul <manfred@colorfullife.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
oliver.sang@intel.com, ebiederm@xmission.com,
colin.king@canonical.com, josh@joshtriplett.org,
penberg@cs.helsinki.fi, mingo@elte.hu, jes@sgi.com, hch@lst.de,
aia21@cantab.net, arjan@infradead.org, jgarzik@pobox.com,
neukum@fachschaft.cup.uni-muenchen.de, oliver@neukum.name,
dada1@cosmosbay.com, axboe@kernel.dk, axboe@suse.de,
nickpiggin@yahoo.com.au, dhowells@redhat.com, nathans@sgi.com,
rolandd@cisco.com, tytso@mit.edu, bunk@stusta.de,
pbadari@us.ibm.com, ak@linux.intel.com, ak@suse.de,
davem@davemloft.net, jsipek@cs.sunysb.edu, jens.axboe@oracle.com,
ramsdell@mitre.org, hch@infradead.org, akpm@linux-foundation.org,
randy.dunlap@oracle.com, efault@gmx.de, rdunlap@infradead.org,
haveblue@us.ibm.com, drepper@redhat.com, dm.n9107@gmail.com,
jblunck@suse.de, davidel@xmailserver.org,
mtk.manpages@googlemail.com, linux-arch@vger.kernel.org,
vda.linux@googlemail.com, jmorris@namei.org, serue@us.ibm.com,
hca@linux.ibm.com, rth@twiddle.net, lethal@linux-sh.org,
tony.luck@intel.com, heiko.carstens@de.ibm.com,
andi@firstfloor.org, corbet@lwn.net, crquan@gmail.com,
mszeredi@suse.cz, miklos@szeredi.hu, peterz@infradead.org,
a.p.zijlstra@chello.nl, earl_chew@agilent.com, npiggin@gmail.com,
npiggin@suse.de, julia@diku.dk, jaxboe@fusionio.com,
nikai@nikai.net, dchinner@redhat.com, davej@redhat.com,
npiggin@kernel.dk, eric.dumazet@gmail.com,
tim.c.chen@linux.intel.com, xemul@parallels.com, tj@kernel.org,
serge.hallyn@canonical.com, gorcunov@openvz.org, bcrl@kvack.org,
alan@lxorguk.ukuu.org.uk, will.deacon@arm.com, will@kernel.org,
zab@redhat.com, balbi@ti.com, gregkh@linuxfoundation.org,
rusty@rustcorp.com.au, socketpair@gmail.com,
penguin-kernel@i-love.sakura.ne.jp, mhocko@kernel.org,
axboe@fb.com, tglx@linutronix.de, mcgrof@kernel.org,
linux@dominikbrodowski.net, willy@infradead.org,
paulmck@kernel.org, kernel@tuxforce.de,
linux-morello@op-lists.linaro.org
Subject: Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write
Date: Sat, 28 Dec 2024 15:32:49 +0100 [thread overview]
Message-ID: <20241228143248.GB5302@redhat.com> (raw)
In-Reply-To: <1df49d97-df0e-4471-9e40-a850b758d981@colorfullife.com>
On 12/27, Manfred Spraul wrote:
>
> >I _think_ that
> >
> > wait_event_whatever(WQ, CONDITION);
> >vs
> >
> > CONDITION = 1;
> > if (wq_has_sleeper(WQ))
> > wake_up_xxx(WQ, ...);
> >
> >is fine.
>
> This pattern is documented in wait.h:
>
> https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/wait.h#L96
>
> Thus if there an issue, then the documentation should be updated.
Agreed, basically the same pattern, prepare_to_wait_event() is similar
to prepare_to_wait().
> But I do not understand this comment (from 2.6.0)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/kernel/fork.c?h=v2.6.0&id=e220fdf7a39b54a758f4102bdd9d0d5706aa32a7
>
> >/* * Note: we use "set_current_state()" _after_ the wait-queue add, *
> >because we need a memory barrier there on SMP, so that any * wake-function
> >that tests for the wait-queue being active * will be guaranteed to see
> >waitqueue addition _or_ subsequent * tests in this thread will see the
> >wakeup having taken place. * * The spin_unlock() itself is semi-permeable
> >and only protects * one way (it only protects stuff inside the critical
> >region and * stops them from bleeding out - it would still allow
> >subsequent * loads to move into the the critical region). */
...
> set_current_state() now uses smp_store_mb(), which is a memory barrier
> _after_ the store.
And afaics this is what we actually need.
> Thus I do not see what enforces that the store happens
> before the store for the __add_wait_queue().
IIUC this is fine, no need to serialize list_add() and STORE(tsk->__state),
they can be reordered.
But we need mb() between __add_wait_queue + __set_current_state (in any
order) and the subsequent "if (CONDITION)" check.
> --- a/kernel/sched/wait.c
> +++ b/kernel/sched/wait.c
> @@ -124,6 +124,23 @@ static int __wake_up_common_lock(struct wait_queue_head *wq_head, unsigned int m
> int __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
> int nr_exclusive, void *key)
> {
> + if (list_empty(&wq_head->head)) {
> + struct list_head *pn;
> +
> + /*
> + * pairs with spin_unlock_irqrestore(&wq_head->lock);
> + * We actually do not need to acquire wq_head->lock, we just
> + * need to be sure that there is no prepare_to_wait() that
> + * completed on any CPU before __wake_up was called.
> + * Thus instead of load_acquiring the spinlock and dropping
> + * it again, we load_acquire the next list entry and check
> + * that the list is not empty.
> + */
> + pn = smp_load_acquire(&wq_head->head.next);
> +
> + if(pn == &wq_head->head)
> + return 0;
> + }
Too subtle for me ;)
I have some concerns, but I need to think a bit more to (try to) actually
understand this change.
Oleg.
next prev parent reply other threads:[~2024-12-28 14:34 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-25 9:42 [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write WangYuli
2024-12-25 13:30 ` Andy Shevchenko
2024-12-25 13:53 ` Kent Overstreet
2024-12-25 16:04 ` Mateusz Guzik
2024-12-25 16:32 ` Kent Overstreet
2024-12-25 17:22 ` Mateusz Guzik
2024-12-25 17:41 ` Kent Overstreet
2024-12-25 15:42 ` WangYuli
2024-12-25 16:00 ` Willy Tarreau
2024-12-25 16:32 ` WangYuli
2024-12-25 16:56 ` Willy Tarreau
2024-12-26 16:00 ` Oleg Nesterov
2024-12-26 19:02 ` Linus Torvalds
2024-12-26 20:11 ` Oleg Nesterov
2024-12-26 20:29 ` Linus Torvalds
2024-12-26 20:57 ` Oleg Nesterov
2024-12-27 15:54 ` Oleg Nesterov
2024-12-27 16:43 ` Oleg Nesterov
2024-12-27 18:39 ` Manfred Spraul
2024-12-28 14:32 ` Oleg Nesterov [this message]
2024-12-28 15:22 ` Oleg Nesterov
2024-12-28 16:32 ` Oleg Nesterov
2024-12-28 18:53 ` Manfred Spraul
2024-12-29 11:54 ` Oleg Nesterov
2024-12-28 16:45 ` Manfred Spraul
2024-12-29 11:57 ` Oleg Nesterov
2024-12-29 12:41 ` Manfred Spraul
2024-12-29 13:05 ` Oleg Nesterov
2024-12-29 13:13 ` Oleg Nesterov
2024-12-29 19:54 ` Manfred Spraul
2024-12-30 15:38 ` Oleg Nesterov
2024-12-31 11:14 ` Manfred Spraul
2024-12-31 19:38 ` Linus Torvalds
2024-12-31 20:24 ` Oleg Nesterov
2024-12-31 22:31 ` Linus Torvalds
2025-01-02 13:57 ` Oleg Nesterov
2025-01-04 21:15 ` RFC: Checkpatch: Introduce list of functions that need memory barriers Manfred Spraul
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241228143248.GB5302@redhat.com \
--to=oleg@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aia21@cantab.net \
--cc=ak@linux.intel.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=andi@firstfloor.org \
--cc=arjan@infradead.org \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=axboe@suse.de \
--cc=balbi@ti.com \
--cc=bcrl@kvack.org \
--cc=brauner@kernel.org \
--cc=bunk@stusta.de \
--cc=colin.king@canonical.com \
--cc=corbet@lwn.net \
--cc=crquan@gmail.com \
--cc=dada1@cosmosbay.com \
--cc=davej@redhat.com \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=dchinner@redhat.com \
--cc=dhowells@redhat.com \
--cc=dm.n9107@gmail.com \
--cc=drepper@redhat.com \
--cc=earl_chew@agilent.com \
--cc=ebiederm@xmission.com \
--cc=efault@gmx.de \
--cc=eric.dumazet@gmail.com \
--cc=gorcunov@openvz.org \
--cc=gregkh@linuxfoundation.org \
--cc=haveblue@us.ibm.com \
--cc=hca@linux.ibm.com \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=heiko.carstens@de.ibm.com \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=jblunck@suse.de \
--cc=jens.axboe@oracle.com \
--cc=jes@sgi.com \
--cc=jgarzik@pobox.com \
--cc=jmorris@namei.org \
--cc=josh@joshtriplett.org \
--cc=jsipek@cs.sunysb.edu \
--cc=julia@diku.dk \
--cc=kernel@tuxforce.de \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-morello@op-lists.linaro.org \
--cc=linux@dominikbrodowski.net \
--cc=manfred@colorfullife.com \
--cc=mcgrof@kernel.org \
--cc=mhocko@kernel.org \
--cc=miklos@szeredi.hu \
--cc=mingo@elte.hu \
--cc=mszeredi@suse.cz \
--cc=mtk.manpages@googlemail.com \
--cc=nathans@sgi.com \
--cc=neukum@fachschaft.cup.uni-muenchen.de \
--cc=nickpiggin@yahoo.com.au \
--cc=nikai@nikai.net \
--cc=npiggin@gmail.com \
--cc=npiggin@kernel.dk \
--cc=npiggin@suse.de \
--cc=oliver.sang@intel.com \
--cc=oliver@neukum.name \
--cc=paulmck@kernel.org \
--cc=pbadari@us.ibm.com \
--cc=penberg@cs.helsinki.fi \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=peterz@infradead.org \
--cc=ramsdell@mitre.org \
--cc=randy.dunlap@oracle.com \
--cc=rdunlap@infradead.org \
--cc=rolandd@cisco.com \
--cc=rth@twiddle.net \
--cc=rusty@rustcorp.com.au \
--cc=serge.hallyn@canonical.com \
--cc=serue@us.ibm.com \
--cc=socketpair@gmail.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=vda.linux@googlemail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will.deacon@arm.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xemul@parallels.com \
--cc=zab@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.