From: David Laight <david.laight.linux@gmail.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Nam Cao <namcao@linutronix.de>,
Soheil Hassas Yeganeh <soheil@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Shuah Khan <shuah@kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Khazhismel Kumykov <khazhy@google.com>,
Willem de Bruijn <willemb@google.com>,
Eric Dumazet <edumazet@google.com>, Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative
Date: Mon, 4 May 2026 13:00:27 +0100 [thread overview]
Message-ID: <20260504130027.50040ce6@pumpkin> (raw)
In-Reply-To: <20260429-november-speisen-3084d769d316@brauner>
On Wed, 29 Apr 2026 08:54:06 +0200
Christian Brauner <brauner@kernel.org> wrote:
> On Fri, Jul 18, 2025 at 10:59:48AM +0200, Nam Cao wrote:
> > On Fri, Jul 18, 2025 at 09:38:27AM +0100, Soheil Hassas Yeganeh wrote:
> > > On Fri, Jul 18, 2025 at 8:52 AM Nam Cao <namcao@linutronix.de> wrote:
> > > >
> > > > ep_events_available() checks for available events by looking at ep->rdllist
> > > > and ep->ovflist. However, this is done without a lock, therefore the
> > > > returned value is not reliable. Because it is possible that both checks on
> > > > ep->rdllist and ep->ovflist are false while ep_start_scan() or
> > > > ep_done_scan() is being executed on other CPUs, despite events are
> > > > available.
> > > >
> > > > This bug can be observed by:
> > > >
> > > > 1. Create an eventpoll with at least one ready level-triggered event
> > > >
> > > > 2. Create multiple threads who do epoll_wait() with zero timeout. The
> > > > threads do not consume the events, therefore all epoll_wait() should
> > > > return at least one event.
> > > >
> > > > If one thread is executing ep_events_available() while another thread is
> > > > executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly
> > > > return no event for the former thread.
> > >
> > > That is the whole point of epoll_wait with a zero timeout. We would want to
> > > opportunistically poll without much overhead, which will have more
> > > false positives.
> > > A caller that calls with a zero timeout should retry later, and will
> > > at some point observe the event.
> >
> > Is this a documented behavior that users expect? I do not see this in the
> > man page.
>
> The selftests rely on this behavior that timeout=0 sees events from a
> concurrently running producer. They would fail at a very higher rate
> after this change - believe me I had a similar patch that changed
> something in this area. I would explore the seqcount that Mateusz
> suggested tbh.
>
Does this scenario really affect any real programs?
It doesn't make sense to have multiple threads looking for level-triggered
events on a single epoll fd.
When epoll returns an event you really need to do a (usually) read on
the associated file descriptor before calling epoll again.
To split the epoll processing between multiple threads you need lots of
epoll fd with the underlying fd distributed between them and get the
threads to process the epoll fd sequentially (eg by putting the fd in an
array and using an atomic increment of a global array index to get the
next epoll fd to process).
-- David
next prev parent reply other threads:[~2026-05-04 12:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1752824628.git.namcao@linutronix.de>
2025-07-18 7:52 ` [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative Nam Cao
2025-07-18 8:38 ` Soheil Hassas Yeganeh
2025-07-18 8:59 ` Nam Cao
2026-04-29 6:54 ` Christian Brauner
2026-04-29 7:27 ` Nam Cao
2026-04-29 15:34 ` Mateusz Guzik
2026-05-03 13:24 ` Nam Cao
2026-05-04 12:00 ` David Laight [this message]
2025-09-17 12:49 ` Mateusz Guzik
2025-09-17 13:41 ` Nam Cao
2025-09-17 16:05 ` Mateusz Guzik
2025-09-17 16:08 ` Mateusz Guzik
2025-09-17 18:03 ` Khazhy Kumykov
2025-09-17 22:28 ` Khazhy Kumykov
2025-09-17 22:38 ` Mateusz Guzik
2025-09-22 6:26 ` Nam Cao
2025-09-20 14:42 ` David Laight
2025-09-20 14:45 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260504130027.50040ce6@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=dave@stgolabs.net \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=khazhy@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=namcao@linutronix.de \
--cc=shuah@kernel.org \
--cc=soheil@google.com \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox