From: Nam Cao <namcao@linutronix.de>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>,
Soheil Hassas Yeganeh <soheil@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Shuah Khan <shuah@kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Khazhismel Kumykov <khazhy@google.com>,
Willem de Bruijn <willemb@google.com>,
Eric Dumazet <edumazet@google.com>, Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative
Date: Sun, 03 May 2026 15:24:14 +0200 [thread overview]
Message-ID: <87cxzc62yp.fsf@yellow.woof> (raw)
In-Reply-To: <xbotidrmois5ygxtqtwqzczkt76wcc7uw5cz5lptda53coaavj@pzvxcpe534cu>
Mateusz Guzik <mjguzik@gmail.com> writes:
> Strictly speaking more error prone than the seq approach, but should be
> faster on weaker-ordered archs thanks to avoided fences.
>
> I'm definitely not going to protest the seqc route.
Linus probably wouldn't be thrilled if I break epoll again, so let's
stay with the simpler seqcount route.
Nam
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index a3090b446af1..22c3f0186476 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -38,6 +38,7 @@
#include <linux/compat.h>
#include <linux/rculist.h>
#include <linux/capability.h>
+#include <linux/seqlock.h>
#include <net/busy_poll.h>
/*
@@ -190,6 +191,9 @@ struct eventpoll {
/* Lock which protects rdllist and ovflist */
spinlock_t lock;
+ /* Protect switching between rdllist and ovflist */
+ seqcount_spinlock_t seq;
+
/* RB tree root used to store monitored fd structs */
struct rb_root_cached rbr;
@@ -382,8 +386,17 @@ static inline struct epitem *ep_item_from_wait(wait_queue_entry_t *p)
*/
static inline int ep_events_available(struct eventpoll *ep)
{
- return !list_empty_careful(&ep->rdllist) ||
- READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
+ bool events_available;
+ unsigned int seq;
+
+ do {
+ seq = read_seqcount_begin(&ep->seq);
+
+ events_available = !list_empty_careful(&ep->rdllist) ||
+ READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR;
+ } while (read_seqcount_retry(&ep->seq, seq));
+
+ return events_available;
}
#ifdef CONFIG_NET_RX_BUSY_POLL
@@ -735,8 +748,12 @@ static void ep_start_scan(struct eventpoll *ep, struct list_head *txlist)
*/
lockdep_assert_irqs_enabled();
spin_lock_irq(&ep->lock);
+ write_seqcount_begin(&ep->seq);
+
list_splice_init(&ep->rdllist, txlist);
WRITE_ONCE(ep->ovflist, NULL);
+
+ write_seqcount_end(&ep->seq);
spin_unlock_irq(&ep->lock);
}
@@ -768,6 +785,9 @@ static void ep_done_scan(struct eventpoll *ep,
ep_pm_stay_awake(epi);
}
}
+
+ write_seqcount_begin(&ep->seq);
+
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
* releasing the lock, events will be queued in the normal way inside
@@ -779,6 +799,9 @@ static void ep_done_scan(struct eventpoll *ep,
* Quickly re-inject items left on "txlist".
*/
list_splice(txlist, &ep->rdllist);
+
+ write_seqcount_end(&ep->seq);
+
__pm_relax(ep->ws);
if (!list_empty(&ep->rdllist)) {
@@ -1155,6 +1178,7 @@ static int ep_alloc(struct eventpoll **pep)
mutex_init(&ep->mtx);
spin_lock_init(&ep->lock);
+ seqcount_spinlock_init(&ep->seq, &ep->lock);
init_waitqueue_head(&ep->wq);
init_waitqueue_head(&ep->poll_wait);
INIT_LIST_HEAD(&ep->rdllist);
next prev parent reply other threads:[~2026-05-03 13:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-18 7:52 [PATCH 0/2] eventpoll: Fix epoll_wait() report false negative Nam Cao
2025-07-18 7:52 ` [PATCH 1/2] selftests/eventpoll: Add test for multiple waiters Nam Cao
2025-07-18 7:52 ` [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative Nam Cao
2025-07-18 8:38 ` Soheil Hassas Yeganeh
2025-07-18 8:59 ` Nam Cao
2026-04-29 6:54 ` Christian Brauner
2026-04-29 7:27 ` Nam Cao
2026-04-29 15:34 ` Mateusz Guzik
2026-05-03 13:24 ` Nam Cao [this message]
2026-05-04 12:00 ` David Laight
2025-09-17 12:49 ` Mateusz Guzik
2025-09-17 13:41 ` Nam Cao
2025-09-17 16:05 ` Mateusz Guzik
2025-09-17 16:08 ` Mateusz Guzik
2025-09-17 18:03 ` Khazhy Kumykov
2025-09-17 22:28 ` Khazhy Kumykov
2025-09-17 22:38 ` Mateusz Guzik
2025-09-22 6:26 ` Nam Cao
2025-09-20 14:42 ` David Laight
2025-09-20 14:45 ` Mateusz Guzik
2025-09-17 7:27 ` [PATCH 0/2] " Nam Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87cxzc62yp.fsf@yellow.woof \
--to=namcao@linutronix.de \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=dave@stgolabs.net \
--cc=edumazet@google.com \
--cc=jack@suse.cz \
--cc=khazhy@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=shuah@kernel.org \
--cc=soheil@google.com \
--cc=stable@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.