From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>,
"Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH 11/17] eventpoll: split ep_clear_and_put() into drain helpers
Date: Fri, 24 Apr 2026 15:46:42 +0200 [thread overview]
Message-ID: <20260424-work-epoll-rework-v1-11-249ed00a20f3@kernel.org> (raw)
In-Reply-To: <20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org>
ep_clear_and_put()'s two-pass walk is the main way an ep file close
tears down its state, and the ordering between the passes is
load-bearing (see previous commit's docblock). Give each pass its
own function so the ordering is enforced by the call sequence in
ep_clear_and_put() rather than by convention inside one body.
ep_drain_pollwaits() carries out Pass 1: walk the rbtree and
ep_unregister_pollwait() each epi. The function-level comment names
it as Pass 1 and spells out the synchronization contract with
ep_poll_callback().
ep_drain_tree() carries out Pass 2: walk the rbtree and ep_remove()
each epi, capturing rb_next() before each erase. The comment names
it as Pass 2 and documents the hand-off with a concurrent
eventpoll_release_file() (removal path C).
ep_clear_and_put() keeps the poll-on-ep wakeup, ep->mtx bracketing,
and ep_put() + conditional ep_free(), and its docblock shrinks to
the high-level summary; the per-pass detail moved into the helpers.
No functional change.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/eventpoll.c | 87 ++++++++++++++++++++++++++++++++++------------------------
1 file changed, 51 insertions(+), 36 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e4a4e92d329f..eeddd05ba529 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1105,62 +1105,77 @@ static void ep_remove(struct eventpoll *ep, struct epitem *epi)
}
/*
- * Removal path B (see "Removal paths" in the top-of-file banner):
- * close of the epoll fd itself, reached via ep_eventpoll_release().
- *
- * Under ep->mtx we walk the rbtree twice:
- *
- * Pass 1 drains each epi's pwqlist via ep_unregister_pollwait().
- * This takes each watched waitqueue head's lock and so
- * synchronizes with any in-flight ep_poll_callback(), so
- * after the pass ends no callback can still be holding or
- * about to dereference any epi on this ep.
- *
- * Pass 2 runs ep_remove() on each epi. The per-epi pwqlist is
- * already empty, but the rest of ep_remove() still runs
- * (epi_fget() pin, f_ep clear under f_lock, rbtree erase,
- * rdllist unlink, kfree_rcu).
- *
- * Pass 1 must strictly precede Pass 2: fusing them would let a
- * callback queued on epi_i still fire after epi_{i+k} was freed.
- *
- * A concurrent eventpoll_release_file() (path C) serializes against
- * us on ep->mtx; in Pass 2, ep_remove() transparently hands off any
- * epi whose watched file is in __fput() by bailing when epi_fget()
- * returns NULL, and C will clean that epi up on its side.
- *
- * ep->refcount is held > 0 throughout by the ep file's own share;
- * we drop that share after the walk and free the eventpoll if we
- * were last.
+ * Pass 1 of ep_clear_and_put(): drain every epi's pwqlist.
+ * ep_unregister_pollwait() takes each watched wait-queue head's lock,
+ * which synchronizes with any in-flight ep_poll_callback(); after
+ * this returns no callback can still be about to dereference an epi
+ * on this ep. Must strictly precede ep_drain_tree() -- fusing the
+ * two walks would let a callback queued on epi_i still fire after
+ * epi_{i+k} had already been freed.
*/
-static void ep_clear_and_put(struct eventpoll *ep)
+static void ep_drain_pollwaits(struct eventpoll *ep)
{
- struct rb_node *rbp, *next;
+ struct rb_node *rbp;
struct epitem *epi;
- /* Release any threads blocked in poll-on-ep. */
- if (waitqueue_active(&ep->poll_wait))
- ep_poll_safewake(ep, NULL, 0);
-
- mutex_lock(&ep->mtx);
+ lockdep_assert_held(&ep->mtx);
- /* Pass 1: drain pwqlists; synchronizes with in-flight callbacks. */
for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
epi = rb_entry(rbp, struct epitem, rbn);
ep_unregister_pollwait(ep, epi);
cond_resched();
}
+}
+
+/*
+ * Pass 2 of ep_clear_and_put(): ep_remove() every epi. The per-epi
+ * pwqlist is already empty (ep_drain_pollwaits ran), but the rest of
+ * ep_remove() still runs: epi_fget() pin, f_ep clear under f_lock,
+ * rbtree erase, rdllist unlink, kfree_rcu(epi). rb_next() is captured
+ * before each erase so the iteration is stable.
+ *
+ * A concurrent eventpoll_release_file() (removal path C) on a watched
+ * file serializes with us via ep->mtx; ep_remove() transparently
+ * hands off any epi whose file is in __fput() by bailing when
+ * epi_fget() returns NULL, and path C will clean that epi up.
+ */
+static void ep_drain_tree(struct eventpoll *ep)
+{
+ struct rb_node *rbp, *next;
+ struct epitem *epi;
+
+ lockdep_assert_held(&ep->mtx);
- /* Pass 2: remove each epi. rb_next() is captured before erase. */
for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = next) {
next = rb_next(rbp);
epi = rb_entry(rbp, struct epitem, rbn);
ep_remove(ep, epi);
cond_resched();
}
+}
+
+/*
+ * Removal path B (see "Removal paths" in the top-of-file banner):
+ * close of the epoll fd itself, reached via ep_eventpoll_release().
+ *
+ * Two passes under ep->mtx: first ep_drain_pollwaits() quiesces
+ * in-flight callbacks, then ep_drain_tree() frees the epis. The
+ * ep->refcount is kept > 0 across the walk by the ep file's own
+ * share, which we drop below; ep_free() runs iff we were the last
+ * holder after the tree drained.
+ */
+static void ep_clear_and_put(struct eventpoll *ep)
+{
+ /* Release any threads blocked in poll-on-ep. */
+ if (waitqueue_active(&ep->poll_wait))
+ ep_poll_safewake(ep, NULL, 0);
+ mutex_lock(&ep->mtx);
+ ep_drain_pollwaits(ep);
+ ep_drain_tree(ep);
mutex_unlock(&ep->mtx);
+
if (ep_put(ep))
ep_free(ep);
}
--
2.47.3
next prev parent reply other threads:[~2026-04-24 13:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:46 [PATCH 00/17] eventpoll: clarity refactor Christian Brauner
2026-04-24 13:46 ` [PATCH 01/17] eventpoll: expand top-of-file overview / locking doc Christian Brauner
2026-04-24 13:46 ` [PATCH 02/17] eventpoll: document loop-check / path-check globals Christian Brauner
2026-04-24 13:46 ` [PATCH 03/17] eventpoll: clarify POLLFREE handshake comments Christian Brauner
2026-04-24 13:46 ` [PATCH 04/17] eventpoll: refresh epi_fget() / ep_remove_file() comments Christian Brauner
2026-04-24 13:46 ` [PATCH 05/17] eventpoll: document ep_clear_and_put() two-pass pattern Christian Brauner
2026-04-24 13:46 ` [PATCH 06/17] eventpoll: rename ep_refcount_dec_and_test() to ep_put() Christian Brauner
2026-04-24 13:46 ` [PATCH 07/17] eventpoll: drop unused depth argument from epoll_mutex_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 08/17] eventpoll: rename attach_epitem() to ep_attach_file() Christian Brauner
2026-04-24 13:46 ` [PATCH 09/17] eventpoll: relocate KCMP helpers near compat syscalls Christian Brauner
2026-04-24 13:46 ` [PATCH 10/17] eventpoll: split ep_insert() into alloc + register stages Christian Brauner
2026-04-24 13:46 ` Christian Brauner [this message]
2026-04-24 13:46 ` [PATCH 12/17] eventpoll: extract ep_deliver_event() from ep_send_events() Christian Brauner
2026-04-24 13:46 ` [PATCH 13/17] eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 14/17] eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 15/17] eventpoll: rename epi->next and txlist for clarity Christian Brauner
2026-04-24 16:06 ` Linus Torvalds
2026-04-24 13:46 ` [PATCH 16/17] eventpoll: use bool for predicate helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 17/17] eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx Christian Brauner
2026-04-24 15:33 ` [PATCH 00/17] eventpoll: clarity refactor Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260424-work-epoll-rework-v1-11-249ed00a20f3@kernel.org \
--to=brauner@kernel.org \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox