public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	 Jens Axboe <axboe@kernel.dk>,
	 "Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH 05/17] eventpoll: document ep_clear_and_put() two-pass pattern
Date: Fri, 24 Apr 2026 15:46:36 +0200	[thread overview]
Message-ID: <20260424-work-epoll-rework-v1-5-249ed00a20f3@kernel.org> (raw)
In-Reply-To: <20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org>

ep_clear_and_put() walks the rbtree twice: once to drain each epi's
pwqlist, then again to ep_remove() each entry. The split is
load-bearing -- fusing the passes into one loop would let a poll
callback still queued on some epi_i fire after epi_{i+k} has already
been freed -- but the previous comments described each pass in
isolation and did not explain the ordering invariant or the
cooperation with removal path C (eventpoll_release_file).

Add a function-level docblock that labels this as path B from the
top-of-file "Removal paths" section, names the two passes and the
ordering invariant, explains the pwqlist drain as synchronization
with in-flight ep_poll_callback() via whead->lock, describes the
C-path hand-off when epi_fget() returns NULL, and states the
ep->refcount invariant that keeps ep_remove()'s WARN_ON_ONCE safe
across the loop.

Also tighten the per-pass comments to one line each and fix the
minor grammar bug in the poll_wait release comment ("these file" ->
"poll-on-ep").

No functional change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/eventpoll.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1039d9737ce9..b6a14c69c482 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1103,20 +1103,47 @@ static void ep_remove(struct eventpoll *ep, struct epitem *epi)
 	WARN_ON_ONCE(ep_refcount_dec_and_test(ep));
 }
 
+/*
+ * Removal path B (see "Removal paths" in the top-of-file banner):
+ * close of the epoll fd itself, reached via ep_eventpoll_release().
+ *
+ * Under ep->mtx we walk the rbtree twice:
+ *
+ *   Pass 1 drains each epi's pwqlist via ep_unregister_pollwait().
+ *          This takes each watched waitqueue head's lock and so
+ *          synchronizes with any in-flight ep_poll_callback(), so
+ *          after the pass ends no callback can still be holding or
+ *          about to dereference any epi on this ep.
+ *
+ *   Pass 2 runs ep_remove() on each epi. The per-epi pwqlist is
+ *          already empty, but the rest of ep_remove() still runs
+ *          (epi_fget() pin, f_ep clear under f_lock, rbtree erase,
+ *          rdllist unlink, kfree_rcu).
+ *
+ * Pass 1 must strictly precede Pass 2: fusing them would let a
+ * callback queued on epi_i still fire after epi_{i+k} was freed.
+ *
+ * A concurrent eventpoll_release_file() (path C) serializes against
+ * us on ep->mtx; in Pass 2, ep_remove() transparently hands off any
+ * epi whose watched file is in __fput() by bailing when epi_fget()
+ * returns NULL, and C will clean that epi up on its side.
+ *
+ * ep->refcount is held > 0 throughout by the ep file's own share;
+ * we drop that share after the walk and free the eventpoll if we
+ * were last.
+ */
 static void ep_clear_and_put(struct eventpoll *ep)
 {
 	struct rb_node *rbp, *next;
 	struct epitem *epi;
 
-	/* We need to release all tasks waiting for these file */
+	/* Release any threads blocked in poll-on-ep. */
 	if (waitqueue_active(&ep->poll_wait))
 		ep_poll_safewake(ep, NULL, 0);
 
 	mutex_lock(&ep->mtx);
 
-	/*
-	 * Walks through the whole tree by unregistering poll callbacks.
-	 */
+	/* Pass 1: drain pwqlists; synchronizes with in-flight callbacks. */
 	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 
@@ -1124,14 +1151,7 @@ static void ep_clear_and_put(struct eventpoll *ep)
 		cond_resched();
 	}
 
-	/*
-	 * Walks through the whole tree and try to free each "struct epitem".
-	 * Note that ep_remove() will not remove the epitem in case of a
-	 * racing eventpoll_release_file(); the latter will do the removal.
-	 * At this point we are sure no poll callbacks will be lingering around.
-	 * Since we still own a reference to the eventpoll struct, the loop can't
-	 * dispose it.
-	 */
+	/* Pass 2: remove each epi. rb_next() is captured before erase. */
 	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = next) {
 		next = rb_next(rbp);
 		epi = rb_entry(rbp, struct epitem, rbn);

-- 
2.47.3


  parent reply	other threads:[~2026-04-24 13:46 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 13:46 [PATCH 00/17] eventpoll: clarity refactor Christian Brauner
2026-04-24 13:46 ` [PATCH 01/17] eventpoll: expand top-of-file overview / locking doc Christian Brauner
2026-04-24 13:46 ` [PATCH 02/17] eventpoll: document loop-check / path-check globals Christian Brauner
2026-04-24 13:46 ` [PATCH 03/17] eventpoll: clarify POLLFREE handshake comments Christian Brauner
2026-04-24 13:46 ` [PATCH 04/17] eventpoll: refresh epi_fget() / ep_remove_file() comments Christian Brauner
2026-04-24 13:46 ` Christian Brauner [this message]
2026-04-24 13:46 ` [PATCH 06/17] eventpoll: rename ep_refcount_dec_and_test() to ep_put() Christian Brauner
2026-04-24 13:46 ` [PATCH 07/17] eventpoll: drop unused depth argument from epoll_mutex_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 08/17] eventpoll: rename attach_epitem() to ep_attach_file() Christian Brauner
2026-04-24 13:46 ` [PATCH 09/17] eventpoll: relocate KCMP helpers near compat syscalls Christian Brauner
2026-04-24 13:46 ` [PATCH 10/17] eventpoll: split ep_insert() into alloc + register stages Christian Brauner
2026-04-24 13:46 ` [PATCH 11/17] eventpoll: split ep_clear_and_put() into drain helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 12/17] eventpoll: extract ep_deliver_event() from ep_send_events() Christian Brauner
2026-04-24 13:46 ` [PATCH 13/17] eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 14/17] eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 15/17] eventpoll: rename epi->next and txlist for clarity Christian Brauner
2026-04-24 16:06   ` Linus Torvalds
2026-04-24 13:46 ` [PATCH 16/17] eventpoll: use bool for predicate helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 17/17] eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx Christian Brauner
2026-04-24 15:33 ` [PATCH 00/17] eventpoll: clarity refactor Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260424-work-epoll-rework-v1-5-249ed00a20f3@kernel.org \
    --to=brauner@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox