From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>,
"Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH 10/17] eventpoll: split ep_insert() into alloc + register stages
Date: Fri, 24 Apr 2026 15:46:41 +0200 [thread overview]
Message-ID: <20260424-work-epoll-rework-v1-10-249ed00a20f3@kernel.org> (raw)
In-Reply-To: <20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org>
ep_insert() was 130 lines and mixed four concerns in one body: user
quota charge and epitem allocation, attach-into-file-hlist plus
rbtree insert plus target-ep locking, reverse-path + EPOLLWAKEUP +
poll-queue install with rollback, and ready-list publication.
Factor the first two concerns into named helpers so the body reduces
to orchestration.
ep_alloc_epitem() charges the user's epoll_watches quota, allocates
a fresh epitem, and initializes its fields. On failure it returns
ERR_PTR(-ENOSPC) or ERR_PTR(-ENOMEM); on success the epi is not yet
linked into anything.
ep_register_epitem() installs @epi into @tfile's f_ep hlist and
@ep's rbtree, optionally chains @tfile onto tfile_check_list for the
path check, takes the tep->mtx nested lock for the epoll-watches-
epoll case, and finally takes the ep_get() reference that pairs
with ep_remove()'s ep_put() in ep_insert()'s error paths. On failure
it frees the epi and decrements epoll_watches to match
ep_alloc_epitem().
ep_insert()'s remaining body is the rollback-via-ep_remove() chain
(reverse_path_check, EPOLLWAKEUP source creation, ep_ptable_queue_proc
allocation) and the ready-list / wake publication. Remove a few
stale comments that duplicated function-level documentation or
described obvious code.
No functional change; rollback boundaries unchanged -- every error
path after ep_register_epitem() still calls ep_remove(), preserving
the ep->refcount invariant that keeps ep_remove()'s WARN_ON_ONCE safe.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/eventpoll.c | 111 ++++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 74 insertions(+), 37 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index fde2396342b6..e4a4e92d329f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1726,68 +1726,112 @@ static int ep_attach_file(struct file *file, struct epitem *epi)
}
/*
- * Must be called with "mtx" held.
+ * Charge the user's epoll_watches quota, allocate a fresh epitem for
+ * @tfile/@fd, and initialize its fields. The returned item is not yet
+ * linked into any data structure; the caller must install it via
+ * ep_register_epitem() (which takes over on success) or kmem_cache_free()
+ * it and decrement epoll_watches on its own.
+ *
+ * Returns ERR_PTR(-ENOSPC) if the quota is exceeded, ERR_PTR(-ENOMEM)
+ * if the slab allocation fails.
*/
-static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
- struct file *tfile, int fd, int full_check)
+static struct epitem *ep_alloc_epitem(struct eventpoll *ep,
+ const struct epoll_event *event,
+ struct file *tfile, int fd)
{
- int error, pwake = 0;
- __poll_t revents;
struct epitem *epi;
- struct ep_pqueue epq;
- struct eventpoll *tep = NULL;
-
- if (is_file_epoll(tfile))
- tep = tfile->private_data;
-
- lockdep_assert_irqs_enabled();
if (unlikely(percpu_counter_compare(&ep->user->epoll_watches,
max_user_watches) >= 0))
- return -ENOSPC;
+ return ERR_PTR(-ENOSPC);
percpu_counter_inc(&ep->user->epoll_watches);
- if (!(epi = kmem_cache_zalloc(epi_cache, GFP_KERNEL))) {
+ epi = kmem_cache_zalloc(epi_cache, GFP_KERNEL);
+ if (unlikely(!epi)) {
percpu_counter_dec(&ep->user->epoll_watches);
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
}
- /* Item initialization follow here ... */
INIT_LIST_HEAD(&epi->rdllink);
epi->ep = ep;
ep_set_ffd(&epi->ffd, tfile, fd);
epi->event = *event;
epi->next = EP_UNACTIVE_PTR;
+ return epi;
+}
+
+/*
+ * Install @epi into its target file's f_ep hlist and into @ep's rbtree,
+ * taking one additional reference on @ep for the lifetime of the item.
+ *
+ * If @tep is non-NULL, the target file is itself an eventpoll; we hold
+ * tep->mtx at subclass 1 across the attach + rbtree insert to serialize
+ * with the target side. RB tree ops are protected by @ep->mtx, which
+ * the caller already holds.
+ *
+ * On failure the epi is freed and the epoll_watches counter decremented,
+ * matching ep_alloc_epitem()'s allocation. After this returns
+ * successfully, ep_insert()'s later error paths use ep_remove() for
+ * unwind; that cannot drop @ep's refcount to zero because the ep file
+ * itself still holds the original reference.
+ */
+static int ep_register_epitem(struct eventpoll *ep, struct epitem *epi,
+ struct eventpoll *tep, int full_check)
+{
+ struct file *tfile = epi->ffd.file;
+ int error;
+
if (tep)
mutex_lock_nested(&tep->mtx, 1);
- /* Add the current item to the list of active epoll hook for this file */
- if (unlikely(ep_attach_file(tfile, epi) < 0)) {
+
+ error = ep_attach_file(tfile, epi);
+ if (unlikely(error)) {
if (tep)
mutex_unlock(&tep->mtx);
kmem_cache_free(epi_cache, epi);
percpu_counter_dec(&ep->user->epoll_watches);
- return -ENOMEM;
+ return error;
}
if (full_check && !tep)
list_file(tfile);
- /*
- * Add the current item to the RB tree. All RB tree operations are
- * protected by "mtx", and ep_insert() is called with "mtx" held.
- */
ep_rbtree_insert(ep, epi);
+
if (tep)
mutex_unlock(&tep->mtx);
- /*
- * ep_remove() calls in the later error paths can't lead to
- * ep_free() as the ep file itself still holds an ep reference.
- */
ep_get(ep);
+ return 0;
+}
+
+/*
+ * Must be called with "mtx" held.
+ */
+static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
+ struct file *tfile, int fd, int full_check)
+{
+ int error, pwake = 0;
+ __poll_t revents;
+ struct epitem *epi;
+ struct ep_pqueue epq;
+ struct eventpoll *tep = NULL;
+
+ if (is_file_epoll(tfile))
+ tep = tfile->private_data;
+
+ lockdep_assert_irqs_enabled();
- /* now check if we've created too many backpaths */
+ epi = ep_alloc_epitem(ep, event, tfile, fd);
+ if (IS_ERR(epi))
+ return PTR_ERR(epi);
+
+ error = ep_register_epitem(ep, epi, tep, full_check);
+ if (error)
+ return error;
+
+ /* Reject the insert if the new link would create too many back-paths. */
if (unlikely(full_check && reverse_path_check())) {
ep_remove(ep, epi);
return -EINVAL;
@@ -1814,28 +1858,21 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
*/
revents = ep_item_poll(epi, &epq.pt, 1);
- /*
- * We have to check if something went wrong during the poll wait queue
- * install process. Namely an allocation for a wait queue failed due
- * high memory pressure.
- */
+ /* ep_ptable_queue_proc() signals allocation failure by clearing epq.epi. */
if (unlikely(!epq.epi)) {
ep_remove(ep, epi);
return -ENOMEM;
}
- /* We have to drop the new item inside our item list to keep track of it */
+ /* Drop the new item onto the ready list if it is already ready. */
spin_lock_irq(&ep->lock);
- /* record NAPI ID of new item if present */
ep_set_busy_poll_napi_id(epi);
- /* If the file is already "ready" we drop it inside the ready list */
if (revents && !ep_is_linked(epi)) {
list_add_tail(&epi->rdllink, &ep->rdllist);
ep_pm_stay_awake(epi);
- /* Notify waiting tasks that events are available */
if (waitqueue_active(&ep->wq))
wake_up(&ep->wq);
if (waitqueue_active(&ep->poll_wait))
--
2.47.3
next prev parent reply other threads:[~2026-04-24 13:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:46 [PATCH 00/17] eventpoll: clarity refactor Christian Brauner
2026-04-24 13:46 ` [PATCH 01/17] eventpoll: expand top-of-file overview / locking doc Christian Brauner
2026-04-24 13:46 ` [PATCH 02/17] eventpoll: document loop-check / path-check globals Christian Brauner
2026-04-24 13:46 ` [PATCH 03/17] eventpoll: clarify POLLFREE handshake comments Christian Brauner
2026-04-24 13:46 ` [PATCH 04/17] eventpoll: refresh epi_fget() / ep_remove_file() comments Christian Brauner
2026-04-24 13:46 ` [PATCH 05/17] eventpoll: document ep_clear_and_put() two-pass pattern Christian Brauner
2026-04-24 13:46 ` [PATCH 06/17] eventpoll: rename ep_refcount_dec_and_test() to ep_put() Christian Brauner
2026-04-24 13:46 ` [PATCH 07/17] eventpoll: drop unused depth argument from epoll_mutex_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 08/17] eventpoll: rename attach_epitem() to ep_attach_file() Christian Brauner
2026-04-24 13:46 ` [PATCH 09/17] eventpoll: relocate KCMP helpers near compat syscalls Christian Brauner
2026-04-24 13:46 ` Christian Brauner [this message]
2026-04-24 13:46 ` [PATCH 11/17] eventpoll: split ep_clear_and_put() into drain helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 12/17] eventpoll: extract ep_deliver_event() from ep_send_events() Christian Brauner
2026-04-24 13:46 ` [PATCH 13/17] eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 14/17] eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 15/17] eventpoll: rename epi->next and txlist for clarity Christian Brauner
2026-04-24 16:06 ` Linus Torvalds
2026-04-24 13:46 ` [PATCH 16/17] eventpoll: use bool for predicate helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 17/17] eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx Christian Brauner
2026-04-24 15:33 ` [PATCH 00/17] eventpoll: clarity refactor Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260424-work-epoll-rework-v1-10-249ed00a20f3@kernel.org \
--to=brauner@kernel.org \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox