From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
Jens Axboe <axboe@kernel.dk>,
"Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH 12/17] eventpoll: extract ep_deliver_event() from ep_send_events()
Date: Fri, 24 Apr 2026 15:46:43 +0200 [thread overview]
Message-ID: <20260424-work-epoll-rework-v1-12-249ed00a20f3@kernel.org> (raw)
In-Reply-To: <20260424-work-epoll-rework-v1-0-249ed00a20f3@kernel.org>
ep_send_events()'s body covered two concerns: per-item work (PM
wakeup-source bookkeeping, re-poll, copy_to_user, level-trigger
re-queue, EPOLLONESHOT mask clear) and the scan-level accumulator
(maxevents cap, EFAULT preservation, txlist/rdllist splice).
Extract the per-item work as ep_deliver_event(), which returns a
tri-state int:
1 one event was delivered; caller advances the counter,
0 re-poll produced no caller-requested events (item drops
out of the ready list; a future callback will re-queue),
-EFAULT copy_to_user() faulted; item is already re-inserted at
the head of the txlist so ep_done_scan() splices it back
to rdllist.
The per-item comments (PM ordering, the "sole writer to rdllist"
invariant for the LT re-queue, the EFAULT semantics) move into
ep_deliver_event(). ep_send_events() reduces to the fatal-signal
short-circuit, scan bracket, and a short txlist walk that accumulates
the deliveries and preserves the "first error wins" EFAULT contract
(res = delivered only if no event was previously delivered; otherwise
the success count is returned and -EFAULT is reported on the next
call).
No functional change.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/eventpoll.c | 138 +++++++++++++++++++++++++++++++++++----------------------
1 file changed, 84 insertions(+), 54 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index eeddd05ba529..6d4167a347ab 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1979,6 +1979,82 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
return 0;
}
+/*
+ * Attempt to deliver one event for @epi into @*uevents.
+ *
+ * Returns 1 if an event was delivered (with *uevents advanced to the
+ * next slot), 0 if the re-poll reported no caller-requested events
+ * (@epi drops out of the ready list; a future callback will re-add
+ * it), or -EFAULT if copy_to_user() faulted (in which case @epi is
+ * re-inserted at the head of @txlist so ep_done_scan() merges it
+ * back to rdllist for the next attempt).
+ *
+ * PM bookkeeping and level-triggered re-queue are handled here.
+ * Caller holds ep->mtx and the scan is active.
+ */
+static int ep_deliver_event(struct eventpoll *ep, struct epitem *epi,
+ poll_table *pt,
+ struct epoll_event __user **uevents,
+ struct list_head *txlist)
+{
+ struct epoll_event __user *next;
+ struct wakeup_source *ws;
+ __poll_t revents;
+
+ /*
+ * Activate ep->ws before deactivating epi->ws to prevent
+ * triggering auto-suspend here (in case we reactivate epi->ws
+ * below). Rearranging to delay the deactivation would let
+ * epi->ws drift out of sync with ep_is_linked().
+ */
+ ws = ep_wakeup_source(epi);
+ if (ws) {
+ if (ws->active)
+ __pm_stay_awake(ep->ws);
+ __pm_relax(ws);
+ }
+
+ list_del_init(&epi->rdllink);
+
+ /*
+ * Re-poll under ep->mtx so userspace cannot change the item
+ * out from under us. If no caller-requested events remain,
+ * @epi stays off the ready list; the poll callback will
+ * re-queue it when events next appear.
+ */
+ revents = ep_item_poll(epi, pt, 1);
+ if (!revents)
+ return 0;
+
+ next = epoll_put_uevent(revents, epi->event.data, *uevents);
+ if (!next) {
+ /*
+ * copy_to_user() faulted: put the item back so
+ * ep_done_scan() splices it onto rdllist for the next
+ * attempt.
+ */
+ list_add(&epi->rdllink, txlist);
+ ep_pm_stay_awake(epi);
+ return -EFAULT;
+ }
+ *uevents = next;
+
+ if (epi->event.events & EPOLLONESHOT) {
+ epi->event.events &= EP_PRIVATE_BITS;
+ } else if (!(epi->event.events & EPOLLET)) {
+ /*
+ * Level-triggered: re-queue so the next epoll_wait()
+ * rechecks availability. We are the sole writer to
+ * rdllist here -- epoll_ctl() callers are locked out
+ * by ep->mtx, and the poll callback queues to ovflist
+ * during scans.
+ */
+ list_add_tail(&epi->rdllink, &ep->rdllist);
+ ep_pm_stay_awake(epi);
+ }
+ return 1;
+}
+
static int ep_send_events(struct eventpoll *ep,
struct epoll_event __user *events, int maxevents)
{
@@ -2001,70 +2077,24 @@ static int ep_send_events(struct eventpoll *ep,
ep_start_scan(ep, &txlist);
/*
- * We can loop without lock because we are passed a task private list.
- * Items cannot vanish during the loop we are holding ep->mtx.
+ * We can loop without lock because we are passed a task-private
+ * txlist; items cannot vanish while we hold ep->mtx.
*/
list_for_each_entry_safe(epi, tmp, &txlist, rdllink) {
- struct wakeup_source *ws;
- __poll_t revents;
+ int delivered;
if (res >= maxevents)
break;
- /*
- * Activate ep->ws before deactivating epi->ws to prevent
- * triggering auto-suspend here (in case we reactive epi->ws
- * below).
- *
- * This could be rearranged to delay the deactivation of epi->ws
- * instead, but then epi->ws would temporarily be out of sync
- * with ep_is_linked().
- */
- ws = ep_wakeup_source(epi);
- if (ws) {
- if (ws->active)
- __pm_stay_awake(ep->ws);
- __pm_relax(ws);
- }
-
- list_del_init(&epi->rdllink);
-
- /*
- * If the event mask intersect the caller-requested one,
- * deliver the event to userspace. Again, we are holding ep->mtx,
- * so no operations coming from userspace can change the item.
- */
- revents = ep_item_poll(epi, &pt, 1);
- if (!revents)
- continue;
-
- events = epoll_put_uevent(revents, epi->event.data, events);
- if (!events) {
- list_add(&epi->rdllink, &txlist);
- ep_pm_stay_awake(epi);
+ delivered = ep_deliver_event(ep, epi, &pt, &events, &txlist);
+ if (delivered < 0) {
if (!res)
- res = -EFAULT;
+ res = delivered;
break;
}
- res++;
- if (epi->event.events & EPOLLONESHOT)
- epi->event.events &= EP_PRIVATE_BITS;
- else if (!(epi->event.events & EPOLLET)) {
- /*
- * If this file has been added with Level
- * Trigger mode, we need to insert back inside
- * the ready list, so that the next call to
- * epoll_wait() will check again the events
- * availability. At this point, no one can insert
- * into ep->rdllist besides us. The epoll_ctl()
- * callers are locked out by
- * ep_send_events() holding "mtx" and the
- * poll callback will queue them in ep->ovflist.
- */
- list_add_tail(&epi->rdllink, &ep->rdllist);
- ep_pm_stay_awake(epi);
- }
+ res += delivered;
}
+
ep_done_scan(ep, &txlist);
mutex_unlock(&ep->mtx);
--
2.47.3
next prev parent reply other threads:[~2026-04-24 13:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-24 13:46 [PATCH 00/17] eventpoll: clarity refactor Christian Brauner
2026-04-24 13:46 ` [PATCH 01/17] eventpoll: expand top-of-file overview / locking doc Christian Brauner
2026-04-24 13:46 ` [PATCH 02/17] eventpoll: document loop-check / path-check globals Christian Brauner
2026-04-24 13:46 ` [PATCH 03/17] eventpoll: clarify POLLFREE handshake comments Christian Brauner
2026-04-24 13:46 ` [PATCH 04/17] eventpoll: refresh epi_fget() / ep_remove_file() comments Christian Brauner
2026-04-24 13:46 ` [PATCH 05/17] eventpoll: document ep_clear_and_put() two-pass pattern Christian Brauner
2026-04-24 13:46 ` [PATCH 06/17] eventpoll: rename ep_refcount_dec_and_test() to ep_put() Christian Brauner
2026-04-24 13:46 ` [PATCH 07/17] eventpoll: drop unused depth argument from epoll_mutex_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 08/17] eventpoll: rename attach_epitem() to ep_attach_file() Christian Brauner
2026-04-24 13:46 ` [PATCH 09/17] eventpoll: relocate KCMP helpers near compat syscalls Christian Brauner
2026-04-24 13:46 ` [PATCH 10/17] eventpoll: split ep_insert() into alloc + register stages Christian Brauner
2026-04-24 13:46 ` [PATCH 11/17] eventpoll: split ep_clear_and_put() into drain helpers Christian Brauner
2026-04-24 13:46 ` Christian Brauner [this message]
2026-04-24 13:46 ` [PATCH 13/17] eventpoll: extract lock dance from do_epoll_ctl() into ep_ctl_lock() Christian Brauner
2026-04-24 13:46 ` [PATCH 14/17] eventpoll: wrap EP_UNACTIVE_PTR in typed sentinel helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 15/17] eventpoll: rename epi->next and txlist for clarity Christian Brauner
2026-04-24 16:06 ` Linus Torvalds
2026-04-24 13:46 ` [PATCH 16/17] eventpoll: use bool for predicate helpers Christian Brauner
2026-04-24 13:46 ` [PATCH 17/17] eventpoll: hoist CTL_ADD scratch state into struct ep_ctl_ctx Christian Brauner
2026-04-24 15:33 ` [PATCH 00/17] eventpoll: clarity refactor Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260424-work-epoll-rework-v1-12-249ed00a20f3@kernel.org \
--to=brauner@kernel.org \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox