From: Frederic Weisbecker <frederic@kernel.org>
To: linux-rt-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Mike Galbraith <efault@gmx.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT
Date: Wed, 25 Aug 2021 15:27:54 +0200 [thread overview]
Message-ID: <20210825132754.GA895675@lothringen> (raw)
Hi,
Ok the patch is gross but at least this lets me start a discussion
about the issue.
---
From d9d66d650b3dac8947a34464dd2e0b546a8c6b63 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 25 Aug 2021 14:24:54 +0200
Subject: [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT
The eventpoll lock has been converted to an rwlock some time ago with:
a218cc491420 (epoll: use rwlock in order to reduce ep_poll
callback() contention)
Unfortunately this can result in scenarios where a high priority caller
of epoll_wait() need to wait for the completion of lower priority wakers.
The typical scenario is:
1) epoll_wait() waits and sleeps for new events in the ep_poll() loop.
2) new events arrive in ep_poll_callback(), the waiter is awaken while
ep->lock is read-acquired.
3) The high priority waiter preempts the waker but it can't acquire the
write lock in epoll_wait() so it blocks waiting for the low prio waker
without priority inheritance.
I guess making readlock writer fair is still not the plan so all I can
propose is to make that rwlock build-conditional.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
fs/eventpoll.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e596e1d0bba..c1fb4b01ea4f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1133,7 +1133,10 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
unsigned long flags;
int ewake = 0;
- read_lock_irqsave(&ep->lock, flags);
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ read_lock_irqsave(&ep->lock, flags);
+ else
+ write_lock_irqsave(&ep->lock, flags);
ep_set_busy_poll_napi_id(epi);
@@ -1197,7 +1200,10 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
pwake++;
out_unlock:
- read_unlock_irqrestore(&ep->lock, flags);
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ read_unlock_irqrestore(&ep->lock, flags);
+ else
+ write_unlock_irqrestore(&ep->lock, flags);
/* We have to call this outside the lock */
if (pwake)
--
2.25.1
next reply other threads:[~2021-08-25 13:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-25 13:27 Frederic Weisbecker [this message]
2021-08-26 11:53 ` [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT Sebastian Andrzej Siewior
2021-08-26 20:30 ` John Ogness
2021-08-27 10:07 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210825132754.GA895675@lothringen \
--to=frederic@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.