From: Frederic Weisbecker <frederic@kernel.org>
To: linux-rt-users@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Mike Galbraith <efault@gmx.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT
Date: Wed, 25 Aug 2021 15:27:54 +0200 [thread overview]
Message-ID: <20210825132754.GA895675@lothringen> (raw)
Hi,
Ok the patch is gross but at least this lets me start a discussion
about the issue.
---
From d9d66d650b3dac8947a34464dd2e0b546a8c6b63 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Wed, 25 Aug 2021 14:24:54 +0200
Subject: [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT
The eventpoll lock has been converted to an rwlock some time ago with:
a218cc491420 (epoll: use rwlock in order to reduce ep_poll
callback() contention)
Unfortunately this can result in scenarios where a high priority caller
of epoll_wait() need to wait for the completion of lower priority wakers.
The typical scenario is:
1) epoll_wait() waits and sleeps for new events in the ep_poll() loop.
2) new events arrive in ep_poll_callback(), the waiter is awaken while
ep->lock is read-acquired.
3) The high priority waiter preempts the waker but it can't acquire the
write lock in epoll_wait() so it blocks waiting for the low prio waker
without priority inheritance.
I guess making readlock writer fair is still not the plan so all I can
propose is to make that rwlock build-conditional.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
fs/eventpoll.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e596e1d0bba..c1fb4b01ea4f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1133,7 +1133,10 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
unsigned long flags;
int ewake = 0;
- read_lock_irqsave(&ep->lock, flags);
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ read_lock_irqsave(&ep->lock, flags);
+ else
+ write_lock_irqsave(&ep->lock, flags);
ep_set_busy_poll_napi_id(epi);
@@ -1197,7 +1200,10 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
pwake++;
out_unlock:
- read_unlock_irqrestore(&ep->lock, flags);
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ read_unlock_irqrestore(&ep->lock, flags);
+ else
+ write_unlock_irqrestore(&ep->lock, flags);
/* We have to call this outside the lock */
if (pwake)
--
2.25.1
next reply other threads:[~2021-08-25 13:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-25 13:27 Frederic Weisbecker [this message]
2021-08-26 11:53 ` [RFC PATCH -RT] epoll: Fix eventpoll read-lock not writer-fair in PREEMPT_RT Sebastian Andrzej Siewior
2021-08-26 20:30 ` John Ogness
2021-08-27 10:07 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210825132754.GA895675@lothringen \
--to=frederic@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox