From: Eric Wong <e@80x24.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Mike Galbraith <efault@gmx.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
John Ogness <john.ogness@linutronix.de>,
Roman Penyaev <rpenyaev@suse.de>,
Davidlohr Bueso <dbueso@suse.de>, Jason Baron <jbaron@akamai.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
Date: Wed, 17 Nov 2021 20:11:32 +0000 [thread overview]
Message-ID: <20211117201132.M259904@dcvr> (raw)
In-Reply-To: <20211116140252.GA348770@lothringen>
Frederic Weisbecker <frederic@kernel.org> wrote:
> Hi,
>
> I'm iterating again on this topic, this time with the author of
> the patch Cc'ed.
>
> The following commit:
>
> a218cc491420 (epoll: use rwlock in order to reduce ep_poll
> callback() contention)
>
> has changed the ep->lock into an rwlock. This can cause priority inversion
> on PREEMPT_RT. Here is an example:
>
>
> 1) High priority task A waits for events on epoll_wait(), nothing shows up so
> it goes to sleep for new events in the ep_poll() loop.
>
> 2) Lower prio task B brings new events in ep_poll_callback(), waking up A
> while still holding read_lock(ep->lock)
>
> 3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
> to wait for task B to release read_lock(ep->lock). Unfortunately there is
> no priority inheritance when write_lock() is called on an rwlock that is
> already read_lock'ed. So back to task B that may even be preempted by
> yet another task before releasing read_lock(ep->lock).
>
>
> Now how to solve this? Several possibilities:
>
> == Delay the wake up after releasing the read_lock()? ==
>
> That solves part of the problem only. If another event comes up
> concurrently we are back to the original issue.
>
> == Make rwlock more fair ? ==
>
> Currently read_lock() only acquires the rtmutex if the lock is already
> write-held (or write_lock() is waiting to acquire). So if read_lock() happens
> after write_lock(), fairness is observed but if write_lock() happens after
> read_lock(), priority inheritance doesn't happen.
>
> I think there has been attempts to solve this by the past but some issues
> arised (don't know the exact details, comments on rwbase_rt.c bring some clues).
>
> == Convert the rwlock to RCU ? ==
>
> Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
> situation fits here because the rwlock is used the other way around:
> the epoll consumer does the write_lock() and the producers do read_lock(). Then
> concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
> to handle racy modifications.
>
> There are also list modifications on both side. There are added from the
> producers and read and deleted (even re-added sometimes) on the consumer side.
>
> Perhaps RCU could be used with keeping locking on the consumer side...
+CC linux-fsdevel and Mathieu Desnoyers
I proposed using wfcqueue many years ago, but ran out of
time/hardware/funding to work on it:
https://yhbt.net/lore/lkml/20130401183118.GA9968@dcvr.yhbt.net/
wfcqueue is used internally by Userspace-RCU, but wfcqueue
itself doesn't rely on RCU. I'm not sure if wfcqueue helps
PREEMPT_RT, but Mathieu + Paul might.
> == Convert to llist ? ==
>
> It's a possibility but some operations like single element deletion may be
> costly because only llist_add() and llist_del_all() are atomic on llist.
> !CONFIG_PREEMPT_RT might not be happy about it.
>
> == Consider epoll not PREEMPT_RT friendly? ==
>
> A last resort is to simply consider epoll is not RT-friendly and suggest
> using more simple alternatives like poll()....
>
> Any thoughts?
parent reply other threads:[~2021-11-17 20:11 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <20211116140252.GA348770@lothringen>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211117201132.M259904@dcvr \
--to=e@80x24.org \
--cc=bigeasy@linutronix.de \
--cc=dbueso@suse.de \
--cc=efault@gmx.de \
--cc=frederic@kernel.org \
--cc=jbaron@akamai.com \
--cc=john.ogness@linutronix.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rpenyaev@suse.de \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).