From: Eric Wong <e@80x24.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Mike Galbraith <efault@gmx.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
John Ogness <john.ogness@linutronix.de>,
Roman Penyaev <rpenyaev@suse.de>,
Davidlohr Bueso <dbueso@suse.de>, Jason Baron <jbaron@akamai.com>,
Al Viro <viro@zeniv.linux.org.uk>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
Date: Wed, 17 Nov 2021 20:11:32 +0000 [thread overview]
Message-ID: <20211117201132.M259904@dcvr> (raw)
In-Reply-To: <20211116140252.GA348770@lothringen>
Frederic Weisbecker <frederic@kernel.org> wrote:
> Hi,
>
> I'm iterating again on this topic, this time with the author of
> the patch Cc'ed.
>
> The following commit:
>
> a218cc491420 (epoll: use rwlock in order to reduce ep_poll
> callback() contention)
>
> has changed the ep->lock into an rwlock. This can cause priority inversion
> on PREEMPT_RT. Here is an example:
>
>
> 1) High priority task A waits for events on epoll_wait(), nothing shows up so
> it goes to sleep for new events in the ep_poll() loop.
>
> 2) Lower prio task B brings new events in ep_poll_callback(), waking up A
> while still holding read_lock(ep->lock)
>
> 3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
> to wait for task B to release read_lock(ep->lock). Unfortunately there is
> no priority inheritance when write_lock() is called on an rwlock that is
> already read_lock'ed. So back to task B that may even be preempted by
> yet another task before releasing read_lock(ep->lock).
>
>
> Now how to solve this? Several possibilities:
>
> == Delay the wake up after releasing the read_lock()? ==
>
> That solves part of the problem only. If another event comes up
> concurrently we are back to the original issue.
>
> == Make rwlock more fair ? ==
>
> Currently read_lock() only acquires the rtmutex if the lock is already
> write-held (or write_lock() is waiting to acquire). So if read_lock() happens
> after write_lock(), fairness is observed but if write_lock() happens after
> read_lock(), priority inheritance doesn't happen.
>
> I think there has been attempts to solve this by the past but some issues
> arised (don't know the exact details, comments on rwbase_rt.c bring some clues).
>
> == Convert the rwlock to RCU ? ==
>
> Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
> situation fits here because the rwlock is used the other way around:
> the epoll consumer does the write_lock() and the producers do read_lock(). Then
> concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
> to handle racy modifications.
>
> There are also list modifications on both side. There are added from the
> producers and read and deleted (even re-added sometimes) on the consumer side.
>
> Perhaps RCU could be used with keeping locking on the consumer side...
+CC linux-fsdevel and Mathieu Desnoyers
I proposed using wfcqueue many years ago, but ran out of
time/hardware/funding to work on it:
https://yhbt.net/lore/lkml/20130401183118.GA9968@dcvr.yhbt.net/
wfcqueue is used internally by Userspace-RCU, but wfcqueue
itself doesn't rely on RCU. I'm not sure if wfcqueue helps
PREEMPT_RT, but Mathieu + Paul might.
> == Convert to llist ? ==
>
> It's a possibility but some operations like single element deletion may be
> costly because only llist_add() and llist_del_all() are atomic on llist.
> !CONFIG_PREEMPT_RT might not be happy about it.
>
> == Consider epoll not PREEMPT_RT friendly? ==
>
> A last resort is to simply consider epoll is not RT-friendly and suggest
> using more simple alternatives like poll()....
>
> Any thoughts?
prev parent reply other threads:[~2021-11-17 20:18 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-16 14:02 [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Frederic Weisbecker
2021-11-17 20:11 ` Eric Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211117201132.M259904@dcvr \
--to=e@80x24.org \
--cc=bigeasy@linutronix.de \
--cc=dbueso@suse.de \
--cc=efault@gmx.de \
--cc=frederic@kernel.org \
--cc=jbaron@akamai.com \
--cc=john.ogness@linutronix.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rpenyaev@suse.de \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.