linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
       [not found] <20211116140252.GA348770@lothringen>
@ 2021-11-17 20:11 ` Eric Wong
  0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2021-11-17 20:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-rt-users, linux-kernel, linux-fsdevel, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt, Mike Galbraith,
	Sebastian Andrzej Siewior, John Ogness, Roman Penyaev,
	Davidlohr Bueso, Jason Baron, Al Viro, Paul E. McKenney,
	Mathieu Desnoyers

Frederic Weisbecker <frederic@kernel.org> wrote:
> Hi,
> 
> I'm iterating again on this topic, this time with the author of
> the patch Cc'ed.
> 
> The following commit:
> 
>     a218cc491420 (epoll: use rwlock in order to reduce ep_poll
>                   callback() contention)
> 
> has changed the ep->lock into an rwlock. This can cause priority inversion
> on PREEMPT_RT. Here is an example:
> 
> 
> 1) High priority task A waits for events on epoll_wait(), nothing shows up so
>    it goes to sleep for new events in the ep_poll() loop.
> 
> 2) Lower prio task B brings new events in ep_poll_callback(), waking up A
>    while still holding read_lock(ep->lock)
> 
> 3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
>    to wait for task B to release read_lock(ep->lock). Unfortunately there is
>    no priority inheritance when write_lock() is called on an rwlock that is
>    already read_lock'ed. So back to task B that may even be preempted by
>    yet another task before releasing read_lock(ep->lock).
> 
> 
> Now how to solve this? Several possibilities:
> 
> == Delay the wake up after releasing the read_lock()? ==
> 
> That solves part of the problem only. If another event comes up
> concurrently we are back to the original issue.
> 
> == Make rwlock more fair ? ==
> 
> Currently read_lock() only acquires the rtmutex if the lock is already
> write-held (or write_lock() is waiting to acquire). So if read_lock() happens
> after write_lock(), fairness is observed but if write_lock() happens after
> read_lock(), priority inheritance doesn't happen.
> 
> I think there has been attempts to solve this by the past but some issues
> arised (don't know the exact details, comments on rwbase_rt.c bring some clues).
> 
> == Convert the rwlock to RCU ? ==
> 
> Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
> situation fits here because the rwlock is used the other way around:
> the epoll consumer does the write_lock() and the producers do read_lock(). Then
> concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
> to handle racy modifications.
> 
> There are also list modifications on both side. There are added from the
> producers and read and deleted (even re-added sometimes) on the consumer side.
> 
> Perhaps RCU could be used with keeping locking on the consumer side...

+CC linux-fsdevel and Mathieu Desnoyers

I proposed using wfcqueue many years ago, but ran out of
time/hardware/funding to work on it:

  https://yhbt.net/lore/lkml/20130401183118.GA9968@dcvr.yhbt.net/

wfcqueue is used internally by Userspace-RCU, but wfcqueue
itself doesn't rely on RCU.  I'm not sure if wfcqueue helps
PREEMPT_RT, but Mathieu + Paul might.

> == Convert to llist ? ==
> 
> It's a possibility but some operations like single element deletion may be
> costly because only llist_add() and llist_del_all() are atomic on llist.
> !CONFIG_PREEMPT_RT might not be happy about it.
> 
> == Consider epoll not PREEMPT_RT friendly? ==
> 
> A last resort is to simply consider epoll is not RT-friendly and suggest
> using more simple alternatives like poll()....
> 
> Any thoughts?

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-11-17 20:11 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211116140252.GA348770@lothringen>
2021-11-17 20:11 ` [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).