public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
@ 2021-11-16 14:02 Frederic Weisbecker
  2021-11-17 20:11 ` Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: Frederic Weisbecker @ 2021-11-16 14:02 UTC (permalink / raw)
  To: linux-rt-users, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Mike Galbraith, Sebastian Andrzej Siewior, John Ogness,
	Roman Penyaev, Davidlohr Bueso, Jason Baron, Al Viro,
	Paul E. McKenney

Hi,

I'm iterating again on this topic, this time with the author of
the patch Cc'ed.

The following commit:

    a218cc491420 (epoll: use rwlock in order to reduce ep_poll
                  callback() contention)

has changed the ep->lock into an rwlock. This can cause priority inversion
on PREEMPT_RT. Here is an example:


1) High priority task A waits for events on epoll_wait(), nothing shows up so
   it goes to sleep for new events in the ep_poll() loop.

2) Lower prio task B brings new events in ep_poll_callback(), waking up A
   while still holding read_lock(ep->lock)

3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
   to wait for task B to release read_lock(ep->lock). Unfortunately there is
   no priority inheritance when write_lock() is called on an rwlock that is
   already read_lock'ed. So back to task B that may even be preempted by
   yet another task before releasing read_lock(ep->lock).


Now how to solve this? Several possibilities:


== Delay the wake up after releasing the read_lock()? ==

That solves part of the problem only. If another event comes up
concurrently we are back to the original issue.

== Make rwlock more fair ? ==

Currently read_lock() only acquires the rtmutex if the lock is already
write-held (or write_lock() is waiting to acquire). So if read_lock() happens
after write_lock(), fairness is observed but if write_lock() happens after
read_lock(), priority inheritance doesn't happen.

I think there has been attempts to solve this by the past but some issues
arised (don't know the exact details, comments on rwbase_rt.c bring some clues).

== Convert the rwlock to RCU ? ==

Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
situation fits here because the rwlock is used the other way around:
the epoll consumer does the write_lock() and the producers do read_lock(). Then
concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
to handle racy modifications.

There are also list modifications on both side. There are added from the
producers and read and deleted (even re-added sometimes) on the consumer side.

Perhaps RCU could be used with keeping locking on the consumer side...

== Convert to llist ? ==

It's a possibility but some operations like single element deletion may be
costly because only llist_add() and llist_del_all() are atomic on llist.
!CONFIG_PREEMPT_RT might not be happy about it.

== Consider epoll not PREEMPT_RT friendly? ==

A last resort is to simply consider epoll is not RT-friendly and suggest
using more simple alternatives like poll()....

Any thoughts?



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
  2021-11-16 14:02 [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Frederic Weisbecker
@ 2021-11-17 20:11 ` Eric Wong
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2021-11-17 20:11 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-rt-users, linux-kernel, linux-fsdevel, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Steven Rostedt, Mike Galbraith,
	Sebastian Andrzej Siewior, John Ogness, Roman Penyaev,
	Davidlohr Bueso, Jason Baron, Al Viro, Paul E. McKenney,
	Mathieu Desnoyers

Frederic Weisbecker <frederic@kernel.org> wrote:
> Hi,
> 
> I'm iterating again on this topic, this time with the author of
> the patch Cc'ed.
> 
> The following commit:
> 
>     a218cc491420 (epoll: use rwlock in order to reduce ep_poll
>                   callback() contention)
> 
> has changed the ep->lock into an rwlock. This can cause priority inversion
> on PREEMPT_RT. Here is an example:
> 
> 
> 1) High priority task A waits for events on epoll_wait(), nothing shows up so
>    it goes to sleep for new events in the ep_poll() loop.
> 
> 2) Lower prio task B brings new events in ep_poll_callback(), waking up A
>    while still holding read_lock(ep->lock)
> 
> 3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
>    to wait for task B to release read_lock(ep->lock). Unfortunately there is
>    no priority inheritance when write_lock() is called on an rwlock that is
>    already read_lock'ed. So back to task B that may even be preempted by
>    yet another task before releasing read_lock(ep->lock).
> 
> 
> Now how to solve this? Several possibilities:
> 
> == Delay the wake up after releasing the read_lock()? ==
> 
> That solves part of the problem only. If another event comes up
> concurrently we are back to the original issue.
> 
> == Make rwlock more fair ? ==
> 
> Currently read_lock() only acquires the rtmutex if the lock is already
> write-held (or write_lock() is waiting to acquire). So if read_lock() happens
> after write_lock(), fairness is observed but if write_lock() happens after
> read_lock(), priority inheritance doesn't happen.
> 
> I think there has been attempts to solve this by the past but some issues
> arised (don't know the exact details, comments on rwbase_rt.c bring some clues).
> 
> == Convert the rwlock to RCU ? ==
> 
> Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
> situation fits here because the rwlock is used the other way around:
> the epoll consumer does the write_lock() and the producers do read_lock(). Then
> concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
> to handle racy modifications.
> 
> There are also list modifications on both side. There are added from the
> producers and read and deleted (even re-added sometimes) on the consumer side.
> 
> Perhaps RCU could be used with keeping locking on the consumer side...

+CC linux-fsdevel and Mathieu Desnoyers

I proposed using wfcqueue many years ago, but ran out of
time/hardware/funding to work on it:

  https://yhbt.net/lore/lkml/20130401183118.GA9968@dcvr.yhbt.net/

wfcqueue is used internally by Userspace-RCU, but wfcqueue
itself doesn't rely on RCU.  I'm not sure if wfcqueue helps
PREEMPT_RT, but Mathieu + Paul might.

> == Convert to llist ? ==
> 
> It's a possibility but some operations like single element deletion may be
> costly because only llist_add() and llist_del_all() are atomic on llist.
> !CONFIG_PREEMPT_RT might not be happy about it.
> 
> == Consider epoll not PREEMPT_RT friendly? ==
> 
> A last resort is to simply consider epoll is not RT-friendly and suggest
> using more simple alternatives like poll()....
> 
> Any thoughts?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-11-17 20:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-16 14:02 [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Frederic Weisbecker
2021-11-17 20:11 ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox