public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mike Galbraith <efault@gmx.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	John Ogness <john.ogness@linutronix.de>,
	Roman Penyaev <rpenyaev@suse.de>,
	Davidlohr Bueso <dbueso@suse.de>, Jason Baron <jbaron@akamai.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
Date: Tue, 16 Nov 2021 15:02:52 +0100	[thread overview]
Message-ID: <20211116140252.GA348770@lothringen> (raw)

Hi,

I'm iterating again on this topic, this time with the author of
the patch Cc'ed.

The following commit:

    a218cc491420 (epoll: use rwlock in order to reduce ep_poll
                  callback() contention)

has changed the ep->lock into an rwlock. This can cause priority inversion
on PREEMPT_RT. Here is an example:


1) High priority task A waits for events on epoll_wait(), nothing shows up so
   it goes to sleep for new events in the ep_poll() loop.

2) Lower prio task B brings new events in ep_poll_callback(), waking up A
   while still holding read_lock(ep->lock)

3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
   to wait for task B to release read_lock(ep->lock). Unfortunately there is
   no priority inheritance when write_lock() is called on an rwlock that is
   already read_lock'ed. So back to task B that may even be preempted by
   yet another task before releasing read_lock(ep->lock).


Now how to solve this? Several possibilities:


== Delay the wake up after releasing the read_lock()? ==

That solves part of the problem only. If another event comes up
concurrently we are back to the original issue.

== Make rwlock more fair ? ==

Currently read_lock() only acquires the rtmutex if the lock is already
write-held (or write_lock() is waiting to acquire). So if read_lock() happens
after write_lock(), fairness is observed but if write_lock() happens after
read_lock(), priority inheritance doesn't happen.

I think there has been attempts to solve this by the past but some issues
arised (don't know the exact details, comments on rwbase_rt.c bring some clues).

== Convert the rwlock to RCU ? ==

Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
situation fits here because the rwlock is used the other way around:
the epoll consumer does the write_lock() and the producers do read_lock(). Then
concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
to handle racy modifications.

There are also list modifications on both side. There are added from the
producers and read and deleted (even re-added sometimes) on the consumer side.

Perhaps RCU could be used with keeping locking on the consumer side...

== Convert to llist ? ==

It's a possibility but some operations like single element deletion may be
costly because only llist_add() and llist_del_all() are atomic on llist.
!CONFIG_PREEMPT_RT might not be happy about it.

== Consider epoll not PREEMPT_RT friendly? ==

A last resort is to simply consider epoll is not RT-friendly and suggest
using more simple alternatives like poll()....

Any thoughts?



             reply	other threads:[~2021-11-16 14:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 14:02 Frederic Weisbecker [this message]
2021-11-17 20:11 ` [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211116140252.GA348770@lothringen \
    --to=frederic@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=dbueso@suse.de \
    --cc=efault@gmx.de \
    --cc=jbaron@akamai.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rpenyaev@suse.de \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox