All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <frederic@kernel.org>
To: linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mike Galbraith <efault@gmx.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	John Ogness <john.ogness@linutronix.de>,
	Roman Penyaev <rpenyaev@suse.de>,
	Davidlohr Bueso <dbueso@suse.de>, Jason Baron <jbaron@akamai.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT?
Date: Tue, 16 Nov 2021 15:02:52 +0100	[thread overview]
Message-ID: <20211116140252.GA348770@lothringen> (raw)

Hi,

I'm iterating again on this topic, this time with the author of
the patch Cc'ed.

The following commit:

    a218cc491420 (epoll: use rwlock in order to reduce ep_poll
                  callback() contention)

has changed the ep->lock into an rwlock. This can cause priority inversion
on PREEMPT_RT. Here is an example:


1) High priority task A waits for events on epoll_wait(), nothing shows up so
   it goes to sleep for new events in the ep_poll() loop.

2) Lower prio task B brings new events in ep_poll_callback(), waking up A
   while still holding read_lock(ep->lock)

3) Task A wakes up immediately, tries to grab write_lock(ep->lock) but it has
   to wait for task B to release read_lock(ep->lock). Unfortunately there is
   no priority inheritance when write_lock() is called on an rwlock that is
   already read_lock'ed. So back to task B that may even be preempted by
   yet another task before releasing read_lock(ep->lock).


Now how to solve this? Several possibilities:


== Delay the wake up after releasing the read_lock()? ==

That solves part of the problem only. If another event comes up
concurrently we are back to the original issue.

== Make rwlock more fair ? ==

Currently read_lock() only acquires the rtmutex if the lock is already
write-held (or write_lock() is waiting to acquire). So if read_lock() happens
after write_lock(), fairness is observed but if write_lock() happens after
read_lock(), priority inheritance doesn't happen.

I think there has been attempts to solve this by the past but some issues
arised (don't know the exact details, comments on rwbase_rt.c bring some clues).

== Convert the rwlock to RCU ? ==

Traditionally, we try to convert rwlocks bringing issues to RCU. I'm not sure the
situation fits here because the rwlock is used the other way around:
the epoll consumer does the write_lock() and the producers do read_lock(). Then
concurrent producers use ad-hoc concurrent list add (see list_add_tail_lockless)
to handle racy modifications.

There are also list modifications on both side. There are added from the
producers and read and deleted (even re-added sometimes) on the consumer side.

Perhaps RCU could be used with keeping locking on the consumer side...

== Convert to llist ? ==

It's a possibility but some operations like single element deletion may be
costly because only llist_add() and llist_del_all() are atomic on llist.
!CONFIG_PREEMPT_RT might not be happy about it.

== Consider epoll not PREEMPT_RT friendly? ==

A last resort is to simply consider epoll is not RT-friendly and suggest
using more simple alternatives like poll()....

Any thoughts?



             reply	other threads:[~2021-11-16 14:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 14:02 Frederic Weisbecker [this message]
2021-11-17 20:11 ` [RFC] How to fix eventpoll rwlock based priority inversion on PREEMPT_RT? Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211116140252.GA348770@lothringen \
    --to=frederic@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=dbueso@suse.de \
    --cc=efault@gmx.de \
    --cc=jbaron@akamai.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=rpenyaev@suse.de \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.