public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Juri Lelli <juri.lelli@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Darren Hart" <dvhart@infradead.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"André Almeida" <andrealmeid@igalia.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: Re: Futex hash_bucket lock can break isolation and cause priority inversion on RT
Date: Tue, 8 Oct 2024 14:30:31 -0400	[thread overview]
Message-ID: <b77b8a52-7b53-46c5-bece-621345fdd4ba@redhat.com> (raw)
In-Reply-To: <ZwVOMgBMxrw7BU9A@jlelli-thinkpadt14gen4.remote.csb>

On 10/8/24 11:22 AM, Juri Lelli wrote:
> Hello,
>
> A report concerning latency sensitive applications using futexes on a
> PREEMPT_RT kernel brought me to (try to!) refresh my understanding of
> how futexes are implemented. The following is an attempt to make sense
> of what I am seeing from traces, validate that it indeed might make
> sense and possibly collect ideas on how to address the issue at hand.
>
> Simplifying what is actually a quite complicated setup composed of
> non-realtime (i.e., background load mostly related to a containers
> orchestrator) and realtime tasks, we can consider the following
> situation:
>
>   - Multiprocessor system running a PREEMPT_RT kernel
>   - Housekeeping CPUs (usually 2) running background tasks + “isolated”
>     CPUs running latency sensitive tasks (possibly need to run also
>     non-realtime activities at times)
>   - CPUs are isolated dynamically by using nohz_full/rcu_nocbs options
>     and affinity, no static scheduler isolation is used (i.e., no
>     isolcpus=domain)
>   - Threaded IRQs, RCU related kthreads, timers, etc. are configured with
>     the highest priorities on the system (FIFO)
>   - Latency sensitive application threads run at FIFO priority below the
>     set of tasks from the former point
>   - Latency sensitive application uses futexes, but they protect data
>     only shared among tasks running on the isolated set of CPUs
>   - Tasks running on housekeeping CPUs also use futexes
>   - Futexes belonging to the above two sets of non interacting tasks are
>     distinct
>
> Under these conditions the actual issue presents itself when:
>
>   - A background task on a housekeeping CPUs enters sys_futex syscall and
>     locks a hb->lock (PI enabled mutex on RT)
>   - That background task gets preempted by a higher priority task (e.g.
>     NIC irq thread)
>   - A low latency application task on an isolated CPU also enters
>     sys_futex, hash collision towards the background task hb, tries to
>     grab hb->lock and, even if it boosts the background task, it still
>     needs to wait for the higher priority task (NIC irq) to finish
>     executing on the housekeeping CPU and eventually misses its deadline
>
> Now, of course by making the latency sensitive application tasks use a
> higher priority than anything on housekeeping CPUs we could avoid the
> issue, but the fact that an implicit in-kernel link between otherwise
> unrelated tasks might cause priority inversion is probably not ideal?
> Thus this email.
>
> Does this report make any sense? If it does, has this issue ever been
> reported and possibly discussed? I guess it’s kind of a corner case, but
> I wonder if anybody has suggestions already on how to possibly try to
> tackle it from a kernel perspective.

Just a question. Is the low latency application using PI futex or the 
normal wait-wake futex? We could use separate set of hash buckets for 
these distinct futex types.

Cheers,
Longman


  parent reply	other threads:[~2024-10-08 18:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 15:22 Futex hash_bucket lock can break isolation and cause priority inversion on RT Juri Lelli
2024-10-08 15:38 ` André Almeida
2024-10-08 15:51   ` Sebastian Andrzej Siewior
2024-10-08 15:59     ` André Almeida
2024-10-08 18:09       ` Sebastian Andrzej Siewior
2024-10-09  8:36       ` Juri Lelli
2024-10-24 22:36         ` Thomas Gleixner
2024-10-08 17:38 ` Peter Zijlstra
2024-10-08 19:44   ` Waiman Long
2024-10-09  7:22     ` Peter Zijlstra
2024-10-09  8:26   ` Juri Lelli
2024-10-08 18:30 ` Waiman Long [this message]
2024-10-09  8:28   ` Juri Lelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b77b8a52-7b53-46c5-bece-621345fdd4ba@redhat.com \
    --to=llong@redhat.com \
    --cc=andrealmeid@igalia.com \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox