public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>
To: namcao@linutronix.de, brauner@kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	frederic@kernel.org, vschneid@redhat.com,
	gregkh@linuxfoundation.org, chris.friesen@windriver.com,
	viorel-catalin.rapiteanu@windriver.com,
	iulian.mocanu@windriver.com
Subject: [REGRESSION] osnoise: "eventpoll: Replace rwlock with spinlock" causes ~50µs noise spikes on isolated PREEMPT_RT cores
Date: Thu, 26 Mar 2026 16:00:57 +0200	[thread overview]
Message-ID: <20260326140058.272854-1-ionut.nechita@windriver.com> (raw)

Hi,

I'm reporting a regression introduced by commit 0c43094f8cc9
("eventpoll: Replace rwlock with spinlock"), backported to stable 6.12.y.

On a PREEMPT_RT system with nohz_full isolated cores, this commit causes
significant osnoise degradation on the isolated CPUs.

Setup:
  - Kernel: 6.12.78 with PREEMPT_RT
  - Hardware: x86_64, dual-socket (CPUs 0-63)
  - Boot params: nohz_full=1-16,33-48 isolcpus=nohz,domain,managed_irq,1-16,33-48
    rcu_nocbs=1-31,33-63 kthread_cpus=0,32 irqaffinity=17-31,49-63
  - Tool: osnoise tracer (./osnoise -c 1-16,33-48)

With commit applied (spinlock, kernel 6.12.78-vanilla-0):

  CPU    RUNTIME   MAX_NOISE   AVAIL%      NOISE  NMI   IRQ   SIRQ  Thread
  [001]   950000       50163   94.719%        14    0   6864     0    5922
  [004]   950000       50294   94.705%        14    0   6864     0    5920
  [007]   950000       49782   94.759%        14    0   6864     1    5921
  [033]   950000       49528   94.786%        15    0   6864     2    5922
  [016]   950000       48551   94.889%        20    0   6863    19    5942
  [008]   950000       44343   95.332%        14    0   6864     0    5925

With commit reverted (rwlock restored, kernel 6.12.78-vanilla-1):

  CPU    RUNTIME   MAX_NOISE   AVAIL%      NOISE  NMI   IRQ   SIRQ  Thread
  [001]   950000           0   100.000%       0    0      6     0       0
  [004]   950000           0   100.000%       0    0      4     0       0
  [007]   950000           0   100.000%       0    0      4     0       0
  [033]   950000           0   100.000%       0    0      4     0       0
  [016]   950000           0   100.000%       0    0      5     0       0
  [008]   950000           7    99.999%       7    0      5     0       0

Summary across all isolated cores (32 CPUs):

                          With spinlock       With rwlock (reverted)
  MAX noise (ns):         44,343 - 51,869     0 - 10
  IRQ count/sample:       ~6,650 - 6,870      3 - 7
  Thread noise/sample:    ~5,700 - 5,940      0 - 1
  CPU availability:       94.5% - 95.3%       ~100%

The regression is roughly 3 orders of magnitude in noise on isolated
cores. The test was run over many consecutive samples and the pattern
is consistent: with the spinlock, every isolated core sees thousands
of IRQs and ~50µs of noise per 950ms sample window. With the rwlock,
the cores are essentially silent.

Note that CPU 016 occasionally shows SIRQ noise (softirq) with both
kernels, which is a separate known issue with the tick on the first
nohz_full CPU. The eventpoll regression is the dominant noise source.

My understanding of the root cause: the original rwlock allowed
ep_poll_callback() (producer side, running from IRQ context on any CPU)
to use read_lock, which does not cause cross-CPU contention on isolated
cores when no local epoll activity exists. With the spinlock conversion,
on PREEMPT_RT spinlock_t becomes an rt_mutex. This means that even if
the isolated core is not involved in any epoll activity, the lock's
cacheline bouncing and potential PI-boosted wakeups from housekeeping
CPUs can inject noise into the isolated cores via IPI or cache
invalidation traffic.

The commit message acknowledges the throughput regression but argues
real workloads won't notice. However, for RT/latency-sensitive
deployments with CPU isolation, the impact is severe and measurable
even with zero local epoll usage.

I believe this needs either:
  a) A revert of the backport for stable RT trees, or
  b) A fix that avoids the spinlock contention path for isolated CPUs

I can provide the full osnoise trace data if needed.

Tested on:
  Linux system-0 6.12.78-vanilla-{0,1} SMP PREEMPT_RT x86_64
  Linux system-0 6.12.57-vanilla-{0,1} SMP PREEMPT_RT x86_64

Thanks,
Ionut.

             reply	other threads:[~2026-03-26 14:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 14:00 Ionut Nechita (Wind River) [this message]
2026-03-26 14:31 ` [REGRESSION] osnoise: "eventpoll: Replace rwlock with spinlock" causes ~50µs noise spikes on isolated PREEMPT_RT cores Sebastian Andrzej Siewior
2026-03-26 14:52 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260326140058.272854-1-ionut.nechita@windriver.com \
    --to=ionut.nechita@windriver.com \
    --cc=brauner@kernel.org \
    --cc=chris.friesen@windriver.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=iulian.mocanu@windriver.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=namcao@linutronix.de \
    --cc=stable@vger.kernel.org \
    --cc=viorel-catalin.rapiteanu@windriver.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox