From: Catalin Marinas <catalin.marinas@arm.com>
To: "Bouska, Zdenek" <zdenek.bouska@siemens.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Kiszka, Jan" <jan.kiszka@siemens.com>,
"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
Nishanth Menon <nm@ti.com>, Puranjay Mohan <p-mohan@ti.com>
Subject: Re: Unfair qspinlocks on ARM64 without LSE atomics => 3ms delay in interrupt handling
Date: Fri, 24 Mar 2023 17:01:28 +0000 [thread overview]
Message-ID: <ZB3XaNtVqGtYHHBw@arm.com> (raw)
In-Reply-To: <AS1PR10MB56750EFD7BEA779D033A68CBEB849@AS1PR10MB5675.EURPRD10.PROD.OUTLOOK.COM>
On Fri, Mar 24, 2023 at 08:43:38AM +0000, Bouska, Zdenek wrote:
> I have seen ~3 ms delay in interrupt handling on ARM64.
>
> I have traced it down to raw_spin_lock() call in handle_irq_event() in
> kernel/irq/handle.c:
>
> irqreturn_t handle_irq_event(struct irq_desc *desc)
> {
> irqreturn_t ret;
>
> desc->istate &= ~IRQS_PENDING;
> irqd_set(&desc->irq_data, IRQD_IRQ_INPROGRESS);
> raw_spin_unlock(&desc->lock);
>
> ret = handle_irq_event_percpu(desc);
>
> --> raw_spin_lock(&desc->lock);
> irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS);
> return ret;
> }
>
> It took ~3 ms for this raw_spin_lock() to lock.
That's quite a large indeed.
> During this time irq_finalize_oneshot() from kernel/irq/manage.c locks and
> unlocks the same raw spin lock more than 1000 times:
>
> static void irq_finalize_oneshot(struct irq_desc *desc,
> struct irqaction *action)
> {
> if (!(desc->istate & IRQS_ONESHOT) ||
> action->handler == irq_forced_secondary_handler)
> return;
> again:
> chip_bus_lock(desc);
> --> raw_spin_lock_irq(&desc->lock);
>
> /*
> * Implausible though it may be we need to protect us against
> * the following scenario:
> *
> * The thread is faster done than the hard interrupt handler
> * on the other CPU. If we unmask the irq line then the
> * interrupt can come in again and masks the line, leaves due
> * to IRQS_INPROGRESS and the irq line is masked forever.
> *
> * This also serializes the state of shared oneshot handlers
> * versus "desc->threads_oneshot |= action->thread_mask;" in
> * irq_wake_thread(). See the comment there which explains the
> * serialization.
> */
> if (unlikely(irqd_irq_inprogress(&desc->irq_data))) {
> --> raw_spin_unlock_irq(&desc->lock);
> chip_bus_sync_unlock(desc);
> cpu_relax();
> goto again;
> }
So this path is hammering the desc->lock location and another CPU cannot
change it. As you found, the problem is not the spinlock algorithm but
the atomic primitives. The LDXR/STXR constructs on arm64 are known to
have this issue with STXR failing indefinitely. raw_spin_unlock() simply
does an STLR and this clears the exclusive monitor that the other CPU
may have set with LDXR but before the STXR. The queued spinlock only
provides fairness if the CPU manages to get in the queue.
> So I confirmed that atomic operations from
> arch/arm64/include/asm/atomic_ll_sc.h can be quite slow when they are
> contested from second CPU.
>
> Do you think that it is possible to create fair qspinlock implementation
> on top of atomic instructions supported by ARM64 version 8 (no LSE atomic
> instructions) without compromising performance in the uncontested case?
> For example ARM64 could have custom queued_fetch_set_pending_acquire
> implementation same as x86 has in arch/x86/include/asm/qspinlock.h. Is the
> retry loop in irq_finalize_oneshot() ok together with the current ARM64
> cpu_relax() implementation for processor with no LSE atomic instructions?
So is the queued_fetch_set_pending_acquire() where it gets stuck or the
earlier atomic_try_cmpxchg_acquire() before entering on the slow path? I
guess both can fail in a similar way.
A longer cpu_relax() here would improve things (on arm64 this function
is a no-op) but maybe Thomas or Will have a better idea.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-03-24 17:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-24 8:43 Unfair qspinlocks on ARM64 without LSE atomics => 3ms delay in interrupt handling Bouska, Zdenek
2023-03-24 17:01 ` Catalin Marinas [this message]
2023-03-24 18:09 ` Will Deacon
2023-03-28 9:39 ` Bouska, Zdenek
2023-03-27 5:44 ` Bouska, Zdenek
[not found] <AS1PR10MB567534190B05A4493674173BEB659@AS1PR10MB5675.EURPRD10.PROD.OUTLOOK.COM>
2023-04-26 21:29 ` Thomas Gleixner
2023-04-27 9:38 ` Bouska, Zdenek
2023-04-27 10:06 ` Will Deacon
2023-04-27 13:14 ` Jan Kiszka
2023-04-27 13:45 ` Kurt Kanzenbach
2023-04-28 7:30 ` Sebastian Andrzej Siewior
2023-04-28 7:37 ` Kurt Kanzenbach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZB3XaNtVqGtYHHBw@arm.com \
--to=catalin.marinas@arm.com \
--cc=jan.kiszka@siemens.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=nm@ti.com \
--cc=p-mohan@ti.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=zdenek.bouska@siemens.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).