From: Marc Zyngier <maz@kernel.org>
To: Koichiro Den <den@valinux.co.jp>
Cc: linux-arm-kernel@lists.infradead.org, tglx@linutronix.de,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] irqchip/gic-v3-its: Fix invalid wait context lockdep report
Date: Thu, 28 Aug 2025 08:56:01 +0100 [thread overview]
Message-ID: <86cy8fev72.wl-maz@kernel.org> (raw)
In-Reply-To: <pkfekcmetqyoj7rwvr77kisu7ok7bc6srq5maoydisnsk4bnyy@wimnw744lp5t>
On Thu, 28 Aug 2025 04:09:00 +0100,
Koichiro Den <den@valinux.co.jp> wrote:
>
> On Wed, Aug 27, 2025 at 01:48:33PM +0100, Marc Zyngier wrote:
> > On Wed, 27 Aug 2025 08:38:48 +0100,
> > Koichiro Den <den@valinux.co.jp> wrote:
> > >
> > > its_irq_set_vcpu_affinity() always runs under a raw_spin_lock wait
> > > context, so calling kcalloc there is not permitted and RT-unsafe since
> > > ___slab_alloc() may acquire a local lock. The below is the actual
> > > lockdep report observed:
> > >
> > > =============================
> > > [ BUG: Invalid wait context ]
> > > 6.16.0-rc3-irqchip-next-7e28bba92c5c+ #1 Tainted: G S
> > > -----------------------------
> > > qemu-system-aar/2129 is trying to lock:
> > > ffff0085b74f2178 (batched_entropy_u32.lock){..-.}-{3:3}, at: get_random_u32+0x9c/0x708
> > > other info that might help us debug this:
> > > context-{5:5}
> > > 6 locks held by qemu-system-aar/2129:
> > > #0: ffff0000b84a0738 (&vdev->igate){+.+.}-{4:4}, at: vfio_pci_core_ioctl+0x40c/0x748 [vfio_pci_core]
> > > #1: ffff8000883cef68 (lock#6){+.+.}-{4:4}, at: irq_bypass_register_producer+0x64/0x2f0
> > > #2: ffff0000ac0df960 (&its->its_lock){+.+.}-{4:4}, at: kvm_vgic_v4_set_forwarding+0x224/0x6f0
> > > #3: ffff000086dc4718 (&irq->irq_lock#3){....}-{2:2}, at: kvm_vgic_v4_set_forwarding+0x288/0x6f0
> > > #4: ffff0001356200c8 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xc8/0x158
> > > #5: ffff00009eae4850 (&dev->event_map.vlpi_lock){....}-{2:2}, at: its_irq_set_vcpu_affinity+0x8c/0x528
> > > ...
> > > Call trace:
> > > show_stack+0x30/0x98 (C)
> > > dump_stack_lvl+0x9c/0xd0
> > > dump_stack+0x1c/0x34
> > > __lock_acquire+0x814/0xb40
> > > lock_acquire.part.0+0x16c/0x2a8
> > > lock_acquire+0x8c/0x178
> > > get_random_u32+0xd4/0x708
> > > __get_random_u32_below+0x20/0x80
> > > shuffle_freelist+0x5c/0x1b0
> > > allocate_slab+0x15c/0x348
> > > new_slab+0x48/0x80
> > > ___slab_alloc+0x590/0x8b8
> > > __slab_alloc.isra.0+0x3c/0x80
> > > __kmalloc_noprof+0x174/0x520
> > > its_vlpi_map+0x834/0xce0
> > > its_irq_set_vcpu_affinity+0x21c/0x528
> > > irq_set_vcpu_affinity+0x160/0x1b0
> > > its_map_vlpi+0x90/0x100
> > > kvm_vgic_v4_set_forwarding+0x3c4/0x6f0
> > > kvm_arch_irq_bypass_add_producer+0xac/0x108
> > > __connect+0x138/0x1b0
> > > irq_bypass_register_producer+0x16c/0x2f0
> > > vfio_msi_set_vector_signal+0x2c0/0x5a8 [vfio_pci_core]
> > > vfio_msi_set_block+0x8c/0x120 [vfio_pci_core]
> > > vfio_pci_set_msi_trigger+0x120/0x3d8 [vfio_pci_core]
> >
> > Huh. I guess this is due to RT not being completely compatible with
> > GFP_ATOMIC... Why you'd want RT and KVM at the same time is beyond
> > me, but hey.
>
> For the record, I didn't run KVM on RT, though I still believe it's better
> to conform to the wait context rule and avoid triggering the lockdep
> splat.
Then I don't understand how you get this, because I have not seen it
so far.
>
> I don't know if there are any plans which make kmalloc with GFP_ATOMIC
> workable under a stricter wait context (getting rid of the local lock
> in some way?), but I think it would be nicer.
GFP_ATOMIC is documented as being compatible with raw spinlocks in the
absence of RT, making the above trace pretty odd.
>
> >
> > > ...
> > >
> > > To avoid this, simply pre-allocate vlpi_maps when creating an ITS v4
> > > device with LPIs allcation. The trade-off is some wasted memory
> > > depending on nr_lpis, if none of those LPIs are never upgraded to VLPIs.
> > >
> > > An alternative would be to move the vlpi_maps allocation out of
> > > its_map_vlpi() and introduce a two-stage prepare/commit flow, allowing a
> > > caller (KVM in the lockdep splat shown above) to do the allocation
> > > outside irq_set_vcpu_affinity(). However, this would unnecessarily add
> > > complexity.
> >
> > That's debatable. It is probably fine for now, but if this was to
> > grow, we'd need to revisit this.
>
> Just curious but do you have any plans to replace the current
> irq_set_vcpu_affinity() approach with something else?
Who knows. This is the Linux kernel, everything changes all the time
without the need for a good reason. More significantly, the amount of
*data* being associated with a VLPI could become much higher in the
future, and add more unnecessary allocation.
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2025-08-28 8:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 7:38 [PATCH] irqchip/gic-v3-its: Fix invalid wait context lockdep report Koichiro Den
2025-08-27 12:48 ` Marc Zyngier
2025-08-28 3:09 ` Koichiro Den
2025-08-28 7:56 ` Marc Zyngier [this message]
2025-08-28 8:45 ` Koichiro Den
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86cy8fev72.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=den@valinux.co.jp \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.