All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Koichiro Den <den@valinux.co.jp>
Cc: linux-arm-kernel@lists.infradead.org, tglx@linutronix.de,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] irqchip/gic-v3-its: Fix invalid wait context lockdep report
Date: Wed, 27 Aug 2025 13:48:33 +0100	[thread overview]
Message-ID: <86h5xtdj6m.wl-maz@kernel.org> (raw)
In-Reply-To: <20250827073848.1410315-1-den@valinux.co.jp>

On Wed, 27 Aug 2025 08:38:48 +0100,
Koichiro Den <den@valinux.co.jp> wrote:
> 
> its_irq_set_vcpu_affinity() always runs under a raw_spin_lock wait
> context, so calling kcalloc there is not permitted and RT-unsafe since
> ___slab_alloc() may acquire a local lock. The below is the actual
> lockdep report observed:
> 
>   =============================
>   [ BUG: Invalid wait context ]
>   6.16.0-rc3-irqchip-next-7e28bba92c5c+ #1 Tainted: G S
>   -----------------------------
>   qemu-system-aar/2129 is trying to lock:
>   ffff0085b74f2178 (batched_entropy_u32.lock){..-.}-{3:3}, at: get_random_u32+0x9c/0x708
>   other info that might help us debug this:
>   context-{5:5}
>   6 locks held by qemu-system-aar/2129:
>    #0: ffff0000b84a0738 (&vdev->igate){+.+.}-{4:4}, at: vfio_pci_core_ioctl+0x40c/0x748 [vfio_pci_core]
>    #1: ffff8000883cef68 (lock#6){+.+.}-{4:4}, at: irq_bypass_register_producer+0x64/0x2f0
>    #2: ffff0000ac0df960 (&its->its_lock){+.+.}-{4:4}, at: kvm_vgic_v4_set_forwarding+0x224/0x6f0
>    #3: ffff000086dc4718 (&irq->irq_lock#3){....}-{2:2}, at: kvm_vgic_v4_set_forwarding+0x288/0x6f0
>    #4: ffff0001356200c8 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xc8/0x158
>    #5: ffff00009eae4850 (&dev->event_map.vlpi_lock){....}-{2:2}, at: its_irq_set_vcpu_affinity+0x8c/0x528
>   ...
>   Call trace:
>    show_stack+0x30/0x98 (C)
>    dump_stack_lvl+0x9c/0xd0
>    dump_stack+0x1c/0x34
>    __lock_acquire+0x814/0xb40
>    lock_acquire.part.0+0x16c/0x2a8
>    lock_acquire+0x8c/0x178
>    get_random_u32+0xd4/0x708
>    __get_random_u32_below+0x20/0x80
>    shuffle_freelist+0x5c/0x1b0
>    allocate_slab+0x15c/0x348
>    new_slab+0x48/0x80
>    ___slab_alloc+0x590/0x8b8
>    __slab_alloc.isra.0+0x3c/0x80
>    __kmalloc_noprof+0x174/0x520
>    its_vlpi_map+0x834/0xce0
>    its_irq_set_vcpu_affinity+0x21c/0x528
>    irq_set_vcpu_affinity+0x160/0x1b0
>    its_map_vlpi+0x90/0x100
>    kvm_vgic_v4_set_forwarding+0x3c4/0x6f0
>    kvm_arch_irq_bypass_add_producer+0xac/0x108
>    __connect+0x138/0x1b0
>    irq_bypass_register_producer+0x16c/0x2f0
>    vfio_msi_set_vector_signal+0x2c0/0x5a8 [vfio_pci_core]
>    vfio_msi_set_block+0x8c/0x120 [vfio_pci_core]
>    vfio_pci_set_msi_trigger+0x120/0x3d8 [vfio_pci_core]

Huh. I guess this is due to RT not being completely compatible with
GFP_ATOMIC...  Why you'd want RT and KVM at the same time is beyond
me, but hey.

>   ...
> 
> To avoid this, simply pre-allocate vlpi_maps when creating an ITS v4
> device with LPIs allcation. The trade-off is some wasted memory
> depending on nr_lpis, if none of those LPIs are never upgraded to VLPIs.
>
> An alternative would be to move the vlpi_maps allocation out of
> its_map_vlpi() and introduce a two-stage prepare/commit flow, allowing a
> caller (KVM in the lockdep splat shown above) to do the allocation
> outside irq_set_vcpu_affinity(). However, this would unnecessarily add
> complexity.

That's debatable. It is probably fine for now, but if this was to
grow, we'd need to revisit this.

> Fixes: d011e4e654d7 ("irqchip/gic-v3-its: Add VLPI map/unmap operations")

No. This code predates RT being merged, and this problem cannot occur
before RT.

> Signed-off-by: Koichiro Den <den@valinux.co.jp>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 36 ++++++++++++++++++--------------
>  1 file changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 467cb78435a9..b933be8ddc51 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1923,19 +1923,10 @@ static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
>  	if (!info->map)
>  		return -EINVAL;
>  
> -	if (!its_dev->event_map.vm) {
> -		struct its_vlpi_map *maps;
> -
> -		maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps),
> -			       GFP_ATOMIC);
> -		if (!maps)
> -			return -ENOMEM;
> -
> +	if (!its_dev->event_map.vm)
>  		its_dev->event_map.vm = info->map->vm;
> -		its_dev->event_map.vlpi_maps = maps;
> -	} else if (its_dev->event_map.vm != info->map->vm) {
> +	else if (its_dev->event_map.vm != info->map->vm)
>  		return -EINVAL;
> -	}
>  
>  	/* Get our private copy of the mapping information */
>  	its_dev->event_map.vlpi_maps[event] = *info->map;
> @@ -2010,10 +2001,8 @@ static int its_vlpi_unmap(struct irq_data *d)
>  	 * Drop the refcount and make the device available again if
>  	 * this was the last VLPI.
>  	 */
> -	if (!--its_dev->event_map.nr_vlpis) {
> +	if (!--its_dev->event_map.nr_vlpis)
>  		its_dev->event_map.vm = NULL;
> -		kfree(its_dev->event_map.vlpi_maps);
> -	}
>  
>  	return 0;
>  }
> @@ -3469,6 +3458,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>  {
>  	struct its_device *dev;
>  	unsigned long *lpi_map = NULL;
> +	struct its_vlpi_map *vlpi_maps;
>  	unsigned long flags;
>  	u16 *col_map = NULL;
>  	void *itt;
> @@ -3497,16 +3487,28 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
>  
>  	if (alloc_lpis) {
>  		lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
> -		if (lpi_map)
> +		if (lpi_map) {
>  			col_map = kcalloc(nr_lpis, sizeof(*col_map),
>  					  GFP_KERNEL);
> +
> +			/*
> +			 * Pre-allocate vlpi_maps to avoid slab allocation
> +			 * under the strict raw spinlock wait context of
> +			 * irq_set_vcpu_affinity. This could waste memory
> +			 * if no vlpi map is ever created.
> +			 */
> +			if (is_v4(its) && nr_lpis > 0)
> +				vlpi_maps = kcalloc(nr_lpis, sizeof(*vlpi_maps),
> +						    GFP_KERNEL);
> +		}
>  	} else {
>  		col_map = kcalloc(nr_ites, sizeof(*col_map), GFP_KERNEL);
>  		nr_lpis = 0;
>  		lpi_base = 0;
>  	}
>  
> -	if (!dev || !itt || !col_map || (!lpi_map && alloc_lpis)) {
> +	if (!dev || !itt || !col_map ||
> +	    (alloc_lpis && (!lpi_map || (is_v4(its) && !vlpi_maps)))) {

This needs to be collapsed into a single boolean evaluated with the
pointer being NULL.

>  		kfree(dev);
>  		itt_free_pool(itt, sz);
>  		bitmap_free(lpi_map);

Where are you freeing vlpi_maps if on the failure path??

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


  reply	other threads:[~2025-08-27 16:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27  7:38 [PATCH] irqchip/gic-v3-its: Fix invalid wait context lockdep report Koichiro Den
2025-08-27 12:48 ` Marc Zyngier [this message]
2025-08-28  3:09   ` Koichiro Den
2025-08-28  7:56     ` Marc Zyngier
2025-08-28  8:45       ` Koichiro Den

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86h5xtdj6m.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=den@valinux.co.jp \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.