* MSIs not freed in GICv3 ITS driver @ 2024-07-08 15:39 Manivannan Sadhasivam 2024-07-08 17:31 ` Marc Zyngier 0 siblings, 1 reply; 11+ messages in thread From: Manivannan Sadhasivam @ 2024-07-08 15:39 UTC (permalink / raw) To: maz, tglx; +Cc: linux-arm-kernel, linux-kernel Hi Marc, Thomas, I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds the MSI count to power of 2 to find the order. In this case, the order becomes 2 in its_alloc_device_irq(). So 4 entries are allocated by bitmap_find_free_region(). But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() will only allocate 3 MSIs, leaving one bitmap entry unused. And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 allocated MSIs were freed and their bitmap entries were also released. But the entry for the additional bitmap was never released. Due to this, its_free_device() was also never called, resulting in the ITS device not getting freed. So when the PCIe driver tries to request the MSIs again (PCIe device being removed and inserted back), because the ITS device was not freed previously, MSIs were again requested for the same ITS device. And due to the stale bitmap entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were available. This forces the PCIe driver to reduce the MSI count, which is sub optimal. This behavior might be applicable to other irqchip drivers handling MSI as well. I want to know if this behavior is already known with MSI and irqchip drivers? For fixing this issue, the PCIe drivers could always request MSIs of power of 2, and use a dummy MSI handler for the extra number of MSIs allocated. This could also be done in the generic MSI driver itself to avoid changes in the PCIe drivers. But I wouldn't say it is the best possible fix. Is there any other way to address this issue? Or am I missing something completely? - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2024-07-08 15:39 MSIs not freed in GICv3 ITS driver Manivannan Sadhasivam @ 2024-07-08 17:31 ` Marc Zyngier 2024-07-09 17:37 ` Manivannan Sadhasivam 0 siblings, 1 reply; 11+ messages in thread From: Marc Zyngier @ 2024-07-08 17:31 UTC (permalink / raw) To: Manivannan Sadhasivam; +Cc: tglx, linux-arm-kernel, linux-kernel Mani, On Mon, 08 Jul 2024 16:39:33 +0100, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > Hi Marc, Thomas, > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > in its_alloc_device_irq(). That's because we can only allocate EventIDs as a number of ID bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or 2^24. This is a power-of-two architecture. > So 4 entries are allocated by bitmap_find_free_region(). Assuming you're calling about its_alloc_device_irq(), it looks like a bug. Or rather, some laziness on my part. The thing is, this bitmap is only dealing with sub-allocation in the pool that has been given to the endpoint. So the power-of-two crap doesn't really matter unless you are dealing with Multi-MSI, which has actual alignment requirements. > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > will only allocate 3 MSIs, leaving one bitmap entry unused. > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > allocated MSIs were freed and their bitmap entries were also released. But the > entry for the additional bitmap was never released. Due to this, > its_free_device() was also never called, resulting in the ITS device not getting > freed. > > So when the PCIe driver tries to request the MSIs again (PCIe device being > removed and inserted back), because the ITS device was not freed previously, > MSIs were again requested for the same ITS device. And due to the stale bitmap > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > available. This forces the PCIe driver to reduce the MSI count, which is sub > optimal. > > This behavior might be applicable to other irqchip drivers handling MSI as well. > I want to know if this behavior is already known with MSI and irqchip drivers? > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > and use a dummy MSI handler for the extra number of MSIs allocated. This could > also be done in the generic MSI driver itself to avoid changes in the PCIe > drivers. But I wouldn't say it is the best possible fix. No, that's terrible. This is just papering over a design mistake, and I refuse to go down that road. > > Is there any other way to address this issue? Or am I missing something > completely? Well, since each endpoint handled by an ITS has its allocation tracked by a bitmap, it makes more sense to precisely track the allocation. Here's a quick hack that managed to survive a VM boot. It may even work. The only problem with it is that it probably breaks a Multi-MSi device sitting behind a non-transparent bridge that would get its MSIs allocated after another device. In this case, we wouldn't honor the required alignment and things would break. So take this as a proof of concept. If that works, I'll think of how to deal with this crap in a more suitable way... M. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3c755d5dad6e6..43479c9e7f8d2 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev) static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq) { - int idx; + unsigned long idx; /* Find a free LPI region in lpi_map and allocate them. */ - idx = bitmap_find_free_region(dev->event_map.lpi_map, - dev->event_map.nr_lpis, - get_count_order(nvecs)); - if (idx < 0) + idx = bitmap_find_next_zero_area(dev->event_map.lpi_map, + dev->event_map.nr_lpis, 0, nvecs, 0); + if (idx >= dev->event_map.nr_lpis) return -ENOSPC; + bitmap_set(dev->event_map.lpi_map, idx, nvecs); + *hwirq = dev->event_map.lpi_base + idx; return 0; @@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq, struct its_node *its = its_dev->its; int i; - bitmap_release_region(its_dev->event_map.lpi_map, - its_get_event_id(irq_domain_get_irq_data(domain, virq)), - get_count_order(nr_irqs)); + bitmap_clear(its_dev->event_map.lpi_map, + its_get_event_id(irq_domain_get_irq_data(domain, virq)), + nr_irqs); for (i = 0; i < nr_irqs; i++) { struct irq_data *data = irq_domain_get_irq_data(domain, -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2024-07-08 17:31 ` Marc Zyngier @ 2024-07-09 17:37 ` Manivannan Sadhasivam 2024-07-09 19:24 ` Marc Zyngier 0 siblings, 1 reply; 11+ messages in thread From: Manivannan Sadhasivam @ 2024-07-09 17:37 UTC (permalink / raw) To: Marc Zyngier; +Cc: tglx, linux-arm-kernel, linux-kernel On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > Mani, > > On Mon, 08 Jul 2024 16:39:33 +0100, > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > Hi Marc, Thomas, > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > in its_alloc_device_irq(). > > That's because we can only allocate EventIDs as a number of ID > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > 2^24. This is a power-of-two architecture. > Ah okay. > > So 4 entries are allocated by bitmap_find_free_region(). > > Assuming you're calling about its_alloc_device_irq(), it looks like a > bug. Or rather, some laziness on my part. The thing is, this bitmap is > only dealing with sub-allocation in the pool that has been given to > the endpoint. So the power-of-two crap doesn't really matter unless > you are dealing with Multi-MSI, which has actual alignment > requirements. > Okay. > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > allocated MSIs were freed and their bitmap entries were also released. But the > > entry for the additional bitmap was never released. Due to this, > > its_free_device() was also never called, resulting in the ITS device not getting > > freed. > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > removed and inserted back), because the ITS device was not freed previously, > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > optimal. > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > drivers. But I wouldn't say it is the best possible fix. > > No, that's terrible. This is just papering over a design mistake, and > I refuse to go down that road. > Agree. But what about other MSI drivers? And because of the MSI design, they also round the requested MSI count to power of 2, leading to unused vectors and those also wouldn't get freed. I think this power of 2 limitation should be imposed at the API level or in the MSI driver instead of silently keeping unused vectors in irqchip drivers. > > > > Is there any other way to address this issue? Or am I missing something > > completely? > > Well, since each endpoint handled by an ITS has its allocation tracked > by a bitmap, it makes more sense to precisely track the allocation. > > Here's a quick hack that managed to survive a VM boot. It may even > work. The only problem with it is that it probably breaks a Multi-MSi > device sitting behind a non-transparent bridge that would get its MSIs > allocated after another device. In this case, we wouldn't honor the > required alignment and things would break. > > So take this as a proof of concept. If that works, I'll think of how > to deal with this crap in a more suitable way... > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! - Mani > M. > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 3c755d5dad6e6..43479c9e7f8d2 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -3475,15 +3475,16 @@ static void its_free_device(struct its_device *its_dev) > > static int its_alloc_device_irq(struct its_device *dev, int nvecs, irq_hw_number_t *hwirq) > { > - int idx; > + unsigned long idx; > > /* Find a free LPI region in lpi_map and allocate them. */ > - idx = bitmap_find_free_region(dev->event_map.lpi_map, > - dev->event_map.nr_lpis, > - get_count_order(nvecs)); > - if (idx < 0) > + idx = bitmap_find_next_zero_area(dev->event_map.lpi_map, > + dev->event_map.nr_lpis, 0, nvecs, 0); > + if (idx >= dev->event_map.nr_lpis) > return -ENOSPC; > > + bitmap_set(dev->event_map.lpi_map, idx, nvecs); > + > *hwirq = dev->event_map.lpi_base + idx; > > return 0; > @@ -3653,9 +3654,9 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq, > struct its_node *its = its_dev->its; > int i; > > - bitmap_release_region(its_dev->event_map.lpi_map, > - its_get_event_id(irq_domain_get_irq_data(domain, virq)), > - get_count_order(nr_irqs)); > + bitmap_clear(its_dev->event_map.lpi_map, > + its_get_event_id(irq_domain_get_irq_data(domain, virq)), > + nr_irqs); > > for (i = 0; i < nr_irqs; i++) { > struct irq_data *data = irq_domain_get_irq_data(domain, > > -- > Without deviation from the norm, progress is not possible. -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2024-07-09 17:37 ` Manivannan Sadhasivam @ 2024-07-09 19:24 ` Marc Zyngier 2024-07-21 8:50 ` Manivannan Sadhasivam 0 siblings, 1 reply; 11+ messages in thread From: Marc Zyngier @ 2024-07-09 19:24 UTC (permalink / raw) To: Manivannan Sadhasivam; +Cc: tglx, linux-arm-kernel, linux-kernel On Tue, 09 Jul 2024 18:37:08 +0100, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > > Mani, > > > > On Mon, 08 Jul 2024 16:39:33 +0100, > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > > > Hi Marc, Thomas, > > > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > > in its_alloc_device_irq(). > > > > That's because we can only allocate EventIDs as a number of ID > > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > > 2^24. This is a power-of-two architecture. > > > > Ah okay. > > > > So 4 entries are allocated by bitmap_find_free_region(). > > > > Assuming you're calling about its_alloc_device_irq(), it looks like a > > bug. Or rather, some laziness on my part. The thing is, this bitmap is > > only dealing with sub-allocation in the pool that has been given to > > the endpoint. So the power-of-two crap doesn't really matter unless > > you are dealing with Multi-MSI, which has actual alignment > > requirements. > > > > Okay. > > > > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > > allocated MSIs were freed and their bitmap entries were also released. But the > > > entry for the additional bitmap was never released. Due to this, > > > its_free_device() was also never called, resulting in the ITS device not getting > > > freed. > > > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > > removed and inserted back), because the ITS device was not freed previously, > > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > > optimal. > > > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > > drivers. But I wouldn't say it is the best possible fix. > > > > No, that's terrible. This is just papering over a design mistake, and > > I refuse to go down that road. > > > > Agree. But what about other MSI drivers? And because of the MSI design, they > also round the requested MSI count to power of 2, leading to unused vectors and > those also wouldn't get freed. This has absolutely nothing to do with the "design" of MSIs. It has everything to do with not special-casing Multi-MSI. > I think this power of 2 limitation should be > imposed at the API level or in the MSI driver instead of silently keeping unused > vectors in irqchip drivers. You really have the wrong end of the stick. The MSi API has *zero* control over the payload allocation. How could it? The whole point of having an MSI driver is to insulate the core code from such stuff. > > > > > > > Is there any other way to address this issue? Or am I missing something > > > completely? > > > > Well, since each endpoint handled by an ITS has its allocation tracked > > by a bitmap, it makes more sense to precisely track the allocation. > > > > Here's a quick hack that managed to survive a VM boot. It may even > > work. The only problem with it is that it probably breaks a Multi-MSi > > device sitting behind a non-transparent bridge that would get its MSIs > > allocated after another device. In this case, we wouldn't honor the > > required alignment and things would break. > > > > So take this as a proof of concept. If that works, I'll think of how > > to deal with this crap in a more suitable way... > > > > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! Well, as I said, this breaks tons of other things so I'm not going to merge this any time soon. Certainly not before Thomas gets his MSI rework upstream. And then I need to work out how to deal with Multi-MSI in the correct way. So don't hold your breath. M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2024-07-09 19:24 ` Marc Zyngier @ 2024-07-21 8:50 ` Manivannan Sadhasivam 2026-01-16 15:03 ` Manivannan Sadhasivam 0 siblings, 1 reply; 11+ messages in thread From: Manivannan Sadhasivam @ 2024-07-21 8:50 UTC (permalink / raw) To: Marc Zyngier; +Cc: tglx, linux-arm-kernel, linux-kernel On Tue, Jul 09, 2024 at 08:24:37PM +0100, Marc Zyngier wrote: > On Tue, 09 Jul 2024 18:37:08 +0100, > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > > > Mani, > > > > > > On Mon, 08 Jul 2024 16:39:33 +0100, > > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > > > > > Hi Marc, Thomas, > > > > > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > > > in its_alloc_device_irq(). > > > > > > That's because we can only allocate EventIDs as a number of ID > > > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > > > 2^24. This is a power-of-two architecture. > > > > > > > Ah okay. > > > > > > So 4 entries are allocated by bitmap_find_free_region(). > > > > > > Assuming you're calling about its_alloc_device_irq(), it looks like a > > > bug. Or rather, some laziness on my part. The thing is, this bitmap is > > > only dealing with sub-allocation in the pool that has been given to > > > the endpoint. So the power-of-two crap doesn't really matter unless > > > you are dealing with Multi-MSI, which has actual alignment > > > requirements. > > > > > > > Okay. > > > > > > > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > > > allocated MSIs were freed and their bitmap entries were also released. But the > > > > entry for the additional bitmap was never released. Due to this, > > > > its_free_device() was also never called, resulting in the ITS device not getting > > > > freed. > > > > > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > > > removed and inserted back), because the ITS device was not freed previously, > > > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > > > optimal. > > > > > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > > > drivers. But I wouldn't say it is the best possible fix. > > > > > > No, that's terrible. This is just papering over a design mistake, and > > > I refuse to go down that road. > > > > > > > Agree. But what about other MSI drivers? And because of the MSI design, they > > also round the requested MSI count to power of 2, leading to unused vectors and > > those also wouldn't get freed. > > This has absolutely nothing to do with the "design" of MSIs. It has > everything to do with not special-casing Multi-MSI. > > > I think this power of 2 limitation should be > > imposed at the API level or in the MSI driver instead of silently keeping unused > > vectors in irqchip drivers. > > You really have the wrong end of the stick. The MSi API has *zero* > control over the payload allocation. How could it? The whole point of > having an MSI driver is to insulate the core code from such stuff. > Right, but because of the way most of the MSI drivers (not all?) use bitmap to allocate MSIs, this issue is also present in all of them. So I think the fix is not just applicable for the gic-its-v3 driver alone. > > > > > > > > > > Is there any other way to address this issue? Or am I missing something > > > > completely? > > > > > > Well, since each endpoint handled by an ITS has its allocation tracked > > > by a bitmap, it makes more sense to precisely track the allocation. > > > > > > Here's a quick hack that managed to survive a VM boot. It may even > > > work. The only problem with it is that it probably breaks a Multi-MSi > > > device sitting behind a non-transparent bridge that would get its MSIs > > > allocated after another device. In this case, we wouldn't honor the > > > required alignment and things would break. > > > > > > So take this as a proof of concept. If that works, I'll think of how > > > to deal with this crap in a more suitable way... > > > > > > > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! > > Well, as I said, this breaks tons of other things so I'm not going to > merge this any time soon. Certainly not before Thomas gets his MSI > rework upstream. And then I need to work out how to deal with > Multi-MSI in the correct way. > > So don't hold your breath. > Sure thing. Thanks for getting it around quickly. I'll wait for a proper fix. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2024-07-21 8:50 ` Manivannan Sadhasivam @ 2026-01-16 15:03 ` Manivannan Sadhasivam 2026-02-19 16:54 ` Marc Zyngier 0 siblings, 1 reply; 11+ messages in thread From: Manivannan Sadhasivam @ 2026-01-16 15:03 UTC (permalink / raw) To: Marc Zyngier; +Cc: tglx, linux-kernel, Qiang Yu On Sun, Jul 21, 2024 at 02:20:32PM +0530, Manivannan Sadhasivam wrote: > On Tue, Jul 09, 2024 at 08:24:37PM +0100, Marc Zyngier wrote: > > On Tue, 09 Jul 2024 18:37:08 +0100, > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > > > On Mon, Jul 08, 2024 at 06:31:57PM +0100, Marc Zyngier wrote: > > > > Mani, > > > > > > > > On Mon, 08 Jul 2024 16:39:33 +0100, > > > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > > > > > > > > > Hi Marc, Thomas, > > > > > > > > > > I'm seeing a weird behavior with GICv3 ITS driver while allocating MSIs from > > > > > PCIe devices. When the PCIe driver (I'm using virtio_pci_common.c) tries to > > > > > allocate non power of 2 MSIs (like 3), then the GICv3 MSI driver always rounds > > > > > the MSI count to power of 2 to find the order. In this case, the order becomes 2 > > > > > in its_alloc_device_irq(). > > > > > > > > That's because we can only allocate EventIDs as a number of ID > > > > bits. So you can't have *1* MSI, nor 3. You can have 2, 4, 8, or > > > > 2^24. This is a power-of-two architecture. > > > > > > > > > > Ah okay. > > > > > > > > So 4 entries are allocated by bitmap_find_free_region(). > > > > > > > > Assuming you're calling about its_alloc_device_irq(), it looks like a > > > > bug. Or rather, some laziness on my part. The thing is, this bitmap is > > > > only dealing with sub-allocation in the pool that has been given to > > > > the endpoint. So the power-of-two crap doesn't really matter unless > > > > you are dealing with Multi-MSI, which has actual alignment > > > > requirements. > > > > > > > > > > Okay. > > > > > > > > > > > > > But since the PCIe driver has only requested 3 MSIs, its_irq_domain_alloc() > > > > > will only allocate 3 MSIs, leaving one bitmap entry unused. > > > > > > > > > > And when the driver frees the MSIs using pci_free_irq_vectors(), only 3 > > > > > allocated MSIs were freed and their bitmap entries were also released. But the > > > > > entry for the additional bitmap was never released. Due to this, > > > > > its_free_device() was also never called, resulting in the ITS device not getting > > > > > freed. > > > > > > > > > > So when the PCIe driver tries to request the MSIs again (PCIe device being > > > > > removed and inserted back), because the ITS device was not freed previously, > > > > > MSIs were again requested for the same ITS device. And due to the stale bitmap > > > > > entry, the ITS driver refuses to allocate 4 MSIs as only 3 bitmap entries were > > > > > available. This forces the PCIe driver to reduce the MSI count, which is sub > > > > > optimal. > > > > > > > > > > This behavior might be applicable to other irqchip drivers handling MSI as well. > > > > > I want to know if this behavior is already known with MSI and irqchip drivers? > > > > > > > > > > For fixing this issue, the PCIe drivers could always request MSIs of power of 2, > > > > > and use a dummy MSI handler for the extra number of MSIs allocated. This could > > > > > also be done in the generic MSI driver itself to avoid changes in the PCIe > > > > > drivers. But I wouldn't say it is the best possible fix. > > > > > > > > No, that's terrible. This is just papering over a design mistake, and > > > > I refuse to go down that road. > > > > > > > > > > Agree. But what about other MSI drivers? And because of the MSI design, they > > > also round the requested MSI count to power of 2, leading to unused vectors and > > > those also wouldn't get freed. > > > > This has absolutely nothing to do with the "design" of MSIs. It has > > everything to do with not special-casing Multi-MSI. > > > > > I think this power of 2 limitation should be > > > imposed at the API level or in the MSI driver instead of silently keeping unused > > > vectors in irqchip drivers. > > > > You really have the wrong end of the stick. The MSi API has *zero* > > control over the payload allocation. How could it? The whole point of > > having an MSI driver is to insulate the core code from such stuff. > > > > Right, but because of the way most of the MSI drivers (not all?) use bitmap to > allocate MSIs, this issue is also present in all of them. So I think the fix is > not just applicable for the gic-its-v3 driver alone. > > > > > > > > > > > > > > Is there any other way to address this issue? Or am I missing something > > > > > completely? > > > > > > > > Well, since each endpoint handled by an ITS has its allocation tracked > > > > by a bitmap, it makes more sense to precisely track the allocation. > > > > > > > > Here's a quick hack that managed to survive a VM boot. It may even > > > > work. The only problem with it is that it probably breaks a Multi-MSi > > > > device sitting behind a non-transparent bridge that would get its MSIs > > > > allocated after another device. In this case, we wouldn't honor the > > > > required alignment and things would break. > > > > > > > > So take this as a proof of concept. If that works, I'll think of how > > > > to deal with this crap in a more suitable way... > > > > > > > > > > This works fine. Now the ITS driver allocates requested number of MSIs, thanks! > > > > Well, as I said, this breaks tons of other things so I'm not going to > > merge this any time soon. Certainly not before Thomas gets his MSI > > rework upstream. And then I need to work out how to deal with > > Multi-MSI in the correct way. > > > > So don't hold your breath. > > > > Sure thing. Thanks for getting it around quickly. I'll wait for a proper fix. > Hi Marc, Looks like this has fallen through the cracks and my colleage internally reported a warning during the removal of a PCI driver and it seems to be related to the issue we were discussing in this thread: [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT [ 54.774067] Tainted: [W]=WARN [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 [ 54.788849] Workqueue: async async_run_entry_fn [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 54.801230] pc : its_msi_teardown+0x11c/0x13c [ 54.805997] lr : its_msi_teardown+0x54/0x13c [ 54.810675] sp : ffff8000837cb710 [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 [ 54.890509] Call trace: [ 54.893320] its_msi_teardown+0x11c/0x13c (P) [ 54.898082] its_msi_teardown+0x34/0x44 [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 [ 54.907701] msi_device_data_release+0x20/0x64 [ 54.912551] devres_release_all+0xa4/0x104 - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2026-01-16 15:03 ` Manivannan Sadhasivam @ 2026-02-19 16:54 ` Marc Zyngier 2026-02-25 9:34 ` Qiang Yu 0 siblings, 1 reply; 11+ messages in thread From: Marc Zyngier @ 2026-02-19 16:54 UTC (permalink / raw) To: Manivannan Sadhasivam; +Cc: tglx, linux-kernel, Qiang Yu On Fri, 16 Jan 2026 15:03:33 +0000, Manivannan Sadhasivam <mani@kernel.org> wrote: > > Hi Marc, > > Looks like this has fallen through the cracks and my colleage internally > reported a warning during the removal of a PCI driver and it seems to be related > to the issue we were discussing in this thread: > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT > [ 54.774067] Tainted: [W]=WARN > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 > [ 54.788849] Workqueue: async async_run_entry_fn > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c > [ 54.805997] lr : its_msi_teardown+0x54/0x13c > [ 54.810675] sp : ffff8000837cb710 > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 > [ 54.890509] Call trace: > [ 54.893320] its_msi_teardown+0x11c/0x13c (P) > [ 54.898082] its_msi_teardown+0x34/0x44 > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 > [ 54.907701] msi_device_data_release+0x20/0x64 > [ 54.912551] devres_release_all+0xa4/0x104 That's nowhere near enough information for me to do anything about it. Unless you describe exactly what device this is, its allocation requirements, the topology of the system and finally reproduce it on a vanilla kernel and not something that I have no access to, I can't do much for you. M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2026-02-19 16:54 ` Marc Zyngier @ 2026-02-25 9:34 ` Qiang Yu 2026-02-26 13:39 ` Marc Zyngier 0 siblings, 1 reply; 11+ messages in thread From: Qiang Yu @ 2026-02-25 9:34 UTC (permalink / raw) To: Marc Zyngier; +Cc: Manivannan Sadhasivam, tglx, linux-kernel On Thu, Feb 19, 2026 at 04:54:29PM +0000, Marc Zyngier wrote: > On Fri, 16 Jan 2026 15:03:33 +0000, > Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > Hi Marc, > > > > Looks like this has fallen through the cracks and my colleage internally > > reported a warning during the removal of a PCI driver and it seems to be related > > to the issue we were discussing in this thread: > > > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 > > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo > > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT > > [ 54.774067] Tainted: [W]=WARN > > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 > > [ 54.788849] Workqueue: async async_run_entry_fn > > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c > > [ 54.805997] lr : its_msi_teardown+0x54/0x13c > > [ 54.810675] sp : ffff8000837cb710 > > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 > > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 > > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 > > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff > > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 > > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 > > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 > > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 > > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 > > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 > > [ 54.890509] Call trace: > > [ 54.893320] its_msi_teardown+0x11c/0x13c (P) > > [ 54.898082] its_msi_teardown+0x34/0x44 > > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 > > [ 54.907701] msi_device_data_release+0x20/0x64 > > [ 54.912551] devres_release_all+0xa4/0x104 > > That's nowhere near enough information for me to do anything about it. > > Unless you describe exactly what device this is, its allocation > requirements, the topology of the system and finally reproduce it on a > vanilla kernel and not something that I have no access to, I can't do > much for you. Hi Marc, Thanks for the feedback. I can reproduce this issue with latest linux-next tag next-20260224. The host is Glymur (Qualcomm compute platform) with an SDX75 modem connected via PCIe. The SDX75 driver requests 7 MSI IRQs, and the warning triggers during driver removal. I think this is actually a common problem with how we handle MSI allocation vs freeing. Here's what I'm seeing: When allocating, irq_domain_alloc_irqs_hierarchy() makes one call to domain->ops->alloc() with nr_irqs=7. The MSI controller (ITS in this case but DWC-MSI has similar behavior) finds a power-of-2 bits in its bitmap region, so it allocates 8 contiguous bits to satisfy the 7 IRQ request. But when freeing, irq_domain_free_irqs_hierarchy() loops and calls domain->ops->free() seven times, each with nr_irqs=1. So we end up freeing 7 individual bits instead of the original 8 bits that was allocated. This allocation/free mismatch seems to corrupt the bitmap tracking, which is what triggers the warning in its_msi_teardown(). I suspect this would happen with any PCIe device that requests a non-power-of-2 number of MSI IRQs on systems using ITS or DWC-MSI. - Qiang Yu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2026-02-25 9:34 ` Qiang Yu @ 2026-02-26 13:39 ` Marc Zyngier 2026-03-03 5:22 ` Qiang Yu 2026-03-03 9:26 ` Manivannan Sadhasivam 0 siblings, 2 replies; 11+ messages in thread From: Marc Zyngier @ 2026-02-26 13:39 UTC (permalink / raw) To: Qiang Yu; +Cc: Manivannan Sadhasivam, tglx, linux-kernel On Wed, 25 Feb 2026 09:34:41 +0000, Qiang Yu <qiang.yu@oss.qualcomm.com> wrote: > > On Thu, Feb 19, 2026 at 04:54:29PM +0000, Marc Zyngier wrote: > > On Fri, 16 Jan 2026 15:03:33 +0000, > > Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > > > Hi Marc, > > > > > > Looks like this has fallen through the cracks and my colleage internally > > > reported a warning during the removal of a PCI driver and it seems to be related > > > to the issue we were discussing in this thread: > > > > > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 > > > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo > > > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT > > > [ 54.774067] Tainted: [W]=WARN > > > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 > > > [ 54.788849] Workqueue: async async_run_entry_fn > > > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c > > > [ 54.805997] lr : its_msi_teardown+0x54/0x13c > > > [ 54.810675] sp : ffff8000837cb710 > > > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 > > > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 > > > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 > > > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff > > > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 > > > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 > > > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 > > > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 > > > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 > > > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 > > > [ 54.890509] Call trace: > > > [ 54.893320] its_msi_teardown+0x11c/0x13c (P) > > > [ 54.898082] its_msi_teardown+0x34/0x44 > > > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 > > > [ 54.907701] msi_device_data_release+0x20/0x64 > > > [ 54.912551] devres_release_all+0xa4/0x104 > > > > That's nowhere near enough information for me to do anything about it. > > > > Unless you describe exactly what device this is, its allocation > > requirements, the topology of the system and finally reproduce it on a > > vanilla kernel and not something that I have no access to, I can't do > > much for you. > > Hi Marc, > > Thanks for the feedback. I can reproduce this issue with latest linux-next > tag next-20260224. Please don't test on -next. Pick the latest tag from Linus. As far as I am concerned, -next bears no relevance whatsoever. > > The host is Glymur (Qualcomm compute platform) with an SDX75 modem > connected via PCIe. The SDX75 driver requests 7 MSI IRQs, and the warning > triggers during driver removal. > > I think this is actually a common problem with how we handle > MSI allocation vs freeing. Here's what I'm seeing: > > When allocating, irq_domain_alloc_irqs_hierarchy() makes one call to > domain->ops->alloc() with nr_irqs=7. The MSI controller (ITS in this case > but DWC-MSI has similar behavior) finds a power-of-2 bits in its bitmap > region, so it allocates 8 contiguous bits to satisfy the 7 IRQ request. Well, it's not like the ITS has a choice. Given that the ITT size is expressed in a number of bits, you get the choice between a power of two or absolutely nothing. I'm not going to comment on the DWC stuff, as it has been bitrotting for the best part of two decades. > > But when freeing, irq_domain_free_irqs_hierarchy() loops and calls > domain->ops->free() seven times, each with nr_irqs=1. So we end up freeing > 7 individual bits instead of the original 8 bits that was allocated. > > This allocation/free mismatch seems to corrupt the bitmap tracking, which > is what triggers the warning in its_msi_teardown(). > > I suspect this would happen with any PCIe device that requests a > non-power-of-2 number of MSI IRQs on systems using ITS or DWC-MSI. Is this device doing Multi-MSI or MSI-X? Please post an 'lspci -vv' so that we know what we are up against. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2026-02-26 13:39 ` Marc Zyngier @ 2026-03-03 5:22 ` Qiang Yu 2026-03-03 9:26 ` Manivannan Sadhasivam 1 sibling, 0 replies; 11+ messages in thread From: Qiang Yu @ 2026-03-03 5:22 UTC (permalink / raw) To: Marc Zyngier; +Cc: Manivannan Sadhasivam, tglx, linux-kernel On Thu, Feb 26, 2026 at 01:39:35PM +0000, Marc Zyngier wrote: > On Wed, 25 Feb 2026 09:34:41 +0000, > Qiang Yu <qiang.yu@oss.qualcomm.com> wrote: > > > > On Thu, Feb 19, 2026 at 04:54:29PM +0000, Marc Zyngier wrote: > > > On Fri, 16 Jan 2026 15:03:33 +0000, > > > Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > > > > > Hi Marc, > > > > > > > > Looks like this has fallen through the cracks and my colleage internally > > > > reported a warning during the removal of a PCI driver and it seems to be related > > > > to the issue we were discussing in this thread: > > > > > > > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 > > > > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo > > > > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT > > > > [ 54.774067] Tainted: [W]=WARN > > > > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 > > > > [ 54.788849] Workqueue: async async_run_entry_fn > > > > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c > > > > [ 54.805997] lr : its_msi_teardown+0x54/0x13c > > > > [ 54.810675] sp : ffff8000837cb710 > > > > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 > > > > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 > > > > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 > > > > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff > > > > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 > > > > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 > > > > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 > > > > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 > > > > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 > > > > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 > > > > [ 54.890509] Call trace: > > > > [ 54.893320] its_msi_teardown+0x11c/0x13c (P) > > > > [ 54.898082] its_msi_teardown+0x34/0x44 > > > > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 > > > > [ 54.907701] msi_device_data_release+0x20/0x64 > > > > [ 54.912551] devres_release_all+0xa4/0x104 > > > > > > That's nowhere near enough information for me to do anything about it. > > > > > > Unless you describe exactly what device this is, its allocation > > > requirements, the topology of the system and finally reproduce it on a > > > vanilla kernel and not something that I have no access to, I can't do > > > much for you. > > > > Hi Marc, > > > > Thanks for the feedback. I can reproduce this issue with latest linux-next > > tag next-20260224. > > Please don't test on -next. Pick the latest tag from Linus. As far as > I am concerned, -next bears no relevance whatsoever. Reproed same issue on latest tag from Linus. [ 922.657743] WARNING: drivers/irqchip/irq-gic-v3-its.c:3642 at its_msi_teardown+0x11c/0x13c, CPU#0: rmmod/490 [ 922.815187] CPU: 0 UID: 0 PID: 490 Comm: rmmod Tainted: G S 7.0.0-rc2-00005-gaf4e9ef3d784 #1 PREEMPT [ 922.826202] Tainted: [S]=CPU_OUT_OF_SPEC [ 922.830254] Hardware name: Qualcomm Technologies, Inc. SM8550 HDK (DT) [ 922.836980] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 922.844158] pc : its_msi_teardown+0x11c/0x13c [ 922.848664] lr : its_msi_teardown+0x54/0x13c [ 922.853075] sp : ffff800080a9bb40 [ 922.856497] x29: ffff800080a9bb40 x28: ffff00081055dd00 x27: 0000000000000000 [ 922.863855] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 922.871215] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800060c08 [ 922.878575] x20: ffff000800060c07 x19: ffff000800968280 x18: 00000000ffffffff [ 922.885931] x17: 00000000000000e3 x16: 00000000000000e2 x15: 00000000000000e1 [ 922.893289] x14: 0000000000000004 x13: ffff000800288210 x12: 0000000000000000 [ 922.900646] x11: ffff0008013c7ce0 x10: 0000000000000002 x9 : 0000000000000001 [ 922.908006] x8 : ffff0008013c7b78 x7 : 000000000000001f x6 : ffff800080a9ba70 [ 922.915364] x5 : 000000000000003c x4 : 0000000000000007 x3 : 0000000000000000 [ 922.922722] x2 : ffff00081072e840 x1 : 0000000000000020 x0 : 0000000000000007 [ 922.930082] Call trace: [ 922.932624] its_msi_teardown+0x11c/0x13c (P) [ 922.937133] its_msi_teardown+0x34/0x44 [ 922.941099] msi_remove_device_irq_domain+0x70/0x114 [ 922.946226] msi_device_data_release+0x20/0x64 [ 922.950824] devres_release_all+0xa4/0x104 [ 922.955070] device_unbind_cleanup+0x18/0x84 [ 922.959484] device_release_driver_internal+0x1f4/0x230 [ 922.964878] driver_detach+0x50/0x98 [ 922.968578] bus_remove_driver+0x6c/0xbc [ 922.972641] driver_unregister+0x30/0x60 [ 922.976697] pci_unregister_driver+0x24/0x9c [ 922.981110] mhi_pci_driver_exit+0x18/0xe0c [mhi_pci_generic] [ 922.987051] __arm64_sys_delete_module+0x1b8/0x2a4 [ 922.992007] invoke_syscall+0x48/0x110 [ 922.995896] el0_svc_common.constprop.0+0x40/0xe0 [ 923.000755] do_el0_svc+0x1c/0x28 [ 923.004197] el0_svc+0x34/0x10c [ 923.007455] el0t_64_sync_handler+0xa0/0xe4 [ 923.011773] el0t_64_sync+0x198/0x19c [ 923.015570] ---[ end trace 0000000000000000 ]--- > > > > > The host is Glymur (Qualcomm compute platform) with an SDX75 modem > > connected via PCIe. The SDX75 driver requests 7 MSI IRQs, and the warning > > triggers during driver removal. > > > > I think this is actually a common problem with how we handle > > MSI allocation vs freeing. Here's what I'm seeing: > > > > When allocating, irq_domain_alloc_irqs_hierarchy() makes one call to > > domain->ops->alloc() with nr_irqs=7. The MSI controller (ITS in this case > > but DWC-MSI has similar behavior) finds a power-of-2 bits in its bitmap > > region, so it allocates 8 contiguous bits to satisfy the 7 IRQ request. > > Well, it's not like the ITS has a choice. Given that the ITT size is > expressed in a number of bits, you get the choice between a power of > two or absolutely nothing. > > I'm not going to comment on the DWC stuff, as it has been bitrotting > for the best part of two decades. Okay, so does each pci driver have to request a power-of-2 number of MSI IRQs? > > > > > But when freeing, irq_domain_free_irqs_hierarchy() loops and calls > > domain->ops->free() seven times, each with nr_irqs=1. So we end up freeing > > 7 individual bits instead of the original 8 bits that was allocated. > > > > This allocation/free mismatch seems to corrupt the bitmap tracking, which > > is what triggers the warning in its_msi_teardown(). > > > > I suspect this would happen with any PCIe device that requests a > > non-power-of-2 number of MSI IRQs on systems using ITS or DWC-MSI. > > Is this device doing Multi-MSI or MSI-X? Please post an 'lspci -vv' so > that we know what we are up against. > This device only supports MSI capability (not MSI-X). Below is the relevant lspci output: 0001:01:00.0 Unassigned class [ff00]: Qualcomm Device 0309 Capabilities: [50] MSI: Enable+ Count=8/32 Maskable+ 64bit+ Address: 00000000fffff040 Data: 0000 Masking: ffffff80 Pending: 00000000 - Qiang Yu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: MSIs not freed in GICv3 ITS driver 2026-02-26 13:39 ` Marc Zyngier 2026-03-03 5:22 ` Qiang Yu @ 2026-03-03 9:26 ` Manivannan Sadhasivam 1 sibling, 0 replies; 11+ messages in thread From: Manivannan Sadhasivam @ 2026-03-03 9:26 UTC (permalink / raw) To: Marc Zyngier; +Cc: Qiang Yu, tglx, linux-kernel On Thu, Feb 26, 2026 at 01:39:35PM +0000, Marc Zyngier wrote: > On Wed, 25 Feb 2026 09:34:41 +0000, > Qiang Yu <qiang.yu@oss.qualcomm.com> wrote: > > > > On Thu, Feb 19, 2026 at 04:54:29PM +0000, Marc Zyngier wrote: > > > On Fri, 16 Jan 2026 15:03:33 +0000, > > > Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > > > > > Hi Marc, > > > > > > > > Looks like this has fallen through the cracks and my colleage internally > > > > reported a warning during the removal of a PCI driver and it seems to be related > > > > to the issue we were discussing in this thread: > > > > > > > > [ 54.727284] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x11c/0x13c, CPU#4: kworker/u73:1/115 > > > > [ 54.738366] Modules linked in: mhi_pci_generic mhi nvme_core usb_f_fs libcomposite sm3_ce nvmem_qcom_spmi_sdam qcom_pon rtc_pm8xxx qcom_spmi_temp_alarm qcom_stats dispcc_glymur gpi llcc_qcom phy_qcom_qmp_pcie qcom_cpucp_mbox qcom_wdt socinfo > > > > [ 54.760588] CPU: 4 UID: 0 PID: 115 Comm: kworker/u73:1 Tainted: G W 6.18.0-next-20251210-14099-gc20082c23661-dirty #2 PREEMPT > > > > [ 54.774067] Tainted: [W]=WARN > > > > [ 54.777412] Hardware name: Qualcomm MTP/Qualcomm Test Device, BIOS 7.0.251121.BOOT.OSSUEFI.3.1-00008-GLYMUR-1 11/21/2025 > > > > [ 54.788849] Workqueue: async async_run_entry_fn > > > > [ 54.793791] pstate: 21400009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) > > > > [ 54.801230] pc : its_msi_teardown+0x11c/0x13c > > > > [ 54.805997] lr : its_msi_teardown+0x54/0x13c > > > > [ 54.810675] sp : ffff8000837cb710 > > > > [ 54.814373] x29: ffff8000837cb710 x28: ffff00080190e410 x27: ffff0008085ba390 > > > > [ 54.821985] x26: ffff000808629bf0 x25: 0000000000000000 x24: 0000000000000066 > > > > [ 54.829602] x23: 0000000000000007 x22: 0000000000000020 x21: ffff000800059608 > > > > [ 54.837209] x20: ffff000800059607 x19: ffff000800a4a300 x18: 00000000ffffffff > > > > [ 54.844819] x17: ffff00080ec65400 x16: ffff00080ec65200 x15: ffff00080ec65000 > > > > [ 54.852429] x14: 0000000000000004 x13: ffff0008000b8810 x12: 0000000000000000 > > > > [ 54.860046] x11: ffff0008007798e8 x10: 0000000000000002 x9 : 0000000000000001 > > > > [ 54.867661] x8 : ffff0008007796f8 x7 : 000000000000001f x6 : ffff8000837cb640 > > > > [ 54.875277] x5 : ffff000801918f40 x4 : 0000000000000007 x3 : 0000000000000000 > > > > [ 54.882891] x2 : ffff000800a037c0 x1 : 0000000000000020 x0 : 0000000000000007 > > > > [ 54.890509] Call trace: > > > > [ 54.893320] its_msi_teardown+0x11c/0x13c (P) > > > > [ 54.898082] its_msi_teardown+0x34/0x44 > > > > [ 54.902316] msi_remove_device_irq_domain+0x70/0x114 > > > > [ 54.907701] msi_device_data_release+0x20/0x64 > > > > [ 54.912551] devres_release_all+0xa4/0x104 > > > > > > That's nowhere near enough information for me to do anything about it. > > > > > > Unless you describe exactly what device this is, its allocation > > > requirements, the topology of the system and finally reproduce it on a > > > vanilla kernel and not something that I have no access to, I can't do > > > much for you. > > > > Hi Marc, > > > > Thanks for the feedback. I can reproduce this issue with latest linux-next > > tag next-20260224. > > Please don't test on -next. Pick the latest tag from Linus. As far as > I am concerned, -next bears no relevance whatsoever. > > > > > The host is Glymur (Qualcomm compute platform) with an SDX75 modem > > connected via PCIe. The SDX75 driver requests 7 MSI IRQs, and the warning > > triggers during driver removal. > > > > I think this is actually a common problem with how we handle > > MSI allocation vs freeing. Here's what I'm seeing: > > > > When allocating, irq_domain_alloc_irqs_hierarchy() makes one call to > > domain->ops->alloc() with nr_irqs=7. The MSI controller (ITS in this case > > but DWC-MSI has similar behavior) finds a power-of-2 bits in its bitmap > > region, so it allocates 8 contiguous bits to satisfy the 7 IRQ request. > > Well, it's not like the ITS has a choice. Given that the ITT size is > expressed in a number of bits, you get the choice between a power of > two or absolutely nothing. > But the underlying issue is that, ITS (maybe other MSI controller drivers) are not freeing *all* of their requested IRQs. I tried reproducing this issue with QEMU: 1. Modified the EDU driver in QEMU to support 8 MSIs: ``` diff --git a/hw/misc/edu.c b/hw/misc/edu.c index cece633e11..95b658ef33 100644 --- a/hw/misc/edu.c +++ b/hw/misc/edu.c @@ -373,7 +373,7 @@ static void pci_edu_realize(PCIDevice *pdev, Error **errp) pci_config_set_interrupt_pin(pci_conf, 1); - if (msi_init(pdev, 0, 1, true, false, errp)) { + if (msi_init(pdev, 0, 8, true, false, errp)) { return; } ``` 2. Then I wrote a simple driver to request 3 IRQs using pci_alloc_irq_vectors() and loaded it: 00:04.0 Unclassified device [00ff]: Device 1234:11e8 (rev 10) Subsystem: Red Hat, Inc. Device 1100 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 25 Region 0: Memory at 10200000 (32-bit, non-prefetchable) [size=1M] Capabilities: [40] MSI: Enable+ Count=4/8 Maskable- 64bit+ Address: 0000000008090040 Data: 0000 Kernel driver in use: edu 3. Rmmoding the driver triggers the below warning (which is same as Qiang reported on Qcom platform): [ 138.082682] ------------[ cut here ]------------ [ 138.082797] WARNING: drivers/irqchip/irq-gic-v3-its.c:3639 at its_msi_teardown+0x150/0x190, CPU#0: rmmod/739 [ 138.083617] Modules linked in: edu(OE-) virtio_net aes_ce_blk aes_ce_cipher ghash_ce sm4 gpio_keys xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 virtio_pci xfrm_user xfrm_algo virtio_pci_legacy_dev rtc_pl031 virtio_pci_modern_dev xt_addrtype nft_compat x_tables nf_tables br_netfilter bridge stp llc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock overlay qrtr binfmt_misc efi_pstore sch_fq_codel libcomposite nfnetlink qemu_fw_cfg autofs4 [ 138.085222] CPU: 0 UID: 0 PID: 739 Comm: rmmod Tainted: G OE 6.19.0-rc1+ #26 PREEMPT(voluntary) [ 138.085414] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 138.085522] Hardware name: linux,dummy-virt (DT) [ 138.085712] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 138.085872] pc : its_msi_teardown+0x150/0x190 [ 138.085974] lr : its_msi_teardown+0x6c/0x190 [ 138.086073] sp : ffff800080903a90 [ 138.086156] x29: ffff800080903ac0 x28: ffff000016e8c200 x27: 0000000000000000 [ 138.086352] x26: 0000000000000000 x25: 0000000000000000 x24: ffffb1373e977ba0 [ 138.086509] x23: ffffb1373eebeeb0 x22: 0000000000000008 x21: ffff000002e21a07 [ 138.086671] x20: ffff000002e21a08 x19: ffff000004895e00 x18: ffff800080573098 [ 138.086828] x17: 0000000000000000 x16: 0000000000000000 x15: ffff000004f5b400 [ 138.086984] x14: ffff000004f0a200 x13: 0000000000000000 x12: 0000000000000000 [ 138.087142] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffb1373e430234 [ 138.087297] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 138.087452] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 138.087595] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000003 [ 138.087791] Call trace: [ 138.087954] its_msi_teardown+0x150/0x190 (P) [ 138.088105] its_msi_teardown+0x40/0x70 [ 138.088196] msi_remove_device_irq_domain+0x84/0x128 [ 138.088289] msi_device_data_release+0x2c/0xa0 [ 138.088370] release_nodes+0x70/0x138 [ 138.088443] devres_release_all+0xa0/0x120 [ 138.088522] device_unbind_cleanup+0x24/0x98 [ 138.088614] device_release_driver_internal+0x238/0x2f8 [ 138.088710] driver_detach+0x58/0xc0 [ 138.088782] bus_remove_driver+0x80/0x140 [ 138.088859] driver_unregister+0x3c/0xa0 [ 138.088935] pci_unregister_driver+0x30/0xc0 [ 138.089021] edu_exit+0x28/0xa8 [edu] [ 138.089280] __arm64_sys_delete_module+0x1e4/0x398 [ 138.089388] invoke_syscall.constprop.0+0x68/0x108 [ 138.089497] el0_svc_common.constprop.0+0x44/0x140 [ 138.089591] do_el0_svc+0x28/0x58 [ 138.089661] el0_svc+0x44/0x230 [ 138.089731] el0t_64_sync_handler+0xc0/0x110 [ 138.089815] el0t_64_sync+0x1b8/0x1c0 [ 138.089975] ---[ end trace 0000000000000000 ]--- > I'm not going to comment on the DWC stuff, as it has been bitrotting > for the best part of two decades. > The above issue should be applicable to other MSI controller drivers as well, not just DWC. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-03-03 9:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-08 15:39 MSIs not freed in GICv3 ITS driver Manivannan Sadhasivam 2024-07-08 17:31 ` Marc Zyngier 2024-07-09 17:37 ` Manivannan Sadhasivam 2024-07-09 19:24 ` Marc Zyngier 2024-07-21 8:50 ` Manivannan Sadhasivam 2026-01-16 15:03 ` Manivannan Sadhasivam 2026-02-19 16:54 ` Marc Zyngier 2026-02-25 9:34 ` Qiang Yu 2026-02-26 13:39 ` Marc Zyngier 2026-03-03 5:22 ` Qiang Yu 2026-03-03 9:26 ` Manivannan Sadhasivam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox