public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely
@ 2023-07-20 12:24 Jie Zhan
  2023-08-25 18:00 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Jie Zhan @ 2023-07-20 12:24 UTC (permalink / raw)
  To: maz, tglx
  Cc: linux-kernel, linuxarm, zhanjie9, prime.zeng, liyihang6,
	chenxiang66, shenyang39, qianweili

Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
interrupt that has no mapping"), we have found failures when
re-inserting some specific drivers:

[root@localhost ~]# rmmod hisi_sas_v3_hw
[root@localhost ~]# modprobe hisi_sas_v3_hw
[ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2

This comes from the case where some IRQs allocated from a low-level domain,
e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
driver insertion fails to get the same number of IRQs because some IRQs are
still occupied.

Free a contiguous group of IRQs in one go to fix this issue.

A previous discussion can be found at:
https://lore.kernel.org/lkml/3d3d0155e66429968cb4f6b4feeae4b3@kernel.org/
This solution was originally written by Marc Zyngier in the discussion, but
no code ends up upstreamed in that thread. Hopefully, this patch could get
some notice back.

Fixes: 4615fbc3788d ("genirq/irqdomain: Don't try to free an interrupt that has no mapping")
Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
---
 kernel/irq/irqdomain.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index f34760a1e222..f059e00dc827 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
 					   unsigned int nr_irqs)
 {
 	unsigned int i;
+	int n;
 
 	if (!domain->ops->free)
 		return;
 
 	for (i = 0; i < nr_irqs; i++) {
-		if (irq_domain_get_irq_data(domain, irq_base + i))
-			domain->ops->free(domain, irq_base + i, 1);
+		/* Find the largest possible span of IRQs to free in one go */
+		for (n = 0;
+			((i + n) < nr_irqs) &&
+			 (irq_domain_get_irq_data(domain, irq_base + i + n));
+			n++)
+			;
+
+		if (!n)
+			continue;
+
+		domain->ops->free(domain, irq_base + i, n);
+		i += n;
 	}
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely
  2023-07-20 12:24 [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely Jie Zhan
@ 2023-08-25 18:00 ` Thomas Gleixner
  2023-08-29  9:05   ` Jie Zhan
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2023-08-25 18:00 UTC (permalink / raw)
  To: Jie Zhan, maz
  Cc: linux-kernel, linuxarm, zhanjie9, prime.zeng, liyihang6,
	chenxiang66, shenyang39, qianweili

On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
> Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
> interrupt that has no mapping"), we have found failures when
> re-inserting some specific drivers:
>
> [root@localhost ~]# rmmod hisi_sas_v3_hw
> [root@localhost ~]# modprobe hisi_sas_v3_hw
> [ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2
>
> This comes from the case where some IRQs allocated from a low-level domain,
> e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
> driver insertion fails to get the same number of IRQs because some IRQs are
> still occupied.

Why?

> Free a contiguous group of IRQs in one go to fix this issue.

Again why?

> @@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>  					   unsigned int nr_irqs)
>  {
>  	unsigned int i;
> +	int n;
>  
>  	if (!domain->ops->free)
>  		return;
>  
>  	for (i = 0; i < nr_irqs; i++) {
> -		if (irq_domain_get_irq_data(domain, irq_base + i))
> -			domain->ops->free(domain, irq_base + i, 1);
> +		/* Find the largest possible span of IRQs to free in one go */
> +		for (n = 0;
> +			((i + n) < nr_irqs) &&
> +			 (irq_domain_get_irq_data(domain, irq_base + i + n));
> +			n++)
> +			;

For one this is unreadable gunk. But what's worse it still does not
explain what this is solving.

It's completely sensible to expect that freeing interrupts in a range
one by one just works.

So why do we need to work around an obvious low level failure in the
core code?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely
  2023-08-25 18:00 ` Thomas Gleixner
@ 2023-08-29  9:05   ` Jie Zhan
  0 siblings, 0 replies; 3+ messages in thread
From: Jie Zhan @ 2023-08-29  9:05 UTC (permalink / raw)
  To: Thomas Gleixner, maz
  Cc: linux-kernel, linuxarm, prime.zeng, liyihang6, chenxiang66,
	shenyang39, qianweili



On 26/08/2023 02:00, Thomas Gleixner wrote:
> On Thu, Jul 20 2023 at 20:24, Jie Zhan wrote:
>> Since commit 4615fbc3788d ("genirq/irqdomain: Don't try to free an
>> interrupt that has no mapping"), we have found failures when
>> re-inserting some specific drivers:
>>
>> [root@localhost ~]# rmmod hisi_sas_v3_hw
>> [root@localhost ~]# modprobe hisi_sas_v3_hw
>> [ 1295.622525] hisi_sas_v3_hw: probe of 0000:30:04.0 failed with error -2
>>
>> This comes from the case where some IRQs allocated from a low-level domain,
>> e.g. GIC ITS, are not freed completely, leaving some leaked. Thus, the next
>> driver insertion fails to get the same number of IRQs because some IRQs are
>> still occupied.
> Why?
>
>> Free a contiguous group of IRQs in one go to fix this issue.
> Again why?
>
>> @@ -1445,13 +1445,24 @@ static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
>>   					   unsigned int nr_irqs)
>>   {
>>   	unsigned int i;
>> +	int n;
>>   
>>   	if (!domain->ops->free)
>>   		return;
>>   
>>   	for (i = 0; i < nr_irqs; i++) {
>> -		if (irq_domain_get_irq_data(domain, irq_base + i))
>> -			domain->ops->free(domain, irq_base + i, 1);
>> +		/* Find the largest possible span of IRQs to free in one go */
>> +		for (n = 0;
>> +			((i + n) < nr_irqs) &&
>> +			 (irq_domain_get_irq_data(domain, irq_base + i + n));
>> +			n++)
>> +			;
> For one this is unreadable gunk. But what's worse it still does not
> explain what this is solving.
>
> It's completely sensible to expect that freeing interrupts in a range
> one by one just works.
>
> So why do we need to work around an obvious low level failure in the
> core code?
>
> Thanks,
>
>          tglx

Hi Thomas,

Many thanks for taking a look.

I believe this patch should be completely reworked as it has caused many 
questions
in the first place --- it's not explaining itself well. Please ignore 
this one now.

The story of the problem is a bit long and complicated. The previous 
disscusion can
be found in the link attached.

Jie


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-08-29  9:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-20 12:24 [PATCH] irqdomain: Fix driver re-inserting failures when IRQs not being freed completely Jie Zhan
2023-08-25 18:00 ` Thomas Gleixner
2023-08-29  9:05   ` Jie Zhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox