public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
@ 2024-03-12 15:07 Imran Khan
  2024-03-29 13:11 ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: Imran Khan @ 2024-03-12 15:07 UTC (permalink / raw)
  To: stable

[ Upstream commit 9fb9eb4b59acc607e978288c96ac7efa917153d4 ]

systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4b59ac ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. Also this is why I have mentioned this commit as equivalent
upstream commit. But this upstream change itself is dependent on a bunch
of changes starting from 'commit 288c81ce4be7 ("PCI/MSI: Move code into a 
separate directory")', which refactored msi driver into multiple parts.
So another way of fixing this issue would be to backport these patches and
get this issue implictly fixed.
Kindly let me know if my current patch is not acceptable and in that case
will it be fine if I backport the above mentioned msi driver refactoring
patches to LST.
 

 drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7c42a99be5065..5b59fb65231ba 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -107,6 +107,7 @@ static int igb_setup_all_rx_resources(struct igb_adapter *);
 static void igb_free_all_tx_resources(struct igb_adapter *);
 static void igb_free_all_rx_resources(struct igb_adapter *);
 static void igb_setup_mrqc(struct igb_adapter *);
+static void igb_free_irq(struct igb_adapter *adapter);
 static int igb_probe(struct pci_dev *, const struct pci_device_id *);
 static void igb_remove(struct pci_dev *pdev);
 static int igb_sw_init(struct igb_adapter *);
@@ -1080,6 +1081,7 @@ static void igb_free_q_vectors(struct igb_adapter *adapter)
  */
 static void igb_clear_interrupt_scheme(struct igb_adapter *adapter)
 {
+	igb_free_irq(adapter);
 	igb_free_q_vectors(adapter);
 	igb_reset_interrupt_capability(adapter);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
  2024-03-12 15:07 [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path Imran Khan
@ 2024-03-29 13:11 ` Greg KH
  2024-04-22 22:32   ` imran.f.khan
  0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2024-03-29 13:11 UTC (permalink / raw)
  To: Imran Khan; +Cc: stable

On Wed, Mar 13, 2024 at 02:07:13AM +1100, Imran Khan wrote:
> [ Upstream commit 9fb9eb4b59acc607e978288c96ac7efa917153d4 ]

No it is not.

> 
> systems, using igb driver, crash while executing poweroff command
> as per following call stack:
> 
> crash> bt -a
> PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
>  #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
>  #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
>  #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
>  #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
>  #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
>  #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
>  #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
>  #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
>  #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
>     [exception RIP: free_msi_irqs+408]
>     RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
>     RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
>     RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
>     RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
>     R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
>  #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
>  #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
>  #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
>  #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
>  #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
>  #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
>  #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
>  #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
>  #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
>  #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
>  #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1
> 
> This happens because igb_shutdown has not yet freed up allocated irqs and
> free_msi_irqs finds irq_has_action true for involved msi irqs here and this
> condition triggers BUG_ON.
> 
> Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
> fixes this problem.
> 
> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> ---
> 
> This issue does not happen in v5.17 or later kernel versions because
> 'commit 9fb9eb4b59ac ("PCI/MSI: Let core code free MSI descriptors")',
> explicitly frees up MSI based irqs and hence indirectly fixes this issue
> as well. Also this is why I have mentioned this commit as equivalent
> upstream commit. But this upstream change itself is dependent on a bunch
> of changes starting from 'commit 288c81ce4be7 ("PCI/MSI: Move code into a 
> separate directory")', which refactored msi driver into multiple parts.
> So another way of fixing this issue would be to backport these patches and
> get this issue implictly fixed.
> Kindly let me know if my current patch is not acceptable and in that case
> will it be fine if I backport the above mentioned msi driver refactoring
> patches to LST.

What would the real patch series look like?  How bad is the backports?
Try that out first please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
  2024-03-29 13:11 ` Greg KH
@ 2024-04-22 22:32   ` imran.f.khan
  2024-04-22 23:13     ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: imran.f.khan @ 2024-04-22 22:32 UTC (permalink / raw)
  To: Greg KH; +Cc: stable

Hello Greg,


On 30/3/2024 12:11 am, Greg KH wrote:

> On Wed, Mar 13, 2024 at 02:07:13AM +1100, Imran Khan wrote:
>> [ Upstream commit 9fb9eb4b59acc607e978288c96ac7efa917153d4 ]
> No it is not.
>
>> systems, using igb driver, crash while executing poweroff command
>> as per following call stack:
>>
>> crash> bt -a
>> PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
>>   #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
>>   #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
>>   #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
>>   #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
>>   #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
>>   #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
>>   #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
>>   #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
>>   #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
>>      [exception RIP: free_msi_irqs+408]
>>      RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
>>      RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
>>      RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
>>      RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
>>      R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
>>      R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
>>   #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
>>   #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
>>   #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
>>   #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
>>   #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
>>   #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
>>   #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
>>   #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
>>   #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
>>   #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
>>   #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1
>>
>> This happens because igb_shutdown has not yet freed up allocated irqs and
>> free_msi_irqs finds irq_has_action true for involved msi irqs here and this
>> condition triggers BUG_ON.
>>
>> Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
>> fixes this problem.
>>
>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
>> ---
>>
>> This issue does not happen in v5.17 or later kernel versions because
>> 'commit 9fb9eb4b59ac ("PCI/MSI: Let core code free MSI descriptors")',
>> explicitly frees up MSI based irqs and hence indirectly fixes this issue
>> as well. Also this is why I have mentioned this commit as equivalent
>> upstream commit. But this upstream change itself is dependent on a bunch
>> of changes starting from 'commit 288c81ce4be7 ("PCI/MSI: Move code into a
>> separate directory")', which refactored msi driver into multiple parts.
>> So another way of fixing this issue would be to backport these patches and
>> get this issue implictly fixed.
>> Kindly let me know if my current patch is not acceptable and in that case
>> will it be fine if I backport the above mentioned msi driver refactoring
>> patches to LST.
> What would the real patch series look like?  How bad is the backports?
> Try that out first please.

Sorry for replying late on this.
I gave backport a try to get a clearer idea of all dependencies.

Some changes like'commit 288c81ce4be7 ("PCI/MSI: Move code into a
separate directory")' are refactoring the code into multiple files and
needed fix is based on this new code structure. This can be got around
without having to refactor the code in stable tree.
But so far the main problem that I see is that fix is utilizing the framework
for runtime extension of MSI-X irqs and some of the needed commits (like
'commit 654eb939a0dd ("PCI/MSI: Move msi_lock to struct pci_dev") are changing
struct device and struct pci_device and this will break kABI.

Could you kindly confirm if a backport that fixes an issue but breaks kernel ABI
is allowed in stable tree ?

thanks,
imran

>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
  2024-04-22 22:32   ` imran.f.khan
@ 2024-04-22 23:13     ` Greg KH
  2024-04-22 23:49       ` imran.f.khan
  0 siblings, 1 reply; 6+ messages in thread
From: Greg KH @ 2024-04-22 23:13 UTC (permalink / raw)
  To: imran.f.khan; +Cc: stable

On Tue, Apr 23, 2024 at 08:32:09AM +1000, imran.f.khan@oracle.com wrote:
> Could you kindly confirm if a backport that fixes an issue but breaks kernel ABI
> is allowed in stable tree ?

There is no such thing as a stable in-kernel api, so I don't understand
what you are asking here, sorry.

The only api that we ever care about is the user/kernel api, that can
not break.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
  2024-04-22 23:13     ` Greg KH
@ 2024-04-22 23:49       ` imran.f.khan
  2024-04-23 11:14         ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: imran.f.khan @ 2024-04-22 23:49 UTC (permalink / raw)
  To: Greg KH; +Cc: stable

Hello Greg,
Thanks for confirming and I am sorry for the confusion. I was referring to
kABI (Kernel Application binary Interface) which is a set of in-kernel symbols
used by drivers and other modules.
We (Oracle Linux) try to keep it unchanged so that external third party
kernel modules or drivers work without needing recompilation.

I understand now, that there is no such requirements in upstream.

thanks,
imran

On 23/4/2024 9:13 am, Greg KH wrote:

> On Tue, Apr 23, 2024 at 08:32:09AM +1000, imran.f.khan@oracle.com wrote:
>> Could you kindly confirm if a backport that fixes an issue but breaks kernel ABI
>> is allowed in stable tree ?
> There is no such thing as a stable in-kernel api, so I don't understand
> what you are asking here, sorry.
>
> The only api that we ever care about is the user/kernel api, that can
> not break.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path.
  2024-04-22 23:49       ` imran.f.khan
@ 2024-04-23 11:14         ` Greg KH
  0 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2024-04-23 11:14 UTC (permalink / raw)
  To: imran.f.khan; +Cc: stable

On Tue, Apr 23, 2024 at 09:49:00AM +1000, imran.f.khan@oracle.com wrote:
> Hello Greg,
> Thanks for confirming and I am sorry for the confusion. I was referring to
> kABI (Kernel Application binary Interface) which is a set of in-kernel symbols
> used by drivers and other modules.
> We (Oracle Linux) try to keep it unchanged so that external third party
> kernel modules or drivers work without needing recompilation.

That is your business decision to do so, the community has no such crazy
restriction at all.

But the comunity does ask that you not top-post email replies, so
perhaps fix up your email client? :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-23 11:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-12 15:07 [PATCH v4.14, v4.19, v5.4, v5.10, v5.15] igb: free up irq resources in device shutdown path Imran Khan
2024-03-29 13:11 ` Greg KH
2024-04-22 22:32   ` imran.f.khan
2024-04-22 23:13     ` Greg KH
2024-04-22 23:49       ` imran.f.khan
2024-04-23 11:14         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox