* Kernel panic with niu module
[not found] <f7c43842-270e-48f8-ba89-9b5e67910131.ref@yahoo.com>
@ 2024-11-04 11:34 ` Dullfire
2024-11-04 23:44 ` Bjorn Helgaas
0 siblings, 1 reply; 7+ messages in thread
From: Dullfire @ 2024-11-04 11:34 UTC (permalink / raw)
To: davem, sparclinux, netdev, linux-pci
Hello,
I am working on a set of patches that address a panic on bind in the niu
module. However, none of the approaches I see integrate well with the kernels
frameworks, so any feed back you could provide would be appreciated.
On sparcv9 systems (and possibly others), when the niu drivers sets up the
MSIX IRQ vectors, a fatal trap[0] is encountered. I have done a number of
tests[1]. From these tests I have believe that any read from a specific MSIX
table entry must come after a write to it's PCI_MSIX_ENTRY_DATA field,
otherwise it will cause a fatal trap.
I see types of approaches:
1) Add writes to the ENTRY_DATA field to niu before it call into the
msi(x) code.
2) Adjust the MSIX code to either skip the read, or write to ENTRY_DATA first
3) Add a PCI quirk for this device to "initialize" the MSIX vector table.
Approach 1 encounters issues in needing to write to the MSIX table. The
functions needed to do this are internal to msi.c (or drivers/pci/msi/msi.h),
so they would have to either be reproduces in niu, or exposed in a public
header. Neither of those seem like a good approach to me.
Approach 2 can be done in a small amount of code, but it would either require
the addition of a struct pci_dev flag of some sort, or it would be invasive
to lots of other devices.
While approach 3 seems to be the most correct location, it suffers many of
the same issues as approach 1.
I have also bisected the kernel, and determined that upstream commit
7d5ec3d3612396dc6d4b76366d20ab9fc06f399f revealed this issue. This commit
adds read to the mask status before any write to PCI_MSIX_ENTRY_DATA, thus
provoking the issue.
If you have any suggestions, please let me know.
Regards,
Jonathan Currier
[0] The trap looks like this:
-----------------------------------------------------------------------------
[ 25.166817] niu: niu.c:v1.1 (Apr 22, 2010)
[ 25.166952] niu 0001:04:00.0: enabling device (0144 -> 0146)
[ 25.174100] niu: niu0: Found PHY 002063b0 type MII at phy_port 26
[ 25.174559] niu: niu0: Found PHY 002063b0 type MII at phy_port 27
[ 25.175004] niu: niu0: Found PHY 002063b0 type MII at phy_port 28
[ 25.175449] niu: niu0: Found PHY 002063b0 type MII at phy_port 29
[ 25.176298] niu: niu0: Port 0 [4 RX chans] [6 TX chans]
[ 25.176405] niu: niu0: Port 1 [4 RX chans] [6 TX chans]
[ 25.176507] niu: niu0: Port 2 [4 RX chans] [6 TX chans]
[ 25.176548] niu: niu0: Port 3 [4 RX chans] [6 TX chans]
[ 25.176590] niu: niu0: Port 0 RDC tbl(0) [ 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ]
[ 25.176757] niu: niu0: Port 0 RDC tbl(1) [ 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ]
[ 25.176890] niu: niu0: Port 1 RDC tbl(2) [ 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 ]
[ 25.177053] niu: niu0: Port 1 RDC tbl(3) [ 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 ]
[ 25.177185] niu: niu0: Port 2 RDC tbl(4) [ 8 9 10 11 8 9 10 11 8 9 10 11 8 9 10 11 ]
[ 25.177349] niu: niu0: Port 2 RDC tbl(5) [ 8 9 10 11 8 9 10 11 8 9 10 11 8 9 10 11 ]
[ 25.177483] niu: niu0: Port 3 RDC tbl(6) [ 12 13 14 15 12 13 14 15 12 13 14 15 12 13 14 15 ]
[ 25.177649] niu: niu0: Port 3 RDC tbl(7) [ 12 13 14 15 12 13 14 15 12 13 14 15 12 13 14 15 ]
[ 25.245863] NON-RESUMABLE ERROR: Reporting on cpu 64
[ 25.245973] NON-RESUMABLE ERROR: TPC [0x00000000005f6900] <msix_prepare_msi_desc+0x90/0xa0>
[ 25.246106] NON-RESUMABLE ERROR: RAW [4010000000000016:00000e37f93e32ff:0000000202000080:ffffffffffffffff
[ 25.246215] NON-RESUMABLE ERROR: 0000000800000000:0000000000000000:0000000000000000:0000000000000000]
[ 25.246291] NON-RESUMABLE ERROR: handle [0x4010000000000016] stick [0x00000e37f93e32ff]
[ 25.246335] NON-RESUMABLE ERROR: type [precise nonresumable]
[ 25.246373] NON-RESUMABLE ERROR: attrs [0x02000080] < ASI sp-faulted priv >
[ 25.246435] NON-RESUMABLE ERROR: raddr [0xffffffffffffffff]
[ 25.246476] NON-RESUMABLE ERROR: insn effective address [0x000000c50020000c]
[ 25.246517] NON-RESUMABLE ERROR: size [0x8]
[ 25.246544] NON-RESUMABLE ERROR: asi [0x00]
[ 25.246573] CPU: 64 UID: 0 PID: 745 Comm: kworker/64:1 Not tainted 6.11.5 #63
[ 25.246625] Workqueue: events work_for_cpu_fn
[ 25.246671] TSTATE: 0000000011001602 TPC: 00000000005f6900 TNPC: 00000000005f6904 Y: 00000000 Not tainted
[ 25.246729] TPC: <msix_prepare_msi_desc+0x90/0xa0>
[ 25.246771] g0: 00000000000002e9 g1: 000000000000000c g2: 000000c50020000c g3: 0000000000000100
[ 25.246815] g4: ffff8000470307c0 g5: ffff800fec5be000 g6: ffff800047a08000 g7: 0000000000000000
[ 25.246861] o0: ffff800014feb000 o1: ffff800047a0b620 o2: 0000000000000011 o3: ffff800047a0b620
[ 25.246906] o4: 0000000000000080 o5: 0000000000000011 sp: ffff800047a0ad51 ret_pc: 00000000005f7128
[ 25.246951] RPC: <__pci_enable_msix_range+0x3cc/0x460>
[ 25.247004] l0: 000000000000000d l1: 000000000000c01f l2: ffff800014feb0a8 l3: 0000000000000020
[ 25.247049] l4: 000000000000c000 l5: 0000000000000001 l6: 0000000020000000 l7: ffff800047a0b734
[ 25.247094] i0: ffff800014feb000 i1: ffff800047a0b730 i2: 0000000000000001 i3: 000000000000000d
[ 25.247138] i4: 0000000000000000 i5: 0000000000000000 i6: ffff800047a0ae81 i7: 00000000101888b0
[ 25.247182] I7: <niu_try_msix.constprop.0+0xc0/0x130 [niu]>
[ 25.247321] Call Trace:
[ 25.247346] [<00000000101888b0>] niu_try_msix.constprop.0+0xc0/0x130 [niu]
[ 25.247442] [<000000001018f840>] niu_get_invariants+0x183c/0x207c [niu]
[ 25.247536] [<00000000101902fc>] niu_pci_init_one+0x27c/0x2fc [niu]
[ 25.247630] [<00000000005ef3e4>] local_pci_probe+0x28/0x74
[ 25.247677] [<0000000000469240>] work_for_cpu_fn+0x8/0x1c
[ 25.247726] [<000000000046b008>] process_scheduled_works+0x144/0x210
[ 25.247782] [<000000000046b518>] worker_thread+0x13c/0x1c0
[ 25.247833] [<00000000004710e0>] kthread+0xb8/0xc8
[ 25.247874] [<00000000004060c8>] ret_from_fork+0x1c/0x2c
[ 25.247931] [<0000000000000000>] 0x0
[ 25.247961] Kernel panic - not syncing: Non-resumable error.
-----------------------------------------------------------------------------
[1] Tests I have done (and their results)
All tests done on a T5240 (UltaSPARC T2).
In my test cases: niu driver tries to use up to 13 vectors (table size of 32 entries).
"SUCCCESS" - Test case booted with functional networking
"FAILED" - Test case experienced a fatal trap after loading the niu module
- writing 0 to all of the MSIX table entrys' ENTRY_VECTOR_CTRL: FAILED
- writing 0 to all of the MSIX table entrys' ENTRY_LOWER_ADDR: FAILED
- writing 0 to all of the MSIX table entrys' ENTRY_UPPER_ADDR: FAILED
- writing 0 to all of the MSIX table entrys' ENTRY_DATA: SUCCESS
- writing 0 to only one MSIX table entry's ENTRY_DATA: FAILED
- writing 0 to only the first 1/2 of the MSIX table entrys' ENTRY_DATA: SUCCESS
- writing ~0 to only the first 1/2 of the MSIX table entrys' ENTRY_DATA: SUCCESS
- writing 0 to only the first 12 of the MSIX table entrys' ENTRY_DATA: FAILED
- writing 0 to only the first 13 of the MSIX table entrys' ENTRY_DATA: SUCCESS
- reading ENTRY_DATA before writing it: FAILED
- reading ENTRY_LOWER_ADDR before writing it: FAILED
- reading ENTRY_UPPER_ADDR before writing it: FAILED
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-04 11:34 ` Kernel panic with niu module Dullfire
@ 2024-11-04 23:44 ` Bjorn Helgaas
2024-11-05 11:24 ` Dullfire
2024-11-06 15:36 ` Thomas Gleixner
0 siblings, 2 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2024-11-04 23:44 UTC (permalink / raw)
To: Dullfire; +Cc: davem, sparclinux, netdev, linux-pci, Thomas Gleixner
[+cc Thomas, author of 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X
entries")]
On Mon, Nov 04, 2024 at 05:34:42AM -0600, Dullfire wrote:
> Hello,
>
> I am working on a set of patches that address a panic on bind in the niu
> module. However, none of the approaches I see integrate well with the kernels
> frameworks, so any feed back you could provide would be appreciated.
>
> On sparcv9 systems (and possibly others), when the niu drivers sets up the
> MSIX IRQ vectors, a fatal trap[0] is encountered. I have done a number of
> tests[1]. From these tests I have believe that any read from a specific MSIX
> table entry must come after a write to it's PCI_MSIX_ENTRY_DATA field,
> otherwise it will cause a fatal trap.
>
> I see types of approaches:
> 1) Add writes to the ENTRY_DATA field to niu before it call into the
> msi(x) code.
> 2) Adjust the MSIX code to either skip the read, or write to ENTRY_DATA first
> 3) Add a PCI quirk for this device to "initialize" the MSIX vector table.
>
> Approach 1 encounters issues in needing to write to the MSIX table. The
> functions needed to do this are internal to msi.c (or drivers/pci/msi/msi.h),
> so they would have to either be reproduces in niu, or exposed in a public
> header. Neither of those seem like a good approach to me.
>
> Approach 2 can be done in a small amount of code, but it would either require
> the addition of a struct pci_dev flag of some sort, or it would be invasive
> to lots of other devices.
>
> While approach 3 seems to be the most correct location, it suffers many of
> the same issues as approach 1.
>
> I have also bisected the kernel, and determined that upstream commit
> 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f revealed this issue. This commit
> adds read to the mask status before any write to PCI_MSIX_ENTRY_DATA, thus
> provoking the issue.
7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") appeared in
v5.14 in 2021. Surely other drivers use MSI-X and would have been
tested on sparcv9 since then? Just based on the age of 7d5ec3d36123,
I would guess some kind of niu issue. But Thomas will know much more.
> If you have any suggestions, please let me know.
>
> Regards,
> Jonathan Currier
>
>
> [0] The trap looks like this:
> -----------------------------------------------------------------------------
> [ 25.166817] niu: niu.c:v1.1 (Apr 22, 2010)
> [ 25.166952] niu 0001:04:00.0: enabling device (0144 -> 0146)
> [ 25.174100] niu: niu0: Found PHY 002063b0 type MII at phy_port 26
> [ 25.174559] niu: niu0: Found PHY 002063b0 type MII at phy_port 27
> [ 25.175004] niu: niu0: Found PHY 002063b0 type MII at phy_port 28
> [ 25.175449] niu: niu0: Found PHY 002063b0 type MII at phy_port 29
> [ 25.176298] niu: niu0: Port 0 [4 RX chans] [6 TX chans]
> [ 25.176405] niu: niu0: Port 1 [4 RX chans] [6 TX chans]
> [ 25.176507] niu: niu0: Port 2 [4 RX chans] [6 TX chans]
> [ 25.176548] niu: niu0: Port 3 [4 RX chans] [6 TX chans]
> [ 25.176590] niu: niu0: Port 0 RDC tbl(0) [ 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ]
> [ 25.176757] niu: niu0: Port 0 RDC tbl(1) [ 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 ]
> [ 25.176890] niu: niu0: Port 1 RDC tbl(2) [ 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 ]
> [ 25.177053] niu: niu0: Port 1 RDC tbl(3) [ 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 ]
> [ 25.177185] niu: niu0: Port 2 RDC tbl(4) [ 8 9 10 11 8 9 10 11 8 9 10 11 8 9 10 11 ]
> [ 25.177349] niu: niu0: Port 2 RDC tbl(5) [ 8 9 10 11 8 9 10 11 8 9 10 11 8 9 10 11 ]
> [ 25.177483] niu: niu0: Port 3 RDC tbl(6) [ 12 13 14 15 12 13 14 15 12 13 14 15 12 13 14 15 ]
> [ 25.177649] niu: niu0: Port 3 RDC tbl(7) [ 12 13 14 15 12 13 14 15 12 13 14 15 12 13 14 15 ]
> [ 25.245863] NON-RESUMABLE ERROR: Reporting on cpu 64
> [ 25.245973] NON-RESUMABLE ERROR: TPC [0x00000000005f6900] <msix_prepare_msi_desc+0x90/0xa0>
> [ 25.246106] NON-RESUMABLE ERROR: RAW [4010000000000016:00000e37f93e32ff:0000000202000080:ffffffffffffffff
> [ 25.246215] NON-RESUMABLE ERROR: 0000000800000000:0000000000000000:0000000000000000:0000000000000000]
> [ 25.246291] NON-RESUMABLE ERROR: handle [0x4010000000000016] stick [0x00000e37f93e32ff]
> [ 25.246335] NON-RESUMABLE ERROR: type [precise nonresumable]
> [ 25.246373] NON-RESUMABLE ERROR: attrs [0x02000080] < ASI sp-faulted priv >
> [ 25.246435] NON-RESUMABLE ERROR: raddr [0xffffffffffffffff]
> [ 25.246476] NON-RESUMABLE ERROR: insn effective address [0x000000c50020000c]
> [ 25.246517] NON-RESUMABLE ERROR: size [0x8]
> [ 25.246544] NON-RESUMABLE ERROR: asi [0x00]
> [ 25.246573] CPU: 64 UID: 0 PID: 745 Comm: kworker/64:1 Not tainted 6.11.5 #63
> [ 25.246625] Workqueue: events work_for_cpu_fn
> [ 25.246671] TSTATE: 0000000011001602 TPC: 00000000005f6900 TNPC: 00000000005f6904 Y: 00000000 Not tainted
> [ 25.246729] TPC: <msix_prepare_msi_desc+0x90/0xa0>
> [ 25.246771] g0: 00000000000002e9 g1: 000000000000000c g2: 000000c50020000c g3: 0000000000000100
> [ 25.246815] g4: ffff8000470307c0 g5: ffff800fec5be000 g6: ffff800047a08000 g7: 0000000000000000
> [ 25.246861] o0: ffff800014feb000 o1: ffff800047a0b620 o2: 0000000000000011 o3: ffff800047a0b620
> [ 25.246906] o4: 0000000000000080 o5: 0000000000000011 sp: ffff800047a0ad51 ret_pc: 00000000005f7128
> [ 25.246951] RPC: <__pci_enable_msix_range+0x3cc/0x460>
> [ 25.247004] l0: 000000000000000d l1: 000000000000c01f l2: ffff800014feb0a8 l3: 0000000000000020
> [ 25.247049] l4: 000000000000c000 l5: 0000000000000001 l6: 0000000020000000 l7: ffff800047a0b734
> [ 25.247094] i0: ffff800014feb000 i1: ffff800047a0b730 i2: 0000000000000001 i3: 000000000000000d
> [ 25.247138] i4: 0000000000000000 i5: 0000000000000000 i6: ffff800047a0ae81 i7: 00000000101888b0
> [ 25.247182] I7: <niu_try_msix.constprop.0+0xc0/0x130 [niu]>
> [ 25.247321] Call Trace:
> [ 25.247346] [<00000000101888b0>] niu_try_msix.constprop.0+0xc0/0x130 [niu]
> [ 25.247442] [<000000001018f840>] niu_get_invariants+0x183c/0x207c [niu]
> [ 25.247536] [<00000000101902fc>] niu_pci_init_one+0x27c/0x2fc [niu]
> [ 25.247630] [<00000000005ef3e4>] local_pci_probe+0x28/0x74
> [ 25.247677] [<0000000000469240>] work_for_cpu_fn+0x8/0x1c
> [ 25.247726] [<000000000046b008>] process_scheduled_works+0x144/0x210
> [ 25.247782] [<000000000046b518>] worker_thread+0x13c/0x1c0
> [ 25.247833] [<00000000004710e0>] kthread+0xb8/0xc8
> [ 25.247874] [<00000000004060c8>] ret_from_fork+0x1c/0x2c
> [ 25.247931] [<0000000000000000>] 0x0
> [ 25.247961] Kernel panic - not syncing: Non-resumable error.
> -----------------------------------------------------------------------------
>
> [1] Tests I have done (and their results)
> All tests done on a T5240 (UltaSPARC T2).
> In my test cases: niu driver tries to use up to 13 vectors (table size of 32 entries).
> "SUCCCESS" - Test case booted with functional networking
> "FAILED" - Test case experienced a fatal trap after loading the niu module
> - writing 0 to all of the MSIX table entrys' ENTRY_VECTOR_CTRL: FAILED
> - writing 0 to all of the MSIX table entrys' ENTRY_LOWER_ADDR: FAILED
> - writing 0 to all of the MSIX table entrys' ENTRY_UPPER_ADDR: FAILED
> - writing 0 to all of the MSIX table entrys' ENTRY_DATA: SUCCESS
> - writing 0 to only one MSIX table entry's ENTRY_DATA: FAILED
> - writing 0 to only the first 1/2 of the MSIX table entrys' ENTRY_DATA: SUCCESS
> - writing ~0 to only the first 1/2 of the MSIX table entrys' ENTRY_DATA: SUCCESS
> - writing 0 to only the first 12 of the MSIX table entrys' ENTRY_DATA: FAILED
> - writing 0 to only the first 13 of the MSIX table entrys' ENTRY_DATA: SUCCESS
> - reading ENTRY_DATA before writing it: FAILED
> - reading ENTRY_LOWER_ADDR before writing it: FAILED
> - reading ENTRY_UPPER_ADDR before writing it: FAILED
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-04 23:44 ` Bjorn Helgaas
@ 2024-11-05 11:24 ` Dullfire
2024-11-06 15:36 ` Thomas Gleixner
1 sibling, 0 replies; 7+ messages in thread
From: Dullfire @ 2024-11-05 11:24 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: davem, sparclinux, netdev, linux-pci, Thomas Gleixner
On 11/4/24 17:44, Bjorn Helgaas wrote:
> [+cc Thomas, author of 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X
> entries")]
>
> On Mon, Nov 04, 2024 at 05:34:42AM -0600, Dullfire wrote:
>> I have also bisected the kernel, and determined that upstream commit
>> 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f revealed this issue. This commit
>> adds read to the mask status before any write to PCI_MSIX_ENTRY_DATA, thus
>> provoking the issue.
>
> 7d5ec3d36123 ("PCI/MSI: Mask all unused MSI-X entries") appeared in
> v5.14 in 2021. Surely other drivers use MSI-X and would have been
> tested on sparcv9 since then? Just based on the age of 7d5ec3d36123,
> I would guess some kind of niu issue. But Thomas will know much more.
Yeah, I wasn't very clear: I believe this problem is specific to the niu
module. My suspicion is hardware errata and/or an issue in the builtin
hypervisor.
My T5240 has several other PCIe devices, none of which exhibit this issue.
I will have to check later if any use MSIX.
Speaking of test cases: It is worth pointing out that any write to ENTRY_DATA
appears to be sufficient to allow subsequent reads to that MSIX table entry
to work. Notably, booting into a pre 7d5ec3d36123 kernel, and then rebooting
into a newer kernel will succeed, because the registers were written to under
the old kernel. I had to power off the unit to reproduce the issue if a
kernel successfully initialized the device.
Regards,
Jonathan Currier
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-04 23:44 ` Bjorn Helgaas
2024-11-05 11:24 ` Dullfire
@ 2024-11-06 15:36 ` Thomas Gleixner
2024-11-06 16:04 ` Dullfire
1 sibling, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2024-11-06 15:36 UTC (permalink / raw)
To: Bjorn Helgaas, Dullfire; +Cc: davem, sparclinux, netdev, linux-pci
On Mon, Nov 04 2024 at 17:44, Bjorn Helgaas wrote:
> On Mon, Nov 04, 2024 at 05:34:42AM -0600, Dullfire wrote:
>> I have also bisected the kernel, and determined that upstream commit
>> 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f revealed this issue. This commit
>> adds read to the mask status before any write to PCI_MSIX_ENTRY_DATA, thus
>> provoking the issue.
7d5ec3d36123 had the mask_all() invocation _before_ setting up the the
entries and reading back the descriptors. So that commit cannot break
the niu device when your problem analysis is correct.
83dbf898a2d4 moved the mask_all() invocation after setting up MSI-X into
the success path to handle a bonkers Marvell NVME device. That then
matches your problem desription as the read proceeds the write.
I've never heard of a similiar problem, so I'm pretty sure that's truly
niu specific.
Thanks,
tglx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-06 15:36 ` Thomas Gleixner
@ 2024-11-06 16:04 ` Dullfire
2024-11-06 17:32 ` Thomas Gleixner
0 siblings, 1 reply; 7+ messages in thread
From: Dullfire @ 2024-11-06 16:04 UTC (permalink / raw)
To: Thomas Gleixner, Bjorn Helgaas; +Cc: davem, sparclinux, netdev, linux-pci
> 7d5ec3d36123 had the mask_all() invocation _before_ setting up the the
> entries and reading back the descriptors. So that commit cannot break
> the niu device when your problem analysis is correct.
In 7d5ec3d36123 (and later) msix_mask_all() only writes to
PCI_MSIX_ENTRY_VECTOR_CTRL. I have tried all the MSIX registers, and only
writes to PCI_MSIX_ENTRY_DATA were able to prevent a fatal trap on a read.
However the only write to PCI_MSIX_ENTRY_DATA I see is in
__pci_write_msi_msg() for 7d5ec3d36123, or pci_write_msg_msix(), in 6.11.5.
> 83dbf898a2d4 moved the mask_all() invocation after setting up MSI-X into
> the success path to handle a bonkers Marvell NVME device. That then
> matches your problem desription as the read proceeds the write.
>
> I've never heard of a similiar problem, so I'm pretty sure that's truly
> niu specific.
>
> Thanks,
>
> tglx
Regards,
Jonathan Currier
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-06 16:04 ` Dullfire
@ 2024-11-06 17:32 ` Thomas Gleixner
2024-11-06 22:12 ` Dullfire
0 siblings, 1 reply; 7+ messages in thread
From: Thomas Gleixner @ 2024-11-06 17:32 UTC (permalink / raw)
To: Dullfire, Bjorn Helgaas; +Cc: davem, sparclinux, netdev, linux-pci
On Wed, Nov 06 2024 at 10:04, dullfire@yahoo.com wrote:
>> 7d5ec3d36123 had the mask_all() invocation _before_ setting up the the
>> entries and reading back the descriptors. So that commit cannot break
>> the niu device when your problem analysis is correct.
>
> In 7d5ec3d36123 (and later) msix_mask_all() only writes to
> PCI_MSIX_ENTRY_VECTOR_CTRL. I have tried all the MSIX registers, and only
> writes to PCI_MSIX_ENTRY_DATA were able to prevent a fatal trap on a read.
> However the only write to PCI_MSIX_ENTRY_DATA I see is in
> __pci_write_msi_msg() for 7d5ec3d36123, or pci_write_msg_msix(), in 6.11.5.
Yuck. They really went a great lenght to make this hard to handle.
Something like the obviously uncompiled below should work.
Thanks,
tglx
---
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -611,6 +611,8 @@ void msix_prepare_msi_desc(struct pci_de
if (desc->pci.msi_attrib.can_mask) {
void __iomem *addr = pci_msix_desc_addr(desc);
+ if (dev->dev_flags & PCI_MSIX_TOUCH_ENTRY_DATA_FIRST)
+ writel(0x0, addr + PCI_MSIX_ENTRY_DATA);
desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
}
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel panic with niu module
2024-11-06 17:32 ` Thomas Gleixner
@ 2024-11-06 22:12 ` Dullfire
0 siblings, 0 replies; 7+ messages in thread
From: Dullfire @ 2024-11-06 22:12 UTC (permalink / raw)
To: Thomas Gleixner, Bjorn Helgaas; +Cc: davem, sparclinux, netdev, linux-pci
On 11/6/24 11:32, Thomas Gleixner wrote:
> On Wed, Nov 06 2024 at 10:04, dullfire@yahoo.com wrote:
>>> 7d5ec3d36123 had the mask_all() invocation _before_ setting up the the
>>> entries and reading back the descriptors. So that commit cannot break
>>> the niu device when your problem analysis is correct.
>>
>> In 7d5ec3d36123 (and later) msix_mask_all() only writes to
>> PCI_MSIX_ENTRY_VECTOR_CTRL. I have tried all the MSIX registers, and only
>> writes to PCI_MSIX_ENTRY_DATA were able to prevent a fatal trap on a read.
>> However the only write to PCI_MSIX_ENTRY_DATA I see is in
>> __pci_write_msi_msg() for 7d5ec3d36123, or pci_write_msg_msix(), in 6.11.5.
>
> Yuck. They really went a great lenght to make this hard to handle.
>
> Something like the obviously uncompiled below should work.
>
> Thanks,
>
> tglx
> ---
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -611,6 +611,8 @@ void msix_prepare_msi_desc(struct pci_de
> if (desc->pci.msi_attrib.can_mask) {
> void __iomem *addr = pci_msix_desc_addr(desc);
>
> + if (dev->dev_flags & PCI_MSIX_TOUCH_ENTRY_DATA_FIRST)
> + writel(0x0, addr + PCI_MSIX_ENTRY_DATA);
> desc->pci.msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
> }
> }
>
Great. Thanks for the recommendation. That is similar to my first patch
approach. I had see struct pci_dev's bit field members, but missed the
dev_flags member. I'll probably have a patch set out in the next few days,
mostly pending my schedule, and reviewing the patch submission process.
Regards,
Jonathan Currier
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-06 22:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <f7c43842-270e-48f8-ba89-9b5e67910131.ref@yahoo.com>
2024-11-04 11:34 ` Kernel panic with niu module Dullfire
2024-11-04 23:44 ` Bjorn Helgaas
2024-11-05 11:24 ` Dullfire
2024-11-06 15:36 ` Thomas Gleixner
2024-11-06 16:04 ` Dullfire
2024-11-06 17:32 ` Thomas Gleixner
2024-11-06 22:12 ` Dullfire
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).