* Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset
@ 2023-09-13 11:25 Bagas Sanjaya
2023-09-13 14:23 ` Bjorn Helgaas
2023-09-13 15:12 ` Niklas Cassel
0 siblings, 2 replies; 12+ messages in thread
From: Bagas Sanjaya @ 2023-09-13 11:25 UTC (permalink / raw)
To: Damien Le Moal, Bjorn Helgaas, patenteng
Cc: Linux Kernel Mailing List, Linux Regressions,
Linux IDE and libata, Linux PCI
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second:
>
> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> [ 9683.270399] ata16.00: configured for UDMA/66
>
> So I've tracked the offending device:
>
> ll /sys/class/ata_port/ata16
> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16
>
> cat /sys/bus/pci/devices/0000:0a:00.0/uevent
> DRIVER=ahci
> PCI_CLASS=10601
> PCI_ID=1B4B:9130
> PCI_SUBSYS_ID=1043:8438
> PCI_SLOT_NAME=0000:0a:00.0
> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01
>
> lspci | grep 0a:00.0
> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11)
>
> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all.
See Bugzilla for the full thread.
patenteng: I have asked you to bisect this regression. Any conclusion?
Anyway, I'm adding this regression to regzbot:
#regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217902
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-13 11:25 Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset Bagas Sanjaya @ 2023-09-13 14:23 ` Bjorn Helgaas 2023-09-13 15:12 ` Niklas Cassel 1 sibling, 0 replies; 12+ messages in thread From: Bjorn Helgaas @ 2023-09-13 14:23 UTC (permalink / raw) To: Bagas Sanjaya Cc: Damien Le Moal, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: > I notice a regression report on Bugzilla [1]. Quoting from it: > > > After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: > > > > [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > [ 9683.270399] ata16.00: configured for UDMA/66 > > > > So I've tracked the offending device: > > > > ll /sys/class/ata_port/ata16 > > lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 > > > > cat /sys/bus/pci/devices/0000:0a:00.0/uevent > > DRIVER=ahci > > PCI_CLASS=10601 > > PCI_ID=1B4B:9130 > > PCI_SUBSYS_ID=1043:8438 > > PCI_SLOT_NAME=0000:0a:00.0 > > MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 > > > > lspci | grep 0a:00.0 > > 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) > > > > I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. > > See Bugzilla for the full thread. > > patenteng: I have asked you to bisect this regression. Any conclusion? > ... > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217902 Thanks for the heads-up. I can't tell whether PCI is involved here. The bugzilla only mentions the SATA link, which is on the downstream side of the PCI SATA device. If PCI is involved, e.g., if the PCI core reset the SATA device because of an error, there may be hints in the dmesg log. Can you attach the complete dmesg log and "sudo lspci -vv" output to the bugzillla? Bjorn ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-13 11:25 Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset Bagas Sanjaya 2023-09-13 14:23 ` Bjorn Helgaas @ 2023-09-13 15:12 ` Niklas Cassel 2023-09-15 3:22 ` David Gow 1 sibling, 1 reply; 12+ messages in thread From: Niklas Cassel @ 2023-09-13 15:12 UTC (permalink / raw) To: Bagas Sanjaya Cc: Damien Le Moal, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: > Hi, > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: > > > > [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > [ 9683.270399] ata16.00: configured for UDMA/66 > > > > So I've tracked the offending device: > > > > ll /sys/class/ata_port/ata16 > > lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 > > > > cat /sys/bus/pci/devices/0000:0a:00.0/uevent > > DRIVER=ahci > > PCI_CLASS=10601 > > PCI_ID=1B4B:9130 > > PCI_SUBSYS_ID=1043:8438 > > PCI_SLOT_NAME=0000:0a:00.0 > > MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 > > > > lspci | grep 0a:00.0 > > 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) > > > > I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. > > See Bugzilla for the full thread. > > patenteng: I have asked you to bisect this regression. Any conclusion? > > Anyway, I'm adding this regression to regzbot: > > #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 Hello Bagas, patenteng, FYI, the prints: [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 9683.270399] ata16.00: configured for UDMA/66 Just show that ATA error handler has been invoked. There was no reset performed. If there was a reset, you would have seen something like: [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1.541250] ata8.00: configured for UDMA/133 [ 1.541411] ata8: hard resetting link Could you please try this patch and see if it improves things for you: https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u Kind regards, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-13 15:12 ` Niklas Cassel @ 2023-09-15 3:22 ` David Gow 2023-09-15 5:41 ` Damien Le Moal 0 siblings, 1 reply; 12+ messages in thread From: David Gow @ 2023-09-15 3:22 UTC (permalink / raw) To: Niklas Cassel, Bagas Sanjaya Cc: Damien Le Moal, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI Le 2023/09/13 à 23:12, Niklas Cassel a écrit : > On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: >> Hi, >> >> I notice a regression report on Bugzilla [1]. Quoting from it: >> >>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: >>> >>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 9683.270399] ata16.00: configured for UDMA/66 >>> >>> So I've tracked the offending device: >>> >>> ll /sys/class/ata_port/ata16 >>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 >>> >>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent >>> DRIVER=ahci >>> PCI_CLASS=10601 >>> PCI_ID=1B4B:9130 >>> PCI_SUBSYS_ID=1043:8438 >>> PCI_SLOT_NAME=0000:0a:00.0 >>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 >>> >>> lspci | grep 0a:00.0 >>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) >>> >>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. >> >> See Bugzilla for the full thread. >> >> patenteng: I have asked you to bisect this regression. Any conclusion? >> >> Anyway, I'm adding this regression to regzbot: >> >> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 > > Hello Bagas, patenteng, > > > FYI, the prints: > [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 9683.270399] ata16.00: configured for UDMA/66 > > Just show that ATA error handler has been invoked. > There was no reset performed. > > If there was a reset, you would have seen something like: > [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [ 1.541250] ata8.00: configured for UDMA/133 > [ 1.541411] ata8: hard resetting link > > > Could you please try this patch and see if it improves things for you: > https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u > FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that patch applied. The log is similar (the last two lines repeat several times a second): [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port 0xf7c10480 irq 33 [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 [ 1.031852] ata14.00: configured for UDMA/66 [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.414505] ata14.00: configured for UDMA/66 [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 1.744368] ata14.00: configured for UDMA/66 [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2.074276] ata14.00: configured for UDMA/66 lspci shows: 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) Subsystem: Gigabyte Technology Co., Ltd Device b000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 33 Region 0: I/O ports at b050 [size=8] Region 1: I/O ports at b040 [size=4] Region 2: I/O ports at b030 [size=8] Region 3: I/O ports at b020 [size=4] Region 4: I/O ports at b000 [size=32] Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] Expansion ROM at f7c00000 [disabled] [size=64K] Capabilities: <access denied> Kernel driver in use: ahci The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. I'm using the controller for several drives, and it's working, it's just spammy. (At worst, there's some performance hitching, but that might just be journald rotating logs as they fill up with the message). I haven't had a chance to bisect yet (this is a slightly awkward machine for me to install test kernels on), but can also confirm it worked with 6.4.12. Hopefully that's useful. I'll get back to you if I manage to bisect it. Cheers, -- David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 3:22 ` David Gow @ 2023-09-15 5:41 ` Damien Le Moal 2023-09-15 6:54 ` David Gow 0 siblings, 1 reply; 12+ messages in thread From: Damien Le Moal @ 2023-09-15 5:41 UTC (permalink / raw) To: David Gow, Niklas Cassel, Bagas Sanjaya Cc: Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On 9/15/23 12:22, David Gow wrote: > Le 2023/09/13 à 23:12, Niklas Cassel a écrit : >> On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: >>> Hi, >>> >>> I notice a regression report on Bugzilla [1]. Quoting from it: >>> >>>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: >>>> >>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>> >>>> So I've tracked the offending device: >>>> >>>> ll /sys/class/ata_port/ata16 >>>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 >>>> >>>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent >>>> DRIVER=ahci >>>> PCI_CLASS=10601 >>>> PCI_ID=1B4B:9130 >>>> PCI_SUBSYS_ID=1043:8438 >>>> PCI_SLOT_NAME=0000:0a:00.0 >>>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 >>>> >>>> lspci | grep 0a:00.0 >>>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) >>>> >>>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. >>> >>> See Bugzilla for the full thread. >>> >>> patenteng: I have asked you to bisect this regression. Any conclusion? >>> >>> Anyway, I'm adding this regression to regzbot: >>> >>> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 >> >> Hello Bagas, patenteng, >> >> >> FYI, the prints: >> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [ 9683.270399] ata16.00: configured for UDMA/66 >> >> Just show that ATA error handler has been invoked. >> There was no reset performed. >> >> If there was a reset, you would have seen something like: >> [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >> [ 1.541250] ata8.00: configured for UDMA/133 >> [ 1.541411] ata8: hard resetting link >> >> >> Could you please try this patch and see if it improves things for you: >> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >> > > FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master > [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that > patch applied. > > > The log is similar (the last two lines repeat several times a second): > [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port > 0xf7c10480 irq 33 > [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 > [ 1.031852] ata14.00: configured for UDMA/66 > [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 1.414505] ata14.00: configured for UDMA/66 > [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 1.744368] ata14.00: configured for UDMA/66 > [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > [ 2.074276] ata14.00: configured for UDMA/66 > > > lspci shows: > 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 > x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) > Subsystem: Gigabyte Technology Co., Ltd Device b000 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 33 > Region 0: I/O ports at b050 [size=8] > Region 1: I/O ports at b040 [size=4] > Region 2: I/O ports at b030 [size=8] > Region 3: I/O ports at b020 [size=4] > Region 4: I/O ports at b000 [size=32] > Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] > Expansion ROM at f7c00000 [disabled] [size=64K] > Capabilities: <access denied> > Kernel driver in use: ahci > > The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. > I'm using the controller for several drives, and it's working, it's just > spammy. (At worst, there's some performance hitching, but that might > just be journald rotating logs as they fill up with the message). > > I haven't had a chance to bisect yet (this is a slightly awkward machine > for me to install test kernels on), but can also confirm it worked with > 6.4.12. > > Hopefully that's useful. I'll get back to you if I manage to bisect it. Bisect will definitely be welcome. But first, please try adding the patch that Niklas mentioned above: https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u If that fixes the issue, we know the culprit :) > > Cheers, > -- David -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 5:41 ` Damien Le Moal @ 2023-09-15 6:54 ` David Gow 2023-09-15 7:00 ` Damien Le Moal 2023-09-15 8:50 ` Niklas Cassel 0 siblings, 2 replies; 12+ messages in thread From: David Gow @ 2023-09-15 6:54 UTC (permalink / raw) To: Damien Le Moal, Niklas Cassel, Bagas Sanjaya Cc: Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI Le 2023/09/15 à 13:41, Damien Le Moal a écrit : > On 9/15/23 12:22, David Gow wrote: >> Le 2023/09/13 à 23:12, Niklas Cassel a écrit : >>> On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: >>>> Hi, >>>> >>>> I notice a regression report on Bugzilla [1]. Quoting from it: >>>> >>>>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: >>>>> >>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>>> >>>>> So I've tracked the offending device: >>>>> >>>>> ll /sys/class/ata_port/ata16 >>>>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 >>>>> >>>>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent >>>>> DRIVER=ahci >>>>> PCI_CLASS=10601 >>>>> PCI_ID=1B4B:9130 >>>>> PCI_SUBSYS_ID=1043:8438 >>>>> PCI_SLOT_NAME=0000:0a:00.0 >>>>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 >>>>> >>>>> lspci | grep 0a:00.0 >>>>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) >>>>> >>>>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. >>>> >>>> See Bugzilla for the full thread. >>>> >>>> patenteng: I have asked you to bisect this regression. Any conclusion? >>>> >>>> Anyway, I'm adding this regression to regzbot: >>>> >>>> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 >>> >>> Hello Bagas, patenteng, >>> >>> >>> FYI, the prints: >>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 9683.270399] ata16.00: configured for UDMA/66 >>> >>> Just show that ATA error handler has been invoked. >>> There was no reset performed. >>> >>> If there was a reset, you would have seen something like: >>> [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>> [ 1.541250] ata8.00: configured for UDMA/133 >>> [ 1.541411] ata8: hard resetting link >>> >>> >>> Could you please try this patch and see if it improves things for you: >>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >>> >> >> FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master >> [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that >> patch applied. >> >> >> The log is similar (the last two lines repeat several times a second): >> [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port >> 0xf7c10480 irq 33 >> [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 >> [ 1.031852] ata14.00: configured for UDMA/66 >> [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [ 1.414505] ata14.00: configured for UDMA/66 >> [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [ 1.744368] ata14.00: configured for UDMA/66 >> [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> [ 2.074276] ata14.00: configured for UDMA/66 >> >> >> lspci shows: >> 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 >> x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) >> Subsystem: Gigabyte Technology Co., Ltd Device b000 >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >> ParErr- Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> <TAbort- <MAbort- >SERR- <PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 33 >> Region 0: I/O ports at b050 [size=8] >> Region 1: I/O ports at b040 [size=4] >> Region 2: I/O ports at b030 [size=8] >> Region 3: I/O ports at b020 [size=4] >> Region 4: I/O ports at b000 [size=32] >> Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] >> Expansion ROM at f7c00000 [disabled] [size=64K] >> Capabilities: <access denied> >> Kernel driver in use: ahci >> >> The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. >> I'm using the controller for several drives, and it's working, it's just >> spammy. (At worst, there's some performance hitching, but that might >> just be journald rotating logs as they fill up with the message). >> >> I haven't had a chance to bisect yet (this is a slightly awkward machine >> for me to install test kernels on), but can also confirm it worked with >> 6.4.12. >> >> Hopefully that's useful. I'll get back to you if I manage to bisect it. > > Bisect will definitely be welcome. But first, please try adding the patch that > Niklas mentioned above: > > https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u > > If that fixes the issue, we know the culprit :) > Sorry: I wasn't clear. I did try with that patch (applied on top of torvalds/master), and the issue remained. I've started bisecting, but fear it'll take a while. Thanks, -- David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 6:54 ` David Gow @ 2023-09-15 7:00 ` Damien Le Moal 2023-09-15 8:50 ` Niklas Cassel 1 sibling, 0 replies; 12+ messages in thread From: Damien Le Moal @ 2023-09-15 7:00 UTC (permalink / raw) To: David Gow, Niklas Cassel, Bagas Sanjaya Cc: Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On 9/15/23 15:54, David Gow wrote: > Le 2023/09/15 à 13:41, Damien Le Moal a écrit : >> On 9/15/23 12:22, David Gow wrote: >>> Le 2023/09/13 à 23:12, Niklas Cassel a écrit : >>>> On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: >>>>> Hi, >>>>> >>>>> I notice a regression report on Bugzilla [1]. Quoting from it: >>>>> >>>>>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: >>>>>> >>>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>>>> >>>>>> So I've tracked the offending device: >>>>>> >>>>>> ll /sys/class/ata_port/ata16 >>>>>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 >>>>>> >>>>>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent >>>>>> DRIVER=ahci >>>>>> PCI_CLASS=10601 >>>>>> PCI_ID=1B4B:9130 >>>>>> PCI_SUBSYS_ID=1043:8438 >>>>>> PCI_SLOT_NAME=0000:0a:00.0 >>>>>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 >>>>>> >>>>>> lspci | grep 0a:00.0 >>>>>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) >>>>>> >>>>>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. >>>>> >>>>> See Bugzilla for the full thread. >>>>> >>>>> patenteng: I have asked you to bisect this regression. Any conclusion? >>>>> >>>>> Anyway, I'm adding this regression to regzbot: >>>>> >>>>> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 >>>> >>>> Hello Bagas, patenteng, >>>> >>>> >>>> FYI, the prints: >>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>> >>>> Just show that ATA error handler has been invoked. >>>> There was no reset performed. >>>> >>>> If there was a reset, you would have seen something like: >>>> [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>>> [ 1.541250] ata8.00: configured for UDMA/133 >>>> [ 1.541411] ata8: hard resetting link >>>> >>>> >>>> Could you please try this patch and see if it improves things for you: >>>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >>>> >>> >>> FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master >>> [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that >>> patch applied. >>> >>> >>> The log is similar (the last two lines repeat several times a second): >>> [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port >>> 0xf7c10480 irq 33 >>> [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 >>> [ 1.031852] ata14.00: configured for UDMA/66 >>> [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 1.414505] ata14.00: configured for UDMA/66 >>> [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 1.744368] ata14.00: configured for UDMA/66 >>> [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>> [ 2.074276] ata14.00: configured for UDMA/66 >>> >>> >>> lspci shows: >>> 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 >>> x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) >>> Subsystem: Gigabyte Technology Co., Ltd Device b000 >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>> <TAbort- <MAbort- >SERR- <PERR- INTx- >>> Latency: 0, Cache Line Size: 64 bytes >>> Interrupt: pin A routed to IRQ 33 >>> Region 0: I/O ports at b050 [size=8] >>> Region 1: I/O ports at b040 [size=4] >>> Region 2: I/O ports at b030 [size=8] >>> Region 3: I/O ports at b020 [size=4] >>> Region 4: I/O ports at b000 [size=32] >>> Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] >>> Expansion ROM at f7c00000 [disabled] [size=64K] >>> Capabilities: <access denied> >>> Kernel driver in use: ahci >>> >>> The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. >>> I'm using the controller for several drives, and it's working, it's just >>> spammy. (At worst, there's some performance hitching, but that might >>> just be journald rotating logs as they fill up with the message). >>> >>> I haven't had a chance to bisect yet (this is a slightly awkward machine >>> for me to install test kernels on), but can also confirm it worked with >>> 6.4.12. >>> >>> Hopefully that's useful. I'll get back to you if I manage to bisect it. >> >> Bisect will definitely be welcome. But first, please try adding the patch that >> Niklas mentioned above: >> >> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >> >> If that fixes the issue, we know the culprit :) >> > > > Sorry: I wasn't clear. I did try with that patch (applied on top of > torvalds/master), and the issue remained. > > I've started bisecting, but fear it'll take a while. OK. Thanks. > > Thanks, > -- David > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 6:54 ` David Gow 2023-09-15 7:00 ` Damien Le Moal @ 2023-09-15 8:50 ` Niklas Cassel 2023-09-15 12:26 ` David Gow 1 sibling, 1 reply; 12+ messages in thread From: Niklas Cassel @ 2023-09-15 8:50 UTC (permalink / raw) To: David Gow Cc: Damien Le Moal, Bagas Sanjaya, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On Fri, Sep 15, 2023 at 02:54:19PM +0800, David Gow wrote: > Le 2023/09/15 à 13:41, Damien Le Moal a écrit : > > On 9/15/23 12:22, David Gow wrote: > > > Le 2023/09/13 à 23:12, Niklas Cassel a écrit : > > > > On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: > > > > > Hi, > > > > > > > > > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > > > > > > > > > After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: > > > > > > > > > > > > [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > > > > [ 9683.270399] ata16.00: configured for UDMA/66 > > > > > > > > > > > > So I've tracked the offending device: > > > > > > > > > > > > ll /sys/class/ata_port/ata16 > > > > > > lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 > > > > > > > > > > > > cat /sys/bus/pci/devices/0000:0a:00.0/uevent > > > > > > DRIVER=ahci > > > > > > PCI_CLASS=10601 > > > > > > PCI_ID=1B4B:9130 > > > > > > PCI_SUBSYS_ID=1043:8438 > > > > > > PCI_SLOT_NAME=0000:0a:00.0 > > > > > > MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 > > > > > > > > > > > > lspci | grep 0a:00.0 > > > > > > 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) > > > > > > > > > > > > I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. > > > > > > > > > > See Bugzilla for the full thread. > > > > > > > > > > patenteng: I have asked you to bisect this regression. Any conclusion? > > > > > > > > > > Anyway, I'm adding this regression to regzbot: > > > > > > > > > > #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 > > > > > > > > Hello Bagas, patenteng, > > > > > > > > > > > > FYI, the prints: > > > > [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > > [ 9683.270399] ata16.00: configured for UDMA/66 > > > > > > > > Just show that ATA error handler has been invoked. > > > > There was no reset performed. > > > > > > > > If there was a reset, you would have seen something like: > > > > [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > > > > [ 1.541250] ata8.00: configured for UDMA/133 > > > > [ 1.541411] ata8: hard resetting link > > > > > > > > > > > > Could you please try this patch and see if it improves things for you: > > > > https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u > > > > > > > > > > FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master > > > [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of > > > git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that > > > patch applied. > > > > > > > > > The log is similar (the last two lines repeat several times a second): > > > [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port > > > 0xf7c10480 irq 33 > > > [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 > > > [ 1.031852] ata14.00: configured for UDMA/66 > > > [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > [ 1.414505] ata14.00: configured for UDMA/66 > > > [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > [ 1.744368] ata14.00: configured for UDMA/66 > > > [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > > > [ 2.074276] ata14.00: configured for UDMA/66 > > > > > > > > > lspci shows: > > > 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 > > > x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) > > > Subsystem: Gigabyte Technology Co., Ltd Device b000 > > > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > > > ParErr- Stepping- SERR- FastB2B- DisINTx+ > > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > > <TAbort- <MAbort- >SERR- <PERR- INTx- > > > Latency: 0, Cache Line Size: 64 bytes > > > Interrupt: pin A routed to IRQ 33 > > > Region 0: I/O ports at b050 [size=8] > > > Region 1: I/O ports at b040 [size=4] > > > Region 2: I/O ports at b030 [size=8] > > > Region 3: I/O ports at b020 [size=4] > > > Region 4: I/O ports at b000 [size=32] > > > Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] > > > Expansion ROM at f7c00000 [disabled] [size=64K] > > > Capabilities: <access denied> > > > Kernel driver in use: ahci > > > > > > The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. > > > I'm using the controller for several drives, and it's working, it's just > > > spammy. (At worst, there's some performance hitching, but that might > > > just be journald rotating logs as they fill up with the message). > > > > > > I haven't had a chance to bisect yet (this is a slightly awkward machine > > > for me to install test kernels on), but can also confirm it worked with > > > 6.4.12. > > > > > > Hopefully that's useful. I'll get back to you if I manage to bisect it. > > > > Bisect will definitely be welcome. But first, please try adding the patch that > > Niklas mentioned above: > > > > https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u > > > > If that fixes the issue, we know the culprit :) > > > > > Sorry: I wasn't clear. I did try with that patch (applied on top of > torvalds/master), and the issue remained. > > I've started bisecting, but fear it'll take a while. I can recommend using QEMU and PCI passthrough to bisect, as it is much faster to boot a kernel using QEMU with KVM than to do a real reboot. It takes a while to set up the first time, but you know what they say: "give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime". There are many ways to do it, but here is an example guide: https://github.com/floatious/qemu-bisect-doc Kind regards, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 8:50 ` Niklas Cassel @ 2023-09-15 12:26 ` David Gow 2023-09-15 16:20 ` Niklas Cassel 0 siblings, 1 reply; 12+ messages in thread From: David Gow @ 2023-09-15 12:26 UTC (permalink / raw) To: Niklas Cassel, Damien Le Moal Cc: Bagas Sanjaya, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI Le 2023/09/15 à 16:50, Niklas Cassel a écrit : > On Fri, Sep 15, 2023 at 02:54:19PM +0800, David Gow wrote: >> Le 2023/09/15 à 13:41, Damien Le Moal a écrit : >>> On 9/15/23 12:22, David Gow wrote: >>>> Le 2023/09/13 à 23:12, Niklas Cassel a écrit : >>>>> On Wed, Sep 13, 2023 at 06:25:31PM +0700, Bagas Sanjaya wrote: >>>>>> Hi, >>>>>> >>>>>> I notice a regression report on Bugzilla [1]. Quoting from it: >>>>>> >>>>>>> After upgrading to 6.5.2 from 6.4.12 I keep getting the following kernel messages around three times per second: >>>>>>> >>>>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>>>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>>>>> >>>>>>> So I've tracked the offending device: >>>>>>> >>>>>>> ll /sys/class/ata_port/ata16 >>>>>>> lrwxrwxrwx 1 root root 0 Sep 10 21:51 /sys/class/ata_port/ata16 -> ../../devices/pci0000:00/0000:00:1c.7/0000:0a:00.0/ata16/ata_port/ata16 >>>>>>> >>>>>>> cat /sys/bus/pci/devices/0000:0a:00.0/uevent >>>>>>> DRIVER=ahci >>>>>>> PCI_CLASS=10601 >>>>>>> PCI_ID=1B4B:9130 >>>>>>> PCI_SUBSYS_ID=1043:8438 >>>>>>> PCI_SLOT_NAME=0000:0a:00.0 >>>>>>> MODALIAS=pci:v00001B4Bd00009130sv00001043sd00008438bc01sc06i01 >>>>>>> >>>>>>> lspci | grep 0a:00.0 >>>>>>> 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA 6 Gb/s RAID controller with HyperDuo (rev 11) >>>>>>> >>>>>>> I am not using the 88SE9128, so I have no way of knowing whether it works or not. It may simply be getting reset a couple of times per second or it may not function at all. >>>>>> >>>>>> See Bugzilla for the full thread. >>>>>> >>>>>> patenteng: I have asked you to bisect this regression. Any conclusion? >>>>>> >>>>>> Anyway, I'm adding this regression to regzbot: >>>>>> >>>>>> #regzbot: introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217902 >>>>> >>>>> Hello Bagas, patenteng, >>>>> >>>>> >>>>> FYI, the prints: >>>>> [ 9683.269830] ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>>> [ 9683.270399] ata16.00: configured for UDMA/66 >>>>> >>>>> Just show that ATA error handler has been invoked. >>>>> There was no reset performed. >>>>> >>>>> If there was a reset, you would have seen something like: >>>>> [ 1.441326] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) >>>>> [ 1.541250] ata8.00: configured for UDMA/133 >>>>> [ 1.541411] ata8: hard resetting link >>>>> >>>>> >>>>> Could you please try this patch and see if it improves things for you: >>>>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >>>>> >>>> >>>> FWIW, I'm seeing a very similar issue both in 6.5.2 and in git master >>>> [aed8aee11130 ("Merge tag 'pmdomain-v6.6-rc1' of >>>> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm") with that >>>> patch applied. >>>> >>>> >>>> The log is similar (the last two lines repeat several times a second): >>>> [ 0.369632] ata14: SATA max UDMA/133 abar m2048@0xf7c10000 port >>>> 0xf7c10480 irq 33 >>>> [ 0.683693] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 1.031662] ata14.00: ATAPI: MARVELL VIRTUALL, 1.09, max UDMA/66 >>>> [ 1.031852] ata14.00: configured for UDMA/66 >>>> [ 1.414145] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 1.414505] ata14.00: configured for UDMA/66 >>>> [ 1.744094] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 1.744368] ata14.00: configured for UDMA/66 >>>> [ 2.073916] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >>>> [ 2.074276] ata14.00: configured for UDMA/66 >>>> >>>> >>>> lspci shows: >>>> 09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 >>>> x2 4-port SATA 6 Gb/s RAID Controller (rev 10) (prog-if 01 [AHCI 1.0]) >>>> Subsystem: Gigabyte Technology Co., Ltd Device b000 >>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>>> <TAbort- <MAbort- >SERR- <PERR- INTx- >>>> Latency: 0, Cache Line Size: 64 bytes >>>> Interrupt: pin A routed to IRQ 33 >>>> Region 0: I/O ports at b050 [size=8] >>>> Region 1: I/O ports at b040 [size=4] >>>> Region 2: I/O ports at b030 [size=8] >>>> Region 3: I/O ports at b020 [size=4] >>>> Region 4: I/O ports at b000 [size=32] >>>> Region 5: Memory at f7c10000 (32-bit, non-prefetchable) [size=2K] >>>> Expansion ROM at f7c00000 [disabled] [size=64K] >>>> Capabilities: <access denied> >>>> Kernel driver in use: ahci >>>> >>>> The controller in question lives on a Gigabyte Z87X-UD5H-CF motherboard. >>>> I'm using the controller for several drives, and it's working, it's just >>>> spammy. (At worst, there's some performance hitching, but that might >>>> just be journald rotating logs as they fill up with the message). >>>> >>>> I haven't had a chance to bisect yet (this is a slightly awkward machine >>>> for me to install test kernels on), but can also confirm it worked with >>>> 6.4.12. >>>> >>>> Hopefully that's useful. I'll get back to you if I manage to bisect it. >>> >>> Bisect will definitely be welcome. But first, please try adding the patch that >>> Niklas mentioned above: >>> >>> https://lore.kernel.org/linux-ide/20230913150443.1200790-1-nks@flawful.org/T/#u >>> >>> If that fixes the issue, we know the culprit :) >>> >> >> >> Sorry: I wasn't clear. I did try with that patch (applied on top of >> torvalds/master), and the issue remained. >> >> I've started bisecting, but fear it'll take a while. > > I can recommend using QEMU and PCI passthrough to bisect, as it is much > faster to boot a kernel using QEMU with KVM than to do a real reboot. > > It takes a while to set up the first time, but you know what they say: > "give a man a fish and you feed him for a day; > teach a man to fish and you feed him for a lifetime". > > There are many ways to do it, but here is an example guide: > https://github.com/floatious/qemu-bisect-doc > Thanks. Alas, this machine doesn't have an IOMMU, which makes that difficult. I've definitely saved the link for the future, though. In any case, the bisect is done: 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit commit 624885209f31eb9985bf51abe204ecbffe2fdeea Author: Damien Le Moal <dlemoal@kernel.org> Date: Thu May 11 03:13:41 2023 +0200 scsi: core: Detect support for command duration limits Introduce the function scsi_cdl_check() to detect if a device supports command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32 and WRITE 32 commands are checked using the function scsi_report_opcode() to probe the rwcdlp and cdlp bits as they indicate the mode page defining the command duration limits descriptors that apply to the command being tested. If any of these commands support CDL, the field cdl_supported of struct scsi_device is set to 1 to indicate that the device supports CDL. Support for CDL for a device is advertizes through sysfs using the new cdl_supported device attribute. This attribute value is 1 for a device supporting CDL and 0 otherwise. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Documentation/ABI/testing/sysfs-block-device | 9 ++++ drivers/scsi/scsi.c | 81 ++++++++++++++++++++++++++++ drivers/scsi/scsi_scan.c | 3 ++ drivers/scsi/scsi_sysfs.c | 2 + include/scsi/scsi_device.h | 3 ++ 5 files changed, 98 insertions(+) This seems to match what was found on the Arch Linux forums, too: https://bbs.archlinux.org/viewtopic.php?id=288723&p=3 I haven't tried it yet, but according to that forum thread, removing the calls to scsi_cdl_check() seems to resolve the issue. This is all well beyond my SCSI knowledge, but maybe a quirk to disable these CDL checks for these older marvell controllers is required? Though it seems odd that the device would be rescanned and/or scsi_add_lun called multiple times a second -- is that normal? In any case, this seems to be the cause. Thanks! -- David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 12:26 ` David Gow @ 2023-09-15 16:20 ` Niklas Cassel 2023-09-16 2:21 ` David Gow 0 siblings, 1 reply; 12+ messages in thread From: Niklas Cassel @ 2023-09-15 16:20 UTC (permalink / raw) To: David Gow Cc: Damien Le Moal, Bagas Sanjaya, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI On Fri, Sep 15, 2023 at 08:26:58PM +0800, David Gow wrote: > In any case, the bisect is done: > > 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit > commit 624885209f31eb9985bf51abe204ecbffe2fdeea > Author: Damien Le Moal <dlemoal@kernel.org> > Date: Thu May 11 03:13:41 2023 +0200 > > scsi: core: Detect support for command duration limits > > Introduce the function scsi_cdl_check() to detect if a device supports > command duration limits (CDL). Support for the READ 16, WRITE 16, READ > 32 > and WRITE 32 commands are checked using the function > scsi_report_opcode() > to probe the rwcdlp and cdlp bits as they indicate the mode page > defining > the command duration limits descriptors that apply to the command being > tested. > > If any of these commands support CDL, the field cdl_supported of struct > scsi_device is set to 1 to indicate that the device supports CDL. > > Support for CDL for a device is advertizes through sysfs using the new > cdl_supported device attribute. This attribute value is 1 for a device > supporting CDL and 0 otherwise. > > Signed-off-by: Damien Le Moal <dlemoal@kernel.org> > Reviewed-by: Hannes Reinecke <hare@suse.de> > Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > Documentation/ABI/testing/sysfs-block-device | 9 ++++ > drivers/scsi/scsi.c | 81 > ++++++++++++++++++++++++++++ > drivers/scsi/scsi_scan.c | 3 ++ > drivers/scsi/scsi_sysfs.c | 2 + > include/scsi/scsi_device.h | 3 ++ > 5 files changed, 98 insertions(+) > > > This seems to match what was found on the Arch Linux forums, too: > https://bbs.archlinux.org/viewtopic.php?id=288723&p=3 > > I haven't tried it yet, but according to that forum thread, removing the > calls to scsi_cdl_check() seems to resolve the issue. This is all well > beyond my SCSI knowledge, but maybe a quirk to disable these CDL checks for > these older marvell controllers is required? Though it seems odd that the > device would be rescanned and/or scsi_add_lun called multiple times a second > -- is that normal? > > In any case, this seems to be the cause. Hello David, Thank you very much for your effort of bisecting this. Could you please try this patch and see if it improves things for you: https://lore.kernel.org/linux-scsi/20230915022034.678121-1-dlemoal@kernel.org/ Kind regards, Niklas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-15 16:20 ` Niklas Cassel @ 2023-09-16 2:21 ` David Gow 2023-09-19 11:48 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 1 reply; 12+ messages in thread From: David Gow @ 2023-09-16 2:21 UTC (permalink / raw) To: Niklas Cassel Cc: Damien Le Moal, Bagas Sanjaya, Bjorn Helgaas, patenteng, Linux Kernel Mailing List, Linux Regressions, Linux IDE and libata, Linux PCI Le 2023/09/16 à 0:20, Niklas Cassel a écrit : > On Fri, Sep 15, 2023 at 08:26:58PM +0800, David Gow wrote: >> In any case, the bisect is done: >> >> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit >> commit 624885209f31eb9985bf51abe204ecbffe2fdeea >> Author: Damien Le Moal <dlemoal@kernel.org> >> Date: Thu May 11 03:13:41 2023 +0200 >> >> scsi: core: Detect support for command duration limits >> >> Introduce the function scsi_cdl_check() to detect if a device supports >> command duration limits (CDL). Support for the READ 16, WRITE 16, READ >> 32 >> and WRITE 32 commands are checked using the function >> scsi_report_opcode() >> to probe the rwcdlp and cdlp bits as they indicate the mode page >> defining >> the command duration limits descriptors that apply to the command being >> tested. >> >> If any of these commands support CDL, the field cdl_supported of struct >> scsi_device is set to 1 to indicate that the device supports CDL. >> >> Support for CDL for a device is advertizes through sysfs using the new >> cdl_supported device attribute. This attribute value is 1 for a device >> supporting CDL and 0 otherwise. >> >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> >> Reviewed-by: Hannes Reinecke <hare@suse.de> >> Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com> >> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> >> Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org >> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> >> >> Documentation/ABI/testing/sysfs-block-device | 9 ++++ >> drivers/scsi/scsi.c | 81 >> ++++++++++++++++++++++++++++ >> drivers/scsi/scsi_scan.c | 3 ++ >> drivers/scsi/scsi_sysfs.c | 2 + >> include/scsi/scsi_device.h | 3 ++ >> 5 files changed, 98 insertions(+) >> >> >> This seems to match what was found on the Arch Linux forums, too: >> https://bbs.archlinux.org/viewtopic.php?id=288723&p=3 >> >> I haven't tried it yet, but according to that forum thread, removing the >> calls to scsi_cdl_check() seems to resolve the issue. This is all well >> beyond my SCSI knowledge, but maybe a quirk to disable these CDL checks for >> these older marvell controllers is required? Though it seems odd that the >> device would be rescanned and/or scsi_add_lun called multiple times a second >> -- is that normal? >> >> In any case, this seems to be the cause. > > Hello David, > > Thank you very much for your effort of bisecting this. > > Could you please try this patch and see if it improves things for you: > https://lore.kernel.org/linux-scsi/20230915022034.678121-1-dlemoal@kernel.org/ > Thanks very much: this seems to fix it here (on top of torvalds/master). Cheers, -- David ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset 2023-09-16 2:21 ` David Gow @ 2023-09-19 11:48 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 12+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-19 11:48 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Linux Regressions, Linux IDE and libata, Linux PCI [TLDR: This mail in primarily relevant for Linux regression tracking. A change or fix related to the regression discussed in this thread was posted or applied, but it did not use a Closes: tag to point to the report, as Linus and the documentation call for. Things happen, no worries -- but now the regression tracking bot needs to be told manually about the fix. See link in footer if these mails annoy you.] On 16.09.23 04:21, David Gow wrote: > Le 2023/09/16 à 0:20, Niklas Cassel a écrit : >> On Fri, Sep 15, 2023 at 08:26:58PM +0800, David Gow wrote: >>> In any case, the bisect is done: >>> >>> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit >>> commit 624885209f31eb9985bf51abe204ecbffe2fdeea >>> Author: Damien Le Moal <dlemoal@kernel.org> >>> Date: Thu May 11 03:13:41 2023 +0200 > [...] >> Thank you very much for your effort of bisecting this. >> >> Could you please try this patch and see if it improves things for you: >> https://lore.kernel.org/linux-scsi/20230915022034.678121-1-dlemoal@kernel.org/ > > Thanks very much: this seems to fix it here (on top of torvalds/master). #regzbot: introduced: 624885209f31eb9 #regzbot fix: scsi: Do no try to probe for CDL on old drives #regzbot monitor: https://lore.kernel.org/linux-scsi/20230915022034.678121-1-dlemoal@kernel.org/ #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-19 11:48 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-13 11:25 Fwd: Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset Bagas Sanjaya 2023-09-13 14:23 ` Bjorn Helgaas 2023-09-13 15:12 ` Niklas Cassel 2023-09-15 3:22 ` David Gow 2023-09-15 5:41 ` Damien Le Moal 2023-09-15 6:54 ` David Gow 2023-09-15 7:00 ` Damien Le Moal 2023-09-15 8:50 ` Niklas Cassel 2023-09-15 12:26 ` David Gow 2023-09-15 16:20 ` Niklas Cassel 2023-09-16 2:21 ` David Gow 2023-09-19 11:48 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox