* Possible nvme regression in 6.4.11
@ 2023-08-16 20:39 Genes Lists
2023-08-16 21:04 ` Keith Busch
2023-08-17 3:00 ` Bagas Sanjaya
0 siblings, 2 replies; 19+ messages in thread
From: Genes Lists @ 2023-08-16 20:39 UTC (permalink / raw)
To: linux-kernel; +Cc: kbusch, axboe, sagi, linux-nvme, hch
Also reported to bugzilla [1]
Failure happens on 1 laptop with samsung ssd.
Boot log manually transcribed:
kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff,
PCI_STATUS=0xffff
kernel: nvme nvme0: Does your device have a faulty power saving mode
enabled?
kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0
pcie_aspm=off" and report a bug
kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to
D0, device inaccessible
kernel: nvme nvme0: Disabling device after reset failure: -19
mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5.
mount[353]: dmesg(1) may have more information after failed moutn
system call.
kernel: nvme0m1: detected capacity change from 2000409264 to 0
kernel: EXT4-fs (nvme0n1p5): unable to read superblock
systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a
...
All kernels are upstream, untainted and compiled on Arch using:
gcc version 13.2.1
Kernels Tested:
- 6.4.10 - works fine
- 6.4.11 - fails
- 6.5-rc6 - fails
- 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails
- 6.4.11 with 1 revert below - fails
Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G
and 512G"
This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473.
Hardware:
model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
stepping : 9
microcode : 0xf4
nvme:
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe
SSD Controller SM961/PM961/SM963
Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD
Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0
Memory at edb00000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=33 Masked-
Kernel driver in use: nvme
Gene
[1] https://bugzilla.kernel.org/show_bug.cgi?id=217802
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: Possible nvme regression in 6.4.11 2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists @ 2023-08-16 21:04 ` Keith Busch 2023-08-17 1:30 ` Genes Lists 2023-08-17 3:00 ` Bagas Sanjaya 1 sibling, 1 reply; 19+ messages in thread From: Keith Busch @ 2023-08-16 21:04 UTC (permalink / raw) To: Genes Lists; +Cc: linux-kernel, axboe, sagi, linux-nvme, hch On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote: > Also reported to bugzilla [1] > > Failure happens on 1 laptop with samsung ssd. > > Boot log manually transcribed: > > kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff, > PCI_STATUS=0xffff > kernel: nvme nvme0: Does your device have a faulty power saving mode > enabled? > kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0 > pcie_aspm=off" and report a bug > kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, > device inaccessible > kernel: nvme nvme0: Disabling device after reset failure: -19 > mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5. > mount[353]: dmesg(1) may have more information after failed moutn > system call. > kernel: nvme0m1: detected capacity change from 2000409264 to 0 > kernel: EXT4-fs (nvme0n1p5): unable to read superblock > systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a > ... > > All kernels are upstream, untainted and compiled on Arch using: > > gcc version 13.2.1 > > Kernels Tested: > - 6.4.10 - works fine > - 6.4.11 - fails > - 6.5-rc6 - fails > - 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails > - 6.4.11 with 1 revert below - fails > > Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and > 512G" > This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473. It sounds like you can recreate this. Since .10 worked and .11 doesn't, could you bisect the git commits? It looks like it will take 7 steps between those two versions. I don't think there are any nvme specific patches that could contribute to what you're seeing, it's more likely some lower level platform patch if a kernel change really did cause the regression. None of the recent commits really stood out to me, so bisect is what I'd recommend. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-16 21:04 ` Keith Busch @ 2023-08-17 1:30 ` Genes Lists 2023-08-17 9:16 ` Genes Lists 0 siblings, 1 reply; 19+ messages in thread From: Genes Lists @ 2023-08-17 1:30 UTC (permalink / raw) To: Keith Busch; +Cc: linux-kernel, axboe, sagi, linux-nvme, hch On 8/16/23 17:04, Keith Busch wrote: ... > It sounds like you can recreate this. Since .10 worked and .11 doesn't, > could you bisect the git commits? It looks like it will take 7 steps > between those two versions. > > I don't think there are any nvme specific patches that could contribute > to what you're seeing, it's more likely some lower level platform patch > if a kernel change really did cause the regression. None of the recent > commits really stood out to me, so bisect is what I'd recommend. Thank you Bisect done - This is result: ---------------------------------------------------------------- 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 Author: Ricky WU <ricky_wu@realtek.com> Date: Tue Jul 25 09:10:54 2023 +0000 misc: rtsx: judge ASPM Mode to set PETXCFG Reg commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG always set to HIGH during the initialization. Cc: stable@vger.kernel.org Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Link: https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 ------------------ drivers/misc/cardreader/rts5249.c | 3 +-- drivers/misc/cardreader/rts5260.c | 18 ------------------ drivers/misc/cardreader/rts5261.c | 18 ------------------ drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- 6 files changed, 6 insertions(+), 58 deletions(-) ------------------------------------------------------ And the machine does have this hardware: 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01) Subsystem: Dell RTS525A PCI Express Card Reader Physical Slot: 1 Flags: bus master, fast devsel, latency 0, IRQ 141 Memory at ed100000 (32-bit, non-prefetchable) [size=4K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [b0] Express Endpoint, MSI 00 Kernel driver in use: rtsx_pci Kernel modules: rtsx_pci ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-17 1:30 ` Genes Lists @ 2023-08-17 9:16 ` Genes Lists 2023-08-17 17:28 ` Keith Busch 2023-08-23 17:41 ` Keith Busch 0 siblings, 2 replies; 19+ messages in thread From: Genes Lists @ 2023-08-17 9:16 UTC (permalink / raw) To: Keith Busch Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On 8/16/23 21:30, Genes Lists wrote: > On 8/16/23 17:04, Keith Busch wrote: > ... >> It sounds like you can recreate this. Since .10 worked and .11 doesn't, >> could you bisect the git commits? It looks like it will take 7 steps >> between those two versions. >> >> I don't think there are any nvme specific patches that could contribute >> to what you're seeing, it's more likely some lower level platform patch >> if a kernel change really did cause the regression. None of the recent >> commits really stood out to me, so bisect is what I'd recommend. > > Thank you > > Bisect done - This is result: > > ---------------------------------------------------------------- > 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit > commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 > Author: Ricky WU <ricky_wu@realtek.com> > Date: Tue Jul 25 09:10:54 2023 +0000 > > misc: rtsx: judge ASPM Mode to set PETXCFG Reg > > commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. > > ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 > to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG > always set to HIGH during the initialization. > > Cc: stable@vger.kernel.org > Signed-off-by: Ricky Wu <ricky_wu@realtek.com> > Link: > https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > drivers/misc/cardreader/rts5227.c | 2 +- > drivers/misc/cardreader/rts5228.c | 18 ------------------ > drivers/misc/cardreader/rts5249.c | 3 +-- > drivers/misc/cardreader/rts5260.c | 18 ------------------ > drivers/misc/cardreader/rts5261.c | 18 ------------------ > drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- > 6 files changed, 6 insertions(+), 58 deletions(-) > > ------------------------------------------------------ > > And the machine does have this hardware: > > 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A > PCI Express Card Reader (rev 01) > Subsystem: Dell RTS525A PCI Express Card Reader > Physical Slot: 1 > Flags: bus master, fast devsel, latency 0, IRQ 141 > Memory at ed100000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [80] Power Management version 3 > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [b0] Express Endpoint, MSI 00 > Kernel driver in use: rtsx_pci > Kernel modules: rtsx_pci > > > Adding to CC list since bisect landed on drivers/misc/cardreader/rtsx_pcr.c Thread starts here: https://lkml.org/lkml/2023/8/16/1154 Thank you, gene ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-17 9:16 ` Genes Lists @ 2023-08-17 17:28 ` Keith Busch 2023-08-17 17:43 ` Genes Lists 2023-08-23 17:41 ` Keith Busch 1 sibling, 1 reply; 19+ messages in thread From: Keith Busch @ 2023-08-17 17:28 UTC (permalink / raw) To: Genes Lists Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote: > On 8/16/23 21:30, Genes Lists wrote: > > On 8/16/23 17:04, Keith Busch wrote: > > ... > > > It sounds like you can recreate this. Since .10 worked and .11 doesn't, > > > could you bisect the git commits? It looks like it will take 7 steps > > > between those two versions. > > > > > > I don't think there are any nvme specific patches that could contribute > > > to what you're seeing, it's more likely some lower level platform patch > > > if a kernel change really did cause the regression. None of the recent > > > commits really stood out to me, so bisect is what I'd recommend. > > > > Thank you > > > > Bisect done - This is result: Sounds like the driver's ASPM suspicion was justified, however the recommended work-around doesn't appear to apply to this hardware. Thanks for running the bisect! > > ---------------------------------------------------------------- > > 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit > > commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 > > Author: Ricky WU <ricky_wu@realtek.com> > > Date: Tue Jul 25 09:10:54 2023 +0000 > > > > misc: rtsx: judge ASPM Mode to set PETXCFG Reg > > > > commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. > > > > ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 > > to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG > > always set to HIGH during the initialization. > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Ricky Wu <ricky_wu@realtek.com> > > Link: > > https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > drivers/misc/cardreader/rts5227.c | 2 +- > > drivers/misc/cardreader/rts5228.c | 18 ------------------ > > drivers/misc/cardreader/rts5249.c | 3 +-- > > drivers/misc/cardreader/rts5260.c | 18 ------------------ > > drivers/misc/cardreader/rts5261.c | 18 ------------------ > > drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- > > 6 files changed, 6 insertions(+), 58 deletions(-) > > > > ------------------------------------------------------ > > > > And the machine does have this hardware: > > > > 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A > > PCI Express Card Reader (rev 01) > > Subsystem: Dell RTS525A PCI Express Card Reader > > Physical Slot: 1 > > Flags: bus master, fast devsel, latency 0, IRQ 141 > > Memory at ed100000 (32-bit, non-prefetchable) [size=4K] > > Capabilities: [80] Power Management version 3 > > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > Capabilities: [b0] Express Endpoint, MSI 00 > > Kernel driver in use: rtsx_pci > > Kernel modules: rtsx_pci ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-17 17:28 ` Keith Busch @ 2023-08-17 17:43 ` Genes Lists 0 siblings, 0 replies; 19+ messages in thread From: Genes Lists @ 2023-08-17 17:43 UTC (permalink / raw) To: Keith Busch Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On 8/17/23 13:28, Keith Busch wrote: ... >>> >>> Bisect done - This is result: > > Sounds like the driver's ASPM suspicion was justified, however the > recommended work-around doesn't appear to apply to this hardware. > Thanks for running the bisect! Happy to help :) gene ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-17 9:16 ` Genes Lists 2023-08-17 17:28 ` Keith Busch @ 2023-08-23 17:41 ` Keith Busch 2023-08-23 20:25 ` Genes Lists 2023-08-24 11:29 ` Genes Lists 1 sibling, 2 replies; 19+ messages in thread From: Keith Busch @ 2023-08-23 17:41 UTC (permalink / raw) To: Genes Lists Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote: > > ---------------------------------------------------------------- > > 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit > > commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 > > Author: Ricky WU <ricky_wu@realtek.com> > > Date: Tue Jul 25 09:10:54 2023 +0000 > > > > misc: rtsx: judge ASPM Mode to set PETXCFG Reg > > > > commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. > > > > ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 > > to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG > > always set to HIGH during the initialization. > > > > Cc: stable@vger.kernel.org > > Signed-off-by: Ricky Wu <ricky_wu@realtek.com> > > Link: > > https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com > > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > drivers/misc/cardreader/rts5227.c | 2 +- > > drivers/misc/cardreader/rts5228.c | 18 ------------------ > > drivers/misc/cardreader/rts5249.c | 3 +-- > > drivers/misc/cardreader/rts5260.c | 18 ------------------ > > drivers/misc/cardreader/rts5261.c | 18 ------------------ > > drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- > > 6 files changed, 6 insertions(+), 58 deletions(-) > > > > ------------------------------------------------------ > > > > And the machine does have this hardware: > > > > 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A > > PCI Express Card Reader (rev 01) > > Subsystem: Dell RTS525A PCI Express Card Reader > > Physical Slot: 1 > > Flags: bus master, fast devsel, latency 0, IRQ 141 > > Memory at ed100000 (32-bit, non-prefetchable) [size=4K] > > Capabilities: [80] Power Management version 3 > > Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ > > Capabilities: [b0] Express Endpoint, MSI 00 > > Kernel driver in use: rtsx_pci > > Kernel modules: rtsx_pci > > > > > > > > > Adding to CC list since bisect landed on > > drivers/misc/cardreader/rtsx_pcr.c > > Thread starts here: https://lkml.org/lkml/2023/8/16/1154 I realize you can work around this by blacklisting the rtsx_pci, but that's not a pleasant solution. With only a few days left in 6.5, should the commit just be reverted? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-23 17:41 ` Keith Busch @ 2023-08-23 20:25 ` Genes Lists [not found] ` <180a2bbd2c314ede8f6c4c16cc4603bf@realtek.com> 2023-08-24 11:29 ` Genes Lists 1 sibling, 1 reply; 19+ messages in thread From: Genes Lists @ 2023-08-23 20:25 UTC (permalink / raw) To: Keith Busch Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On 8/23/23 13:41, Keith Busch wrote: > On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote: >>> ---------------------------------------------------------------- >>> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit >>> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 >>> Author: Ricky WU <ricky_wu@realtek.com> >>> Date: Tue Jul 25 09:10:54 2023 +0000 >>> >>> misc: rtsx: judge ASPM Mode to set PETXCFG Reg >>> >>> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. >>> >>> ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 >>> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG >>> always set to HIGH during the initialization. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Ricky Wu <ricky_wu@realtek.com> >>> Link: >>> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com >>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>> >>> drivers/misc/cardreader/rts5227.c | 2 +- >>> drivers/misc/cardreader/rts5228.c | 18 ------------------ >>> drivers/misc/cardreader/rts5249.c | 3 +-- >>> drivers/misc/cardreader/rts5260.c | 18 ------------------ >>> drivers/misc/cardreader/rts5261.c | 18 ------------------ >>> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- >>> 6 files changed, 6 insertions(+), 58 deletions(-) >>> >>> ------------------------------------------------------ >>> >>> And the machine does have this hardware: >>> >>> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A >>> PCI Express Card Reader (rev 01) >>> Subsystem: Dell RTS525A PCI Express Card Reader >>> Physical Slot: 1 >>> Flags: bus master, fast devsel, latency 0, IRQ 141 >>> Memory at ed100000 (32-bit, non-prefetchable) [size=4K] >>> Capabilities: [80] Power Management version 3 >>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> Capabilities: [b0] Express Endpoint, MSI 00 >>> Kernel driver in use: rtsx_pci >>> Kernel modules: rtsx_pci >>> >>> >>> >> >> >> Adding to CC list since bisect landed on >> >> drivers/misc/cardreader/rtsx_pcr.c >> >> Thread starts here: https://lkml.org/lkml/2023/8/16/1154 > > I realize you can work around this by blacklisting the rtsx_pci, but > that's not a pleasant solution. With only a few days left in 6.5, should > the commit just be reverted? Keith - thanks for reminder. The card reader device itself is non-critical and very low priority. What perhaps is a little more worrisome is the change in rtsx somehow prevented nvme from functioning normally and the machine then not booting (at least for some combination(s) of hardware). If there is a simple fix to prevent nvme from being impacted by the rtsx driver that would be more than sufficient? On the other hand 6.4.11 is out, and I'm guessing there isn't a lot of noise on this either. From what I've seen, 1 other user with same problem [1] and 1 with same card reader not having a problema [2]. And no 'me-too's in the kernel bugzilla [3] either. Gene [1] https://bbs.archlinux.org/viewtopic.php?id=288095 [2] https://bugs.archlinux.org/task/79439 [3] https://bugzilla.kernel.org/show_bug.cgi?id=217802 ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <180a2bbd2c314ede8f6c4c16cc4603bf@realtek.com>]
* Re: Possible nvme regression in 6.4.11 [not found] ` <180a2bbd2c314ede8f6c4c16cc4603bf@realtek.com> @ 2023-08-24 9:48 ` Genes Lists 2023-08-24 10:22 ` Genes Lists 0 siblings, 1 reply; 19+ messages in thread From: Genes Lists @ 2023-08-24 9:48 UTC (permalink / raw) To: Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org On 8/23/23 22:44, Ricky WU wrote: > Hi Gene, > > I can't reproduce this issue on my side... > > So if you only revert this patch (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine? > This patch only do is pull our clock request to HIGH if HOST need also can pull to LOW, and this only do on our device > I don’t think this will affect other ports... > > BR, > Ricky Thanks Ricky - I will test revering just that commit and report back. I wont be able to get to it till later today (sometime after 2pm EDT) but I will do it today. FYI, i see one mpre report of someone experiencing same problem [1] gene [1] https://bugs.archlinux.org/task/79439 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-24 9:48 ` Genes Lists @ 2023-08-24 10:22 ` Genes Lists [not found] ` <fa82d9dcbe83403abc644c20922b47f9@realtek.com> 0 siblings, 1 reply; 19+ messages in thread From: Genes Lists @ 2023-08-24 10:22 UTC (permalink / raw) To: Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org On 8/24/23 05:48, Genes Lists wrote: > On 8/23/23 22:44, Ricky WU wrote: >> Hi Gene, >> >> I can't reproduce this issue on my side... >> >> So if you only revert this patch >> (69304c8d285b77c9a56d68f5ddb2558f27abf406) can work fine? >> This patch only do is pull our clock request to HIGH if HOST need also >> can pull to LOW, and this only do on our device >> I don’t think this will affect other ports... >> >> BR, >> Ricky > > Thanks Ricky - I will test revering just that commit and report back. I > wont be able to get to it till later today (sometime after 2pm EDT) but > I will do it today. > > FYI, i see one mpre report of someone experiencing same problem [1] > > gene > > [1] https://bugs.archlinux.org/task/79439 > > That commit was what was reverted in the last step of the git bisect - and indeed reverting that commit makes the problem go away and machine then boots fine. thanks gene ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <fa82d9dcbe83403abc644c20922b47f9@realtek.com>]
* Re: Possible nvme regression in 6.4.11 [not found] ` <fa82d9dcbe83403abc644c20922b47f9@realtek.com> @ 2023-08-30 21:09 ` Genes Lists 2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 19+ messages in thread From: Genes Lists @ 2023-08-30 21:09 UTC (permalink / raw) To: Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org ... > I think maybe it is a system power saving issue.... > In the past if the BIOS(config space) not set L1-substate, our driver will keep drive low CLKREQ# when HOST want to enter power saving state that make whole system not enter the power saving state. > But this patch we release the CLKREQ# to HOST, make whole system can enter power saving state success when the HOST want to enter the power saving state, but I don't know why your system can not wake out success from power saving stat on the platform > > Ricky > Hi Thanks for continuing to look into this. Can you share your thoughts on best way to proceed going forward - do you plan to revert or something else? thanks gene ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-30 21:09 ` Genes Lists @ 2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis) 2023-09-11 11:38 ` Thorsten Leemhuis 2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato 0 siblings, 2 replies; 19+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-11 8:02 UTC (permalink / raw) To: Genes Lists, Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org, Linux kernel regressions list Hi, Thorsten here, the Linux kernel's regression tracker. On 30.08.23 23:09, Genes Lists wrote: > ... >> I think maybe it is a system power saving issue.... >> In the past if the BIOS(config space) not set L1-substate, our driver >> will keep drive low CLKREQ# when HOST want to enter power saving state >> that make whole system not enter the power saving state. >> But this patch we release the CLKREQ# to HOST, make whole system can >> enter power saving state success when the HOST want to enter the power >> saving state, but I don't know why your system can not wake out >> success from power saving stat on the platform > > Thanks for continuing to look into this. Can you share your thoughts > on best way to proceed going forward - do you plan to revert or > something else? Hmmm. This looks like it fell through the cracks. Or am I missing something? Anyway, 6.4.y will likely be EOL in a week or two. Which bears the question: are 6.5.y and 6.6-rc1 working better for you? From the bugzilla ticket (https://bugzilla.kernel.org/show_bug.cgi?id=217802) and comments from others that are affected it sounds like that's not the case. If that's how it is I guess it overdue that the 101bd907b4244a ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg") is reverted. Or am I missing something? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-09-11 11:38 ` Thorsten Leemhuis 2023-09-18 17:07 ` [Revert] " Jade Lovelace 2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace 2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato 1 sibling, 2 replies; 19+ messages in thread From: Thorsten Leemhuis @ 2023-09-11 11:38 UTC (permalink / raw) To: Genes Lists, Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org, Linux kernel regressions list On 11.09.23 10:02, Linux regression tracking (Thorsten Leemhuis) wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. > > On 30.08.23 23:09, Genes Lists wrote: >> ... >>> I think maybe it is a system power saving issue.... >>> In the past if the BIOS(config space) not set L1-substate, our driver >>> will keep drive low CLKREQ# when HOST want to enter power saving state >>> that make whole system not enter the power saving state. >>> But this patch we release the CLKREQ# to HOST, make whole system can >>> enter power saving state success when the HOST want to enter the power >>> saving state, but I don't know why your system can not wake out >>> success from power saving stat on the platform >> >> Thanks for continuing to look into this. Can you share your thoughts >> on best way to proceed going forward - do you plan to revert or >> something else? > > Hmmm. This looks like it fell through the cracks. Or am I missing something? > > Anyway, 6.4.y will likely be EOL in a week or two. Which bears the > question: are 6.5.y and 6.6-rc1 working better for you? From the > bugzilla ticket (https://bugzilla.kernel.org/show_bug.cgi?id=217802) and > comments from others that are affected it sounds like that's not the > case. If that's how it is I guess it overdue that the 101bd907b4244a > ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg") is reverted. Or am I > missing something? According to feedback in bugzilla.kernel.org 6.5.y is affected as well. And openSUSE apparently reverted the culprit about a week ago due to the problems it causes: https://bugzilla.suse.com/show_bug.cgi?id=1214428 Guess that means we should do the same for mainline with a CC: stable@... tag. Ricky WU, or do you have a better idea? Yes, from earlier in the thread the root of the problem might not be in the patch you contributed, but it exposes the problem, hence it should be reverted unless a better solution can be found quickly. And that hasn't happened in the past two weeks, hence it's afaics time for a revert. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Revert] Re: Possible nvme regression in 6.4.11 2023-09-11 11:38 ` Thorsten Leemhuis @ 2023-09-18 17:07 ` Jade Lovelace 2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace 1 sibling, 0 replies; 19+ messages in thread From: Jade Lovelace @ 2023-09-18 17:07 UTC (permalink / raw) To: Gene, Ricky WU, Keith Busch, Thorsten Leemhuis, linux-kernel Cc: regressions, Alyssa Ross, Michal Suchanek, axboe @ kernel . dk , sagi @ grimberg . me , linux-nvme @ lists . infradead . org , hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org, stable This regression affects all copies of the Dell XPS 15 9560 and Dell Precision 5520 with any SSD including aftermarket ones. Per the bugzilla discussion here: https://bugzilla.kernel.org/show_bug.cgi?id=217802 this regression has been confirmed to also affect 6.5.2 and 6.6-rc1, and affects several distros. Known affected branches: 6.1, 6.4, 6.5, 6.6. It has already been reverted by OpenSUSE and is soon to be reverted in NixOS. A patch follows, hopefully with the right metadata tags. I have compiled a kernel 6.1 with the revert and confirmed it now boots again. p.s. this is my first time pointing git-send-email at this particular list, so I'm sorry if I got anything wrong. Jade ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" 2023-09-11 11:38 ` Thorsten Leemhuis 2023-09-18 17:07 ` [Revert] " Jade Lovelace @ 2023-09-18 17:07 ` Jade Lovelace 1 sibling, 0 replies; 19+ messages in thread From: Jade Lovelace @ 2023-09-18 17:07 UTC (permalink / raw) To: Gene, Ricky WU, Keith Busch, Thorsten Leemhuis, linux-kernel Cc: regressions, Alyssa Ross, Michal Suchanek, axboe @ kernel . dk , sagi @ grimberg . me , linux-nvme @ lists . infradead . org , hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org, stable, Gene This reverts commit 101bd907b4244a726980ee67f95ed9cafab6ff7a. This commit causes the NVMe controller to not work on the Dell XPS 15 9560, and similar laptop models. It appears to happen with any SSD model. This commit is broken on 6.1, 6.4, 6.5, and 6.6-rc1. OpenSUSE has already reverted, and I have submitted a revert to NixOS. As far as I can tell, this regression has fallen through the cracks. Symptom: kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible kernel: nvme nvme0: Disabling device after reset failure: -19 systemd-cryptsetup[169]: Device /dev/disk/by-uuid/b80aedf8-ddd4-46fa-8d09-5215d5f286b9 READ lock released. systemd-cryptsetup[169]: IO error while decrypting keyslot. systemd-cryptsetup[169]: Keyslot 0 (luks2) open failed with -5. systemd-cryptsetup[169]: Keyslot open failed. systemd-cryptsetup[169]: Failed to activate with specified passphrase: Input/output error There are several downstream bugs, these are the ones I know of: - https://bugzilla.suse.com/show_bug.cgi?id=1214428 - https://github.com/NixOS/nixpkgs/issues/253418 - https://bugs.archlinux.org/task/79439#comment221866 Upstream revert links: - https://github.com/openSUSE/kernel-source/commit/1b02b1528a26f4e9b577e215c114d8c5e773ee10 - https://github.com/NixOS/nixpkgs/pull/255824 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217802 Reported-and-bisected-by: Gene <geneslists@sapience.com> Link: https://lore.kernel.org/lkml/30b69186-5a6e-4f53-b24c-2221926fc3b4@sapience.com/ Signed-off-by: Jade Lovelace <lists@jade.fyi> --- drivers/misc/cardreader/rts5227.c | 2 +- drivers/misc/cardreader/rts5228.c | 18 ++++++++++++++++++ drivers/misc/cardreader/rts5249.c | 3 ++- drivers/misc/cardreader/rts5260.c | 18 ++++++++++++++++++ drivers/misc/cardreader/rts5261.c | 18 ++++++++++++++++++ drivers/misc/cardreader/rtsx_pcr.c | 5 +---- 6 files changed, 58 insertions(+), 6 deletions(-) diff --git a/drivers/misc/cardreader/rts5227.c b/drivers/misc/cardreader/rts5227.c index 3dae5e3a1697..d676cf63a966 100644 --- a/drivers/misc/cardreader/rts5227.c +++ b/drivers/misc/cardreader/rts5227.c @@ -195,7 +195,7 @@ static int rts5227_extra_init_hw(struct rtsx_pcr *pcr) } } - if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG) + if (option->force_clkreq_0) rtsx_pci_add_cmd(pcr, WRITE_REG_CMD, PETXCFG, FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); else diff --git a/drivers/misc/cardreader/rts5228.c b/drivers/misc/cardreader/rts5228.c index f4ab09439da7..cfebad51d1d8 100644 --- a/drivers/misc/cardreader/rts5228.c +++ b/drivers/misc/cardreader/rts5228.c @@ -435,10 +435,17 @@ static void rts5228_init_from_cfg(struct rtsx_pcr *pcr) option->ltr_enabled = false; } } + + if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN + | PM_L1_1_EN | PM_L1_2_EN)) + option->force_clkreq_0 = false; + else + option->force_clkreq_0 = true; } static int rts5228_extra_init_hw(struct rtsx_pcr *pcr) { + struct rtsx_cr_option *option = &pcr->option; rtsx_pci_write_register(pcr, RTS5228_AUTOLOAD_CFG1, CD_RESUME_EN_MASK, CD_RESUME_EN_MASK); @@ -469,6 +476,17 @@ static int rts5228_extra_init_hw(struct rtsx_pcr *pcr) else rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00); + /* + * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced + * to drive low, and we forcibly request clock. + */ + if (option->force_clkreq_0) + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); + else + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); + rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB); if (pcr->rtd3_en) { diff --git a/drivers/misc/cardreader/rts5249.c b/drivers/misc/cardreader/rts5249.c index 47ab72a43256..91d240dd68fa 100644 --- a/drivers/misc/cardreader/rts5249.c +++ b/drivers/misc/cardreader/rts5249.c @@ -327,11 +327,12 @@ static int rts5249_extra_init_hw(struct rtsx_pcr *pcr) } } + /* * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced * to drive low, and we forcibly request clock. */ - if (option->force_clkreq_0 && pcr->aspm_mode == ASPM_MODE_CFG) + if (option->force_clkreq_0) rtsx_pci_write_register(pcr, PETXCFG, FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); else diff --git a/drivers/misc/cardreader/rts5260.c b/drivers/misc/cardreader/rts5260.c index 79b18f6f73a8..9b42b20a3e5a 100644 --- a/drivers/misc/cardreader/rts5260.c +++ b/drivers/misc/cardreader/rts5260.c @@ -517,10 +517,17 @@ static void rts5260_init_from_cfg(struct rtsx_pcr *pcr) option->ltr_enabled = false; } } + + if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN + | PM_L1_1_EN | PM_L1_2_EN)) + option->force_clkreq_0 = false; + else + option->force_clkreq_0 = true; } static int rts5260_extra_init_hw(struct rtsx_pcr *pcr) { + struct rtsx_cr_option *option = &pcr->option; /* Set mcu_cnt to 7 to ensure data can be sampled properly */ rtsx_pci_write_register(pcr, 0xFC03, 0x7F, 0x07); @@ -539,6 +546,17 @@ static int rts5260_extra_init_hw(struct rtsx_pcr *pcr) rts5260_init_hw(pcr); + /* + * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced + * to drive low, and we forcibly request clock. + */ + if (option->force_clkreq_0) + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); + else + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); + rtsx_pci_write_register(pcr, pcr->reg_pm_ctrl3, 0x10, 0x00); return 0; diff --git a/drivers/misc/cardreader/rts5261.c b/drivers/misc/cardreader/rts5261.c index 94af6bf8a25a..b1e76030cafd 100644 --- a/drivers/misc/cardreader/rts5261.c +++ b/drivers/misc/cardreader/rts5261.c @@ -498,10 +498,17 @@ static void rts5261_init_from_cfg(struct rtsx_pcr *pcr) option->ltr_enabled = false; } } + + if (rtsx_check_dev_flag(pcr, ASPM_L1_1_EN | ASPM_L1_2_EN + | PM_L1_1_EN | PM_L1_2_EN)) + option->force_clkreq_0 = false; + else + option->force_clkreq_0 = true; } static int rts5261_extra_init_hw(struct rtsx_pcr *pcr) { + struct rtsx_cr_option *option = &pcr->option; u32 val; rtsx_pci_write_register(pcr, RTS5261_AUTOLOAD_CFG1, @@ -547,6 +554,17 @@ static int rts5261_extra_init_hw(struct rtsx_pcr *pcr) else rtsx_pci_write_register(pcr, PETXCFG, 0x30, 0x00); + /* + * If u_force_clkreq_0 is enabled, CLKREQ# PIN will be forced + * to drive low, and we forcibly request clock. + */ + if (option->force_clkreq_0) + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_LOW); + else + rtsx_pci_write_register(pcr, PETXCFG, + FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); + rtsx_pci_write_register(pcr, PWD_SUSPEND_EN, 0xFF, 0xFB); if (pcr->rtd3_en) { diff --git a/drivers/misc/cardreader/rtsx_pcr.c b/drivers/misc/cardreader/rtsx_pcr.c index a3f4b52bb159..32b7783e9d4f 100644 --- a/drivers/misc/cardreader/rtsx_pcr.c +++ b/drivers/misc/cardreader/rtsx_pcr.c @@ -1326,11 +1326,8 @@ static int rtsx_pci_init_hw(struct rtsx_pcr *pcr) return err; } - if (pcr->aspm_mode == ASPM_MODE_REG) { + if (pcr->aspm_mode == ASPM_MODE_REG) rtsx_pci_write_register(pcr, ASPM_FORCE_CTL, 0x30, 0x30); - rtsx_pci_write_register(pcr, PETXCFG, - FORCE_CLKREQ_DELINK_MASK, FORCE_CLKREQ_HIGH); - } /* No CD interrupt if probing driver with card inserted. * So we need to initialize pcr->card_exist here. -- 2.42.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis) 2023-09-11 11:38 ` Thorsten Leemhuis @ 2023-09-11 15:41 ` Augusto Zanellato 1 sibling, 0 replies; 19+ messages in thread From: Augusto Zanellato @ 2023-09-11 15:41 UTC (permalink / raw) To: Linux regressions mailing list, Genes Lists, Ricky WU, Keith Busch Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk, sagi@grimberg.me, linux-nvme@lists.infradead.org, hch@lst.de, arnd@arndb.de, gregkh@linuxfoundation.org Hi, I'm also experiencing the issue described in this thread, just wanted to chime in with regards to > Anyway, 6.4.y will likely be EOL in a week or two. Which bears the > question: are 6.5.y and 6.6-rc1 working better for you? I can confirm that the issue is still happening and preventing correct boot on both 6.5.2 (Arch Linux package version 6.5.2.arch1-1) and on 6.6-rc1 (Arch Linux package linux-mainline built from AUR). For reference: the affected machine is a Dell XPS 15 9560 with the latest firmware revision (1.31.0) as of time of writing this email, the NVMe drive is a Sabrent Rocket4 1TB drive, but the issue also happens with the OEM provided Toshiba XG4. Thanks, Augusto PS: it's my first time here in LKML, feel free to tell me if I did anything wrong :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-23 17:41 ` Keith Busch 2023-08-23 20:25 ` Genes Lists @ 2023-08-24 11:29 ` Genes Lists 1 sibling, 0 replies; 19+ messages in thread From: Genes Lists @ 2023-08-24 11:29 UTC (permalink / raw) To: Keith Busch Cc: linux-kernel, axboe, sagi, linux-nvme, hch, arnd, ricky_wu, gregkh On 8/23/23 13:41, Keith Busch wrote: > On Thu, Aug 17, 2023 at 05:16:01AM -0400, Genes Lists wrote: >>> ---------------------------------------------------------------- >>> 69304c8d285b77c9a56d68f5ddb2558f27abf406 is the first bad commit >>> commit 69304c8d285b77c9a56d68f5ddb2558f27abf406 >>> Author: Ricky WU <ricky_wu@realtek.com> >>> Date: Tue Jul 25 09:10:54 2023 +0000 >>> >>> misc: rtsx: judge ASPM Mode to set PETXCFG Reg >>> >>> commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream. >>> >>> ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0 >>> to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG >>> always set to HIGH during the initialization. >>> >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Ricky Wu <ricky_wu@realtek.com> >>> Link: >>> https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com >>> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> >>> >>> drivers/misc/cardreader/rts5227.c | 2 +- >>> drivers/misc/cardreader/rts5228.c | 18 ------------------ >>> drivers/misc/cardreader/rts5249.c | 3 +-- >>> drivers/misc/cardreader/rts5260.c | 18 ------------------ >>> drivers/misc/cardreader/rts5261.c | 18 ------------------ >>> drivers/misc/cardreader/rtsx_pcr.c | 5 ++++- >>> 6 files changed, 6 insertions(+), 58 deletions(-) >>> >>> ------------------------------------------------------ >>> >>> And the machine does have this hardware: >>> >>> 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A >>> PCI Express Card Reader (rev 01) >>> Subsystem: Dell RTS525A PCI Express Card Reader >>> Physical Slot: 1 >>> Flags: bus master, fast devsel, latency 0, IRQ 141 >>> Memory at ed100000 (32-bit, non-prefetchable) [size=4K] >>> Capabilities: [80] Power Management version 3 >>> Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> Capabilities: [b0] Express Endpoint, MSI 00 >>> Kernel driver in use: rtsx_pci >>> Kernel modules: rtsx_pci >>> >>> >>> >> >> >> Adding to CC list since bisect landed on >> >> drivers/misc/cardreader/rtsx_pcr.c >> >> Thread starts here: https://lkml.org/lkml/2023/8/16/1154 > > I realize you can work around this by blacklisting the rtsx_pci, but > that's not a pleasant solution. With only a few days left in 6.5, should > the commit just be reverted? Looks like here are more people having same problem than I was aware of earlier [1]. My recommendation now is to revert this. thanks gene [1] https://bugs.archlinux.org/task/79439#comment221262 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists 2023-08-16 21:04 ` Keith Busch @ 2023-08-17 3:00 ` Bagas Sanjaya 2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis) 1 sibling, 1 reply; 19+ messages in thread From: Bagas Sanjaya @ 2023-08-17 3:00 UTC (permalink / raw) To: Genes Lists, linux-kernel, Ricky WU, Arnd Bergmann Cc: kbusch, axboe, sagi, linux-nvme, hch, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 2554 bytes --] On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote: > > Also reported to bugzilla [1] > > Failure happens on 1 laptop with samsung ssd. > > Boot log manually transcribed: > > kernel: nvme nvme0: controller is down; will reset: CSTS:0xffffffff, > PCI_STATUS=0xffff > kernel: nvme nvme0: Does your device have a faulty power saving mode > enabled? > kernel: nvme nvme0: try "nvme_core.default_ps_max_latency_us=0 > pcie_aspm=off" and report a bug > kernel: nvme 0000:04:00.0: Unable to change power state from D3cold to D0, > device inaccessible > kernel: nvme nvme0: Disabling device after reset failure: -19 > mount[353]: mount /sysroot: can't read suprtblock on /dev/nvme0n1p5. > mount[353]: dmesg(1) may have more information after failed moutn > system call. > kernel: nvme0m1: detected capacity change from 2000409264 to 0 > kernel: EXT4-fs (nvme0n1p5): unable to read superblock > systemd([1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a > ... > > All kernels are upstream, untainted and compiled on Arch using: > > gcc version 13.2.1 > > Kernels Tested: > - 6.4.10 - works fine > - 6.4.11 - fails > - 6.5-rc6 - fails > - 6.4.11 + nvme_core.default_ps_max_latency_us=0 pcie_aspm=off - fails > - 6.4.11 with 1 revert below - fails > > Revert "nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and > 512G" > This reverts commit 061fbf64825fb47367bbb6e0a528611f08119473. > > Hardware: > model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz > stepping : 9 > microcode : 0xf4 > > nvme: > 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD > Controller SM961/PM961/SM963 > Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD > Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 0 > Memory at edb00000 (64-bit, non-prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+ > Capabilities: [70] Express Endpoint, MSI 00 > Capabilities: [b0] MSI-X: Enable+ Count=33 Masked- > Kernel driver in use: nvme > Thanks for the regression report. I'm adding it to regzbot: #regzbot ^introduced: 101bd907b4244a #regzbot title: can't change Samsung SSD power state due to ASPM mode checking #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217802 -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Possible nvme regression in 6.4.11 2023-08-17 3:00 ` Bagas Sanjaya @ 2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis) 0 siblings, 0 replies; 19+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-09-29 11:49 UTC (permalink / raw) To: linux-kernel; +Cc: linux-nvme, Linux Regressions On 17.08.23 05:00, Bagas Sanjaya wrote: > On Wed, Aug 16, 2023 at 04:39:34PM -0400, Genes Lists wrote: > Thanks for the regression report. I'm adding it to regzbot: > > #regzbot ^introduced: 101bd907b4244a > #regzbot title: can't change Samsung SSD power state due to ASPM mode checking > #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217802 Fix is in Gregs tree and hopefully soon in mainline: #regzbot fix: 0e4cac557531a4 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-09-29 11:49 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-16 20:39 Possible nvme regression in 6.4.11 Genes Lists
2023-08-16 21:04 ` Keith Busch
2023-08-17 1:30 ` Genes Lists
2023-08-17 9:16 ` Genes Lists
2023-08-17 17:28 ` Keith Busch
2023-08-17 17:43 ` Genes Lists
2023-08-23 17:41 ` Keith Busch
2023-08-23 20:25 ` Genes Lists
[not found] ` <180a2bbd2c314ede8f6c4c16cc4603bf@realtek.com>
2023-08-24 9:48 ` Genes Lists
2023-08-24 10:22 ` Genes Lists
[not found] ` <fa82d9dcbe83403abc644c20922b47f9@realtek.com>
2023-08-30 21:09 ` Genes Lists
2023-09-11 8:02 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-11 11:38 ` Thorsten Leemhuis
2023-09-18 17:07 ` [Revert] " Jade Lovelace
2023-09-18 17:07 ` [PATCH] Revert "misc: rtsx: judge ASPM Mode to set PETXCFG Reg" Jade Lovelace
2023-09-11 15:41 ` Possible nvme regression in 6.4.11 Augusto Zanellato
2023-08-24 11:29 ` Genes Lists
2023-08-17 3:00 ` Bagas Sanjaya
2023-09-29 11:49 ` Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox