From: Ricky WU <ricky_wu@realtek.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: "scott@spiteful.org" <scott@spiteful.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"kbusch@kernel.org" <kbusch@kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"Mika Westerberg" <mika.westerberg@linux.intel.com>
Subject: RE: [bug report] nvme not re-recognize when changed M.2 SSD in S3 mode
Date: Wed, 29 May 2024 07:47:13 +0000 [thread overview]
Message-ID: <94918684e6a84122a6373ef45b3c117c@realtek.com> (raw)
In-Reply-To: <ZlR0grWLqO9nch8Q@wunner.de>
Hi Lukas,
We use SDEX card replace M.2 nvme card because we don't have enough M.2 nvme card
We tried this patch as below situation:
1.keep the SDEX card enter S3 then resume ->PASS
the video can continue playing, not see the msg "device replace during system sleep"
2. on S3 mode insert the SDEX card then resume -> PASS
Can identify the card and can see the msg "device replace during system sleep"
3.on S3 mode remove the SDEX card then resume -> PASS
No card show on system and can see the msg "device replace during system sleep"
4.replace card on S3 mode (different brand) ->PASS
Can identify the second card and can see the msg "device replace during system sleep"
5.replace card on S3 mode (same brand and same capacity) ->FAIL
Can not see the msg "device replace during system sleep"
Against 5 we found a issue, most card not declare capability "PCI_EXT_CAP_ID_DSN",
even if there is the capability the config space value are 0, cause pci_get_dsn() return 0 normally.
However these cards can show the SN on CrystalDiskInfo.
below is the card pci config space log and lsblk disk info.
Sandisk Corp Device 5007 (prog-if 02 [NVM Express]) this card can see the "PCI_EXT_CAP_ID_DSN" capability
But value is 0
--------------------------------------------------------------------------------------------------------------------
cr@cr-desktop:~$ sudo lspci -s 02:00.0 -vvv
02:00.0 Non-Volatile memory controller: Sandisk Corp Device 5007 (prog-if 02 [NVM Express])
Subsystem: Sandisk Corp Device 5007
Physical Slot: 8
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at a3b00000 (64-bit, non-prefetchable) [size=16K]
Region 4: Memory at a3b04000 (64-bit, non-prefetchable) [size=256]
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] MSI-X: Enable+ Count=17 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=4 offset=00000000
Capabilities: [c0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L1 <8us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+, NROPrPrP-, LTR+
10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [1b8 v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [900 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=32us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=90112ns
L1SubCtl2: T_PwrOn=44us
Kernel driver in use: nvme
Kernel modules: nvme
--------------------------------------------------------------------------------------------------------------------
Here can see the Serial Number is "03969d74164000000000"
--------------------------------------------------------------------------------------------------------------------
cr@cr-desktop:~$ lsblk --nodeps -o name,size,type,mountpoint,serial
NAME SIZE TYPE MOUNTPOINT SERIAL
loop0 4K loop /snap/bare/5
loop1 55.7M loop /snap/core18/2785
loop2 55.7M loop /snap/core18/2790
loop3 63.5M loop /snap/core20/1974
loop4 91.7M loop /snap/gtk-common-themes/1535
loop5 53.3M loop /snap/snapd/19457
loop6 485.5M loop /snap/gnome-42-2204/120
loop7 65.1M loop /snap/gtk-common-themes/1515
loop8 73.9M loop /snap/core22/858
loop9 12.3M loop /snap/snap-store/959
loop10 485.5M loop /snap/gnome-42-2204/126
loop11 219M loop /snap/gnome-3-34-1804/72
loop12 51M loop /snap/snap-store/547
loop13 218.4M loop /snap/gnome-3-34-1804/93
loop14 63.9M loop /snap/sublime-text/122
loop15 321.1M loop /snap/vlc/3721
loop16 65.2M loop /snap/sublime-text/118
sda 232.9G disk 210350450505
nvme0n1 236.1G disk 03969d74164000000000
--------------------------------------------------------------------------------------------------------------------
we tried the 4 SDEX cards and 2 M.2 Nvme,
either it is not declared or the value is 0 all the cards I have.
RIcky
> Below please find a reworked patch which checks the Device Serial Number in
> addition to various other device identity information in config space.
>
> I've moved the check for a replaced device to the ->resume_noirq phase, i.e.
> before the device driver's first access to the device.
> I'm also marking the device removed to prevent the driver from accessing it.
>
> Does this fix the issue for you?
>
> If it does reliably detect a replaced device, could you also double-check the
> opposite case, i.e. if the device is *not* replaced during system sleep?
>
> I think this is about as much as we can do at the PCI layer to detect a replaced
> device. Drivers may have additional ways to identify a device (such as
> reading a WWID from some register) and we could consider providing a library
> function which drivers could call if they detect a replaced device on resume.
>
> If you set CONFIG_DYNAMIC_DEBUG=y and add the following to the command
> line...
>
> pciehp.pciehp_debug=1 dyndbg="file pciehp* +p" log_buf_len=10M
> ignore_loglevel
>
> ...you should see "device replaced during system sleep" messages on resume if
> a replaced device is detected.
>
> -- >8 --
>
> From: Lukas Wunner <lukas@wunner.de>
> Subject: [PATCH] PCI: pciehp: Detect device replacement during system sleep
>
> Ricky reports that replacing a device in a hotplug slot during ACPI sleep state
> S3 does not cause re-enumeration on resume, as one would expect. Instead,
> the new device is treated as if it was the old one.
>
> There is no bulletproof way to detect device replacement, but as a heuristic,
> check whether the device identity in config space matches cached data in
> struct pci_dev (Vendor ID, Device ID, Class Code, Revision ID, Subsystem
> Vendor ID, Subsystem ID). Additionally, cache and compare the Device Serial
> Number (PCIe r6.2 sec 7.9.3). If a mismatch is detected, mark the old device
> disconnected (to prevent its driver from accessing the new device) and
> synthesize a Presence Detect Changed event.
>
> Reported-by: Ricky Wu <ricky_wu@realtek.com>
> Closes:
> https://lore.kernel.org/r/a608b5930d0a48f092f717c0e137454b@realtek.com
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
> drivers/pci/hotplug/pciehp.h | 4 ++++
> drivers/pci/hotplug/pciehp_core.c | 42
> ++++++++++++++++++++++++++++++++++++++-
> drivers/pci/hotplug/pciehp_hpc.c | 5 +++++
> drivers/pci/hotplug/pciehp_pci.c | 4 ++++
> 4 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h index
> e0a614a..273dd8c 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -46,6 +46,9 @@
> /**
> * struct controller - PCIe hotplug controller
> * @pcie: pointer to the controller's PCIe port service device
> + * @dsn: cached copy of Device Serial Number of Function 0 in the hotplug
> slot
> + * (PCIe r6.2 sec 7.9.3); used to determine whether a hotplugged device
> + * was replaced with a different one during system sleep
> * @slot_cap: cached copy of the Slot Capabilities register
> * @inband_presence_disabled: In-Band Presence Detect Disable supported
> by
> * controller and disabled per spec recommendation (PCIe r5.0,
> appendix I
> @@ -87,6 +90,7 @@
> */
> struct controller {
> struct pcie_device *pcie;
> + u64 dsn;
>
> u32 slot_cap; /* capabilities and
> quirks */
> unsigned int inband_presence_disabled:1; diff --git
> a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index ddd55ad..ff458e6 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -284,6 +284,32 @@ static int pciehp_suspend(struct pcie_device *dev)
> return 0;
> }
>
> +static bool pciehp_device_replaced(struct controller *ctrl) {
> + struct pci_dev *pdev __free(pci_dev_put);
> + u32 reg;
> +
> + pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
> + if (!pdev)
> + return true;
> +
> + if (pci_read_config_dword(pdev, PCI_VENDOR_ID, ®) ||
> + reg != (pdev->vendor | (pdev->device << 16)) ||
> + pci_read_config_dword(pdev, PCI_CLASS_REVISION, ®) ||
> + reg != (pdev->revision | (pdev->class << 8)))
> + return true;
> +
> + if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> + (pci_read_config_dword(pdev, PCI_SUBSYSTEM_VENDOR_ID,
> ®) ||
> + reg != (pdev->subsystem_vendor | (pdev->subsystem_device
> << 16))))
> + return true;
> +
> + if (pci_get_dsn(pdev) != ctrl->dsn)
> + return true;
> +
> + return false;
> +}
> +
> static int pciehp_resume_noirq(struct pcie_device *dev) {
> struct controller *ctrl = get_service_data(dev); @@ -293,9 +319,23
> @@ static int pciehp_resume_noirq(struct pcie_device *dev)
> ctrl->cmd_busy = true;
>
> /* clear spurious events from rediscovery of inserted card */
> - if (ctrl->state == ON_STATE || ctrl->state == BLINKINGOFF_STATE)
> + if (ctrl->state == ON_STATE || ctrl->state == BLINKINGOFF_STATE)
> + {
> pcie_clear_hotplug_events(ctrl);
>
> + /*
> + * If hotplugged device was replaced with a different one
> + * during system sleep, mark the old device disconnected
> + * (to prevent its driver from accessing the new device)
> + * and synthesize a Presence Detect Changed event.
> + */
> + if (pciehp_device_replaced(ctrl)) {
> + ctrl_dbg(ctrl, "device replaced during system
> sleep\n");
> + pci_walk_bus(ctrl->pcie->port->subordinate,
> + pci_dev_set_disconnected,
> NULL);
> + pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
> + }
> + }
> +
> return 0;
> }
> #endif
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index b1d0a1b3..061f01f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -1055,6 +1055,11 @@ struct controller *pcie_init(struct pcie_device
> *dev)
> }
> }
>
> + pdev = pci_get_slot(subordinate, PCI_DEVFN(0, 0));
> + if (pdev)
> + ctrl->dsn = pci_get_dsn(pdev);
> + pci_dev_put(pdev);
> +
> return ctrl;
> }
>
> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
> index ad12515..65e50be 100644
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -72,6 +72,10 @@ int pciehp_configure_device(struct controller *ctrl)
> pci_bus_add_devices(parent);
> down_read_nested(&ctrl->reset_lock, ctrl->depth);
>
> + dev = pci_get_slot(parent, PCI_DEVFN(0, 0));
> + ctrl->dsn = pci_get_dsn(dev);
> + pci_dev_put(dev);
> +
> out:
> pci_unlock_rescan_remove();
> return ret;
> --
> 2.43.0
next prev parent reply other threads:[~2024-05-29 7:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-09 5:44 [bug report] nvme not re-recognize when changed M.2 SSD in S3 mode Ricky WU
2024-04-09 13:16 ` Bjorn Helgaas
2024-04-10 10:07 ` Lukas Wunner
2024-04-10 14:58 ` Lukas Wunner
2024-04-25 6:02 ` Ricky WU
2024-05-27 11:54 ` Lukas Wunner
2024-05-29 7:47 ` Ricky WU [this message]
2024-05-29 14:43 ` Lukas Wunner
2024-05-15 7:31 ` Ricky WU
2024-05-15 8:46 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94918684e6a84122a6373ef45b3c117c@realtek.com \
--to=ricky_wu@realtek.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mika.westerberg@linux.intel.com \
--cc=scott@spiteful.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox