* `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
@ 2024-05-01 4:29 Paul Menzel
2024-05-01 4:51 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Paul Menzel @ 2024-05-01 4:29 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg; +Cc: linux-nvme
[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]
Dear Linux folks,
On the Dell XPS 13 9360 with a PC300 NVMe SK hynix 512GB
$ sudo dmesg | grep -e "DMI:" -e "Linux version" -e microcode
[ 0.000000] Linux version 6.9.0-rc6-00046-g18daea77cca6
(build@bohemianrhapsody.molgen.mpg.de) (gcc (Debian 13.2.0-23) 13.2.0,
GNU ld (GNU Binutils for Debian) 2.42) #127 SMP PREEMPT_DYNAMIC Tue Apr
30 21:55:07 CEST 2024
[ 0.000000] DMI: Dell Inc. XPS 13 9360/0596KF, BIOS 2.21.0
06/02/2022
[ 0.370665] microcode: Current revision: 0x000000f4
[ 0.370667] microcode: Updated early from: 0x000000f0
[ 794.810669] Calling microcode_bsp_resume+0x0/0xd0
[ 989.533146] Calling microcode_bsp_resume+0x0/0xd0
$ lsblk -N
NAME TYPE MODEL SERIAL REV
TRAN RQ-SIZE MQ
nvme0n1 disk PC300 NVMe SK hynix 512GB FS6BN01071010B54P 20004A00
nvme 1023 4
$ sudo nvme list
Node Generic SN
Model Namespace Usage
Format FW Rev
--------------------- --------------------- --------------------
---------------------------------------- ----------
-------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 FS6BN01071010B54P
PC300 NVMe SK hynix 512GB 0x1 512.11 GB / 512.11
GB 512 B + 0 B 20004A00
benchmarking suspend and resume with `sudo ./analyze_suspend.py -f` [1],
`nvme_disable_ctrl()` takes 400 ms:
988.751855 | 2) kworker-4882 | |
nvme_disable_ctrl [nvme_core]() {
[…]
989.162790 | 2) kworker-4882 | 410934.5 us | } /*
nvme_disable_ctrl [nvme_core] */
Could a warning be logged, when this takes more than 50 ms with a hint
that the disk vendor should improve their firmware?
[1]: https://github.com/intel/pm-graph
[-- Attachment #2: abreu_mem_dmesg.txt --]
[-- Type: text/plain, Size: 5566 bytes --]
# suspend-050124-050530 abreu mem 6.9.0-rc6-00046-g18daea77cca6
# sysinfo | man:Dell Inc. | plat:XPS 13 9360 | cpu:Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz | bios:2.21.0 | biosdate:06/02/2022 | numcpu:4 | memsz:15934724 | memfr:6457120 | os:Debian GNU/Linux trixie/sid
# command | analyze_suspend.py -f
# fwsuspend 0 fwresume 1069300
[ 988.543689] PM: suspend entry (deep)
[ 988.550787] Filesystems sync: 0.007 seconds
[ 988.556348] Freezing user space processes
[ 988.596014] Freezing user space processes completed (elapsed 0.039 seconds)
[ 988.596052] OOM killer disabled.
[ 988.596080] Freezing remaining freezable tasks
[ 988.601766] Freezing remaining freezable tasks completed (elapsed 0.005 seconds)
[ 988.601900] printk: Suspending console(s) (use no_console_suspend to debug)
[ 988.621780] wlp58s0: deauthenticating from 88:71:b1:81:93:1b by local choice (Reason: 3=DEAUTH_LEAVING)
[ 989.169191] PM: suspend of devices complete after 548.040 msecs
[ 989.169222] PM: start suspend of devices complete after 567.175 msecs
[ 989.216983] PM: late suspend of devices complete after 47.748 msecs
[ 989.228745] ACPI: EC: interrupt blocked
[ 989.306218] PM: noirq suspend of devices complete after 88.059 msecs
[ 989.306260] ACPI: PM: Preparing to enter system sleep state S3
[ 989.503508] ACPI: EC: event blocked
[ 989.503526] ACPI: EC: EC stopped
[ 989.503543] ACPI: PM: Saving platform NVS memory
[ 989.505256] Disabling non-boot CPUs ...
[ 989.507688] smpboot: CPU 1 is now offline
[ 989.515131] smpboot: CPU 2 is now offline
[ 989.519983] smpboot: CPU 3 is now offline
[ 989.522691] Checking wakeup interrupts
[ 989.522731] Calling kvm_suspend+0x0/0x40 [kvm]
[ 989.522833] Calling intel_epb_save+0x0/0x30
[ 989.522861] Calling mce_syscore_suspend+0x0/0x20
[ 989.522886] Calling ledtrig_cpu_syscore_suspend+0x0/0x20
[ 989.522915] Calling timekeeping_suspend+0x0/0x2f0
[ 989.523082] Calling save_ioapic_entries+0x0/0xd0
[ 989.524373] Calling i8259A_suspend+0x0/0x30
[ 989.524405] Calling fw_suspend+0x0/0x20
[ 989.524429] Calling acpi_save_bm_rld+0x0/0x30
[ 989.524463] Calling lapic_suspend+0x0/0x160
[ 989.529976] ACPI: PM: Low-level resume complete
[ 989.530034] ACPI: EC: EC started
[ 989.530038] ACPI: PM: Restoring platform NVS memory
[ 989.531228] Calling init_counter_refs+0x0/0x40
[ 989.531252] Calling lapic_resume+0x0/0x220
[ 989.531319] Calling acpi_restore_bm_rld+0x0/0x70
[ 989.531346] Calling irqrouter_resume+0x0/0x50
[ 989.531367] Calling i8259A_resume+0x0/0x40
[ 989.531557] Calling ioapic_resume+0x0/0xc0
[ 989.531819] Calling irq_pm_syscore_resume+0x0/0x20
[ 989.532855] Calling timekeeping_resume+0x0/0x1e0
[ 989.533008] Timekeeping suspended for 14.174 seconds
[ 989.533058] Calling ledtrig_cpu_syscore_resume+0x0/0x20
[ 989.533082] Calling mce_syscore_resume+0x0/0x30
[ 989.533125] Calling intel_epb_restore+0x0/0x90
[ 989.533146] Calling microcode_bsp_resume+0x0/0xd0
[ 989.533168] Calling kvm_resume+0x0/0x60 [kvm]
[ 989.533255] PM: Triggering wakeup from IRQ 51
[ 989.533299] Enabling non-boot CPUs ...
[ 989.533554] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 989.535107] CPU1 is up
[ 989.535355] smpboot: Booting Node 0 Processor 2 APIC 0x1
[ 989.537199] CPU2 is up
[ 989.537443] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 989.539014] CPU3 is up
[ 989.545084] ACPI: PM: Waking up from system sleep state S3
[ 990.591691] ACPI: EC: interrupt unblocked
[ 990.623101] PM: noirq resume of devices complete after 31.933 msecs
[ 990.624873] PM: early resume of devices complete after 1.517 msecs
[ 990.625382] ACPI: EC: event unblocked
[ 990.641114] nvme nvme0: 4/0/0 default/read/poll queues
[ 990.658018] i915 0000:00:02.0: [drm] [ENCODER:94:DDI A/PHY A] is disabled/in DSI mode with an ungated DDI clock, gate it
[ 990.669995] i915 0000:00:02.0: [drm] [ENCODER:102:DDI B/PHY B] is disabled/in DSI mode with an ungated DDI clock, gate it
[ 990.773737] i915 0000:00:02.0: [drm] [ENCODER:113:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it
[ 990.902881] usb 1-4: reset full-speed USB device number 3 using xhci_hcd
[ 991.034511] ACPI Debug: "iGfx Supported Functions Bitmap "
[ 991.190678] usb 1-5: reset high-speed USB device number 4 using xhci_hcd
[ 991.467237] usb 1-3: reset full-speed USB device number 2 using xhci_hcd
[ 991.617916] PM: resume of devices complete after 993.226 msecs
[ 991.650442] OOM killer enabled.
[ 991.650470] Restarting tasks ... done.
[ 991.694503] random: crng reseeded on system resumption
[ 991.846758] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[ 991.925783] wlp58s0: authenticate with 88:71:b1:81:93:1a (local address=9c:b6:d0:d1:6a:b1)
[ 991.925800] wlp58s0: send auth to 88:71:b1:81:93:1a (try 1/3)
[ 991.930114] wlp58s0: authenticated
[ 991.933812] wlp58s0: associate with 88:71:b1:81:93:1a (try 1/3)
[ 991.970406] wlp58s0: RX AssocResp from 88:71:b1:81:93:1a (capab=0x1431 status=0 aid=1)
[ 991.979423] wlp58s0: associated
[ 991.979652] wlp58s0: Limiting TX power to 20 (20 - 0) dBm as advertised by 88:71:b1:81:93:1a
[ 993.200241] PM: suspend exit
[ 993.200929] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin
[ 993.200940] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111
[ 993.564671] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
[ 993.591246] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
[-- Attachment #3: abreu_mem_ftrace.txt.7z --]
[-- Type: application/x-7z-compressed, Size: 433371 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-01 4:29 `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe Paul Menzel
@ 2024-05-01 4:51 ` Christoph Hellwig
2024-05-01 7:58 ` Keith Busch
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2024-05-01 4:51 UTC (permalink / raw)
To: Paul Menzel
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme
On Wed, May 01, 2024 at 06:29:12AM +0200, Paul Menzel wrote:
> Could a warning be logged, when this takes more than 50 ms with a hint that
> the disk vendor should improve their firmware?
Not sure why we'd warn about it. A clean shutdown can definitively take
some time and there's nothing tha forbids it. I don't think a linux
message is going to have any effect on firmware engineering..
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-01 4:51 ` Christoph Hellwig
@ 2024-05-01 7:58 ` Keith Busch
2024-05-01 20:58 ` Paul Menzel
0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2024-05-01 7:58 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Paul Menzel, Jens Axboe, Sagi Grimberg, linux-nvme
On Wed, May 01, 2024 at 06:51:45AM +0200, Christoph Hellwig wrote:
> On Wed, May 01, 2024 at 06:29:12AM +0200, Paul Menzel wrote:
> > Could a warning be logged, when this takes more than 50 ms with a hint that
> > the disk vendor should improve their firmware?
>
> Not sure why we'd warn about it. A clean shutdown can definitively take
> some time and there's nothing tha forbids it. I don't think a linux
> message is going to have any effect on firmware engineering..
Exactly. Unless the device reports a lower D3 entry latency, then this
sounds like everything is working-as-designed.
You can check your device's advertised shutdown time (assuming your
device is nvme0):
nvme id-ctrl /dev/nvme0 | grep rtd3e
The value is reported in microseconds. If it shows 0, then the device
doesn't report an expected shutdown time.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-01 7:58 ` Keith Busch
@ 2024-05-01 20:58 ` Paul Menzel
2024-05-01 22:03 ` Keith Busch
0 siblings, 1 reply; 10+ messages in thread
From: Paul Menzel @ 2024-05-01 20:58 UTC (permalink / raw)
To: Keith Busch, Christoph Hellwig; +Cc: Jens Axboe, Sagi Grimberg, linux-nvme
Dear Keith, dear Christoph,
Am 01.05.24 um 09:58 schrieb Keith Busch:
> On Wed, May 01, 2024 at 06:51:45AM +0200, Christoph Hellwig wrote:
>> On Wed, May 01, 2024 at 06:29:12AM +0200, Paul Menzel wrote:
>>> Could a warning be logged, when this takes more than 50 ms with a hint that
>>> the disk vendor should improve their firmware?
>>
>> Not sure why we'd warn about it. A clean shutdown can definitively take
>> some time and there's nothing tha forbids it. I don't think a linux
>> message is going to have any effect on firmware engineering..
>
> Exactly. Unless the device reports a lower D3 entry latency, then this
> sounds like everything is working-as-designed.
Maybe according to the spec, but I have a hard time to believe, that
disks should take longer to shut down than coreboot to initialize a
mainboard.
In the end, in my opinion, users cannot make an informed decision, if
these things are hidden. If it would be visible somehow in the logs –
maybe not warning but info level – then even not so technical users
could inform themselves and factor this in their buying decision.
> You can check your device's advertised shutdown time (assuming your
> device is nvme0):
>
> nvme id-ctrl /dev/nvme0 | grep rtd3e
>
> The value is reported in microseconds. If it shows 0, then the device
> doesn't report an expected shutdown time.
Thank you for sharing. It’s 60 ms:
$ sudo nvme id-ctrl /dev/nvme0 | grep rtd3e
rtd3e : 0xea60
6 * 16 + 10 * 256 + 14 * 16 * 256 = 60000
No idea, if ftrace skews something.
Kind regards,
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-01 20:58 ` Paul Menzel
@ 2024-05-01 22:03 ` Keith Busch
2024-05-02 6:04 ` Paul Menzel
0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2024-05-01 22:03 UTC (permalink / raw)
To: Paul Menzel; +Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme
On Wed, May 01, 2024 at 10:58:05PM +0200, Paul Menzel wrote:
> Am 01.05.24 um 09:58 schrieb Keith Busch:
> > Exactly. Unless the device reports a lower D3 entry latency, then this
> > sounds like everything is working-as-designed.
>
> Maybe according to the spec, but I have a hard time to believe, that disks
> should take longer to shut down than coreboot to initialize a mainboard.
That doesn't seem too hard to believe to me. A safe shutdown can often
take a while time for an SSD. I've seen other implementations orders of
magnitude worse than what you're showing. You could do an unsafe
shutdown instead, but the device will just take even more time to enable
on its next power-on.
> In the end, in my opinion, users cannot make an informed decision, if these
> things are hidden. If it would be visible somehow in the logs - maybe not
> warning but info level - then even not so technical users could inform
> themselves and factor this in their buying decision.
What good is it to advertise a shutdown time when vendors are clearly
unreliable at reporting an accurate value? If you need to see the driver
report it from emperical testing, then you've already bought the device,
right?
> > You can check your device's advertised shutdown time (assuming your
> > device is nvme0):
> >
> > nvme id-ctrl /dev/nvme0 | grep rtd3e
> >
> > The value is reported in microseconds. If it shows 0, then the device
> > doesn't report an expected shutdown time.
>
> Thank you for sharing. It´s 60 ms:
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-01 22:03 ` Keith Busch
@ 2024-05-02 6:04 ` Paul Menzel
2024-05-02 6:12 ` Paul Menzel
0 siblings, 1 reply; 10+ messages in thread
From: Paul Menzel @ 2024-05-02 6:04 UTC (permalink / raw)
To: Keith Busch; +Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme
Dear Keith,
Thank you for your quick reply.
Am 02.05.24 um 00:03 schrieb Keith Busch:
> On Wed, May 01, 2024 at 10:58:05PM +0200, Paul Menzel wrote:
>> Am 01.05.24 um 09:58 schrieb Keith Busch:
>>> Exactly. Unless the device reports a lower D3 entry latency, then this
>>> sounds like everything is working-as-designed.
>>
>> Maybe according to the spec, but I have a hard time to believe, that disks
>> should take longer to shut down than coreboot to initialize a mainboard.
>
> That doesn't seem too hard to believe to me. A safe shutdown can often
> take a while time for an SSD. I've seen other implementations orders of
> magnitude worse than what you're showing.
But why? Due to physics or due to “slow” firmware?
> You could do an unsafe shutdown instead, but the device will just
> take even more time to enable on its next power-on.
Indeed, that is does not sound good.
>> In the end, in my opinion, users cannot make an informed decision, if these
>> things are hidden. If it would be visible somehow in the logs - maybe not
>> warning but info level - then even not so technical users could inform
>> themselves and factor this in their buying decision.
>
> What good is it to advertise a shutdown time when vendors are clearly
> unreliable at reporting an accurate value? If you need to see the driver
> report it from emperical testing, then you've already bought the device,
> right?
>
>>> You can check your device's advertised shutdown time (assuming your
>>> device is nvme0):
>>>
>>> nvme id-ctrl /dev/nvme0 | grep rtd3e
>>>
>>> The value is reported in microseconds. If it shows 0, then the device
>>> doesn't report an expected shutdown time.
>>
>> Thank you for sharing. It´s 60 ms:
I added some debug prints:
```
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 27281a9a8951..76857d5e8a1a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2335,6 +2335,10 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl,
bool shutdown)
{
int ret;
+ dev_err(ctrl->device,
+ "XXX Enter nvme_disable_ctrl with shutdown %s\n",
+ shutdown ? "yes" : "no");
+
ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
if (shutdown)
ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
@@ -2346,14 +2350,24 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl,
bool shutdown)
return ret;
if (shutdown) {
- return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
+ dev_err(ctrl->device,
+ "XXX before nvme_wait_ready,
shutdown_timeout %u\n", ctrl->shutdown_timeout);
+ ret = nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
NVME_CSTS_SHST_CMPLT,
ctrl->shutdown_timeout, "shutdown");
+ dev_err(ctrl->device,
+ "XXX after nvme_wait_ready\n");
+ return ret;
}
if (ctrl->quirks & NVME_QUIRK_DELAY_BEFORE_CHK_RDY)
msleep(NVME_QUIRK_DELAY_AMOUNT);
- return nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0,
+ dev_err(ctrl->device,
+ "XXX before nvme_wait_ready, shutdown_timeout
%u\n", (NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2);
+ ret = nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0,
(NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2,
"reset");
+ dev_err(ctrl->device,
+ "XXX after nvme_wait_ready\n");
+ return ret;
}
EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
```
and got for “cold boot” and from ACPI S3 suspend and resume:
```
[ 0.955837] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown no
[ 0.955898] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 128
[ 0.961861] nvme nvme0: XXX after nvme_wait_ready
[ 1.637251] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is
disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[ 27.785752] wmi_bus wmi_bus-PNP0C14:01: [Firmware Bug]: WQBC data
block query control method not found
[ 29.811581] Bluetooth: hci0: HCI Enhanced Setup Synchronous
Connection command is advertised, but not supported.
[ 62.697526] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown yes
[ 62.697538] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 5
[ 63.110008] nvme nvme0: XXX after nvme_wait_ready
[ 63.308385] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown no
[ 63.308394] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 128
[ 63.308399] nvme nvme0: XXX after nvme_wait_ready
```
So this confirms the ftrace findings. Excuse my ignorance, so the
time-out is in seconds? And how does this relate to the rtd3e value (410
ms ≠ 60 ms ≠ 5 s(?)?
Kind regards,
Paul
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-02 6:04 ` Paul Menzel
@ 2024-05-02 6:12 ` Paul Menzel
2024-05-02 8:43 ` Keith Busch
0 siblings, 1 reply; 10+ messages in thread
From: Paul Menzel @ 2024-05-02 6:12 UTC (permalink / raw)
To: Keith Busch; +Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme
[add forgotten reply to questions in second paragraph]
Am 02.05.24 um 08:04 schrieb Paul Menzel:
> Dear Keith,
>
>
> Thank you for your quick reply.
>
> Am 02.05.24 um 00:03 schrieb Keith Busch:
>> On Wed, May 01, 2024 at 10:58:05PM +0200, Paul Menzel wrote:
>>> Am 01.05.24 um 09:58 schrieb Keith Busch:
>>>> Exactly. Unless the device reports a lower D3 entry latency, then this
>>>> sounds like everything is working-as-designed.
>>>
>>> Maybe according to the spec, but I have a hard time to believe, that
>>> disks
>>> should take longer to shut down than coreboot to initialize a mainboard.
>>
>> That doesn't seem too hard to believe to me. A safe shutdown can often
>> take a while time for an SSD. I've seen other implementations orders of
>> magnitude worse than what you're showing.
>
> But why? Due to physics or due to “slow” firmware?
>
>> You could do an unsafe shutdown instead, but the device will just
>> take even more time to enable on its next power-on.
>
> Indeed, that is does not sound good.
>
>>> In the end, in my opinion, users cannot make an informed decision, if these
>>> things are hidden. If it would be visible somehow in the logs - maybe not
>>> warning but info level - then even not so technical users could inform
>>> themselves and factor this in their buying decision.
>>
>> What good is it to advertise a shutdown time when vendors are clearly
>> unreliable at reporting an accurate value?
Linux time stamps would be reliable, and showing if vendors lie is a
good thing, because then people can support vendors not lying.
>> If you need to see the driver report it from emperical testing,
>> then you've already bought the device, right?
Not necessarily, as logs from other systems can be accessed in databases
like the Hardware for Linux database [1] and the uploaded log messages [2].
>>>> You can check your device's advertised shutdown time (assuming your
>>>> device is nvme0):
>>>>
>>>> nvme id-ctrl /dev/nvme0 | grep rtd3e
>>>>
>>>> The value is reported in microseconds. If it shows 0, then the device
>>>> doesn't report an expected shutdown time.
>>>
>>> Thank you for sharing. It´s 60 ms:
>
> I added some debug prints:
>
> ```
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 27281a9a8951..76857d5e8a1a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2335,6 +2335,10 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl,
> bool shutdown)
> {
> int ret;
>
> + dev_err(ctrl->device,
> + "XXX Enter nvme_disable_ctrl with shutdown %s\n",
> + shutdown ? "yes" : "no");
> +
> ctrl->ctrl_config &= ~NVME_CC_SHN_MASK;
> if (shutdown)
> ctrl->ctrl_config |= NVME_CC_SHN_NORMAL;
> @@ -2346,14 +2350,24 @@ int nvme_disable_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
> return ret;
>
> if (shutdown) {
> - return nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
> + dev_err(ctrl->device,
> + "XXX before nvme_wait_ready, shutdown_timeout %u\n", ctrl->shutdown_timeout);
> + ret = nvme_wait_ready(ctrl, NVME_CSTS_SHST_MASK,
> NVME_CSTS_SHST_CMPLT,
> ctrl->shutdown_timeout, "shutdown");
> + dev_err(ctrl->device,
> + "XXX after nvme_wait_ready\n");
> + return ret;
> }
> if (ctrl->quirks & NVME_QUIRK_DELAY_BEFORE_CHK_RDY)
> msleep(NVME_QUIRK_DELAY_AMOUNT);
> - return nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0,
> + dev_err(ctrl->device,
> + "XXX before nvme_wait_ready, shutdown_timeout %u\n", (NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2);
> + ret = nvme_wait_ready(ctrl, NVME_CSTS_RDY, 0,
> (NVME_CAP_TIMEOUT(ctrl->cap) + 1) / 2,
> "reset");
> + dev_err(ctrl->device,
> + "XXX after nvme_wait_ready\n");
> + return ret;
> }
> EXPORT_SYMBOL_GPL(nvme_disable_ctrl);
> ```
>
> and got for “cold boot” and from ACPI S3 suspend and resume:
>
> ```
> [ 0.955837] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown no
> [ 0.955898] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 128
> [ 0.961861] nvme nvme0: XXX after nvme_wait_ready
> [ 1.637251] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> [ 27.785752] wmi_bus wmi_bus-PNP0C14:01: [Firmware Bug]: WQBC data block query control method not found
> [ 29.811581] Bluetooth: hci0: HCI Enhanced Setup Synchronous Connection command is advertised, but not supported.
> [ 62.697526] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown yes
> [ 62.697538] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 5
> [ 63.110008] nvme nvme0: XXX after nvme_wait_ready
> [ 63.308385] nvme nvme0: XXX Enter nvme_disable_ctrl with shutdown no
> [ 63.308394] nvme nvme0: XXX before nvme_wait_ready, shutdown_timeout 128
> [ 63.308399] nvme nvme0: XXX after nvme_wait_ready
> ```
>
> So this confirms the ftrace findings. Excuse my ignorance, so the
> time-out is in seconds? And how does this relate to the rtd3e value (410
> ms ≠ 60 ms ≠ 5 s(?)?
>
>
> Kind regards,
>
> Paul
[1]: https://linux-hardware.org/
[2]: https://github.com/linuxhw/Dmesg
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-02 6:12 ` Paul Menzel
@ 2024-05-02 8:43 ` Keith Busch
2024-05-02 15:57 ` Jeremy Allison
2024-05-03 5:52 ` Paul Menzel
0 siblings, 2 replies; 10+ messages in thread
From: Keith Busch @ 2024-05-02 8:43 UTC (permalink / raw)
To: Paul Menzel; +Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme
On Thu, May 02, 2024 at 08:12:39AM +0200, Paul Menzel wrote:
> > > That doesn't seem too hard to believe to me. A safe shutdown can often
> > > take a while time for an SSD. I've seen other implementations orders of
> > > magnitude worse than what you're showing.
> >
> > But why? Due to physics or due to "slow" firmware?
Maybe both? The run time metadata doesn't necessarily match the on-disk
format, and constructing that can take a moment. These device's CPUs are
usually the cheapest the vendor could get that satisifies a run-time
performance target, so may be under powered for computational tasks.
And it may also have to flush pending user data from its internal
memory, which could be a few GB.
Lower end devices don't even have memory, so may have to make many round
trips to host memory to retreive its metadata then manipulate that to
its on-disk format.
Maybe this could be better optimized, but vendors may not consider
shutdown time to be a high priority.
This gets worse as you add more nvme devices to your system because
shutdown is serialized. Some of us have proposed patches parallelizing
this process. I wish I could spend more time on helping see that to
completion, but other priorities get in the way. :(
> > So this confirms the ftrace findings. Excuse my ignorance, so the
> > time-out is in seconds? And how does this relate to the rtd3e value (410
> > ms /= 60 ms /= 5 s(?)?
The driver provides a user tunable parameter to specify the minimum
timeout value, and it defaults to 5 seconds.
nvme_core.shutdown_timeout=<time_in_seconds>
The driver selects this or the advertised rtd3e, whichever is greater.
We can't trust device's to report this correctly (and NVMe 1.0 didn't
even provide a way for a device to report an expected shutdown time), so
this exists to prevent unsafe shutdowns. Devices are supposed to survive
an unsafe shutdown, but it's best to avoid that path.
The parameter is in granularity of seconds because the NVMe 1.0 spec
said to "wait at least one second" for a shutdown to complete. Not the
most clear wording for a spec, but that's where we started.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-02 8:43 ` Keith Busch
@ 2024-05-02 15:57 ` Jeremy Allison
2024-05-03 5:52 ` Paul Menzel
1 sibling, 0 replies; 10+ messages in thread
From: Jeremy Allison @ 2024-05-02 15:57 UTC (permalink / raw)
To: Keith Busch
Cc: Paul Menzel, Christoph Hellwig, Jens Axboe, Sagi Grimberg,
linux-nvme
On Thu, May 02, 2024 at 09:43:36AM +0100, Keith Busch wrote:
>
>This gets worse as you add more nvme devices to your system because
>shutdown is serialized. Some of us have proposed patches parallelizing
>this process. I wish I could spend more time on helping see that to
>completion, but other priorities get in the way. :(
FYI. The various asynchronous shutdown patches have been
merged and are continuing development and testing. I expect
Stuart to re-propose these once all the obvious bugs have been
found. His approach is currently the best option IMHO.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe
2024-05-02 8:43 ` Keith Busch
2024-05-02 15:57 ` Jeremy Allison
@ 2024-05-03 5:52 ` Paul Menzel
1 sibling, 0 replies; 10+ messages in thread
From: Paul Menzel @ 2024-05-03 5:52 UTC (permalink / raw)
To: Keith Busch; +Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme
Dear Keith,
Thank you for your reply with a lot of background. This is much appreciated.
Am 02.05.24 um 10:43 schrieb Keith Busch:
> On Thu, May 02, 2024 at 08:12:39AM +0200, Paul Menzel wrote:
>>>> That doesn't seem too hard to believe to me. A safe shutdown can often
>>>> take a while time for an SSD. I've seen other implementations orders of
>>>> magnitude worse than what you're showing.
>>>
>>> But why? Due to physics or due to "slow" firmware?
>
> Maybe both? The run time metadata doesn't necessarily match the on-disk
> format, and constructing that can take a moment. These device's CPUs are
> usually the cheapest the vendor could get that satisfies a run-time
> performance target, so may be under powered for computational tasks.
>
> And it may also have to flush pending user data from its internal
> memory, which could be a few GB.
>
> Lower end devices don't even have memory, so may have to make many round
> trips to host memory to retrieve its metadata then manipulate that to
> its on-disk format.
Thank you for the details. Indeed “slow” firmware can be caused by
low-performant chips.
> Maybe this could be better optimized, but vendors may not consider
> shutdown time to be a high priority.
As this is all a black box, it’s hard to know. If this is more visible,
vendors might make it a higher priority.
> This gets worse as you add more nvme devices to your system because
> shutdown is serialized. Some of us have proposed patches parallelizing
> this process. I wish I could spend more time on helping see that to
> completion, but other priorities get in the way. :(
I didn’t know. I only have systems with one NVMe device. Also in other
parts like initializing CPU cores and applying microcode updates they
try to parallelize the initialization to decrease boot time.
>>> So this confirms the ftrace findings. Excuse my ignorance, so the
>>> time-out is in seconds? And how does this relate to the rtd3e value (410
>>> ms /= 60 ms /= 5 s(?)?
>
> The driver provides a user tunable parameter to specify the minimum
> timeout value, and it defaults to 5 seconds.
>
> nvme_core.shutdown_timeout=<time_in_seconds>
>
> The driver selects this or the advertised rtd3e, whichever is greater.
Reading the code, that is in `nvme_init_identify()`:
if (id->rtd3e) {
/* us -> s */
u32 transition_time = le32_to_cpu(id->rtd3e) / USEC_PER_SEC;
ctrl->shutdown_timeout = clamp_t(unsigned int, transition_time,
shutdown_timeout, 60);
if (ctrl->shutdown_timeout != shutdown_timeout)
dev_info(ctrl->device,
"D3 entry latency set to %u seconds\n",
ctrl->shutdown_timeout);
} else
ctrl->shutdown_timeout = shutdown_timeout;
> We can't trust device's to report this correctly (and NVMe 1.0 didn't
> even provide a way for a device to report an expected shutdown time), so
> this exists to prevent unsafe shutdowns. Devices are supposed to survive
> an unsafe shutdown, but it's best to avoid that path.
So it’s like this in my case, as the SK hynix reports 60 ms, but
actually takes 411 ms?
> The parameter is in granularity of seconds because the NVMe 1.0 spec
> said to "wait at least one second" for a shutdown to complete. Not the
> most clear wording for a spec, but that's where we started.
Thank you for the details. I am not a spec writer, but my gut feeling
says, there should always be a polling(?) solution and only upper
boundaries, that means “no longer than”, should be used.
Kind regards,
Paul
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-03 5:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-01 4:29 `nvme_disable_ctrl()` takes 411 ms on a Dell XPS 13 with SK hynix PC300 NVMEe Paul Menzel
2024-05-01 4:51 ` Christoph Hellwig
2024-05-01 7:58 ` Keith Busch
2024-05-01 20:58 ` Paul Menzel
2024-05-01 22:03 ` Keith Busch
2024-05-02 6:04 ` Paul Menzel
2024-05-02 6:12 ` Paul Menzel
2024-05-02 8:43 ` Keith Busch
2024-05-02 15:57 ` Jeremy Allison
2024-05-03 5:52 ` Paul Menzel
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.