From mboxrd@z Thu Jan 1 00:00:00 1970 From: keith.busch@intel.com (Keith Busch) Date: Mon, 22 Jan 2018 14:30:25 -0700 Subject: Report long suspend times of NVMe devices (mostly firmware/device issues) In-Reply-To: <9e398971-762d-bb1a-f798-bf0b18cb5b6b@molgen.mpg.de> References: <9e398971-762d-bb1a-f798-bf0b18cb5b6b@molgen.mpg.de> Message-ID: <20180122213024.GR12043@localhost.localdomain> On Mon, Jan 22, 2018@10:02:12PM +0100, Paul Menzel wrote: > Dear Linux folks, > > > Benchmarking the ACPI S3 suspend and resume times with `sleepgraph.py > -config config/suspend-callgraph.cfg` [1], shows that the NVMe disk SAMSUNG > MZVKW512HMJP-00000 in the TUXEDO Book BU1406 takes between 0.3 and 1.4 > seconds, holding up the suspend cycle. > > The time is spent in `nvme_shutdown_ctrl()`. > > ### Linux 4.14.1-041401-generic > > > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 1439.299 ms Total Resume: 19.865 ms) > > ### Linux 4.15-rc9 > > > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 362.239 ms Total Resume: 19.897 m > It?d be useful, if the Linux kernel logged such issues visibly to the user, > so that the hardware manufacturer can be contacted to fix the device > (probably the firmware). > > In my opinion anything longer than 200 ms should be reported similar to [2], > and maybe worded like below. > > > NVMe took more than 200 ms to do suspend routine > > What do you think? The nvme spec guides toward longer times than that. I don't see the point of warning users about things operating within spec. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751143AbeAVV1A (ORCPT ); Mon, 22 Jan 2018 16:27:00 -0500 Received: from mga09.intel.com ([134.134.136.24]:4538 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750892AbeAVV07 (ORCPT ); Mon, 22 Jan 2018 16:26:59 -0500 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,398,1511856000"; d="scan'208";a="13264192" Date: Mon, 22 Jan 2018 14:30:25 -0700 From: Keith Busch To: Paul Menzel Cc: Jens Axboe , Christoph Hellwig , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: Report long suspend times of NVMe devices (mostly firmware/device issues) Message-ID: <20180122213024.GR12043@localhost.localdomain> References: <9e398971-762d-bb1a-f798-bf0b18cb5b6b@molgen.mpg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9e398971-762d-bb1a-f798-bf0b18cb5b6b@molgen.mpg.de> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 22, 2018 at 10:02:12PM +0100, Paul Menzel wrote: > Dear Linux folks, > > > Benchmarking the ACPI S3 suspend and resume times with `sleepgraph.py > -config config/suspend-callgraph.cfg` [1], shows that the NVMe disk SAMSUNG > MZVKW512HMJP-00000 in the TUXEDO Book BU1406 takes between 0.3 and 1.4 > seconds, holding up the suspend cycle. > > The time is spent in `nvme_shutdown_ctrl()`. > > ### Linux 4.14.1-041401-generic > > > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 1439.299 ms Total Resume: 19.865 ms) > > ### Linux 4.15-rc9 > > > nvme @ 0000:04:00.0 {nvme} async_device (Total Suspend: 362.239 ms Total Resume: 19.897 m > It’d be useful, if the Linux kernel logged such issues visibly to the user, > so that the hardware manufacturer can be contacted to fix the device > (probably the firmware). > > In my opinion anything longer than 200 ms should be reported similar to [2], > and maybe worded like below. > > > NVMe took more than 200 ms to do suspend routine > > What do you think? The nvme spec guides toward longer times than that. I don't see the point of warning users about things operating within spec.