All of lore.kernel.org
 help / color / mirror / Atom feed
From: kbusch@kernel.org (Keith Busch)
Subject: [PATCH] nvme/pci: Use host managed power state for suspend
Date: Tue, 14 May 2019 16:16:09 -0600	[thread overview]
Message-ID: <20190514221609.GC19977@localhost.localdomain> (raw)
In-Reply-To: <CAJZ5v0g3cCYK3rAQn09pCr7LMrRr=zQy_ceaEB5AKhVx604YgA@mail.gmail.com>

On Tue, May 14, 2019@10:04:22AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 13, 2019@5:10 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Mon, May 13, 2019@03:05:42PM +0000, Mario.Limonciello@dell.com wrote:
> > > This system power state - suspend to idle is going to freeze threads.
> > > But we're talking a multi threaded kernel.  Can't there be a timing problem going
> > > on then too?  With a disk flush being active in one task and the other task trying
> > > to put the disk into the deepest power state.  If you don't freeze the queues how
> > > can you guarantee that didn't happen?
> >
> > But if an active data flush task is running, then we're not idle and
> > shouldn't go to low power.
> 
> To be entirely precise, system suspend prevents user space from
> running while it is in progress.  It doesn't do that to kernel
> threads, at least not by default, though, so if there is a kernel
> thread flushing the data, it needs to be stopped or suspended somehow
> directly in the system suspend path.  [And yes, system suspend (or
> hibernation) may take place at any time so long as all user space can
> be prevented from running then (by means of the tasks freezer).]
> 
> However, freezing the queues from a driver ->suspend callback doesn't
> help in general and the reason why is hibernation.  Roughly speaking,
> hibernation works in two steps, the first of which creates a snapshot
> image of system memory and the second one writes that image to
> persistent storage.  Devices are resumed between the two steps in
> order to make it possible to do the write, but that would unfreeze the
> queues and let the data flusher run.  If it runs, it may cause the
> memory snapshot image that has just been created to become outdated
> and restoring the system memory contents from that image going forward
> may cause corruption to occur.
> 
> Thus freezing the queues from a driver ->suspend callback should not
> be relied on for correctness if the same callback is used for system
> suspend and hibernation, which is the case here.  If doing that
> prevents the system from crashing, it is critical to find out why IMO,
> as that may very well indicate a broader issue, not necessarily in the
> driver itself.
> 
> But note that even if the device turns out to behave oddly, it still
> needs to be handled, unless it may be prevented from shipping to users
> in that shape.  If it ships, users will face the odd behavior anyway.

Thanks for all the information. I'll take another shot at this, should
have it posted tomorrow.

It's mostly not a problem to ensure enqueued and dispatched requests are
completed before returning from our suspend callback. I originally had
that behavior and backed it out when I thought it wasn't necessary. So
I'll reintroduce that. I'm not sure yet how we may handle kernel tasks
that are about to read/write pages, but haven't yet enqueued their
requests.

IO timeouts, shold they occur, have some problems here, so I'll need to
send prep patches to address a few issues with that.

WARNING: multiple messages have this Message-ID (diff)
From: Keith Busch <kbusch@kernel.org>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Mario Limonciello <Mario.Limonciello@dell.com>,
	Christoph Hellwig <hch@lst.de>,
	Keith Busch <keith.busch@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme <linux-nvme@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>
Subject: Re: [PATCH] nvme/pci: Use host managed power state for suspend
Date: Tue, 14 May 2019 16:16:09 -0600	[thread overview]
Message-ID: <20190514221609.GC19977@localhost.localdomain> (raw)
In-Reply-To: <CAJZ5v0g3cCYK3rAQn09pCr7LMrRr=zQy_ceaEB5AKhVx604YgA@mail.gmail.com>

On Tue, May 14, 2019 at 10:04:22AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 13, 2019 at 5:10 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Mon, May 13, 2019 at 03:05:42PM +0000, Mario.Limonciello@dell.com wrote:
> > > This system power state - suspend to idle is going to freeze threads.
> > > But we're talking a multi threaded kernel.  Can't there be a timing problem going
> > > on then too?  With a disk flush being active in one task and the other task trying
> > > to put the disk into the deepest power state.  If you don't freeze the queues how
> > > can you guarantee that didn't happen?
> >
> > But if an active data flush task is running, then we're not idle and
> > shouldn't go to low power.
> 
> To be entirely precise, system suspend prevents user space from
> running while it is in progress.  It doesn't do that to kernel
> threads, at least not by default, though, so if there is a kernel
> thread flushing the data, it needs to be stopped or suspended somehow
> directly in the system suspend path.  [And yes, system suspend (or
> hibernation) may take place at any time so long as all user space can
> be prevented from running then (by means of the tasks freezer).]
> 
> However, freezing the queues from a driver ->suspend callback doesn't
> help in general and the reason why is hibernation.  Roughly speaking,
> hibernation works in two steps, the first of which creates a snapshot
> image of system memory and the second one writes that image to
> persistent storage.  Devices are resumed between the two steps in
> order to make it possible to do the write, but that would unfreeze the
> queues and let the data flusher run.  If it runs, it may cause the
> memory snapshot image that has just been created to become outdated
> and restoring the system memory contents from that image going forward
> may cause corruption to occur.
> 
> Thus freezing the queues from a driver ->suspend callback should not
> be relied on for correctness if the same callback is used for system
> suspend and hibernation, which is the case here.  If doing that
> prevents the system from crashing, it is critical to find out why IMO,
> as that may very well indicate a broader issue, not necessarily in the
> driver itself.
> 
> But note that even if the device turns out to behave oddly, it still
> needs to be handled, unless it may be prevented from shipping to users
> in that shape.  If it ships, users will face the odd behavior anyway.

Thanks for all the information. I'll take another shot at this, should
have it posted tomorrow.

It's mostly not a problem to ensure enqueued and dispatched requests are
completed before returning from our suspend callback. I originally had
that behavior and backed it out when I thought it wasn't necessary. So
I'll reintroduce that. I'm not sure yet how we may handle kernel tasks
that are about to read/write pages, but haven't yet enqueued their
requests.

IO timeouts, shold they occur, have some problems here, so I'll need to
send prep patches to address a few issues with that.

  reply	other threads:[~2019-05-14 22:16 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10 21:29 [PATCH] nvme/pci: Use host managed power state for suspend Keith Busch
2019-05-10 21:29 ` Keith Busch
2019-05-11  0:52 ` Mario.Limonciello
2019-05-11  0:52   ` Mario.Limonciello
2019-05-11  0:52   ` Mario.Limonciello
2019-05-13 14:24   ` Mario.Limonciello
2019-05-13 14:24     ` Mario.Limonciello
2019-05-13 14:24     ` Mario.Limonciello
2019-05-13 14:37     ` Christoph Hellwig
2019-05-13 14:37       ` Christoph Hellwig
2019-05-13 14:43       ` Mario.Limonciello
2019-05-13 14:43         ` Mario.Limonciello
2019-05-13 14:43         ` Mario.Limonciello
2019-05-13 14:54         ` Christoph Hellwig
2019-05-13 14:54           ` Christoph Hellwig
2019-05-13 14:55         ` Keith Busch
2019-05-13 14:55           ` Keith Busch
2019-05-13 15:05           ` Mario.Limonciello
2019-05-13 15:05             ` Mario.Limonciello
2019-05-13 15:05             ` Mario.Limonciello
2019-05-13 15:04             ` Keith Busch
2019-05-13 15:04               ` Keith Busch
2019-05-14  8:04               ` Rafael J. Wysocki
2019-05-14  8:04                 ` Rafael J. Wysocki
2019-05-14 22:16                 ` Keith Busch [this message]
2019-05-14 22:16                   ` Keith Busch
2019-05-15  9:00                   ` Rafael J. Wysocki
2019-05-15  9:00                     ` Rafael J. Wysocki
2019-05-13 14:37     ` Keith Busch
2019-05-13 14:37       ` Keith Busch
2019-05-13 14:54       ` Mario.Limonciello
2019-05-13 14:54         ` Mario.Limonciello
2019-05-13 14:54         ` Mario.Limonciello
2019-05-13 14:57         ` Christoph Hellwig
2019-05-13 14:57           ` Christoph Hellwig
2019-05-13 15:16           ` Keith Busch
2019-05-13 15:16             ` Keith Busch
2019-05-13 17:16             ` Kai-Heng Feng
2019-05-13 17:16               ` Kai-Heng Feng
2019-05-13 18:16               ` Keith Busch
2019-05-13 18:16                 ` Keith Busch
2019-05-13 18:01           ` Mario.Limonciello
2019-05-13 18:01             ` Mario.Limonciello
2019-05-13 18:01             ` Mario.Limonciello
2019-05-14  6:11             ` Christoph Hellwig
2019-05-14  6:11               ` Christoph Hellwig
2019-05-11  1:33 ` Edmund Nadolski
2019-05-11  1:33   ` Edmund Nadolski
2019-05-11  7:22 ` Christoph Hellwig
2019-05-11  7:22   ` Christoph Hellwig
2019-05-12 14:34   ` Sagi Grimberg
2019-05-12 14:34     ` Sagi Grimberg
2019-05-13 13:36   ` Keith Busch
2019-05-13 13:36     ` Keith Busch
2019-05-12  6:06 ` Chaitanya Kulkarni
2019-05-12  6:06   ` Chaitanya Kulkarni
2019-05-12 14:30   ` Minwoo Im
2019-05-12 14:30     ` Minwoo Im
2019-05-12 15:20     ` Chaitanya Kulkarni
2019-05-12 15:20       ` Chaitanya Kulkarni
2019-05-13 13:47   ` Keith Busch
2019-05-13 13:47     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514221609.GC19977@localhost.localdomain \
    --to=kbusch@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.