From: james_p_freyensee@linux.intel.com (J Freyensee)
Subject: [PATCH 3/3] nvme: Enable autonomous power state transitions
Date: Mon, 29 Aug 2016 08:07:46 -0700 [thread overview]
Message-ID: <1472483266.2816.14.camel@linux.intel.com> (raw)
In-Reply-To: <88cc7972617eec58a81877d933604e0f0e342e43.1472462539.git.luto@kernel.org>
On Mon, 2016-08-29@02:25 -0700, Andy Lutomirski wrote:
> NVME devices can advertise multiple power states.??These states can
> be either "operational" (the device is fully functional but possibly
> slow) or "non-operational" (the device is asleep until woken up).
> Some devices can automatically enter a non-operational state when
> idle for a specified amount of time and then automatically wake back
> up when needed.
>
> The hardware configuration is a table.??For each state, an entry in
> the table indicates the next deeper non-operational state, if any,
> to autonomously transition to and the idle time required before
> transitioning.
>
> This patch teaches the driver to program APST so that each
> successive non-operational state will be entered after an idle time
> equal to 100% of the total latency (entry plus exit) associated with
> that state.??A sysfs attribute 'apst_max_latency_ns' gives the
> maximum acceptable latency in ns; non-operational states with total
> latency greater than this value will not be used.??As a special
> case, apst_max_latency_ns=0 will disable APST entirely.
>
> On hardware without APST support, apst_max_latency_ns will not be
> exposed in sysfs.
>
> In theory, the device can expose "default" APST table, but this
> doesn't seem to function correctly on my device (Samsung 950), nor
> does it seem particularly useful.??There is also an optional
> mechanism by which a configuration can be "saved" so it will be
> automatically loaded on reset.??This can be configured from
> userspace, but it doesn't seem useful to support in the driver.
>
> On my laptop, enabling APST seems to save nearly 1W.
>
> The hardware tables can be decoded in userspace with nvme-cli.
> 'nvme id-ctrl /dev/nvmeN' will show the power state table and
> 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
> configuration.
>
> Signed-off-by: Andy Lutomirski <luto at kernel.org>
> ---
> ?drivers/nvme/host/core.c | 167
> +++++++++++++++++++++++++++++++++++++++++++++++
> ?drivers/nvme/host/nvme.h |???6 ++
> ?include/linux/nvme.h?????|???6 ++
> ?3 files changed, 179 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3f7561ab54dc..042137ad2437 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1223,6 +1223,98 @@ static void nvme_set_queue_limits(struct
> nvme_ctrl *ctrl,
> ? blk_queue_write_cache(q, vwc, vwc);
> ?}
> ?
> +static void nvme_configure_apst(struct nvme_ctrl *ctrl)
> +{
> + /*
> + ?* APST (Autonomous Power State Transition) lets us program
> a
> + ?* table of power state transitions that the controller will
> + ?* perform automatically.??We configure it with a simple
> + ?* heuristic: we are willing to spend at most 2% of the time
> + ?* transitioning between power states.??Therefore, when
> running
> + ?* in any given state, we will enter the next lower-power
> + ?* non-operational state after waiting 100 * (enlat + exlat)
> + ?* microseconds, as long as that state's total latency is
> under
> + ?* the requested maximum latency.
> + ?*
> + ?* We will not autonomously enter any non-operational state
> for
> + ?* which the total latency exceeds
> apst_max_latency_ns.??Users
> + ?* can set apst_max_latency_ns to zero to turn off APST.
> + ?*/
> +
> + unsigned apste;
> + struct nvme_feat_auto_pst *table;
> + int ret;
> +
> + if (!ctrl->apsta)
> + return; /* APST isn't supported. */
> +
> + if (ctrl->npss > 31) {
> + dev_warn(ctrl->device, "NPSS is invalid; disabling
> APST\n");
Quick question. ?A little bit below in a later if() block, apste is set
to 0 to turn off APST, which is to be used later in a
nvme_set_features() call to actually turn it off. ?You wouldn't want to
also set apste to zero too and call a nvme_set_features() to "disable
APST"?
I guess I'm a little confused on the error statement, "disabling APST",
when it doesn't seem like anything is being done to actually disable
APST, it's just more of an invalid state retrieved from the HW.
?
> + return;
> + }
> +
> + table = kzalloc(sizeof(*table), GFP_KERNEL);
> + if (!table)
> + return;
> +
> + if (ctrl->apst_max_latency_ns == 0) {
> + /* Turn off APST. */
> + apste = 0;
> + } else {
> + __le64 target = cpu_to_le64(0);
> + int state;
> +
> + /*
> + ?* Walk through all states from lowest- to highest-
> power.
> + ?* According to the spec, lower-numbered states use
> more
> + ?* power.??NPSS, despite the name, is the index of
> the
> + ?* lowest-power state, not the number of states.
> + ?*/
> + for (state = (int)ctrl->npss; state >= 0; state--) {
> + u64 total_latency_us, transition_ms;
> +
> + if (target)
> + table->entries[state] = target;
> +
> + /*
> + ?* Is this state a useful non-operational
> state for
> + ?* higher-power states to autonomously
> transition to?
> + ?*/
> + if (!(ctrl->psd[state].flags & 2))
> + continue;??/* It's an operational
> state. */
> +
> + total_latency_us =
> + (u64)cpu_to_le32(ctrl-
> >psd[state].entry_lat) +
> + + cpu_to_le32(ctrl-
> >psd[state].exit_lat);
> + if (total_latency_us * 1000 > ctrl-
> >apst_max_latency_ns)
> + continue;
> +
> + /*
> + ?* This state is good.??Use it as the APST
> idle
> + ?* target for higher power states.
> + ?*/
> + transition_ms = total_latency_us + 19;
> + do_div(transition_ms, 20);
> + if (transition_ms >= (1 << 24))
> + transition_ms = (1 << 24);
Is it possible to use a macro for this bit shift as its used more than
once?
> +
> + target = cpu_to_le64((state << 3) |
> + ?????(transition_ms << 8));
> + }
>
snip...
.
.
.
> + /*
> + ?* By default, allow up to 25ms of APST-induced
> latency.??This will
> + ?* have no effect on non-APST supporting controllers (i.e.
> any
> + ?* controller with APSTA == 0).
> + ?*/
> + ctrl->apst_max_latency_ns = 25000000;
Is it possible to make that a #define please?
Nice stuff!
>?
next prev parent reply other threads:[~2016-08-29 15:07 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-29 9:25 [PATCH 0/3] nvme power saving Andy Lutomirski
2016-08-29 9:25 ` [PATCH 1/3] nvme/scsi: Remove power management support Andy Lutomirski
2016-08-29 9:25 ` [PATCH 2/3] nvme: Pass pointers, not dma addresses, to nvme_get/set_features() Andy Lutomirski
2016-08-29 16:27 ` Keith Busch
2016-08-29 23:20 ` Andy Lutomirski
2016-08-30 6:36 ` Christoph Hellwig
2016-08-30 16:00 ` Andy Lutomirski
2016-08-29 9:25 ` [PATCH 3/3] nvme: Enable autonomous power state transitions Andy Lutomirski
2016-08-29 15:07 ` J Freyensee [this message]
2016-08-29 23:16 ` Andy Lutomirski
2016-08-30 20:21 ` Andy Lutomirski
2016-09-02 18:11 ` J Freyensee
2016-09-02 18:50 ` Andy Lutomirski
2016-08-29 16:45 ` Keith Busch
2016-08-29 23:16 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1472483266.2816.14.camel@linux.intel.com \
--to=james_p_freyensee@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).