public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Loehle <christian.loehle@arm.com>
To: Colin Ian King <colin.king@intel.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	linux-block@vger.kernel.org, linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cpuidle: psd: add power sleep demotion prevention for fast I/O devices
Date: Mon, 3 Mar 2025 22:24:36 +0000	[thread overview]
Message-ID: <f18607ca-30dc-43de-be77-fec69968aeec@arm.com> (raw)
In-Reply-To: <33882f284ac6e6d1ec766ca4bb2f3b88@intel.com>

On 3/3/25 16:43, Colin Ian King wrote:
> Modern processors can drop into deep sleep states relatively quickly
> to save power. However, coming out of deep sleep states takes a small
> amount of time and this is detrimental to performance for I/O devices
> such as fast PCIe NVME drives when servicing a completed I/O
> transactions.
> 
> Testing with fio with read/write RAID0 PCIe NVME devices on various
> modern SMP based systems (such as 96 thead Granite Rapids Xeon 6741P)
> has shown that on 85-90% of read/write transactions issued on a CPU
> are completed by the same CPU, so it makes some sense to prevent the
> CPU from dropping into a deep sleep state to help reduce I/O handling
> latency.

For the platform you tested on that may be true, but even if we constrain
ourselves to pci-nvme there's a variety of queue/irq mappings where
this doesn't hold I'm afraid.

> 
> This commit introduces a simple, lightweight and fast power sleep
> demotion mechanism that provides the block layer a way to inform the
> menu governor to prevent a CPU from going into a deep sleep when an
> I/O operation is requested. While it is true that some I/Os may not

s/requested/completed is the full truth, isn't it?

> be serviced on the same CPU that issued the I/O request and hence
> is not 100% perfect the mechanism does work well in the vast majority
> of I/O operations and there is very small overhead with the sleep
> demotion prevention.
> 
> Test results on a 96 thread Xeon 6741P with a 6 way RAID0 PCIe NVME md
> array using fio 3.35 performing random read and read-write test on a
> 512GB file with 8 concurrent I/O jobs. Tested with the NHM_C1_AUTO_DEMOTE
> bit set in MSR_PKG_CST_CONFIG_CONTROL set in the BIOS.
> 
> Test case: random reads, results based on geometic mean of results from
> 5 test runs:
>            Bandwidth         IO-ops   Latency   Bandwidth
>            read (bytes/sec)  per sec    (ns)    % Std.Deviation
> Baseline:  21365755610	     20377     390105   1.86%
> Patched:   25950107558       24748     322905   0.16%

What is the baseline?
Do you mind trying with Rafael's recently posted series?
Given the IOPS I'd expect good results from that alone already.
https://lore.kernel.org/lkml/1916668.tdWV9SEqCh@rjwysocki.net/

(Happy to see teo as comparison too, which you don't modify).

> 
> Read rate improvement of ~21%.
> 
> Test case: random read+writes, results based on geometic mean of results
> from 5 test runs:
> 
>            Bandwidth         IO-ops   Latency   Bandwidth
>            read (bytes/sec)  per sec    (ns)    % Std.Deviation
> Baseline:   9937848224        9477     550094   1.04%
> Patched:   10502592508       10016     509315   1.85%
> 
> Read rate improvement of ~5.7%
> 
>            Bandwidth         IO-ops   Latency   Bandwidth
>            write (bytes/sec) per sec    (ns)    % Std.Deviation
> Baseline:   9945197656        9484     288933   1.02%
> Patched:   10517268400       10030     287026   1.85%
> 
> Write rate improvement of ~5.7%
> 
> For kernel builds, where all CPUs are fully loaded no perfomance
> improvement or regressions were observed based on the results of
> 5 kernel build test runs.
> 
> By default, CPU power sleep demotion blocking is set to run
> for 3 ms on I/O requests, but this can be modified using the
> new sysfs interface:
> 
>   /sys/devices/system/cpu/cpuidle/psd_cpu_lat_timeout_ms

rounding up a jiffie sure is a heavy price to pay then.

> 
> setting this to zero will disabled the mechanism.
> 
> Signed-off-by: Colin Ian King <colin.king@intel.com>
> ---
>  block/blk-mq.c                   |   2 +
>  drivers/cpuidle/Kconfig          |  10 +++
>  drivers/cpuidle/Makefile         |   1 +
>  drivers/cpuidle/governors/menu.c |   4 +
>  drivers/cpuidle/psd.c            | 123 +++++++++++++++++++++++++++++++
>  include/linux/cpuidle_psd.h      |  32 ++++++++
>  6 files changed, 172 insertions(+)
>  create mode 100644 drivers/cpuidle/psd.c
>  create mode 100644 include/linux/cpuidle_psd.h
> 

       reply	other threads:[~2025-03-03 22:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <33882f284ac6e6d1ec766ca4bb2f3b88@intel.com>
2025-03-03 22:24 ` Christian Loehle [this message]
2025-03-17 10:03   ` [PATCH] cpuidle: psd: add power sleep demotion prevention for fast I/O devices King, Colin
2025-03-23  9:18     ` Christian Loehle
2025-03-26 15:14       ` King, Colin
2025-03-23 12:35     ` Bart Van Assche
2025-03-26 15:04       ` King, Colin
2025-03-26 15:14         ` Rafael J. Wysocki
2025-03-26 16:26         ` Christian Loehle
2025-03-26 17:46           ` Rafael J. Wysocki
2025-04-01 15:03           ` King, Colin
2025-04-01 15:15             ` Christian Loehle
2025-04-01 16:41               ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f18607ca-30dc-43de-be77-fec69968aeec@arm.com \
    --to=christian.loehle@arm.com \
    --cc=axboe@kernel.dk \
    --cc=colin.king@intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox