public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] nvme-pci: Add CPU latency pm-qos handling
@ 2024-10-04 10:09 Tero Kristo
  2024-10-04 10:09 ` [PATCH 1/1] " Tero Kristo
  0 siblings, 1 reply; 9+ messages in thread
From: Tero Kristo @ 2024-10-04 10:09 UTC (permalink / raw)
  Cc: linux-kernel, axboe, hch, linux-nvme, sagi, kbusch

Hello,

Re-posting this as the 6.12-rc1 is out, and the previous RFC didn't
receive any feedback. The patch hasn't seen any changes, but I included
the cover letter for details.

The patch adds mechanism for tacking NVME latency with random workloads.
A new sysfs knob (cpu_latency_us) is added under NVME devices, which can
be used to fine tune PM QoS CPU latency limit while NVME is operational.

Below is a postprocessed measurement run on an Icelake Xeon platform,
measuring latencies with 'fio' tool, running random-read and read
profiles. 5 random-read and 5 bulk read operations are done with the
latency limit enabled / disabled, and the maximum 'slat' (start latency),
'clat' (completion latency) and 'lat' (total latency) values shown for each
setup; values are in microseconds. The bandwidth is measured with the
'read' payload of fio, and min-avg-max values are shown in MiB/s. c6%
indicates the time spent in c6 state as percentage during the test for
the CPU running 'fio'.

==
Setting cpu_latency_us limit to 10 (enabled)
  slat: 31, clat: 99, lat: 113, bw: 1156-1332-1359, c6%: 2.8
  slat: 49, clat: 135, lat: 143, bw: 1156-1332-1361, c6%: 1.0
  slat: 67, clat: 148, lat: 156, bw: 1159-1331-1361, c6%: 0.9
  slat: 51, clat: 99, lat: 107, bw: 1160-1330-1356, c6%: 1.0
  slat: 82, clat: 114, lat: 122, bw: 1156-1333-1359, c6%: 1.0
Setting cpu_latency_us limit to -1 (disabled)
  slat: 112, clat: 275, lat: 364, bw: 1153-1334-1364, c6%: 80.0
  slat: 110, clat: 270, lat: 324, bw: 1164-1338-1369, c6%: 80.1
  slat: 106, clat: 260, lat: 320, bw: 1159-1330-1362, c6%: 79.7
  slat: 110, clat: 255, lat: 300, bw: 1156-1332-1363, c6%: 80.2
  slat: 107, clat: 248, lat: 322, bw: 1152-1331-1362, c6%: 79.9
==

As a summary, the c6 induced latencies are eliminated from the
random-read tests ('clat' drops from 250+us to 100-150us), and in the
maximum throughput testing the bandwidth is not impacted negatively
(bandwidth values are pretty much identical) so the overhead introduced
is minimal.

-Tero


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-10-18  7:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-04 10:09 [PATCH 0/1] nvme-pci: Add CPU latency pm-qos handling Tero Kristo
2024-10-04 10:09 ` [PATCH 1/1] " Tero Kristo
2024-10-07  6:19   ` Christoph Hellwig
2024-10-09  6:45     ` Tero Kristo
2024-10-09  8:00       ` Christoph Hellwig
2024-10-09  8:24         ` Tero Kristo
2024-10-15  9:25           ` Tero Kristo
2024-10-15 13:29             ` Christoph Hellwig
2024-10-18  7:58               ` Tero Kristo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox