Linux block layer
 help / color / mirror / Atom feed
From: Diangang Li <diangangli@gmail.com>
To: axboe@kernel.dk
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Diangang Li <lidiangang@bytedance.com>
Subject: [RFC 0/1] block: export I/O latency histograms
Date: Thu,  2 Jul 2026 21:27:11 +0800	[thread overview]
Message-ID: <20260702132712.2255703-1-diangangli@gmail.com> (raw)

From: Diangang Li <lidiangang@bytedance.com>

Hi,

The existing block I/O statistics count completed I/Os and accumulate the
time spent in each operation group. That works for average latency, but
not for the tail. Once the time is folded into a single total, userspace
cannot tell whether a device saw a steady stream of moderate I/Os or a
small number of very slow ones.

This RFC adds cumulative latency histograms for block devices and
partitions. The new accounting is in the same completion paths as the
existing I/O statistics and uses the same operation groups: read, write,
discard, and flush.

Two proc files are added:

  /proc/disk_lat_buckets
        bucket upper bounds, in microseconds

  /proc/disk_lat_hists
        cumulative histogram counters

/proc/disk_lat_hists follows the shape of /proc/diskstats. Each
reported device or partition has four consecutive lines, in read,
write, discard, flush order. Each line starts with the major number,
minor number, and device name, followed by the bucket counters.
Userspace can sample the file twice and compute interval histograms and
percentiles from the deltas.

eBPF is useful for targeted debugging, but it is not a good match for
this interface. These counters are block accounting data, tied to the
same accounting points as diskstats and readable without a resident
userspace collector.

The histogram storage is per block_device and optional. If allocation
fails, bd_lat_hist remains NULL and regular I/O statistics keep working.
The record side uses per-cpu counters.

The current bucket table has 24 upper bounds, from 10 us to 8 seconds,
which gives 25 counters. That covers both fast NVMe devices and slow
disks without making the per-device state too large.

Fio tests on NVMe and HDD devices did not show a consistent performance
regression, and confirmed that histogram deltas match the corresponding
diskstats completion counters.

Diangang Li (1):
  block: export I/O latency histograms

 Documentation/ABI/testing/procfs-diskstats |  25 ++++
 block/Makefile                             |   2 +-
 block/bdev.c                               |   2 +
 block/blk-core.c                           |   4 +-
 block/blk-flush.c                          |   5 +-
 block/blk-mq.c                             |   4 +-
 block/blk.h                                |   7 +
 block/disk-lat-hist.c                      | 158 +++++++++++++++++++++
 block/genhd.c                              |  10 ++
 include/linux/blk_types.h                  |   1 +
 10 files changed, 213 insertions(+), 5 deletions(-)
 create mode 100644 block/disk-lat-hist.c

-- 
2.39.5


             reply	other threads:[~2026-07-02 13:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-02 13:27 Diangang Li [this message]
2026-07-02 13:27 ` [RFC 1/1] block: export I/O latency histograms Diangang Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260702132712.2255703-1-diangangli@gmail.com \
    --to=diangangli@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=lidiangang@bytedance.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox