From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-kernel@vger.kernel.org
Cc: Mikulas Patocka <mpatocka@redhat.com>,
Mike Snitzer <snitzer@redhat.com>, Ming Lei <ming.lei@redhat.com>
Subject: [PATCH v3 1/3] block/diskstats: more accurate approximation of io_ticks for slow disks
Date: Tue, 24 Mar 2020 09:39:40 +0300 [thread overview]
Message-ID: <158503198072.1955.16227279292140721351.stgit@buzz> (raw)
In-Reply-To: <158503038812.1955.7827988255138056389.stgit@buzz>
Currently io_ticks is approximated by adding one at each start and end of
requests if jiffies counter has changed. This works perfectly for requests
shorter than a jiffy or if one of requests starts/ends at each jiffy.
If disk executes just one request at a time and they are longer than two
jiffies then only first and last jiffies will be accounted.
Fix is simple: at the end of request add up into io_ticks jiffies passed
since last update rather than just one jiffy.
Example: common HDD executes random read 4k requests around 12ms.
fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
iostat -x 10 sdb
Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:
Before:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,60 0,00 330,40 0,00 8,00 0,96 12,09 12,09 0,00 1,02 8,43
After:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0,00 0,00 82,50 0,00 330,00 0,00 8,00 1,00 12,10 12,10 0,00 12,12 99,99
For load estimation "%util" is not as useful as average queue length,
but it clearly shows how often disk queue is completely empty.
Fixes: 5b18b5a73760 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
Documentation/admin-guide/iostats.rst | 5 ++++-
block/bio.c | 8 ++++----
block/blk-core.c | 4 ++--
include/linux/genhd.h | 2 +-
4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
index df5b8345c41d..9b14b0c2c9c4 100644
--- a/Documentation/admin-guide/iostats.rst
+++ b/Documentation/admin-guide/iostats.rst
@@ -100,7 +100,7 @@ Field 10 -- # of milliseconds spent doing I/Os (unsigned int)
Since 5.0 this field counts jiffies when at least one request was
started or completed. If request runs more than 2 jiffies then some
- I/O time will not be accounted unless there are other requests.
+ I/O time might be not accounted in case of concurrent requests.
Field 11 -- weighted # of milliseconds spent doing I/Os (unsigned int)
This field is incremented at each I/O start, I/O completion, I/O
@@ -143,6 +143,9 @@ are summed (possibly overflowing the unsigned long variable they are
summed to) and the result given to the user. There is no convenient
user interface for accessing the per-CPU counters themselves.
+Since 4.19 request times are measured with nanoseconds precision and
+truncated to milliseconds before showing in this interface.
+
Disks vs Partitions
-------------------
diff --git a/block/bio.c b/block/bio.c
index 0985f3422556..b1053eb7af37 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1762,14 +1762,14 @@ void bio_check_pages_dirty(struct bio *bio)
schedule_work(&bio_dirty_work);
}
-void update_io_ticks(struct hd_struct *part, unsigned long now)
+void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
{
unsigned long stamp;
again:
stamp = READ_ONCE(part->stamp);
if (unlikely(stamp != now)) {
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
- __part_stat_add(part, io_ticks, 1);
+ __part_stat_add(part, io_ticks, end ? now - stamp : 1);
}
}
if (part->partno) {
@@ -1785,7 +1785,7 @@ void generic_start_io_acct(struct request_queue *q, int op,
part_stat_lock();
- update_io_ticks(part, jiffies);
+ update_io_ticks(part, jiffies, false);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, sectors[sgrp], sectors);
part_inc_in_flight(q, part, op_is_write(op));
@@ -1803,7 +1803,7 @@ void generic_end_io_acct(struct request_queue *q, int req_op,
part_stat_lock();
- update_io_ticks(part, now);
+ update_io_ticks(part, now, true);
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
part_stat_add(part, time_in_queue, duration);
part_dec_in_flight(q, part, op_is_write(req_op));
diff --git a/block/blk-core.c b/block/blk-core.c
index abfdcf81a228..4401b30a1751 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1337,7 +1337,7 @@ void blk_account_io_done(struct request *req, u64 now)
part_stat_lock();
part = req->part;
- update_io_ticks(part, jiffies);
+ update_io_ticks(part, jiffies, true);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, nsecs[sgrp], now - req->start_time_ns);
part_stat_add(part, time_in_queue, nsecs_to_jiffies64(now - req->start_time_ns));
@@ -1379,7 +1379,7 @@ void blk_account_io_start(struct request *rq, bool new_io)
rq->part = part;
}
- update_io_ticks(part, jiffies);
+ update_io_ticks(part, jiffies, false);
part_stat_unlock();
}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index d5c75df64bba..f1066f10b062 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -467,7 +467,7 @@ static inline void free_part_info(struct hd_struct *part)
kfree(part->info);
}
-void update_io_ticks(struct hd_struct *part, unsigned long now);
+void update_io_ticks(struct hd_struct *part, unsigned long now, bool end);
/* block/genhd.c */
extern void device_add_disk(struct device *parent, struct gendisk *disk,
next prev parent reply other threads:[~2020-03-24 6:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-24 6:39 [PATCH v3 0/3] block/diskstats: more accurate io_ticks and optimization Konstantin Khlebnikov
2020-03-24 6:39 ` Konstantin Khlebnikov [this message]
2020-03-24 14:06 ` [PATCH v3 1/3] block/diskstats: more accurate approximation of io_ticks for slow disks Ming Lei
2020-03-25 3:40 ` Ming Lei
2020-03-25 6:28 ` Konstantin Khlebnikov
2020-03-25 8:02 ` Konstantin Khlebnikov
2020-03-25 8:54 ` Ming Lei
2020-03-25 13:02 ` Konstantin Khlebnikov
2020-03-26 7:53 ` Ming Lei
2020-03-24 6:39 ` [PATCH v3 2/3] block/diskstats: accumulate all per-cpu counters in one pass Konstantin Khlebnikov
2020-03-24 6:39 ` [PATCH v3 3/3] block/diskstats: replace time_in_queue with sum of request times Konstantin Khlebnikov
2020-03-24 16:06 ` [PATCH v3 0/3] block/diskstats: more accurate io_ticks and optimization Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=158503198072.1955.16227279292140721351.stgit@buzz \
--to=khlebnikov@yandex-team.ru \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox