public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chengming Zhou <chengming.zhou@linux.dev>
To: Tejun Heo <tj@kernel.org>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Chengming Zhou <zhouchengming@bytedance.com>
Subject: Re: [PATCH] blk-mq: fix incorrect rq start_time_ns and alloc_time_ns after throttled
Date: Tue, 6 Jun 2023 18:22:28 +0800	[thread overview]
Message-ID: <c82d6bab-36d0-0403-9304-4415f6ffd972@linux.dev> (raw)
In-Reply-To: <ZH4p8tqFc57_OYoH@slm.duckdns.org>

On 2023/6/6 02:31, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jun 01, 2023 at 01:39:19PM +0800, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> iocost rely on rq start_time_ns and alloc_time_ns to tell the saturation
>> state of the block device.
>>
>> If any qos ->throttle() end up blocking, the cached rq start_time_ns and
>> alloc_time_ns will include its throtted time, which can confuse its user.
> 
> I don't follow. rq_qos_throttle() happens before a request is allocated, so
> whether ->throttle() blocks or not doesn't affect alloc_time_ns or
> start_time_ns.
Yes, most of the time request is allocated after rq_qos_throttle() and its
alloc_time_ns or start_time_ns won't be affected clearly.

But for plug batched allocation introduced by the commit 47c122e35d7e
("block: pre-allocate requests if plug is started and is a batch"), we can
rq_qos_throttle() after the allocation of the request. This is what the
blk_mq_get_cached_request() does.

In this case, the cached request alloc_time_ns or start_time_ns is much ahead
if block in any qos ->throttle().

> 
>> This patch add nr_flush counter in blk_plug, so we can tell if the task
>> has throttled in any qos ->throttle(), in which case we need to correct
>> the rq start_time_ns and alloc_time_ns.
>>
>> Another solution may be make rq_qos_throttle() return bool to indicate
>> if it has throttled in any qos ->throttle(). But this need more changes.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Depending on the flush behavior and adjusting alloc_time_ns seems fragile to
> me and will likely confuse other users of alloc_time_ns too.

I agree with you, this code is not good. My basic idea is to adjust the cached
request alloc_time_ns and start_time_ns when throttled.

> 
> Maybe I'm misunderstanding the problem you're describing. Can you give a
> concrete example of how the current code would misbehave?
> 

I tried using fio to reproduce it:

1. set the iocost qos (a bit strict qos setting to reproduce throttle)

echo "259:0 enable=1 rpct=5 rlat=500 wpct=5 wlat=500" > io.cost.qos

2. run fio using io_uring ioengine (for now only io_uring used batched allocation)

fio --name global --runtime 30 --time_based --size 10G --ioengine io_uring \
--iodepth 256 --buffered 0 --sqthread_poll \
--name job1 --rw read --cgroup job1 --numjobs 10 \
--name job2 --rw write --cgroup job2 --numjobs 10

3. run bpftrace to check request start_time_ns

bpftrace -e 'kprobe:__rq_qos_track { $rq = (struct request *)arg1; if ($rq->start_time_ns) { @delta = hist((nsecs - $rq->start_time_ns)/1000); } }'


If we go blk_mq_get_cached_request() -> throttle() and throttled for some time,
then the returned cached request start_time_ns will be much ahead.

Like below: (delta value is us)

@delta:
[0]               170090 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  898 |                                                    |
[2, 4)               418 |                                                    |
[4, 8)               284 |                                                    |
[8, 16)               54 |                                                    |
[16, 32)             198 |                                                    |
[32, 64)            5416 |@                                                   |
[64, 128)           5082 |@                                                   |
[128, 256)          1296 |                                                    |
[256, 512)            23 |                                                    |
[512, 1K)           2632 |                                                    |
[1K, 2K)           21143 |@@@@@@                                              |
[2K, 4K)           26349 |@@@@@@@@                                            |
[4K, 8K)            4559 |@                                                   |
[8K, 16K)           4273 |@                                                   |
[16K, 32K)            14 |                                                    |


  reply	other threads:[~2023-06-06 10:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-01  5:39 [PATCH] blk-mq: fix incorrect rq start_time_ns and alloc_time_ns after throttled chengming.zhou
2023-06-05 18:31 ` Tejun Heo
2023-06-06 10:22   ` Chengming Zhou [this message]
2023-06-08 22:56     ` Tejun Heo
2023-06-24 15:24       ` Chengming Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c82d6bab-36d0-0403-9304-4415f6ffd972@linux.dev \
    --to=chengming.zhou@linux.dev \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox