From: Jens Axboe <axboe@kernel.dk>
To: Yu Kuai <yukuai1@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>
Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org,
song@kernel.org, dm-devel@redhat.com, ira.weiny@intel.com,
agk@redhat.com, drbd-dev@lists.linbit.com, dave.jiang@intel.com,
christoph.boehmwalder@linbit.com, vishal.l.verma@intel.com,
konrad.wilk@oracle.com, "yukuai \(C\)" <yukuai3@huawei.com>,
kent.overstreet@gmail.com, ngupta@vflare.org, kch@nvidia.com,
senozhatsky@chromium.org,
Gulam Mohamed <gulam.mohamed@oracle.com>,
snitzer@kernel.org, colyli@suse.de, linux-block@vger.kernel.org,
linux-bcache@vger.kernel.org, dan.j.williams@intel.com,
linux-raid@vger.kernel.org, martin.petersen@oracle.com,
philipp.reisner@linbit.com, junxiao.bi@oracle.com,
minchan@kernel.org, lars.ellenberg@linbit.com
Subject: Re: [dm-devel] [RFC] block: Change the granularity of io ticks from ms to ns
Date: Wed, 7 Dec 2022 10:22:09 -0700 [thread overview]
Message-ID: <b8deb6fa-8a09-c1af-278f-24e66afe367d@kernel.dk> (raw)
In-Reply-To: <aadfc6d2-ad04-279c-a1d6-7f634d0b2c99@huaweicloud.com>
On 12/7/22 6:09 AM, Yu Kuai wrote:
> Hi,
>
> 在 2022/12/07 11:15, Ming Lei 写道:
>> On Wed, Dec 07, 2022 at 10:19:08AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2022/12/07 2:15, Gulam Mohamed 写道:
>>>> Use ktime to change the granularity of IO accounting in block layer from
>>>> milli-seconds to nano-seconds to get the proper latency values for the
>>>> devices whose latency is in micro-seconds. After changing the granularity
>>>> to nano-seconds the iostat command, which was showing incorrect values for
>>>> %util, is now showing correct values.
>>>
>>> This patch didn't correct the counting of io_ticks, just make the
>>> error accounting from jiffies(ms) to ns. The problem that util can be
>>> smaller or larger still exist.
>>
>> Agree.
>>
>>>
>>> However, I think this change make sense consider that error margin is
>>> much smaller, and performance overhead should be minimum.
>>>
>>> Hi, Ming, how do you think?
>>
>> I remembered that ktime_get() has non-negligible overhead, is there any
>> test data(iops/cpu utilization) when running fio or t/io_uring on
>> null_blk with this patch?
>
> Yes, testing with null_blk is necessary, we don't want any performance
> regression.
null_blk is fine as a substitute, but I'd much rather run this on my
test bench with actual IO and devices.
> BTW, I thought it's fine because it's already used for tracking io
> latency.
Reading a nsec timestamp is a LOT more expensive than reading jiffies,
which is essentially free. If you look at the amount of work that's
gone into minimizing ktime_get() for the fast path in the IO stack,
then that's a testament to that.
So that's a very bad assumption, and definitely wrong.
--
Jens Axboe
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Yu Kuai <yukuai1@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>
Cc: Gulam Mohamed <gulam.mohamed@oracle.com>,
linux-block@vger.kernel.org, philipp.reisner@linbit.com,
lars.ellenberg@linbit.com, christoph.boehmwalder@linbit.com,
minchan@kernel.org, ngupta@vflare.org, senozhatsky@chromium.org,
colyli@suse.de, kent.overstreet@gmail.com, agk@redhat.com,
snitzer@kernel.org, dm-devel@redhat.com, song@kernel.org,
dan.j.williams@intel.com, vishal.l.verma@intel.com,
dave.jiang@intel.com, ira.weiny@intel.com, junxiao.bi@oracle.com,
martin.petersen@oracle.com, kch@nvidia.com,
drbd-dev@lists.linbit.com, linux-kernel@vger.kernel.org,
linux-bcache@vger.kernel.org, linux-raid@vger.kernel.org,
nvdimm@lists.linux.dev, konrad.wilk@oracle.com,
"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [RFC] block: Change the granularity of io ticks from ms to ns
Date: Wed, 7 Dec 2022 10:22:09 -0700 [thread overview]
Message-ID: <b8deb6fa-8a09-c1af-278f-24e66afe367d@kernel.dk> (raw)
In-Reply-To: <aadfc6d2-ad04-279c-a1d6-7f634d0b2c99@huaweicloud.com>
On 12/7/22 6:09 AM, Yu Kuai wrote:
> Hi,
>
> 在 2022/12/07 11:15, Ming Lei 写道:
>> On Wed, Dec 07, 2022 at 10:19:08AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2022/12/07 2:15, Gulam Mohamed 写道:
>>>> Use ktime to change the granularity of IO accounting in block layer from
>>>> milli-seconds to nano-seconds to get the proper latency values for the
>>>> devices whose latency is in micro-seconds. After changing the granularity
>>>> to nano-seconds the iostat command, which was showing incorrect values for
>>>> %util, is now showing correct values.
>>>
>>> This patch didn't correct the counting of io_ticks, just make the
>>> error accounting from jiffies(ms) to ns. The problem that util can be
>>> smaller or larger still exist.
>>
>> Agree.
>>
>>>
>>> However, I think this change make sense consider that error margin is
>>> much smaller, and performance overhead should be minimum.
>>>
>>> Hi, Ming, how do you think?
>>
>> I remembered that ktime_get() has non-negligible overhead, is there any
>> test data(iops/cpu utilization) when running fio or t/io_uring on
>> null_blk with this patch?
>
> Yes, testing with null_blk is necessary, we don't want any performance
> regression.
null_blk is fine as a substitute, but I'd much rather run this on my
test bench with actual IO and devices.
> BTW, I thought it's fine because it's already used for tracking io
> latency.
Reading a nsec timestamp is a LOT more expensive than reading jiffies,
which is essentially free. If you look at the amount of work that's
gone into minimizing ktime_get() for the fast path in the IO stack,
then that's a testament to that.
So that's a very bad assumption, and definitely wrong.
--
Jens Axboe
WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <axboe@kernel.dk>
To: Yu Kuai <yukuai1@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>
Cc: nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org,
song@kernel.org, dm-devel@redhat.com, ira.weiny@intel.com,
agk@redhat.com, drbd-dev@lists.linbit.com, dave.jiang@intel.com,
vishal.l.verma@intel.com, konrad.wilk@oracle.com,
"yukuai \(C\)" <yukuai3@huawei.com>,
kent.overstreet@gmail.com, ngupta@vflare.org, kch@nvidia.com,
senozhatsky@chromium.org,
Gulam Mohamed <gulam.mohamed@oracle.com>,
snitzer@kernel.org, colyli@suse.de, linux-block@vger.kernel.org,
linux-bcache@vger.kernel.org, dan.j.williams@intel.com,
linux-raid@vger.kernel.org, martin.petersen@oracle.com,
philipp.reisner@linbit.com, junxiao.bi@oracle.com,
minchan@kernel.org, lars.ellenberg@linbit.com
Subject: Re: [Drbd-dev] [RFC] block: Change the granularity of io ticks from ms to ns
Date: Wed, 7 Dec 2022 10:22:09 -0700 [thread overview]
Message-ID: <b8deb6fa-8a09-c1af-278f-24e66afe367d@kernel.dk> (raw)
In-Reply-To: <aadfc6d2-ad04-279c-a1d6-7f634d0b2c99@huaweicloud.com>
On 12/7/22 6:09 AM, Yu Kuai wrote:
> Hi,
>
> 在 2022/12/07 11:15, Ming Lei 写道:
>> On Wed, Dec 07, 2022 at 10:19:08AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2022/12/07 2:15, Gulam Mohamed 写道:
>>>> Use ktime to change the granularity of IO accounting in block layer from
>>>> milli-seconds to nano-seconds to get the proper latency values for the
>>>> devices whose latency is in micro-seconds. After changing the granularity
>>>> to nano-seconds the iostat command, which was showing incorrect values for
>>>> %util, is now showing correct values.
>>>
>>> This patch didn't correct the counting of io_ticks, just make the
>>> error accounting from jiffies(ms) to ns. The problem that util can be
>>> smaller or larger still exist.
>>
>> Agree.
>>
>>>
>>> However, I think this change make sense consider that error margin is
>>> much smaller, and performance overhead should be minimum.
>>>
>>> Hi, Ming, how do you think?
>>
>> I remembered that ktime_get() has non-negligible overhead, is there any
>> test data(iops/cpu utilization) when running fio or t/io_uring on
>> null_blk with this patch?
>
> Yes, testing with null_blk is necessary, we don't want any performance
> regression.
null_blk is fine as a substitute, but I'd much rather run this on my
test bench with actual IO and devices.
> BTW, I thought it's fine because it's already used for tracking io
> latency.
Reading a nsec timestamp is a LOT more expensive than reading jiffies,
which is essentially free. If you look at the amount of work that's
gone into minimizing ktime_get() for the fast path in the IO stack,
then that's a testament to that.
So that's a very bad assumption, and definitely wrong.
--
Jens Axboe
next prev parent reply other threads:[~2022-12-07 17:22 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 18:15 [dm-devel] [RFC] block: Change the granularity of io ticks from ms to ns Gulam Mohamed
2022-12-06 18:15 ` [Drbd-dev] " Gulam Mohamed
2022-12-06 18:15 ` Gulam Mohamed
2022-12-06 19:31 ` [dm-devel] " Paul Menzel
2022-12-06 19:31 ` [Drbd-dev] " Paul Menzel
2022-12-06 19:31 ` Paul Menzel
2022-12-06 22:50 ` kernel test robot
2022-12-07 2:19 ` [dm-devel] " Yu Kuai
2022-12-07 2:19 ` [Drbd-dev] " Yu Kuai
2022-12-07 2:19 ` Yu Kuai
2022-12-07 3:15 ` [dm-devel] " Ming Lei
2022-12-07 3:15 ` [Drbd-dev] " Ming Lei
2022-12-07 3:15 ` Ming Lei
2022-12-07 13:09 ` [dm-devel] " Yu Kuai
2022-12-07 13:09 ` [Drbd-dev] " Yu Kuai
2022-12-07 13:09 ` Yu Kuai
2022-12-07 17:22 ` Jens Axboe [this message]
2022-12-07 17:22 ` [Drbd-dev] " Jens Axboe
2022-12-07 17:22 ` Jens Axboe
2022-12-07 5:55 ` kernel test robot
2022-12-07 17:08 ` [dm-devel] " Jens Axboe
2022-12-07 17:08 ` [Drbd-dev] " Jens Axboe
2022-12-07 17:08 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b8deb6fa-8a09-c1af-278f-24e66afe367d@kernel.dk \
--to=axboe@kernel.dk \
--cc=agk@redhat.com \
--cc=christoph.boehmwalder@linbit.com \
--cc=colyli@suse.de \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dm-devel@redhat.com \
--cc=drbd-dev@lists.linbit.com \
--cc=gulam.mohamed@oracle.com \
--cc=ira.weiny@intel.com \
--cc=junxiao.bi@oracle.com \
--cc=kch@nvidia.com \
--cc=kent.overstreet@gmail.com \
--cc=konrad.wilk@oracle.com \
--cc=lars.ellenberg@linbit.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=minchan@kernel.org \
--cc=ming.lei@redhat.com \
--cc=ngupta@vflare.org \
--cc=nvdimm@lists.linux.dev \
--cc=philipp.reisner@linbit.com \
--cc=senozhatsky@chromium.org \
--cc=snitzer@kernel.org \
--cc=song@kernel.org \
--cc=vishal.l.verma@intel.com \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.