public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: tj@kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
Date: Wed, 30 Mar 2022 08:28:28 -0400	[thread overview]
Message-ID: <YkRM7Iyp8m6A1BCl@fedora> (raw)
In-Reply-To: <YkSK6mU1fja2OykG@redhat.com>

Hi Mike,

On Wed, Mar 30, 2022 at 12:52:58PM -0400, Mike Snitzer wrote:
> Hey Tejun and Dennis,
> 
> I recently found that due to bio_set_dev()'s call to
> bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
> especially when doing 4K IOs via io_uring's HIPRI bio-polling.
> 
> I'm very naive about blk-cgroups.. so I'm hopeful you or others can
> help me cut through this to understand what the ideal outcome should
> be for DM's bio clone + remap heavy use-case as it relates to
> bio_associate_blkg.
> 
> If I hack dm-linear with a local __bio_set_dev that simply removes
> the call to bio_associate_blkg() my IOPS go from ~980K to 995K.
> 
> Looking at what is happening a bit, relative to this DM bio cloning
> usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
> clone the blkg from DM device, then dm-linear.c:linear_map's call
> to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
> but then it triggers an update because the bdev is being remapped in
> the bio (due to linear_map sending the IO to the real underlying
> device). End result _seems_ like collective wasteful effort to get the
> blk-cgroup resources setup properly in the face of a simple remap.
> 
> Seems the current DM pattern is causing repeat blkg work for _every_
> remapped bio?  Do you see a way to speed up repeat calls to
> bio_associate_blkg()?
> 

I must admit I wrote this with limited knowledge of bio cloning at the
time. I can fill in the thought process here.

The idea was every bio should have a blkg associated with it for io
accounting and things like blk-iolatency and blk-iocost. The device
abstraction I believe means we can set limits here as well on submission
rate to the md device.

I think cloning is a special case that I might have gotten wrong. If
there is a bio_set_dev() call after each clone(), then the
bio_clone_blkg_association() is excess work. We'd need to audit how
bio_alloc_clone() is being used to be safe. Alternatively, we could opt
for a bio_alloc_clone_noblkg(), but that's a little bit uglier.

1. bio_set_dev() above md <- needed so we can do throttling on the md.
2. bio_alloc_clone() <- doesn't need to clone the blkg() info.
3. bio_set_dev() in md <- sets the right underlying device association.

Thanks,
Dennis

> Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
> kernel should be fine too):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> I'm using dm-linear ontop on a 16G blk-mq null_blk device:
> 
> modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
> SIZE=`blockdev --getsz /dev/nullb0`
> echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear
> 
> And running the workload with fio using this wrapper script:
> io_uring.sh 20 1 /dev/mapper/linear 4096
> 
> #!/bin/bash
> 
> RTIME=$1
> JOBS=$2
> DEV=$3
> BS=$4
> 
> QD=64
> BATCH=16
> HI=1
> 
> fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
>         --iodepth=$QD \
>         --iodepth_batch_submit=$BATCH \
>         --iodepth_batch_complete_min=$BATCH \
>         --filename=$DEV \
>         --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
>         --name=test --group_reporting

  reply	other threads:[~2022-03-30 19:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 16:52 can we reduce bio_set_dev overhead due to bio_associate_blkg? Mike Snitzer
2022-03-30 12:28 ` Dennis Zhou [this message]
2022-03-31  4:39   ` Christoph Hellwig
2022-03-31  5:52     ` Dennis Zhou
2022-03-31  9:15       ` Christoph Hellwig
2022-04-08 15:42         ` Mike Snitzer
2022-04-09  5:15           ` Christoph Hellwig
2022-04-11 16:58             ` Mike Snitzer
2022-04-11 17:16               ` Mike Snitzer
2022-04-11 17:33                 ` [PATCH] block: remove redundant blk-cgroup init from __bio_clone Mike Snitzer
2022-04-12  5:27                   ` Christoph Hellwig
2022-04-12  7:52                     ` Dennis Zhou
2022-04-23 16:55                   ` Christoph Hellwig
2022-04-26 17:30                     ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YkRM7Iyp8m6A1BCl@fedora \
    --to=dennis@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox