All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Nitesh Shetty <nj.shetty@samsung.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, io-uring@vger.kernel.org,
	Caleb Sander Mateos <csander@purestorage.com>,
	huang-jl <huang-jl@deepseek.com>
Subject: Re: [PATCH 1/3] block: fix bio_may_need_split() by using bvec iterator way
Date: Fri, 19 Dec 2025 00:08:51 +0800	[thread overview]
Message-ID: <aUQnE0b46XFGrLOd@fedora> (raw)
In-Reply-To: <20251218151522.m4dgyf4nkrfsgkrv@green245.gost>

On Thu, Dec 18, 2025 at 08:46:47PM +0530, Nitesh Shetty wrote:
> On 18/12/25 05:45PM, Ming Lei wrote:
> > On Thu, Dec 18, 2025 at 01:37:37AM -0800, Christoph Hellwig wrote:
> > > On Thu, Dec 18, 2025 at 05:31:42PM +0800, Ming Lei wrote:
> > > > ->bi_vcnt doesn't make sense for cloned bio, which is perfectly fine
> > > > passed to bio_may_need_split().
> > > >
> > > > So fix bio_may_need_split() by not taking ->bi_vcnt directly, instead
> > > > checking with help from bio size and bvec->len.
> > > >
> > > > Meantime retrieving the 1st bvec via __bvec_iter_bvec().
> > > 
> > > That totally misses the point.  The ->bi_vcnt is a fast and lose
> > > check to see if we need the fairly expensive iterators to do the
> > > real check.
> > 
> > It is just __bvec_iter_bvec(), whatever it should be in cache sooner or
> > later.
> > 
> > 
> Functionality wise overall patch looks fine to me, but as Christoph
> stated there is slight performance(IOPS) penalty.
> Here is my benchmarking numbers[1], I suspect Jens setup might show
> more regression.
> 
> Regards,
> Nitesh
> 
> 
> [1]
> ===============================
> a. two optane nvme device setup:
> ----------
> base case:
> ----------
> sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \
> -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=0, tid=206586, file=/dev/nvme0n1, nfiles=1, node=-1
> submitter=1, tid=206587, file=/dev/nvme1n1, nfiles=1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=6.45M, BW=3.15GiB/s, IOS/call=32/31
> IOPS=6.47M, BW=3.16GiB/s, IOS/call=32/32
> IOPS=6.47M, BW=3.16GiB/s, IOS/call=32/32
> Exiting on timeout
> Maximum IOPS=6.47M
> 
> ----------------
> with this patch:
> ----------------
> sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \
> -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=0, tid=6352, file=/dev/nvme0n1, nfiles=1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> submitter=1, tid=6353, file=/dev/nvme1n1, nfiles=1, node=-1
> IOPS=6.30M, BW=3.08GiB/s, IOS/call=32/31
> IOPS=6.35M, BW=3.10GiB/s, IOS/call=32/31
> IOPS=6.37M, BW=3.11GiB/s, IOS/call=32/32
> Exiting on timeout
> Maximum IOPS=6.37M
> 
> =============================
> b. two null-blk device setup:
> ------------------
> null device setup:
> ------------------
> sudo modprobe null_blk queue_mode=2 gb=10 bs=512 nr_devices=2 irqmode=2 \
> completion_nsec=1000000 hw_queue_depth=256 memory_backed=0 discard=0 \
> use_per_node_hctx=1
> 
> ----------
> base case:
> ----------
> sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \
> -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nullb0 /dev/nullb1
> submitter=0, tid=6743, file=/dev/nullb0, nfiles=1, node=-1
> submitter=1, tid=6744, file=/dev/nullb1, nfiles=1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=7.89M, BW=3.85GiB/s, IOS/call=32/31
> IOPS=7.96M, BW=3.89GiB/s, IOS/call=32/32
> IOPS=7.99M, BW=3.90GiB/s, IOS/call=32/32
> Exiting on timeout
> Maximum IOPS=7.99M
> 
> -------------------
> with this patchset:
> -------------------
> sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \
> -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nullb0 /dev/nullb1
> submitter=0, tid=35633, file=/dev/nullb0, nfiles=1, node=-1
> submitter=1, tid=35634, file=/dev/nullb1, nfiles=1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=7.79M, BW=3.80GiB/s, IOS/call=32/31
> IOPS=7.86M, BW=3.84GiB/s, IOS/call=32/32
> IOPS=7.89M, BW=3.85GiB/s, IOS/call=32/32
> Exiting on timeout
> Maximum IOPS=7.89M

Thanks for the perf test!

This patch only adds bio->bi_iter memory footprint, which is supposed
to hit from L1, maybe because `bi_io_vec` is in the 2nd cacheline, can
you see any difference with the following change?


diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 5dc061d318a4..1c4570b37436 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -240,6 +240,7 @@ struct bio {
                /* for plugged zoned writes only: */
                unsigned int            __bi_nr_segments;
        };
+       struct bio_vec          *bi_io_vec;     /* the actual vec list */
        bio_end_io_t            *bi_end_io;
        void                    *bi_private;
 #ifdef CONFIG_BLK_CGROUP
@@ -275,8 +276,6 @@ struct bio {

        atomic_t                __bi_cnt;       /* pin count */

-       struct bio_vec          *bi_io_vec;     /* the actual vec list */
-
        struct bio_set          *bi_pool;
 };



Thanks,
Ming


  reply	other threads:[~2025-12-18 16:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18  9:31 [PATCH 0/3] block: fix bi_vcnt misuse for cloned bio Ming Lei
2025-12-18  9:31 ` [PATCH 1/3] block: fix bio_may_need_split() by using bvec iterator way Ming Lei
2025-12-18  9:37   ` Christoph Hellwig
2025-12-18  9:45     ` Ming Lei
2025-12-18 15:16       ` Nitesh Shetty
2025-12-18 16:08         ` Ming Lei [this message]
2025-12-20  8:16           ` Nitesh Shetty
2025-12-20  8:19   ` Nitesh Shetty
2025-12-18  9:31 ` [PATCH 2/3] block: don't initialize bi_vcnt for cloned bio in bio_iov_bvec_set() Ming Lei
2025-12-18  9:38   ` Christoph Hellwig
2025-12-18  9:48     ` Ming Lei
2025-12-18  9:31 ` [PATCH 3/3] io_uring: don't re-calculate iov_iter nr_segs in io_import_kbuf() Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUQnE0b46XFGrLOd@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=hch@infradead.org \
    --cc=huang-jl@deepseek.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=nj.shetty@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.