From: Christoph Hellwig <hch@infradead.org>
To: Sean Anderson <seanga2@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
linux-mtd@lists.infradead.org,
Zhihao Cheng <chengzhihao1@huawei.com>
Subject: Re: bio segment constraints
Date: Mon, 7 Apr 2025 00:07:11 -0700 [thread overview]
Message-ID: <Z_N5nxLDOBb5NDAM@infradead.org> (raw)
In-Reply-To: <8dfd97ac-59e7-ae69-238a-85b7a2dae4f1@gmail.com>
On Sun, Apr 06, 2025 at 03:40:04PM -0400, Sean Anderson wrote:
> Hi all,
>
> I'm not really sure what guarantees the block layer makes regarding the
> segments in a bio as part of a request submitted to a block driver. As
> far as I can tell this is not documented anywhere. In particular,
First you need to define what segment you mean. We have at least two and
a half historical uses of the name. One is for each bio_vec attached to
the bio, either directly as submitted into ->submit_bio for bio based
drivers (case 1a), or generated by bio_split_to_limits (case 1b), which
is called for every blk-mq driver before calling into ->queue_rq(s) or
explicitly called by a few bio based driver.
The other is the bio-vec synthesized by bio_for_each_segment (case 2).
> - Is bv_len aligned to SECTOR_SIZE?
Yes.
> - To logical_sector_size?
Yes.
> - What if logical_sector_size > PAGE_SIZE?
Still always aligned to logical_sector_size.
> - What about bv_offset?
bv_offset is a memory offset and must only be aligned to the
dma_alignment limit.
> - Is it possible to have a bio where the total length is a multiple of
> logical_sector_size, but the data is split across several segments
> where each segment is a multiple of SECTOR_SIZE?
Yes.
> - Is is possible to have segments not even aligned to SECTOR_SIZE?
No.
> - Can I somehow request to only get segments with bv_len aligned to
> logical_sector_size?
For drivers that use bio_split_to_limits implicitly or explicitly you can
do that by setting the right seg_boundary_mask.
> make some big assumptions (which might be bugs?) For example, in
> drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
> - There is only one bio in a request. This one is a bit of a soft
> assumption since we should only flush the pages in the bio and not the
> whole request otherwise.
It always operates on the first bio in the request and then uses
blk_update_request to move the context past that. It is an old
and somewhat arkane way to write drivers, but should work. The
rq_for_each_segment looks do call flush_dcache_page look horribly
wrong for this model, though.
> - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
> maps a single page, so if we go past one page we end up in adjacent
> kmapped pages.
Yes, this looks broken.
> Am I missing something here? Handling highmem seems like a persistent
> issue. E.g. drivers/mtd/ubi/block.c doesn't even bother doing a kmap.
> Should both of these have BLK_FEAT_BOUNCE_HIGH?
BLK_FEAT_BOUNCE_HIGH needs to go away rather sooner than later.
in the short run the best fix would be to synthesized a
bio_for_each_segment like bio_vec that stays inside a single page
using bio_iter_iovec) at the top of do_blktrans_request and use
that for all references to the data.
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@infradead.org>
To: Sean Anderson <seanga2@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
linux-mtd@lists.infradead.org,
Zhihao Cheng <chengzhihao1@huawei.com>
Subject: Re: bio segment constraints
Date: Mon, 7 Apr 2025 00:07:11 -0700 [thread overview]
Message-ID: <Z_N5nxLDOBb5NDAM@infradead.org> (raw)
In-Reply-To: <8dfd97ac-59e7-ae69-238a-85b7a2dae4f1@gmail.com>
On Sun, Apr 06, 2025 at 03:40:04PM -0400, Sean Anderson wrote:
> Hi all,
>
> I'm not really sure what guarantees the block layer makes regarding the
> segments in a bio as part of a request submitted to a block driver. As
> far as I can tell this is not documented anywhere. In particular,
First you need to define what segment you mean. We have at least two and
a half historical uses of the name. One is for each bio_vec attached to
the bio, either directly as submitted into ->submit_bio for bio based
drivers (case 1a), or generated by bio_split_to_limits (case 1b), which
is called for every blk-mq driver before calling into ->queue_rq(s) or
explicitly called by a few bio based driver.
The other is the bio-vec synthesized by bio_for_each_segment (case 2).
> - Is bv_len aligned to SECTOR_SIZE?
Yes.
> - To logical_sector_size?
Yes.
> - What if logical_sector_size > PAGE_SIZE?
Still always aligned to logical_sector_size.
> - What about bv_offset?
bv_offset is a memory offset and must only be aligned to the
dma_alignment limit.
> - Is it possible to have a bio where the total length is a multiple of
> logical_sector_size, but the data is split across several segments
> where each segment is a multiple of SECTOR_SIZE?
Yes.
> - Is is possible to have segments not even aligned to SECTOR_SIZE?
No.
> - Can I somehow request to only get segments with bv_len aligned to
> logical_sector_size?
For drivers that use bio_split_to_limits implicitly or explicitly you can
do that by setting the right seg_boundary_mask.
> make some big assumptions (which might be bugs?) For example, in
> drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
> - There is only one bio in a request. This one is a bit of a soft
> assumption since we should only flush the pages in the bio and not the
> whole request otherwise.
It always operates on the first bio in the request and then uses
blk_update_request to move the context past that. It is an old
and somewhat arkane way to write drivers, but should work. The
rq_for_each_segment looks do call flush_dcache_page look horribly
wrong for this model, though.
> - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
> maps a single page, so if we go past one page we end up in adjacent
> kmapped pages.
Yes, this looks broken.
> Am I missing something here? Handling highmem seems like a persistent
> issue. E.g. drivers/mtd/ubi/block.c doesn't even bother doing a kmap.
> Should both of these have BLK_FEAT_BOUNCE_HIGH?
BLK_FEAT_BOUNCE_HIGH needs to go away rather sooner than later.
in the short run the best fix would be to synthesized a
bio_for_each_segment like bio_vec that stays inside a single page
using bio_iter_iovec) at the top of do_blktrans_request and use
that for all references to the data.
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next prev parent reply other threads:[~2025-04-07 7:07 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-06 19:40 bio segment constraints Sean Anderson
2025-04-06 19:40 ` Sean Anderson
2025-04-07 7:07 ` Christoph Hellwig [this message]
2025-04-07 7:07 ` Christoph Hellwig
2025-04-07 13:46 ` Keith Busch
2025-04-07 13:46 ` Keith Busch
2025-04-07 13:59 ` Christoph Hellwig
2025-04-07 13:59 ` Christoph Hellwig
2025-04-07 15:52 ` Bart Van Assche
2025-04-07 15:52 ` Bart Van Assche
2025-04-07 13:59 ` Sean Anderson
2025-04-07 13:59 ` Sean Anderson
2025-04-07 14:12 ` Christoph Hellwig
2025-04-07 14:12 ` Christoph Hellwig
2025-04-07 7:10 ` Hannes Reinecke
2025-04-07 7:10 ` Hannes Reinecke
2025-04-07 14:14 ` Sean Anderson
2025-04-07 14:14 ` Sean Anderson
2025-04-08 6:10 ` Hannes Reinecke
2025-04-08 6:10 ` Hannes Reinecke
2025-04-08 13:57 ` Sean Anderson
2025-04-08 13:57 ` Sean Anderson
2025-04-08 14:33 ` Keith Busch
2025-04-08 14:33 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_N5nxLDOBb5NDAM@infradead.org \
--to=hch@infradead.org \
--cc=axboe@kernel.dk \
--cc=chengzhihao1@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=miquel.raynal@bootlin.com \
--cc=richard@nod.at \
--cc=seanga2@gmail.com \
--cc=vigneshr@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.