From: Sean Anderson <seanga2@gmail.com>
To: Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
linux-mtd@lists.infradead.org,
Zhihao Cheng <chengzhihao1@huawei.com>
Subject: Re: bio segment constraints
Date: Mon, 7 Apr 2025 10:14:28 -0400 [thread overview]
Message-ID: <a0ffa9b9-8649-1b63-3d56-3fc45fdfda83@gmail.com> (raw)
In-Reply-To: <8a232716-74f8-4bba-a514-d0f766492344@suse.de>
On 4/7/25 03:10, Hannes Reinecke wrote:
> On 4/6/25 21:40, Sean Anderson wrote:
>> Hi all,
>>
>> I'm not really sure what guarantees the block layer makes regarding the
>> segments in a bio as part of a request submitted to a block driver. As
>> far as I can tell this is not documented anywhere. In particular,
>>
>> - Is bv_len aligned to SECTOR_SIZE?
>
> The block layer always uses a 512 byte sector size, so yes.
>
>> - To logical_sector_size?
>
> Not necessarily. Bvecs are a consecutive list of byte ranges which
> make up the data portion of a bio.
> The logical sector size is a property of the request queue, which is
> applied when a request is formed from one or several bios.
> For the request the overall length need to be a multiple of the logical
> sector size, but not necessarily the individual bios.
Oh, so this is worse than I thought. So if you care about e.g. only submitting
I/O in units of logical_block_size, you have to combine segments across the
entire request.
>> - What if logical_sector_size > PAGE_SIZE?
>
> See above.
>
>> - What about bv_offset?
>
> Same story. The eventual request needs to observe that the offset
> and the length is aligned to the logical block size, but the individual
> bios might not.
>
>> - Is it possible to have a bio where the total length is a multiple of
>> logical_sector_size, but the data is split across several segments
>> where each segment is a multiple of SECTOR_SIZE?
>
> Sure.
>
>> - Is is possible to have segments not even aligned to SECTOR_SIZE?
>
> Nope.
>
>> - Can I somehow request to only get segments with bv_len aligned to
>> logical_sector_size? Or do I need to do my own coalescing and bounce
>> buffering for that?
>>
>
> The driver surely can. You should be able to set 'max_segment_size' to
> the logical block size, and that should give you what you want.
But couldn't I get segments smaller than that? max_segment_size seems like
it would only restrict the maximum size, leaving the possibility open for
smaller segments.
>> I've been reading some drivers (as well as stuff in block/) to try and
>> figure things out, but it's hard to figure out all the places where
>> constraints are enforced. In particular, I've read several drivers that
>> make some big assumptions (which might be bugs?) For example, in
>> drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
>>
> In general, the block layer has two major data items, bios and requests.
> 'struct bio' is the central structure for any 'upper' layers to submit
> data (via the 'submit_bio()' function), and 'struct request' is the
> central structure for drivers to fetch data for submission to the
> hardware (via the 'queue_rq()' request_queue callback).
> And the task of the block layer is to convert 'struct bio' into
> 'struct request'.
>
> [ .. ]
>
>> For context, tr->blkshift is either 512 or 4096, depending on the
>> backend. From what I can tell, this code assumes the following:
>>
> mtd is probably not a good examples, as MTD has it's own set of limitations which might result in certain shortcuts to be taken.
Well, I want to write a block driver on top of MTD, so it's a pretty good
example for my purposes :P
>> - There is only one bio in a request. This one is a bit of a soft
>> assumption since we should only flush the pages in the bio and not the
>> whole request otherwise.
>> - There is only one segment in a bio. This one could be reasonable if
>> max_segments was set to 1, but it's not as far as I can tell. So I
>> guess we just go off the end of the bio if there's a second segment?
>> - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
>> maps a single page, so if we go past one page we end up in adjacent
>> kmapped pages.
>>
> Well, that code _does_ look suspicious. It really should be converted
> to using the iov iterators.
I had a look at this, but the API isn't documented so I wasn't sure what
I would get out of it. I'll have a closer look.
> But then again, it _might_ be okay if there are underlying MTD
> restrictions which would devolve into MTD only having a single bvec.
The underlying restriction is that the MTD API expects a buffer that has
contiguous kernel virtual addresses. The driver will do bounce-buffering
if wants to do DMA and virt_addr_valid is false. The mtd_blkdevs driver
promises to submit buffers of size tr->blksize to the underlying bltrans
driver. This whole thing is not very efficient if the MTD driver can do
scatter-gather DMA, but that's not the API...
Maybe I should just vmap the entire request?
--Sean
WARNING: multiple messages have this Message-ID (diff)
From: Sean Anderson <seanga2@gmail.com>
To: Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
Richard Weinberger <richard@nod.at>,
Vignesh Raghavendra <vigneshr@ti.com>,
linux-mtd@lists.infradead.org,
Zhihao Cheng <chengzhihao1@huawei.com>
Subject: Re: bio segment constraints
Date: Mon, 7 Apr 2025 10:14:28 -0400 [thread overview]
Message-ID: <a0ffa9b9-8649-1b63-3d56-3fc45fdfda83@gmail.com> (raw)
In-Reply-To: <8a232716-74f8-4bba-a514-d0f766492344@suse.de>
On 4/7/25 03:10, Hannes Reinecke wrote:
> On 4/6/25 21:40, Sean Anderson wrote:
>> Hi all,
>>
>> I'm not really sure what guarantees the block layer makes regarding the
>> segments in a bio as part of a request submitted to a block driver. As
>> far as I can tell this is not documented anywhere. In particular,
>>
>> - Is bv_len aligned to SECTOR_SIZE?
>
> The block layer always uses a 512 byte sector size, so yes.
>
>> - To logical_sector_size?
>
> Not necessarily. Bvecs are a consecutive list of byte ranges which
> make up the data portion of a bio.
> The logical sector size is a property of the request queue, which is
> applied when a request is formed from one or several bios.
> For the request the overall length need to be a multiple of the logical
> sector size, but not necessarily the individual bios.
Oh, so this is worse than I thought. So if you care about e.g. only submitting
I/O in units of logical_block_size, you have to combine segments across the
entire request.
>> - What if logical_sector_size > PAGE_SIZE?
>
> See above.
>
>> - What about bv_offset?
>
> Same story. The eventual request needs to observe that the offset
> and the length is aligned to the logical block size, but the individual
> bios might not.
>
>> - Is it possible to have a bio where the total length is a multiple of
>> logical_sector_size, but the data is split across several segments
>> where each segment is a multiple of SECTOR_SIZE?
>
> Sure.
>
>> - Is is possible to have segments not even aligned to SECTOR_SIZE?
>
> Nope.
>
>> - Can I somehow request to only get segments with bv_len aligned to
>> logical_sector_size? Or do I need to do my own coalescing and bounce
>> buffering for that?
>>
>
> The driver surely can. You should be able to set 'max_segment_size' to
> the logical block size, and that should give you what you want.
But couldn't I get segments smaller than that? max_segment_size seems like
it would only restrict the maximum size, leaving the possibility open for
smaller segments.
>> I've been reading some drivers (as well as stuff in block/) to try and
>> figure things out, but it's hard to figure out all the places where
>> constraints are enforced. In particular, I've read several drivers that
>> make some big assumptions (which might be bugs?) For example, in
>> drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
>>
> In general, the block layer has two major data items, bios and requests.
> 'struct bio' is the central structure for any 'upper' layers to submit
> data (via the 'submit_bio()' function), and 'struct request' is the
> central structure for drivers to fetch data for submission to the
> hardware (via the 'queue_rq()' request_queue callback).
> And the task of the block layer is to convert 'struct bio' into
> 'struct request'.
>
> [ .. ]
>
>> For context, tr->blkshift is either 512 or 4096, depending on the
>> backend. From what I can tell, this code assumes the following:
>>
> mtd is probably not a good examples, as MTD has it's own set of limitations which might result in certain shortcuts to be taken.
Well, I want to write a block driver on top of MTD, so it's a pretty good
example for my purposes :P
>> - There is only one bio in a request. This one is a bit of a soft
>> assumption since we should only flush the pages in the bio and not the
>> whole request otherwise.
>> - There is only one segment in a bio. This one could be reasonable if
>> max_segments was set to 1, but it's not as far as I can tell. So I
>> guess we just go off the end of the bio if there's a second segment?
>> - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
>> maps a single page, so if we go past one page we end up in adjacent
>> kmapped pages.
>>
> Well, that code _does_ look suspicious. It really should be converted
> to using the iov iterators.
I had a look at this, but the API isn't documented so I wasn't sure what
I would get out of it. I'll have a closer look.
> But then again, it _might_ be okay if there are underlying MTD
> restrictions which would devolve into MTD only having a single bvec.
The underlying restriction is that the MTD API expects a buffer that has
contiguous kernel virtual addresses. The driver will do bounce-buffering
if wants to do DMA and virt_addr_valid is false. The mtd_blkdevs driver
promises to submit buffers of size tr->blksize to the underlying bltrans
driver. This whole thing is not very efficient if the MTD driver can do
scatter-gather DMA, but that's not the API...
Maybe I should just vmap the entire request?
--Sean
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
next prev parent reply other threads:[~2025-04-07 14:14 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-06 19:40 bio segment constraints Sean Anderson
2025-04-06 19:40 ` Sean Anderson
2025-04-07 7:07 ` Christoph Hellwig
2025-04-07 7:07 ` Christoph Hellwig
2025-04-07 13:46 ` Keith Busch
2025-04-07 13:46 ` Keith Busch
2025-04-07 13:59 ` Christoph Hellwig
2025-04-07 13:59 ` Christoph Hellwig
2025-04-07 15:52 ` Bart Van Assche
2025-04-07 15:52 ` Bart Van Assche
2025-04-07 13:59 ` Sean Anderson
2025-04-07 13:59 ` Sean Anderson
2025-04-07 14:12 ` Christoph Hellwig
2025-04-07 14:12 ` Christoph Hellwig
2025-04-07 7:10 ` Hannes Reinecke
2025-04-07 7:10 ` Hannes Reinecke
2025-04-07 14:14 ` Sean Anderson [this message]
2025-04-07 14:14 ` Sean Anderson
2025-04-08 6:10 ` Hannes Reinecke
2025-04-08 6:10 ` Hannes Reinecke
2025-04-08 13:57 ` Sean Anderson
2025-04-08 13:57 ` Sean Anderson
2025-04-08 14:33 ` Keith Busch
2025-04-08 14:33 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0ffa9b9-8649-1b63-3d56-3fc45fdfda83@gmail.com \
--to=seanga2@gmail.com \
--cc=axboe@kernel.dk \
--cc=chengzhihao1@huawei.com \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=miquel.raynal@bootlin.com \
--cc=richard@nod.at \
--cc=vigneshr@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.