public inbox for linux-mtd@lists.infradead.org
 help / color / mirror / Atom feed
From: Sean Anderson <seanga2@gmail.com>
To: Hannes Reinecke <hare@suse.de>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org
Cc: Miquel Raynal <miquel.raynal@bootlin.com>,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	linux-mtd@lists.infradead.org,
	Zhihao Cheng <chengzhihao1@huawei.com>
Subject: Re: bio segment constraints
Date: Mon, 7 Apr 2025 10:14:28 -0400	[thread overview]
Message-ID: <a0ffa9b9-8649-1b63-3d56-3fc45fdfda83@gmail.com> (raw)
In-Reply-To: <8a232716-74f8-4bba-a514-d0f766492344@suse.de>

On 4/7/25 03:10, Hannes Reinecke wrote:
> On 4/6/25 21:40, Sean Anderson wrote:
>> Hi all,
>>
>> I'm not really sure what guarantees the block layer makes regarding the
>> segments in a bio as part of a request submitted to a block driver. As
>> far as I can tell this is not documented anywhere. In particular,
>>
>> - Is bv_len aligned to SECTOR_SIZE?
> 
> The block layer always uses a 512 byte sector size, so yes.
> 
>> - To logical_sector_size?
> 
> Not necessarily. Bvecs are a consecutive list of byte ranges which
> make up the data portion of a bio.
> The logical sector size is a property of the request queue, which is
> applied when a request is formed from one or several bios.
> For the request the overall length need to be a multiple of the logical
> sector size, but not necessarily the individual bios.

Oh, so this is worse than I thought. So if you care about e.g. only submitting
I/O in units of logical_block_size, you have to combine segments across the
entire request.

>> - What if logical_sector_size > PAGE_SIZE?
> 
> See above.
> 
>> - What about bv_offset?
> 
> Same story. The eventual request needs to observe that the offset
> and the length is aligned to the logical block size, but the individual
> bios might not.
> 
>> - Is it possible to have a bio where the total length is a multiple of
>>    logical_sector_size, but the data is split across several segments
>>    where each segment is a multiple of SECTOR_SIZE?
> 
> Sure.
> 
>> - Is is possible to have segments not even aligned to SECTOR_SIZE?
> 
> Nope.
> 
>> - Can I somehow request to only get segments with bv_len aligned to
>>    logical_sector_size? Or do I need to do my own coalescing and bounce
>>    buffering for that?
>>
> 
> The driver surely can. You should be able to set 'max_segment_size' to
> the logical block size, and that should give you what you want.

But couldn't I get segments smaller than that? max_segment_size seems like
it would only restrict the maximum size, leaving the possibility open for
smaller segments.

>> I've been reading some drivers (as well as stuff in block/) to try and
>> figure things out, but it's hard to figure out all the places where
>> constraints are enforced. In particular, I've read several drivers that
>> make some big assumptions (which might be bugs?) For example, in
>> drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
>>
> In general, the block layer has two major data items, bios and requests.
> 'struct bio' is the central structure for any 'upper' layers to submit
> data (via the 'submit_bio()' function), and 'struct request' is the
> central structure for drivers to fetch data for submission to the
> hardware (via the 'queue_rq()' request_queue callback).
> And the task of the block layer is to convert 'struct bio' into
> 'struct request'.
> 
> [ .. ]
> 
>> For context, tr->blkshift is either 512 or 4096, depending on the
>> backend. From what I can tell, this code assumes the following:
>>
> mtd is probably not a good examples, as MTD has it's own set of limitations which might result in certain shortcuts to be taken.

Well, I want to write a block driver on top of MTD, so it's a pretty good
example for my purposes :P

>> - There is only one bio in a request. This one is a bit of a soft
>>    assumption since we should only flush the pages in the bio and not the
>>    whole request otherwise.
>> - There is only one segment in a bio. This one could be reasonable if
>>    max_segments was set to 1, but it's not as far as I can tell. So I
>>    guess we just go off the end of the bio if there's a second segment?
>> - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
>>    maps a single page, so if we go past one page we end up in adjacent
>>    kmapped pages.
>>
> Well, that code _does_ look suspicious. It really should be converted
> to using the iov iterators.

I had a look at this, but the API isn't documented so I wasn't sure what
I would get out of it. I'll have a closer look.

> But then again, it _might_ be okay if there are underlying MTD
> restrictions which would devolve into MTD only having a single bvec.

The underlying restriction is that the MTD API expects a buffer that has
contiguous kernel virtual addresses. The driver will do bounce-buffering
if wants to do DMA and virt_addr_valid is false. The mtd_blkdevs driver
promises to submit buffers of size tr->blksize to the underlying bltrans
driver. This whole thing is not very efficient if the MTD driver can do
scatter-gather DMA, but that's not the API...

Maybe I should just vmap the entire request?

--Sean
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2025-04-07 16:04 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-06 19:40 bio segment constraints Sean Anderson
2025-04-07  7:07 ` Christoph Hellwig
2025-04-07 13:46   ` Keith Busch
2025-04-07 13:59     ` Christoph Hellwig
2025-04-07 15:52       ` Bart Van Assche
2025-04-07 13:59   ` Sean Anderson
2025-04-07 14:12     ` Christoph Hellwig
2025-04-07  7:10 ` Hannes Reinecke
2025-04-07 14:14   ` Sean Anderson [this message]
2025-04-08  6:10     ` Hannes Reinecke
2025-04-08 13:57       ` Sean Anderson
2025-04-08 14:33     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0ffa9b9-8649-1b63-3d56-3fc45fdfda83@gmail.com \
    --to=seanga2@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=chengzhihao1@huawei.com \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox