From: Luis Chamberlain <mcgrof@kernel.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Daniel Gomez <da.gomez@kernel.org>,
Paul Bunyan <pbunyan@redhat.com>, Yi Zhang <yi.zhang@redhat.com>,
John Garry <john.g.garry@oracle.com>,
Keith Busch <kbusch@kernel.org>,
Bart Van Assche <bvanassche@acm.org>
Subject: Re: [PATCH V4] block: make segment size limit workable for > 4K PAGE_SIZE
Date: Fri, 21 Feb 2025 12:12:42 -0800 [thread overview]
Message-ID: <Z7jeOpW882Old2Eh@bombadil.infradead.org> (raw)
In-Reply-To: <20250219024409.901186-1-ming.lei@redhat.com>
On Wed, Feb 19, 2025 at 10:44:09AM +0800, Ming Lei wrote:
> PAGE_SIZE is applied in validating block device queue limits, this way is
> very fragile and is wrong:
>
> - queue limits are read from hardware, which is often one readonly hardware
> property
>
> - PAGE_SIZE is one config option which can be changed during build time.
>
> In RH lab, it has been found that max segment size of some mmc card is
> less than 64K, then this kind of card can't be probed successfully when
> same kernel is re-built with 64K PAGE_SIZE.
>
> Fix this issue by adding BLK_MIN_SEGMENT_SIZE and lim->min_segment_size:
>
> - validate segment limits by BLK_MIN_SEGMENT_SIZE which is 4K(minimized PAGE_SIZE)
>
> - checking if one bvec can be one segment quickly by lim->min_segment_size
>
> commit 6aeb4f836480 ("block: remove bio_add_pc_page")
> commit 02ee5d69e3ba ("block: remove blk_rq_bio_prep")
> commit b7175e24d6ac ("block: add a dma mapping iterator")
Let me try to help with this commit log message a bit, how about:
Using PAGE_SIZE as a minimum expected DMA segment size in consideration
of devices which have a max DMA segment size of 4k when used on 64k
PAGE_SIZE systems leads to devices not being able to probe such as
eMMC and Exynos UFS controller [0] [1] you can end up with a probe failure
as follows:
WARNING: CPU: 2 PID: 397 at block/blk-settings.c:339 blk_validate_limits+0x364/0x3c0
Modules linked in: mmc_block(+) rpmb_core crct10dif_ce ghash_ce sha2_ce dw_mmc_bluefield sha256_arm64 dw_mmc_pltfm sha1_ce dw_mmc mmc_core nfit i2c_mlxbf sbsa_gwdt gpio_mlxbf2
f_tmfifo dm_mirror dm_region_hash dm_log dm_mod
CPU: 2 UID: 0 PID: 397 Comm: (udev-worker) Not tainted 6.12.0-39.el10.aarch64+64k #1
Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.5.1-1-g4078432 Jan 28 2021
ng pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : blk_validate_limits+0x364/0x3c0
p.service
lr : blk_set_default_limits+0x20/0x40
Setup...
sp : ffff80008688f2d0
x29: ffff80008688f2d0 x28: ffff000082acb600 x27: ffff80007bef02a8
x26: ffff80007bef0000 x25: ffff80008688f58e x24: ffff80008688f450
x23: ffff80008301b000 x22: 00000000ffffffff x21: ffff800082c39950
x20: 0000000000000000 x19: ffff0000930169e0 x18: 0000000000000014
x17: 00000000767472b1 x16: 0000000005a697e6 x15: 0000000002f42ca4
x11: 00000000de7f0111 x10: 000000005285b53a x9 : ffff800080752908
x8 : 0000000000000001 x7 : 0000000000000000 x6 : 0000000000000200
x5 : 0000000000000000 x4 : 000000000000ffff x3 : 0000000000004000
x2 : 0000000000000200 x1 : 0000000000001000 x0 : ffff80008688f450
Call trace:
blk_validate_limits+0x364/0x3c0
blk_set_default_limits+0x20/0x40
blk_alloc_queue+0x84/0x240
blk_mq_alloc_queue+0x80/0x118
__blk_mq_alloc_disk+0x28/0x198
mmc_alloc_disk+0xe0/0x260 [mmc_block]
...
mmcblk mmc0:0001: probe with driver mmcblk failed with error -22
To fix this provide a block sanity check to ensure we use the min of the
block device's segment size and PAGE_SIZE.
Link: https://lore.kernel.org/linux-block/20230612203314.17820-1-bvanassche@acm.org/ # [0]
Link: https://lore.kernel.org/linux-block/1d55e942-5150-de4c-3a02-c3d066f87028@acm.org/ # [1]
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 15cd231d560c..4fe2dfabfc9d 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -329,7 +329,7 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim,
>
> if (nsegs < lim->max_segments &&
> bytes + bv.bv_len <= max_bytes &&
> - bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
> + bv.bv_offset + bv.bv_len <= lim->min_segment_size) {
> nsegs++;
> bytes += bv.bv_len;
> } else {
Now that's certainly more in line with what I expected.
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 248416ecd01c..1f7d492975c1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -367,6 +367,7 @@ struct queue_limits {
> unsigned int max_sectors;
> unsigned int max_user_sectors;
> unsigned int max_segment_size;
> + unsigned int min_segment_size;
> unsigned int physical_block_size;
> unsigned int logical_block_size;
> unsigned int alignment_offset;
> @@ -1163,6 +1164,8 @@ static inline bool bdev_is_partition(struct block_device *bdev)
> enum blk_default_limits {
> BLK_MAX_SEGMENTS = 128,
> BLK_SAFE_MAX_SECTORS = 255,
> + /* use minimized PAGE_SIZE as min segment size hint */
> + BLK_MIN_SEGMENT_SIZE = 4096,
> BLK_MAX_SEGMENT_SIZE = 65536,
> BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL,
I think Bart provided a more sensible comment:
Use 4 KiB as the smallest default supported DMA segment size limit instead of
PAGE_SIZE. This is important if the page size is larger than 4 KiB since
the maximum DMA segment size for some storage controllers is only 4 KiB.
With that:
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Luis
next prev parent reply other threads:[~2025-02-21 20:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-19 2:44 [PATCH V4] block: make segment size limit workable for > 4K PAGE_SIZE Ming Lei
2025-02-20 6:13 ` Christoph Hellwig
2025-02-20 11:38 ` Ming Lei
2025-02-21 20:12 ` Luis Chamberlain [this message]
2025-02-22 21:43 ` Daniel Gomez
2025-02-24 0:55 ` Luis Chamberlain
2025-02-24 1:48 ` Ming Lei
2025-02-22 21:52 ` Daniel Gomez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7jeOpW882Old2Eh@bombadil.infradead.org \
--to=mcgrof@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=da.gomez@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=pbunyan@redhat.com \
--cc=yi.zhang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox