From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>,
Christian Brauner <brauner@kernel.org>,
Carlos Maiolino <cem@kernel.org>, Qu Wenruo <wqu@suse.com>,
Al Viro <viro@zeniv.linux.org.uk>,
linux-block@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 05/14] block: add helpers to bounce buffer an iov_iter into bios
Date: Thu, 22 Jan 2026 09:25:56 -0800 [thread overview]
Message-ID: <20260122172556.GV5945@frogsfrogsfrogs> (raw)
In-Reply-To: <20260119074425.4005867-6-hch@lst.de>
On Mon, Jan 19, 2026 at 08:44:12AM +0100, Christoph Hellwig wrote:
> Add helpers to implement bounce buffering of data into a bio to implement
> direct I/O for cases where direct user access is not possible because
> stable in-flight data is required. These are intended to be used as
> easily as bio_iov_iter_get_pages for the zero-copy path.
>
> The write side is trivial and just copies data into the bounce buffer.
> The read side is a lot more complex because it needs to perform the copy
> from the completion context, and without preserving the iov_iter through
> the call chain. It steals a trick from the integrity data user interface
> and uses the first vector in the bio for the bounce buffer data that is
> fed to the block I/O stack, and uses the others to record the user
> buffer fragments.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/bio.c | 178 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/bio.h | 26 +++++++
> 2 files changed, 204 insertions(+)
>
> diff --git a/block/bio.c b/block/bio.c
> index c51b4e2470e2..da795b1df52a 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1266,6 +1266,184 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter,
> return bio_iov_iter_align_down(bio, iter, len_align_mask);
> }
>
> +static struct folio *folio_alloc_greedy(gfp_t gfp, size_t *size)
> +{
> + struct folio *folio;
> +
> + while (*size > PAGE_SIZE) {
> + folio = folio_alloc(gfp | __GFP_NORETRY, get_order(*size));
> + if (folio)
> + return folio;
> + *size = rounddown_pow_of_two(*size - 1);
> + }
> +
> + return folio_alloc(gfp, get_order(*size));
> +}
Hrm. Should we combine this with the slightly different version that is
in xfs_healthmon?
/* Allocate as much memory as we can get for verification buffer. */
static struct folio *
xfs_verify_alloc_folio(
const unsigned int iosize)
{
unsigned int order = get_order(iosize);
while (order > 0) {
struct folio *folio =
folio_alloc(GFP_KERNEL | __GFP_NORETRY, order);
if (folio)
return folio;
order--;
}
return folio_alloc(GFP_KERNEL, 0);
}
> +static void bio_free_folios(struct bio *bio)
> +{
> + struct bio_vec *bv;
> + int i;
> +
> + bio_for_each_bvec_all(bv, bio, i) {
> + struct folio *folio = page_folio(bv->bv_page);
> +
> + if (!is_zero_folio(folio))
> + folio_put(page_folio(bv->bv_page));
Isn't folio_put's argument just @folio again?
> + }
> +}
> +
> +static int bio_iov_iter_bounce_write(struct bio *bio, struct iov_iter *iter)
> +{
> + size_t total_len = iov_iter_count(iter);
> +
> + if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
> + return -EINVAL;
> + if (WARN_ON_ONCE(bio->bi_iter.bi_size))
> + return -EINVAL;
> + if (WARN_ON_ONCE(bio->bi_vcnt >= bio->bi_max_vecs))
> + return -EINVAL;
> +
> + do {
> + size_t this_len = min(total_len, SZ_1M);
> + struct folio *folio;
> +
> + if (this_len > PAGE_SIZE * 2)
> + this_len = rounddown_pow_of_two(this_len);
> +
> + if (bio->bi_iter.bi_size > UINT_MAX - this_len)
Now that I've seen UINT_MAX appear twice in terms of limiting bio size,
I wonder if that ought to be encoded as a constant somewhere?
#define BIO_ITER_MAX_SIZE (UINT_MAX)
(apologies if I'm digging up some horrible old flamewar from the 1830s)
> + break;
> +
> + folio = folio_alloc_greedy(GFP_KERNEL, &this_len);
> + if (!folio)
> + break;
> + bio_add_folio_nofail(bio, folio, this_len, 0);
> +
> + if (copy_from_iter(folio_address(folio), this_len, iter) !=
> + this_len) {
> + bio_free_folios(bio);
> + return -EFAULT;
> + }
> +
> + total_len -= this_len;
> + } while (total_len && bio->bi_vcnt < bio->bi_max_vecs);
> +
> + if (!bio->bi_iter.bi_size)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +static int bio_iov_iter_bounce_read(struct bio *bio, struct iov_iter *iter)
> +{
> + size_t len = min(iov_iter_count(iter), SZ_1M);
> + struct folio *folio;
> +
> + folio = folio_alloc_greedy(GFP_KERNEL, &len);
> + if (!folio)
> + return -ENOMEM;
> +
> + do {
> + ssize_t ret;
> +
> + ret = iov_iter_extract_bvecs(iter, bio->bi_io_vec + 1, len,
> + &bio->bi_vcnt, bio->bi_max_vecs - 1, 0);
> + if (ret <= 0) {
> + if (!bio->bi_vcnt)
> + return ret;
> + break;
> + }
> + len -= ret;
> + bio->bi_iter.bi_size += ret;
> + } while (len && bio->bi_vcnt < bio->bi_max_vecs - 1);
> +
> + /*
> + * Set the folio directly here. The above loop has already calculated
> + * the correct bi_size, and we use bi_vcnt for the user buffers. That
> + * is safe as bi_vcnt is only for user by the submitter and not looked
"...for use by the submitter..." ?
> + * at by the actual I/O path.
> + */
> + bvec_set_folio(&bio->bi_io_vec[0], folio, bio->bi_iter.bi_size, 0);
> + if (iov_iter_extract_will_pin(iter))
> + bio_set_flag(bio, BIO_PAGE_PINNED);
> + return 0;
> +}
> +
> +/**
> + * bio_iov_iter_bounce - bounce buffer data from an iter into a bio
> + * @bio: bio to send
> + * @iter: iter to read from / write into
> + *
> + * Helper for direct I/O implementations that need to bounce buffer because
> + * we need to checksum the data or perform other operations that require
> + * consistency. Allocates folios to back the bounce buffer, and for writes
> + * copies the data into it. Needs to be paired with bio_iov_iter_unbounce()
> + * called on completion.
> + */
> +int bio_iov_iter_bounce(struct bio *bio, struct iov_iter *iter)
> +{
> + if (op_is_write(bio_op(bio)))
> + return bio_iov_iter_bounce_write(bio, iter);
> + return bio_iov_iter_bounce_read(bio, iter);
> +}
> +
> +static void bvec_unpin(struct bio_vec *bv, bool mark_dirty)
> +{
> + struct folio *folio = page_folio(bv->bv_page);
> + size_t nr_pages = (bv->bv_offset + bv->bv_len - 1) / PAGE_SIZE -
> + bv->bv_offset / PAGE_SIZE + 1;
> +
> + if (mark_dirty)
> + folio_mark_dirty_lock(folio);
> + unpin_user_folio(folio, nr_pages);
> +}
> +
> +static void bio_iov_iter_unbounce_read(struct bio *bio, bool is_error,
> + bool mark_dirty)
> +{
> + unsigned int len = bio->bi_io_vec[0].bv_len;
> +
> + if (likely(!is_error)) {
> + void *buf = bvec_virt(&bio->bi_io_vec[0]);
> + struct iov_iter to;
> +
> + iov_iter_bvec(&to, ITER_DEST, bio->bi_io_vec + 1, bio->bi_vcnt,
> + len);
> + WARN_ON_ONCE(copy_to_iter(buf, len, &to) != len);
I wonder, under what circumstances would the copy_to_iter come up short?
Something evil like $program initiates a directio read from a PI disk, a
BPF guy starts screaming in a datacenter to wobble the disk, and that
gives a compromised systemd enough time to attach to $program with
ptrace to unmap a page in the middle of the read buffer before
bio_iov_iter_unbounce_read gets called?
--D
> + } else {
> + /* No need to mark folios dirty if never copied to them */
> + mark_dirty = false;
> + }
> +
> + if (bio_flagged(bio, BIO_PAGE_PINNED)) {
> + int i;
> +
> + for (i = 0; i < bio->bi_vcnt; i++)
> + bvec_unpin(&bio->bi_io_vec[1 + i], mark_dirty);
> + }
> +
> + folio_put(page_folio(bio->bi_io_vec[0].bv_page));
> +}
> +
> +/**
> + * bio_iov_iter_unbounce - finish a bounce buffer operation
> + * @bio: completed bio
> + * @is_error: %true if an I/O error occurred and data should not be copied
> + * @mark_dirty: If %true, folios will be marked dirty.
> + *
> + * Helper for direct I/O implementations that need to bounce buffer because
> + * we need to checksum the data or perform other operations that require
> + * consistency. Called to complete a bio set up by bio_iov_iter_bounce().
> + * Copies data back for reads, and marks the original folios dirty if
> + * requested and then frees the bounce buffer.
> + */
> +void bio_iov_iter_unbounce(struct bio *bio, bool is_error, bool mark_dirty)
> +{
> + if (op_is_write(bio_op(bio)))
> + bio_free_folios(bio);
> + else
> + bio_iov_iter_unbounce_read(bio, is_error, mark_dirty);
> +}
> +
> static void submit_bio_wait_endio(struct bio *bio)
> {
> complete(bio->bi_private);
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index c75a9b3672aa..95cfc79b88b8 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -403,6 +403,29 @@ static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs)
> return iov_iter_npages(iter, max_segs);
> }
>
> +/**
> + * bio_iov_bounce_nr_vecs - calculate number of bvecs for a bounce bio
> + * @iter: iter to bounce from
> + * @op: REQ_OP_* for the bio
> + *
> + * Calculates how many bvecs are needed for the next bio to bounce from/to
> + * @iter.
> + */
> +static inline unsigned short
> +bio_iov_bounce_nr_vecs(struct iov_iter *iter, blk_opf_t op)
> +{
> + /*
> + * We still need to bounce bvec iters, so don't special case them
> + * here unlike in bio_iov_vecs_to_alloc.
> + *
> + * For reads we need to use a vector for the bounce buffer, account
> + * for that here.
> + */
> + if (op_is_write(op))
> + return iov_iter_npages(iter, BIO_MAX_VECS);
> + return iov_iter_npages(iter, BIO_MAX_VECS - 1) + 1;
> +}
> +
> struct request_queue;
>
> void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
> @@ -456,6 +479,9 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty);
> extern void bio_set_pages_dirty(struct bio *bio);
> extern void bio_check_pages_dirty(struct bio *bio);
>
> +int bio_iov_iter_bounce(struct bio *bio, struct iov_iter *iter);
> +void bio_iov_iter_unbounce(struct bio *bio, bool is_error, bool mark_dirty);
> +
> extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
> struct bio *src, struct bvec_iter *src_iter);
> extern void bio_copy_data(struct bio *dst, struct bio *src);
> --
> 2.47.3
>
>
next prev parent reply other threads:[~2026-01-22 17:25 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20260123121444epcas5p4e729259011e031a28be8379ea3b9b749@epcas5p4.samsung.com>
2026-01-19 7:44 ` bounce buffer direct I/O when stable pages are required v2 Christoph Hellwig
2026-01-19 7:44 ` [PATCH 01/14] block: refactor get_contig_folio_len Christoph Hellwig
2026-01-22 11:00 ` Johannes Thumshirn
2026-01-22 17:54 ` Darrick J. Wong
2026-01-23 8:32 ` Damien Le Moal
2026-01-23 8:35 ` Christoph Hellwig
2026-01-23 8:44 ` Damien Le Moal
2026-01-23 8:45 ` Damien Le Moal
2026-01-23 12:14 ` Anuj Gupta
2026-01-19 7:44 ` [PATCH 02/14] block: open code bio_add_page and fix handling of mismatching P2P ranges Christoph Hellwig
2026-01-22 11:04 ` Johannes Thumshirn
2026-01-22 17:59 ` Darrick J. Wong
2026-01-23 5:43 ` Christoph Hellwig
2026-01-23 7:05 ` Darrick J. Wong
2026-01-23 8:35 ` Damien Le Moal
2026-01-23 12:15 ` Anuj Gupta
2026-01-19 7:44 ` [PATCH 03/14] iov_iter: extract a iov_iter_extract_bvecs helper from bio code Christoph Hellwig
2026-01-22 17:47 ` Darrick J. Wong
2026-01-23 5:44 ` Christoph Hellwig
2026-01-23 7:09 ` Darrick J. Wong
2026-01-23 7:14 ` Christoph Hellwig
2026-01-23 11:37 ` David Howells
2026-01-23 13:58 ` Christoph Hellwig
2026-01-23 14:57 ` David Howells
2026-01-26 17:36 ` Matthew Wilcox
2026-01-27 5:13 ` Christoph Hellwig
2026-01-27 5:44 ` Matthew Wilcox
2026-01-27 5:47 ` Christoph Hellwig
2026-02-03 8:20 ` Askar Safin
2026-02-03 10:28 ` Askar Safin
2026-02-03 16:32 ` Christoph Hellwig
2026-01-19 7:44 ` [PATCH 04/14] block: remove bio_release_page Christoph Hellwig
2026-01-22 11:14 ` Johannes Thumshirn
2026-01-22 17:26 ` Darrick J. Wong
2026-01-23 8:43 ` Damien Le Moal
2026-01-23 12:17 ` Anuj Gupta
2026-01-19 7:44 ` [PATCH 05/14] block: add helpers to bounce buffer an iov_iter into bios Christoph Hellwig
2026-01-22 13:05 ` Johannes Thumshirn
2026-01-22 17:25 ` Darrick J. Wong [this message]
2026-01-23 5:51 ` Christoph Hellwig
2026-01-23 7:11 ` Darrick J. Wong
2026-01-23 7:16 ` Christoph Hellwig
2026-01-23 8:52 ` Damien Le Moal
2026-01-23 12:20 ` Anuj Gupta
2026-01-19 7:44 ` [PATCH 06/14] iomap: fix submission side handling of completion side errors Christoph Hellwig
2026-01-19 17:40 ` Darrick J. Wong
2026-01-23 8:54 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 07/14] iomap: simplify iomap_dio_bio_iter Christoph Hellwig
2026-01-19 17:43 ` Darrick J. Wong
2026-01-23 8:55 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 08/14] iomap: split out the per-bio logic from iomap_dio_bio_iter Christoph Hellwig
2026-01-23 8:57 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 09/14] iomap: share code between iomap_dio_bio_end_io and iomap_finish_ioend_direct Christoph Hellwig
2026-01-23 8:58 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 10/14] iomap: free the bio before completing the dio Christoph Hellwig
2026-01-19 17:43 ` Darrick J. Wong
2026-01-23 8:59 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 11/14] iomap: rename IOMAP_DIO_DIRTY to IOMAP_DIO_USER_BACKED Christoph Hellwig
2026-01-23 9:00 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 12/14] iomap: support ioends for direct reads Christoph Hellwig
2026-01-23 9:02 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 13/14] iomap: add a flag to bounce buffer direct I/O Christoph Hellwig
2026-01-23 9:05 ` Damien Le Moal
2026-01-19 7:44 ` [PATCH 14/14] xfs: use bounce buffering direct I/O when the device requires stable pages Christoph Hellwig
2026-01-19 17:45 ` Darrick J. Wong
2026-01-23 9:08 ` Damien Le Moal
2026-01-23 12:10 ` bounce buffer direct I/O when stable pages are required v2 Anuj Gupta
2026-01-23 14:01 ` Christoph Hellwig
2026-01-23 14:09 ` Keith Busch
2026-01-23 12:24 ` Christian Brauner
2026-01-23 14:10 ` block or iomap tree, was: " Christoph Hellwig
2026-01-27 10:31 ` Christian Brauner
2026-01-27 12:50 ` Christoph Hellwig
2026-01-14 7:40 bounce buffer direct I/O when stable pages are required Christoph Hellwig
2026-01-14 7:41 ` [PATCH 05/14] block: add helpers to bounce buffer an iov_iter into bios Christoph Hellwig
2026-01-14 12:51 ` Johannes Thumshirn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260122172556.GV5945@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox