public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* bounce buffer direct I/O when stable pages are required
@ 2026-01-14  7:40 Christoph Hellwig
  2026-01-14  7:41 ` [PATCH 03/14] iov_iter: extract a iov_iter_extract_bvecs helper from bio code Christoph Hellwig
  0 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2026-01-14  7:40 UTC (permalink / raw)
  To: Jens Axboe, Christian Brauner
  Cc: Darrick J. Wong, Carlos Maiolino, Qu Wenruo, Al Viro, linux-block,
	linux-xfs, linux-fsdevel

Hi all,

this series tries to address the problem that under I/O pages can be
modified during direct I/O, even when the device or file system require
stable pages during I/O to calculate checksums, parity or data
operations.  It does so by adding block layer helpers to bounce buffer
an iov_iter into a bio, then wires that up in iomap and ultimately
XFS.

The reason that the file system even needs to know about it, is because
reads need a user context to copy the data back, and the infrastructure
to defer ioends to a workqueue currently sits in XFS.  I'm going to look
into moving that into ioend and enabling it for other file systems.
Additionally btrfs already has it's own infrastructure for this, and
actually an urgent need to bounce buffer, so this should be useful there
and could be wire up easily.  In fact the idea comes from patches by
Qu that did this in btrfs.

This patch fixes all but one xfstests failures on T10 PI capable devices
(generic/095 seems to have issues with a mix of mmap and splice still,
I'm looking into that separate), and make qemu VMs running Windows,
or Linux with swap enabled fine on an XFS file on a device using PI.

Performance numbers on my (not exactly state of the art) NVMe PI test
setup:

  Sequential reads using io_uring, QD=16.
  Bandwidth and CPU usage (usr/sys):

  | size |        zero copy         |          bounce          |
  +------+--------------------------+--------------------------+
  |   4k | 1316MiB/s (12.65/55.40%) | 1081MiB/s (11.76/49.78%) |
  |  64K | 3370MiB/s ( 5.46/18.20%) | 3365MiB/s ( 4.47/15.68%) |
  |   1M | 3401MiB/s ( 0.76/23.05%) | 3400MiB/s ( 0.80/09.06%) |
  +------+--------------------------+--------------------------+

  Sequential writes using io_uring, QD=16.
  Bandwidth and CPU usage (usr/sys):

  | size |        zero copy         |          bounce          |
  +------+--------------------------+--------------------------+
  |   4k |  882MiB/s (11.83/33.88%) |  750MiB/s (10.53/34.08%) |
  |  64K | 2009MiB/s ( 7.33/15.80%) | 2007MiB/s ( 7.47/24.71%) |
  |   1M | 1992MiB/s ( 7.26/ 9.13%) | 1992MiB/s ( 9.21/19.11%) |
  +------+--------------------------+--------------------------+

Note that the 64k read numbers look really odd to me for the baseline
zero copy case, but are reproducible over many repeated runs.

The bounce read numbers should further improve when moving the PI
validation to the file system and removing the double context switch,
which I have patches for that will sent as soon as we are done with
this series.

Diffstat:
 block/bio.c           |  323 ++++++++++++++++++++++++++++++--------------------
 block/blk.h           |   11 -
 fs/iomap/direct-io.c  |  189 +++++++++++++++--------------
 fs/iomap/ioend.c      |    8 +
 fs/xfs/xfs_aops.c     |    8 -
 fs/xfs/xfs_file.c     |   41 +++++-
 include/linux/bio.h   |   26 ++++
 include/linux/iomap.h |    9 +
 include/linux/uio.h   |    3 
 lib/iov_iter.c        |   98 +++++++++++++++
 10 files changed, 490 insertions(+), 226 deletions(-)

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2026-02-03 16:32 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20260123121444epcas5p4e729259011e031a28be8379ea3b9b749@epcas5p4.samsung.com>
2026-01-19  7:44 ` bounce buffer direct I/O when stable pages are required v2 Christoph Hellwig
2026-01-19  7:44   ` [PATCH 01/14] block: refactor get_contig_folio_len Christoph Hellwig
2026-01-22 11:00     ` Johannes Thumshirn
2026-01-22 17:54     ` Darrick J. Wong
2026-01-23  8:32     ` Damien Le Moal
2026-01-23  8:35       ` Christoph Hellwig
2026-01-23  8:44         ` Damien Le Moal
2026-01-23  8:45     ` Damien Le Moal
2026-01-23 12:14     ` Anuj Gupta
2026-01-19  7:44   ` [PATCH 02/14] block: open code bio_add_page and fix handling of mismatching P2P ranges Christoph Hellwig
2026-01-22 11:04     ` Johannes Thumshirn
2026-01-22 17:59     ` Darrick J. Wong
2026-01-23  5:43       ` Christoph Hellwig
2026-01-23  7:05         ` Darrick J. Wong
2026-01-23  8:35     ` Damien Le Moal
2026-01-23 12:15     ` Anuj Gupta
2026-01-19  7:44   ` [PATCH 03/14] iov_iter: extract a iov_iter_extract_bvecs helper from bio code Christoph Hellwig
2026-01-22 17:47     ` Darrick J. Wong
2026-01-23  5:44       ` Christoph Hellwig
2026-01-23  7:09         ` Darrick J. Wong
2026-01-23  7:14           ` Christoph Hellwig
2026-01-23 11:37     ` David Howells
2026-01-23 13:58       ` Christoph Hellwig
2026-01-23 14:57         ` David Howells
2026-01-26 17:36           ` Matthew Wilcox
2026-01-27  5:13             ` Christoph Hellwig
2026-01-27  5:44               ` Matthew Wilcox
2026-01-27  5:47                 ` Christoph Hellwig
2026-02-03  8:20           ` Askar Safin
2026-02-03 10:28           ` Askar Safin
2026-02-03 16:32             ` Christoph Hellwig
2026-01-19  7:44   ` [PATCH 04/14] block: remove bio_release_page Christoph Hellwig
2026-01-22 11:14     ` Johannes Thumshirn
2026-01-22 17:26     ` Darrick J. Wong
2026-01-23  8:43     ` Damien Le Moal
2026-01-23 12:17     ` Anuj Gupta
2026-01-19  7:44   ` [PATCH 05/14] block: add helpers to bounce buffer an iov_iter into bios Christoph Hellwig
2026-01-22 13:05     ` Johannes Thumshirn
2026-01-22 17:25     ` Darrick J. Wong
2026-01-23  5:51       ` Christoph Hellwig
2026-01-23  7:11         ` Darrick J. Wong
2026-01-23  7:16           ` Christoph Hellwig
2026-01-23  8:52     ` Damien Le Moal
2026-01-23 12:20     ` Anuj Gupta
2026-01-19  7:44   ` [PATCH 06/14] iomap: fix submission side handling of completion side errors Christoph Hellwig
2026-01-19 17:40     ` Darrick J. Wong
2026-01-23  8:54     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 07/14] iomap: simplify iomap_dio_bio_iter Christoph Hellwig
2026-01-19 17:43     ` Darrick J. Wong
2026-01-23  8:55     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 08/14] iomap: split out the per-bio logic from iomap_dio_bio_iter Christoph Hellwig
2026-01-23  8:57     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 09/14] iomap: share code between iomap_dio_bio_end_io and iomap_finish_ioend_direct Christoph Hellwig
2026-01-23  8:58     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 10/14] iomap: free the bio before completing the dio Christoph Hellwig
2026-01-19 17:43     ` Darrick J. Wong
2026-01-23  8:59     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 11/14] iomap: rename IOMAP_DIO_DIRTY to IOMAP_DIO_USER_BACKED Christoph Hellwig
2026-01-23  9:00     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 12/14] iomap: support ioends for direct reads Christoph Hellwig
2026-01-23  9:02     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 13/14] iomap: add a flag to bounce buffer direct I/O Christoph Hellwig
2026-01-23  9:05     ` Damien Le Moal
2026-01-19  7:44   ` [PATCH 14/14] xfs: use bounce buffering direct I/O when the device requires stable pages Christoph Hellwig
2026-01-19 17:45     ` Darrick J. Wong
2026-01-23  9:08     ` Damien Le Moal
2026-01-23 12:10   ` bounce buffer direct I/O when stable pages are required v2 Anuj Gupta
2026-01-23 14:01     ` Christoph Hellwig
2026-01-23 14:09     ` Keith Busch
2026-01-23 12:24   ` Christian Brauner
2026-01-23 14:10     ` block or iomap tree, was: " Christoph Hellwig
2026-01-27 10:31       ` Christian Brauner
2026-01-27 12:50         ` Christoph Hellwig
2026-01-14  7:40 bounce buffer direct I/O when stable pages are required Christoph Hellwig
2026-01-14  7:41 ` [PATCH 03/14] iov_iter: extract a iov_iter_extract_bvecs helper from bio code Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox