All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v4 0/3] block: enable RWF_DONTCACHE for block devices
@ 2026-03-25 18:42 Tal Zussman
  2026-03-25 18:43 ` [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion Tal Zussman
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Tal Zussman @ 2026-03-25 18:42 UTC (permalink / raw)
  To: Jens Axboe, Matthew Wilcox (Oracle), Christian Brauner,
	Darrick J. Wong, Carlos Maiolino, Alexander Viro, Jan Kara
  Cc: Christoph Hellwig, linux-block, linux-kernel, linux-xfs,
	linux-fsdevel, linux-mm, Tal Zussman

Add support for using RWF_DONTCACHE with block devices.

Dropbehind pruning needs to be done in non-IRQ context, but block
devices complete writeback in IRQ context.

To fix this, we can defer dropbehind invalidation to task context. We
introduce a new BIO_COMPLETE_IN_TASK flag that allows the bio submitter
to request task-context completion of bi_end_io. When bio_endio() sees
this flag in non-task context, it queues the bio to a per-CPU list and
schedules a work item to do bio completion.

Patch 1 adds the BIO_COMPLETE_IN_TASK infrastructure in the block
layer.

Patch 2 wires BIO_COMPLETE_IN_TASK into iomap writeback for DONTCACHE
folios and removes the DONTCACHE workqueue deferral from XFS.

Patch 3 enables RWF_DONTCACHE for block devices, setting
BIO_COMPLETE_IN_TASK in submit_bh_wbc() for the CONFIG_BUFFER_HEAD
path.

This support is useful for databases that operate on raw block devices,
among other userspace applications.

I tested this (with CONFIG_BUFFER_HEAD=y) for reads and writes on a
single block device on a VM, so results may be noisy.

Reads were tested on the root partition with a 45GB range (~2x RAM).
Writes were tested on a disabled swap parition (~1GB) in a memcg of size
244MB to force reclaim pressure.

Results:

===== READS (/dev/nvme0n1p2) =====
 sec   normal MB/s  dontcache MB/s
----  ------------  --------------
   1        1098.6          1609.0
   2        1270.3          1506.6
   3        1093.3          1576.5
   4        1141.8          2393.9
   5        1365.3          2793.8
   6        1324.6          2065.9
   7         879.6          1920.7
   8        1434.1          1662.4
   9        1184.9          1857.9
  10        1166.4          1702.8
  11        1161.4          1653.4
  12        1086.9          1555.4
  13        1198.5          1718.9
  14        1111.9          1752.2
----  ------------  --------------
 avg        1173.7          1828.8  (+56%)

==== WRITES (/dev/nvme0n1p3) =====
 sec   normal MB/s  dontcache MB/s
----  ------------  --------------
   1         692.4          9297.7
   2        4810.8          9342.8
   3        5221.7          2955.2
   4         396.7          8488.3
   5        7249.2          9249.3
   6        6695.4          1376.2
   7         122.9          9125.8
   8        5486.5          9414.7
   9        6921.5          8743.5
  10          27.9          8997.8
----  ------------  --------------
 avg        3762.5          7699.1  (+105%)

---
Changes in v4:
- 1/3: Move dropbehind deferral from folio-level to bio-level using
  BIO_COMPLETE_IN_TASK, per Matthew and Jan.
- 1/3: Work function yields on need_resched() to avoid hogging the CPU,
  per Jan.
- 2/3: New patch. Set BIO_COMPLETE_IN_TASK on iomap writeback bios for
  DONTCACHE folios, removing the need for XFS-specific workqueue
  deferral.
- 3/3: Set BIO_COMPLETE_IN_TASK in submit_bh_wbc() for buffer_head
  path.
- 3/3: Update commit message to mention CONFIG_BUFFER_HEAD=n path.
- Link to v3: https://lore.kernel.org/r/20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu

Changes in v3:
- 1/2: Convert dropbehind deferral to per-CPU folio_batches protected by
  local_lock using per-CPU work items, to reduce contention, per Jens.
- 1/2: Call folio_end_dropbehind_irq() directly from
  folio_end_writeback(), per Jens.
- 1/2: Add CPU hotplug dead callback to drain the departing CPU's folio
  batch.
- 2/2: Introduce block_write_begin_iocb(), per Christoph.
- 2/2: Dropped R-b due to changes.
- Link to v2: https://lore.kernel.org/r/20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu

Changes in v2:
- Add R-b from Jan Kara for 2/2.
- Add patch to defer dropbehind completion from IRQ context via a work
  item (1/2).
- Add initial performance numbers to cover letter.
- Link to v1: https://lore.kernel.org/r/20260218-blk-dontcache-v1-1-fad6675ef71f@columbia.edu

---
Tal Zussman (3):
      block: add BIO_COMPLETE_IN_TASK for task-context completion
      iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
      block: enable RWF_DONTCACHE for block devices

 block/bio.c                 | 84 ++++++++++++++++++++++++++++++++++++++++++++-
 block/fops.c                |  5 +--
 fs/buffer.c                 | 22 ++++++++++--
 fs/iomap/ioend.c            |  2 ++
 fs/xfs/xfs_aops.c           |  4 ---
 include/linux/blk_types.h   |  1 +
 include/linux/buffer_head.h |  3 ++
 7 files changed, 111 insertions(+), 10 deletions(-)
---
base-commit: 2961f841b025fb234860bac26dfb7fa7cb0fb122
change-id: 20260218-blk-dontcache-338133dd045e

Best regards,
-- 
Tal Zussman <tz2294@columbia.edu>


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-04-15  6:10 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 18:42 [PATCH RFC v4 0/3] block: enable RWF_DONTCACHE for block devices Tal Zussman
2026-03-25 18:43 ` [PATCH RFC v4 1/3] block: add BIO_COMPLETE_IN_TASK for task-context completion Tal Zussman
2026-03-25 19:54   ` Matthew Wilcox
2026-03-25 20:14   ` Jens Axboe
2026-04-08 18:48     ` Tal Zussman
2026-04-08 19:51       ` Jens Axboe
2026-04-08 22:51         ` Tal Zussman
2026-04-08 23:36           ` Jens Axboe
2026-04-09 18:54             ` Tal Zussman
2026-04-10  0:46               ` Jens Axboe
2026-04-14 20:29                 ` Tal Zussman
2026-04-15  6:10                   ` Christoph Hellwig
2026-03-25 20:26   ` Dave Chinner
2026-03-25 20:39     ` Matthew Wilcox
2026-03-26  2:44       ` Dave Chinner
2026-04-08 18:50     ` Tal Zussman
2026-03-25 21:03   ` Bart Van Assche
2026-03-26  3:18     ` Dave Chinner
2026-03-27  6:01   ` Christoph Hellwig
2026-04-08 19:35     ` Tal Zussman
2026-03-25 18:43 ` [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback Tal Zussman
2026-03-25 20:21   ` Matthew Wilcox
2026-03-27  6:03     ` Christoph Hellwig
2026-04-08 19:36     ` Tal Zussman
2026-04-08 19:44     ` Tal Zussman
2026-04-08 20:01       ` Matthew Wilcox
2026-04-08 20:10         ` Tal Zussman
2026-03-25 20:34   ` Dave Chinner
2026-03-27  6:08     ` Christoph Hellwig
2026-03-27  6:24       ` Gao Xiang
2026-03-27  6:27         ` Christoph Hellwig
2026-03-27  6:45           ` Gao Xiang
2026-03-25 18:43 ` [PATCH RFC v4 3/3] block: enable RWF_DONTCACHE for block devices Tal Zussman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.