linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: Improve read ahead size for rotational devices
@ 2025-06-16  6:28 Damien Le Moal
  2025-06-16 21:24 ` Martin K. Petersen
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Damien Le Moal @ 2025-06-16  6:28 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-fsdevel, linux-mm

For a device that does not advertize an optimal I/O size, the function
blk_apply_bdi_limits() defaults to an initial setting of the ra_pages
field of struct backing_dev_info to VM_READAHEAD_PAGES, that is, 128 KB.

This low I/O size value is far from being optimal for hard-disk devices:
when reading files from multiple contexts using buffered I/Os, the seek
overhead between the small read commands generated to read-ahead
multiple files will significantly limit the performance that can be
achieved.

This fact applies to all ATA devices as ATA does not define an optimal
I/O size and the SCSI SAT specification does not define a default value
to expose to the host.

Modify blk_apply_bdi_limits() to use a device max_sectors limit to
calculate the ra_pages field of struct backing_dev_info, when the device
is a rotational one (BLK_FEAT_ROTATIONAL feature is set). For a SCSI
disk, this defaults to 2560 KB, which significantly improve performance
for buffered reads. Using XFS and sequentially reading randomly selected
(large) files stored on a SATA HDD, the maximum throughput achieved with
8 readers reading files with 1MB buffered I/Os increases from 122 MB/s
to 167 MB/s (+36%). The improvement is even larger when reading files
using 128 KB buffered I/Os, with a throughput increasing from 57 MB/s to
165 MB/s (+189%).

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-settings.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..66d402de9026 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -58,16 +58,24 @@ EXPORT_SYMBOL(blk_set_stacking_limits);
 void blk_apply_bdi_limits(struct backing_dev_info *bdi,
 		struct queue_limits *lim)
 {
+	u64 io_opt = lim->io_opt;
+
 	/*
 	 * For read-ahead of large files to be effective, we need to read ahead
-	 * at least twice the optimal I/O size.
+	 * at least twice the optimal I/O size. For rotational devices that do
+	 * not report an optimal I/O size (e.g. ATA HDDs), use the maximum I/O
+	 * size to avoid falling back to the (rather inefficient) small default
+	 * read-ahead size.
 	 *
 	 * There is no hardware limitation for the read-ahead size and the user
 	 * might have increased the read-ahead size through sysfs, so don't ever
 	 * decrease it.
 	 */
+	if (!io_opt && (lim->features & BLK_FEAT_ROTATIONAL))
+		io_opt = (u64)lim->max_sectors << SECTOR_SHIFT;
+
 	bdi->ra_pages = max3(bdi->ra_pages,
-				lim->io_opt * 2 / PAGE_SIZE,
+				io_opt * 2 >> PAGE_SHIFT,
 				VM_READAHEAD_PAGES);
 	bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-29 12:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-16  6:28 [PATCH] block: Improve read ahead size for rotational devices Damien Le Moal
2025-06-16 21:24 ` Martin K. Petersen
2025-06-16 22:45   ` Damien Le Moal
2025-06-17  5:10   ` Christoph Hellwig
2025-06-17  5:09 ` Christoph Hellwig
2025-06-17  6:58 ` John Garry
2025-07-29  7:58 ` Damien Le Moal
2025-07-29 12:27 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).