linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior
@ 2025-02-21 21:13 Kalesh Singh
  2025-02-22 18:03 ` Kent Overstreet
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Kalesh Singh @ 2025-02-21 21:13 UTC (permalink / raw)
  To: lsf-pc, open list:MEMORY MANAGEMENT, linux-fsdevel
  Cc: Suren Baghdasaryan, David Hildenbrand, Lorenzo Stoakes,
	Liam R. Howlett, Juan Yescas, android-mm, Matthew Wilcox,
	Vlastimil Babka, Michal Hocko

Hi organizers of LSF/MM,

I realize this is a late submission, but I was hoping there might
still be a chance to have this topic considered for discussion.

Problem Statement
===============

Readahead can result in unnecessary page cache pollution for mapped
regions that are never accessed. Current mechanisms to disable
readahead lack granularity and rather operate at the file or VMA
level. This proposal seeks to initiate discussion at LSFMM to explore
potential solutions for optimizing page cache/readahead behavior.


Background
=========

The read-ahead heuristics on file-backed memory mappings can
inadvertently populate the page cache with pages corresponding to
regions that user-space processes are known never to access e.g ELF
LOAD segment padding regions. While these pages are ultimately
reclaimable, their presence precipitates unnecessary I/O operations,
particularly when a substantial quantity of such regions exists.

Although the underlying file can be made sparse in these regions to
mitigate I/O, readahead will still allocate discrete zero pages when
populating the page cache within these ranges. These pages, while
subject to reclaim, introduce additional churn to the LRU. This
reclaim overhead is further exacerbated in filesystems that support
"fault-around" semantics, that can populate the surrounding pages’
PTEs if found present in the page cache.

While the memory impact may be negligible for large files containing a
limited number of sparse regions, it becomes appreciable for many
small mappings characterized by numerous holes. This scenario can
arise from efforts to minimize vm_area_struct slab memory footprint.

Limitations of Existing Mechanisms
===========================

fadvise(..., POSIX_FADV_RANDOM, ...): disables read-ahead for the
entire file, rather than specific sub-regions. The offset and length
parameters primarily serve the POSIX_FADV_WILLNEED [1] and
POSIX_FADV_DONTNEED [2] cases.

madvise(..., MADV_RANDOM, ...): Similarly, this applies on the entire
VMA, rather than specific sub-regions. [3]
Guard Regions: While guard regions for file-backed VMAs circumvent
fault-around concerns, the fundamental issue of unnecessary page cache
population persists. [4]

Empirical Demonstration
===================

Below is a simple program to demonstrate the issue. Assume that the
last 20 pages of the mapping is a region known to never be accessed
(perhaps a guard region).

cachestat is a simple C program I wrote that returns the nr_cached for
the entire file using the new cachestat() syscall [5].

cat pollute_page_cache.sh

#!/bin/bash

FILE="myfile.txt"

echo "Creating sparse file of size 25 pages"
truncate -s 100k $FILE

apparent_size=$(ls -lahs $FILE | awk '{ print $6 }')
echo "Apparent Size: $apparent_size"

real_size=$(ls -lahs $FILE | awk '{ print $1 }')
echo "Real Size: $real_size"

nr_cached=$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }')
echo "Number cached pages: $nr_cached"

echo "Reading first 5 pages..."
head -c 20k $FILE

nr_cached=$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }')
echo "Number cached pages: $nr_cached"

rm $FILE

-------

./pollute_page_cache.sh
Creating sparse file of size 25 pages
Apparent Size: 100K
Real Size: 0
Number cached pages: 0
Reading first 5 pages...
Number cached pages: 25


Thanks,
Kalesh

[1] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L96
[2] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L113
[3] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/madvise.c#L1277
[4] https://lore.kernel.org/r/cover.1739469950.git.lorenzo.stoakes@oracle.com/
[5] https://lore.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.com/


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-04-02  0:14 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21 21:13 [LSF/MM/BPF TOPIC] Optimizing Page Cache Readahead Behavior Kalesh Singh
2025-02-22 18:03 ` Kent Overstreet
2025-02-23  5:36   ` Kalesh Singh
2025-02-23  5:42     ` Kalesh Singh
2025-02-23  9:30     ` Lorenzo Stoakes
2025-02-23 12:24       ` Matthew Wilcox
2025-02-23  5:34 ` Ritesh Harjani
2025-02-23  6:50   ` Kalesh Singh
2025-02-24 12:56   ` David Sterba
2025-02-24 14:14 ` [Lsf-pc] " Jan Kara
2025-02-24 14:21   ` Lorenzo Stoakes
2025-02-24 16:31     ` Jan Kara
2025-02-24 16:52       ` Lorenzo Stoakes
2025-02-24 21:36         ` Kalesh Singh
2025-02-24 21:55           ` Kalesh Singh
2025-02-24 23:56           ` Dave Chinner
2025-02-25  6:45             ` Kalesh Singh
2025-02-27 22:12             ` Matthew Wilcox
2025-02-28  1:12               ` Dave Chinner
2025-02-28  9:07               ` David Hildenbrand
2025-04-02  0:13                 ` Kalesh Singh
2025-02-25  5:44           ` Lorenzo Stoakes
2025-02-25  6:59             ` Kalesh Singh
2025-02-25 16:36           ` Jan Kara
2025-02-26  0:49             ` Kalesh Singh
2025-02-25 16:21         ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).