From: Mike Snitzer <snitzer@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Don Dutile <ddutile@redhat.com>,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>
Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range
Date: Sun, 28 Jan 2024 18:12:29 -0500 [thread overview]
Message-ID: <ZbbfXVg9FpWRUVDn@redhat.com> (raw)
In-Reply-To: <ZbbPCQZdazF7s0_b@casper.infradead.org>
On Sun, Jan 28 2024 at 5:02P -0500,
Matthew Wilcox <willy@infradead.org> wrote:
> On Sun, Jan 28, 2024 at 10:25:22PM +0800, Ming Lei wrote:
> > Since commit 6d2be915e589 ("mm/readahead.c: fix readahead failure for
> > memoryless NUMA nodes and limit readahead max_pages"), ADV_WILLNEED
> > only tries to readahead 512 pages, and the remained part in the advised
> > range fallback on normal readahead.
>
> Does the MAINTAINERS file mean nothing any more?
"Ming, please use scripts/get_maintainer.pl when submitting patches."
(I've cc'd accordingly with this email).
> > If bdi->ra_pages is set as small, readahead will perform not efficient
> > enough. Increasing read ahead may not be an option since workload may
> > have mixed random and sequential I/O.
>
> I thik there needs to be a lot more explanation than this about what's
> going on before we jump to "And therefore this patch is the right
> answer".
The patch is "RFC". Ming didn't declare his RFC is "the right answer".
All ideas for how best to fix this issue are welcome.
I agree this patch's header could've worked harder to establish the
problem that it fixes. But I'll now take a crack at backfilling the
regression report that motivated this patch be developed:
Linux 3.14 was the last kernel to allow madvise (MADV_WILLNEED)
allowed mmap'ing a file more optimally if read_ahead_kb < max_sectors_kb.
Ths regressed with commit 6d2be915e589 (so Linux 3.15) such that
mounting XFS on a device with read_ahead_kb=64 and max_sectors_kb=1024
and running this reproducer against a 2G file will take ~5x longer
(depending on the system's capabilities), mmap_load_test.java follows:
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
public class mmap_load_test {
public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException {
if (args.length == 0) {
System.out.println("Please provide a file");
System.exit(0);
}
FileChannel fc = new RandomAccessFile(new File(args[0]), "rw").getChannel();
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
System.out.println("Loading the file");
long startTime = System.currentTimeMillis();
mem.load();
long endTime = System.currentTimeMillis();
System.out.println("Done! Loading took " + (endTime-startTime) + " ms");
}
}
reproduce with:
javac mmap_load_test.java
echo 64 > /sys/block/sda/queue/read_ahead_kb
echo 1024 > /sys/block/sda/queue/max_sectors_kb
mkfs.xfs /dev/sda
mount /dev/sda /mnt/test
dd if=/dev/zero of=/mnt/test/2G_file bs=1024k count=2000
echo 3 > /proc/sys/vm/drop_caches
java mmap_load_test /mnt/test/2G_file
Without a fix, like the patch Ming provided, iostat will show rareq-sz
is 64 rather than ~1024.
> > @@ -972,6 +974,7 @@ struct file_ra_state {
> > unsigned int ra_pages;
> > unsigned int mmap_miss;
> > loff_t prev_pos;
> > + struct maple_tree *need_mt;
>
> No. Embed the struct maple tree. Don't allocate it.
Constructive feedback, thanks.
> What made you think this was the right approach?
But then you closed with an attack, rather than inform Ming and/or
others why you feel so strongly, e.g.: Best to keep memory used for
file_ra_state contiguous.
next prev parent reply other threads:[~2024-01-28 23:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-28 14:25 [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Ming Lei
2024-01-28 22:02 ` Matthew Wilcox
2024-01-28 23:12 ` Mike Snitzer [this message]
2024-01-29 0:21 ` Matthew Wilcox
2024-01-29 0:39 ` Mike Snitzer
2024-01-29 1:47 ` Dave Chinner
2024-01-29 2:12 ` Mike Snitzer
2024-01-29 4:56 ` Dave Chinner
2024-01-29 3:57 ` Ming Lei
2024-01-29 5:15 ` Dave Chinner
2024-01-29 8:25 ` Ming Lei
2024-01-29 13:26 ` Matthew Wilcox
2024-01-29 22:07 ` Dave Chinner
2024-01-30 3:13 ` Ming Lei
2024-01-30 5:29 ` Dave Chinner
2024-01-30 11:34 ` Ming Lei
2024-01-29 3:20 ` Ming Lei
2024-01-29 3:00 ` Ming Lei
2024-01-29 17:19 ` Mike Snitzer
2024-01-29 17:42 ` Mike Snitzer
2024-01-29 22:12 ` Dave Chinner
2024-01-29 22:46 ` Mike Snitzer
2024-01-30 10:43 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbbfXVg9FpWRUVDn@redhat.com \
--to=snitzer@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=ddutile@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ming.lei@redhat.com \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.