All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Mike Snitzer <snitzer@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Don Dutile <ddutile@redhat.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	linux-block@vger.kernel.org, ming.lei@redhat.com
Subject: Re: [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range
Date: Mon, 29 Jan 2024 11:57:45 +0800	[thread overview]
Message-ID: <ZbciOba1h3V9mmup@fedora> (raw)
In-Reply-To: <ZbcDvTkeDKttPfJ4@dread.disaster.area>

On Mon, Jan 29, 2024 at 12:47:41PM +1100, Dave Chinner wrote:
> On Sun, Jan 28, 2024 at 07:39:49PM -0500, Mike Snitzer wrote:
> > On Sun, Jan 28, 2024 at 7:22 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Sun, Jan 28, 2024 at 06:12:29PM -0500, Mike Snitzer wrote:
> > > > On Sun, Jan 28 2024 at  5:02P -0500,
> > > > Matthew Wilcox <willy@infradead.org> wrote:
> > > Understood.  But ... the application is asking for as much readahead as
> > > possible, and the sysadmin has said "Don't readahead more than 64kB at
> > > a time".  So why will we not get a bug report in 1-15 years time saying
> > > "I put a limit on readahead and the kernel is ignoring it"?  I think
> > > typically we allow the sysadmin to override application requests,
> > > don't we?
> > 
> > The application isn't knowingly asking for readahead.  It is asking to
> > mmap the file (and reporter wants it done as quickly as possible..
> > like occurred before).
> 
> ... which we do within the constraints of the given configuration.
> 
> > This fix is comparable to Jens' commit 9491ae4aade6 ("mm: don't cap
> > request size based on read-ahead setting") -- same logic, just applied
> > to callchain that ends up using madvise(MADV_WILLNEED).
> 
> Not really. There is a difference between performing a synchronous
> read IO here that we must complete, compared to optimistic
> asynchronous read-ahead which we can fail or toss away without the
> user ever seeing the data the IO returned.

Yeah, the big readahead in this patch happens when user starts to read
over mmaped buffer instead of madvise().

> 
> We want required IO to be done in as few, larger IOs as possible,
> and not be limited by constraints placed on background optimistic
> IOs.
> 
> madvise(WILLNEED) is optimistic IO - there is no requirement that it
> complete the data reads successfully. If the data is actually
> required, we'll guarantee completion when the user accesses it, not
> when madvise() is called.  IOWs, madvise is async readahead, and so
> really should be constrained by readahead bounds and not user IO
> bounds.
> 
> We could change this behaviour for madvise of large ranges that we
> force into the page cache by ignoring device readahead bounds, but
> I'm not sure we want to do this in general.
> 
> Perhaps fadvise/madvise(willneed) can fiddle the file f_ra.ra_pages
> value in this situation to override the device limit for large
> ranges (for some definition of large - say 10x bdi->ra_pages) and
> restore it once the readahead operation is done. This would make it
> behave less like readahead and more like a user read from an IO
> perspective...

->ra_pages is just one hint, which is 128KB at default, and either
device or userspace can override it.

fadvise/madvise(willneed) already readahead bytes from bdi->io_pages which
is the max device sector size(often 10X of ->ra_pages), please see
force_page_cache_ra().

Follows the current report:

1) usersapce call madvise(willneed, 1G)

2) only the 1st part(size is from bdi->io_pages, suppose it is 2MB) is
readahead in madvise(willneed, 1G) since commit 6d2be915e589

3) the other parts(2M ~ 1G) is readahead by unit of bdi->ra_pages which is
set as 64KB by userspace when userspace reads the mmaped buffer, then
the whole application becomes slower.

This patch changes 3) to use bdi->io_pages as readahead unit.


Thanks,
Ming


  parent reply	other threads:[~2024-01-29  3:58 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-28 14:25 [RFC PATCH] mm/readahead: readahead aggressively if read drops in willneed range Ming Lei
2024-01-28 22:02 ` Matthew Wilcox
2024-01-28 23:12   ` Mike Snitzer
2024-01-29  0:21     ` Matthew Wilcox
2024-01-29  0:39       ` Mike Snitzer
2024-01-29  1:47         ` Dave Chinner
2024-01-29  2:12           ` Mike Snitzer
2024-01-29  4:56             ` Dave Chinner
2024-01-29  3:57           ` Ming Lei [this message]
2024-01-29  5:15             ` Dave Chinner
2024-01-29  8:25               ` Ming Lei
2024-01-29 13:26                 ` Matthew Wilcox
2024-01-29 22:07                 ` Dave Chinner
2024-01-30  3:13                   ` Ming Lei
2024-01-30  5:29                     ` Dave Chinner
2024-01-30 11:34                       ` Ming Lei
2024-01-29  3:20       ` Ming Lei
2024-01-29  3:00   ` Ming Lei
2024-01-29 17:19 ` Mike Snitzer
2024-01-29 17:42   ` Mike Snitzer
2024-01-29 22:12   ` Dave Chinner
2024-01-29 22:46     ` Mike Snitzer
2024-01-30 10:43       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbciOba1h3V9mmup@fedora \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=ddutile@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=snitzer@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.