Re: [PATCH v3 6/8] filemap: Allow __filemap_get_folio to allocate large folios

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Wang Yugui <wangyugui@e16-tech.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J . Wong" <djwong@kernel.org>
Subject: Re: [PATCH v3 6/8] filemap: Allow __filemap_get_folio to allocate large folios
Date: Tue, 13 Jun 2023 03:00:14 +0100	[thread overview]
Message-ID: <ZIfNrnUsJbcWGSD8@casper.infradead.org> (raw)
In-Reply-To: <ZIfGpWYNA1yd5K/l@dread.disaster.area>

On Tue, Jun 13, 2023 at 11:30:13AM +1000, Dave Chinner wrote:
> On Tue, Jun 13, 2023 at 01:42:51AM +0100, Matthew Wilcox wrote:
> > On Tue, Jun 13, 2023 at 08:49:05AM +1000, Dave Chinner wrote:
> > > On Mon, Jun 12, 2023 at 09:39:08PM +0100, Matthew Wilcox (Oracle) wrote:
> > > > Allow callers of __filemap_get_folio() to specify a preferred folio
> > > > order in the FGP flags.  This is only honoured in the FGP_CREATE path;
> > > > if there is already a folio in the page cache that covers the index,
> > > > we will return it, no matter what its order is.  No create-around is
> > > > attempted; we will only create folios which start at the specified index.
> > > > Unmodified callers will continue to allocate order 0 folios.
> > > .....
> > > > -		/* Init accessed so avoid atomic mark_page_accessed later */
> > > > -		if (fgp_flags & FGP_ACCESSED)
> > > > -			__folio_set_referenced(folio);
> > > > +		if (!mapping_large_folio_support(mapping))
> > > > +			order = 0;
> > > > +		if (order > MAX_PAGECACHE_ORDER)
> > > > +			order = MAX_PAGECACHE_ORDER;
> > > > +		/* If we're not aligned, allocate a smaller folio */
> > > > +		if (index & ((1UL << order) - 1))
> > > > +			order = __ffs(index);
> > > 
> > > If I read this right, if we pass in an unaligned index, we won't get
> > > the size of the folio we ask for?
> > 
> > Right.  That's implied by (but perhaps not obvious from) the changelog.
> > Folios are always naturally aligned in the file, so an order-4 folio
> > has to start at a multiple of 16.  If the index you pass in is not
> > a multiple of 16, we can't create an order-4 folio without starting
> > at an earlier index.
> > 
> > For a 4kB block size filesystem, that's what we want.  Applications
> > _generally_ don't write backwards, so creating an order-4 folio is just
> > wasting memory.
> > 
> > > e.g. if we want an order-4 folio (64kB) because we have a 64kB block
> > > size in the filesystem, then we have to pass in an index that
> > > order-4 aligned, yes?
> > > 
> > > I ask this, because the later iomap code that asks for large folios
> > > only passes in "pos >> PAGE_SHIFT" so it looks to me like it won't
> > > allocate large folios for anything other than large folio aligned
> > > writes, even if we need them.
> > > 
> > > What am I missing?
> > 
> > Perhaps what you're missing is that this isn't trying to solve the
> > problem of supporting a bs > ps filesystem?
> 
> No, that's not what I'm asking about. I know there's other changes
> needed to enforce minimum folio size/alignment for bs > ps.

OK.  Bringing up the 64kB block size filesystem confused me.

> What I'm asking about is when someone does a 16kB write at offset
> 12kB, they won't get a large folio allocated at all, right? Even
> though the write is large enough to enable it?

Right.

> Indeed, if we do a 1MB write at offset 4KB, we'll get 4kB at 4KB, 8KB
> and 12kB (because we can't do order-1 folios), then order-2 at 16KB,
> order-3 at 32kB, and so on until we hit offset 1MB where we will do
> an order-0 folio allocation again (because the remaining length is
> 4KB). The next 1MB write will then follow the same pattern, right?

Yes.  Assuming we get another write ...

> I think this ends up being sub-optimal and fairly non-obvious
> non-obvious behaviour from the iomap side of the fence which is
> clearly asking for high-order folios to be allocated. i.e. a small
> amount of allocate-around to naturally align large folios when the
> page cache is otherwise empty would make a big difference to the
> efficiency of non-large-folio-aligned sequential writes...

At this point we're arguing about what I/O pattern to optimise for.
I'm going for a "do no harm" approach where we only allocate exactly as
much memory as we did before.  You're advocating for a
higher-risk/higher-reward approach.

I'd prefer the low-risk approach for now; we can change it later!
I'd like to see some amount of per-fd write history (as we have per-fd
readahead history) to decide whether to allocate large folios ahead of
the current write position.  As with readahead, I'd like to see that even
doing single-byte writes can result in the allocation of large folios,
as long as the app has done enough of them.

next prev parent reply	other threads:[~2023-06-13  2:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 20:39 [PATCH v3 0/8] Create large folios in iomap buffered write path Matthew Wilcox (Oracle)
2023-06-12 20:39 ` [PATCH v3 1/8] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() Matthew Wilcox (Oracle)
2023-06-13  4:52   ` Christoph Hellwig
2023-07-10  3:36     ` Matthew Wilcox
2023-06-12 20:39 ` [PATCH v3 2/8] iomap: Remove large folio handling in iomap_invalidate_folio() Matthew Wilcox (Oracle)
2023-06-12 20:39 ` [PATCH v3 3/8] doc: Correct the description of ->release_folio Matthew Wilcox (Oracle)
2023-06-13  4:53   ` Christoph Hellwig
2023-06-12 20:39 ` [PATCH v3 4/8] iomap: Remove unnecessary test from iomap_release_folio() Matthew Wilcox (Oracle)
2023-06-13  4:53   ` Christoph Hellwig
2023-06-13 16:19   ` Matthew Wilcox
2023-06-12 20:39 ` [PATCH v3 5/8] filemap: Add fgf_t typedef Matthew Wilcox (Oracle)
2023-06-13  4:53   ` Christoph Hellwig
2023-06-12 20:39 ` [PATCH v3 6/8] filemap: Allow __filemap_get_folio to allocate large folios Matthew Wilcox (Oracle)
2023-06-12 22:49   ` Dave Chinner
2023-06-13  0:42     ` Matthew Wilcox
2023-06-13  1:30       ` Dave Chinner
2023-06-13  2:00         ` Matthew Wilcox [this message]
2023-06-13  7:54           ` Dave Chinner
2023-06-13 13:34             ` Matthew Wilcox
2023-06-16 17:45             ` Matthew Wilcox
2023-06-16 22:40               ` Dave Chinner
2023-06-13  4:56   ` Christoph Hellwig
2023-06-12 20:39 ` [PATCH v3 7/8] iomap: Create large folios in the buffered write path Matthew Wilcox (Oracle)
2023-06-13  4:56   ` Christoph Hellwig
2023-06-12 20:39 ` [PATCH v3 8/8] iomap: Copy larger chunks from userspace Matthew Wilcox (Oracle)
2023-06-13  4:58   ` Christoph Hellwig
2023-06-13 19:43     ` Matthew Wilcox
2023-07-10  3:45       ` Matthew Wilcox
2023-06-17  7:13   ` Ritesh Harjani
2023-06-19 17:09     ` Matthew Wilcox
2023-07-10  3:57       ` Matthew Wilcox
2023-06-21 12:03 ` [PATCH v3 0/8] Create large folios in iomap buffered write path Wang Yugui
2023-07-10  3:55   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIfNrnUsJbcWGSD8@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.