Re: [PATCH v4 7/9] filemap: Allow __filemap_get_folio to allocate large folios

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Luis Chamberlain <mcgrof@kernel.org>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	Wang Yugui <wangyugui@e16-tech.com>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v4 7/9] filemap: Allow __filemap_get_folio to allocate large folios
Date: Mon, 10 Jul 2023 16:58:55 -0700	[thread overview]
Message-ID: <ZKybP22DRs1w4G3a@bombadil.infradead.org> (raw)
In-Reply-To: <20230710130253.3484695-8-willy@infradead.org>

On Mon, Jul 10, 2023 at 02:02:51PM +0100, Matthew Wilcox (Oracle) wrote:
> Allow callers of __filemap_get_folio() to specify a preferred folio
> order in the FGP flags.  This is only honoured in the FGP_CREATE path;
> if there is already a folio in the page cache that covers the index,
> we will return it, no matter what its order is.  No create-around is
> attempted; we will only create folios which start at the specified index.
> Unmodified callers will continue to allocate order 0 folios.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/pagemap.h | 34 ++++++++++++++++++++++++++++++
>  mm/filemap.c            | 46 +++++++++++++++++++++++++++++------------
>  mm/readahead.c          | 13 ------------
>  3 files changed, 67 insertions(+), 26 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 911201fc41fc..d87840acbfb2 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -470,6 +470,19 @@ static inline void *detach_page_private(struct page *page)
>  	return folio_detach_private(page_folio(page));
>  }
>  
> +/*
> + * There are some parts of the kernel which assume that PMD entries
> + * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
> + * limit the maximum allocation order to PMD size.  I'm not aware of any
> + * assumptions about maximum order if THP are disabled, but 8 seems like
> + * a good order (that's 1MB if you're using 4kB pages)
> + */
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
> +#else
> +#define MAX_PAGECACHE_ORDER	8
> +#endif
> +
>  #ifdef CONFIG_NUMA
>  struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
>  #else
> @@ -535,9 +548,30 @@ typedef unsigned int __bitwise fgf_t;
>  #define FGP_NOWAIT		((__force fgf_t)0x00000020)
>  #define FGP_FOR_MMAP		((__force fgf_t)0x00000040)
>  #define FGP_STABLE		((__force fgf_t)0x00000080)
> +#define FGF_GET_ORDER(fgf)	(((__force unsigned)fgf) >> 26)	/* top 6 bits */
>  
>  #define FGP_WRITEBEGIN		(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE)
>  
> +/**
> + * fgf_set_order - Encode a length in the fgf_t flags.
> + * @size: The suggested size of the folio to create.
> + *
> + * The caller of __filemap_get_folio() can use this to suggest a preferred
> + * size for the folio that is created.  If there is already a folio at
> + * the index, it will be returned, no matter what its size.  If a folio
> + * is freshly created, it may be of a different size than requested
> + * due to alignment constraints, memory pressure, or the presence of
> + * other folios at nearby indices.
> + */
> +static inline fgf_t fgf_set_order(size_t size)
> +{
> +	unsigned int shift = ilog2(size);
> +
> +	if (shift <= PAGE_SHIFT)
> +		return 0;
> +	return (__force fgf_t)((shift - PAGE_SHIFT) << 26);
> +}
> +
>  void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
>  struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  		fgf_t fgp_flags, gfp_t gfp);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8a669fecfd1c..baafbf324c9f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1905,7 +1905,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  		folio_wait_stable(folio);
>  no_page:
>  	if (!folio && (fgp_flags & FGP_CREAT)) {
> +		unsigned order = FGF_GET_ORDER(fgp_flags);
>  		int err;
> +
>  		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
>  			gfp |= __GFP_WRITE;
>  		if (fgp_flags & FGP_NOFS)
> @@ -1914,26 +1916,44 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  			gfp &= ~GFP_KERNEL;
>  			gfp |= GFP_NOWAIT | __GFP_NOWARN;
>  		}
> -
> -		folio = filemap_alloc_folio(gfp, 0);
> -		if (!folio)
> -			return ERR_PTR(-ENOMEM);
> -
>  		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
>  			fgp_flags |= FGP_LOCK;
>  
> -		/* Init accessed so avoid atomic mark_page_accessed later */
> -		if (fgp_flags & FGP_ACCESSED)
> -			__folio_set_referenced(folio);
> +		if (!mapping_large_folio_support(mapping))
> +			order = 0;
> +		if (order > MAX_PAGECACHE_ORDER)
> +			order = MAX_PAGECACHE_ORDER;

Curious how this ended up being the heuristic used to shoot for the
MAX_PAGECACHE_ORDER sky first, and then go down 1/2 each time. I don't
see it explained on the commit log but I'm sure there's has to be
some reasonable rationale. From the cover letter, I could guess that
it means the gains of always using the largest folio possible means
an implied latency savings through other means, so the small latencies
spent looking seem to no where compare to the saving in using. But
I rather understand a bit more for the rationale.

Are there situations where perhaps limitting this initial max preferred
high order folio might be smaller than MAX_PAGECACHE_ORDER? How if not,
how do we know?

  Luis

next prev parent reply	other threads:[~2023-07-10 23:59 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 13:02 [PATCH v4 0/9] Create large folios in iomap buffered write path Matthew Wilcox (Oracle)
2023-07-10 13:02 ` [PATCH v4 1/9] iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic() Matthew Wilcox (Oracle)
2023-07-10 23:43   ` Luis Chamberlain
2023-07-11  0:03     ` Matthew Wilcox
2023-07-11  6:30   ` Christoph Hellwig
2023-07-11 20:05   ` Kent Overstreet
2023-07-13  4:42   ` Darrick J. Wong
2023-07-13 13:46     ` Matthew Wilcox
2023-07-10 13:02 ` [PATCH v4 2/9] iov_iter: Add copy_folio_from_iter_atomic() Matthew Wilcox (Oracle)
2023-07-11  6:31   ` Christoph Hellwig
2023-07-11 20:46     ` Matthew Wilcox
2023-07-24 15:56   ` Darrick J. Wong
2023-07-10 13:02 ` [PATCH v4 3/9] iomap: Remove large folio handling in iomap_invalidate_folio() Matthew Wilcox (Oracle)
2023-07-10 13:02 ` [PATCH v4 4/9] doc: Correct the description of ->release_folio Matthew Wilcox (Oracle)
2023-07-13  4:43   ` Darrick J. Wong
2023-07-10 13:02 ` [PATCH v4 5/9] iomap: Remove unnecessary test from iomap_release_folio() Matthew Wilcox (Oracle)
2023-07-13  4:45   ` Darrick J. Wong
2023-07-13  5:25     ` Ritesh Harjani
2023-07-13  5:33       ` Darrick J. Wong
2023-07-13  5:51         ` Ritesh Harjani
2023-07-10 13:02 ` [PATCH v4 6/9] filemap: Add fgf_t typedef Matthew Wilcox (Oracle)
2023-07-13  4:47   ` Darrick J. Wong
2023-07-13  5:08   ` Kent Overstreet
2023-07-10 13:02 ` [PATCH v4 7/9] filemap: Allow __filemap_get_folio to allocate large folios Matthew Wilcox (Oracle)
2023-07-10 23:58   ` Luis Chamberlain [this message]
2023-07-11  0:07     ` Matthew Wilcox
2023-07-11  0:21       ` Luis Chamberlain
2023-07-11  0:42         ` Matthew Wilcox
2023-07-11  0:47         ` Dave Chinner
2023-07-11  0:13     ` Kent Overstreet
2023-07-13  4:50   ` Darrick J. Wong
2023-07-13  5:04   ` Kent Overstreet
2023-07-13 14:42     ` Matthew Wilcox
2023-07-13 15:19       ` Kent Overstreet
2023-07-10 13:02 ` [PATCH v4 8/9] iomap: Create large folios in the buffered write path Matthew Wilcox (Oracle)
2023-07-13  4:56   ` Darrick J. Wong
2023-07-10 13:02 ` [PATCH v4 9/9] iomap: Copy larger chunks from userspace Matthew Wilcox (Oracle)
2023-07-13  4:58   ` Darrick J. Wong
2023-07-10 22:55 ` [PATCH v4 0/9] Create large folios in iomap buffered write path Luis Chamberlain
2023-07-10 23:53   ` Matthew Wilcox
2023-07-11  0:01   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKybP22DRs1w4G3a@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wangyugui@e16-tech.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).