Re: [RFC v2 02/14] filemap: align the index to mapping_min_order in the page cache

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	mcgrof@kernel.org, gost.dev@samsung.com,
	akpm@linux-foundation.org, kbusch@kernel.org, djwong@kernel.org,
	chandan.babu@oracle.com, p.raghav@samsung.com,
	linux-kernel@vger.kernel.org, hare@suse.de, willy@infradead.org,
	linux-mm@kvack.org
Subject: Re: [RFC v2 02/14] filemap: align the index to mapping_min_order in the page cache
Date: Wed, 14 Feb 2024 09:00:46 +1100	[thread overview]
Message-ID: <ZcvmjthSu0TNkf8z@dread.disaster.area> (raw)
In-Reply-To: <20240213093713.1753368-3-kernel@pankajraghav.com>

On Tue, Feb 13, 2024 at 10:37:01AM +0100, Pankaj Raghav (Samsung) wrote:
> From: Luis Chamberlain <mcgrof@kernel.org>
> 
> Supporting mapping_min_order implies that we guarantee each folio in the
> page cache has at least an order of mapping_min_order. So when adding new
> folios to the page cache we must ensure the index used is aligned to the
> mapping_min_order as the page cache requires the index to be aligned to
> the order of the folio.
> 
> A higher order folio than min_order by definition is a multiple of the
> min_order. If an index is aligned to an order higher than a min_order, it
> will also be aligned to the min order.
> 
> This effectively introduces no new functional changes when min order is
> not set other than a few rounding computations that should result in the
> same value.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
>  mm/filemap.c | 34 ++++++++++++++++++++++++++--------
>  1 file changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 750e779c23db..323a8e169581 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2479,14 +2479,16 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
>  {
>  	struct file *filp = iocb->ki_filp;
>  	struct address_space *mapping = filp->f_mapping;
> +	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
>  	struct file_ra_state *ra = &filp->f_ra;
> -	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
> +	pgoff_t index = round_down(iocb->ki_pos >> PAGE_SHIFT, min_nrpages);

This is pretty magic - this patch adds a bunch of what appear to be
random rounding operations for some undocumented reason.

>  	pgoff_t last_index;
>  	struct folio *folio;
>  	int err = 0;
>  
>  	/* "last_index" is the index of the page beyond the end of the read */
>  	last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
> +	last_index = round_up(last_index, min_nrpages);

Same here - this is pretty nasty - we round up twice, but no obvious
reason as to why the second round up exists or why it can't be done
by the DIV_ROUND_UP() call. Just looking at the code it's impossible
to reason why this is being done, let alone determine if it has been
implemented correctly.

>  retry:
>  	if (fatal_signal_pending(current))
>  		return -EINTR;
> @@ -2502,8 +2504,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
>  	if (!folio_batch_count(fbatch)) {
>  		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
>  			return -EAGAIN;
> -		err = filemap_create_folio(filp, mapping,
> -				iocb->ki_pos >> PAGE_SHIFT, fbatch);
> +		err = filemap_create_folio(filp, mapping, index, fbatch);
>  		if (err == AOP_TRUNCATED_PAGE)
>  			goto retry;
>  		return err;
> @@ -3095,7 +3096,10 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  	struct file *file = vmf->vma->vm_file;
>  	struct file_ra_state *ra = &file->f_ra;
>  	struct address_space *mapping = file->f_mapping;
> -	DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
> +	unsigned int min_order = mapping_min_folio_order(mapping);
> +	unsigned int min_nrpages = mapping_min_folio_nrpages(file->f_mapping);

Why use file->f_mapping here and not mapping? And why not just

	unsigned int min_nrpages = 1U < min_order;

So it's obvious how the index alignment is related to the folio
order?

> +	pgoff_t index = round_down(vmf->pgoff, min_nrpages);
> +	DEFINE_READAHEAD(ractl, file, ra, mapping, index);
>  	struct file *fpin = NULL;
>  	unsigned long vm_flags = vmf->vma->vm_flags;
>  	unsigned int mmap_miss;
> @@ -3147,10 +3151,11 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  	 */
>  	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
>  	ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
> +	ra->start = round_down(ra->start, min_nrpages);

Again, another random rounding operation....

>  	ra->size = ra->ra_pages;
>  	ra->async_size = ra->ra_pages / 4;
>  	ractl._index = ra->start;
> -	page_cache_ra_order(&ractl, ra, 0);
> +	page_cache_ra_order(&ractl, ra, min_order);
>  	return fpin;
>  }
>  
> @@ -3164,7 +3169,9 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
>  {
>  	struct file *file = vmf->vma->vm_file;
>  	struct file_ra_state *ra = &file->f_ra;
> -	DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, vmf->pgoff);
> +	unsigned int min_nrpages = mapping_min_folio_nrpages(file->f_mapping);
> +	pgoff_t index = round_down(vmf->pgoff, min_nrpages);
> +	DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, index);

Ok, this is begging for a new DEFINE_READAHEAD_ALIGNED() macro which
internally grabs the mapping_min_folio_nrpages() from the mapping
passed to the macro.

>  	struct file *fpin = NULL;
>  	unsigned int mmap_miss;
>  
> @@ -3212,13 +3219,17 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>  	struct file *file = vmf->vma->vm_file;
>  	struct file *fpin = NULL;
>  	struct address_space *mapping = file->f_mapping;
> +	unsigned int min_order = mapping_min_folio_order(mapping);
> +	unsigned int nrpages = 1UL << min_order;

You didn't use mapping_min_folio_nrpages() for this.

At least initialise all the variables the same way in the same
patch!

>  	struct inode *inode = mapping->host;
> -	pgoff_t max_idx, index = vmf->pgoff;
> +	pgoff_t max_idx, index = round_down(vmf->pgoff, nrpages);

Yup, I can't help but think that with how many times this is being
repeated in this patchset that a helper or two is in order:

	index = mapping_align_start_index(mapping, vmf->pgoff);

And then most of the calls to mapping_min_folio_order() and
mapping_min_folio_nrpages() can be removed from this code, too.


>  	struct folio *folio;
>  	vm_fault_t ret = 0;
>  	bool mapping_locked = false;
>  
>  	max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> +	max_idx = round_up(max_idx, nrpages);

	max_index = mapping_align_end_index(mapping,
			DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));

>  	if (unlikely(index >= max_idx))
>  		return VM_FAULT_SIGBUS;
>  
> @@ -3317,13 +3328,17 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
>  	 * We must recheck i_size under page lock.
>  	 */
>  	max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> +	max_idx = round_up(max_idx, nrpages);

Same again:

	max_index = mapping_align_end_index(mapping,
			DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
> +
>  	if (unlikely(index >= max_idx)) {
>  		folio_unlock(folio);
>  		folio_put(folio);
>  		return VM_FAULT_SIGBUS;
>  	}
>  
> -	vmf->page = folio_file_page(folio, index);
> +	VM_BUG_ON_FOLIO(folio_order(folio) < min_order, folio);
> +
> +	vmf->page = folio_file_page(folio, vmf->pgoff);
>  	return ret | VM_FAULT_LOCKED;
>  
>  page_not_uptodate:
> @@ -3658,6 +3673,9 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
>  {
>  	struct folio *folio;
>  	int err;
> +	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
> +
> +	index = round_down(index, min_nrpages);

And more magic rounding.

	index = mapping_align_start_index(mapping, index);

-Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2024-02-13 22:00 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  9:36 [RFC v2 00/14] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 01/14] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-02-13 12:03   ` Hannes Reinecke
2024-02-13 16:34   ` Darrick J. Wong
2024-02-13 21:05     ` Pankaj Raghav (Samsung)
2024-02-13 21:29       ` Darrick J. Wong
2024-02-14 19:00         ` Matthew Wilcox
2024-02-15 10:34           ` Pankaj Raghav (Samsung)
2024-02-14 18:49   ` Matthew Wilcox
2024-02-15 10:21     ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 02/14] filemap: align the index to mapping_min_order in the page cache Pankaj Raghav (Samsung)
2024-02-13 12:20   ` Hannes Reinecke
2024-02-13 21:13     ` Pankaj Raghav (Samsung)
2024-02-13 22:00   ` Dave Chinner [this message]
2024-02-13  9:37 ` [RFC v2 03/14] filemap: use mapping_min_order while allocating folios Pankaj Raghav (Samsung)
2024-02-13 14:58   ` Hannes Reinecke
2024-02-13 16:38   ` Darrick J. Wong
2024-02-13 22:05   ` Dave Chinner
2024-02-14 10:13     ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 04/14] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav (Samsung)
2024-02-13 14:59   ` Hannes Reinecke
2024-02-13 16:46   ` Darrick J. Wong
2024-02-13 22:09   ` Dave Chinner
2024-02-14 13:32     ` Pankaj Raghav (Samsung)
2024-02-14 13:53       ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 05/14] readahead: align index to mapping_min_order in ondemand_ra and force_ra Pankaj Raghav (Samsung)
2024-02-13 15:00   ` Hannes Reinecke
2024-02-13 16:46   ` Darrick J. Wong
2024-02-13 22:29   ` Dave Chinner
2024-02-14 15:10     ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 06/14] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
2024-02-13 16:47   ` Darrick J. Wong
2024-02-13  9:37 ` [RFC v2 07/14] readahead: allocate folios with mapping_min_order in ra_(unbounded|order) Pankaj Raghav (Samsung)
2024-02-13 15:01   ` Hannes Reinecke
2024-02-13 16:47   ` Darrick J. Wong
2024-02-13  9:37 ` [RFC v2 08/14] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
2024-02-13 15:02   ` Hannes Reinecke
2024-02-13  9:37 ` [RFC v2 09/14] mm: Support order-1 folios in the page cache Pankaj Raghav (Samsung)
2024-02-13 15:03   ` Hannes Reinecke
2024-02-13  9:37 ` [RFC v2 10/14] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-02-13 15:06   ` Hannes Reinecke
2024-02-13 16:30   ` Darrick J. Wong
2024-02-13 21:27     ` Pankaj Raghav (Samsung)
2024-02-13 21:30       ` Darrick J. Wong
2024-02-14 15:13         ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 11/14] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-02-13 16:27   ` Darrick J. Wong
2024-02-13 21:32     ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 12/14] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-02-13 16:26   ` Darrick J. Wong
2024-02-13 21:48     ` Pankaj Raghav (Samsung)
2024-02-13 22:44       ` Dave Chinner
2024-02-14 15:51         ` Pankaj Raghav (Samsung)
2024-02-13  9:37 ` [RFC v2 13/14] xfs: add an experimental CONFIG_XFS_LBS option Pankaj Raghav (Samsung)
2024-02-13 16:39   ` Darrick J. Wong
2024-02-13 21:19   ` Dave Chinner
2024-02-13 21:54     ` Pankaj Raghav (Samsung)
2024-02-13 22:45       ` Dave Chinner
2024-02-13  9:37 ` [RFC v2 14/14] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-02-13 16:20   ` Darrick J. Wong
2024-02-14 16:40     ` Pankaj Raghav (Samsung)
2024-02-13 21:34   ` Dave Chinner
2024-02-14 16:35     ` Pankaj Raghav (Samsung)
2024-02-15 22:17       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcvmjthSu0TNkf8z@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=chandan.babu@oracle.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=kbusch@kernel.org \
    --cc=kernel@pankajraghav.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.