From: "Darrick J. Wong" <djwong@kernel.org>
To: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
Matthew Wilcox <willy@infradead.org>,
david@fromorbit.com, chandan.babu@oracle.com, brauner@kernel.org,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
yang@os.amperecomputing.com, linux-mm@kvack.org,
john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org,
hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org,
gost.dev@samsung.com, cl@os.amperecomputing.com,
linux-xfs@vger.kernel.org, hch@lst.de, Zi Yan <ziy@nvidia.com>
Subject: Re: [PATCH v10 01/10] fs: Allow fine-grained control of folio sizes
Date: Wed, 17 Jul 2024 08:25:22 -0700 [thread overview]
Message-ID: <20240717152522.GF612460@frogsfrogsfrogs> (raw)
In-Reply-To: <20240717151251.x7vkwajb57pefs6m@quentin>
On Wed, Jul 17, 2024 at 03:12:51PM +0000, Pankaj Raghav (Samsung) wrote:
> > >>
> > >> This is really too much. It's something that will never happen. Just
> > >> delete the message.
> > >>
> > >>> + if (max > MAX_PAGECACHE_ORDER) {
> > >>> + VM_WARN_ONCE(1,
> > >>> + "max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> > >>> + max = MAX_PAGECACHE_ORDER;
> > >>
> > >> Absolutely not. If the filesystem declares it can support a block size
> > >> of 4TB, then good for it. We just silently clamp it.
> > >
> > > Hmm, but you raised the point about clamping in the previous patches[1]
> > > after Ryan pointed out that we should not silently clamp the order.
> > >
> > > ```
> > >> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
> > >> whatever values are passed in are a hard requirement? So wouldn't want them to
> > >> be silently reduced. (Especially given the recent change to reduce the size of
> > >> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
> > >
> > > Hm, yes. We should probably make this return an errno. Including
> > > returning an errno for !IS_ENABLED() and min > 0.
> > > ```
> > >
> > > It was not clear from the conversation in the previous patches that we
> > > decided to just clamp the order (like it was done before).
> > >
> > > So let's just stick with how it was done before where we clamp the
> > > values if min and max > MAX_PAGECACHE_ORDER?
> > >
> > > [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@casper.infradead.org/
> >
> > The way I see it, there are 2 approaches we could take:
> >
> > 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
> > mapping_set_folio_order_range() that says min must be lte max, max must be lte
> > mapping_max_folio_size_supported(). Then emit VM_WARN() in
> > mapping_set_folio_order_range() if the constraints are violated, and clamp to
> > make it safe (from page cache's perspective). The VM_WARN()s can just be inline
>
> Inlining with the `if` is not possible since:
> 91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")
>
> > in the if statements to keep them clean. The FS is responsible for checking
> > mapping_max_folio_size_supported() and ensuring min and max meet requirements.
>
> This is sort of what is done here but IIUC willy's reply to the patch,
> he prefers silent clamping over having WARNINGS. I think because we check
> the constraints during the mount time, so it should be safe to call
> this I guess?
That's my read of the situation, but I'll ask about it at the next thp
meeting if that helps.
> >
> > 2. Return an error from mapping_set_folio_order_range() (and the other functions
> > that set min/max). No need for warning. No state changed if error is returned.
> > FS can emit warning on error if it wants.
>
> I think Chinner was not happy with this approach because this is done
> per inode and basically we would just shutdown the filesystem in the
> first inode allocation instead of refusing the mount as we know about
> the MAX_PAGECACHE_ORDER even during the mount phase anyway.
I agree. Filesystem-wide properties (e.g. fs blocksize) should cause
the mount to fail if the pagecache cannot possibly handle any file
blocks. Inode-specific properties (e.g. the forcealign+notears write
work John Garry is working on) could error out of open() with -EIO, but
that's a specialty file property.
--D
> --
> Pankaj
>
next prev parent reply other threads:[~2024-07-17 15:25 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 9:44 [PATCH v10 00/10] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 01/10] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-07-16 15:26 ` Matthew Wilcox
2024-07-17 9:46 ` Pankaj Raghav (Samsung)
2024-07-17 9:59 ` Ryan Roberts
2024-07-17 15:12 ` Pankaj Raghav (Samsung)
2024-07-17 15:25 ` Darrick J. Wong [this message]
2024-07-17 15:26 ` Ryan Roberts
2024-07-22 14:19 ` Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 02/10] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 03/10] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 04/10] mm: split a folio in minimum folio order chunks Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 07/10] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 08/10] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-07-15 9:44 ` [PATCH v10 10/10] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-07-15 16:46 ` Darrick J. Wong
2024-07-22 14:12 ` Pankaj Raghav (Samsung)
2024-07-22 18:49 ` Darrick J. Wong
2024-07-16 15:29 ` Matthew Wilcox
2024-07-16 17:40 ` Darrick J. Wong
2024-07-16 17:46 ` Matthew Wilcox
2024-07-16 22:37 ` Darrick J. Wong
2024-07-17 10:02 ` Pankaj Raghav (Samsung)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240717152522.GF612460@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=cl@os.amperecomputing.com \
--cc=david@fromorbit.com \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kernel@pankajraghav.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).