linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
	Matthew Wilcox <willy@infradead.org>,
	chandan.babu@oracle.com, djwong@kernel.org, brauner@kernel.org,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	yang@os.amperecomputing.com, linux-mm@kvack.org,
	john.g.garry@oracle.com, linux-fsdevel@vger.kernel.org,
	hare@suse.de, p.raghav@samsung.com, mcgrof@kernel.org,
	gost.dev@samsung.com, cl@os.amperecomputing.com,
	linux-xfs@vger.kernel.org, hch@lst.de, Zi Yan <zi.yan@sent.com>
Subject: Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes
Date: Tue, 9 Jul 2024 09:11:51 +0100	[thread overview]
Message-ID: <5875f5ea-4d83-4691-914c-15834338410e@arm.com> (raw)
In-Reply-To: <ZoxvzXA1wcGDlQS2@dread.disaster.area>

On 09/07/2024 00:01, Dave Chinner wrote:
> On Fri, Jul 05, 2024 at 02:31:08PM +0100, Ryan Roberts wrote:
>> On 05/07/2024 14:24, Pankaj Raghav (Samsung) wrote:
>>>>> I suggest you handle it better than this.  If the device is asking for a
>>>>> blocksize > PMD_SIZE, you should fail to mount it.
>>>>
>>>> That's my point: we already do that.
>>>>
>>>> The largest block size we support is 64kB and that's way smaller
>>>> than PMD_SIZE on all platforms and we always check for bs > ps 
>>>> support at mount time when the filesystem bs > ps.
>>>>
>>>> Hence we're never going to set the min value to anything unsupported
>>>> unless someone makes a massive programming mistake. At which point,
>>>> we want a *hard, immediate fail* so the developer notices their
>>>> mistake immediately. All filesystems and block devices need to
>>>> behave this way so the limits should be encoded as asserts in the
>>>> function to trigger such behaviour.
>>>
>>> I agree, this kind of bug will be encountered only during developement 
>>> and not during actual production due to the limit we have fs block size
>>> in XFS.
>>>
>>>>
>>>>> If the device is
>>>>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is
>>>>> not set, you should also decline to mount the filesystem.
>>>>
>>>> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems
>>>> being able to use large folios?
>>>>
>>>> If that's an actual dependency of using large folios, then we're at
>>>> the point where the mm side of large folios needs to be divorced
>>>> from CONFIG_TRANSPARENT_HUGEPAGE and always supported.
>>>> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the
>>>> block layer and also every filesystem that wants to support
>>>> sector/blocks sizes larger than PAGE_SIZE.  IOWs, large folio
>>>> support needs to *always* be enabled on systems that say
>>>> CONFIG_BLOCK=y.
>>>
>>> Why CONFIG_BLOCK? I think it is enough if it comes from the FS side
>>> right? And for now, the only FS that needs that sort of bs > ps 
>>> guarantee is XFS with this series. Other filesystems such as bcachefs 
>>> that call mapping_set_large_folios() only enable it as an optimization
>>> and it is not needed for the filesystem to function.
>>>
>>> So this is my conclusion from the conversation:
>>> - Add a dependency in Kconfig on THP for XFS until we fix the dependency
>>>   of large folios on THP
>>
>> THP isn't supported on some arches, so isn't this effectively saying XFS can no
>> longer be used with those arches, even if the bs <= ps?
> 
> I'm good with that - we're already long past the point where we try
> to support XFS on every linux platform. Indeed, we've recent been
> musing about making XFS depend on 64 bit only - 32 bit systems don't
> have the memory capacity to run the full xfs tool chain (e.g.
> xfs_repair) on filesystems over about a TB in size, and they are
> greatly limited in kernel memory and vmap areas, both of which XFS
> makes heavy use of. Basically, friends don't let friends use XFS on
> 32 bit systems, and that's been true for about 20 years now.
> 
> Our problem is the test matrix - if we now have to explicitly test
> XFS both with and without large folios enabled to support these
> platforms, we've just doubled our test matrix. The test matrix is
> already far too large to robustly cover, so anything that requires
> doubling the number of kernel configs we have to test is, IMO, a
> non-starter.
> 
> That's why we really don't support XFS on 32 bit systems anymore and
> why we're talking about making that official with a config option.
> If we're at the point where XFS will now depend on large folios (i.e
> THP), then we need to seriously consider reducing the supported
> arches to just those that support both 64 bit and THP. If niche
> arches want to support THP, or enable large folios without the need
> for THP, then they can do that work and then they get XFS for
> free.
> 
> Just because an arch might run a Linux kernel, it doesn't mean we
> have to support XFS on it....

OK. I was just pointing out the impact of adding this Kconfig dependency. If
that impact is explicitly considered and desired, then great. I'll leave you to it.

Thanks,
Ryan

> 
> -Dave.


  reply	other threads:[~2024-07-09  8:11 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25 11:44 [PATCH v8 00/10] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-06-25 11:44 ` [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-07-04 12:23   ` Ryan Roberts
2024-07-04 15:20     ` Matthew Wilcox
2024-07-04 15:52       ` Ryan Roberts
2024-07-04 21:28       ` Pankaj Raghav (Samsung)
2024-07-04 22:06       ` Dave Chinner
2024-07-04 23:56         ` Matthew Wilcox
2024-07-05  4:32           ` Dave Chinner
2024-07-05  9:03             ` Ryan Roberts
2024-07-05 12:45               ` Pankaj Raghav (Samsung)
2024-07-05 13:24             ` Pankaj Raghav (Samsung)
2024-07-05 13:31               ` Ryan Roberts
2024-07-05 14:14                 ` Pankaj Raghav (Samsung)
2024-07-08 23:01                 ` Dave Chinner
2024-07-09  8:11                   ` Ryan Roberts [this message]
2024-07-09 13:08                   ` Pankaj Raghav (Samsung)
2024-07-05 15:14             ` Matthew Wilcox
2024-07-04 21:34     ` Pankaj Raghav (Samsung)
2024-07-09 16:29   ` Pankaj Raghav (Samsung)
2024-07-09 16:38     ` Matthew Wilcox
2024-07-09 17:33       ` Pankaj Raghav (Samsung)
2024-07-09 16:50     ` Darrick J. Wong
2024-07-09 21:08       ` Pankaj Raghav (Samsung)
2024-07-09 21:59         ` Darrick J. Wong
2024-06-25 11:44 ` [PATCH v8 02/10] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-06-25 15:52   ` Matthew Wilcox
2024-06-25 18:06     ` Pankaj Raghav (Samsung)
2024-06-25 11:44 ` [PATCH v8 03/10] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-07-02 19:38   ` Darrick J. Wong
2024-07-03 14:10     ` Pankaj Raghav (Samsung)
2024-07-04 14:24   ` Ryan Roberts
2024-07-04 14:29     ` Matthew Wilcox
2024-06-25 11:44 ` [PATCH v8 04/10] mm: split a folio in minimum folio order chunks Pankaj Raghav (Samsung)
2024-06-25 14:45   ` Zi Yan
2024-06-25 17:20     ` Pankaj Raghav (Samsung)
2024-06-25 11:44 ` [PATCH v8 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Pankaj Raghav (Samsung)
2024-07-01 23:39   ` Darrick J. Wong
2024-06-25 11:44 ` [PATCH v8 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-07-01  2:37   ` Dave Chinner
2024-07-01 11:22     ` Pankaj Raghav (Samsung)
2024-07-01 23:40   ` Darrick J. Wong
2024-07-02  7:42   ` Christoph Hellwig
2024-07-02 10:15     ` Pankaj Raghav (Samsung)
2024-07-02 12:02       ` Christoph Hellwig
2024-07-02 14:01         ` Pankaj Raghav (Samsung)
2024-07-02 15:42           ` Christoph Hellwig
2024-07-02 16:13             ` Pankaj Raghav (Samsung)
2024-07-02 16:51               ` Matthew Wilcox
2024-07-02 17:10                 ` Pankaj Raghav (Samsung)
2024-07-03  5:16                   ` Christoph Hellwig
2024-07-02 16:50         ` Matthew Wilcox
2024-07-02 13:49       ` Luis Chamberlain
2024-06-25 11:44 ` [PATCH v8 07/10] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-06-25 18:07   ` Pankaj Raghav (Samsung)
2024-06-25 11:44 ` [PATCH v8 08/10] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-07-01  2:33   ` Dave Chinner
2024-06-25 11:44 ` [PATCH v8 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-07-01  2:34   ` Dave Chinner
2024-06-25 11:44 ` [PATCH v8 10/10] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-07-01  2:34   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5875f5ea-4d83-4691-914c-15834338410e@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=cl@os.amperecomputing.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=john.g.garry@oracle.com \
    --cc=kernel@pankajraghav.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=zi.yan@sent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).