Re: [RFC 00/23] Enable block size > page size in XFS

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Luis Chamberlain <mcgrof@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>,
	Pankaj Raghav <kernel@pankajraghav.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	da.gomez@samsung.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, willy@infradead.org,
	djwong@kernel.org, linux-mm@kvack.org, chandan.babu@oracle.com,
	gost.dev@samsung.com, riteshh@linux.ibm.com
Subject: Re: [RFC 00/23] Enable block size > page size in XFS
Date: Thu, 21 Sep 2023 00:18:13 -0700	[thread overview]
Message-ID: <ZQvuNaYIukAnlEDM@bombadil.infradead.org> (raw)
In-Reply-To: <ZQvczBjY4vTLJFBp@dread.disaster.area>

On Thu, Sep 21, 2023 at 04:03:56PM +1000, Dave Chinner wrote:
> On Wed, Sep 20, 2023 at 09:57:56PM -0700, Luis Chamberlain wrote:
> > On Wed, Sep 20, 2023 at 08:00:12PM -0700, Luis Chamberlain wrote:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus
> > > 
> > > I haven't tested yet the second branch I pushed though but it applied without any changes
> > > so it should be good (usual famous last words).
> > 
> > I have run some preliminary tests on that branch as well above using fsx
> > with larger LBA formats running them all on the *same* system at the
> > same time. Kernel is happy.

<-- snip -->

> So I just pulled this, built it and run generic/091 as the very
> first test on this:
> 
> # ./run_check.sh --mkfs-opts "-m rmapbt=1 -b size=64k" --run-opts "-s xfs_64k generic/091"

The cover letter for this patch series acknowledged failures in fstests.

For kdevops now, we borrow the same last linux-next baseline:

git grep "generic/091" workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev
workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_1024.txt:generic/091 # possible regression
workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_16k.txt:generic/091
workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_32k.txt:generic/091
workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_64k_4ks.txt:generic/091

So well, we already know this fails.

> For all these assertions about how none of your testing is finding
> bugs in this code, It's taken me *4 seconds* of test runtime to find
> the first failure.

Because you know what to look for and this is not yet perfect.

> And, well, it's the same failure as I reported for the previous
> version of this code:

And we haven't done *any* new changes to the patch series so no surprise
either.

> Guess what? The fsx parameters being used means it is testing things you
> aren't.

I actualy found quite a bit of issues with -W. And it was useful.

> Yes, the '-Z -R -W' mean it is using direct IO for reads and writes,
> mmap() is disabled. Other parameters indicate that using 4k aligned reads and
> 512 byte aligned writes and truncates.

Thanks! This will help for sure!.

> There is a reason there are multiple different fsx tests in fstests;

You made it clear, and I documented the goal to ensure we get to the
point we pass all those:

https://kernelnewbies.org/KernelProjects/large-block-size#fsx

> they all exercise different sets of IO behaviours and alignments,
> and they exercise the IO paths differently.
> 
> So there's clearly something wrong here - it's likely that the
> filesystem IO alignment parameters pulled from the underlying block
> device (4k physical, 512 byte logical sector sizes) are improperly
> interpreted.  i.e. for a filesystem with a sector size of 4kB,
> direct IO with an alignment of 512 bytes should be rejected......

So yes, this is not yet complete.

But now let's step back and I want you to realize where we started
and why we decided to post, in particular me, I was suggesting we
post now, instead of waiting for us to resolve *it all*.

When we first started this work we simply thought it was impossible.
Unless of course you are Matthew and you believed hard in your work.

The progress, which you don't see, is that steps towards fixing fsx
issues have been logarithmic. Days, weeks, months before decent
progress, but the progress was steady...

And so to get to where we are today only just shows, well this is
actually not impossible, and Matthew did the right thing with the
right data structure, and the changes to the page cache with multi
index array stuff, it seems to be able to also be used for LBS.

At this point, from a logarithmic perspective, we have huge progress,
and I don't think it will stop. It gives us confidence Matthew was
right and LBS is possible indeed with the multi-index stuff.

It's not about, can this crash. Yes, we know, it can crash. It's about
how many different ways, and how many fixes left. Because clearly the
multi-index stuff is working well. The code feedback so far on this
patch series has mostly been "I don't think this patch is needed" or
"perhaps this way is better", and that's the kind of feedback we're
looking for. Because *each* new patch adds a huge a milestone. And
it seems the progress has been logarithmic. It is exactly why this
series went out with a few patches which ... we felt safer with them
than without. For instance the batch delete.. I still am suspicious
about us not needing as Hannes' patches also seem to rely on similer
rounding on the wait stuff, and it seems to bring back memories
on issues found on permissions. But anyway, the point is that, this
is clearly not ready. But try to think of progress here as logarithmic,
and any *dent* we make on the page cache to fix the last corner cases
will be huge, not small.

If you want to try, you can see for yourself, what's the next fix? :)
And if found, was it logarithmic? How do we polish this? That's the
goal of this patch series.

  Luis

next prev parent reply	other threads:[~2023-09-21  7:18 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 18:38 [RFC 00/23] Enable block size > page size in XFS Pankaj Raghav
2023-09-15 18:38 ` [RFC 01/23] fs: Allow fine-grained control of folio sizes Pankaj Raghav
2023-09-15 19:03   ` Matthew Wilcox
2023-09-15 18:38 ` [RFC 02/23] pagemap: use mapping_min_order in fgf_set_order() Pankaj Raghav
2023-09-15 18:55   ` Matthew Wilcox
2023-09-20  7:46     ` Pankaj Raghav
2023-09-15 18:38 ` [RFC 03/23] filemap: add folio with at least mapping_min_order in __filemap_get_folio Pankaj Raghav
2023-09-15 19:00   ` Matthew Wilcox
2023-09-20  8:06     ` Pankaj Raghav
2023-09-15 18:38 ` [RFC 04/23] filemap: set the order of the index in page_cache_delete_batch() Pankaj Raghav
2023-09-15 19:43   ` Matthew Wilcox
2023-09-18 18:20     ` Luis Chamberlain
2023-09-15 18:38 ` [RFC 05/23] filemap: align index to mapping_min_order in filemap_range_has_page() Pankaj Raghav
2023-09-15 19:45   ` Matthew Wilcox
2023-09-18 18:25     ` Luis Chamberlain
2023-09-15 18:38 ` [RFC 06/23] mm: call xas_set_order() in replace_page_cache_folio() Pankaj Raghav
2023-09-15 19:46   ` Matthew Wilcox
2023-09-18 18:27     ` Luis Chamberlain
2023-09-15 18:38 ` [RFC 07/23] filemap: align the index to mapping_min_order in __filemap_add_folio() Pankaj Raghav
2023-09-15 19:48   ` Matthew Wilcox
2023-09-18 18:32     ` Luis Chamberlain
2023-09-15 18:38 ` [RFC 08/23] filemap: align the index to mapping_min_order in filemap_get_folios_tag() Pankaj Raghav
2023-09-15 19:50   ` Matthew Wilcox
2023-09-18 18:36     ` Luis Chamberlain
2023-09-15 18:38 ` [RFC 09/23] filemap: use mapping_min_order while allocating folios Pankaj Raghav
2023-09-15 19:54   ` Matthew Wilcox
2023-09-15 18:38 ` [RFC 10/23] filemap: align the index to mapping_min_order in filemap_get_pages() Pankaj Raghav
2023-09-15 18:38 ` [RFC 11/23] filemap: align the index to mapping_min_order in do_[a]sync_mmap_readahead Pankaj Raghav
2023-09-15 18:38 ` [RFC 12/23] filemap: align index to mapping_min_order in filemap_fault() Pankaj Raghav
2023-09-15 18:38 ` [RFC 13/23] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav
2023-09-15 18:38 ` [RFC 14/23] readahead: allocate folios with mapping_min_order in ra_unbounded() Pankaj Raghav
2023-09-15 18:38 ` [RFC 15/23] readahead: align with mapping_min_order in force_page_cache_ra() Pankaj Raghav
2023-09-15 18:38 ` [RFC 16/23] readahead: add folio with at least mapping_min_order in page_cache_ra_order Pankaj Raghav
2023-09-15 18:38 ` [RFC 17/23] readahead: set the minimum ra size in get_(init|next)_ra Pankaj Raghav
2023-09-15 18:38 ` [RFC 18/23] readahead: align ra start and size to mapping_min_order in ondemand_ra() Pankaj Raghav
2023-09-15 18:38 ` [RFC 19/23] truncate: align index to mapping_min_order Pankaj Raghav
2023-09-15 18:38 ` [RFC 20/23] mm: round down folio split requirements Pankaj Raghav
2023-09-15 18:38 ` [RFC 21/23] xfs: expose block size in stat Pankaj Raghav
2023-09-15 18:38 ` [RFC 22/23] xfs: enable block size larger than page size support Pankaj Raghav
2023-09-15 18:38 ` [RFC 23/23] xfs: set minimum order folio for page cache based on blocksize Pankaj Raghav
2023-09-15 18:50 ` [RFC 00/23] Enable block size > page size in XFS Matthew Wilcox
2023-09-18 12:35   ` Pankaj Raghav
2023-09-17 22:05 ` Dave Chinner
2023-09-18  2:04   ` Luis Chamberlain
2023-09-18  5:07     ` Dave Chinner
2023-09-18 12:29       ` Pankaj Raghav
2023-09-19 11:56         ` Ritesh Harjani
2023-09-19 21:15           ` Luis Chamberlain
2023-09-21  3:00         ` Luis Chamberlain
     [not found]           ` <ZQvNVAfZMjE3hgmN@bombadil.infradead.org>
2023-09-21  6:03             ` Dave Chinner
2023-09-21  7:18               ` Luis Chamberlain [this message]
2023-09-21  7:20                 ` Luis Chamberlain
2023-09-22  5:03                 ` Dave Chinner
2023-09-22 19:38               ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQvuNaYIukAnlEDM@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=chandan.babu@oracle.com \
    --cc=da.gomez@samsung.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=kernel@pankajraghav.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=riteshh@linux.ibm.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).