public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] xfs: miscellaneous bug fixes
@ 2024-11-12 22:05 Dave Chinner
  2024-11-12 22:05 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Dave Chinner @ 2024-11-12 22:05 UTC (permalink / raw)
  To: linux-xfs; +Cc: cem

These are three bug fixes for recent issues.

The first is a repost of the original patch to prevent allocation of
sparse inode clusters at the end of an unaligned runt AG. There
was plenty of discussion over that fix here:

https://lore.kernel.org/linux-xfs/20241024025142.4082218-1-david@fromorbit.com/

And the outcome of that discussion is that we can't allow sparse
inode clusters overlapping the end of the runt AG without an on disk
format definition change. Hence this patch to ensure the check is
done correctly is the only change we need to make to the kernel to
avoid this problem in the future.

Filesystems that have this problem on disk will need to run
xfs_repair to remove the bad cluster, but no data loss is possible
from this because the kernel currently disallows inode allocation
from the bad cluster and so none of the inodes in the sparse cluster
can actually be used. Hence there is no possible data loss or other
metadata corruption possible from this situation, all we need to do
is ensure that it doesn't happen again once repair has done it's
work.

The other two patches are for issues I've recently hit when running
lots of fstests in parallel. That changes loading and hence timing
of events during tests, exposing latent race conditions in the code.
The quota fix removes racy debug code that has been there since the
quota code was first committed in 1996.

The log shutdown race fix is a much more recent issue created by
trying to ensure shutdowns operate in a sane and predictable manner.
The logic flaw is that we allow multiple log shutdowns to start and
force the log before selecting on a single log shutdown task. This
leads to a situation where shutdown log item callback processing
gets stuck waiting on a task holding a buffer lock that is waiting
on a log force that is waiting on shutdown log item callback
processing to complete...

Thoughts?


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH 0/3] xfs: sparse inodes overlap end of filesystem
@ 2024-10-24  2:51 Dave Chinner
  2024-10-24  2:51 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2024-10-24  2:51 UTC (permalink / raw)
  To: linux-xfs

We have had a large number of recent reports about cloud filesystems
with "corrupt" inode records recently. They are all the same, and
feature a filesystem that has been grown from a small size to a
larger size (to 30G or 50G). In all cases, they have a very small
runt AG at the end of the filesystem.  In the case of the 30GB
filesystems, this is 1031 blocks long.

These filesystems start issuing corruption warnings when trying to
allocate an in a sparse cluster at block 1024 of the runt AG. At
this point, there shouldn't be a sparse inode cluster because there
isn't space to fit an entire inode chunk (8 blocks) at block 1024.
i.e. it is only 7 blocks from the end of the AG.

Hence the first bug is that we allowed allocation of a sparse inode
cluster in this location when it should not have occurred. The first
patch in the series addresses this.

However, there is actually nothing corrupt in the on-disk sparse
inode record or inode cluster at agbno 1024. It is a 32 inode
cluster, which means it is 4 blocks in length, so sits entirely
within the AG and every inode in the record is addressable and
accessible. The only thing we can't do is make the sparse inode
record whole - the inode allocation code cannot allocate another 4
blocks that span beyond the end of the AG. Hence this inode record
and cluster remain sparse until all the inodes in it are freed and
the cluster removed from disk.

The second bug is that we don't consider inodes beyond inode cluster
alignment at the end of an AG as being valid. When sparse inode
alignment is in use, we set the in-memory inode cluster alignment to
match the inode chunk alignment, and so the maximum valid inode
number is inode chunk aligned, not inode cluster aligned. Hence when
we have an inode cluster at the end of the AG - so the max inode
number is cluster aligned - we reject that entire cluster as being
invalid. 

As stated above, there is nothing corrupt about the sparse inode
cluster at the end of the AG, it just doesn't match an arbitrary
alignment validation restriction for inodes at the end of the AG.
Given we have production filesystems out there with sparse inode
clusters allocated with cluster alignment at the end of the AG, we
need to consider these inodes as valid and not error out with a
corruption report.  The second patch addresses this.

The third issue I found is that we never validate the
sb->sb_spino_align valid when we mount the filesystem. It could have
any value and we just blindly use it when calculating inode
allocation geometry. The third patch adds sb->sb_spino_align range
validation to the superblock verifier.

There is one question that needs to be resolved in this patchset: if
we take patch 2 to allow sparse inodes at the end of the AG, why
would we need the change in patch 1? Indeed, at this point I have to
ask why we even need the min/max agbno guidelines to the inode chunk
allocation as we end up allowing any aligned location in the AG to
be used by sparse inodes. i.e. if we take patch 2, then patch 1 is
unnecessary and now we can remove a bunch of code (min/max_agbno
constraints) from the allocator paths...

I'd prefer that we take the latter path: ignore the first patch.
This results in more flexible behaviour, allows existing filesystems
with this issue to work without needing xfs_repair to fix them, and
we get to remove complexity from the code.

Thoughts?

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-11-25 11:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-12 22:05 [PATCH 0/3] xfs: miscellaneous bug fixes Dave Chinner
2024-11-12 22:05 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner
2024-11-12 23:15   ` Darrick J. Wong
2024-11-13  0:12     ` Dave Chinner
2024-11-13  0:24       ` Darrick J. Wong
2024-11-12 22:05 ` [PATCH 2/3] xfs: delalloc and quota softlimit timers are incoherent Dave Chinner
2024-11-12 23:48   ` Darrick J. Wong
2024-11-13  0:14     ` Dave Chinner
2024-11-13  8:48   ` Christoph Hellwig
2024-11-12 22:05 ` [PATCH 3/3] xfs: prevent mount and log shutdown race Dave Chinner
2024-11-12 23:58   ` Darrick J. Wong
2024-11-13  0:56     ` Dave Chinner
2024-11-13  8:50   ` Christoph Hellwig
2024-11-12 23:59 ` [PATCH 0/3] xfs: miscellaneous bug fixes Darrick J. Wong
2024-11-13  1:09   ` Dave Chinner
2024-11-25 11:57 ` Carlos Maiolino
  -- strict thread matches above, loose matches on Subject: below --
2024-10-24  2:51 [PATCH 0/3] xfs: sparse inodes overlap end of filesystem Dave Chinner
2024-10-24  2:51 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox