public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Anders Blomdell <anders.blomdell@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: XFS mount timeout in linux-6.9.11
Date: Sat, 10 Aug 2024 10:29:38 +0200	[thread overview]
Message-ID: <252d91e2-282e-4af4-b99b-3b8147d98bc3@gmail.com> (raw)
In-Reply-To: <ZraeRdPmGXpbRM7V@dread.disaster.area>



On 2024-08-10 00:55, Dave Chinner wrote:
> On Fri, Aug 09, 2024 at 07:08:41PM +0200, Anders Blomdell wrote:
>> With a filesystem that contains a very large amount of hardlinks
>> the time to mount the filesystem skyrockets to around 15 minutes
>> on 6.9.11-200.fc40.x86_64 as compared to around 1 second on
>> 6.8.10-300.fc40.x86_64,
> 
> That sounds like the filesystem is not being cleanly unmounted on
> 6.9.11-200.fc40.x86_64 and so is having to run log recovery on the
> next mount and so is recovering lots of hardlink operations that
> weren't written back at unmount.
> 
> Hence this smells like an unmount or OS shutdown process issue, not
> a mount issue. e.g. if something in the shutdown scripts hangs,
> systemd may time out the shutdown and power off/reboot the machine
> wihtout completing the full shutdown process. The result of this is
> the filesystem has to perform recovery on the next mount and so you
> see a long mount time because of some other unrelated issue.
> 
> What is the dmesg output for the mount operations? That will tell us
> if journal recovery is the difference for certain.  Have you also
> checked to see what is happening in the shutdown/unmount process
> before the long mount times occur?
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
umount /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg
mount /dev/vg1/test /test
echo $(uname -r) $(date +%H:%M:%S) > /dev/kmsg

[55581.470484] 6.8.0-rc4-00129-g14dd46cf31f4 09:17:20
[55581.492733] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56048.292804] XFS (dm-7): Ending clean mount
[56516.433008] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:55
[56516.434695] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56516.925145] 6.8.0-rc4-00129-g14dd46cf31f4 09:32:56
[56517.039873] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[56986.017144] XFS (dm-7): Ending clean mount
[57454.876371] 6.8.0-rc4-00129-g14dd46cf31f4 09:48:34

And rebooting to the kernel before the offending commit:

[   60.177951] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:00
[   61.009283] SGI XFS with ACLs, security attributes, realtime, scrub, quota, no debug enabled
[   61.017422] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.351100] XFS (dm-7): Ending clean mount
[   61.366359] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[   61.367673] XFS (dm-7): Unmounting Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.444552] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01
[   61.459358] XFS (dm-7): Mounting V5 Filesystem e2159bbc-18fb-4d4b-a6c5-14c97b8e5380
[   61.513938] XFS (dm-7): Ending clean mount
[   61.524056] 6.8.0-rc4-00128-g8541a7d9da2d 10:23:01


> 
>> this of course makes booting drop
>> into emergency mode if the filesystem is in /etc/fstab. A git bisect
>> nails the offending commit as 14dd46cf31f4aaffcf26b00de9af39d01ec8d547.
> 
> Commit 14dd46cf31f4 ("xfs: split xfs_inobt_init_cursor") doesn't
> seem like a candidate for any sort of change of behaviour. It's just
> a refactoring patch that doesn't change any behaviour at all. 
> Are you sure the reproducer you used for the bisect is reliable?
Yes.

>> The filesystem is a collection of daily snapshots of a live filesystem
>> collected over a number of years, organized as a storage of unique files,
>> that are reflinked to inodes that contain the actual {owner,group,permission,
>> mtime}, and these inodes are hardlinked into the daily snapshot trees.
> 
> So it's reflinks and hardlinks. Recovering a reflink takes a lot
> more CPU time and journal traffic than recovering a hardlink, so
> that will also be a contributing factor.
> 
>> The numbers for the filesystem are:
>>
>>    Total file size:           3.6e+12 bytes
> 
> 3.6TB, not a large data set by any measurement.
> 
>>    Unique files:             12.4e+06
> 
> 12M files, not a lot.
> 
>>    Reflink inodes:           18.6e+06
> 
> 18M inodes with shared extents, not a huge number, either.
> 
>>    Hardlinks:                15.7e+09
> 
> Ok, 15.7 billion hardlinks is a *lot*.
:-)
> 
> And by a lot, I mean that's the largest number of hardlinks in an
> XFS filesystem I've personally ever heard about in 20 years.
Glad to be of service.

> 
> As a warning: hope like hell you never have a disaster with that
> storage and need to run xfs_repair on that filesystem. It you don't
> have many, many TBs of RAM, just checking the hardlinks resolve
> correctly could take billions of IOs...
I hope so as well :-), but it is not a critical system (used for testing
and statistics, will take about a month to rebuild though :-/).

> 
> -Dave.

  reply	other threads:[~2024-08-10  8:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-09 17:08 XFS mount timeout in linux-6.9.11 Anders Blomdell
2024-08-09 22:55 ` Dave Chinner
2024-08-10  8:29   ` Anders Blomdell [this message]
2024-08-10 23:11     ` Dave Chinner
2024-08-11  8:17       ` Anders Blomdell
2024-08-12  0:04         ` Dave Chinner
     [not found]           ` <6a19bfdf-9503-4c3b-bc5b-192685ec1bdd@gmail.com>
2024-08-13  9:19             ` Dave Chinner
2024-08-13 12:01               ` Anders Blomdell
2024-08-13 14:59               ` Christoph Hellwig
2024-08-13 15:25                 ` Darrick J. Wong
2024-08-22 10:25                   ` Anders Blomdell
2024-08-22 15:46                     ` Darrick J. Wong
2024-09-04  7:45                   ` David Woodhouse
2024-09-08 14:18                     ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=252d91e2-282e-4af4-b99b-3b8147d98bc3@gmail.com \
    --to=anders.blomdell@gmail.com \
    --cc=chandan.babu@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox