public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	shrikanth hegde <sshegde@linux.vnet.ibm.com>,
	dchinner@redhat.com, linux-xfs@vger.kernel.org,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	ojaswin@linux.ibm.com
Subject: Re: xfs: system fails to boot up due to Internal error xfs_trans_cancel
Date: Thu, 16 Mar 2023 10:18:45 +0530	[thread overview]
Message-ID: <87y1nx713m.fsf@doe.com> (raw)
In-Reply-To: <20230310002907.GO360264@dread.disaster.area>

Dave Chinner <david@fromorbit.com> writes:

> On Thu, Mar 09, 2023 at 07:56:06PM +0530, Ritesh Harjani wrote:
>> Hi Darrick,
>>
>> Please find the information collected from the system. We added some
>> debug logs and looks like it is exactly what is happening which you
>> pointed out.
>>
>> We added a debug kernel patch to get more info from the system which
>> you had requested [1]
>>
>> 1. We first breaked into emergency shell where root fs is first getting
>> mounted on /sysroot as "ro" filesystem. Here are the logs.
>>
>> [  OK  ] Started File System Check on /dev/mapper/rhel_ltcden3--lp1-root.
>>          Mounting /sysroot...
>> [    7.203990] SGI XFS with ACLs, security attributes, quota, no debug enabled
>> [    7.205835] XFS (dm-0): Mounting V5 Filesystem 7b801289-75a7-4d39-8cd3-24526e9e9da7
>> [   ***] A start job is running for /sysroot (15s / 1min 35s)[   17.439377] XFS (dm-0): Starting recovery (logdev: internal)
>> [  *** ] A start job is running for /sysroot (16s / 1min 35s)[   17.771158] xfs_log_mount_finish: Recovery needed is set
>> [   17.771172] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:0
>> [   17.771179] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:1
>> [   17.771184] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:2
>> [   17.771190] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:3
>> [   17.771196] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:4
>> [   17.771201] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:5
>> [   17.801033] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:6
>> [   17.801041] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:7
>> [   17.801046] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:8
>> [   17.801052] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:9
>> [   17.801057] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:10
>> [   17.801063] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:11
>> [   17.801068] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:12
>> [   17.801272] xlog_recover_iunlink_bucket: bucket: 13, agino: 3064909, ino: 3064909, iget ret: 0, previno:18446744073709551615, prev_agino:4294967295
>>
>> <previno, prev_agino> is just <-1 %ull and -1 %u> in above. That's why
>> the huge value.
>
> That's NULLFSINO and NULLAGINO respectively. That's the indication
> that this is the last inode on the chain and that the recovery loop
> should terminate at this inode.
>
> Nothing looks wrong to me there.

Yes, that's right.

>
>
>> 2. Then these are the logs from xfs_repair -n /dev/dm-0
>> Here you will notice the same agi 3064909 in bucket 13 (from phase-2) which got also
>> printed in above xlog_recover_iunlink_ag() function.
>>
>> switch_root:/# xfs_repair -n /dev/dm-0
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>         - zero log...
>>         - scan filesystem freespace and inode maps...
>> agi unlinked bucket 13 is 3064909 in ag 0 (inode=3064909)
>>         - found root inode chunk
>> Phase 3 - for each AG...
>>         - scan (but don't clear) agi unlinked lists...
>>         - process known inodes and perform inode discovery...
>>         - agno = 0
>>         - agno = 1
>>         - agno = 2
>>         - agno = 3
>>         - process newly discovered inodes...
>> Phase 4 - check for duplicate blocks...
>>         - setting up duplicate extent list...
>>         - check for inodes claiming duplicate blocks...
>>         - agno = 0
>>         - agno = 2
>>         - agno = 1
>>         - agno = 3
>> No modify flag set, skipping phase 5
>> Phase 6 - check inode connectivity...
>>         - traversing filesystem ...
>>         - traversal finished ...
>>         - moving disconnected inodes to lost+found ...
>> Phase 7 - verify link counts...
>> would have reset inode 3064909 nlinks from 4294967291 to 2
>
> And there's the problem. The link count, as an signed int, is -5.
> It got reset to 2 probably because it was moved to lost+found due to
> having a non-zero link count. Not sure how it got to -5 and I
> suspect we'll never really know. However, I think that's irrelevant
> because if we fix the problem with unlink recovery failing to
> register an unlink recovery error from inodegc we'll intentionally
> leak the inode and it won't ever be a problem again.
>
>
>> No modify flag set, skipping filesystem flush and exiting.
>>
>>
>> 3. Then we exit from the shell for the system to continue booting.
>> Here it will continue.. Just pasting the logs where the warning gets
>> generated and some extra logs are getting printed for the same inode
>> with our patch.
>>
>>
>> it continues
>> ================
>> [  587.999113] ------------[ cut here ]------------
>> [  587.999121] WARNING: CPU: 48 PID: 2026 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs]
>
> Yeah, that happens because we failed to recover the unlinked inode,
> we don't feed errors that occur in inodegc back to the unlink
> recovery code, so it didn't clear the AGI bucket when we failed to
> remove the inode from the unlinked list in inodegc. Hence it goes to
> add the new inode to the start of unlinked list, and doesn't find the
> broken, unrecovered inode that the bucket points to in cache.
>
> IOWs, we fix the problem with feeding back errors from inodegc to
> iunlink recovery, and this specific issue will go away.
>
>> I can spend some more time to debug and understand on how to fix this.
>> But thought of sharing this info meanwhile and see if there are any
>> pointers on how to fix this in kernel.
>
> I suspect that we need to do something like gather an error in
> xfs_inodegc_flush() like we do for gathering the error in
> xfs_btree_split() from the offloaded xfs_btree_split_worker()...
>

Thanks Dave for pointers. I haven't yet started looking into it.
Once I get few other things off my plate. I can start looking into this.

-ritesh

      reply	other threads:[~2023-03-16  4:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17 11:15 xfs: system fails to boot up due to Internal error xfs_trans_cancel shrikanth hegde
2023-02-17 11:25 ` shrikanth hegde
2023-02-17 11:30 ` shrikanth hegde
2023-02-17 15:03 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-02-17 16:53 ` Darrick J. Wong
2023-02-17 20:25   ` Dave Chinner
2023-02-18  7:17   ` shrikanth hegde
2023-02-22 16:41     ` Darrick J. Wong
2023-02-24  8:04       ` shrikanth hegde
2023-02-24 21:18         ` Darrick J. Wong
2023-03-09 14:26       ` Ritesh Harjani
2023-03-09 17:27         ` Darrick J. Wong
2023-03-16  4:46           ` Ritesh Harjani
2023-03-16  5:20             ` Darrick J. Wong
2023-03-17 20:44               ` Darrick J. Wong
2023-03-18 16:50                 ` Ritesh Harjani
2023-03-18 19:20                   ` Darrick J. Wong
2023-03-20  5:20                     ` Ritesh Harjani
2023-04-17 11:16                       ` Linux regression tracking (Thorsten Leemhuis)
2023-04-18  4:56                         ` Darrick J. Wong
2023-04-21 13:04                           ` Linux regression tracking (Thorsten Leemhuis)
2023-06-05 13:27                           ` Thorsten Leemhuis
2023-06-05 21:57                             ` Darrick J. Wong
2023-06-06  2:46                               ` Dave Chinner
2023-06-06  3:22                                 ` Darrick J. Wong
2023-06-06 11:23                                 ` Thorsten Leemhuis
2023-03-10  0:29         ` Dave Chinner
2023-03-16  4:48           ` Ritesh Harjani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1nx713m.fsf@doe.com \
    --to=ritesh.list@gmail.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox