From: Brian Foster <bfoster@redhat.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>, linux-xfs@vger.kernel.org
Subject: Re: Corruption of in-memory data detected. Shutting down filesystem
Date: Mon, 18 Feb 2019 09:34:04 -0500 [thread overview]
Message-ID: <20190218143403.GB33924@bfoster> (raw)
In-Reply-To: <20190218142203.p53qvipc4yul6mv6@hades.usersys.redhat.com>
On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote:
> Hi.
>
> > Dear XFS folks,
> >
> >
>
> > [ 25.506600] XFS (sdd): Mounting V5 Filesystem
> > [ 25.629621] XFS (sdd): Starting recovery (logdev: internal)
> > [ 25.685100] NFSD: starting 90-second grace period (net f0000098)
> > [ 26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1
> > [ 26.433834] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem
> > [ 26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [ 26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> >
> Ok, filesystem shut itself down likely because blocks allocated in the
> transaction exceeded the reservation.
>
> Could you please post the whole dmesg?
>
> > We mounted it with an overlay files,
>
> I'm not sure what you meant here, could you please specify what you meant by
> 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for
> overlayfs?
>
> > and the xfs_repair shows the
> > summary below.
> >
> > ```
> > # xfs_repair -vv /dev/mapper/sddovl
> > - block cache size set to 4201400 entries
> > Phase 2 - using internal log
> > - zero log...
> > zero_log: head block 3930112 tail block 3929088
> > ERROR: The filesystem has valuable metadata changes in a log which needs to
> > be replayed. Mount the filesystem to replay the log, and unmount it before
> > re-running xfs_repair. If you are unable to mount the filesystem, then use
> > the -L option to destroy the log and attempt a repair.
> > Note that destroying the log may cause corruption -- please attempt a mount
>
> Have you tried to mount/umount the filesystem before zeroing the log? This is
> supposed to be used as a last resort. Zero out the logs I mean.
>
> >
> > The directory `lost+found` contains almost five million files
> >
> > # find lost+found | wc
> > 4859687 4859687 110985720
>
> We don't have neither the whole xfs_repair output nor more information about the
> filesystem itself, but looks like you had huge directory(ies) update in your log
> which were not replayed, and all orphan inodes ended up in the lost+found =/
>
> >
> > We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we
> > cannot attach it.
> >
> > `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes
> > and the dump files is 8.8 GB in size.
>
> At this point, xfs_metadump won't help much once you already repaired the
> filesystem.
> Although, why are you getting a metadump from /dev/sdd, when the fs you tried to
> repair is a device-mapper device? Are you facing this issue in more than one
> filesystem?
>
> >
> > It’d be great, if you could give hints on debugging this issue further,
> > and comment, if you think it is possible to recover the files, that means,
> > to fix the log, so that it can be cleanly applied.
>
> Unfortunately, you already got rid of the log, so, you can't recover it anymore,
> but all the recovered files will be in lost+found, with their inode numbers as
> file name.
>
>
> Ok, so below is the dmesg, thanks for having attached it.
>
> One thing is there are 2 devices failing. sdd and dm-0. So my question again, is
> this the same filesystem or are they 2 separated filesystems showing exactly the
> same issue? The filesystem has found corrupted inodes in the AG's unlinked
> bucket, but this shouldn't affect log recovery.
>
> If they are two separated devices, did you xfs_repair'ed both of them? After you
> repaired the filesystem(s), do you still see the memory corruption issue?
>
> At this point, there is not much we can do regarding the filesystem metadata,
> once you already forced a xfs_repair zeroing the logs.
>
> So, could you please tell the current state of the filesystem (or filesystems if
> there is more than one)? Are you still seeing the same memory corruption error
> even after xfs_repair it?
>
FWIW, if you do still have an original copy of the fs, we could see
about whether bypassing the shutdown allows us to trade log recovery
failure for a space accounting error. This would still require a
subsequent repair, but that may be less invasive than zapping the log
and dealing with the aftermath of that.
Brian
> And for completeness, please provide us as much information as possible from
> this(these) filesystem(s):
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> Cheers.
>
> > [ 1380.869451] XFS (sdd): Mounting V5 Filesystem
> > [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal)
> > [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1
> > [ 1381.030785] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem
> > [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> <...>
> > [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal)
> > [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool.
> > [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5
>
> > [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal)
> > [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool.
> > [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5
>
> > [10944.023429] XFS (dm-0): Mounting V5 Filesystem
> > [10944.035260] XFS (dm-0): Ending clean mount
> > [11664.862376] XFS (dm-0): Unmounting Filesystem
> > [11689.260213] XFS (dm-0): Mounting V5 Filesystem
> > [11689.338187] XFS (dm-0): Ending clean mount
>
>
>
>
> --
> Carlos
next prev parent reply other threads:[~2019-02-18 14:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 12:23 Corruption of in-memory data detected. Shutting down filesystem Paul Menzel
2019-02-18 14:22 ` Carlos Maiolino
2019-02-18 14:31 ` Paul Menzel
2019-02-18 14:34 ` Brian Foster [this message]
2019-02-18 15:08 ` Paul Menzel
2019-02-18 16:17 ` Brian Foster
2019-02-18 17:32 ` Darrick J. Wong
2019-02-18 17:57 ` Brian Foster
2019-02-26 15:03 ` Paul Menzel
2019-02-26 17:15 ` Paul Menzel
2019-02-26 18:18 ` Brian Foster
2019-02-27 3:04 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2011-07-19 9:20 Markus Uckelmann
2011-07-19 11:38 ` Dave Chinner
2011-07-20 9:41 ` Markus Uckelmann
2011-07-20 9:48 ` Markus Uckelmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190218143403.GB33924@bfoster \
--to=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox