Re: Corruption of in-memory data detected. Shutting down filesystem

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>, linux-xfs@vger.kernel.org
Subject: Re: Corruption of in-memory data detected. Shutting down filesystem
Date: Mon, 18 Feb 2019 09:34:04 -0500	[thread overview]
Message-ID: <20190218143403.GB33924@bfoster> (raw)
In-Reply-To: <20190218142203.p53qvipc4yul6mv6@hades.usersys.redhat.com>

On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote:
> Hi.
> 
> > Dear XFS folks,
> > 
> > 
> 
> > [   25.506600] XFS (sdd): Mounting V5 Filesystem
> > [   25.629621] XFS (sdd): Starting recovery (logdev: internal)
> > [   25.685100] NFSD: starting 90-second grace period (net f0000098)
> > [   26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c.  Return address = 00000000cfa623e1
> > [   26.433834] XFS (sdd): Corruption of in-memory data detected.  Shutting down filesystem
> > [   26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [   26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > 
> Ok, filesystem shut itself down likely because blocks allocated in the
> transaction exceeded the reservation.
> 
> Could you please post the whole dmesg?
> 
> > We mounted it with an overlay files,
> 
> I'm not sure what you meant here, could you please specify what you meant by
> 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for
> overlayfs?
> 
> > and the xfs_repair shows the
> > summary below.
> > 
> > ```
> > # xfs_repair -vv /dev/mapper/sddovl
> >         - block cache size set to 4201400 entries
> > Phase 2 - using internal log
> >         - zero log...
> > zero_log: head block 3930112 tail block 3929088
> > ERROR: The filesystem has valuable metadata changes in a log which needs to
> > be replayed.  Mount the filesystem to replay the log, and unmount it before
> > re-running xfs_repair.  If you are unable to mount the filesystem, then use
> > the -L option to destroy the log and attempt a repair.
> > Note that destroying the log may cause corruption -- please attempt a mount
> 
> Have you tried to mount/umount the filesystem before zeroing the log? This is
> supposed to be used as a last resort. Zero out the logs I mean.
> 
> > 
> > The directory `lost+found` contains almost five million files
> > 
> >     # find lost+found | wc
> >     4859687 4859687 110985720
> 
> We don't have neither the whole xfs_repair output nor more information about the
> filesystem itself, but looks like you had huge directory(ies) update in your log
> which were not replayed, and all orphan inodes ended up in the lost+found =/
> 
> > 
> > We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we
> > cannot attach it.
> > 
> > `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes
> > and the dump files is 8.8 GB in size.
> 
> At this point, xfs_metadump won't help much once you already repaired the
> filesystem.
> Although, why are you getting a metadump from /dev/sdd, when the fs you tried to
> repair is a device-mapper device? Are you facing this issue in more than one
> filesystem?
> 
> > 
> > It’d be great, if you could give hints on debugging this issue further,
> > and comment, if you think it is possible to recover the files, that means,
> > to fix the log, so that it can be cleanly applied.
> 
> Unfortunately, you already got rid of the log, so, you can't recover it anymore,
> but all the recovered files will be in lost+found, with their inode numbers as
> file name.
> 
> 
> Ok, so below is the dmesg, thanks for having attached it.
> 
> One thing is there are 2 devices failing. sdd and dm-0. So my question again, is
> this the same filesystem or are they 2 separated filesystems showing exactly the
> same issue? The filesystem has found corrupted inodes in the AG's unlinked
> bucket, but this shouldn't affect log recovery.
> 
> If they are two separated devices, did you xfs_repair'ed both of them? After you
> repaired the filesystem(s), do you still see the memory corruption issue?
> 
> At this point, there is not much we can do regarding the filesystem metadata,
> once you already forced a xfs_repair zeroing the logs.
> 
> So, could you please tell the current state of the filesystem (or filesystems if
> there is more than one)? Are you still seeing the same memory corruption error
> even after xfs_repair it?
> 

FWIW, if you do still have an original copy of the fs, we could see
about whether bypassing the shutdown allows us to trade log recovery
failure for a space accounting error. This would still require a
subsequent repair, but that may be less invasive than zapping the log
and dealing with the aftermath of that.

Brian

> And for completeness, please provide us as much information as possible from
> this(these) filesystem(s):
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> Cheers.
> 
> > [ 1380.869451] XFS (sdd): Mounting V5 Filesystem
> > [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal)
> > [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c.  Return address = 00000000cfa623e1
> > [ 1381.030785] XFS (sdd): Corruption of in-memory data detected.  Shutting down filesystem
> > [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> <...>
> > [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal)
> > [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool.
> > [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000217dbba5
> 
> > [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal)
> > [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool.
> > [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000217dbba5
> 
> > [10944.023429] XFS (dm-0): Mounting V5 Filesystem
> > [10944.035260] XFS (dm-0): Ending clean mount
> > [11664.862376] XFS (dm-0): Unmounting Filesystem
> > [11689.260213] XFS (dm-0): Mounting V5 Filesystem
> > [11689.338187] XFS (dm-0): Ending clean mount
> 
> 
> 
> 
> -- 
> Carlos

next prev parent reply	other threads:[~2019-02-18 14:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 12:23 Corruption of in-memory data detected. Shutting down filesystem Paul Menzel
2019-02-18 14:22 ` Carlos Maiolino
2019-02-18 14:31   ` Paul Menzel
2019-02-18 14:34   ` Brian Foster [this message]
2019-02-18 15:08     ` Paul Menzel
2019-02-18 16:17       ` Brian Foster
2019-02-18 17:32         ` Darrick J. Wong
2019-02-18 17:57           ` Brian Foster
2019-02-26 15:03             ` Paul Menzel
2019-02-26 17:15               ` Paul Menzel
2019-02-26 18:18                 ` Brian Foster
2019-02-27  3:04                   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2011-07-19  9:20 Markus Uckelmann
2011-07-19 11:38 ` Dave Chinner
2011-07-20  9:41   ` Markus Uckelmann
2011-07-20  9:48   ` Markus Uckelmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190218143403.GB33924@bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox