Re: Corruption of in-memory data detected. Shutting down filesystem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>, linux-xfs@vger.kernel.org
Subject: Re: Corruption of in-memory data detected. Shutting down filesystem
Date: Mon, 18 Feb 2019 09:34:04 -0500	[thread overview]
Message-ID: <20190218143403.GB33924@bfoster> (raw)
In-Reply-To: <20190218142203.p53qvipc4yul6mv6@hades.usersys.redhat.com>

On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote:
> Hi.
> 
> > Dear XFS folks,
> > 
> > 
> 
> > [   25.506600] XFS (sdd): Mounting V5 Filesystem
> > [   25.629621] XFS (sdd): Starting recovery (logdev: internal)
> > [   25.685100] NFSD: starting 90-second grace period (net f0000098)
> > [   26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c.  Return address = 00000000cfa623e1
> > [   26.433834] XFS (sdd): Corruption of in-memory data detected.  Shutting down filesystem
> > [   26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [   26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > 
> Ok, filesystem shut itself down likely because blocks allocated in the
> transaction exceeded the reservation.
> 
> Could you please post the whole dmesg?
> 
> > We mounted it with an overlay files,
> 
> I'm not sure what you meant here, could you please specify what you meant by
> 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for
> overlayfs?
> 
> > and the xfs_repair shows the
> > summary below.
> > 
> > ```
> > # xfs_repair -vv /dev/mapper/sddovl
> >         - block cache size set to 4201400 entries
> > Phase 2 - using internal log
> >         - zero log...
> > zero_log: head block 3930112 tail block 3929088
> > ERROR: The filesystem has valuable metadata changes in a log which needs to
> > be replayed.  Mount the filesystem to replay the log, and unmount it before
> > re-running xfs_repair.  If you are unable to mount the filesystem, then use
> > the -L option to destroy the log and attempt a repair.
> > Note that destroying the log may cause corruption -- please attempt a mount
> 
> Have you tried to mount/umount the filesystem before zeroing the log? This is
> supposed to be used as a last resort. Zero out the logs I mean.
> 
> > 
> > The directory `lost+found` contains almost five million files
> > 
> >     # find lost+found | wc
> >     4859687 4859687 110985720
> 
> We don't have neither the whole xfs_repair output nor more information about the
> filesystem itself, but looks like you had huge directory(ies) update in your log
> which were not replayed, and all orphan inodes ended up in the lost+found =/
> 
> > 
> > We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we
> > cannot attach it.
> > 
> > `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes
> > and the dump files is 8.8 GB in size.
> 
> At this point, xfs_metadump won't help much once you already repaired the
> filesystem.
> Although, why are you getting a metadump from /dev/sdd, when the fs you tried to
> repair is a device-mapper device? Are you facing this issue in more than one
> filesystem?
> 
> > 
> > It’d be great, if you could give hints on debugging this issue further,
> > and comment, if you think it is possible to recover the files, that means,
> > to fix the log, so that it can be cleanly applied.
> 
> Unfortunately, you already got rid of the log, so, you can't recover it anymore,
> but all the recovered files will be in lost+found, with their inode numbers as
> file name.
> 
> 
> Ok, so below is the dmesg, thanks for having attached it.
> 
> One thing is there are 2 devices failing. sdd and dm-0. So my question again, is
> this the same filesystem or are they 2 separated filesystems showing exactly the
> same issue? The filesystem has found corrupted inodes in the AG's unlinked
> bucket, but this shouldn't affect log recovery.
> 
> If they are two separated devices, did you xfs_repair'ed both of them? After you
> repaired the filesystem(s), do you still see the memory corruption issue?
> 
> At this point, there is not much we can do regarding the filesystem metadata,
> once you already forced a xfs_repair zeroing the logs.
> 
> So, could you please tell the current state of the filesystem (or filesystems if
> there is more than one)? Are you still seeing the same memory corruption error
> even after xfs_repair it?
> 

FWIW, if you do still have an original copy of the fs, we could see
about whether bypassing the shutdown allows us to trade log recovery
failure for a space accounting error. This would still require a
subsequent repair, but that may be less invasive than zapping the log
and dealing with the aftermath of that.

Brian

> And for completeness, please provide us as much information as possible from
> this(these) filesystem(s):
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> Cheers.
> 
> > [ 1380.869451] XFS (sdd): Mounting V5 Filesystem
> > [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal)
> > [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c.  Return address = 00000000cfa623e1
> > [ 1381.030785] XFS (sdd): Corruption of in-memory data detected.  Shutting down filesystem
> > [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> <...>
> > [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal)
> > [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool.
> > [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000217dbba5
> 
> > [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal)
> > [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool.
> > [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c.  Return address = 00000000217dbba5
> 
> > [10944.023429] XFS (dm-0): Mounting V5 Filesystem
> > [10944.035260] XFS (dm-0): Ending clean mount
> > [11664.862376] XFS (dm-0): Unmounting Filesystem
> > [11689.260213] XFS (dm-0): Mounting V5 Filesystem
> > [11689.338187] XFS (dm-0): Ending clean mount
> 
> 
> 
> 
> -- 
> Carlos

next prev parent reply	other threads:[~2019-02-18 14:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 12:23 Corruption of in-memory data detected. Shutting down filesystem Paul Menzel
2019-02-18 14:22 ` Carlos Maiolino
2019-02-18 14:31   ` Paul Menzel
2019-02-18 14:34   ` Brian Foster [this message]
2019-02-18 15:08     ` Paul Menzel
2019-02-18 16:17       ` Brian Foster
2019-02-18 17:32         ` Darrick J. Wong
2019-02-18 17:57           ` Brian Foster
2019-02-26 15:03             ` Paul Menzel
2019-02-26 17:15               ` Paul Menzel
2019-02-26 18:18                 ` Brian Foster
2019-02-27  3:04                   ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2011-07-19  9:20 Markus Uckelmann
2011-07-19 11:38 ` Dave Chinner
2011-07-20  9:41   ` Markus Uckelmann
2011-07-20  9:48   ` Markus Uckelmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190218143403.GB33924@bfoster \
    --to=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.