From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>,
linux-xfs@vger.kernel.org, it+linux-xfs@molgen.mpg.de
Subject: Re: Corruption of in-memory data detected. Shutting down filesystem
Date: Mon, 18 Feb 2019 09:32:51 -0800 [thread overview]
Message-ID: <20190218173251.GF6503@magnolia> (raw)
In-Reply-To: <20190218161746.GE33924@bfoster>
On Mon, Feb 18, 2019 at 11:17:46AM -0500, Brian Foster wrote:
> On Mon, Feb 18, 2019 at 04:08:00PM +0100, Paul Menzel wrote:
> > Dear Brian, dear Carlos,
> >
> >
> > Thank you very much for your replies.
> >
> >
> > On 02/18/19 15:34, Brian Foster wrote:
> > > On Mon, Feb 18, 2019 at 03:22:03PM +0100, Carlos Maiolino wrote:
> >
> > >>> [ 25.506600] XFS (sdd): Mounting V5 Filesystem
> > >>> [ 25.629621] XFS (sdd): Starting recovery (logdev: internal)
> > >>> [ 25.685100] NFSD: starting 90-second grace period (net f0000098)
> > >>> [ 26.433828] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1
> > >>> [ 26.433834] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem
> > >>> [ 26.433835] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > >>> [ 26.433857] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > >>>
> > >> Ok, filesystem shut itself down likely because blocks allocated in the
> > >> transaction exceeded the reservation.
> >
> > >>> We mounted it with an overlay files,
> > >>
> > >> I'm not sure what you meant here, could you please specify what you meant by
> > >> 'overlay files'? Are you using this XFS filesystem as an upper/lower FS for
> > >> overlayfs?
> >
> > To not touch the original data on `/dev/sdd` after the error we created on overlay
> > FS to test xfs_repair, and analyze the effect.
> >
> > ```
> > # more /scratch/local3/mkoverlay.sh
> > dev=/dev/sdd
> > ovl=/scratch/local3/X4042-overlay.file
> > newdevname=sddovl
> > size=$(blockdev --getsz "$dev")
> >
> > loop=$(losetup -f --show -- "$ovl")
> > printf '%s\n' "0 $size snapshot $dev $loop P 8"
> > printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname"
> > # lsblk
> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > loop0 7:0 0 10G 0 loop
> > loop1 7:1 0 10G 0 loop
> > └─sddovl 252:0 0 51T 0 dm
> > sda 8:0 0 931.5G 0 disk
> > ├─sda1 8:1 0 256G 0 part /
> > └─sda2 8:2 0 675.5G 0 part /amd/tinytina/0
> > sdb 8:16 0 931.5G 0 disk /amd/tinytina/1
> > sdc 8:32 0 27.3T 0 disk /amd/tinytina/2
> > sdd 8:48 0 51T 0 disk
> > └─sddovl 252:0 0 51T 0 dm
> > sde 8:64 0 51T 0 disk /amd/tinytina/X/X4075
> > sdf 8:80 0 51T 0 disk /amd/tinytina/X/X4076
> > ```
> >
> > >>> and the xfs_repair shows the summary below.
> > >>>
> > >>> ```
> > >>> # xfs_repair -vv /dev/mapper/sddovl
> > >>> - block cache size set to 4201400 entries
> > >>> Phase 2 - using internal log
> > >>> - zero log...
> > >>> zero_log: head block 3930112 tail block 3929088
> > >>> ERROR: The filesystem has valuable metadata changes in a log which needs to
> > >>> be replayed. Mount the filesystem to replay the log, and unmount it before
> > >>> re-running xfs_repair. If you are unable to mount the filesystem, then use
> > >>> the -L option to destroy the log and attempt a repair.
> > >>> Note that destroying the log may cause corruption -- please attempt a mount
> > >>
> > >> Have you tried to mount/umount the filesystem before zeroing the log? This is
> > >> supposed to be used as a last resort. Zero out the logs I mean.
> >
> > Yes, we tried that, and were told to run `xfs_repair`.
> >
> > >>> The directory `lost+found` contains almost five million files
> > >>>
> > >>> # find lost+found | wc
> > >>> 4859687 4859687 110985720
> > >>
> > >> We don't have neither the whole xfs_repair output nor more information about the
> > >> filesystem itself, but looks like you had huge directory(ies) update in your log
> > >> which were not replayed, and all orphan inodes ended up in the lost+found =/
> >
> > I’ll see if I can share the 500 MB xfs_repair output.
> >
> > ```
> > $ lsblk /dev/sdd
> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > sdd 8:48 0 51T 0 disk
> > $ sudo xfs_info /dev/sdd
> > Password (for pmenzel) :
> > meta-data=/dev/sdd isize=512 agcount=51, agsize=268435392 blks
> > = sectsz=512 attr=2, projid32bit=1
> > = crc=1 finobt=1, sparse=0, rmapbt=0
> > = reflink=0
> > data = bsize=4096 blocks=13671838208, imaxpct=1
> > = sunit=64 swidth=896 blks
> > naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> > log =internal log bsize=4096 blocks=521728, version=2
> > = sectsz=512 sunit=64 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> > ```
> >
> > >>> We saved the output of `xfs_repair`, but it’s over 500 MB in size, so we
> > >>> cannot attach it.
> > >>>
> > >>> `sudo xfs_metadump -go /dev/sdd sdd-metadump.dump` takes over 15 minutes
> > >>> and the dump files is 8.8 GB in size.
> > >>
> > >> At this point, xfs_metadump won't help much once you already repaired the
> > >> filesystem.
> >
> > As written, we only run the repair on the overlay.
> >
> > >> Although, why are you getting a metadump from /dev/sdd, when the fs you tried to
> > >> repair is a device-mapper device? Are you facing this issue in more than one
> > >> filesystem?
> >
> > As written above, it was to not touch `/dev/sdd`. So the meta data dump should be
> > useable.
> >
> > >>> It’d be great, if you could give hints on debugging this issue further,
> > >>> and comment, if you think it is possible to recover the files, that means,
> > >>> to fix the log, so that it can be cleanly applied.
> > >>
> > >> Unfortunately, you already got rid of the log, so, you can't recover it anymore,
> > >> but all the recovered files will be in lost+found, with their inode numbers as
> > >> file name.
> > >>
> > >> Ok, so below is the dmesg, thanks for having attached it.
> > >>
> > >> One thing is there are 2 devices failing. sdd and dm-0. So my question again, is
> > >> this the same filesystem or are they 2 separated filesystems showing exactly the
> > >> same issue? The filesystem has found corrupted inodes in the AG's unlinked
> > >> bucket, but this shouldn't affect log recovery.
> > >>
> > >> If they are two separated devices, did you xfs_repair'ed both of them? After you
> > >> repaired the filesystem(s), do you still see the memory corruption issue?
> >
> > No, we do not see the corruption issue anymore, and mounting works as expected.
> >
> > >> At this point, there is not much we can do regarding the filesystem metadata,
> > >> once you already forced a xfs_repair zeroing the logs.
> > >>
> > >> So, could you please tell the current state of the filesystem (or filesystems if
> > >> there is more than one)? Are you still seeing the same memory corruption error
> > >> even after xfs_repair it?
> >
> > The user wants to continue using the device, and does not need the recovered files,
> > so we had to go ahead, and repair `/dev/sde` now too. We are only interested in
> > having this bug fixed in XFS, and not about data recovery.
> >
>
> Well, if the metadump is still available and is based on the original
> (broken) fs then it may be able to reproduce the original problem. How
> large is it compressed? Would you be able to upload it somewhere?
>
> Brian
>
> > > FWIW, if you do still have an original copy of the fs, we could see
> > > about whether bypassing the shutdown allows us to trade log recovery
> > > failure for a space accounting error. This would still require a
> > > subsequent repair, but that may be less invasive than zapping the log
> > > and dealing with the aftermath of that.
> >
> > Unfortunately, it’s 50 TB, so we do not have a copy anymore.
> >
> > >> And for completeness, please provide us as much information as possible from
> > >> this(these) filesystem(s):
> > >>
> > >> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >
> > /dev/sdd /amd/tinytina/X/X4074 xfs rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=7168,noquota 0 0
> >
> > >>> [ 1380.869451] XFS (sdd): Mounting V5 Filesystem
> > >>> [ 1380.912559] XFS (sdd): Starting recovery (logdev: internal)
> > >>> [ 1381.030780] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 368 of file fs/xfs/xfs_trans.c. Return address = 00000000cfa623e1
<groan> Funny, I saw this exact same thing a week ago. We really ought
to record a stack trace when we cancel dirty transactions.
Hmm, so leaking millions of nfs file handles ... were they temporary or
deleted files?
I bet I know exactly what caused this -- freeing unlinked inodes that
required a free inode btree expansion, but m_inotbt_nores was
incorrectly set during log recovery so we didn't reserve any blocks and
blew up.
Relevant patches that might fix this:
"xfs: rename m_inotbt_nores to m_finobt_nores"
"xfs: reserve blocks for ifree transaction during log recovery"
(As mentioned earlier, there's not much we can do once the log has been
zereod.)
--D
> > >>> [ 1381.030785] XFS (sdd): Corruption of in-memory data detected. Shutting down filesystem
> > >>> [ 1381.030786] XFS (sdd): Please umount the filesystem and rectify the problem(s)
> > >>> [ 1381.031086] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > >>> [ 1381.031088] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > >>> [ 1381.031090] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > >>> [ 1381.031093] XFS (sdd): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
> > >>> [ 1381.031095] XFS (sdd): xlog_recover_clear_agi_bucket: failed to clear agi 0. Continuing.
> > >> <...>
> > >>> [ 1381.031113] XFS (sdd): Ending recovery (logdev: internal)
> > >>> [ 1381.031490] XFS (sdd): Error -5 reserving per-AG metadata reserve pool.
> > >>> [ 1381.031492] XFS (sdd): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5
> > >>
> > >>> [ 2795.123228] XFS (dm-0): Ending recovery (logdev: internal)
> > >>> [ 2795.231020] XFS (dm-0): Error -5 reserving per-AG metadata reserve pool.
> > >>> [ 2795.231023] XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 548 of file fs/xfs/xfs_fsops.c. Return address = 00000000217dbba5
> > >>
> > >>> [10944.023429] XFS (dm-0): Mounting V5 Filesystem
> > >>> [10944.035260] XFS (dm-0): Ending clean mount
> > >>> [11664.862376] XFS (dm-0): Unmounting Filesystem
> > >>> [11689.260213] XFS (dm-0): Mounting V5 Filesystem
> > >>> [11689.338187] XFS (dm-0): Ending clean mount
> >
> > As written, data recovery is not needed anymore, but, if it’d help the driver,
> > we could share the logs we have (after making sure it’s obfuscated).
> >
> > Please tell us, if you are interested, and what dumps, xfs_repair output and
> > so on would be needed.
> >
> >
> > Kind regards,
> >
> > Paul
> >
>
>
next prev parent reply other threads:[~2019-02-18 17:33 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 12:23 Corruption of in-memory data detected. Shutting down filesystem Paul Menzel
2019-02-18 14:22 ` Carlos Maiolino
2019-02-18 14:31 ` Paul Menzel
2019-02-18 14:34 ` Brian Foster
2019-02-18 15:08 ` Paul Menzel
2019-02-18 16:17 ` Brian Foster
2019-02-18 17:32 ` Darrick J. Wong [this message]
2019-02-18 17:57 ` Brian Foster
2019-02-26 15:03 ` Paul Menzel
2019-02-26 17:15 ` Paul Menzel
2019-02-26 18:18 ` Brian Foster
2019-02-27 3:04 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2011-07-19 9:20 Markus Uckelmann
2011-07-19 11:38 ` Dave Chinner
2011-07-20 9:41 ` Markus Uckelmann
2011-07-20 9:48 ` Markus Uckelmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190218173251.GF6503@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=it+linux-xfs@molgen.mpg.de \
--cc=linux-xfs@vger.kernel.org \
--cc=pmenzel@molgen.mpg.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).