My XFS volume died, please help!

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* My XFS volume died, please help!
@ 2017-06-02 23:21 Péter András Felvégi
  2017-06-05  8:57 ` Carlos Maiolino
  2017-06-05 17:29 ` Eric Sandeen
  0 siblings, 2 replies; 4+ messages in thread
From: Péter András Felvégi @ 2017-06-02 23:21 UTC (permalink / raw)
  To: linux-xfs

Hello,

after a power outage, the mount replayed the journal and no errors
were reported, but the mounted XFS volume had suspiciously little free
space. So I unmounted, and ran xfs_repair which crashed in phase 5
with a floating point exception. After that I was unable to mount
again, due to metadata corruption, so the repair made things worse.
Please help!

Linux 4.4.66 x86_64, Debian Jessie, xfsprogs 3.2.1

xps_repair output:
Phase 5 - rebuild AG headers and trees...
        - agno = 0
traps: xfs_repair[4786] trap divide error ip:417fef sp:7ffe43770d60
error:0 in xfs_repair[400000+7a000]
Floating point exception

mount output:
XFS (dm-2): Mounting V5 Filesystem
XFS (dm-2): Corruption warning: Metadata has LSN (11:170112) ahead of
current LSN (1:64). Please unmount and run xfs_repair (>= v4.3) to
resolve.
XFS (dm-2): log mount/recovery failed: error -22
XFS (dm-2): log mount failed
mount: wrong fs type, bad option, bad superblock on /dev/mapper/storage-crypt,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

I dumped the whole volume with dd so that I can revert to the current
(hopefully not hopelessly screwed) state. Downloaded xfsprogs 4.11.0
and checked the changelog. There are fixes for xfs_repair phase 5,
though the FP exception is not mentioned. Is this bug already fixed?
Should I try xfs_repair from 4.11?

Thanks & kind regards, Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: My XFS volume died, please help!
  2017-06-02 23:21 My XFS volume died, please help! Péter András Felvégi
@ 2017-06-05  8:57 ` Carlos Maiolino
  2017-06-05 17:29 ` Eric Sandeen
  1 sibling, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2017-06-05  8:57 UTC (permalink / raw)
  To: Péter András Felvégi; +Cc: linux-xfs

Hi,

On Sat, Jun 03, 2017 at 01:21:31AM +0200, Péter András Felvégi wrote:
> Hello,
> 
> after a power outage, the mount replayed the journal and no errors
> were reported, but the mounted XFS volume had suspiciously little free
> space. So I unmounted, and ran xfs_repair which crashed in phase 5
> with a floating point exception. After that I was unable to mount
> again, due to metadata corruption, so the repair made things worse.
> Please help!
> 
> Linux 4.4.66 x86_64, Debian Jessie, xfsprogs 3.2.1
> 
> xps_repair output:
> Phase 5 - rebuild AG headers and trees...
>         - agno = 0
> traps: xfs_repair[4786] trap divide error ip:417fef sp:7ffe43770d60
> error:0 in xfs_repair[400000+7a000]
> Floating point exception
> 

Hmmm, I've seen this before, but honestly don't remember neither when nor what
happened with it.

> mount output:
> XFS (dm-2): Mounting V5 Filesystem
> XFS (dm-2): Corruption warning: Metadata has LSN (11:170112) ahead of
> current LSN (1:64). Please unmount and run xfs_repair (>= v4.3) to
> resolve.
> XFS (dm-2): log mount/recovery failed: error -22
> XFS (dm-2): log mount failed
> mount: wrong fs type, bad option, bad superblock on /dev/mapper/storage-crypt,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 
> I dumped the whole volume with dd so that I can revert to the current
> (hopefully not hopelessly screwed) state. Downloaded xfsprogs 4.11.0
> and checked the changelog. There are fixes for xfs_repair phase 5,
> though the FP exception is not mentioned. Is this bug already fixed?
> Should I try xfs_repair from 4.11?
>

Yes, you might want to run latest xfs_repair on it. If you are afraid of making
thins worse, you can actually take a metadump from your current broken fs, and
run xfs_repair on it, to see what it finds, or run it with -n to see what it
will find in your FS.

cheers

> Thanks & kind regards, Peter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: My XFS volume died, please help!
  2017-06-02 23:21 My XFS volume died, please help! Péter András Felvégi
  2017-06-05  8:57 ` Carlos Maiolino
@ 2017-06-05 17:29 ` Eric Sandeen
  2017-06-07 11:11   ` Péter András Felvégi
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2017-06-05 17:29 UTC (permalink / raw)
  To: Péter András Felvégi, linux-xfs

On 6/2/17 6:21 PM, Péter András Felvégi wrote:
> Hello,
> 
> after a power outage, the mount replayed the journal and no errors
> were reported, but the mounted XFS volume had suspiciously little free
> space.

Did it happen to be your root filesystem, or some other?

> So I unmounted, and ran xfs_repair which crashed in phase 5

Did it happen to find corruption before that?

> with a floating point exception. After that I was unable to mount
> again, due to metadata corruption, so the repair made things worse.
> Please help!
> 
> Linux 4.4.66 x86_64, Debian Jessie, xfsprogs 3.2.1

3 years old :(

> xps_repair output:
> Phase 5 - rebuild AG headers and trees...
>         - agno = 0
> traps: xfs_repair[4786] trap divide error ip:417fef sp:7ffe43770d60
> error:0 in xfs_repair[400000+7a000]
> Floating point exception

> mount output:
> XFS (dm-2): Mounting V5 Filesystem
> XFS (dm-2): Corruption warning: Metadata has LSN (11:170112) ahead of
> current LSN (1:64). Please unmount and run xfs_repair (>= v4.3) to
> resolve.

I'd follow the advice as printed, and you'll probably be fine.

If you want a test run, do what Carlos suggested, after getting
a modern xfsprogs package:

# xfs_metadump -o /dev/dm-2 - | xfs_mdrestore - dm-2.img
# xfs_repair dm-2.img
# mkdir -p mnt/
# mount -o loop dm-2.img mnt/

and see how that looks (metadata only, the image created
above will contain no file data)

If current xfsprogs also segfaults, please provide a metadump
image:

# xfs_metadump /dev/dm-2 dm-2.metadump
# bzip2 dm-2.metadump

and provide the result to me or carlos (on a side channel, out
of an abundance of caution for privacy).

-Eric

> XFS (dm-2): log mount/recovery failed: error -22
> XFS (dm-2): log mount failed
> mount: wrong fs type, bad option, bad superblock on /dev/mapper/storage-crypt,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.


> I dumped the whole volume with dd so that I can revert to the current
> (hopefully not hopelessly screwed) state. Downloaded xfsprogs 4.11.0
> and checked the changelog. There are fixes for xfs_repair phase 5,
> though the FP exception is not mentioned. Is this bug already fixed?
> Should I try xfs_repair from 4.11?
> 
> Thanks & kind regards, Peter
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: My XFS volume died, please help!
  2017-06-05 17:29 ` Eric Sandeen
@ 2017-06-07 11:11   ` Péter András Felvégi
  0 siblings, 0 replies; 4+ messages in thread
From: Péter András Felvégi @ 2017-06-07 11:11 UTC (permalink / raw)
  To: linux-xfs

Hello,

On 05/06/2017, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 6/2/17 6:21 PM, Péter András Felvégi wrote:
>> Hello,
>>
>> after a power outage, the mount replayed the journal and no errors
>> were reported, but the mounted XFS volume had suspiciously little free
>> space.
>
> Did it happen to be your root filesystem, or some other?

Some other. It's a raid+lvm+luks volume.

>> So I unmounted, and ran xfs_repair which crashed in phase 5
>
> Did it happen to find corruption before that?

I ran xfs_repair twice, it crashed each time in phase 5 with the same
message. I don't recall it found corruption, but the log zeroing
printed different numbers.

If I run xfs_repair 4.11 with -n, it skips phase 5 altogether :S

> # xfs_metadump -o /dev/dm-2 - | xfs_mdrestore - dm-2.img

Metadata corruption detected at block 0x2dd04801/0x200
xfs_metadump: cannot init perag data (117). Continuing anyway.
Metadata CRC error detected at block 0x20b95802/0x200
Metadata CRC error detected at block 0x23ff1402/0x200
Metadata CRC error detected at block 0x2744d002/0x200
Metadata CRC error detected at block 0x2a8a8c02/0x200
Metadata corruption detected at block 0x2dd04801/0x200
Metadata CRC error detected at block 0x2dd04802/0x200
Metadata CRC error detected at block 0x31160402/0x200
Metadata CRC error detected at block 0x345bc002/0x200
Metadata corruption detected at block 0x37a17c01/0x200
Metadata CRC error detected at block 0x37a17c02/0x200
Metadata CRC error detected at block 0x3ae73802/0x200
Metadata CRC error detected at block 0x3e2cf402/0x200
Metadata CRC error detected at block 0x4172b002/0x200
Metadata CRC error detected at block 0x44b86c02/0x200
Metadata CRC error detected at block 0x47fe2802/0x200
Metadata CRC error detected at block 0x4b43e402/0x200
Metadata CRC error detected at block 0x4e89a002/0x200
Metadata CRC error detected at block 0x51cf5c02/0x200
Metadata CRC error detected at block 0x55151802/0x200
Metadata CRC error detected at block 0x585ad402/0x200
Metadata CRC error detected at block 0x5ba09002/0x200
Metadata CRC error detected at block 0x5ee64c02/0x200
Metadata CRC error detected at block 0x622c0802/0x200
Metadata CRC error detected at block 0x6571c402/0x200

Isn't this too many blocks for this kind of failure? There was no
activity on the volume except for compressing a single large file.

> # xfs_repair dm-2.img
Phase 5 ran without error. After Phase 7 it printed this:
Maximum metadata LSN (11:170112) is ahead of log (1:64).
Format log to cycle 14.

> # mkdir -p mnt/
> # mount -o loop dm-2.img mnt/

> and see how that looks (metadata only, the image created
> above will contain no file data)

Seems OK :)

After running the repair on dm-2, the mount failed, my heart dropped,
but luckily this was just due to the duplicate fs UUIDs, as the dumped
meta image was still mounted.

So the fs is mounted, everything looks OK so far.

THANK YOU!

Kind regards, Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-07 11:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-02 23:21 My XFS volume died, please help! Péter András Felvégi
2017-06-05  8:57 ` Carlos Maiolino
2017-06-05 17:29 ` Eric Sandeen
2017-06-07 11:11   ` Péter András Felvégi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).