XFS corruption after power surge/outage

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* XFS corruption after power surge/outage
@ 2024-02-09 18:39 Jorge Garcia
  2024-02-11 20:39 ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Jorge Garcia @ 2024-02-09 18:39 UTC (permalink / raw)
  To: linux-xfs

Hello,

We have a server with a very large (300+ TB) XFS filesystem that we
use to provide downloads to the world. Last week's storms in
California caused damage to our machine room, causing unexpected power
surges and power outages, even in our UPS and generator backed data
center. One of the end results was some data corruption on our server
(running Centos 8). After looking around the internet for solutions to
our issues, the general consensus seemed to be to run xfs_repair on
the filesystem to get it to recover. We tried that (xfs_repair V 5.0)
and it seemed to report lots of issues before eventually failing
during "Phase 6" with an error like:

  Metadata corruption detected at 0x46d6c4, inode 0x8700657ff8 dinode

  fatal error -- couldn't map inode 579827236856, err = 117

After another set of internet searches, we found some postings that
suggested this could be a bug that may have been fixed in later
versions, so we built xfs_repair V 6.5 and tried the repair again. The
results were the same. We even tried "xfs_repair -L", and no joy. So
now we're desperate. Is the data all lost? We can't mount the
filesystem. We tried using xfs_metadump (another suggestion from our
searches) and it reports lots of metadata corruption ending with:

Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300023518/0x1000
Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300296bf8/0x1000
Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x137fffb258/0x1000
Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x138009ebd8/0x1000
Metadata corruption detected at 0x467858, xfs_inobt block 0x138067f550/0x1000
Metadata corruption detected at 0x467858, xfs_inobt block 0x13834b39e0/0x1000
xfs_metadump: bad starting inode offset 5

Not sure what to try next. Any help would be greatly appreciated. Thanks!

Jorge

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS corruption after power surge/outage
  2024-02-09 18:39 XFS corruption after power surge/outage Jorge Garcia
@ 2024-02-11 20:39 ` Eric Sandeen
  2024-02-12 18:07   ` Jorge Garcia
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2024-02-11 20:39 UTC (permalink / raw)
  To: Jorge Garcia, linux-xfs

On 2/9/24 12:39 PM, Jorge Garcia wrote:
> Hello,
> 
> We have a server with a very large (300+ TB) XFS filesystem that we
> use to provide downloads to the world. Last week's storms in
> California caused damage to our machine room, causing unexpected power
> surges and power outages, even in our UPS and generator backed data
> center. One of the end results was some data corruption on our server
> (running Centos 8). After looking around the internet for solutions to
> our issues, the general consensus seemed to be to run xfs_repair on
> the filesystem to get it to recover. We tried that (xfs_repair V 5.0)
> and it seemed to report lots of issues before eventually failing
> during "Phase 6" with an error like:
> 
>   Metadata corruption detected at 0x46d6c4, inode 0x8700657ff8 dinode
> 
>   fatal error -- couldn't map inode 579827236856, err = 117
> 
> After another set of internet searches, we found some postings that
> suggested this could be a bug that may have been fixed in later
> versions, so we built xfs_repair V 6.5 and tried the repair again. The
> results were the same. We even tried "xfs_repair -L", and no joy. So
> now we're desperate. Is the data all lost? We can't mount the
> filesystem. We tried using xfs_metadump (another suggestion from our
> searches) and it reports lots of metadata corruption ending with:

I was going to suggest creating an xfs_metadump image for analysis.
Was that created with xfsprogs v6.5.0 as well?

> Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300023518/0x1000
> Metadata corruption detected at 0x4382f0, xfs_cntbt block 0x1300296bf8/0x1000
> Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x137fffb258/0x1000
> Metadata corruption detected at 0x4382f0, xfs_bnobt block 0x138009ebd8/0x1000
> Metadata corruption detected at 0x467858, xfs_inobt block 0x138067f550/0x1000
> Metadata corruption detected at 0x467858, xfs_inobt block 0x13834b39e0/0x1000
> xfs_metadump: bad starting inode offset 5

so the metadump did not complete?

Does the filesystem mount? Can you mount it -o ro or -o ro,norecovery
to see how much you can read off of it?

If mount fails, what is in the kernel log when it fails?

> Not sure what to try next. Any help would be greatly appreciated. Thanks!

Power losses really should not cause corruption, it's a metadata journaling
filesytem which should maintain consistency even with a power loss.

What kind of storage do you have, though? Corruption after a power loss often
stems from a filesystem on a RAID with a write cache that does not honor
data integrity commands and/or does not have its own battery backup.

-Eric

> Jorge
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS corruption after power surge/outage
  2024-02-11 20:39 ` Eric Sandeen
@ 2024-02-12 18:07   ` Jorge Garcia
  2024-02-12 21:06     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Jorge Garcia @ 2024-02-12 18:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Sun, Feb 11, 2024 at 12:39 PM Eric Sandeen <sandeen@sandeen.net> wrote:

> I was going to suggest creating an xfs_metadump image for analysis.
> Was that created with xfsprogs v6.5.0 as well?

> so the metadump did not complete?

I actually tried running xfs_metadump with both v5.0 and v6.5.0. They
both gave many error messages, but they created files. Not sure what I
can do with those files

> Does the filesystem mount? Can you mount it -o ro or -o ro,norecovery
> to see how much you can read off of it?

The file system doesn't mount. the message when I try to mount it is:

mount: /data: wrong fs type, bad option, bad superblock on /dev/sda1,
missing codepage or helper program, or other error.

and

Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): Superblock has unknown
incompatible features (0x10) enabled.
Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): Filesystem cannot be
safely mounted by this kernel.
Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): SB validate failed
with error -22.

I wonder if that is because I tried a xfs_repair with a newer version...

>
> If mount fails, what is in the kernel log when it fails?

> Power losses really should not cause corruption, it's a metadata journaling
> filesytem which should maintain consistency even with a power loss.
>
> What kind of storage do you have, though? Corruption after a power loss often
> stems from a filesystem on a RAID with a write cache that does not honor
> data integrity commands and/or does not have its own battery backup.

We have a RAID 6 card with a BBU:

Product Name    : AVAGO MegaRAID SAS 9361-8i
Serial No       : SK00485396
FW Package Build: 24.21.0-0017

I agree that power issues should not cause corruption, but here we
are. Somewhere on one of the discussion threads I saw somebody mention
ufsexplorer, and when I downloaded the trial version, it seemed to see
most of the files on the device. I guess if I can't find a way to
recover the current filesystem, I will try to use that to recover the
data.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS corruption after power surge/outage
  2024-02-12 18:07   ` Jorge Garcia
@ 2024-02-12 21:06     ` Dave Chinner
  2024-02-12 21:46       ` Jorge Garcia
  2024-02-12 22:39       ` Eric Sandeen
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2024-02-12 21:06 UTC (permalink / raw)
  To: Jorge Garcia; +Cc: Eric Sandeen, linux-xfs

On Mon, Feb 12, 2024 at 10:07:33AM -0800, Jorge Garcia wrote:
> On Sun, Feb 11, 2024 at 12:39 PM Eric Sandeen <sandeen@sandeen.net> wrote:
> 
> > I was going to suggest creating an xfs_metadump image for analysis.
> > Was that created with xfsprogs v6.5.0 as well?
> 
> > so the metadump did not complete?
> 
> I actually tried running xfs_metadump with both v5.0 and v6.5.0. They
> both gave many error messages, but they created files. Not sure what I
> can do with those files

Nothing - they are incomplete as metadump aborted at when it got
that error.

> > Does the filesystem mount? Can you mount it -o ro or -o ro,norecovery
> > to see how much you can read off of it?
> 
> The file system doesn't mount. the message when I try to mount it is:
> 
> mount: /data: wrong fs type, bad option, bad superblock on /dev/sda1,
> missing codepage or helper program, or other error.
> 
> and
> 
> Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): Superblock has unknown
> incompatible features (0x10) enabled.
> Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): Filesystem cannot be
> safely mounted by this kernel.
> Feb 12 10:06:02 hgdownload1 kernel: XFS (sda1): SB validate failed
> with error -22.

That has the XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR bit set...

> I wonder if that is because I tried a xfs_repair with a newer version...

.... which is a result of xfs_repair 6.5.0 crashing mid way through
repair of the filesystem. Your kernel is too old to recognise the
NEEDSREPAIR bit. You can clear it with xfs_db like this:

Run this to get the current field value:

# xfs_db -c "sb 0" -c "p features_incompat" <dev>

Then subtract 0x10 from the value returned and run:

# xfs_db -c "sb 0" -c "write features_incompat <val>" <dev>

But that won't get you too far - the filesystem is still corrupt and
inconsistent. By blowing away the log with xfs_repair before
actually determining if the problem was caused by a RAID array
issue, you've essentially forced yourself into a filesystem recovery
situation.

> > If mount fails, what is in the kernel log when it fails?
> 
> > Power losses really should not cause corruption, it's a metadata journaling
> > filesytem which should maintain consistency even with a power loss.
> >
> > What kind of storage do you have, though? Corruption after a power loss often
> > stems from a filesystem on a RAID with a write cache that does not honor
> > data integrity commands and/or does not have its own battery backup.
> 
> We have a RAID 6 card with a BBU:
> 
> Product Name    : AVAGO MegaRAID SAS 9361-8i
> Serial No       : SK00485396
> FW Package Build: 24.21.0-0017

Ok, so they don't actually have a BBU on board - it's an option to
add via a module, but the basic RAID controller doesn't have any
power failure protection. These cards are also pretty old tech now -
how old is this card, and when was the last time the cache
protection module was tested?

Indeed, how long was the power out for?

The BBU on most RAID controllers is only guaranteed to hold the
state for 72 hours (when new) and I've personally seen them last for
only a few minutes before dying when the RAID controller had been in
continuous service for ~5 years. So the duration of the power
failure may be important here.

Also, how are the back end disks configured? Do they have their
volatile write caches turned off? What cache mode was the RAID
controller operating in - write-back or write-through?

What's the rest of your storage stack? Do you have MD, LVM, etc
between the storage hardware and the filesystem?

> I agree that power issues should not cause corruption, but here we
> are.

Yup. Keep in mind that we do occasionally see these old LSI
hardware raid cards corrupt storage on power failure, so we're not
necessarily even looking for filesystem problems at this point in
time. We need to rule that out first before doing any more damage to
the filesystem than you've already done trying to recover it so
far...

> Somewhere on one of the discussion threads I saw somebody mention
> ufsexplorer, and when I downloaded the trial version, it seemed to see
> most of the files on the device. I guess if I can't find a way to
> recover the current filesystem, I will try to use that to recover the
> data.

Well, that's a last resort. But if your raid controller is unhealthy
or the volume has been corrupted by the raid controller the
ufsexplorer won't help you get your data back, either....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS corruption after power surge/outage
  2024-02-12 21:06     ` Dave Chinner
@ 2024-02-12 21:46       ` Jorge Garcia
  2024-02-12 22:39       ` Eric Sandeen
  1 sibling, 0 replies; 6+ messages in thread
From: Jorge Garcia @ 2024-02-12 21:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs

On Mon, Feb 12, 2024 at 1:06 PM Dave Chinner <david@fromorbit.com> wrote:

> Ok, so they don't actually have a BBU on board - it's an option to
> add via a module, but the basic RAID controller doesn't have any
> power failure protection. These cards are also pretty old tech now -
> how old is this card, and when was the last time the cache
> protection module was tested?

The card claims Mfg. Date or 01/26/20, which is not too old. The last
time the cache protection was tested? No idea.
The BBU status for the card reports battery state optimal.

> Indeed, how long was the power out for?

I'm not exactly sure how long power was out for, but probably less
than an hour. The data center is supposed to have UPS power and
generator power, but a breaker tripped, and we lost power. My guess is
that power was out for less than an hour, and probably more like a few
minutes.

>
> The BBU on most RAID controllers is only guaranteed to hold the
> state for 72 hours (when new) and I've personally seen them last for
> only a few minutes before dying when the RAID controller had been in
> continuous service for ~5 years. So the duration of the power
> failure may be important here.
>
> Also, how are the back end disks configured? Do they have their
> volatile write caches turned off? What cache mode was the RAID
> controller operating in - write-back or write-through?
>
> What's the rest of your storage stack? Do you have MD, LVM, etc
> between the storage hardware and the filesystem?

You may be asking questions I'm not sure how to answer. Most of the
settings are default settings. RAID controller was operating in WB
mode. No MD or LVM, just 24 disks in MegaRAID RAID-6 configuration,
then seen by the OS as one device, which was formatted as XFS.

> > Somewhere on one of the discussion threads I saw somebody mention
> > ufsexplorer, and when I downloaded the trial version, it seemed to see
> > most of the files on the device. I guess if I can't find a way to
> > recover the current filesystem, I will try to use that to recover the
> > data.
>
> Well, that's a last resort. But if your raid controller is unhealthy
> or the volume has been corrupted by the raid controller the
> ufsexplorer won't help you get your data back, either....

The controller is reporting everything as working, all disks are
Online and Spun Up, and no errors reported as far as I can tell. I did
get ufsexplorer, and it seems to be recovering data, but it will take
days or weeks to recover all of the data. Still would like to know
more of what happened and how to prevent it from happening in the
future, and what would have been the correct sequence of steps I
should have done when encountering a problem like this.

Thanks for all your help!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS corruption after power surge/outage
  2024-02-12 21:06     ` Dave Chinner
  2024-02-12 21:46       ` Jorge Garcia
@ 2024-02-12 22:39       ` Eric Sandeen
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2024-02-12 22:39 UTC (permalink / raw)
  To: Dave Chinner, Jorge Garcia; +Cc: linux-xfs

On 2/12/24 3:06 PM, Dave Chinner wrote:
> That has the XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR bit set...
> 
>> I wonder if that is because I tried a xfs_repair with a newer version...
> .... which is a result of xfs_repair 6.5.0 crashing mid way through
> repair of the filesystem. Your kernel is too old to recognise the
> NEEDSREPAIR bit. You can clear it with xfs_db like this:
> 
> Run this to get the current field value:
> 
> # xfs_db -c "sb 0" -c "p features_incompat" <dev>
> 
> Then subtract 0x10 from the value returned and run:
> 
> # xfs_db -c "sb 0" -c "write features_incompat <val>" <dev>
> 
> But that won't get you too far - the filesystem is still corrupt and
> inconsistent. By blowing away the log with xfs_repair before
> actually determining if the problem was caused by a RAID array
> issue, you've essentially forced yourself into a filesystem recovery
> situation.

Everything Dave said, yes. Depending on how bad the corruption is, you
*might* be able to do a readonly or readonly/norecovery mount and scrape
some data out.

Ideally the first thing to do would be to make a 1:1 dd image of the
device as a safe backup, but I understand it's 300T ...

-Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-02-12 22:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-09 18:39 XFS corruption after power surge/outage Jorge Garcia
2024-02-11 20:39 ` Eric Sandeen
2024-02-12 18:07   ` Jorge Garcia
2024-02-12 21:06     ` Dave Chinner
2024-02-12 21:46       ` Jorge Garcia
2024-02-12 22:39       ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).