unexpected XFS SB magic number

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* unexpected XFS SB magic number
@ 2006-12-22 17:42 Gaspar Bakos
  2006-12-22 20:12 ` Russell Cattelan
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-22 17:42 UTC (permalink / raw)
  To: linux-xfs

Dear all,

I have a 12 x 500Gb RAID-5 hardware RAID array on an ARECA 1130-ML
controller. There is one single partition on it, exported as /dev/sdc1.
This configuration used to work fine for 4 months.
Then the computer crashed a couple of times, and led to a situation where

xfs_check /dev/sdc1 output is:

xfs_check: unexpected XFS SB magic number 0x45464920
xfs_check: size check failed
xfs_check: read failed: Invalid argument
xfs_check: data size check failed
xfs_check: failed to alloc 58876353264 bytes: Cannot allocate memory

I also checked the RAID, and seemingly the controller is fine; I can
communicate with it, all 12 disks are visible, their SMART status is
OK, the RAID-5 is reported to be in 'normal' condition, etc.

[root@localhost ~]# xfs_db -r /dev/sdc1
xfs_db: unexpected XFS SB magic number 0x45464920
xfs_db: size check failed
xfs_db: read failed: Invalid argument
xfs_db: data size check failed
xfs_db: failed to alloc 58876353264 bytes: Cannot allocate memory

--------------------

[root@localhost ~]# xfs_repair -nv /dev/sdc1

Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
attempting to find secondary superblock...
................................................................................
...
....................found candidate secondary superblock...
unable to verify superblock, continuing...
...
....................found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

----------------

I would very much appreciate advice on how to proceed in such situation.
I worry that xfs_repair will repair, but may leave a mess that is hard
to recover. I am hoping there may be a safer way.

Best regards
Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 17:42 unexpected XFS SB magic number Gaspar Bakos
@ 2006-12-22 20:12 ` Russell Cattelan
  2006-12-22 20:51   ` Gaspar Bakos
  2006-12-22 23:15 ` Eric Sandeen
  2007-10-07  9:47 ` qon
  2 siblings, 1 reply; 13+ messages in thread
From: Russell Cattelan @ 2006-12-22 20:12 UTC (permalink / raw)
  To: gbakos; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2226 bytes --]

On Fri, 2006-12-22 at 12:42 -0500, Gaspar Bakos wrote:
> Dear all,
> 
> I have a 12 x 500Gb RAID-5 hardware RAID array on an ARECA 1130-ML
> controller. There is one single partition on it, exported as /dev/sdc1.
> This configuration used to work fine for 4 months.
> Then the computer crashed a couple of times, and led to a situation where
> 
> xfs_check /dev/sdc1 output is:
> 
> xfs_check: unexpected XFS SB magic number 0x45464920
> xfs_check: size check failed
> xfs_check: read failed: Invalid argument
> xfs_check: data size check failed
> xfs_check: failed to alloc 58876353264 bytes: Cannot allocate memory

What does 
xfs_db /dev/sdc1
sb 0
p 

look like?
That will print out the contents of the superblock

> 
> I also checked the RAID, and seemingly the controller is fine; I can
> communicate with it, all 12 disks are visible, their SMART status is
> OK, the RAID-5 is reported to be in 'normal' condition, etc.
> 
> [root@localhost ~]# xfs_db -r /dev/sdc1
> xfs_db: unexpected XFS SB magic number 0x45464920
> xfs_db: size check failed
> xfs_db: read failed: Invalid argument
> xfs_db: data size check failed
> xfs_db: failed to alloc 58876353264 bytes: Cannot allocate memory
> 
> --------------------
> 
> [root@localhost ~]# xfs_repair -nv /dev/sdc1
> 
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!
> attempting to find secondary superblock...
> ................................................................................
> ...
> ....................found candidate secondary superblock...
> unable to verify superblock, continuing...
> ...
> ....................found candidate secondary superblock...
> verified secondary superblock...
> would write modified primary superblock
> Primary superblock would have been modified.
> Cannot proceed further in no_modify mode.
> Exiting now.
> 
> ----------------
> 
> 
> I would very much appreciate advice on how to proceed in such situation.
> I worry that xfs_repair will repair, but may leave a mess that is hard
> to recover. I am hoping there may be a safer way.
> 
> 
> Best regards
> Gaspar
> 
-- 
Russell Cattelan <cattelan@thebarn.com>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 20:12 ` Russell Cattelan
@ 2006-12-22 20:51   ` Gaspar Bakos
  0 siblings, 0 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-22 20:51 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: linux-xfs

Dear Russell,

RE:
> xfs_db /dev/sdc1
> sb 0
> p
> look like?


Unfortunately I can not enter interactive mode with xfs_db:

[root@localhost ~]# xfs_db /dev/sdc1
xfs_db: unexpected XFS SB magic number 0x45464920
xfs_db: size check failed
xfs_db: read failed: Invalid argument
xfs_db: data size check failed
xfs_db: failed to alloc 58876353264 bytes: Cannot allocate memory

[root@localhost ~]#

I also tried specifying the command on the cmdline:

xfs_db -c "sb 0" -c p /dev/sdc1

But I get exactly the same message.

Cheers,
Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 17:42 unexpected XFS SB magic number Gaspar Bakos
  2006-12-22 20:12 ` Russell Cattelan
@ 2006-12-22 23:15 ` Eric Sandeen
  2006-12-22 23:28   ` Gaspar Bakos
  2007-10-07  9:47 ` qon
  2 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2006-12-22 23:15 UTC (permalink / raw)
  To: gbakos; +Cc: linux-xfs

Gaspar Bakos wrote:
> Dear all,
> 
> I have a 12 x 500Gb RAID-5 hardware RAID array on an ARECA 1130-ML
> controller. There is one single partition on it, exported as /dev/sdc1.
> This configuration used to work fine for 4 months.
> Then the computer crashed a couple of times, and led to a situation where
> 
> xfs_check /dev/sdc1 output is:
> 
> xfs_check: unexpected XFS SB magic number 0x45464920

That spells "EFI"

looks like somebody came along & labeled your disk for you.  Dangers of 
being on a san I suppose (if you are....)

repair might fix it....

-Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 23:15 ` Eric Sandeen
@ 2006-12-22 23:28   ` Gaspar Bakos
  2006-12-22 23:33     ` Eric Sandeen
  2006-12-22 23:38     ` Eric Sandeen
  0 siblings, 2 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-22 23:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Hi, Eric,

RE:
> > xfs_check: unexpected XFS SB magic number 0x45464920
> That spells "EFI"

Does this have any special meaning? E.g. I could figure out who
labelled the disk (array)?

> looks like somebody came along & labeled your disk for you.  Dangers of
> being on a san I suppose (if you are....)

Humm. This is an FC5 with SMP opteron. The RAID-5 (12 disks) is ran by
an ARECA 1130-ML card.

> repair might fix it....

I worry that what repair will leave behind is pure numbers (and many of those), like
11212/
12133/
121212/
...
and to reconstruct ~3Tb from that is not trivial...

This reminds me: what paranoid safety measures can one take? E.g keep
an external log of the filesystem as well?

Thanks for the thoughts!

Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 23:28   ` Gaspar Bakos
@ 2006-12-22 23:33     ` Eric Sandeen
  2006-12-22 23:38     ` Eric Sandeen
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Sandeen @ 2006-12-22 23:33 UTC (permalink / raw)
  To: gbakos; +Cc: linux-xfs

Gaspar Bakos wrote:
> Hi, Eric,
> 
> RE:
>>> xfs_check: unexpected XFS SB magic number 0x45464920
>> That spells "EFI"
> 
> Does this have any special meaning? E.g. I could figure out who
> labelled the disk (array)?

EFI is a bootloader....

>> looks like somebody came along & labeled your disk for you.  Dangers of
>> being on a san I suppose (if you are....)
> 
> Humm. This is an FC5 with SMP opteron. The RAID-5 (12 disks) is ran by
> an ARECA 1130-ML card.
> 
>> repair might fix it....
> 
> I worry that what repair will leave behind is pure numbers (and many of those), like
> 11212/
> 12133/
> 121212/
> ...
> and to reconstruct ~3Tb from that is not trivial...
> 
> This reminds me: what paranoid safety measures can one take? E.g keep
> an external log of the filesystem as well?
> 
> Thanks for the thoughts!
> 
> Gaspar
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 23:28   ` Gaspar Bakos
  2006-12-22 23:33     ` Eric Sandeen
@ 2006-12-22 23:38     ` Eric Sandeen
  2006-12-23 23:08       ` Gaspar Bakos
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2006-12-22 23:38 UTC (permalink / raw)
  To: gbakos; +Cc: linux-xfs

Gaspar Bakos wrote:

> This reminds me: what paranoid safety measures can one take? E.g keep
> an external log of the filesystem as well?

it looks like you wound up with an efi bootloader splatted over the 
front of your partition.  maybe the raid got scrambled around?  Or maybe 
someone actually installed a bootloader over an otherwise-ok filesystem? 
  hard to say.  not sure what could have prevented either of those.

are you sure the raid is in the right shape, and in the right order of 
disks?

i also remember something about parted (maybe...) finding a backup gpt 
signature at the end of a disk, and "helpfully" copying it over the 
front end if so.  This was a bug.  sgi guys do you remember?

-Eric

> Thanks for the thoughts!
> 
> Gaspar
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 23:38     ` Eric Sandeen
@ 2006-12-23 23:08       ` Gaspar Bakos
  2006-12-24  3:50         ` Eric Sandeen
  2006-12-24 12:16         ` Iustin Pop
  0 siblings, 2 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-23 23:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Hi, Eric,

RE:
> it looks like you wound up with an efi bootloader splatted over the
> front of your partition.  maybe the raid got scrambled around?  Or maybe
> someone actually installed a bootloader over an otherwise-ok filesystem?
>   hard to say.  not sure what could have prevented either of those.

To summarize; this is a hardware RAID-6 (and not a RAID-5 as I wrote
earlier) of 12 x 500Gb disks, thus the size is 5Tb.
The RAID card is an ARECA 1130-ML card.
The computer runs FC5 with 2.6.17-6 (kernel.org) kernel.
It was quite stable for 4 monhts.

I remember originally there was a problem with partitioning. fdisk
could not handle the 5Tb partition size (I needed one big partition, it
was out of question to split it up).

Then per recommendation of someone from another list, I used gparted
and set the partition type to GPT. This indeed made the job, and I was
able to create a 5Tb partition.
mkfs.xfs worked fine (xfsprogs-2.7.3-1.2.1)

I can definitely say that the RAID was not scrambled around.
There are only few users on this computer, and only one superuser
(myself) and no physical access to the computer by others.

One of the users was running quite memory and disk IO intensive tasks
past week. This lead to a crash. (I was not around to keep an eye on
it). The computer rebooted, and few days later another crash, etc.
Finally, when I returned this week, I found it powered off.

And then I realized that the sdc1 partition can not be mounted any more.

> i also remember something about parted (maybe...) finding a backup gpt
> signature at the end of a disk, and "helpfully" copying it over the
> front end if so.  This was a bug.  sgi guys do you remember?

But for this one has to invoke parted, and commit the operations done,
am I right?
Maybe there is a nasty daemon doing something. The fs was also exported
as NFS and mounted by two other hosts.

---------
So the questions are:
- what partition type to choose next time?
- is there a simpler way of recovery (than xfs_recovery), i.e. the
  first few bytes of the partition need to be changed back to something
  XFS magic, and the rest is probably untouched?

Cheers
Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-23 23:08       ` Gaspar Bakos
@ 2006-12-24  3:50         ` Eric Sandeen
  2006-12-24  6:08           ` Gaspar Bakos
  2006-12-24 12:16         ` Iustin Pop
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2006-12-24  3:50 UTC (permalink / raw)
  To: gbakos; +Cc: linux-xfs

Gaspar Bakos wrote:
> Hi, Eric,

> One of the users was running quite memory and disk IO intensive tasks
> past week. This lead to a crash. (I was not around to keep an eye on
> it). The computer rebooted, and few days later another crash, etc.
> Finally, when I returned this week, I found it powered off.
> 
> And then I realized that the sdc1 partition can not be mounted any more.

Well, something put a gpt label on top of your xfs partition... and it 
wasn't xfs :)

>> i also remember something about parted (maybe...) finding a backup gpt
>> signature at the end of a disk, and "helpfully" copying it over the
>> front end if so.  This was a bug.  sgi guys do you remember?
> 
> But for this one has to invoke parted, and commit the operations done,
> am I right?

if I recall, even invoking parted could do this.

> Maybe there is a nasty daemon doing something. The fs was also exported
> as NFS and mounted by two other hosts.
> 
> ---------
> So the questions are:
> - what partition type to choose next time?
> - is there a simpler way of recovery (than xfs_recovery), i.e. the
>   first few bytes of the partition need to be changed back to something
>   XFS magic, and the rest is probably untouched?

i'd google around and find out how big the gpt header is; try to find 
out how much of the front of your partition got clobbered.  that'll give 
a clue as to how much you may have lost.

-Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-24  3:50         ` Eric Sandeen
@ 2006-12-24  6:08           ` Gaspar Bakos
  0 siblings, 0 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-24  6:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Hi, Eric,

RE:
> Well, something put a gpt label on top of your xfs partition... and it
> wasn't xfs :)

After some contemplation I allowed xfs_repair to make an attempt.
It seems that nothing was recovered.

If found a secondary suplerblock, and then lot of message followed ...
(attached).

The repaired XFS filesystem is 2Tb in size (as opposed to the 5TB), and
0% is used:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdd1            2147352448       528 2147351920   1% /mnt/ar1/un0

After the recovery parted reports:
Error: Unable to open /dev/sdd - unrecognised disk label.

( For parted /dev/sdd1: partition type reported as 'loop'. _

fdisk reports GPT partition type for /dev/sdd:

[root@cfhat7 xfs]# fdisk -l /dev/sdd
WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'!
The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/sdd: 4999.9 GB, 4999998341120 bytes
255 heads, 63 sectors/track, 607881 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1      267350  2147483647+  ee  EFI GPT

---------
So all this is a great mess.
parted is supposed to be better than fdisk, yet it does not recognize
what fdisk does?

Now having lost all the data, I may start experimenting if I can
reproduce this.

Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-23 23:08       ` Gaspar Bakos
  2006-12-24  3:50         ` Eric Sandeen
@ 2006-12-24 12:16         ` Iustin Pop
  1 sibling, 0 replies; 13+ messages in thread
From: Iustin Pop @ 2006-12-24 12:16 UTC (permalink / raw)
  To: Gaspar Bakos; +Cc: linux-xfs

On Sat, Dec 23, 2006 at 06:08:58PM -0500, Gaspar Bakos wrote:
> So the questions are:
> - what partition type to choose next time?

In cases where I want to use the whole disk, I usually make the
filesystem directly over the disk (e.g. /dev/sdc) instead of
partitioning it.

Never had any problem with this setup.

Regards,
Iustin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: unexpected XFS SB magic number
  2006-12-22 17:42 unexpected XFS SB magic number Gaspar Bakos
  2006-12-22 20:12 ` Russell Cattelan
  2006-12-22 23:15 ` Eric Sandeen
@ 2007-10-07  9:47 ` qon
  2 siblings, 0 replies; 13+ messages in thread
From: qon @ 2007-10-07  9:47 UTC (permalink / raw)
  To: linux-xfs

I'm sorry this is quite late for you but I recently had exactly the same
problem and now finally found the solution. So this might help people that
will get the same error hopefully.

You used a GPT Partition Table. To protect the GPT table from os that do not
support is a standard mbr with msdos partition table is written to the disc
in front of the GPT table that indicates the whole disc space is used by an
unknown partition of 2TB (as msdos tables do not support larger partitions).

Linux recognizes GPT table when booting only when this is explicitly enable
in the Kernel (filesystem/partition types/efi). Though when creating a GPT
table using parted for example obviously when then rereading the partition
table the partition is recognized correctly. This might change after a
reboot as in you case. For me it then reportet a 2TB partition on my 2,X TB
harddisk and it could no longer mount my XFS filesystem. I ran xfs_check and
got exactly the same error as you (unexspected XFS SB). This is because it
tried to mount the partition in the fake msdos partition table of 2 tb that
startet exactly where the GPT table resides, so when looking for the
superblock it found the EFI Bootloader.

When you ran xfs_repair it of course found a superblock (may be even the
correct superblock), but all pointers are shiftet so it did not find any
files. Then you partition was reportet as 2tb partition after the repair,
because it used the partition size from the msdos table again.

So for me updateing the kernel to directly support efi tables solved the
problem.

qon

Gaspar Bakos wrote:
> 
> Dear all,
> 
> I have a 12 x 500Gb RAID-5 hardware RAID array on an ARECA 1130-ML
> controller. There is one single partition on it, exported as /dev/sdc1.
> This configuration used to work fine for 4 months.
> Then the computer crashed a couple of times, and led to a situation where
> 
> xfs_check /dev/sdc1 output is:
> 
> xfs_check: unexpected XFS SB magic number 0x45464920
> xfs_check: size check failed
> xfs_check: read failed: Invalid argument
> xfs_check: data size check failed
> xfs_check: failed to alloc 58876353264 bytes: Cannot allocate memory
> 
> I also checked the RAID, and seemingly the controller is fine; I can
> communicate with it, all 12 disks are visible, their SMART status is
> OK, the RAID-5 is reported to be in 'normal' condition, etc.
> 
> [root@localhost ~]# xfs_db -r /dev/sdc1
> xfs_db: unexpected XFS SB magic number 0x45464920
> xfs_db: size check failed
> xfs_db: read failed: Invalid argument
> xfs_db: data size check failed
> xfs_db: failed to alloc 58876353264 bytes: Cannot allocate memory
> 
> --------------------
> 
> [root@localhost ~]# xfs_repair -nv /dev/sdc1
> 
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!
> attempting to find secondary superblock...
> ................................................................................
> ...
> ....................found candidate secondary superblock...
> unable to verify superblock, continuing...
> ...
> ....................found candidate secondary superblock...
> verified secondary superblock...
> would write modified primary superblock
> Primary superblock would have been modified.
> Cannot proceed further in no_modify mode.
> Exiting now.
> 
> ----------------
> 
> 
> I would very much appreciate advice on how to proceed in such situation.
> I worry that xfs_repair will repair, but may leave a mess that is hard
> to recover. I am hoping there may be a safer way.
> 
> 
> Best regards
> Gaspar
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/unexpected-XFS-SB-magic-number-tf2871877.html#a13081228
Sent from the linux-xfs mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* unexpected XFS SB magic number
@ 2006-12-22 17:45 Gaspar Bakos
  0 siblings, 0 replies; 13+ messages in thread
From: Gaspar Bakos @ 2006-12-22 17:45 UTC (permalink / raw)
  To: linux-xfs

Dear all,

I forgot to mention some crucial information
( RE: unexpected XFS SB magic number. )

This is an AMD64 bit SMP system with 2.6.17-6 kernel under FC5.

Cheers,
Gaspar

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-07  9:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-22 17:42 unexpected XFS SB magic number Gaspar Bakos
2006-12-22 20:12 ` Russell Cattelan
2006-12-22 20:51   ` Gaspar Bakos
2006-12-22 23:15 ` Eric Sandeen
2006-12-22 23:28   ` Gaspar Bakos
2006-12-22 23:33     ` Eric Sandeen
2006-12-22 23:38     ` Eric Sandeen
2006-12-23 23:08       ` Gaspar Bakos
2006-12-24  3:50         ` Eric Sandeen
2006-12-24  6:08           ` Gaspar Bakos
2006-12-24 12:16         ` Iustin Pop
2007-10-07  9:47 ` qon
  -- strict thread matches above, loose matches on Subject: below --
2006-12-22 17:45 Gaspar Bakos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox