My Thecus RAID-0 filesystem unmountable with mdadm. Please help.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
@ 2010-05-24 12:40 David Reniau
  2010-05-24 21:57 ` Michael Evans
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: David Reniau @ 2010-05-24 12:40 UTC (permalink / raw)
  To: linux-raid

Hello Linux-Raid list,





------------------------

My problem in a nutshell:

------------------------



I am unable to mount a RAID-0 (EXT3?) filesystem which I previously
assembled with mdadm under Ubuntu 9.10 32bitx. This RAID-0 array was
originally created by my NAS Thecus N4100.



I am getting the following console message:



mount: wrong fs type, bad option, bad superblock on /dev/md0,

missing codepage or helper program, or other error

In some cases useful info is found in syslog - try

dmesg | tail or so



See test [T007] below for detailed messages.



In other words, I cannot recover my stored data.



Can you help? It's crucial to me.



Thanks in advance,



David







------------------------

My story in (very) short:

------------------------



I own a NAS Thecus N4100 -- works perfectly -- with 4 x 400GB disks
running as a RAID-0 array. No physical disk error, RAID is perfectly
sain.



In Feb 2010, I (had to) extract(ed) the 4 disks off the NAS rack in
order to remount the RAID under a regular Linux box. I placed the 4
disks in USB cases, labelled the cases (1,2,3,4) according to the
disks genuine order in the NAS rack (see Figure 1 below), and tried to
rebuild the RAID-0 array by means of mdadm under Ubuntu 9.10, using
the disks connection situation depited in Figure 2.



After several trial and error manipulations (not only, but in
particular, to regenerate the RAID superblocks), I was able to
re-create the RAID-0 array but... I am unable to mount the RAID file
system in the end.



Panik: I attempted to insert the disks back to the NAS to check
whether the RAID was still "alive" in the NAS. The NAS rebooted for
about 10 minutes (what did it do? I do not know), then reported that
my RAID configuration was gone. I wrote nothing to the NAS, I properly
shut the NAS down, and put the disks back to their USB cases in order
to resolve this RAID issue (if resolvable?) with mdadm.



I performed several tests (reported below) which formally describe the
situation I am facing.



I do need your help to understand what is wrong and whether (and how)
to solve this issue.




NOTE: I am currently working with 4 disk images of my physical disks
as depicted by Figure 3 below, so I can safely perform destructive
tests and manipulations you suggest me, with no risk for the original
disks. I (only...) need 15 hours to recreate a disk image set.
..






-----------------

My main questions:

-----------------



1) Test [T006]

   ***********



(1.1) Is there something / what is wrong my with my RAID superblocks?



mdadm accepts to assemble the disks, reports a sain RAID, BUT mdadm -D
details about each disk unexceptingly reference a fixed /dev/sdc2
partition in the array, which is no way involved in the array
(although there is an existing /dev/sdc2 partition on my /dev/sdc
disk).

(1.2) Can this be a source of the problems?



(1.3) Can this so-called RAID superblock inconsistency due to some
deeper EXT2/3 filesystem issue or corruption mentioned in question 2
below?

(1.4) How can I practically perform a deep RAID superblock check on
each disk other than mdadm -D /dev/diskN that would make such an
inconsistency explanable?






2) Test [T007]:  EXT2/3 file system issues

   ***************************************



(2.1) Do I FIRST need to resolve the EXT2/3 filesystem issues reported
when mounting the RAID filesystem (therefore considering the EXT2/3
induces the RAID isssue)?



(2.2) Or reversely: are the EXT2/3 issues the consequence of a RAID issue?



(2.3) At which level do I need to work in order to solve this issue out?


(2.4) Any work methodology advice is welcome...


3) Do I need to recover my RAID partition in order to mount it, or is
there any RAID-related manipulation, configuration I missed with
mdadm, which prevents me from mounting it?





4) Test [T003]

   ***********



The tests performed with Palimpsest report a 201 MB "unknown" partition:



(4.1) Where does this disk zone come from?


(4.2) Was it accidentally created by the NAS when the disks were
re-inserted back to its rack? (sent a ticket to Thecus support about
this point - no answer yet).


(4.3) Is this 201MB unkown zone a RAID-0 disk feature common to all
RAID-0 arrays? If yes, what is this zone supposed to contain?

(4.4) If yes to question (4.2): does this mean my RAID-0 data are
definively lost because the N4100 implicitly deleted a part of the
RAID partition?






5) Tests [T010] to [T012]

   **********************



Test executed with Testdisk report a Linux partition that seems to be
living beside the RAID component partition, and could be a "lost"
partition buried in the 201MB unknown zone.



(5.1) Is this partition a feature in RAID-0 arrays?



(5.2) Is this an inconsistency caused by missusing mdadm? Or by the
NAS when the disks were inserted back to their rack?



(5.2) How can I resolve that?





6) Any piece of advice, test to perform, manipulations, etc are welcome.









----------------------------------

THECUS N4100 initial configuration:

----------------------------------



- Firmware            : 1.3.06 (SSH plugin installed)

- RAID Level          : RAID-0

- Disks               : 4 x Seagate Baracuda ST3400832AS 400 GB

- Total RAID capacity : 1.6 TB

- Used space          : around 75%





- Seagate ST3400832AS features (from manufacturer):



  * Total capacity    : 400 GB

  * Usable capacity   : 372.6 GB

  * Cylinders         : 16383

  * Heads             : 16

  * Sectors           : 63





- Figure 1: Disks genuine ordering in the NAS rack:

  ********

                  +------------+

  * Top disk    : |   Disk 1   |

                  +------------+

  * Next disk   : |   Disk 2   |

                  +------------+

  * Third disk  : |   Disk 3   |

                  +------------+

  * Bottom disk : |   Disk 4   |

                  +------------+






- Figure 2: Disks connections in the Linux box:

  ********



    Thecus          USB           Linux        Disk

    N4100         devices        devices     partitions



  +--------+

  |        |                              --> 201MB   (Unknown)

  | DISK 1 | --> USB Disk 1 --> /dev/sdf |

  |        |                              --> 372.4GB (RAID compon.1)

  +--------+



  +--------+

  |        |                              --> 201MB   (Unknown)

  | DISK 2 | --> USB Disk 2 --> /dev/sdg |

  |        |                              --> 372.4GB (RAID compon.2)

  +--------+



  +--------+

  |        |                              --> 201MB   (Unknown)

  | DISK 3 | --> USB Disk 3 --> /dev/sdh |

  |        |                              --> 372.4GB (RAID compon.3)

  +--------+



  +--------+

  |        |                              --> 201MB   (Unknown)

  | DISK 4 | --> USB Disk 4 --> /dev/sdi |

  |        |                              --> 372.4GB (RAID compon.4)

  +--------+







- Figure 3: Corresponding disk images situation:

  ********


    Thecus         Disk          Loop           Mapped disk

    N4100         images        devices         partitions



  +--------+                              --> 201MB   (Unknown)

  |        |                             |    /dev/mapper/loop0p1

  | DISK 1 | --> disk0.hd --> /dev/loop0 |

  |        |                              --> 372.4GB (RAID compo.1)

  +--------+                                 /dev/mapper/loop0p2





  +--------+                              --> 201MB   (Unknown)

  |        |                             |    /dev/mapper/loop1p1

  | DISK 2 | --> disk1.hd --> /dev/loop1 |

  |        |                              --> 372.4GB (RAID compo.2)

  +--------+                                 /dev/mapper/loop1p2





  +--------+                              --> 201MB   (Unknown)

  |        |                             |    /dev/mapper/loop2p1

  | DISK 3 | --> disk2.hd --> /dev/loop2 |

  |        |                              --> 372.4GB (RAID compo.3)

  +--------+                                 /dev/mapper/loop2p2





  +--------+                              --> 201MB   (Unknown)

  |        |                             |    /dev/mapper/loop3p1

  | DISK 4 | --> disk3.hd --> /dev/loop3 |

  |        |                              --> 372.4GB (RAID compo.4)

  +--------+                                 /dev/mapper/loop3p2













-------------------------------------------------------

Performed tests & manipulation descriptions and results:

-------------------------------------------------------





PART 1: TESTS USING mdadm

*************************



------

[T001] Connecting USB disk 1, disk 2, disk 3, disk 4 and gathering information.

------



The purpose of this test suite is to verify the response of the system
when connecting each physical RAID disk as a USB device.





* ACTION *



I am connecting disk 1 as USB device /dev/sdf to my Ubuntu system:





* messages.log *



May 15 14:42:19 obelix kernel: [176690.908772] usb 1-7.3.3: new high
speed USB device using ehci_hcd and address 11

May 15 14:42:19 obelix kernel: [176691.002540] usb 1-7.3.3:
configuration #1 chosen from 1 choice

May 15 14:42:19 obelix kernel: [176691.011777] scsi10 : SCSI emulation
for USB Mass Storage devices

May 15 14:42:24 obelix kernel: [176696.059395] scsi 10:0:0:0:
Direct-Access     ST340083 2AS                   PQ: 0 ANSI: 2 CCS

May 15 14:42:24 obelix kernel: [176696.060124] sd 10:0:0:0: Attached
scsi generic sg8 type 0

May 15 14:42:24 obelix kernel: [176696.071314] sd 10:0:0:0: [sdf]
781422768 512-byte logical blocks: (400 GB/372 GiB)

May 15 14:42:24 obelix kernel: [176696.075622] sd 10:0:0:0: [sdf]
Write Protect is off

May 15 14:42:24 obelix kernel: [176696.078622]  sdf: sdf1 sdf2

May 15 14:42:24 obelix kernel: [176696.116632] sd 10:0:0:0: [sdf]
Attached SCSI disk





* ACTION *



I am connecting disk 2 as USB device /dev/sdg to my Ubuntu system:



* messages.log *



May 15 14:52:11 obelix kernel: [177282.841023] usb 1-7.3.1: new high
speed USB device using ehci_hcd and address 12

May 15 14:52:11 obelix kernel: [177282.936281] usb 1-7.3.1:
configuration #1 chosen from 1 choice

May 15 14:52:11 obelix kernel: [177282.955419] scsi11 : SCSI emulation
for USB Mass Storage devices

May 15 14:52:16 obelix kernel: [177287.961386] scsi 11:0:0:0:
Direct-Access     ST340083 2AS                   PQ: 0 ANSI: 2

May 15 14:52:16 obelix kernel: [177287.962147] sd 11:0:0:0: Attached
scsi generic sg9 type 0

May 15 14:52:16 obelix kernel: [177287.969607] sd 11:0:0:0: [sdg]
781422768 512-byte logical blocks: (400 GB/372 GiB)

May 15 14:52:16 obelix kernel: [177287.975128] sd 11:0:0:0: [sdg]
Write Protect is off

May 15 14:52:16 obelix kernel: [177287.980862]  sdg: sdg1 sdg2

May 15 14:52:16 obelix kernel: [177288.011894] sd 11:0:0:0: [sdg]
Attached SCSI disk





* ACTION *



I am connecting disk 3 as USB device /dev/sdh to my Ubuntu system:



* messages.log *



May 15 14:59:33 obelix kernel: [177724.441158] usb 1-7.2: new high
speed USB device using ehci_hcd and address 14

May 15 14:59:33 obelix kernel: [177724.536461] usb 1-7.2:
configuration #1 chosen from 1 choice

May 15 14:59:33 obelix kernel: [177724.543552] scsi13 : SCSI emulation
for USB Mass Storage devices

May 15 14:59:38 obelix kernel: [177729.545857] scsi 13:0:0:0:
Direct-Access     ST340083 2AS                   PQ: 0 ANSI: 2

May 15 14:59:38 obelix kernel: [177729.546667] sd 13:0:0:0: Attached
scsi generic sg10 type 0

May 15 14:59:38 obelix kernel: [177729.552659] sd 13:0:0:0: [sdh]
781422768 512-byte logical blocks: (400 GB/372 GiB)

May 15 14:59:38 obelix kernel: [177729.556128] sd 13:0:0:0: [sdh]
Write Protect is off

May 15 14:59:38 obelix kernel: [177729.561068]  sdh: sdh1 sdh2

May 15 14:59:38 obelix kernel: [177729.590054] sd 13:0:0:0: [sdh]
Attached SCSI disk





* ACTION *



I am connecting disk 4 as USB device /dev/sdi to my Ubuntu system:



* messages.log *



May 15 15:00:14 obelix kernel: [177765.658207] usb 1-7.3.4: new high
speed USB device using ehci_hcd and address 15

May 15 15:00:14 obelix kernel: [177765.752468] usb 1-7.3.4:
configuration #1 chosen from 1 choice

May 15 15:00:14 obelix kernel: [177765.773190] scsi14 : SCSI emulation
for USB Mass Storage devices

May 15 15:00:19 obelix kernel: [177770.777746] scsi 14:0:0:0:
Direct-Access     ST340083 2AS                   PQ: 0 ANSI: 2

May 15 15:00:19 obelix kernel: [177770.778639] sd 14:0:0:0: Attached
scsi generic sg11 type 0

May 15 15:00:19 obelix kernel: [177770.789192] sd 14:0:0:0: [sdi]
781422768 512-byte logical blocks: (400 GB/372 GiB)

May 15 15:00:19 obelix kernel: [177770.796334] sd 14:0:0:0: [sdi]
Write Protect is off

May 15 15:00:19 obelix kernel: [177770.805059]  sdi: sdi1 sdi2

May 15 15:00:19 obelix kernel: [177770.837077] sd 14:0:0:0: [sdi]
Attached SCSI disk




* ACTION *

I am now collecting summarizing information about the USB RAID disks
connected to my Ubuntu box:

$ sudo blkid


* CONSOLE-OUT *

(... other devices ...)

/dev/sdf2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdg2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdh2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"
/dev/sdi2: UUID="ecfe8404-2f35-4a45-d668-56da8e136666" TYPE="linux_raid_member"


* QUESTION *

The 4 RAID disk partitions are detected as "Linux Raid Members" and
share the same UUID, which should be normal since they belong to the
same RAID array. Is this right?





------

[T002] Disks 1, 2, 3 and 4 geometry and partitions using fdisk -l

------



The purpose of this test suite is to report the physical geomtry and
partitioning information returned by fdisk -l for each physical RAID
disk.



* ACTION *



I am examinating each physical disk geometry and partitioning reported
by fdisk.



$ sudo fdisk -l /dev/sdf



* CONSOLE-OUT *



Disk /dev/sdf: 400.1 GB, 400088457216 bytes

16 heads, 63 sectors/track, 775221 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Disk identifier : 0x00000000



Device Boot          Start        End      Blocks   Id  System

/dev/sdf1               1         389      196024+  83  Linux

/dev/sdf2             390      775221   390515328   fd  Linux raid autodetect





$ sudo fdisk -l /dev/sdg



* CONSOLE-OUT *



Disk /dev/sdg: 400.1 GB, 400088457216 bytes

16 heads, 63 sectors/track, 775221 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Disk identifier : 0x00000000



Device Boot          Start        End      Blocks   Id  System

/dev/sdg1               1         389      196024+  83  Linux

/dev/sdg2             390      775221   390515328   fd  Linux raid autodetect





$ sudo fdisk -l /dev/sdh



* CONSOLE-OUT *



Disk /dev/sdh: 400.1 GB, 400088457216 bytes

16 heads, 63 sectors/track, 775221 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Disk identifier : 0x00000000



Device Boot          Start        End      Blocks   Id  System

/dev/sdh1               1         389      196024+  83  Linux

/dev/sdh2             390      775221   390515328   fd  Linux raid autodetect





$ sudo fdisk -l /dev/sdi



* CONSOLE-OUT *



Disk /dev/sdi: 400.1 GB, 400088457216 bytes

16 heads, 63 sectors/track, 775221 cylinders

Units = cylinders of 1008 * 512 = 516096 bytes

Disk identifier : 0x00000000



Device Boot          Start        End      Blocks   Id  System

/dev/sdi1               1         389      196024+  83  Linux

/dev/sdi2             390      775221   390515328   fd  Linux raid autodetect





------

[T003] Using palimpsest to view disks partition structures.

------



* ACTION *



For this diagnostic, I am using graphical disk manager application
"palimpsest" under Gnome to visualize the 4 USB disk devices /dev/sdf,
/dev/sdg, /dev/sdh, /dev/sdi in order to confirm the results I got
from previous fdisk -l commands.



* RESULTS *



- See attached images 01 to 04.

- The 4 USB disks are correctly displayed in the disk tree (image01.png)

- For each USB disk, there is an unknown or unused 201MB partition
(image02.png)

- Each Seagate disk contains a second partition labelled "Linux Raid
Member" (image03.png)



- The 4 disks are detected as a coherent RAID drive (image04.png)

- The assembled filesystem is reported "mountable" by Palimpset (image05.png)



* COMMENTS *



- On image 04, one notices that only the second partition (/dev/sdf2,
/dev/sdg2, /dev/sdh2, /dev/sdi2) typed as "linux raid member"
partition of each disk is used for assembling the final RAID drive.

- The assembled filesystem is reported to be an ext2 filesystem

- I am unable to mount the RAID filesystem by using Palimpsest.





------

[T004] Using mdadm to assemble the full disks as one single RAID-0 device.

------



* ACTION *



I am using standard raid management tool mdadm to assemble the 4 USB
physical disk as one single RAID device. I am using switch -A (not
switch --create) because I have already created the array previously
and regenerated the persistent superblocks on each disks.
Nevertheless, please note that I am explicitly mentioning which
devices (and their order) are involved in the assembled array.



$ sudo mdadm -A /dev/md0 /dev/sdf /dev/sdg /dev/sdh /dev/sdi
                            ^         ^        ^        ^
                            |         |        |        |
                          DISK 1    DISK 2   DISK 3   DISK 4




* CONSOLE-OUT *



mdadm: no recogniseable superblock on /dev/sdf

mdadm: /dev/sdf has no superblock - assembly aborted



* COMMENTS *



This error seems logical: for each disk, only the second partition,
labelled "Linux raid member" is supposed to be part of the RAID array.





------

[T005] mdadm to assemble the "linux raid" partitions as one single
RAID-0 device.

------



* ACTION *



Same test as [TEST004]. But this time, I am explicitly assembling the
"linux raid member" partition of each disk. See [T001] for the
partitions of each disk.



$ sudo mdadm -A /dev/md0 /dev/sdf2    /dev/sdg2    /dev/sdh2    /dev/sdi2
                            ^            ^             ^            ^
                            |            |             |            |
                        RAID comp.1  RAID comp.2  RAID comp.3  RAID comp.4


* CONSOLE-OUT *



mdadm: /dev/md0 has been started with 4 drives.





* messages.log *



May 15 16:42:49 obelix kernel: [183920.968499] md: md0 stopped.

May 15 16:42:49 obelix kernel: [183921.161066] md: bind<sdg2>

May 15 16:42:49 obelix kernel: [183921.173482] md: bind<sdh2>

May 15 16:42:49 obelix kernel: [183921.181697] md: bind<sdi2>

May 15 16:42:49 obelix kernel: [183921.183694] md: bind<sdf2>

May 15 16:42:49 obelix kernel: [183921.186312] raid0: looking at sdf2

May 15 16:42:49 obelix kernel: [183921.186318] raid0:   comparing
sdf2(781030528)

May 15 16:42:49 obelix kernel: [183921.186323]  with sdf2(781030528)

May 15 16:42:49 obelix kernel: [183921.186327] raid0:   END

May 15 16:42:49 obelix kernel: [183921.186330] raid0:   ==> UNIQUE

May 15 16:42:49 obelix kernel: [183921.186333] raid0: 1 zones

May 15 16:42:49 obelix kernel: [183921.186337] raid0: looking at sdi2

May 15 16:42:49 obelix kernel: [183921.186342] raid0:   comparing
sdi2(781030528)

May 15 16:42:49 obelix kernel: [183921.186346]  with sdf2(781030528)

May 15 16:42:49 obelix kernel: [183921.186349] raid0:   EQUAL

May 15 16:42:49 obelix kernel: [183921.186353] raid0: looking at sdh2

May 15 16:42:49 obelix kernel: [183921.186358] raid0:   comparing
sdh2(781030528)

May 15 16:42:49 obelix kernel: [183921.186362]  with sdf2(781030528)

May 15 16:42:49 obelix kernel: [183921.186365] raid0:   EQUAL

May 15 16:42:49 obelix kernel: [183921.186369] raid0: looking at sdg2

May 15 16:42:49 obelix kernel: [183921.186374] raid0:   comparing
sdg2(781030528)

May 15 16:42:49 obelix kernel: [183921.186378]  with sdf2(781030528)

May 15 16:42:49 obelix kernel: [183921.186381] raid0:   EQUAL

May 15 16:42:49 obelix kernel: [183921.186384] raid0: FINAL 1 zones

May 15 16:42:49 obelix kernel: [183921.186393] raid0: done.

May 15 16:42:49 obelix kernel: [183921.186397] raid0 : md_size is
3124122112 sectors.

May 15 16:42:49 obelix kernel: [183921.186401] ******* md0
configuration *********

May 15 16:42:49 obelix kernel: [183921.186405] zone0=[sdf2/sdg2/sdh2/sdi2/]

May 15 16:42:49 obelix kernel: [183921.186417]         zone offset=0kb
device offset=0kb size=1562061056kb

May 15 16:42:49 obelix kernel: [183921.186421]
**********************************

May 15 16:42:49 obelix kernel: [183921.186423]

May 15 16:42:49 obelix kernel: [183921.186446] md0: detected capacity
change from 0 to 1599550521344

May 15 16:42:49 obelix kernel: [183921.194595]  md0: unknown partition table





* COMMENTS *



The command now apparently worked.The RAID array seems to be
assembled. In test [T006] below, I am performing simple RAID diagnosis
using the mdadm command.



* QUESTION *



Messages.log indicates that there is no partition table available on
device /dev/md0. Is this normal?







------

[T006] Diagnosing the assembled RAID array using mdadm

------



* ACTION *



Listing the assembled arrays at kernel



$ sudo cat /proc/mdstat



* CONSOLE-OUT *



Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]



md0 : active raid0 sdf2[0] sdi2[3] sdh2[2] sdg2[1]

      1562061056 blocks 64k chunks



* COMMENTS *



The kernel sees a RAID-0 device /dev/md0 assembled with the following
disks devices ordered as /dev/sdf2, /dev/sdg2, /dev/sdh2, and
/dev/sdi2.



* ACTION *



Let's get details about the assembled RAID-0 device /dev/md0



$ sudo mdadm -D /dev/md0



* CONSOLE-OUT *



/dev/md0:

        Version : 00.90

  Creation Time : Fri Feb 19 01:23:02 2010

     Raid Level : raid0

     Array Size : 1562061056 (1489.70 GiB 1599.55 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Persistence : Superblock is persistent



    Update Time : Fri Feb 19 01:23:02 2010

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0



     Chunk Size : 64K



           UUID : ecfe8404:2f354a45:d66856da:8e136666

         Events : 0.1



    Number   Major   Minor   RaidDevice State

       0       8       82        0      active sync   /dev/sdf2

       1       8       98        1      active sync   /dev/sdg2

       2       8      114        2      active sync   /dev/sdh2

       3       8      130        3      active sync   /dev/sdi2





* COMMENTS *



This result seems consistent!





* ACTION *



Let's get details about RAID component partition /dev/sdf2 (DISK 1) with mdadm:



$ sudo mdadm -E /dev/sdf2



* CONSOLE-OUTPUT *



/dev/sdf2:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : ecfe8404:2f354a45:d66856da:8e136666

  Creation Time : Fri Feb 19 01:23:02 2010

     Raid Level : raid0

  Used Dev Size : 0

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Fri Feb 19 01:23:02 2010

          State : active

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : c0d7901b - correct

         Events : 1



     Chunk Size : 64K



      Number   Major   Minor   RaidDevice State

this     0       8       34        0      active sync   /dev/sdc2



   0     0       8       34        0      active sync   /dev/sdc2

   1     1       8       50        1      active sync

   2     2       8       66        2      active sync

   3     3       8       82        3      active sync   /dev/sdf2





* COMMENTS *



This result does not look not consistent:



- Why is /dev/sdc2 mentioned here as the current device? Should be /dev/sdf2.

- Why devices 1 and 2 are left blank?

- Why is device /dev/sdf2 (current device) mentioned as device 3?





* ACTION *



Let's get details about RAID component partition /dev/sdg2 (DISK 2) with mdadm:



$ sudo mdadm -E /dev/sdg2



* CONSOLE-OUTPUT *



/dev/sdg2:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : ecfe8404:2f354a45:d66856da:8e136666

  Creation Time : Fri Feb 19 01:23:02 2010

     Raid Level : raid0

  Used Dev Size : 0

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Fri Feb 19 01:23:02 2010

          State : active

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : c0d7902d - correct

         Events : 1



     Chunk Size : 64K



      Number   Major   Minor   RaidDevice State

this     1       8       50        1      active sync



   0     0       8       34        0      active sync   /dev/sdc2

   1     1       8       50        1      active sync

   2     2       8       66        2      active sync

   3     3       8       82        3      active sync   /dev/sdf2





* COMMENTS *



This result does not look consistent:



- Why is (blank) mentioned here as the current device?  Should be /dev/sdg2.

- Why devices 1 and 2 are left blank?

- Why is device /dev/sdf2 mentioned as device 3?





* ACTION *



Let's get details about RAID component partition /dev/sdh2 (DISK 3) with mdadm:



$ sudo mdadm -E /dev/sdh2



* CONSOLE-OUTPUT *



/dev/sdh2:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : ecfe8404:2f354a45:d66856da:8e136666

  Creation Time : Fri Feb 19 01:23:02 2010

     Raid Level : raid0

  Used Dev Size : 0

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Fri Feb 19 01:23:02 2010

          State : active

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : c0d7903f - correct

         Events : 1



     Chunk Size : 64K



      Number   Major   Minor   RaidDevice State

this     2       8       66        2      active sync



   0     0       8       34        0      active sync   /dev/sdc2

   1     1       8       50        1      active sync

   2     2       8       66        2      active sync

   3     3       8       82        3      active sync   /dev/sdf2





* COMMENTS *



This result does not look consistent:



- Why is (blank) mentioned here as the current device?  Should be /dev/sdh2.

- Why devices 1 and 2 are left blank?

- Why is device /dev/sdf2 mentioned as device 3?





* ACTION *



Let's get details about RAID component partition /dev/sdi2 (DISK 4) with mdadm:



$ sudo mdadm -E /dev/sdi2



* CONSOLE-OUTPUT *



/dev/sdi2:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : ecfe8404:2f354a45:d66856da:8e136666

  Creation Time : Fri Feb 19 01:23:02 2010

     Raid Level : raid0

  Used Dev Size : 0

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0



    Update Time : Fri Feb 19 01:23:02 2010

          State : active

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : c0d79051 - correct

         Events : 1



     Chunk Size : 64K



      Number   Major   Minor   RaidDevice State

this     3       8       82        3      active sync   /dev/sdf2



   0     0       8       34        0      active sync   /dev/sdc2

   1     1       8       50        1      active sync

   2     2       8       66        2      active sync

   3     3       8       82        3      active sync   /dev/sdf2





* COMMENTS *



This result does not look consistent:



- Why is (blank) mentioned here as the current device?  Should be /dev/sdi2.

- Why devices 1 and 2 are left blank?

- Why is device /dev/sdf2 mentioned as device 3?







------

[T007] Mounting the assembled RAID's filesystem as ext3-fs.

------



$ sudo mount -t ext3 /dev/md0 /media/N4100



* CONSOLE-OUT *



mount: wrong fs type, bad option, bad superblock on /dev/md0,

missing codepage or helper program, or other error

In some cases useful info is found in syslog - try

dmesg | tail or so





* kern.log *



May 15 16:48:08 obelix kernel: [184240.160548] EXT3-fs error (device
md0): ext3_check_descriptors: Block bitmap for group 1920 not in group
(block 0)!

May 15 16:48:08 obelix kernel: [184240.163677] EXT3-fs: group
descriptors corrupted!





* COMMENTS *



There is an obvious filesystem issue on the assembled filesystem,
which seems related with corrupted ext3 filesystem descriptors.



* ACTION *



I re-issue the mount command, this time not forcing the filesystem type:



$ sudo mount /dev/md0 /media/N4100



* CONSOLE-OUT *



mount: wrong fs type, bad option, bad superblock on /dev/md0,

missing codepage or helper program, or other error

In some cases useful info is found in syslog - try

dmesg | tail or so





* kern.log *



May 15 16:51:42 obelix kernel: [184453.959766] EXT2-fs error (device
md0): ext2_check_descriptors: Block bitmap for group 1920 not in group
(block 0)!

May 15 16:51:42 obelix kernel: [184453.959783] EXT2-fs: group
descriptors corrupted!



* QUESTION *



Is this issue related with the apparent inconsistencies of the mdadm
diagnosis performed on each individual disk in [T006] ?



* COMMENTS *



In its current state, the RAID filesystem of device /dev/md0 cannot be
mounted and exhibits severe inconsistencies...





------

[T008] RAID array device /dev/md0 geometry and partitioning information

------



$ sudo fdisk -l /dev/md0



* CONSOLE-OUT *



Disk /dev/md0: 1599.6 GB, 1599550521344 bytes

2 heads, 4 sectors/track, 390515264 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

Disk identifier : 0x00000000



Disk /dev/md0 doesn't contain a valid partition table.



[QUESTIONS]



Is there anything to fix here? How?





------

[T009] Checking what is wrong on /dev/md0 filesystem by means of fsck.ext3

------



* COMMENTS *



Not performing any write action on the assembled physical array...



$ sudo e2fsck -n /dev/md0



* CONSOLE-OUT *



e2fsck 1.41.9 (22-Aug-2009)

e2fsck: Group descriptors look bad... trying backing blocks...

the superbloc has an invalid journal (i-node 8).

Delete ? no



e2fsck: Illegal inode number while checking ext3 journal for /dev/md0





* COMMENTS *



This diagnostic is insufficient for now but I do not want to perform
any intrusive diagnostic on the physical disks.









PART 2: TESTS USING testdisk on disk images

*******************************************



------

[T010] Global analysis of the assembled raid image /dev/md0

------



* COMMENTS *



I am performing a testdisk analysis on the final RAID device assembled
from the 4 disk images disk0.hd, disk1.hd, disk2.hd and disk3.hd.



$ sudo testdisk /dev/md0





* CONSOLE-OUT *



Screen 1

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



  TestDisk is free software, and

comes with ABSOLUTELY NO WARRANTY.



Select a media (use Arrow keys, then press Enter):

Disk /dev/md0 - 1599 GB / 1489 GiB







[Proceed ]  [  Quit  ]



Note: Disk capacity must be correctly detected for a successful recovery.

If a disk listed above has incorrect size, check HD jumper settings, BIOS

detection, and install the latest OS patches and disk drivers.

---------------------------------------------------------------------





Screen 2

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org





Disk /dev/md0 - 1599 GB / 1489 GiB



Please select the partition table type, press Enter when done.

[Intel  ]  Intel/PC partition

[EFI GPT]  EFI GPT partition map (Mac i386, some x86_64...)

[Mac    ]  Apple partition map

[None   ]  Non partitioned media

[Sun    ]  Sun Solaris partition

[XBox   ]  XBox partition

[Return ]  Return to disk selection





Note: Do NOT select 'None' for media with only a single partition. It's very

rare for a drive to be 'Non-partitioned'.

---------------------------------------------------------------------





* ACTION *



I select none: indeed, according to my assumption, /dev/md0 SHOULD be a an

ext3-fs filesystem, and therefore does not contain sub-partitions.





Screen 3

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org





Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4



[ Analyse  ]  Analyse current partition structure and search for lost partitions

[ Advanced ]  Filesystem Utils

[ Geometry ]  Change disk geometry

[ Options  ]  Modify options

[ Quit     ]  Return to disk selection





Note: Correct disk geometry is required for a successful recovery. 'Analyse'

process may give some warnings if it thinks the logical geometry is mismatched.

---------------------------------------------------------------------





* ACTION *



I select option Analyse





Screen 4

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4

Current partition structure:

     Partition                  Start        End    Size in sectors



   P ext2                     0   0  1 390515263   1  4 3124122112







[Quick Search]

                            Try to locate partition

---------------------------------------------------------------------





* ACTION *



I select Quick Search. The Quik Search analysis gets started... and
the following result is displayed.





Screen 5

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4

     Partition               Start        End    Size in sectors

P ext2                     0   0  1 390515199   1  4 3124121600





Structure: Ok.





Keys T: change type, P: list files,

     Enter: to continue

EXT2 Large file Sparse superblock, 1599 GB / 1489 GiB

---------------------------------------------------------------------





* ACTION *



I press P to list the files on this filesystem.





Screen 6

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org

   P ext2                     0   0  1 390515199   1  4 3124121600

Directory /



No file found, filesystem seems damaged.





Use Right arrow to change directory, c to copy,

    h to hide deleted files, q to quit

---------------------------------------------------------------------





* COMMENTS *



What is wrong?





* ACTION *



I press Q to return to screen 5. Then I press Enter to continue.





Screen 7

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/md0 - 1599 GB / 1489 GiB - CHS 390515264 2 4



     Partition                  Start        End    Size in sectors



   P ext2                     0   0  1 390515199   1  4 3124121600



Write isn't available because the partition table type "None" has been selected.





[  Quit  ]  [Deeper Search]

                          Try to find more partitions

---------------------------------------------------------------------





* COMMENTS *



So, what can I do? I cannot write partition organization to the disk
because I have selected "none" as the partition structure for the
analysis... How can I practically modify that?







------

[T011] Analysis of the unkown 201MB partition.

------



* NOTES *



I execute this testdisk analysis on the first loopback partition
mapped as /dev/mapper/loop0p1.



Unlike the /dev/md0 device, it seems I cannot perform any RAID
assembling of the 4 p1 partitions /dev/mapper/loop0p1,
/dev/mapper/loop1p1, /dev/mapper/loop2p1 and /dev/mapper/loop3p1,
because this disk zone does not seem to contain any RAID superblocks.



In clear, the 201 Unknown zone reported on each disk DOES NOT look
like a RAID partition (unless its type was accidentally changed by the
above manipulations). Therefore, and unlike assembled device /dev/md0,
I am forced to run TestDisk one 1 disk image.



I select the first disk image /dev/loop0, and I therefore execute
TestDisk on its p1 partition as follows:





$ sudo test /dev/mapper/loop0p1





* CONSOLE-OUT *



Screen 1

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



  TestDisk is free software, and

comes with ABSOLUTELY NO WARRANTY.



Select a media (use Arrow keys, then press Enter):

Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB





[Proceed ]  [  Quit  ]



Note: Disk capacity must be correctly detected for a successful recovery.

If a disk listed above has incorrect size, check HD jumper settings, BIOS

detection, and install the latest OS patches and disk drivers.

---------------------------------------------------------------------





Screen 2

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org





Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB



Please select the partition table type, press Enter when done.

[Intel  ]  Intel/PC partition

[EFI GPT]  EFI GPT partition map (Mac i386, some x86_64...)

[Mac    ]  Apple partition map

[None   ]  Non partitioned media

[Sun    ]  Sun Solaris partition

[XBox   ]  XBox partition

[Return ]  Return to disk selection





Note: Do NOT select 'None' for media with only a single partition. It's very

rare for a drive to be 'Non-partitioned'.

---------------------------------------------------------------------





* ACTION *



I select Intel/PC partition just in case this zone would contain some
deleted partition.





Screen 3

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB - CHS 392049 1 1

Current partition structure:

     Partition                  Start        End    Size in sectors





Partition sector doesn't have the endmark 0xAA55





*=Primary bootable  P=Primary  L=Logical  E=Extended  D=Deleted

[Quick Search]

                            Try to locate partition

---------------------------------------------------------------------





* COMMENTS *



Where does the end 0xAA55 error come from?





* ACTION *



I select the Quick Search option, and I get the following result.





Screen 4

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/mapper/loop0p1 - 200 MB / 191 MiB - CHS 392049 1 1



     Partition                  Start        End    Size in sectors





No partition found or selected for recovery





[  Quit  ]  [Deeper Search]

                          Try to find more partitions

---------------------------------------------------------------------





* COMMENTS *



Unknown zone p1 does not contain any partition.







------

[T012] Analysis of one RAID disk image /dev/loop0

------



* NOTES *



I execute this testdisk analysis on the first loopback disk mapped as
/dev/loop0.



Identical results would also be found performing a testdisk analysis
on images /dev/loop1, /dev/loop2, or /dev/loop3.



Please note that:



- I ONLY perform an analysis on ONE disk image, not on the entire
/dev/md0 RAID device image,



- I am performing the analysis of ONE whole disk image, unlike test
[T010] where I ONLY analyzed the 201 MB unknown partition.





By doing this test, I expect TestDisk will give me accurate partition
information about each individual disk involved in the RAID array, and
in particular, I do hope I will get more accurate information about
this 201 MB zone.





$ sudo test /dev/loop0





* CONSOLE-OUT *



Screen 1

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



  TestDisk is free software, and

comes with ABSOLUTELY NO WARRANTY.



Select a media (use Arrow keys, then press Enter):

Disk /dev/loop0 - 400 GB / 372 GiB





[Proceed ]  [  Quit  ]



Note: Disk capacity must be correctly detected for a successful recovery.

If a disk listed above has incorrect size, check HD jumper settings, BIOS

detection, and install the latest OS patches and disk drivers.

---------------------------------------------------------------------





Screen 2

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org





Disk /dev/loop0 - 400 GB / 372 GiB



Please select the partition table type, press Enter when done.

[Intel  ]  Intel/PC partition

[EFI GPT]  EFI GPT partition map (Mac i386, some x86_64...)

[Mac    ]  Apple partition map

[None   ]  Non partitioned media

[Sun    ]  Sun Solaris partition

[XBox   ]  XBox partition

[Return ]  Return to disk selection





Note: Do NOT select 'None' for media with only a single partition. It's very

rare for a drive to be 'Non-partitioned'.

---------------------------------------------------------------------





* ACTION *



I select Intel/PC partition, because I assume there should be a
regular partition available in which the RAID ext2 (or 3) partition is
referenced.





Screen 3

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org





Disk /dev/loop0 - 400 GB / 372 GiB - CHS 781422768 1 1



[ Analyse  ]  Analyse current partition structure and search for lost partitions

[ Advanced ]  Filesystem Utils

[ Geometry ]  Change disk geometry

[ Options  ]  Modify options

[ MBR Code ]  Write TestDisk MBR code to first sector

[ Delete   ]  Delete all data in the partition table

[ Quit     ]  Return to disk selection





Note: Correct disk geometry is required for a successful recovery. 'Analyse'

process may give some warnings if it thinks the logical geometry is mismatched.

---------------------------------------------------------------------





* COMMENTS *



Disk Geometry information CHS=781422768 1 1 reported by Testdisk does
not match the CHS information reported by fdisk -l in test [T010].



I decide to correct Testdisk geometry parameters by replacing them
with parameters CHS=775221 16 63 reported by fdisk -l in test [T010].
(Note: I also performed test [T018] with no geometry information
change: the results of this test are not reported in this document,
because the results are inconsistent.)



* ACTION *



I select Analyse.





Screen 4

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63

Current partition structure:

     Partition                  Start        End    Size in sectors



No EXT2, JFS, Reiser, cramfs or XFS marker

 1 P Linux                    0   1  1   388  15 63     392049

 1 P Linux                    0   1  1   388  15 63     392049

 2 P Linux RAID             389   0  1 775220  15 63  781030656 [md0]

No partition is bootable





[Quick Search]  [ Backup ]

                            Try to locate partition

---------------------------------------------------------------------





* COMMENTS *



This time I seem to get some more information about the global
partition structure

of the disk:



- Partition 2 is obviously the RAID component partition



- Partition 1 is suposedly a Linux partition. But where is this
partition? Furthermore, there seems to be 2 traces of the same
partition... Was a second partition created on top of an older one?



Up to now, there seems to be a ray of hope: the RAID partition is
effectively referenced in the partition table of a RAID disk, AND
there also seems to be a Linux partition, probably damaged. I suspect
that this Linux partition may have been created by the NAS Thecus
N4100 and may contain the SHARED FOLDERS configuration and access
rights...



Nevertheless, does the fact that I cannot see this Linux partition 1
prevent me from accessing and mouting the RAID partition 2?





* ACTION *



I select Quick search in order to perform some partition search.
Results are reported in Screen 5 below.







Screen 5

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63



The harddisk (400 GB / 372 GiB) seems too small! (< 1599 GB / 1489 GiB)

Check the harddisk size: HD jumpers settings, BIOS detection...



The following partitions can't be recovered:

     Partition               Start        End    Size in sectors

  Linux                  389   0  1 3099715  15 47 3124121600

  Linux                  397   0  1 3099723  15 47 3124121600





[ Continue ]

EXT2 Large file Sparse superblock, 1599 GB / 1489 GiB

---------------------------------------------------------------------





* COMMENTS *



The hard disk seems to small!! How is this possible? What is wrong? I
am using the correct geometry information, am not I?





Screen 6

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63



Warning: the current number of heads per cylinder is 16

but the correct value may be 255.

You can use the Geometry menu to change this value.

It's something to try if

- some partitions are not found by TestDisk

- or the partition table can not be written because partitions overlaps.





[ Continue ]

---------------------------------------------------------------------





* QUESTION *



What am I supposed to do now? The geometry of /dev/loop0 matches that
reported by the fdisk -l tests performed previously. There cannot be a
disk geometry issue, can it?







Screen 7

---------------------------------------------------------------------



TestDisk 6.11, Data Recovery Utility, April 2009

Christophe GRENIER <grenier@cgsecurity.org>

http://www.cgsecurity.org



Disk /dev/loop0 - 400 GB / 372 GiB - CHS 775221 16 63

     Partition               Start        End    Size in sectors

L Linux RAID           775220  13 62 775220  15 63        128 [md0]





Structure: Ok.  Use Up/Down Arrow keys to select partition.

Use Left/Right Arrow keys to CHANGE partition characteristics:

*=Primary bootable  P=Primary  L=Logical  E=Extended  D=Deleted

Keys A: add partition, L: load backup, T: change type,

     Enter: to continue

md 0.90.0 Raid 0: devices 0(8,34)* 1(8,50) 2(8,66) 3(8,82), 65 KB / 64 KiB

---------------------------------------------------------------------







* COMMENTS *



Now, only the RAID partition is shown in the list. Linux partition 1
has disappeared... Why?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-24 12:40 David Reniau
@ 2010-05-24 21:57 ` Michael Evans
  2010-05-25  0:33 ` Neil Brown
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Michael Evans @ 2010-05-24 21:57 UTC (permalink / raw)
  To: David Reniau; +Cc: linux-raid

Way too big and spread out, I don't know if anyone will have the time
to read that.

First, https://raid.wiki.kernel.org/index.php/RAID_Recovery

Second, what parameters do you remember about the original array?

Third, you probably want to try hexdumping the first megabyte or two
of each drive and looking for a superblock of your filesystem.

(man hexdump)

hexdump -Cn $((1024*1024)) /dev/whatever

You may also want to use dd to get the end of the block device.

Most importantly, knowing //where// the superblock is in your device
will allow for a guess about where in to the block-dev the data is
supposed to start.  That ///may/// allow for the creation of a
superblock with the proper alignment (and hopefully chunk size) to
read your data.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-24 12:40 David Reniau
  2010-05-24 21:57 ` Michael Evans
@ 2010-05-25  0:33 ` Neil Brown
  2010-05-25  5:57   ` Mikael Abrahamsson
  2010-05-25  5:09 ` Luca Berra
  2010-05-25  7:59 ` John Robinson
  3 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2010-05-25  0:33 UTC (permalink / raw)
  To: David Reniau; +Cc: linux-raid

On Mon, 24 May 2010 14:40:25 +0200
David Reniau <david.reniau@gmail.com> wrote:

> Hello Linux-Raid list,
> 
> 
> 
> 
> 
> ------------------------
> 
> My problem in a nutshell:
> 
> ------------------------
> 
> 
> 
> I am unable to mount a RAID-0 (EXT3?) filesystem which I previously
> assembled with mdadm under Ubuntu 9.10 32bitx. This RAID-0 array was
> originally created by my NAS Thecus N4100.
> 

As far as I can tell from all the details you provided, everything is
behaving as expected except that the filesystem looks bad.  It could be that
the Thecus NAS vendors made some incompatible change in the ext3 filesystem
format for their product.  I would rate that as fairly unlikely but
definitely possible.  I believe it has happened before.
However you say (I think) that when you put the devices back in the NAS they
still don't work.  That suggests some on-device corruption so we cannot
really blame the vendor.

The small partition at the start of each device is probably some boot
partition.  It might be raided, it might be ext3, or it might just be a raw
copy of the kernel that the bios loads directly.  It isn't really important
to you.

The RAID0 assembled from the '2' partitions is fairly clearly the correct
raid0.  mount and fsck seem to be able to read an ext3 superblock from the
start of md0 which suggests that there aren't partitions (agreeing with what
fdisk says) and that they weren't using LVM (which is a common practice).
It would hurt to run 'vgscan' to see if it can find any LVM headers though.

It might also be interesting to run "tune2fs -l /dev/md0" and report the
result.

The fact that "mdadm -E" gives confusing messages about the device identifies
is not very interesting.  The device names that it gives are the names that
the devices had the last time the array was active.  When you move devices
between machines it is very likely for the names to change.

If you can assemble md0 from the loopXp2 devices and and happy to run
possibly destructive tests on that, try
   fsck -a /dev/md0

and see if it managed to make any sense of the filesystem.

I'm afraid there is nothing else I can suggest.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-24 12:40 David Reniau
  2010-05-24 21:57 ` Michael Evans
  2010-05-25  0:33 ` Neil Brown
@ 2010-05-25  5:09 ` Luca Berra
  2010-05-25  7:59 ` John Robinson
  3 siblings, 0 replies; 9+ messages in thread
From: Luca Berra @ 2010-05-25  5:09 UTC (permalink / raw)
  To: linux-raid

On Mon, May 24, 2010 at 02:40:25PM +0200, David Reniau wrote:
>After several trial and error manipulations (not only, but in
>particular, to regenerate the RAID superblocks), I was able to
>re-create the RAID-0 array but... I am unable to mount the RAID file
>system in the end.
do i read the above correctly? in your tries you created a new raid0
over the existing data ?
in case i am misreading, could you please explain what do you mean by
"regenerate the RAID superblocks" ?

L.


-- 
Luca Berra -- bluca@comedia.it
         Communication Media & Services S.r.l.
  /"\
  \ /     ASCII RIBBON CAMPAIGN
   X        AGAINST HTML MAIL
  / \

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-25  0:33 ` Neil Brown
@ 2010-05-25  5:57   ` Mikael Abrahamsson
  0 siblings, 0 replies; 9+ messages in thread
From: Mikael Abrahamsson @ 2010-05-25  5:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: David Reniau, linux-raid

On Tue, 25 May 2010, Neil Brown wrote:

> As far as I can tell from all the details you provided, everything is
> behaving as expected except that the filesystem looks bad.  It could be that
> the Thecus NAS vendors made some incompatible change in the ext3 filesystem
> format for their product.  I would rate that as fairly unlikely but
> definitely possible.  I believe it has happened before.

Yes, I tried putting some NAS drives (don't remember the brand) into a 
regular linux box and I could read most but not all files, and my research 
back then indicated that they had some kind of mix of ext3 and ext4 on 
there that was not in the mainline kernel.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-24 12:40 David Reniau
                   ` (2 preceding siblings ...)
  2010-05-25  5:09 ` Luca Berra
@ 2010-05-25  7:59 ` John Robinson
  3 siblings, 0 replies; 9+ messages in thread
From: John Robinson @ 2010-05-25  7:59 UTC (permalink / raw)
  To: David Reniau; +Cc: linux-raid

On 24/05/2010 13:40, David Reniau wrote:
> In Feb 2010, I (had to) extract(ed) the 4 disks off the NAS rack in
> order to remount the RAID under a regular Linux box. I placed the 4
> disks in USB cases, labelled the cases (1,2,3,4) according to the
> disks genuine order in the NAS rack (see Figure 1 below), and tried to
> rebuild the RAID-0 array by means of mdadm under Ubuntu 9.10, using
> the disks connection situation depited in Figure 2.
> 
> After several trial and error manipulations (not only, but in
> particular, to regenerate the RAID superblocks), I was able to
> re-create the RAID-0 array but... I am unable to mount the RAID file
> system in the end.

If you've recreated the array under Ubuntu with something like `mdadm -C 
/dev/mdX -l 0 -n 4 /dev/disc1p2 /dev/disc2p2 /dev/disc3p2 /dev/disc4p2` 
then you may have the discs in the wrong order, the NAS may internally 
count from the bottom up, so you should also try `mdadm -C /dev/mdX -l 0 
-n 4 /dev/disc4p2 /dev/disc3p2 /dev/disc2p2 /dev/disc1p2`. Other orders 
are also possible but perhaps unlikely.

It's also possible that Thecus used a non-default chunk size. If they 
did you need to find out and include it in your array creation line with 
-c. This is a good one to check out since if the order of the discs is 
right but the chunk size is wrong I can imagine perhaps a quarter of 
blocks appearing in the right places in the resulting /dev/mdX, perhaps 
enough for e2fsck to think there is a damaged ext2/3 filesystem there.

It's also possible that Thecus used a non-default metadata type. If they 
did you need to find out and include it in your array creation line with 
-e. Note also that mdadm's default metadata type has changed so if 
Ubuntu's mdadm is recent enough you may need to specify the 0.90 
metadata if that's what Thecus used.

If you have definitely only ever attempted to assemble the array with 
`mdadm -A` and never recreated it with `mdadm -C` then don't go trying 
any new create lines, there may be valuable information available in the 
RAID superblocks that I couldn't quite get a handle on from your 
original 50K+ email.

Good luck.

Cheers,

John.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
@ 2010-05-27  1:03 David Reniau
  2010-05-27 10:39 ` John Robinson
  2010-05-28  1:32 ` Michael Evans
  0 siblings, 2 replies; 9+ messages in thread
From: David Reniau @ 2010-05-27  1:03 UTC (permalink / raw)
  To: linux-raid

Dear Linux-RAID gurus,

Thank you to all of you for your patience, courage, and for your
precious answers who help me focus on the important things.

@Michael Evans, John Robinson:
-----------------------------

From the RAID recovery wiki:

- I am trying not to get into panik :-/... since February 2010...

- In Feb. 2010, I (manually) performed the 4!=24 RAID disk
permutations to create the RAID array via the loop procedure:

* check mdadm.conf deleted > USB disks powered off > PC reboot > disks
sequentially powered on: 1, 2, 3, 4,
followed by:

* sudo mdadm --create /dev/md0 --level=raid0 --raid-devices=4
/dev/sd[diskpermut1] /dev/sd[diskpermut2] /dev/sd[diskpermut3]
/dev/sd[diskpermut4]

* sudo mount /dev/md0 /media/mylostraid

This procedure was a failure: no combination worked to mount the RAID
filesystem. Only combinations starting with DISK 1 as the first
ordered disk, seemed to return so-called "consistent" error messages.
I empirically concluded that it is likely that order = DISK 1, 2, 3, 4
is probably correct, but so far, I must admit I cannot certify it.

@Luca Berra:
-----------

From above, yes, you read correctly, I used mdadm --create several
times. I am (now) aware this usage has altered the original RAID
superblocks information, which could be one of the causes of this
issue.

As far as I remember (manipulations performed in Feb 2010), I
nevertheless (wrongly?) felt I had no better choice at that time;
indeed, I could not get the array assembled from these 4
fresh-extracted NAS disks under Linux with a straight mdadm
--assemble. The only way had to consist of performing a
--zero-superblocks first, then of using the mdadm --create option to
somehow force the superblocks re-generation. This is my understanding
of what happened in the backstage...

It seemed to work: with potential RAID information loss? Impact(s)?

@John Robinson:
--------------

"...It's also possible that Thecus used a non-default metadata type..."

In my case, passing either 0.90 or a newer version for the metadata
did not change anything... But I should anyway pay attention to that
point the next times I will assemble the array with Ubuntu 10.04.

"...then don't go trying any new create lines, there may be valuable
information available in the RAID superblocks"

Err... From above comments to Luca, I am affraid all of my original
RAID info's gone, right?

@Neil Brown:
-----------
Your scenario asumptions confirm what I was more or less thinking of,
and helped me minimize my initial concerns about this 201 MB
partition, and about the mdadm -E reports.

Following your suggestion:

$ sudo tune2fs -l /dev/md0

tune2fs 1.41.9 (22-Aug-2009)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          7db2aaee-1830-4f1f-8f1b-fd97a6d48a54
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr filetype sparse_super large_file
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         not clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              195264512
Block count:              390515200
Reserved block count:     19525760
Free blocks:              88837799
Free inodes:              64986461
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Sat Feb 13 05:44:45 2010
Last write time:          Wed May 26 18:53:25 2010
Mount count:              53
Maximum mount count:      27
Last checked:             Wed Jun 28 00:13:42 2006
Check interval:           15552000 (6 months)
Next check after:         Sun Dec 24 23:13:42 2006
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          128

How can I exploit this information?

Repair test fsck -a /dev/md0. I will give it a try probably during the
week-end on the disk images. Since my disks images may be definitively
lost afterwards + about 15 boring hours to recreate them, I will try
to perform other (non destructive) tests on them before... Anyhow,
this manipulation is in the pipe...

@John Robinson, Mikael Abrahamsson, Neil Brown, Michael Evans:
-------------------------------------------------------------

OKay. Therefore, I should not exclude that Thecus may make use of
proprietary and/or of non standard RAID parameters to build a RAID
array. Thinking about it, this assumption is actually quite likely
considering that the ways RAID is managed from one Thecus model to
another seems to vary. Christophe Grenier (testdisk author) suggested
that Thecus may -- why not? -- make use of a patched kernel to read
the raid fs of the Thecus N4100.

Therefore, my first priority BEFORE proceeding to misc. tests and
analysis MUST be collecting all possible accurate information about
Thecus N4100's RAID creation and configuration parameters, in
particular:

- Chunk size,
- RAID metadata version,
- Disk assembling order
- Exact type/features of the RAID filesystem: ext2, ext3, some
proprietary fs, etc.

After reading your emails:

- I opened a ticket to Thecus support requesting all the technical
details about the RAID management parameters internally used in a
Thecus N4100. I am now expecting their answer.
- I also backed-up the contents of this info request to the concerned
Thecus forums. Quicker?
- Within the next days, I will reconnect the Thecus (diskless), and I
will try to investigate its logs and config files (provided I can
access them) in order to find more about this point.

@Michael Evans:
--------------

I will use the the hexdump technique you suggested. Having read about
the crucial need to know about the original RAID parameters, wouldn't
it make sense to use this technique _after_ I could first figure out
and apply the exact Thecus N4100's RAID params to my disks via mdadm?
Be it only to guarantee that the hexdumps I would extract would be
consistent and reliable?

@Neil Brown, John Robinson:
--------------------------

Similarly, IF the default chunk size (64K) i've been using so far with
mdadm happens to be different from the Thecus', would the above
tune2fs -l /dev/md0 command deliver results identical to the above
ones with the corrected Thecus chunk size applied via mdadm
--assemble? Could this parameter modification be sufficient to get my
RAID filesystem back and mountable?

Kind regards,

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-27  1:03 My Thecus RAID-0 filesystem unmountable with mdadm. Please help David Reniau
@ 2010-05-27 10:39 ` John Robinson
  2010-05-28  1:32 ` Michael Evans
  1 sibling, 0 replies; 9+ messages in thread
From: John Robinson @ 2010-05-27 10:39 UTC (permalink / raw)
  To: David Reniau, Linux RAID

On 27/05/2010 02:03, David Reniau wrote:
> Dear Linux-RAID gurus,

That's very kind of you but I'm no guru, just a user like yourself, 
though perhaps with a little more experience and I've hung around here a 
while because I like to know what goes on inside the black box.

[...]
> Similarly, IF the default chunk size (64K) i've been using so far with
> mdadm happens to be different from the Thecus', would the above
> tune2fs -l /dev/md0 command deliver results identical to the above
> ones with the corrected Thecus chunk size applied via mdadm
> --assemble? Could this parameter modification be sufficient to get my
> RAID filesystem back and mountable?

I don't know if the tune2fs info will be identical but it would be 
similar. You'll need to --create the array again with the corrected 
chunk size, not just --assemble it, but yes this on its own might very 
well be enough to get your filesystem back.

Cheers,

John.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: My Thecus RAID-0 filesystem unmountable with mdadm. Please help.
  2010-05-27  1:03 My Thecus RAID-0 filesystem unmountable with mdadm. Please help David Reniau
  2010-05-27 10:39 ` John Robinson
@ 2010-05-28  1:32 ` Michael Evans
  1 sibling, 0 replies; 9+ messages in thread
From: Michael Evans @ 2010-05-28  1:32 UTC (permalink / raw)
  To: David Reniau; +Cc: linux-raid

I was suggesting the hexdump utility to determine several things:

1) Which disk contained the extfs magic header (thus the first disk in
almost all cases)
2) Where the start of that header was, which would /typically/ begin
after 1kb of padding (it's there for other things to use in some
cases).
3) If created properly, the extfs header /may/ give you a suggested
stripe/chunk size.

As an example...

http://www.virtualblueness.net/Ext2fs-overview/Ext2fs-overview-0.1-12.html
http://www.monstrmoose.com/repository/Halo_Tools/Etc/WinHex_15.1/Ext%20Superblock.tpl

(0x400 == 1024)

Looking at the blocks, in the standard output the 'magic signature'
should be in the 4th row, starting halfway across, which it is for
even ext4 filesystems.  Typically, but not always, there is
zero-filled padding around this area; as denoted by the line of 0s and
then the * indicating that the last line repeats until the next
address.

hexdump -Cn2048 /dev/mapper/lin-lucid--root_crypt
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000400  90 97 05 00 ff 56 16 00  f3 1d 01 00 a3 4b 04 00  |.....V.......K..|
00000410  2f 95 02 00 00 00 00 00  02 00 00 00 02 00 00 00  |/...............|
00000420  00 80 00 00 00 80 00 00  d0 1f 00 00 ec 1e e6 4b  |...............K|
00000430  32 d3 e0 4b 11 00 1f 00  53 ef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-05-28  1:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-27  1:03 My Thecus RAID-0 filesystem unmountable with mdadm. Please help David Reniau
2010-05-27 10:39 ` John Robinson
2010-05-28  1:32 ` Michael Evans
  -- strict thread matches above, loose matches on Subject: below --
2010-05-24 12:40 David Reniau
2010-05-24 21:57 ` Michael Evans
2010-05-25  0:33 ` Neil Brown
2010-05-25  5:57   ` Mikael Abrahamsson
2010-05-25  5:09 ` Luca Berra
2010-05-25  7:59 ` John Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).