recovery problems, might be driver-related

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* recovery problems, might be driver-related
@ 2010-02-07 13:38 Stefan Hübner
  2010-02-08  5:52 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Hübner @ 2010-02-07 13:38 UTC (permalink / raw)
  To: linux-raid

Hi Everybody,

I've recently come across some RAID-Recovery problem that were kind of
not-so-easy-to-understand.  I was trying to recover damaged RAID5s,
where one
disk died and another dropped out (most likely due to read error recovery
timeout / not reporting back while error recovery was active) during
resync with
a spare/new disk.  After taking double backups I tried to recreate the
raid with
the needed working disk images (to make the superblocks consistent).  During
that action mdadm told me that the last-dropped disk contained a valid ext3
filesystem and was obviously part of an md-array.

This happened with NAS-Devices from 2 different Vendors (namely Thecus and
Synology), which made me think it must be a md-raid thing.  Does md-raid
create
a filesystem after a disk dropped out?  Or may something in the system
happen to
cause this strange behaviour?

All in all: after recreating the raids the filesystem contained on it was
totally damaged (could not even be mounted).  fsck ran multiple days with
excessive data loss.

P.S.: the mdadm-lines to recreate the RAIDs were derived from mdadm -E -
outputs
of the original partitions, so I believe that it should have worked (on
other
recoveries I did before it also worked well).

Does anyone have an idea what is going on there?  Or may it have happened -
well, I don't want to say something that could get me sued.

All the best,
Stefan Hübner
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: recovery problems, might be driver-related
  2010-02-07 13:38 recovery problems, might be driver-related Stefan Hübner
@ 2010-02-08  5:52 ` Neil Brown
  2010-02-08  6:49   ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2010-02-08  5:52 UTC (permalink / raw)
  To: stefan.huebner; +Cc: linux-raid

On Sun, 07 Feb 2010 14:38:18 +0100
Stefan Hübner <stefan.huebner@stud.tu-ilmenau.de> wrote:

> Hi Everybody,
> 
> I've recently come across some RAID-Recovery problem that were kind of
> not-so-easy-to-understand.  I was trying to recover damaged RAID5s,
> where one
> disk died and another dropped out (most likely due to read error recovery
> timeout / not reporting back while error recovery was active) during
> resync with
> a spare/new disk.  After taking double backups I tried to recreate the
> raid with
> the needed working disk images (to make the superblocks consistent).  During
> that action mdadm told me that the last-dropped disk contained a valid ext3
> filesystem and was obviously part of an md-array.

mdadm only checks the superblock at the start of the device to see if it
looks like an ext3 filesystem.  So if an md array has a valid filesystem,
then it is very likely that at least one of the devices in the array will
appear to have a valid filesystem to mdadm.

> 
> This happened with NAS-Devices from 2 different Vendors (namely Thecus and
> Synology), which made me think it must be a md-raid thing.  Does md-raid
> create
> a filesystem after a disk dropped out?  Or may something in the system
> happen to
> cause this strange behaviour?

No, nothing would try to create a filesystem on a device just because it has
dropped out of a RAID.


> 
> All in all: after recreating the raids the filesystem contained on it was
> totally damaged (could not even be mounted).  fsck ran multiple days with
> excessive data loss.

Maybe there was meant to be another layer between the md array and the
filesystem - maybe LVM ??  If there should have been an LVM and wasn't the
filesystem would definitely look very corrupt even though the superblock
might appear to be in the right place.


NeilBrown


> 
> P.S.: the mdadm-lines to recreate the RAIDs were derived from mdadm -E -
> outputs
> of the original partitions, so I believe that it should have worked (on
> other
> recoveries I did before it also worked well).
> 
> Does anyone have an idea what is going on there?  Or may it have happened -
> well, I don't want to say something that could get me sued.
> 
> All the best,
> Stefan Hübner
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: recovery problems, might be driver-related
  2010-02-08  5:52 ` Neil Brown
@ 2010-02-08  6:49   ` Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-02-08  6:49 UTC (permalink / raw)
  To: linux-raid

Neil Brown schrieb:
> On Sun, 07 Feb 2010 14:38:18 +0100
> [...]
> mdadm only checks the superblock at the start of the device to see if it
> looks like an ext3 filesystem.  So if an md array has a valid filesystem,
> then it is very likely that at least one of the devices in the array will
> appear to have a valid filesystem to mdadm.

Indeed that makes perfect sense.
> 
>[...]
> 
> Maybe there was meant to be another layer between the md array and the
> filesystem - maybe LVM ??  If there should have been an LVM and wasn't the
> filesystem would definitely look very corrupt even though the superblock
> might appear to be in the right place.
> 

Well, there's no need to tell me - the guys in taiwan just don't do it
on their NAS Devices.  But thanks for the hint.  I recall that the
successful data-recoveries were on a Thecus N5200, which does indeed use
lvm (to separate iSCSI-space, userspace, config-space, etc.)

> 
> NeilBrown
> 
Bottom-line: the customer's RAIDs were just f**ked, they should have had
taken drive-health more seriously... And the Vendor's code should get
"normalized" so that FS-checking in regular intervals takes place...

Thanks for the help,
Stefan Hübner
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-02-08  6:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-07 13:38 recovery problems, might be driver-related Stefan Hübner
2010-02-08  5:52 ` Neil Brown
2010-02-08  6:49   ` Stefan /*St0fF*/ Hübner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).