questions about softraid limitations

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* questions about softraid limitations
@ 2008-05-14  0:34 Janos Haar
  2008-05-14 10:45 ` David Greaves
  0 siblings, 1 reply; 17+ messages in thread
From: Janos Haar @ 2008-05-14  0:34 UTC (permalink / raw)
  To: linux-raid

Hello list, Neil,

I have worked on a faulty hw raid card data recovery some days before.
The project is already successfully done, but i run into some limitations.

For the best safe way first i have jumperd the drives to readonly mode.

Than try to build an "old fashion" linear arrays from each disks + 64k 
another blockdevice. (for store the superblock)
But the mdadm refused to _build_ the array, because the source scsi drive is 
jumpered to readonly. Why? :-)

I try to build the array with --readonly option, but the mdadm still dont 
understand what i want. (yes, i know, rtfm...)

The next step is:

mdadm --create --assume-clean -l 5 /dev/mdX --raid-disks 5 /dev/....

(And this step is refused too if the drives or some part of the sources are 
readonly.)

Its OK, but what about building a readonly raid 5 array for recovery usage 
only? :-)
And _build_ a degraded-readonly raid 4-5-6 is even better if this will ever 
available....

And i need to set the readonly flag _after_ the --create, but the --create 
+ --assume-clean is logical, and working combination with --readonly.

More additionally, i think the only write the superblocks to the disks, when 
the --readonly flag is cleared.
And more, if the readonly raid 4-5-6 array can handle, and recover the bad 
sectors from the parity infos.... hmmmm... :-)

These little options can be great help for data recovery, and all is safe.

Thanks and regards,

Janos Haar

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-14  0:34 questions about softraid limitations Janos Haar
@ 2008-05-14 10:45 ` David Greaves
  2008-05-14 23:29   ` Janos Haar
  0 siblings, 1 reply; 17+ messages in thread
From: David Greaves @ 2008-05-14 10:45 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

Janos Haar wrote:
> Hello list, Neil,

Hi Janos

> I have worked on a faulty hw raid card data recovery some days before.
> The project is already successfully done, but i run into some limitations.

Firstly, are you aware that Linux SW raid will not understand disks written by
hardware raid.

> Than try to build an "old fashion" linear arrays from each disks + 64k
> another blockdevice. (for store the superblock)
> But the mdadm refused to _build_ the array, because the source scsi
> drive is jumpered to readonly. Why? :-)
This will not allow md to write superblocks to the disks.

> 
> I try to build the array with --readonly option, but the mdadm still
> dont understand what i want. (yes, i know, rtfm...)
This will start the array in readonly mode - you've not created an array yet
because you haven't written any superblocks...


> Its OK, but what about building a readonly raid 5 array for recovery
> usage only? :-)
That's fine. If they are md raid disks. Yours aren't yet since you haven't
written the superblocks.


David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-14 10:45 ` David Greaves
@ 2008-05-14 23:29   ` Janos Haar
  2008-05-16  1:39     ` Neil Brown
  2008-05-16  8:36     ` David Greaves
  0 siblings, 2 replies; 17+ messages in thread
From: Janos Haar @ 2008-05-14 23:29 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

Hello David,

----- Original Message ----- 
From: "David Greaves" <david@dgreaves.com>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Wednesday, May 14, 2008 12:45 PM
Subject: Re: questions about softraid limitations


> Janos Haar wrote:
>> Hello list, Neil,
>
> Hi Janos
>
>> I have worked on a faulty hw raid card data recovery some days before.
>> The project is already successfully done, but i run into some 
>> limitations.
>
> Firstly, are you aware that Linux SW raid will not understand disks 
> written by
> hardware raid.

Yes, i know, but the linux raid is a great tool to try it, and if the user 
know what he is doing, it is safe too. :-)

>
>> Than try to build an "old fashion" linear arrays from each disks + 64k
>> another blockdevice. (for store the superblock)
>> But the mdadm refused to _build_ the array, because the source scsi
>> drive is jumpered to readonly. Why? :-)
> This will not allow md to write superblocks to the disks.

I think exactly for this steps:

dd if=/dev/zero of=suberblock.bin bs=64k count=1
losetup /dev/loop0 superblock.bin
blockdev --setro /dev/sda
mdadm --build -l linear /dev/md0 /dev/sda /dev/loop0

The superblock area is writable.
And this is enough to try to assemble the array to do the recovery, but this 
step is refused.

>
>>
>> I try to build the array with --readonly option, but the mdadm still
>> dont understand what i want. (yes, i know, rtfm...)
> This will start the array in readonly mode - you've not created an array 
> yet
> because you haven't written any superblocks...

Yes, i only want to build, not to create.

>
>
>> Its OK, but what about building a readonly raid 5 array for recovery
>> usage only? :-)
> That's fine. If they are md raid disks. Yours aren't yet since you haven't
> written the superblocks.

I only want to help for some people to get back the data.
I only need to build, not to create.

Thanks,

Janos Haar


>
>
> David
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-14 23:29   ` Janos Haar
@ 2008-05-16  1:39     ` Neil Brown
  2008-05-16  6:05       ` [OT] " Peter Rabbitson
  2008-05-16 10:00       ` Janos Haar
  2008-05-16  8:36     ` David Greaves
  1 sibling, 2 replies; 17+ messages in thread
From: Neil Brown @ 2008-05-16  1:39 UTC (permalink / raw)
  To: Janos Haar; +Cc: David Greaves, linux-raid

On Thursday May 15, djani22@netcenter.hu wrote:
> Hello David,
> 
> ----- Original Message ----- 
> From: "David Greaves" <david@dgreaves.com>
> To: "Janos Haar" <djani22@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Wednesday, May 14, 2008 12:45 PM
> Subject: Re: questions about softraid limitations
> 
> 
> > Janos Haar wrote:
> >> Hello list, Neil,
> >
> > Hi Janos
> >
> >> I have worked on a faulty hw raid card data recovery some days before.
> >> The project is already successfully done, but i run into some 
> >> limitations.
> >
> > Firstly, are you aware that Linux SW raid will not understand disks 
> > written by
> > hardware raid.
> 
> Yes, i know, but the linux raid is a great tool to try it, and if the user 
> know what he is doing, it is safe too. :-)

As long as the user also knows what the kernel is doing .....

If you build an md array on top of a read-only device, the array is
still writable, and the device gets written too!!

Yes, it is a bug.  I hadn't thought about that case before.  I will
look into it.

> 
> >
> >> Than try to build an "old fashion" linear arrays from each disks + 64k
> >> another blockdevice. (for store the superblock)
> >> But the mdadm refused to _build_ the array, because the source scsi
> >> drive is jumpered to readonly. Why? :-)
> > This will not allow md to write superblocks to the disks.
> 
> I think exactly for this steps:
> 
> dd if=/dev/zero of=suberblock.bin bs=64k count=1
                       ^p
> losetup /dev/loop0 superblock.bin
> blockdev --setro /dev/sda
> mdadm --build -l linear /dev/md0 /dev/sda /dev/loop0
                         ^ --raid-disks=2

> 
> The superblock area is writable.
> And this is enough to try to assemble the array to do the recovery, but this 
> step is refused.

What error message do you get?  It worked for me (once I added
--raid-disks=2).

You probably want superblock.bin to be more than 64K.  The superblock
is located between 64K and 128K from the end of the device, depending
on device size.  It is always a multiple of 64K from the start of the
device.

> 
> >
> >>
> >> I try to build the array with --readonly option, but the mdadm still
> >> dont understand what i want. (yes, i know, rtfm...)
> > This will start the array in readonly mode - you've not created an array 
> > yet
> > because you haven't written any superblocks...
> 
> Yes, i only want to build, not to create.
> 
> >
> >
> >> Its OK, but what about building a readonly raid 5 array for recovery
> >> usage only? :-)
> > That's fine. If they are md raid disks. Yours aren't yet since you haven't
> > written the superblocks.
> 
> I only want to help for some people to get back the data.
> I only need to build, not to create.

And this you can do ... but not with mdadm at the moment
unfortunately.

What carefully :-)
--------------------------------------------------------------
/tmp# cd /sys/block/md0/md
/sys/block/md0/md# echo 65536 > chunk_size 
/sys/block/md0/md# echo 2 > layout 
/sys/block/md0/md# echo raid5  > level 
/sys/block/md0/md# echo none > metadata_version 
/sys/block/md0/md# echo 5 > raid_disks 
/sys/block/md0/md# ls -l /dev/sdb
brw-rw---- 1 root disk 8, 16 2008-05-16 11:13 /dev/sdb
/sys/block/md0/md# ls -l /dev/sdc
brw-rw---- 1 root disk 8, 32 2008-05-16 11:13 /dev/sdc
/sys/block/md0/md# echo 8:16 > new_dev 
/sys/block/md0/md# echo 8:32 > new_dev 
/sys/block/md0/md# echo 8:48 > new_dev 
/sys/block/md0/md# echo 8:64 > new_dev 
/sys/block/md0/md# echo 8:80 > new_dev 
/sys/block/md0/md# echo 0 > dev-sdb/slot 
/sys/block/md0/md# echo 1 > dev-sdc/slot 
/sys/block/md0/md# echo 2 > dev-sdd/slot 
/sys/block/md0/md# echo 3 > dev-sde/slot 
/sys/block/md0/md# echo 4 > dev-sdf/slot 
/sys/block/md0/md# echo 156250000 > dev-sdb/size 
/sys/block/md0/md# echo 156250000 > dev-sdc/size 
/sys/block/md0/md# echo 156250000 > dev-sdd/size 
/sys/block/md0/md# echo 156250000 > dev-sde/size 
/sys/block/md0/md# echo 156250000 > dev-sdf/size 
/sys/block/md0/md# echo readonly > array_state 
/sys/block/md0/md# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md0 : active (read-only) raid5 sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
      624999936 blocks super non-persistent level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
      
unused devices: <none>
----------------------------------------------------------

Did you catch all of that?

The per-device 'size' is in K - I took it straight from
  /proc/partitions.
The chunk_size is in bytes.

Have fun.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [OT] Re: questions about softraid limitations
  2008-05-16  1:39     ` Neil Brown
@ 2008-05-16  6:05       ` Peter Rabbitson
  2008-05-18 23:52         ` Neil Brown
  2008-05-16 10:00       ` Janos Haar
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Rabbitson @ 2008-05-16  6:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Janos Haar, David Greaves, linux-raid

Neil Brown wrote:
> On Thursday May 15, djani22@netcenter.hu wrote:
>> I only want to help for some people to get back the data.
>> I only need to build, not to create.
> 
> And this you can do ... but not with mdadm at the moment
> unfortunately.
> 
> What carefully :-)
> --------------------------------------------------------------
> /tmp# cd /sys/block/md0/md
> /sys/block/md0/md# echo 65536 > chunk_size 
> /sys/block/md0/md# echo 2 > layout 
> /sys/block/md0/md# echo raid5  > level 
> /sys/block/md0/md# echo none > metadata_version 
> /sys/block/md0/md# echo 5 > raid_disks 
> /sys/block/md0/md# ls -l /dev/sdb
> brw-rw---- 1 root disk 8, 16 2008-05-16 11:13 /dev/sdb
> /sys/block/md0/md# ls -l /dev/sdc
> brw-rw---- 1 root disk 8, 32 2008-05-16 11:13 /dev/sdc
> /sys/block/md0/md# echo 8:16 > new_dev 
> /sys/block/md0/md# echo 8:32 > new_dev 
> /sys/block/md0/md# echo 8:48 > new_dev 
> /sys/block/md0/md# echo 8:64 > new_dev 
> /sys/block/md0/md# echo 8:80 > new_dev 
> /sys/block/md0/md# echo 0 > dev-sdb/slot 
> /sys/block/md0/md# echo 1 > dev-sdc/slot 
> /sys/block/md0/md# echo 2 > dev-sdd/slot 
> /sys/block/md0/md# echo 3 > dev-sde/slot 
> /sys/block/md0/md# echo 4 > dev-sdf/slot 
> /sys/block/md0/md# echo 156250000 > dev-sdb/size 
> /sys/block/md0/md# echo 156250000 > dev-sdc/size 
> /sys/block/md0/md# echo 156250000 > dev-sdd/size 
> /sys/block/md0/md# echo 156250000 > dev-sde/size 
> /sys/block/md0/md# echo 156250000 > dev-sdf/size 
> /sys/block/md0/md# echo readonly > array_state 
> /sys/block/md0/md# cat /proc/mdstat 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
> md0 : active (read-only) raid5 sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
>       624999936 blocks super non-persistent level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       
> unused devices: <none>
> ----------------------------------------------------------

Wow, this is big :) Is it correct to state that today most of mdadm's tasks 
can be carried out by toying with sysfs? Fascinating!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [OT] Re: questions about softraid limitations
  2008-05-16  6:05       ` [OT] " Peter Rabbitson
@ 2008-05-18 23:52         ` Neil Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2008-05-18 23:52 UTC (permalink / raw)
  To: Peter Rabbitson; +Cc: Janos Haar, David Greaves, linux-raid

On Friday May 16, rabbit+list@rabbit.us wrote:
> 
> Wow, this is big :) Is it correct to state that today most of mdadm's tasks 
> can be carried out by toying with sysfs? Fascinating!

Sort-of.

Many of mdadm's tasks are "make it easy to..." or "'automatically
figure out ...".  These things cannot be done by toying in sysfs.

But yes:  most (if not all) direct manipulations of md array can be
done through text files in sysfs.

NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-16  1:39     ` Neil Brown
  2008-05-16  6:05       ` [OT] " Peter Rabbitson
@ 2008-05-16 10:00       ` Janos Haar
  1 sibling, 0 replies; 17+ messages in thread
From: Janos Haar @ 2008-05-16 10:00 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello Neil,

(sorry Neil for duplication, but first i forget to cc)

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: "David Greaves" <david@dgreaves.com>; <linux-raid@vger.kernel.org>
Sent: Friday, May 16, 2008 3:39 AM
Subject: Re: questions about softraid limitations


> On Thursday May 15, djani22@netcenter.hu wrote:
>> Hello David,
>>
>> ----- Original Message ----- 
>> From: "David Greaves" <david@dgreaves.com>
>> To: "Janos Haar" <djani22@netcenter.hu>
>> Cc: <linux-raid@vger.kernel.org>
>> Sent: Wednesday, May 14, 2008 12:45 PM
>> Subject: Re: questions about softraid limitations
>>
>>
>> > Janos Haar wrote:
>> >> Hello list, Neil,
>> >
>> > Hi Janos
>> >
>> >> I have worked on a faulty hw raid card data recovery some days before.
>> >> The project is already successfully done, but i run into some
>> >> limitations.
>> >
>> > Firstly, are you aware that Linux SW raid will not understand disks
>> > written by
>> > hardware raid.
>>
>> Yes, i know, but the linux raid is a great tool to try it, and if the
>> user
>> know what he is doing, it is safe too. :-)
>
> As long as the user also knows what the kernel is doing .....
>
> If you build an md array on top of a read-only device, the array is
> still writable, and the device gets written too!!
>
> Yes, it is a bug.  I hadn't thought about that case before.  I will
> look into it.

Ooops. :-)

>
>>
>> >
>> >> Than try to build an "old fashion" linear arrays from each disks + 64k
>> >> another blockdevice. (for store the superblock)
>> >> But the mdadm refused to _build_ the array, because the source scsi
>> >> drive is jumpered to readonly. Why? :-)
>> > This will not allow md to write superblocks to the disks.
>>
>> I think exactly for this steps:
>>
>> dd if=/dev/zero of=suberblock.bin bs=64k count=1
>                       ^p
>> losetup /dev/loop0 superblock.bin
>> blockdev --setro /dev/sda
>> mdadm --build -l linear /dev/md0 /dev/sda /dev/loop0
>                         ^ --raid-disks=2
>
>>
>> The superblock area is writable.
>> And this is enough to try to assemble the array to do the recovery, but
>> this
>> step is refused.
>
> What error message do you get?  It worked for me (once I added
> --raid-disks=2).

The previous example is just an on the fly typing, in real readonly jumpered
scsi sda gives az error message!

>
> You probably want superblock.bin to be more than 64K.  The superblock
> is located between 64K and 128K from the end of the device, depending
> on device size.  It is always a multiple of 64K from the start of the
> device.

Usually i used a 128MB! disk partitions for this.
This is safe enough.
And sometimes we need more loop devices than 8.... :-)

>
>>
>> >
>> >>
>> >> I try to build the array with --readonly option, but the mdadm still
>> >> dont understand what i want. (yes, i know, rtfm...)
>> > This will start the array in readonly mode - you've not created an
>> > array
>> > yet
>> > because you haven't written any superblocks...
>>
>> Yes, i only want to build, not to create.
>>
>> >
>> >
>> >> Its OK, but what about building a readonly raid 5 array for recovery
>> >> usage only? :-)
>> > That's fine. If they are md raid disks. Yours aren't yet since you
>> > haven't
>> > written the superblocks.
>>
>> I only want to help for some people to get back the data.
>> I only need to build, not to create.
>
> And this you can do ... but not with mdadm at the moment
> unfortunately.
>
> What carefully :-)
> --------------------------------------------------------------
> /tmp# cd /sys/block/md0/md
> /sys/block/md0/md# echo 65536 > chunk_size
> /sys/block/md0/md# echo 2 > layout
> /sys/block/md0/md# echo raid5  > level
> /sys/block/md0/md# echo none > metadata_version
> /sys/block/md0/md# echo 5 > raid_disks
> /sys/block/md0/md# ls -l /dev/sdb
> brw-rw---- 1 root disk 8, 16 2008-05-16 11:13 /dev/sdb
> /sys/block/md0/md# ls -l /dev/sdc
> brw-rw---- 1 root disk 8, 32 2008-05-16 11:13 /dev/sdc
> /sys/block/md0/md# echo 8:16 > new_dev
> /sys/block/md0/md# echo 8:32 > new_dev
> /sys/block/md0/md# echo 8:48 > new_dev
> /sys/block/md0/md# echo 8:64 > new_dev
> /sys/block/md0/md# echo 8:80 > new_dev
> /sys/block/md0/md# echo 0 > dev-sdb/slot
> /sys/block/md0/md# echo 1 > dev-sdc/slot
> /sys/block/md0/md# echo 2 > dev-sdd/slot
> /sys/block/md0/md# echo 3 > dev-sde/slot
> /sys/block/md0/md# echo 4 > dev-sdf/slot
> /sys/block/md0/md# echo 156250000 > dev-sdb/size
> /sys/block/md0/md# echo 156250000 > dev-sdc/size
> /sys/block/md0/md# echo 156250000 > dev-sdd/size
> /sys/block/md0/md# echo 156250000 > dev-sde/size
> /sys/block/md0/md# echo 156250000 > dev-sdf/size
> /sys/block/md0/md# echo readonly > array_state
> /sys/block/md0/md# cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> [multipath] [faulty]
> md0 : active (read-only) raid5 sdf[4] sde[3] sdd[2] sdc[1] sdb[0]
>      624999936 blocks super non-persistent level 5, 64k chunk, algorithm 2
> [5/5] [UUUUU]
>
> unused devices: <none>
> ----------------------------------------------------------
>
> Did you catch all of that?

sysfs! :-)
Wow! :-)
This is what really helps for me, thanks! :-)

But what about the other people?
Mdadm will know this too?

>
> The per-device 'size' is in K - I took it straight from
>  /proc/partitions.
> The chunk_size is in bytes.
>
> Have fun.

Thank you!
Next time i will try it. :-)

Janos

>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-14 23:29   ` Janos Haar
  2008-05-16  1:39     ` Neil Brown
@ 2008-05-16  8:36     ` David Greaves
  2008-05-16  9:18       ` David Greaves
  2008-05-16  9:28       ` Janos Haar
  1 sibling, 2 replies; 17+ messages in thread
From: David Greaves @ 2008-05-16  8:36 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

Janos Haar wrote:
>> Firstly, are you aware that Linux SW raid will not understand disks
>> written by hardware raid.
> Yes, i know, but the linux raid is a great tool to try it, and if the
> user know what he is doing, it is safe too. :-)
OK - just checking :)

>> This will not allow md to write superblocks to the disks.
> 
> I think exactly for this steps:
> 
> dd if=/dev/zero of=suberblock.bin bs=64k count=1
> losetup /dev/loop0 superblock.bin
> blockdev --setro /dev/sda
> mdadm --build -l linear /dev/md0 /dev/sda /dev/loop0
> 
> The superblock area is writable.
> And this is enough to try to assemble the array to do the recovery, but
> this step is refused.
Ah, I understand now.
I think you need -n2 to tell mdadm to use 2 devices.

>>> Its OK, but what about building a readonly raid 5 array for recovery
>>> usage only? :-)
>> That's fine. If they are md raid disks. Yours aren't yet since you
>> haven't
>> written the superblocks.
> 
> I only want to help for some people to get back the data.
> I only need to build, not to create.

I think this would be really hard if they are not md arrays since the on-disk
layout is likely to be different. Not something I know how to do.

Typically the first step in recovery is to duplicate the disks using ddrescue
and work on copies of the duplicates where you can overwrite things.
If you have had a hardware failure on the drive then even mounting readonly can
make things worse. (If the mb/controller failed then fair enough - but in that
case it's not a 'recovery', just a simple, 'no-risk(tm)' migration... ?)

Tell us more about the failed system:
* hardware or md raid5  (if hw then you'll need a *lot* of info about on-disk
layout and I personally have no clue how to help - sorry)

If md:
* kernel of original system and new system
* new mdadm version
* what kind of failure occured
* any dmesg data you have
* can you ddrescue the drives and do mdadm --examine /dev/sd<partition> for each
component.

Cheers
David
PS Aplogies if I'm stating things that are obvious to you :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-16  8:36     ` David Greaves
@ 2008-05-16  9:18       ` David Greaves
  2008-05-16  9:28       ` Janos Haar
  1 sibling, 0 replies; 17+ messages in thread
From: David Greaves @ 2008-05-16  9:18 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

David Greaves wrote:
Before checking the list this morning and seeing Neil had replied...

Anyhow - consider what I said about the disk failure issues and be careful with
Neil's incantation if the only copy of the data is on those potentially flakey
drives.

If things aren't clear to you then feel free to answer my questions before doing
anything you later regret :)

cheers
David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-16  8:36     ` David Greaves
  2008-05-16  9:18       ` David Greaves
@ 2008-05-16  9:28       ` Janos Haar
  2008-05-18  9:11         ` David Greaves
  1 sibling, 1 reply; 17+ messages in thread
From: Janos Haar @ 2008-05-16  9:28 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid

----- Original Message ----- 
From: "David Greaves" <david@dgreaves.com>
To: "Janos Haar" <djani22@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, May 16, 2008 10:36 AM
Subject: Re: questions about softraid limitations

> Janos Haar wrote:
>>> Firstly, are you aware that Linux SW raid will not understand disks
>>> written by hardware raid.
>> Yes, i know, but the linux raid is a great tool to try it, and if the
>> user know what he is doing, it is safe too. :-)
> OK - just checking :)

:)

>
>>> This will not allow md to write superblocks to the disks.
>>
>> I think exactly for this steps:
>>
>> dd if=/dev/zero of=suberblock.bin bs=64k count=1
>> losetup /dev/loop0 superblock.bin
>> blockdev --setro /dev/sda
>> mdadm --build -l linear /dev/md0 /dev/sda /dev/loop0
>>
>> The superblock area is writable.
>> And this is enough to try to assemble the array to do the recovery, but
>> this step is refused.
> Ah, I understand now.
> I think you need -n2 to tell mdadm to use 2 devices.

Sorry i forget. :-)
I wrote this on the fly, not a real example....

>
>>>> Its OK, but what about building a readonly raid 5 array for recovery
>>>> usage only? :-)
>>> That's fine. If they are md raid disks. Yours aren't yet since you
>>> haven't
>>> written the superblocks.
>>
>> I only want to help for some people to get back the data.
>> I only need to build, not to create.
>
> I think this would be really hard if they are not md arrays since the 
> on-disk
> layout is likely to be different. Not something I know how to do.
>
> Typically the first step in recovery is to duplicate the disks using 
> ddrescue
> and work on copies of the duplicates where you can overwrite things.
> If you have had a hardware failure on the drive then even mounting 
> readonly can
> make things worse. (If the mb/controller failed then fair enough - but in 
> that
> case it's not a 'recovery', just a simple, 'no-risk(tm)' migration... ?)

Yes, i can agree.
At this time, i working in my data recovery company, and some times need to 
recover the broken hw raid arrays too.
(with md arrays, we have no problem at all. :-) )

In your rows, we talking about 2 cases:

a, disk hw problem (only bad sectors, the completely failed disk is in 'b' 
case)
Yes, the ddrescue is the best way, to do the recovery, but:
The ddrescue is too agressive with default -e 0 setting!
This can be easily fail down the drive! (dependig the reason of the bad 
sectors)
And with the images, we have another problem!
The 0x00 holes.
The hw or md have no deal about where we need recover from parity and where 
we have real zero blocks....
Overall this is why data recovery companys learning and developing more and 
more.... :-)

b, the disk is fine, but the hw raid card is failed, or the array have a 
logical problem eg 2 disk in raid 5 is out of sync.
In this case the duplication is only waste of time.
The recovery safely in readonly mode.

Often the problem with the arrays is the time.
The servers using the array, and all the down time is expensive.
In my case, the recovery is already successfully done, but yes, i need to 
copy all the x TB data, to make only a readonly probe.... :-(

>
> Tell us more about the failed system:
> * hardware or md raid5  (if hw then you'll need a *lot* of info about 
> on-disk
> layout and I personally have no clue how to help - sorry)

The card was an adaptech scsi raid card, and 5 disk in raid 5.
Electrical problem (black out) made the 2 disk out of sync, and the card let 
me no chance to repair, or safely rebuild the array, only erase it....
But with md, ist done. :-)

I need no help at this time, i just want to share my ideas, to helping 
upgrading/developing md, and helping for people....

Thanks,

Janos

>
> If md:
> * kernel of original system and new system
> * new mdadm version
> * what kind of failure occured
> * any dmesg data you have
> * can you ddrescue the drives and do mdadm --examine /dev/sd<partition> 
> for each
> component.
>
> Cheers
> David
> PS Aplogies if I'm stating things that are obvious to you :)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-16  9:28       ` Janos Haar
@ 2008-05-18  9:11         ` David Greaves
  2008-05-18 11:11           ` Janos Haar
  2008-05-18 19:36           ` David Lethe
  0 siblings, 2 replies; 17+ messages in thread
From: David Greaves @ 2008-05-18  9:11 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid, David Lethe

Janos Haar wrote:
> At this time, i working in my data recovery company, and some times need
Ah - I missed this too.

> to recover the broken hw raid arrays too.
> (with md arrays, we have no problem at all. :-) )
Nice quote for "the benefits of software raid" somewhere :)

> In your rows, we talking about 2 cases:
> 
> a, disk hw problem (only bad sectors, the completely failed disk is in
> 'b' case)
> Yes, the ddrescue is the best way, to do the recovery, but:
> The ddrescue is too agressive with default -e 0 setting!
> This can be easily fail down the drive! (dependig the reason of the bad
> sectors)
OK, worth knowing - what would you suggest?

> And with the images, we have another problem!
> The 0x00 holes.
> The hw or md have no deal about where we need recover from parity and
> where we have real zero blocks....
> Overall this is why data recovery companys learning and developing more
> and more.... :-)

Hmm - I wonder if things like ddrescue could work with the md bitmaps to improve
this situation?
Is this related to David Lethe's recent request?

> I need no help at this time, i just want to share my ideas, to helping
> upgrading/developing md, and helping for people....
OK - ta.

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-18  9:11         ` David Greaves
@ 2008-05-18 11:11           ` Janos Haar
  2008-05-18 13:00             ` David Greaves
  2008-05-18 19:36           ` David Lethe
  1 sibling, 1 reply; 17+ messages in thread
From: Janos Haar @ 2008-05-18 11:11 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid


----- Original Message ----- 
From: "David Greaves" <david@dgreaves.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>; "David Lethe" <david@santools.com>
Sent: Sunday, May 18, 2008 11:11 AM
Subject: Re: questions about softraid limitations


> Janos Haar wrote:
>> At this time, i working in my data recovery company, and some times need
> Ah - I missed this too.
>
>> to recover the broken hw raid arrays too.
>> (with md arrays, we have no problem at all. :-) )
> Nice quote for "the benefits of software raid" somewhere :)
>
>> In your rows, we talking about 2 cases:
>>
>> a, disk hw problem (only bad sectors, the completely failed disk is in
>> 'b' case)
>> Yes, the ddrescue is the best way, to do the recovery, but:
>> The ddrescue is too agressive with default -e 0 setting!
>> This can be easily fail down the drive! (dependig the reason of the bad
>> sectors)
> OK, worth knowing - what would you suggest?

The actually best way is this:

blockdev --setro /dev/SOURCE
blockdev --setra  0 /dev/SOURCE
ddrescue -v -b 4096 -B 1024 -e 1 -y 0 /dev/SOURCE /dev/TARGET (if you dont 
use a real clean disk with full of zeros, you need the -A option too)

+ need some kernel patch to disable all retry on the source disk.

The best recovery history is in 3 steps:

1. read the drive with these settings, and after the error-stop, jump up 
with x Kbyte, and continue reading...
2. jump only the start-kbytes, and read backward (with -e 1) to the first 
error or until the end of x Kbyte.
3. read with -B 1024 -b 1204 -e 0 -y 0 only the holes. :-)

+ need to count the jump-start-errors! (if the drive gives read error after 
3-5 jump continously, this means something wrong happening, and need to stop 
the dirve immediately!)

The best step up x Kbyte for good heads and good platter is 16KByte - 8 
MByte.
And for damaged head and/or damaged platters is 100MB - 1GB.

There is no easy solution. :-(

>
>> And with the images, we have another problem!
>> The 0x00 holes.
>> The hw or md have no deal about where we need recover from parity and
>> where we have real zero blocks....
>> Overall this is why data recovery companys learning and developing more
>> and more.... :-)
>
> Hmm - I wonder if things like ddrescue could work with the md bitmaps to 
> improve
> this situation?
> Is this related to David Lethe's recent request?

I think ddrescue is for copy/rescue the data, not for processing it.
But can log the errors... ;-)

But we have another problem at this point:
The practical step is to copy the whole block device, not the partition 
only.
(if something wrong happening with the damaged heads or platters, we dont 
know the MBR (first sector) will be exists on the next reboot....)

Cheers,

Janos


>
>> I need no help at this time, i just want to share my ideas, to helping
>> upgrading/developing md, and helping for people....
> OK - ta.
>
> David 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-18 11:11           ` Janos Haar
@ 2008-05-18 13:00             ` David Greaves
  2008-05-18 21:51               ` Janos Haar
  0 siblings, 1 reply; 17+ messages in thread
From: David Greaves @ 2008-05-18 13:00 UTC (permalink / raw)
  To: Janos Haar; +Cc: linux-raid

Janos Haar wrote:
> The actually best way is this:
> 
> blockdev --setro /dev/SOURCE
> blockdev --setra  0 /dev/SOURCE
> ddrescue -v -b 4096 -B 1024 -e 1 -y 0 /dev/SOURCE /dev/TARGET (if you
> dont use a real clean disk with full of zeros, you need the -A option too)
> 
> + need some kernel patch to disable all retry on the source disk.
Is that something you could share?

It sounds like something that would be worth including in the libata error
handling to be controlled by something like /sys/block/sdX/device/ioerr_retry_count

I'd be interested in playing with it and sending to Tejun Heo if you don't have
time.

> The best recovery history is in 3 steps:
> 
[snip]
> 
> There is no easy solution. :-(

OK - I'll put this up on the wiki sometime.

>> Hmm - I wonder if things like ddrescue could work with the md bitmaps
>> to improve
>> this situation?
>> Is this related to David Lethe's recent request?
> 
> I think ddrescue is for copy/rescue the data, not for processing it.
> But can log the errors... ;-)
Yes - I meant to take the ddrescue output and manipulate the bitmap on the
copied drive to mark certain parts as 'bad'.
Ideally this could be done in the event of a dual-failure so that you could
re-add 2 failed drives and the bitmaps could indicate.

This may requires something closer to a bytemap but...


David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-18 13:00             ` David Greaves
@ 2008-05-18 21:51               ` Janos Haar
  0 siblings, 0 replies; 17+ messages in thread
From: Janos Haar @ 2008-05-18 21:51 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-raid


----- Original Message ----- 
From: "David Greaves" <david@dgreaves.com>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Sunday, May 18, 2008 3:00 PM
Subject: Re: questions about softraid limitations


> Janos Haar wrote:
>> The actually best way is this:
>>
>> blockdev --setro /dev/SOURCE
>> blockdev --setra  0 /dev/SOURCE
>> ddrescue -v -b 4096 -B 1024 -e 1 -y 0 /dev/SOURCE /dev/TARGET (if you
>> dont use a real clean disk with full of zeros, you need the -A option 
>> too)
>>
>> + need some kernel patch to disable all retry on the source disk.
> Is that something you could share?

Aaa, i made it myself about 1 year before on 2.6.18, but unfortunately at 
the moment the source is already deleted, and i dont save the diff. :-(
I am not a good C programmer, but takes about 1 hour. ;-)

>
> It sounds like something that would be worth including in the libata error
> handling to be controlled by something like 
> /sys/block/sdX/device/ioerr_retry_count

This is a great idea! :)
But not just for the ide drives!
I think this is good for all block device!
(usb, scsi, ide, sata...)

Oops.
And an old problem comes again:
On IDE drives with read errors, sometimes drop the DMA.
My script can be re-enable it sometimes, but not the real good solution....
If this can be fixed too, this will be great! :-)

>
> I'd be interested in playing with it and sending to Tejun Heo if you don't 
> have
> time.
>
>> The best recovery history is in 3 steps:
>>
> [snip]
>>
>> There is no easy solution. :-(
>
> OK - I'll put this up on the wiki sometime.
>
>>> Hmm - I wonder if things like ddrescue could work with the md bitmaps
>>> to improve
>>> this situation?
>>> Is this related to David Lethe's recent request?
>>
>> I think ddrescue is for copy/rescue the data, not for processing it.
>> But can log the errors... ;-)
> Yes - I meant to take the ddrescue output and manipulate the bitmap on the
> copied drive to mark certain parts as 'bad'.
> Ideally this could be done in the event of a dual-failure so that you 
> could
> re-add 2 failed drives and the bitmaps could indicate.
>
> This may requires something closer to a bytemap but...

This is good in this way. :-)

Cheers,

Janos

>
>
> David 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: questions about softraid limitations
  2008-05-18  9:11         ` David Greaves
  2008-05-18 11:11           ` Janos Haar
@ 2008-05-18 19:36           ` David Lethe
  2008-05-18 22:23             ` David Greaves
  1 sibling, 1 reply; 17+ messages in thread
From: David Lethe @ 2008-05-18 19:36 UTC (permalink / raw)
  To: David Greaves, Janos Haar; +Cc: linux-raid

-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com] 
Sent: Sunday, May 18, 2008 4:12 AM
To: Janos Haar
Cc: linux-raid@vger.kernel.org; David Lethe
Subject: Re: questions about softraid limitations

Janos Haar wrote:
> At this time, i working in my data recovery company, and some times
need
Ah - I missed this too.

> to recover the broken hw raid arrays too.
> (with md arrays, we have no problem at all. :-) )
Nice quote for "the benefits of software raid" somewhere :)

> In your rows, we talking about 2 cases:
> 
> a, disk hw problem (only bad sectors, the completely failed disk is in
> 'b' case)
> Yes, the ddrescue is the best way, to do the recovery, but:
> The ddrescue is too agressive with default -e 0 setting!
> This can be easily fail down the drive! (dependig the reason of the
bad
> sectors)
OK, worth knowing - what would you suggest?

> And with the images, we have another problem!
> The 0x00 holes.
> The hw or md have no deal about where we need recover from parity and
> where we have real zero blocks....
> Overall this is why data recovery companys learning and developing
more
> and more.... :-)

Hmm - I wonder if things like ddrescue could work with the md bitmaps to
improve
this situation?
Is this related to David Lethe's recent request?

> I need no help at this time, i just want to share my ideas, to helping
> upgrading/developing md, and helping for people....
OK - ta.

David
-----------
No, we are trying two different approaches.
In my situation, I already know that the data is munged on a particular
block, so the solution is to calculate the correct data from surviving
parity, and just write the new value.  There is no reason to worry about
md bitmaps, or even whether or not there are 0x00 holes.

I am not trying to fix a problem such as a rebuild gone bad or an
intermittent disk failure that put the md array in a partially synced,
and totally confused state. [I also do data recovery, and have a
software bag-o-tricks, but I only take on jobs relating to certain
hardware RAID controllers where I am intimately familiar with the
metadata layout ... and have a software bag-o-tricks that nearly always
have to modify given the original configuration, and chain of events].

My desire is to limit damage before a full disk recovery needs to be
performed, by insuring that there are no double-errors that will make
stripe-level recovery impossible (assuming they aren't using RAID6).
For that I need a mechanism to repair a stripe given a physical disk and
offset. There is no completely failed disk to contend with, merely a
block of bad data that will repair itself once I issue a simple write
command. (trick, of course, is to figure out exactly what & where to
right it and deal with potential locking issues relating to file
system).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-18 19:36           ` David Lethe
@ 2008-05-18 22:23             ` David Greaves
  2008-05-18 22:38               ` Janos Haar
  0 siblings, 1 reply; 17+ messages in thread
From: David Greaves @ 2008-05-18 22:23 UTC (permalink / raw)
  To: David Lethe; +Cc: Janos Haar, linux-raid

David Lethe wrote:
 > Hmm - I wonder if things like ddrescue could work with the md bitmaps to
> improve
> this situation?
> Is this related to David Lethe's recent request?
> 
> -----------
> No, we are trying two different approaches.
> In my situation, I already know that the data is munged on a particular
> block, so the solution is to calculate the correct data from surviving
> parity, and just write the new value.  There is no reason to worry about
> md bitmaps, or even whether or not there are 0x00 holes.

I think we (or I) may be talking about the same thing?

Consider an array sd[abcde] and a badblock (42) on sdb followed by a badblock
elsewhere (142) on sdc.
I would like to ddrescue sdb to sdb' and sdc to sdc' (leaving holes)
block 42 should be recovered from sd[acde] to sdb'
block 142 should be recovered from sd[abde] to sdc'

The idea was to possibly tristate the bitmap clean/dirty/corrupt.
If md gets a read/write error then it marks the block corrupt; alternatively we
could use the output from ddrescue to identify corrupt blocks that md may not
have seen.

I wondered whether each block actually needed to record the event it was last
updated with. I haven't thought through the various failure cases but...

> I am not trying to fix a problem such as a rebuild gone bad or an
> intermittent disk failure that put the md array in a partially synced,
> and totally confused state.
No, me neither...

> My desire is to limit damage before a full disk recovery needs to be
> performed, by insuring that there are no double-errors that will make
> stripe-level recovery impossible (assuming they aren't using RAID6).
> For that I need a mechanism to repair a stripe given a physical disk and
> offset. There is no completely failed disk to contend with, merely a
> block of bad data that will repair itself once I issue a simple write
> command. (trick, of course, is to figure out exactly what & where to
> right it and deal with potential locking issues relating to file
> system).
I think I'm describing that too.
If you simplify my case to a single badblock do we meet?

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: questions about softraid limitations
  2008-05-18 22:23             ` David Greaves
@ 2008-05-18 22:38               ` Janos Haar
  0 siblings, 0 replies; 17+ messages in thread
From: Janos Haar @ 2008-05-18 22:38 UTC (permalink / raw)
  To: David Lethe, David Greaves; +Cc: linux-raid


----- Original Message ----- 
From: "David Greaves" <david@dgreaves.com>
To: "David Lethe" <david@santools.com>
Cc: "Janos Haar" <janos.haar@netcenter.hu>; <linux-raid@vger.kernel.org>
Sent: Monday, May 19, 2008 12:23 AM
Subject: Re: questions about softraid limitations


> David Lethe wrote:
> > Hmm - I wonder if things like ddrescue could work with the md bitmaps to
>> improve
>> this situation?
>> Is this related to David Lethe's recent request?
>>
>> -----------
>> No, we are trying two different approaches.
>> In my situation, I already know that the data is munged on a particular
>> block, so the solution is to calculate the correct data from surviving
>> parity, and just write the new value.  There is no reason to worry about
>> md bitmaps, or even whether or not there are 0x00 holes.
>
> I think we (or I) may be talking about the same thing?
>
> Consider an array sd[abcde] and a badblock (42) on sdb followed by a 
> badblock
> elsewhere (142) on sdc.
> I would like to ddrescue sdb to sdb' and sdc to sdc' (leaving holes)
> block 42 should be recovered from sd[acde] to sdb'
> block 142 should be recovered from sd[abde] to sdc'

If i read this correct, David Lethe wants an on the fly solution, to keep 
the integrity of the big online array, before some app reads the bad block, 
and need to resync...

(David, think twice before buy disks! ;-)

>
> The idea was to possibly tristate the bitmap clean/dirty/corrupt.
> If md gets a read/write error then it marks the block corrupt; 
> alternatively we
> could use the output from ddrescue to identify corrupt blocks that md may 
> not
> have seen.

I am not sure, but i have right, the bitmap cannot be tristate!
But in my case, enough the dirty flag, because i only need a readonly array 
to read the data, and no need to rewrite the bad block by the kernel.


>
> I wondered whether each block actually needed to record the event it was 
> last
> updated with. I haven't thought through the various failure cases but...
>
>> I am not trying to fix a problem such as a rebuild gone bad or an
>> intermittent disk failure that put the md array in a partially synced,
>> and totally confused state.
> No, me neither...
>
>> My desire is to limit damage before a full disk recovery needs to be
>> performed, by insuring that there are no double-errors that will make
>> stripe-level recovery impossible (assuming they aren't using RAID6).
>> For that I need a mechanism to repair a stripe given a physical disk and
>> offset. There is no completely failed disk to contend with, merely a
>> block of bad data that will repair itself once I issue a simple write
>> command. (trick, of course, is to figure out exactly what & where to
>> right it and deal with potential locking issues relating to file
>> system).
> I think I'm describing that too.
> If you simplify my case to a single badblock do we meet?
>
> David 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-05-18 23:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-14  0:34 questions about softraid limitations Janos Haar
2008-05-14 10:45 ` David Greaves
2008-05-14 23:29   ` Janos Haar
2008-05-16  1:39     ` Neil Brown
2008-05-16  6:05       ` [OT] " Peter Rabbitson
2008-05-18 23:52         ` Neil Brown
2008-05-16 10:00       ` Janos Haar
2008-05-16  8:36     ` David Greaves
2008-05-16  9:18       ` David Greaves
2008-05-16  9:28       ` Janos Haar
2008-05-18  9:11         ` David Greaves
2008-05-18 11:11           ` Janos Haar
2008-05-18 13:00             ` David Greaves
2008-05-18 21:51               ` Janos Haar
2008-05-18 19:36           ` David Lethe
2008-05-18 22:23             ` David Greaves
2008-05-18 22:38               ` Janos Haar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).