Map Block number from hdd to md

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Map Block number from hdd to md
@ 2010-02-12  0:24 Michael
  2010-02-16  1:20 ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Michael @ 2010-02-12  0:24 UTC (permalink / raw)
  To: linux-raid

Hello,

i've came into the situation that one of my 4 mdadm raid5 drives failed.
not realy faild, but not detectet at system startup. so i started resync,
and one of the remaining hdd's had a bad block and faild. so 2 drives
offline and raid not functional anymore.

1st question:
i have read that it is possible with debugfs to locate which file belongs
to the bad block on a ext file system. good thing, so i can check if i have
*lost* an inportant or an unimportant file... or just free space.
problen with this is, that i cant map the known bad block from, lets say,
sda to my raid array md0.

is there any method to find that bad block in context of the raid block
device? reading all files is not a good option on large raidsets.
level 5, 64k chunk, algorithm 2

2nd question:

in my case, i have a functional raid5 array with 3 of 4 disks, in which
one of the active discs has a bad sector. assume that the one failed disk
has consistent parity information/data on this sector, but has been altered
so that a complete resync would not work. is there a way to resync only
that one chunk that belongs to the block? using the data from the 3 drives
without a bad block, even if one is not a active part of the array but was
before?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-12  0:24 Map Block number from hdd to md Michael
@ 2010-02-16  1:20 ` Neil Brown
  2010-02-16  4:02   ` Keld Simonsen
  2010-02-16 11:14   ` Michael
  0 siblings, 2 replies; 10+ messages in thread
From: Neil Brown @ 2010-02-16  1:20 UTC (permalink / raw)
  To: Michael; +Cc: linux-raid

On Fri, 12 Feb 2010 01:24:30 +0100
Michael <michael@rw23.de> wrote:

> Hello,
> 
> i've came into the situation that one of my 4 mdadm raid5 drives failed.
> not realy faild, but not detectet at system startup. so i started resync,
> and one of the remaining hdd's had a bad block and faild. so 2 drives
> offline and raid not functional anymore.
> 
> 1st question:
> i have read that it is possible with debugfs to locate which file belongs
> to the bad block on a ext file system. good thing, so i can check if i have
> *lost* an inportant or an unimportant file... or just free space.
> problen with this is, that i cant map the known bad block from, lets say,
> sda to my raid array md0.
> 
> is there any method to find that bad block in context of the raid block
> device? reading all files is not a good option on large raidsets.
> level 5, 64k chunk, algorithm 2

It isn't that hard.  The code is in drivers/md/raid5.c in the kernel.....

Rather than trying to describe in general, give me the block number, device,
and "mdadm --examine" of that device, and I'll tell you how I get the answer.



> 
> 2nd question:
> 
> in my case, i have a functional raid5 array with 3 of 4 disks, in which
> one of the active discs has a bad sector. assume that the one failed disk
> has consistent parity information/data on this sector, but has been altered
> so that a complete resync would not work. is there a way to resync only
> that one chunk that belongs to the block? using the data from the 3 drives
> without a bad block, even if one is not a active part of the array but was
> before?

No.
If you were desperate, you could use 'dd' to read each of the chunks into a
file, then write a little c/perl/whatever program to xor those files
together, then use 'dd' to write that file back out the the target chunk.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16  1:20 ` Neil Brown
@ 2010-02-16  4:02   ` Keld Simonsen
  2010-02-16  4:38     ` Keld Simonsen
  2010-02-16 11:14   ` Michael
  1 sibling, 1 reply; 10+ messages in thread
From: Keld Simonsen @ 2010-02-16  4:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: Michael, linux-raid

On Tue, Feb 16, 2010 at 12:20:14PM +1100, Neil Brown wrote:
> On Fri, 12 Feb 2010 01:24:30 +0100
> Michael <michael@rw23.de> wrote:
> 
> > Hello,
> > 
> > i've came into the situation that one of my 4 mdadm raid5 drives failed.
> > not realy faild, but not detectet at system startup. so i started resync,
> > and one of the remaining hdd's had a bad block and faild. so 2 drives
> > offline and raid not functional anymore.

I just had a similar situation. A raid5 with 4 disks had block errors on
one disk and was failed. I checked it and it seemed without errors, and
I wanted to re-add it. But then the other (Samsung 1 TB) disk erred in
the resync process due to this disk also having bad blocks.

I managed to get the raid5 running again forcing it to be run with only
3 disks (one with bad blocks), and checking the fs with xfs_repair I found out that I was
lucky that the fs integrity (directories, inodes etc) was undmaged.
So I could run the array. 

But I cannot resync it as resyncing almost immediately runs into a
resync of the bad blocks on the Samsung disk. It would have been nice if
there was some sort of bad blocks management with Linux MD, but I
understand that this is in the works.

I also understand that ext3 badblock management would not have saved me
here, true? 

MD  resyncing is in an underlying level and does not take care of ext3
badblock handling, I think.

> > 1st question:
> > i have read that it is possible with debugfs to locate which file belongs
> > to the bad block on a ext file system. good thing, so i can check if i have
> > *lost* an inportant or an unimportant file... or just free space.
> > problen with this is, that i cant map the known bad block from, lets say,
> > sda to my raid array md0.
> > 
> > is there any method to find that bad block in context of the raid block
> > device? reading all files is not a good option on large raidsets.
> > level 5, 64k chunk, algorithm 2
> 
> It isn't that hard.  The code is in drivers/md/raid5.c in the kernel.....
> 
> Rather than trying to describe in general, give me the block number, device,
> and "mdadm --examine" of that device, and I'll tell you how I get the answer.

Furthermore I would have liked to find out which files were affected.
Is there a way to do this with XFS? debugfs is for ext3.
I was not able to find a program mapping a sector to an inode in XFS.
And then there is the need to map the physical bad block number on the
device to the actual block in the (damaged) raid5. How to do that?
I think this is almost the same question as Michael's (with an XFS
variation).

Best regards
Keld

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16  4:02   ` Keld Simonsen
@ 2010-02-16  4:38     ` Keld Simonsen
  2010-02-16 10:57       ` Michael
  0 siblings, 1 reply; 10+ messages in thread
From: Keld Simonsen @ 2010-02-16  4:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Michael, linux-raid

Further to my problems described below I dreamt up something that could
solve my problem, till I got new disks installed.

I am actually alive with a raid5 with 2 malfunctioning devices -
something that is impossible...  And I think I could be revived.
And I think it is not an uncommon situation.

I have badblocks. But only about 60 blocks on one drive and 10 on the
other, out of 4 drives. It is an error rate of about 1 out of 20,000
or 99,995 % good data rate. If I could resync both the erroneous drives, and
avoid the badblocks in the process, I would be safe (for some time). 

So if resync could be told to avoid the badblocks, and the file system
in question also could be told to avoid the blocks then I could be in
the air. I was then thinking of a userland resync process - no need to
change the kernel, just install new mdadm and friends. Is that doable
and useful?

best regards
keld

On Tue, Feb 16, 2010 at 06:02:52AM +0200, Keld Simonsen wrote:
> On Tue, Feb 16, 2010 at 12:20:14PM +1100, Neil Brown wrote:
> > On Fri, 12 Feb 2010 01:24:30 +0100
> > Michael <michael@rw23.de> wrote:
> > 
> > > Hello,
> > > 
> > > i've came into the situation that one of my 4 mdadm raid5 drives failed.
> > > not realy faild, but not detectet at system startup. so i started resync,
> > > and one of the remaining hdd's had a bad block and faild. so 2 drives
> > > offline and raid not functional anymore.
> 
> I just had a similar situation. A raid5 with 4 disks had block errors on
> one disk and was failed. I checked it and it seemed without errors, and
> I wanted to re-add it. But then the other (Samsung 1 TB) disk erred in
> the resync process due to this disk also having bad blocks.
> 
> I managed to get the raid5 running again forcing it to be run with only
> 3 disks (one with bad blocks), and checking the fs with xfs_repair I found out that I was
> lucky that the fs integrity (directories, inodes etc) was undmaged.
> So I could run the array. 
> 
> But I cannot resync it as resyncing almost immediately runs into a
> resync of the bad blocks on the Samsung disk. It would have been nice if
> there was some sort of bad blocks management with Linux MD, but I
> understand that this is in the works.
> 
> I also understand that ext3 badblock management would not have saved me
> here, true? 
> 
> MD  resyncing is in an underlying level and does not take care of ext3
> badblock handling, I think.
> 
> 
> 
> > > 1st question:
> > > i have read that it is possible with debugfs to locate which file belongs
> > > to the bad block on a ext file system. good thing, so i can check if i have
> > > *lost* an inportant or an unimportant file... or just free space.
> > > problen with this is, that i cant map the known bad block from, lets say,
> > > sda to my raid array md0.
> > > 
> > > is there any method to find that bad block in context of the raid block
> > > device? reading all files is not a good option on large raidsets.
> > > level 5, 64k chunk, algorithm 2
> > 
> > It isn't that hard.  The code is in drivers/md/raid5.c in the kernel.....
> > 
> > Rather than trying to describe in general, give me the block number, device,
> > and "mdadm --examine" of that device, and I'll tell you how I get the answer.
> 
> Furthermore I would have liked to find out which files were affected.
> Is there a way to do this with XFS? debugfs is for ext3.
> I was not able to find a program mapping a sector to an inode in XFS.
> And then there is the need to map the physical bad block number on the
> device to the actual block in the (damaged) raid5. How to do that?
> I think this is almost the same question as Michael's (with an XFS
> variation).
> 
> Best regards
> Keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16  4:38     ` Keld Simonsen
@ 2010-02-16 10:57       ` Michael
  2010-02-17  3:34         ` Keld Simonsen
  0 siblings, 1 reply; 10+ messages in thread
From: Michael @ 2010-02-16 10:57 UTC (permalink / raw)
  To: Keld Simonsen; +Cc: Neil Brown, linux-raid

Hi Keld,

if you do a smartctl -A on /dev/sdX you sould see something under
Current_Pending_Sector and Offline_Uncorrectable.
Your hard drive replaces the bad blocks with spare blocks as far as you
are write something to them.

i have solved the resync issue by using
dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1

you can test the block number to be really bad by
dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1
if that command causes a input/output error, the block is bad.

in fact, with each block, you have "lost" 512 bytes of data. your problem
is very simular to mine.
after overwriting the bad blocks, all should be fine again.

you sould be able to "repair" all that bad blocks by a little xor'ing
script/program mentioned by neil brown.
if would be nice to have such a script where you can tell which
block/chunk is wrong and to which device to write to (and to read from).
with that program, the bad block will be overwritten with the (hopefully)
valid data and become functional again.

i also think this is a very common issue, that after a 1disk failue a 2nd
disk fails at resync because of bad blocks.
this could be prevented by doing a long smart check once a week or
something, but i did not had the idea to do that till today :)

On Tue, 16 Feb 2010 06:38:41 +0200, Keld Simonsen <keld@keldix.com> wrote:
> Further to my problems described below I dreamt up something that could
> solve my problem, till I got new disks installed.
> 
> I am actually alive with a raid5 with 2 malfunctioning devices -
> something that is impossible...  And I think I could be revived.
> And I think it is not an uncommon situation.
> 
> I have badblocks. But only about 60 blocks on one drive and 10 on the
> other, out of 4 drives. It is an error rate of about 1 out of 20,000
> or 99,995 % good data rate. If I could resync both the erroneous drives,
> and
> avoid the badblocks in the process, I would be safe (for some time). 
> 
> So if resync could be told to avoid the badblocks, and the file system
> in question also could be told to avoid the blocks then I could be in
> the air. I was then thinking of a userland resync process - no need to
> change the kernel, just install new mdadm and friends. Is that doable
> and useful?
> 
> best regards
> keld

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16 10:57       ` Michael
@ 2010-02-17  3:34         ` Keld Simonsen
  2010-02-17  8:43           ` Michael
  0 siblings, 1 reply; 10+ messages in thread
From: Keld Simonsen @ 2010-02-17  3:34 UTC (permalink / raw)
  To: Michael; +Cc: Neil Brown, linux-raid

On Tue, Feb 16, 2010 at 11:57:00AM +0100, Michael wrote:
> Hi Keld,
> 
> if you do a smartctl -A on /dev/sdX you sould see something under
> Current_Pending_Sector and Offline_Uncorrectable.
> Your hard drive replaces the bad blocks with spare blocks as far as you
> are write something to them.
> 
> i have solved the resync issue by using
> dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1
> 
> you can test the block number to be really bad by
> dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1
> if that command causes a input/output error, the block is bad.

Yes, that cleared some errors, but unfortunately not all.
That is one divice had 72bad blocks beforehand, and 44 afterwaeds, and
the other had 9 beforehand, and 5 after.

The second dd command actuallly did not report any bad blocks, but a
selective badblocks command did.

Anyway, is there something about Samsung disks not having spare blocks 
for this?


> in fact, with each block, you have "lost" 512 bytes of data. your problem
> is very simular to mine.
> after overwriting the bad blocks, all should be fine again.
> 
> you sould be able to "repair" all that bad blocks by a little xor'ing
> script/program mentioned by neil brown.
> if would be nice to have such a script where you can tell which
> block/chunk is wrong and to which device to write to (and to read from).
> with that program, the bad block will be overwritten with the (hopefully)
> valid data and become functional again.

yes, I still would like to find the inode in the raid file system from 
the bad block on a physical disk.

> i also think this is a very common issue, that after a 1disk failue a 2nd
> disk fails at resync because of bad blocks.
> this could be prevented by doing a long smart check once a week or
> something, but i did not had the idea to do that till today :)

I will do some description of this on the wiki, in a while. Others may
also contribute, you are most welcome to write something up for the
wiki.

> On Tue, 16 Feb 2010 06:38:41 +0200, Keld Simonsen <keld@keldix.com> wrote:
> > Further to my problems described below I dreamt up something that could
> > solve my problem, till I got new disks installed.
> > 
> > I am actually alive with a raid5 with 2 malfunctioning devices -
> > something that is impossible...  And I think I could be revived.
> > And I think it is not an uncommon situation.
> > 
> > I have badblocks. But only about 60 blocks on one drive and 10 on the
> > other, out of 4 drives. It is an error rate of about 1 out of 20,000
> > or 99,995 % good data rate. If I could resync both the erroneous drives,
> > and
> > avoid the badblocks in the process, I would be safe (for some time). 
> > 
> > So if resync could be told to avoid the badblocks, and the file system
> > in question also could be told to avoid the blocks then I could be in
> > the air. I was then thinking of a userland resync process - no need to
> > change the kernel, just install new mdadm and friends. Is that doable
> > and useful?
> > 
> > best regards
> > keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-17  3:34         ` Keld Simonsen
@ 2010-02-17  8:43           ` Michael
  0 siblings, 0 replies; 10+ messages in thread
From: Michael @ 2010-02-17  8:43 UTC (permalink / raw)
  To: Keld Simonsen; +Cc: linux-raid

On Wed, 17 Feb 2010 05:34:38 +0200, Keld Simonsen <keld@keldix.com> wrote:
> On Tue, Feb 16, 2010 at 11:57:00AM +0100, Michael wrote:
>> Hi Keld,
>> 
>> if you do a smartctl -A on /dev/sdX you sould see something under
>> Current_Pending_Sector and Offline_Uncorrectable.
>> Your hard drive replaces the bad blocks with spare blocks as far as you
>> are write something to them.
>> 
>> i have solved the resync issue by using
>> dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1
>> 
>> you can test the block number to be really bad by
>> dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1
>> if that command causes a input/output error, the block is bad.
> 
> Yes, that cleared some errors, but unfortunately not all.
> That is one divice had 72bad blocks beforehand, and 44 afterwaeds, and
> the other had 9 beforehand, and 5 after.
> 
> The second dd command actuallly did not report any bad blocks, but a
> selective badblocks command did.
strange. if the 2nd command is working, you can recover the data and write
it back. i have heard about bad blocks that are "sometimes" bad, and other
times not. iam not sure.

> 
> Anyway, is there something about Samsung disks not having spare blocks 
> for this?

Model Family:     SAMSUNG SpinPoint F1 DT series
Device Model:     SAMSUNG HD103UJ               
that's my disk, and it has spare blocks.
check smartctl -A /dev/sdX for Current_Pending_Sector and
Offline_Uncorrectable


> yes, I still would like to find the inode in the raid file system from 
> the bad block on a physical disk.

yeah, me to.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16  1:20 ` Neil Brown
  2010-02-16  4:02   ` Keld Simonsen
@ 2010-02-16 11:14   ` Michael
  2010-02-17 23:47     ` Neil Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Michael @ 2010-02-16 11:14 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote:
>> is there any method to find that bad block in context of the raid block
>> device? reading all files is not a good option on large raidsets.
>> level 5, 64k chunk, algorithm 2
> 
> It isn't that hard.  The code is in drivers/md/raid5.c in the
kernel.....
> 
> Rather than trying to describe in general, give me the block number,
> device,
> and "mdadm --examine" of that device, and I'll tell you how I get the
> answer.


the bad block number was 122060740.

[root@raw sqla]mdadm --examine /dev/sda3
/dev/sda3:
          Magic : a92b4efc
        Version : 0.91.00
           UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw)
  Creation Time : Thu Feb 11 16:01:12 2010
     Raid Level : raid6
  Used Dev Size : 966060672 (921.31 GiB 989.25 GB)
     Array Size : 2898182016 (2763.92 GiB 2967.74 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2

  Reshape pos'n : 974014464 (928.89 GiB 997.39 GB)
     New Layout : left-symmetric

    Update Time : Tue Feb 16 11:58:37 2010
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 16372b12 - correct
         Events : 363519

         Layout : left-symmetric-6
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8        3        2      active sync   /dev/sda3

   0     0       8       35        0      active sync   /dev/sdc3
   1     1       8       51        1      active sync   /dev/sdd3
   2     2       8        3        2      active sync   /dev/sda3
   3     3       8       83        3      active sync   /dev/sdf3
   4     4       8       99        4      active   /dev/sdg3

thank you.

iam currently reshaping my raid5 to a raid6.

i want to give you a note that i have had the "too-old metadata" problem
with "mdadm - v3.1.1 - 19th November 2009"
commenting out that check started my array again. i thought this should
have been fixed in that version? 

what is the right way to stop the reshaping process? kill <pid of mdadm
--grow/assemble> and then mdadm --stop /dev/mdX or just mdadm --stop
/dev/mdX without killing?

other question: what happens when a operating raid5/6 encounters a bad
block at read time? does it just mark the corresponding devices as faild?

> If you were desperate, you could use 'dd' to read each of the chunks
into a
> file, then write a little c/perl/whatever program to xor those files
> together, then use 'dd' to write that file back out the the target
chunk.
> 
> NeilBrown

sounds easy so far. mapping blocks to chunks is also easy? and what to do
in a raid6 case?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-16 11:14   ` Michael
@ 2010-02-17 23:47     ` Neil Brown
  2010-02-18  4:12       ` Keld Simonsen
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2010-02-17 23:47 UTC (permalink / raw)
  To: Michael; +Cc: linux-raid

On Tue, 16 Feb 2010 12:14:38 +0100
Michael <michael@rw23.de> wrote:

> On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote:
> >> is there any method to find that bad block in context of the raid block
> >> device? reading all files is not a good option on large raidsets.
> >> level 5, 64k chunk, algorithm 2
> > 
> > It isn't that hard.  The code is in drivers/md/raid5.c in the
> kernel.....
> > 
> > Rather than trying to describe in general, give me the block number,
> > device,
> > and "mdadm --examine" of that device, and I'll tell you how I get the
> > answer.
> 
> 
> the bad block number was 122060740 sec.
> 
> [root@raw sqla]mdadm --examine /dev/sda3
> /dev/sda3:
>           Magic : a92b4efc
>         Version : 0.91.00
>            UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw)
>   Creation Time : Thu Feb 11 16:01:12 2010
>      Raid Level : raid6
>   Used Dev Size : 966060672 (921.31 GiB 989.25 GB)
>      Array Size : 2898182016 (2763.92 GiB 2967.74 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 2
> 
>   Reshape pos'n : 974014464 (928.89 GiB 997.39 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Tue Feb 16 11:58:37 2010
>           State : clean
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 16372b12 - correct
>          Events : 363519
> 
>          Layout : left-symmetric-6
>      Chunk Size : 64K

So...
There is no Data Offset give, so it is zero.  so the block is 122060740
sectors into the data area of the devices.
Chunksize if 64k (128 sectors), so 
122060740 / 128 == 953599 remainder 68.
So Stripe number 953599, and sector 68 of device '2' of that stripe.

A stripe had 4 disks when raid5, 5 when raid6, so 3 data drives.
So stripe 953599 is 953599 * 3 * 128 sectors from the start of the
array. i.e. 366182016 sectors.

In the raid5 layout:
4 drives, so 4 different stripe layouts.
953599 % 4 == 3, so it is layout 3 (of 0, 1, 2, 3).
Looking at the code in raid5.c for LEFT_SYMMETRIC 
The parity disk is disk 0. The data disks follow that,
so device '2' holds data chunk '1'.
So we add 1 full chunk plus the 68 sectors of the partial chunk.
i.e. that sector is 366182016 + 128 + 68
 or sector 366182212 in the array.

After the conversion to RAID6, there are 5 drives so 5 stripe layouts.
953599 % 5 == 4, so layout 4
So 'P' is device 0, 'Q' is device 1, D0 is device 2 etc.
So sda3 is the first data disk in the stripe, so there are no full stripes to
add, just the partial stripe.
366182016 + 68 == 366182084


> 
>       Number   Major   Minor   RaidDevice State
> this     2       8        3        2      active sync   /dev/sda3
> 
>    0     0       8       35        0      active sync   /dev/sdc3
>    1     1       8       51        1      active sync   /dev/sdd3
>    2     2       8        3        2      active sync   /dev/sda3
>    3     3       8       83        3      active sync   /dev/sdf3
>    4     4       8       99        4      active   /dev/sdg3
> 
> thank you.
> 
> iam currently reshaping my raid5 to a raid6.
> 
> i want to give you a note that i have had the "too-old metadata" problem
> with "mdadm - v3.1.1 - 19th November 2009"
> commenting out that check started my array again. i thought this should
> have been fixed in that version? 

I thought so too.  I'll have to have another look.

> 
> what is the right way to stop the reshaping process? kill <pid of mdadm
> --grow/assemble> and then mdadm --stop /dev/mdX or just mdadm --stop
> /dev/mdX without killing?

Don't kill things.  Just --stop the array.

> 
> other question: what happens when a operating raid5/6 encounters a bad
> block at read time? does it just mark the corresponding devices as faild?

A read error only causes the device to be failed if the array is degraded.
If the array is not degraded, md tries to recover the data and write it back
out.  If this fails, then the device is failed.


> 
> > If you were desperate, you could use 'dd' to read each of the chunks
> into a
> > file, then write a little c/perl/whatever program to xor those files
> > together, then use 'dd' to write that file back out the the target
> chunk.
> > 
> > NeilBrown
> 
> sounds easy so far. mapping blocks to chunks is also easy? and what to do
> in a raid6 case?

Much the same - it is just the finally mapping within a stripe that is
interesting ... look at the code :-)

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Map Block number from hdd to md
  2010-02-17 23:47     ` Neil Brown
@ 2010-02-18  4:12       ` Keld Simonsen
  0 siblings, 0 replies; 10+ messages in thread
From: Keld Simonsen @ 2010-02-18  4:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: Michael, linux-raid

On Thu, Feb 18, 2010 at 10:47:55AM +1100, Neil Brown wrote:
> On Tue, 16 Feb 2010 12:14:38 +0100
> Michael <michael@rw23.de> wrote:
> 
> > On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote:
> > >> is there any method to find that bad block in context of the raid block
> > >> device? reading all files is not a good option on large raidsets.
> > >> level 5, 64k chunk, algorithm 2
> > > 
> > > It isn't that hard.  The code is in drivers/md/raid5.c in the
> > kernel.....
> > > 
> > > Rather than trying to describe in general, give me the block number,
> > > device,
> > > and "mdadm --examine" of that device, and I'll tell you how I get the
> > > answer.
> > 
> > 
> > the bad block number was 122060740 sec.
> > 
> > [root@raw sqla]mdadm --examine /dev/sda3
> > /dev/sda3:
> >           Magic : a92b4efc
> >         Version : 0.91.00
> >            UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw)
> >   Creation Time : Thu Feb 11 16:01:12 2010
> >      Raid Level : raid6
> >   Used Dev Size : 966060672 (921.31 GiB 989.25 GB)
> >      Array Size : 2898182016 (2763.92 GiB 2967.74 GB)
> >    Raid Devices : 5
> >   Total Devices : 5
> > Preferred Minor : 2
> > 
> >   Reshape pos'n : 974014464 (928.89 GiB 997.39 GB)
> >      New Layout : left-symmetric
> > 
> >     Update Time : Tue Feb 16 11:58:37 2010
> >           State : clean
> >  Active Devices : 5
> > Working Devices : 5
> >  Failed Devices : 0
> >   Spare Devices : 0
> >        Checksum : 16372b12 - correct
> >          Events : 363519
> > 
> >          Layout : left-symmetric-6
> >      Chunk Size : 64K
> 
> So...
> There is no Data Offset give, so it is zero.  so the block is 122060740
> sectors into the data area of the devices.
> Chunksize if 64k (128 sectors), so 
> 122060740 / 128 == 953599 remainder 68.
> So Stripe number 953599, and sector 68 of device '2' of that stripe.
> 
> A stripe had 4 disks when raid5, 5 when raid6, so 3 data drives.
> So stripe 953599 is 953599 * 3 * 128 sectors from the start of the
> array. i.e. 366182016 sectors.
> 
> In the raid5 layout:
> 4 drives, so 4 different stripe layouts.
> 953599 % 4 == 3, so it is layout 3 (of 0, 1, 2, 3).
> Looking at the code in raid5.c for LEFT_SYMMETRIC 
> The parity disk is disk 0. The data disks follow that,
> so device '2' holds data chunk '1'.
> So we add 1 full chunk plus the 68 sectors of the partial chunk.
> i.e. that sector is 366182016 + 128 + 68
>  or sector 366182212 in the array.
> 
> After the conversion to RAID6, there are 5 drives so 5 stripe layouts.
> 953599 % 5 == 4, so layout 4
> So 'P' is device 0, 'Q' is device 1, D0 is device 2 etc.
> So sda3 is the first data disk in the stripe, so there are no full stripes to
> add, just the partial stripe.
> 366182016 + 68 == 366182084

Sounds like it would be nice to have a program to calculate this :-)

Seriously, Neil, would it be possible to lift out the code from the
kernel to make a small utility - for salvaging raid file systems?

best regards
keld

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-02-18  4:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-12  0:24 Map Block number from hdd to md Michael
2010-02-16  1:20 ` Neil Brown
2010-02-16  4:02   ` Keld Simonsen
2010-02-16  4:38     ` Keld Simonsen
2010-02-16 10:57       ` Michael
2010-02-17  3:34         ` Keld Simonsen
2010-02-17  8:43           ` Michael
2010-02-16 11:14   ` Michael
2010-02-17 23:47     ` Neil Brown
2010-02-18  4:12       ` Keld Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).