* Map Block number from hdd to md @ 2010-02-12 0:24 Michael 2010-02-16 1:20 ` Neil Brown 0 siblings, 1 reply; 10+ messages in thread From: Michael @ 2010-02-12 0:24 UTC (permalink / raw) To: linux-raid Hello, i've came into the situation that one of my 4 mdadm raid5 drives failed. not realy faild, but not detectet at system startup. so i started resync, and one of the remaining hdd's had a bad block and faild. so 2 drives offline and raid not functional anymore. 1st question: i have read that it is possible with debugfs to locate which file belongs to the bad block on a ext file system. good thing, so i can check if i have *lost* an inportant or an unimportant file... or just free space. problen with this is, that i cant map the known bad block from, lets say, sda to my raid array md0. is there any method to find that bad block in context of the raid block device? reading all files is not a good option on large raidsets. level 5, 64k chunk, algorithm 2 2nd question: in my case, i have a functional raid5 array with 3 of 4 disks, in which one of the active discs has a bad sector. assume that the one failed disk has consistent parity information/data on this sector, but has been altered so that a complete resync would not work. is there a way to resync only that one chunk that belongs to the block? using the data from the 3 drives without a bad block, even if one is not a active part of the array but was before? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-12 0:24 Map Block number from hdd to md Michael @ 2010-02-16 1:20 ` Neil Brown 2010-02-16 4:02 ` Keld Simonsen 2010-02-16 11:14 ` Michael 0 siblings, 2 replies; 10+ messages in thread From: Neil Brown @ 2010-02-16 1:20 UTC (permalink / raw) To: Michael; +Cc: linux-raid On Fri, 12 Feb 2010 01:24:30 +0100 Michael <michael@rw23.de> wrote: > Hello, > > i've came into the situation that one of my 4 mdadm raid5 drives failed. > not realy faild, but not detectet at system startup. so i started resync, > and one of the remaining hdd's had a bad block and faild. so 2 drives > offline and raid not functional anymore. > > 1st question: > i have read that it is possible with debugfs to locate which file belongs > to the bad block on a ext file system. good thing, so i can check if i have > *lost* an inportant or an unimportant file... or just free space. > problen with this is, that i cant map the known bad block from, lets say, > sda to my raid array md0. > > is there any method to find that bad block in context of the raid block > device? reading all files is not a good option on large raidsets. > level 5, 64k chunk, algorithm 2 It isn't that hard. The code is in drivers/md/raid5.c in the kernel..... Rather than trying to describe in general, give me the block number, device, and "mdadm --examine" of that device, and I'll tell you how I get the answer. > > 2nd question: > > in my case, i have a functional raid5 array with 3 of 4 disks, in which > one of the active discs has a bad sector. assume that the one failed disk > has consistent parity information/data on this sector, but has been altered > so that a complete resync would not work. is there a way to resync only > that one chunk that belongs to the block? using the data from the 3 drives > without a bad block, even if one is not a active part of the array but was > before? No. If you were desperate, you could use 'dd' to read each of the chunks into a file, then write a little c/perl/whatever program to xor those files together, then use 'dd' to write that file back out the the target chunk. NeilBrown ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 1:20 ` Neil Brown @ 2010-02-16 4:02 ` Keld Simonsen 2010-02-16 4:38 ` Keld Simonsen 2010-02-16 11:14 ` Michael 1 sibling, 1 reply; 10+ messages in thread From: Keld Simonsen @ 2010-02-16 4:02 UTC (permalink / raw) To: Neil Brown; +Cc: Michael, linux-raid On Tue, Feb 16, 2010 at 12:20:14PM +1100, Neil Brown wrote: > On Fri, 12 Feb 2010 01:24:30 +0100 > Michael <michael@rw23.de> wrote: > > > Hello, > > > > i've came into the situation that one of my 4 mdadm raid5 drives failed. > > not realy faild, but not detectet at system startup. so i started resync, > > and one of the remaining hdd's had a bad block and faild. so 2 drives > > offline and raid not functional anymore. I just had a similar situation. A raid5 with 4 disks had block errors on one disk and was failed. I checked it and it seemed without errors, and I wanted to re-add it. But then the other (Samsung 1 TB) disk erred in the resync process due to this disk also having bad blocks. I managed to get the raid5 running again forcing it to be run with only 3 disks (one with bad blocks), and checking the fs with xfs_repair I found out that I was lucky that the fs integrity (directories, inodes etc) was undmaged. So I could run the array. But I cannot resync it as resyncing almost immediately runs into a resync of the bad blocks on the Samsung disk. It would have been nice if there was some sort of bad blocks management with Linux MD, but I understand that this is in the works. I also understand that ext3 badblock management would not have saved me here, true? MD resyncing is in an underlying level and does not take care of ext3 badblock handling, I think. > > 1st question: > > i have read that it is possible with debugfs to locate which file belongs > > to the bad block on a ext file system. good thing, so i can check if i have > > *lost* an inportant or an unimportant file... or just free space. > > problen with this is, that i cant map the known bad block from, lets say, > > sda to my raid array md0. > > > > is there any method to find that bad block in context of the raid block > > device? reading all files is not a good option on large raidsets. > > level 5, 64k chunk, algorithm 2 > > It isn't that hard. The code is in drivers/md/raid5.c in the kernel..... > > Rather than trying to describe in general, give me the block number, device, > and "mdadm --examine" of that device, and I'll tell you how I get the answer. Furthermore I would have liked to find out which files were affected. Is there a way to do this with XFS? debugfs is for ext3. I was not able to find a program mapping a sector to an inode in XFS. And then there is the need to map the physical bad block number on the device to the actual block in the (damaged) raid5. How to do that? I think this is almost the same question as Michael's (with an XFS variation). Best regards Keld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 4:02 ` Keld Simonsen @ 2010-02-16 4:38 ` Keld Simonsen 2010-02-16 10:57 ` Michael 0 siblings, 1 reply; 10+ messages in thread From: Keld Simonsen @ 2010-02-16 4:38 UTC (permalink / raw) To: Neil Brown; +Cc: Michael, linux-raid Further to my problems described below I dreamt up something that could solve my problem, till I got new disks installed. I am actually alive with a raid5 with 2 malfunctioning devices - something that is impossible... And I think I could be revived. And I think it is not an uncommon situation. I have badblocks. But only about 60 blocks on one drive and 10 on the other, out of 4 drives. It is an error rate of about 1 out of 20,000 or 99,995 % good data rate. If I could resync both the erroneous drives, and avoid the badblocks in the process, I would be safe (for some time). So if resync could be told to avoid the badblocks, and the file system in question also could be told to avoid the blocks then I could be in the air. I was then thinking of a userland resync process - no need to change the kernel, just install new mdadm and friends. Is that doable and useful? best regards keld On Tue, Feb 16, 2010 at 06:02:52AM +0200, Keld Simonsen wrote: > On Tue, Feb 16, 2010 at 12:20:14PM +1100, Neil Brown wrote: > > On Fri, 12 Feb 2010 01:24:30 +0100 > > Michael <michael@rw23.de> wrote: > > > > > Hello, > > > > > > i've came into the situation that one of my 4 mdadm raid5 drives failed. > > > not realy faild, but not detectet at system startup. so i started resync, > > > and one of the remaining hdd's had a bad block and faild. so 2 drives > > > offline and raid not functional anymore. > > I just had a similar situation. A raid5 with 4 disks had block errors on > one disk and was failed. I checked it and it seemed without errors, and > I wanted to re-add it. But then the other (Samsung 1 TB) disk erred in > the resync process due to this disk also having bad blocks. > > I managed to get the raid5 running again forcing it to be run with only > 3 disks (one with bad blocks), and checking the fs with xfs_repair I found out that I was > lucky that the fs integrity (directories, inodes etc) was undmaged. > So I could run the array. > > But I cannot resync it as resyncing almost immediately runs into a > resync of the bad blocks on the Samsung disk. It would have been nice if > there was some sort of bad blocks management with Linux MD, but I > understand that this is in the works. > > I also understand that ext3 badblock management would not have saved me > here, true? > > MD resyncing is in an underlying level and does not take care of ext3 > badblock handling, I think. > > > > > > 1st question: > > > i have read that it is possible with debugfs to locate which file belongs > > > to the bad block on a ext file system. good thing, so i can check if i have > > > *lost* an inportant or an unimportant file... or just free space. > > > problen with this is, that i cant map the known bad block from, lets say, > > > sda to my raid array md0. > > > > > > is there any method to find that bad block in context of the raid block > > > device? reading all files is not a good option on large raidsets. > > > level 5, 64k chunk, algorithm 2 > > > > It isn't that hard. The code is in drivers/md/raid5.c in the kernel..... > > > > Rather than trying to describe in general, give me the block number, device, > > and "mdadm --examine" of that device, and I'll tell you how I get the answer. > > Furthermore I would have liked to find out which files were affected. > Is there a way to do this with XFS? debugfs is for ext3. > I was not able to find a program mapping a sector to an inode in XFS. > And then there is the need to map the physical bad block number on the > device to the actual block in the (damaged) raid5. How to do that? > I think this is almost the same question as Michael's (with an XFS > variation). > > Best regards > Keld > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 4:38 ` Keld Simonsen @ 2010-02-16 10:57 ` Michael 2010-02-17 3:34 ` Keld Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Michael @ 2010-02-16 10:57 UTC (permalink / raw) To: Keld Simonsen; +Cc: Neil Brown, linux-raid Hi Keld, if you do a smartctl -A on /dev/sdX you sould see something under Current_Pending_Sector and Offline_Uncorrectable. Your hard drive replaces the bad blocks with spare blocks as far as you are write something to them. i have solved the resync issue by using dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1 you can test the block number to be really bad by dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1 if that command causes a input/output error, the block is bad. in fact, with each block, you have "lost" 512 bytes of data. your problem is very simular to mine. after overwriting the bad blocks, all should be fine again. you sould be able to "repair" all that bad blocks by a little xor'ing script/program mentioned by neil brown. if would be nice to have such a script where you can tell which block/chunk is wrong and to which device to write to (and to read from). with that program, the bad block will be overwritten with the (hopefully) valid data and become functional again. i also think this is a very common issue, that after a 1disk failue a 2nd disk fails at resync because of bad blocks. this could be prevented by doing a long smart check once a week or something, but i did not had the idea to do that till today :) On Tue, 16 Feb 2010 06:38:41 +0200, Keld Simonsen <keld@keldix.com> wrote: > Further to my problems described below I dreamt up something that could > solve my problem, till I got new disks installed. > > I am actually alive with a raid5 with 2 malfunctioning devices - > something that is impossible... And I think I could be revived. > And I think it is not an uncommon situation. > > I have badblocks. But only about 60 blocks on one drive and 10 on the > other, out of 4 drives. It is an error rate of about 1 out of 20,000 > or 99,995 % good data rate. If I could resync both the erroneous drives, > and > avoid the badblocks in the process, I would be safe (for some time). > > So if resync could be told to avoid the badblocks, and the file system > in question also could be told to avoid the blocks then I could be in > the air. I was then thinking of a userland resync process - no need to > change the kernel, just install new mdadm and friends. Is that doable > and useful? > > best regards > keld ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 10:57 ` Michael @ 2010-02-17 3:34 ` Keld Simonsen 2010-02-17 8:43 ` Michael 0 siblings, 1 reply; 10+ messages in thread From: Keld Simonsen @ 2010-02-17 3:34 UTC (permalink / raw) To: Michael; +Cc: Neil Brown, linux-raid On Tue, Feb 16, 2010 at 11:57:00AM +0100, Michael wrote: > Hi Keld, > > if you do a smartctl -A on /dev/sdX you sould see something under > Current_Pending_Sector and Offline_Uncorrectable. > Your hard drive replaces the bad blocks with spare blocks as far as you > are write something to them. > > i have solved the resync issue by using > dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1 > > you can test the block number to be really bad by > dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1 > if that command causes a input/output error, the block is bad. Yes, that cleared some errors, but unfortunately not all. That is one divice had 72bad blocks beforehand, and 44 afterwaeds, and the other had 9 beforehand, and 5 after. The second dd command actuallly did not report any bad blocks, but a selective badblocks command did. Anyway, is there something about Samsung disks not having spare blocks for this? > in fact, with each block, you have "lost" 512 bytes of data. your problem > is very simular to mine. > after overwriting the bad blocks, all should be fine again. > > you sould be able to "repair" all that bad blocks by a little xor'ing > script/program mentioned by neil brown. > if would be nice to have such a script where you can tell which > block/chunk is wrong and to which device to write to (and to read from). > with that program, the bad block will be overwritten with the (hopefully) > valid data and become functional again. yes, I still would like to find the inode in the raid file system from the bad block on a physical disk. > i also think this is a very common issue, that after a 1disk failue a 2nd > disk fails at resync because of bad blocks. > this could be prevented by doing a long smart check once a week or > something, but i did not had the idea to do that till today :) I will do some description of this on the wiki, in a while. Others may also contribute, you are most welcome to write something up for the wiki. > On Tue, 16 Feb 2010 06:38:41 +0200, Keld Simonsen <keld@keldix.com> wrote: > > Further to my problems described below I dreamt up something that could > > solve my problem, till I got new disks installed. > > > > I am actually alive with a raid5 with 2 malfunctioning devices - > > something that is impossible... And I think I could be revived. > > And I think it is not an uncommon situation. > > > > I have badblocks. But only about 60 blocks on one drive and 10 on the > > other, out of 4 drives. It is an error rate of about 1 out of 20,000 > > or 99,995 % good data rate. If I could resync both the erroneous drives, > > and > > avoid the badblocks in the process, I would be safe (for some time). > > > > So if resync could be told to avoid the badblocks, and the file system > > in question also could be told to avoid the blocks then I could be in > > the air. I was then thinking of a userland resync process - no need to > > change the kernel, just install new mdadm and friends. Is that doable > > and useful? > > > > best regards > > keld > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-17 3:34 ` Keld Simonsen @ 2010-02-17 8:43 ` Michael 0 siblings, 0 replies; 10+ messages in thread From: Michael @ 2010-02-17 8:43 UTC (permalink / raw) To: Keld Simonsen; +Cc: linux-raid On Wed, 17 Feb 2010 05:34:38 +0200, Keld Simonsen <keld@keldix.com> wrote: > On Tue, Feb 16, 2010 at 11:57:00AM +0100, Michael wrote: >> Hi Keld, >> >> if you do a smartctl -A on /dev/sdX you sould see something under >> Current_Pending_Sector and Offline_Uncorrectable. >> Your hard drive replaces the bad blocks with spare blocks as far as you >> are write something to them. >> >> i have solved the resync issue by using >> dd if=/dev/zero of=/dev/sdX bs=512 seek=<bad-block-number> count=1 >> >> you can test the block number to be really bad by >> dd if=/dev/sdX of=/dev/null bs=512 skip=<bad-block-number> count=1 >> if that command causes a input/output error, the block is bad. > > Yes, that cleared some errors, but unfortunately not all. > That is one divice had 72bad blocks beforehand, and 44 afterwaeds, and > the other had 9 beforehand, and 5 after. > > The second dd command actuallly did not report any bad blocks, but a > selective badblocks command did. strange. if the 2nd command is working, you can recover the data and write it back. i have heard about bad blocks that are "sometimes" bad, and other times not. iam not sure. > > Anyway, is there something about Samsung disks not having spare blocks > for this? Model Family: SAMSUNG SpinPoint F1 DT series Device Model: SAMSUNG HD103UJ that's my disk, and it has spare blocks. check smartctl -A /dev/sdX for Current_Pending_Sector and Offline_Uncorrectable > yes, I still would like to find the inode in the raid file system from > the bad block on a physical disk. yeah, me to. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 1:20 ` Neil Brown 2010-02-16 4:02 ` Keld Simonsen @ 2010-02-16 11:14 ` Michael 2010-02-17 23:47 ` Neil Brown 1 sibling, 1 reply; 10+ messages in thread From: Michael @ 2010-02-16 11:14 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote: >> is there any method to find that bad block in context of the raid block >> device? reading all files is not a good option on large raidsets. >> level 5, 64k chunk, algorithm 2 > > It isn't that hard. The code is in drivers/md/raid5.c in the kernel..... > > Rather than trying to describe in general, give me the block number, > device, > and "mdadm --examine" of that device, and I'll tell you how I get the > answer. the bad block number was 122060740. [root@raw sqla]mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 0.91.00 UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw) Creation Time : Thu Feb 11 16:01:12 2010 Raid Level : raid6 Used Dev Size : 966060672 (921.31 GiB 989.25 GB) Array Size : 2898182016 (2763.92 GiB 2967.74 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 2 Reshape pos'n : 974014464 (928.89 GiB 997.39 GB) New Layout : left-symmetric Update Time : Tue Feb 16 11:58:37 2010 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Checksum : 16372b12 - correct Events : 363519 Layout : left-symmetric-6 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 3 2 active sync /dev/sda3 0 0 8 35 0 active sync /dev/sdc3 1 1 8 51 1 active sync /dev/sdd3 2 2 8 3 2 active sync /dev/sda3 3 3 8 83 3 active sync /dev/sdf3 4 4 8 99 4 active /dev/sdg3 thank you. iam currently reshaping my raid5 to a raid6. i want to give you a note that i have had the "too-old metadata" problem with "mdadm - v3.1.1 - 19th November 2009" commenting out that check started my array again. i thought this should have been fixed in that version? what is the right way to stop the reshaping process? kill <pid of mdadm --grow/assemble> and then mdadm --stop /dev/mdX or just mdadm --stop /dev/mdX without killing? other question: what happens when a operating raid5/6 encounters a bad block at read time? does it just mark the corresponding devices as faild? > If you were desperate, you could use 'dd' to read each of the chunks into a > file, then write a little c/perl/whatever program to xor those files > together, then use 'dd' to write that file back out the the target chunk. > > NeilBrown sounds easy so far. mapping blocks to chunks is also easy? and what to do in a raid6 case? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-16 11:14 ` Michael @ 2010-02-17 23:47 ` Neil Brown 2010-02-18 4:12 ` Keld Simonsen 0 siblings, 1 reply; 10+ messages in thread From: Neil Brown @ 2010-02-17 23:47 UTC (permalink / raw) To: Michael; +Cc: linux-raid On Tue, 16 Feb 2010 12:14:38 +0100 Michael <michael@rw23.de> wrote: > On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote: > >> is there any method to find that bad block in context of the raid block > >> device? reading all files is not a good option on large raidsets. > >> level 5, 64k chunk, algorithm 2 > > > > It isn't that hard. The code is in drivers/md/raid5.c in the > kernel..... > > > > Rather than trying to describe in general, give me the block number, > > device, > > and "mdadm --examine" of that device, and I'll tell you how I get the > > answer. > > > the bad block number was 122060740 sec. > > [root@raw sqla]mdadm --examine /dev/sda3 > /dev/sda3: > Magic : a92b4efc > Version : 0.91.00 > UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw) > Creation Time : Thu Feb 11 16:01:12 2010 > Raid Level : raid6 > Used Dev Size : 966060672 (921.31 GiB 989.25 GB) > Array Size : 2898182016 (2763.92 GiB 2967.74 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 2 > > Reshape pos'n : 974014464 (928.89 GiB 997.39 GB) > New Layout : left-symmetric > > Update Time : Tue Feb 16 11:58:37 2010 > State : clean > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 16372b12 - correct > Events : 363519 > > Layout : left-symmetric-6 > Chunk Size : 64K So... There is no Data Offset give, so it is zero. so the block is 122060740 sectors into the data area of the devices. Chunksize if 64k (128 sectors), so 122060740 / 128 == 953599 remainder 68. So Stripe number 953599, and sector 68 of device '2' of that stripe. A stripe had 4 disks when raid5, 5 when raid6, so 3 data drives. So stripe 953599 is 953599 * 3 * 128 sectors from the start of the array. i.e. 366182016 sectors. In the raid5 layout: 4 drives, so 4 different stripe layouts. 953599 % 4 == 3, so it is layout 3 (of 0, 1, 2, 3). Looking at the code in raid5.c for LEFT_SYMMETRIC The parity disk is disk 0. The data disks follow that, so device '2' holds data chunk '1'. So we add 1 full chunk plus the 68 sectors of the partial chunk. i.e. that sector is 366182016 + 128 + 68 or sector 366182212 in the array. After the conversion to RAID6, there are 5 drives so 5 stripe layouts. 953599 % 5 == 4, so layout 4 So 'P' is device 0, 'Q' is device 1, D0 is device 2 etc. So sda3 is the first data disk in the stripe, so there are no full stripes to add, just the partial stripe. 366182016 + 68 == 366182084 > > Number Major Minor RaidDevice State > this 2 8 3 2 active sync /dev/sda3 > > 0 0 8 35 0 active sync /dev/sdc3 > 1 1 8 51 1 active sync /dev/sdd3 > 2 2 8 3 2 active sync /dev/sda3 > 3 3 8 83 3 active sync /dev/sdf3 > 4 4 8 99 4 active /dev/sdg3 > > thank you. > > iam currently reshaping my raid5 to a raid6. > > i want to give you a note that i have had the "too-old metadata" problem > with "mdadm - v3.1.1 - 19th November 2009" > commenting out that check started my array again. i thought this should > have been fixed in that version? I thought so too. I'll have to have another look. > > what is the right way to stop the reshaping process? kill <pid of mdadm > --grow/assemble> and then mdadm --stop /dev/mdX or just mdadm --stop > /dev/mdX without killing? Don't kill things. Just --stop the array. > > other question: what happens when a operating raid5/6 encounters a bad > block at read time? does it just mark the corresponding devices as faild? A read error only causes the device to be failed if the array is degraded. If the array is not degraded, md tries to recover the data and write it back out. If this fails, then the device is failed. > > > If you were desperate, you could use 'dd' to read each of the chunks > into a > > file, then write a little c/perl/whatever program to xor those files > > together, then use 'dd' to write that file back out the the target > chunk. > > > > NeilBrown > > sounds easy so far. mapping blocks to chunks is also easy? and what to do > in a raid6 case? Much the same - it is just the finally mapping within a stripe that is interesting ... look at the code :-) NeilBrown ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Map Block number from hdd to md 2010-02-17 23:47 ` Neil Brown @ 2010-02-18 4:12 ` Keld Simonsen 0 siblings, 0 replies; 10+ messages in thread From: Keld Simonsen @ 2010-02-18 4:12 UTC (permalink / raw) To: Neil Brown; +Cc: Michael, linux-raid On Thu, Feb 18, 2010 at 10:47:55AM +1100, Neil Brown wrote: > On Tue, 16 Feb 2010 12:14:38 +0100 > Michael <michael@rw23.de> wrote: > > > On Tue, 16 Feb 2010 12:20:14 +1100, Neil Brown <neilb@suse.de> wrote: > > >> is there any method to find that bad block in context of the raid block > > >> device? reading all files is not a good option on large raidsets. > > >> level 5, 64k chunk, algorithm 2 > > > > > > It isn't that hard. The code is in drivers/md/raid5.c in the > > kernel..... > > > > > > Rather than trying to describe in general, give me the block number, > > > device, > > > and "mdadm --examine" of that device, and I'll tell you how I get the > > > answer. > > > > > > the bad block number was 122060740 sec. > > > > [root@raw sqla]mdadm --examine /dev/sda3 > > /dev/sda3: > > Magic : a92b4efc > > Version : 0.91.00 > > UUID : 9815a2c6:c83a9a53:2a8015ce:9d8e5e8c (local to host raw) > > Creation Time : Thu Feb 11 16:01:12 2010 > > Raid Level : raid6 > > Used Dev Size : 966060672 (921.31 GiB 989.25 GB) > > Array Size : 2898182016 (2763.92 GiB 2967.74 GB) > > Raid Devices : 5 > > Total Devices : 5 > > Preferred Minor : 2 > > > > Reshape pos'n : 974014464 (928.89 GiB 997.39 GB) > > New Layout : left-symmetric > > > > Update Time : Tue Feb 16 11:58:37 2010 > > State : clean > > Active Devices : 5 > > Working Devices : 5 > > Failed Devices : 0 > > Spare Devices : 0 > > Checksum : 16372b12 - correct > > Events : 363519 > > > > Layout : left-symmetric-6 > > Chunk Size : 64K > > So... > There is no Data Offset give, so it is zero. so the block is 122060740 > sectors into the data area of the devices. > Chunksize if 64k (128 sectors), so > 122060740 / 128 == 953599 remainder 68. > So Stripe number 953599, and sector 68 of device '2' of that stripe. > > A stripe had 4 disks when raid5, 5 when raid6, so 3 data drives. > So stripe 953599 is 953599 * 3 * 128 sectors from the start of the > array. i.e. 366182016 sectors. > > In the raid5 layout: > 4 drives, so 4 different stripe layouts. > 953599 % 4 == 3, so it is layout 3 (of 0, 1, 2, 3). > Looking at the code in raid5.c for LEFT_SYMMETRIC > The parity disk is disk 0. The data disks follow that, > so device '2' holds data chunk '1'. > So we add 1 full chunk plus the 68 sectors of the partial chunk. > i.e. that sector is 366182016 + 128 + 68 > or sector 366182212 in the array. > > After the conversion to RAID6, there are 5 drives so 5 stripe layouts. > 953599 % 5 == 4, so layout 4 > So 'P' is device 0, 'Q' is device 1, D0 is device 2 etc. > So sda3 is the first data disk in the stripe, so there are no full stripes to > add, just the partial stripe. > 366182016 + 68 == 366182084 Sounds like it would be nice to have a program to calculate this :-) Seriously, Neil, would it be possible to lift out the code from the kernel to make a small utility - for salvaging raid file systems? best regards keld ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2010-02-18 4:12 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-12 0:24 Map Block number from hdd to md Michael 2010-02-16 1:20 ` Neil Brown 2010-02-16 4:02 ` Keld Simonsen 2010-02-16 4:38 ` Keld Simonsen 2010-02-16 10:57 ` Michael 2010-02-17 3:34 ` Keld Simonsen 2010-02-17 8:43 ` Michael 2010-02-16 11:14 ` Michael 2010-02-17 23:47 ` Neil Brown 2010-02-18 4:12 ` Keld Simonsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).