From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Janos Haar" Subject: Re: Suggestion needed for fixing RAID6 Date: Sat, 1 May 2010 11:37:36 +0200 Message-ID: <12cf01cae911$f0d92940$0400a8c0@dcccs> References: <626601cae203$dae35030$0400a8c0@dcccs> <20100423065143.GA17743@maude.comedia.it> <695a01cae2c1$a72907d0$0400a8c0@dcccs> <4BD193D0.5080003@shiftmail.org> <717901cae3e5$6a5fa730$0400a8c0@dcccs> <4BD3751A.5000403@shiftmail.org> <756601cae45e$213d6190$0400a8c0@dcccs> <4BD569E2.7010409@shiftmail.org> <7a3e01cae53f$684122c0$0400a8c0@dcccs> <4BD5C51E.9040207@shiftmail.org> <80a201cae621$684daa30$0400a8c0@dcccs> <4BD76CF6.5020804@shiftmail.org> <20100428113732.03486490@notabene.brown> <4BD830B0.1080406@shiftmail.org> <025e01cae6d7$30bb7870$0400a8c0@dcccs> <4BD843D4.7030700@shiftmail.org> <062001cae771$545e0910$0400a8c0@dcccs> <4BD9A41E.9050009@shiftmail.org> <0c1201cae7e0$01f9a930$0400a8c0@dcccs> <4BDA0F88.70907@shiftmail.org> <0d6401cae82c$da8b5590$0400a8c0@dcccs> <4BDB6DB6.5020306@sh iftmail.org> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset="ISO-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit Return-path: Sender: linux-raid-owner@vger.kernel.org To: MRK Cc: linux-raid@vger.kernel.org List-Id: linux-raid.ids Hello, Now i am tried with 1 sector snapshot size. the result was the same first the snapshot have been invalidated, than DM dropped from the raid. The next was this: md3 : active raid6 sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[12](F) sdg4[6] sdf4[5] dm-0[4] sdc4[2] sdb4[1] sda4[0] 14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10] [UUU_UUU_UUUU] [===================>.] resync = 99.9% (1462653628/1462653888) finish=0.0 min speed=2512K/sec The sync progress bar jumped from 58.8% to 99.9% the speed falls, the 1462653628/1462653888 is freezed in this point. I can do dmesg once by hand, than save the dmesg output to file, but the system crashed after this. The entire story was about 1 minute. Whoever, the sync_min option generally solves my problem, becasue i can build up the missing disk from the 90% wich is good enough for me. :-) If somebody is interested about playing more with this system, i still have some days for it, but i am not interested anymore to trace the md-dm behavior in this situation.... Additionally, i don't want to put in risk the data if not really needed.... Thanks a lot, Janos Haar ----- Original Message ----- From: "MRK" To: "Janos Haar" Cc: Sent: Saturday, May 01, 2010 1:54 AM Subject: Re: Suggestion needed for fixing RAID6 > On 04/30/2010 08:17 AM, Janos Haar wrote: >> Hello, >> >> OK, MRK you are right (again). >> There was some line in the messages wich avoids my attention. >> The entire log is here: >> http://download.netcenter.hu/bughunt/20100430/messages >> > > Ah here we go: > > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots: > Invalidating snapshot: Error reading/writing. > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1, > disabling device. > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10 > devices. > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done. > > Firstly I'm not totally sure of how DM passed the information of the > device failing to MD. There is no error message about this on MD. If it > was a read error, MD should have performed the rewrite but this apparently > did not happen (the error message for a failed rewrite by MD I think is > "read error NOT corrected!!"). But anyway... > >> The dm founds invalid my cow devices, but i don't know why at this time. >> > > I have just had a brief look ad DM code. I understand like 1% of it right > now, however I am thinking that in a not-perfectly-optimized way of doing > things, if you specified 8 sectors (8x512b = 4k, which you did) > granularity during the creation of your cow and cow2 devices, whenever you > write to the COW device, DM might do the thing in 2 steps: > > 1- copy 8 (or multiple of 8) sectors from the HD to the cow device, enough > to cover the area to which you are writing > 2- overwrite such 8 sectors with the data coming from MD. > > Of course this is not optimal in case you are writing exactly 8 sectors > with MD, and these are aligned to the ones that DM uses (both things I > think are true in your case) because DM could have skipped #1 in this > case. > However supposing DM is not so smart and it indeed does not skip step #1, > then I think I understand why it disables the device: it's because #1 > fails with read error and DM does not know how to handle the situation in > that case in general. If you had written a smaller amount with MD such as > 512 bytes, if step #1 fails, what do you write in the other 7 sectors > around it? The right semantics is not obvious so they disable the device. > > Firstly you could try with 1 sector granularity instead of 8, during the > creation of dm cow devices. This MIGHT work around the issue if DM is at > least a bit smart. Right now it's not obvious to me where in the is code > the logic for the COW copying. Maybe tomorrow I will understand this. > > If this doesn't work, the best thing is probably if you can write to the > DM mailing list asking why it behaves like this and if they can guess a > workaround. You can keep me in cc, I'm interested. > > >> [CUT] >> >> echo 0 $(blockdev --getsize /dev/sde4) \ >> snapshot /dev/sde4 /dev/loop3 p 8 | \ >> dmsetup create cow >> >> echo 0 $(blockdev --getsize /dev/sdh4) \ >> snapshot /dev/sdh4 /dev/loop4 p 8 | \ >> dmsetup create cow2 > > See, you are creating it with 8 sectors granularity... try with 1. > >> I can try again, if there is any new idea, but it would be really good to >> do some trick with bitmaps or set the recovery's start point or something >> similar, because every time i need >16 hour to get the first poit where >> the raid do something interesting.... >> >> Neil, >> Can you say something useful about this? >> > > I just looked into this and it seems this feature is already there. > See if you have these files: > /sys/block/md3/md/sync_min and sync_max > Those are the starting and ending sector. > But keep in mind you have to enter them in multiples of the chunk size so > if your chunk is e.g. 1024k then you need to enter multiples of 2048 > (sectors). > Enter the value before starting the sync. Or stop the sync by entering > "idle" in sync_action, then change the sync_min value, then restart the > sync entering "check" in sync_action. It should work, I just tried it on > my comp. > > Good luck > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html