From: "Janos Haar" <janos.haar@netcenter.hu>
To: MRK <mrk@shiftmail.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Suggestion needed for fixing RAID6
Date: Sat, 1 May 2010 11:37:36 +0200 [thread overview]
Message-ID: <12cf01cae911$f0d92940$0400a8c0@dcccs> (raw)
In-Reply-To: 4BDB6DB6.5020306@sh iftmail.org
Hello,
Now i am tried with 1 sector snapshot size.
the result was the same
first the snapshot have been invalidated, than DM dropped from the raid.
The next was this:
md3 : active raid6 sdl4[11] sdk4[10] sdj4[9] sdi4[8] dm-1[12](F) sdg4[6]
sdf4[5]
dm-0[4] sdc4[2] sdb4[1] sda4[0]
14626538880 blocks level 6, 16k chunk, algorithm 2 [12/10]
[UUU_UUU_UUUU]
[===================>.] resync = 99.9% (1462653628/1462653888)
finish=0.0
min speed=2512K/sec
The sync progress bar jumped from 58.8% to 99.9% the speed falls, the
1462653628/1462653888 is freezed in this point.
I can do dmesg once by hand, than save the dmesg output to file, but the
system crashed after this.
The entire story was about 1 minute.
Whoever, the sync_min option generally solves my problem, becasue i can
build up the missing disk from the 90% wich is good enough for me. :-)
If somebody is interested about playing more with this system, i still have
some days for it, but i am not interested anymore to trace the md-dm
behavior in this situation....
Additionally, i don't want to put in risk the data if not really needed....
Thanks a lot,
Janos Haar
----- Original Message -----
From: "MRK" <mrk@shiftmail.org>
To: "Janos Haar" <janos.haar@netcenter.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Saturday, May 01, 2010 1:54 AM
Subject: Re: Suggestion needed for fixing RAID6
> On 04/30/2010 08:17 AM, Janos Haar wrote:
>> Hello,
>>
>> OK, MRK you are right (again).
>> There was some line in the messages wich avoids my attention.
>> The entire log is here:
>> http://download.netcenter.hu/bughunt/20100430/messages
>>
>
> Ah here we go:
>
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> Invalidating snapshot: Error reading/writing.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Operation continuing on 10
> devices.
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: md: md3: recovery done.
>
> Firstly I'm not totally sure of how DM passed the information of the
> device failing to MD. There is no error message about this on MD. If it
> was a read error, MD should have performed the rewrite but this apparently
> did not happen (the error message for a failed rewrite by MD I think is
> "read error NOT corrected!!"). But anyway...
>
>> The dm founds invalid my cow devices, but i don't know why at this time.
>>
>
> I have just had a brief look ad DM code. I understand like 1% of it right
> now, however I am thinking that in a not-perfectly-optimized way of doing
> things, if you specified 8 sectors (8x512b = 4k, which you did)
> granularity during the creation of your cow and cow2 devices, whenever you
> write to the COW device, DM might do the thing in 2 steps:
>
> 1- copy 8 (or multiple of 8) sectors from the HD to the cow device, enough
> to cover the area to which you are writing
> 2- overwrite such 8 sectors with the data coming from MD.
>
> Of course this is not optimal in case you are writing exactly 8 sectors
> with MD, and these are aligned to the ones that DM uses (both things I
> think are true in your case) because DM could have skipped #1 in this
> case.
> However supposing DM is not so smart and it indeed does not skip step #1,
> then I think I understand why it disables the device: it's because #1
> fails with read error and DM does not know how to handle the situation in
> that case in general. If you had written a smaller amount with MD such as
> 512 bytes, if step #1 fails, what do you write in the other 7 sectors
> around it? The right semantics is not obvious so they disable the device.
>
> Firstly you could try with 1 sector granularity instead of 8, during the
> creation of dm cow devices. This MIGHT work around the issue if DM is at
> least a bit smart. Right now it's not obvious to me where in the is code
> the logic for the COW copying. Maybe tomorrow I will understand this.
>
> If this doesn't work, the best thing is probably if you can write to the
> DM mailing list asking why it behaves like this and if they can guess a
> workaround. You can keep me in cc, I'm interested.
>
>
>> [CUT]
>>
>> echo 0 $(blockdev --getsize /dev/sde4) \
>> snapshot /dev/sde4 /dev/loop3 p 8 | \
>> dmsetup create cow
>>
>> echo 0 $(blockdev --getsize /dev/sdh4) \
>> snapshot /dev/sdh4 /dev/loop4 p 8 | \
>> dmsetup create cow2
>
> See, you are creating it with 8 sectors granularity... try with 1.
>
>> I can try again, if there is any new idea, but it would be really good to
>> do some trick with bitmaps or set the recovery's start point or something
>> similar, because every time i need >16 hour to get the first poit where
>> the raid do something interesting....
>>
>> Neil,
>> Can you say something useful about this?
>>
>
> I just looked into this and it seems this feature is already there.
> See if you have these files:
> /sys/block/md3/md/sync_min and sync_max
> Those are the starting and ending sector.
> But keep in mind you have to enter them in multiples of the chunk size so
> if your chunk is e.g. 1024k then you need to enter multiples of 2048
> (sectors).
> Enter the value before starting the sync. Or stop the sync by entering
> "idle" in sync_action, then change the sync_min value, then restart the
> sync entering "check" in sync_action. It should work, I just tried it on
> my comp.
>
> Good luck
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-05-01 9:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12 ` Janos Haar
2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-22 20:48 ` Janos Haar
2010-04-23 6:51 ` Luca Berra
2010-04-23 8:47 ` Janos Haar
2010-04-23 12:34 ` MRK
2010-04-24 19:36 ` Janos Haar
2010-04-24 22:47 ` MRK
2010-04-25 10:00 ` Janos Haar
2010-04-26 10:24 ` MRK
2010-04-26 12:52 ` Janos Haar
2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
2010-04-26 23:06 ` Michael Evans
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
2010-04-27 0:04 ` Michael Evans
2010-04-27 15:50 ` Janos Haar
2010-04-27 23:02 ` MRK
2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 2:12 ` Neil Brown
2010-04-28 2:30 ` Mikael Abrahamsson
2010-05-03 2:29 ` Neil Brown
2010-04-28 12:57 ` MRK
2010-04-28 13:32 ` Janos Haar
2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
2010-04-29 15:22 ` MRK
2010-04-29 21:07 ` Janos Haar
2010-04-29 23:00 ` MRK
2010-04-30 6:17 ` Janos Haar
2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
2010-05-01 9:37 ` Janos Haar [this message]
2010-05-01 17:17 ` MRK
2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
2010-05-03 21:04 ` Neil Brown
2010-05-03 21:02 ` Neil Brown
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20 ` Janos Haar
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
2010-05-05 19:27 ` MRK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='12cf01cae911$f0d92940$0400a8c0@dcccs' \
--to=janos.haar@netcenter.hu \
--cc=linux-raid@vger.kernel.org \
--cc=mrk@shiftmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).