* List of mismatched blocks?
@ 2010-07-03 17:15 Niobos
2010-07-03 21:29 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 5+ messages in thread
From: Niobos @ 2010-07-03 17:15 UTC (permalink / raw)
To: linux-raid
Hi,
Weekly, I'm checking my RAID1 array using:
echo "check" >> /sys/block/$MD/md/sync_action
Every few weeks, I get a mismatch count (which are easily solved with a
"sync"). Is there an easy way to get a list of blocknumbers that
mismatch? I'd like to figure out what may be causing this mismatch.
Regards,
Niobos
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: List of mismatched blocks?
2010-07-03 17:15 List of mismatched blocks? Niobos
@ 2010-07-03 21:29 ` Mario 'BitKoenig' Holbe
2010-07-04 10:16 ` Niobos
0 siblings, 1 reply; 5+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2010-07-03 21:29 UTC (permalink / raw)
To: linux-raid
Niobos <niobos@dest-unreach.be> wrote:
> Weekly, I'm checking my RAID1 array using:
> Every few weeks, I get a mismatch count (which are easily solved with a
Which kernel version and which filesystem on top of the RAID1?
> "sync"). Is there an easy way to get a list of blocknumbers that
> mismatch? I'd like to figure out what may be causing this mismatch.
You can just cmp -l your component devices and get a list of addresses
and mismatches - easy to convert to block numbers. For ext2/3 I'd bet
the mismatches are in inode blocks.
regards
Mario
--
Um mit einem Mann gluecklich zu werden, muss man ihn sehr gut
verstehen und ihn ein bisschen lieben.
Um mit einer Frau gluecklich zu werden, muss man sie sehr lieben
und darf erst gar nicht versuchen, sie zu verstehen.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: List of mismatched blocks?
2010-07-03 21:29 ` Mario 'BitKoenig' Holbe
@ 2010-07-04 10:16 ` Niobos
2010-07-04 18:29 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 5+ messages in thread
From: Niobos @ 2010-07-04 10:16 UTC (permalink / raw)
To: linux-raid
On 2010-07-03 23:29, Mario 'BitKoenig' Holbe wrote:
> Niobos <niobos@dest-unreach.be> wrote:
>> Weekly, I'm checking my RAID1 array using:
>> Every few weeks, I get a mismatch count (which are easily solved with a
>
> Which kernel version and which filesystem on top of the RAID1?
Linux serv02.<omitted> 2.6.31-15-server #50-Ubuntu SMP Tue Nov 10
15:50:36 UTC 2009 x86_64 GNU/Linux
/dev/md1 on / type ext3 (rw,errors=remount-ro)
>> "sync"). Is there an easy way to get a list of blocknumbers that
>> mismatch? I'd like to figure out what may be causing this mismatch.
>
> You can just cmp -l your component devices and get a list of addresses
> and mismatches - easy to convert to block numbers. For ext2/3 I'd bet
> the mismatches are in inode blocks.
I found out about cmp -l, but that gives me roughly 9 million differing
bytes. They are nicely grouped in ranges, but I was hoping for a way
that would output "sectors 517-523 and 4554-4559" or similar.
For the few mismatches that I calculated manually, the mismatching file
seems to be the ext3-journal itself.
I'm running cmp on an active RAID. My guess is that this cause a large
number of false positives: cmp reading both members at a different time,
and hence reading a different version. That was the main reason to ask
for another way to get the mismatching blocks through the raid-layer.
Niobos
PS: Both disks report "rellocated sector count = 0" via SMART.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: List of mismatched blocks?
2010-07-04 10:16 ` Niobos
@ 2010-07-04 18:29 ` Mario 'BitKoenig' Holbe
2010-07-05 10:16 ` Niobos
0 siblings, 1 reply; 5+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2010-07-04 18:29 UTC (permalink / raw)
To: linux-raid
Niobos <niobos@dest-unreach.be> wrote:
> I found out about cmp -l, but that gives me roughly 9 million differing
> bytes. They are nicely grouped in ranges, but I was hoping for a way
> that would output "sectors 517-523 and 4554-4559" or similar.
Well, a little awk (or whatever scripting language you prefer) script
afterwards? :)
> For the few mismatches that I calculated manually, the mismatching file
> seems to be the ext3-journal itself.
Yes, also a good bet.
On high-frequent changes this can unfortunately happen with RAID1 due to
the handling of dirty pages (there's a thread back in 2k6 here on the
list where Heinz Mauelshagen explained this quite nice - somewhere below
the Subject: No syncing after crash. Is this a software raid bug?).
I didn't notice it anymore since >2.6.26, so I thought it was fixed, but
this could also be because the probability shrunk due to new hardware.
> I'm running cmp on an active RAID. My guess is that this cause a large
> number of false positives: cmp reading both members at a different time,
> and hence reading a different version. That was the main reason to ask
Not very likely unless your filesystem is under very high load. To be
more specific: I did cmp -l my mirrors regularly before the "check"
sync_action had been developed (and still do it from time to time -
never trust... :)) and never experienced such things.
regards
Mario
--
If you think technology can solve your problems you don't understand
technology and you don't understand your problems.
-- Bruce Schneier
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: List of mismatched blocks?
2010-07-04 18:29 ` Mario 'BitKoenig' Holbe
@ 2010-07-05 10:16 ` Niobos
0 siblings, 0 replies; 5+ messages in thread
From: Niobos @ 2010-07-05 10:16 UTC (permalink / raw)
To: linux-raid
On 2010-07-04 20:29, Mario 'BitKoenig' Holbe wrote:
> Niobos <niobos@dest-unreach.be> wrote:
>> I'm running cmp on an active RAID. My guess is that this cause a large
>> number of false positives: cmp reading both members at a different time,
>> and hence reading a different version. That was the main reason to ask
>> for another way to get the mismatching blocks through the raid-layer.
> On high-frequent changes this can unfortunately happen with RAID1 due to
> the handling of dirty pages (there's a thread back in 2k6 here on the
> list where Heinz Mauelshagen explained this quite nice - somewhere below
> the Subject: No syncing after crash. Is this a software raid bug?).
> I didn't notice it anymore since >2.6.26, so I thought it was fixed, but
> this could also be because the probability shrunk due to new hardware.
>
>> I'm running cmp on an active RAID. My guess is that this cause a large
>> number of false positives: cmp reading both members at a different time,
>> and hence reading a different version. That was the main reason to ask
>
> Not very likely unless your filesystem is under very high load. To be
> more specific: I did cmp -l my mirrors regularly before the "check"
> sync_action had been developed (and still do it from time to time -
> never trust... :)) and never experienced such things.
I just did 5 cmp's sequentially, with minutes in between them:
-rw-r--r-- 1 root root 15016204 2010-07-05 09:18 sd[ab]1.1.diff
-rw-r--r-- 1 root root 21404544 2010-07-05 09:49 sd[ab]1.2.diff
-rw-r--r-- 1 root root 40538120 2010-07-05 09:54 sd[ab]1.3.diff
-rw-r--r-- 1 root root 415879184 2010-07-05 10:39 sd[ab]1.4.diff
-rw-r--r-- 1 root root 24811696 2010-07-05 11:45 sd[ab]1.5.diff
From the output-size alone, one can see that something is not working as
described. There are millions of bytes that mismatched at 10:39, that
magically were "fixed" an hour later...
I'm not sure how to quantify "very heavy filesystem load", but I don't
thing it is.
thanks,
Niobos
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-07-05 10:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-03 17:15 List of mismatched blocks? Niobos
2010-07-03 21:29 ` Mario 'BitKoenig' Holbe
2010-07-04 10:16 ` Niobos
2010-07-04 18:29 ` Mario 'BitKoenig' Holbe
2010-07-05 10:16 ` Niobos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).