* /etc/cron.weekly/99-raid-check
@ 2009-11-30 13:08 Farkas Levente
0 siblings, 0 replies; 5+ messages in thread
From: Farkas Levente @ 2009-11-30 13:08 UTC (permalink / raw)
To: CentOS mailing list, linux-raid
hi,
it's been a few weeks since rhel/centos 5.4 released and there were many
discussion about this new "feature" the weekly raid partition check.
we've got a lot's of server with raid1 system and i already try to
configure them not to send these messages, but i'm not able ie. i
already add to the SKIP_DEVS all of my swap partitions (since i read it
on linux-kernel list that there can be mismatch_cnt even though i still
not understand why?). but even the data partitions (ie. all of my
servers all raid1 partitions) produce this error (ie. ther mismatch_cnt
is never 0 at the weekend). and this cause all of my raid1 partitions
are rebuild during the weekend. and i don't like it:-(
so my questions:
- is it a real bug in the raid1 system?
- is it a real bug in my disk which runs raid (not really believe since
it's dozens of servers)?
- the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4?
or what's the problem?
can someone enlighten me?
thanks in advance.
regards.
--
Levente "Si vis pacem para bellum!"
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /etc/cron.weekly/99-raid-check
@ 2009-12-02 15:38 ` greg
2009-12-03 9:11 ` /etc/cron.weekly/99-raid-check CoolCold
0 siblings, 1 reply; 5+ messages in thread
From: greg @ 2009-12-02 15:38 UTC (permalink / raw)
To: Farkas Levente, CentOS mailing list, linux-raid
On Nov 30, 2:08pm, Farkas Levente wrote:
} Subject: /etc/cron.weekly/99-raid-check
> hi,
Hi Farkas, hope your day is going well. Just thought I would respond
for the edification of others who are troubled by this issue.
> it's been a few weeks since rhel/centos 5.4 released and there were many
> discussion about this new "feature" the weekly raid partition check.
> we've got a lot's of server with raid1 system and i already try to
> configure them not to send these messages, but i'm not able ie. i
> already add to the SKIP_DEVS all of my swap partitions (since i read it
> on linux-kernel list that there can be mismatch_cnt even though i still
> not understand why?). but even the data partitions (ie. all of my
> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt
> is never 0 at the weekend). and this cause all of my raid1 partitions
> are rebuild during the weekend. and i don't like it:-(
> so my questions:
> - is it a real bug in the raid1 system?
> - is it a real bug in my disk which runs raid (not really believe since
> it's dozens of servers)?
> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4?
> or what's the problem?
> can someone enlighten me?
Its a combination of what I would consider a misfeature with what MAY
BE, and I stress MAY be a sentient bug someplace.
The current RAID/IO stack does not 'pin' pages which are destined to
be written out to disk. As a result the contents of the pages may
change as the request to do I/O against these pages transits the I/O
stack down to disk.
This results in a 'race' condition where one side of a RAID1 mirror
gets one version ofdata written to it while the other side of the
mirror gets a different piece of data written to it. In the case of a
swap partition this appears to be harmless. In the case of
filesystems there seems to be a general assurance that this occurs
only in uninhabited portions of the filesystem.
The 'check' feature of the MD system which the 99-raid-check uses
reads the underlying physical devices of a composite RAID device. The
mismatch_cnt is elevated if the contents of mirrored sectors are not
identical.
The results of the intersection of all this are problematic now that
major distributions have included this raid-check feature. There are
probably hundreds if not thousands of systems which are reporting what
may or may not be false positives with respect to data corruption.
The current RAID stack has an option to 'repair' a RAID set which has
mismatches. Unfortunately there is no intelligence in this facility
and it randomly picks one of the sectors as being 'good' and uses that
to replace the contents of the other sector. I'm somewhat reticent to
recommend the use of this facility given the issues at hand.
A complicating factor is that the kernel does not report the location
of where the mismatches occur. There appears to be movement underway
to include support in the kernel for printing out the sector locations
of the mismatches.
When that feature becomes available there will be a need to have some
type of tool, in the case of RAID1 devices backing filesystems, to
make an assessment of which version of the data is 'correct' so the
faulty version can be over-written with the correct version.
As an aside what is really needed is a tool which assesses
whether or not the mismatched sectors are actually in an
inhabited portion of the filesystem. If not the 'repair'
facility on RAID1 could be presumably run with no issues.
Given the appropriate coherency/validation checks to make sure
the sectors are still incoherent secondary to a race where the
uninhabited portion chooses to become inhabited.
We see the issue over a large range of production systems running
standard RHEL5 kernels all the way up to recent versions of Fedora.
Interestingly the mismatch counts are always an exact multiple of 128
on all the systems.
We have also isolated the problem to be RAID1 and independent of the
backing store. We run geographical mirrors where an initiator is fed
from two separate data-centers where each mirror half is based on a
RAID5 Linux target. On RAID1 mirrors which are mismatched the two
separate RAID5 backing volumes both report completely consistent
volumes.
So there is the situation as I believe it currently stands.
The notion of running the 'check' sync_action is well founded. The
issue of 'silent' data corruption is well understood and well founded.
The Linux RAID system as of a couple of years ago will re-write any
sectors which come up as unreadable during the check process. Disk
drives will re-allocate a sector from their re-mapping pool
effectively replacing the bad sector. This pays huge dividends with
respect to maintaining healthy RAID farms.
Unfortunately the report of the mismatch_cnt's is problematic given
the above issues. I think it is unfortunate the vendors opted to
release this checking/reporting while these issues are still unresolved.
> thanks in advance.
> regards.
>
> --
> Levente "Si vis pacem para bellum!"
Hope the above information is helpful for everyone running into this
issue.
Best wishes for a productive remainder of the week to everyone.
Greg
}-- End of excerpt from Farkas Levente
As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Experience is something you don't get until just after you need it."
-- Olivier
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /etc/cron.weekly/99-raid-check
2009-12-02 15:38 ` /etc/cron.weekly/99-raid-check greg
@ 2009-12-03 9:11 ` CoolCold
2009-12-03 9:53 ` /etc/cron.weekly/99-raid-check Robin Hill
2009-12-03 10:49 ` /etc/cron.weekly/99-raid-check Sujit K M
0 siblings, 2 replies; 5+ messages in thread
From: CoolCold @ 2009-12-03 9:11 UTC (permalink / raw)
To: greg; +Cc: Farkas Levente, CentOS mailing list, linux-raid
On Wed, Dec 2, 2009 at 6:38 PM, <greg@enjellic.com> wrote:
> On Nov 30, 2:08pm, Farkas Levente wrote:
> } Subject: /etc/cron.weekly/99-raid-check
>
>> hi,
>
> Hi Farkas, hope your day is going well. Just thought I would respond
> for the edification of others who are troubled by this issue.
>
>> it's been a few weeks since rhel/centos 5.4 released and there were many
>> discussion about this new "feature" the weekly raid partition check.
>> we've got a lot's of server with raid1 system and i already try to
>> configure them not to send these messages, but i'm not able ie. i
>> already add to the SKIP_DEVS all of my swap partitions (since i read it
>> on linux-kernel list that there can be mismatch_cnt even though i still
>> not understand why?). but even the data partitions (ie. all of my
>> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt
>> is never 0 at the weekend). and this cause all of my raid1 partitions
>> are rebuild during the weekend. and i don't like it:-(
>> so my questions:
>> - is it a real bug in the raid1 system?
>> - is it a real bug in my disk which runs raid (not really believe since
>> it's dozens of servers)?
>> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4?
>> or what's the problem?
>> can someone enlighten me?
>
> Its a combination of what I would consider a misfeature with what MAY
> BE, and I stress MAY be a sentient bug someplace.
>
> The current RAID/IO stack does not 'pin' pages which are destined to
> be written out to disk. As a result the contents of the pages may
> change as the request to do I/O against these pages transits the I/O
> stack down to disk.
Can you write a bit more about "the pages may change"? 'Who' can
change page contents ?
> This results in a 'race' condition where one side of a RAID1 mirror
> gets one version ofdata written to it while the other side of the
> mirror gets a different piece of data written to it. In the case of a
> swap partition this appears to be harmless. In the case of
> filesystems there seems to be a general assurance that this occurs
> only in uninhabited portions of the filesystem.
>
> The 'check' feature of the MD system which the 99-raid-check uses
> reads the underlying physical devices of a composite RAID device. The
> mismatch_cnt is elevated if the contents of mirrored sectors are not
> identical.
>
> The results of the intersection of all this are problematic now that
> major distributions have included this raid-check feature. There are
> probably hundreds if not thousands of systems which are reporting what
> may or may not be false positives with respect to data corruption.
>
> The current RAID stack has an option to 'repair' a RAID set which has
> mismatches. Unfortunately there is no intelligence in this facility
> and it randomly picks one of the sectors as being 'good' and uses that
> to replace the contents of the other sector. I'm somewhat reticent to
> recommend the use of this facility given the issues at hand.
>
> A complicating factor is that the kernel does not report the location
> of where the mismatches occur. There appears to be movement underway
> to include support in the kernel for printing out the sector locations
> of the mismatches.
>
> When that feature becomes available there will be a need to have some
> type of tool, in the case of RAID1 devices backing filesystems, to
> make an assessment of which version of the data is 'correct' so the
> faulty version can be over-written with the correct version.
>
> As an aside what is really needed is a tool which assesses
> whether or not the mismatched sectors are actually in an
> inhabited portion of the filesystem. If not the 'repair'
> facility on RAID1 could be presumably run with no issues.
> Given the appropriate coherency/validation checks to make sure
> the sectors are still incoherent secondary to a race where the
> uninhabited portion chooses to become inhabited.
>
> We see the issue over a large range of production systems running
> standard RHEL5 kernels all the way up to recent versions of Fedora.
> Interestingly the mismatch counts are always an exact multiple of 128
> on all the systems.
>
> We have also isolated the problem to be RAID1 and independent of the
> backing store. We run geographical mirrors where an initiator is fed
> from two separate data-centers where each mirror half is based on a
> RAID5 Linux target. On RAID1 mirrors which are mismatched the two
> separate RAID5 backing volumes both report completely consistent
> volumes.
>
> So there is the situation as I believe it currently stands.
>
> The notion of running the 'check' sync_action is well founded. The
> issue of 'silent' data corruption is well understood and well founded.
> The Linux RAID system as of a couple of years ago will re-write any
> sectors which come up as unreadable during the check process. Disk
> drives will re-allocate a sector from their re-mapping pool
> effectively replacing the bad sector. This pays huge dividends with
> respect to maintaining healthy RAID farms.
>
> Unfortunately the report of the mismatch_cnt's is problematic given
> the above issues. I think it is unfortunate the vendors opted to
> release this checking/reporting while these issues are still unresolved.
>
>> thanks in advance.
>> regards.
>>
>> --
>> Levente "Si vis pacem para bellum!"
>
> Hope the above information is helpful for everyone running into this
> issue.
>
> Best wishes for a productive remainder of the week to everyone.
>
> Greg
>
> }-- End of excerpt from Farkas Levente
>
> As always,
> Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
> 4206 N. 19th Ave. Specializing in information infra-structure
> Fargo, ND 58102 development.
> PH: 701-281-1686
> FAX: 701-281-3949 EMAIL: greg@enjellic.com
> ------------------------------------------------------------------------------
> "Experience is something you don't get until just after you need it."
> -- Olivier
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /etc/cron.weekly/99-raid-check
2009-12-03 9:11 ` /etc/cron.weekly/99-raid-check CoolCold
@ 2009-12-03 9:53 ` Robin Hill
2009-12-03 10:49 ` /etc/cron.weekly/99-raid-check Sujit K M
1 sibling, 0 replies; 5+ messages in thread
From: Robin Hill @ 2009-12-03 9:53 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]
On Thu Dec 03, 2009 at 12:11:29PM +0300, CoolCold wrote:
> > The current RAID/IO stack does not 'pin' pages which are destined to
> > be written out to disk. As a result the contents of the pages may
> > change as the request to do I/O against these pages transits the I/O
> > stack down to disk.
>
> Can you write a bit more about "the pages may change"? 'Who' can
> change page contents ?
>
My understanding of this is:
- An application maps a file to memory.
- It makes some modifications.
- The kernel flags the md layer to write these changes to disk.
- The md layer writes to one copy of the RAID1.
- The application makes some more modifications.
- The md layer writes to the second RAID1 copy (getting different
data).
- The kernel flags the md layer to write the new changes.
- The md layer writes both copies.
The second set of writes can go to a different section of the disk, so
you're left with the (now unused) blocks being different on the disk.
There's no data problem, as the kernel always flags the md layer to
write _after_ the data's changes, and the md layer always writes the
data _after_ the kernel notifies it.
Note: I've not looked at the code for any of this, so I don't _know_
that this is what's happening, but that's my understanding from what
I've read on here in the past.
HTH,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: /etc/cron.weekly/99-raid-check
2009-12-03 9:11 ` /etc/cron.weekly/99-raid-check CoolCold
2009-12-03 9:53 ` /etc/cron.weekly/99-raid-check Robin Hill
@ 2009-12-03 10:49 ` Sujit K M
1 sibling, 0 replies; 5+ messages in thread
From: Sujit K M @ 2009-12-03 10:49 UTC (permalink / raw)
To: CoolCold; +Cc: greg, Farkas Levente, CentOS mailing list, linux-raid
Kindly Follow some mailing list critique. Donot jump up unrelated
threads without
knowing what is being discussed.
On Thu, Dec 3, 2009 at 2:41 PM, CoolCold <coolthecold@gmail.com> wrote:
> On Wed, Dec 2, 2009 at 6:38 PM, <greg@enjellic.com> wrote:
>> On Nov 30, 2:08pm, Farkas Levente wrote:
>> } Subject: /etc/cron.weekly/99-raid-check
>>
>>> hi,
>>
>> Hi Farkas, hope your day is going well. Just thought I would respond
>> for the edification of others who are troubled by this issue.
>>
>>> it's been a few weeks since rhel/centos 5.4 released and there were many
>>> discussion about this new "feature" the weekly raid partition check.
>>> we've got a lot's of server with raid1 system and i already try to
>>> configure them not to send these messages, but i'm not able ie. i
>>> already add to the SKIP_DEVS all of my swap partitions (since i read it
>>> on linux-kernel list that there can be mismatch_cnt even though i still
>>> not understand why?). but even the data partitions (ie. all of my
>>> servers all raid1 partitions) produce this error (ie. ther mismatch_cnt
>>> is never 0 at the weekend). and this cause all of my raid1 partitions
>>> are rebuild during the weekend. and i don't like it:-(
>>> so my questions:
>>> - is it a real bug in the raid1 system?
>>> - is it a real bug in my disk which runs raid (not really believe since
>>> it's dozens of servers)?
>>> - the /etc/cron.weekly/99-raid-check is wrong in rhel/centos-5.4?
>>> or what's the problem?
>>> can someone enlighten me?
>>
>> Its a combination of what I would consider a misfeature with what MAY
>> BE, and I stress MAY be a sentient bug someplace.
>>
>> The current RAID/IO stack does not 'pin' pages which are destined to
>> be written out to disk. As a result the contents of the pages may
>> change as the request to do I/O against these pages transits the I/O
>> stack down to disk.
>
> Can you write a bit more about "the pages may change"? 'Who' can
> change page contents ?
>
>
>> This results in a 'race' condition where one side of a RAID1 mirror
>> gets one version ofdata written to it while the other side of the
>> mirror gets a different piece of data written to it. In the case of a
>> swap partition this appears to be harmless. In the case of
>> filesystems there seems to be a general assurance that this occurs
>> only in uninhabited portions of the filesystem.
>>
>> The 'check' feature of the MD system which the 99-raid-check uses
>> reads the underlying physical devices of a composite RAID device. The
>> mismatch_cnt is elevated if the contents of mirrored sectors are not
>> identical.
>>
>> The results of the intersection of all this are problematic now that
>> major distributions have included this raid-check feature. There are
>> probably hundreds if not thousands of systems which are reporting what
>> may or may not be false positives with respect to data corruption.
>>
>> The current RAID stack has an option to 'repair' a RAID set which has
>> mismatches. Unfortunately there is no intelligence in this facility
>> and it randomly picks one of the sectors as being 'good' and uses that
>> to replace the contents of the other sector. I'm somewhat reticent to
>> recommend the use of this facility given the issues at hand.
>>
>> A complicating factor is that the kernel does not report the location
>> of where the mismatches occur. There appears to be movement underway
>> to include support in the kernel for printing out the sector locations
>> of the mismatches.
>>
>> When that feature becomes available there will be a need to have some
>> type of tool, in the case of RAID1 devices backing filesystems, to
>> make an assessment of which version of the data is 'correct' so the
>> faulty version can be over-written with the correct version.
>>
>> As an aside what is really needed is a tool which assesses
>> whether or not the mismatched sectors are actually in an
>> inhabited portion of the filesystem. If not the 'repair'
>> facility on RAID1 could be presumably run with no issues.
>> Given the appropriate coherency/validation checks to make sure
>> the sectors are still incoherent secondary to a race where the
>> uninhabited portion chooses to become inhabited.
>>
>> We see the issue over a large range of production systems running
>> standard RHEL5 kernels all the way up to recent versions of Fedora.
>> Interestingly the mismatch counts are always an exact multiple of 128
>> on all the systems.
>>
>> We have also isolated the problem to be RAID1 and independent of the
>> backing store. We run geographical mirrors where an initiator is fed
>> from two separate data-centers where each mirror half is based on a
>> RAID5 Linux target. On RAID1 mirrors which are mismatched the two
>> separate RAID5 backing volumes both report completely consistent
>> volumes.
>>
>> So there is the situation as I believe it currently stands.
>>
>> The notion of running the 'check' sync_action is well founded. The
>> issue of 'silent' data corruption is well understood and well founded.
>> The Linux RAID system as of a couple of years ago will re-write any
>> sectors which come up as unreadable during the check process. Disk
>> drives will re-allocate a sector from their re-mapping pool
>> effectively replacing the bad sector. This pays huge dividends with
>> respect to maintaining healthy RAID farms.
>>
>> Unfortunately the report of the mismatch_cnt's is problematic given
>> the above issues. I think it is unfortunate the vendors opted to
>> release this checking/reporting while these issues are still unresolved.
>>
>>> thanks in advance.
>>> regards.
>>>
>>> --
>>> Levente "Si vis pacem para bellum!"
>>
>> Hope the above information is helpful for everyone running into this
>> issue.
>>
>> Best wishes for a productive remainder of the week to everyone.
>>
>> Greg
>>
>> }-- End of excerpt from Farkas Levente
>>
>> As always,
>> Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
>> 4206 N. 19th Ave. Specializing in information infra-structure
>> Fargo, ND 58102 development.
>> PH: 701-281-1686
>> FAX: 701-281-3949 EMAIL: greg@enjellic.com
>> ------------------------------------------------------------------------------
>> "Experience is something you don't get until just after you need it."
>> -- Olivier
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best regards,
> [COOLCOLD-RIPN]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
-- Sujit K M
blog(http://kmsujit.blogspot.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-03 10:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-30 13:08 /etc/cron.weekly/99-raid-check Farkas Levente
[not found] <lfarkas@lfarkas.org>
2009-12-02 15:38 ` /etc/cron.weekly/99-raid-check greg
2009-12-03 9:11 ` /etc/cron.weekly/99-raid-check CoolCold
2009-12-03 9:53 ` /etc/cron.weekly/99-raid-check Robin Hill
2009-12-03 10:49 ` /etc/cron.weekly/99-raid-check Sujit K M
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).