md devices: Suggestion for in place time and checksum within the RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md devices: Suggestion for in place time and checksum within the RAID
@ 2010-03-13 23:00 Joachim Otahal
  2010-03-14  0:04 ` Bill Davidsen
  0 siblings, 1 reply; 8+ messages in thread
From: Joachim Otahal @ 2010-03-13 23:00 UTC (permalink / raw)
  To: linux-raid

Current Situation in RAID:
If a drive fails silently and is giving out wrong data instead of read 
errors there is no way to detect that corruption (no fun, I had that a 
few times already).
Even in RAID1 with three drives there is no "two over three" voting 
mechanism.

A workaround for that problem would be:
Adding one sector to each chunk to store the time (in nanoseconds 
resolution) + CRC or ECC value of the whole stripe, making it possible 
to see and handle such errors below the filesystem level.
Time in nanoseconds only to differ between those many writes that 
actually happen, it does not really matter how precise the time actually 
is, just every stripe update should have a different time value from the 
previous update.
It would be an easy way to know which chunks are actually the latest (or 
which contain correct data in case one out of three+ chunks has a wrong 
time upon reading). A random uniqe ID or counter could also do the job 
of the time value if anyone prefers, but I doubt since the collision 
possibility would be higher.
The use of CRC or ECC or whatever hash should be obvious, their 
existence would make it easy to detect drive degration, even in a RAID0 
or LINEAR.
Bad side: Adding this might break the on the fly raid expansion 
capabilities. A workaround might be using 8K(+ one sector) chunks by 
default upon creation or the need to specify the chunk size on creation 
(like 8k+1 sector) if future expansion capabilities are actually wanted 
with RAID0/4/5/6, but that is a different issue anyway.

Question:
Will RAID4/5/6 in the future use the parity upon read too? Currently it 
would not detect wrong data reads from the parity chunk, resulting in a 
disaster when it is actually needed.

Do those plans already exist and my post was completely useless?

Sorry that I cannot give patches, my last kernel patch + compile was 
2.2.26, since then I never compiled a kernel.

Joachim Otahal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-13 23:00 md devices: Suggestion for in place time and checksum within the RAID Joachim Otahal
@ 2010-03-14  0:04 ` Bill Davidsen
  2010-03-14  1:25   ` Joachim Otahal
  0 siblings, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2010-03-14  0:04 UTC (permalink / raw)
  To: Joachim Otahal; +Cc: linux-raid

Joachim Otahal wrote:
> Current Situation in RAID:
> If a drive fails silently and is giving out wrong data instead of read 
> errors there is no way to detect that corruption (no fun, I had that a 
> few times already).

That is almost certainly a hardware issue, the chances of silent bad 
data are tiny, the chances of bad hardware messing the data is more 
likely. Often cable issues.

> Even in RAID1 with three drives there is no "two over three" voting 
> mechanism.
>
> A workaround for that problem would be:
> Adding one sector to each chunk to store the time (in nanoseconds 
> resolution) + CRC or ECC value of the whole stripe, making it possible 
> to see and handle such errors below the filesystem level.
> Time in nanoseconds only to differ between those many writes that 
> actually happen, it does not really matter how precise the time 
> actually is, just every stripe update should have a different time 
> value from the previous update.

Unlikely to have meaning, there is so much caching and delay that it 
would be inaccurate. A simple monotonic counter of writes would do as 
well. And I think you need to do it at a lower level than chuck, like 
sector. Have to look at that code again.

> It would be an easy way to know which chunks are actually the latest 
> (or which contain correct data in case one out of three+ chunks has a 
> wrong time upon reading). A random uniqe ID or counter could also do 
> the job of the time value if anyone prefers, but I doubt since the 
> collision possibility would be higher.

You can only know the time when the buffer is filled, after that you 
have write cache, drive cache, and rotational delay. A count does as 
well and doesn't depend on time between PCUs being the same at ns level.

> The use of CRC or ECC or whatever hash should be obvious, their 
> existence would make it easy to detect drive degration, even in a 
> RAID0 or LINEAR.

There is a ton of that in the drive already.

> Bad side: Adding this might break the on the fly raid expansion 
> capabilities. A workaround might be using 8K(+ one sector) chunks by 
> default upon creation or the need to specify the chunk size on 
> creation (like 8k+1 sector) if future expansion capabilities are 
> actually wanted with RAID0/4/5/6, but that is a different issue anyway.
>
> Question:
> Will RAID4/5/6 in the future use the parity upon read too? Currently 
> it would not detect wrong data reads from the parity chunk, resulting 
> in a disaster when it is actually needed.
>
> Do those plans already exist and my post was completely useless?
>
> Sorry that I cannot give patches, my last kernel patch + compile was 
> 2.2.26, since then I never compiled a kernel.
>
> Joachim Otahal
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14  0:04 ` Bill Davidsen
@ 2010-03-14  1:25   ` Joachim Otahal
  2010-03-14 10:20     ` Keld Simonsen
  0 siblings, 1 reply; 8+ messages in thread
From: Joachim Otahal @ 2010-03-14  1:25 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

Bill Davidsen schrieb:
> Joachim Otahal wrote:
>> Current Situation in RAID:
>> If a drive fails silently and is giving out wrong data instead of 
>> read errors there is no way to detect that corruption (no fun, I had 
>> that a few times already).
>
> That is almost certainly a hardware issue, the chances of silent bad 
> data are tiny, the chances of bad hardware messing the data is more 
> likely. Often cable issues.
In over 20 years (including our customer drives) about ten harddrives of 
that type. Does indeed not happen often. Were not cable issues, we 
replaced the drive with the same type and vendor and RMA'd the original. 
It is not vendor specific, it's like every vendor does have such 
problematic drives during their existence. The last case was just a few 
month ago.

>> Even in RAID1 with three drives there is no "two over three" voting 
>> mechanism.
>>
>> A workaround for that problem would be:
>> Adding one sector to each chunk to store the time (in nanoseconds 
>> resolution) + CRC or ECC value of the whole stripe, making it 
>> possible to see and handle such errors below the filesystem level.
>> Time in nanoseconds only to differ between those many writes that 
>> actually happen, it does not really matter how precise the time 
>> actually is, just every stripe update should have a different time 
>> value from the previous update.
>
> Unlikely to have meaning, there is so much caching and delay that it 
> would be inaccurate. A simple monotonic counter of writes would do as 
> well. And I think you need to do it at a lower level than chuck, like 
> sector. Have to look at that code again.
 From what I know from the docs: The "stripe" is normally 64k, so the 
"chunk" on each drive when using raid5 with three drives is 32k, smaller 
with more drives. At least that is what I am referring to : ). The 
filesystem level never sees what is done on the raid level not even in 
the ZFS implementation on linux which was originally designed for such a 
case.

>> The use of CRC or ECC or whatever hash should be obvious, their 
>> existence would make it easy to detect drive degration, even in a 
>> RAID0 or LINEAR.
>
> There is a ton of that in the drive already.
That is mainly meant to know whether the stripe is consistent (after 
power fail etc), and if not, correct it. Currently that cannot be 
detected, especially since the the partiy is not read in the current 
implementation (at least the docs say so!). If it can be reconstructed 
using the ECC and/or parity write the corrected data back silently (if 
mounted rw) to get the data consistent again. For successful silent 
correction only one syslog line would be enough, if correction is not 
possible it can still go back to the current default behaviour, read 
whatever is there, but at least we could _detect_ such inconsistency.

>> Bad side: Adding this might break the on the fly raid expansion 
>> capabilities. A workaround might be using 8K(+ one sector) chunks by 
>> default upon creation or the need to specify the chunk size on 
>> creation (like 8k+1 sector) if future expansion capabilities are 
>> actually wanted with RAID0/4/5/6, but that is a different issue anyway.
>>
>> Question:
>> Will RAID4/5/6 in the future use the parity upon read too? Currently 
>> it would not detect wrong data reads from the parity chunk, resulting 
>> in a disaster when it is actually needed.
>>
>> Do those plans already exist and my post was completely useless?
>>
>> Sorry that I cannot give patches, my last kernel patch + compile was 
>> 2.2.26, since then I never compiled a kernel.
>>
>> Joachim Otahal
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14  1:25   ` Joachim Otahal
@ 2010-03-14 10:20     ` Keld Simonsen
  2010-03-14 11:58       ` Joachim Otahal
  0 siblings, 1 reply; 8+ messages in thread
From: Keld Simonsen @ 2010-03-14 10:20 UTC (permalink / raw)
  To: Joachim Otahal; +Cc: Bill Davidsen, linux-raid

On Sun, Mar 14, 2010 at 02:25:38AM +0100, Joachim Otahal wrote:
> Bill Davidsen schrieb:
> >Joachim Otahal wrote:
> >>Current Situation in RAID:
> >>If a drive fails silently and is giving out wrong data instead of 
> >>read errors there is no way to detect that corruption (no fun, I had 
> >>that a few times already).
> >
> >That is almost certainly a hardware issue, the chances of silent bad 
> >data are tiny, the chances of bad hardware messing the data is more 
> >likely. Often cable issues.
> In over 20 years (including our customer drives) about ten harddrives of 
> that type. Does indeed not happen often. Were not cable issues, we 
> replaced the drive with the same type and vendor and RMA'd the original. 
> It is not vendor specific, it's like every vendor does have such 
> problematic drives during their existence. The last case was just a few 
> month ago.
> 
> >>Even in RAID1 with three drives there is no "two over three" voting 
> >>mechanism.
> >>
> >>A workaround for that problem would be:
> >>Adding one sector to each chunk to store the time (in nanoseconds 
> >>resolution) + CRC or ECC value of the whole stripe, making it 
> >>possible to see and handle such errors below the filesystem level.
> >>Time in nanoseconds only to differ between those many writes that 
> >>actually happen, it does not really matter how precise the time 
> >>actually is, just every stripe update should have a different time 
> >>value from the previous update.
> >
> >Unlikely to have meaning, there is so much caching and delay that it 
> >would be inaccurate. A simple monotonic counter of writes would do as 
> >well. And I think you need to do it at a lower level than chuck, like 
> >sector. Have to look at that code again.
> From what I know from the docs: The "stripe" is normally 64k, so the 
> "chunk" on each drive when using raid5 with three drives is 32k, smaller 
> with more drives. At least that is what I am referring to : ). The 
> filesystem level never sees what is done on the raid level not even in 
> the ZFS implementation on linux which was originally designed for such a 
> case.
> 
> >>The use of CRC or ECC or whatever hash should be obvious, their 
> >>existence would make it easy to detect drive degration, even in a 
> >>RAID0 or LINEAR.
> >
> >There is a ton of that in the drive already.
> That is mainly meant to know whether the stripe is consistent (after 
> power fail etc), and if not, correct it. Currently that cannot be 
> detected, especially since the the partiy is not read in the current 
> implementation (at least the docs say so!). If it can be reconstructed 
> using the ECC and/or parity write the corrected data back silently (if 
> mounted rw) to get the data consistent again. For successful silent 
> correction only one syslog line would be enough, if correction is not 
> possible it can still go back to the current default behaviour, read 
> whatever is there, but at least we could _detect_ such inconsistency.
> 
> >>Bad side: Adding this might break the on the fly raid expansion 
> >>capabilities. A workaround might be using 8K(+ one sector) chunks by 
> >>default upon creation or the need to specify the chunk size on 
> >>creation (like 8k+1 sector) if future expansion capabilities are 
> >>actually wanted with RAID0/4/5/6, but that is a different issue anyway.
> >>
> >>Question:
> >>Will RAID4/5/6 in the future use the parity upon read too? Currently 
> >>it would not detect wrong data reads from the parity chunk, resulting 
> >>in a disaster when it is actually needed.
> >>
> >>Do those plans already exist and my post was completely useless?
> >>
> >>Sorry that I cannot give patches, my last kernel patch + compile was 
> >>2.2.26, since then I never compiled a kernel.
> >>
> >>Joachim Otahal

Hmm, would that not be detected by a check - initiated by cron?

Which data to believe could then be determined according to a number 
of techniques, like for a 3 copy array the best 2 out of 3,
investigating the error log of the drives, and relaying the error
information to the file system layer for manual inspection and repair.
I would expect this is not something that occurs frequently, so maybe 
once a year for the unlucky or systems with many disks.

best regards
keld

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14 10:20     ` Keld Simonsen
@ 2010-03-14 11:58       ` Joachim Otahal
  2010-03-14 13:03         ` Keld Simonsen
  0 siblings, 1 reply; 8+ messages in thread
From: Joachim Otahal @ 2010-03-14 11:58 UTC (permalink / raw)
  To: Keld Simonsen; +Cc: Bill Davidsen, linux-raid

Keld Simonsen schrieb:
> On Sun, Mar 14, 2010 at 02:25:38AM +0100, Joachim Otahal wrote
>>>> Question:
>>>> Will RAID4/5/6 in the future use the parity upon read too? Currently
>>>> it would not detect wrong data reads from the parity chunk, resulting
>>>> in a disaster when it is actually needed.
>>>>
>>>> Do those plans already exist and my post was completely useless?
>>>>
>>>> Sorry that I cannot give patches, my last kernel patch + compile was
>>>> 2.2.26, since then I never compiled a kernel.
>>>>
>>>> Joachim Otahal
>>>>          
> Hmm, would that not be detected by a check - initiated by cron?
>    
Debian schedules a monthly check (first sunday 00:57), IMHO the best 
possible time and frequency, less is dangerous, more is useless. I added 
a cronjob to check every 15 minutes for changes from /proc/mdstat and 
changes from smart info (reallocated sector count and drive internal 
error list only) and emails me if something changed from the previous check.
I use the script because /etc/mdadm/mdadm.conf only takes ONE email 
address and requires a local MTA installed, I allways uninstall the 
local MTA if the machine is not going to be a mail server.
But why not checking parity during normal read operation? Was that a 
performance decision? It is not _that_ bad not doing it during normal 
operation since the good dists schedule a regular check, but can it be 
controlled by something like echo "1" > 
/proc/sys/dev/raid/always_read_parity ?

> Which data to believe could then be determined according to a number
> of techniques, like for a 3 copy array the best 2 out of 3,
> investigating the error log of the drives, and relaying the error
> information to the file system layer for manual inspection and repair.
>    
That is a matter of "believe" and "best guess" and not "knowing" which 
contains the correct data in redundant array levels, hence the 
suggestion from before to include a timer + ECC (or better) at the raid 
level, so we actually _know_ which is the newest, and we _know_ which 
stripe does have consistent data, no guessing needed, we can apply 
crystal clear rules.
My ruleset would be:
first use: newest time and correct ECC
second use: newest time and correctable ECC
third use: any time and correct ECC (hint possible filesystem error to 
the lext layer)
fourth use: any time and correctable ECC (hint possible filesystem error 
to the lext layer)
fifth use: Current implementation, use the data from the active drive 
ordering according to the list in the superblock + hint possible 
filesystem error to the lext layer.
A raid aware filesystem would be perfect (compare with ZFS on Solaris) 
eliminating the write hole problem, doing the checksum at raid level 
makes it more flexible.

> I would expect this is not something that occurs frequently, so maybe
> once a year for the unlucky or systems with many disks.
>    
If you get paranoid about corrupting really important data once in 5 
years too much. Implementing the checksum + timestamp would lift linux 
software raid to the next level, closer to enterprise where such 
techniques are actually in use. At it's current level it is very good 
and solid, so it is time to get to the next level for long time archiving.

regards,

Joachim Otahal


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14 11:58       ` Joachim Otahal
@ 2010-03-14 13:03         ` Keld Simonsen
  2010-03-14 14:00           ` Joachim Otahal
  2010-03-15 21:28           ` Joachim Otahal
  0 siblings, 2 replies; 8+ messages in thread
From: Keld Simonsen @ 2010-03-14 13:03 UTC (permalink / raw)
  To: Joachim Otahal; +Cc: Bill Davidsen, linux-raid

On Sun, Mar 14, 2010 at 12:58:50PM +0100, Joachim Otahal wrote:
> Keld Simonsen schrieb:
> >On Sun, Mar 14, 2010 at 02:25:38AM +0100, Joachim Otahal wrote
> >>>>Question:
> >>>>Will RAID4/5/6 in the future use the parity upon read too? Currently
> >>>>it would not detect wrong data reads from the parity chunk, resulting
> >>>>in a disaster when it is actually needed.
> >>>>
> >>>>Do those plans already exist and my post was completely useless?
> >>>>
> >>>>Sorry that I cannot give patches, my last kernel patch + compile was
> >>>>2.2.26, since then I never compiled a kernel.
> >>>>
> >>>>Joachim Otahal
> >>>>         
> >Hmm, would that not be detected by a check - initiated by cron?
> >   
> Debian schedules a monthly check (first sunday 00:57), IMHO the best 
> possible time and frequency, less is dangerous, more is useless. I added 
> a cronjob to check every 15 minutes for changes from /proc/mdstat and 
> changes from smart info (reallocated sector count and drive internal 
> error list only) and emails me if something changed from the previous check.
> I use the script because /etc/mdadm/mdadm.conf only takes ONE email 
> address and requires a local MTA installed, I allways uninstall the 
> local MTA if the machine is not going to be a mail server.

Interesting! I would like to see your scripts....

> But why not checking parity during normal read operation? Was that a 
> performance decision?

I don't know, but I do think it would hurt performance considerably.


> It is not _that_ bad not doing it during normal 
> operation since the good dists schedule a regular check, but can it be 
> controlled by something like echo "1" > 
> /proc/sys/dev/raid/always_read_parity ?

Well, I think making an optional check would be fine.
I dont know if it could be done in a non-performance hurting way, such
as being deleyed or running at a lower IO priority.

> >Which data to believe could then be determined according to a number
> >of techniques, like for a 3 copy array the best 2 out of 3,
> >investigating the error log of the drives, and relaying the error
> >information to the file system layer for manual inspection and repair.
> >   
> That is a matter of "believe" and "best guess" and not "knowing" which 
> contains the correct data in redundant array levels, hence the 
> suggestion from before to include a timer + ECC (or better) at the raid 
> level, so we actually _know_ which is the newest, and we _know_ which 
> stripe does have consistent data, no guessing needed, we can apply 
> crystal clear rules.
> My ruleset would be:
> first use: newest time and correct ECC
> second use: newest time and correctable ECC
> third use: any time and correct ECC (hint possible filesystem error to 
> the lext layer)
> fourth use: any time and correctable ECC (hint possible filesystem error 
> to the lext layer)
> fifth use: Current implementation, use the data from the active drive 
> ordering according to the list in the superblock + hint possible 
> filesystem error to the lext layer.
> A raid aware filesystem would be perfect (compare with ZFS on Solaris) 
> eliminating the write hole problem, doing the checksum at raid level 
> makes it more flexible.

Interesting ideas

> >I would expect this is not something that occurs frequently, so maybe
> >once a year for the unlucky or systems with many disks.
> >   
> If you get paranoid about corrupting really important data once in 5 
> years too much. Implementing the checksum + timestamp would lift linux 
> software raid to the next level, closer to enterprise where such 
> techniques are actually in use. At it's current level it is very good 
> and solid, so it is time to get to the next level for long time archiving.

I was not trying to say this is not important, but rather that error
correction could be done by manual intervention, given that is not so
frequent. Or at least that manual corrction should be one of the
impelemted ways of adressing it.

best regards
keld

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14 13:03         ` Keld Simonsen
@ 2010-03-14 14:00           ` Joachim Otahal
  2010-03-15 21:28           ` Joachim Otahal
  1 sibling, 0 replies; 8+ messages in thread
From: Joachim Otahal @ 2010-03-14 14:00 UTC (permalink / raw)
  To: Keld Simonsen; +Cc: Bill Davidsen, linux-raid

Keld Simonsen schrieb:
> On Sun, Mar 14, 2010 at 12:58:50PM +0100, Joachim Otahal wrote:
>    
>> Debian schedules a monthly check (first sunday 00:57), IMHO the best
>> possible time and frequency, less is dangerous, more is useless. I added
>> a cronjob to check every 15 minutes for changes from /proc/mdstat and
>> changes from smart info (reallocated sector count and drive internal
>> error list only) and emails me if something changed from the previous check.
>> I use the script because /etc/mdadm/mdadm.conf only takes ONE email
>> address and requires a local MTA installed, I allways uninstall the
>> local MTA if the machine is not going to be a mail server.
>>      
> Interesting! I would like to see your scripts....
>    
sendEmail.pl is from 
http://caspian.dotconf.net/menu/Software/SendEmail/, in his latest 
update he managed to get rid of tls and base64-encoding problems.
Here is the unpolished script, in "it does what it should do" state. The 
HEALTHFILE variable is changed to somewhere in the middle. The locations 
are chosen for: raid info at every boot + upon change, smart info only 
when something changes. It is run every 15 minutes from cron. One of my 
hdd's had a growing reallocated sector count each two weeks, but seems 
to be stabilized now, I can nicely follow that in my inbox.

#!/bin/sh
HEALTHFILE="/tmp/healthcheck.mdstat"
HARDDRIVES="/dev/sda /dev/sdb /dev/sdc /dev/sdd"
SENDEMAILCOMMAND="/usr/local/sbin/sendEmail.pl -f <sender> -t 
<receipient> -cc <receipient> -cc <receipient> -s <smtp-server> -o 
tls=auto -xu <smtp-user> -xp <smtp-password>"
if [ -f ${HEALTHFILE}.1 ] ; then /bin/rm -f ${HEALTHFILE}.1 ; fi
if [ -f ${HEALTHFILE}.0 ] ; then /bin/mv ${HEALTHFILE}.0 ${HEALTHFILE}.1 
; else /usr/bin/touch ${HEALTHFILE}.1 ; fi
/bin/cat /proc/mdstat > ${HEALTHFILE}.0
/usr/bin/diff ${HEALTHFILE}.0 ${HEALTHFILE}.1 > /dev/null
case "$?" in
   0)
     #
   ;;
   1)
     ${SENDEMAILCOMMAND} -u "RAID status" < ${HEALTHFILE}.0
   ;;
esac

HEALTHFILE="/var/log/healthcheck.smartdtl.realloc-sector-count"
if [ -f ${HEALTHFILE}.1 ] ; then /bin/rm -f ${HEALTHFILE}.1 ; fi
if [ -f ${HEALTHFILE}.0 ] ; then /bin/mv ${HEALTHFILE}.0 ${HEALTHFILE}.1 
; else /usr/bin/touch ${HEALTHFILE}.1 ; fi
echo "SMART shot info:"> ${HEALTHFILE}.0
for X in ${HARDDRIVES} ; do
   /bin/echo "${X}">> ${HEALTHFILE}.0
   /usr/local/sbin/smartctl --all ${X} | /bin/grep -i 
Reallocated_Sector_Ct >> ${HEALTHFILE}.0
done
/bin/echo 
"------------------------------------------------------------------------">> 
${HEALTHFILE}.0
/bin/echo "Error Log from drives">> ${HEALTHFILE}.0
for X in ${HARDDRIVES} ; do
   /bin/echo "${X}">> ${HEALTHFILE}.0
   /usr/local/sbin/smartctl --all ${X} | /bin/grep -i -A 999 "SMART 
Error Log" | grep -v "without error" >> ${HEALTHFILE}.0
   /bin/echo 
"------------------------------------------------------------------------">> 
${HEALTHFILE}.0
done
/usr/bin/diff ${HEALTHFILE}.0 ${HEALTHFILE}.1 > /dev/null
case "$?" in
   0)
     #
   ;;
   1)
     ${SENDEMAILCOMMAND} -u "SMART Status, Reallocated Sector Count" < 
${HEALTHFILE}.0
   ;;
esac
>> But why not checking parity during normal read operation? Was that a
>> performance decision?
>>      
> I don't know, but I do think it would hurt performance considerably.
>    
If  http://www.accs.com/p_and_p/RAID/LinuxRAID.html is still current 
info: It will hurt performance due to the "left synchronous default", 
but I expect the real world difference to be small.

>> It is not _that_ bad not doing it during normal
>> operation since the good dists schedule a regular check, but can it be
>> controlled by something like echo "1">
>> /proc/sys/dev/raid/always_read_parity ?
>>      
> Well, I think making an optional check would be fine.
> I dont know if it could be done in a non-performance hurting way, such
> as being deleyed or running at a lower IO priority.
>    
I doubt delaying would help the performance, in asynchronous layouts it 
is the fifth HD doing a read, in synchronous layouts the 
next-chunk-to-read is directly after the parity chunk.

kind regards,

Joachim Otahal


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md devices: Suggestion for in place time and checksum within the RAID
  2010-03-14 13:03         ` Keld Simonsen
  2010-03-14 14:00           ` Joachim Otahal
@ 2010-03-15 21:28           ` Joachim Otahal
  1 sibling, 0 replies; 8+ messages in thread
From: Joachim Otahal @ 2010-03-15 21:28 UTC (permalink / raw)
  To: Keld Simonsen; +Cc: Bill Davidsen, linux-raid

Keld Simonsen schrieb:
> Interesting! I would like to see your scripts....
>    
I did not realize how OLD that script was until I saw it today, I could 
not leave it that way, here is a revised and less embarrassing version, 
easy to extent to bang you with emails on a raid error too, but 
southpark is on TV now:

#!/bin/sh
HEALTHFILE="/tmp/healthcheck.mdstat"
HARDDRIVES="/dev/sda /dev/sdb /dev/sdc /dev/sdd"
SENDEMAILCOMMAND="/usr/local/sbin/sendEmail.pl <commandline-here>"
[ -f ${HEALTHFILE}.1 ] && /bin/rm -f ${HEALTHFILE}.1
[ -f ${HEALTHFILE}.0 ] && /bin/mv ${HEALTHFILE}.0 ${HEALTHFILE}.1
/usr/bin/touch ${HEALTHFILE}.1
/bin/cat /proc/mdstat > ${HEALTHFILE}.0
/usr/bin/diff ${HEALTHFILE}.0 ${HEALTHFILE}.1 > /dev/null
if [ $? == 1 ] ; then
   ${SENDEMAILCOMMAND} -u "RAID Status" < ${HEALTHFILE}.0
fi

HEALTHFILE="/var/log/healthcheck.smartctl"
[ -f ${HEALTHFILE}.1 ] && /bin/rm -f ${HEALTHFILE}.1
[ -f ${HEALTHFILE}.0 ] && /bin/mv ${HEALTHFILE}.0 ${HEALTHFILE}.1
/usr/bin/touch ${HEALTHFILE}.1
echo "SMART info:"> ${HEALTHFILE}.0
EMAILSUBJECT="SMART Status, Reallocated Sector Count"
for X in ${HARDDRIVES} ; do
   Y="`/usr/local/sbin/smartctl --all ${X} | /bin/grep -i 
Reallocated_Sector_Ct`"
   if [ "${Y}" != "" ] ; then
     /bin/echo "${X}     ${Y}">> ${HEALTHFILE}.0
     if [ "`/usr/local/sbin/smartctl --all ${X} | /bin/grep -o 'No 
Errors Logged'`" == "No Errors Logged" ] ; then
       /bin/echo "${X}    No Errors Logged">> ${HEALTHFILE}.0
     else
       EMAILSUBJECT="SMART ERRORS LOGGED, Reallocated Sector Count"
       [ -f ${HEALTHFILE}.1 ] && /bin/rm -f ${HEALTHFILE}.1
       /usr/bin/touch ${HEALTHFILE}.1
       /bin/echo 
"------------------------------------------------------------------------">> 
${HEALTHFILE}.0
       /bin/echo "${X}">> ${HEALTHFILE}.0
       /usr/local/sbin/smartctl --all ${X} | /bin/grep -i -A 999 "SMART 
Error Log" >> ${HEALTHFILE}.0
       /bin/echo 
"------------------------------------------------------------------------">> 
${HEALTHFILE}.0
     fi
   fi
done
/usr/bin/diff ${HEALTHFILE}.0 ${HEALTHFILE}.1 > /dev/null
if [ $? == 1 ] ; then
   ${SENDEMAILCOMMAND} -u "${EMAILSUBJECT}" < ${HEALTHFILE}.0
fi

regards,

Joachim Otahal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-15 21:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-13 23:00 md devices: Suggestion for in place time and checksum within the RAID Joachim Otahal
2010-03-14  0:04 ` Bill Davidsen
2010-03-14  1:25   ` Joachim Otahal
2010-03-14 10:20     ` Keld Simonsen
2010-03-14 11:58       ` Joachim Otahal
2010-03-14 13:03         ` Keld Simonsen
2010-03-14 14:00           ` Joachim Otahal
2010-03-15 21:28           ` Joachim Otahal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).