* SMART, RAID and real world experience of failures.
@ 2012-01-05 23:53 Steven Haigh
2012-01-06 0:42 ` Roman Mamedov
2012-01-06 11:22 ` Peter Grandi
0 siblings, 2 replies; 9+ messages in thread
From: Steven Haigh @ 2012-01-05 23:53 UTC (permalink / raw)
To: linux-raid
Hi all,
Extremely long time listener but very few time poster.
I got a SMART error email yesterday from my home server with a 4 x 1Tb
RAID6. It basically boiled down to:
The following warning/error was logged by the smartd daemon:
Device: /dev/sdd [SAT], 1 Currently unreadable (pending) sectors
Device: /dev/sdd [SAT], 1 Offline uncorrectable sectors
This got me wondering so I ran a long test (smartctl -t long /dev/sdd)
and sure enough, after an hour or so I got this:
# 2 Extended offline Completed: read failure 50% 17465
1172842872
So, in the spirit of experimentation, I did the following:
# mdadm /dev/md2 --manage --fail /dev/sdd
# mdadm /dev/md2 --manage --remove /dev/sdd
# dd if=/dev/zero of=/dev/sdd bs=10M
# mdadm /dev/md2 --manage --add /dev/sdd
< a resync occurred here, afterwards >
# smartctl -t long /dev/sdd
< long wait >
# smartctl -a /dev/sdd
This is where it gets interesting. Although it originally logged an
error, I now see the following (with lots of other info trimmed):
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 154
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always - 0
9 Power_On_Hours 0x0032 081 081 000 Old_age
Always - 17493
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 77
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
Then even more interesting:
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 17489
-
# 2 Extended offline Completed: read failure 50% 17465
1172842872
This makes me ponder. Has the drive recovered? Has the sector with the
read failure been remapped and hidden from view? Is it still (more?)
likely to fail in the near future?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: SMART, RAID and real world experience of failures. 2012-01-05 23:53 SMART, RAID and real world experience of failures Steven Haigh @ 2012-01-06 0:42 ` Roman Mamedov 2012-01-06 11:22 ` Peter Grandi 1 sibling, 0 replies; 9+ messages in thread From: Roman Mamedov @ 2012-01-06 0:42 UTC (permalink / raw) To: Steven Haigh; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1275 bytes --] On Fri, 06 Jan 2012 10:53:44 +1100 Steven Haigh <netwiz@crc.id.au> wrote: > So, in the spirit of experimentation, I did the following: > # mdadm /dev/md2 --manage --fail /dev/sdd > # mdadm /dev/md2 --manage --remove /dev/sdd > # dd if=/dev/zero of=/dev/sdd bs=10M > # mdadm /dev/md2 --manage --add /dev/sdd Congratulations on doing absolutely the right thing, IMO :) > Has the sector with the read failure been remapped and hidden from view? More likely the drive decided the sector wasn't bad, after all. E.g. by overwriting the sector and thus allowing the drive to throw away old user data from it, you have also let the drive to rewrite the ECC or system area of that sector, or whatever they have today in hard drives on the low level. If it was indeed remapped, then the logic would also require an increase in Reallocated_Sector_Ct; but SMART attribute readings do have a lot of peculiarities, many of which are also vendor-specific. > Is it still (more?) likely to fail in the near future? In my experience not at all, unless you start getting stuff like this every week. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-05 23:53 SMART, RAID and real world experience of failures Steven Haigh 2012-01-06 0:42 ` Roman Mamedov @ 2012-01-06 11:22 ` Peter Grandi 2012-01-06 11:40 ` Steven Haigh 1 sibling, 1 reply; 9+ messages in thread From: Peter Grandi @ 2012-01-06 11:22 UTC (permalink / raw) To: Linux RAID [ ... ] > I got a SMART error email yesterday from my home server with a > 4 x 1Tb RAID6. [ ... ] That's an (euphemism alert) imaginative setup. Why not a 4 drive RAID10? In general there are vanishingly few cases in which RAID6 makes sense, and in the 4 drive case a RAID10 makes even more sense than usual. Especially with the really cool setup options that MD RAID10 offers. [ ... ] >> dd if=/dev/zero of=/dev/sdd bs=10M >> mdadm /dev/md2 --manage --add /dev/sdd > < a resync occurred here, afterwards > That means that *two* writes-over have been done, one by way of 'dd' and one by way of resyncing. Usually if one wants to "refresh"/"retest" a drive doing a SECURITY ERASE UNIT is possibly more suitable than a zero-over: http://www.sabi.co.uk/blog/12-one.html#120104c but before the security erase was available the zero-over was the choice. > This makes me ponder. Has the drive recovered? Has the sector > with the read failure been remapped and hidden from view? Is > it still (more?) likely to fail in the near future? Uhmmm, slightly naive questions. A 1TB drive has almost 2 billion sectors, so "bad" sectors should be common. But the main point is that what is a "bad" sector is a messy story, and most "bad" sectors are really marginal (and an argument can be made that most sectors are marginal or else PRML encoding would not be necessary). So many things can go wrong, and not all fatally. For example when writing some "bad" sectors the drive was vibrating a bit more and the head was accordingly a little bit off, etc. Writing-over some marginal sectors often refreshes the recording, and it is no longer marginal, and otherwise as you guessed the drive can substitute the sector with a spare (something that it cannot really do on reading of course). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-06 11:22 ` Peter Grandi @ 2012-01-06 11:40 ` Steven Haigh 2012-01-06 13:38 ` Phil Turmel 2012-01-09 13:59 ` Peter Grandi 0 siblings, 2 replies; 9+ messages in thread From: Steven Haigh @ 2012-01-06 11:40 UTC (permalink / raw) To: Linux RAID On 6/01/2012 10:22 PM, Peter Grandi wrote: > [ ... ] > >> I got a SMART error email yesterday from my home server with a >> 4 x 1Tb RAID6. [ ... ] > > That's an (euphemism alert) imaginative setup. Why not a 4 drive > RAID10? In general there are vanishingly few cases in which > RAID6 makes sense, and in the 4 drive case a RAID10 makes even > more sense than usual. Especially with the really cool setup > options that MD RAID10 offers. The main reason is the easy ability to grow the RAID6 to an extra drive when I need the space. I've just about allocated all of the array to various VMs and file storage. One thats full, its easier to add another 1Tb drive, grow the RAID, grow the PV and then either add more LVs or grow the ones that need it. Sadly, I don't have the cash flow to just replace the 1Tb drives with 2Tb drives or whatever the flavour of the month is after 2 years. >> This makes me ponder. Has the drive recovered? Has the sector >> with the read failure been remapped and hidden from view? Is >> it still (more?) likely to fail in the near future? > > Uhmmm, slightly naive questions. A 1TB drive has almost 2 > billion sectors, so "bad" sectors should be common. > > But the main point is that what is a "bad" sector is a messy > story, and most "bad" sectors are really marginal (and an > argument can be made that most sectors are marginal or else PRML > encoding would not be necessary). So many things can go wrong, > and not all fatally. For example when writing some "bad" sectors > the drive was vibrating a bit more and the head was accordingly > a little bit off, etc. > > Writing-over some marginal sectors often refreshes the > recording, and it is no longer marginal, and otherwise as you > guessed the drive can substitute the sector with a spare > (something that it cannot really do on reading of course). This is what I was wondering... The drive has been running for about 1.9 years - pretty much 24/7. From checking the seagate web site, its still under warranty until the end of 2012. I guess it seems that the best thing to do is monitor the drive as I have been doing and see if its a once off or becomes a regular occurrence. My system does a check of the RAID every week as part of the cron setup, so I'd hope things like this get picked up before it starts losing any redundancy. -- Steven Haigh Email: netwiz@crc.id.au Web: http://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-06 11:40 ` Steven Haigh @ 2012-01-06 13:38 ` Phil Turmel 2012-01-09 14:50 ` Peter Grandi 2012-01-09 13:59 ` Peter Grandi 1 sibling, 1 reply; 9+ messages in thread From: Phil Turmel @ 2012-01-06 13:38 UTC (permalink / raw) To: Steven Haigh; +Cc: Linux RAID On 01/06/2012 06:40 AM, Steven Haigh wrote: > On 6/01/2012 10:22 PM, Peter Grandi wrote: >> [ ... ] >> >>> I got a SMART error email yesterday from my home server with a 4 >>> x 1Tb RAID6. [ ... ] >> >> That's an (euphemism alert) imaginative setup. Why not a 4 drive >> RAID10? In general there are vanishingly few cases in which RAID6 >> makes sense, and in the 4 drive case a RAID10 makes even more sense >> than usual. Especially with the really cool setup options that MD >> RAID10 offers. In this case, the raid6 can suffer the loss of any two drives and continue operating. Raid10 cannot, unless you give up more space for triple redundancy. Basic trade-off: speed vs. safety. > The main reason is the easy ability to grow the RAID6 to an extra > drive when I need the space. I've just about allocated all of the > array to various VMs and file storage. One thats full, its easier to > add another 1Tb drive, grow the RAID, grow the PV and then either add > more LVs or grow the ones that need it. Sadly, I don't have the cash > flow to just replace the 1Tb drives with 2Tb drives or whatever the > flavour of the month is after 2 years. Watch out for the change in SCTERC support. I replaced some 1T Seagate desktop drives on a home server with newer 2T desktop drives, which appeared to be Seagate's recommended successors to the older models. But no SCTERC support. My fault for not rechecking the spec sheet, but still. Shouldn't have been pitched as a "successor", IMO. Some Hitachi desktop drives still show STCERC on their specs, but I fear that'll change before I'm ready to spend again. >>> This makes me ponder. Has the drive recovered? Has the sector >>> with the read failure been remapped and hidden from view? Is it >>> still (more?) likely to fail in the near future? >> >> Uhmmm, slightly naive questions. A 1TB drive has almost 2 billion >> sectors, so "bad" sectors should be common. >> >> But the main point is that what is a "bad" sector is a messy story, >> and most "bad" sectors are really marginal (and an argument can be >> made that most sectors are marginal or else PRML encoding would not >> be necessary). So many things can go wrong, and not all fatally. >> For example when writing some "bad" sectors the drive was vibrating >> a bit more and the head was accordingly a little bit off, etc. >> >> Writing-over some marginal sectors often refreshes the recording, >> and it is no longer marginal, and otherwise as you guessed the >> drive can substitute the sector with a spare (something that it >> cannot really do on reading of course). Also keep in mind that modern hard drives have manufacturing defects that are mapped at the factory, and excluded from all further use. Those don't show in the statistics. > This is what I was wondering... The drive has been running for about > 1.9 years - pretty much 24/7. From checking the seagate web site, its > still under warranty until the end of 2012. > > I guess it seems that the best thing to do is monitor the drive as I > have been doing and see if its a once off or becomes a regular > occurrence. My system does a check of the RAID every week as part of > the cron setup, so I'd hope things like this get picked up before it > starts losing any redundancy. You might also want to perform a "repair" scrub the next time this kind of problem appears. ("check" is most appropriate in the cron job, though.) It might have saved you a bunch of time, while maintaining the extra redundancy on the rest of the array. Phil ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-06 13:38 ` Phil Turmel @ 2012-01-09 14:50 ` Peter Grandi 2012-01-09 16:37 ` Phil Turmel 2012-01-09 20:23 ` Peter Grandi 0 siblings, 2 replies; 9+ messages in thread From: Peter Grandi @ 2012-01-09 14:50 UTC (permalink / raw) To: Linux RAID >>> I got a SMART error email yesterday from my home server with a 4 >>> x 1Tb RAID6. [ ... ] >>> That's an (euphemism alert) imaginative setup. Why not a 4 >>> drive RAID10? In general there are vanishingly few cases in >>> which RAID6 makes sense, and in the 4 drive case a RAID10 >>> makes even more sense than usual. Especially with the really >>> cool setup options that MD RAID10 offers. > In this case, the raid6 can suffer the loss of any two drives > and continue operating. Raid10 cannot, unless you give up > more space for triple redundancy. When I see arguments like this I am sometimes (euphemism alert) enthused by their (euphemism alert) profundity. A defense of a 4-drive RAID6 is a particularly compelling example, and this type of (euphemism alert) astute observation even more so. In my shallowness I had thought that one goal of redundant RAID setups like RAID10 and RAID6 is to take advantage of redundancy to deliver greater realiability, a statistical property, related also to expected probability (and correlation and cost) of failure modes, not just to geometry. But even as to geometrical arguments, there is: * While RAID6 can «suffer the loss of any two drives and continue operating», RAID10 can "suffer the loss of any number of non paired drives and continue operating", which is not directly comparable, but is not necessarily a weaker property overall (it is weaker only in the paired case and much stronger on the non paired case). This ''geometric'' property is of great advantage in engineering terms because it allows putting drives in two mostly uncorrelated sets, and lack of correlation is a very important property in statistical redundancy work. In practice for example this allows putting two shelves of drives in different racks, on different power supplies, on different host adapters, or even (with DRBD for example) on different computers on different networks. But this is not the whole story, because let's look at further probabilistic aspects: * the failure of any two paired drives is a lot less likely than that of any two non-paired drives; * the failure of any two paired drives is even less likely than the failure of any single drive, which is by far the single biggest problem likely to happen (unless there are huge environmental common modes, outside "geometric" arguments); * The failure of any two paired drives at the same time (outside rebuild) is probably less likely than the failure of any other RAID setup component, like the host bus adapter, or power supplies. * As mentioned above, the biggest problem with redundancy is correlation, that is common mode of failures, for example via environmental factors, and RAID10 affords simpletons like me the luxury of setting up two mostly uncorrelated sets, while RAID6 (like all parity RAID) effectively tangles all drives together. As to the latter, in my naive thoughts before being exposed to the (euphemism alert) superior wisdom of the fact that «raid6 can suffer the loss of any two drives and continue operating» I worried about what happens *after* a failure, in particular to common modes: * In the common case of a loss of a single drive, the only drive impacted in RAID10 is the paired one, and it involves a pretty simple linear mirroring, with very little extra activity. This means that both as to performance and to environmental factors like extra vibration, heat and power draw the impact is minimal. Similarly for the loss of any N non paired drives. In all cases the duration of the vulnerable rebuild period is limited to drive duplication. * For RAID6 the loss of one or two drives involves a massive whole-array activity surge, with a lot of read-write cycles on each drive (all drives must be both read and written), which both interferes hugely with array performance, and may impact heavily vibration, heat and power draw levels, as usually the drives are contiguous (and the sort of people who like RAID6 tend to make them of identical units pulled from the same carton...). Because of the massive extra activity, the vulnerable rebuild period is greatly lengthened, and in this period all drives are subject to new and largely identical stresses, which may well greatly raise the probability of further failures, including the failures of more than 1 extra drive. There are other little problems with parity RAID rebuilds, as described in the BAARF.com site for example. But the above points seemed to me a pretty huge deal to me, before I read the (euphemism alert) imaginative geometric point that «raid6 can suffer the loss of any two drives and continue operating» as that be all that matters. > Basic trade-off: speed vs. safety. I was previously unable to imagine why one would want to trade off much lower speed to achieve lower safety as well... :-) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-09 14:50 ` Peter Grandi @ 2012-01-09 16:37 ` Phil Turmel 2012-01-09 20:23 ` Peter Grandi 1 sibling, 0 replies; 9+ messages in thread From: Phil Turmel @ 2012-01-09 16:37 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID On 01/09/2012 09:50 AM, Peter Grandi wrote: >>>> I got a SMART error email yesterday from my home server with a 4 >>>> x 1Tb RAID6. [ ... ] > >>>> That's an (euphemism alert) imaginative setup. Why not a 4 >>>> drive RAID10? In general there are vanishingly few cases in >>>> which RAID6 makes sense, and in the 4 drive case a RAID10 >>>> makes even more sense than usual. Especially with the really >>>> cool setup options that MD RAID10 offers. > >> In this case, the raid6 can suffer the loss of any two drives >> and continue operating. Raid10 cannot, unless you give up >> more space for triple redundancy. > > When I see arguments like this I am sometimes (euphemism alert) > enthused by their (euphemism alert) profundity. A defense of a > 4-drive RAID6 is a particularly compelling example, and this > type of (euphemism alert) astute observation even more so. << plonk >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-09 14:50 ` Peter Grandi 2012-01-09 16:37 ` Phil Turmel @ 2012-01-09 20:23 ` Peter Grandi 1 sibling, 0 replies; 9+ messages in thread From: Peter Grandi @ 2012-01-09 20:23 UTC (permalink / raw) To: Linux RAID >>> [ ... ] In general there are vanishingly few cases in which >>> RAID6 makes sense, and in the 4 drive case a RAID10 makes >>> even more sense than usual. [ ... ] >> In this case, the raid6 can suffer the loss of any two drives >> and continue operating. Raid10 cannot, unless you give up >> more space for triple redundancy. [ ... ] [ ... ] > * While RAID6 can «suffer the loss of any two drives and > continue operating», RAID10 can "suffer the loss of any > number of non paired drives and continue operating", [ ... ] Let's put behind the (euphemism alert) entertaining misunderestimation that RAID6 is safer than RAID10 even if a (wide) RAID10 can take more than 2 non-paired failures while continuing service, and has much more desirable rebuild profiles and statical risks, which is a bigger deal in practice. But the possibly amusing detail here is that while I have a server at home with 4 drives I don't use RAID10 for that. What I use is not-RAID: I have 2 "active" drives, and 2 "backup" drives, and I image-copy each partition in each pair during the night (using a simple CRON script), because: * I don't particularly value the additional performance that RAID10 would give for a "home" server. * Data preservation by the the avoidance of common modes to maximize the advantages of redundancy seems more important to me. I have 4 drives of different makes and models, and I keep 2 separate "active" drives so *maintenance* or a failure of one or the other does not fully terminate service (a minor concern though for my "home" server), and the 2 backup drives can be kept in sleep mode until the night-time copy, and have a completely different stress profile from the 2 "active" drives (and between them too). Unfortunately all 4 are in the same cage in the same PC box though, and thus in the same environmental conditions and all 'online', and thus with undesirable common modes. Thus I also have 'offline' drives that I put in an external eSATA cradle every now and then to further copy the contents of the 2 active drives onto them, and then I put them on a shelf, in totally different environmental conditions. * I could be using continuous RAID1 for the mirroring between "active" and "backup" drives, even for the additional eSATA offline mirrors, and I do that on some "work" servers. But I prefer for the mirroring to be noncontinuous and periodic, to enjoy a sort of simple minded versioning system (and to reduce stress on the "backup" drives also allowing them to be in low-power sleep mode for most of the day). So I can then recover accidental deletions that are up to a day old from the "inline" nightly copies, or up to several days or weeks with the "offline" copies. * Since for my "home" server I don't really need continuous availability, if an "active" drive fails I just power down, swap cables with the "backup" drive and reboot, accepting losing up to a day of updates. But for most "work" servers (which have higher continuous availability targets) I use RAID1 for those that don't have high write transfer rate requirements, or RAID10 for those that do (very very very rarely narrow 2+1 or even 3+1 RAID5 for mostly readonly content caches). For mostly readonly very multithreaded loads with small data sizes I have occasionally used 3-4 drive RAID1s. For very "transient" data areas with low availability requirements even more rarely RAID0s (never a 'concat'!). Always using MD (without ever reshaping), virtually never hw RAID (most hw RAID cards are amazingly buggy as well as having configuration oddities) and almost never DM/LVM2; that is except for DM's snapshots sometimes, and I am not using yet but very interested in COW file-systems that offer builting snapshotting. And using as a rule JFS or XFS (looking sometimes wistfully at OCFS2) as file-systems, and almost never growing any of their filetrees (I tend to use 1 or at most 2 filetrees per disk or RAID set). -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: SMART, RAID and real world experience of failures. 2012-01-06 11:40 ` Steven Haigh 2012-01-06 13:38 ` Phil Turmel @ 2012-01-09 13:59 ` Peter Grandi 1 sibling, 0 replies; 9+ messages in thread From: Peter Grandi @ 2012-01-09 13:59 UTC (permalink / raw) To: Linux RAID >>> I got a SMART error email yesterday from my home server with >>> a 4 x 1Tb RAID6. [ ... ] >> That's an (euphemism alert) imaginative setup. Why not a 4 >> drive RAID10? [ ... ] > The main reason is the easy ability to grow the RAID6 to an > extra drive when I need the space. I've just about allocated > all of the array to various VMs and file storage. One thats > full, its easier to add another 1Tb drive, grow the RAID, grow > the PV and then either add more LVs or grow the ones that need > it. That's indeed an (euphemism alert) audacious plan. It has all the popular ideas that I consider very brave indeed: RAID6, LVM2, growing arrays in-place, growing filesystems in-place, and of course VM images on top (following the usual logic I guess that they will be sparse). It has been my misfortune to see much the same audacious plan setup by optimistic predecessors in a few workplaces, and then having to turn that plan into a less brave one in a hurry when it ended up as it would. But at least in your case it is just a home setup. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-01-09 20:23 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-05 23:53 SMART, RAID and real world experience of failures Steven Haigh 2012-01-06 0:42 ` Roman Mamedov 2012-01-06 11:22 ` Peter Grandi 2012-01-06 11:40 ` Steven Haigh 2012-01-06 13:38 ` Phil Turmel 2012-01-09 14:50 ` Peter Grandi 2012-01-09 16:37 ` Phil Turmel 2012-01-09 20:23 ` Peter Grandi 2012-01-09 13:59 ` Peter Grandi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).