Re: scrub implies failing drive - smartctl blissfully unaware

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: scrub implies failing drive - smartctl blissfully unaware
Date: Wed, 19 Nov 2014 02:46:29 +0000 (UTC)	[thread overview]
Message-ID: <pan$7c051$a7c25939$cafa59e$f63379c9@cox.net> (raw)
In-Reply-To: 546BB2EA.5080809@ubuntu.com

Phillip Susi posted on Tue, 18 Nov 2014 15:58:18 -0500 as excerpted:

> Are there really any that take longer than 30 seconds?  That's enough
> time for thousands of retries.  If it can't be read after a dozen tries,
> it ain't never gonna work.  It seems absurd that a drive would keep
> trying for so long.

I'm not sure about normal operation, but certainly, many drives take 
longer than 30 seconds to stabilize after power-on, and I routinely see 
resets during this time.

In fact, as I recently posted, power-up stabilization time can and often 
does kill reliable multi-drive device or filesystem (my experience is 
with mdraid and btrfs raid) resume from suspend to RAM or hibernate to 
disk, either one or both, because it's often enough the case that one 
device or another will take enough longer to stabilize than the other, 
that it'll be failed out of the raid.

This doesn't happen on single-hardware-device block devices and 
filesystems because in that case it's either up or down, if the device 
doesn't come up in time the resume simply fails entirely, instead of 
coming up with one or more devices there, but others missing as they 
didn't stabilize in time, as is unfortunately all too common in the multi-
device scenario.

I've seen this with both spinning rust and with SSDs, with mdraid and 
btrfs, with multiple mobos and device controllers, and with resume both 
from suspend to ram (if the machine powers down the storage devices in 
that case, as most modern ones do) and hibernate to permanent storage 
device, over several years worth of kernel series, so it's a reasonably 
widespread phenomena, at least among consumer-level SATA devices.  (My 
experience doesn't extend to enterprise-raid-level devices or proper 
SCSI, etc, so I simply don't know, there.)

While two minutes is getting a bit long, I think it's still within normal 
range, and some devices definitely take over a minute enough of the time 
to be both noticeable and irritating.

That said, I SHOULD say I'd be far *MORE* irritated if the device simply 
pretended it was stable and started reading/writing data before it really 
had stabilized, particularly with SSDs where that sort of behavior has 
been observed and is known to put some devices at risk of complete 
scrambling of either media or firmware, beyond recovery at times.  That 
of course is the risk of going the other direction, and I'd a WHOLE lot 
rather have devices play it safe for another 30 seconds or so after they /
think/ they're stable and be SURE, than pretend to be just fine when 
voltages have NOT stabilized yet and thus end up scrambling things 
irrecoverably.  I've never had that happen here tho I've never stress-
tested for it, only done normal operation, but I've seen testing reports 
where the testers DID make it happen surprisingly easily, to a surprising 
number of their test devices.

So, umm... I suspect the 2-minute default is 2 minutes due to power-up 
stabilizing issues, where two minutes is a reasonable compromise between 
failing the boot most of the time if the timeout is too low, and taking 
excessively long for very little further gain.

And in my experience, the only way around that, at the consumer level at 
least, would be to split the timeouts, perhaps setting something even 
higher, 2.5-3 minutes on power-on, while lowering the operational timeout 
to something more sane for operation, probably 30 seconds or so by 
default, but easily tunable down to 10-20 seconds (or even lower, 5 
seconds, even for consumer level devices?) for those who had hardware 
that fit within that tolerance and wanted the performance.  But at least 
to my knowledge, there's no such split in reset timeout values available 
(maybe for SCSI?), and due to auto-spindown and power-saving, I'm not 
sure whether it's even possible, without some specific hardware feature 
available to tell the kernel that it has in fact NOT been in power-saving 
mode for say 5-10 minutes, hopefully long enough that voltage readings 
really /are/ fully stabilized and a shorter timeout is possible.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-11-19  2:46 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1XqYMg-0000YI-8y@watricky.valid.co.za>
2014-11-18  7:29 ` scrub implies failing drive - smartctl blissfully unaware Brendan Hide
2014-11-18  7:36   ` Roman Mamedov
2014-11-18 13:24     ` Brendan Hide
2014-11-18 15:16       ` Duncan
2014-11-18 12:08   ` Austin S Hemmelgarn
2014-11-18 13:25     ` Brendan Hide
2014-11-18 16:02     ` Phillip Susi
2014-11-18 15:35   ` Marc MERLIN
2014-11-18 16:04     ` Phillip Susi
2014-11-18 16:11       ` Marc MERLIN
2014-11-18 16:26         ` Phillip Susi
2014-11-18 18:57     ` Chris Murphy
2014-11-18 20:58       ` Phillip Susi
2014-11-19  2:40         ` Chris Murphy
2014-11-19 15:11           ` Phillip Susi
2014-11-20  0:05             ` Chris Murphy
2014-11-25 21:34               ` Phillip Susi
2014-11-25 23:13                 ` Chris Murphy
2014-11-26  1:53                   ` Rich Freeman
2014-12-01 19:10                   ` Phillip Susi
2014-11-28 15:02                 ` Patrik Lundquist
2014-11-19  2:46         ` Duncan [this message]
2014-11-19 16:07           ` Phillip Susi
2014-11-19 21:05             ` Robert White
2014-11-19 21:47               ` Phillip Susi
2014-11-19 22:25                 ` Robert White
2014-11-20 20:26                   ` Phillip Susi
2014-11-20 22:45                     ` Robert White
2014-11-21 15:11                       ` Phillip Susi
2014-11-21 21:12                         ` Robert White
2014-11-21 21:41                           ` Robert White
2014-11-22 22:06                           ` Phillip Susi
2014-11-19 22:33                 ` Robert White
2014-11-20 20:34                   ` Phillip Susi
2014-11-20 23:08                     ` Robert White
2014-11-21 15:27                       ` Phillip Susi
2014-11-20  0:25               ` Duncan
2014-11-20  2:08                 ` Robert White
2014-11-19 23:59             ` Duncan
2014-11-25 22:14               ` Phillip Susi
2014-11-28 15:55                 ` Patrik Lundquist
2014-11-21  4:58   ` Zygo Blaxell
2014-11-21  7:05     ` Brendan Hide
2014-11-21 12:55       ` Ian Armstrong
2014-11-21 17:45         ` Chris Murphy
2014-11-22  7:18           ` Ian Armstrong
2014-11-21 17:42       ` Zygo Blaxell
2014-11-21 18:06         ` Chris Murphy
2014-11-22  2:25           ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$7c051$a7c25939$cafa59e$f63379c9@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).