All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phillip Susi <psusi@ubuntu.com>
To: Duncan <1i5t5.duncan@cox.net>, linux-btrfs@vger.kernel.org
Subject: Re: scrub implies failing drive - smartctl blissfully unaware
Date: Wed, 19 Nov 2014 11:07:43 -0500	[thread overview]
Message-ID: <546CC04F.6040207@ubuntu.com> (raw)
In-Reply-To: <pan$7c051$a7c25939$cafa59e$f63379c9@cox.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/18/2014 9:46 PM, Duncan wrote:
> I'm not sure about normal operation, but certainly, many drives
> take longer than 30 seconds to stabilize after power-on, and I
> routinely see resets during this time.

As far as I have seen, typical drive spin up time is on the order of
3-7 seconds.  Hell, I remember my pair of first generation seagate
cheetah 15,000 rpm drives seemed to take *forever* to spin up and that
still was maybe only 15 seconds.  If a drive takes longer than 30
seconds, then there is something wrong with it.  I figure there is a
reason why spin up time is tracked by SMART so it seems like long spin
up time is a sign of a sick drive.

> This doesn't happen on single-hardware-device block devices and 
> filesystems because in that case it's either up or down, if the
> device doesn't come up in time the resume simply fails entirely,
> instead of coming up with one or more devices there, but others
> missing as they didn't stabilize in time, as is unfortunately all
> too common in the multi- device scenario.

No, the resume doesn't "fail entirely".  The drive is reset, and the
IO request is retried, and by then it should succeed.

> I've seen this with both spinning rust and with SSDs, with mdraid
> and btrfs, with multiple mobos and device controllers, and with
> resume both from suspend to ram (if the machine powers down the
> storage devices in that case, as most modern ones do) and hibernate
> to permanent storage device, over several years worth of kernel
> series, so it's a reasonably widespread phenomena, at least among
> consumer-level SATA devices.  (My experience doesn't extend to
> enterprise-raid-level devices or proper SCSI, etc, so I simply
> don't know, there.)

If you are restoring from hibernation, then the drives are already
spun up before the kernel is loaded.

> While two minutes is getting a bit long, I think it's still within
> normal range, and some devices definitely take over a minute enough
> of the time to be both noticeable and irritating.

It certainly is not normal for a drive to take that long to spin up.
IIRC, the 30 second timeout comes from the ATA specs which state that
it can take up to 30 seconds for a drive to spin up.

> That said, I SHOULD say I'd be far *MORE* irritated if the device
> simply pretended it was stable and started reading/writing data
> before it really had stabilized, particularly with SSDs where that
> sort of behavior has been observed and is known to put some devices
> at risk of complete scrambling of either media or firmware, beyond
> recovery at times.  That of course is the risk of going the other
> direction, and I'd a WHOLE lot rather have devices play it safe for
> another 30 seconds or so after they / think/ they're stable and be
> SURE, than pretend to be just fine when voltages have NOT
> stabilized yet and thus end up scrambling things irrecoverably.
> I've never had that happen here tho I've never stress- tested for
> it, only done normal operation, but I've seen testing reports where
> the testers DID make it happen surprisingly easily, to a surprising
>  number of their test devices.

Power supply voltage is stable within milliseconds.  What takes HDDs
time to start up is mechanically bringing the spinning rust up to
speed.  On SSDs, I think you are confusing testing done on power
*cycling* ( i.e. yanking the power cord in the middle of a write )
with startup.

> So, umm... I suspect the 2-minute default is 2 minutes due to
> power-up stabilizing issues, where two minutes is a reasonable
> compromise between failing the boot most of the time if the timeout
> is too low, and taking excessively long for very little further
> gain.

The default is 30 seconds, not 2 minutes.

> sure whether it's even possible, without some specific hardware
> feature available to tell the kernel that it has in fact NOT been
> in power-saving mode for say 5-10 minutes, hopefully long enough
> that voltage readings really /are/ fully stabilized and a shorter
> timeout is possible.

Again, there is no several minute period where voltage stabilizes and
the drive takes longer to access.  This is a complete red herring.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUbMBPAAoJEI5FoCIzSKrwcV0H/20pv7O5+CDf2cRg5G5vt7PR
4J1NuVIBsboKwjwCj8qdxHQJHihvLYkTQKANqaqHv0+wx0u2DaQdPU/LRnqN71xA
jP7b9lx9X6rPnAnZUDBbxzAc8HLeutgQ8YD/WB0sE5IXlI1/XFGW4tXIZ4iYmtN9
GUdL+zcdtEiYE993xiGSMXF4UBrN8d/5buBRsUsPVivAZes6OHbf9bd72c1IXBuS
ADZ7cH7XGmLL3OXA+hm7d99429HFZYAgI7DjrLWp6Tb9ja5Gvhy+AVvrbU5ZWMwu
XUnNsLsBBhEGuZs5xpkotZgaQlmJpw4BFY4BKwC6PL+7ex7ud3hGCGeI6VDmI0U=
=DLHU
-----END PGP SIGNATURE-----

  reply	other threads:[~2014-11-19 16:07 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <E1XqYMg-0000YI-8y@watricky.valid.co.za>
2014-11-18  7:29 ` scrub implies failing drive - smartctl blissfully unaware Brendan Hide
2014-11-18  7:36   ` Roman Mamedov
2014-11-18 13:24     ` Brendan Hide
2014-11-18 15:16       ` Duncan
2014-11-18 12:08   ` Austin S Hemmelgarn
2014-11-18 13:25     ` Brendan Hide
2014-11-18 16:02     ` Phillip Susi
2014-11-18 15:35   ` Marc MERLIN
2014-11-18 16:04     ` Phillip Susi
2014-11-18 16:11       ` Marc MERLIN
2014-11-18 16:26         ` Phillip Susi
2014-11-18 18:57     ` Chris Murphy
2014-11-18 20:58       ` Phillip Susi
2014-11-19  2:40         ` Chris Murphy
2014-11-19 15:11           ` Phillip Susi
2014-11-20  0:05             ` Chris Murphy
2014-11-25 21:34               ` Phillip Susi
2014-11-25 23:13                 ` Chris Murphy
2014-11-26  1:53                   ` Rich Freeman
2014-12-01 19:10                   ` Phillip Susi
2014-11-28 15:02                 ` Patrik Lundquist
2014-11-19  2:46         ` Duncan
2014-11-19 16:07           ` Phillip Susi [this message]
2014-11-19 21:05             ` Robert White
2014-11-19 21:47               ` Phillip Susi
2014-11-19 22:25                 ` Robert White
2014-11-20 20:26                   ` Phillip Susi
2014-11-20 22:45                     ` Robert White
2014-11-21 15:11                       ` Phillip Susi
2014-11-21 21:12                         ` Robert White
2014-11-21 21:41                           ` Robert White
2014-11-22 22:06                           ` Phillip Susi
2014-11-19 22:33                 ` Robert White
2014-11-20 20:34                   ` Phillip Susi
2014-11-20 23:08                     ` Robert White
2014-11-21 15:27                       ` Phillip Susi
2014-11-20  0:25               ` Duncan
2014-11-20  2:08                 ` Robert White
2014-11-19 23:59             ` Duncan
2014-11-25 22:14               ` Phillip Susi
2014-11-28 15:55                 ` Patrik Lundquist
2014-11-21  4:58   ` Zygo Blaxell
2014-11-21  7:05     ` Brendan Hide
2014-11-21 12:55       ` Ian Armstrong
2014-11-21 17:45         ` Chris Murphy
2014-11-22  7:18           ` Ian Armstrong
2014-11-21 17:42       ` Zygo Blaxell
2014-11-21 18:06         ` Chris Murphy
2014-11-22  2:25           ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546CC04F.6040207@ubuntu.com \
    --to=psusi@ubuntu.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.