From: Martin Steigerwald <martin@lichtvoll.de>
To: Tejun Heo <tj@kernel.org>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
linux-ide@vger.kernel.org, Hans de Goede <hdegoede@redhat.com>
Subject: Re: Race to power off harming SATA SSDs
Date: Tue, 11 Apr 2017 12:37:43 +0200 [thread overview]
Message-ID: <3231980.BbEtxjAFS5@merkaba> (raw)
In-Reply-To: <20170410235206.GA28603@wtj.duckdns.org>
Am Dienstag, 11. April 2017, 08:52:06 CEST schrieb Tejun Heo:
> > Evidently, how often the SSD will lose the race depends on a platform
> > and SSD combination, and also on how often the system is powered off.
> > A sluggish firmware that takes its time to cut power can save the day...
> >
> >
> > Observing the effects:
> >
> > An unclean SSD power-off will be signaled by the SSD device through an
> > increase on a specific S.M.A.R.T attribute. These SMART attributes can
> > be read using the smartmontools package from www.smartmontools.org,
> > which should be available in just about every Linux distro.
> >
> > smartctl -A /dev/sd#
> >
> > The SMART attribute related to unclean power-off is vendor-specific, so
> > one might have to track down the SSD datasheet to know which attribute a
> > particular SSD uses. The naming of the attribute also varies.
> >
> > For a Crucial M500 SSD with up-to-date firmware, this would be attribute
> > 174 "Unexpect_Power_Loss_Ct", for example.
> >
> > NOTE: unclean SSD power-offs are dangerous and may brick the device in
> > the worst case, or otherwise harm it (reduce longevity, damage flash
> > blocks). It is also not impossible to get data corruption.
>
> I get that the incrementing counters might not be pretty but I'm a bit
> skeptical about this being an actual issue. Because if that were
> true, the device would be bricking itself from any sort of power
> losses be that an actual power loss, battery rundown or hard power off
> after crash.
The write-up by Henrique has been a very informative and interesting read for
me. I wondered about the same question tough.
I do have a Crucial M500 and I do have an increase of that counter:
martin@merkaba:~[…]/Crucial-M500> grep "^174" smartctl-a-201*
smartctl-a-2014-03-05.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000
Old_age Always - 1
smartctl-a-2014-10-11-nach-prüfsummenfehlern.txt:174 Unexpect_Power_Loss_Ct
0x0032 100 100 000 Old_age Always - 67
smartctl-a-2015-05-01.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000
Old_age Always - 105
smartctl-a-2016-02-06.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000
Old_age Always - 148
smartctl-a-2016-07-08-unreadable-sector.txt:174 Unexpect_Power_Loss_Ct 0x0032
100 100 000 Old_age Always - 201
smartctl-a-2017-04-11.txt:174 Unexpect_Power_Loss_Ct 0x0032 100 100 000
Old_age Always - 272
I mostly didn´t notice anything, except for one time where I indeed had a
BTRFS checksum error, luckily within a BTRFS RAID 1 with an Intel SSD (which
also has an attribute for unclean shutdown which raises).
I blogged about this in german language quite some time ago:
https://blog.teamix.de/2015/01/19/btrfs-raid-1-selbstheilung-in-aktion/
(I think its easy enough to get the point of the blog post even when not
understanding german)
Result of scrub:
scrub started at Thu Oct 9 15:52:00 2014 and finished after 564 seconds
total bytes scrubbed: 268.36GiB with 60 errors
error details: csum=60
corrected errors: 60, uncorrectable errors: 0, unverified errors: 0
Device errors were on:
merkaba:~> btrfs device stats /home
[/dev/mapper/msata-home].write_io_errs 0
[/dev/mapper/msata-home].read_io_errs 0
[/dev/mapper/msata-home].flush_io_errs 0
[/dev/mapper/msata-home].corruption_errs 60
[/dev/mapper/msata-home].generation_errs 0
[…]
(thats the Crucial m500)
I didn´t have any explaination of this, but I suspected some unclean shutdown,
even tough I remembered no unclean shutdown. I take good care to always has a
battery in this ThinkPad T520, due to unclean shutdown issues with Intel SSD
320 (bricked device which reports 8 MiB as capacity, probably fixed by the
firmware update I applied back then).
The write-up Henrique gave me the idea, that maybe it wasn´t an user triggered
unclean shutdown that caused the issue, but an unclean shutdown triggered by
the Linux kernel SSD shutdown procedure implementation.
Of course, I don´t know whether this is the case and I think there is no way
to proof or falsify it years after this happened. I never had this happen
again.
Thanks,
--
Martin
next prev parent reply other threads:[~2017-04-11 10:37 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-10 23:21 Race to power off harming SATA SSDs Henrique de Moraes Holschuh
2017-04-10 23:34 ` Bart Van Assche
2017-04-10 23:50 ` Henrique de Moraes Holschuh
2017-04-10 23:49 ` sd: wait for slow devices on shutdown path Henrique de Moraes Holschuh
2017-04-10 23:52 ` Race to power off harming SATA SSDs Tejun Heo
2017-04-10 23:57 ` James Bottomley
2017-04-11 2:02 ` Henrique de Moraes Holschuh
2017-04-11 1:26 ` Henrique de Moraes Holschuh
2017-04-11 10:37 ` Martin Steigerwald [this message]
2017-04-11 14:31 ` Henrique de Moraes Holschuh
2017-04-12 7:47 ` Martin Steigerwald
2017-05-07 20:40 ` Pavel Machek
2017-05-08 7:21 ` David Woodhouse
2017-05-08 7:38 ` Ricard Wanderlof
2017-05-08 8:13 ` David Woodhouse
2017-05-08 8:36 ` Ricard Wanderlof
2017-05-08 8:54 ` David Woodhouse
2017-05-08 9:06 ` Ricard Wanderlof
2017-05-08 9:09 ` Hans de Goede
2017-05-08 10:13 ` David Woodhouse
2017-05-08 11:50 ` Boris Brezillon
2017-05-08 15:40 ` David Woodhouse
2017-05-08 21:36 ` Pavel Machek
2017-05-08 16:43 ` Pavel Machek
2017-05-08 17:43 ` Tejun Heo
2017-05-08 18:56 ` Pavel Machek
2017-05-08 19:04 ` Tejun Heo
2017-05-08 18:29 ` Atlant Schmidt
2017-05-08 10:12 ` David Woodhouse
2017-05-08 9:28 ` Pavel Machek
2017-05-08 9:34 ` David Woodhouse
2017-05-08 10:49 ` Pavel Machek
2017-05-08 11:06 ` Richard Weinberger
2017-05-08 11:48 ` Boris Brezillon
2017-05-08 11:55 ` Boris Brezillon
2017-05-08 12:13 ` Richard Weinberger
2017-05-08 11:09 ` David Woodhouse
2017-05-08 12:32 ` Pavel Machek
2017-05-08 9:51 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3231980.BbEtxjAFS5@merkaba \
--to=martin@lichtvoll.de \
--cc=hdegoede@redhat.com \
--cc=hmh@hmh.eng.br \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox