public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
From: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
To: Martin Steigerwald <martin@lichtvoll.de>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-ide@vger.kernel.org, Hans de Goede <hdegoede@redhat.com>
Subject: Re: Race to power off harming SATA SSDs
Date: Tue, 11 Apr 2017 11:31:29 -0300	[thread overview]
Message-ID: <20170411143129.GA28632@khazad-dum.debian.net> (raw)
In-Reply-To: <3231980.BbEtxjAFS5@merkaba>

On Tue, 11 Apr 2017, Martin Steigerwald wrote:
> I do have a Crucial M500 and I do have an increase of that counter:
> 
> martin@merkaba:~[…]/Crucial-M500> grep "^174" smartctl-a-201*   
> smartctl-a-2014-03-05.txt:174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    
> Old_age   Always       -       1
> smartctl-a-2014-10-11-nach-prüfsummenfehlern.txt:174 Unexpect_Power_Loss_Ct  
> 0x0032   100   100   000    Old_age   Always       -       67
> smartctl-a-2015-05-01.txt:174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    
> Old_age   Always       -       105
> smartctl-a-2016-02-06.txt:174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    
> Old_age   Always       -       148
> smartctl-a-2016-07-08-unreadable-sector.txt:174 Unexpect_Power_Loss_Ct  0x0032   
> 100   100   000    Old_age   Always       -       201
> smartctl-a-2017-04-11.txt:174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    
> Old_age   Always       -       272
> 
> 
> I mostly didn´t notice anything, except for one time where I indeed had a 
> BTRFS checksum error, luckily within a BTRFS RAID 1 with an Intel SSD (which 
> also has an attribute for unclean shutdown which raises).

The Crucial M500 has something called "RAIN" which it got unmodified
from its Micron datacenter siblings of the time, along with a large
amount of flash overprovisioning.  Too bad it lost the overprovisioned
supercapacitor bank present on the Microns.

RAIN does block-level N+1 RAID5-like parity across the flash chips on
top of the usual block-based ECC, and the SSD has a background scrubber
task that repairs and blocks that fail ECC correction using the RAIN
parity information.

On such an SSD, you really need multi-chip flash corruption beyond what
ECC can fix to even get the operating system/filesystem to notice any
damage, unless you are paying attention to its SMART attributes (it
counts the number of blocks that required RAIN recovery -- which implies
ECC failed to correct that block in the first place), etc.

Unfortunately, I do not have correlation data to know whether there is
an increase on RAIN-corrected or ECC-corrected blocks during the 24h
after an unclean poweroff right after STANDBY IMMEDIATE on a Crucial
M500 SSD.

> The write-up Henrique gave me the idea, that maybe it wasn´t an user triggered 
> unclean shutdown that caused the issue, but an unclean shutdown triggered by 
> the Linux kernel SSD shutdown procedure implementation.

Maybe.  But that corruption could easily having been caused by something
else.  There is no shortage of possible culprits.

I expect most damage caused by unclean SSD power-offs to be hidden from
the user/operating system/filesystem by the extensive recovery
facilities present on most SSDs.

Note that the fact that data was transparently (and sucessfully)
recovered doesn't mean damage did not happen, or that the unit was not
harmed by it: it likely got some extra flash wear at the very least.

BTW, for the record, Windows 7 also appears to have had (and maybe still
have) this issue as far as I can tell.  Almost every user report of
excessive unclean power off alerts (and also of SSD bricking) to be
found on SSD vendor forums come from Windows users.

-- 
  Henrique Holschuh

  reply	other threads:[~2017-04-11 14:31 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10 23:21 Race to power off harming SATA SSDs Henrique de Moraes Holschuh
2017-04-10 23:34 ` Bart Van Assche
2017-04-10 23:50   ` Henrique de Moraes Holschuh
2017-04-10 23:49 ` sd: wait for slow devices on shutdown path Henrique de Moraes Holschuh
2017-04-10 23:52 ` Race to power off harming SATA SSDs Tejun Heo
2017-04-10 23:57   ` James Bottomley
2017-04-11  2:02     ` Henrique de Moraes Holschuh
2017-04-11  1:26   ` Henrique de Moraes Holschuh
2017-04-11 10:37   ` Martin Steigerwald
2017-04-11 14:31     ` Henrique de Moraes Holschuh [this message]
2017-04-12  7:47       ` Martin Steigerwald
2017-05-07 20:40   ` Pavel Machek
2017-05-08  7:21     ` David Woodhouse
2017-05-08  7:38       ` Ricard Wanderlof
2017-05-08  8:13         ` David Woodhouse
2017-05-08  8:36           ` Ricard Wanderlof
2017-05-08  8:54             ` David Woodhouse
2017-05-08  9:06               ` Ricard Wanderlof
2017-05-08  9:09                 ` Hans de Goede
2017-05-08 10:13                   ` David Woodhouse
2017-05-08 11:50                     ` Boris Brezillon
2017-05-08 15:40                       ` David Woodhouse
2017-05-08 21:36                         ` Pavel Machek
2017-05-08 16:43                       ` Pavel Machek
2017-05-08 17:43                         ` Tejun Heo
2017-05-08 18:56                           ` Pavel Machek
2017-05-08 19:04                             ` Tejun Heo
2017-05-08 18:29                         ` Atlant Schmidt
2017-05-08 10:12                 ` David Woodhouse
2017-05-08  9:28       ` Pavel Machek
2017-05-08  9:34         ` David Woodhouse
2017-05-08 10:49           ` Pavel Machek
2017-05-08 11:06             ` Richard Weinberger
2017-05-08 11:48               ` Boris Brezillon
2017-05-08 11:55                 ` Boris Brezillon
2017-05-08 12:13                 ` Richard Weinberger
2017-05-08 11:09             ` David Woodhouse
2017-05-08 12:32               ` Pavel Machek
2017-05-08  9:51         ` Richard Weinberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170411143129.GA28632@khazad-dum.debian.net \
    --to=hmh@hmh.eng.br \
    --cc=hdegoede@redhat.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox