Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Martin Steigerwald <martin@lichtvoll.de>
To: Roman Mamedov <rm@romanrm.net>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
Date: Sat, 18 Aug 2018 10:47:15 +0200	[thread overview]
Message-ID: <3878092.uyhAA9QzPZ@merkaba> (raw)
In-Reply-To: <20180818121230.7339a49f@natsu>

Roman Mamedov - 18.08.18, 09:12:
> On Fri, 17 Aug 2018 23:17:33 +0200
> 
> Martin Steigerwald <martin@lichtvoll.de> wrote:
> > > Do not consider SSD "compression" as a factor in any of your
> > > calculations or planning. Modern controllers do not do it anymore,
> > > the last ones that did are SandForce, and that's 2010 era stuff.
> > > You
> > > can check for yourself by comparing write speeds of compressible
> > > vs
> > > incompressible data, it should be the same. At most, the modern
> > > ones
> > > know to recognize a stream of binary zeroes and have a special
> > > case
> > > for that.
> > 
> > Interesting. Do you have any backup for your claim?
> 
> Just "something I read". I follow quote a bit of SSD-related articles
> and reviews which often also include a section to talk about the
> controller utilized, its background and technological
> improvements/changes -- and the compression going out of fashion
> after SandForce seems to be considered a well-known fact.
> 
> Incidentally, your old Intel 320 SSDs actually seem to be based on
> that old SandForce controller (or at least license some of that IP to
> extend on it), and hence those indeed might perform compression.

Interesting. Back then I read the Intel SSD 320 would not compress.
I think its difficult to know for sure with those proprietary controllers.

> > As the data still needs to be transferred to the SSD at least when
> > the SATA connection is maxed out I bet you won´t see any difference
> > in write speed whether the SSD compresses in real time or not.
> 
> Most controllers expose two readings in SMART:
> 
>   - Lifetime writes from host (SMART attribute 241)
>   - Lifetime writes to flash (attribute 233, or 177, or 173...)
>
> It might be difficult to get the second one, as often it needs to be
> decoded from others such as "Average block erase count" or "Wear
> leveling count". (And seems to be impossible on Samsung NVMe ones,
> for example)

I got the impression every manufacturer does their own thing here. And I
would not even be surprised when its different between different generations
of SSDs by one manufacturer.

# Crucial mSATA

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   000    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       16345
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       4193
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0032   078   078   000    Old_age   Always       -       663
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       362
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       8219
183 SATA_Iface_Downshift    0x0032   100   100   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   046   020   000    Old_age   Always       -       54 (Min/Max -10/80)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       16
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Used   0x0031   078   078   000    Pre-fail  Offline      -       22

I expect the raw value of this to raise more slowly now there are almost
100 GiB completely unused and there is lots of free space in the filesystems.
But even if not, the SSD is in use since March 2014. So it has plenty of time
to go.

206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   ---    Old_age   Always       -       91288276930

^^ In sectors. 91288276930 * 512 / 1024 / 1024 / 1024 ~= 43529 GiB

Could be 4 KiB… but as its telling about Host_Sector and the value multiplied
by eight does not make any sense, I bet its 512 Bytes.

% smartctl /dev/sdb --all |grep "Sector Size"
Sector Sizes:     512 bytes logical, 4096 bytes physical

247 Host_Program_Page_Count 0x0032   100   100   ---    Old_age   Always       -       2892511571
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   ---    Old_age   Always       -       742817198


# Intel SSD 320, before secure erase

The Intel SSD 320 in April 2017, I lost the smartctl -a directly before the
secure erase output due to writing it to the /home filesystem after the
backup – I do have the more recent attrlog CSV file, but I feel to lazy
to format it in a meaningful way:

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       21035
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5292
170 Reserve_Block_Count     0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       169
183 SATA_Downshift_Count    0x0030   100   100   000    Old_age   Offline      -       3
184 End-to-End_Error        0x0032   100   100   090    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       462
199 CRC_Error_Count         0x0030   100   100   000    Old_age   Offline      -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1370316
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       2206583
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       49
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       13857327
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   097   097   000    Old_age   Always       -       0

^^ almost new. I have a PDF from Intel explaining this value somewhere.
Intel SSD 320 had more free space than the Crucial M500 for a good time
of their usage.

241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       1370316

^^ 1370316 * 32 / 1024 ~= 42822 GiB

242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       2016560

The Intel SSD is in use for a longer time, since May 2011.


# Intel SSD 320 after secure erase:

Interestingly the secure erase nuked the SMART values:

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
  4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       6726
170 Reserve_Block_Count     0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
183 SATA_Downshift_Count    0x0030   100   100   000    Old_age   Offline      -       0
184 End-to-End_Error        0x0032   100   100   090    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       537
199 CRC_Error_Count         0x0030   100   100   000    Old_age   Offline      -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       5768
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       65535
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       65535
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       65535
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       5768
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -

Good for selling it. You could claim it is all fresh and new :)


# Samsung Pro 860

Note this SSD is almost new – smartctl 6.6 2016-05-31 does not know about 
one attribute. I am not sure why the command is so old in Debian Sid:

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       50
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       26
177 Wear_Leveling_Count     0x0013   100   100   000    Pre-fail  Always       -       0

^^ new :)

179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   065   052   000    Old_age   Always       -       35
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       1
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       1133999775

According to reference in internet, sectors are meant here, so:

1133999775 * 512 / 1024 / 1024 / 1024 ~= 541 GiB

% smartctl /dev/sda --all |grep "Sector Size"
Sector Size:      512 bytes logical/physical

> But if you have numbers for both, you know the write amplification of
> the drive (and its past workload).

Sure.

> If there is compression at work, you'd see the 2nd number being
> somewhat, or significantly lower -- and barely increase at all, if
> you write highly compressible data. This is not typically observed on
> modern SSDs, except maybe when writing zeroes. Writes to flash will
> be the same as writes from host, or most often somewhat higher, as
> the hardware can typically erase flash only in chunks of 2MB or so,
> hence there's quite a bit of under the hood reorganizing going on.
> Also as a result depending on workloads the "to flash" number can be
> much higher than "from host".

Okay, I get that, but it would be quite some effort to make reliable
measurements cause you´d need to write quite some amount of data
for the media wearout indicator to change. I do not intend to do that.

> Point is, even when the SATA link is maxed out in both cases, you can
> still check if there's compression at work via using those SMART
> attributes.

Sure. But with quite some effort. And with some aging of the SSDs involved.

I can imagine better uses of my  time :)

> > In any case: It was a experience report, no request for help, so I
> > don´t see why exact error messages are absolutely needed. If I had
> > a support inquiry that would be different, I agree.
> 
> Well, when reading such stories (involving software that I also use) I
> imagine what if I had been in that situation myself, what would I do,
> would I have anything else to try, do I know about any workaround for
> this. And without any technical details to go from, those are all
> questions left unanswered.

Sure, I get that.

My priority was to bring the machine back online. I managed to put the
screen log on a filesystem I destroyed afterwards and I managed to put it
there after the backup of that filesystem was complete… so c’est la vie the
log is gone. But even if I still had it, I probably would not have included
all error messages. But I would have been able to provide the those you
are interested in. Anyway, its gone and that is it.

Thanks,
-- 
Martin

next prev parent reply	other threads:[~2018-08-18 11:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-17  9:08 Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD Martin Steigerwald
2018-08-17 11:58 ` Austin S. Hemmelgarn
2018-08-17 12:28   ` Martin Steigerwald
2018-08-17 12:50     ` Roman Mamedov
2018-08-17 13:01       ` Austin S. Hemmelgarn
2018-08-17 21:16         ` Martin Steigerwald
2018-08-17 21:17       ` Martin Steigerwald
2018-08-18  7:12         ` Roman Mamedov
2018-08-18  8:47           ` Martin Steigerwald [this message]
2018-08-17 12:55     ` Austin S. Hemmelgarn
2018-08-17 21:26       ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3878092.uyhAA9QzPZ@merkaba \
    --to=martin@lichtvoll.de \
    --cc=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rm@romanrm.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox