* Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
@ 2018-08-17 9:08 Martin Steigerwald
2018-08-17 11:58 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-17 9:08 UTC (permalink / raw)
To: linux-btrfs
Hi!
This happened about two weeks ago. I already dealt with it and all is
well.
Linux hung on suspend so I switched off this ThinkPad T520 forcefully.
After that it did not boot the operating system anymore. Intel SSD 320,
latest firmware, which should patch this bug, but apparently does not,
is only 8 MiB big. Those 8 MiB just contain zeros.
Access via GRML and "mount -fo degraded" worked. I initially was even
able to write onto this degraded filesystem. First I copied all data to
a backup drive.
I even started a balance to "single" so that it would work with one SSD.
But later I learned that secure erase may recover the Intel SSD 320 and
since I had no other SSD at hand, did that. And yes, it did. So I
canceled the balance.
I partitioned the Intel SSD 320 and put LVM on it, just as I had it. But
at that time I was not able to mount the degraded BTRFS on the other SSD
as writable anymore, not even with "-f" "I know what I am doing". Thus I
was not able to add a device to it and btrfs balance it to RAID 1. Even
"btrfs replace" was not working.
I thus formatted a new BTRFS RAID 1 and restored.
A week later I migrated the Intel SSD 320 to a Samsung 860 Pro. Again
via one full backup and restore cycle. However, this time I was able to
copy most of the data of the Intel SSD 320 with "mount -fo degraded" via
eSATA and thus the copy operation was way faster.
So conclusion:
1. Pro: BTRFS RAID 1 really protected my data against a complete SSD
outage.
2. Con: It does not allow me to add a device and balance to RAID 1 or
replace one device that is already missing at this time.
3. I keep using BTRFS RAID 1 on two SSDs for often changed, critical
data.
4. And yes, I know it does not replace a backup. As it was holidays and
I was lazy backup was two weeks old already, so I was happy to have all
my data still on the other SSD.
5. The error messages in kernel when mounting without "-o degraded" are
less than helpful. They indicate a corrupted filesystem instead of just
telling that one device is missing and "-o degraded" would help here.
I have seen a discussion about the limitation in point 2. That allowing
to add a device and make it into RAID 1 again might be dangerous, cause
of system chunk and probably other reasons. I did not completely read
and understand it tough.
So I still don´t get it, cause:
Either it is a RAID 1, then, one disk may fail and I still have *all*
data. Also for the system chunk, which according to btrfs fi df / btrfs
fi sh was indeed RAID 1. If so, then period. Then I don´t see why it
would need to disallow me to make it into an RAID 1 again after one
device has been lost.
Or it is no RAID 1 and then what is the point to begin with? As I was
able to copy of all date of the degraded mount, I´d say it was a RAID 1.
(I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just does
two copies regardless of how many drives you use.)
For this laptop it was not all that important but I wonder about BTRFS
RAID 1 in enterprise environment, cause restoring from backup adds a
significantly higher downtime.
Anyway, creating a new filesystem may have been better here anyway,
cause it replaced an BTRFS that aged over several years with a new one.
Due to the increased capacity and due to me thinking that Samsung 860
Pro compresses itself, I removed LZO compression. This would also give
larger extents on files that are not fragmented or only slightly
fragmented. I think that Intel SSD 320 did not compress, but Crucial
m500 mSATA SSD does. That has been the secondary SSD that still had all
the data after the outage of the Intel SSD 320.
Overall I am happy, cause BTRFS RAID 1 gave me access to the data after
the SSD outage. That is the most important thing about it for me.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 9:08 Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD Martin Steigerwald
@ 2018-08-17 11:58 ` Austin S. Hemmelgarn
2018-08-17 12:28 ` Martin Steigerwald
0 siblings, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-17 11:58 UTC (permalink / raw)
To: Martin Steigerwald, linux-btrfs
On 2018-08-17 05:08, Martin Steigerwald wrote:
> Hi!
>
> This happened about two weeks ago. I already dealt with it and all is
> well.
>
> Linux hung on suspend so I switched off this ThinkPad T520 forcefully.
> After that it did not boot the operating system anymore. Intel SSD 320,
> latest firmware, which should patch this bug, but apparently does not,
> is only 8 MiB big. Those 8 MiB just contain zeros.
>
> Access via GRML and "mount -fo degraded" worked. I initially was even
> able to write onto this degraded filesystem. First I copied all data to
> a backup drive.
>
> I even started a balance to "single" so that it would work with one SSD.
>
> But later I learned that secure erase may recover the Intel SSD 320 and
> since I had no other SSD at hand, did that. And yes, it did. So I
> canceled the balance.
>
> I partitioned the Intel SSD 320 and put LVM on it, just as I had it. But
> at that time I was not able to mount the degraded BTRFS on the other SSD
> as writable anymore, not even with "-f" "I know what I am doing". Thus I
> was not able to add a device to it and btrfs balance it to RAID 1. Even
> "btrfs replace" was not working.
>
> I thus formatted a new BTRFS RAID 1 and restored.
>
> A week later I migrated the Intel SSD 320 to a Samsung 860 Pro. Again
> via one full backup and restore cycle. However, this time I was able to
> copy most of the data of the Intel SSD 320 with "mount -fo degraded" via
> eSATA and thus the copy operation was way faster.
>
> So conclusion:
>
> 1. Pro: BTRFS RAID 1 really protected my data against a complete SSD
> outage.
Glad to hear I'm not the only one!
>
> 2. Con: It does not allow me to add a device and balance to RAID 1 or
> replace one device that is already missing at this time.
See below where you comment about this more, I've replied regarding it
there.
>
> 3. I keep using BTRFS RAID 1 on two SSDs for often changed, critical
> data.
>
> 4. And yes, I know it does not replace a backup. As it was holidays and
> I was lazy backup was two weeks old already, so I was happy to have all
> my data still on the other SSD.
>
> 5. The error messages in kernel when mounting without "-o degraded" are
> less than helpful. They indicate a corrupted filesystem instead of just
> telling that one device is missing and "-o degraded" would help here.
Agreed, the kernel error messages need significant improvement, not just
for this case, but in general (I would _love_ to make sure that there
are exactly zero exit paths for open_ctree that don't involve a proper
error message being printed beyond the ubiquitous `open_ctree failed`
message you get when it fails).
>
>
> I have seen a discussion about the limitation in point 2. That allowing
> to add a device and make it into RAID 1 again might be dangerous, cause
> of system chunk and probably other reasons. I did not completely read
> and understand it tough.
>
> So I still don´t get it, cause:
>
> Either it is a RAID 1, then, one disk may fail and I still have *all*
> data. Also for the system chunk, which according to btrfs fi df / btrfs
> fi sh was indeed RAID 1. If so, then period. Then I don´t see why it
> would need to disallow me to make it into an RAID 1 again after one
> device has been lost.
>
> Or it is no RAID 1 and then what is the point to begin with? As I was
> able to copy of all date of the degraded mount, I´d say it was a RAID 1.
>
> (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just does
> two copies regardless of how many drives you use.)
So, what's happening here is a bit complicated. The issue is entirely
with older kernels that are missing a couple of specific patches, but it
appears that not all distributions have their kernels updated to include
those patches yet.
In short, when you have a volume consisting of _exactly_ two devices
using raid1 profiles that is missing one device, and you mount it
writable and degraded on such a kernel, newly created chunks will be
single-profile chunks instead of raid1 chunks with one half missing.
Any write has the potential to trigger allocation of a new chunk, and
more importantly any _read_ has the potential to trigger allocation of a
new chunk if you don't use the `noatime` mount option (because a read
will trigger an atime update, which results in a write).
When older kernels then go and try to mount that volume a second time,
they see that there are single-profile chunks (which can't tolerate
_any_ device failures), and refuse to mount at all (because they can't
guarantee that metadata is intact). Newer kernels fix this part by
checking per-chunk if a chunk is degraded/complete/missing, which avoids
this because all the single chunks are on the remaining device.
As far as avoiding this in the future:
* If you're just pulling data off the device, mark the device read-only
in the _block layer_, not the filesystem, before you mount it. If
you're using LVM, just mark the LV read-only using LVM commands This
will make 100% certain that nothing gets written to the device, and thus
makes sure that you won't accidentally cause issues like this.
* If you're going to convert to a single device, just do it and don't
stop it part way through. In particular, make sure that your system
will not lose power.
* Otherwise, don't mount the volume unless you know you're going to
repair it.
>
>
> For this laptop it was not all that important but I wonder about BTRFS
> RAID 1 in enterprise environment, cause restoring from backup adds a
> significantly higher downtime.
>
> Anyway, creating a new filesystem may have been better here anyway,
> cause it replaced an BTRFS that aged over several years with a new one.
> Due to the increased capacity and due to me thinking that Samsung 860
> Pro compresses itself, I removed LZO compression. This would also give
> larger extents on files that are not fragmented or only slightly
> fragmented. I think that Intel SSD 320 did not compress, but Crucial
> m500 mSATA SSD does. That has been the secondary SSD that still had all
> the data after the outage of the Intel SSD 320.
First off, keep in mind that the SSD firmware doing compression only
really helps with wear-leveling. Doing it in the filesystem will help
not only with that, but will also give you more space to work with.
Secondarily, keep in mind that most SSD's use compression algorithms
that are fast, but don't generally get particularly amazing compression
ratios (think LZ4 or Snappy for examples of this). In comparison, BTRFS
provides a couple of options that are slower, but get far better ratios
most of the time (zlib, and more recently zstd, which is actually pretty
fast).
>
>
> Overall I am happy, cause BTRFS RAID 1 gave me access to the data after
> the SSD outage. That is the most important thing about it for me.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 11:58 ` Austin S. Hemmelgarn
@ 2018-08-17 12:28 ` Martin Steigerwald
2018-08-17 12:50 ` Roman Mamedov
2018-08-17 12:55 ` Austin S. Hemmelgarn
0 siblings, 2 replies; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-17 12:28 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: linux-btrfs
Thanks for your detailed answer.
Austin S. Hemmelgarn - 17.08.18, 13:58:
> On 2018-08-17 05:08, Martin Steigerwald wrote:
[…]
> > I have seen a discussion about the limitation in point 2. That
> > allowing to add a device and make it into RAID 1 again might be
> > dangerous, cause of system chunk and probably other reasons. I did
> > not completely read and understand it tough.
> >
> > So I still don´t get it, cause:
> >
> > Either it is a RAID 1, then, one disk may fail and I still have
> > *all*
> > data. Also for the system chunk, which according to btrfs fi df /
> > btrfs fi sh was indeed RAID 1. If so, then period. Then I don´t see
> > why it would need to disallow me to make it into an RAID 1 again
> > after one device has been lost.
> >
> > Or it is no RAID 1 and then what is the point to begin with? As I
> > was
> > able to copy of all date of the degraded mount, I´d say it was a
> > RAID 1.
> >
> > (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just
> > does two copies regardless of how many drives you use.)
>
> So, what's happening here is a bit complicated. The issue is entirely
> with older kernels that are missing a couple of specific patches, but
> it appears that not all distributions have their kernels updated to
> include those patches yet.
>
> In short, when you have a volume consisting of _exactly_ two devices
> using raid1 profiles that is missing one device, and you mount it
> writable and degraded on such a kernel, newly created chunks will be
> single-profile chunks instead of raid1 chunks with one half missing.
> Any write has the potential to trigger allocation of a new chunk, and
> more importantly any _read_ has the potential to trigger allocation of
> a new chunk if you don't use the `noatime` mount option (because a
> read will trigger an atime update, which results in a write).
>
> When older kernels then go and try to mount that volume a second time,
> they see that there are single-profile chunks (which can't tolerate
> _any_ device failures), and refuse to mount at all (because they
> can't guarantee that metadata is intact). Newer kernels fix this
> part by checking per-chunk if a chunk is degraded/complete/missing,
> which avoids this because all the single chunks are on the remaining
> device.
How new the kernel needs to be for that to happen?
Do I get this right that it would be the kernel used for recovery, i.e.
the one on the live distro that needs to be new enough? To one on this
laptop meanwhile is already 4.18.1.
I used latest GRML stable release 2017.05 which has an 4.9 kernel.
> As far as avoiding this in the future:
I hope that with the new Samsung Pro 860 together with the existing
Crucial m500 I am spared from this for years to come. That Crucial SSD
according to SMART status about lifetime used has still quite some time
to go.
> * If you're just pulling data off the device, mark the device
> read-only in the _block layer_, not the filesystem, before you mount
> it. If you're using LVM, just mark the LV read-only using LVM
> commands This will make 100% certain that nothing gets written to
> the device, and thus makes sure that you won't accidentally cause
> issues like this.
> * If you're going to convert to a single device,
> just do it and don't stop it part way through. In particular, make
> sure that your system will not lose power.
> * Otherwise, don't mount the volume unless you know you're going to
> repair it.
Thanks for those. Good to keep in mind.
> > For this laptop it was not all that important but I wonder about
> > BTRFS RAID 1 in enterprise environment, cause restoring from backup
> > adds a significantly higher downtime.
> >
> > Anyway, creating a new filesystem may have been better here anyway,
> > cause it replaced an BTRFS that aged over several years with a new
> > one. Due to the increased capacity and due to me thinking that
> > Samsung 860 Pro compresses itself, I removed LZO compression. This
> > would also give larger extents on files that are not fragmented or
> > only slightly fragmented. I think that Intel SSD 320 did not
> > compress, but Crucial m500 mSATA SSD does. That has been the
> > secondary SSD that still had all the data after the outage of the
> > Intel SSD 320.
>
> First off, keep in mind that the SSD firmware doing compression only
> really helps with wear-leveling. Doing it in the filesystem will help
> not only with that, but will also give you more space to work with.
While also reducing the ability of the SSD to wear-level. The more data
I fit on the SSD, the less it can wear-level. And the better I compress
that data, the less it can wear-level.
> Secondarily, keep in mind that most SSD's use compression algorithms
> that are fast, but don't generally get particularly amazing
> compression ratios (think LZ4 or Snappy for examples of this). In
> comparison, BTRFS provides a couple of options that are slower, but
> get far better ratios most of the time (zlib, and more recently zstd,
> which is actually pretty fast).
I considered switching to zstd. But it may not be compatible with grml
2017.05 4.9 kernel, of course I could test a grml snapshot with a newer
kernel. I always like to be able to recover with some live distro :).
And GRML is the one of my choice.
However… I am not all that convinced that it would benefit me as long as
I have enough space. That SSD replacement more than doubled capacity
from about 680 TB to 1480 TB. I have ton of free space in the
filesystems – usage of /home is only 46% for example – and there are 96
GiB completely unused in LVM on the Crucial SSD and even more than 183
GiB completely unused on Samsung SSD. The system is doing weekly
"fstrim" on all filesystems. I think that this is more than is needed
for the longevity of the SSDs, but well actually I just don´t need the
space, so…
Of course, in case I manage to fill up all that space, I consider using
compression. Until then, I am not all that convinced that I´d benefit
from it.
Of course it may increase read speeds and in case of nicely compressible
data also write speeds, I am not sure whether it even matters. Also it
uses up some CPU cycles on a dual core (+ hyperthreading) Sandybridge
mobile i5. While I am not sure about it, I bet also having larger
possible extent sizes may help a bit. As well as no compression may also
help a bit with fragmentation.
Well putting this to a (non-scientific) test:
[…]/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head -5
3,1G parttable.ibd
[…]/.local/share/akonadi/db_data/akonadi> filefrag parttable.ibd
parttable.ibd: 11583 extents found
Hmmm, already quite many extents after just about one week with the new
filesystem. On the old filesystem I had somewhat around 40000-50000
extents on that file.
Well actually what do I know: I don´t even have an idea whether not
using compression would be beneficial. Maybe it does not even matter all
that much.
I bet testing it to the point that I could be sure about it for my
workload would take considerable amount of time.
Ciao,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 12:28 ` Martin Steigerwald
@ 2018-08-17 12:50 ` Roman Mamedov
2018-08-17 13:01 ` Austin S. Hemmelgarn
2018-08-17 21:17 ` Martin Steigerwald
2018-08-17 12:55 ` Austin S. Hemmelgarn
1 sibling, 2 replies; 11+ messages in thread
From: Roman Mamedov @ 2018-08-17 12:50 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Austin S. Hemmelgarn, linux-btrfs
On Fri, 17 Aug 2018 14:28:25 +0200
Martin Steigerwald <martin@lichtvoll.de> wrote:
> > First off, keep in mind that the SSD firmware doing compression only
> > really helps with wear-leveling. Doing it in the filesystem will help
> > not only with that, but will also give you more space to work with.
>
> While also reducing the ability of the SSD to wear-level. The more data
> I fit on the SSD, the less it can wear-level. And the better I compress
> that data, the less it can wear-level.
Do not consider SSD "compression" as a factor in any of your calculations or
planning. Modern controllers do not do it anymore, the last ones that did are
SandForce, and that's 2010 era stuff. You can check for yourself by comparing
write speeds of compressible vs incompressible data, it should be the same. At
most, the modern ones know to recognize a stream of binary zeroes and have a
special case for that.
As for general comment on this thread, always try to save the exact messages
you get when troubleshooting or getting failures from your system. Saying just
"was not able to add" or "btrfs replace not working" without any exact details
isn't really helpful as a bug report or even as a general "experiences" story,
as we don't know what was the exact cause of those, could that have been
avoided or worked around, not to mention what was your FS state at the time
(as in "btrfs fi show" and "fi df").
--
With respect,
Roman
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 12:28 ` Martin Steigerwald
2018-08-17 12:50 ` Roman Mamedov
@ 2018-08-17 12:55 ` Austin S. Hemmelgarn
2018-08-17 21:26 ` Martin Steigerwald
1 sibling, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-17 12:55 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs
On 2018-08-17 08:28, Martin Steigerwald wrote:
> Thanks for your detailed answer.
>
> Austin S. Hemmelgarn - 17.08.18, 13:58:
>> On 2018-08-17 05:08, Martin Steigerwald wrote:
> […]
>>> I have seen a discussion about the limitation in point 2. That
>>> allowing to add a device and make it into RAID 1 again might be
>>> dangerous, cause of system chunk and probably other reasons. I did
>>> not completely read and understand it tough.
>>>
>>> So I still don´t get it, cause:
>>>
>>> Either it is a RAID 1, then, one disk may fail and I still have
>>> *all*
>>> data. Also for the system chunk, which according to btrfs fi df /
>>> btrfs fi sh was indeed RAID 1. If so, then period. Then I don´t see
>>> why it would need to disallow me to make it into an RAID 1 again
>>> after one device has been lost.
>>>
>>> Or it is no RAID 1 and then what is the point to begin with? As I
>>> was
>>> able to copy of all date of the degraded mount, I´d say it was a
>>> RAID 1.
>>>
>>> (I know that BTRFS RAID 1 is not a regular RAID 1 anyway, but just
>>> does two copies regardless of how many drives you use.)
>>
>> So, what's happening here is a bit complicated. The issue is entirely
>> with older kernels that are missing a couple of specific patches, but
>> it appears that not all distributions have their kernels updated to
>> include those patches yet.
>>
>> In short, when you have a volume consisting of _exactly_ two devices
>> using raid1 profiles that is missing one device, and you mount it
>> writable and degraded on such a kernel, newly created chunks will be
>> single-profile chunks instead of raid1 chunks with one half missing.
>> Any write has the potential to trigger allocation of a new chunk, and
>> more importantly any _read_ has the potential to trigger allocation of
>> a new chunk if you don't use the `noatime` mount option (because a
>> read will trigger an atime update, which results in a write).
>>
>> When older kernels then go and try to mount that volume a second time,
>> they see that there are single-profile chunks (which can't tolerate
>> _any_ device failures), and refuse to mount at all (because they
>> can't guarantee that metadata is intact). Newer kernels fix this
>> part by checking per-chunk if a chunk is degraded/complete/missing,
>> which avoids this because all the single chunks are on the remaining
>> device.
>
> How new the kernel needs to be for that to happen?
>
> Do I get this right that it would be the kernel used for recovery, i.e.
> the one on the live distro that needs to be new enough? To one on this
> laptop meanwhile is already 4.18.1.
Yes, the kernel used for recovery is the important one here. I don't
remember for certain when the patches went in, but I'm pretty sure it's
been no eariler than 4.14. FWIW, I'm pretty sure SystemRescueCD has a
new enough kernel, but they still (sadly) lack zstd support.
>
> I used latest GRML stable release 2017.05 which has an 4.9 kernel.
While I don't know exactly when the patches went in, I'm fairly certain
that 4.9 never got them.
>
>> As far as avoiding this in the future:
>
> I hope that with the new Samsung Pro 860 together with the existing
> Crucial m500 I am spared from this for years to come. That Crucial SSD
> according to SMART status about lifetime used has still quite some time
> to go.
Yes, hopefully. And the SMART status on that Crucial is probably right,
they tend to do a very good job in my experience with accurately
measuring life expectancy (that or they're just _really_ good at
predicting failures, I've never had a Crucial SSD that did not indicate
correctly in the SMART status that it would fail in the near future).
>
>> * If you're just pulling data off the device, mark the device
>> read-only in the _block layer_, not the filesystem, before you mount
>> it. If you're using LVM, just mark the LV read-only using LVM
>> commands This will make 100% certain that nothing gets written to
>> the device, and thus makes sure that you won't accidentally cause
>> issues like this.
>
>> * If you're going to convert to a single device,
>> just do it and don't stop it part way through. In particular, make
>> sure that your system will not lose power.
>
>> * Otherwise, don't mount the volume unless you know you're going to
>> repair it.
>
> Thanks for those. Good to keep in mind.
The last one is actually good advice in general, not just for BTRFS. I
can't count how many stories I've heard of people who tried to run half
an array simply to avoid downtime, and ended up making things far worse
than they were as a result.
>
>>> For this laptop it was not all that important but I wonder about
>>> BTRFS RAID 1 in enterprise environment, cause restoring from backup
>>> adds a significantly higher downtime.
>>>
>>> Anyway, creating a new filesystem may have been better here anyway,
>>> cause it replaced an BTRFS that aged over several years with a new
>>> one. Due to the increased capacity and due to me thinking that
>>> Samsung 860 Pro compresses itself, I removed LZO compression. This
>>> would also give larger extents on files that are not fragmented or
>>> only slightly fragmented. I think that Intel SSD 320 did not
>>> compress, but Crucial m500 mSATA SSD does. That has been the
>>> secondary SSD that still had all the data after the outage of the
>>> Intel SSD 320.
>>
>> First off, keep in mind that the SSD firmware doing compression only
>> really helps with wear-leveling. Doing it in the filesystem will help
>> not only with that, but will also give you more space to work with.
>
> While also reducing the ability of the SSD to wear-level. The more data
> I fit on the SSD, the less it can wear-level. And the better I compress
> that data, the less it can wear-level.
No, the better you compress the data, the _less_ data you are physically
putting on the SSD, just like compressing a file makes it take up less
space. This actually makes it easier for the firmware to do
wear-leveling. Wear-leveling is entirely about picking where to put
data, and by reducing the total amount of data you are writing to the
SSD, you're making that decision easier for the firmware, and also
reducing the number of blocks of flash memory needed (which also helps
with SSD life expectancy because it translates to fewer erase cycles).
The compression they do internally operates on the same principal, the
only difference is that you have no control over how it's doing it and
no way to see exactly how efficient it is (but it's pretty well known it
needs to be fast, and fast compression usually does not get good
compression ratios).
>
>> Secondarily, keep in mind that most SSD's use compression algorithms
>> that are fast, but don't generally get particularly amazing
>> compression ratios (think LZ4 or Snappy for examples of this). In
>> comparison, BTRFS provides a couple of options that are slower, but
>> get far better ratios most of the time (zlib, and more recently zstd,
>> which is actually pretty fast).
>
> I considered switching to zstd. But it may not be compatible with grml
> 2017.05 4.9 kernel, of course I could test a grml snapshot with a newer
> kernel. I always like to be able to recover with some live distro :).
> And GRML is the one of my choice.
>
> However… I am not all that convinced that it would benefit me as long as
> I have enough space. That SSD replacement more than doubled capacity
> from about 680 TB to 1480 TB. I have ton of free space in the
> filesystems – usage of /home is only 46% for example – and there are 96
> GiB completely unused in LVM on the Crucial SSD and even more than 183
> GiB completely unused on Samsung SSD. The system is doing weekly
> "fstrim" on all filesystems. I think that this is more than is needed
> for the longevity of the SSDs, but well actually I just don´t need the
> space, so…
>
> Of course, in case I manage to fill up all that space, I consider using
> compression. Until then, I am not all that convinced that I´d benefit
> from it.
>
> Of course it may increase read speeds and in case of nicely compressible
> data also write speeds, I am not sure whether it even matters. Also it
> uses up some CPU cycles on a dual core (+ hyperthreading) Sandybridge
> mobile i5. While I am not sure about it, I bet also having larger
> possible extent sizes may help a bit. As well as no compression may also
> help a bit with fragmentation.
It generally does actually. Less data physically on the device means
lower chances of fragmentation. In your case, it may not improve speed
much though (your i5 _probably_ can't compress data much faster than it
can access your SSD's, which means you likely won't see much performance
benefit other than reducing fragmentation).
>
> Well putting this to a (non-scientific) test:
>
> […]/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head -5
> 3,1G parttable.ibd
>
> […]/.local/share/akonadi/db_data/akonadi> filefrag parttable.ibd
> parttable.ibd: 11583 extents found
>
> Hmmm, already quite many extents after just about one week with the new
> filesystem. On the old filesystem I had somewhat around 40000-50000
> extents on that file.
Filefrag doesn't properly handle compressed files on BTRFS. It treats
each 128KiB compression block as a separate extent, even though they may
be contiguous as part of one BTRFS extent. That one file by itself
should have reported as about 25396 extents on the old volume (assuming
it was entirely compressed), so your numbers seem to match up
realistically.>
>
> Well actually what do I know: I don´t even have an idea whether not
> using compression would be beneficial. Maybe it does not even matter all
> that much.
>
> I bet testing it to the point that I could be sure about it for my
> workload would take considerable amount of time.
>
One last quick thing about compression in general on BTRFS. Unless you
have a lot of files that are likely to be completely incompressible,
you're generally better off using `compress-force` instead of
`compress`. With regular `compress`, BTRFS will try to compress the
first few blocks of a file, and if that fails will mark the file as
incompressible and not try to compress any of it automatically ever
again. With `compress-force`, BTRFS will just unconditionally compress
everything.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 12:50 ` Roman Mamedov
@ 2018-08-17 13:01 ` Austin S. Hemmelgarn
2018-08-17 21:16 ` Martin Steigerwald
2018-08-17 21:17 ` Martin Steigerwald
1 sibling, 1 reply; 11+ messages in thread
From: Austin S. Hemmelgarn @ 2018-08-17 13:01 UTC (permalink / raw)
To: Roman Mamedov, Martin Steigerwald; +Cc: linux-btrfs
On 2018-08-17 08:50, Roman Mamedov wrote:
> On Fri, 17 Aug 2018 14:28:25 +0200
> Martin Steigerwald <martin@lichtvoll.de> wrote:
>
>>> First off, keep in mind that the SSD firmware doing compression only
>>> really helps with wear-leveling. Doing it in the filesystem will help
>>> not only with that, but will also give you more space to work with.
>>
>> While also reducing the ability of the SSD to wear-level. The more data
>> I fit on the SSD, the less it can wear-level. And the better I compress
>> that data, the less it can wear-level.
>
> Do not consider SSD "compression" as a factor in any of your calculations or
> planning. Modern controllers do not do it anymore, the last ones that did are
> SandForce, and that's 2010 era stuff. You can check for yourself by comparing
> write speeds of compressible vs incompressible data, it should be the same. At
> most, the modern ones know to recognize a stream of binary zeroes and have a
> special case for that.
All that testing write speeds forz compressible versus incompressible
data tells you is if the SSD is doing real-time compression of data, not
if they are doing any compression at all.. Also, this test only works
if you turn the write-cache on the device off.
Besides, you can't prove 100% for certain that any manufacturer who does
not sell their controller chips isn't doing this, which means there are
a few manufacturers that may still be doing it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 13:01 ` Austin S. Hemmelgarn
@ 2018-08-17 21:16 ` Martin Steigerwald
0 siblings, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-17 21:16 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: Roman Mamedov, linux-btrfs
Austin S. Hemmelgarn - 17.08.18, 15:01:
> On 2018-08-17 08:50, Roman Mamedov wrote:
> > On Fri, 17 Aug 2018 14:28:25 +0200
> >
> > Martin Steigerwald <martin@lichtvoll.de> wrote:
> >>> First off, keep in mind that the SSD firmware doing compression
> >>> only
> >>> really helps with wear-leveling. Doing it in the filesystem will
> >>> help not only with that, but will also give you more space to
> >>> work with.>>
> >> While also reducing the ability of the SSD to wear-level. The more
> >> data I fit on the SSD, the less it can wear-level. And the better
> >> I compress that data, the less it can wear-level.
> >
> > Do not consider SSD "compression" as a factor in any of your
> > calculations or planning. Modern controllers do not do it anymore,
> > the last ones that did are SandForce, and that's 2010 era stuff.
> > You can check for yourself by comparing write speeds of
> > compressible vs incompressible data, it should be the same. At
> > most, the modern ones know to recognize a stream of binary zeroes
> > and have a special case for that.
>
> All that testing write speeds forz compressible versus incompressible
> data tells you is if the SSD is doing real-time compression of data,
> not if they are doing any compression at all.. Also, this test only
> works if you turn the write-cache on the device off.
As the data still needs to be transferred to the SSD at least when the
SATA connection is maxed out I bet you won´t see any difference in write
speed whether the SSD compresses in real time or not.
> Besides, you can't prove 100% for certain that any manufacturer who
> does not sell their controller chips isn't doing this, which means
> there are a few manufacturers that may still be doing it.
Who really knows what SSD controller manufacturers are doing? I have not
seen any Open Channel SSD stuff for laptops so far.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 12:50 ` Roman Mamedov
2018-08-17 13:01 ` Austin S. Hemmelgarn
@ 2018-08-17 21:17 ` Martin Steigerwald
2018-08-18 7:12 ` Roman Mamedov
1 sibling, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-17 21:17 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Austin S. Hemmelgarn, linux-btrfs
Hi Roman.
Now with proper CC.
Roman Mamedov - 17.08.18, 14:50:
> On Fri, 17 Aug 2018 14:28:25 +0200
>
> Martin Steigerwald <martin@lichtvoll.de> wrote:
> > > First off, keep in mind that the SSD firmware doing compression
> > > only
> > > really helps with wear-leveling. Doing it in the filesystem will
> > > help not only with that, but will also give you more space to
> > > work with.>
> > While also reducing the ability of the SSD to wear-level. The more
> > data I fit on the SSD, the less it can wear-level. And the better I
> > compress that data, the less it can wear-level.
>
> Do not consider SSD "compression" as a factor in any of your
> calculations or planning. Modern controllers do not do it anymore,
> the last ones that did are SandForce, and that's 2010 era stuff. You
> can check for yourself by comparing write speeds of compressible vs
> incompressible data, it should be the same. At most, the modern ones
> know to recognize a stream of binary zeroes and have a special case
> for that.
Interesting. Do you have any backup for your claim?
> As for general comment on this thread, always try to save the exact
> messages you get when troubleshooting or getting failures from your
> system. Saying just "was not able to add" or "btrfs replace not
> working" without any exact details isn't really helpful as a bug
> report or even as a general "experiences" story, as we don't know
> what was the exact cause of those, could that have been avoided or
> worked around, not to mention what was your FS state at the time (as
> in "btrfs fi show" and "fi df").
I had a screen.log, but I put it on the filesystem after the
backup was made, so it was lost.
Anyway, the reason for not being able to add the device was the read
only state of the BTRFS, as I wrote. Same goes for replace. I was able
to read the error message just fine. AFAIR the exact wording was "read
only filesystem".
In any case: It was a experience report, no request for help, so I don´t
see why exact error messages are absolutely needed. If I had a support
inquiry that would be different, I agree.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 12:55 ` Austin S. Hemmelgarn
@ 2018-08-17 21:26 ` Martin Steigerwald
0 siblings, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-17 21:26 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: linux-btrfs
Austin S. Hemmelgarn - 17.08.18, 14:55:
> On 2018-08-17 08:28, Martin Steigerwald wrote:
> > Thanks for your detailed answer.
> >
> > Austin S. Hemmelgarn - 17.08.18, 13:58:
> >> On 2018-08-17 05:08, Martin Steigerwald wrote:
[…]
> >>> Anyway, creating a new filesystem may have been better here
> >>> anyway,
> >>> cause it replaced an BTRFS that aged over several years with a new
> >>> one. Due to the increased capacity and due to me thinking that
> >>> Samsung 860 Pro compresses itself, I removed LZO compression. This
> >>> would also give larger extents on files that are not fragmented or
> >>> only slightly fragmented. I think that Intel SSD 320 did not
> >>> compress, but Crucial m500 mSATA SSD does. That has been the
> >>> secondary SSD that still had all the data after the outage of the
> >>> Intel SSD 320.
> >>
> >> First off, keep in mind that the SSD firmware doing compression
> >> only
> >> really helps with wear-leveling. Doing it in the filesystem will
> >> help not only with that, but will also give you more space to work
> >> with.>
> > While also reducing the ability of the SSD to wear-level. The more
> > data I fit on the SSD, the less it can wear-level. And the better I
> > compress that data, the less it can wear-level.
>
> No, the better you compress the data, the _less_ data you are
> physically putting on the SSD, just like compressing a file makes it
> take up less space. This actually makes it easier for the firmware
> to do wear-leveling. Wear-leveling is entirely about picking where
> to put data, and by reducing the total amount of data you are writing
> to the SSD, you're making that decision easier for the firmware, and
> also reducing the number of blocks of flash memory needed (which also
> helps with SSD life expectancy because it translates to fewer erase
> cycles).
On one hand I can go with this, but:
If I fill the SSD 99% with already compressed data, in case it
compresses itself for wear leveling, it has less chance to wear level
than with 99% of not yet compressed data that it could compress itself.
That was the point I was trying to make.
Sure, with a fill rate of about 46% for home, compression would help the
wear leveling. And if the controller does not compress at all, it would
also.
Hmmm, maybe I enable "zstd", but on the other hand I save CPU cycles
with not enabling it.
> > However… I am not all that convinced that it would benefit me as
> > long as I have enough space. That SSD replacement more than doubled
> > capacity from about 680 TB to 1480 TB. I have ton of free space in
> > the filesystems – usage of /home is only 46% for example – and
> > there are 96 GiB completely unused in LVM on the Crucial SSD and
> > even more than 183 GiB completely unused on Samsung SSD. The system
> > is doing weekly "fstrim" on all filesystems. I think that this is
> > more than is needed for the longevity of the SSDs, but well
> > actually I just don´t need the space, so…
> >
> > Of course, in case I manage to fill up all that space, I consider
> > using compression. Until then, I am not all that convinced that I´d
> > benefit from it.
> >
> > Of course it may increase read speeds and in case of nicely
> > compressible data also write speeds, I am not sure whether it even
> > matters. Also it uses up some CPU cycles on a dual core (+
> > hyperthreading) Sandybridge mobile i5. While I am not sure about
> > it, I bet also having larger possible extent sizes may help a bit.
> > As well as no compression may also help a bit with fragmentation.
>
> It generally does actually. Less data physically on the device means
> lower chances of fragmentation. In your case, it may not improve
I thought "no compression" may help with fragmentation, but I think you
think that "compression" helps with fragmentation and misunderstood what
I wrote.
> speed much though (your i5 _probably_ can't compress data much faster
> than it can access your SSD's, which means you likely won't see much
> performance benefit other than reducing fragmentation).
>
> > Well putting this to a (non-scientific) test:
> >
> > […]/.local/share/akonadi/db_data/akonadi> du -sh * | sort -rh | head
> > -5 3,1G parttable.ibd
> >
> > […]/.local/share/akonadi/db_data/akonadi> filefrag parttable.ibd
> > parttable.ibd: 11583 extents found
> >
> > Hmmm, already quite many extents after just about one week with the
> > new filesystem. On the old filesystem I had somewhat around
> > 40000-50000 extents on that file.
>
> Filefrag doesn't properly handle compressed files on BTRFS. It treats
> each 128KiB compression block as a separate extent, even though they
> may be contiguous as part of one BTRFS extent. That one file by
> itself should have reported as about 25396 extents on the old volume
> (assuming it was entirely compressed), so your numbers seem to match
> up realistically.>
Oh, thanks. I did not know that filefrag does not understand extents for
compressed files in BTRFS.
> > Well actually what do I know: I don´t even have an idea whether not
> > using compression would be beneficial. Maybe it does not even matter
> > all that much.
> >
> > I bet testing it to the point that I could be sure about it for my
> > workload would take considerable amount of time.
>
> One last quick thing about compression in general on BTRFS. Unless
> you have a lot of files that are likely to be completely
> incompressible, you're generally better off using `compress-force`
> instead of `compress`. With regular `compress`, BTRFS will try to
> compress the first few blocks of a file, and if that fails will mark
> the file as incompressible and not try to compress any of it
> automatically ever again. With `compress-force`, BTRFS will just
> unconditionally compress everything.
Well on one filesystem which is on a single SSD, I do have lots of image
files, mostly jpg, and audio files in mp3 or ogg vorbis formats.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-17 21:17 ` Martin Steigerwald
@ 2018-08-18 7:12 ` Roman Mamedov
2018-08-18 8:47 ` Martin Steigerwald
0 siblings, 1 reply; 11+ messages in thread
From: Roman Mamedov @ 2018-08-18 7:12 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Austin S. Hemmelgarn, linux-btrfs
On Fri, 17 Aug 2018 23:17:33 +0200
Martin Steigerwald <martin@lichtvoll.de> wrote:
> > Do not consider SSD "compression" as a factor in any of your
> > calculations or planning. Modern controllers do not do it anymore,
> > the last ones that did are SandForce, and that's 2010 era stuff. You
> > can check for yourself by comparing write speeds of compressible vs
> > incompressible data, it should be the same. At most, the modern ones
> > know to recognize a stream of binary zeroes and have a special case
> > for that.
>
> Interesting. Do you have any backup for your claim?
Just "something I read". I follow quote a bit of SSD-related articles and
reviews which often also include a section to talk about the controller
utilized, its background and technological improvements/changes -- and the
compression going out of fashion after SandForce seems to be considered a
well-known fact.
Incidentally, your old Intel 320 SSDs actually seem to be based on that old
SandForce controller (or at least license some of that IP to extend on it),
and hence those indeed might perform compression.
> As the data still needs to be transferred to the SSD at least when the
> SATA connection is maxed out I bet you won´t see any difference in write
> speed whether the SSD compresses in real time or not.
Most controllers expose two readings in SMART:
- Lifetime writes from host (SMART attribute 241)
- Lifetime writes to flash (attribute 233, or 177, or 173...)
It might be difficult to get the second one, as often it needs to be decoded
from others such as "Average block erase count" or "Wear leveling count".
(And seems to be impossible on Samsung NVMe ones, for example)
But if you have numbers for both, you know the write amplification of the
drive (and its past workload).
If there is compression at work, you'd see the 2nd number being somewhat, or
significantly lower -- and barely increase at all, if you write highly
compressible data. This is not typically observed on modern SSDs, except maybe
when writing zeroes. Writes to flash will be the same as writes from host, or
most often somewhat higher, as the hardware can typically erase flash only in
chunks of 2MB or so, hence there's quite a bit of under the hood reorganizing
going on. Also as a result depending on workloads the "to flash" number can be
much higher than "from host".
Point is, even when the SATA link is maxed out in both cases, you can still
check if there's compression at work via using those SMART attributes.
> In any case: It was a experience report, no request for help, so I don´t
> see why exact error messages are absolutely needed. If I had a support
> inquiry that would be different, I agree.
Well, when reading such stories (involving software that I also use) I imagine
what if I had been in that situation myself, what would I do, would I have
anything else to try, do I know about any workaround for this. And without any
technical details to go from, those are all questions left unanswered.
--
With respect,
Roman
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD
2018-08-18 7:12 ` Roman Mamedov
@ 2018-08-18 8:47 ` Martin Steigerwald
0 siblings, 0 replies; 11+ messages in thread
From: Martin Steigerwald @ 2018-08-18 8:47 UTC (permalink / raw)
To: Roman Mamedov; +Cc: Austin S. Hemmelgarn, linux-btrfs
Roman Mamedov - 18.08.18, 09:12:
> On Fri, 17 Aug 2018 23:17:33 +0200
>
> Martin Steigerwald <martin@lichtvoll.de> wrote:
> > > Do not consider SSD "compression" as a factor in any of your
> > > calculations or planning. Modern controllers do not do it anymore,
> > > the last ones that did are SandForce, and that's 2010 era stuff.
> > > You
> > > can check for yourself by comparing write speeds of compressible
> > > vs
> > > incompressible data, it should be the same. At most, the modern
> > > ones
> > > know to recognize a stream of binary zeroes and have a special
> > > case
> > > for that.
> >
> > Interesting. Do you have any backup for your claim?
>
> Just "something I read". I follow quote a bit of SSD-related articles
> and reviews which often also include a section to talk about the
> controller utilized, its background and technological
> improvements/changes -- and the compression going out of fashion
> after SandForce seems to be considered a well-known fact.
>
> Incidentally, your old Intel 320 SSDs actually seem to be based on
> that old SandForce controller (or at least license some of that IP to
> extend on it), and hence those indeed might perform compression.
Interesting. Back then I read the Intel SSD 320 would not compress.
I think its difficult to know for sure with those proprietary controllers.
> > As the data still needs to be transferred to the SSD at least when
> > the SATA connection is maxed out I bet you won´t see any difference
> > in write speed whether the SSD compresses in real time or not.
>
> Most controllers expose two readings in SMART:
>
> - Lifetime writes from host (SMART attribute 241)
> - Lifetime writes to flash (attribute 233, or 177, or 173...)
>
> It might be difficult to get the second one, as often it needs to be
> decoded from others such as "Average block erase count" or "Wear
> leveling count". (And seems to be impossible on Samsung NVMe ones,
> for example)
I got the impression every manufacturer does their own thing here. And I
would not even be surprised when its different between different generations
of SSDs by one manufacturer.
# Crucial mSATA
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 000 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 16345
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4193
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Wear_Leveling_Count 0x0032 078 078 000 Old_age Always - 663
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 362
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 8219
183 SATA_Iface_Downshift 0x0032 100 100 000 Old_age Always - 1
184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 046 020 000 Old_age Always - 54 (Min/Max -10/80)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 16
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Used 0x0031 078 078 000 Pre-fail Offline - 22
I expect the raw value of this to raise more slowly now there are almost
100 GiB completely unused and there is lots of free space in the filesystems.
But even if not, the SSD is in use since March 2014. So it has plenty of time
to go.
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_Host_Sector_Write 0x0032 100 100 --- Old_age Always - 91288276930
^^ In sectors. 91288276930 * 512 / 1024 / 1024 / 1024 ~= 43529 GiB
Could be 4 KiB… but as its telling about Host_Sector and the value multiplied
by eight does not make any sense, I bet its 512 Bytes.
% smartctl /dev/sdb --all |grep "Sector Size"
Sector Sizes: 512 bytes logical, 4096 bytes physical
247 Host_Program_Page_Count 0x0032 100 100 --- Old_age Always - 2892511571
248 Bckgnd_Program_Page_Cnt 0x0032 100 100 --- Old_age Always - 742817198
# Intel SSD 320, before secure erase
The Intel SSD 320 in April 2017, I lost the smartctl -a directly before the
secure erase output due to writing it to the /home filesystem after the
backup – I do have the more recent attrlog CSV file, but I feel to lazy
to format it in a meaningful way:
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0
4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 21035
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5292
170 Reserve_Block_Count 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 169
183 SATA_Downshift_Count 0x0030 100 100 000 Old_age Offline - 3
184 End-to-End_Error 0x0032 100 100 090 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 462
199 CRC_Error_Count 0x0030 100 100 000 Old_age Offline - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1370316
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 2206583
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 49
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 13857327
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 097 097 000 Old_age Always - 0
^^ almost new. I have a PDF from Intel explaining this value somewhere.
Intel SSD 320 had more free space than the Crucial M500 for a good time
of their usage.
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 1370316
^^ 1370316 * 32 / 1024 ~= 42822 GiB
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 2016560
The Intel SSD is in use for a longer time, since May 2011.
# Intel SSD 320 after secure erase:
Interestingly the secure erase nuked the SMART values:
SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0020 100 100 000 Old_age Offline - 0
4 Start_Stop_Count 0x0030 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 6726
170 Reserve_Block_Count 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
183 SATA_Downshift_Count 0x0030 100 100 000 Old_age Offline - 0
184 End-to-End_Error 0x0032 100 100 090 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 537
199 CRC_Error_Count 0x0030 100 100 000 Old_age Offline - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5768
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 65535
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 65535
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 5768
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always -
Good for selling it. You could claim it is all fresh and new :)
# Samsung Pro 860
Note this SSD is almost new – smartctl 6.6 2016-05-31 does not know about
one attribute. I am not sure why the command is so old in Debian Sid:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 50
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 26
177 Wear_Leveling_Count 0x0013 100 100 000 Pre-fail Always - 0
^^ new :)
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 065 052 000 Old_age Always - 35
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 1
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 1133999775
According to reference in internet, sectors are meant here, so:
1133999775 * 512 / 1024 / 1024 / 1024 ~= 541 GiB
% smartctl /dev/sda --all |grep "Sector Size"
Sector Size: 512 bytes logical/physical
> But if you have numbers for both, you know the write amplification of
> the drive (and its past workload).
Sure.
> If there is compression at work, you'd see the 2nd number being
> somewhat, or significantly lower -- and barely increase at all, if
> you write highly compressible data. This is not typically observed on
> modern SSDs, except maybe when writing zeroes. Writes to flash will
> be the same as writes from host, or most often somewhat higher, as
> the hardware can typically erase flash only in chunks of 2MB or so,
> hence there's quite a bit of under the hood reorganizing going on.
> Also as a result depending on workloads the "to flash" number can be
> much higher than "from host".
Okay, I get that, but it would be quite some effort to make reliable
measurements cause you´d need to write quite some amount of data
for the media wearout indicator to change. I do not intend to do that.
> Point is, even when the SATA link is maxed out in both cases, you can
> still check if there's compression at work via using those SMART
> attributes.
Sure. But with quite some effort. And with some aging of the SSDs involved.
I can imagine better uses of my time :)
> > In any case: It was a experience report, no request for help, so I
> > don´t see why exact error messages are absolutely needed. If I had
> > a support inquiry that would be different, I agree.
>
> Well, when reading such stories (involving software that I also use) I
> imagine what if I had been in that situation myself, what would I do,
> would I have anything else to try, do I know about any workaround for
> this. And without any technical details to go from, those are all
> questions left unanswered.
Sure, I get that.
My priority was to bring the machine back online. I managed to put the
screen log on a filesystem I destroyed afterwards and I managed to put it
there after the backup of that filesystem was complete… so c’est la vie the
log is gone. But even if I still had it, I probably would not have included
all error messages. But I would have been able to provide the those you
are interested in. Anyway, its gone and that is it.
Thanks,
--
Martin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-08-18 11:54 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-17 9:08 Experiences on BTRFS Dual SSD RAID 1 with outage of one SSD Martin Steigerwald
2018-08-17 11:58 ` Austin S. Hemmelgarn
2018-08-17 12:28 ` Martin Steigerwald
2018-08-17 12:50 ` Roman Mamedov
2018-08-17 13:01 ` Austin S. Hemmelgarn
2018-08-17 21:16 ` Martin Steigerwald
2018-08-17 21:17 ` Martin Steigerwald
2018-08-18 7:12 ` Roman Mamedov
2018-08-18 8:47 ` Martin Steigerwald
2018-08-17 12:55 ` Austin S. Hemmelgarn
2018-08-17 21:26 ` Martin Steigerwald
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).