* SSD erase state and reducing SSD wear
@ 2012-05-22 21:47 Martin
2012-05-23 4:19 ` Calvin Walton
0 siblings, 1 reply; 4+ messages in thread
From: Martin @ 2012-05-22 21:47 UTC (permalink / raw)
To: linux-btrfs
I've got two recent examples of SSDs. Their pristine state from the
manufacturer shows:
Device Model: OCZ-VERTEX3
# hexdump -C /dev/sdd
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
*
1bf2976000
Device Model: OCZ VERTEX PLUS
(OCZ VERTEX 2E)
# hexdump -C /dev/sdd
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|................|
*
df99e6000
What's a good way to test what state they get erased to from a TRIM
operation?
Can btrfs detect the erase state and pad unused space in filesystem
writes with the same value so as to reduce SSD wear?
Regards,
Martin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SSD erase state and reducing SSD wear
2012-05-22 21:47 SSD erase state and reducing SSD wear Martin
@ 2012-05-23 4:19 ` Calvin Walton
2012-05-23 15:44 ` Martin
0 siblings, 1 reply; 4+ messages in thread
From: Calvin Walton @ 2012-05-23 4:19 UTC (permalink / raw)
To: Martin; +Cc: linux-btrfs
On Tue, 2012-05-22 at 22:47 +0100, Martin wrote:
> I've got two recent examples of SSDs. Their pristine state from the
> manufacturer shows:
> Device Model: OCZ-VERTEX3
> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Device Model: OCZ VERTEX PLUS
> 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> What's a good way to test what state they get erased to from a TRIM
> operation?
This pristine state probably matches up with the result of a trim
command on the drive. In particular, a freshly erased flash block is in
a state where the bits are all 1, so the Vertex Plus drive is showing
you the flash contents directly. The Vertex 3 has substantially more
processing, and the 0s are effectively generated on the fly for unmapped
flash blocks (similar to how the missing portions of a sparse file
contains 0s).
> Can btrfs detect the erase state and pad unused space in filesystem
> writes with the same value so as to reduce SSD wear?
On the Vertex 3, this wouldn't actually do what you'd hope. The firmware
in that drive actually compresses, deduplicates, and encrypts all the
data prior to writing it to flash - and as a result the data that hits
the flash looks nothing like what the filesystem wrote.
(For best performance, it might make sense to disable btrfs's built-in
compression on the Vertex 3 drive to allow the drive's compression to
kick in. Let us know if you benchmark it either way.)
The benefit to doing this on the Vertex Plus is probably fairly small,
since to rewrite a block - even if the block is partially unwritten - is
still likely to require a read-modify-write cycle with an erase step.
The granularity of the erase blocks is just too big for the savings to
be very meaningful.
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SSD erase state and reducing SSD wear
2012-05-23 4:19 ` Calvin Walton
@ 2012-05-23 15:44 ` Martin
2012-05-23 19:50 ` Calvin Walton
0 siblings, 1 reply; 4+ messages in thread
From: Martin @ 2012-05-23 15:44 UTC (permalink / raw)
To: linux-btrfs
On 23/05/12 05:19, Calvin Walton wrote:
> On Tue, 2012-05-22 at 22:47 +0100, Martin wrote:
>> I've got two recent examples of SSDs. Their pristine state from the
>> manufacturer shows:
>
>> Device Model: OCZ-VERTEX3
>> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
>> Device Model: OCZ VERTEX PLUS
>> 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
>> What's a good way to test what state they get erased to from a TRIM
>> operation?
>
> This pristine state probably matches up with the result of a trim
> command on the drive. In particular, a freshly erased flash block is in
> a state where the bits are all 1, so the Vertex Plus drive is showing
> you the flash contents directly. The Vertex 3 has substantially more
> processing, and the 0s are effectively generated on the fly for unmapped
> flash blocks (similar to how the missing portions of a sparse file
> contains 0s).
So for that example of reading an 'empty' drive, the OCZ-VERTEX3 might
not even be reading the flash chips at all!...
>> Can btrfs detect the erase state and pad unused space in filesystem
>> writes with the same value so as to reduce SSD wear?
>
> On the Vertex 3, this wouldn't actually do what you'd hope. The firmware
> in that drive actually compresses, deduplicates, and encrypts all the
> data prior to writing it to flash - and as a result the data that hits
> the flash looks nothing like what the filesystem wrote.
> (For best performance, it might make sense to disable btrfs's built-in
> compression on the Vertex 3 drive to allow the drive's compression to
> kick in. Let us know if you benchmark it either way.)
Very good comment, thanks. That leaves a very good question of how the
Sandforce controller uses the flash. Does it implement its own 'virtual
block level' interface to then use the underlying flash using structures
that are not visible externally?
What does that do to concerns about alignment?...
And for what granularity of write chunks?
> The benefit to doing this on the Vertex Plus is probably fairly small,
> since to rewrite a block - even if the block is partially unwritten - is
> still likely to require a read-modify-write cycle with an erase step.
> The granularity of the erase blocks is just too big for the savings to
> be very meaningful.
My understanding is that the 'wear' mechanism in flash is a problem of
charge getting trapped in the insulation material itself that surrounds
the floating gate of a cell. The permanently trapped charge accumulates
further for each change of state until a high enough offset voltage has
accumulated to exceed what can be tolerated for correct operation of the
cell.
Hence, writing the *same value* as that for already stored for a cell
should not cause any wear being as you are not changing the state of a
cell. (No change in charge levels.)
For non-Sandforce controllers, that suggests doing a read-modify-write
to pad out whatever minimum sized write chunk. That would be rather poor
for performance, and the manufacturer's secrecy means we cannot be sure
of the underlying write block size for minimum sized alignment.
Alternatively, padding out writes with the erased state value means that
no further wear should be caused for when that block is eventually
TRIMed/erased for rewriting.
That should also be a 'soft' option for the Sandforce controllers in
that /hopefully/ their compression/deduplication will compress down the
padding so as not to be a problem.
(Damn the Manufacturer's secrecy!)
Regards,
Martin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SSD erase state and reducing SSD wear
2012-05-23 15:44 ` Martin
@ 2012-05-23 19:50 ` Calvin Walton
0 siblings, 0 replies; 4+ messages in thread
From: Calvin Walton @ 2012-05-23 19:50 UTC (permalink / raw)
To: Martin; +Cc: linux-btrfs
On Wed, 2012-05-23 at 16:44 +0100, Martin wrote:
> On 23/05/12 05:19, Calvin Walton wrote:
> > On Tue, 2012-05-22 at 22:47 +0100, Martin wrote:
> >> I've got two recent examples of SSDs. Their pristine state from the
> >> manufacturer shows:
> >
> >> Device Model: OCZ-VERTEX3
> >> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >
> >> Device Model: OCZ VERTEX PLUS
> >> 00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >> Can btrfs detect the erase state and pad unused space in filesystem
> >> writes with the same value so as to reduce SSD wear?
> > The benefit to doing this on the Vertex Plus is probably fairly small,
> > since to rewrite a block - even if the block is partially unwritten - is
> > still likely to require a read-modify-write cycle with an erase step.
> > The granularity of the erase blocks is just too big for the savings to
> > be very meaningful.
>
> My understanding is that the 'wear' mechanism in flash is a problem of
> charge getting trapped in the insulation material itself that surrounds
> the floating gate of a cell. The permanently trapped charge accumulates
> further for each change of state until a high enough offset voltage has
> accumulated to exceed what can be tolerated for correct operation of the
> cell.
>
> Hence, writing the *same value* as that for already stored for a cell
> should not cause any wear being as you are not changing the state of a
> cell. (No change in charge levels.)
>
> For non-Sandforce controllers, that suggests doing a read-modify-write
> to pad out whatever minimum sized write chunk. That would be rather poor
> for performance, and the manufacturer's secrecy means we cannot be sure
> of the underlying write block size for minimum sized alignment.
It's very unlikely that the firmware in any modern high-performance SSD
would ever do an in-place read-modify-write sequence. If you write data
to the same sector on the disc twice, it is more likely to actually
write to two different places in the flash.
A flash erase block typically won't be re-used until all of the data
that had been in it gets rewritten somewhere else. The Indilinx
controller in the Vertex 1 drives have a garbage collector that runs in
the background to look for flash erase blocks that have been partially
rewritten, and consolidate the remaining data from multiple blocks into
one block to free new space for future writing.
> Alternatively, padding out writes with the erased state value means that
> no further wear should be caused for when that block is eventually
> TRIMed/erased for rewriting.
It is certainly possible that this could be the case. The difference is
likely to be fairly minimal. But unless you are an SSD manufacturer,
you'll probably never know how much actual difference it would make :)
--
Calvin Walton <calvin.walton@kepstin.ca>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-23 19:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-22 21:47 SSD erase state and reducing SSD wear Martin
2012-05-23 4:19 ` Calvin Walton
2012-05-23 15:44 ` Martin
2012-05-23 19:50 ` Calvin Walton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).