From: Chris Mason <chris.mason@oracle.com>
To: Hubert Kario <hka@qbs.com.pl>
Cc: Gordan Bobic <gordan@bobich.net>, linux-btrfs@vger.kernel.org
Subject: Re: SSD Optimizations
Date: Thu, 11 Mar 2010 20:42:48 -0500 [thread overview]
Message-ID: <20100312014247.GB3035@think> (raw)
In-Reply-To: <201003120207.40740.hka@qbs.com.pl>
On Fri, Mar 12, 2010 at 02:07:40AM +0100, Hubert Kario wrote:
> > > For example - you have a disk that has had all it's addressable blocks
> > > tainted. A new write comes in - what do you do with it? Worse, a write
> > > comes in spanning two erase blocks as a consequence of the data
> > > re-alignment in the firmware. You have no choice but to wipe them both
> > > and re-write the data. You'd be better off not doing the magic and
> > > assuming that the FS is sensibly aligned.
> >
> > Ok, how exactly would the FS help here? We have a device with a 256kb
> > erasure size, and userland does a 4k write followed by an fsync.
>
> I assume here that the FS knows about erasure size and does implement TRIM.
>
> > If the FS were to be smart and know about the 256kb requirement, it
> > would do a read/modify/write cycle somewhere and then write the 4KB.
>
> If all the free blocks have been TRIMmed, FS should pick a completely free
> erasure size block and write those 4KiB of data.
>
> Correct implementation of wear leveling in the drive should notice that the
> write is entirely inside a free block and make just a write cycle adding zeros
> to the end of supplied data.
>
> > The underlying implementation is the same in the device. It picks a
> > destination, reads it then writes it back. You could argue (and many
> > people do) that this operation is risky and has a good chance of
> > destroying old data. Perhaps we're best off if the FS does the rmw
> > cycle instead into an entirely safe location.
>
> And IMO that's the idea behind TRIM -- not to force the device do do rmw
> cycles, only write cycle or erase cycle, provided there's free space and the
> free space doesn't have considerably more write cycles than the already
> allocated data.
>
> >
> > It's a great place for research and people are definitely looking at it.
> >
> > But with all of that said, it has nothing to do with alignment or trim.
> > Modern ssds are a raid device with a large stripe size, and someone
> > somewhere is going to do a read/modify/write to service any small write.
> > You can force this up to the FS or the application, it'll happen
> > somewhere.
>
> Yes, and if the parition is full rmw will happen in the drive. But if the
> partition is far from full, free space is TRIMmed then than the r/m/w cycle
> will happen inside btrfs and the SSD won't have to do its magic -- making the
> process faster.
The filesystem cannot do read/modify/write faster or better than the
drive. The drive is pushing data around internally and the FS has to
pull it in and out of the sata bus. The drive is much faster.
The FS can be safer than the drive because it is able to do more
consistency checks on the data as it reads. But this also has a cost
because the crcs for the blocks might not be adjacent to the block.
If the FS is the FTL, we don't need trim because the FS already knows
which blocks are in use. So, there isn't as much complexity in finding
the free erasure block. The FS FTL does win there.
-chris
next prev parent reply other threads:[~2010-03-12 1:42 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-10 19:49 SSD Optimizations Gordan Bobic
2010-03-10 21:14 ` Marcus Fritzsch
2010-03-10 21:22 ` Marcus Fritzsch
2010-03-10 23:13 ` Gordan Bobic
2010-03-11 10:35 ` Daniel J Blueman
2010-03-11 12:03 ` Gordan Bobic
2010-03-10 23:12 ` Mike Fedyk
2010-03-10 23:22 ` Gordan Bobic
2010-03-11 7:38 ` Sander
2010-03-11 10:59 ` Hubert Kario
2010-03-11 11:31 ` Stephan von Krawczynski
2010-03-11 12:17 ` Gordan Bobic
2010-03-11 12:59 ` Stephan von Krawczynski
2010-03-11 13:20 ` Gordan Bobic
2010-03-11 14:01 ` Hubert Kario
2010-03-11 15:35 ` Stephan von Krawczynski
2010-03-11 16:03 ` Gordan Bobic
2010-03-11 16:19 ` Chris Mason
2010-03-12 1:07 ` Hubert Kario
2010-03-12 1:42 ` Chris Mason [this message]
2010-03-12 9:15 ` Stephan von Krawczynski
2010-03-12 16:00 ` Hubert Kario
2010-03-13 17:02 ` Stephan von Krawczynski
2010-03-13 19:01 ` Hubert Kario
2010-03-11 16:48 ` Martin K. Petersen
2010-03-11 14:39 ` Sander
2010-03-11 17:35 ` Stephan von Krawczynski
2010-03-11 18:00 ` Chris Mason
2010-03-13 16:43 ` Stephan von Krawczynski
2010-03-13 19:41 ` Hubert Kario
2010-03-13 21:48 ` Chris Mason
2010-03-14 3:19 ` Jeremy Fitzhardinge
2010-03-11 12:09 ` Gordan Bobic
2010-03-11 16:22 ` Martin K. Petersen
2010-03-11 11:59 ` Gordan Bobic
2010-03-11 15:59 ` Asdo
[not found] ` <4B98F350.6080804@shiftmail.org>
2010-03-11 16:15 ` Gordan Bobic
2010-03-11 14:21 ` Chris Mason
2010-03-11 16:18 ` Gordan Bobic
2010-03-11 16:29 ` Chris Mason
-- strict thread matches above, loose matches on Subject: below --
2010-12-12 17:24 SSD optimizations Paddy Steed
2010-12-13 0:04 ` Gordan Bobic
2010-12-13 5:11 ` Sander
2010-12-13 9:25 ` Gordan Bobic
2010-12-13 14:33 ` Peter Harris
2010-12-13 15:04 ` Gordan Bobic
2010-12-13 15:17 ` cwillu
2010-12-13 16:48 ` Gordan Bobic
2010-12-13 17:17 ` Paddy Steed
2010-12-13 17:47 ` Gordan Bobic
2010-12-13 18:20 ` Tomasz Torcz
2010-12-13 19:34 ` Ric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100312014247.GB3035@think \
--to=chris.mason@oracle.com \
--cc=gordan@bobich.net \
--cc=hka@qbs.com.pl \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).