From: Chris Mason <chris.mason@oracle.com>
To: Gordan Bobic <gordan@bobich.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: SSD Optimizations
Date: Thu, 11 Mar 2010 11:29:07 -0500 [thread overview]
Message-ID: <20100311162907.GI6509@think> (raw)
In-Reply-To: <fdd915134f8941e4f2087fc2f4cf4e79@localhost>
On Thu, Mar 11, 2010 at 04:18:48PM +0000, Gordan Bobic wrote:
> On Thu, 11 Mar 2010 09:21:30 -0500, Chris Mason <chris.mason@oracle.com>
> wrote:
> > On Wed, Mar 10, 2010 at 07:49:34PM +0000, Gordan Bobic wrote:
> >> I'm looking to try BTRFS on a SSD, and I would like to know what SSD
> >> optimizations it applies. Is there a comprehensive list of what ssd
> >> mount option does? How are the blocks and metadata arranged? Are
> >> there options available comparable to ext2/ext3 to help reduce wear
> >> and improve performance?
> >>
> >> Specifically, on ext2 (journal means more writes, so I don't use
> >> ext3 on SSDs, since fsck typically only takes a few seconds when
> >> access time is < 100us), I usually apply the
> >> -b 4096 -E stripe-width = (erase_block/4096)
> >> parameters to mkfs in order to reduce the multiple erase cycles on
> >> the same underlying block.
> >>
> >> Are there similar optimizations available in BTRFS?
> >
> > All devices (raid, ssd, single spindle) tend to benefit from big chunks
> > of writes going down close together on disk. This is true for different
> > reasons on each one, but it is still the easiest way to optimize writes.
> > COW filesystems like btrfs are very well suited to send down lots of big
> > writes because we're always reallocating things.
>
> Doesn't this mean _more_ writes? If that's the case, then that would make
> btrfs a _bad_ choice for flash based media with limite write cycles.
It just means that when we do write, we don't overwrite the existing
data in the file. We allocate a new block instead and write there
(freeing the old one)
.
This gives us a lot of control over grouping writes together, instead of
being restricted to the layout from when the file was first created.
It also fragments the files much more, but this isn't an issue on ssd.
>
> > For traditional storage, we also need to keep blocks from one file (or
> > files in a directory) close together to reduce seeks during reads. SSDs
> > have no such restrictions, and so the mount -o ssd related options in
> > btrfs focus on tossing out tradeoffs that slow down writes in hopes of
> > reading faster later.
> >
> > Someone already mentioned the mount -o ssd and ssd_spread options.
> > Mount -o ssd is targeted at faster SSD that is good at wear leveling and
> > generally just benefits from having a bunch of data sent down close
> > together. In mount -o ssd, you might find a write pattern like this:
> >
> > block N, N+2, N+3, N+4, N+6, N+7, N+16, N+17, N+18, N+19, N+20 ...
> >
> > It's a largely contiguous chunk of writes, but there may be gaps. Good
> > ssds don't really care about the gaps, and they benefit more from the
> > fact that we're preferring to reuse blocks that had once been written
> > than to go off and find completely contiguous areas of the disk to
> > write (which are more likely to have never been written at all).
> >
> > mount -o ssd_spread is much more strict. You'll get N,N+2,N+3,N+4,N+5
> > etc because crummy ssds really do care about the gaps.
> >
> > Now, btrfs could go off and probe for the erasure size and work very
> > hard to align things to it. As someone said, alignment of the partition
> > table is very important here as well. But for modern ssd this generally
> > matters much less than just doing big ios and letting the little log
> > structured squirrel inside the device figure things out.
>
> Thanks, that's quite helpful. Can you provide any insight into alignment
> of FS structures in such a way that they do not straddle erase block
> boundaries?
We align on 4k (but partition alignment can defeat this). We don't
attempt to understand or guess at erasure blocks. Unless the filesystem
completely takes over the FTL duties, I don't think it makes sense to do
more than send large writes whenever we can.
The raid 5/6 patches will add more knobs for strict alignment, but I'd
be very surprised if they made a big difference on modern ssd.
>
> > For trim, we do have mount -o discard. It does introduce a run time
> > performance hit (this varies wildly from device to device) and we're
> > tuning things as discard capable devices become more common. If anyone
> > is looking for a project it would be nice to have an ioctl that triggers
> > free space discards in bulk.
>
> Are you saying that -o discard implements trim support?
Yes, it sends trim/discards down to devices that support it.
-chris
next prev parent reply other threads:[~2010-03-11 16:29 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-10 19:49 SSD Optimizations Gordan Bobic
2010-03-10 21:14 ` Marcus Fritzsch
2010-03-10 21:22 ` Marcus Fritzsch
2010-03-10 23:13 ` Gordan Bobic
2010-03-11 10:35 ` Daniel J Blueman
2010-03-11 12:03 ` Gordan Bobic
2010-03-10 23:12 ` Mike Fedyk
2010-03-10 23:22 ` Gordan Bobic
2010-03-11 7:38 ` Sander
2010-03-11 10:59 ` Hubert Kario
2010-03-11 11:31 ` Stephan von Krawczynski
2010-03-11 12:17 ` Gordan Bobic
2010-03-11 12:59 ` Stephan von Krawczynski
2010-03-11 13:20 ` Gordan Bobic
2010-03-11 14:01 ` Hubert Kario
2010-03-11 15:35 ` Stephan von Krawczynski
2010-03-11 16:03 ` Gordan Bobic
2010-03-11 16:19 ` Chris Mason
2010-03-12 1:07 ` Hubert Kario
2010-03-12 1:42 ` Chris Mason
2010-03-12 9:15 ` Stephan von Krawczynski
2010-03-12 16:00 ` Hubert Kario
2010-03-13 17:02 ` Stephan von Krawczynski
2010-03-13 19:01 ` Hubert Kario
2010-03-11 16:48 ` Martin K. Petersen
2010-03-11 14:39 ` Sander
2010-03-11 17:35 ` Stephan von Krawczynski
2010-03-11 18:00 ` Chris Mason
2010-03-13 16:43 ` Stephan von Krawczynski
2010-03-13 19:41 ` Hubert Kario
2010-03-13 21:48 ` Chris Mason
2010-03-14 3:19 ` Jeremy Fitzhardinge
2010-03-11 12:09 ` Gordan Bobic
2010-03-11 16:22 ` Martin K. Petersen
2010-03-11 11:59 ` Gordan Bobic
2010-03-11 15:59 ` Asdo
[not found] ` <4B98F350.6080804@shiftmail.org>
2010-03-11 16:15 ` Gordan Bobic
2010-03-11 14:21 ` Chris Mason
2010-03-11 16:18 ` Gordan Bobic
2010-03-11 16:29 ` Chris Mason [this message]
-- strict thread matches above, loose matches on Subject: below --
2010-12-12 17:24 SSD optimizations Paddy Steed
2010-12-13 0:04 ` Gordan Bobic
2010-12-13 5:11 ` Sander
2010-12-13 9:25 ` Gordan Bobic
2010-12-13 14:33 ` Peter Harris
2010-12-13 15:04 ` Gordan Bobic
2010-12-13 15:17 ` cwillu
2010-12-13 16:48 ` Gordan Bobic
2010-12-13 17:17 ` Paddy Steed
2010-12-13 17:47 ` Gordan Bobic
2010-12-13 18:20 ` Tomasz Torcz
2010-12-13 19:34 ` Ric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100311162907.GI6509@think \
--to=chris.mason@oracle.com \
--cc=gordan@bobich.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).