From: Gordan Bobic <gordan@bobich.net>
To: cwillu <cwillu@cwillu.com>
Cc: sander@humilis.net, jarktasaa@gmail.com, linux-btrfs@vger.kernel.org
Subject: Re: SSD optimizations
Date: Mon, 13 Dec 2010 16:48:19 +0000 [thread overview]
Message-ID: <4D064E53.6050207@bobich.net> (raw)
In-Reply-To: <AANLkTinpD+CXsHMB10DmXm-jF5a_rCUJ-tkeVthy4FYV@mail.gmail.com>
On 13/12/2010 15:17, cwillu wrote:
>>>>> In a few weeks parts for my new computer will be arriving. The st=
orage
>>>>> will be a 128GB SSD. A few weeks after that I will order three la=
rge
>>>>> disks for a RAID array. I understand that BTRFS RAID 5 support wi=
ll be
>>>>> available shortly. What is the best possible way for me to get th=
e
>>>>> highest performance out of this setup. I know of the option to op=
timize
>>>>> for SSD's
>>>>
>>>> BTRFS is hardly the best option for SSDs. I typically use ext4
>>>> without a journal on SSDs, or ext2 if that is not available.
>>>> Journalling causes more writes to hit the disk, which wears out
>>>> flash faster. Plus, SSDs typically have much slower writes than
>>>> reads, so avoiding writes is a good thing.
>>>
>>> Gordan, this you wrote is so wrong I don't even know where to begin=
=2E
>>>
>>> You'd better google a bit on the subject (ssd, and btrfs on ssd) as=
much
>>> is written about it already.
>>
>> I suggest you back your opinion up with some hard data before making=
such
>> statements. Here's a quick test - make an ext2 fs and a btrfs on two=
similar
>> disk partitions (any disk, for the sake of the experiment it doesn't=
have to
>> be an ssd), then check vmstat -d to get a base line. Then put the ke=
rnel
>> sources on each it, do a full build, then make clean and check vmsta=
t -d
>> again. Check the vmstat -d output again. See how many writes (sector=
s) hit
>> the disk with ext2 and how many with btrfs. You'll find that there w=
ere many
>> more writes with BTRFS. You can't go faster when doing more. Journal=
ing is
>> expensive.
>
> Of course. But that applies to rotating media as well (where the
> seeks involved hurt much more), and has little if anything to do with
> why you would use btrfs instead of ext2.
Indeed - btrfs is about features, most specifically the chesumming that=
=20
allows smart recovery from disk media failure. But on flash, write=20
volumes are something that shouldn't be ignored.
> Good ssd drives (by which I mean anything but consumer flash as it
> exists on sd cards and usb sticks) have very good wear leveling, good
> enough that you could overwrite the same logical sector billions of
> times before you'd experience any failure due to wear.
It comes down to volumes even in the best case scenario. A _very_ good=20
SSD (e.g. Intel) might get write amplification down to about 1.2:1, but=
=20
more typical figures are in the region of 10-20:1. Every write that can=
=20
be avoided, should be avoided.
> The issues
> with cheaper ssd drives (which I distinguish from things like sd
> cards) are uniformly performance degredation due to crappy garbage
> collection and lack of trim support to compensate. A journal is _not=
_
> a problem here.
The journal doesn't help. It can cause more than a 50% overhead on=20
metadata-heavy operations.
> On crappy flash, yes, you want to avoid a journal, mainly because the
> write leveling for a given sector only occurs over a fixed small
> number of erase blocks, resulting in a filesystem that you can burn
> out quite easily =97 I have a small pile of sd cards on my desk that =
I
> sent to such a fate. Even here there is reason to use btrfs. The
> journaling performed is much less strenuous that ext3/4: it's
> basically just a version stamp, as opposed to actually journaling the
> metadata involved. The actual metadata writes, being copy-on-write,
> provide pretty much the best case for crappy flash, as cow inherently
> wear-levels over the entire device (ssd_spread). To say nothing of
> checksums and duplicated metadata, allowing you to actually determine
> if you're running into corrupted metadata, and often recover from it
> transparently. Ext2's behavior in this respect is less than ideal.
I'm not disputing that, but the OP was talking about using the SSD as a=
=20
cache for a slower disk subsystem. That is likely to waste the SSD=20
pretty quickly purely by volume of writes, regardless of how good the=20
wear leveling is. That may be fine on a setup where the SSD is treated=20
as disposable throw-away cache item that doesn't lose you data when it=20
goes wrong, but what was being discussed isn't an expensive enterprise=20
grade setup that behaves that way.
Gordan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-13 16:48 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-12 17:24 SSD optimizations Paddy Steed
2010-12-13 0:04 ` Gordan Bobic
2010-12-13 5:11 ` Sander
2010-12-13 9:25 ` Gordan Bobic
2010-12-13 14:33 ` Peter Harris
2010-12-13 15:04 ` Gordan Bobic
2010-12-13 15:17 ` cwillu
2010-12-13 16:48 ` Gordan Bobic [this message]
2010-12-13 17:17 ` Paddy Steed
2010-12-13 17:47 ` Gordan Bobic
2010-12-13 18:20 ` Tomasz Torcz
2010-12-13 19:34 ` Ric Wheeler
-- strict thread matches above, loose matches on Subject: below --
2010-03-10 19:49 SSD Optimizations Gordan Bobic
2010-03-10 21:14 ` Marcus Fritzsch
2010-03-10 21:22 ` Marcus Fritzsch
2010-03-10 23:13 ` Gordan Bobic
2010-03-11 10:35 ` Daniel J Blueman
2010-03-11 12:03 ` Gordan Bobic
2010-03-10 23:12 ` Mike Fedyk
2010-03-10 23:22 ` Gordan Bobic
2010-03-11 7:38 ` Sander
2010-03-11 10:59 ` Hubert Kario
2010-03-11 11:31 ` Stephan von Krawczynski
2010-03-11 12:17 ` Gordan Bobic
2010-03-11 12:59 ` Stephan von Krawczynski
2010-03-11 13:20 ` Gordan Bobic
2010-03-11 14:01 ` Hubert Kario
2010-03-11 15:35 ` Stephan von Krawczynski
2010-03-11 16:03 ` Gordan Bobic
2010-03-11 16:19 ` Chris Mason
2010-03-12 1:07 ` Hubert Kario
2010-03-12 1:42 ` Chris Mason
2010-03-12 9:15 ` Stephan von Krawczynski
2010-03-12 16:00 ` Hubert Kario
2010-03-13 17:02 ` Stephan von Krawczynski
2010-03-13 19:01 ` Hubert Kario
2010-03-11 16:48 ` Martin K. Petersen
2010-03-11 14:39 ` Sander
2010-03-11 17:35 ` Stephan von Krawczynski
2010-03-11 18:00 ` Chris Mason
2010-03-13 16:43 ` Stephan von Krawczynski
2010-03-13 19:41 ` Hubert Kario
2010-03-13 21:48 ` Chris Mason
2010-03-14 3:19 ` Jeremy Fitzhardinge
2010-03-11 12:09 ` Gordan Bobic
2010-03-11 16:22 ` Martin K. Petersen
2010-03-11 11:59 ` Gordan Bobic
2010-03-11 15:59 ` Asdo
[not found] ` <4B98F350.6080804@shiftmail.org>
2010-03-11 16:15 ` Gordan Bobic
2010-03-11 14:21 ` Chris Mason
2010-03-11 16:18 ` Gordan Bobic
2010-03-11 16:29 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D064E53.6050207@bobich.net \
--to=gordan@bobich.net \
--cc=cwillu@cwillu.com \
--cc=jarktasaa@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sander@humilis.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).