From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gordan Bobic Subject: Re: SSD optimizations Date: Mon, 13 Dec 2010 16:48:19 +0000 Message-ID: <4D064E53.6050207@bobich.net> References: <1292174654.11248.10.camel@paddy-desktop> <4D05630E.7070809@bobich.net> <20101213051157.GA19543@attic.humilis.net> <4D05E681.5090004@bobich.net> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Cc: sander@humilis.net, jarktasaa@gmail.com, linux-btrfs@vger.kernel.org To: cwillu Return-path: In-Reply-To: List-ID: On 13/12/2010 15:17, cwillu wrote: >>>>> In a few weeks parts for my new computer will be arriving. The st= orage >>>>> will be a 128GB SSD. A few weeks after that I will order three la= rge >>>>> disks for a RAID array. I understand that BTRFS RAID 5 support wi= ll be >>>>> available shortly. What is the best possible way for me to get th= e >>>>> highest performance out of this setup. I know of the option to op= timize >>>>> for SSD's >>>> >>>> BTRFS is hardly the best option for SSDs. I typically use ext4 >>>> without a journal on SSDs, or ext2 if that is not available. >>>> Journalling causes more writes to hit the disk, which wears out >>>> flash faster. Plus, SSDs typically have much slower writes than >>>> reads, so avoiding writes is a good thing. >>> >>> Gordan, this you wrote is so wrong I don't even know where to begin= =2E >>> >>> You'd better google a bit on the subject (ssd, and btrfs on ssd) as= much >>> is written about it already. >> >> I suggest you back your opinion up with some hard data before making= such >> statements. Here's a quick test - make an ext2 fs and a btrfs on two= similar >> disk partitions (any disk, for the sake of the experiment it doesn't= have to >> be an ssd), then check vmstat -d to get a base line. Then put the ke= rnel >> sources on each it, do a full build, then make clean and check vmsta= t -d >> again. Check the vmstat -d output again. See how many writes (sector= s) hit >> the disk with ext2 and how many with btrfs. You'll find that there w= ere many >> more writes with BTRFS. You can't go faster when doing more. Journal= ing is >> expensive. > > Of course. But that applies to rotating media as well (where the > seeks involved hurt much more), and has little if anything to do with > why you would use btrfs instead of ext2. Indeed - btrfs is about features, most specifically the chesumming that= =20 allows smart recovery from disk media failure. But on flash, write=20 volumes are something that shouldn't be ignored. > Good ssd drives (by which I mean anything but consumer flash as it > exists on sd cards and usb sticks) have very good wear leveling, good > enough that you could overwrite the same logical sector billions of > times before you'd experience any failure due to wear. It comes down to volumes even in the best case scenario. A _very_ good=20 SSD (e.g. Intel) might get write amplification down to about 1.2:1, but= =20 more typical figures are in the region of 10-20:1. Every write that can= =20 be avoided, should be avoided. > The issues > with cheaper ssd drives (which I distinguish from things like sd > cards) are uniformly performance degredation due to crappy garbage > collection and lack of trim support to compensate. A journal is _not= _ > a problem here. The journal doesn't help. It can cause more than a 50% overhead on=20 metadata-heavy operations. > On crappy flash, yes, you want to avoid a journal, mainly because the > write leveling for a given sector only occurs over a fixed small > number of erase blocks, resulting in a filesystem that you can burn > out quite easily =97 I have a small pile of sd cards on my desk that = I > sent to such a fate. Even here there is reason to use btrfs. The > journaling performed is much less strenuous that ext3/4: it's > basically just a version stamp, as opposed to actually journaling the > metadata involved. The actual metadata writes, being copy-on-write, > provide pretty much the best case for crappy flash, as cow inherently > wear-levels over the entire device (ssd_spread). To say nothing of > checksums and duplicated metadata, allowing you to actually determine > if you're running into corrupted metadata, and often recover from it > transparently. Ext2's behavior in this respect is less than ideal. I'm not disputing that, but the OP was talking about using the SSD as a= =20 cache for a slower disk subsystem. That is likely to waste the SSD=20 pretty quickly purely by volume of writes, regardless of how good the=20 wear leveling is. That may be fine on a setup where the SSD is treated=20 as disposable throw-away cache item that doesn't lose you data when it=20 goes wrong, but what was being discussed isn't an expensive enterprise=20 grade setup that behaves that way. Gordan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html