From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:46956 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751780AbaAXUOt (ORCPT ); Fri, 24 Jan 2014 15:14:49 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1W6n9H-0000Yg-Fx for linux-btrfs@vger.kernel.org; Fri, 24 Jan 2014 21:14:43 +0100 Received: from 37-4-166-128-dynip.superkabel.de ([37.4.166.128]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 24 Jan 2014 21:14:43 +0100 Received: from hurikhan77+btrfs by 37-4-166-128-dynip.superkabel.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 24 Jan 2014 21:14:43 +0100 To: linux-btrfs@vger.kernel.org From: Kai Krakow Subject: Re: Options for SSD - autodefrag etc? Date: Fri, 24 Jan 2014 21:14:21 +0100 Message-ID: References: <52E19667.6090005@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Sender: linux-btrfs-owner@vger.kernel.org List-ID: KC schrieb: > I was wondering about whether using options like "autodefrag" and > "inode_cache" on SSDs. > > On one hand, one always hears that defragmentation of SSD is a no-no, > does that apply to BTRFS's autodefrag? > Also, just recently, I heard something similar about "inode_cache". > > On the other hand, Arch BTRFS wiki recommends to use both options on SSDs > http://wiki.archlinux.org/index.php/Btrfs#Mount_options > > So to clear things up, I ask at the source where people should know best. > > Does using those options on SSDs gives any benefits and causes > non-negligible increase in SSD wear? I'm not an expert, but I wondered myself. And while I still have not SSD yet I would prefer turning autodefrag on even for SSD - at least when I have no big write-intensive files on the device (but you should plan your FS to not have those on a SSD anyways) because btrfs may rewrite large files on SSD just for the purpose of autodefragging. I hope that will improve soon, maybe by only defragging parts of the file given some sane thresholds. Why I decided I would turn it on? Well, heavily fragmented files give a performance overhead, and btrfs tends to fragment files fast (except for the nodatacow mount flag with its own downsides). An adaptive online defrag ensures you gain no performance loss due to very scattert extents. And: Fragmented files (or let's better say fragmented free space) increases write-amplification (at least for long-living filesystems) because when small amounts of free space are randomly scattered all over the device the filesystem has to fill these holes at some point in time. This decreases performance because it has to find these holes and possibly split batched write requests, and it potentially decreases life-time of your SSD because the read-modify-write-erase cycle takes action in more places than what would be needed if the free space hole had just been big enough. I don't know how big erase blocks [*] are - but think about it. You will come to the conclusion that it will reduce life-time. So it is generally recommended to defragment heavily fragmented files, leave alone the not-so-heavily fragmented files and coalesce free space holes into bigger free space areas on a regular basis. I think, an effective algorithm could coalesce free space into bigger areas of freespace and as a side effect simply defragment those files whose parts had to be moved anyways to merge free space. During this process, a trim should be applied. I wonder if btrfs will optimize for this use case in the future... All in all, I'd say: Defragmenting a SSD is not that bad if done right, and if done right it will even improve life-time and performance. And I believe this is why the wiki recommends it. I'd recommend combining it with compress=lzo or maybe even compress-force=lzo (unless your SSD firmware does compression) - it should give a performance boost and reduces writes to your SSD. YMMV - so do your (long-term) benchmarking. If performance and life-time is a really big concern then only partition and ever use 75% of your device and leave the rest of it untouched so it can be used as spare area for wear-levelling [**]. It will give you a good long- term performance and should increase life-time. [*] Erase blocks are usually much much bigger than the block size you can read and write data at. Flash memory cannot be overwritten, it is essentially write-once-read-many, so it needs to be erased. This is where the read-modify-write-erase cycle comes from and why wear-leveling is needed: Read the whole erase block, modify it with your data block, write it to a new location, erase and free the old block. So you see: Writing just 4k can result in (128k-4k) read, 128k written, 128k erased (so something like a write-amplification factor of 64), given an erase block size of 128k. Do this a lot and randomly scattered, and performance and life-time will suffer a lot. The SSD firmware will try to buffer as much data as possible before the read-modify-write-erase-cycle kicks in to decrease the bad effects of random writes. So a block-sorting scheduler (deadline instead of noop) and increasing nr_requests may be a good idea. This is also why you may want to look into filesystems that turn random writes into sequential writes like f2fs or why you may want to use bcache which also turns random writes into sequential writes for the cache device (your SSD). [**] This ([*]) is why you should keep a spare area... These are just my humble thoughts. You see: The topic may be a lot more complex than just saying "use noop scheduler" and "SSD needs no defragmentation". I think those statements are just plain wrong. -- Replies to list only preferred.