From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:36050 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751686AbaAXGy7 (ORCPT ); Fri, 24 Jan 2014 01:54:59 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1W6afJ-00005m-EA for linux-btrfs@vger.kernel.org; Fri, 24 Jan 2014 07:54:57 +0100 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 24 Jan 2014 07:54:57 +0100 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 24 Jan 2014 07:54:57 +0100 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: Options for SSD - autodefrag etc? Date: Fri, 24 Jan 2014 06:54:31 +0000 (UTC) Message-ID: References: <52E19667.6090005@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: KC posted on Thu, 23 Jan 2014 23:23:35 +0100 as excerpted: > I was wondering about whether using options like "autodefrag" and > "inode_cache" on SSDs. > > On one hand, one always hears that defragmentation of SSD is a no-no, > does that apply to BTRFS's autodefrag? > Also, just recently, I heard something similar about "inode_cache". > > On the other hand, Arch BTRFS wiki recommends to use both options on > SSDs http://wiki.archlinux.org/index.php/Btrfs#Mount_options > > So to clear things up, I ask at the source where people should know > best. > > Does using those options on SSDs gives any benefits and causes > non-negligible increase in SSD wear? inode_cache is not recommended for general use, tho it can make sense for use-cases such as busy maildir based email servers where there's a lot of small files being constantly written and erased. Additionally, since btrfs is not yet fully stable (tho with kernel 3.13 the kconfig warning for btrfs was officially decreased in severity), my thought is if it's disabled, that's one less feature I have to worry about bugs in, for my filesystems. =:^) So don't enable inode_cache unless you know you need it. autodefrag is an interesting one, and I asked about it too when I was setting up my ssd-backed btrfs filesystems, so good question! =:^) Yes, autodefrag does use up somewhat limited on SSD write cycles, and yes, there's no seek time to worry about on SSDs so fragmentation doesn't hurt as badly as it does on spinning rust. There's still some cost to fragmentation, however -- each file fragment is an IOPS count on access, and while modern SSDs are rated pretty high IOPS, copy-on-write (COW) based filesystems like btrfs can heavily fragment "internally rewritten" (as opposed to written once and never changed, or new data always appended at the end like a log file or streaming media recording) files. We've seen worst-cases internal- rewritten files such as multi-gig VMs reported here, with 100K extents or more! That *WILL* eat up IOPS, even on SSDs, and there's other serious issues with that heavily fragmented a file as well, not least the additional chance of damage to it given all the work btrfs has to do tracking all those extents! But for that large a file, autodefrag isn't really the best option. See a couple paragraphs down for a better one for such large files. There are several COW-triggered fragmentation worst-cases. Perhaps the most common one on a typical desktop is small database files such as the sqlite files used for firefox history, cookies, etc, and this is where the autodefrag mount option really shines and what it was designed for. Larger internal-write files (say half a gig or bigger), particularly highly active ones where file updates may come fast enough rewriting the whole file slows things down, like big active database files, pre- allocated bittorrent download files, or multi-gig VM images, are a rather different problem, and autodefrag doesn't work as well with them. For these, the NOCOW file attribute (set with chattr +C, see the chattr manpage), which with btrfs must be set before data is written into the file, works rather better. The easiest way to set the attribute before the file is written into is to set it on the containing directory so new files created in it inherit the attribute automatically. So setup your database, VMs, or torrent client to use the same dir for everything, then set +C/NOCOW on that dir before the files are downloaded/created/copied- into-it/whatever. That way, rewrites happen in-place instead of creating a new extent every time some bit of the file changes. Of course another alternative is to use an entirely separate filesystem for your big internal-write files, either something like ext4 that's not COW-based, or btrfs with the NODATACOW mount option set (tho you'd definitely not want to use that for a general purpose btrfs). But back to autodefrag. It's also worth noting that actually doing the install with this option enabled can make a difference too, as apparently a number of popular distro installers trigger fragmentation during their work, leaving even brand new installations heavily fragmented if the install is to btrfs mounted without autodefrag. One more note on fragmentation. filefrag doesn't yet understand btrfs compression, and reports each compression block (128 KiB IIRC) as a separate extent. So if you use compression (I use compress=lzo, here), don't be surprised to see larger files reported as several hundred extents, perhaps a few thousand on gigabyte sized files. If you're worried about it, (manually, btrfs fi defrag) defrag the file and see if the number of reported extents goes down significantly. If it does, the file was fragmented and defragmenting helped. If not, defragmenting didn't help. Anyway, yes, I turned autodefrag on for my SSDs, here, but there are arguments to be made in either direction, so I can understand people choosing not to do that. One not-btrfs specific mount option that's very useful for btrfs, particularly if you're using btrfs snapshotting features, SSD or not, is noatime. While admins have been disabling atime updates for years to get better performance and that's recommended in general unless you run mutt (with other than mbox files) or something else that requires it, given that the exclusive size of a snapshot is the size of the filesystem changes written between it and the previous snapshot, with atime updates on and not a lot of other writes, those atime updates can be a big part of the exclusive size of that snapshot! So disabling them means smaller and more efficient snapshots, particularly if there isn't that much other write activity going on either. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman