linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77+btrfs@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Options for SSD - autodefrag etc?
Date: Fri, 24 Jan 2014 21:14:21 +0100	[thread overview]
Message-ID: <t5uara-vp4.ln1@hurikhan77.spdns.de> (raw)
In-Reply-To: 52E19667.6090005@gmail.com

KC <conrad.francois.artus@googlemail.com> schrieb:

> I was wondering about whether using options like "autodefrag" and
> "inode_cache" on SSDs.
> 
> On one hand, one always hears that defragmentation of SSD is a no-no,
> does that apply to BTRFS's autodefrag?
> Also, just recently, I heard something similar about "inode_cache".
> 
> On the other hand, Arch BTRFS wiki recommends to use both options on SSDs
>   http://wiki.archlinux.org/index.php/Btrfs#Mount_options
> 
> So to clear things up, I ask at the source where people should know best.
> 
> Does using those options on SSDs gives any benefits and causes
> non-negligible increase in SSD wear?

I'm not an expert, but I wondered myself. And while I still have not SSD yet 
I would prefer turning autodefrag on even for SSD - at least when I have no 
big write-intensive files on the device (but you should plan your FS to not 
have those on a SSD anyways) because btrfs may rewrite large files on SSD 
just for the purpose of autodefragging. I hope that will improve soon, maybe 
by only defragging parts of the file given some sane thresholds.

Why I decided I would turn it on? Well, heavily fragmented files give a 
performance overhead, and btrfs tends to fragment files fast (except for the 
nodatacow mount flag with its own downsides). An adaptive online defrag 
ensures you gain no performance loss due to very scattert extents. And: 
Fragmented files (or let's better say fragmented free space) increases 
write-amplification (at least for long-living filesystems) because when 
small amounts of free space are randomly scattered all over the device the 
filesystem has to fill these holes at some point in time. This decreases 
performance because it has to find these holes and possibly split batched 
write requests, and it potentially decreases life-time of your SSD because 
the read-modify-write-erase cycle takes action in more places than what 
would be needed if the free space hole had just been big enough. I don't 
know how big erase blocks [*] are - but think about it. You will come to the 
conclusion that it will reduce life-time.

So it is generally recommended to defragment heavily fragmented files, leave 
alone the not-so-heavily fragmented files and coalesce free space holes into 
bigger free space areas on a regular basis. I think, an effective algorithm 
could coalesce free space into bigger areas of freespace and as a side 
effect simply defragment those files whose parts had to be moved anyways to 
merge free space. During this process, a trim should be applied.

I wonder if btrfs will optimize for this use case in the future...

All in all, I'd say: Defragmenting a SSD is not that bad if done right, and 
if done right it will even improve life-time and performance. And I believe 
this is why the wiki recommends it. I'd recommend combining it with 
compress=lzo or maybe even compress-force=lzo (unless your SSD firmware does 
compression) - it should give a performance boost and reduces writes to your 
SSD. YMMV - so do your (long-term) benchmarking.

If performance and life-time is a really big concern then only partition and 
ever use 75% of your device and leave the rest of it untouched so it can be 
used as spare area for wear-levelling [**]. It will give you a good long-
term performance and should increase life-time.

[*] Erase blocks are usually much much bigger than the block size you can 
read and write data at. Flash memory cannot be overwritten, it is 
essentially write-once-read-many, so it needs to be erased. This is where 
the read-modify-write-erase cycle comes from and why wear-leveling is 
needed: Read the whole erase block, modify it with your data block, write it 
to a new location, erase and free the old block. So you see: Writing just 4k 
can result in (128k-4k) read, 128k written, 128k erased (so something like a 
write-amplification factor of 64), given an erase block size of 128k. Do 
this a lot and randomly scattered, and performance and life-time will suffer 
a lot. The SSD firmware will try to buffer as much data as possible before 
the read-modify-write-erase-cycle kicks in to decrease the bad effects of 
random writes. So a block-sorting scheduler (deadline instead of noop) and 
increasing nr_requests may be a good idea. This is also why you may want to 
look into filesystems that turn random writes into sequential writes like 
f2fs or why you may want to use bcache which also turns random writes into 
sequential writes for the cache device (your SSD).

[**] This ([*]) is why you should keep a spare area...

These are just my humble thoughts. You see: The topic may be a lot more 
complex than just saying "use noop scheduler" and "SSD needs no 
defragmentation". I think those statements are just plain wrong.

-- 
Replies to list only preferred.


  parent reply	other threads:[~2014-01-24 20:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-23 22:23 Options for SSD - autodefrag etc? KC
2014-01-24  6:54 ` Duncan
2014-01-25 12:54   ` Martin Steigerwald
2014-01-26 21:44     ` Duncan
2014-01-24 20:14 ` Kai Krakow [this message]
2014-01-25 13:11   ` Martin Steigerwald
2014-01-25 14:06     ` Kai Krakow
2014-01-25 16:19       ` Martin Steigerwald
  -- strict thread matches above, loose matches on Subject: below --
2014-01-24 18:55 KC
2014-01-24 20:27 ` Kai Krakow
2014-01-25  5:09   ` Duncan
2014-01-25 13:33   ` Imran Geriskovan
2014-01-25 14:01     ` Martin Steigerwald
2014-01-26 17:18       ` Duncan
     [not found]         ` <KA9w1n01A0tVtje01A9yLn>
2014-01-28 11:41           ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=t5uara-vp4.ln1@hurikhan77.spdns.de \
    --to=hurikhan77+btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).