From: Kai Krakow <hurikhan77+btrfs@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Options for SSD - autodefrag etc?
Date: Fri, 24 Jan 2014 21:14:21 +0100 [thread overview]
Message-ID: <t5uara-vp4.ln1@hurikhan77.spdns.de> (raw)
In-Reply-To: 52E19667.6090005@gmail.com
KC <conrad.francois.artus@googlemail.com> schrieb:
> I was wondering about whether using options like "autodefrag" and
> "inode_cache" on SSDs.
>
> On one hand, one always hears that defragmentation of SSD is a no-no,
> does that apply to BTRFS's autodefrag?
> Also, just recently, I heard something similar about "inode_cache".
>
> On the other hand, Arch BTRFS wiki recommends to use both options on SSDs
> http://wiki.archlinux.org/index.php/Btrfs#Mount_options
>
> So to clear things up, I ask at the source where people should know best.
>
> Does using those options on SSDs gives any benefits and causes
> non-negligible increase in SSD wear?
I'm not an expert, but I wondered myself. And while I still have not SSD yet
I would prefer turning autodefrag on even for SSD - at least when I have no
big write-intensive files on the device (but you should plan your FS to not
have those on a SSD anyways) because btrfs may rewrite large files on SSD
just for the purpose of autodefragging. I hope that will improve soon, maybe
by only defragging parts of the file given some sane thresholds.
Why I decided I would turn it on? Well, heavily fragmented files give a
performance overhead, and btrfs tends to fragment files fast (except for the
nodatacow mount flag with its own downsides). An adaptive online defrag
ensures you gain no performance loss due to very scattert extents. And:
Fragmented files (or let's better say fragmented free space) increases
write-amplification (at least for long-living filesystems) because when
small amounts of free space are randomly scattered all over the device the
filesystem has to fill these holes at some point in time. This decreases
performance because it has to find these holes and possibly split batched
write requests, and it potentially decreases life-time of your SSD because
the read-modify-write-erase cycle takes action in more places than what
would be needed if the free space hole had just been big enough. I don't
know how big erase blocks [*] are - but think about it. You will come to the
conclusion that it will reduce life-time.
So it is generally recommended to defragment heavily fragmented files, leave
alone the not-so-heavily fragmented files and coalesce free space holes into
bigger free space areas on a regular basis. I think, an effective algorithm
could coalesce free space into bigger areas of freespace and as a side
effect simply defragment those files whose parts had to be moved anyways to
merge free space. During this process, a trim should be applied.
I wonder if btrfs will optimize for this use case in the future...
All in all, I'd say: Defragmenting a SSD is not that bad if done right, and
if done right it will even improve life-time and performance. And I believe
this is why the wiki recommends it. I'd recommend combining it with
compress=lzo or maybe even compress-force=lzo (unless your SSD firmware does
compression) - it should give a performance boost and reduces writes to your
SSD. YMMV - so do your (long-term) benchmarking.
If performance and life-time is a really big concern then only partition and
ever use 75% of your device and leave the rest of it untouched so it can be
used as spare area for wear-levelling [**]. It will give you a good long-
term performance and should increase life-time.
[*] Erase blocks are usually much much bigger than the block size you can
read and write data at. Flash memory cannot be overwritten, it is
essentially write-once-read-many, so it needs to be erased. This is where
the read-modify-write-erase cycle comes from and why wear-leveling is
needed: Read the whole erase block, modify it with your data block, write it
to a new location, erase and free the old block. So you see: Writing just 4k
can result in (128k-4k) read, 128k written, 128k erased (so something like a
write-amplification factor of 64), given an erase block size of 128k. Do
this a lot and randomly scattered, and performance and life-time will suffer
a lot. The SSD firmware will try to buffer as much data as possible before
the read-modify-write-erase-cycle kicks in to decrease the bad effects of
random writes. So a block-sorting scheduler (deadline instead of noop) and
increasing nr_requests may be a good idea. This is also why you may want to
look into filesystems that turn random writes into sequential writes like
f2fs or why you may want to use bcache which also turns random writes into
sequential writes for the cache device (your SSD).
[**] This ([*]) is why you should keep a spare area...
These are just my humble thoughts. You see: The topic may be a lot more
complex than just saying "use noop scheduler" and "SSD needs no
defragmentation". I think those statements are just plain wrong.
--
Replies to list only preferred.
next prev parent reply other threads:[~2014-01-24 20:14 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-23 22:23 Options for SSD - autodefrag etc? KC
2014-01-24 6:54 ` Duncan
2014-01-25 12:54 ` Martin Steigerwald
2014-01-26 21:44 ` Duncan
2014-01-24 20:14 ` Kai Krakow [this message]
2014-01-25 13:11 ` Martin Steigerwald
2014-01-25 14:06 ` Kai Krakow
2014-01-25 16:19 ` Martin Steigerwald
-- strict thread matches above, loose matches on Subject: below --
2014-01-24 18:55 KC
2014-01-24 20:27 ` Kai Krakow
2014-01-25 5:09 ` Duncan
2014-01-25 13:33 ` Imran Geriskovan
2014-01-25 14:01 ` Martin Steigerwald
2014-01-26 17:18 ` Duncan
[not found] ` <KA9w1n01A0tVtje01A9yLn>
2014-01-28 11:41 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=t5uara-vp4.ln1@hurikhan77.spdns.de \
--to=hurikhan77+btrfs@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).