linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Imran Geriskovan <imran.geriskovan@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs/SSD
Date: Mon, 17 Apr 2017 13:13:39 -0400	[thread overview]
Message-ID: <8f046fa5-a458-9db8-b616-907afd34383b@gmail.com> (raw)
In-Reply-To: <CAJCQCtS=xqcWMqiRxC_uoqTRUaW6aMwayoqjtMqq6XhcCJNVRg@mail.gmail.com>

On 2017-04-17 12:58, Chris Murphy wrote:
> On Mon, Apr 17, 2017 at 5:53 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>
>> Regarding BTRFS specifically:
>> * Given my recently newfound understanding of what the 'ssd' mount option
>> actually does, I'm inclined to recommend that people who are using high-end
>> SSD's _NOT_ use it as it will heavily increase fragmentation and will likely
>> have near zero impact on actual device lifetime (but may _hurt_
>> performance).  It will still probably help with mid and low-end SSD's.
>
> What is a high end SSD these days? Built-in NVMe?
One with a good FTL in the firmware.  At minimum, the good Samsung EVO 
drives, the high quality Intel ones, and the Crucial MX series, but 
probably some others.  My choice of words here probably wasn't the best 
though.
>
>
>
>> * Files with NOCOW and filesystems with 'nodatacow' set will both hurt
>> performance for BTRFS on SSD's, and appear to reduce the lifetime of the
>> SSD.
>
> Can you elaborate. It's an interesting problem, on a small scale the
> systemd folks have journald set +C on /var/log/journal so that any new
> journals are nocow. There is an initial fallocate, but the write
> behavior is writing in the same place at the head and tail. But at the
> tail, the writes get pushed torward the middle. So the file is growing
> into its fallocated space from the tail. The header changes in the
> same location, it's an overwrite.
For a normal filesystem or BTRFS with nodatacow or NOCOW, the block gets 
rewritten in-place.  This means that cheap FTL's will rewrite that erase 
block in-place (which won't hurt performance but will impact device 
lifetime), and good ones will rewrite into a free block somewhere else 
but may not free that original block for quite some time (which is bad 
for performance but slightly better for device lifetime).

When BTRFS does a COW operation on a block however, it will guarantee 
that that block moves.  Because of this, the old location will either:
1. Be discarded by the FS itself if the 'discard' mount option is set.
2. Be caught by a scheduled call to 'fstrim'.
3. Lay dormant for at least a while.

The first case is ideal for most FTL's, because it lets them know 
immediately that that data isn't needed and the space can be reused. 
The second is close to ideal, but defers telling the FTL that the block 
is unused, which can be better on some SSD's (some have firmware that 
handles wear-leveling better in batches).  The third is not ideal, but 
is still better than what happens with NOCOW or nodatacow set.

Overall, this boils down to the fact that most FTL's get slower if they 
can't wear-level the device properly, and in-place rewrites make it 
harder for them to do proper wear-leveling.
>
> So long as this file is not reflinked or snapshot, filefrag shows a
> pile of mostly 4096 byte blocks, thousands. But as they're pretty much
> all continuous, the file fragmentation (extent count) is usually never
> higher than 12. It meanders between 1 and 12 extents for its life.
>
> Except on the system using ssd_spread mount option. That one has a
> journal file that is +C, is not being snapshot, but has over 3000
> extents per filefrag and btrfs-progs/debugfs. Really weird.
Given how the 'ssd' mount option behaves and the frequency that most 
systemd instances write to their journals, that's actually reasonably 
expected.  We look for big chunks of free space to write into and then 
align to 2M regardless of the actual size of the write, which in turn 
means that files like the systemd journal which see lots of small 
(relatively speaking) writes will have way more extents than they should 
until you defragment them.
>
> Now, systemd aside, there are databases that behave this same way
> where there's a small section contantly being overwritten, and one or
> more sections that grow the data base file from within and at the end.
> If this is made cow, the file will absolutely fragment a ton. And
> especially if the changes are mostly 4KiB block sizes that then are
> fsync'd.
>
> It's almost like we need these things to not fsync at all, and just
> rely on the filesystem commit time...
Essentially yes, but that causes all kinds of other problems.

  reply	other threads:[~2017-04-17 17:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14 11:02 Btrfs/SSD Imran Geriskovan
2017-04-17 11:53 ` Btrfs/SSD Austin S. Hemmelgarn
2017-04-17 16:58   ` Btrfs/SSD Chris Murphy
2017-04-17 17:13     ` Austin S. Hemmelgarn [this message]
2017-04-17 18:24       ` Btrfs/SSD Roman Mamedov
2017-04-17 19:22         ` Btrfs/SSD Imran Geriskovan
2017-04-17 22:55           ` Btrfs/SSD Hans van Kranenburg
2017-04-19 18:10             ` Btrfs/SSD Chris Murphy
2017-04-18 12:26           ` Btrfs/SSD Austin S. Hemmelgarn
2017-04-18  3:23         ` Btrfs/SSD Duncan
2017-04-18  4:58           ` Btrfs/SSD Roman Mamedov
2017-04-17 18:34       ` Btrfs/SSD Chris Murphy
2017-04-17 19:26         ` Btrfs/SSD Austin S. Hemmelgarn
2017-04-17 19:39           ` Btrfs/SSD Chris Murphy
2017-04-18 11:31             ` Btrfs/SSD Austin S. Hemmelgarn
2017-04-18 12:20               ` Btrfs/SSD Hugo Mills
2017-04-18 13:02   ` Btrfs/SSD Imran Geriskovan
2017-04-18 13:39     ` Btrfs/SSD Austin S. Hemmelgarn
2017-05-12 18:27     ` Btrfs/SSD Kai Krakow
2017-05-12 20:31       ` Btrfs/SSD Imran Geriskovan
2017-05-13  9:39       ` Btrfs/SSD Duncan
2017-05-13 11:15         ` Btrfs/SSD Janos Toth F.
2017-05-13 11:34         ` [OT] SSD performance patterns (was: Btrfs/SSD) Kai Krakow
2017-05-14 16:21         ` Btrfs/SSD Chris Murphy
2017-05-14 18:01           ` Btrfs/SSD Tomasz Kusmierz
2017-05-14 20:47             ` Btrfs/SSD (my -o ssd "summary") Hans van Kranenburg
2017-05-14 23:01             ` Btrfs/SSD Imran Geriskovan
2017-05-15  0:23               ` Btrfs/SSD Tomasz Kusmierz
2017-05-15  0:24               ` Btrfs/SSD Tomasz Kusmierz
2017-05-15 11:25                 ` Btrfs/SSD Imran Geriskovan
2017-05-15 11:46       ` Btrfs/SSD Austin S. Hemmelgarn
2017-05-15 19:22         ` Btrfs/SSD Kai Krakow
2017-05-12  4:51   ` Btrfs/SSD Duncan
2017-05-12 13:02     ` Btrfs/SSD Imran Geriskovan
2017-05-12 18:36       ` Btrfs/SSD Kai Krakow
2017-05-13  9:52         ` Btrfs/SSD Roman Mamedov
2017-05-13 10:47           ` Btrfs/SSD Kai Krakow
2017-05-15 12:03         ` Btrfs/SSD Austin S. Hemmelgarn
2017-05-15 13:09           ` Btrfs/SSD Tomasz Kusmierz
2017-05-15 19:12             ` Btrfs/SSD Kai Krakow
2017-05-16  4:48               ` Btrfs/SSD Duncan
2017-05-15 19:49           ` Btrfs/SSD Kai Krakow
2017-05-15 20:05             ` Btrfs/SSD Tomasz Torcz
2017-05-16  1:58               ` Btrfs/SSD Kai Krakow
2017-05-16 12:21                 ` Btrfs/SSD Tomasz Torcz
2017-05-16 12:35                   ` Btrfs/SSD Austin S. Hemmelgarn
2017-05-16 17:08                   ` Btrfs/SSD Kai Krakow
2017-05-16 11:43             ` Btrfs/SSD Austin S. Hemmelgarn
2017-05-14  8:46       ` Btrfs/SSD Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f046fa5-a458-9db8-b616-907afd34383b@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=imran.geriskovan@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).