Re: defragmenting best practice? - Austin S. Hemmelgarn

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Dave <davestechshop@gmail.com>,
	Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: defragmenting best practice?
Date: Thu, 2 Nov 2017 14:37:42 -0400	[thread overview]
Message-ID: <cbb18834-342a-c724-35ac-8bb9dcfe0a35@gmail.com> (raw)
In-Reply-To: <CAH=dxU4Rnhajhmq6j0B8t82_YUfUCSbestfSivm3JDwuV=6wYA@mail.gmail.com>

On 2017-11-02 14:09, Dave wrote:
> On Thu, Nov 2, 2017 at 7:17 AM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
> 
>>> And the worst performing machine was the one with the most RAM and a
>>> fast NVMe drive and top of the line hardware.
>>
>> Somewhat nonsensically, I'll bet that NVMe is a contributing factor in this
>> particular case.  NVMe has particularly bad performance with the old block
>> IO schedulers (though it is NVMe, so it should still be better than a SATA
>> or SAS SSD), and the new blk-mq framework just got scheduling support in
>> 4.12, and only got reasonably good scheduling options in 4.13.  I doubt it's
>> the entirety of the issue, but it's probably part of it.
> 
> Thanks for that news. Based on that, I assume the advice here (to use
> noop for NVMe) is now outdated?
> https://stackoverflow.com/a/27664577/463994
> 
> Is the solution as simple as running a kernel >= 4.13? Or do I need to
> specify which scheduler to use?
> 
> I just checked one computer:
> 
> uname -a
> Linux morpheus 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 09:58:47 CEST
> 2017 x86_64 GNU/Linux
> 
> $ sudo find /sys -name scheduler -exec grep . {} +
> /sys/devices/pci0000:00/0000:00:1d.0/0000:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler:[none]
> mq-deadline kyber bfq
> 
>  From this article, it sounds like (maybe) I should use kyber. I see
> kyber listed in the output above, so I assume that means it is
> available. I also think [none] is the current scheduler being used, as
> it is in brackets.
> 
> I checked this:
> https://www.kernel.org/doc/Documentation/block/switching-sched.txt
> Based on that, I assume I would do this at runtime:
> 
> echo kyber > /sys/devices/pci0000:00/0000:00:1d.0/0000:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler
> 
> I assume this is equivalent:
> 
> echo kyber > /sys/block/nvme0n1/queue/scheduler
> 
> How would I set it permanently at boot time?
It's kind of complicated overall.  As of 4.14, there are four options 
for the blk-mq path.  The 'none' scheduler is the old behavior prior to 
4.13, and does no scheduling.  'mq-deadline' is the default AFAIK, and 
behaves like the old deadline I/O scheduler (not sure if it supports I/O 
priorities).  'bfq' is a blk-mq port of a scheduler originally designed 
to replace the default CFQ scheduler from the old block layer.  'kyber' 
I know essentially nothing about, I never saw the patches on LKML (not 
sure if I just missed them, or they only went to topic lists), and I've 
not tried it myself.

I have no personal experience with anything but the none scheduler on 
NVMe devices, so i can't really comment much more than saying that I've 
seen a huge difference on the SATA SSD's I use first when the deadline 
scheduler became the default and then again when I switched to BFQ on my 
systems, and the fact that I've seen reports of using the deadline 
scheduler improving things on NVMe.

As far as setting it at boot time, there's currently no kernel 
configuration option to set a default like there is for the old block 
interface, and I don't know of any kernel command line option to set it 
either, but a udev rule setting it as a attribute works reliably.  I'm 
using something like the following to set all my SATA devices to use BFQ 
by default:

KERNEL=="sd?", SUBSYSTEM=="block", ACTION=="add", 
ATTR{queue/scheduler}="bfq"
> 
>>> While Firefox and Linux in general have their performance "issues",
>>> that's not relevant here. I'm comparing the same distros, same Firefox
>>> versions, same Firefox add-ons, etc. I eventually tested many hardware
>>> configurations: different CPU's, motherboards, GPU's, SSD's, RAM, etc.
>>> The only remaining difference I can find is that the computer with
>>> acceptable performance uses LVM + EXT4 while all the others use BTRFS.
>>>
>>> With all the great feedback I have gotten here, I'm now ready to
>>> retest this after implementing all the BTRFS-related suggestions I
>>> have received. Maybe that will solve the problem or maybe this mystery
>>> will continue...
>>
>> Hmm, if you're only using SSD's, that may partially explain things.  I don't
>> remember if it was mentioned earlier in this thread, but you might try
>> adding 'nossd' to the mount options.  The 'ssd' mount option (which gets set
>> automatically if the device reports as non-rotational) impacts how the block
>> allocator works, and that can have a pretty insane impact on performance.
> 
> I will test the "nossd" mount option.
If you're not seeing any difference on the newest kernels (I hadn't 
realized you were running 4.13 on anything), you might not see any 
impact from doing this.  I'd also suggest running a full balance prior 
to testing _after_ switching the option, part of the performance impact 
is due to the resultant on-disk layout.
> 
>> Additionally, independently from that, try toggling the 'discard' mount
>> option.  If you have it enabled, disable it, if you have it disabled, enable
>> it.  Inline discards can be very expensive on some hardware, especially
>> older SSD's, and discards happen pretty frequently in a COW filesystem.
> 
> I have been following this advice, so I have never enabled discard for
> an NVMe drive. Do you think it is worth testing?
> 
> Solid State Drives/NVMe - ArchWiki
> https://wiki.archlinux.org/index.php/Solid_State_Drives/NVMe
> 
> Discards:
> Note: Although continuous TRIM is an option (albeit not recommended)
> for SSDs, NVMe devices should not be issued discards.
I've never heard this particular advice before, and it offers no source 
for the claim.  I have seen Intel's advice that they quote below that 
before though, and would tend to agree with it for most users.  The part 
that makes this all complicated is that different devices handle batched 
discards (what the Arch people call 'Periodic TRIM') and on-demand 
discards (what the Arch people call 'Continuous TRIM') differently. 
Some devices (especially old ones) do better with batched discards, 
while others seem to do better with on-demand discards.  On top of that, 
there's significant variance based on the actual workload (including 
that from the filesystem itself).

Based on my own experience using BTRFS on SATA SSD's, it's usually 
better to do batched discards unless you only write to the filesystem 
infrequently, because:
1. Each COW operation triggers an associated discard (this can seriously 
kill your performance).
2. Because old copies of blocks get discarded immediately, it's much 
harder to recover a damaged filesystem.

There are some odd exceptions though.  If for example you're running 
BTRFS on a ramdisk or ZRAM device, you should just use on-demand 
discards, as that will free up memory immediately.

next prev parent reply	other threads:[~2017-11-02 18:37 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-31  7:05 btrfs filesystem defragment -r -- does it affect subvolumes? Ulli Horlacher
2017-09-12 16:28 ` defragmenting best practice? Ulli Horlacher
2017-09-12 17:27   ` Austin S. Hemmelgarn
2017-09-14  7:54     ` Duncan
2017-09-14 12:28       ` Austin S. Hemmelgarn
2017-09-14 11:38   ` Kai Krakow
2017-09-14 13:31     ` Tomasz Kłoczko
2017-09-14 15:24       ` Kai Krakow
2017-09-14 15:47         ` Kai Krakow
2017-09-14 17:48         ` Tomasz Kłoczko
2017-09-14 18:53           ` Austin S. Hemmelgarn
2017-09-15  2:26             ` Tomasz Kłoczko
2017-09-15 12:23               ` Austin S. Hemmelgarn
2017-09-14 20:17           ` Kai Krakow
2017-09-15 10:54           ` Michał Sokołowski
2017-09-15 11:13             ` Peter Grandi
2017-09-15 13:07             ` Tomasz Kłoczko
2017-09-15 14:11               ` Michał Sokołowski
2017-09-15 16:35                 ` Peter Grandi
2017-09-15 17:08                 ` Kai Krakow
2017-09-15 19:10                   ` Tomasz Kłoczko
2017-09-20  6:38                     ` Dave
2017-09-20 11:46                       ` Austin S. Hemmelgarn
2017-09-21 20:10                         ` Kai Krakow
2017-09-21 23:30                           ` Dave
2017-09-21 23:58                           ` Kai Krakow
2017-09-22 11:22                           ` Austin S. Hemmelgarn
2017-09-22 20:29                             ` Marc Joliet
2017-09-21 11:09                       ` Duncan
2017-10-31 21:47                         ` Dave
2017-10-31 23:06                           ` Peter Grandi
2017-11-01  0:37                             ` Dave
2017-11-01 12:21                               ` Austin S. Hemmelgarn
2017-11-02  1:39                                 ` Dave
2017-11-02 11:07                                   ` Austin S. Hemmelgarn
2017-11-03  2:59                                     ` Dave
2017-11-03  7:12                                       ` Kai Krakow
2017-11-03  5:58                                   ` Marat Khalili
2017-11-03  7:19                                     ` Kai Krakow
2017-11-01 17:48                               ` Peter Grandi
2017-11-02  0:09                                 ` Dave
2017-11-02 11:17                                   ` Austin S. Hemmelgarn
2017-11-02 18:09                                     ` Dave
2017-11-02 18:37                                       ` Austin S. Hemmelgarn [this message]
2017-11-02  0:43                                 ` Peter Grandi
2017-11-02 21:16                               ` Kai Krakow
2017-11-03  2:47                                 ` Dave
2017-11-03  7:26                                   ` Kai Krakow
2017-11-03 11:30                                     ` Austin S. Hemmelgarn
     [not found]                             ` <CAH=dxU47-52-asM5vJ_-qOpEpjZczHw7vQzgi1-TeKm58++zBQ@mail.gmail.com>
2017-12-11  5:18                               ` Dave
2017-12-11  6:10                                 ` Timofey Titovets
2017-11-01  7:43                           ` Sean Greenslade
2017-11-01 13:31                           ` Duncan
2017-11-01 23:36                             ` Dave
2017-09-21 19:28                       ` Sean Greenslade
2017-09-20  7:34                     ` Dmitry Kudriavtsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbb18834-342a-c724-35ac-8bb9dcfe0a35@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=davestechshop@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).