Re: Very slow filesystem

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Very slow filesystem
Date: Thu, 5 Jun 2014 03:05:26 +0000 (UTC)	[thread overview]
Message-ID: <pan$765af$b28af9a8$903ce16b$940baf34@cox.net> (raw)
In-Reply-To: CAEezp7uMXA3t8-GDVcVzx9An+hPBZmynYZ-wUXzhJuP93czs1w@mail.gmail.com

Igor M posted on Thu, 05 Jun 2014 00:15:31 +0200 as excerpted:

> Why btrfs becames EXTREMELY slow after some time (months) of usage ?
> This is now happened second time, first time I though it was hard drive
> fault, but now drive seems ok.
> Filesystem is mounted with compress-force=lzo and is used for MySQL
> databases, files are mostly big 2G-8G.

That's the problem right there, database access pattern on files over 1 
GiB in size, but the problem along with the fix has been repeated over 
and over and over and over... again on this list, and it's covered on the 
btrfs wiki as well, so I guess you haven't checked existing answers 
before you asked the same question yet again.

Never-the-less, here's the basic answer yet again...

Btrfs, like all copy-on-write (COW) filesystems, has a tough time with a 
particular file rewrite pattern, that being frequently changed and 
rewritten data internal to an existing file (as opposed to appended to 
it, like a log file).  In the normal case, such an internal-rewrite 
pattern triggers copies of the rewritten blocks every time they change, 
*HIGHLY* fragmenting this type of files after only a relatively short 
period.  While compression changes things up a bit (filefrag doesn't know 
how to deal with it yet and its report isn't reliable), it's not unusual 
to see people with several-gig files with this sort of write pattern on 
btrfs without compression find filefrag reporting literally hundreds of 
thousands of extents!

For smaller files with this access pattern (think firefox/thunderbird 
sqlite database files and the like), typically up to a few hundred MiB or 
so, btrfs' autodefrag mount option works reasonably well, as when it sees 
a file fragmenting due to rewrite, it'll queue up that file for 
background defrag via sequential copy, deleting the old fragmented copy 
after the defrag is done.

For larger files (say a gig plus) with this access pattern, typically 
larger database files as well as VM images, autodefrag doesn't scale so 
well, as the whole file must be rewritten each time, and at that size the 
changes can come faster than the file can be rewritten.  So a different 
solution must be used for them.

The recommended solution for larger internal-rewrite-pattern files is to 
give them the NOCOW file attribute (chattr +C) , so they're updated in 
place.  However, this attribute cannot be added to a file with existing 
data and have things work as expected.  NOCOW must be added to the file 
before it contains data.  The easiest way to do that is to set the 
attribute on the subdir that will contain the files and let the files 
inherit the attribute as they are created.  Then you can copy (not move, 
and don't use cp's --reflink option) existing files into the new subdir, 
such that the new copy gets created with the NOCOW attribute.

NOCOW files are updated in-place, thereby eliminating the fragmentation 
that would otherwise occur, keeping them fast to access.

However, there are a few caveats.  Setting NOCOW turns off file 
compression and checksumming as well, which is actually what you want for 
such files as it eliminates race conditions and other complex issues that 
would otherwise occur when trying to update the files in-place (thus the 
reason such features aren't part of most non-COW filesystems, which 
update in-place by default).

Additionally, taking a btrfs snapshot locks the existing data in place 
for the snapshot, so the first rewrite to a file block (4096 bytes, I 
believe) after a snapshot will always be COW, even if the file has the 
NOCOW attribute set.  Some people run automatic snapshotting software and 
can be taking snapshots as often as once a minute.  Obviously, this 
effectively almost kills NOCOW entirely, since it's then only effective 
on changes after the first one between shapshots, and with snapshots only 
a minute apart, the file fragments almost as fast as it would have 
otherwise!

So snapshots and the NOCOW attribute basically don't get along with each 
other.  But because snapshots stop at subvolume boundaries, one method to 
avoid snapshotting NOCOW files is to put your NOCOW files, already in 
their own subdirs if using the suggestion above, into dedicated subvolumes 
as well.  That lets you continue taking snapshots of the parent subvolume, 
without snapshotting the the dedicated subvolumes containing the NOCOW 
database or VM-image files.

You'd then do conventional backups of your database and VM-image files, 
instead of snapshotting them.

Of course if you're not using btrfs snapshots in the first place, you can 
avoid the whole subvolume thing, and just put your NOCOW files in their 
own subdirs, setting NOCOW on the subdir as suggested above, so files 
(and further subdirs, nested subdirs inherit the NOCOW as well) inherit 
the NOCOW of the subdir they're created in, at that creation.

Meanwhile, it can be noted that once you turn off COW/compression/
checksumming, and if you're not snapshotting, you're almost back to the 
features of a normal filesystem anyway, except you can still use the 
btrfs multi-device features, of course.  So if you're not using the multi-
device features either, an alternative solution is to simply use a more 
traditional filesystem (like ext4 or xfs, with xfs being targeted at 
large files anyway, so for multi-gig database and VM-image files it could 
be a good choice =:^) for your large internal-rewrite-pattern files, 
while potentially continuing to use btrfs for your normal files, where 
btrfs' COW nature and other features are a better match for the use-case, 
than they are for gig-plus internal-rewrite-pattern files.

As I said, further discussion elsewhere already, but that's the problem 
you're seeing along with a couple potential solutions.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-06-05  3:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-04 22:15 Very slow filesystem Igor M
2014-06-04 22:27 ` Fajar A. Nugraha
2014-06-04 22:40   ` Roman Mamedov
2014-06-04 22:45   ` Igor M
2014-06-04 23:17     ` Timofey Titovets
2014-06-05  3:05 ` Duncan [this message]
2014-06-05  3:22   ` Fajar A. Nugraha
2014-06-05  4:45     ` Duncan
2014-06-05  7:50   ` Igor M
2014-06-05 10:54     ` Russell Coker
2014-06-05 15:52   ` Igor M
2014-06-05 16:13     ` Timofey Titovets
2014-06-05 19:53       ` Duncan
2014-06-06 19:06         ` Mitch Harder
2014-06-06 19:59           ` Duncan
2014-06-07  2:29           ` Russell Coker
2014-06-05  8:08 ` Erkki Seppala
2014-06-05  8:12   ` Erkki Seppala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$765af$b28af9a8$903ce16b$940baf34@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).