From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: Mitchell Fossen <msfossen@gmail.com>,
Duncan <1i5t5.duncan@cox.net>, <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs: poor performance on deleting many large files
Date: Fri, 27 Nov 2015 09:49:04 +0800 [thread overview]
Message-ID: <5657B690.3080900@cn.fujitsu.com> (raw)
In-Reply-To: <1448488198.4717.4.camel@gmail.com>
Mitchell Fossen wrote on 2015/11/25 15:49 -0600:
> On Mon, 2015-11-23 at 06:29 +0000, Duncan wrote:
>
>> Using subvolumes was the first recommendation I was going to make, too,
>> so you're on the right track. =:^)
>>
>> Also, in case you are using it (you didn't say, but this has been
>> demonstrated to solve similar issues for others so it's worth
>> mentioning), try turning btrfs quota functionality off. While the devs
>> are working very hard on that feature for btrfs, the fact is that it's
>> simply still buggy and doesn't work reliably anyway, in addition to
>> triggering scaling issues before they'd otherwise occur. So my
>> recommendation has been, and remains, unless you're working directly with
>> the devs to fix quota issues (in which case, thanks!), if you actually
>> NEED quota functionality, use a filesystem where it works reliably, while
>> if you don't, just turn it off and avoid the scaling and other issues
>> that currently still come with it.
>>
>
> I did indeed have quotas turned on for the home directories! Since they were
> mostly to calculate space used by everyone (since du -hs is so slow) and not
> actually needed to limit people, I disabled them.
[[About quota]]
Personally speaking, I'd like to have some comparison between quota
enabled and disabled, to help locate if it's quota causing the problem.
If you can find a good and reliable reproducer, it would be very helpful
for developers to improve btrfs.
BTW, it's also a good idea to us ps to locate what process is running at
the time your btrfs hangs.
If it's kernel thread named btrfs-transaction, then it may be related to
quota.
>
>> As for defrag, that's quite a topic of its own, with complications
>> related to snapshots and the nocow file attribute. Very briefly, if you
>> haven't been running it regularly or using the autodefrag mount option by
>> default, chances are your available free space is rather fragmented as
>> well, and while defrag may help, it may not reduce fragmentation to the
>> degree you'd like. (I'd suggest using filefrag to check fragmentation,
>> but it doesn't know how to deal with btrfs compression, and will report
>> heavy fragmentation for compressed files even if they're fine. Since you
>> use compression, that kind of eliminates using filefrag to actually see
>> what your fragmentation is.)
>> Additionally, defrag isn't snapshot aware (they tried it for a few
>> kernels a couple years ago but it simply didn't scale), so if you're
>> using snapshots (as I believe Ubuntu does by default on btrfs, at least
>> taking snapshots for upgrade-in-place), so using defrag on files that
>> exist in the snapshots as well can dramatically increase space usage,
>> since defrag will break the reflinks to the snapshotted extents and
>> create new extents for defragged files.
>>
>> Meanwhile, the absolute worst-case fragmentation on btrfs occurs with
>> random-internal-rewrite-pattern files (as opposed to never changed, or
>> append-only). Common examples are database files and VM images. For
>> /relatively/ small files, to say 256 MiB, the autodefrag mount option is
>> a reasonably effective solution, but it tends to have scaling issues with
>> files over half a GiB so you can call this a negative recommendation for
>> trying that option with half-gig-plus internal-random-rewrite-pattern
>> files. There are other mitigation strategies that can be used, but here
>> the subject gets complex so I'll not detail them. Suffice it to say that
>> if the filesystem in question is used with large VM images or database
>> files and you haven't taken specific fragmentation avoidance measures,
>> that's very likely a good part of your problem right there, and you can
>> call this a hint that further research is called for.
>>
>> If your half-gig-plus files are mostly write-once, for example most media
>> files unless you're doing heavy media editing, however, then autodefrag
>> could be a good option in general, as it deals well with such files and
>> with random-internal-rewrite-pattern files under a quarter gig or so. Be
>> aware, however, that if it's enabled on an already heavily fragmented
>> filesystem (as yours likely is), it's likely to actually make performance
>> worse until it gets things under control. Your best bet in that case, if
>> you have spare devices available to do so, is probably to create a fresh
>> btrfs and consistently use autodefrag as you populate it from the
>> existing heavily fragmented btrfs. That way, it'll never have a chance
>> for the fragmentation to build up in the first place, and autodefrag used
>> as a routine mount option should keep it from getting bad in normal use.
>
> Thanks for explaining that! Most of these files are written once and then read
> from for the rest of their "lifetime" until the simulations are done and they
> get archived/deleted. I'll try leaving autodefrag on and defragging directories
> over the holiday weekend when no one is using the server. There is some database
> usage, but I turned off COW for its folder and it only gets used sporadically
> and shouldn't be a huge factor in day-to-day usage.
>
> Also, is there a recommendation for relatime vs noatime mount options? I don't
> believe anything that runs on the server needs to use file access times, so if
> it can help with performance/disk usage I'm fine with setting it to noatime.
>
> I just tried copying a 70GB folder and then rm -rf it and it didn't appear to
> impact performance, and I plan to try some larger tests later.
It depends on the folder structure, but even for the worst case, it
won't really trigger your problem.
[[About large files in btrfs]]
I agree with Duncan's suggestion completely, as that's the problem of
btrfs fs tree design, it will cause too much race on the same tree lock.
Change it multi-subvolume will improve performance greatly especially
for large files/directories.
The real problem is, btrfs delete one large file in a very unscaled method:
Block transaction until *all* the file extents belong to the inode are
deleted.
Check __btrfs_update_delayed_inode() function in fs/btrfs/delayed-inode.c.
For small files that's OK, but for super huge files, that's a nightmare,
as the transaction won't be committed until all the file extents are
deleted.
For 70G case, it will be consist of less than 600 file extents.
2 ~ 3 leaves can handle it, you may not feel the glitch when running
delayed inode.
But for your 500~700G case, btrfs will need to delete about 4K file
extents, the deletion may change the b-tree hugely, and takes a longer time.
So in your case, you may need that large files to trigger the problem...
We can try a better method to delete some file extents transcation by
transaction, and hopes it may help your case.
Thanks,
Qu
>
> Thanks again for the help!
>
> -Mitch
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
next prev parent reply other threads:[~2015-11-27 1:49 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-23 1:43 btrfs: poor performance on deleting many large files Mitch Fossen
2015-11-23 6:29 ` Duncan
2015-11-25 21:49 ` Mitchell Fossen
2015-11-26 16:52 ` Duncan
2015-11-26 18:25 ` Christoph Anton Mitterer
2015-11-26 23:29 ` Duncan
2015-11-27 0:06 ` Christoph Anton Mitterer
2015-11-27 3:38 ` Duncan
2015-11-28 3:57 ` Christoph Anton Mitterer
2015-11-28 6:49 ` Duncan
2015-12-12 22:15 ` Christoph Anton Mitterer
2015-12-13 7:10 ` Duncan
2015-12-16 22:14 ` Christoph Anton Mitterer
2015-12-14 14:24 ` Austin S. Hemmelgarn
2015-12-14 19:39 ` Christoph Anton Mitterer
2015-12-14 20:27 ` Austin S. Hemmelgarn
2015-12-14 21:30 ` Lionel Bouton
2015-12-14 23:25 ` Christoph Anton Mitterer
2015-12-15 1:49 ` Duncan
2015-12-15 2:38 ` Lionel Bouton
2015-12-16 8:10 ` Duncan
2015-12-14 23:10 ` Christoph Anton Mitterer
2015-12-14 23:16 ` project idea: per-object default mount-options / more btrfs-properties / chattr attributes (was: btrfs: poor performance on deleting many large files) Christoph Anton Mitterer
2015-12-15 2:08 ` btrfs: poor performance on deleting many large files Duncan
2015-12-15 4:05 ` Chris Murphy
2015-11-27 1:49 ` Qu Wenruo [this message]
2015-11-23 12:59 ` Austin S Hemmelgarn
2015-11-26 0:23 ` [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?) Christoph Anton Mitterer
2015-11-26 0:33 ` Hugo Mills
2015-12-09 5:43 ` Christoph Anton Mitterer
2015-12-09 13:36 ` Duncan
2015-12-14 2:46 ` Christoph Anton Mitterer
2015-12-14 11:19 ` Duncan
2015-12-16 23:39 ` Kai Krakow
2015-12-14 1:44 ` Christoph Anton Mitterer
2015-12-14 10:51 ` Duncan
2015-12-16 23:55 ` Christoph Anton Mitterer
2015-11-26 23:08 ` Duncan
2015-12-09 5:45 ` Christoph Anton Mitterer
2015-12-09 16:36 ` Duncan
2015-12-16 21:59 ` Christoph Anton Mitterer
2015-12-17 4:06 ` Duncan
2015-12-18 0:21 ` Christoph Anton Mitterer
2015-12-17 4:35 ` Duncan
2015-12-17 5:07 ` Duncan
2015-12-17 5:12 ` Duncan
2015-12-17 6:00 ` Duncan
2015-12-17 6:01 ` Duncan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5657B690.3080900@cn.fujitsu.com \
--to=quwenruo@cn.fujitsu.com \
--cc=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=msfossen@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.