From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Blocket for more than 120 seconds
Date: Sun, 15 Dec 2013 13:24:08 +0000 (UTC) [thread overview]
Message-ID: <pan$e9ec8$f8abd102$9f0d15ad$59d83b5d@cox.net> (raw)
In-Reply-To: CAD_cGvF8-pxXXvwpB+yNYnKRW09S4c0VuKUGiWa7=8C2uFkvqA@mail.gmail.com
Hans-Kristian Bakke posted on Sun, 15 Dec 2013 03:35:53 +0100 as
excerpted:
> I have done some more testing. I turned off everything using the disk
> and only did defrag. I have created a script that gives me a list of the
> files with the most extents. I started from the top to improve the
> fragmentation of the worst files. The most fragmentet file was a file of
> about 32GB with over 250 000 extents!
> It seems that I can defrag a two to three largish (15-30GB) ~100 000
> extents files just fine, but after a while the system locks up (not a
> complete hard lock, but everythings hangs and a restart is necessary to
> get a fully working system again)
>
> It seems like defrag operations is triggering the issue. Probably in
> combination with the large and heavily fragmentet files.
>
> I have slowly managed to defragment the most fragmented files,
> rebooting 4 times, so one of the worst files now is this one:
>
> # filefrag vide01.mkv
> vide01.mkv: 77810 extents found
> # lsattr vide01.mkv
> ---------------- vide01.mkv
>
> All the large fragmented files are ordinary mkv-files (video). The
> reason for the heavy fragmentation was that perhaps 50 to 100 files were
> written at the same time over a period of several days, with lots of
> other activity going on as well. No problem for the system as it was
> network limited most of the time.
> Although defrag alone can trigger blocking, so can also straight rsync
> from another internal 1000 MB/s continous reads internal array combined
> with some random activity. It seems that the cause is just heavy IO. Is
> it possible that even though I have seemingly lots of space free in
> measured MBytes, that it is all so fragmented that btrfs can't allocate
> space efficiently enough? Or would that give other errors?
>
> I actually downgraded from kernel 3.13-rc2 because of not being able to
> do anything else if copying between the internal arrays without btrfs
> hanging, although seemingly just temporarily and not as bad as the
> defrag blocking.
>
> I will try to free up some space before running more defrag too, just to
> check if that is the issue.
Three points based on bits you've mentioned, the third likely being the
most critical for this thread, plus a fourth point, not something you've
mentioned but just in case...:
1) You mentioned compress=lzo. It's worth noting that at present,
filefrag interprets the file segments btrfs breaks compressed files up
into as part of the compression process as fragments (of IIRC 128 KiB
each, altho I'm not absolutely sure on that number), so anything that's
compressed and over that size will be reported by filefrag as fragmented,
even if it's not.
They're working on teaching filefrag about this sort of thing, and in
fact I saw some proposed patches for the kernel side of things just
yesterday, IIRC, but it'll be a few months before all the various pieces
are in the kernel and filefrag upstreams, and it'll probably be a few
months to a year or more beyond that before those fixes filter out to
what the distros are shipping.
However, btrfs won't ordinarily attempt to compress known video files
(unless the compress-force mount option is used) since they're normally
already compressed, so that's unlikely to be the issue with your mkvs.
Additionally, if defragging them is reducing the fragmentation
dramatically, that's not the problem, as if it was defragging wouldn't
make a difference.
But it might make a difference on some other files you have...
2) You mentioned backups. Your backups aren't of the type that use lots
and lots of hardlinks are they? Because btrfs isn't the most efficient
at processing large numbers of hardlinks. For hardlink-type backups,
etc, a filesystem other than btrfs will be preferred. (Additionally,
since btrfs is still experimental, it's probably a good idea to avoid
having both your working system and backups on btrfs anyway. Better to
have the backups on something else, in case btrfs lives up to the risk
level implied by its development status.)
3) Critically, the blocked task in your first post was rtorrent. Given
that and the filetypes (mkv video files) involved, one can guess that you
do quite a bit of torrenting.
I'm not sure about rtorrent, but a lot of torrent clients (possibly
optionally) pre-allocate the space required for a file, then fill in
random individual chunks they are downloaded and written.
*** THIS IS ONE OF THE WORST USE-CASES POSSIBLE FOR ANY COPY-ON-WRITE
FILESYSTEM, BTRFS INCLUDED!! ***
What happens is that each of those random chunk-writes creates a new
extent, a new fragment of the file, since COW means it isn't rewritten in-
place and thus must be mapped to a new location on the disk. If that's
what you're doing, then no WONDER those files have so many extents -- a
32-gig file with a quarter million extents in the worst-case you
mentioned. And especially on spinning rust, YES, something that heavily
fragmented WILL trigger I/O blockages for minutes at a time!
(The other very common bad-case, tho I don't believe quite as bad as the
torrent case as I don't think they commonly re-write the /entire/ thing,
only large parts of it, is virtual machine images, where writes to the
virtual disk in the VM end up being "internal file writes" in the file
containing that image on the host filesystem. The below recommendations
apply there as well.)
There's several possible workarounds including turning off the pre-
allocate option in your torrent client, if possible, and several variants
on the theme of telling btrfs not to COW those particular files so they
get rewritten in-place instead.
3a) Create a separate filesystem for your torrent files and either use
something other than a COW filesystem (ext4 or xfs might be usable
options), or if you use btrfs, mount that filesystem with the nodatacow
mount-option.
3b) Configure your btrfs client to use a particular directory (which it
probably already does by default, but make sure all the files are going
there -- you're not telling it to directly write some torrent downloads
elsewhere instead), then set the NOCOW attribute on that directory.
Newly created files in it should inherit that NOCOW.
3c) Arrange to set NOCOW on individual files before you start writing
into them. Often this is done by touching the file to create it, then
setting the NOCOW attribute, then writing into the existing zero-length
file. The attribute needs to be set before there's data in the file --
setting it after the fact doesn't really help, and this is one way to do
it (with inherit from the directory as in 3b another). However, this
could be impossible or at minimum rather complicated to handle with the
torrent client, so 3a or 3b are likely to be more practical choices.
3d) As mentioned, in some clients it's possible to turn off the pre-
allocation option. However, this can have other effects as well or pre-
allocation wouldn't be a common torrent client practice in the first
place, so it may not be what you want in any case. Pre-allocation is
fine, as long as the file is set NOCOW using one of the methods above.
Of course once you have that setup, you'll still have to deal with the
existing heavily fragmented files, but at least you won't have a
continuing regenerating problem you have to deal with. =:^)
4) This one you didn't mention but just in case... There have been some
issues with btrfs qgroups that I'm not sure are fully ironed out yet. In
general, I'd recommend staying away from quotas and their btrfs qgroups
implementation for now. As with hardlink-heavy use-cases, use a
different filesystem if you are dependent on quotes, at least for the
time being.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2013-12-15 13:24 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-14 20:30 Blocket for more than 120 seconds Hans-Kristian Bakke
2013-12-14 21:35 ` Chris Murphy
2013-12-14 23:19 ` Hans-Kristian Bakke
2013-12-14 23:50 ` Chris Murphy
2013-12-15 0:28 ` Hans-Kristian Bakke
2013-12-15 1:59 ` Chris Murphy
2013-12-15 2:35 ` Hans-Kristian Bakke
2013-12-15 13:24 ` Duncan [this message]
2013-12-15 14:51 ` Hans-Kristian Bakke
2013-12-15 23:08 ` Duncan
2013-12-16 0:06 ` Hans-Kristian Bakke
2013-12-16 10:19 ` Duncan
2013-12-16 10:55 ` Hans-Kristian Bakke
2013-12-16 15:00 ` Duncan
2013-12-16 15:18 ` Chris Mason
2013-12-16 16:32 ` Hans-Kristian Bakke
2013-12-16 18:16 ` Chris Mason
2013-12-16 18:22 ` Hans-Kristian Bakke
2013-12-16 18:33 ` Chris Mason
2013-12-16 18:41 ` Hans-Kristian Bakke
2013-12-15 3:47 ` George Mitchell
2013-12-15 23:39 ` Charles Cazabon
2013-12-16 0:16 ` Hans-Kristian Bakke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$e9ec8$f8abd102$9f0d15ad$59d83b5d@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).