Re: Blocket for more than 120 seconds

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Blocket for more than 120 seconds
Date: Sun, 15 Dec 2013 13:24:08 +0000 (UTC)	[thread overview]
Message-ID: <pan$e9ec8$f8abd102$9f0d15ad$59d83b5d@cox.net> (raw)
In-Reply-To: CAD_cGvF8-pxXXvwpB+yNYnKRW09S4c0VuKUGiWa7=8C2uFkvqA@mail.gmail.com

Hans-Kristian Bakke posted on Sun, 15 Dec 2013 03:35:53 +0100 as
excerpted:

> I have done some more testing. I turned off everything using the disk
> and only did defrag. I have created a script that gives me a list of the
> files with the most extents. I started from the top to improve the
> fragmentation of the worst files. The most fragmentet file was a file of
> about 32GB with over 250 000 extents!
> It seems that I can defrag a two to three largish (15-30GB) ~100 000
> extents files just fine, but after a while the system locks up (not a
> complete hard lock, but everythings hangs and a restart is necessary to
> get a fully working system again)
> 
> It seems like defrag operations is triggering the issue. Probably in
> combination with the large and heavily fragmentet files.
> 
> I have slowly managed to defragment the most fragmented files,
> rebooting 4 times, so one of the worst files now is this one:
> 
> # filefrag vide01.mkv
> vide01.mkv: 77810 extents found 
> # lsattr vide01.mkv
> ---------------- vide01.mkv
> 
> All the large fragmented files are ordinary mkv-files (video). The
> reason for the heavy fragmentation was that perhaps 50 to 100 files were
> written at the same time over a period of several days, with lots of
> other activity going on as well. No problem for the system as it was
> network limited most of the time.
> Although defrag alone can trigger blocking, so can also straight rsync
> from another internal 1000 MB/s continous reads internal array combined
> with some random activity. It seems that the cause is just heavy IO. Is
> it possible that even though I have seemingly lots of space free in
> measured MBytes, that it is all so fragmented that btrfs can't allocate
> space efficiently enough? Or would that give other errors?
> 
> I actually downgraded from kernel 3.13-rc2 because of not being able to
> do anything else if copying between the internal arrays without btrfs
> hanging, although seemingly just temporarily and not as bad as the
> defrag blocking.
> 
> I will try to free up some space before running more defrag too, just to
> check if that is the issue.

Three points based on bits you've mentioned, the third likely being the 
most critical for this thread, plus a fourth point, not something you've 
mentioned but just in case...:

1) You mentioned compress=lzo.  It's worth noting that at present, 
filefrag interprets the file segments btrfs breaks compressed files up 
into as part of the compression process as fragments (of IIRC 128 KiB 
each, altho I'm not absolutely sure on that number), so anything that's 
compressed and over that size will be reported by filefrag as fragmented, 
even if it's not.

They're working on teaching filefrag about this sort of thing, and in 
fact I saw some proposed patches for the kernel side of things just 
yesterday, IIRC, but it'll be a few months before all the various pieces 
are in the kernel and filefrag upstreams, and it'll probably be a few 
months to a year or more beyond that before those fixes filter out to 
what the distros are shipping.

However, btrfs won't ordinarily attempt to compress known video files 
(unless the compress-force mount option is used) since they're normally 
already compressed, so that's unlikely to be the issue with your mkvs.  
Additionally, if defragging them is reducing the fragmentation 
dramatically, that's not the problem, as if it was defragging wouldn't 
make a difference.

But it might make a difference on some other files you have...

2) You mentioned backups.  Your backups aren't of the type that use lots 
and lots of hardlinks are they?  Because btrfs isn't the most efficient 
at processing large numbers of hardlinks.  For hardlink-type backups, 
etc, a filesystem other than btrfs will be preferred.  (Additionally, 
since btrfs is still experimental, it's probably a good idea to avoid 
having both your working system and backups on btrfs anyway.  Better to 
have the backups on something else, in case btrfs lives up to the risk 
level implied by its development status.)

3) Critically, the blocked task in your first post was rtorrent.  Given 
that and the filetypes (mkv video files) involved, one can guess that you 
do quite a bit of torrenting.

I'm not sure about rtorrent, but a lot of torrent clients (possibly 
optionally) pre-allocate the space required for a file, then fill in 
random individual chunks they are downloaded and written.

*** THIS IS ONE OF THE WORST USE-CASES POSSIBLE FOR ANY COPY-ON-WRITE 
FILESYSTEM, BTRFS INCLUDED!! ***

What happens is that each of those random chunk-writes creates a new 
extent, a new fragment of the file, since COW means it isn't rewritten in-
place and thus must be mapped to a new location on the disk.  If that's 
what you're doing, then no WONDER those files have so many extents -- a 
32-gig file with a quarter million extents in the worst-case you 
mentioned.  And especially on spinning rust, YES, something that heavily 
fragmented WILL trigger I/O blockages for minutes at a time!

(The other very common bad-case, tho I don't believe quite as bad as the 
torrent case as I don't think they commonly re-write the /entire/ thing, 
only large parts of it, is virtual machine images, where writes to the 
virtual disk in the VM end up being "internal file writes" in the file 
containing that image on the host filesystem.  The below recommendations 
apply there as well.)

There's several possible workarounds including turning off the pre-
allocate option in your torrent client, if possible, and several variants 
on the theme of telling btrfs not to COW those particular files so they 
get rewritten in-place instead.

3a) Create a separate filesystem for your torrent files and either use 
something other than a COW filesystem (ext4 or xfs might be usable 
options), or if you use btrfs, mount that filesystem with the nodatacow 
mount-option.

3b) Configure your btrfs client to use a particular directory (which it 
probably already does by default, but make sure all the files are going 
there -- you're not telling it to directly write some torrent downloads 
elsewhere instead), then set the NOCOW attribute on that directory.  
Newly created files in it should inherit that NOCOW.

3c) Arrange to set NOCOW on individual files before you start writing 
into them.  Often this is done by touching the file to create it, then 
setting the NOCOW attribute, then writing into the existing zero-length 
file.  The attribute needs to be set before there's data in the file -- 
setting it after the fact doesn't really help, and this is one way to do 
it (with inherit from the directory as in 3b another).  However, this 
could be impossible or at minimum rather complicated to handle with the 
torrent client, so 3a or 3b are likely to be more practical choices.

3d) As mentioned, in some clients it's possible to turn off the pre-
allocation option.  However, this can have other effects as well or pre-
allocation wouldn't be a common torrent client practice in the first 
place, so it may not be what you want in any case.  Pre-allocation is 
fine, as long as the file is set NOCOW using one of the methods above.

Of course once you have that setup, you'll still have to deal with the 
existing heavily fragmented files, but at least you won't have a 
continuing regenerating problem you have to deal with. =:^)

4) This one you didn't mention but just in case...  There have been some 
issues with btrfs qgroups that I'm not sure are fully ironed out yet.  In 
general, I'd recommend staying away from quotas and their btrfs qgroups 
implementation for now.  As with hardlink-heavy use-cases, use a 
different filesystem if you are dependent on quotes, at least for the 
time being.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2013-12-15 13:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-14 20:30 Blocket for more than 120 seconds Hans-Kristian Bakke
2013-12-14 21:35 ` Chris Murphy
2013-12-14 23:19   ` Hans-Kristian Bakke
2013-12-14 23:50     ` Chris Murphy
2013-12-15  0:28       ` Hans-Kristian Bakke
2013-12-15  1:59         ` Chris Murphy
2013-12-15  2:35           ` Hans-Kristian Bakke
2013-12-15 13:24             ` Duncan [this message]
2013-12-15 14:51               ` Hans-Kristian Bakke
2013-12-15 23:08                 ` Duncan
2013-12-16  0:06                   ` Hans-Kristian Bakke
2013-12-16 10:19                     ` Duncan
2013-12-16 10:55                       ` Hans-Kristian Bakke
2013-12-16 15:00                         ` Duncan
2013-12-16 15:18             ` Chris Mason
2013-12-16 16:32               ` Hans-Kristian Bakke
2013-12-16 18:16                 ` Chris Mason
2013-12-16 18:22                   ` Hans-Kristian Bakke
2013-12-16 18:33                     ` Chris Mason
2013-12-16 18:41                       ` Hans-Kristian Bakke
2013-12-15  3:47         ` George Mitchell
2013-12-15 23:39       ` Charles Cazabon
2013-12-16  0:16         ` Hans-Kristian Bakke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$e9ec8$f8abd102$9f0d15ad$59d83b5d@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).