linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin Steigerwald <Martin@lichtvoll.de>
To: Hugo Mills <hugo@carfax.org.uk>
Cc: Robert White <rwhite@pobox.com>, linux-btrfs@vger.kernel.org
Subject: Re: BTRFS free space handling still needs more work: Hangs again
Date: Sat, 27 Dec 2014 18:11:21 +0100	[thread overview]
Message-ID: <9346949.uCfVN6IAc7@merkaba> (raw)
In-Reply-To: <20141227162642.GK25267@carfax.org.uk>

[-- Attachment #1: Type: text/plain, Size: 6800 bytes --]

Am Samstag, 27. Dezember 2014, 16:26:42 schrieb Hugo Mills:
> On Sat, Dec 27, 2014 at 06:54:33AM -0800, Robert White wrote:
> > On 12/27/2014 05:55 AM, Martin Steigerwald wrote:
> [snip]
> > >while fio was just *laying* out the 4 GiB file. Yes, thats 100% system CPU
> > >for 10 seconds while allocatiing a 4 GiB file on a filesystem like:
> > >
> > >martin@merkaba:~> LANG=C df -hT /home
> > >Filesystem             Type   Size  Used Avail Use% Mounted on
> > >/dev/mapper/msata-home btrfs  170G  156G   17G  91% /home
> > >
> > >where a 4 GiB file should easily fit, no? (And this output is with the 4
> > >GiB file. So it was even 4 GiB more free before.)
> > 
> > No. /usr/bin/df is an _approximation_ in BTRFS because of the limits
> > of the fsstat() function call. The fstat function call was defined
> > in 1990 and "can't understand" the dynamic allocation model used in
> > BTRFS as it assumes fixed geometry for filesystems. You do _not_
> > have 17G actually available. You need to rely on btrfs fi df and
> > btrfs fi show to figure out how much space you _really_ have.
> > 
> > According to this block you have a RAID1 of ~ 160GB expanse (two 160G disks)
> > 
> > > merkaba:~> date; btrfs fi sh /home ; btrfs fi df /home
> > > Sa 27. Dez 13:26:39 CET 2014
> > > Label: 'home'  uuid: [some UUID]
> > >          Total devices 2 FS bytes used 152.83GiB
> > >          devid    1 size 160.00GiB used 160.00GiB path
> > /dev/mapper/msata-home
> > >          devid    2 size 160.00GiB used 160.00GiB path
> > /dev/mapper/sata-home
> > 
> > And according to this block you have about 4.49GiB of data space:
> > 
> > > Btrfs v3.17
> > > Data, RAID1: total=154.97GiB, used=149.58GiB
> > > System, RAID1: total=32.00MiB, used=48.00KiB
> > > Metadata, RAID1: total=5.00GiB, used=3.26GiB
> > > GlobalReserve, single: total=512.00MiB, used=0.00B
> > 
> > 154.97
> >   5.00
> >   0.032
> > + 0.512
> > 
> > Pretty much as close to 160GiB as you are going to get (those
> > numbers being rounded up in places for "human readability") BTRFS
> > has allocate 100% of the raw storage into typed extents.
> > 
> > A large datafile can only fit in the 154.97-149.58 = 5.39
> 
>    I appreciate that this is something of a minor point in the grand
> scheme of things, but I'm afraid I've lost the enthusiasm to engage
> with the broader (somewhat rambling, possibly-at-cross-purposes)
> conversation in this thread. However...
> 
> > Trying to allocate that 4GiB file into that 5.39GiB of space becomes
> > an NP-complete (e.g. "very hard") problem if it is very fragmented.
> 
>    This is... badly mistaken, at best. The problem of where to write a
> file into a set of free extents is definitely *not* an NP-hard
> problem. It's a P problem, with an O(n log n) solution, where n is the
> number of free extents in the free space cache. The simple approach:
> fill the first hole with as many bytes as you can, then move on to the
> next hole. More complex: order the free extents by size first. Both of
> these are O(n log n) algorithms, given an efficient general-purpose
> index of free space.
> 
>    The problem of placing file data isn't a bin-packing problem; it's
> not like allocating RAM (where each allocation must be contiguous).
> The items being placed may be split as much as you like, although
> minimising the amount of splitting is a goal.
> 
>    I suspect that the performance problems that Martin is seeing may
> indeed be related to free space fragmentation, in that finding and
> creating all of those tiny extents for a huge file is causing
> problems. I believe that btrfs isn't alone in this, but it may well be
> showing the problem to a far greater degree than other FSes. I don't
> have figures to compare, I'm afraid.

Thats what I wanted to hint at.

I suspect an issue with free space fragmentation and do what I think I see:

btrfs balance minimizes free space in chunk fragmentation.

And that is my whole case on why I think it does help with my /home
filesystem.

So while btrfs filesystem defragment may help with defragmenting individual
files, possibly at the cost of fragmenting free space at least on filesystem
almost full conditions, I think to help with free space fragmentation there
are only three options at the moment:

1) reformat and restore via rsync or btrfs send from backup (i.e. file based)

2) make the BTRFS in itself bigger

3) btrfs balance at least chunks, at least those that are not more than 70%
or 80% full.

Do you know of any other ways to deal with it?

So yes, in case it really is freespace fragmentation, I do think a balance
may be helpful. Even if usually one should not use a balance.
 
> > I also don't know what kind of tool you are using, but it might be
> > repeatedly trying and failing to fallocate the file as a single
> > extent or something equally dumb.
> 
>    Userspace doesn't as far as I know, get to make that decision. I've
> just read the fallocate(2) man page, and it says nothing at all about
> the contiguity of the extent(s) storage allocated by the call.

fio fallocates just once. And then writes, even if the fallocate call fails.

Was nice to see at some point as BTRFS returned out of space on the
fallocate but was still be able to write the 4GiB of random data. I bet
the latter was due to compression. Thus while it could not guarentee
that the 4 GiB will be there in all cases, i.e. even with uncompressible
data, it was able to wrote out the random buffer fio repeatedly wrote.


I think I will step back from this now, its weekend and a quiet time after
all.

I probably got a bit too engaged with this discussion. Yet, I had the feeling
I was treated by Robert like someone who doesn´t know a thing. I want to
approach this with a willingness to learn, and I don´t want to interpret
an empirical result away before someone even had a closer look at it.

I had this before where an expert claimed that he wouldn´t reduce the
dirty_background_ratio on an rsync via NFS case and I actually needed to
prove the result to him before he – I don´t even know – eventually
accepted it.

I may be off with my free space fragmentation idea, thus let the kern.log
and my results speak for itself. I don´t see much point in proceeding this
discussion before a BTRFS developer had a look at it.

I put up the sysrq-trigger t kern.log onto the bug report. The bugzilla does
not seem to be available from here at the moment, nginx reports "502 bad
gateway, but the kern.log I attached to it. And in case someone needs it by
mail, just ping me.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2014-12-27 17:11 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-26 13:37 BTRFS free space handling still needs more work: Hangs again Martin Steigerwald
2014-12-26 14:20 ` Martin Steigerwald
2014-12-26 14:41   ` Martin Steigerwald
2014-12-27  3:33     ` Duncan
2014-12-26 15:59 ` Martin Steigerwald
2014-12-27  4:26   ` Duncan
2014-12-26 22:48 ` Robert White
2014-12-27  5:54   ` Duncan
2014-12-27  9:01   ` Martin Steigerwald
2014-12-27  9:30     ` Hugo Mills
2014-12-27 10:54       ` Martin Steigerwald
2014-12-27 11:52         ` Robert White
2014-12-27 13:16           ` Martin Steigerwald
2014-12-27 13:49             ` Robert White
2014-12-27 14:06               ` Martin Steigerwald
2014-12-27 14:00             ` Robert White
2014-12-27 14:14               ` Martin Steigerwald
2014-12-27 14:21                 ` Martin Steigerwald
2014-12-27 15:14                   ` Robert White
2014-12-27 16:01                     ` Martin Steigerwald
2014-12-28  0:25                       ` Robert White
2014-12-28  1:01                         ` Bardur Arantsson
2014-12-28  4:03                           ` Robert White
2014-12-28 12:03                             ` Martin Steigerwald
2014-12-28 17:04                               ` Patrik Lundquist
2014-12-29 10:14                                 ` Martin Steigerwald
2014-12-28 12:07                             ` Martin Steigerwald
2014-12-28 14:52                               ` Robert White
2014-12-28 15:42                                 ` Martin Steigerwald
2014-12-28 15:47                                   ` Martin Steigerwald
2014-12-29  0:27                                   ` Robert White
2014-12-29  9:14                                     ` Martin Steigerwald
2014-12-27 16:10                     ` Martin Steigerwald
2014-12-27 14:19               ` Robert White
2014-12-27 11:11       ` Martin Steigerwald
2014-12-27 12:08         ` Robert White
2014-12-27 13:55       ` Martin Steigerwald
2014-12-27 14:54         ` Robert White
2014-12-27 16:26           ` Hugo Mills
2014-12-27 17:11             ` Martin Steigerwald [this message]
2014-12-27 17:59               ` Martin Steigerwald
2014-12-28  0:06             ` Robert White
2014-12-28 11:05               ` Martin Steigerwald
2014-12-28 13:00         ` BTRFS free space handling still needs more work: Hangs again (further tests) Martin Steigerwald
2014-12-28 13:40           ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare) Martin Steigerwald
2014-12-28 13:56             ` BTRFS free space handling still needs more work: Hangs again (further tests, as close as I dare, current idea) Martin Steigerwald
2014-12-28 15:00               ` Martin Steigerwald
2014-12-29  9:25               ` Martin Steigerwald
2014-12-27 18:28       ` BTRFS free space handling still needs more work: Hangs again Zygo Blaxell
2014-12-27 18:40         ` Hugo Mills
2014-12-27 19:23           ` BTRFS free space handling still needs more work: Hangs again (no complete lockups, "just" tasks stuck for some time) Martin Steigerwald
2014-12-29  2:07             ` Zygo Blaxell
2014-12-29  9:32               ` Martin Steigerwald
2015-01-06 20:03                 ` Zygo Blaxell
2015-01-07 19:08                   ` Martin Steigerwald
2015-01-07 21:41                     ` Zygo Blaxell
2015-01-08  5:45                     ` Duncan
2015-01-08 10:18                       ` Martin Steigerwald
2015-01-09  8:25                         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9346949.uCfVN6IAc7@merkaba \
    --to=martin@lichtvoll.de \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rwhite@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).