linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: "Holger Hoffstätte" <holger.hoffstaette@googlemail.com>,
	bo.li.liu@oracle.com
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Stray 4k extents with slow buffered writes
Date: Thu, 3 Mar 2016 15:47:16 -0500	[thread overview]
Message-ID: <56D8A2D4.2010907@gmail.com> (raw)
In-Reply-To: <56D89655.70504@googlemail.com>

On 2016-03-03 14:53, Holger Hoffstätte wrote:
> On 03/03/16 19:33, Liu Bo wrote:
>> On Thu, Mar 03, 2016 at 01:28:29PM +0100, Holger Hoffstätte wrote:
> (..)
>>> I've noticed that slow slow buffered writes create a huge number of
>>> unnecessary 4k sized extents. At first I wrote it off as odd buffering
>>> behaviour of the application (a download manager), but it can be easily
>>> reproduced. For example:
>>
>> On a new fresh btrfs, I cannot reproduce the fragmented layout with "wget --limit-rate=1m",
>
> For better effect lower the bandwidth, 100k or so.
>
>> [root@10-11-17-236 btrfs]# filefrag -v -b linux-4.5-rc6.tar.xz
>> Filesystem type is: 9123683e
>> File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks, blocksize
>> 1024)
>>   ext logical physical expected length flags
>>     0       0   143744            5264
>>     1    5264   149008           35884
>>     2   41148   220848   184892      4
>
> So you also have one, after ~35 MB. See below.
>
>>     3   41152   184896   220852  35948
>>     4   77100   220852   220844   9192 eof
>> linux-4.5-rc6.tar.xz: 4 extents found
>
> No sync? filefrag is a notorious liar. ;)
>
> It changes things because you likely have a higher value set for
> vm/dirty_expire_centisecs or dirty_bytes explicitly configured; I have
> it set to 1000 (10s) to prevent large writebacks from choking everything.
> The default is probably still 30s aka 3000.
Last I looked (about a month ago), the default was still 3000.
>
> I understand that I should get smaller extents overall, but not the stray
> 4k sized ones in regular intervals.
>
>> Can you gather your mount options and 'btrfs fi show/df' output?
>
> I can reproduce that on another machine/drive where it also initially
> didn't show the 4k extents in a parallel-running filefrag, but did
> after a sync (when the extents were written). That was surprising.
>
> Anyway, it's just an external scratch drive..the mount options really
> don't matter much:
>
> $mount | grep sdf
> /dev/sdf1 on /mnt/usb type btrfs (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
Do you still see the same behavior with the old space_cache format? 
This appears to be an issue of space management and allocation, so this 
may be playing a part.
>
> $btrfs fi df /mnt/usb
> Data, single: total=4.00GiB, used=3.31GiB
> System, single: total=32.00MiB, used=16.00KiB
> Metadata, single: total=1.00GiB, used=4.45MiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
>
> $btrfs fi show /mnt/usb
> Label: 'Test'  uuid: 1d37a067-5b7d-4dcf-b2c1-7c5745b9c7a5
> 	Total devices 1 FS bytes used 3.32GiB
> 	devid    1 size 111.79GiB used 5.03GiB path /dev/sdf1
>
> I then remounted with -ocommit=300 and set dirty_expire_centisecs=10000
> (100s). That results in a single large extent, even after sync, so
> writeback expiry and commit definitely play a part.
>
> Here is what it looks like when both dirty_expire and commit are set
> to very low 5s:
I'd be somewhat curious to see if something similar happens on other 
filesystems with such low writeback timeouts.  My thought in this case 
is that the issue is that BTRFS's allocator isn't smart enough to try 
and merge new extents into existing ones when possible.
>
> $filefrag -ek linux-4.4.4.tar.bz2
> Filesystem type is: 9123683e
> File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes)
>   ext:     logical_offset:        physical_offset: length:   expected: flags:
>     0:        0..    5199:  227197920.. 227203119:   5200:
>     1:     5200..    5203:  227169600.. 227169603:      4:  227203120:
>     2:     5204..   15407:  227203124.. 227213327:  10204:  227169604:
>     3:    15408..   20623:  227213332.. 227218547:   5216:  227213328:
>     4:    20624..   20627:  227169604.. 227169607:      4:  227218548:
>     5:    20628..   30831:  227218552.. 227228755:  10204:  227169608:
>     6:    30832..   36047:  227228760.. 227233975:   5216:  227228756:
>     7:    36048..   36051:  227169608.. 227169611:      4:  227233976:
>     8:    36052..   41263:  227233980.. 227239191:   5212:  227169612:
>     9:    41264..   46479:  227271164.. 227276379:   5216:  227239192:
>    10:    46480..   46483:  227239196.. 227239199:      4:  227276380:
>    11:    46484..   51695:  227276384.. 227281595:   5212:  227239200:
>    12:    51696..   61903:  227281600.. 227291807:  10208:  227281596:
>    13:    61904..   61907:  227239200.. 227239203:      4:  227291808:
>    14:    61908..   67119:  227291812.. 227297023:   5212:  227239204:
>    15:    67120..   77327:  227297028.. 227307235:  10208:  227297024:
>    16:    77328..   77331:  227239204.. 227239207:      4:  227307236:
>    17:    77332..   82543:  227307240.. 227312451:   5212:  227239208:
>    18:    82544..   92751:  227312456.. 227322663:  10208:  227312452:
>    19:    92752..   92755:  227239208.. 227239211:      4:  227322664:
>    20:    92756..   97967:  227322668.. 227327879:   5212:  227239212:
>    21:    97968..  102547:  227239212.. 227243791:   4580:  227327880: last,eof
> linux-4.4.4.tar.bz2: 22 extents found
>
> There's definitely a pattern here.
What I find particularly interesting here is that the small extents 
appear to be packed out of order into the spaces being left between the 
bigger ones.  For something that you don't need super fast access to, 
this is actually a good thing because it reduces free space 
fragmentation, but BTRFS has no way of knowing whether this trade off is 
worth it for that particular file.
>
> Out of curiosity I also tried the above run with autodefrag enabled, and
> that helped a little bit: it merges those 4k extents into 256k-sized ones
> with the adjacent followup extent. That was nice, but still a bit unexpected
> since we've been told autodefrag is for random writes.
> It also doesn't really explain the original behaviour.
>
> I guess I need to add autodefrag everywhere now. :)


  reply	other threads:[~2016-03-03 20:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-03 12:28 Stray 4k extents with slow buffered writes Holger Hoffstätte
2016-03-03 18:33 ` Liu Bo
2016-03-03 19:53   ` Holger Hoffstätte
2016-03-03 20:47     ` Austin S. Hemmelgarn [this message]
2016-03-03 21:50       ` Holger Hoffstätte
2016-03-03 22:13         ` Liu Bo
2016-03-04  1:37           ` Liu Bo
2016-03-04 12:17     ` Duncan
2016-03-03 20:55 ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D8A2D4.2010907@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=bo.li.liu@oracle.com \
    --cc=holger.hoffstaette@googlemail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).