From: Liu Bo <bo.li.liu@oracle.com>
To: "Holger Hoffstätte" <holger.hoffstaette@googlemail.com>
Cc: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Chris Mason <clm@fb.com>
Subject: Re: Stray 4k extents with slow buffered writes
Date: Thu, 3 Mar 2016 17:37:24 -0800 [thread overview]
Message-ID: <20160304013723.GC8666@localhost.localdomain> (raw)
In-Reply-To: <20160303221309.GA8666@localhost.localdomain>
On Thu, Mar 03, 2016 at 02:13:09PM -0800, Liu Bo wrote:
> On Thu, Mar 03, 2016 at 10:50:58PM +0100, Holger Hoffstätte wrote:
> > On 03/03/16 21:47, Austin S. Hemmelgarn wrote:
> > >> $mount | grep sdf
> > >> /dev/sdf1 on /mnt/usb type btrfs (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
> > > Do you still see the same behavior with the old space_cache format?
> > > This appears to be an issue of space management and allocation, so
> > > this may be playing a part.
> >
> > I just did the clear_cache,space_cache=v1 dance. Now a download with
> > bandwidth-limit=1M, dirty_expire=20s, commit=30 and *no* autodefrag
> > first ended up looking like this:
> >
> > $filefrag -ek linux-4.5-rc6.tar.xz
> > Filesystem type is: 9123683e
> > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes)
> > ext: logical_offset: physical_offset: length: expected: flags:
> > 0: 0.. 7427: 227197920.. 227205347: 7428:
> > 1: 7428.. 33027: 227205348.. 227230947: 25600:
> > 2: 33028.. 53011: 227271164.. 227291147: 19984: 227230948:
> > 3: 53012.. 72995: 227291148.. 227311131: 19984:
> > 4: 72996.. 86291: 227311132.. 227324427: 13296: last,eof
> > linux-4.5-rc6.tar.xz: 2 extents found
> >
> > Yay! But wait, there's more!
> >
> > $sync
> > $filefrag -ek linux-4.5-rc6.tar.xz
> > Filesystem type is: 9123683e
> > File size of linux-4.5-rc6.tar.xz is 88362576 (86292 blocks of 1024 bytes)
> > ext: logical_offset: physical_offset: length: expected: flags:
> > 0: 0.. 7423: 227197920.. 227205343: 7424:
> > 1: 7424.. 7427: 227169600.. 227169603: 4: 227205344:
> > 2: 7428.. 33023: 227205348.. 227230943: 25596: 227169604:
> > 3: 33024.. 33027: 227169604.. 227169607: 4: 227230944:
> > 4: 33028.. 53007: 227271164.. 227291143: 19980: 227169608:
> > 5: 53008.. 53011: 227230948.. 227230951: 4: 227291144:
> > 6: 53012.. 72991: 227291148.. 227311127: 19980: 227230952:
> > 7: 72992.. 72995: 227230952.. 227230955: 4: 227311128:
> > 8: 72996.. 86291: 227311132.. 227324427: 13296: 227230956: last,eof
> > linux-4.5-rc6.tar.xz: 9 extents found
> >
> > Now I'm like ¯\(ツ)/¯
>
> Yeah, after sync, I also get this file layout.
OK...I think I've found why we get this weird layout, it's because btrfs
applies COW for overwrites while ext4 just updates it in place.
Here is my filefrag output after sync,
# !filefrag
filefrag -vb /mnt/btrfs/linux-4.5-rc6.tar.xz
Filesystem type is: 9123683e
File size of /mnt/btrfs/linux-4.5-rc6.tar.xz is 88362576 (86292 blocks,
blocksize 1024)
ext logical physical expected length flags
0 0 12352 5020
1 5020 17376 17372 4
2 5024 133504 17380 30908
3 35932 195296 164412 4
4 35936 164416 195300 30876
5 66812 195300 195292 19480 eof
/mnt/btrfs/linux-4.5-rc6.tar.xz: 6 extents found
And the output of btrfs_dirty_pages, I grep for the first 4k single extent,
# trace-cmd report -i /tmp/trace.dat | grep "dirty_page" | grep $((5020 << 10)) -A 2 -B 2
wget-29482 [003] 783746.039682: bprint: btrfs_dirty_pages: page start 5124096 end 5132287
wget-29482 [003] 783746.039771: bprint: btrfs_dirty_pages: page start 5128192 end 5144575
wget-29482 [003] 783746.263238: bprint: btrfs_dirty_pages: page start 5140480 end 5148671
wget-29482 [003] 783746.263304: bprint: btrfs_dirty_pages: page start 5144576 end 5160959
wget-29482 [003] 783746.263546: bprint: btrfs_dirty_pages: page start 5156864 end 5165055
So it turns out to be that wget writes the data as an overlapped way,
extent [5140480, 4096) is written twice, and the second write to the
extent can trigger a COW write when the first write to the extent has
finish the endio.
With mount -onodatacow,
# !filefrag
filefrag -vb /mnt/btrfs/linux-4.5-rc6.tar.xz
Filesystem type is: 9123683e
File size of /mnt/btrfs/linux-4.5-rc6.tar.xz is 88362576 (86292 blocks,
blocksize 1024)
ext logical physical expected length flags
0 0 12416 5292
1 5292 133504 17708 35872
2 41164 169376 30880
3 72044 200256 14248 eof
/mnt/btrfs/linux-4.5-rc6.tar.xz: 2 extents found
Anyway it's not due to any btrfs allocator bug (although I was thinking it
was and trying to find it out...).
Thanks,
-liubo
>
> >
> > With autodefrag the same happens, though it then eventually does the
> > merging from 4k -> 256k. I went searching for that hardcoded 256k value
> > and found it as default in ioctl.c:btrfs_defrag_file() when no threshold
> > has been passed, as is the case for autodefrag. I'll try to increase that
> > and see how much I can destroy.
> >
> > Also, rsync with --bwlimit=1m does _not_ seem to create files like this:
> >
> > $rsync (..)
> > $filefrag -ek linux-4.4.4.tar.bz2
> > Filesystem type is: 9123683e
> > File size of linux-4.4.4.tar.bz2 is 105008928 (102548 blocks of 1024 bytes)
> > ext: logical_offset: physical_offset: length: expected: flags:
> > 0: 0.. 4095: 227197920.. 227202015: 4096:
> > 1: 4096.. 25599: 227202016.. 227223519: 21504:
> > 2: 25600.. 51199: 227271164.. 227296763: 25600: 227223520:
> > 3: 51200.. 76799: 227296764.. 227322363: 25600:
> > 4: 76800.. 102547: 227322364.. 227348111: 25748: last,eof
> > linux-4.4.4.tar.bz2: 2 extents found
> >
> > Which looks exactly as one would expect, probably - as Chris' mail
> > just explained - it doesn't use O_APPEND, whereas wget apparently does.
>
> Interesting, my strace log shows wget doesn't open the file with O_APPEND.
>
> open("linux-4.5-rc6.tar.xz", O_WRONLY|O_CREAT|O_EXCL, 0666) = 4
>
> Thanks,
>
> -liubo
>
> >
> > > I'd be somewhat curious to see if something similar happens on other
> > > filesystems with such low writeback timeouts. My thought in this
> > > case is that the issue is that BTRFS's allocator isn't smart enough
> > > to try and merge new extents into existing ones when possible.
> >
> > ext4 creates 1-2 extents, regardless of method.
> >
> > Holger
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-03-04 1:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-03 12:28 Stray 4k extents with slow buffered writes Holger Hoffstätte
2016-03-03 18:33 ` Liu Bo
2016-03-03 19:53 ` Holger Hoffstätte
2016-03-03 20:47 ` Austin S. Hemmelgarn
2016-03-03 21:50 ` Holger Hoffstätte
2016-03-03 22:13 ` Liu Bo
2016-03-04 1:37 ` Liu Bo [this message]
2016-03-04 12:17 ` Duncan
2016-03-03 20:55 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160304013723.GC8666@localhost.localdomain \
--to=bo.li.liu@oracle.com \
--cc=ahferroin7@gmail.com \
--cc=clm@fb.com \
--cc=holger.hoffstaette@googlemail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).