linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-btrfs@vger.kernel.org, xfs <linux-xfs@vger.kernel.org>
Subject: Re: Unexpected reflink/subvol snapshot behaviour
Date: Mon, 1 Feb 2021 18:14:21 -0800	[thread overview]
Message-ID: <20210202021421.GA7181@magnolia> (raw)
In-Reply-To: <20210121222051.GB4626@dread.disaster.area>

On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote:
> Hi btrfs-gurus,
> 
> I'm running a simple reflink/snapshot/COW scalability test at the
> moment. It is just a loop that does "fio overwrite of 10,000 4kB
> random direct IOs in a 4GB file; snapshot" and I want to check a
> couple of things I'm seeing with btrfs. fio config file is appended
> to the email.
> 
> Firstly, what is the expected "space amplification" of such a
> workload over 1000 iterations on btrfs? This will write 40GB of user
> data, and I'm seeing btrfs consume ~220GB of space for the workload
> regardless of whether I use subvol snapshot or file clones
> (reflink).  That's a space amplification of ~5.5x (a lot!) so I'm
> wondering if this is expected or whether there's something else
> going on. XFS amplification for 1000 iterations using reflink is
> only 1.4x, so 5.5x seems somewhat excessive to me.
> 
> On a similar note, the IO bandwidth consumed by btrfs is way out of
> proportion with the amount of user data being written. I'm seeing
> multiple GBs being written by btrfs on every iteration - easily
> exceeding 5GB of writes per cycle in the later iterations of the
> test. Given that only 40MB of user data is being written per cycle,
> there's a write amplification factor of well over 100x ocurring
> here. In comparison, XFS is writing roughly consistently at 80MB/s
> to disk over the course of the entire workload, largely because of
> journal traffic for the transactions run during COW and clone
> operations.  Is such a huge amount of of IO expected for btrfs in
> this situation?

<just gonna snip this part>

> FYI, I've compared btrfs reflink to XFS reflink, too, and XFS fio
> performance stays largely consistent across all 1000 iterations at
> around 13-14k +/-2k IOPS. The reflink time also scales linearly with
> the number of extents in the source file and levels off at about
> 10-11s per cycle as the extent count in the source file levels off
> at ~850,000 extents. XFS completes the 1000 iterations of
> write/clone in about 4 hours, btrfs completels the same part of the
> workload in about 9 hours.

Just out of curiosity, do any of the patches in [1] improve those
numbers for xfs?  As you noted a long time ago, the transaction
reservations are kind of huge, so I fixed those and shook out a few
other warts while I was at it.

--D

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups
> 
> Oh, I almost forget - FIEMAP performance. After the reflink test, I
> map all the extents in all the cloned files to a) count the extents
> and b) confirm that the difference between clones is correct (~10000
> extents not shared with the previous iteration). Pulling the extent
> maps out of XFS takes about 3s a clone (~850,000 extents), or 30
> minutes for the whole set when run serialised. btrfs takes 90-100s
> per clone - after 8 hours it had only managed to map 380 files and
> was running at 6-7000 read IOPS the entire time. IOWs, it was taking
> _half a million_ read IOs to map the extents of a single clone that
> only had a million extents in it. Is it expected that FIEMAP is so
> slow and IO intensive on cloned files?
> 
> As there are no performance anomolies or memory reclaim issues with
> XFS running this workload, I suspect the issues I note above are
> btrfs issues, not expected behaviour.  I'm not sure what the
> expected scalability of btrfs file clones and snapshots are though,
> so I'm interested to hear if these results are expected or not.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> JOBS=4
> IODEPTH=4
> IOCOUNT=$((10000 / $JOBS))
> FILESIZE=4g
> 
> cat >$fio_config <<EOF
> [global]
> name=${DST}.name
> directory=${DST}
> size=${FILESIZE}
> randrepeat=0
> bs=4k
> ioengine=libaio
> iodepth=${IODEPTH}
> iodepth_low=2
> direct=1
> end_fsync=1
> fallocate=none
> overwrite=1
> number_ios=${IOCOUNT}
> runtime=30s
> group_reporting=1
> disable_lat=1
> lat_percentiles=0
> clat_percentiles=0
> slat_percentiles=0
> disk_util=0
> 
> [j1]
> filename=testfile
> rw=randwrite
> 
> [j2]
> filename=testfile
> rw=randwrite
> 
> [j3]
> filename=testfile
> rw=randwrite
> 
> [j4]
> filename=testfile
> rw=randwrite
> EOF
> 

  parent reply	other threads:[~2021-02-02  2:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 22:20 Unexpected reflink/subvol snapshot behaviour Dave Chinner
2021-01-23  8:42 ` Qu Wenruo
2021-01-23  8:51   ` Qu Wenruo
2021-01-23 10:39   ` Roman Mamedov
2021-01-23 10:58     ` Qu Wenruo
2021-01-24 13:08   ` Filipe Manana
2021-01-24 22:36   ` Dave Chinner
2021-01-25  1:09     ` Qu Wenruo
2021-01-29 23:25     ` Zygo Blaxell
2021-02-02  0:13       ` Dave Chinner
2021-02-12  3:04         ` Zygo Blaxell
2021-01-24  0:19 ` Zygo Blaxell
2021-01-24 21:43   ` Dave Chinner
2021-01-30  1:03     ` Zygo Blaxell
2021-02-02  2:14 ` Darrick J. Wong [this message]
2021-02-02  6:02   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202021421.GA7181@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).