Re: XFS reflink overhead, ioctl(FICLONE)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Suyash Mahar <smahar@ucsd.edu>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, tpkelly@eecs.umich.edu,
	Suyash Mahar <suyash12mahar@outlook.com>
Subject: Re: XFS reflink overhead, ioctl(FICLONE)
Date: Thu, 15 Dec 2022 11:19:44 +1100	[thread overview]
Message-ID: <20221215001944.GC1971568@dread.disaster.area> (raw)
In-Reply-To: <CACQnzjua_0=Nz_gyza=iFVigceJO6Wbzn4X86E2y4N_Od3Yi0g@mail.gmail.com>

On Tue, Dec 13, 2022 at 08:47:03PM -0800, Suyash Mahar wrote:
> Hi Darrick,
> 
> Thank you for the response. I have replied inline.
> 
> -Suyash
> 
> Le mar. 13 déc. 2022 à 09:18, Darrick J. Wong <djwong@kernel.org> a écrit :
> >
> > [ugh, your email never made it to the list.  I bet the email security
> > standards have been tightened again.  <insert rant about dkim and dmarc
> > silent failures here>] :(
> >
> > On Sat, Dec 10, 2022 at 09:28:36PM -0800, Suyash Mahar wrote:
> > > Hi all!
> > >
> > > While using XFS's ioctl(FICLONE), we found that XFS seems to have
> > > poor performance (ioctl takes milliseconds for sparse files) and the
> > > overhead
> > > increases with every call.
> > >
> > > For the demo, we are using an Optane DC-PMM configured as a
> > > block device (fsdax) and running XFS (Linux v5.18.13).
> >
> > How are you using fsdax and reflink on a 5.18 kernel?  That combination
> > of features wasn't supported until 6.0, and the data corruption problems
> > won't get fixed until a pull request that's about to happen for 6.2.
> 
> We did not enable the dax option. The optane DIMMs are configured to
> appear as a block device.
> 
> $ mount | grep xfs
> /dev/pmem0p4 on /mnt/pmem0p4 type xfs
> (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> 
> Regardless of the block device (the plot includes results for optane
> and RamFS), it seems like the ioctl(FICLONE) call is slow.

Please define "slow" - is it actually slower than it should be
(i.e. a bug) or does it simply not perform according to your
expectations?

A few things that you can quantify to answer these questions.

1. What is the actual rate it is cloning extents at? i.e. extent count
/ clone time?  Is this rate consistent/sustained, or is it dropping substantially
over time and/or increase in extent count?

3. How does clone speed of a given file compare to the actual data
copy speed of that file (please include fsync time in the data
copy results)? Is cloning faster or slower than copying
the data? What is the extent count of the file at the cross-over
point where cloning goes from being faster to slower than copying
the data?

3. How does it compare with btrfs running the same write/clone
workload? Does btrfs run faster? Does it perform better with
high extent counts than XFS? What about with high sharing counts
(e.g. after 500 or 1000 clones of the source file)?

Basically, I'm trying to understand what "slow" means in teh context
of the operations you are performing.  I haven't seen any recent
performance regressions in clone speed on XFS, so I'm trying to
understand what you are seeing and why you think it is slower than
it should be.

> > > We create a 1 GiB dense file, then repeatedly modify a tiny random
> > > fraction of it and make a clone via ioctl(FICLONE).
> >
> > Yay, random cow writes, that will slowly increase the number of space
> > mapping records in the file metadata.

Yup, the scripts I use do exactly this - 10,000 random 4kB writes to
a 8GB file between reflink clones. I then iterate a few thousand
times and measure the reflink time.

> > > The time required for the ioctl() calls increases from large to insane
> > > over the course of ~250 iterations: From roughly a millisecond for the
> > > first iteration or two (which seems high, given that this is on
> > > Optane and the code doesn't fsync or msync anywhere at all, ever) to 20
> > > milliseconds (which seems crazy).
> >
> > Does the system call runtime increase with O(number_extents)?  You might
> > record the number of extents in the file you're cloning by running this
> > periodically:
> >
> > xfs_io -c stat $path | grep fsxattr.nextents
> 
> The extent count does increase linearly (just like the ioctl() call latency).

As expected. Changing the sharing state a single extent has a
roughly constant overhead regardless of the number of extents in the
file. Hence clone time should scale linearly with the number of
extents that need to have their shared state modified.

> I used the xfs_bmap tool, let me know if this is not the right way. If
> it is not, I'll update the microbenchmark to run xfs_io.

xfs_bmap is the slow way - it has to iterate every extents and
format them out to userspace. the above mechanism just does a single
syscall to query the count of extents from the inode. Using the
fsxattr extent count query is much faster, especially when you have
files with tens of millions of extents in them....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2022-12-15  0:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CACQnzjuhRzNruTm369wVQU3y091da2c+h+AfRED+AtA-dYqXNQ@mail.gmail.com>
2022-12-13 17:18 ` XFS reflink overhead, ioctl(FICLONE) Darrick J. Wong
2022-12-14  1:46   ` Terence Kelly
2022-12-14  4:47   ` Suyash Mahar
2022-12-15  0:19     ` Dave Chinner [this message]
2022-12-16  1:06       ` Terence Kelly
2022-12-17 17:30         ` Mike Fleetwood
2022-12-17 18:43           ` Terence Kelly
2022-12-18  1:46         ` Dave Chinner
2022-12-18  4:47           ` Suyash Mahar
2022-12-20  3:06             ` Darrick J. Wong
2022-12-21 22:34               ` atomic file commits (was: Re: XFS reflink overhead, ioctl(FICLONE)) Terence Kelly
2022-12-18 23:40           ` XFS reflink overhead, ioctl(FICLONE) Terence Kelly
2022-12-20  2:16             ` Dave Chinner
2022-12-21 23:07               ` wish list for Santa (was: Re: XFS reflink overhead, ioctl(FICLONE)) Terence Kelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221215001944.GC1971568@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=smahar@ucsd.edu \
    --cc=suyash12mahar@outlook.com \
    --cc=tpkelly@eecs.umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox