From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
Date: Fri, 2 Oct 2015 07:33:04 -0400 [thread overview]
Message-ID: <20151002113304.GB51208@bfoster.bfoster> (raw)
In-Reply-To: <20151002021634.GJ3902@dastard>
On Fri, Oct 02, 2015 at 12:16:34PM +1000, Dave Chinner wrote:
> On Thu, Oct 01, 2015 at 04:38:51PM -0400, Brian Foster wrote:
> > On Thu, Sep 24, 2015 at 08:36:25AM +1000, Dave Chinner wrote:
> > > On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> > > > > This will have the same lock contention problems that the kernel
> > > > > code would have had - my repair scalablity tests regularly reach
> > > > > over 1GB/s of metadata being prefetched through tens of threads, so
> > > > > this is going have a significant impact on performance in those
> > > > > tests....
> > ...
> > >
> > > > I'll have to think about this some more and see what's effective. I'd
> > > > also like to quantify the effect the current locking has on performance
> > > > if possible. Can you provide a brief description of your typical repair
> > > > test that you would expect this to hurt? E.g., a large fs, many AGs,
> > > > populated with fs_mark and repaired with many threads..? Any special
> > > > storage configuration? Thanks.
> > >
> > > Just my usual 500TB fs_mark test...
> >
> > Thanks for the test information and sample results. I wasn't able to get
> > close enough to the base numbers you mentioned on IRC with the spinning
> > rust storage I have available. Instead, I tried running something
> > similar using a large ramdisk as a backing store. I have a 500T sparse
> > file formatted with XFS and populated with ~25m inodes that uses roughly
> > ~16GB of the backing store (leaving another 16GB of usable RAM for the
> > server).
>
> Keep in mind when doing this sort of testing that a ramdisk does not
> perform like normal storage - it does synchronous IO via memcpy()
> rather than async IO via DMA, and hence has a very different CPU
> cache footprint, concurrency and latency profile to normal IO.
>
> I'm not saying your numbers are invalid, just making sure that you
> remember that ramdisks, while fast, may not give results that are
> representative of normal storage behaviour...
>
Yeah, good point. I kind of just got here as a last resort. I'm going to
see if I can get something set up to more easily test this kind of thing
in the future.
> > I run xfs_repair[1] against that 500TB fs and see spikes of throughput up
> > over 2GB/s and get repair result reports like the following:
>
> <snip>
>
> > So I'm not seeing much difference here with the max lsn tracking as it
> > is implemented in this series.
>
> Yup, same here. min/max spread between the 4.2.0 debian unstable
> package and a locally built 4.2.0+lsn tracking is the same. Both
> varied between 2m55s and 3m5s, all used roughly the same user and
> system CPU. So we'll commit it as is ;)
>
Ok, thanks for verifying that. I still have to rebase this code once the
kernel side is reviewed and make a couple minor updates as well.
> > Out of curiosity, I ran a v3.2.2
> > xfs_repair binary that happened to be installed on this host, got a much
> > faster result than even the current master, and via perf diff discovered
> > that the biggest difference between the runs was actual CRC calculation.
>
> I think you might have drawn the incorrect conclusion from the
> profile data...
>
> > Based on that, I ran the same crc=0 test against the current code with
> > the following results:
> >
> > Phase Start End Duration
> > Phase 1: 10/01 13:53:49 10/01 13:53:49
> > Phase 2: 10/01 13:53:49 10/01 13:53:50 1 second
> > Phase 3: 10/01 13:53:50 10/01 13:54:52 1 minute, 2 seconds
> > Phase 4: 10/01 13:54:52 10/01 13:55:01 9 seconds
> > Phase 5: 10/01 13:55:01 10/01 13:55:01
> > Phase 6: 10/01 13:55:01 10/01 13:55:35 34 seconds
> > Phase 7: 10/01 13:55:35 10/01 13:55:35
> >
> > ... so that knocks off another 15s or so from the test. Note that the
> > lsn lock is irrelevant in the crc=0 case as there are no metadata LSNs,
> > thus no verification occurs.
>
> My results:
> fsmark creation
> mkfs (4.2.0) files/s (avg) wall time repair
> defaults 203209 4m57s 2m55s-3m05s
> -m crc=0 188837 5m06s 2m07s-2m10s
>
> On first glance, I've just replicated your results. But the PCP
> monitoring I run all the time tell a different story. Both
> xfs_repair runs are io bound during phase 3 and 4, but the crc=0 run
> does a *lot less IO*.
>
> The reason for this will be obvious when I tell you: inodes are 512
> bytes when CRCs are enabled, 256 bytes when they aren't. Hence
> there's twice as much metadata to be read when CRCs are enabled, and
> when it is read at a fixed rate, it takes twice as long to read.
>
Ah, right. I probably should have formatted the crc=0 fs with isize=512.
> The trap here is that perf profiles tend to present
> "percentage of CPU used by a function" rather than absolute CPU
> consumed by the function across a test run. That's why you might
> think that the CPU consumed by CRCs is responsible for the
> difference in performance - it's the only really obvious difference
> in the profile. What I think you'll find if you look deeper is that
> the time spent in memcpy() deep in the ramdisk IO path has doubled
> and that's where most of the additional time comes from.
>
> Apples to apples comparison at the IO level on my test rig:
>
> fsmark creation
> mkfs files/s (avg) wall time repair
> defaults 203209 4m57s 2m55s-3m05s
> -m crc=0 188837 5m06s 2m07s-2m10s
> -m crc=0 -i size=512 194331 5m00s 3m00s-3m10s
>
> Yup, when doing IO-equivalent testing, CRCs have no measurable
> wall time impact on xfs_repair runtime.
>
Ok, that minor range shift definitely seems much more reasonable a
result. I thought the 1m or so difference I saw (on spinning storage)
seemed a bit much, enough to try and figure out what was going on there.
I was thinking a regression at first, but the crc stuff jumped out at
the top of the cpu profile so I left it at that and forgot about the
isize change. Thanks for the analysis...
Brian
> Performance analysis is tricky stuff.... ;)
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-10-02 11:33 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks Brian Foster
2015-09-11 18:55 ` [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers Brian Foster
2015-09-23 3:44 ` Dave Chinner
2015-09-23 13:18 ` Brian Foster
2015-09-23 22:36 ` Dave Chinner
2015-10-01 20:38 ` Brian Foster
2015-10-02 2:16 ` Dave Chinner
2015-10-02 11:33 ` Brian Foster [this message]
2015-09-11 18:55 ` [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header Brian Foster
2015-09-23 3:48 ` Dave Chinner
2015-09-23 13:22 ` Brian Foster
2015-09-24 0:37 ` Dave Chinner
2015-09-24 13:00 ` Brian Foster
2015-09-11 18:55 ` [PATCH v2 04/12] libxfs: pass lsn param to log clear and record header logging helpers Brian Foster
2015-09-11 18:55 ` [PATCH v2 05/12] libxfs: add ability to clear log to arbitrary log cycle Brian Foster
2015-09-11 18:55 ` [PATCH v2 06/12] libxlog: pull struct xlog out of xlog_is_dirty() Brian Foster
2015-09-11 18:55 ` [PATCH v2 07/12] xfs_repair: track log state throughout all recovery phases Brian Foster
2015-09-11 18:55 ` [PATCH v2 08/12] xfs_repair: process the log in no_modify mode Brian Foster
2015-09-11 18:55 ` [PATCH v2 09/12] xfs_repair: format the log with forward cycle number on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 10/12] xfs_repair: don't clear the log by default Brian Foster
2015-09-11 18:55 ` [PATCH v2 11/12] xfs_db: do not reset current lsn from uuid command on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 12/12] db/metadump: bump lsn when log is cleared " Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151002113304.GB51208@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox