linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [rfc] larger batches for crc32c
Date: Mon, 31 Oct 2016 14:08:53 +1100	[thread overview]
Message-ID: <20161031030853.GK22126@dastard> (raw)
In-Reply-To: <20161028160218.1af40906@roar.ozlabs.ibm.com>

On Fri, Oct 28, 2016 at 04:02:18PM +1100, Nicholas Piggin wrote:
> Okay, the XFS crc sizes indeed don't look too so bad, so it's more the
> crc implementation I suppose. I was seeing a lot of small calls to crc,
> but as a fraction of the total number of bytes, it's not as significant
> as I thought. That said, there is some improvement you may be able to
> get even from x86 implementation.
> 
> I took an ilog2 histogram of frequency and total bytes going to XFS

Which means ilog2 = 3 is 8-15 bytes and 9 is 512-1023 bytes? 

> checksum, with total, head, and tail lengths. I'll give as percentages
> of total for easier comparison (total calls were around 1 million and
> 500MB of data):

Does this table match the profile you showed with all the overhead
being through the fsync->log write path?

Decode table - these are the offsets of crc fields in XFS structures.
('pahole fs/xfs/xfs.o |grep crc' and cleaned up)

	AGF header                 agf_crc;              /*   216     4 */
	short btree                bb_crc;               /*    44     4 */
	long btree                 bb_crc;               /*    56     4 */
	log buffer header          h_crc;                /*    32     4 */
	AG Free list               agfl_crc;             /*    32     4 */
	dir/attr leaf/node block   crc;                  /*    12     4 */
	remote attribute           rm_crc;               /*    12     4 */
	directory data block       crc;                  /*     4     4 */
	dquot                      dd_crc;               /*   108     4 */
	AGI header                 agi_crc;              /*   312     4 */
	inode                      di_crc;               /*   100     4 */
	superblock                 sb_crc;               /*   224     4 */
	symlink                    sl_crc;               /*    12     4 */

> 
>                 frequency                   bytes
> ilog2   total   | head | tail       total | head | tail
>   3         0     1.51      0           0   0.01      0

Directory data blocks.

>   4         0        0      0           0      0      0
>   5         0        0      0           0      0      0
>   6         0    22.35      0           0   1.36      0

log buffer headers, short/long btree blocks

>   7         0    76.10      0           0  14.40      0

Inodes.

>   8         0     0.04     ~0           0   0.02     ~0
>   9     22.25       ~0  98.39       13.81     ~0  71.07

Inode, log buffer header tails.

>  10     76.14        0      0       73.77      0      0

Full sector, no head, no tail (i.e. external crc store)? I think
only log buffers (the extended header sector CRCs) can do that.
That implies a large log buffer (e.g. 256k) is configured and
(possibly) log stripe unit padding is being done. What is the
xfs_info and mount options from the test filesystem?

>  11         0        0      0           0      0      0
>  12         0        0   1.60           0      0  12.39

Directory data block tails.

>  13      1.60        0      0       12.42      0      0

Larger than 4k? Probably only log buffers.

> Keep in mind you have to sum the number of bytes for head and tail to
> get ~100%.
> 
> Now for x86-64, you need to be at 9-10 (depending on configuration) or
> greater to exceed the breakeven point for their fastest implementation.
> Split crc implementation will use the fast algorithm for about 85% of
> bytes in the best case, 12% at worst. Combined gets there for 85% at
> worst, and 100% at best. The slower x86 implementation still uses a
> hardware instruction, so it doesn't do too badly.
> 
> For powerpc, the breakeven is at 512 + 16 bytes (9ish), but it falls
> back to generic implementation for bytes below that.

Which means for the most common objects we won't be able to reach
breakeven easily simply because of the size of the objects we are
running CRCs on. e.g. sectors and inodes/dquots by default are all
512 bytes or smaller. THere's only so much that can be optimised
here...

> I think we can
> reduce the break even point on powerpc slightly and capture most of
> the rest, so it's not so bad.
> 
> Anyway at least that's a data point to consider. Small improvement is
> possible.

Yup, but there's no huge gain to be made here - these numbers say to
me that the problem may not be the CRC overhead, but instead is the
amount of CRC work being done. Hence my request for mount options
+ xfs_info to determine if what you are seeing is simply a bad fs
configuration for optimal small log write performance. CRC overhead
may just be a symptom of a filesystem config issue...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2016-10-31  3:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-27 16:17 [rfc] larger batches for crc32c Nicholas Piggin
2016-10-27 18:29 ` Darrick J. Wong
2016-10-28  3:21   ` Nicholas Piggin
2016-10-27 21:42 ` Dave Chinner
2016-10-27 23:16   ` Dave Chinner
2016-10-28  2:12   ` Nicholas Piggin
2016-10-28  4:29     ` Dave Chinner
2016-10-28  5:02     ` Nicholas Piggin
2016-10-31  3:08       ` Dave Chinner [this message]
2016-11-01  3:39         ` Nicholas Piggin
2016-11-01  5:47           ` Dave Chinner
2016-11-02  2:18             ` [rfe]: finobt option separable from crc option? (was [rfc] larger batches for crc32c) L.A. Walsh
2016-11-03  8:29               ` Dave Chinner
2016-11-03 16:04                 ` L.A. Walsh
2016-11-03 18:15                   ` Eric Sandeen
2016-11-03 23:00                   ` Dave Chinner
2016-11-04  6:56                     ` L.A. Walsh
2016-11-04 17:37                       ` Eric Sandeen
2016-11-04  0:12 ` [rfc] larger batches for crc32c Dave Chinner
2016-11-04  2:28   ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161031030853.GK22126@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).