From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f174.google.com ([209.85.192.174]:36717 "EHLO mail-pf0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965526AbcKADj3 (ORCPT ); Mon, 31 Oct 2016 23:39:29 -0400 Received: by mail-pf0-f174.google.com with SMTP id 189so34324400pfz.3 for ; Mon, 31 Oct 2016 20:39:29 -0700 (PDT) Date: Tue, 1 Nov 2016 14:39:18 +1100 From: Nicholas Piggin Subject: Re: [rfc] larger batches for crc32c Message-ID: <20161101143918.4f154154@roar.ozlabs.ibm.com> In-Reply-To: <20161031030853.GK22126@dastard> References: <20161028031747.68472ac7@roar.ozlabs.ibm.com> <20161027214244.GO14023@dastard> <20161028131234.24a5cb6f@roar.ozlabs.ibm.com> <20161028160218.1af40906@roar.ozlabs.ibm.com> <20161031030853.GK22126@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Dave Chinner Cc: linux-xfs@vger.kernel.org, Christoph Hellwig , Dave Chinner , "Darrick J. Wong" On Mon, 31 Oct 2016 14:08:53 +1100 Dave Chinner wrote: > On Fri, Oct 28, 2016 at 04:02:18PM +1100, Nicholas Piggin wrote: > > Okay, the XFS crc sizes indeed don't look too so bad, so it's more the > > crc implementation I suppose. I was seeing a lot of small calls to crc, > > but as a fraction of the total number of bytes, it's not as significant > > as I thought. That said, there is some improvement you may be able to > > get even from x86 implementation. > > > > I took an ilog2 histogram of frequency and total bytes going to XFS > > Which means ilog2 = 3 is 8-15 bytes and 9 is 512-1023 bytes? Yes. > > checksum, with total, head, and tail lengths. I'll give as percentages > > of total for easier comparison (total calls were around 1 million and > > 500MB of data): > > Does this table match the profile you showed with all the overhead > being through the fsync->log write path? Yes. [snip interesting summary] > Full sector, no head, no tail (i.e. external crc store)? I think > only log buffers (the extended header sector CRCs) can do that. > That implies a large log buffer (e.g. 256k) is configured and > (possibly) log stripe unit padding is being done. What is the > xfs_info and mount options from the test filesystem? See the end of the mail. [snip] > > Keep in mind you have to sum the number of bytes for head and tail to > > get ~100%. > > > > Now for x86-64, you need to be at 9-10 (depending on configuration) or > > greater to exceed the breakeven point for their fastest implementation. > > Split crc implementation will use the fast algorithm for about 85% of > > bytes in the best case, 12% at worst. Combined gets there for 85% at > > worst, and 100% at best. The slower x86 implementation still uses a > > hardware instruction, so it doesn't do too badly. > > > > For powerpc, the breakeven is at 512 + 16 bytes (9ish), but it falls > > back to generic implementation for bytes below that. > > Which means for the most common objects we won't be able to reach > breakeven easily simply because of the size of the objects we are > running CRCs on. e.g. sectors and inodes/dquots by default are all > 512 bytes or smaller. THere's only so much that can be optimised > here... Well for this workload at least, the full checksum size seems always >= 512. The small heads cut it down and drag a lot of crc32c calls from 1024-2047 range (optimal for Intel) to 512-1023. I don't *think* I've done the wrong thing here, but if it looks odd to you, I'll go back and double check. > > > I think we can > > reduce the break even point on powerpc slightly and capture most of > > the rest, so it's not so bad. > > > > Anyway at least that's a data point to consider. Small improvement is > > possible. > > Yup, but there's no huge gain to be made here - these numbers say to > me that the problem may not be the CRC overhead, but instead is the > amount of CRC work being done. Hence my request for mount options > + xfs_info to determine if what you are seeing is simply a bad fs > configuration for optimal small log write performance. CRC overhead > may just be a symptom of a filesystem config issue... Yes sorry, I forgot to send an xfs_info sample. mkfs.xfs is 4.3.0 from Ubuntu 16.04. npiggin@fstn3:/etc$ sudo mkfs.xfs -f /dev/ram0 specified blocksize 4096 is less than device physical sector size 65536 switching to logical sector size 512 meta-data=/dev/ram0 isize=512 agcount=4, agsize=4194304 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=4096 blocks=16777216, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Mount options are standard: /dev/ram0 on /mnt type xfs (rw,relatime,attr2,inode64,noquota) xfs_info sample: extent_alloc 64475 822567 74740 1164625 abt 0 0 0 0 blk_map 1356685 1125591 334183 64406 227190 2816523 0 bmbt 0 0 0 0 dir 79418 460612 460544 5685160 trans 0 3491960 0 ig 381191 378085 0 3106 0 2972 153329 log 89045 2859542 62 132145 143932 push_ail 3491960 24 619 53860 0 6433 13135 284324 0 445 xstrat 64342 0 rw 951375 2937203 attr 0 0 0 0 icluster 47412 38985 221903 vnodes 5294 0 0 0 381123 381123 381123 0 buf 4497307 6910 4497106 1054073 13012 201 0 0 0 abtb2 139597 675266 27639 27517 0 0 0 0 0 0 0 0 0 0 1411718 abtc2 240942 1207277 120532 120410 0 0 0 0 0 0 0 0 0 0 4618844 bmbt2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ibt2 762383 3048311 69 67 0 0 0 0 0 0 0 0 0 0 263 fibt2 1114420 2571311 143583 143582 0 0 0 0 0 0 0 0 0 0 1232534 rmapbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 refcntbt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 qm 0 0 0 0 0 0 0 0 xpc 3366711296 24870568605 34799779740 debug 0 Thanks, Nick