From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id AC5F529E04 for ; Mon, 18 Nov 2013 16:28:30 -0600 (CST) Date: Mon, 18 Nov 2013 16:28:26 -0600 From: Ben Myers Subject: Re: [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Message-ID: <20131118222826.GJ1935@sgi.com> References: <1383280040-21979-1-git-send-email-david@fromorbit.com> <1383280040-21979-3-git-send-email-david@fromorbit.com> <528A8C90.3010401@sandeen.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <528A8C90.3010401@sandeen.net> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Eric Sandeen Cc: xfs@oss.sgi.com On Mon, Nov 18, 2013 at 03:54:24PM -0600, Eric Sandeen wrote: > On 10/31/13, 11:27 PM, Dave Chinner wrote: > > From: Dave Chinner > > > > Michael L Semon reported that generic/069 runtime increased on v5 > > superblocks by 100% compared to v4 superblocks. his perf-based > > analysis pointed directly at the timestamp updates being done by the > > write path in this workload. The append writers are doing 4-byte > > writes, so there are lots of timestamp updates occurring. > > > > The thing is, they aren't being triggered by timestamp changes - > > they are being triggered by the inode change counter needing to be > > updated. That is, every write(2) system call needs to bump the inode > > version count, and it does that through the timestamp update > > mechanism. Hence for v5 filesystems, test generic/069 is running 3 > > orders of magnitude more timestmap update transactions on v5 > > filesystems due to the fact it does a huge number of *4 byte* > > write(2) calls. > > > > This isn't a real world scenario we really need to address - anyone > > doing such sequential IO should be using fwrite(3), not write(2). > > i.e. fwrite(3) buffers the writes in userspace to minimise the > > number of write(2) syscalls, and the problem goes away. > > > > However, there is a small change we can make to improve the > > situation - removing the expensive lock operation on the change > > counter update. All inode version counter changes in XFS occur > > under the ip->i_ilock during a transaction, and therefore we > > don't actually need the spin lock that provides exclusive access to > > it through inc_inode_iversion(). > > > > Hence avoid the lock and just open code the increment ourselves when > > logging the inode. > > Well, ok. Maybe worth a note about why the unlocked read is 99.9999% ok... > > Reviewed-by: Eric Sandeen Ah, sorry Eric, I didn't realize you were still reviewing this guy. I pulled him in a bit earlier in the day. Thanks, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs