public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 02/15] xfs: open code inc_inode_iversion when logging an inode
Date: Tue, 29 Oct 2013 22:11:45 +1100	[thread overview]
Message-ID: <1383045118-31107-3-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1383045118-31107-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Michael L Semon reported that generic/069 runtime increased on v5
superblocks by 100% compared to v4 superblocks. his perf-based
analysis pointed directly at the timestamp updates being done by the
write path in this workload. The append writers are doing 4-byte
writes, so there are lots of timestamp updates occurring.

The thing is, they aren't being triggered by timestamp changes -
they are being triggered by the inode change counter needing to be
updated. That is, every write(2) system call needs to bump the inode
version count, and it does that through the timestamp update
mechanism. Hence for v5 filesystems, test generic/069 is running 3
orders of magnitude more timestmap update transactions on v5
filesystems due to the fact it does a huge number of *4 byte*
write(2) calls.

This isn't a real world scenario we really need to address - anyone
doing such sequential IO should be using fwrite(3), not write(2).
i.e. fwrite(3) buffers the writes in userspace to minimise the
number of write(2) syscalls, and the problem goes away.

However, there is a small change we can make to improve the
situation - removing the expensive lock operation on the change
counter update.  All inode version counter changes in XFS occur
under the ip->i_ilock during a transaction, and therefore we
don't actually need the spin lock that provides exclusive access to
it through inc_inode_iversion().

Hence avoid the lock and just open code the increment ourselves when
logging the inode.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_inode.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 1bba7f6..50c3f56 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -111,12 +111,14 @@ xfs_trans_log_inode(
 
 	/*
 	 * First time we log the inode in a transaction, bump the inode change
-	 * counter if it is configured for this to occur.
+	 * counter if it is configured for this to occur. We don't use
+	 * inode_inc_version() because there is no need for extra locking around
+	 * i_version as we already hold the inode locked exclusively for
+	 * metadata modification.
 	 */
 	if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
 	    IS_I_VERSION(VFS_I(ip))) {
-		inode_inc_iversion(VFS_I(ip));
-		ip->i_d.di_changecount = VFS_I(ip)->i_version;
+		ip->i_d.di_changecount = ++VFS_I(ip)->i_version;
 		flags |= XFS_ILOG_CORE;
 	}
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-10-29 11:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-29 11:11 [PATCH 00/15] xfs: patches for 3.13 Dave Chinner
2013-10-29 11:11 ` [PATCH 01/15] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-10-30 22:39   ` Ben Myers
2013-10-30 23:15     ` Dave Chinner
2013-11-04 23:10       ` Ben Myers
2013-10-29 11:11 ` Dave Chinner [this message]
2013-10-29 11:11 ` [PATCH 03/15] xfs: abstract the differences in dir2/dir3 via an ops vector Dave Chinner
2013-10-29 11:11 ` [PATCH 04/15] xfs: vectorise remaining shortform dir2 ops Dave Chinner
2013-10-29 11:11 ` [PATCH 05/15] xfs: vectorise directory data operations Dave Chinner
2013-10-29 11:11 ` [PATCH 06/15] xfs: vectorise directory data operations part 2 Dave Chinner
2013-10-29 11:11 ` [PATCH 07/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 11:11 ` [PATCH 08/15] xfs: vectorise DA btree operations Dave Chinner
2013-10-29 11:11 ` [PATCH 09/15] xfs: vectorise encoding/decoding directory headers Dave Chinner
2013-10-29 19:06   ` Ben Myers
2013-10-29 11:11 ` [PATCH 10/15] xfs: vectorise directory leaf operations Dave Chinner
2013-10-29 19:13   ` Ben Myers
2013-10-29 11:11 ` [PATCH 11/15] xfs: convert directory vector functions to constants Dave Chinner
2013-10-29 19:22   ` Ben Myers
2013-10-29 22:15   ` [PATCH 11/15 V2] " Dave Chinner
2013-10-30 18:09     ` Ben Myers
2013-10-29 11:11 ` [PATCH 12/15] xfs: make dir2 ftype offset pointers explicit Dave Chinner
2013-10-29 20:00   ` Ben Myers
2013-10-29 22:15     ` Dave Chinner
2013-10-30 18:51       ` Ben Myers
2013-10-29 11:11 ` [PATCH 13/15] xfs: validity check the directory block leaf entry count Dave Chinner
2013-10-29 20:43   ` Ben Myers
2013-10-29 11:11 ` [PATCH 14/15] xfs: prevent stack overflows from page cache allocation Dave Chinner
2013-10-30 10:23   ` Christoph Hellwig
2013-10-30 21:40   ` Ben Myers
2013-10-29 11:11 ` [PATCH 15/15] xfs: fix static and extern sparse warnings Dave Chinner
2013-10-29 21:12   ` Ben Myers
2013-10-30 19:22 ` [PATCH 00/15] xfs: patches for 3.13 Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1383045118-31107-3-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox