linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
Cc: Josef Bacik <josef-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Boaz Harrosh <bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org>,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org,
	tytso-3s7WtUTddSA@public.gmane.org,
	jack-AlSwsSmVLrQ@public.gmane.org
Subject: Re: [PATCH] ext4: fix how i_version is modified and turn it on by default V2
Date: Tue, 15 May 2012 16:05:34 -0400	[thread overview]
Message-ID: <20120515200533.GD1907@localhost.localdomain> (raw)
In-Reply-To: <AB91C817-DFEE-415F-8769-78831D72C6B7-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>

On Tue, May 15, 2012 at 01:55:33PM -0600, Andreas Dilger wrote:

<snip>

> > 
> > Yeah but for every write() call we have to do this in-memory operation (via file_update_time()), so in the worst case where you are doing 1
> > byte writes where you would notice this sort of overhead you get a 40%
> > overhead just from:
> > 
> > spin_lock(&inode->i_lock);
> > inode->i_version++;
> > spin_unlock(&inode->i_lock);
> > 
> > and then the subsequent mark_inode_dirty() call.
> 
> Right.  That this is so noticeable (40% wallclock slowdown for
> filesystem IO) means that the CPU usage is higher with the
> patch than without, likely MUCH more than the 40% shown by the
> wallclock time.
> 
> > What is likely saving us here is that the 1 byte writes are happening
> > fast enough that the normal ctime/mtime operations aren't triggering
> > a mark_inode_dirty() since it appears to happen within the same time,
> > but now that we're doing the i_version++ we are calling
> > mark_inode_dirty() more than before.
> 
> Right, and hence my objection to the "basic" version of this patch
> that just turns on i_version updating for everyone.
> 

Yeah agreed, I wasn't completely convinced this would happen until I ran the
tests, and usual I was wrong.

> > I'd have to profile it to verify that's what is actually happening,
> > but that means turning on ftrace and parsing a bunch of output and
> > man that sounds like more time than I want to waste on this.  The
> > question is do we care that the worst case is so awful since the
> > worst case is likely to not show up often if at all?
> 
> Based on my experience, there are a LOT of badly written applications
> in the world, and I don't think anyone would be happy with a 40%
> slowdown, PLUS 100% CPU usage on the node while doing IO.  I would
> guess that the number of systems running badly-written applications
> far exceeds the number of systems that are doing NFS exporting, so
> I don't think this is a good tradeoff.
> 
> > If we do then I guess the next step is to add a fs flag for i_version
> > so that people who are going to be exporting file systems can get
> > this without having to use a special option all the time.
> 
> It should be fairly straight forward to have a flag set in the ext4
> superblock (s_state flag?) that indicates that the filesystem has
> been exported via NFS.  There might be other optimizations that can
> be done based on this (e.g. avoid some of the directory cookie
> hijinx that are only needed if NFS has exported the filesystem and
> needs to keep persistent cookies across reboots).
> 
> I think that the ext4_mark_inode_dirty() performance problem could
> be at least partially fixed by deferring the copy of in-core inode
> to on-disk inode to use a journal commit callback.  This is far
> more work than just setting a flag in the superblock, but it has
> the potential to _improve_ performance rather than make it worse.
> 

Yeah Btrfs doesn't have this sort of problem since we delay inode updating sinc
it is so costly, we simply let it hang around in the in-core inode until we feel
like updating it at some point down the road.  I'll put together a feature flag
or something to make it be enabled for always if somebody turns it on.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-05-15 20:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-15 14:33 [PATCH] ext4: fix how i_version is modified and turn it on by default V2 Josef Bacik
2012-05-15 15:18 ` Jan Kara
2012-05-15 17:53 ` Josef Bacik
     [not found]   ` <20120515175308.GB1907-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2012-05-15 18:17     ` Boaz Harrosh
2012-05-15 18:29       ` Josef Bacik
2012-05-15 19:55         ` Andreas Dilger
     [not found]           ` <AB91C817-DFEE-415F-8769-78831D72C6B7-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2012-05-15 20:05             ` Josef Bacik [this message]
     [not found]               ` <20120515200533.GD1907-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2012-05-15 21:00                 ` J. Bruce Fields
     [not found]                   ` <20120515210029.GA11932-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2012-05-15 21:08                     ` Josef Bacik
2012-05-15 22:19                     ` Andreas Dilger
     [not found]         ` <20120515182914.GC1907-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2012-05-15 20:24           ` Boaz Harrosh
2012-05-16  1:36   ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120515200533.GD1907@localhost.localdomain \
    --to=josef-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
    --cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
    --cc=bharrosh-C4P08NqkoRlBDgjK7y7TUQ@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tytso-3s7WtUTddSA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).