linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Josef Bacik <josef@redhat.com>
Cc: <linux-ext4@vger.kernel.org>, <linux-nfs@vger.kernel.org>,
	<bfields@fieldses.org>, <adilger@dilger.ca>, <tytso@mit.edu>,
	<jack@suse.cz>
Subject: Re: [PATCH] ext4: fix how i_version is modified and turn it on by default V2
Date: Tue, 15 May 2012 23:24:48 +0300	[thread overview]
Message-ID: <4FB2BB90.30900@panasas.com> (raw)
In-Reply-To: <20120515182914.GC1907@localhost.localdomain>

On 05/15/2012 09:29 PM, Josef Bacik wrote:

> On Tue, May 15, 2012 at 09:17:16PM +0300, Boaz Harrosh wrote:

<snip>

>>
>> do you mean that the "with the patch" is 40% slower then "without the patch"
>> 	in the same "bs=1 count=10485760 test" ?
>>
>> Then count me clueless. Do you understand this difference?
>>
>> The way I read your patch the inode is copied and written to disk exactly the
>> same number of times, as before. Only that now i_version is also updated
>> together with ctime and/or mtime. What is the fundamental difference then?
>> Is it just that i_version++ in-memory operation?
>>
> 
> Yeah but for every write() call we have to do this in-memory operation (via
> file_update_time()), so in the worst case where you are doing 1 byte writes
> where you would notice this sort of overhead you get a 40% overhead just from
> 
> spin_lock(&inode->i_lock);
> inode->i_version++;
> spin_unlock(&inode->i_lock);
> 


We should be optimizing this we can take the spin_lock once and change
the i_version and ctime/mtime at the same locking cycle.

> and then the subsequent mark_inode_dirty() call.  What is likely saving us here
> is that the 1 byte writes are happening fast enough that the normal ctime/mtime
> operations aren't triggering a mark_inode_dirty() since it appears to happen
> wihtin the same time, 


Ha, OK that Jiffy resolution of ctime. Hence the core problem BTW, changes come
in and we can't differentiate them.

So you are saying that there is somewhere in code that does
if (old_ctime != cur_ctime)
	mark_inode_dirty()

Isn't the mark_inode_dirty() called every-time regardless of the value set
at ctime ?

> but now that we're doing the i_version++ we are calling
> mark_inode_dirty() more than before.  


Can we fix that. Can we mark the inode dirty at Jiffy resolution?

In the case of a Server crash the nfs_changed attr is expected to reset
back to old value until such time that some client did a COMMIT. Up to
COMMIT it might happen that on disk version is older than runtime.

But the Jiffy resolution is a real bummer believe me.

> I'd have to profile it to verify that's
> what is actually happening, but that means turning on ftrace and parsing a bunch
> of output and man that sounds like more time than I want to waste on this.  The
> question is do we care that the worst case is so awful since the worst case is
> likely to not show up often if at all?  If we do then I guess the next step is
> to add a fs flag for i_version so that people who are going to be exporting file
> systems can get this without having to use a special option all the time.
> Thanks,
> 


Lets just think about it a bit, Can we make it that disk updates happen just
as often as today and no more. But in memory inquiry of i_version gives us
a much higher resolution.

What about that Al's proposal of incrementing only after an NFSD inquiry

> Josef 


Thanks for looking into this. We all hate this for a long time but never
took the time to fix it.

Boaz

  parent reply	other threads:[~2012-05-15 20:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-15 14:33 [PATCH] ext4: fix how i_version is modified and turn it on by default V2 Josef Bacik
2012-05-15 15:18 ` Jan Kara
2012-05-15 17:53 ` Josef Bacik
2012-05-15 18:17   ` Boaz Harrosh
2012-05-15 18:29     ` Josef Bacik
2012-05-15 19:55       ` Andreas Dilger
2012-05-15 20:05         ` Josef Bacik
2012-05-15 21:00           ` J. Bruce Fields
2012-05-15 21:08             ` Josef Bacik
2012-05-15 22:19             ` Andreas Dilger
2012-05-15 20:24       ` Boaz Harrosh [this message]
2012-05-16  1:36   ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB2BB90.30900@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=adilger@dilger.ca \
    --cc=bfields@fieldses.org \
    --cc=jack@suse.cz \
    --cc=josef@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).