From: Dave Chinner <david@fromorbit.com>
To: bpm@sgi.com
Cc: xfs@oss.sgi.com
Subject: Re: nfs performance delta between filesystems
Date: Sat, 23 Jan 2010 23:30:53 +1100 [thread overview]
Message-ID: <20100123123053.GF25842@discord.disaster> (raw)
In-Reply-To: <20100122183848.GB28561@sgi.com>
On Fri, Jan 22, 2010 at 12:38:48PM -0600, bpm@sgi.com wrote:
> Hey Emmanuel,
>
> I did some research on this in April last year on an old, old kernel.
> One of the codepaths I flagged:
>
> nfsd_create
> write_inode_now
> __sync_single_inode
> write_inode
> xfs_fs_write_inode
> xfs_inode_flush
> xfs_iflush
>
> There were small gains to be had by reordering the sync of the parent and
> child syncs where the two inodes were in the same cluster. The larger
> problem seemed to be that we're not treating the log as stable storage.
> By calling write_inode_now we've written the changes to the log first
> and then gone and also written them out to the inode.
Pretty much right, but there are historical reasons for that
behaviour. The ->write_inode() path is the only
method for the higher layers to say "write this inode to disk".
That's how XFS has been treating it for a long time - as a command
to _physically_ write a dirty inode some time after it was first
changed and the transaction is already on disk.
Unfortunately, NFS is using the same call for is a method for saying
"commit this changed inode to disk immediately", which is a
different semantic to the way the sync code uses it and physical
inode IO really hurts here.
> nfsd_create, nfsd_link, and nfsd_setattr all do this (or do in the old
> kernel I'm looking at). I have a patchset that changes
> this to an fsync so we force the log and call it good. I'll be happy to
> dust it off if someone hasn't already addressed this situation.
The delayed write inode flushing patchset I'm finalising does this.
We now have reliable tracking of dirty inodes in XFS and a method
for efficient physical writeback, so we no longer need to rely on
->write_inode to tell us to write inodes to disk. Hence the patchset
turns the inode write into a an xfs_fsync() if it is a sync write or
a delayed write if it is async. I'm hoping to have that ready for
.34 inclusion sometime next week...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-01-23 12:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-22 17:54 nfs performance delta between filesystems Emmanuel Florac
2010-01-22 18:38 ` bpm
2010-01-22 20:46 ` Emmanuel Florac
2010-01-23 12:30 ` Dave Chinner [this message]
2010-01-25 15:04 ` Christoph Hellwig
2010-01-25 20:28 ` bpm
2010-01-25 20:40 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100123123053.GF25842@discord.disaster \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox