From: Andreas Dilger <adilger@clusterfs.com>
To: Xin Zhao <uszhaoxin@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: Why must NFS access metadata in synchronous mode?
Date: Thu, 1 Jun 2006 15:40:14 -0600 [thread overview]
Message-ID: <20060601214014.GI5964@schatzie.adilger.int> (raw)
In-Reply-To: <4ae3c140606010927t308e7d6ag5a9fc112c859aa45@mail.gmail.com>
On Jun 01, 2006 12:27 -0400, Xin Zhao wrote:
> > Question 3: Cache consistency requirements are _much_ more stringent
> > for asynchronous operation.
> I agree. But I am not sure how local file system like Ext3 handle this
> problem. I don't think Ext3 must synchronously write metadata (I will
> double check the ext3 code). If I remember correctly, when change
> metadata, Ext3 just change it in memory and mark this page to be
> dirty. The page will be flushed to disk afterward. If the server
> exports an Ext3 code, it should be able to do the same thing. When a
> client requests to change metadata, server writes to the mmaped
> metadata page and then return to client instead of having to sync the
> change to disk. With this mechanism, at least the client does not have
> to wait for the disk flush time. Does it make sense? To prevent
> interleave change on metadata before it is flushed to disk, the server
> can even mark the metadata page to be read-only before it is flushed
> to disk.
The difference between local filesystems and remote filesystems is that
if asynchronous operations on a local filesystem are lost due to node
failure, the application is also generally failed at the same time,
so when the node restarts the application it gets the old state from disk.
If applications care about on-disk consistency (say because the application
is itself sharing state with a remote system, like sendmail) then it will
fsync the file(s) before updating the remote state.
In the NFS case, a remote client keeps the "new" state, which is inconsistent
with what the server has on disk (if server is running asynchronously) so
there is no way to reconcile this. In the case of Lustre the clients are
stateful and keep a record of all operations they do (until the server later
confirms that it is safe on disk). In case of server failure the clients
replay uncommitted operations to the server after a failure. This allows
the server filesystem to run asynchronously.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
prev parent reply other threads:[~2006-06-01 21:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-01 4:04 Why must NFS access metadata in synchronous mode? Xin Zhao
2006-06-01 5:55 ` Trond Myklebust
2006-06-01 16:27 ` Xin Zhao
2006-06-01 17:26 ` Trond Myklebust
2006-06-02 3:42 ` Can Sar
2006-06-02 4:06 ` Trond Myklebust
2006-06-01 21:40 ` Andreas Dilger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060601214014.GI5964@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
--cc=uszhaoxin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).