From mboxrd@z Thu Jan  1 00:00:00 1970
From: Can Sar <csar@stanford.edu>
Subject: Re: Why must NFS access metadata in synchronous mode?
Date: Thu, 1 Jun 2006 20:42:19 -0700
Message-ID: <3B8B2FFD-A2CD-4657-9AB2-564CB64D61CF@stanford.edu>
References: <4ae3c140605312104m441ca006j784a93354456faf8@mail.gmail.com> <1149141341.13298.21.camel@lade.trondhjem.org> <4ae3c140606010927t308e7d6ag5a9fc112c859aa45@mail.gmail.com> <1149182779.3549.25.camel@lade.trondhjem.org>
Mime-Version: 1.0 (Apple Message framework v750)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Xin Zhao <uszhaoxin@gmail.com>, linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from smtp-roam.Stanford.EDU ([171.64.10.152]:8065 "EHLO
	smtp-roam.Stanford.EDU") by vger.kernel.org with ESMTP
	id S1751251AbWFBDmc (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 1 Jun 2006 23:42:32 -0400
In-Reply-To: <1149182779.3549.25.camel@lade.trondhjem.org>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org


On Jun 1, 2006, at 10:26 AM, Trond Myklebust wrote:

> On Thu, 2006-06-01 at 12:27 -0400, Xin Zhao wrote:
>> Question 1: ...and how many NFS implementations have you seen  
>> based on
>> that paper?
>> I don't know. I only read the NFS implementations distributed with
>> Linux kernel. But some paper mentioned that the soft update mechanism
>> suggested in that paper has been adopted by FreeBSD.
>
> FreeBSD does not use soft updates for NFS afaik.
>
>> Question 2: NFS permissions are checked by the _server_, not the  
>> client.
>> That's true. But I was not saying that all metadata access must be
>> asynchronous. Even for permission checking, speculative execution
>> mechanism proposed in Ed Nightingale's "speculative execution ...."
>> paper published in SOSP 2005 can be used to avoid waiting. The basic
>> idea is that a NFS client speculatively assume permission checking
>> returns "OK" and set a checkpoint, then the client can go ahead to
>> send further requests. If the actual result turns out to be "OK", the
>> client can discard the checkpoint, otherwise, it rolls back to the
>> checking point. This can make waiting time overlap with the sending
>> time of subsequent requests.
>
> ...and how does that help the user that has been told the operation
> succeeded?

It wouldn't. For externally visible operations the kernel just waits  
until it actually has the right answer.
Contributions from that paper would definitely speed up NFS on Linux  
but they require extensive work, which is one of the reasons no one  
has actually implemented a production level version of that work yet  
(researchers generally move on to new papers instead).

>
>> Question 3: Cache consistency requirements are _much_ more stringent
>> for asynchronous operation.
>> I agree. But I am not sure how local file system like Ext3 handle  
>> this
>> problem. I don't think Ext3 must synchronously write metadata (I will
>> double check the ext3 code). If I remember correctly, when change
>> metadata, Ext3 just change it in memory and mark this page to be
>> dirty. The page will be flushed to disk afterward. If the server
>> exports an Ext3 code, it should be able to do the same thing. When a
>> client requests to change metadata, server writes to the mmaped
>> metadata page and then return to client instead of having to sync the
>> change to disk. With this mechanism, at least the client does not  
>> have
>> to wait for the disk flush time. Does it make sense? To prevent
>> interleave change on metadata before it is flushed to disk, the  
>> server
>> can even mark the metadata page to be read-only before it is flushed
>> to disk.
>
> 'man 5 exports'. Read _carefully_ the entry on the "async" export
> option, and see the NFS FAQ, nfs mailing list archives, etc... why  
> it is
> a bad idea.
>
>
> Cheers,
>   Trond
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux- 
> fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html