From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mike Shaver <shaver@clusterfs.com>
Subject: Re: [Lustre-devel] Re: fixing redundant network opens on Linux file creation
Date: Tue, 7 Jan 2003 08:19:27 -0500
Sender: samba-technical-admin@lists.samba.org
Message-ID: <20030107081927.Z11553@off.net>
References: <20030106154853.N31555@schatzie.adilger.int> <OF37000125.004F402A-ON87256CA7.00018EB2-88256CA7.0005DCA2@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Andreas Dilger <adilger@clusterfs.com>, Jan Hudec <bulb@ucw.cz>,
	linux-fsdevel@vger.kernel.org, linux-fsdevel-owner@vger.kernel.org,
	Lustre Development Mailing List <lustre-devel@lists.sourceforge.net>,
	Richard Sharpe <rsharpe@richardsharpe.com>,
	samba-technical@samba.org, Steven French <sfrench@us.ibm.com>
Return-path: <samba-technical-admin@lists.samba.org>
To: Bryan Henderson <hbryan@us.ibm.com>
Content-Disposition: inline
In-Reply-To: <OF37000125.004F402A-ON87256CA7.00018EB2-88256CA7.0005DCA2@us.ibm.com>; from hbryan@us.ibm.com on Mon, Jan 06, 2003 at 05:06:23PM -0800
Errors-To: samba-technical-admin@lists.samba.org
List-Help: <mailto:samba-technical-request@lists.samba.org?subject=help>
List-Post: <mailto:samba-technical@lists.samba.org>
List-Subscribe: <http://lists.samba.org/mailman/listinfo/samba-technical>,
	<mailto:samba-technical-request@lists.samba.org?subject=subscribe>
List-Unsubscribe: <http://lists.samba.org/mailman/listinfo/samba-technical>,
	<mailto:samba-technical-request@lists.samba.org?subject=unsubscribe>
List-Archive: <http://lists.samba.org/pipermail/samba-technical/>
List-Id: linux-fsdevel.vger.kernel.org

On Jan 06, Bryan Henderson wrote:
> That's really orthogonal to this discussion.  If you want to conserve the
> number of VFS operation routines, you can have a single routine with
> parameters for a dozen different operations whether it is
> lookup-with-intent or lookup-and-do.  Pretty much the only difference in
> the C code is the name of the routine.

That may be true, but the invasiveness of the change to the Linux VFS
would likely be much greater.  Our intent patches are pretty small, and
therefore much easier to port between versions, as well as more likely
to be integrated into 2.5/2.6.

> But my discomfort with the lookup-with-intent approach is focused on the
> open/create operation in particular.  From what I can tell, these intents
> are more than just declaration of intent. They're promises.  If the VFS
> caller did a lookup with intent to create if not found, and then didn't
> follow through on that intent, I guess that would cause trouble on Lustre
> since the implementation of lookup-with-intent actually created the file.

Do you use "the VFS caller" to mean "the code that calls into the VFS",
or "the caller of the intent-handling operations, which is the VFS"?
It's my understanding that these changes are transparent to the caller
of the VFS, but if the VFS itself were to "abort" halfway we might well
have a problem.  Not because something created the file, but because we
wouldn't necessarily clean up the intent structures correctly.  I expect
that this is a soluble problem, at the expense of more changes to the
VFS.

We haven't seen any problems with "aborted intent" in part because we
don't depend on the caller-into-the-VFS to cooperate; the VFS itself
completes the intent protocol correctly, every time, in no small part
because the intent is declarative and binding, rather than just
speculative.

> That's not the concept of intent declaration as I've seen it everywhere
> else.  Something like "open with write intent" always means either "open
> the file and I won't do anything but write to it," or "open the file and
> I'll probably be writing to it," but never "open the file and the next
> thing you see from me will be a write of 10 bytes at offset 20."

Is the objection really just to the terminology, then?  JFS, VxFS and
NetApp seem to use "intent logging" to mean something similar ("I will
be doing this next", rather than "I might be doing this next, but maybe
not").  Maybe I misunderstand the intent log, though, and the time at
which it gets updated.  It certainly does seem to describe fact rather
than a fallible expectation.

The origin of the intent stuff is really, to my understanding, in the
locking: the client requests a lock with the declared intent of performing
some other FS operation (getattr, create of a child, etc.).  The
presence of that intent information, in the form of a fully-specified FS
operation, is what permits the server to perform the desired operation
on behalf of the client, where system performance would be degraded
unacceptably by giving one client an exclusive lock on a contended
resource.  That we have intent-driven behaviour in lookup/lookup2 is
largely due to the fact that it's in lookup that we need to acquire our
locks.

> Another thing the structure of this "intent" interface says to me is that a
> filesystem driver might choose in some cases not to open the file but wait
> until the open is actually requested.  If so, doesn't the filesystem driver
> have to maintain some cognizance of the thread of file accesses, so it can
> match up an open with a previous lookup-with-intent and know if that
> particular open is already done?  That kind of state has always been
> intentionally omitted from the VFS interface.

I think it's that state, specifically, that's represented by the intent
parameters added to the various ops.  I understand that it was a design
compromise motivated in no small part by the desire to minimize changes
to the Linux VFS at this stage.  I'm not at all certain that we would
structure things in this form if we were writing an intent-enabled VFS
from first principles.

Mike