public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH/RFC] Lustre VFS patch, version 2
@ 2004-06-02 23:15 Peter J. Braam
  2004-06-03 13:59 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Peter J. Braam @ 2004-06-02 23:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: hch, axboe, lmb, kevcorry, arjanv, iro, trond.myklebust, anton,
	lustre-devel

[-- Attachment #1: Type: text/plain, Size: 7683 bytes --]

Hello!
 
The feedback of the Lustre patches was of very high quality, thanks a
lot for studying it carefully.  Things are simpler now.
 
Oleg Drokin and I discussed the emails extensively and here is our reply. 
We have attached another collection of patches, addressing many of the
concerns.
We felt it is was perhaps easier to keep this all in one long email.
 
People requested to see the code that uses the patch.  We have uploaded that
to:
 
ftp://ftp.clusterfs.com:/pub/lustre/lkml/lustre-client_and_mds.tgz 
 
The client file system is the primary user of the kernel patch, in the
llite directory. The MDS server is a sample user of do_kern_mount.  As
requested I have removed many other things from the tar ball to make
review simple (so this won't compile or run).
 
1. Export symbols concerns by Christoph Hellwig: 
 
Indeed we can do without __iget, kernel_text_address, reparent_to_init
and exit_files.
 
We actually need do_kern_mount and truncate_complete_page.  Do kern
mount is used because we use a file system namespace in servers in the
kernel without exporting it to user space (mds/handler.c).  The server
file systems are ext3 file systems but we replace VFS locking with DLM
locks, and it would take considerable work to export that as a file
system.
 
Truncate_complete_page is used to remove pages in the middle of a file
mapping, when lock revocations happen (llite/file.c
ll_extent_lock_callback, calling ll_pgcache_remove_extent) .
 
2. lustre_version.patch concerns by Christoph Hellwig:
 
This one can easily be removed, but kernel version alone does not
necessarily represent anything useful. There are tons of people
patching their kernel with patches, even applying parts of newer
kernel and still leaving kernel version at its old value
(distributions immediately come to mind). So we still need something
to identify version of necessary bits. E.g. version of intent API.
 
3. Introduction of lock-less version of d_rehash (__d_rehash) by
   Christoph Hellwig:
 
In some places lustre needs to do several things to dentry's with
dcache lock held already, e.g. traverse alias dentries in inode to
find one with same name and parent as the one we have already.  Lustre
can invalidate busy dentries, which we put on a list.  If these are
looked up again, concurrently, we find them on this list and re-use
them, to avoid having several identical aliases in an inode.  See
llite/{dcache.c,namei.c} ll_revalidate and the lock callback function
ll_mdc_blocking_ast which calls ll_unhash_aliases.  We use d_move to
manipulate dentries associated with raw inodes and names in ext3.
 
4. vfs intent API changes kernel exported concern API by Christoph
   Hellwig:
 
With slight modification it is possible to reduce the changes to just
changes in the name of intent structure itself and some of its
fields. 
 
This renaming was requested by Linus, but we can change names back
easily if needed, that would avoid any api change.  Are there other
users, please let us know what to do?
 
All the functions can easily be split into valid intent expecting ones
(with some suffix in name like _it) and those that are part of old API
would just initialise the intent to something sensible and then call
corresponding intent-expecting function. No harm should be done to
external filesystems this way. We have modified vfs intent API patch
to achieve this.
 
5. Some objections from Trond Myklebust about open flags in exec, cwd
   revalidation, and revalidate_counter patch:
 
We have fixed the exec open flags issue (our error). Also
revalidate_counter patch was dropped since we can do this inside
lustre as well. CWD revalidation can be converted to FS_REVAL_DOT in
fs flags instead, but we still need part of that patch, the
LOOKUP_LAST/LOOKUP_NOT_LAST part. Lustre needs to know when we reached
the last component in the path so that intent needs to be looked
at. (It seems we cannot use LOOKUP_CONTINUE for this reliably).
 
6. from Trond Myklebust:
 
> The vfs-intent_lustre-vanilla-2.6.patch + the "intent_release()"
> code. What if you end up crossing a mountpoint? How do you then know 
> to which superblock/filesystem the private field belongs if there are 
> more than one user of this mechanism?
 
Basically intent only makes sence for the last component. Our code
checks that and if we are doing lookup a component before the last,
then a dummy IT_LOOKUP intent is created on stack and we work with
that, perhaps the same is true for other filesystems that would like
to use this mechanism.
 
7. raw operations concerns by various people:
 
We have now implemented an alternative approach to this, that is
taking place when parent lookup is done, using intents.  For setattr
we managed to remove the raw operations alltogether, (praying that we
haven't forgotten some awful problem we solved that led to the
introduction of setattr_raw in the first place).
 
The correctly filled intent is recognised by filesystem's lookup or
revalidate method.  After the parent is looked up, based on the intent
the correct "raw" server call is executed, within the file
system. Then a special flag is set in intent, the caller of parent
lookup checks for the flag and if it is set, the functions returns
immediately with supplied (in intent)exit code, without instantiating
child dentries.
 
This needs some minor changes to VFS, though. There are at
least two approaches.
 
One is to not introduce any new methods and just rely on fs' metohds
to do everything, for this to work filesystem needs to know the
remaining path to be traversed (we can fill nd->last with remaining
path before calling into fs).  In the root directory of the mount, we
need to call a revalidate (if supported by fs) on mountpoint to
intercept the intent, after we crossed mountpoint. We have this
approach implemented in that attached patch.  Does it look better than
the raw operations?  
 
Much simpler for us is to add additional inode operation
"process_intent" method that would be called when LOOKUP_PARENT sort
of lookup was requested and we are about to leave link_path_walk()
with nameidata structure filled and everything ready.  Then the same
flag in intent will be set and everything else as in previous
approach.  
 
We believe both methods are less intrusive than the raw methods, but
slightly more delicate.
 
8. Mountpoint-crossing issues during rename (and link) noticed by
   Arjan van de Ven:
 
Well, indeed this can happen if source or destination is a mountpoint
on client but not server, this needs to be addressed by individual
filesystems that chose to implement those raw methods.
 
9. dev_readonly patch concerns by Jens Axboe:
 
We already clarified why we need it in this exact way. But there were
some valid suggestions to use other means like dm-flakey device mapper
module, so we decided to write a failure simulator DM.
 
10. "Have these patches undergone any siginifant test?" by Anton Blanchard:
 
There are two important questions I think: 
- Do the patches cause damage?
   Probably not anymore. SUSE has done testing and it appears the
   original patch I attached didn't break things (after one fix was
   made).
- Is Lustre stable?
   On 2.4 Lustre is quite stable.  On 2.6 we have done testing but,
   for example, never more than on 40 nodes.  We don't consider it
   rock solid on 2.6, it does pass POSIX and just about every other
   benchmark without failures.
 
Since the patches were modified for this discussion there are of
course some new issues which Oleg Drokin is now ironing out.
 
Our test results are visible at https://buffalo.lustre.org
 
Well, how close are we now to this being acceptable?
 
- Peter J. Braam & Oleg Drokin -


[-- Attachment #2: export-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 3228 bytes --]

 fs/jbd/journal.c   |    1 +
 fs/super.c         |    2 ++
 include/linux/fs.h |    1 +
 include/linux/mm.h |    3 +++
 mm/truncate.c      |    4 +++-
 5 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.6/fs/jbd/journal.c
===================================================================
--- linux-2.6.6.orig/fs/jbd/journal.c	2004-05-26 20:25:49.000000000 +0300
+++ linux-2.6.6/fs/jbd/journal.c	2004-05-27 21:08:52.686693408 +0300
@@ -71,6 +71,7 @@
 EXPORT_SYMBOL(journal_errno);
 EXPORT_SYMBOL(journal_ack_err);
 EXPORT_SYMBOL(journal_clear_err);
+EXPORT_SYMBOL(log_start_commit);
 EXPORT_SYMBOL(log_wait_commit);
 EXPORT_SYMBOL(journal_start_commit);
 EXPORT_SYMBOL(journal_wipe);
Index: linux-2.6.6/fs/super.c
===================================================================
--- linux-2.6.6.orig/fs/super.c	2004-05-26 20:25:43.000000000 +0300
+++ linux-2.6.6/fs/super.c	2004-05-27 21:08:52.718688544 +0300
@@ -788,6 +788,8 @@
 	return (struct vfsmount *)sb;
 }
 
+EXPORT_SYMBOL(do_kern_mount);
+
 struct vfsmount *kern_mount(struct file_system_type *type)
 {
 	return do_kern_mount(type->name, 0, type->name, NULL);
Index: linux-2.6.6/include/linux/mm.h
===================================================================
--- linux-2.6.6.orig/include/linux/mm.h	2004-05-26 20:26:11.000000000 +0300
+++ linux-2.6.6/include/linux/mm.h	2004-05-27 21:08:52.735685960 +0300
@@ -589,6 +589,9 @@
 	return 0;
 }
 
+/* truncate.c */
+extern void truncate_complete_page(struct address_space *mapping,struct page *);
+
 /* filemap.c */
 extern unsigned long page_unuse(struct page *);
 extern void truncate_inode_pages(struct address_space *, loff_t);
Index: linux-2.6.6/include/linux/fs.h
===================================================================
--- linux-2.6.6.orig/include/linux/fs.h	2004-05-27 21:08:45.986711960 +0300
+++ linux-2.6.6/include/linux/fs.h	2004-05-27 21:08:52.738685504 +0300
@@ -1137,6 +1137,7 @@
 extern int unregister_filesystem(struct file_system_type *);
 extern struct vfsmount *kern_mount(struct file_system_type *);
 extern int may_umount(struct vfsmount *);
+struct vfsmount *do_kern_mount(const char *type, int flags, const char *name, void *data);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
Index: linux-2.6.6/mm/truncate.c
===================================================================
--- linux-2.6.6.orig/mm/truncate.c	2004-05-26 20:26:14.000000000 +0300
+++ linux-2.6.6/mm/truncate.c	2004-05-27 21:08:52.750683680 +0300
@@ -42,7 +42,7 @@
  * its lock, b) when a concurrent invalidate_inode_pages got there first and
  * c) when tmpfs swizzles a page between a tmpfs inode and swapper_space.
  */
-static void
+void
 truncate_complete_page(struct address_space *mapping, struct page *page)
 {
 	if (page->mapping != mapping)
@@ -58,6 +58,8 @@
 	page_cache_release(page);	/* pagecache ref */
 }
 
+EXPORT_SYMBOL(truncate_complete_page);
+
 /*
  * This is for invalidate_inode_pages().  That function can be called at
  * any time, and is not supposed to throw away dirty pages.  But pages can


[-- Attachment #3: header_guards-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 1481 bytes --]

%diffstat
 blockgroup_lock.h |    4 +++-
 percpu_counter.h  |    4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

%patch
Index: linux-2.6.6/include/linux/percpu_counter.h
===================================================================
--- linux-2.6.6.orig/include/linux/percpu_counter.h	2004-04-04 11:37:23.000000000 +0800
+++ linux-2.6.6/include/linux/percpu_counter.h	2004-05-22 16:08:16.000000000 +0800
@@ -3,6 +3,8 @@
  *
  * WARNING: these things are HUGE.  4 kbytes per counter on 32-way P4.
  */
+#ifndef _LINUX_PERCPU_COUNTER_H
+#define _LINUX_PERCPU_COUNTER_H
 
 #include <linux/config.h>
 #include <linux/spinlock.h>
@@ -101,3 +103,5 @@ static inline void percpu_counter_dec(st
 {
 	percpu_counter_mod(fbc, -1);
 }
+
+#endif /* _LINUX_PERCPU_COUNTER_H */
Index: linux-2.6.6/include/linux/blockgroup_lock.h
===================================================================
--- linux-2.6.6.orig/include/linux/blockgroup_lock.h	2004-04-04 11:36:26.000000000 +0800
+++ linux-2.6.6/include/linux/blockgroup_lock.h	2004-05-22 16:08:45.000000000 +0800
@@ -3,6 +3,8 @@
  *
  * Simple hashed spinlocking.
  */
+#ifndef _LINUX_BLOCKGROUP_LOCK_H
+#define _LINUX_BLOCKGROUP_LOCK_H
 
 #include <linux/config.h>
 #include <linux/spinlock.h>
@@ -55,4 +57,4 @@ static inline void bgl_lock_init(struct 
 #define sb_bgl_lock(sb, block_group) \
 	(&(sb)->s_blockgroup_lock.locks[(block_group) & (NR_BG_LOCKS-1)].lock)
 
-
+#endif


[-- Attachment #4: lustre_version.patch --]
[-- Type: application/octet-stream, Size: 482 bytes --]

Version 36: don't dput dentry after error (b=2350), zero page->private (3119)
Version 35: pass intent to real_lookup after revalidate failure (b=3285)
Version 34: fix ext3 iopen assertion failure (b=2517, b=2399)

 include/linux/lustre_version.h |    1 +
 1 files changed, 1 insertion(+)

--- /dev/null	Fri Aug 30 17:31:37 2002
+++ linux-2.4.18-18.8.0-l12-braam/include/linux/lustre_version.h	Thu Feb 13 07:58:33 2003
@@ -0,0 +1 @@
+#define LUSTRE_KERNEL_VERSION 36

_

[-- Attachment #5: vanilla-2.6.6 --]
[-- Type: application/octet-stream, Size: 374 bytes --]

lustre_version.patch
vfs_intent-flags_rename-vanilla-2.6.patch 
vfs-dcache_locking-vanilla-2.6.patch
vfs-dcache_lustre_invalid-vanilla-2.6.patch 
vfs-intent_api-vanilla-2.6.patch 
vfs-raw_ops-vanilla-2.6.patch 
export-vanilla-2.6.patch 
header_guards-vanilla-2.6.patch 
vfs-intent_lustre-vanilla-2.6.patch 
vfs-do_truncate.patch
vfs-lookup_last-vanilla-2.6.patch

[-- Attachment #6: vfs_intent-flags_rename-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 8145 bytes --]

%diffstat
 fs/cifs/dir.c         |   14 +++++++-------
 fs/exec.c             |    4 ++--
 fs/namei.c            |    4 ++--
 fs/nfs/dir.c          |   14 +++++++-------
 fs/nfs/nfs4proc.c     |    6 +++---
 include/linux/namei.h |   15 +++++++--------
 6 files changed, 28 insertions(+), 29 deletions(-)

%patch
Index: linux-2.6.6/fs/exec.c
===================================================================
--- linux-2.6.6.orig/fs/exec.c	2004-05-22 00:46:19.000000000 +0800
+++ linux-2.6.6/fs/exec.c	2004-05-22 01:36:12.000000000 +0800
@@ -122,7 +122,7 @@ asmlinkage long sys_uselib(const char __
 	struct nameidata nd;
 	int error;
 
-	nd.intent.open.flags = FMODE_READ;
+	nd.intent.it_flags = FMODE_READ;
 	error = __user_walk(library, LOOKUP_FOLLOW|LOOKUP_OPEN, &nd);
 	if (error)
 		goto out;
@@ -483,7 +483,7 @@ struct file *open_exec(const char *name)
 	int err;
 	struct file *file;
 
-	nd.intent.open.flags = FMODE_READ;
+	nd.intent.it_flags = FMODE_READ;
 	err = path_lookup(name, LOOKUP_FOLLOW|LOOKUP_OPEN, &nd);
 	file = ERR_PTR(err);
 
Index: linux-2.6.6/fs/namei.c
===================================================================
--- linux-2.6.6.orig/fs/namei.c	2004-05-22 00:46:19.000000000 +0800
+++ linux-2.6.6/fs/namei.c	2004-05-22 01:36:46.000000000 +0800
@@ -1266,8 +1266,8 @@ int open_namei(const char * pathname, in
 		acc_mode |= MAY_APPEND;
 
 	/* Fill in the open() intent data */
-	nd->intent.open.flags = flag;
-	nd->intent.open.create_mode = mode;
+	nd->intent.it_flags = flag;
+	nd->intent.it_create_mode = mode;
 
 	/*
 	 * The simplest case - just a plain lookup.
Index: linux-2.6.6/fs/nfs/dir.c
===================================================================
--- linux-2.6.6.orig/fs/nfs/dir.c	2004-04-04 11:37:06.000000000 +0800
+++ linux-2.6.6/fs/nfs/dir.c	2004-05-22 01:58:56.000000000 +0800
@@ -705,7 +705,7 @@ int nfs_is_exclusive_create(struct inode
 		return 0;
 	if (!nd || (nd->flags & LOOKUP_CONTINUE) || !(nd->flags & LOOKUP_CREATE))
 		return 0;
-	return (nd->intent.open.flags & O_EXCL) != 0;
+	return (nd->intent.it_flags & O_EXCL) != 0;
 }
 
 static struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, struct nameidata *nd)
@@ -778,7 +778,7 @@ static int is_atomic_open(struct inode *
 	if (nd->flags & LOOKUP_DIRECTORY)
 		return 0;
 	/* Are we trying to write to a read only partition? */
-	if (IS_RDONLY(dir) && (nd->intent.open.flags & (O_CREAT|O_TRUNC|FMODE_WRITE)))
+	if (IS_RDONLY(dir) && (nd->intent.it_flags & (O_CREAT|O_TRUNC|FMODE_WRITE)))
 		return 0;
 	return 1;
 }
@@ -799,7 +799,7 @@ static struct dentry *nfs_atomic_lookup(
 	dentry->d_op = NFS_PROTO(dir)->dentry_ops;
 
 	/* Let vfs_create() deal with O_EXCL */
-	if (nd->intent.open.flags & O_EXCL)
+	if (nd->intent.it_flags & O_EXCL)
 		goto no_entry;
 
 	/* Open the file on the server */
@@ -807,7 +807,7 @@ static struct dentry *nfs_atomic_lookup(
 	/* Revalidate parent directory attribute cache */
 	nfs_revalidate_inode(NFS_SERVER(dir), dir);
 
-	if (nd->intent.open.flags & O_CREAT) {
+	if (nd->intent.it_flags & O_CREAT) {
 		nfs_begin_data_update(dir);
 		inode = nfs4_atomic_open(dir, dentry, nd);
 		nfs_end_data_update(dir);
@@ -823,7 +823,7 @@ static struct dentry *nfs_atomic_lookup(
 				break;
 			/* This turned out not to be a regular file */
 			case -ELOOP:
-				if (!(nd->intent.open.flags & O_NOFOLLOW))
+				if (!(nd->intent.it_flags & O_NOFOLLOW))
 					goto no_open;
 			/* case -EISDIR: */
 			/* case -EINVAL: */
@@ -857,7 +857,7 @@ static int nfs_open_revalidate(struct de
 	dir = parent->d_inode;
 	if (!is_atomic_open(dir, nd))
 		goto no_open;
-	openflags = nd->intent.open.flags;
+	openflags = nd->intent.it_flags;
 	if (openflags & O_CREAT) {
 		/* If this is a negative dentry, just drop it */
 		if (!inode)
@@ -1022,7 +1022,7 @@ static int nfs_create(struct inode *dir,
 	attr.ia_valid = ATTR_MODE;
 
 	if (nd && (nd->flags & LOOKUP_CREATE))
-		open_flags = nd->intent.open.flags;
+		open_flags = nd->intent.it_flags;
 
 	/*
 	 * The 0 argument passed into the create function should one day
Index: linux-2.6.6/fs/nfs/nfs4proc.c
===================================================================
--- linux-2.6.6.orig/fs/nfs/nfs4proc.c	2004-05-22 00:46:19.000000000 +0800
+++ linux-2.6.6/fs/nfs/nfs4proc.c	2004-05-22 01:59:41.000000000 +0800
@@ -475,17 +475,17 @@ nfs4_atomic_open(struct inode *dir, stru
 	struct nfs4_state *state;
 
 	if (nd->flags & LOOKUP_CREATE) {
-		attr.ia_mode = nd->intent.open.create_mode;
+		attr.ia_mode = nd->intent.it_create_mode;
 		attr.ia_valid = ATTR_MODE;
 		if (!IS_POSIXACL(dir))
 			attr.ia_mode &= ~current->fs->umask;
 	} else {
 		attr.ia_valid = 0;
-		BUG_ON(nd->intent.open.flags & O_CREAT);
+		BUG_ON(nd->intent.it_flags & O_CREAT);
 	}
 
 	cred = rpcauth_lookupcred(NFS_SERVER(dir)->client->cl_auth, 0);
-	state = nfs4_do_open(dir, &dentry->d_name, nd->intent.open.flags, &attr, cred);
+	state = nfs4_do_open(dir, &dentry->d_name, nd->intent.it_flags, &attr, cred);
 	put_rpccred(cred);
 	if (IS_ERR(state))
 		return (struct inode *)state;
Index: linux-2.6.6/fs/cifs/dir.c
===================================================================
--- linux-2.6.6.orig/fs/cifs/dir.c	2004-05-22 00:46:19.000000000 +0800
+++ linux-2.6.6/fs/cifs/dir.c	2004-05-22 02:00:12.000000000 +0800
@@ -146,22 +146,22 @@ cifs_create(struct inode *inode, struct 
 	if(nd) { 
 		cFYI(1,("In create for inode %p dentry->inode %p nd flags = 0x%x for %s",inode, direntry->d_inode, nd->flags,full_path));
 
-		if ((nd->intent.open.flags & O_ACCMODE) == O_RDONLY)
+		if ((nd->intent.it_flags & O_ACCMODE) == O_RDONLY)
 			desiredAccess = GENERIC_READ;
-		else if ((nd->intent.open.flags & O_ACCMODE) == O_WRONLY)
+		else if ((nd->intent.it_flags & O_ACCMODE) == O_WRONLY)
 			desiredAccess = GENERIC_WRITE;
-		else if ((nd->intent.open.flags & O_ACCMODE) == O_RDWR) {
+		else if ((nd->intent.it_flags & O_ACCMODE) == O_RDWR) {
 			/* GENERIC_ALL is too much permission to request */
 			/* can cause unnecessary access denied on create */
 			/* desiredAccess = GENERIC_ALL; */
 			desiredAccess = GENERIC_READ | GENERIC_WRITE;
 		}
 
-		if((nd->intent.open.flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))
+		if((nd->intent.it_flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))
 			disposition = FILE_CREATE;
-		else if((nd->intent.open.flags & (O_CREAT | O_TRUNC)) == (O_CREAT | O_TRUNC))
+		else if((nd->intent.it_flags & (O_CREAT | O_TRUNC)) == (O_CREAT | O_TRUNC))
 			disposition = FILE_OVERWRITE_IF;
-		else if((nd->intent.open.flags & O_CREAT) == O_CREAT)
+		else if((nd->intent.it_flags & O_CREAT) == O_CREAT)
 			disposition = FILE_OPEN_IF;
 		else {
 			cFYI(1,("Create flag not set in create function"));
@@ -311,7 +311,7 @@ cifs_lookup(struct inode *parent_dir_ino
 	      parent_dir_inode, direntry->d_name.name, direntry));
 
 	if(nd) {  /* BB removeme */
-		cFYI(1,("In lookup nd flags 0x%x open intent flags 0x%x",nd->flags,nd->intent.open.flags));
+		cFYI(1,("In lookup nd flags 0x%x open intent flags 0x%x",nd->flags,nd->intent.it_flags));
 	} /* BB removeme BB */
 	/* BB Add check of incoming data - e.g. frame not longer than maximum SMB - let server check the namelen BB */
 
Index: linux-2.6.6/include/linux/namei.h
===================================================================
--- linux-2.6.6.orig/include/linux/namei.h	2004-04-04 11:36:55.000000000 +0800
+++ linux-2.6.6/include/linux/namei.h	2004-05-22 01:46:25.000000000 +0800
@@ -5,9 +5,12 @@
 
 struct vfsmount;
 
-struct open_intent {
-	int	flags;
-	int	create_mode;
+#define INTENT_MAGIC 0x19620323
+struct lookup_intent {
+	int     it_magic;
+	int     it_op;
+	int     it_flags;
+	int     it_create_mode;
 };
 
 struct nameidata {
@@ -16,11 +19,7 @@ struct nameidata {
 	struct qstr	last;
 	unsigned int	flags;
 	int		last_type;
-
-	/* Intent data */
-	union {
-		struct open_intent open;
-	} intent;
+	struct lookup_intent intent;
 };
 
 /*


[-- Attachment #7: vfs-dcache_locking-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 2685 bytes --]

%diffstat
 fs/dcache.c            |   22 ++++++++++++++++++----
 include/linux/dcache.h |    2 ++
 2 files changed, 20 insertions(+), 4 deletions(-)

%patch
Index: linux-2.6.6/fs/dcache.c
===================================================================
--- linux-2.6.6.orig/fs/dcache.c	2004-05-22 00:46:19.000000000 +0800
+++ linux-2.6.6/fs/dcache.c	2004-05-22 02:11:17.000000000 +0800
@@ -1115,13 +1115,20 @@ void d_delete(struct dentry * dentry)
  * Adds a dentry to the hash according to its name.
  */
  
-void d_rehash(struct dentry * entry)
+void __d_rehash(struct dentry * entry)
 {
 	struct hlist_head *list = d_hash(entry->d_parent, entry->d_name.hash);
-	spin_lock(&dcache_lock);
  	entry->d_vfs_flags &= ~DCACHE_UNHASHED;
 	entry->d_bucket = list;
  	hlist_add_head_rcu(&entry->d_hash, list);
+}
+
+EXPORT_SYMBOL(__d_rehash);
+
+void d_rehash(struct dentry * entry)
+{
+	spin_lock(&dcache_lock);
+        __d_rehash(entry);
 	spin_unlock(&dcache_lock);
 }
 
@@ -1185,12 +1192,11 @@ static inline void switch_names(struct d
  * dcache entries should not be moved in this way.
  */
 
-void d_move(struct dentry * dentry, struct dentry * target)
+void __d_move(struct dentry * dentry, struct dentry * target)
 {
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	/*
 	 * XXXX: do we really need to take target->d_lock?
@@ -1243,6 +1249,14 @@ already_unhashed:
 	spin_unlock(&target->d_lock);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
+}
+
+EXPORT_SYMBOL(__d_move);
+
+void d_move(struct dentry *dentry, struct dentry *target)
+{
+	spin_lock(&dcache_lock);
+	__d_move(dentry, target);
 	spin_unlock(&dcache_lock);
 }
 
Index: linux-2.6.6/include/linux/dcache.h
===================================================================
--- linux-2.6.6.orig/include/linux/dcache.h	2004-05-22 00:46:20.000000000 +0800
+++ linux-2.6.6/include/linux/dcache.h	2004-05-22 02:10:01.000000000 +0800
@@ -224,6 +224,7 @@ extern int have_submounts(struct dentry 
  * This adds the entry to the hash queues.
  */
 extern void d_rehash(struct dentry *);
+extern void __d_rehash(struct dentry *);
 
 /**
  * d_add - add dentry to hash queues
@@ -242,6 +243,7 @@ static inline void d_add(struct dentry *
 
 /* used for rename() and baskets */
 extern void d_move(struct dentry *, struct dentry *);
+extern void __d_move(struct dentry *, struct dentry *);
 
 /* appendix may either be NULL or be used for transname suffixes */
 extern struct dentry * d_lookup(struct dentry *, struct qstr *);


[-- Attachment #8: vfs-dcache_lustre_invalid-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 1252 bytes --]

%diffstat
 fs/dcache.c            |    7 +++++++
 include/linux/dcache.h |    1 +
 2 files changed, 8 insertions(+)

%patch
Index: linux-2.6.6/fs/dcache.c
===================================================================
--- linux-2.6.6.orig/fs/dcache.c	2004-05-22 02:11:17.000000000 +0800
+++ linux-2.6.6/fs/dcache.c	2004-05-22 02:14:46.000000000 +0800
@@ -217,6 +217,13 @@ int d_invalidate(struct dentry * dentry)
 		spin_unlock(&dcache_lock);
 		return 0;
 	}
+
+	/* network invalidation by Lustre */
+	if (dentry->d_flags & DCACHE_LUSTRE_INVALID) {
+		spin_unlock(&dcache_lock);
+		return 0;
+	}
+
 	/*
 	 * Check whether to do a partial shrink_dcache
 	 * to get rid of unused child entries.
Index: linux-2.6.6/include/linux/dcache.h
===================================================================
--- linux-2.6.6.orig/include/linux/dcache.h	2004-05-22 02:10:01.000000000 +0800
+++ linux-2.6.6/include/linux/dcache.h	2004-05-22 02:15:17.000000000 +0800
@@ -153,6 +153,7 @@ d_iput:		no		no		yes
 
 #define DCACHE_REFERENCED	0x0008  /* Recently used, don't discard. */
 #define DCACHE_UNHASHED		0x0010	
+#define DCACHE_LUSTRE_INVALID	0x0020	/* invalidated by Lustre */
 
 extern spinlock_t dcache_lock;
 


[-- Attachment #9: vfs-do_truncate.patch --]
[-- Type: application/octet-stream, Size: 3284 bytes --]

 fs/exec.c          |    2 +-
 fs/namei.c         |    2 +-
 fs/open.c          |    8 +++++---
 include/linux/fs.h |    3 ++-
 4 files changed, 9 insertions(+), 6 deletions(-)
Index: linux-2.6.6/fs/namei.c
===================================================================
--- linux-2.6.6.orig/fs/namei.c	2004-05-30 23:17:06.267030976 +0300
+++ linux-2.6.6/fs/namei.c	2004-05-30 23:23:15.642877312 +0300
@@ -1270,7 +1270,7 @@
 		if (!error) {
 			DQUOT_INIT(inode);
 			
-			error = do_truncate(dentry, 0);
+			error = do_truncate(dentry, 0, 1);
 		}
 		put_write_access(inode);
 		if (error)
Index: linux-2.6.6/fs/open.c
===================================================================
--- linux-2.6.6.orig/fs/open.c	2004-05-30 20:05:26.857206992 +0300
+++ linux-2.6.6/fs/open.c	2004-05-30 23:24:38.908219056 +0300
@@ -189,7 +189,7 @@
 	return error;
 }
 
-int do_truncate(struct dentry *dentry, loff_t length)
+int do_truncate(struct dentry *dentry, loff_t length, int called_from_open)
 {
 	int err;
 	struct iattr newattrs;
@@ -202,6 +202,8 @@
 	newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
 	down(&dentry->d_inode->i_sem);
 	down_write(&dentry->d_inode->i_alloc_sem);
+	if (called_from_open)
+		newattrs.ia_valid |= ATTR_FROM_OPEN;
 	err = notify_change(dentry, &newattrs);
 	up_write(&dentry->d_inode->i_alloc_sem);
 	up(&dentry->d_inode->i_sem);
@@ -259,7 +261,7 @@
 	error = locks_verify_truncate(inode, NULL, length);
 	if (!error) {
 		DQUOT_INIT(inode);
-		error = do_truncate(nd.dentry, length);
+		error = do_truncate(nd.dentry, length, 0);
 	}
 	put_write_access(inode);
 
@@ -311,7 +313,7 @@
 
 	error = locks_verify_truncate(inode, file, length);
 	if (!error)
-		error = do_truncate(dentry, length);
+		error = do_truncate(dentry, length, 0);
 out_putf:
 	fput(file);
 out:
Index: linux-2.6.6/fs/exec.c
===================================================================
--- linux-2.6.6.orig/fs/exec.c	2004-05-30 20:05:26.862206232 +0300
+++ linux-2.6.6/fs/exec.c	2004-05-30 23:23:15.648876400 +0300
@@ -1395,7 +1395,7 @@
 		goto close_fail;
 	if (!file->f_op->write)
 		goto close_fail;
-	if (do_truncate(file->f_dentry, 0) != 0)
+	if (do_truncate(file->f_dentry, 0, 0) != 0)
 		goto close_fail;
 
 	retval = binfmt->core_dump(signr, regs, file);
Index: linux-2.6.6/include/linux/fs.h
===================================================================
--- linux-2.6.6.orig/include/linux/fs.h	2004-05-30 23:20:11.979798344 +0300
+++ linux-2.6.6/include/linux/fs.h	2004-05-30 23:25:29.167578472 +0300
@@ -249,6 +249,7 @@
 #define ATTR_ATTR_FLAG	1024
 #define ATTR_KILL_SUID	2048
 #define ATTR_KILL_SGID	4096
+#define ATTR_FROM_OPEN	8192   /* called from open path, ie O_TRUNC */
 
 /*
  * This is the Inode Attributes structure, used for notify_change().  It
@@ -1189,7 +1190,7 @@
 
 /* fs/open.c */
 
-extern int do_truncate(struct dentry *, loff_t start);
+extern int do_truncate(struct dentry *, loff_t start, int called_from_open);
 extern struct file *filp_open(const char *, int, int);
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int);
 extern struct file * dentry_open_it(struct dentry *, struct vfsmount *, int, struct lookup_intent *);

[-- Attachment #10: vfs-intent_api-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 16232 bytes --]

 fs/exec.c             |   10 ++++---
 fs/namei.c            |   69 ++++++++++++++++++++++++++++++++++++++++++++------
 fs/namespace.c        |    1 
 fs/open.c             |   28 +++++++++++++++-----
 fs/stat.c             |   10 +++++--
 fs/xattr.c            |   12 +++++---
 include/linux/fs.h    |    2 +
 include/linux/namei.h |   27 +++++++++++++++++++
 8 files changed, 134 insertions(+), 25 deletions(-)

Index: linux-2.6.6/include/linux/namei.h
===================================================================
--- linux-2.6.6.orig/include/linux/namei.h	2004-05-30 19:46:50.238958768 +0300
+++ linux-2.6.6/include/linux/namei.h	2004-05-30 20:05:26.849208208 +0300
@@ -2,17 +2,36 @@
 #define _LINUX_NAMEI_H
 
 #include <linux/linkage.h>
+#include <linux/string.h>
 
 struct vfsmount;
 
+/* intent opcodes */
+#define IT_OPEN		(1)
+#define IT_CREAT	(1<<1)
+#define IT_READDIR	(1<<2)
+#define IT_GETATTR	(1<<3)
+#define IT_LOOKUP	(1<<4)
+#define IT_UNLINK	(1<<5)
+#define IT_TRUNC	(1<<6)
+#define IT_GETXATTR	(1<<7)
+
 #define INTENT_MAGIC 0x19620323
 struct lookup_intent {
 	int     it_magic;
 	int     it_op;
+	void    (*it_op_release)(struct lookup_intent *);
 	int     it_flags;
 	int     it_create_mode;
 };
 
+static inline void intent_init(struct lookup_intent *it, int op)
+{
+	memset(it, 0, sizeof(*it));
+	it->it_magic = INTENT_MAGIC;
+	it->it_op = op;
+}
+
 struct nameidata {
 	struct dentry	*dentry;
 	struct vfsmount *mnt;
@@ -48,14 +67,22 @@
 #define LOOKUP_ACCESS		(0x0400)
 
 extern int FASTCALL(__user_walk(const char __user *, unsigned, struct nameidata *));
+extern int FASTCALL(__user_walk_it(const char __user *, unsigned, struct nameidata *));
 #define user_path_walk(name,nd) \
 	__user_walk(name, LOOKUP_FOLLOW, nd)
+#define user_path_walk_it(name,nd) \
+	__user_walk_it(name, LOOKUP_FOLLOW, nd)
 #define user_path_walk_link(name,nd) \
 	__user_walk(name, 0, nd)
+#define user_path_walk_link_it(name,nd) \
+	__user_walk_it(name, 0, nd)
 extern int FASTCALL(path_lookup(const char *, unsigned, struct nameidata *));
+extern int FASTCALL(path_lookup_it(const char *, unsigned, struct nameidata *));
 extern int FASTCALL(path_walk(const char *, struct nameidata *));
+extern int FASTCALL(path_walk_it(const char *, struct nameidata *));
 extern int FASTCALL(link_path_walk(const char *, struct nameidata *));
 extern void path_release(struct nameidata *);
+extern void intent_release(struct lookup_intent *);
 
 extern struct dentry * lookup_one_len(const char *, struct dentry *, int);
 extern struct dentry * lookup_hash(struct qstr *, struct dentry *);
Index: linux-2.6.6/include/linux/fs.h
===================================================================
--- linux-2.6.6.orig/include/linux/fs.h	2004-05-26 20:26:11.000000000 +0300
+++ linux-2.6.6/include/linux/fs.h	2004-05-30 20:05:26.852207752 +0300
@@ -576,6 +576,7 @@
 	spinlock_t		f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
+	struct lookup_intent	*f_it;
 };
 extern spinlock_t files_lock;
 #define file_list_lock() spin_lock(&files_lock);
@@ -1190,6 +1191,7 @@
 extern int do_truncate(struct dentry *, loff_t start);
 extern struct file *filp_open(const char *, int, int);
 extern struct file * dentry_open(struct dentry *, struct vfsmount *, int);
+extern struct file * dentry_open_it(struct dentry *, struct vfsmount *, int, struct lookup_intent *);
 extern int filp_close(struct file *, fl_owner_t id);
 extern char * getname(const char __user *);
 
Index: linux-2.6.6/fs/namei.c
===================================================================
--- linux-2.6.6.orig/fs/namei.c	2004-05-30 19:46:50.185966824 +0300
+++ linux-2.6.6/fs/namei.c	2004-05-30 20:05:26.855207296 +0300
@@ -272,8 +272,19 @@
 	return 0;
 }
 
+void intent_release(struct lookup_intent *it)
+{
+	if (!it)
+		return;
+	if (it->it_magic != INTENT_MAGIC)
+		return;
+	if (it->it_op_release)
+		it->it_op_release(it);
+}
+
 void path_release(struct nameidata *nd)
 {
+	intent_release(&nd->intent);
 	dput(nd->dentry);
 	mntput(nd->mnt);
 }
@@ -774,8 +785,14 @@
 	return err;
 }
 
+int fastcall path_walk_it(const char * name, struct nameidata *nd)
+{
+	current->total_link_count = 0;
+	return link_path_walk(name, nd);
+}
 int fastcall path_walk(const char * name, struct nameidata *nd)
 {
+	intent_init(&nd->intent, IT_LOOKUP);
 	current->total_link_count = 0;
 	return link_path_walk(name, nd);
 }
@@ -784,7 +801,7 @@
 /* returns 1 if everything is done */
 static int __emul_lookup_dentry(const char *name, struct nameidata *nd)
 {
-	if (path_walk(name, nd))
+	if (path_walk_it(name, nd))
 		return 0;		/* something went wrong... */
 
 	if (!nd->dentry->d_inode || S_ISDIR(nd->dentry->d_inode->i_mode)) {
@@ -861,7 +878,18 @@
 	return 1;
 }
 
-int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)
+static inline int it_mode_from_lookup_flags(int flags)
+{
+	int mode = IT_LOOKUP;
+
+	if (flags & LOOKUP_OPEN)
+		mode = IT_OPEN;
+	if (flags & LOOKUP_CREATE)
+		mode |= IT_CREAT;
+	return mode;
+}
+
+int fastcall path_lookup_it(const char *name, unsigned int flags, struct nameidata *nd)
 {
 	int retval;
 
@@ -896,6 +924,12 @@
 	return retval;
 }
 
+int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)
+{
+	intent_init(&nd->intent, it_mode_from_lookup_flags(flags));
+	return path_lookup_it(name, flags, nd);
+}
+
 /*
  * Restricted form of lookup. Doesn't follow links, single-component only,
  * needs parent already locked. Doesn't follow mounts.
@@ -946,7 +980,7 @@
 }
 
 /* SMP-safe */
-struct dentry * lookup_one_len(const char * name, struct dentry * base, int len)
+struct dentry * lookup_one_len_it(const char * name, struct dentry * base, int len, struct nameidata *nd)
 {
 	unsigned long hash;
 	struct qstr this;
@@ -966,11 +1000,16 @@
 	}
 	this.hash = end_name_hash(hash);
 
-	return lookup_hash(&this, base);
+	return __lookup_hash(&this, base, nd);
 access:
 	return ERR_PTR(-EACCES);
 }
 
+struct dentry * lookup_one_len(const char * name, struct dentry * base, int len)
+{
+	return lookup_one_len_it(name, base, len, NULL);
+}
+
 /*
  *	namei()
  *
@@ -982,18 +1021,24 @@
  * that namei follows links, while lnamei does not.
  * SMP-safe
  */
-int fastcall __user_walk(const char __user *name, unsigned flags, struct nameidata *nd)
+int fastcall __user_walk_it(const char __user *name, unsigned flags, struct nameidata *nd)
 {
 	char *tmp = getname(name);
 	int err = PTR_ERR(tmp);
 
 	if (!IS_ERR(tmp)) {
-		err = path_lookup(tmp, flags, nd);
+		err = path_lookup_it(tmp, flags, nd);
 		putname(tmp);
 	}
 	return err;
 }
 
+int fastcall __user_walk(const char __user *name, unsigned flags, struct nameidata *nd)
+{
+	intent_init(&nd->intent, it_mode_from_lookup_flags(flags));
+	return __user_walk_it(name, flags, nd);
+}
+
 /*
  * It's inline, so penalty for filesystems that don't use sticky bit is
  * minimal.
@@ -1273,7 +1318,7 @@
 	 * The simplest case - just a plain lookup.
 	 */
 	if (!(flag & O_CREAT)) {
-		error = path_lookup(pathname, lookup_flags(flag)|LOOKUP_OPEN, nd);
+		error = path_lookup_it(pathname, lookup_flags(flag), nd);
 		if (error)
 			return error;
 		goto ok;
@@ -1282,7 +1327,8 @@
 	/*
 	 * Create - we need to know the parent.
 	 */
-	error = path_lookup(pathname, LOOKUP_PARENT|LOOKUP_OPEN|LOOKUP_CREATE, nd);
+	nd->intent.it_op |= IT_CREAT;
+	error = path_lookup_it(pathname, LOOKUP_PARENT, nd);
 	if (error)
 		return error;
 
@@ -2165,6 +2211,7 @@
 __vfs_follow_link(struct nameidata *nd, const char *link)
 {
 	int res = 0;
+	struct lookup_intent it = nd->intent;
 	char *name;
 	if (IS_ERR(link))
 		goto fail;
@@ -2175,6 +2222,9 @@
 			/* weird __emul_prefix() stuff did it */
 			goto out;
 	}
+	intent_init(&nd->intent, it.it_op);
+	nd->intent.it_flags = it.it_flags;
+	nd->intent.it_create_mode = it.it_create_mode;
 	res = link_path_walk(link, nd);
 out:
 	if (current->link_count || res || nd->last_type!=LAST_NORM)
@@ -2249,6 +2299,7 @@
 	return res;
 }
 
+
 int page_symlink(struct inode *inode, const char *symname, int len)
 {
 	struct address_space *mapping = inode->i_mapping;
@@ -2309,8 +2360,10 @@
 EXPORT_SYMBOL(page_symlink);
 EXPORT_SYMBOL(page_symlink_inode_operations);
 EXPORT_SYMBOL(path_lookup);
+EXPORT_SYMBOL(path_lookup_it);
 EXPORT_SYMBOL(path_release);
 EXPORT_SYMBOL(path_walk);
+EXPORT_SYMBOL(path_walk_it);
 EXPORT_SYMBOL(permission);
 EXPORT_SYMBOL(unlock_rename);
 EXPORT_SYMBOL(vfs_create);
Index: linux-2.6.6/fs/open.c
===================================================================
--- linux-2.6.6.orig/fs/open.c	2004-05-26 20:25:43.000000000 +0300
+++ linux-2.6.6/fs/open.c	2004-05-30 20:05:26.857206992 +0300
@@ -214,11 +214,12 @@
 	struct inode * inode;
 	int error;
 
+	intent_init(&nd.intent, IT_GETATTR);
 	error = -EINVAL;
 	if (length < 0)	/* sorry, but loff_t says... */
 		goto out;
 
-	error = user_path_walk(path, &nd);
+	error = user_path_walk_it(path, &nd);
 	if (error)
 		goto out;
 	inode = nd.dentry->d_inode;
@@ -473,6 +474,7 @@
 	kernel_cap_t old_cap;
 	int res;
 
+	intent_init(&nd.intent, IT_GETATTR);
 	if (mode & ~S_IRWXO)	/* where's F_OK, X_OK, W_OK, R_OK? */
 		return -EINVAL;
 
@@ -496,7 +498,7 @@
 	else
 		current->cap_effective = current->cap_permitted;
 
-	res = __user_walk(filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, &nd);
+	res = __user_walk_it(filename, LOOKUP_FOLLOW|LOOKUP_ACCESS, &nd);
 	if (!res) {
 		res = permission(nd.dentry->d_inode, mode, &nd);
 		/* SuS v2 requires we report a read only fs too */
@@ -518,7 +520,8 @@
 	struct nameidata nd;
 	int error;
 
-	error = __user_walk(filename, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &nd);
+	intent_init(&nd.intent, IT_GETATTR);
+	error = __user_walk_it(filename, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &nd);
 	if (error)
 		goto out;
 
@@ -569,7 +572,8 @@
 	struct nameidata nd;
 	int error;
 
-	error = __user_walk(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY | LOOKUP_NOALT, &nd);
+	intent_init(&nd.intent, IT_GETATTR);
+	error = __user_walk_it(filename, LOOKUP_FOLLOW | LOOKUP_DIRECTORY | LOOKUP_NOALT, &nd);
 	if (error)
 		goto out;
 
@@ -752,6 +756,7 @@
 {
 	int namei_flags, error;
 	struct nameidata nd;
+	intent_init(&nd.intent, IT_OPEN);
 
 	namei_flags = flags;
 	if ((namei_flags+1) & O_ACCMODE)
@@ -761,14 +766,14 @@
 
 	error = open_namei(filename, namei_flags, mode, &nd);
 	if (!error)
-		return dentry_open(nd.dentry, nd.mnt, flags);
+		return dentry_open_it(nd.dentry, nd.mnt, flags, &nd.intent);
 
 	return ERR_PTR(error);
 }
 
 EXPORT_SYMBOL(filp_open);
 
-struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags)
+struct file *dentry_open_it(struct dentry *dentry, struct vfsmount *mnt, int flags, struct lookup_intent *it)
 {
 	struct file * f;
 	struct inode *inode;
@@ -780,6 +785,7 @@
 		goto cleanup_dentry;
 	f->f_flags = flags;
 	f->f_mode = (flags+1) & O_ACCMODE;
+	f->f_it = it;
 	inode = dentry->d_inode;
 	if (f->f_mode & FMODE_WRITE) {
 		error = get_write_access(inode);
@@ -799,6 +805,7 @@
 		error = f->f_op->open(inode,f);
 		if (error)
 			goto cleanup_all;
+		intent_release(it);
 	}
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
@@ -823,11 +830,20 @@
 cleanup_file:
 	put_filp(f);
 cleanup_dentry:
+	intent_release(it);
 	dput(dentry);
 	mntput(mnt);
 	return ERR_PTR(error);
 }
 
+struct file *dentry_open(struct dentry *dentry, struct vfsmount *mnt, int flags)
+{
+	struct lookup_intent it;
+	intent_init(&it, IT_LOOKUP);
+
+	return dentry_open_it(dentry, mnt, flags, &it);
+}
+
 EXPORT_SYMBOL(dentry_open);
 
 /*
Index: linux-2.6.6/fs/stat.c
===================================================================
--- linux-2.6.6.orig/fs/stat.c	2004-05-26 20:25:43.000000000 +0300
+++ linux-2.6.6/fs/stat.c	2004-05-30 23:46:08.545164440 +0300
@@ -58,15 +58,15 @@
 	}
 	return 0;
 }
-
 EXPORT_SYMBOL(vfs_getattr);
 
 int vfs_stat(char __user *name, struct kstat *stat)
 {
 	struct nameidata nd;
 	int error;
+	intent_init(&nd.intent, IT_GETATTR);
 
-	error = user_path_walk(name, &nd);
+	error = user_path_walk_it(name, &nd);
 	if (!error) {
 		error = vfs_getattr(nd.mnt, nd.dentry, stat);
 		path_release(&nd);
@@ -80,8 +80,9 @@
 {
 	struct nameidata nd;
 	int error;
+	intent_init(&nd.intent, IT_GETATTR);
 
-	error = user_path_walk_link(name, &nd);
+	error = user_path_walk_link_it(name, &nd);
 	if (!error) {
 		error = vfs_getattr(nd.mnt, nd.dentry, stat);
 		path_release(&nd);
@@ -95,9 +96,12 @@
 {
 	struct file *f = fget(fd);
 	int error = -EBADF;
+	struct nameidata nd;
+	intent_init(&nd.intent, IT_GETATTR);
 
 	if (f) {
 		error = vfs_getattr(f->f_vfsmnt, f->f_dentry, stat);
+		intent_release(&nd.intent);
 		fput(f);
 	}
 	return error;
Index: linux-2.6.6/fs/namespace.c
===================================================================
--- linux-2.6.6.orig/fs/namespace.c	2004-05-26 20:25:43.000000000 +0300
+++ linux-2.6.6/fs/namespace.c	2004-05-30 20:05:26.860206536 +0300
@@ -115,6 +115,7 @@
 
 static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd)
 {
+	memset(old_nd, 0, sizeof(*old_nd));
 	old_nd->dentry = mnt->mnt_mountpoint;
 	old_nd->mnt = mnt->mnt_parent;
 	mnt->mnt_parent = mnt;
Index: linux-2.6.6/fs/exec.c
===================================================================
--- linux-2.6.6.orig/fs/exec.c	2004-05-30 19:46:50.182967280 +0300
+++ linux-2.6.6/fs/exec.c	2004-05-30 20:05:26.862206232 +0300
@@ -122,8 +122,9 @@
 	struct nameidata nd;
 	int error;
 
+	intent_init(&nd.intent, IT_OPEN);
 	nd.intent.it_flags = FMODE_READ;
-	error = __user_walk(library, LOOKUP_FOLLOW|LOOKUP_OPEN, &nd);
+	error = user_path_walk_it(library, &nd);
 	if (error)
 		goto out;
 
@@ -135,7 +136,7 @@
 	if (error)
 		goto exit;
 
-	file = dentry_open(nd.dentry, nd.mnt, O_RDONLY);
+	file = dentry_open_it(nd.dentry, nd.mnt, O_RDONLY, &nd.intent);
 	error = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out;
@@ -483,8 +484,9 @@
 	int err;
 	struct file *file;
 
+	intent_init(&nd.intent, IT_OPEN);
 	nd.intent.it_flags = FMODE_READ;
-	err = path_lookup(name, LOOKUP_FOLLOW|LOOKUP_OPEN, &nd);
+	err = path_lookup_it(name, LOOKUP_FOLLOW, &nd);
 	file = ERR_PTR(err);
 
 	if (!err) {
@@ -497,7 +499,7 @@
 				err = -EACCES;
 			file = ERR_PTR(err);
 			if (!err) {
-				file = dentry_open(nd.dentry, nd.mnt, O_RDONLY);
+				file = dentry_open_it(nd.dentry, nd.mnt, O_RDONLY, &nd.intent);
 				if (!IS_ERR(file)) {
 					err = deny_write_access(file);
 					if (err) {
Index: linux-2.6.6/fs/xattr.c
===================================================================
--- linux-2.6.6.orig/fs/xattr.c	2004-05-26 20:25:43.000000000 +0300
+++ linux-2.6.6/fs/xattr.c	2004-05-30 20:05:26.863206080 +0300
@@ -161,7 +161,8 @@
 	struct nameidata nd;
 	ssize_t error;
 
-	error = user_path_walk(path, &nd);
+	intent_init(&nd.intent, IT_GETXATTR);
+	error = user_path_walk_it(path, &nd);
 	if (error)
 		return error;
 	error = getxattr(nd.dentry, name, value, size);
@@ -176,7 +177,8 @@
 	struct nameidata nd;
 	ssize_t error;
 
-	error = user_path_walk_link(path, &nd);
+	intent_init(&nd.intent, IT_GETXATTR);
+	error = user_path_walk_link_it(path, &nd);
 	if (error)
 		return error;
 	error = getxattr(nd.dentry, name, value, size);
@@ -242,7 +244,8 @@
 	struct nameidata nd;
 	ssize_t error;
 
-	error = user_path_walk(path, &nd);
+	intent_init(&nd.intent, IT_GETXATTR);
+	error = user_path_walk_it(path, &nd);
 	if (error)
 		return error;
 	error = listxattr(nd.dentry, list, size);
@@ -256,7 +259,8 @@
 	struct nameidata nd;
 	ssize_t error;
 
-	error = user_path_walk_link(path, &nd);
+	intent_init(&nd.intent, IT_GETXATTR);
+	error = user_path_walk_link_it(path, &nd);
 	if (error)
 		return error;
 	error = listxattr(nd.dentry, list, size);

[-- Attachment #11: vfs-intent_lustre-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 940 bytes --]

 namei.h |   11 +++++++++++
 1 files changed, 11 insertions(+)
Index: linux-2.6.6/include/linux/namei.h
===================================================================
--- linux-2.6.6.orig/include/linux/namei.h	2004-05-31 11:55:11.399239832 +0300
+++ linux-2.6.6/include/linux/namei.h	2004-05-31 11:56:45.338958824 +0300
@@ -22,6 +22,14 @@
 #define IT_MKNOD	(1<<12)
 #define IT_SYMLINK	(1<<13)
 
+struct lustre_intent_data {
+	int       it_disposition;
+	int       it_status;
+	__u64     it_lock_handle;
+	void     *it_data;
+	int       it_lock_mode;
+};
+
 #define INTENT_MAGIC 0x19620323
 #define IT_STATUS_RAW (1<<10)	/* Setting this in it_flags on exit from lookup
 				   means everything was done already and return
@@ -38,6 +46,9 @@
 		char	*link;	/* For symlink */
 		struct nameidata *source_nd; /* For link/rename */
 	} it_create;
+	union {
+		struct lustre_intent_data *lustre;
+	} d;
 };
 
 

[-- Attachment #12: vfs-lookup_last-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 1803 bytes --]

 fs/namei.c            |    8 ++++++++
 include/linux/namei.h |    3 +++
 2 files changed, 11 insertions(+)

Index: linux-2.6.6/fs/namei.c
===================================================================
--- linux-2.6.6.orig/fs/namei.c	2004-05-27 21:24:45.151896688 +0300
+++ linux-2.6.6/fs/namei.c	2004-05-27 22:48:34.155371952 +0300
@@ -677,7 +677,9 @@
 
 		if (inode->i_op->follow_link) {
 			mntget(next.mnt);
+			nd->flags |= LOOKUP_LINK_NOTLAST;
 			err = do_follow_link(next.dentry, nd);
+			nd->flags &= ~LOOKUP_LINK_NOTLAST;
 			dput(next.dentry);
 			mntput(next.mnt);
 			if (err)
@@ -723,7 +725,9 @@
 			if (err < 0)
 				break;
 		}
+		nd->flags |= LOOKUP_LAST;
 		err = do_lookup(nd, &this, &next);
+		nd->flags &= ~LOOKUP_LAST;
 		if (err)
 			break;
 		follow_mount(&next.mnt, &next.dentry);
@@ -1344,7 +1348,9 @@
 	dir = nd->dentry;
 	nd->flags &= ~LOOKUP_PARENT;
 	down(&dir->d_inode->i_sem);
+	nd->flags |= LOOKUP_LAST;
 	dentry = __lookup_hash(&nd->last, nd->dentry, nd);
+	nd->flags &= ~LOOKUP_LAST;
 
 do_last:
 	error = PTR_ERR(dentry);
@@ -1449,7 +1455,9 @@
 	}
 	dir = nd->dentry;
 	down(&dir->d_inode->i_sem);
+	nd->flags |= LOOKUP_LAST;
 	dentry = __lookup_hash(&nd->last, nd->dentry, nd);
+	nd->flags &= ~LOOKUP_LAST;
 	putname(nd->last.name);
 	goto do_last;
 }
Index: linux-2.6.6/include/linux/namei.h
===================================================================
--- linux-2.6.6.orig/include/linux/namei.h	2004-05-27 21:24:45.078907784 +0300
+++ linux-2.6.6/include/linux/namei.h	2004-05-27 22:47:58.870736032 +0300
@@ -70,6 +70,9 @@
 #define LOOKUP_CONTINUE		 4
 #define LOOKUP_PARENT		16
 #define LOOKUP_NOALT		32
+#define LOOKUP_LAST		64
+#define LOOKUP_LINK_NOTLAST	128
+
 /*
  * Intent data
  */

[-- Attachment #13: vfs-raw_ops-vanilla-2.6.patch --]
[-- Type: application/octet-stream, Size: 6446 bytes --]

 fs/namei.c            |   73 ++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/namei.h |   17 +++++++++++
 2 files changed, 82 insertions(+), 8 deletions(-)

Index: linux-2.6.6/fs/namei.c
===================================================================
--- linux-2.6.6.orig/fs/namei.c	2004-06-02 17:01:51.115405512 +0300
+++ linux-2.6.6/fs/namei.c	2004-06-02 17:05:18.898817632 +0300
@@ -560,12 +560,14 @@
 	return 0;
 
 need_lookup:
+	nd->last = *name;
 	dentry = real_lookup(nd->dentry, name, nd);
 	if (IS_ERR(dentry))
 		goto fail;
 	goto done;
 
 need_revalidate:
+	nd->last = *name;
 	if (dentry->d_op->d_revalidate(dentry, nd))
 		goto done;
 	if (d_invalidate(dentry))
@@ -606,6 +608,7 @@
 		unsigned long hash;
 		struct qstr this;
 		unsigned int c;
+		int span_mount = 0;
 
 		err = exec_permission_lite(inode, nd);
 		if (err == -EAGAIN) { 
@@ -665,7 +668,8 @@
 		if (err)
 			break;
 		/* Check mountpoints.. */
-		follow_mount(&next.mnt, &next.dentry);
+		if (follow_mount(&next.mnt, &next.dentry))
+			span_mount = 1;
 
 		err = -ENOENT;
 		inode = next.dentry->d_inode;
@@ -693,6 +697,12 @@
 			dput(nd->dentry);
 			nd->mnt = next.mnt;
 			nd->dentry = next.dentry;
+			if (span_mount && next.dentry->d_op &&
+			    next.dentry->d_op->d_revalidate) {
+				nd->last = this;
+				next.dentry->d_op->d_revalidate(next.dentry, nd);
+				span_mount = 0;
+			}
 		}
 		err = -ENOTDIR; 
 		if (!inode->i_op->lookup)
@@ -1523,9 +1533,18 @@
 	if (IS_ERR(tmp))
 		return PTR_ERR(tmp);
 
-	error = path_lookup(tmp, LOOKUP_PARENT, &nd);
+	intent_init(&nd.intent, IT_MKNOD);
+	nd.intent.it_create_mode = mode;
+	nd.intent.it_create.dev = dev;
+
+	error = path_lookup_it(tmp, LOOKUP_PARENT, &nd);
 	if (error)
 		goto out;
+	if (nd.intent.it_flags & IT_STATUS_RAW) {
+		error = nd.intent.it_create.raw_status;
+		goto out2;
+	}
+
 	dentry = lookup_create(&nd, 0);
 	error = PTR_ERR(dentry);
 
@@ -1552,6 +1571,7 @@
 		dput(dentry);
 	}
 	up(&nd.dentry->d_inode->i_sem);
+out2:
 	path_release(&nd);
 out:
 	putname(tmp);
@@ -1594,9 +1614,15 @@
 		struct dentry *dentry;
 		struct nameidata nd;
 
-		error = path_lookup(tmp, LOOKUP_PARENT, &nd);
+		intent_init(&nd.intent, IT_MKDIR);
+		nd.intent.it_create_mode = mode;
+		error = path_lookup_it(tmp, LOOKUP_PARENT, &nd);
 		if (error)
 			goto out;
+		if (nd.intent.it_flags & IT_STATUS_RAW) {
+			error = nd.intent.it_create.raw_status;
+			goto out2;
+		}
 		dentry = lookup_create(&nd, 1);
 		error = PTR_ERR(dentry);
 		if (!IS_ERR(dentry)) {
@@ -1606,6 +1632,7 @@
 			dput(dentry);
 		}
 		up(&nd.dentry->d_inode->i_sem);
+out2:
 		path_release(&nd);
 out:
 		putname(tmp);
@@ -1691,9 +1718,14 @@
 	if(IS_ERR(name))
 		return PTR_ERR(name);
 
-	error = path_lookup(name, LOOKUP_PARENT, &nd);
+	intent_init(&nd.intent, IT_RMDIR);
+	error = path_lookup_it(name, LOOKUP_PARENT, &nd);
 	if (error)
 		goto exit;
+	if (nd.intent.it_flags & IT_STATUS_RAW) {
+		error = nd.intent.it_create.raw_status;
+		goto exit1;
+	}
 
 	switch(nd.last_type) {
 		case LAST_DOTDOT:
@@ -1769,9 +1801,15 @@
 	if(IS_ERR(name))
 		return PTR_ERR(name);
 
-	error = path_lookup(name, LOOKUP_PARENT, &nd);
+	intent_init(&nd.intent, IT_UNLINK);
+	error = path_lookup_it(name, LOOKUP_PARENT, &nd);
 	if (error)
 		goto exit;
+	if (nd.intent.it_flags & IT_STATUS_RAW) {
+		error = nd.intent.it_create.raw_status;
+		goto exit1;
+	}
+
 	error = -EISDIR;
 	if (nd.last_type != LAST_NORM)
 		goto exit1;
@@ -1843,9 +1881,15 @@
 		struct dentry *dentry;
 		struct nameidata nd;
 
-		error = path_lookup(to, LOOKUP_PARENT, &nd);
+		intent_init(&nd.intent, IT_SYMLINK);
+		nd.intent.it_create.link = from;
+		error = path_lookup_it(to, LOOKUP_PARENT, &nd);
 		if (error)
 			goto out;
+		if (nd.intent.it_flags & IT_STATUS_RAW) {
+			error = nd.intent.it_create.raw_status;
+			goto out2;
+		}
 		dentry = lookup_create(&nd, 0);
 		error = PTR_ERR(dentry);
 		if (!IS_ERR(dentry)) {
@@ -1853,6 +1897,7 @@
 			dput(dentry);
 		}
 		up(&nd.dentry->d_inode->i_sem);
+out2:
 		path_release(&nd);
 out:
 		putname(to);
@@ -1924,9 +1969,15 @@
 	error = __user_walk(oldname, 0, &old_nd);
 	if (error)
 		goto exit;
-	error = path_lookup(to, LOOKUP_PARENT, &nd);
+	intent_init(&nd.intent, IT_LINK);
+	nd.intent.it_create.source_nd = &old_nd;
+	error = path_lookup_it(to, LOOKUP_PARENT, &nd);
 	if (error)
 		goto out;
+	if (nd.intent.it_flags & IT_STATUS_RAW) {
+		error = nd.intent.it_create.raw_status;
+		goto out_release;
+	}
 	error = -EXDEV;
 	if (old_nd.mnt != nd.mnt)
 		goto out_release;
@@ -2107,9 +2158,15 @@
 	if (error)
 		goto exit;
 
-	error = path_lookup(newname, LOOKUP_PARENT, &newnd);
+	intent_init(&newnd.intent, IT_RENAME);
+	newnd.intent.it_create.source_nd = &oldnd;
+	error = path_lookup_it(newname, LOOKUP_PARENT, &newnd);
 	if (error)
 		goto exit1;
+	if (newnd.intent.it_flags & IT_STATUS_RAW) {
+		error = newnd.intent.it_create.raw_status;
+		goto exit2;
+	}
 
 	error = -EXDEV;
 	if (oldnd.mnt != newnd.mnt)
Index: linux-2.6.6/include/linux/namei.h
===================================================================
--- linux-2.6.6.orig/include/linux/namei.h	2004-06-02 17:01:51.091409160 +0300
+++ linux-2.6.6/include/linux/namei.h	2004-06-02 17:01:54.912828216 +0300
@@ -15,16 +15,33 @@
 #define IT_UNLINK	(1<<5)
 #define IT_TRUNC	(1<<6)
 #define IT_GETXATTR	(1<<7)
+#define IT_RMDIR	(1<<8)
+#define IT_LINK		(1<<9)
+#define IT_RENAME	(1<<10)
+#define IT_MKDIR	(1<<11)
+#define IT_MKNOD	(1<<12)
+#define IT_SYMLINK	(1<<13)
 
 #define INTENT_MAGIC 0x19620323
+#define IT_STATUS_RAW (1<<10)	/* Setting this in it_flags on exit from lookup
+				   means everything was done already and return
+				   value from lookup is in fact status of
+				   already performed operation */
 struct lookup_intent {
 	int     it_magic;
 	int     it_op;
 	void    (*it_op_release)(struct lookup_intent *);
 	int     it_flags;
 	int     it_create_mode;
+	union {
+		int	raw_status;	/* return value from raw method */
+		unsigned	dev;	/* For mknod */
+		char	*link;	/* For symlink */
+		struct nameidata *source_nd; /* For link/rename */
+	} it_create;
 };
 
+
 static inline void intent_init(struct lookup_intent *it, int op)
 {
 	memset(it, 0, sizeof(*it));

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: [PATCH/RFC] Lustre VFS patch, version 2
@ 2004-06-03 15:53 Peter J. Braam
  2004-06-06 17:00 ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Peter J. Braam @ 2004-06-03 15:53 UTC (permalink / raw)
  To: linux-kernel, torvalds, akpm
  Cc: Christoph Hellwig, axboe, kevcorry, arjanv, viro, anton,
	Trond Myklebust, Lars Marowsky-Bree

Hi,

Of course I am totally happy to include or not include the Lustre client
with it.  However, that does lead to a sizeable amount of (completely
modular) code, as it depends on the networking, lock manager, logical
volume driver and metadata and object storage clients and the management
framework.  It's 2M.

I'd like to also acknowledge that we should remove the small
incompatibility in the names of intents, to preserve api compatibility,
and add an inode method for intent execution.  Yes, the LUSTRE_INVALID
flag was discussed on irc with Al Viro: he said that probably I really
needed _something_, he said it's hairy, so it was coded to not affect
anyone that doesn't use that flag.

I have not worked on Coda for 5 years, and have nothing to say about it.
I have recently withdrawn InterMezzo to be helpful to the kernel
community.  Of course I would offer the same for Lustre.  But as I have
said before, this time there are a lot of resources to maintain this.

Perhaps it is useful to explain that vendors (Novell, Dell, HP and
others) have urged me to enquire if the hooks could go into 2.6.  All of
them have really major Lustre customers, running top10 super computing
clusters with Lustre.  Having the hooks avoids having to patch vendor
kernels, which breaks support arrangements.  As for our position, it's
in fact easier to wait and just collect clever insights from time to
time. 

I represent them here.  I understand and would respect the wait until
2.7 argument, but I think it is workable to get them into 2.6.  Is it
really a big deal to go through these small patches a few more times to
judge if they are safe, and to include them?  I think it would help
people who care and support Linux financically.  I only hear Christoph
arguing against it, are there other insights?

Again many thanks for spending time to study the patches, it has already
helped Lustre get better.

- Peter -


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-06-07 18:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-02 23:15 [PATCH/RFC] Lustre VFS patch, version 2 Peter J. Braam
2004-06-03 13:59 ` Christoph Hellwig
2004-06-03 14:19   ` Lars Marowsky-Bree
2004-06-03 14:26     ` Christoph Hellwig
2004-06-03 14:33     ` Christoph Hellwig
2004-06-03 14:49     ` Trond Myklebust
2004-06-03 18:10     ` Jan Harkes
2004-06-04  5:03     ` Daniel Phillips
2004-06-03 14:27 ` Christoph Hellwig
2004-06-04 16:55 ` Anton Blanchard
2004-06-07 18:02   ` Dipankar Sarma
  -- strict thread matches above, loose matches on Subject: below --
2004-06-03 15:53 Peter J. Braam
2004-06-06 17:00 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox