All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsd-fh: found a name that I didn't expect
@ 2003-03-28 17:28 ` Bernd Schubert
  0 siblings, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-28 17:28 UTC (permalink / raw)
  To: nfs; +Cc: reiserfs-list

Hi,

due to hardware problems I just started our fall back server and got these 
messages for 2 files:

nfsd-fh: found a name that I didn't expect: bin/uptime
nfsd-fh: found a name that I didn't expect: bin/uptime
nfsd: last server has exited
nfsd: unexporting all filesystems
nfsd-fh: found a name that I didn't expect: lib/libident.so.0
nfsd-fh: found a name that I didn't expect: lib/libident.so.0

Well, just a small explanation how our fall back-solution works:
	The server exports '/' (hda5) via nfs to all clients and via nbd to one of 
its clients (the fall back server) . The exporting via nbd is used for 
mirroring the device via a cron job (by "dd'ing" the device).
The cron-job script also executes a 'reiserfsck --fix-fixable' and afterwards 
a 'reiserfsck --check', so except doing the partionchecks, both devices 
should be identical.
However, when I started the fall-back server, it showed the messages for these 
2 files and the clients got I/O errors for these files.

The solution was to stop the nfs-server, copy the files, delete the old ones 
and move the copies back to the old names (just as I'm used to be to do when 
this happens to directories served by ClusterNFS).

Any ideas how we can prevent this in the future ?


Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* nfsd-fh: found a name that I didn't expect
@ 2003-03-28 17:28 ` Bernd Schubert
  0 siblings, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-28 17:28 UTC (permalink / raw)
  To: nfs; +Cc: reiserfs-list

Hi,

due to hardware problems I just started our fall back server and got these 
messages for 2 files:

nfsd-fh: found a name that I didn't expect: bin/uptime
nfsd-fh: found a name that I didn't expect: bin/uptime
nfsd: last server has exited
nfsd: unexporting all filesystems
nfsd-fh: found a name that I didn't expect: lib/libident.so.0
nfsd-fh: found a name that I didn't expect: lib/libident.so.0

Well, just a small explanation how our fall back-solution works:
	The server exports '/' (hda5) via nfs to all clients and via nbd to one of 
its clients (the fall back server) . The exporting via nbd is used for 
mirroring the device via a cron job (by "dd'ing" the device).
The cron-job script also executes a 'reiserfsck --fix-fixable' and afterwards 
a 'reiserfsck --check', so except doing the partionchecks, both devices 
should be identical.
However, when I started the fall-back server, it showed the messages for these 
2 files and the clients got I/O errors for these files.

The solution was to stop the nfs-server, copy the files, delete the old ones 
and move the copies back to the old names (just as I'm used to be to do when 
this happens to directories served by ClusterNFS).

Any ideas how we can prevent this in the future ?


Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-28 17:28 ` Bernd Schubert
  (?)
@ 2003-03-29 11:01 ` Oleg Drokin
  2003-03-29 11:54   ` Bernd Schubert
  2003-03-29 12:04   ` Bernd Schubert
  -1 siblings, 2 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-03-29 11:01 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nfs, reiserfs-list

Hello!

On Fri, Mar 28, 2003 at 06:28:55PM +0100, Bernd Schubert wrote:
> due to hardware problems I just started our fall back server and got these 
> messages for 2 files:
> nfsd-fh: found a name that I didn't expect: bin/uptime
> nfsd-fh: found a name that I didn't expect: bin/uptime
> nfsd: last server has exited
> nfsd: unexporting all filesystems
> nfsd-fh: found a name that I didn't expect: lib/libident.so.0
> nfsd-fh: found a name that I didn't expect: lib/libident.so.0

Hm. Does the message is visible each time you access those files?
Does reiserfsck have found anything?

> Any ideas how we can prevent this in the future ?

We do not yet understand on how to reproduce that locally, because this message
indicates pretty strange conditions.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 11:01 ` Oleg Drokin
@ 2003-03-29 11:54   ` Bernd Schubert
  2003-03-29 11:59     ` Oleg Drokin
  2003-03-29 12:04   ` Bernd Schubert
  1 sibling, 1 reply; 27+ messages in thread
From: Bernd Schubert @ 2003-03-29 11:54 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: nfs, reiserfs-list

On Saturday 29 March 2003 12:01, you wrote:
> Hello!
>
> On Fri, Mar 28, 2003 at 06:28:55PM +0100, Bernd Schubert wrote:
> > due to hardware problems I just started our fall back server and got
> > these messages for 2 files:
> > nfsd-fh: found a name that I didn't expect: bin/uptime
> > nfsd-fh: found a name that I didn't expect: bin/uptime
> > nfsd: last server has exited
> > nfsd: unexporting all filesystems
> > nfsd-fh: found a name that I didn't expect: lib/libident.so.0
> > nfsd-fh: found a name that I didn't expect: lib/libident.so.0
>
> Hm. Does the message is visible each time you access those files?

Yes, probably since uptime and libident.so.0 were called in and endless loop 
from the clients, the nfs-servers log was filled with those messages and I 
had to stop the nfsd and do the cp- and mv-procedure for those files.

> Does reiserfsck have found anything?

As I said, the backup script runs a reiserfsck itself and found no problems, I 
hope there is no corruption after a simple boot, but I will check this later 
on this day.

>
> > Any ideas how we can prevent this in the future ?
>
> We do not yet understand on how to reproduce that locally, because this
> message indicates pretty strange conditions.

Is there anything I can do to debug this, when I observe this phenomena the 
next time  (e.g. enabling nfs-debugging via proc, etc) ?

Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 11:54   ` Bernd Schubert
@ 2003-03-29 11:59     ` Oleg Drokin
  2003-03-29 14:09       ` Bernd Schubert
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Drokin @ 2003-03-29 11:59 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: reiserfs-list

Hello!

On Sat, Mar 29, 2003 at 12:54:02PM +0100, Bernd Schubert wrote:
> > > due to hardware problems I just started our fall back server and got
> > > these messages for 2 files:
> > > nfsd-fh: found a name that I didn't expect: bin/uptime
> > > nfsd-fh: found a name that I didn't expect: bin/uptime
> > > nfsd: last server has exited
> > > nfsd: unexporting all filesystems
> > > nfsd-fh: found a name that I didn't expect: lib/libident.so.0
> > > nfsd-fh: found a name that I didn't expect: lib/libident.so.0
> > Hm. Does the message is visible each time you access those files?
> Yes, probably since uptime and libident.so.0 were called in and endless loop 
> from the clients, the nfs-servers log was filled with those messages and I 
> had to stop the nfsd and do the cp- and mv-procedure for those files.

Hm.

> > Does reiserfsck have found anything?
> As I said, the backup script runs a reiserfsck itself and found no problems, I 
> hope there is no corruption after a simple boot, but I will check this later 
> on this day.

Ok, let us know of the results.

> > We do not yet understand on how to reproduce that locally, because this
> > message indicates pretty strange conditions.
> Is there anything I can do to debug this, when I observe this phenomena the 
> next time  (e.g. enabling nfs-debugging via proc, etc) ?

Please try to reboot the server not changing fs, if the problem still persists,
then take FS metadata snapshot (debugreiserfs -p /dev/device | bzip2 -9c >metadata.bz2,
use recent reiserfsprogs please) and make it available for us to download, please.

Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 11:01 ` Oleg Drokin
  2003-03-29 11:54   ` Bernd Schubert
@ 2003-03-29 12:04   ` Bernd Schubert
  1 sibling, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-29 12:04 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: nfs, reiserfs-list

>
> We do not yet understand on how to reproduce that locally, because this
> message indicates pretty strange conditions.

Oh, I just logged in from home and checked the logfiles and saw that 
/usr/sbin/logrotate is affected as well. However, this time the clients don't 
get I/O-errors for this file, its only the server that reports the problems.

Well, if you like, I could provide you a dd-image of this partition, though 
its rather large (19GB).


Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 11:59     ` Oleg Drokin
@ 2003-03-29 14:09       ` Bernd Schubert
  2003-03-29 14:13         ` Oleg Drokin
  0 siblings, 1 reply; 27+ messages in thread
From: Bernd Schubert @ 2003-03-29 14:09 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

Hello,

> > > Does reiserfsck have found anything?
> >
> > As I said, the backup script runs a reiserfsck itself and found no
> > problems, I hope there is no corruption after a simple boot, but I will
> > check this later on this day.
>
> Ok, let us know of the results.

As I expected, 'reiserfsck --check' didn't find any problems.

>
> > > We do not yet understand on how to reproduce that locally, because this
> > > message indicates pretty strange conditions.
> >
> > Is there anything I can do to debug this, when I observe this phenomena
> > the next time  (e.g. enabling nfs-debugging via proc, etc) ?
>
> Please try to reboot the server not changing fs, if the problem still
> persists, then take FS metadata snapshot (debugreiserfs -p /dev/device |
> bzip2 -9c >metadata.bz2, use recent reiserfsprogs please) and make it
> available for us to download, please.
>

As noted in my other e-email, /usr/sbin/logrotate was also affected. I don't 
like it to say, but after rebooting everything was fine, so those messages 
doesn't  appear any longer. 
Does this mean memory problems again ? We even have loaded the 
ecc-kernel-module since last week, to be able to problems befor the 
filesystem is affected, but it didn't show any memory-problem on this 
machine.
Or could it be that we have 3GB RAM and the vanilla kernel is not suitable for 
this (I read something like this in the LKML, but can't remember the 
details).


Thanks,
	Bernd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 14:09       ` Bernd Schubert
@ 2003-03-29 14:13         ` Oleg Drokin
  2003-03-29 15:06           ` Bernd Schubert
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Drokin @ 2003-03-29 14:13 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: reiserfs-list

Hello!

On Sat, Mar 29, 2003 at 03:09:23PM +0100, Bernd Schubert wrote:
> > Ok, let us know of the results.
> As I expected, 'reiserfsck --check' didn't find any problems.

Ok.

> As noted in my other e-email, /usr/sbin/logrotate was also affected. I don't 
> like it to say, but after rebooting everything was fine, so those messages 
> doesn't  appear any longer. 
> Does this mean memory problems again ? We even have loaded the 

Not likely.

> ecc-kernel-module since last week, to be able to problems befor the 
> filesystem is affected, but it didn't show any memory-problem on this 
> machine.
> Or could it be that we have 3GB RAM and the vanilla kernel is not suitable for 
> this (I read something like this in the LKML, but can't remember the 
> details).

The problems seems to have something to do with strange dentries appearing in dentry cache,
also you have verified that this is not because of fs error.
What kernel are you running?
I think may be race with iget4 might cause this, but I am not sure.
We have a patch for this iget4 race, and if you are willing to test it, I can
send it to you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 14:13         ` Oleg Drokin
@ 2003-03-29 15:06           ` Bernd Schubert
  2003-03-29 17:37             ` Oleg Drokin
  0 siblings, 1 reply; 27+ messages in thread
From: Bernd Schubert @ 2003-03-29 15:06 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

Hello,

> > As noted in my other e-email, /usr/sbin/logrotate was also affected. I
> > don't like it to say, but after rebooting everything was fine, so those
> > messages doesn't  appear any longer.
> > Does this mean memory problems again ? We even have loaded the
>
> Not likely.

The first good message this day! :) I was really worried again.

>
> The problems seems to have something to do with strange dentries appearing
> in dentry cache, also you have verified that this is not because of fs
> error.
> What kernel are you running?

2.4.20 with the ptrace patch.

> I think may be race with iget4 might cause this, but I am not sure.
> We have a patch for this iget4 race, and if you are willing to test it, I
> can send it to you.

Yes of course I am. I hope the other admins won't kill me, but I guess they 
know that I willing to do experiments from time to time ;-)
Could something worse happen when I try the patch, that could not be fixed by 
simple rebooting ?

The problems is to detect if this is really caused by this race. Since I still 
don't know a way how to reliable trigger this and  even a reboot can fix it, 
it is difficult to get to know if the patch would be really helpful. 
Do you see a way  to enable your patch via proc interface ? So I could try to 
reboot the server and run 'find /' on the client as loop until the problem 
occurs. When I see the problem, I could enable it via proc and check if it 
works.

Update: While I was writing and thinking about how to trigger it, I just again 
saw the problem for /usr/bin/uptime and another file and again a reboot could 
fix it for those files.


Thanks a lot for your help,
	Bernd


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 15:06           ` Bernd Schubert
@ 2003-03-29 17:37             ` Oleg Drokin
  2003-03-29 18:22               ` Bernd Schubert
                                 ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-03-29 17:37 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

Hello!

On Sat, Mar 29, 2003 at 04:06:54PM +0100, Bernd Schubert wrote:
> > The problems seems to have something to do with strange dentries appearing
> > in dentry cache, also you have verified that this is not because of fs
> > error.
> > What kernel are you running?
> 2.4.20 with the ptrace patch.

Ok.

> > I think may be race with iget4 might cause this, but I am not sure.
> > We have a patch for this iget4 race, and if you are willing to test it, I
> > can send it to you.
> Yes of course I am. I hope the other admins won't kill me, but I guess they 
> know that I willing to do experiments from time to time ;-)
> Could something worse happen when I try the patch, that could not be fixed by 
> simple rebooting ?

See the patch below.
It should not break anything.

> The problems is to detect if this is really caused by this race. Since I still 

Well, race is of type when several parallel processes try to access the file
whose inode is not in memory.
This resembles what you describe with lots of clients accessing
same file over NFS at the same time.

> Update: While I was writing and thinking about how to trigger it, I just again 
> saw the problem for /usr/bin/uptime and another file and again a reboot could 
> fix it for those files.

Bye,
    Oleg

[-- Attachment #2: iget5_locked_2.4.20.diff --]
[-- Type: text/plain, Size: 34608 bytes --]

===== Documentation/filesystems/Locking 1.5 vs edited =====
--- 1.5/Documentation/filesystems/Locking	Thu Sep  5 00:14:54 2002
+++ edited/Documentation/filesystems/Locking	Fri Mar 21 15:23:44 2003
@@ -112,7 +112,7 @@
 remount_fs:	yes	yes	maybe		(see below)
 umount_begin:	yes	no	maybe		(see below)
 
-->read_inode() is not a method - it's a callback used in iget()/iget4().
+->read_inode() is not a method - it's a callback used in iget().
 rules for mount_sem are not too nice - it is going to die and be replaced
 by better scheme anyway.
 
===== fs/Makefile 1.16 vs edited =====
--- 1.16/fs/Makefile	Thu Sep 12 04:00:00 2002
+++ edited/fs/Makefile	Fri Mar 21 15:23:46 2003
@@ -7,7 +7,7 @@
 
 O_TARGET := fs.o
 
-export-objs :=	filesystems.o open.o dcache.o buffer.o
+export-objs :=	filesystems.o open.o dcache.o buffer.o inode.o
 mod-subdirs :=	nls
 
 obj-y :=	open.o read_write.o devices.o file_table.o buffer.o \
===== fs/inode.c 1.35 vs edited =====
--- 1.35/fs/inode.c	Sat Mar 16 01:06:57 2002
+++ edited/fs/inode.c	Fri Mar 21 16:20:29 2003
@@ -17,6 +17,7 @@
 #include <linux/swapctl.h>
 #include <linux/prefetch.h>
 #include <linux/locks.h>
+#include <linux/module.h>
 
 /*
  * New inode.c implementation.
@@ -643,7 +644,7 @@
 	invalidate_buffers(dev);
 	return res;
 }
-
+EXPORT_SYMBOL(unlock_new_inode);
 
 /*
  * This is called with the inode lock held. It searches
@@ -734,7 +735,32 @@
  * by hand after calling find_inode now! This simplifies iunique and won't
  * add any additional branch in the common code.
  */
-static struct inode * find_inode(struct super_block * sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
+static struct inode * find_inode(struct super_block * sb, struct list_head *head, int (*test)(struct inode *, void *), void *data)
+{
+	struct list_head *tmp;
+	struct inode * inode;
+
+	tmp = head;
+	for (;;) {
+		tmp = tmp->next;
+		inode = NULL;
+		if (tmp == head)
+			break;
+		inode = list_entry(tmp, struct inode, i_hash);
+		if (inode->i_sb != sb)
+			continue;
+		if (!test(inode, data))
+			continue;
+		break;
+	}
+	return inode;
+}
+
+/*
+ * find_inode_fast is the fast path version of find_inode, see the comment at
+ * iget_locked for details.
+ */
+static struct inode * find_inode_fast(struct super_block * sb, struct list_head *head, unsigned long ino)
 {
 	struct list_head *tmp;
 	struct inode * inode;
@@ -750,8 +776,6 @@
 			continue;
 		if (inode->i_sb != sb)
 			continue;
-		if (find_actor && !find_actor(inode, ino, opaque))
-			continue;
 		break;
 	}
 	return inode;
@@ -827,13 +851,28 @@
 	return inode;
 }
 
+void unlock_new_inode(struct inode *inode)
+{
+	/*
+	 * This is special!  We do not need the spinlock
+	 * when clearing I_LOCK, because we're guaranteed
+	 * that nobody else tries to do anything about the
+	 * state of the inode when it is locked, as we
+	 * just created it (so there can be no old holders
+	 * that haven't tested I_LOCK).
+	 */
+	inode->i_state &= ~(I_LOCK|I_NEW);
+	wake_up(&inode->i_wait);
+}
+
+
 /*
  * This is called without the inode lock held.. Be careful.
  *
  * We no longer cache the sb_flags in i_flags - see fs.h
  *	-- rmk@arm.uk.linux.org
  */
-static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
+static struct inode * get_new_inode(struct super_block *sb, struct list_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data)
 {
 	struct inode * inode;
 
@@ -843,44 +882,27 @@
 
 		spin_lock(&inode_lock);
 		/* We released the lock, so.. */
-		old = find_inode(sb, ino, head, find_actor, opaque);
+		old = find_inode(sb, head, test, data);
 		if (!old) {
+			if (set(inode, data))
+				goto set_failed;
+
 			inodes_stat.nr_inodes++;
 			list_add(&inode->i_list, &inode_in_use);
 			list_add(&inode->i_hash, head);
 			inode->i_sb = sb;
 			inode->i_dev = sb->s_dev;
 			inode->i_blkbits = sb->s_blocksize_bits;
-			inode->i_ino = ino;
 			inode->i_flags = 0;
 			atomic_set(&inode->i_count, 1);
-			inode->i_state = I_LOCK;
+			inode->i_state = I_LOCK|I_NEW;
 			spin_unlock(&inode_lock);
 
 			clean_inode(inode);
 
-			/* reiserfs specific hack right here.  We don't
-			** want this to last, and are looking for VFS changes
-			** that will allow us to get rid of it.
-			** -- mason@suse.com 
-			*/
-			if (sb->s_op->read_inode2) {
-				sb->s_op->read_inode2(inode, opaque) ;
-			} else {
-				sb->s_op->read_inode(inode);
-			}
-
-			/*
-			 * This is special!  We do not need the spinlock
-			 * when clearing I_LOCK, because we're guaranteed
-			 * that nobody else tries to do anything about the
-			 * state of the inode when it is locked, as we
-			 * just created it (so there can be no old holders
-			 * that haven't tested I_LOCK).
+			/* Return the locked inode with I_NEW set, the
+			 * caller is responsible for filling in the contents
 			 */
-			inode->i_state &= ~I_LOCK;
-			wake_up(&inode->i_wait);
-
 			return inode;
 		}
 
@@ -896,11 +918,53 @@
 		wait_on_inode(inode);
 	}
 	return inode;
+
+set_failed:
+	spin_unlock(&inode_lock);
+	destroy_inode(inode);
+	return NULL;
+}
+
+/*
+ * get_new_inode_fast is the fast path version of get_new_inode, see the
+ * comment at iget_locked for details.
+ */
+static struct inode * get_new_inode_fast(struct super_block *sb, struct list_head *head, unsigned long ino)
+{
+	struct inode * inode;
+
+	inode = alloc_inode();
+	if (inode) {
+		struct inode * old;
+
+		spin_lock(&inode_lock);
+		/* We released the lock, so.. */
+		old = find_inode_fast(sb, head, ino);
+		if (!old) {
+			inode->i_ino = ino;
+			inodes_stat.nr_inodes++;
+			list_add(&inode->i_list, &inode_in_use);
+			list_add(&inode->i_hash, head);
+			inode->i_sb = sb;
+			inode->i_dev = sb->s_dev;
+			inode->i_blkbits = sb->s_blocksize_bits;
+			inode->i_flags = 0;
+			inode->i_state = I_LOCK|I_NEW;
+			spin_unlock(&inode_lock);
+
+			clean_inode(inode);
+			/* Return the locked inode with I_NEW set, the
+			 * caller is responsible for filling in the contents
+			 */
+			return inode;
+		}
+	}
+	return inode;
 }
 
-static inline unsigned long hash(struct super_block *sb, unsigned long i_ino)
+static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
 {
-	unsigned long tmp = i_ino + ((unsigned long) sb / L1_CACHE_BYTES);
+	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
 	tmp = tmp + (tmp >> I_HASHBITS);
 	return tmp & I_HASHMASK;
 }
@@ -932,7 +996,8 @@
 retry:
 	if (counter > max_reserved) {
 		head = inode_hashtable + hash(sb,counter);
-		inode = find_inode(sb, res = counter++, head, NULL, NULL);
+		res = counter++;
+		inode = find_inode_fast(sb, head, res);
 		if (!inode) {
 			spin_unlock(&inode_lock);
 			return res;
@@ -960,14 +1025,63 @@
 	return inode;
 }
 
+/**
+ * ifind - internal function, you want ilookup5() or iget5().
+ * @sb:		super block of file system to search
+ * @hashval:	hash value (usually inode number) to search for
+ * @test:	callback used for comparisons between inodes
+ * @data:	opaque data pointer to pass to @test
+ *
+ * ifind() searches for the inode specified by @hashval and @data in the inode
+ * cache. This is a generalized version of ifind_fast() for file systems where
+ * the inode number is not sufficient for unique identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ *
+ * Note, @test is called with the inode_lock held, so can't sleep.
+ */
+static inline struct inode *ifind(struct super_block *sb,
+		struct list_head *head, int (*test)(struct inode *, void *),
+		void *data)
+{
+	struct inode *inode;
+
+	spin_lock(&inode_lock);
+	inode = find_inode(sb, head, test, data);
+	if (inode) {
+		__iget(inode);
+		spin_unlock(&inode_lock);
+		wait_on_inode(inode);
+		return inode;
+	}
+	spin_unlock(&inode_lock);
+	return NULL;
+}
 
-struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque)
+/**
+ * ifind_fast - internal function, you want ilookup() or iget().
+ * @sb:		super block of file system to search
+ * @ino:	inode number to search for
+ *
+ * ifind_fast() searches for the inode @ino in the inode cache. This is for
+ * file systems where the inode number is sufficient for unique identification
+ * of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ */
+static inline struct inode *ifind_fast(struct super_block *sb,
+		struct list_head *head, unsigned long ino)
 {
-	struct list_head * head = inode_hashtable + hash(sb,ino);
-	struct inode * inode;
+	struct inode *inode;
 
 	spin_lock(&inode_lock);
-	inode = find_inode(sb, ino, head, find_actor, opaque);
+	inode = find_inode_fast(sb, head, ino);
 	if (inode) {
 		__iget(inode);
 		spin_unlock(&inode_lock);
@@ -975,27 +1089,147 @@
 		return inode;
 	}
 	spin_unlock(&inode_lock);
+	return NULL;
+}
 
+/**
+ * ilookup5 - search for an inode in the inode cache
+ * @sb:		super block of file system to search
+ * @hashval:	hash value (usually inode number) to search for
+ * @test:	callback used for comparisons between inodes
+ * @data:	opaque data pointer to pass to @test
+ *
+ * ilookup5() uses ifind() to search for the inode specified by @hashval and
+ * @data in the inode cache. This is a generalized version of ilookup() for
+ * file systems where the inode number is not sufficient for unique
+ * identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ *
+ * Note, @test is called with the inode_lock held, so can't sleep.
+ */
+struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
+		int (*test)(struct inode *, void *), void *data)
+{
+	struct list_head *head = inode_hashtable + hash(sb, hashval);
+
+	return ifind(sb, head, test, data);
+}
+EXPORT_SYMBOL(ilookup5);
+
+/**
+ * ilookup - search for an inode in the inode cache
+ * @sb:		super block of file system to search
+ * @ino:	inode number to search for
+ *
+ * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache.
+ * This is for file systems where the inode number is sufficient for unique
+ * identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ */
+struct inode *ilookup(struct super_block *sb, unsigned long ino)
+{
+	struct list_head *head = inode_hashtable + hash(sb, ino);
+
+	return ifind_fast(sb, head, ino);
+}
+EXPORT_SYMBOL(ilookup);
+
+/**
+ * iget5_locked - obtain an inode from a mounted file system
+ * @sb:		super block of file system
+ * @hashval:	hash value (usually inode number) to get
+ * @test:	callback used for comparisons between inodes
+ * @set:	callback used to initialize a new struct inode
+ * @data:	opaque data pointer to pass to @test and @set
+ *
+ * This is iget() without the read_inode() portion of get_new_inode().
+ *
+ * iget5_locked() uses ifind() to search for the inode specified by @hashval
+ * and @data in the inode cache and if present it is returned with an increased
+ * reference count. This is a generalized version of iget_locked() for file
+ * systems where the inode number is not sufficient for unique identification
+ * of an inode.
+ *
+ * If the inode is not in cache, get_new_inode() is called to allocate a new
+ * inode and this is returned locked, hashed, and with the I_NEW flag set. The
+ * file system gets to fill it in before unlocking it via unlock_new_inode().
+ *
+ * Note both @test and @set are called with the inode_lock held, so can't sleep.
+ */
+struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
+		int (*test)(struct inode *, void *),
+		int (*set)(struct inode *, void *), void *data)
+{
+	struct list_head *head = inode_hashtable + hash(sb, hashval);
+	struct inode *inode;
+
+	inode = ifind(sb, head, test, data);
+	if (inode)
+		return inode;
 	/*
 	 * get_new_inode() will do the right thing, re-trying the search
 	 * in case it had to block at any point.
 	 */
-	return get_new_inode(sb, ino, head, find_actor, opaque);
+	return get_new_inode(sb, head, test, set, data);
+}
+EXPORT_SYMBOL(iget5_locked);
+
+/**
+ * iget_locked - obtain an inode from a mounted file system
+ * @sb:		super block of file system
+ * @ino:	inode number to get
+ *
+ * This is iget() without the read_inode() portion of get_new_inode_fast().
+ *
+ * iget_locked() uses ifind_fast() to search for the inode specified by @ino in
+ * the inode cache and if present it is returned with an increased reference
+ * count. This is for file systems where the inode number is sufficient for
+ * unique identification of an inode.
+ *
+ * If the inode is not in cache, get_new_inode_fast() is called to allocate a
+ * new inode and this is returned locked, hashed, and with the I_NEW flag set.
+ * The file system gets to fill it in before unlocking it via
+ * unlock_new_inode().
+ */
+struct inode *iget_locked(struct super_block *sb, unsigned long ino)
+{
+	struct list_head *head = inode_hashtable + hash(sb, ino);
+	struct inode *inode;
+
+	inode = ifind_fast(sb, head, ino);
+	if (inode)
+		return inode;
+	/*
+	 * get_new_inode_fast() will do the right thing, re-trying the search
+	 * in case it had to block at any point.
+	 */
+	return get_new_inode_fast(sb, head, ino);
 }
+EXPORT_SYMBOL(iget_locked);
 
 /**
- *	insert_inode_hash - hash an inode
+ *	__insert_inode_hash - hash an inode
  *	@inode: unhashed inode
+ *	@hashval: unsigned long value used to locate this object in the
+ *		inode_hashtable.
  *
  *	Add an inode to the inode hash for this superblock. If the inode
  *	has no superblock it is added to a separate anonymous chain.
  */
  
-void insert_inode_hash(struct inode *inode)
+void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 {
 	struct list_head *head = &anon_hash_chain;
 	if (inode->i_sb)
-		head = inode_hashtable + hash(inode->i_sb, inode->i_ino);
+		head = inode_hashtable + hash(inode->i_sb, hashval);
 	spin_lock(&inode_lock);
 	list_add(&inode->i_hash, head);
 	spin_unlock(&inode_lock);
===== fs/coda/cnode.c 1.8 vs edited =====
--- 1.8/fs/coda/cnode.c	Wed May 29 19:20:33 2002
+++ edited/fs/coda/cnode.c	Fri Mar 21 15:23:49 2003
@@ -27,11 +27,6 @@
 	return 1;
 }
 
-static int coda_inocmp(struct inode *inode, unsigned long ino, void *opaque)
-{
-	return (coda_fideq((ViceFid *)opaque, &(ITOC(inode)->c_fid)));
-}
-
 static struct inode_operations coda_symlink_inode_operations = {
 	readlink:	page_readlink,
 	follow_link:	page_follow_link,
@@ -62,29 +57,46 @@
                 init_special_inode(inode, inode->i_mode, attr->va_rdev);
 }
 
+static int coda_test_inode(struct inode *inode, void *data)
+{
+	ViceFid *fid = (ViceFid *)data;
+	return coda_fideq(&(ITOC(inode)->c_fid), fid);
+}
+
+static int coda_set_inode(struct inode *inode, void *data)
+{
+	ViceFid *fid = (ViceFid *)data;
+	ITOC(inode)->c_fid = *fid;
+	return 0;
+}
+
+static int coda_fail_inode(struct inode *inode, void *data)
+{
+	return -1;
+}
+
 struct inode * coda_iget(struct super_block * sb, ViceFid * fid,
 			 struct coda_vattr * attr)
 {
 	struct inode *inode;
 	struct coda_inode_info *cii;
-	ino_t ino = coda_f2i(fid);
 	struct coda_sb_info *sbi = coda_sbp(sb);
+	unsigned long hash = coda_f2i(fid);
 
-	down(&sbi->sbi_iget4_mutex);
-	inode = iget4(sb, ino, coda_inocmp, fid);
+	inode = iget5_locked(sb, hash, coda_test_inode, coda_set_inode, fid);
 
 	if ( !inode ) { 
-		CDEBUG(D_CNODE, "coda_iget: no inode\n");
-		up(&sbi->sbi_iget4_mutex);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	/* check if the inode is already initialized */
-	cii = ITOC(inode);
-	if (coda_isnullfid(&cii->c_fid))
-		/* new, empty inode found... initializing */
-		cii->c_fid = *fid;
-	up(&sbi->sbi_iget4_mutex);
+	if (inode->i_state & I_NEW) {
+		cii = ITOC(inode);
+		/* we still need to set i_ino for things like stat(2) */
+		inode->i_ino = hash;
+		list_add(&cii->c_cilist, &sbi->sbi_cihead);
+		unlock_new_inode(inode);
+	}
+
 
 	/* always replace the attributes, type might have changed */
 	coda_fill_inode(inode, attr);
@@ -129,6 +141,7 @@
 		      struct ViceFid *newfid)
 {
 	struct coda_inode_info *cii;
+	unsigned long hash = coda_f2i(newfid);
 	
 	cii = ITOC(inode);
 
@@ -139,17 +152,16 @@
 	/* XXX we probably need to hold some lock here! */
 	remove_inode_hash(inode);
 	cii->c_fid = *newfid;
-	inode->i_ino = coda_f2i(newfid);
-	insert_inode_hash(inode);
+	inode->i_ino = hash;
+	__insert_inode_hash(inode, hash);
 }
 
 /* convert a fid to an inode. */
 struct inode *coda_fid_to_inode(ViceFid *fid, struct super_block *sb) 
 {
-	ino_t nr;
+	
 	struct inode *inode;
-	struct coda_inode_info *cii;
-	struct coda_sb_info *sbi;
+	unsigned long hash = coda_f2i(fid);
 
 	if ( !sb ) {
 		printk("coda_fid_to_inode: no sb!\n");
@@ -158,47 +170,29 @@
 
 	CDEBUG(D_INODE, "%s\n", coda_f2s(fid));
 
-	sbi = coda_sbp(sb);
-	nr = coda_f2i(fid);
-	down(&sbi->sbi_iget4_mutex);
-	inode = iget4(sb, nr, coda_inocmp, fid);
-	if ( !inode ) {
-		printk("coda_fid_to_inode: null from iget, sb %p, nr %ld.\n",
-		       sb, (long)nr);
-		goto out_unlock;
-	}
-
-	cii = ITOC(inode);
+	inode = iget5_locked(sb, hash, coda_test_inode, coda_fail_inode, fid);
+	if ( !inode )
+		return NULL;
 
-	/* The inode could already be purged due to memory pressure */
-	if (coda_isnullfid(&cii->c_fid)) {
-		inode->i_nlink = 0;
-		iput(inode);
-		goto out_unlock;
-	}
+	/* we should never see newly created inodes because we intentionally
+	 * fail in the initialization callback */
+	BUG_ON(inode->i_state & I_NEW);
 
         CDEBUG(D_INODE, "found %ld\n", inode->i_ino);
-	up(&sbi->sbi_iget4_mutex);
 	return inode;
-
-out_unlock:
-	up(&sbi->sbi_iget4_mutex);
-	return NULL;
 }
 
 /* the CONTROL inode is made without asking attributes from Venus */
 int coda_cnode_makectl(struct inode **inode, struct super_block *sb)
 {
-	int error = 0;
+	int error = -ENOMEM;
 
 	*inode = iget(sb, CTL_INO);
-	if ( *inode ) {
+	if (*inode) {
 		(*inode)->i_op = &coda_ioctl_inode_operations;
 		(*inode)->i_fop = &coda_ioctl_operations;
 		(*inode)->i_mode = 0444;
 		error = 0;
-	} else { 
-		error = -ENOMEM;
 	}
     
 	return error;
===== fs/coda/inode.c 1.9 vs edited =====
--- 1.9/fs/coda/inode.c	Wed May 29 19:17:41 2002
+++ edited/fs/coda/inode.c	Fri Mar 21 15:23:51 2003
@@ -34,7 +34,6 @@
 
 /* VFS super_block ops */
 static struct super_block *coda_read_super(struct super_block *, void *, int);
-static void coda_read_inode(struct inode *);
 static void coda_clear_inode(struct inode *);
 static void coda_put_super(struct super_block *);
 static int coda_statfs(struct super_block *sb, struct statfs *buf);
@@ -42,7 +41,6 @@
 /* exported operations */
 struct super_operations coda_super_operations =
 {
-	read_inode:	coda_read_inode,
 	clear_inode:	coda_clear_inode,
 	put_super:	coda_put_super,
 	statfs:		coda_statfs,
@@ -179,24 +177,6 @@
 
 	printk("Coda: Bye bye.\n");
 	kfree(sbi);
-}
-
-/* all filling in of inodes postponed until lookup */
-static void coda_read_inode(struct inode *inode)
-{
-	struct coda_sb_info *sbi = coda_sbp(inode->i_sb);
-	struct coda_inode_info *cii;
-
-        if (!sbi) BUG();
-
-	cii = ITOC(inode);
-	if (!coda_isnullfid(&cii->c_fid)) {
-            printk("coda_read_inode: initialized inode");
-            return;
-        }
-
-	cii->c_mapcount = 0;
-	list_add(&cii->c_cilist, &sbi->sbi_cihead);
 }
 
 static void coda_clear_inode(struct inode *inode)
===== fs/nfs/inode.c 1.18 vs edited =====
--- 1.18/fs/nfs/inode.c	Thu Aug 15 05:05:32 2002
+++ edited/fs/nfs/inode.c	Fri Mar 21 15:23:51 2003
@@ -45,7 +45,6 @@
 void nfs_zap_caches(struct inode *);
 static void nfs_invalidate_inode(struct inode *);
 
-static void nfs_read_inode(struct inode *);
 static void nfs_write_inode(struct inode *,int);
 static void nfs_delete_inode(struct inode *);
 static void nfs_put_super(struct super_block *);
@@ -55,7 +54,6 @@
 static int  nfs_show_options(struct seq_file *, struct vfsmount *);
 
 static struct super_operations nfs_sops = { 
-	read_inode:	nfs_read_inode,
 	write_inode:	nfs_write_inode,
 	delete_inode:	nfs_delete_inode,
 	put_super:	nfs_put_super,
@@ -92,30 +90,6 @@
 	return nfs_fileid_to_ino_t(fattr->fileid);
 }
 
-/*
- * The "read_inode" function doesn't actually do anything:
- * the real data is filled in later in nfs_fhget. Here we
- * just mark the cache times invalid, and zero out i_mode
- * (the latter makes "nfs_refresh_inode" do the right thing
- * wrt pipe inodes)
- */
-static void
-nfs_read_inode(struct inode * inode)
-{
-	inode->i_blksize = inode->i_sb->s_blocksize;
-	inode->i_mode = 0;
-	inode->i_rdev = 0;
-	/* We can't support UPDATE_ATIME(), since the server will reset it */
-	inode->i_flags |= S_NOATIME;
-	INIT_LIST_HEAD(&inode->u.nfs_i.read);
-	INIT_LIST_HEAD(&inode->u.nfs_i.dirty);
-	INIT_LIST_HEAD(&inode->u.nfs_i.commit);
-	INIT_LIST_HEAD(&inode->u.nfs_i.writeback);
-	NFS_CACHEINV(inode);
-	NFS_ATTRTIMEO(inode) = NFS_MINATTRTIMEO(inode);
-	NFS_ATTRTIMEO_UPDATE(inode) = jiffies;
-}
-
 static void
 nfs_write_inode(struct inode *inode, int sync)
 {
@@ -634,7 +608,6 @@
 	 * do this once. (We don't allow inodes to change types.)
 	 */
 	if (inode->i_mode == 0) {
-		NFS_FILEID(inode) = fattr->fileid;
 		inode->i_mode = fattr->mode;
 		/* Why so? Because we want revalidate for devices/FIFOs, and
 		 * that's precisely what we have in nfs_file_inode_operations.
@@ -650,9 +623,7 @@
 			inode->i_op = &nfs_symlink_inode_operations;
 		else
 			init_special_inode(inode, inode->i_mode, fattr->rdev);
-		memcpy(&inode->u.nfs_i.fh, fh, sizeof(inode->u.nfs_i.fh));
 	}
-	nfs_refresh_inode(inode, fattr);
 }
 
 struct nfs_find_desc {
@@ -667,7 +638,7 @@
  * i_ino.
  */
 static int
-nfs_find_actor(struct inode *inode, unsigned long ino, void *opaque)
+nfs_find_actor(struct inode *inode, void *opaque)
 {
 	struct nfs_find_desc	*desc = (struct nfs_find_desc *)opaque;
 	struct nfs_fh		*fh = desc->fh;
@@ -685,6 +656,18 @@
 	return 1;
 }
 
+static int
+nfs_init_locked(struct inode *inode, void *opaque)
+{
+	struct nfs_find_desc	*desc = (struct nfs_find_desc *)opaque;
+	struct nfs_fh		*fh = desc->fh;
+	struct nfs_fattr	*fattr = desc->fattr;
+
+	NFS_FILEID(inode) = fattr->fileid;
+	memcpy(NFS_FH(inode), fh, sizeof(struct nfs_fh));
+	return 0;
+}
+
 /*
  * This is our own version of iget that looks up inodes by file handle
  * instead of inode number.  We use this technique instead of using
@@ -712,7 +695,7 @@
 {
 	struct nfs_find_desc desc = { fh, fattr };
 	struct inode *inode = NULL;
-	unsigned long ino;
+	unsigned long hash;
 
 	if ((fattr->valid & NFS_ATTR_FATTR) == 0)
 		goto out_no_inode;
@@ -722,12 +705,29 @@
 		goto out_no_inode;
 	}
 
-	ino = nfs_fattr_to_ino_t(fattr);
+	hash = nfs_fattr_to_ino_t(fattr);
 
-	if (!(inode = iget4(sb, ino, nfs_find_actor, &desc)))
+	if (!(inode = iget5_locked(sb, hash, nfs_find_actor, nfs_init_locked, &desc)))
 		goto out_no_inode;
 
-	nfs_fill_inode(inode, fh, fattr);
+        if (inode->i_state & I_NEW) {
+		inode->i_ino = hash;
+		inode->i_blksize = inode->i_sb->s_blocksize;
+		inode->i_mode = 0;
+		inode->i_rdev = 0;
+		/* We can't support UPDATE_ATIME(), since the server will reset it */
+		inode->i_flags |= S_NOATIME;
+		INIT_LIST_HEAD(&inode->u.nfs_i.read);
+		INIT_LIST_HEAD(&inode->u.nfs_i.dirty);
+		INIT_LIST_HEAD(&inode->u.nfs_i.commit);
+		INIT_LIST_HEAD(&inode->u.nfs_i.writeback);
+		NFS_CACHEINV(inode);
+		NFS_ATTRTIMEO(inode) = NFS_MINATTRTIMEO(inode);
+		NFS_ATTRTIMEO_UPDATE(inode) = jiffies;
+		nfs_fill_inode(inode, fh, fattr);
+		unlock_new_inode(inode);
+	} else
+		nfs_refresh_inode(inode, fattr);
 	dprintk("NFS: __nfs_fhget(%x/%Ld ct=%d)\n",
 		inode->i_dev, (long long)NFS_FILEID(inode),
 		atomic_read(&inode->i_count));
===== fs/reiserfs/inode.c 1.39 vs edited =====
--- 1.39/fs/reiserfs/inode.c	Thu Sep 12 12:39:21 2002
+++ edited/fs/reiserfs/inode.c	Fri Mar 21 15:23:52 2003
@@ -30,7 +30,7 @@
     lock_kernel() ; 
 
     /* The = 0 happens when we abort creating a new inode for some reason like lack of space.. */
-    if (INODE_PKEY(inode)->k_objectid != 0) { /* also handles bad_inode case */
+    if (!(inode->i_state & I_NEW) && INODE_PKEY(inode)->k_objectid != 0) { /* also handles bad_inode case */
 	down (&inode->i_sem); 
 
 	journal_begin(&th, inode->i_sb, jbegin_count) ;
@@ -867,7 +867,7 @@
 // item version directly
 //
 
-// called by read_inode
+// called by read_locked_inode
 static void init_inode (struct inode * inode, struct path * path)
 {
     struct buffer_head * bh;
@@ -1107,27 +1107,24 @@
     make_bad_inode(inode);
 }
 
-void reiserfs_read_inode(struct inode *inode) {
-    reiserfs_make_bad_inode(inode) ;
+int reiserfs_init_locked_inode (struct inode * inode, void *p)
+{
+    struct reiserfs_iget_args *args = (struct reiserfs_iget_args *)p ;
+    inode->i_ino = args->objectid;
+    INODE_PKEY(inode)->k_dir_id = cpu_to_le32(args->dirid);
+    return 0;
 }
 
-
 /* looks for stat data in the tree, and fills up the fields of in-core
    inode stat data fields */
-void reiserfs_read_inode2 (struct inode * inode, void *p)
+void reiserfs_read_locked_inode (struct inode * inode, struct reiserfs_iget_args *args)
 {
     INITIALIZE_PATH (path_to_sd);
     struct cpu_key key;
-    struct reiserfs_iget4_args *args = (struct reiserfs_iget4_args *)p ;
     unsigned long dirino;
     int retval;
 
-    if (!p) {
-	reiserfs_make_bad_inode(inode) ;
-	return;
-    }
-
-    dirino = args->objectid ;
+    dirino = args->dirid ;
 
     /* set version 1, version 2 could be used too, because stat data
        key is the same in both versions */
@@ -1140,7 +1137,7 @@
     /* look for the object's stat data */
     retval = search_item (inode->i_sb, &key, &path_to_sd);
     if (retval == IO_ERROR) {
-	reiserfs_warning ("vs-13070: reiserfs_read_inode2: "
+	reiserfs_warning ("vs-13070: reiserfs_read_locked_inode: "
                     "i/o failure occurred trying to find stat data of %K\n",
                     &key);
 	reiserfs_make_bad_inode(inode) ;
@@ -1172,7 +1169,7 @@
        during mount (fs/reiserfs/super.c:finish_unfinished()). */
     if( ( inode -> i_nlink == 0 ) && 
 	! inode -> i_sb -> u.reiserfs_sb.s_is_unlinked_ok ) {
-	    reiserfs_warning( "vs-13075: reiserfs_read_inode2: "
+	    reiserfs_warning( "vs-13075: reiserfs_read_locked_inode: "
 			      "dead inode read from disk %K. "
 			      "This is likely to be race with knfsd. Ignore\n", 
 			      &key );
@@ -1184,38 +1181,43 @@
 }
 
 /**
- * reiserfs_find_actor() - "find actor" reiserfs supplies to iget4().
+ * reiserfs_find_actor() - "find actor" reiserfs supplies to iget5_locked().
  *
  * @inode:    inode from hash table to check
- * @inode_no: inode number we are looking for
- * @opaque:   "cookie" passed to iget4(). This is &reiserfs_iget4_args.
+ * @opaque:   "cookie" passed to iget5_locked(). This is &reiserfs_iget_args.
  *
- * This function is called by iget4() to distinguish reiserfs inodes
+ * This function is called by iget5_locked() to distinguish reiserfs inodes
  * having the same inode numbers. Such inodes can only exist due to some
  * error condition. One of them should be bad. Inodes with identical
  * inode numbers (objectids) are distinguished by parent directory ids.
  *
  */
-static int reiserfs_find_actor( struct inode *inode, 
-				unsigned long inode_no, void *opaque )
+int reiserfs_find_actor( struct inode *inode, void *opaque )
 {
-    struct reiserfs_iget4_args *args;
+    struct reiserfs_iget_args *args;
 
     args = opaque;
     /* args is already in CPU order */
-    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
+    return (inode->i_ino == args->objectid) &&
+	(le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args->dirid);
 }
 
 struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)
 {
     struct inode * inode;
-    struct reiserfs_iget4_args args ;
+    struct reiserfs_iget_args args ;
 
-    args.objectid = key->on_disk_key.k_dir_id ;
-    inode = iget4 (s, key->on_disk_key.k_objectid, 
-		   reiserfs_find_actor, (void *)(&args));
+    args.objectid = key->on_disk_key.k_objectid ;
+    args.dirid = key->on_disk_key.k_dir_id ;
+    inode = iget5_locked (s, key->on_disk_key.k_objectid, 
+		   reiserfs_find_actor, reiserfs_init_locked_inode, (void *)(&args));
     if (!inode) 
 	return ERR_PTR(-ENOMEM) ;
+
+    if (inode->i_state & I_NEW) {
+	reiserfs_read_locked_inode(inode, &args);
+	unlock_new_inode(inode);
+    }
 
     if (comp_short_keys (INODE_PKEY (inode), key) || is_bad_inode (inode)) {
 	/* either due to i/o error or a stale NFS handle */
===== fs/reiserfs/super.c 1.27 vs edited =====
--- 1.27/fs/reiserfs/super.c	Wed Oct 30 19:42:36 2002
+++ edited/fs/reiserfs/super.c	Fri Mar 21 15:23:53 2003
@@ -381,8 +381,6 @@
 
 struct super_operations reiserfs_sops = 
 {
-  read_inode: reiserfs_read_inode,
-  read_inode2: reiserfs_read_inode2,
   write_inode: reiserfs_write_inode,
   dirty_inode: reiserfs_dirty_inode,
   delete_inode: reiserfs_delete_inode,
@@ -1117,7 +1115,7 @@
     int old_format = 0;
     unsigned long blocks;
     int jinit_done = 0 ;
-    struct reiserfs_iget4_args args ;
+    struct reiserfs_iget_args args ;
     int old_magic;
     struct reiserfs_super_block * rs;
 
@@ -1194,11 +1192,17 @@
         printk("clm-7000: Detected readonly device, marking FS readonly\n") ;
 	s->s_flags |= MS_RDONLY ;
     }
-    args.objectid = REISERFS_ROOT_PARENT_OBJECTID ;
-    root_inode = iget4 (s, REISERFS_ROOT_OBJECTID, 0, (void *)(&args));
+    args.objectid = REISERFS_ROOT_OBJECTID ;
+    args.dirid = REISERFS_ROOT_PARENT_OBJECTID ;
+    root_inode = iget5_locked (s, REISERFS_ROOT_OBJECTID, reiserfs_find_actor, reiserfs_init_locked_inode, (void *)(&args));
     if (!root_inode) {
 	printk ("reiserfs_read_super: get root inode failed\n");
 	goto error;
+    }
+ 
+    if (root_inode->i_state & I_NEW) {
+	reiserfs_read_locked_inode(root_inode, &args);
+	unlock_new_inode(root_inode);
     }
 
     s->s_root = d_alloc_root(root_inode);  
===== include/linux/fs.h 1.69 vs edited =====
--- 1.69/include/linux/fs.h	Thu Sep  5 00:14:54 2002
+++ edited/include/linux/fs.h	Fri Mar 21 16:14:25 2003
@@ -881,13 +881,6 @@
 struct super_operations {
 	void (*read_inode) (struct inode *);
   
-  	/* reiserfs kludge.  reiserfs needs 64 bits of information to
-    	** find an inode.  We are using the read_inode2 call to get
-   	** that information.  We don't like this, and are waiting on some
-   	** VFS changes for the real solution.
-   	** iget4 calls read_inode2, iff it is defined
-   	*/
-    	void (*read_inode2) (struct inode *, void *) ;
    	void (*dirty_inode) (struct inode *);
 	void (*write_inode) (struct inode *, int);
 	void (*put_inode) (struct inode *);
@@ -935,6 +928,7 @@
 #define I_LOCK			8
 #define I_FREEING		16
 #define I_CLEAR			32
+#define I_NEW			64
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
@@ -1347,11 +1341,24 @@
 extern struct inode * igrab(struct inode *);
 extern ino_t iunique(struct super_block *, ino_t);
 
-typedef int (*find_inode_t)(struct inode *, unsigned long, void *);
-extern struct inode * iget4(struct super_block *, unsigned long, find_inode_t, void *);
+extern struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
+	      int (*test)(struct inode *, void *), void *data);
+extern struct inode *ilookup(struct super_block *sb, unsigned long ino);
+
+extern struct inode * iget5_locked(struct super_block *, unsigned long, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *);
+extern struct inode * iget_locked(struct super_block *, unsigned long);
+extern void unlock_new_inode(struct inode *);
+
 static inline struct inode *iget(struct super_block *sb, unsigned long ino)
 {
-	return iget4(sb, ino, NULL, NULL);
+      struct inode *inode = iget_locked(sb, ino);
+
+      if (inode && (inode->i_state & I_NEW)) {
+	      sb->s_op->read_inode(inode);
+	      unlock_new_inode(inode);
+      }
+
+      return inode;
 }
 
 extern void clear_inode(struct inode *);
@@ -1369,8 +1376,12 @@
 }
 extern void remove_suid(struct inode *inode);
 
-extern void insert_inode_hash(struct inode *);
+extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 extern void remove_inode_hash(struct inode *);
+static inline void insert_inode_hash(struct inode *inode) {
+      __insert_inode_hash(inode, inode->i_ino);
+}
+
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern struct buffer_head * get_hash_table(kdev_t, int, int);
===== include/linux/reiserfs_fs.h 1.25 vs edited =====
--- 1.25/include/linux/reiserfs_fs.h	Thu Sep 12 12:39:21 2002
+++ edited/include/linux/reiserfs_fs.h	Fri Mar 21 16:15:38 2003
@@ -1478,8 +1478,9 @@
 #define B_I_POS_UNFM_POINTER(bh,ih,pos) le32_to_cpu(*(((unp_t *)B_I_PITEM(bh,ih)) + (pos)))
 #define PUT_B_I_POS_UNFM_POINTER(bh,ih,pos, val) do {*(((unp_t *)B_I_PITEM(bh,ih)) + (pos)) = cpu_to_le32(val); } while (0)
 
-struct reiserfs_iget4_args {
+struct reiserfs_iget_args {
     __u32 objectid ;
+    __u32 dirid ;
 } ;
 
 /***************************************************************************/
@@ -1730,8 +1731,9 @@
 
 /* inode.c */
 
-void reiserfs_read_inode (struct inode * inode) ;
-void reiserfs_read_inode2(struct inode * inode, void *p) ;
+void reiserfs_read_locked_inode(struct inode * inode, struct reiserfs_iget_args *args) ;
+int reiserfs_find_actor(struct inode * inode, void *p) ;
+int reiserfs_init_locked_inode(struct inode * inode, void *p) ;
 void reiserfs_delete_inode (struct inode * inode);
 void reiserfs_write_inode (struct inode * inode, int) ;
 struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, __u32 *data,
===== kernel/ksyms.c 1.64 vs edited =====
--- 1.64/kernel/ksyms.c	Thu Sep 19 04:55:42 2002
+++ edited/kernel/ksyms.c	Fri Mar 21 16:01:41 2003
@@ -140,7 +140,6 @@
 EXPORT_SYMBOL(fget);
 EXPORT_SYMBOL(igrab);
 EXPORT_SYMBOL(iunique);
-EXPORT_SYMBOL(iget4);
 EXPORT_SYMBOL(iput);
 EXPORT_SYMBOL(force_delete);
 EXPORT_SYMBOL(follow_up);
@@ -524,7 +523,7 @@
 EXPORT_SYMBOL(read_ahead);
 EXPORT_SYMBOL(get_hash_table);
 EXPORT_SYMBOL(get_empty_inode);
-EXPORT_SYMBOL(insert_inode_hash);
+EXPORT_SYMBOL(__insert_inode_hash);
 EXPORT_SYMBOL(remove_inode_hash);
 EXPORT_SYMBOL(buffer_insert_inode_queue);
 EXPORT_SYMBOL(buffer_insert_inode_data_queue);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 17:37             ` Oleg Drokin
@ 2003-03-29 18:22               ` Bernd Schubert
  2003-03-29 18:45               ` Soeren Sonnenburg
  2003-03-30 15:08               ` Bernd Schubert
  2 siblings, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-29 18:22 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

Hello!

> See the patch below.
> It should not break anything.

Thanks a lot for the patch, I'm going to test it tomorrow.

>
> > The problems is to detect if this is really caused by this race. Since I
> > still
>
> Well, race is of type when several parallel processes try to access the
> file whose inode is not in memory.
> This resembles what you describe with lots of clients accessing
> same file over NFS at the same time.

Ah, good to know, so I could trigger this when I write a script that reads a 
file on the server as endless loop. Runing this on all clients and then 
rebooting server, should cause a problem for this file, shouldn't it ?
This way we could prove that the patch helps.


Best regards,
	Bernd


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 17:37             ` Oleg Drokin
  2003-03-29 18:22               ` Bernd Schubert
@ 2003-03-29 18:45               ` Soeren Sonnenburg
  2003-03-31  8:37                 ` Oleg Drokin
  2003-03-30 15:08               ` Bernd Schubert
  2 siblings, 1 reply; 27+ messages in thread
From: Soeren Sonnenburg @ 2003-03-29 18:45 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Bernd Schubert, reiserfs-list

On Sat, 2003-03-29 at 18:37, Oleg Drokin wrote:
> Hello!
> 
> On Sat, Mar 29, 2003 at 04:06:54PM +0100, Bernd Schubert wrote:
> > > The problems seems to have something to do with strange dentries appearing
> > > in dentry cache, also you have verified that this is not because of fs
> > > error.
> > > What kernel are you running?
> > 2.4.20 with the ptrace patch.
> 
> Ok.

here it is 2.4.20 with ext3 umount patch.

> > > I think may be race with iget4 might cause this, but I am not sure.
> > > We have a patch for this iget4 race, and if you are willing to test it, I
> > > can send it to you.
> > Yes of course I am. I hope the other admins won't kill me, but I guess they 
> > know that I willing to do experiments from time to time ;-)
> > Could something worse happen when I try the patch, that could not be fixed by 
> > simple rebooting ?
> 
> See the patch below.
> It should not break anything.

Well, we keep getting this error in massess too... and in our case it is
very likely that clients access the same file from all the clients...

However I cannot quickly test that atm but if it is really safe I might
test it next week.

Soeren.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 17:37             ` Oleg Drokin
  2003-03-29 18:22               ` Bernd Schubert
  2003-03-29 18:45               ` Soeren Sonnenburg
@ 2003-03-30 15:08               ` Bernd Schubert
  2003-03-31  8:33                 ` Oleg Drokin
  2003-03-31 10:24                 ` Oleg Drokin
  2 siblings, 2 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-30 15:08 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

Hello!

> See the patch below.
> It should not break anything.
>

I just tried to cause the problems again, but without any success. I simply 
don't get the messages today. I tried to call /usr/bin/uptime from all 
clients as endless loop, rebooted the server -- no success. Then I stopped 
the nfs-server, copied uptime to uptime.old, removed uptime and moved 
uptime.old to uptime (to make the filehandle invalid) and rebooted -- still 
no success. I also tried several other things, but none of them helped.

Anyway I also tested the kernel-patch, but it causes problems with mounts from 
another nfs-server. Well, mounting works fine, but e.g. running 'ls 
{mountdir}'  returns 'Unknown error 524' (or was it 542 I can't remember). So 
even my homedirectory was not available on the server when I tried the 
iget-patch.
Well on system-reboot there I also get an oops. I have attached the ksymoops 
output.

Best regards,
	Bernd

[-- Attachment #2: ksymoops.out --]
[-- Type: text/plain, Size: 2653 bytes --]

ksymoops 2.4.8 on i686 2.4.20-athlon-pp.  Options used
     -v vmlinux__2.4.20-athlon-pp-igetp (specified)
     -K (specified)
     -L (specified)
     -o /lib/modules/2.4.20-athlon-pp-igetp/ (specified)
     -m System.map__2.4.20-athlon-pp-igetp (specified)

No modules in ksyms, skipping objects
Mar 30 16:21:38 hamilton kernel: kernel BUG at inode.c:1268!
Mar 30 16:21:38 hamilton kernel: invalid operand: 0000
Mar 30 16:21:38 hamilton kernel: CPU:    0
Mar 30 16:21:38 hamilton kernel: EIP:    0010:[<c0149870>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Mar 30 16:21:38 hamilton kernel: EFLAGS: 00010246
Mar 30 16:21:38 hamilton kernel: eax: f6dc6280   ebx: f6dc6280   ecx: f89afad4   edx: f6dc63a0
Mar 30 16:21:38 hamilton kernel: esi: f6d82400   edi: 00000000   ebp: 080551f0   esp: f6d21f28
Mar 30 16:21:38 hamilton kernel: ds: 0018   es: 0018   ss: 0018
Mar 30 16:21:38 hamilton kernel: Process umount (pid: 892, stackpage=f6d21000)
Mar 30 16:21:38 hamilton kernel: Stack: f89afad4 f6d824d4 f89afca0 f89a930d f6dc6280 f6d82400 f89a9df8 f6dc6280
Mar 30 16:21:38 hamilton kernel:        f6d82400 f6d82444 c013b3e1 f6d82400 c342a540 f6d82400 0804ffd8 c014af0e
Mar 30 16:21:38 hamilton kernel:        f6d82400 c342a540 f744bc40 f6d21f98 00000000 c013f067 c342a540 f6d21f98
Mar 30 16:21:38 hamilton kernel: Call Trace:    [<f89afad4>] [<f89afca0>] [<f89a930d>] [<f89a9df8>] [<c013b3e1>]
Mar 30 16:21:38 hamilton kernel:   [<c014af0e>] [<c013f067>] [<c014b5af>] [<c01283b5>] [<c014b5cc>] [<c0108837>]
Mar 30 16:21:38 hamilton kernel: Code: 0f 0b f4 04 bd 0b 27 c0 85 f6 74 03 8b 7e 20 85 ff 74 0d 8b


>>EIP; c0149870 <iput+20/200>   <=====

Trace; f89afad4 <END_OF_CODE+3865ec30/????>
Trace; f89afca0 <END_OF_CODE+3865edfc/????>
Trace; f89a930d <END_OF_CODE+38658469/????>
Trace; f89a9df8 <END_OF_CODE+38658f54/????>
Trace; c013b3e1 <kill_super+a1/e0>
Trace; c014af0e <__mntput+1e/30>
Trace; c013f067 <path_release+27/30>
Trace; c014b5af <sys_umount+6f/80>
Trace; c01283b5 <sys_munmap+35/60>
Trace; c014b5cc <sys_oldumount+c/10>
Trace; c0108837 <system_call+33/38>

Code;  c0149870 <iput+20/200>
00000000 <_EIP>:
Code;  c0149870 <iput+20/200>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0149872 <iput+22/200>
   2:   f4                        hlt    
Code;  c0149873 <iput+23/200>
   3:   04 bd                     add    $0xbd,%al
Code;  c0149875 <iput+25/200>
   5:   0b 27                     or     (%edi),%esp
Code;  c0149877 <iput+27/200>
   7:   c0 85 f6 74 03 8b 7e      rolb   $0x7e,0x8b0374f6(%ebp)
Code;  c014987e <iput+2e/200>
   e:   20 85 ff 74 0d 8b         and    %al,0x8b0d74ff(%ebp)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-30 15:08               ` Bernd Schubert
@ 2003-03-31  8:33                 ` Oleg Drokin
  2003-03-31 10:24                 ` Oleg Drokin
  1 sibling, 0 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-03-31  8:33 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

Hello!

On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:

> > See the patch below.
> > It should not break anything.
> I just tried to cause the problems again, but without any success. I simply 
> don't get the messages today. I tried to call /usr/bin/uptime from all 

This is probably confirms that the problem is a race of some kind.

> Anyway I also tested the kernel-patch, but it causes problems with mounts from 
> another nfs-server. Well, mounting works fine, but e.g. running 'ls 
> {mountdir}'  returns 'Unknown error 524' (or was it 542 I can't remember). So 
> even my homedirectory was not available on the server when I tried the 
> iget-patch.

Hm. This is strange.
While backporting iget5_locked patch, the only conflicts were in core fs stuff,
NFS-client part of the patch is the same as in patch for 2.4.21-pre6, and 
our tester have found zero problems with the patch on 2.4.21-pre6.

> Well on system-reboot there I also get an oops. I have attached the ksymoops 
> output.

Ok, I will try to reproduce that.

Meanwhile, if you can try 2.4.21-pre6 with attached patch, that would be great.

Thank you.

Bye,
    Oleg

[-- Attachment #2: iget5_locked.diff --]
[-- Type: text/plain, Size: 34447 bytes --]

===== Documentation/filesystems/Locking 1.7 vs edited =====
--- 1.7/Documentation/filesystems/Locking	Thu Dec 19 05:34:24 2002
+++ edited/Documentation/filesystems/Locking	Fri Feb 21 14:19:38 2003
@@ -114,7 +114,7 @@
 remount_fs:	yes	yes	maybe		(see below)
 umount_begin:	yes	no	maybe		(see below)
 
-->read_inode() is not a method - it's a callback used in iget()/iget4().
+->read_inode() is not a method - it's a callback used in iget().
 rules for mount_sem are not too nice - it is going to die and be replaced
 by better scheme anyway.
 
===== fs/Makefile 1.16 vs edited =====
--- 1.16/fs/Makefile	Thu Sep 12 04:00:00 2002
+++ edited/fs/Makefile	Fri Feb 21 14:24:21 2003
@@ -7,7 +7,7 @@
 
 O_TARGET := fs.o
 
-export-objs :=	filesystems.o open.o dcache.o buffer.o
+export-objs :=	filesystems.o open.o dcache.o buffer.o inode.o
 mod-subdirs :=	nls
 
 obj-y :=	open.o read_write.o devices.o file_table.o buffer.o \
===== fs/inode.c 1.36 vs edited =====
--- 1.36/fs/inode.c	Thu Aug 29 07:02:23 2002
+++ edited/fs/inode.c	Mon Mar  3 19:32:54 2003
@@ -17,6 +17,7 @@
 #include <linux/swapctl.h>
 #include <linux/prefetch.h>
 #include <linux/locks.h>
+#include <linux/module.h>
 
 /*
  * New inode.c implementation.
@@ -692,7 +693,7 @@
 	invalidate_buffers(dev);
 	return res;
 }
-
+EXPORT_SYMBOL(unlock_new_inode);
 
 /*
  * This is called with the inode lock held. It searches
@@ -783,7 +784,32 @@
  * by hand after calling find_inode now! This simplifies iunique and won't
  * add any additional branch in the common code.
  */
-static struct inode * find_inode(struct super_block * sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
+static struct inode * find_inode(struct super_block * sb, struct list_head *head, int (*test)(struct inode *, void *), void *data)
+{
+	struct list_head *tmp;
+	struct inode * inode;
+
+	tmp = head;
+	for (;;) {
+		tmp = tmp->next;
+		inode = NULL;
+		if (tmp == head)
+			break;
+		inode = list_entry(tmp, struct inode, i_hash);
+		if (inode->i_sb != sb)
+			continue;
+		if (!test(inode, data))
+			continue;
+		break;
+	}
+	return inode;
+}
+
+/*
+ * find_inode_fast is the fast path version of find_inode, see the comment at
+ * iget_locked for details.
+ */
+static struct inode * find_inode_fast(struct super_block * sb, struct list_head *head, unsigned long ino)
 {
 	struct list_head *tmp;
 	struct inode * inode;
@@ -799,8 +825,6 @@
 			continue;
 		if (inode->i_sb != sb)
 			continue;
-		if (find_actor && !find_actor(inode, ino, opaque))
-			continue;
 		break;
 	}
 	return inode;
@@ -832,13 +856,28 @@
 	return inode;
 }
 
+void unlock_new_inode(struct inode *inode)
+{
+	/*
+	 * This is special!  We do not need the spinlock
+	 * when clearing I_LOCK, because we're guaranteed
+	 * that nobody else tries to do anything about the
+	 * state of the inode when it is locked, as we
+	 * just created it (so there can be no old holders
+	 * that haven't tested I_LOCK).
+	 */
+	inode->i_state &= ~(I_LOCK|I_NEW);
+	wake_up(&inode->i_wait);
+}
+
+
 /*
  * This is called without the inode lock held.. Be careful.
  *
  * We no longer cache the sb_flags in i_flags - see fs.h
  *	-- rmk@arm.uk.linux.org
  */
-static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
+static struct inode * get_new_inode(struct super_block *sb, struct list_head *head, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data)
 {
 	struct inode * inode;
 
@@ -848,37 +887,68 @@
 
 		spin_lock(&inode_lock);
 		/* We released the lock, so.. */
-		old = find_inode(sb, ino, head, find_actor, opaque);
+		old = find_inode(sb, head, test, data);
 		if (!old) {
+			if (set(inode, data))
+				goto set_failed;
+
 			inodes_stat.nr_inodes++;
 			list_add(&inode->i_list, &inode_in_use);
 			list_add(&inode->i_hash, head);
-			inode->i_ino = ino;
-			inode->i_state = I_LOCK;
+			inode->i_state = I_LOCK|I_NEW;
 			spin_unlock(&inode_lock);
 
-			/* reiserfs specific hack right here.  We don't
-			** want this to last, and are looking for VFS changes
-			** that will allow us to get rid of it.
-			** -- mason@suse.com 
-			*/
-			if (sb->s_op->read_inode2) {
-				sb->s_op->read_inode2(inode, opaque) ;
-			} else {
-				sb->s_op->read_inode(inode);
-			}
-
-			/*
-			 * This is special!  We do not need the spinlock
-			 * when clearing I_LOCK, because we're guaranteed
-			 * that nobody else tries to do anything about the
-			 * state of the inode when it is locked, as we
-			 * just created it (so there can be no old holders
-			 * that haven't tested I_LOCK).
+			/* Return the locked inode with I_NEW set, the
+			 * caller is responsible for filling in the contents
 			 */
-			inode->i_state &= ~I_LOCK;
-			wake_up(&inode->i_wait);
+			return inode;
+		}
 
+		/*
+		 * Uhhuh, somebody else created the same inode under
+		 * us. Use the old inode instead of the one we just
+		 * allocated.
+		 */
+		__iget(old);
+		spin_unlock(&inode_lock);
+		destroy_inode(inode);
+		inode = old;
+		wait_on_inode(inode);
+	}
+	return inode;
+
+set_failed:
+	spin_unlock(&inode_lock);
+	destroy_inode(inode);
+	return NULL;
+}
+
+/*
+ * get_new_inode_fast is the fast path version of get_new_inode, see the
+ * comment at iget_locked for details.
+ */
+static struct inode * get_new_inode_fast(struct super_block *sb, struct list_head *head, unsigned long ino)
+{
+	struct inode * inode;
+
+	inode = alloc_inode(sb);
+	if (inode) {
+		struct inode * old;
+
+		spin_lock(&inode_lock);
+		/* We released the lock, so.. */
+		old = find_inode_fast(sb, head, ino);
+		if (!old) {
+			inode->i_ino = ino;
+			inodes_stat.nr_inodes++;
+			list_add(&inode->i_list, &inode_in_use);
+			list_add(&inode->i_hash, head);
+			inode->i_state = I_LOCK|I_NEW;
+			spin_unlock(&inode_lock);
+
+			/* Return the locked inode with I_NEW set, the
+			 * caller is responsible for filling in the contents
+			 */
 			return inode;
 		}
 
@@ -896,9 +966,9 @@
 	return inode;
 }
 
-static inline unsigned long hash(struct super_block *sb, unsigned long i_ino)
+static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
 {
-	unsigned long tmp = i_ino + ((unsigned long) sb / L1_CACHE_BYTES);
+	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
 	tmp = tmp + (tmp >> I_HASHBITS);
 	return tmp & I_HASHMASK;
 }
@@ -930,7 +1000,8 @@
 retry:
 	if (counter > max_reserved) {
 		head = inode_hashtable + hash(sb,counter);
-		inode = find_inode(sb, res = counter++, head, NULL, NULL);
+		res = counter++;
+		inode = find_inode_fast(sb, head, res);
 		if (!inode) {
 			spin_unlock(&inode_lock);
 			return res;
@@ -958,14 +1029,63 @@
 	return inode;
 }
 
+/**
+ * ifind - internal function, you want ilookup5() or iget5().
+ * @sb:		super block of file system to search
+ * @hashval:	hash value (usually inode number) to search for
+ * @test:	callback used for comparisons between inodes
+ * @data:	opaque data pointer to pass to @test
+ *
+ * ifind() searches for the inode specified by @hashval and @data in the inode
+ * cache. This is a generalized version of ifind_fast() for file systems where
+ * the inode number is not sufficient for unique identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ *
+ * Note, @test is called with the inode_lock held, so can't sleep.
+ */
+static inline struct inode *ifind(struct super_block *sb,
+		struct list_head *head, int (*test)(struct inode *, void *),
+		void *data)
+{
+	struct inode *inode;
+
+	spin_lock(&inode_lock);
+	inode = find_inode(sb, head, test, data);
+	if (inode) {
+		__iget(inode);
+		spin_unlock(&inode_lock);
+		wait_on_inode(inode);
+		return inode;
+	}
+	spin_unlock(&inode_lock);
+	return NULL;
+}
 
-struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque)
+/**
+ * ifind_fast - internal function, you want ilookup() or iget().
+ * @sb:		super block of file system to search
+ * @ino:	inode number to search for
+ *
+ * ifind_fast() searches for the inode @ino in the inode cache. This is for
+ * file systems where the inode number is sufficient for unique identification
+ * of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ */
+static inline struct inode *ifind_fast(struct super_block *sb,
+		struct list_head *head, unsigned long ino)
 {
-	struct list_head * head = inode_hashtable + hash(sb,ino);
-	struct inode * inode;
+	struct inode *inode;
 
 	spin_lock(&inode_lock);
-	inode = find_inode(sb, ino, head, find_actor, opaque);
+	inode = find_inode_fast(sb, head, ino);
 	if (inode) {
 		__iget(inode);
 		spin_unlock(&inode_lock);
@@ -973,27 +1093,147 @@
 		return inode;
 	}
 	spin_unlock(&inode_lock);
+	return NULL;
+}
+
+/**
+ * ilookup5 - search for an inode in the inode cache
+ * @sb:		super block of file system to search
+ * @hashval:	hash value (usually inode number) to search for
+ * @test:	callback used for comparisons between inodes
+ * @data:	opaque data pointer to pass to @test
+ *
+ * ilookup5() uses ifind() to search for the inode specified by @hashval and
+ * @data in the inode cache. This is a generalized version of ilookup() for
+ * file systems where the inode number is not sufficient for unique
+ * identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ *
+ * Note, @test is called with the inode_lock held, so can't sleep.
+ */
+struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
+		int (*test)(struct inode *, void *), void *data)
+{
+	struct list_head *head = inode_hashtable + hash(sb, hashval);
 
+	return ifind(sb, head, test, data);
+}
+EXPORT_SYMBOL(ilookup5);
+
+/**
+ * ilookup - search for an inode in the inode cache
+ * @sb:		super block of file system to search
+ * @ino:	inode number to search for
+ *
+ * ilookup() uses ifind_fast() to search for the inode @ino in the inode cache.
+ * This is for file systems where the inode number is sufficient for unique
+ * identification of an inode.
+ *
+ * If the inode is in the cache, the inode is returned with an incremented
+ * reference count.
+ *
+ * Otherwise NULL is returned.
+ */
+struct inode *ilookup(struct super_block *sb, unsigned long ino)
+{
+	struct list_head *head = inode_hashtable + hash(sb, ino);
+
+	return ifind_fast(sb, head, ino);
+}
+EXPORT_SYMBOL(ilookup);
+
+/**
+ * iget5_locked - obtain an inode from a mounted file system
+ * @sb:		super block of file system
+ * @hashval:	hash value (usually inode number) to get
+ * @test:	callback used for comparisons between inodes
+ * @set:	callback used to initialize a new struct inode
+ * @data:	opaque data pointer to pass to @test and @set
+ *
+ * This is iget() without the read_inode() portion of get_new_inode().
+ *
+ * iget5_locked() uses ifind() to search for the inode specified by @hashval
+ * and @data in the inode cache and if present it is returned with an increased
+ * reference count. This is a generalized version of iget_locked() for file
+ * systems where the inode number is not sufficient for unique identification
+ * of an inode.
+ *
+ * If the inode is not in cache, get_new_inode() is called to allocate a new
+ * inode and this is returned locked, hashed, and with the I_NEW flag set. The
+ * file system gets to fill it in before unlocking it via unlock_new_inode().
+ *
+ * Note both @test and @set are called with the inode_lock held, so can't sleep.
+ */
+struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
+		int (*test)(struct inode *, void *),
+		int (*set)(struct inode *, void *), void *data)
+{
+	struct list_head *head = inode_hashtable + hash(sb, hashval);
+	struct inode *inode;
+
+	inode = ifind(sb, head, test, data);
+	if (inode)
+		return inode;
 	/*
 	 * get_new_inode() will do the right thing, re-trying the search
 	 * in case it had to block at any point.
 	 */
-	return get_new_inode(sb, ino, head, find_actor, opaque);
+	return get_new_inode(sb, head, test, set, data);
+}
+EXPORT_SYMBOL(iget5_locked);
+
+/**
+ * iget_locked - obtain an inode from a mounted file system
+ * @sb:		super block of file system
+ * @ino:	inode number to get
+ *
+ * This is iget() without the read_inode() portion of get_new_inode_fast().
+ *
+ * iget_locked() uses ifind_fast() to search for the inode specified by @ino in
+ * the inode cache and if present it is returned with an increased reference
+ * count. This is for file systems where the inode number is sufficient for
+ * unique identification of an inode.
+ *
+ * If the inode is not in cache, get_new_inode_fast() is called to allocate a
+ * new inode and this is returned locked, hashed, and with the I_NEW flag set.
+ * The file system gets to fill it in before unlocking it via
+ * unlock_new_inode().
+ */
+struct inode *iget_locked(struct super_block *sb, unsigned long ino)
+{
+	struct list_head *head = inode_hashtable + hash(sb, ino);
+	struct inode *inode;
+
+	inode = ifind_fast(sb, head, ino);
+	if (inode)
+		return inode;
+	/*
+	 * get_new_inode_fast() will do the right thing, re-trying the search
+	 * in case it had to block at any point.
+	 */
+	return get_new_inode_fast(sb, head, ino);
 }
+EXPORT_SYMBOL(iget_locked);
 
 /**
- *	insert_inode_hash - hash an inode
+ *	__insert_inode_hash - hash an inode
  *	@inode: unhashed inode
+ *	@hashval: unsigned long value used to locate this object in the
+ *		inode_hashtable.
  *
  *	Add an inode to the inode hash for this superblock. If the inode
  *	has no superblock it is added to a separate anonymous chain.
  */
  
-void insert_inode_hash(struct inode *inode)
+void __insert_inode_hash(struct inode *inode, unsigned long hashval)
 {
 	struct list_head *head = &anon_hash_chain;
 	if (inode->i_sb)
-		head = inode_hashtable + hash(inode->i_sb, inode->i_ino);
+		head = inode_hashtable + hash(inode->i_sb, hashval);
 	spin_lock(&inode_lock);
 	list_add(&inode->i_hash, head);
 	spin_unlock(&inode_lock);
===== fs/coda/cnode.c 1.8 vs edited =====
--- 1.8/fs/coda/cnode.c	Wed May 29 19:20:33 2002
+++ edited/fs/coda/cnode.c	Mon Feb 24 13:20:12 2003
@@ -27,11 +27,6 @@
 	return 1;
 }
 
-static int coda_inocmp(struct inode *inode, unsigned long ino, void *opaque)
-{
-	return (coda_fideq((ViceFid *)opaque, &(ITOC(inode)->c_fid)));
-}
-
 static struct inode_operations coda_symlink_inode_operations = {
 	readlink:	page_readlink,
 	follow_link:	page_follow_link,
@@ -62,29 +57,46 @@
                 init_special_inode(inode, inode->i_mode, attr->va_rdev);
 }
 
+static int coda_test_inode(struct inode *inode, void *data)
+{
+	ViceFid *fid = (ViceFid *)data;
+	return coda_fideq(&(ITOC(inode)->c_fid), fid);
+}
+
+static int coda_set_inode(struct inode *inode, void *data)
+{
+	ViceFid *fid = (ViceFid *)data;
+	ITOC(inode)->c_fid = *fid;
+	return 0;
+}
+
+static int coda_fail_inode(struct inode *inode, void *data)
+{
+	return -1;
+}
+
 struct inode * coda_iget(struct super_block * sb, ViceFid * fid,
 			 struct coda_vattr * attr)
 {
 	struct inode *inode;
 	struct coda_inode_info *cii;
-	ino_t ino = coda_f2i(fid);
 	struct coda_sb_info *sbi = coda_sbp(sb);
+	unsigned long hash = coda_f2i(fid);
 
-	down(&sbi->sbi_iget4_mutex);
-	inode = iget4(sb, ino, coda_inocmp, fid);
+	inode = iget5_locked(sb, hash, coda_test_inode, coda_set_inode, fid);
 
 	if ( !inode ) { 
-		CDEBUG(D_CNODE, "coda_iget: no inode\n");
-		up(&sbi->sbi_iget4_mutex);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	/* check if the inode is already initialized */
-	cii = ITOC(inode);
-	if (coda_isnullfid(&cii->c_fid))
-		/* new, empty inode found... initializing */
-		cii->c_fid = *fid;
-	up(&sbi->sbi_iget4_mutex);
+	if (inode->i_state & I_NEW) {
+		cii = ITOC(inode);
+		/* we still need to set i_ino for things like stat(2) */
+		inode->i_ino = hash;
+		list_add(&cii->c_cilist, &sbi->sbi_cihead);
+		unlock_new_inode(inode);
+	}
+
 
 	/* always replace the attributes, type might have changed */
 	coda_fill_inode(inode, attr);
@@ -129,6 +141,7 @@
 		      struct ViceFid *newfid)
 {
 	struct coda_inode_info *cii;
+	unsigned long hash = coda_f2i(newfid);
 	
 	cii = ITOC(inode);
 
@@ -139,17 +152,16 @@
 	/* XXX we probably need to hold some lock here! */
 	remove_inode_hash(inode);
 	cii->c_fid = *newfid;
-	inode->i_ino = coda_f2i(newfid);
-	insert_inode_hash(inode);
+	inode->i_ino = hash;
+	__insert_inode_hash(inode, hash);
 }
 
 /* convert a fid to an inode. */
 struct inode *coda_fid_to_inode(ViceFid *fid, struct super_block *sb) 
 {
-	ino_t nr;
+	
 	struct inode *inode;
-	struct coda_inode_info *cii;
-	struct coda_sb_info *sbi;
+	unsigned long hash = coda_f2i(fid);
 
 	if ( !sb ) {
 		printk("coda_fid_to_inode: no sb!\n");
@@ -158,47 +170,29 @@
 
 	CDEBUG(D_INODE, "%s\n", coda_f2s(fid));
 
-	sbi = coda_sbp(sb);
-	nr = coda_f2i(fid);
-	down(&sbi->sbi_iget4_mutex);
-	inode = iget4(sb, nr, coda_inocmp, fid);
-	if ( !inode ) {
-		printk("coda_fid_to_inode: null from iget, sb %p, nr %ld.\n",
-		       sb, (long)nr);
-		goto out_unlock;
-	}
-
-	cii = ITOC(inode);
+	inode = iget5_locked(sb, hash, coda_test_inode, coda_fail_inode, fid);
+	if ( !inode )
+		return NULL;
 
-	/* The inode could already be purged due to memory pressure */
-	if (coda_isnullfid(&cii->c_fid)) {
-		inode->i_nlink = 0;
-		iput(inode);
-		goto out_unlock;
-	}
+	/* we should never see newly created inodes because we intentionally
+	 * fail in the initialization callback */
+	BUG_ON(inode->i_state & I_NEW);
 
         CDEBUG(D_INODE, "found %ld\n", inode->i_ino);
-	up(&sbi->sbi_iget4_mutex);
 	return inode;
-
-out_unlock:
-	up(&sbi->sbi_iget4_mutex);
-	return NULL;
 }
 
 /* the CONTROL inode is made without asking attributes from Venus */
 int coda_cnode_makectl(struct inode **inode, struct super_block *sb)
 {
-	int error = 0;
+	int error = -ENOMEM;
 
 	*inode = iget(sb, CTL_INO);
-	if ( *inode ) {
+	if (*inode) {
 		(*inode)->i_op = &coda_ioctl_inode_operations;
 		(*inode)->i_fop = &coda_ioctl_operations;
 		(*inode)->i_mode = 0444;
 		error = 0;
-	} else { 
-		error = -ENOMEM;
 	}
     
 	return error;
===== fs/coda/inode.c 1.9 vs edited =====
--- 1.9/fs/coda/inode.c	Wed May 29 19:17:41 2002
+++ edited/fs/coda/inode.c	Mon Feb 24 13:20:12 2003
@@ -34,7 +34,6 @@
 
 /* VFS super_block ops */
 static struct super_block *coda_read_super(struct super_block *, void *, int);
-static void coda_read_inode(struct inode *);
 static void coda_clear_inode(struct inode *);
 static void coda_put_super(struct super_block *);
 static int coda_statfs(struct super_block *sb, struct statfs *buf);
@@ -42,7 +41,6 @@
 /* exported operations */
 struct super_operations coda_super_operations =
 {
-	read_inode:	coda_read_inode,
 	clear_inode:	coda_clear_inode,
 	put_super:	coda_put_super,
 	statfs:		coda_statfs,
@@ -179,24 +177,6 @@
 
 	printk("Coda: Bye bye.\n");
 	kfree(sbi);
-}
-
-/* all filling in of inodes postponed until lookup */
-static void coda_read_inode(struct inode *inode)
-{
-	struct coda_sb_info *sbi = coda_sbp(inode->i_sb);
-	struct coda_inode_info *cii;
-
-        if (!sbi) BUG();
-
-	cii = ITOC(inode);
-	if (!coda_isnullfid(&cii->c_fid)) {
-            printk("coda_read_inode: initialized inode");
-            return;
-        }
-
-	cii->c_mapcount = 0;
-	list_add(&cii->c_cilist, &sbi->sbi_cihead);
 }
 
 static void coda_clear_inode(struct inode *inode)
===== fs/nfs/inode.c 1.18 vs edited =====
--- 1.18/fs/nfs/inode.c	Thu Aug 15 05:05:32 2002
+++ edited/fs/nfs/inode.c	Mon Feb 24 12:38:30 2003
@@ -45,7 +45,6 @@
 void nfs_zap_caches(struct inode *);
 static void nfs_invalidate_inode(struct inode *);
 
-static void nfs_read_inode(struct inode *);
 static void nfs_write_inode(struct inode *,int);
 static void nfs_delete_inode(struct inode *);
 static void nfs_put_super(struct super_block *);
@@ -55,7 +54,6 @@
 static int  nfs_show_options(struct seq_file *, struct vfsmount *);
 
 static struct super_operations nfs_sops = { 
-	read_inode:	nfs_read_inode,
 	write_inode:	nfs_write_inode,
 	delete_inode:	nfs_delete_inode,
 	put_super:	nfs_put_super,
@@ -92,30 +90,6 @@
 	return nfs_fileid_to_ino_t(fattr->fileid);
 }
 
-/*
- * The "read_inode" function doesn't actually do anything:
- * the real data is filled in later in nfs_fhget. Here we
- * just mark the cache times invalid, and zero out i_mode
- * (the latter makes "nfs_refresh_inode" do the right thing
- * wrt pipe inodes)
- */
-static void
-nfs_read_inode(struct inode * inode)
-{
-	inode->i_blksize = inode->i_sb->s_blocksize;
-	inode->i_mode = 0;
-	inode->i_rdev = 0;
-	/* We can't support UPDATE_ATIME(), since the server will reset it */
-	inode->i_flags |= S_NOATIME;
-	INIT_LIST_HEAD(&inode->u.nfs_i.read);
-	INIT_LIST_HEAD(&inode->u.nfs_i.dirty);
-	INIT_LIST_HEAD(&inode->u.nfs_i.commit);
-	INIT_LIST_HEAD(&inode->u.nfs_i.writeback);
-	NFS_CACHEINV(inode);
-	NFS_ATTRTIMEO(inode) = NFS_MINATTRTIMEO(inode);
-	NFS_ATTRTIMEO_UPDATE(inode) = jiffies;
-}
-
 static void
 nfs_write_inode(struct inode *inode, int sync)
 {
@@ -634,7 +608,6 @@
 	 * do this once. (We don't allow inodes to change types.)
 	 */
 	if (inode->i_mode == 0) {
-		NFS_FILEID(inode) = fattr->fileid;
 		inode->i_mode = fattr->mode;
 		/* Why so? Because we want revalidate for devices/FIFOs, and
 		 * that's precisely what we have in nfs_file_inode_operations.
@@ -650,9 +623,7 @@
 			inode->i_op = &nfs_symlink_inode_operations;
 		else
 			init_special_inode(inode, inode->i_mode, fattr->rdev);
-		memcpy(&inode->u.nfs_i.fh, fh, sizeof(inode->u.nfs_i.fh));
 	}
-	nfs_refresh_inode(inode, fattr);
 }
 
 struct nfs_find_desc {
@@ -667,7 +638,7 @@
  * i_ino.
  */
 static int
-nfs_find_actor(struct inode *inode, unsigned long ino, void *opaque)
+nfs_find_actor(struct inode *inode, void *opaque)
 {
 	struct nfs_find_desc	*desc = (struct nfs_find_desc *)opaque;
 	struct nfs_fh		*fh = desc->fh;
@@ -685,6 +656,18 @@
 	return 1;
 }
 
+static int
+nfs_init_locked(struct inode *inode, void *opaque)
+{
+	struct nfs_find_desc	*desc = (struct nfs_find_desc *)opaque;
+	struct nfs_fh		*fh = desc->fh;
+	struct nfs_fattr	*fattr = desc->fattr;
+
+	NFS_FILEID(inode) = fattr->fileid;
+	memcpy(NFS_FH(inode), fh, sizeof(struct nfs_fh));
+	return 0;
+}
+
 /*
  * This is our own version of iget that looks up inodes by file handle
  * instead of inode number.  We use this technique instead of using
@@ -712,7 +695,7 @@
 {
 	struct nfs_find_desc desc = { fh, fattr };
 	struct inode *inode = NULL;
-	unsigned long ino;
+	unsigned long hash;
 
 	if ((fattr->valid & NFS_ATTR_FATTR) == 0)
 		goto out_no_inode;
@@ -722,12 +705,29 @@
 		goto out_no_inode;
 	}
 
-	ino = nfs_fattr_to_ino_t(fattr);
+	hash = nfs_fattr_to_ino_t(fattr);
 
-	if (!(inode = iget4(sb, ino, nfs_find_actor, &desc)))
+	if (!(inode = iget5_locked(sb, hash, nfs_find_actor, nfs_init_locked, &desc)))
 		goto out_no_inode;
 
-	nfs_fill_inode(inode, fh, fattr);
+        if (inode->i_state & I_NEW) {
+		inode->i_ino = hash;
+		inode->i_blksize = inode->i_sb->s_blocksize;
+		inode->i_mode = 0;
+		inode->i_rdev = 0;
+		/* We can't support UPDATE_ATIME(), since the server will reset it */
+		inode->i_flags |= S_NOATIME;
+		INIT_LIST_HEAD(&inode->u.nfs_i.read);
+		INIT_LIST_HEAD(&inode->u.nfs_i.dirty);
+		INIT_LIST_HEAD(&inode->u.nfs_i.commit);
+		INIT_LIST_HEAD(&inode->u.nfs_i.writeback);
+		NFS_CACHEINV(inode);
+		NFS_ATTRTIMEO(inode) = NFS_MINATTRTIMEO(inode);
+		NFS_ATTRTIMEO_UPDATE(inode) = jiffies;
+		nfs_fill_inode(inode, fh, fattr);
+		unlock_new_inode(inode);
+	} else
+		nfs_refresh_inode(inode, fattr);
 	dprintk("NFS: __nfs_fhget(%x/%Ld ct=%d)\n",
 		inode->i_dev, (long long)NFS_FILEID(inode),
 		atomic_read(&inode->i_count));
===== fs/reiserfs/inode.c 1.42 vs edited =====
--- 1.42/fs/reiserfs/inode.c	Thu Feb 13 15:42:42 2003
+++ edited/fs/reiserfs/inode.c	Fri Feb 21 14:29:42 2003
@@ -30,7 +30,7 @@
     lock_kernel() ; 
 
     /* The = 0 happens when we abort creating a new inode for some reason like lack of space.. */
-    if (INODE_PKEY(inode)->k_objectid != 0) { /* also handles bad_inode case */
+    if (!(inode->i_state & I_NEW) && INODE_PKEY(inode)->k_objectid != 0) { /* also handles bad_inode case */
 	down (&inode->i_sem); 
 
 	journal_begin(&th, inode->i_sb, jbegin_count) ;
@@ -887,7 +887,7 @@
 // item version directly
 //
 
-// called by read_inode
+// called by read_locked_inode
 static void init_inode (struct inode * inode, struct path * path)
 {
     struct buffer_head * bh;
@@ -1127,27 +1127,24 @@
     make_bad_inode(inode);
 }
 
-void reiserfs_read_inode(struct inode *inode) {
-    reiserfs_make_bad_inode(inode) ;
+int reiserfs_init_locked_inode (struct inode * inode, void *p)
+{
+    struct reiserfs_iget_args *args = (struct reiserfs_iget_args *)p ;
+    inode->i_ino = args->objectid;
+    INODE_PKEY(inode)->k_dir_id = cpu_to_le32(args->dirid);
+    return 0;
 }
 
-
 /* looks for stat data in the tree, and fills up the fields of in-core
    inode stat data fields */
-void reiserfs_read_inode2 (struct inode * inode, void *p)
+void reiserfs_read_locked_inode (struct inode * inode, struct reiserfs_iget_args *args)
 {
     INITIALIZE_PATH (path_to_sd);
     struct cpu_key key;
-    struct reiserfs_iget4_args *args = (struct reiserfs_iget4_args *)p ;
     unsigned long dirino;
     int retval;
 
-    if (!p) {
-	reiserfs_make_bad_inode(inode) ;
-	return;
-    }
-
-    dirino = args->objectid ;
+    dirino = args->dirid ;
 
     /* set version 1, version 2 could be used too, because stat data
        key is the same in both versions */
@@ -1160,7 +1157,7 @@
     /* look for the object's stat data */
     retval = search_item (inode->i_sb, &key, &path_to_sd);
     if (retval == IO_ERROR) {
-	reiserfs_warning ("vs-13070: reiserfs_read_inode2: "
+	reiserfs_warning ("vs-13070: reiserfs_read_locked_inode: "
                     "i/o failure occurred trying to find stat data of %K\n",
                     &key);
 	reiserfs_make_bad_inode(inode) ;
@@ -1192,7 +1189,7 @@
        during mount (fs/reiserfs/super.c:finish_unfinished()). */
     if( ( inode -> i_nlink == 0 ) && 
 	! inode -> i_sb -> u.reiserfs_sb.s_is_unlinked_ok ) {
-	    reiserfs_warning( "vs-13075: reiserfs_read_inode2: "
+	    reiserfs_warning( "vs-13075: reiserfs_read_locked_inode: "
 			      "dead inode read from disk %K. "
 			      "This is likely to be race with knfsd. Ignore\n", 
 			      &key );
@@ -1204,38 +1201,43 @@
 }
 
 /**
- * reiserfs_find_actor() - "find actor" reiserfs supplies to iget4().
+ * reiserfs_find_actor() - "find actor" reiserfs supplies to iget5_locked().
  *
  * @inode:    inode from hash table to check
- * @inode_no: inode number we are looking for
- * @opaque:   "cookie" passed to iget4(). This is &reiserfs_iget4_args.
+ * @opaque:   "cookie" passed to iget5_locked(). This is &reiserfs_iget_args.
  *
- * This function is called by iget4() to distinguish reiserfs inodes
+ * This function is called by iget5_locked() to distinguish reiserfs inodes
  * having the same inode numbers. Such inodes can only exist due to some
  * error condition. One of them should be bad. Inodes with identical
  * inode numbers (objectids) are distinguished by parent directory ids.
  *
  */
-static int reiserfs_find_actor( struct inode *inode, 
-				unsigned long inode_no, void *opaque )
+int reiserfs_find_actor( struct inode *inode, void *opaque )
 {
-    struct reiserfs_iget4_args *args;
+    struct reiserfs_iget_args *args;
 
     args = opaque;
     /* args is already in CPU order */
-    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
+    return (inode->i_ino == args->objectid) &&
+	(le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args->dirid);
 }
 
 struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)
 {
     struct inode * inode;
-    struct reiserfs_iget4_args args ;
+    struct reiserfs_iget_args args ;
 
-    args.objectid = key->on_disk_key.k_dir_id ;
-    inode = iget4 (s, key->on_disk_key.k_objectid, 
-		   reiserfs_find_actor, (void *)(&args));
+    args.objectid = key->on_disk_key.k_objectid ;
+    args.dirid = key->on_disk_key.k_dir_id ;
+    inode = iget5_locked (s, key->on_disk_key.k_objectid, 
+		   reiserfs_find_actor, reiserfs_init_locked_inode, (void *)(&args));
     if (!inode) 
 	return ERR_PTR(-ENOMEM) ;
+
+    if (inode->i_state & I_NEW) {
+	reiserfs_read_locked_inode(inode, &args);
+	unlock_new_inode(inode);
+    }
 
     if (comp_short_keys (INODE_PKEY (inode), key) || is_bad_inode (inode)) {
 	/* either due to i/o error or a stale NFS handle */
===== fs/reiserfs/super.c 1.27 vs edited =====
--- 1.27/fs/reiserfs/super.c	Wed Oct 30 19:42:36 2002
+++ edited/fs/reiserfs/super.c	Fri Feb 21 14:28:06 2003
@@ -381,8 +381,6 @@
 
 struct super_operations reiserfs_sops = 
 {
-  read_inode: reiserfs_read_inode,
-  read_inode2: reiserfs_read_inode2,
   write_inode: reiserfs_write_inode,
   dirty_inode: reiserfs_dirty_inode,
   delete_inode: reiserfs_delete_inode,
@@ -1117,7 +1115,7 @@
     int old_format = 0;
     unsigned long blocks;
     int jinit_done = 0 ;
-    struct reiserfs_iget4_args args ;
+    struct reiserfs_iget_args args ;
     int old_magic;
     struct reiserfs_super_block * rs;
 
@@ -1194,11 +1192,17 @@
         printk("clm-7000: Detected readonly device, marking FS readonly\n") ;
 	s->s_flags |= MS_RDONLY ;
     }
-    args.objectid = REISERFS_ROOT_PARENT_OBJECTID ;
-    root_inode = iget4 (s, REISERFS_ROOT_OBJECTID, 0, (void *)(&args));
+    args.objectid = REISERFS_ROOT_OBJECTID ;
+    args.dirid = REISERFS_ROOT_PARENT_OBJECTID ;
+    root_inode = iget5_locked (s, REISERFS_ROOT_OBJECTID, reiserfs_find_actor, reiserfs_init_locked_inode, (void *)(&args));
     if (!root_inode) {
 	printk ("reiserfs_read_super: get root inode failed\n");
 	goto error;
+    }
+ 
+    if (root_inode->i_state & I_NEW) {
+	reiserfs_read_locked_inode(root_inode, &args);
+	unlock_new_inode(root_inode);
     }
 
     s->s_root = d_alloc_root(root_inode);  
===== include/linux/fs.h 1.74 vs edited =====
--- 1.74/include/linux/fs.h	Sat Jan  4 06:09:16 2003
+++ edited/include/linux/fs.h	Mon Mar  3 19:32:51 2003
@@ -885,13 +885,6 @@
 
 	void (*read_inode) (struct inode *);
   
-  	/* reiserfs kludge.  reiserfs needs 64 bits of information to
-    	** find an inode.  We are using the read_inode2 call to get
-   	** that information.  We don't like this, and are waiting on some
-   	** VFS changes for the real solution.
-   	** iget4 calls read_inode2, iff it is defined
-   	*/
-    	void (*read_inode2) (struct inode *, void *) ;
    	void (*dirty_inode) (struct inode *);
 	void (*write_inode) (struct inode *, int);
 	void (*put_inode) (struct inode *);
@@ -940,6 +933,7 @@
 #define I_LOCK			8
 #define I_FREEING		16
 #define I_CLEAR			32
+#define I_NEW			64
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
@@ -1378,19 +1372,36 @@
 extern struct inode * igrab(struct inode *);
 extern ino_t iunique(struct super_block *, ino_t);
 
-typedef int (*find_inode_t)(struct inode *, unsigned long, void *);
-extern struct inode * iget4(struct super_block *, unsigned long, find_inode_t, void *);
+extern struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
+		int (*test)(struct inode *, void *), void *data);
+extern struct inode *ilookup(struct super_block *sb, unsigned long ino);
+
+extern struct inode * iget5_locked(struct super_block *, unsigned long, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *);
+extern struct inode * iget_locked(struct super_block *, unsigned long);
+extern void unlock_new_inode(struct inode *);
+
 static inline struct inode *iget(struct super_block *sb, unsigned long ino)
 {
-	return iget4(sb, ino, NULL, NULL);
+	struct inode *inode = iget_locked(sb, ino);
+
+	if (inode && (inode->i_state & I_NEW)) {
+		sb->s_op->read_inode(inode);
+		unlock_new_inode(inode);
+	}
+
+	return inode;
 }
 
 extern void clear_inode(struct inode *);
 extern struct inode *new_inode(struct super_block *sb);
 extern void remove_suid(struct inode *inode);
 
-extern void insert_inode_hash(struct inode *);
+extern void __insert_inode_hash(struct inode *, unsigned long hashval);
 extern void remove_inode_hash(struct inode *);
+static inline void insert_inode_hash(struct inode *inode) {
+	__insert_inode_hash(inode, inode->i_ino);
+}
+
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern struct buffer_head * get_hash_table(kdev_t, int, int);
===== include/linux/reiserfs_fs.h 1.26 vs edited =====
--- 1.26/include/linux/reiserfs_fs.h	Mon Jan 20 13:19:30 2003
+++ edited/include/linux/reiserfs_fs.h	Mon Mar  3 19:34:27 2003
@@ -1478,8 +1478,9 @@
 #define B_I_POS_UNFM_POINTER(bh,ih,pos) le32_to_cpu(*(((unp_t *)B_I_PITEM(bh,ih)) + (pos)))
 #define PUT_B_I_POS_UNFM_POINTER(bh,ih,pos, val) do {*(((unp_t *)B_I_PITEM(bh,ih)) + (pos)) = cpu_to_le32(val); } while (0)
 
-struct reiserfs_iget4_args {
+struct reiserfs_iget_args {
     __u32 objectid ;
+    __u32 dirid ;
 } ;
 
 /***************************************************************************/
@@ -1730,8 +1731,9 @@
 
 /* inode.c */
 
-void reiserfs_read_inode (struct inode * inode) ;
-void reiserfs_read_inode2(struct inode * inode, void *p) ;
+void reiserfs_read_locked_inode(struct inode * inode, struct reiserfs_iget_args *args) ;
+int reiserfs_find_actor(struct inode * inode, void *p) ;
+int reiserfs_init_locked_inode(struct inode * inode, void *p) ;
 void reiserfs_delete_inode (struct inode * inode);
 void reiserfs_write_inode (struct inode * inode, int) ;
 struct dentry *reiserfs_fh_to_dentry(struct super_block *sb, __u32 *data,
===== kernel/ksyms.c 1.67 vs edited =====
--- 1.67/kernel/ksyms.c	Tue Oct  1 22:34:41 2002
+++ edited/kernel/ksyms.c	Fri Feb 21 14:19:50 2003
@@ -140,7 +140,6 @@
 EXPORT_SYMBOL(fget);
 EXPORT_SYMBOL(igrab);
 EXPORT_SYMBOL(iunique);
-EXPORT_SYMBOL(iget4);
 EXPORT_SYMBOL(iput);
 EXPORT_SYMBOL(inode_init_once);
 EXPORT_SYMBOL(force_delete);
@@ -528,7 +527,7 @@
 EXPORT_SYMBOL(read_ahead);
 EXPORT_SYMBOL(get_hash_table);
 EXPORT_SYMBOL(new_inode);
-EXPORT_SYMBOL(insert_inode_hash);
+EXPORT_SYMBOL(__insert_inode_hash);
 EXPORT_SYMBOL(remove_inode_hash);
 EXPORT_SYMBOL(buffer_insert_list);
 EXPORT_SYMBOL(make_bad_inode);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-29 18:45               ` Soeren Sonnenburg
@ 2003-03-31  8:37                 ` Oleg Drokin
  2003-03-31  8:43                   ` Soeren Sonnenburg
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Drokin @ 2003-03-31  8:37 UTC (permalink / raw)
  To: Soeren Sonnenburg; +Cc: reiserfs-list

Hello!

On Sat, Mar 29, 2003 at 07:45:34PM +0100, Soeren Sonnenburg wrote:
> > > > I think may be race with iget4 might cause this, but I am not sure.
> > > > We have a patch for this iget4 race, and if you are willing to test it, I
> > > > can send it to you.
> > > Yes of course I am. I hope the other admins won't kill me, but I guess they 
> > > know that I willing to do experiments from time to time ;-)
> > > Could something worse happen when I try the patch, that could not be fixed by 
> > > simple rebooting ?
> > See the patch below.
> > It should not break anything.
> Well, we keep getting this error in massess too... and in our case it is
> very likely that clients access the same file from all the clients...

Are all of your NFS exports where these errors happens reiserfs?

> However I cannot quickly test that atm but if it is really safe I might
> test it next week.

Well, there is a problem reported with that 2.4.20 backport, that I am going to look into.
If you can try 2.4.21-pre6 with iget5_locked.diff (I just sent to the list in this same
thread) that would be great.

Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31  8:37                 ` Oleg Drokin
@ 2003-03-31  8:43                   ` Soeren Sonnenburg
  2003-03-31  9:05                     ` Oleg Drokin
  0 siblings, 1 reply; 27+ messages in thread
From: Soeren Sonnenburg @ 2003-03-31  8:43 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Soeren Sonnenburg, reiserfs-list



On Mon, 31 Mar 2003, Oleg Drokin wrote:

> Hello!
> 
> On Sat, Mar 29, 2003 at 07:45:34PM +0100, Soeren Sonnenburg wrote:
> > > > > I think may be race with iget4 might cause this, but I am not sure.
> > > > > We have a patch for this iget4 race, and if you are willing to test it, I
> > > > > can send it to you.
> > > > Yes of course I am. I hope the other admins won't kill me, but I guess they 
> > > > know that I willing to do experiments from time to time ;-)
> > > > Could something worse happen when I try the patch, that could not be fixed by 
> > > > simple rebooting ?
> > > See the patch below.
> > > It should not break anything.
> > Well, we keep getting this error in massess too... and in our case it is
> > very likely that clients access the same file from all the clients...
> 
> Are all of your NFS exports where these errors happens reiserfs?

This is the same machine I was telling you about on lkml ...
The errors I got now are all on the most actively used nfs share which
uses reiserfs as the underlying filesystem.

> > However I cannot quickly test that atm but if it is really safe I might
> > test it next week.
> 
> Well, there is a problem reported with that 2.4.20 backport, that I am going to look into.
> If you can try 2.4.21-pre6 with iget5_locked.diff (I just sent to the list in this same
> thread) that would be great.
> 
> Thank you.

well, I can't do experimental testing as there are like 40
diskless machines using
that disk ... I could somewhen in the evening/night do some stress test
but thats it...

Soeren
 
> Bye,
>     Oleg
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31  8:43                   ` Soeren Sonnenburg
@ 2003-03-31  9:05                     ` Oleg Drokin
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-03-31  9:05 UTC (permalink / raw)
  To: Soeren Sonnenburg; +Cc: Soeren Sonnenburg, reiserfs-list

Hello!

On Mon, Mar 31, 2003 at 10:43:56AM +0200, Soeren Sonnenburg wrote:
> > > > > Yes of course I am. I hope the other admins won't kill me, but I guess they 
> > > > > know that I willing to do experiments from time to time ;-)
> > > > > Could something worse happen when I try the patch, that could not be fixed by 
> > > > > simple rebooting ?
> > > > See the patch below.
> > > > It should not break anything.
> > > Well, we keep getting this error in massess too... and in our case it is
> > > very likely that clients access the same file from all the clients...
> > Are all of your NFS exports where these errors happens reiserfs?
> This is the same machine I was telling you about on lkml ...

Ah, ok.

> well, I can't do experimental testing as there are like 40
> diskless machines using
> that disk ... I could somewhen in the evening/night do some stress test
> but thats it...

Ok, I understand. Well, I guess I will spend some time on this iget5 2.4.20 backport
and on trying to reproduce the problem locally.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-30 15:08               ` Bernd Schubert
  2003-03-31  8:33                 ` Oleg Drokin
@ 2003-03-31 10:24                 ` Oleg Drokin
  2003-03-31 10:37                   ` Soeren Sonnenburg
                                     ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-03-31 10:24 UTC (permalink / raw)
  To: Bernd Schubert, reiserfs; +Cc: reiserfs-list

Hello!

On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:

> Anyway I also tested the kernel-patch, but it causes problems with mounts from 
> another nfs-server. Well, mounting works fine, but e.g. running 'ls 

Ah, stupid me.
I have another patch that we created before iget5_locked backport.
It's a bit of hack to beat the race, but it works (fixes the race).
It does not touch any code outside of reiserfs.
So if you apply that to 2.4.20 and see if you still can see the NFS problem
over time...
Patch is below.

Bye,
    Oleg

===== fs/reiserfs/inode.c 1.42 vs edited =====
--- 1.42/fs/reiserfs/inode.c	Thu Feb 13 15:42:42 2003
+++ edited/fs/reiserfs/inode.c	Thu Feb 20 17:23:24 2003
@@ -20,6 +20,10 @@
 static int reiserfs_get_block (struct inode * inode, long block,
 			       struct buffer_head * bh_result, int create);
 
+/* This spinlock guards inode pkey in private part of inode
+   against race between find_actor() vs reiserfs_read_inode2 */
+static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
+
 void reiserfs_delete_inode (struct inode * inode)
 {
     int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; 
@@ -898,8 +902,9 @@
     bh = PATH_PLAST_BUFFER (path);
     ih = PATH_PITEM_HEAD (path);
 
-
+    spin_lock(&keycopy_lock);
     copy_key (INODE_PKEY (inode), &(ih->ih_key));
+    spin_unlock(&keycopy_lock);
     inode->i_blksize = PAGE_SIZE;
 
     INIT_LIST_HEAD(&inode->u.reiserfs_i.i_prealloc_list) ;
@@ -1220,10 +1225,27 @@
 				unsigned long inode_no, void *opaque )
 {
     struct reiserfs_iget4_args *args;
+    int retval;
 
     args = opaque;
+    /* We protect against possible parallel init_inode() on another CPU here. */
+    spin_lock(&keycopy_lock);
     /* args is already in CPU order */
-    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
+    if (le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid)
+	retval = 1;
+    else
+	/* If The key does not match, lets see if we are racing
+	   with another iget4, that already progressed so far
+	   to reiserfs_read_inode2() and was preempted in
+	   call to search_by_key(). The signs of that are:
+	     Inode is locked
+	     dirid and object id are zero (not yet initialized)*/
+	retval = (inode->i_state & I_LOCK) &&
+		 !INODE_PKEY(inode)->k_dir_id &&
+		 !INODE_PKEY(inode)->k_objectid;
+
+    spin_unlock(&keycopy_lock);
+    return retval;
 }
 
 struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 10:24                 ` Oleg Drokin
@ 2003-03-31 10:37                   ` Soeren Sonnenburg
  2003-03-31 10:41                     ` Oleg Drokin
  2003-03-31 10:49                   ` Bernd Schubert
  2003-08-06 21:00                   ` John Dalbec
  2 siblings, 1 reply; 27+ messages in thread
From: Soeren Sonnenburg @ 2003-03-31 10:37 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Bernd Schubert, reiserfs, reiserfs-list



On Mon, 31 Mar 2003, Oleg Drokin wrote:

> Hello!
> 
> On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:
> 
> > Anyway I also tested the kernel-patch, but it causes problems with mounts from 
> > another nfs-server. Well, mounting works fine, but e.g. running 'ls 
> 
> Ah, stupid me.
> I have another patch that we created before iget5_locked backport.
> It's a bit of hack to beat the race, but it works (fixes the race).
> It does not touch any code outside of reiserfs.
> So if you apply that to 2.4.20 and see if you still can see the NFS problem
> over time...
> Patch is below.
> 

Err, sorry but I have to ask again:

This problem needs a patch on the server only - yes ?

Then I will at least patch+compile the kernel.

Soeren.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 10:37                   ` Soeren Sonnenburg
@ 2003-03-31 10:41                     ` Oleg Drokin
  2003-03-31 13:28                       ` Soeren Sonnenburg
  0 siblings, 1 reply; 27+ messages in thread
From: Oleg Drokin @ 2003-03-31 10:41 UTC (permalink / raw)
  To: Soeren Sonnenburg; +Cc: Bernd Schubert, reiserfs, reiserfs-list

Hello!

On Mon, Mar 31, 2003 at 12:37:12PM +0200, Soeren Sonnenburg wrote:
> > I have another patch that we created before iget5_locked backport.
> > It's a bit of hack to beat the race, but it works (fixes the race).
> > It does not touch any code outside of reiserfs.
> > So if you apply that to 2.4.20 and see if you still can see the NFS problem
> > over time...
> > Patch is below.
> Err, sorry but I have to ask again:
> This problem needs a patch on the server only - yes ?

Yes. Well, may be this race is not the cause of the problem you are observing,
but prerequisites are similar, so we'd better make sure this is different problem.

> Then I will at least patch+compile the kernel.

Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 10:24                 ` Oleg Drokin
  2003-03-31 10:37                   ` Soeren Sonnenburg
@ 2003-03-31 10:49                   ` Bernd Schubert
  2003-08-06 21:00                   ` John Dalbec
  2 siblings, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-03-31 10:49 UTC (permalink / raw)
  To: Oleg Drokin, reiserfs; +Cc: reiserfs-list

On Monday 31 March 2003 12:24, Oleg Drokin wrote:

Hello Oleg!

>
> On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:
> > Anyway I also tested the kernel-patch, but it causes problems with mounts
> > from another nfs-server. Well, mounting works fine, but e.g. running 'ls
>
> Ah, stupid me.
> I have another patch that we created before iget5_locked backport.
> It's a bit of hack to beat the race, but it works (fixes the race).
> It does not touch any code outside of reiserfs.
> So if you apply that to 2.4.20 and see if you still can see the NFS problem
> over time...
> Patch is below.


Thanks a lot for both new patches, I will try them as soon as possible, though 
it is usually more difficult to reboot during the week (so it may become the 
next weekend, before I can try it). Anyway I will also try it on our data 
server, where we also see those messages from time to time, even without 
rebooting.


Best regards,
	Bernd


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 10:41                     ` Oleg Drokin
@ 2003-03-31 13:28                       ` Soeren Sonnenburg
  2003-06-02 15:01                         ` Oleg Drokin
  0 siblings, 1 reply; 27+ messages in thread
From: Soeren Sonnenburg @ 2003-03-31 13:28 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Bernd Schubert, reiserfs, reiserfs-list



On Mon, 31 Mar 2003, Oleg Drokin wrote:

> > Err, sorry but I have to ask again:
> > This problem needs a patch on the server only - yes ?
> 
> Yes. Well, may be this race is not the cause of the problem you are observing,
> but prerequisites are similar, so we'd better make sure this is different problem.

well I know that these files were accessed simultanously from a number of
clients... so it is very likely that this is the same bug...

> > Then I will at least patch+compile the kernel.
> 
> Thank you.

will install this kernel tuesday night (tomorrow) and run some test too.

we will know wednesday if that was the cause.

Soeren.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
       [not found]   ` <20030331122820.D25533@namesys.com>
@ 2003-03-31 21:44     ` Roland
  0 siblings, 0 replies; 27+ messages in thread
From: Roland @ 2003-03-31 21:44 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: reiserfs-list

Hi Oleg,

i've forgot to CC to reiserfs-list last time, so i do a full quote.

On Monday 31 March 2003 10:28, you wrote:
> Hello!
>
> On Sat, Mar 29, 2003 at 07:33:48PM +0100, Roland wrote:
> > i've read your thread very interessted, because i think i've a very
> > similar problem (but until know, i didn't think about reiserfs as
> > source): kernel 2.4.20 SMP, HighMem, HighMem IO (without ptrace patch)
> > root filesystem ~70GB ext3 on Compaq CISS
> > /data filesystems ~1.1TB reiserfs (on top of lvm) on a second Compaq CISS
> > System: Compaq Dual PIII 2GB Ram
> > the system is used as nfs fileserver for diskless clients (about 30 right
> > now) first we experienced high load situations during daily cron jobs,
> > later we saw log messages very similar to what Bernd has
> > (/usr/sbin/logrotate, /usr/lib/perl5/...) and last week we had a crash of
> > the fileserver (no sysreq key's, console blanking didn't responded, ping
> > to system was ok)
>
> Was there anything in logs after reboot?

no, nothing...

> > since that crash we don't have any 'nfsd-fh: found a name that I didn't
> > expect:' messages; but we need to find out what was behind that crash...
> > Any ideas?
>
> If only you had related kernel oops/whatever was caused the crash
> (either on your HDD, or captured via serial console/network oops
> dumper/whatever), that would be much easier.

the system has no serial console attached *sight*, the logs on disk carried 
nothing useful...

> > (it would be possible for me to provide a machine which is 100% identical
> > to our production fileserver if this would help for tests, but i don't
> > have ressources to put up more than one client for this testsystem...)
>
> Since we do not know yet on how to produce those nfsd-fh stuff, there is no
> point in using your test server yet.

hmm... ok if you think it can be useful, let me know...

Greetings,

Roland


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 13:28                       ` Soeren Sonnenburg
@ 2003-06-02 15:01                         ` Oleg Drokin
  0 siblings, 0 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-06-02 15:01 UTC (permalink / raw)
  To: Soeren Sonnenburg; +Cc: reiserfs-list

Hello!

On Mon, Mar 31, 2003 at 03:28:54PM +0200, Soeren Sonnenburg wrote:

Hmm....
Now this explains why I have not got some of the mails you have referred to.
Seems that somebody turned off some mailserver with nonempty mailqueue
and turned it on only now ;)

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-03-31 10:24                 ` Oleg Drokin
  2003-03-31 10:37                   ` Soeren Sonnenburg
  2003-03-31 10:49                   ` Bernd Schubert
@ 2003-08-06 21:00                   ` John Dalbec
  2003-08-06 22:06                     ` Bernd Schubert
  2003-08-07  5:28                     ` Oleg Drokin
  2 siblings, 2 replies; 27+ messages in thread
From: John Dalbec @ 2003-08-06 21:00 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Bernd Schubert, reiserfs, reiserfs-list



Oleg Drokin wrote:
> Hello!
> 
> On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:
> 
> 
>>Anyway I also tested the kernel-patch, but it causes problems with mounts from 
>>another nfs-server. Well, mounting works fine, but e.g. running 'ls 
> 
> 
> Ah, stupid me.
> I have another patch that we created before iget5_locked backport.
> It's a bit of hack to beat the race, but it works (fixes the race).
> It does not touch any code outside of reiserfs.
> So if you apply that to 2.4.20 and see if you still can see the NFS problem
> over time...
> Patch is below.

I just got an "nfsd-fh: found a name that I didn't expect" yesterday. 
I'm using a Red Hat 2.4.20 RPM with 2.4.20-pending+data-logging+quota.
Should I apply just this patch or both this patch and the 
iget5_locked_2.4.20 patch?
Thanks,
John Dalbec
> 
> Bye,
>     Oleg
> 
> ===== fs/reiserfs/inode.c 1.42 vs edited =====
> --- 1.42/fs/reiserfs/inode.c	Thu Feb 13 15:42:42 2003
> +++ edited/fs/reiserfs/inode.c	Thu Feb 20 17:23:24 2003
> @@ -20,6 +20,10 @@
>  static int reiserfs_get_block (struct inode * inode, long block,
>  			       struct buffer_head * bh_result, int create);
>  
> +/* This spinlock guards inode pkey in private part of inode
> +   against race between find_actor() vs reiserfs_read_inode2 */
> +static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
> +
>  void reiserfs_delete_inode (struct inode * inode)
>  {
>      int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; 
> @@ -898,8 +902,9 @@
>      bh = PATH_PLAST_BUFFER (path);
>      ih = PATH_PITEM_HEAD (path);
>  
> -
> +    spin_lock(&keycopy_lock);
>      copy_key (INODE_PKEY (inode), &(ih->ih_key));
> +    spin_unlock(&keycopy_lock);
>      inode->i_blksize = PAGE_SIZE;
>  
>      INIT_LIST_HEAD(&inode->u.reiserfs_i.i_prealloc_list) ;
> @@ -1220,10 +1225,27 @@
>  				unsigned long inode_no, void *opaque )
>  {
>      struct reiserfs_iget4_args *args;
> +    int retval;
>  
>      args = opaque;
> +    /* We protect against possible parallel init_inode() on another CPU here. */
> +    spin_lock(&keycopy_lock);
>      /* args is already in CPU order */
> -    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
> +    if (le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid)
> +	retval = 1;
> +    else
> +	/* If The key does not match, lets see if we are racing
> +	   with another iget4, that already progressed so far
> +	   to reiserfs_read_inode2() and was preempted in
> +	   call to search_by_key(). The signs of that are:
> +	     Inode is locked
> +	     dirid and object id are zero (not yet initialized)*/
> +	retval = (inode->i_state & I_LOCK) &&
> +		 !INODE_PKEY(inode)->k_dir_id &&
> +		 !INODE_PKEY(inode)->k_objectid;
> +
> +    spin_unlock(&keycopy_lock);
> +    return retval;
>  }
>  
>  struct inode * reiserfs_iget (struct super_block * s, const struct cpu_key * key)
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-08-06 21:00                   ` John Dalbec
@ 2003-08-06 22:06                     ` Bernd Schubert
  2003-08-07  5:28                     ` Oleg Drokin
  1 sibling, 0 replies; 27+ messages in thread
From: Bernd Schubert @ 2003-08-06 22:06 UTC (permalink / raw)
  To: John Dalbec, Oleg Drokin; +Cc: reiserfs, reiserfs-list

On Wednesday 06 August 2003 23:00, John Dalbec wrote:
> Oleg Drokin wrote:
> > Hello!
> >
> > On Sun, Mar 30, 2003 at 04:08:11PM +0100, Bernd Schubert wrote:
> >>Anyway I also tested the kernel-patch, but it causes problems with mounts
> >> from another nfs-server. Well, mounting works fine, but e.g. running 'ls
> >
> > Ah, stupid me.
> > I have another patch that we created before iget5_locked backport.
> > It's a bit of hack to beat the race, but it works (fixes the race).
> > It does not touch any code outside of reiserfs.
> > So if you apply that to 2.4.20 and see if you still can see the NFS
> > problem over time...
> > Patch is below.
>
> I just got an "nfsd-fh: found a name that I didn't expect" yesterday.
> I'm using a Red Hat 2.4.20 RPM with 2.4.20-pending+data-logging+quota.
> Should I apply just this patch or both this patch and the
> iget5_locked_2.4.20 patch?
> Thanks,
> John Dalbec

Hi,

only the patch below! 
Don't install the iget5_locked_2.4.20 patch as it may cause problems with 
imports from other nfs-servers.

Bernd

>
> > Bye,
> >     Oleg
> >
> > ===== fs/reiserfs/inode.c 1.42 vs edited =====
> > --- 1.42/fs/reiserfs/inode.c	Thu Feb 13 15:42:42 2003
> > +++ edited/fs/reiserfs/inode.c	Thu Feb 20 17:23:24 2003
> > @@ -20,6 +20,10 @@
> >  static int reiserfs_get_block (struct inode * inode, long block,
> >  			       struct buffer_head * bh_result, int create);
> >
> > +/* This spinlock guards inode pkey in private part of inode
> > +   against race between find_actor() vs reiserfs_read_inode2 */
> > +static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
> > +
> >  void reiserfs_delete_inode (struct inode * inode)
> >  {
> >      int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2;
> > @@ -898,8 +902,9 @@
> >      bh = PATH_PLAST_BUFFER (path);
> >      ih = PATH_PITEM_HEAD (path);
> >
> > -
> > +    spin_lock(&keycopy_lock);
> >      copy_key (INODE_PKEY (inode), &(ih->ih_key));
> > +    spin_unlock(&keycopy_lock);
> >      inode->i_blksize = PAGE_SIZE;
> >
> >      INIT_LIST_HEAD(&inode->u.reiserfs_i.i_prealloc_list) ;
> > @@ -1220,10 +1225,27 @@
> >  				unsigned long inode_no, void *opaque )
> >  {
> >      struct reiserfs_iget4_args *args;
> > +    int retval;
> >
> >      args = opaque;
> > +    /* We protect against possible parallel init_inode() on another CPU
> > here. */ +    spin_lock(&keycopy_lock);
> >      /* args is already in CPU order */
> > -    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
> > +    if (le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid)
> > +	retval = 1;
> > +    else
> > +	/* If The key does not match, lets see if we are racing
> > +	   with another iget4, that already progressed so far
> > +	   to reiserfs_read_inode2() and was preempted in
> > +	   call to search_by_key(). The signs of that are:
> > +	     Inode is locked
> > +	     dirid and object id are zero (not yet initialized)*/
> > +	retval = (inode->i_state & I_LOCK) &&
> > +		 !INODE_PKEY(inode)->k_dir_id &&
> > +		 !INODE_PKEY(inode)->k_objectid;
> > +
> > +    spin_unlock(&keycopy_lock);
> > +    return retval;
> >  }
> >
> >  struct inode * reiserfs_iget (struct super_block * s, const struct
> > cpu_key * key)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfsd-fh: found a name that I didn't expect
  2003-08-06 21:00                   ` John Dalbec
  2003-08-06 22:06                     ` Bernd Schubert
@ 2003-08-07  5:28                     ` Oleg Drokin
  1 sibling, 0 replies; 27+ messages in thread
From: Oleg Drokin @ 2003-08-07  5:28 UTC (permalink / raw)
  To: John Dalbec; +Cc: Bernd Schubert, reiserfs, reiserfs-list

Hello!

On Wed, Aug 06, 2003 at 05:00:03PM -0400, John Dalbec wrote:

> I just got an "nfsd-fh: found a name that I didn't expect" yesterday. 
> I'm using a Red Hat 2.4.20 RPM with 2.4.20-pending+data-logging+quota.
> Should I apply just this patch or both this patch and the 
> iget5_locked_2.4.20 patch?

You only need the patch below. iget5_locked_2.4.20 patch is broken.

Bye,
    Oleg
> >===== fs/reiserfs/inode.c 1.42 vs edited =====
> >--- 1.42/fs/reiserfs/inode.c	Thu Feb 13 15:42:42 2003
> >+++ edited/fs/reiserfs/inode.c	Thu Feb 20 17:23:24 2003
> >@@ -20,6 +20,10 @@
> > static int reiserfs_get_block (struct inode * inode, long block,
> > 			       struct buffer_head * bh_result, int create);
> > 
> >+/* This spinlock guards inode pkey in private part of inode
> >+   against race between find_actor() vs reiserfs_read_inode2 */
> >+static spinlock_t keycopy_lock = SPIN_LOCK_UNLOCKED;
> >+
> > void reiserfs_delete_inode (struct inode * inode)
> > {
> >     int jbegin_count = JOURNAL_PER_BALANCE_CNT * 2; 
> >@@ -898,8 +902,9 @@
> >     bh = PATH_PLAST_BUFFER (path);
> >     ih = PATH_PITEM_HEAD (path);
> > 
> >-
> >+    spin_lock(&keycopy_lock);
> >     copy_key (INODE_PKEY (inode), &(ih->ih_key));
> >+    spin_unlock(&keycopy_lock);
> >     inode->i_blksize = PAGE_SIZE;
> > 
> >     INIT_LIST_HEAD(&inode->u.reiserfs_i.i_prealloc_list) ;
> >@@ -1220,10 +1225,27 @@
> > 				unsigned long inode_no, void *opaque )
> > {
> >     struct reiserfs_iget4_args *args;
> >+    int retval;
> > 
> >     args = opaque;
> >+    /* We protect against possible parallel init_inode() on another CPU 
> >here. */
> >+    spin_lock(&keycopy_lock);
> >     /* args is already in CPU order */
> >-    return le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid;
> >+    if (le32_to_cpu(INODE_PKEY(inode)->k_dir_id) == args -> objectid)
> >+	retval = 1;
> >+    else
> >+	/* If The key does not match, lets see if we are racing
> >+	   with another iget4, that already progressed so far
> >+	   to reiserfs_read_inode2() and was preempted in
> >+	   call to search_by_key(). The signs of that are:
> >+	     Inode is locked
> >+	     dirid and object id are zero (not yet initialized)*/
> >+	retval = (inode->i_state & I_LOCK) &&
> >+		 !INODE_PKEY(inode)->k_dir_id &&
> >+		 !INODE_PKEY(inode)->k_objectid;
> >+
> >+    spin_unlock(&keycopy_lock);
> >+    return retval;
> > }
> > 
> > struct inode * reiserfs_iget (struct super_block * s, const struct 
> > cpu_key * key)
> >
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2003-08-07  5:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-28 17:28 nfsd-fh: found a name that I didn't expect Bernd Schubert
2003-03-28 17:28 ` Bernd Schubert
2003-03-29 11:01 ` Oleg Drokin
2003-03-29 11:54   ` Bernd Schubert
2003-03-29 11:59     ` Oleg Drokin
2003-03-29 14:09       ` Bernd Schubert
2003-03-29 14:13         ` Oleg Drokin
2003-03-29 15:06           ` Bernd Schubert
2003-03-29 17:37             ` Oleg Drokin
2003-03-29 18:22               ` Bernd Schubert
2003-03-29 18:45               ` Soeren Sonnenburg
2003-03-31  8:37                 ` Oleg Drokin
2003-03-31  8:43                   ` Soeren Sonnenburg
2003-03-31  9:05                     ` Oleg Drokin
2003-03-30 15:08               ` Bernd Schubert
2003-03-31  8:33                 ` Oleg Drokin
2003-03-31 10:24                 ` Oleg Drokin
2003-03-31 10:37                   ` Soeren Sonnenburg
2003-03-31 10:41                     ` Oleg Drokin
2003-03-31 13:28                       ` Soeren Sonnenburg
2003-06-02 15:01                         ` Oleg Drokin
2003-03-31 10:49                   ` Bernd Schubert
2003-08-06 21:00                   ` John Dalbec
2003-08-06 22:06                     ` Bernd Schubert
2003-08-07  5:28                     ` Oleg Drokin
2003-03-29 12:04   ` Bernd Schubert
     [not found] ` <200303291933.48476.roland@xebec.de>
     [not found]   ` <20030331122820.D25533@namesys.com>
2003-03-31 21:44     ` Roland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.