----- Original Message -----
Sent: Tuesday, February 11, 2003 12:17
PM
Subject: the inode semaphore
We’re working on a high availability
experiment
project for our linux-based server and the nfsds hang
from
times to times. It is observed that the version 3
procedure 12/13 (rmdir
and remove) obtains an inode
semaphore within fh_lock()/nfsd_unlink() but
never
releases it. Theoretically this inode is going to get
released
but what can prevent the following two cases
that would end up hanging the
nfsd ?
1. Due to slow network and/or system, the client
re-
sends the same request but using a different xid. The
nfsd that
accepts the new request would be stuck in
nfsd_unlink() waiting for the
inode semaphore.
2. A “create” immediately follows the “remove” for the
same file.
There is a flag called fh_locked but it is passed
in
from each individual nfs request (a local copy, not
from dcache)
that doesn’t help. I also browse thru
some file system’s (say ext3) inode
free code but
can’t find anywhere this semaphore gets released.
We’re
still debugging our failover logic but if
someone could confirm the above
are indeed a bug (or
explain why it will not happen) would be of great
helps.
Or should this semaphore get “zero”ed out by file
system (say, ext3) ?
We’re on 2.4.19 kernel.
Thanks for the help.
Wendy