All of lore.kernel.org
 help / color / mirror / Atom feed
* fcntl locks prevent unmounting of underlying filesystem
@ 2004-05-26 16:43 Jeffrey Layton
  2004-05-27 14:27 ` Jeffrey Layton
  2004-05-29 22:51 ` Neil Brown
  0 siblings, 2 replies; 6+ messages in thread
From: Jeffrey Layton @ 2004-05-26 16:43 UTC (permalink / raw)
  To: nfs

I've seem to have run across a problem with NFS and fcntl locks. I'm
trying to implement a HA-NFS solution using heartbeat, DRBD, LVM2, etc.
I'm running the following:

2.6.6 kernel
nfs-kernel-server and nfs-common 1.0.6-3 (debian packages)

The underlying filesystem is reiserfs. Essentially what I'm seeing is
that when I try to shut down NFS and unmount the filesystem for a
failover, I'm unable to unmount if I have an fcntl lock on the file.

Here's the C program I used to test the locks (my C coding is not the
best, I copied a lot of this from R. Stevens' book):


-------------------------[snip]------------------------

#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
                                                                                
#define write_lock(fd, offset, whence, len) \
        lock_reg(fd, F_SETLK, F_WRLCK, offset, whence, len)
                                                                                
int main( int argv, char *argc[] ) {
                                                                                
    int fd;
                                                                                
    fd = open(argc[1], O_RDWR);
 
    if ( write_lock(fd, 0, SEEK_SET, 0) < 0 ) {
        printf("Unable to acquire lock!\n");
    } else {
        printf("Got the lock. Sleeping for 300 secs.\n");
        sleep(300);
    }
 
    close(fd);
}
 
int lock_reg (int fd, int cmd, int type, off_t offset, int whence, off_t
len) {
 
    struct flock lock;
 
    lock.l_type = type;
    lock.l_start = offset;
    lock.l_whence = whence;
    lock.l_len = len;
 
    return (fcntl(fd, cmd, &lock)); 
}

-----------------------[snip]-----------------------------

I run this program against a file on an NFS mounted directory on the
client machine. On the server, I then shut down NFS (using the debian
nfs-common and nfs-kernel-server startup scripts).

I'm then unable to unmount the underlying filesystem. The error message
is (yes, umount prints it twice for some reason):

umount: /services/NFS/home: device is busy
umount: /services/NFS/home: device is busy

If I then start up NFS again, and kill the locktest program, I'm then
able to shut down nfs and unmount the filesystem.

I also did a test where I just opened the file r/w without locking it,
and it didn't seem to have the same problem, so it seems like the
fcntl() lock is what is causing the problem (though I could be wrong
here).

I've been able to replicate this problem with /proc/fs/nfsd mounted and
unmounted on the server.

I also tried applying the exportfs patch that was in the thread:

   nfsd, rmtab, failover, and stale filehandles

on this mailing list earlier this month, and it didn't help. Has anyone
else seen this problem?

If there's any other info you need me to provide to help diagnose this,
please don't hesitate to ask!

Thanks,
Jeff




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fcntl locks prevent unmounting of underlying filesystem
  2004-05-26 16:43 fcntl locks prevent unmounting of underlying filesystem Jeffrey Layton
@ 2004-05-27 14:27 ` Jeffrey Layton
  2004-05-27 17:31   ` Jeffrey Layton
  2004-05-29 22:51 ` Neil Brown
  1 sibling, 1 reply; 6+ messages in thread
From: Jeffrey Layton @ 2004-05-27 14:27 UTC (permalink / raw)
  To: nfs

FWIW, I've also been able to replicate this problem on the 2.4 kernels
as well. I'm unable to unmount the underlying filesystem of the NFS
server until the POSIX locks that the clients hold have been released.
Needless to say, this is not good for a HA-NFS server, since it prevents
me from reliably failing over. How have others dealt with this problem?
Or perhaps I don't have something configured correctly?

Any help or insight would be much appreciated...

-- Jeff




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fcntl locks prevent unmounting of underlying filesystem
  2004-05-27 14:27 ` Jeffrey Layton
@ 2004-05-27 17:31   ` Jeffrey Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeffrey Layton @ 2004-05-27 17:31 UTC (permalink / raw)
  To: nfs

Got a response from the linux-ha list that this is a known bug :-(. The
current workaround for HA setups seems to be to force an immediate
reboot if it occurs (blech!). Any ideas what the issue is, and how it
can be fixed?

Thanks,
Jeff





-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fcntl locks prevent unmounting of underlying filesystem
  2004-05-26 16:43 fcntl locks prevent unmounting of underlying filesystem Jeffrey Layton
  2004-05-27 14:27 ` Jeffrey Layton
@ 2004-05-29 22:51 ` Neil Brown
  2004-05-30 10:53   ` Jeff Layton
  2004-06-04 12:03   ` [NFS] " Jeffrey Layton
  1 sibling, 2 replies; 6+ messages in thread
From: Neil Brown @ 2004-05-29 22:51 UTC (permalink / raw)
  To: Jeffrey Layton; +Cc: nfs

On Wednesday May 26, jtlayton@poochiereds.net wrote:
> I've seem to have run across a problem with NFS and fcntl locks. I'm
> trying to implement a HA-NFS solution using heartbeat, DRBD, LVM2, etc.
> I'm running the following:
> 
> 2.6.6 kernel
> nfs-kernel-server and nfs-common 1.0.6-3 (debian packages)
> 
> The underlying filesystem is reiserfs. Essentially what I'm seeing is
> that when I try to shut down NFS and unmount the filesystem for a
> failover, I'm unable to unmount if I have an fcntl lock on the file.

You need to make sure that lockd gets killed as well.
Just shutting down nfsd doesn't necessarily kill lockd, as if you have
any active nfs mounts lockd will stay up for them.
So, when you have shut down nfsd and before you try to unmount, could
you check if lockd is still running or not?
If it is, send it a SIGKILL.  It won't exit, but it should release any
locks that it is holding.

If lockd has gone away at this point but locks are still being held,
then that is a real problem and I will try to look into it.

NeilBrown



-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: fcntl locks prevent unmounting of underlying filesystem
  2004-05-29 22:51 ` Neil Brown
@ 2004-05-30 10:53   ` Jeff Layton
  2004-06-04 12:03   ` [NFS] " Jeffrey Layton
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2004-05-30 10:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

> You need to make sure that lockd gets killed as well.
> Just shutting down nfsd doesn't necessarily kill lockd, as if you have
> any active nfs mounts lockd will stay up for them.
> So, when you have shut down nfsd and before you try to unmount, could
> you check if lockd is still running or not?
> If it is, send it a SIGKILL.  It won't exit, but it should release any
> locks that it is holding.
> 
> If lockd has gone away at this point but locks are still being held,
> then that is a real problem and I will try to look into it.
> 
> NeilBrown

Ahh, Thanks for the info! That does indeed seem to take care of the hold
on the underlying filesystem. I need to do a little more testing to see
how locks are handled, but that at least removes my current logjam.

Cheers!
-- 
Jeff Layton <jtlayton@poochiereds.net>


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [NFS] fcntl locks prevent unmounting of underlying filesystem
  2004-05-29 22:51 ` Neil Brown
  2004-05-30 10:53   ` Jeff Layton
@ 2004-06-04 12:03   ` Jeffrey Layton
  1 sibling, 0 replies; 6+ messages in thread
From: Jeffrey Layton @ 2004-06-04 12:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-ha, nfs

On Sat, 2004-05-29 at 18:51, Neil Brown wrote:
> If lockd has gone away at this point but locks are still being held,
> then that is a real problem and I will try to look into it.
> 
> NeilBrown

Ok, I think we have a test case where even a SIGKILL to lockd won't
help. When I did a single POSIX lock on the filesystem, sending the
SIGKILL to lockd did seem to clear up the problem unmounting the
filesystem. The folks on the linux-ha list were still having problems
and suggested I try a Connectathon test to see if the SIGKILL still
worked after that.

I downloaded connectathon:

http://www.connectathon.org/nfstests.html

And ran the locking test on a filesystem I had mounted from the server.
I then did on the server:

exportfs -u <filesystem>
exportfs -f
pkill -KILL -x lockd

And tried to unmount the filesystem. It wouldn't unmount. I then killed
connectathon and tried to unmount it. It wouldn't unmount. I then
unmounted the filesystem from the client, and still I couldn't unmount
the underlying filesystem. At this point, I rebooted the box, as I
didn't see any alternative.

So whatever connectathon does, it seems to hose up Linux NFS locking
pretty solidly. One thing it seems to do is reserve _a_lot_ of locks, so
perhaps it's a problem with the amount of them. If you could download
and try it, perhaps you could get to the bottom of the problem.

FWIW, I'm using 2.6.6 kernel on the server, with /proc/fs/nfsd mounted
and nfs-utils 1.0.6-3 from Debian archive, but Guochun Shi said that
he's been able to replicate this problem on recent 2.4 kernels as well.

Many thanks for your help!
-- Jeff


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-06-04 12:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 16:43 fcntl locks prevent unmounting of underlying filesystem Jeffrey Layton
2004-05-27 14:27 ` Jeffrey Layton
2004-05-27 17:31   ` Jeffrey Layton
2004-05-29 22:51 ` Neil Brown
2004-05-30 10:53   ` Jeff Layton
2004-06-04 12:03   ` [NFS] " Jeffrey Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.