* Killed process on NFS client can result in lost lock on server
@ 2003-09-30 20:06 Philippe Troin
2003-09-30 20:52 ` Trond Myklebust
0 siblings, 1 reply; 5+ messages in thread
From: Philippe Troin @ 2003-09-30 20:06 UTC (permalink / raw)
To: nfs
[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]
I've noticed this first with bogofilter, and was able to reproduce the
problem with the enclosed test program.
Setup: kernel 2.4.22 and nfs-utils 1.0.5
A (nfs) client mounts a file system from the (nfs) server with these
options (from /proc/mounts):
server:/fs /fs nfs rw,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=server
If a process running on the (nfs) client is killed by a signal while
holding a lock on a (nfs) file, the server might not relinquish the
lock even though the locker is dead.
Try compiling and running the enclosed C program on a nfs client to
demonstrate the problem:
phil@client:~% gcc -Wall -W -o kill-locks kill-locks.c
phil@client:~% ./kill-locks
[child] fcntl(F_SETLK): Resource temporarily unavailable
unexpected status from child 00000100
successful locking attempts: 2
zsh: 10479 exit 1 ./kill-locks
phil@client:~% ./kill-locks
[child] fcntl(F_SETLK): Resource temporarily unavailable
unexpected status from child 00000100
successful locking attempts: 0
zsh: 10483 exit 1 ./kill-locks
phil@client:~% ls -i kill-locks.tmp
371922 kill-locks.tmp
phil@client:~% grep 371922 /proc/locks
zsh: 10492 exit 1 grep 371922 /proc/locks
phil@client:~%
On the server:
phil@server:~% grep 371922 /proc/locks
2: POSIX ADVISORY WRITE 10480 3a:04:371922 0 EOF c8138840 c8138484 cda9d324 00000000 c813884c
phil@server:~%
The lock is still held.
While trying to make this test program, I've noticed that the problem
only occurs while I/O is done on the locked file. Note the write() in
a while loop in the test program. I could not get the bad behavior to
show up if no I/O is going on.
Phil.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kill-locks.c --]
[-- Type: text/x-csrc, Size: 2274 bytes --]
#define _GNU_SOURCE
#define _LARGEFILE_SOURCE
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <errno.h>
#define FNAME "kill-locks.tmp"
#define BUFSIZE 16384
#define DEATHSIG SIGINT
void sighandler(int signum)
{
if (0) signum = 0;
}
int
main()
{
int successcount = 0;
struct sigaction sa;
sigset_t blockset, origset, waitset;
/**/
sa.sa_handler = &sighandler;
sa.sa_flags = 0;
sigemptyset(&sa.sa_mask);
if (sigaction(SIGUSR1, &sa, NULL) == -1)
perror("sigaction(SIGUSR1)"), exit(1);
if (sigaction(SIGCHLD, &sa, NULL) == -1)
perror("sigaction(SIGCHLD)"), exit(1);
sigemptyset(&blockset);
sigaddset(&blockset, SIGUSR1);
sigaddset(&blockset, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &blockset, &origset) == -1)
perror("sigprocmask"), exit(1);
waitset = origset;
sigdelset(&waitset, SIGUSR1);
sigdelset(&waitset, SIGCHLD);
sigaddset(&waitset, DEATHSIG);
while (1)
{
pid_t childpid = fork();
int status;
/**/
if (childpid == (pid_t) -1)
perror("fork()"), exit(1);
if (childpid == 0)
{
/* Child */
int fd;
struct flock lck;
char buf[BUFSIZE];
/**/
if (sigprocmask(SIG_SETMASK, &origset, NULL) == -1)
perror("[child] sigprocmask"), exit(1);
fd = open(FNAME, O_RDWR|O_CREAT, 0666);
if (fd == -1)
perror("[child] open()"), exit(1);
lck.l_type = F_WRLCK;
lck.l_whence = SEEK_SET;
lck.l_start = (off_t)0;
lck.l_len = (off_t)0;
if (fcntl(fd, F_SETLK, &lck) == -1)
perror("[child] fcntl(F_SETLK)"), exit(1);
memset(buf, 0, sizeof(buf));
kill(getppid(), SIGUSR1);
while(1)
write(fd, buf, sizeof(buf));
}
if ( ! (sigsuspend(&waitset) == -1 && errno == EINTR))
perror("sigsuspend"), exit(1);
usleep(rand()%1000);
kill(childpid, DEATHSIG);
if (waitpid(childpid, &status, 0) != childpid)
perror("waitpid"), exit(1);
if ( ! (WIFSIGNALED(status) && WTERMSIG(status) == DEATHSIG))
{
fprintf(stderr,
"unexpected status from child %08X\n"
"successful locking attempts: %d\n",
status, successcount);
exit(1);
}
++successcount;
}
}
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Killed process on NFS client can result in lost lock on server
2003-09-30 20:06 Killed process on NFS client can result in lost lock on server Philippe Troin
@ 2003-09-30 20:52 ` Trond Myklebust
2003-09-30 22:35 ` Philippe Troin
0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2003-09-30 20:52 UTC (permalink / raw)
To: nfs; +Cc: Trond Myklebust
>>>>> " " == Philippe Troin <phil@fifi.org> writes:
> While trying to make this test program, I've noticed that the
> problem only occurs while I/O is done on the locked file. Note
> the write() in a while loop in the test program. I could not
> get the bad behavior to show up if no I/O is going on.
Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
all locks come rain or shine when the process exits.
Cheers,
Trond
--- linux-2.4.23-pre5/fs/nfs/file.c.orig 2003-07-09 14:10:21.000000000 -0400
+++ linux-2.4.23-pre5/fs/nfs/file.c 2003-09-30 16:48:52.000000000 -0400
@@ -293,7 +293,8 @@
status2 = filemap_fdatawait(inode->i_mapping);
if (status2 && !status)
status = status2;
- if (status < 0)
+ /* Note: Ignore status if we're cleaning up locks on process exit */
+ if (status < 0 && !(current->flags & PF_EXITING))
return status;
lock_kernel();
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Killed process on NFS client can result in lost lock on server
2003-09-30 20:52 ` Trond Myklebust
@ 2003-09-30 22:35 ` Philippe Troin
2003-09-30 22:36 ` Trond Myklebust
2003-10-01 6:38 ` Philippe Troin
0 siblings, 2 replies; 5+ messages in thread
From: Philippe Troin @ 2003-09-30 22:35 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> >>>>> " " == Philippe Troin <phil@fifi.org> writes:
>
> > While trying to make this test program, I've noticed that the
> > problem only occurs while I/O is done on the locked file. Note
> > the write() in a while loop in the test program. I could not
> > get the bad behavior to show up if no I/O is going on.
>
> Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
> all locks come rain or shine when the process exits.
Thanks for the patch Trond, I'll give it a shot later today.
Is this patch in a -pre kernel yet or in 2.6.x or is it a new bug?
Phil.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Killed process on NFS client can result in lost lock on server
2003-09-30 22:35 ` Philippe Troin
@ 2003-09-30 22:36 ` Trond Myklebust
2003-10-01 6:38 ` Philippe Troin
1 sibling, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2003-09-30 22:36 UTC (permalink / raw)
To: Philippe Troin; +Cc: nfs
> Is this patch in a -pre kernel yet or in 2.6.x or is it a new
> bug?
It is new...
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Killed process on NFS client can result in lost lock on server
2003-09-30 22:35 ` Philippe Troin
2003-09-30 22:36 ` Trond Myklebust
@ 2003-10-01 6:38 ` Philippe Troin
1 sibling, 0 replies; 5+ messages in thread
From: Philippe Troin @ 2003-10-01 6:38 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Philippe Troin <phil@fifi.org> writes:
> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>
> > >>>>> " " == Philippe Troin <phil@fifi.org> writes:
> >
> > > While trying to make this test program, I've noticed that the
> > > problem only occurs while I/O is done on the locked file. Note
> > > the write() in a while loop in the test program. I could not
> > > get the bad behavior to show up if no I/O is going on.
> >
> > Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
> > all locks come rain or shine when the process exits.
>
> Thanks for the patch Trond, I'll give it a shot later today.
Unfortunately, your patch does not fix the bug, although it makes it
less frequent. Please try running kill-locks for extended periods of
time to see the bug happen again. I can see the problem develop after
100-200 successful locking attempts here, versus just a few (< 10)
successful locking attempts before your patch.
Phil.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-10-01 6:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-30 20:06 Killed process on NFS client can result in lost lock on server Philippe Troin
2003-09-30 20:52 ` Trond Myklebust
2003-09-30 22:35 ` Philippe Troin
2003-09-30 22:36 ` Trond Myklebust
2003-10-01 6:38 ` Philippe Troin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.