All of lore.kernel.org
 help / color / mirror / Atom feed
* Killed process on NFS client can result in lost lock on server
@ 2003-09-30 20:06 Philippe Troin
  2003-09-30 20:52 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Troin @ 2003-09-30 20:06 UTC (permalink / raw)
  To: nfs

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

I've noticed this first with bogofilter, and was able to reproduce the
problem with the enclosed test program.

Setup: kernel 2.4.22 and nfs-utils 1.0.5

A (nfs) client mounts a file system from the (nfs) server with these
options (from /proc/mounts):

server:/fs /fs nfs rw,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=server

If a process running on the (nfs) client is killed by a signal while
holding a lock on a (nfs) file, the server might not relinquish the
lock even though the locker is dead.

Try compiling and running the enclosed C program on a nfs client to
demonstrate the problem:

   phil@client:~% gcc -Wall -W -o kill-locks kill-locks.c
   phil@client:~% ./kill-locks
   [child] fcntl(F_SETLK): Resource temporarily unavailable
   unexpected status from child 00000100
   successful locking attempts: 2
   zsh: 10479 exit 1     ./kill-locks
   phil@client:~% ./kill-locks
   [child] fcntl(F_SETLK): Resource temporarily unavailable
   unexpected status from child 00000100
   successful locking attempts: 0
   zsh: 10483 exit 1     ./kill-locks
   phil@client:~% ls -i kill-locks.tmp
    371922 kill-locks.tmp
   phil@client:~% grep 371922 /proc/locks
   zsh: 10492 exit 1     grep 371922 /proc/locks
   phil@client:~%

On the server:

   phil@server:~% grep 371922 /proc/locks
   2: POSIX  ADVISORY  WRITE 10480 3a:04:371922 0 EOF c8138840 c8138484 cda9d324 00000000 c813884c
   phil@server:~%

The lock is still held.

While trying to make this test program, I've noticed that the problem
only occurs while I/O is done on the locked file. Note the write() in
a while loop in the test program. I could not get the bad behavior to
show up if no I/O is going on.

Phil.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kill-locks.c --]
[-- Type: text/x-csrc, Size: 2274 bytes --]

#define _GNU_SOURCE
#define _LARGEFILE_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <errno.h>

#define FNAME		"kill-locks.tmp"
#define BUFSIZE		16384
#define DEATHSIG	SIGINT

void sighandler(int signum)
{
  if (0) signum = 0;
}

int
main()
{
  int			successcount = 0;
  struct sigaction	sa;
  sigset_t		blockset, origset, waitset;
  /**/

  sa.sa_handler = &sighandler;
  sa.sa_flags	= 0;
  sigemptyset(&sa.sa_mask);
  if (sigaction(SIGUSR1, &sa, NULL) == -1)
    perror("sigaction(SIGUSR1)"), exit(1);
  if (sigaction(SIGCHLD, &sa, NULL) == -1)
    perror("sigaction(SIGCHLD)"), exit(1);

  sigemptyset(&blockset);
  sigaddset(&blockset, SIGUSR1);
  sigaddset(&blockset, SIGCHLD);
  if (sigprocmask(SIG_BLOCK, &blockset, &origset) == -1)
    perror("sigprocmask"), exit(1);
  waitset = origset;
  sigdelset(&waitset, SIGUSR1);
  sigdelset(&waitset, SIGCHLD);
  sigaddset(&waitset, DEATHSIG);

  while (1)
    {
      pid_t	childpid = fork();
      int	status;
      /**/

      if (childpid == (pid_t) -1)
	perror("fork()"), exit(1);
      if (childpid == 0)
	{
	  /* Child */
	  int		fd;
	  struct flock	lck;
	  char		buf[BUFSIZE];
	  /**/

	  if (sigprocmask(SIG_SETMASK, &origset, NULL) == -1)
	    perror("[child] sigprocmask"), exit(1);

	  fd = open(FNAME, O_RDWR|O_CREAT, 0666);
	  if (fd == -1)
	    perror("[child] open()"), exit(1);

	  lck.l_type   = F_WRLCK;
	  lck.l_whence = SEEK_SET;
	  lck.l_start  = (off_t)0;
	  lck.l_len    = (off_t)0;
	  if (fcntl(fd, F_SETLK, &lck) == -1)
	    perror("[child] fcntl(F_SETLK)"), exit(1);
	  memset(buf, 0, sizeof(buf));
	  kill(getppid(), SIGUSR1);
	  while(1)
	    write(fd, buf, sizeof(buf));
	}

      if ( ! (sigsuspend(&waitset) == -1 && errno == EINTR))
	perror("sigsuspend"), exit(1);
      usleep(rand()%1000);
      kill(childpid, DEATHSIG);
      if (waitpid(childpid, &status, 0) != childpid)
	perror("waitpid"), exit(1);
      if ( ! (WIFSIGNALED(status) && WTERMSIG(status) == DEATHSIG))
	{
	  fprintf(stderr,
		  "unexpected status from child %08X\n"
		  "successful locking attempts: %d\n",
		  status, successcount);
	  exit(1);
	}
      ++successcount;
    }
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Killed process on NFS client can result in lost lock on server
  2003-09-30 20:06 Killed process on NFS client can result in lost lock on server Philippe Troin
@ 2003-09-30 20:52 ` Trond Myklebust
  2003-09-30 22:35   ` Philippe Troin
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2003-09-30 20:52 UTC (permalink / raw)
  To: nfs; +Cc: Trond Myklebust

>>>>> " " == Philippe Troin <phil@fifi.org> writes:

     > While trying to make this test program, I've noticed that the
     > problem only occurs while I/O is done on the locked file. Note
     > the write() in a while loop in the test program. I could not
     > get the bad behavior to show up if no I/O is going on.

Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
all locks come rain or shine when the process exits.

Cheers,
  Trond

--- linux-2.4.23-pre5/fs/nfs/file.c.orig	2003-07-09 14:10:21.000000000 -0400
+++ linux-2.4.23-pre5/fs/nfs/file.c	2003-09-30 16:48:52.000000000 -0400
@@ -293,7 +293,8 @@
 	status2 = filemap_fdatawait(inode->i_mapping);
 	if (status2 && !status)
 		status = status2;
-	if (status < 0)
+	/* Note: Ignore status if we're cleaning up locks on process exit */
+	if (status < 0 && !(current->flags & PF_EXITING))
 		return status;
 
 	lock_kernel();




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Killed process on NFS client can result in lost lock on server
  2003-09-30 20:52 ` Trond Myklebust
@ 2003-09-30 22:35   ` Philippe Troin
  2003-09-30 22:36     ` Trond Myklebust
  2003-10-01  6:38     ` Philippe Troin
  0 siblings, 2 replies; 5+ messages in thread
From: Philippe Troin @ 2003-09-30 22:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> >>>>> " " == Philippe Troin <phil@fifi.org> writes:
> 
>      > While trying to make this test program, I've noticed that the
>      > problem only occurs while I/O is done on the locked file. Note
>      > the write() in a while loop in the test program. I could not
>      > get the bad behavior to show up if no I/O is going on.
> 
> Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
> all locks come rain or shine when the process exits.

Thanks for the patch Trond, I'll give it a shot later today.

Is this patch in a -pre kernel yet or in 2.6.x or is it a new bug?

Phil.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Killed process on NFS client can result in lost lock on server
  2003-09-30 22:35   ` Philippe Troin
@ 2003-09-30 22:36     ` Trond Myklebust
  2003-10-01  6:38     ` Philippe Troin
  1 sibling, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2003-09-30 22:36 UTC (permalink / raw)
  To: Philippe Troin; +Cc: nfs


     > Is this patch in a -pre kernel yet or in 2.6.x or is it a new
     > bug?

It is new...

Cheers,
  Trond


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Killed process on NFS client can result in lost lock on server
  2003-09-30 22:35   ` Philippe Troin
  2003-09-30 22:36     ` Trond Myklebust
@ 2003-10-01  6:38     ` Philippe Troin
  1 sibling, 0 replies; 5+ messages in thread
From: Philippe Troin @ 2003-10-01  6:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs

Philippe Troin <phil@fifi.org> writes:

> Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> 
> > >>>>> " " == Philippe Troin <phil@fifi.org> writes:
> > 
> >      > While trying to make this test program, I've noticed that the
> >      > problem only occurs while I/O is done on the locked file. Note
> >      > the write() in a while loop in the test program. I could not
> >      > get the bad behavior to show up if no I/O is going on.
> > 
> > Yep. It's the same problem as in nlmclnt_proc(): we have to clean up
> > all locks come rain or shine when the process exits.
> 
> Thanks for the patch Trond, I'll give it a shot later today.

Unfortunately, your patch does not fix the bug, although it makes it
less frequent. Please try running kill-locks for extended periods of
time to see the bug happen again. I can see the problem develop after
100-200 successful locking attempts here, versus just a few (< 10)
successful locking attempts before your patch.

Phil.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-10-01  6:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-30 20:06 Killed process on NFS client can result in lost lock on server Philippe Troin
2003-09-30 20:52 ` Trond Myklebust
2003-09-30 22:35   ` Philippe Troin
2003-09-30 22:36     ` Trond Myklebust
2003-10-01  6:38     ` Philippe Troin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.