public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* NFS file locking?
@ 2001-10-14 18:11 Larry McVoy
  2001-10-14 23:52 ` Neil Brown
  2001-10-15  1:43 ` Alan Cox
  0 siblings, 2 replies; 5+ messages in thread
From: Larry McVoy @ 2001-10-14 18:11 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

Hi, the open(2) man page says:

       O_EXCL When  used with O_CREAT, if the file already exists
              it is an error and the open will fail.   O_EXCL  is
              broken  on NFS file systems, programs which rely on
              it for performing locking tasks will contain a race
              condition.  The solution for performing atomic file
              locking using a lockfile is to create a unique file
              on  the  same  fs (e.g., incorporating hostname and
              pid), use link(2) to make a link to  the  lockfile.
              If  link() returns 0, the lock is successful.  Oth­
              erwise, use stat(2) on the unique file to check  if
              its  link  count  has increased to 2, in which case
              the lock is also successful.

I coded this up and tried it here on a cluster of different operating
systems (Linux 2.4.5 server, linux, freebsd, solaris, aix, hpux, irix
clients) and it doesn't work.

2 questions:

a) is it the belief of folks here that this should work?

b) if performance isn't a big issue, is there any portable way to do
   locking over NFS with just files?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS file locking?
  2001-10-14 18:11 NFS file locking? Larry McVoy
@ 2001-10-14 23:52 ` Neil Brown
  2001-10-15  2:38   ` Larry McVoy
  2001-10-15  1:43 ` Alan Cox
  1 sibling, 1 reply; 5+ messages in thread
From: Neil Brown @ 2001-10-14 23:52 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel

On Sunday October 14, lm@bitmover.com wrote:
> Hi, the open(2) man page says:
> 
>        O_EXCL When  used with O_CREAT, if the file already exists
>               it is an error and the open will fail.   O_EXCL  is
>               broken  on NFS file systems, programs which rely on
>               it for performing locking tasks will contain a race
>               condition.  The solution for performing atomic file
>               locking using a lockfile is to create a unique file
>               on  the  same  fs (e.g., incorporating hostname and
>               pid), use link(2) to make a link to  the  lockfile.
>               If  link() returns 0, the lock is successful.  Oth­
>               erwise, use stat(2) on the unique file to check  if
>               its  link  count  has increased to 2, in which case
>               the lock is also successful.
> 
> I coded this up and tried it here on a cluster of different operating
> systems (Linux 2.4.5 server, linux, freebsd, solaris, aix, hpux, irix
> clients) and it doesn't work.
> 
> 2 questions:
> 
> a) is it the belief of folks here that this should work?

No.  It is unsupportable with NFSv2.
The NFSv3 protocol does provide support, the I don't think the Linux
NFSv3 client supports it yet because the VFS layer tries to handle all
the exclusion, and doesn't give the file-system a chance.

> 
> b) if performance isn't a big issue, is there any portable way to do
>    locking over NFS with just files?

   Instead of creating a lock file, create a lock symlink.
   Have the content of the symlink be something recognisably unique.
   e.g. hostname.pid
   If the "symlink" syscall succeeds, you have got the lock.
   If it fails, issue a readlink and see if the content is what you
   tried to create (RPC packet loss and retransmit could have caused
   an incorrect failure return).  If it is, you have the lock.
   If not, you don't.

   Similar tricks can be done with hard links if you really want a
   file.
   i.e. create a file with a unique name and then hard-link it to the
   lock-file-name.  On apparent failure, check the inode number.


   With all these approaches (including O_EXCL) the tricky bit is
   cleaning up after a failed application left a lockfile lying
   around.
   Automatically deleting it is racy unless you guarantee that only
   one process could ever consider deleting an old lock file.  e.g. a
   cron job on the fileserver that runs every 5 minutes and deletes
   any lock file older that 10 minutes.

NeilBrown

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS file locking?
  2001-10-14 18:11 NFS file locking? Larry McVoy
  2001-10-14 23:52 ` Neil Brown
@ 2001-10-15  1:43 ` Alan Cox
  1 sibling, 0 replies; 5+ messages in thread
From: Alan Cox @ 2001-10-15  1:43 UTC (permalink / raw)
  To: Larry McVoy; +Cc: linux-kernel

> a) is it the belief of folks here that this should work?

NFSv2 doesnt have the needed semantics

> b) if performance isn't a big issue, is there any portable way to do
>    locking over NFS with just files?

The classic way is to use link(). 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS file locking?
  2001-10-14 23:52 ` Neil Brown
@ 2001-10-15  2:38   ` Larry McVoy
  2001-10-17 11:15     ` Miquel van Smoorenburg
  0 siblings, 1 reply; 5+ messages in thread
From: Larry McVoy @ 2001-10-15  2:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Larry McVoy, linux-kernel

>    Instead of creating a lock file, create a lock symlink.
>    Have the content of the symlink be something recognisably unique.
>    e.g. hostname.pid
>    If the "symlink" syscall succeeds, you have got the lock.
>    If it fails, issue a readlink and see if the content is what you
>    tried to create (RPC packet loss and retransmit could have caused
>    an incorrect failure return).  If it is, you have the lock.
>    If not, you don't.

OK, tried that too, here's the code.  Doesn't work.  Neither does the
link approach.  Am I doing something wrong?  It seems to me that I'm
completely at the mercy of the client NFS implementation - if it caches
stuff wrong, I'm hosed.  There has to be some cute trick to get past this.

--lm


int
sccs_lockfile(char *lockfile, int seconds)
{
	char	*s;
	char	buf[300];
	int	n, uslp = 1000, waited = 0;

	s = aprintf("%u %s", getpid(), sccs_gethost());
	for ( ;; ) {
		if (symlink(s, lockfile) == 0) return (0);
		n = readlink(lockfile, buf, sizeof(buf));
		if (n > 0) {
			buf[n] = 0;
			if (streq(s, buf)) return (0);
		}
		if (seconds && ((waited / 1000000) >= seconds)) {
			fprintf(stderr, "timed out waiting for %s\n", lockfile);
			free(s);
			return (-1);
		}
		usleep(uslp);
		waited += uslp;
		if (uslp < 20000) uslp <<= 1;
	}
	/* NOTREACHED */
}

/*
 * Usage: a.out iterations lockfile
 */
int
main(int ac, char **av)
{
	int	i, iter;
	int	me = getpid();

	unless (ac == 3) return (1);
	unless ((iter = atoi(av[1])) > 0) return (1);
	printf("%d starts\n", me);
	for (i = 1; i <= iter; ++i) {
		sccs_lockfile(av[2], 0);
		assert(mine(av[2]));
		unlink(av[2]);
		unless (i % 10) printf("%d locked %d times\n", me, i);
	}
	printf("%d done\n", me);
	return (0);
}

int
mine(char *file)
{
	char	buf[300];
	char	*s;
	int	n;

	n = readlink(file, buf, sizeof(buf));
	if (n > 0) {
		s = aprintf("%u %s", getpid(), sccs_gethost());
		buf[n] = 0;
		n = streq(s, buf);
		unless (n) fprintf(stderr, "%s != %s\n", s, buf);
		free(s);
		return (n);
	}
	return (0);
}

/*
 * This function works like sprintf(), except it return a
 * malloc'ed buffer which caller should free when done
 */
char *
aprintf(char *fmt, ...)
{
	va_list	ptr;
	int	rc, size = strlen(fmt) + 64;
	char	*buf = malloc(size);

	va_start(ptr, fmt);
	rc = vsnprintf(buf, size, fmt, ptr);
	va_end(ptr);
	/*
	 * On IRIX, it truncates and returns size-1.
	 * We can't assume that that is OK, even though that might be
	 * a perfect fit.  We always bump up the size and try again.
	 * This can rarely lead to an extra alloc that we didn't need,
	 * but that's tough.
	 */
	while ((rc < 0) || (rc >= (size-1))) {
		size *= 2;
		free(buf);
		buf = malloc(size);
		va_start(ptr, fmt);
		rc = vsnprintf(buf, size, fmt, ptr);
		va_end(ptr);
	}
	return (buf); /* caller should free */
}

char	*
sccs_gethost()
{
	static	char	host[256];

	if (gethostname(host, sizeof(host)) == -1) return "?";
	return (host);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS file locking?
  2001-10-15  2:38   ` Larry McVoy
@ 2001-10-17 11:15     ` Miquel van Smoorenburg
  0 siblings, 0 replies; 5+ messages in thread
From: Miquel van Smoorenburg @ 2001-10-17 11:15 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4768 bytes --]

In article <20011014193844.C13153@work.bitmover.com>,
Larry McVoy  <lm@bitmover.com> wrote:
>OK, tried that too, here's the code.  Doesn't work.  Neither does the
>link approach.  Am I doing something wrong?  It seems to me that I'm
>completely at the mercy of the client NFS implementation - if it caches
>stuff wrong, I'm hosed.  There has to be some cute trick to get past this.

Download ftp://ftp.debian.org/debian/pool/main/libl/liblockfile/liblockfile_1.03.tar.gz

It contains NFS safe locking functions, and it knows how to work around
NFS client caches. And it documents all algorithms in the manpages too.

ALGORITHM
       The algorithm that is used to  create  a  lockfile  in  an
       atomic way, even over NFS, is as follows:

       1      A  unique  file  is  created. In printf format, the
              name of the file is .lk%05d%x%s. The first argument
              (%05d)  is the current process id. The second argu­
              ment (%x) consists of the 4 minor bits of the value
              returned  by time(2). The last argument is the sys­
              tem hostname.


       2      Then the lockfile is  created  using  link(2).  The
              return value of link is ignored.


       3      Now the lockfile is stat()ed. If the stat fails, we
              go to step 6.


       4      The stat value of the  lockfile  is  compared  with
              that  of  the temporary file. If they are the same,
              we have the lock. The temporary file is deleted and
              a value of 0 (success) is returned to the caller.


       5      A  check is made to see if the existing lockfile is
              a valid one. If it isn't valid, the stale  lockfile
              is deleted.


       6      Before  retrying, we sleep for n seconds. n is ini­
              tially 5 seconds, but after  every  retry  5  extra
              seconds  is added up to a maximum of 60 seconds (an
              incremental backoff). Then we go to step  2  up  to
              retries times.


REMOTE FILE SYSTEMS AND THE KERNEL ATTRIBUTE CACHE
       If  you  are  using  lockfile_create to create a lock on a
       file that resides on a remote server, and you already have
       that  file open, you need to flush the NFS attribute cache
       after locking. This is needed  to  prevent  the  following
       scenario:

       o  open /var/mail/USERNAME
       o  attributes,  such as size, inode, etc are now cached in
          the kernel!
       o  meanwhile,  another  remote  system  appends  data   to
          /var/mail/USERNAME
       o  grab lock using lockfile_create()
       o  seek to end of file
       o  write data

       Now the end of the file really isn't the end of the file -
       the kernel cached the attributes on open, and  st_size  is
       not  the  end  of  the  file anymore. So after locking the
       file, you need to tell the kernel to flush  the  NFS  file
       attribute cache.

       The only portable way to do this is the POSIX fcntl() file
       locking primitives - locking a file using fcntl() has  the
       fortunate   side-effect   of  invalidating  the  NFS  file
       attribute cache of the kernel.

       lockfile_create() cannot do this for you for two  reasons.
       One,  it  just  creates  a lockfile- it doesn't know which
       file you are actually trying to  lock!  Two,  even  if  it
       could deduce the file you're locking from the filename, by
       just opening and  closing  it,  it  would  invalidate  any
       existing  POSIX  locks  the  program might already have on
       that file (yes, POSIX locking semantics are insane!).

       So basically what you need to do is something like this:

         fd = open("/var/mail/USER");
         .. program code ..

         lockfile_create("/var/mail/USER.lock", x, y);

         /* Invalidate NFS attribute cache using POSIX locks */
         if (lockf(fd, F_TLOCK, 0) == 0) lockf(fd, F_ULOCK, 0);

       You have to be careful with this if you're putting this in
       an  existing  program that might already be using fcntl(),
       flock() or lockf() locking- you might invalidate  existing
       locks.



       There  is also a non-portable way. A lot of NFS operations
       return the updated attributes - and the Linux kernel actu­
       ally  uses  these  to  update  the attribute cache. One of
       these operations is chmod(2).

       So stat()ing a file and then chmod()ing it  to  st.st_mode
       will  not  actually change the file, nor will it interfere
       with any locks on the file, but  it  will  invalidate  the
       attribute cache. The equivalent to use from a shell script
       would be

         chmod u=u /var/mail/USER

Mike.
-- 
Move sig.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-10-17 11:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-14 18:11 NFS file locking? Larry McVoy
2001-10-14 23:52 ` Neil Brown
2001-10-15  2:38   ` Larry McVoy
2001-10-17 11:15     ` Miquel van Smoorenburg
2001-10-15  1:43 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox