NFS: What is this delay for?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* NFS: What is this delay for?
@ 2012-08-24 14:19 Tetsuo Handa
  2012-08-26 14:56 ` Idan Kedar
  0 siblings, 1 reply; 4+ messages in thread
From: Tetsuo Handa @ 2012-08-24 14:19 UTC (permalink / raw)
  To: linux-fsdevel

Hello.

I noticed a strange delay when I was running a test program for
http://serverfault.com/questions/235059/vfs-file-max-limit-1231582-reached .

---------- test program start ----------
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/file.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[]) {
	while (1) {
		struct flock lock = { };
		int fd = open("/mnt/file", O_CREAT | O_RDWR, 0600);
		lock.l_type = F_WRLCK;
		lock.l_whence = SEEK_SET;
		if (fcntl(fd, F_SETLK, &lock)) {
			int err = errno;
			fprintf(stderr, "Error %d\n", err);
			return 1;
		}
		close(fd);
		fprintf(stderr, ".");
	}
}
---------- test program end ----------

When above program is executed after "service nfs start", above program sleeps
for about 90-100 seconds at the first fcntl() attempt. But after once the first
'.' is printed, above program never sleeps even after above program was
re-executed.

# cat /etc/exports
/mountdir *(rw,all_squash,sync,fsid=0)
# service nfs start
# mount -t nfs 127.0.0.1:/mountdir /mnt/
# time ./a.out

sysrq on 3.6.0-rc2-00400-g23dcfa6 kernel shows:

[  728.514014] a.out           S 00000000  5656  4400   3765 0x00000080
[  728.514014]  d9a0dd7c 00000046 d9a0dcf0 00000000 00000001 dcf0aec8 00000023 00000000
[  728.514014]  00000000 dcf0aa60 c17d2220 c17d2220 9fac0f21 000000a8 c17d2220 c17d2220
[  728.514014]  9fa96cf7 000000a8 dcf0aa60 000000a8 9fac0995 dcf0aa60 00000000 00000000
[  728.514014] Call Trace:
[  728.514014]  [<c143fc55>] ? _raw_spin_unlock_irqrestore+0x35/0x60
[  728.514014]  [<c143e45e>] schedule+0x1e/0x50
[  728.514014]  [<c143c6ea>] schedule_timeout+0xea/0x170
[  728.514014]  [<c1040a10>] ? lock_timer_base+0x50/0x50
[  728.514014]  [<c11e1f27>] nlmclnt_call+0x97/0x1f0
[  728.514014]  [<c1053210>] ? wake_up_bit+0x30/0x30
[  728.514014]  [<c11e282e>] nlmclnt_proc+0x22e/0x6f0
[  728.514014]  [<c1062e1e>] ? local_clock+0x4e/0x60
[  728.514014]  [<c11a567a>] nfs3_proc_lock+0x1a/0x20
[  728.514014]  [<c1196069>] do_setlk+0xc9/0xf0
[  728.514014]  [<c11961b8>] nfs_lock+0x98/0x170
[  728.514014]  [<c1196120>] ? nfs_flock+0x90/0x90
[  728.514014]  [<c1122161>] vfs_lock_file+0x21/0x50
[  728.514014]  [<c112227a>] do_lock_file_wait+0x4a/0xd0
[  728.514014]  [<c10c7426>] ? might_fault+0x46/0xa0
[  728.514014]  [<c1230304>] ? _copy_from_user+0x34/0x80
[  728.514014]  [<c112278e>] fcntl_setlk+0xce/0x2c0
[  728.514014]  [<c107ba3b>] ? trace_hardirqs_on+0xb/0x10
[  728.514014]  [<c10f0b82>] do_fcntl+0x1e2/0x5f0
[  728.514014]  [<c10f1013>] sys_fcntl64+0x83/0xd0
[  728.514014]  [<c144048c>] sysenter_do_call+0x12/0x32

Same result for CentOS 6.3's 2.6.32-279.5.1.el6.i686 kernel:

a.out         S 00000286     0  1290   1160 0x00000080
 df4d1aa0 00000086 e0d20aac 00000286 00000007 df57c24c dcd94544 0000ad6f
 00000000 df8393c0 0000001e d1daf53c 0000001e c0b25680 c0b25680 df4d1d48
 c0b25680 c0b21024 c0b25680 df4d1d48 de01e000 00000246 e0d25b24 e0d3dd88
Call Trace:
 [<e0d20aac>] ? xs_sendpages+0x1cc/0x240 [sunrpc]
 [<e0d25b24>] ? rpc_wake_up_next+0xc4/0x180 [sunrpc]
 [<c0440372>] ? __wake_up+0x42/0x60
 [<e0d1cecb>] ? rpc_release_client+0x7b/0x90 [sunrpc]
 [<c0466137>] ? lock_timer_base+0x27/0x50
 [<c083d34c>] ? schedule_timeout+0x12c/0x250
 [<c04e3591>] ? mempool_free+0x81/0x90
 [<c0466210>] ? process_timeout+0x0/0x10
 [<e0daf8a6>] ? nlmclnt_call+0xd6/0x290 [lockd]
 [<c0476a10>] ? autoremove_wake_function+0x0/0x40
 [<e0db0349>] ? nlmclnt_proc+0x219/0x760 [lockd]
 [<c04769f3>] ? wake_up_bit+0x13/0x30
 [<e0eeda0e>] ? do_setlk+0xbe/0xe0 [nfs]
 [<e0eedbd3>] ? nfs_lock+0xb3/0x1b0 [nfs]
 [<c053be35>] ? do_filp_open+0x635/0xb00
 [<c051f102>] ? cache_alloc_refill+0x172/0x510
 [<e0eedb20>] ? nfs_lock+0x0/0x1b0 [nfs]
 [<c056e1d2>] ? vfs_lock_file+0x22/0x50
 [<c056e2eb>] ? do_lock_file_wait+0x4b/0xd0
 [<c056e64f>] ? fcntl_setlk+0xff/0x1e0
 [<c053cf52>] ? do_fcntl+0x92/0x440
 [<c04b14cc>] ? audit_syscall_entry+0x21c/0x240
 [<c053d35a>] ? sys_fcntl64+0x5a/0xd0
 [<c0409a9f>] ? sysenter_do_call+0x12/0x28

I wonder what this sleeping period is for.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS: What is this delay for?
  2012-08-24 14:19 NFS: What is this delay for? Tetsuo Handa
@ 2012-08-26 14:56 ` Idan Kedar
  2012-08-27 19:16   ` J. Bruce Fields
  0 siblings, 1 reply; 4+ messages in thread
From: Idan Kedar @ 2012-08-26 14:56 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-fsdevel

On Fri, Aug 24, 2012 at 5:19 PM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> Hello.
>
> I noticed a strange delay when I was running a test program for
> http://serverfault.com/questions/235059/vfs-file-max-limit-1231582-reached .
>
> ---------- test program start ----------
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/file.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <errno.h>
>
> int main(int argc, char *argv[]) {
>         while (1) {
>                 struct flock lock = { };
>                 int fd = open("/mnt/file", O_CREAT | O_RDWR, 0600);
>                 lock.l_type = F_WRLCK;
>                 lock.l_whence = SEEK_SET;
>                 if (fcntl(fd, F_SETLK, &lock)) {
>                         int err = errno;
>                         fprintf(stderr, "Error %d\n", err);
>                         return 1;
>                 }
>                 close(fd);
>                 fprintf(stderr, ".");
>         }
> }
> ---------- test program end ----------
>
> When above program is executed after "service nfs start", above program sleeps
> for about 90-100 seconds at the first fcntl() attempt. But after once the first
> '.' is printed, above program never sleeps even after above program was
> re-executed.
>
> # cat /etc/exports
> /mountdir *(rw,all_squash,sync,fsid=0)
> # service nfs start
> # mount -t nfs 127.0.0.1:/mountdir /mnt/
> # time ./a.out
>
> sysrq on 3.6.0-rc2-00400-g23dcfa6 kernel shows:
>
> [  728.514014] a.out           S 00000000  5656  4400   3765 0x00000080
> [  728.514014]  d9a0dd7c 00000046 d9a0dcf0 00000000 00000001 dcf0aec8 00000023 00000000
> [  728.514014]  00000000 dcf0aa60 c17d2220 c17d2220 9fac0f21 000000a8 c17d2220 c17d2220
> [  728.514014]  9fa96cf7 000000a8 dcf0aa60 000000a8 9fac0995 dcf0aa60 00000000 00000000
> [  728.514014] Call Trace:
> [  728.514014]  [<c143fc55>] ? _raw_spin_unlock_irqrestore+0x35/0x60
> [  728.514014]  [<c143e45e>] schedule+0x1e/0x50
> [  728.514014]  [<c143c6ea>] schedule_timeout+0xea/0x170
> [  728.514014]  [<c1040a10>] ? lock_timer_base+0x50/0x50
> [  728.514014]  [<c11e1f27>] nlmclnt_call+0x97/0x1f0
> [  728.514014]  [<c1053210>] ? wake_up_bit+0x30/0x30
> [  728.514014]  [<c11e282e>] nlmclnt_proc+0x22e/0x6f0
> [  728.514014]  [<c1062e1e>] ? local_clock+0x4e/0x60
> [  728.514014]  [<c11a567a>] nfs3_proc_lock+0x1a/0x20
> [  728.514014]  [<c1196069>] do_setlk+0xc9/0xf0
> [  728.514014]  [<c11961b8>] nfs_lock+0x98/0x170
> [  728.514014]  [<c1196120>] ? nfs_flock+0x90/0x90
> [  728.514014]  [<c1122161>] vfs_lock_file+0x21/0x50
> [  728.514014]  [<c112227a>] do_lock_file_wait+0x4a/0xd0
> [  728.514014]  [<c10c7426>] ? might_fault+0x46/0xa0
> [  728.514014]  [<c1230304>] ? _copy_from_user+0x34/0x80
> [  728.514014]  [<c112278e>] fcntl_setlk+0xce/0x2c0
> [  728.514014]  [<c107ba3b>] ? trace_hardirqs_on+0xb/0x10
> [  728.514014]  [<c10f0b82>] do_fcntl+0x1e2/0x5f0
> [  728.514014]  [<c10f1013>] sys_fcntl64+0x83/0xd0
> [  728.514014]  [<c144048c>] sysenter_do_call+0x12/0x32
>
> Same result for CentOS 6.3's 2.6.32-279.5.1.el6.i686 kernel:
>
> a.out         S 00000286     0  1290   1160 0x00000080
>  df4d1aa0 00000086 e0d20aac 00000286 00000007 df57c24c dcd94544 0000ad6f
>  00000000 df8393c0 0000001e d1daf53c 0000001e c0b25680 c0b25680 df4d1d48
>  c0b25680 c0b21024 c0b25680 df4d1d48 de01e000 00000246 e0d25b24 e0d3dd88
> Call Trace:
>  [<e0d20aac>] ? xs_sendpages+0x1cc/0x240 [sunrpc]
>  [<e0d25b24>] ? rpc_wake_up_next+0xc4/0x180 [sunrpc]
>  [<c0440372>] ? __wake_up+0x42/0x60
>  [<e0d1cecb>] ? rpc_release_client+0x7b/0x90 [sunrpc]
>  [<c0466137>] ? lock_timer_base+0x27/0x50
>  [<c083d34c>] ? schedule_timeout+0x12c/0x250
>  [<c04e3591>] ? mempool_free+0x81/0x90
>  [<c0466210>] ? process_timeout+0x0/0x10
>  [<e0daf8a6>] ? nlmclnt_call+0xd6/0x290 [lockd]
>  [<c0476a10>] ? autoremove_wake_function+0x0/0x40
>  [<e0db0349>] ? nlmclnt_proc+0x219/0x760 [lockd]
>  [<c04769f3>] ? wake_up_bit+0x13/0x30
>  [<e0eeda0e>] ? do_setlk+0xbe/0xe0 [nfs]
>  [<e0eedbd3>] ? nfs_lock+0xb3/0x1b0 [nfs]
>  [<c053be35>] ? do_filp_open+0x635/0xb00
>  [<c051f102>] ? cache_alloc_refill+0x172/0x510
>  [<e0eedb20>] ? nfs_lock+0x0/0x1b0 [nfs]
>  [<c056e1d2>] ? vfs_lock_file+0x22/0x50
>  [<c056e2eb>] ? do_lock_file_wait+0x4b/0xd0
>  [<c056e64f>] ? fcntl_setlk+0xff/0x1e0
>  [<c053cf52>] ? do_fcntl+0x92/0x440
>  [<c04b14cc>] ? audit_syscall_entry+0x21c/0x240
>  [<c053d35a>] ? sys_fcntl64+0x5a/0xd0
>  [<c0409a9f>] ? sysenter_do_call+0x12/0x28
>
> I wonder what this sleeping period is for.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

After the NFS server restarts it has a grace period of 90 seconds, for
recovery purposes. You can read about it online.
You can also look it up on the NFS 4.1 RFC 5661:
http://tools.ietf.org/rfcmarkup?doc=5661

-- 
idank

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS: What is this delay for?
  2012-08-26 14:56 ` Idan Kedar
@ 2012-08-27 19:16   ` J. Bruce Fields
  2012-08-27 21:50     ` Tetsuo Handa
  0 siblings, 1 reply; 4+ messages in thread
From: J. Bruce Fields @ 2012-08-27 19:16 UTC (permalink / raw)
  To: Idan Kedar; +Cc: Tetsuo Handa, linux-fsdevel

On Sun, Aug 26, 2012 at 05:56:30PM +0300, Idan Kedar wrote:
> On Fri, Aug 24, 2012 at 5:19 PM, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > Hello.
> >
> > I noticed a strange delay when I was running a test program for
> > http://serverfault.com/questions/235059/vfs-file-max-limit-1231582-reached .
> >
> > ---------- test program start ----------
> > #include <sys/types.h>
> > #include <sys/stat.h>
> > #include <sys/file.h>
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <errno.h>
> >
> > int main(int argc, char *argv[]) {
> >         while (1) {
> >                 struct flock lock = { };
> >                 int fd = open("/mnt/file", O_CREAT | O_RDWR, 0600);
> >                 lock.l_type = F_WRLCK;
> >                 lock.l_whence = SEEK_SET;
> >                 if (fcntl(fd, F_SETLK, &lock)) {
> >                         int err = errno;
> >                         fprintf(stderr, "Error %d\n", err);
> >                         return 1;
> >                 }
> >                 close(fd);
> >                 fprintf(stderr, ".");
> >         }
> > }
> > ---------- test program end ----------
> >
> > When above program is executed after "service nfs start", above program sleeps
> > for about 90-100 seconds at the first fcntl() attempt. But after once the first
> > '.' is printed, above program never sleeps even after above program was
> > re-executed.
> >
> > # cat /etc/exports
> > /mountdir *(rw,all_squash,sync,fsid=0)
> > # service nfs start
> > # mount -t nfs 127.0.0.1:/mountdir /mnt/
> > # time ./a.out
> >
> > sysrq on 3.6.0-rc2-00400-g23dcfa6 kernel shows:
> >
> > [  728.514014] a.out           S 00000000  5656  4400   3765 0x00000080
> > [  728.514014]  d9a0dd7c 00000046 d9a0dcf0 00000000 00000001 dcf0aec8 00000023 00000000
> > [  728.514014]  00000000 dcf0aa60 c17d2220 c17d2220 9fac0f21 000000a8 c17d2220 c17d2220
> > [  728.514014]  9fa96cf7 000000a8 dcf0aa60 000000a8 9fac0995 dcf0aa60 00000000 00000000
> > [  728.514014] Call Trace:
> > [  728.514014]  [<c143fc55>] ? _raw_spin_unlock_irqrestore+0x35/0x60
> > [  728.514014]  [<c143e45e>] schedule+0x1e/0x50
> > [  728.514014]  [<c143c6ea>] schedule_timeout+0xea/0x170
> > [  728.514014]  [<c1040a10>] ? lock_timer_base+0x50/0x50
> > [  728.514014]  [<c11e1f27>] nlmclnt_call+0x97/0x1f0
> > [  728.514014]  [<c1053210>] ? wake_up_bit+0x30/0x30
> > [  728.514014]  [<c11e282e>] nlmclnt_proc+0x22e/0x6f0
> > [  728.514014]  [<c1062e1e>] ? local_clock+0x4e/0x60
> > [  728.514014]  [<c11a567a>] nfs3_proc_lock+0x1a/0x20
> > [  728.514014]  [<c1196069>] do_setlk+0xc9/0xf0
> > [  728.514014]  [<c11961b8>] nfs_lock+0x98/0x170
> > [  728.514014]  [<c1196120>] ? nfs_flock+0x90/0x90
> > [  728.514014]  [<c1122161>] vfs_lock_file+0x21/0x50
> > [  728.514014]  [<c112227a>] do_lock_file_wait+0x4a/0xd0
> > [  728.514014]  [<c10c7426>] ? might_fault+0x46/0xa0
> > [  728.514014]  [<c1230304>] ? _copy_from_user+0x34/0x80
> > [  728.514014]  [<c112278e>] fcntl_setlk+0xce/0x2c0
> > [  728.514014]  [<c107ba3b>] ? trace_hardirqs_on+0xb/0x10
> > [  728.514014]  [<c10f0b82>] do_fcntl+0x1e2/0x5f0
> > [  728.514014]  [<c10f1013>] sys_fcntl64+0x83/0xd0
> > [  728.514014]  [<c144048c>] sysenter_do_call+0x12/0x32
> >
> > Same result for CentOS 6.3's 2.6.32-279.5.1.el6.i686 kernel:
> >
> > a.out         S 00000286     0  1290   1160 0x00000080
> >  df4d1aa0 00000086 e0d20aac 00000286 00000007 df57c24c dcd94544 0000ad6f
> >  00000000 df8393c0 0000001e d1daf53c 0000001e c0b25680 c0b25680 df4d1d48
> >  c0b25680 c0b21024 c0b25680 df4d1d48 de01e000 00000246 e0d25b24 e0d3dd88
> > Call Trace:
> >  [<e0d20aac>] ? xs_sendpages+0x1cc/0x240 [sunrpc]
> >  [<e0d25b24>] ? rpc_wake_up_next+0xc4/0x180 [sunrpc]
> >  [<c0440372>] ? __wake_up+0x42/0x60
> >  [<e0d1cecb>] ? rpc_release_client+0x7b/0x90 [sunrpc]
> >  [<c0466137>] ? lock_timer_base+0x27/0x50
> >  [<c083d34c>] ? schedule_timeout+0x12c/0x250
> >  [<c04e3591>] ? mempool_free+0x81/0x90
> >  [<c0466210>] ? process_timeout+0x0/0x10
> >  [<e0daf8a6>] ? nlmclnt_call+0xd6/0x290 [lockd]
> >  [<c0476a10>] ? autoremove_wake_function+0x0/0x40
> >  [<e0db0349>] ? nlmclnt_proc+0x219/0x760 [lockd]
> >  [<c04769f3>] ? wake_up_bit+0x13/0x30
> >  [<e0eeda0e>] ? do_setlk+0xbe/0xe0 [nfs]
> >  [<e0eedbd3>] ? nfs_lock+0xb3/0x1b0 [nfs]
> >  [<c053be35>] ? do_filp_open+0x635/0xb00
> >  [<c051f102>] ? cache_alloc_refill+0x172/0x510
> >  [<e0eedb20>] ? nfs_lock+0x0/0x1b0 [nfs]
> >  [<c056e1d2>] ? vfs_lock_file+0x22/0x50
> >  [<c056e2eb>] ? do_lock_file_wait+0x4b/0xd0
> >  [<c056e64f>] ? fcntl_setlk+0xff/0x1e0
> >  [<c053cf52>] ? do_fcntl+0x92/0x440
> >  [<c04b14cc>] ? audit_syscall_entry+0x21c/0x240
> >  [<c053d35a>] ? sys_fcntl64+0x5a/0xd0
> >  [<c0409a9f>] ? sysenter_do_call+0x12/0x28
> >
> > I wonder what this sleeping period is for.
...
> After the NFS server restarts it has a grace period of 90 seconds, for
> recovery purposes. You can read about it online.
> You can also look it up on the NFS 4.1 RFC 5661:
> http://tools.ietf.org/rfcmarkup?doc=5661

Yes.  In this particular case it's NLM, not NFSv4, but it's the same
basic idea: no new locks (opens too in the case of NFSv4) are allowed
for a while at the start, to give previous clients a chance to reclaim
their locks first.

--b.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS: What is this delay for?
  2012-08-27 19:16   ` J. Bruce Fields
@ 2012-08-27 21:50     ` Tetsuo Handa
  0 siblings, 0 replies; 4+ messages in thread
From: Tetsuo Handa @ 2012-08-27 21:50 UTC (permalink / raw)
  To: bfields, idank; +Cc: linux-fsdevel

J. Bruce Fields wrote:
> On Sun, Aug 26, 2012 at 05:56:30PM +0300, Idan Kedar wrote:
> > After the NFS server restarts it has a grace period of 90 seconds, for
> > recovery purposes. You can read about it online.
> > You can also look it up on the NFS 4.1 RFC 5661:
> > http://tools.ietf.org/rfcmarkup?doc=5661
> 
> Yes.  In this particular case it's NLM, not NFSv4, but it's the same
> basic idea: no new locks (opens too in the case of NFSv4) are allowed
> for a while at the start, to give previous clients a chance to reclaim
> their locks first.

I see. I wondered because the process blocked at fcntl() rather than open(),
but I understand that this is expected behaviour. Thank you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-27 21:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-24 14:19 NFS: What is this delay for? Tetsuo Handa
2012-08-26 14:56 ` Idan Kedar
2012-08-27 19:16   ` J. Bruce Fields
2012-08-27 21:50     ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).