Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Ian Kent <ikent@redhat.com>
To: "Carlos André" <candrecn@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Linux NFSv4 mailing list <nfsv4@linux-nfs.org>,
	NFS list <linux-nfs@vger.kernel.org>
Subject: Re: AutoFS+NFSv4 server down = LOOOOONG timeout.
Date: Mon, 24 Aug 2009 22:57:07 +0800	[thread overview]
Message-ID: <4A92AA43.6070304@redhat.com> (raw)
In-Reply-To: <f6ce31e30908240627gff0a7eeu3c884185e6324518-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Carlos Andr=E9 wrote:
> Hi Ian,
>=20
> Thanks for patch and sorry for delay (i'm expecting receive u reply o=
n
> bug track, not here) :)
>=20
> But, this patch doesnt worked to me like expected...  :(
>=20
>=20
> Firstly I've changed "#MOUNT_WAIT=3D-1" to "MOUNT_WAIT=3D10"
> and later changed "10" to "2" with same results...
> (always restarting service, of course :)
>=20
> Then, tried remove "sec=3Dkrb5p", and later removed "nfs4" but i got
> same results again.
>=20
> Or i'm doing something wrong?
>=20
>=20
> [root@KSTATION areas]# automount -V
>=20
> Linux automount version 5.0.1-0.rc2.131.bz517349.1
> [...]
>=20
> [root@KSTATION areas]# time ls -la testdown
> ls: testedown: No such file or directory
>=20
> real    3m9.006s
> user    0m0.002s
> sys     0m0.000s

OK, that isn't behaving the way I expect, I'll have a look.

>=20
>=20
> LOGGING:
> -----------------------------------------
> Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
> calling mount -t nfs4 -s -o rw,acl,sec=3Dkrb5p 1.2.3.4:/areas/testdow=
n
> /misc/areas/testdown
> Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
> Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token =3D=
 91
> Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/area=
s/testdown
> -----------------------------------------
>=20
>=20
>=20
>=20
>=20
> 2009/8/17 Ian Kent <ikent@redhat.com>:
>> On Thu, 2009-08-13 at 12:18 -0300, Carlos Andr=E9 wrote:
>>> Filled bug report:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=3D517349
>> Hi Carlos,
>>
>> I have a patched source rpm to add a mount wait parameter to autofs
>> located at:
>> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>>
>> Could you build it and see if it works.
>> I haven't tested it at all but it is fairly straight forward.
>> It is still unclear if this is the right way to do this and what the
>> consequences are in sending a term signal to mount. This mount reque=
st
>> will likely be followed by other requests for the same mount causing=
 an
>> accumulation of mount(8) processes waiting for RPC timeouts before t=
hey
>> can answer the TERM signal.
>>
>> Anyway, for information the patch included in the source rpm above i=
s:
>>
>> autofs-5.0.4 - add mount wait parameter
>>
>> From: Ian Kent <raven@themaw.net>
>>
>> Often delays when trying to mount from a server that is not repondin=
g
>> for some reason are undesirable. To try and prevent these delays we
>> provide a configuration setting to limit the time that we wait for
>> our spawned mount(8) process to complete before sending it a SIGTERM
>> signal. This patch adds a configuration parameter to allow us to
>> request we limit the time we wait for mount(8) to complete before
>> send it a TERM signal.
>> ---
>>
>>  daemon/spawn.c                 |    3 ++-
>>  include/defaults.h             |    2 ++
>>  lib/defaults.c                 |   13 +++++++++++++
>>  man/auto.master.5.in           |    7 +++++++
>>  redhat/autofs.sysconfig.in     |    9 +++++++++
>>  samples/autofs.conf.default.in |    9 +++++++++
>>  6 files changed, 42 insertions(+), 1 deletion(-)
>>
>>
>> --- autofs-5.0.1.orig/daemon/spawn.c
>> +++ autofs-5.0.1/daemon/spawn.c
>> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>>        unsigned int options;
>>        unsigned int retries =3D MTAB_LOCK_RETRIES;
>>        int update_mtab =3D 1, ret, printed =3D 0;
>> +       unsigned int wait =3D defaults_get_mount_wait();
>>        char buf[PATH_MAX];
>>
>>        /* If we use mount locking we can't validate the location */
>> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>>        va_end(arg);
>>
>>        while (retries--) {
>> -               ret =3D do_spawn(logopt, -1, options, prog, (const c=
har **) argv);
>> +               ret =3D do_spawn(logopt, wait, options, prog, (const=
 char **) argv);
>>                if (ret & MTAB_NOTUPDATED) {
>>                        struct timespec tm =3D {3, 0};
>>
>> --- autofs-5.0.1.orig/include/defaults.h
>> +++ autofs-5.0.1/include/defaults.h
>> @@ -24,6 +24,7 @@
>>
>>  #define DEFAULT_TIMEOUT                        600
>>  #define DEFAULT_NEGATIVE_TIMEOUT       60
>> +#define DEFAULT_MOUNT_WAIT             -1
>>  #define DEFAULT_UMOUNT_WAIT            12
>>  #define DEFAULT_BROWSE_MODE            1
>>  #define DEFAULT_LOGGING                        0
>> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>>  struct ldap_searchdn *defaults_get_searchdns(void);
>>  void defaults_free_searchdns(struct ldap_searchdn *);
>>  unsigned int defaults_get_append_options(void);
>> +unsigned int defaults_get_mount_wait(void);
>>  unsigned int defaults_get_umount_wait(void);
>>  const char *defaults_get_auth_conf_file(void);
>>  unsigned int defaults_get_map_hash_table_size(void);
>> --- autofs-5.0.1.orig/lib/defaults.c
>> +++ autofs-5.0.1/lib/defaults.c
>> @@ -45,6 +45,7 @@
>>  #define ENV_NAME_VALUE_ATTR            "VALUE_ATTRIBUTE"
>>
>>  #define ENV_APPEND_OPTIONS             "APPEND_OPTIONS"
>> +#define ENV_MOUNT_WAIT                 "MOUNT_WAIT"
>>  #define ENV_UMOUNT_WAIT                        "UMOUNT_WAIT"
>>  #define ENV_AUTH_CONF_FILE             "AUTH_CONF_FILE"
>>
>> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>>                    check_set_config_value(key, ENV_NAME_ENTRY_ATTR, =
value, to_syslog) ||
>>                    check_set_config_value(key, ENV_NAME_VALUE_ATTR, =
value, to_syslog) ||
>>                    check_set_config_value(key, ENV_APPEND_OPTIONS, v=
alue, to_syslog) ||
>> +                   check_set_config_value(key, ENV_MOUNT_WAIT, valu=
e, to_syslog) ||
>>                    check_set_config_value(key, ENV_UMOUNT_WAIT, valu=
e, to_syslog) ||
>>                    check_set_config_value(key, ENV_AUTH_CONF_FILE, v=
alue, to_syslog) ||
>>                    check_set_config_value(key, ENV_MAP_HASH_TABLE_SI=
ZE, value, to_syslog))
>> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>>        return res;
>>  }
>>
>> +unsigned int defaults_get_mount_wait(void)
>> +{
>> +       long wait;
>> +
>> +       wait =3D get_env_number(ENV_MOUNT_WAIT);
>> +       if (wait < 0)
>> +               wait =3D DEFAULT_MOUNT_WAIT;
>> +
>> +       return (unsigned int) wait;
>> +}
>> +
>>  unsigned int defaults_get_umount_wait(void)
>>  {
>>        long wait;
>> --- autofs-5.0.1.orig/man/auto.master.5.in
>> +++ autofs-5.0.1/man/auto.master.5.in
>> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>>  60). If the equivalent command line option is given it will overrid=
e this
>>  setting.
>>  .TP
>> +.B MOUNT_WAIT
>> +Set the default time to wait for a response from a spawned mount(8)
>> +before sending it a SIGTERM. Note that we still need to wait for th=
e
>> +RPC layer to timeout before the sub-process exits so this isn't ide=
al
>> +but it is the best we can do. The default is to wait until mount(8)
>> +returns without intervention.
>> +.TP
>>  .B UMOUNT_WAIT
>>  Set the default time to wait for a response from a spawned umount(8=
)
>>  before sending it a SIGTERM. Note that we still need to wait for th=
e
>> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
>> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>  #
>>  #NEGATIVE_TIMEOUT=3D60
>>  #
>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>> +#             Setting this timeout can cause problems when
>> +#             mount would otherwise wait for a server that
>> +#             is temporarily unavailable, such as when it's
>> +#             restarting. The defailt of waiting for mount(8)
>> +#             usually results in a wait of around 3 minutes.
>> +#
>> +#MOUNT_WAIT=3D-1
>> +#
>>  # UMOUNT_WAIT - time to wait for a response from umount(8).
>>  #
>>  #UMOUNT_WAIT=3D12
>> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
>> +++ autofs-5.0.1/samples/autofs.conf.default.in
>> @@ -14,6 +14,15 @@ TIMEOUT=3D300
>>  #
>>  #NEGATIVE_TIMEOUT=3D60
>>  #
>> +# MOUNT_WAIT - time to wait for a response from umount(8).
>> +#             Setting this timeout can cause problems when
>> +#             mount would otherwise wait for a server that
>> +#             is temporarily unavailable, such as when it's
>> +#             restarting. The defailt of waiting for mount(8)
>> +#             usually results in a wait of around 3 minutes.
>> +#
>> +#MOUNT_WAIT=3D-1
>> +#
>>  # UMOUNT_WAIT - time to wait for a response from umount(8).
>>  #
>>  #UMOUNT_WAIT=3D12
>>
>>
>>> Thanks!
>>>
>>> 2009/8/13 Carlos Andr=E9 <candrecn@gmail.com>:
>>>> 2009/8/13 Ian Kent <ikent@redhat.com>:
>>>>> Carlos Andr=E9 wrote:
>>>>>> Today (2009-08-12) I'm using:
>>>>>> kernel-2.6.18-128.2.1.el5
>>>>>> autofs-5.0.1-0.rc2.102.el5_3.1
>>>>> Thanks,
>>>>>
>>>>> My mistake, the wait time I was referring to is used for umounts =
during
>>>>> expires and is present in rev rc2.102.
>>>>>
>>>>> It shouldn't be hard to add this for mount as well.
>>>>> Would you like me to put something together?
>>>> Sure! that 'll help me a lot (and for sure another ppl) :) Thanks =
:)
>>>>
>>>>> Probably would be good to test something out to see if we can mak=
e a
>>>>> difference with the killing mount after some configured timeout b=
ut, if
>>>>> we make progress, probably the best way to deal with it is for yo=
u to
>>>>> log a bug against rhel-5 so I can get it committed to the rhel pa=
ckage.
>>>>> The possible issue is that I'm not sure if the RPC subsystem in t=
he
>>>>> above rhel kernel will respond well to process death with potenti=
al
>>>>> outstanding requests. But we'll see.
>>>> Ok, on my way :)
>>>>
>>>> Thanks a lot!
>>>>
>>>>>>
>>>>>> Look my last test:
>>>>>> --------------------------------------------------------------
>>>>>> [root@KSTATION areas]# time ls testdown
>>>>>> ls: testdown: No such file or directory
>>>>>>
>>>>>> real    3m9.025s
>>>>>> user    0m0.000s
>>>>>> sys     0m0.002s
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun)=
:
>>>>>> mounting root /misc/areas, mountpoint testdown, what
>>>>>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>>>>>> acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0 using module nfs4
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> root=3D/misc/areas name=3Dtestdown what=3D1.2.3.4:/areas/testdow=
n,
>>>>>> fstype=3Dnfs4, options=3Dacl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> nfs options=3D"acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D0", nosymlink=
=3D0, ro=3D0
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> calling mkdir_path /misc/areas/testdown
>>>>>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nf=
s):
>>>>>> calling mount -t nfs4 -s -o acl,sec=3Dkrb5p,proto=3Dtcp,retry=3D=
0
>>>>>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc=
 =3D
>>>>>> 3078093712 path /misc
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect:=
 2
>>>>>> submounts remaining in /misc
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got t=
hid
>>>>>> 3078093712 path /misc stat 3
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigch=
ld:
>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready():=
 state
>>>>>> =3D 2 path /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc=
 =3D
>>>>>> 3078093712 path /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect:=
 2
>>>>>> submounts remaining in /misc
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got t=
hid
>>>>>> 3078093712 path /misc stat 3
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigch=
ld:
>>>>>> exp 3078093712 finished, switching from 2 to 1
>>>>>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready():=
 state
>>>>>> =3D 2 path /misc
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NF=
S
>>>>>> server '1.2.3.4' failed: timed out (giving up).
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: moun=
t
>>>>>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token =3D =
17
>>>>>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc=
/areas/testdown
>>>>>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 pa=
th /misc
>>>>>> --------------------------------------------------------------
>>>>>>
>>>>>> 2009/8/12 Ian Kent <ikent@redhat.com>:
>>>>>>> Carlos Andr=E9 wrote:
>>>>>>>> Hi Ian,
>>>>>>>> I'm getting crazy trying put "retry=3D" to work on mount... th=
is option
>>>>>>>> just DONT WORK if use proto=3Dtcp and/OR kerberos (sec=3Dkrb5/=
krb5i/krb5p)
>>>>>>>> like you can see on my previous emails...
>>>>>>> Right, my mistake for not looking closely enough at post.
>>>>>>>
>>>>>>> Maybe this is related to the same sort of problem we had with m=
ount in
>>>>>>> the past, before the options parsing went into the kernel, wher=
e other
>>>>>>> services, like portmapper (or rpcbind), were being done with di=
fferent
>>>>>>> timeout parameters before the RPC calls for mounting. That's ju=
st an
>>>>>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>>>>>>>
>>>>>>> But what version of autofs and kernel did you say you were usin=
g?
>>>>>>>
>>>>>>>> I appreciate any help.
>>>>>>>>
>>>>>>>> Carlos.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2009/8/12 Ian Kent <ikent@redhat.com>:
>>>>>>>>> Chuck Lever wrote:
>>>>>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos Andr=E9 wrote:
>>>>>>>>>>> This long timeout is good if workstation need mount a criti=
cal
>>>>>>>>>>> directory using /etc/fstab on boot (for example)..
>>>>>>>>>>> But in my case, using this loooong timeout doesnt make any =
sense,
>>>>>>>>>>> since autofs retry mount directory on-access. This in fact =
gives me
>>>>>>>>>>> alot of headaches, coz user login 'll just hangs if one ser=
ver goes
>>>>>>>>>>> down for any reason, and will again hangs if user try acces=
s directory
>>>>>>>>>>> pointing to a NFS down server...
>>>>>>>>>> "retry=3D0" means the mount command will fail as soon as the=
 first
>>>>>>>>>> mount(2) system call fails.  When you set SYN retries to 1, =
this means
>>>>>>>>>> after 9 seconds, the connect fails, and that causes the moun=
t(2) system
>>>>>>>>>> call to fail.
>>>>>>>>>>
>>>>>>>>>> Recent conversations with Ian suggested that a long timeout =
was desired
>>>>>>>>>> for automounter as well as other cases.  Ian, is there somet=
hing else we
>>>>>>>>>> need to consider to determine the correct retry timeout for =
NFS/TCP
>>>>>>>>>> mount points handled via automounter?  How should mount.nfs =
wait so we
>>>>>>>>>> don't make other use cases worse?  (Looks like most of the h=
istory is
>>>>>>>>>> intact below).
>>>>>>>>> Of course we know that autofs is entirely at the mercy of mou=
nt(8) (and
>>>>>>>>> mount.nfs in particular). This has always been a difficult si=
tuation for
>>>>>>>>> the automounter because interactive mount invocations should =
wait. But I
>>>>>>>>> believe automount mounts should always time out quickly, but =
that leads
>>>>>>>>> to its own set of problems, especially when home directories =
are concerned.
>>>>>>>>>
>>>>>>>>> I think adding "retry=3D0" is the right thing to do myself bu=
t I'm not
>>>>>>>>> certain that will work as we expect. I'll have to do some exp=
erimentation.
>>>>>>>>>
>>>>>>>>>> How long do you think is appropriate for the automounter to =
wait if the
>>>>>>>>>> server is down, in your case, Carlos?
>>>>>>>>>>
>>>>>>>>>>> Am losing something or there have was something weirdo...!?
>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retr=
ies  [DEFAULT]
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    3m9.000s
>>>>>>>>>>> user    0m0.002s
>>>>>>>>>>> sys     0m0.001s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    3m9.000s
>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    3m9.001s
>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>> sys     0m0.003s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    3m9.001s
>>>>>>>>>>> user    0m0.002s
>>>>>>>>>>> sys     0m0.001s
>>>>>>>>>>>
>>>>>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retr=
ies [ 5 to 1 ]
>>>>>>>>>>>
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 6]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    1m3.002s
>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D1
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 13]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    2m6.000s
>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    0m9.003s
>>>>>>>>>>> user    0m0.001s
>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4=
 -o
>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp,retry=3D0
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (ret=
rying). [x 13]
>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giv=
ing up).
>>>>>>>>>>>
>>>>>>>>>>> real    2m6.001s
>>>>>>>>>>> user    0m0.001s
>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>> [root@KSTATION ~]#
>>>>>>>>>>> ------------------------------------------------
>>>>>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to=
 1... and
>>>>>>>>>>> using retry=3D0 without kerberos I got only 9s...
>>>>>>>>>>>
>>>>>>>>>>> *sigh*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2009/8/10 Chuck Lever <chuck.lever@oracle.com>:
>>>>>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>>>>>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_sy=
n_retries to
>>>>>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>>>>>>>>>>>> Right.  Normally the RPC client calls the kernel's socket =
connect
>>>>>>>>>>>> function,
>>>>>>>>>>>> which does 6 SYN retries.  That one call usually takes lon=
ger than
>>>>>>>>>>>> the RPC
>>>>>>>>>>>> client's connect timeout, so it only makes one connect cal=
l, and then
>>>>>>>>>>>> fails.
>>>>>>>>>>>>
>>>>>>>>>>>> Reducing the number of SYN retries per connect attempt cau=
ses the RPC
>>>>>>>>>>>> client
>>>>>>>>>>>> to retry the connect call until its connect timeout expire=
s.  Each
>>>>>>>>>>>> connect
>>>>>>>>>>>> call resets the SYN timeout to 3 seconds.
>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nf=
s4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (g=
iving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    3m9.000s
>>>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>>>> sys     0m0.002s
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_ret=
ries
>>>>>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nf=
s4 -o
>>>>>>>>>>>>> sec=3Dkrb5p,proto=3Dtcp  ("retry=3D1" =3D no change)
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (r=
etrying).
>>>>>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (g=
iving up).
>>>>>>>>>>>>>
>>>>>>>>>>>>> real    2m6.004s
>>>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>>>> sys     0m0.004s
>>>>>>>>>>>>>
>>>>>>>>>>>>> (3,6,3,6... secs interval)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2009/8/10 Carlos Andr=E9 <candrecn@gmail.com>:
>>>>>>>>>>>>>> No, i'm just using packages from CentOS repo...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And u're right about expo retries... with tcpdump i've m=
onitored
>>>>>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 se=
cs on port
>>>>>>>>>>>>>> 2049...
>>>>>>>>>>>>>> I tried use "retry=3D1" option on mount without any chan=
ge... I dont
>>>>>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2009/8/10 Chuck Lever <chuck.lever@oracle.com>:
>>>>>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos Andr=E9 wrote:
>>>>>>>>>>>>>>>> Bruce, no... you're right.  I'm describing a situation=
 where my
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) t=
han 3 minutes
>>>>>>>>>>>>>>>> and 9 seconds...
>>>>>>>>>>>>>>> The 189 second timeout is likely how long it takes the =
kernel to
>>>>>>>>>>>>>>> give up
>>>>>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN att=
empts with
>>>>>>>>>>>>>>> exponential retries, or something like that).  For stoc=
k CentOS
>>>>>>>>>>>>>>> 5.3, I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 moun=
ts -- the
>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no pre=
ceding rpcbind
>>>>>>>>>>>>>>> request.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-relate=
d CentOS
>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>> (kernel, nfs-utils) with something you've built yoursel=
f.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <bfields@fieldses.org>:
>>>>>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halev=
y wrote:
>>>>>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos Andr=E9 <candre=
cn@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Anyone ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2009/7/29 Carlos Andr=E9 <candrecn@gmail.com>:
>>>>>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 serve=
r to work with
>>>>>>>>>>>>>>>>>>>> Kerberos
>>>>>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goe=
s down i get a
>>>>>>>>>>>>>>>>>>>> LOOOOOOONG
>>>>>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client=
=2E..
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logo=
n process, if
>>>>>>>>>>>>>>>>>>>> mount
>>>>>>>>>>>>>>>>>>>> hangs,
>>>>>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to time=
out (if server
>>>>>>>>>>>>>>>>>>>> down)
>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinatio=
ns, there my
>>>>>>>>>>>>>>>>>>>> findings
>>>>>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1=
=2E10) using
>>>>>>>>>>>>>>>>>>>> basic
>>>>>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t n=
fs4 -o
>>>>>>>>>>>>>>>>>>>> sec=3Dkrb5,proto=3D<tcp/udp>) from NFS client:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (prot=
o=3Dtcp OR
>>>>>>>>>>>>>>>>>>>> proto=3Dudp)
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real  3m9.001s)  until s=
how error
>>>>>>>>>>>>>>>>>>>> (mount:
>>>>>>>>>>>>>>>>>>>> mount to
>>>>>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving=
 up))
>>>>>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period=
=2E
>>>>>>>>>>>>>>>>> I thought he was describing a situation where the ser=
ver the server
>>>>>>>>>>>>>>>>> is completely gone and isn't coming back, and wonderi=
ng how to make
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> mount fail faster.  But I may be misunderstanding.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --b.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscr=
ibe
>>>>>>>>>>>>>>>> linux-nfs" in
>>>>>>>>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordo=
mo-info.html
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Chuck Lever
>>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Chuck Lever
>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>

next prev parent reply	other threads:[~2009-08-24 14:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f6ce31e30907291021p769d8bb7jb7a13d0370b87bd6@mail.gmail.com>
     [not found] ` <f6ce31e30908061718u2c527e2eo5cf35f6eb0800fd4@mail.gmail.com>
2009-08-07  6:42   ` AutoFS+NFSv4 server down = LOOOOONG timeout Benny Halevy
2009-08-07 14:04     ` J. Bruce Fields
2009-08-10 18:29       ` Carlos André
2009-08-10 19:18         ` Chuck Lever
2009-08-10 19:43           ` Carlos André
2009-08-10 20:05             ` Carlos André
2009-08-10 20:35               ` Chuck Lever
2009-08-11 12:41                 ` Carlos André
2009-08-11 20:00                   ` Chuck Lever
2009-08-12  2:37                     ` Carlos André
2009-08-12 14:27                       ` Ian Kent
2009-08-12 14:13                     ` Ian Kent
2009-08-12 15:00                       ` Carlos André
2009-08-12 15:20                         ` Ian Kent
2009-08-12 16:40                           ` Carlos André
2009-08-13 14:19                             ` Ian Kent
2009-08-13 14:43                               ` Carlos André
2009-08-13 15:18                                 ` Carlos André
2009-08-18  0:30                                   ` Ian Kent
2009-08-18 13:17                                     ` Chuck Lever
     [not found]                                     ` <1250555418.16878.7.camel-oPQCyYhPoviaaDTPkt0SUw@public.gmane.org>
2009-08-24 13:27                                       ` Carlos André
     [not found]                                         ` <f6ce31e30908240627gff0a7eeu3c884185e6324518-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-08-24 14:57                                           ` Ian Kent [this message]
2009-08-24 18:07                                             ` Carlos André
2009-08-27  8:54                                             ` Ian Kent
2009-08-27 14:38                                               ` Chuck Lever
2009-08-27 14:52                                                 ` Trond Myklebust
2009-08-27 14:54                                                   ` Chuck Lever
2009-08-27 15:00                                                     ` Trond Myklebust
2009-08-27 15:12                                                       ` Chuck Lever
2009-09-17 12:58                                                         ` Carlos André
2009-09-17 13:12                                                           ` Ondrej Valousek
2009-09-22  5:46                                         ` Ian Kent
2009-09-22 17:52                                           ` Carlos André
2009-08-10 20:11             ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A92AA43.6070304@redhat.com \
    --to=ikent@redhat.com \
    --cc=candrecn@gmail.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=nfsv4@linux-nfs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox