long cifs timeout when share becomes unavailable

All of lore.kernel.org
 help / color / mirror / Atom feed

* long cifs timeout when share becomes unavailable
@ 2012-05-24  6:13 Sergey Urushkin
       [not found] ` <4FBDD189.6040402-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Urushkin @ 2012-05-24  6:13 UTC (permalink / raw)
  To: linux-cifs-u79uwXL29TY76Z2rM5mHXA

Hi,

there are issues with cifs share mounted via mount.cifs (with recent
kernels): the first 'ls' on the dir where cifs share is mounted after the
server becomes unavailable 1) hangs (can't be interrupted with ^C) and 2)
lasts about 5 minutes. The first problem appears everywhere I tested
(ubuntu 10.04 with any distributed kernel, ubuntu 12.04, fedora 17), but
with old kernels (tested with ubuntu 10.04 2.6.32 and 2.6.35) 'ls' is
uninterruptable but hangs only for about 25 seconds (which makes this
problem really less complex for old kernels). And with new kernels (ubuntu
10.04 3.0, ubuntu 12.04 3.2, fedora 17 3.3) I'm facing very long hangs of
'ls' (the second problem). And many GUI applications (e.g. nautilus,
firefox, gnome-panel,mc) that query that directory for some reason appears
to act the same way as 'ls', so the system becomes unusable for 5(!)
minutes. When the server (tested with samba 3.6, win2003) becomes
unavailable nothing is written on the mounted directory, so I can't
understand why this timeout is so big. Here is how I tested this:

 # mount.cifs //fsrv/home /mnt "-ouser=test,dom=wg,soft"
 Password:
 # time ls /mnt
 Desktop Documents Program Files WINDOWS

 real 0m0.019s
 user 0m0.004s
 sys 0m0.012s
 # iptables -I OUTPUT -d 172.17.0.65 -j DROP
 # time ls /mnt # This 'ls' cannot be interrupted
 ls: cannot access /mnt: Host is down

 real 4m51.668s
 user 0m0.004s
 sys 0m0.016s
 # time ls /mnt # This 'ls' and all others after can be interrupted
 ls: cannot access /mnt: Host is down

 real 0m10.014s
 user 0m0.008s
 sys 0m0.004s

I see these messages in syslog:

 kernel: [ 1625.552044] CIFS VFS: Server fsrv has not responded in 300
seconds. Reconnecting...
 kernel: [ 1655.509422] CIFS VFS: Unexpected lookup error -112

 ...

 And I can not see any timeout options for mount.cifs (except acl timeout).
So, the actual questions are: 1) is there a way to avoid these hangs
(analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout
(analog of 'timeo'?)? Maybe there are some variables in the sources?

Thanks.

-- 
Best regards,
Sergey Urushkin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: long cifs timeout when share becomes unavailable
       [not found] ` <4FBDD189.6040402-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
@ 2012-05-24 10:31   ` Jeff Layton
       [not found]     ` <20120524063131.5bc93ecf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2012-05-24 10:31 UTC (permalink / raw)
  To: Sergey Urushkin; +Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA

On Thu, 24 May 2012 10:13:29 +0400
Sergey Urushkin <urushkin-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org> wrote:

> Hi,
> 
> there are issues with cifs share mounted via mount.cifs (with recent
> kernels): the first 'ls' on the dir where cifs share is mounted after the
> server becomes unavailable 1) hangs (can't be interrupted with ^C) and 2)
> lasts about 5 minutes. The first problem appears everywhere I tested
> (ubuntu 10.04 with any distributed kernel, ubuntu 12.04, fedora 17), but
> with old kernels (tested with ubuntu 10.04 2.6.32 and 2.6.35) 'ls' is
> uninterruptable but hangs only for about 25 seconds (which makes this
> problem really less complex for old kernels). And with new kernels (ubuntu
> 10.04 3.0, ubuntu 12.04 3.2, fedora 17 3.3) I'm facing very long hangs of
> 'ls' (the second problem). And many GUI applications (e.g. nautilus,
> firefox, gnome-panel,mc) that query that directory for some reason appears
> to act the same way as 'ls', so the system becomes unusable for 5(!)
> minutes. When the server (tested with samba 3.6, win2003) becomes
> unavailable nothing is written on the mounted directory, so I can't
> understand why this timeout is so big. Here is how I tested this:
> 
>  # mount.cifs //fsrv/home /mnt "-ouser=test,dom=wg,soft"
>  Password:
>  # time ls /mnt
>  Desktop Documents Program Files WINDOWS
> 
>  real 0m0.019s
>  user 0m0.004s
>  sys 0m0.012s
>  # iptables -I OUTPUT -d 172.17.0.65 -j DROP
>  # time ls /mnt # This 'ls' cannot be interrupted
>  ls: cannot access /mnt: Host is down
> 
>  real 4m51.668s
>  user 0m0.004s
>  sys 0m0.016s
>  # time ls /mnt # This 'ls' and all others after can be interrupted
>  ls: cannot access /mnt: Host is down
> 
>  real 0m10.014s
>  user 0m0.008s
>  sys 0m0.004s
> 
> I see these messages in syslog:
> 
>  kernel: [ 1625.552044] CIFS VFS: Server fsrv has not responded in 300
> seconds. Reconnecting...
>  kernel: [ 1655.509422] CIFS VFS: Unexpected lookup error -112
> 
>  ...
> 
>  And I can not see any timeout options for mount.cifs (except acl timeout).
> So, the actual questions are: 1) is there a way to avoid these hangs
> (analog of 'intr'?) and 2) how can I reduce this unreachable-host timeout
> (analog of 'timeo'?)? Maybe there are some variables in the sources?
> 
> Thanks.
> 

You can try setting the echo_retries kernel module parameter to 1,
which should cut down the wait time to 60s. In 3.4, we've removed
that parm and it's now set to 1 always. The timeout between echo
requests (which is how we detect whether the server is still
responding) is not currently tunable.

-- 
Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: long cifs timeout when share becomes unavailable
       [not found]     ` <20120524063131.5bc93ecf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2012-05-24 12:44       ` Sergey Urushkin
       [not found]         ` <3b8a8049f2d724bdd621aa5e624d4047-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Sergey Urushkin @ 2012-05-24 12:44 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Linux cifs

Jeff Layton писал 24.05.2012 14:31:
> On Thu, 24 May 2012 10:13:29 +0400
> Sergey Urushkin <urushkin-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org> wrote:
>>
>>  And I can not see any timeout options for mount.cifs (except acl 
>> timeout).
>> So, the actual questions are: 1) is there a way to avoid these hangs
>> (analog of 'intr'?) and 2) how can I reduce this unreachable-host 
>> timeout
>> (analog of 'timeo'?)? Maybe there are some variables in the sources?
>
> You can try setting the echo_retries kernel module parameter to 1,
> which should cut down the wait time to 60s. In 3.4, we've removed
> that parm and it's now set to 1 always. The timeout between echo
> requests (which is how we detect whether the server is still
> responding) is not currently tunable.

That works like a charm, 60s is much better than 5m, thanks a lot. From 
documentation (for kernels before 3.4) it isn't clear what behavior this 
parm changes, maybe docs should be fixed some way?
Could you explain to me why SMB_ECHO_INTERVAL is so big by default? Is 
there any side effect of changing it to, for example, "30"? Does this 
change means that every 30 secs (twice more often) client will send 
special packets to the server (even in the case mounted share isn't used 
by any application at the moment)?
What's about my first question? Could I make ls-like applications 
interruptible somehow?

Thanks a lot again!

-- 
Best regards,
Sergey Urushkin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: long cifs timeout when share becomes unavailable
       [not found]         ` <3b8a8049f2d724bdd621aa5e624d4047-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
@ 2012-05-25 12:26           ` Jeff Layton
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2012-05-25 12:26 UTC (permalink / raw)
  To: Sergey Urushkin; +Cc: Linux cifs

On Thu, 24 May 2012 16:44:58 +0400
Sergey Urushkin <urushkin-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org> wrote:

> Jeff Layton писал 24.05.2012 14:31:
> > On Thu, 24 May 2012 10:13:29 +0400
> > Sergey Urushkin <urushkin-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org> wrote:
> >>
> >>  And I can not see any timeout options for mount.cifs (except acl 
> >> timeout).
> >> So, the actual questions are: 1) is there a way to avoid these hangs
> >> (analog of 'intr'?) and 2) how can I reduce this unreachable-host 
> >> timeout
> >> (analog of 'timeo'?)? Maybe there are some variables in the sources?
> >
> > You can try setting the echo_retries kernel module parameter to 1,
> > which should cut down the wait time to 60s. In 3.4, we've removed
> > that parm and it's now set to 1 always. The timeout between echo
> > requests (which is how we detect whether the server is still
> > responding) is not currently tunable.
> 
> That works like a charm, 60s is much better than 5m, thanks a lot. From 
> documentation (for kernels before 3.4) it isn't clear what behavior this 
> parm changes, maybe docs should be fixed some way?

Well, this parm has gone away now in recent kernels so I'm not inclined
to bother with documenting it. If you'd wish to do, maybe write up a
section for the manpage on the socket and reconnect behavior?

> Could you explain to me why SMB_ECHO_INTERVAL is so big by default?

We don't really want to spam the server with a ton of these requests.
The idea is that we wait a while for a response and then check with the
server to see if it's still alive before timing out the packet. Most
servers will respond to most calls within this time period and it's on
par with the default timeo= value for NFS over TCP.

Some calls however can take a very long time (minutes) writes long past
the end of the file, for instance. NTFS doesn't do sparse files so it
has to zero-fill them and that can take a long time on slow storage.

The echo is primarily to allow us to distinguish between servers that
are just slow to respond to certain calls, and those that are truly
unreachable.

> Is 
> there any side effect of changing it to, for example, "30"? Does this 
> change means that every 30 secs (twice more often) client will send 
> special packets to the server (even in the case mounted share isn't used 
> by any application at the moment)?

Yes, reducing that interval will make it send SMB echoes to the server
more frequently.

Will that have any side effects? Probably not, but I've not
experimented with it. I don't see a lot of value in making this tunable
since this is an error condition.

> What's about my first question? Could I make ls-like applications 
> interruptible somehow?
> 

Most of these sleeps are TASK_KILLABLE, so they should respond to fatal
signals (SIGKILL).

-- 
Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-25 12:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-24  6:13 long cifs timeout when share becomes unavailable Sergey Urushkin
     [not found] ` <4FBDD189.6040402-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
2012-05-24 10:31   ` Jeff Layton
     [not found]     ` <20120524063131.5bc93ecf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2012-05-24 12:44       ` Sergey Urushkin
     [not found]         ` <3b8a8049f2d724bdd621aa5e624d4047-MKmKHxvQ5r6HXe+LvDLADg@public.gmane.org>
2012-05-25 12:26           ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.