All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH 00/14] cifs: overhaul request timeout behavior in CIFS (try #2)
Date: Fri, 10 Dec 2010 10:44:24 -0500	[thread overview]
Message-ID: <1291995877-2276-1-git-send-email-jlayton@redhat.com> (raw)

This is the second spin of the patchset to overhaul timeout behavior in
CIFS. The main differences are bugfixes, mainly to ensure that cifsd
isn't holding the GlobalMid_Lock when calling the mid callback functions.

I've also dropped the patch to change the default to "hard". I think it
would make for better data integrity in the face of reconnection events
but it's probably better to separate that patch from this set.

Finally, I've cleaned up the patch to handle -EAGAIN errors in
cifs_writepages. Rather than retrying in the WB_SYNC_NONE case, it has
cifs_writepages re-mark the page as dirty and just skip it. That should
prevent long hangs in cifs_writepages for non-data-integrity syncs.

This patchset is intended to fix the unreliable behavior in CIFS in the
face of a server that's taking a long time to process requests.  Much of
my rationale for this set has been outlined in the separate discussion
thread entitled:

    "cifs client timeouts and hard/soft mounts"

In general, the current code sets a timeout for all requests that are
sent on the wire. If the server doesn't respond to the request within
that timeout, the client performs a reconnect and retries the request.

This is dangerous and wasteful behavior for the client. Much of the
state of a CIFS mount is bound to the socket connection. Break the
socket connection and state is lost.

I believe this the root cause of some data corruption issues that have
been reported to me. We had a partner report that when they copied a
large file to a CIFS server and then compare the result to the original,
there is sometimes a mismatch. The problem is highly correlated to
messages in the ring buffer that indicate that the client reconnected
the socket during the test run.

Another problem that I can reliably reproduce -- I have win2k8
installed as a VM guest. When I run connectathon tests to that server,
it frequently fails on the test that writes 4GB past the EOF. The
storage on this server is slow, and it can take longer than 180s for
it to zero-fill the output file.

The intent of this patchset is to fundamentally change when the client
decides to reconnect the socket. Instead of the old behavior, this
patchset makes the client wait indefinitely for a response. Rather than
waiting in TASK_UNINTERRUPTIBLE sleep however, the client waits in
TASK_KILLABLE sleep so that fatal signals will end the sleep and
return -ERESTARTSYS to the caller.

In order to determine whether the server is completely dead or just
taking a long time to process requests, this patchset has the client do
an asynchronous SMB echo request every 30s when the client hasn't gotten
a reponse. If the server doesn't respond after 3 echo attempts, the
client will attempt to reconnect the socket.

With this patchset, I can reliably run the connectathon tests against my
slow server. Preliminary results using the proprietary test that was
seeing data corruption have also been promising.

I'd like to see this set considered for inclusion into 2.6.38. Timely
review would be appreciated so that I have time to make changes before
the merge window if they are needed.

Jeff Layton (13):
  cifs: don't fail writepages on -EAGAIN errors
  cifs: make wait_for_free_request take a TCP_Server_Info pointer
  cifs: move mid result processing into common function
  cifs: wait indefinitely for responses
  cifs: don't reconnect server when we don't get a response
  cifs: clean up handle_mid_response
  cifs: allow for different handling of received response
  cifs: handle cancelled requests better
  cifs: add cifs_call_async
  cifs: add ability to send an echo request
  cifs: set up recurring workqueue job to do SMB echo requests
  cifs: reconnect unresponsive servers
  cifs: remove code for setting timeouts on requests

 fs/cifs/cifs_debug.c |    8 +-
 fs/cifs/cifsglob.h   |   19 ++-
 fs/cifs/cifspdu.h    |   15 ++
 fs/cifs/cifsproto.h  |    7 +
 fs/cifs/cifssmb.c    |   55 ++++++++-
 fs/cifs/connect.c    |  146 +++++++++++++++++-----
 fs/cifs/file.c       |   67 +++-------
 fs/cifs/sess.c       |    2 +-
 fs/cifs/transport.c  |  345 ++++++++++++++++++++++----------------------------
 9 files changed, 375 insertions(+), 289 deletions(-)

-- 
1.7.3.2

             reply	other threads:[~2010-12-10 15:44 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-10 15:44 Jeff Layton [this message]
     [not found] ` <1291995877-2276-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-10 15:44   ` [PATCH 01/13] cifs: don't fail writepages on -EAGAIN errors Jeff Layton
     [not found]     ` <1291995877-2276-2-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-10 22:14       ` [PATCH 00/13] cifs: don't fail writepages on -EAGAIN errors (try #2) Jeff Layton
     [not found]         ` <1292019275-7248-1-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-13 20:17           ` Pavel Shilovsky
2010-12-13 20:01       ` [PATCH 01/13] cifs: don't fail writepages on -EAGAIN errors Pavel Shilovsky
     [not found]         ` <AANLkTinzyPMq79aXmzARLpm1+X_GZho38AYR=zuyXKCi-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-13 20:05           ` Jeff Layton
     [not found]             ` <20101213150556.7f0cf2f1-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-12-13 20:10               ` Pavel Shilovsky
2010-12-14  9:26       ` Suresh Jayaraman
     [not found]         ` <4D07383A.6000400-l3A5Bk7waGM@public.gmane.org>
2010-12-14 12:18           ` Jeff Layton
     [not found]             ` <20101214071820.2aa4936b-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-12-16 16:35               ` Steve French
     [not found]                 ` <AANLkTi=soXxgZMXoWrbx2_eJtGQR5iHncXtDO_dbWRX7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-16 17:08                   ` Jeff Layton
2010-12-10 15:44   ` [PATCH 02/13] cifs: make wait_for_free_request take a TCP_Server_Info pointer Jeff Layton
     [not found]     ` <1291995877-2276-3-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-13 20:03       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 03/13] cifs: move mid result processing into common function Jeff Layton
     [not found]     ` <1291995877-2276-4-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-12 23:29       ` Shirish Pargaonkar
     [not found]         ` <AANLkTimGiESxGU4qnQ2fX+xTJw94BR5PspMp981VKKe--JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-13  0:17           ` Jeff Layton
2010-12-14  7:34       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 04/13] cifs: wait indefinitely for responses Jeff Layton
     [not found]     ` <1291995877-2276-5-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-13 20:04       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 05/13] cifs: don't reconnect server when we don't get a response Jeff Layton
     [not found]     ` <1291995877-2276-6-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-13 20:06       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 06/13] cifs: clean up handle_mid_response Jeff Layton
     [not found]     ` <1291995877-2276-7-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-12 23:52       ` Shirish Pargaonkar
     [not found]         ` <AANLkTi=j8j=OxUxJcwn4h6EqvHSH2vrhQkzRxYjAerzi-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-13  0:21           ` Jeff Layton
     [not found]             ` <20101212192152.0c75c5ce-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2010-12-14  7:33               ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 07/13] cifs: allow for different handling of received response Jeff Layton
     [not found]     ` <1291995877-2276-8-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-13 20:21       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 08/13] cifs: handle cancelled requests better Jeff Layton
     [not found]     ` <1291995877-2276-9-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  7:24       ` Pavel Shilovsky
     [not found]         ` <AANLkTinU19tUL-6uwYN64dfE1Rsa+uSiC2fkeBHV+XOS-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-14 11:59           ` Jeff Layton
     [not found]             ` <20101214065935.50a0bdf0-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-12-14 20:40               ` Pavel Shilovsky
     [not found]                 ` <AANLkTi=mFXsJd55CzebrKO24ALnwmduBnFLyYZCRVdP4-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-14 21:33                   ` Steve French
     [not found]                     ` <AANLkTinSC4WKa4ZBeEOWkSQmy6wBhU8=cO09EKy2Qda2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-14 21:44                       ` Jeff Layton
     [not found]                         ` <20101214164407.377304e0-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-12-14 22:22                           ` Steve French
     [not found]                             ` <AANLkTimFQSeMk8ZCbAud+RdU5WQcGrDnKz+dAC_UFzNM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-12-14 23:18                               ` Jeff Layton
     [not found]                                 ` <20101214181829.0075c6c6-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-12-15  4:05                                   ` Steve French
2010-12-15 11:37                                     ` Jeff Layton
2010-12-10 15:44   ` [PATCH 09/13] cifs: add cifs_call_async Jeff Layton
     [not found]     ` <1291995877-2276-10-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  6:52       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 10/13] cifs: add ability to send an echo request Jeff Layton
     [not found]     ` <1291995877-2276-11-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  7:15       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 11/13] cifs: set up recurring workqueue job to do SMB echo requests Jeff Layton
     [not found]     ` <1291995877-2276-12-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  6:57       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 12/13] cifs: reconnect unresponsive servers Jeff Layton
     [not found]     ` <1291995877-2276-13-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  6:57       ` Pavel Shilovsky
2010-12-10 15:44   ` [PATCH 13/13] cifs: remove code for setting timeouts on requests Jeff Layton
     [not found]     ` <1291995877-2276-14-git-send-email-jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-12-14  7:25       ` Pavel Shilovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291995877-2276-1-git-send-email-jlayton@redhat.com \
    --to=jlayton-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.