All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: CIFS endless console spammage in 2.6.38.7
Date: Tue, 31 May 2011 13:51:58 -0700	[thread overview]
Message-ID: <4DE554EE.3010402@candelatech.com> (raw)
In-Reply-To: <20110531164408.178eeebf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>

On 05/31/2011 01:44 PM, Jeff Layton wrote:
> On Tue, 31 May 2011 12:45:37 -0700
> Ben Greear<greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>  wrote:
>
>> On 05/31/2011 12:36 PM, Steve French wrote:
>>> This is on setting up a session, so could be something like:
>>> - mount
>>> - do write
>>> - server crash
>>> - attempt to reconnect
>>> - socket returns ENOSOCK
>>> - attempt to reconnect ...
>>> - repeat
>>>
>>> Is this repeatable enough that we could modify the client to stop on
>>> the reconnect to see what is causing the socket to go bad and which
>>> operation we are repeating the reconnect on.
>>
>> Well, ENOTSOCK sounds like a pretty serious coding problem.  Maybe
>> a use-after-close or something?
>>
>> At the least, we could look for some particular errors (such as ENOTSOCK)
>> and print more info and do a more thorough job of cleaning up.
>>
>> Maybe a WARN_ON_ONCE() when the rv is ENOTSOCK as well?
>>
>> Seems we can reproduce this only when our open-filer HA system
>> craps itself during failover, but we can get that to happen usually
>> within hours, sometimes maybe about a day.  And, CIFS errors don't always
>> happen when the HA cluster goes bad.
>>
>> So, I'm happy to test patches, but since it's a bit tricky to
>> reproduce this...I'm hoping to get the best info possible with
>> each patch iteration!
>>
>
> I had a report of a similar problem on a RHEL5 (2.6.18) kernel:
>
>      https://bugzilla.redhat.com/show_bug.cgi?id=704921
>
> In this case, it caused an oops as well. Your problem may or may not be
> the same, but if it is, I suspect that the root cause is a lack of
> clear locking rules for the TCP_Server_Info->tcpStatus.
>
> What I think happened in that case was that the client was in the
> middle of a NEGOTIATE request and got a response, and another reconnect
> occurred while it was processing it. While the client was tearing down
> and creating a new socket, the thread that issued the NEGOTIATE on the
> previous socket marked the tcpStatus as CifsGood.
>
> Fixing it looks to be anything but trivial. I'm not even quite sure how
> to approach it at this point. Suggestions welcome.

Well, I grepped through 2GB of console logs and found no oopses
in my case.

Seems to me that the retry logic either isn't being properly done,
or maybe it's just trying too often and stuck in basically a tight
loop writing logs to the console.  (My HA server cluster is still hosed,
left it busted while debugging this, so there is no way that CIFS can
actually recover the connection as of now.)

If it's just a log-spam tight loop, then rate-limitting the messages
should help, and some timeouts or backoffs should be added to CIFS.

Building new kernels now, and we'll try to reproduce with the
extra debugging code.

Thanks,
Ben

-- 
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc  http://www.candelatech.com

      parent reply	other threads:[~2011-05-31 20:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-31 18:50 CIFS endless console spammage in 2.6.38.7 Ben Greear
     [not found] ` <4DE5385C.1030808-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 19:36   ` Steve French
     [not found]     ` <BANLkTik+Z32vDVjB3_Rt7iPrqpJPJYnpwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-31 19:45       ` Ben Greear
     [not found]         ` <4DE54561.1090906-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 20:44           ` Jeff Layton
     [not found]             ` <20110531164408.178eeebf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-05-31 20:51               ` Steve French
     [not found]                 ` <BANLkTinyb=tekDwPLqxuSqyQfrgc8MykCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-31 20:53                   ` Ben Greear
     [not found]                     ` <4DE55537.5040705-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 20:54                       ` Steve French
     [not found]                         ` <BANLkTimNgW-Ff_50HeuFqmS7PXXjuLmYVw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-01 18:01                           ` Jeff Layton
     [not found]                             ` <20110601140139.079287da-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-01 18:07                               ` Ben Greear
     [not found]                                 ` <4DE67FFE.3040907-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-01 19:06                                   ` Jeff Layton
     [not found]                                     ` <20110601150621.7b465941-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-01 19:17                                       ` Ben Greear
     [not found]                                         ` <4DE69041.5070802-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-03 21:01                                           ` Ben Greear
     [not found]                                             ` <4DE94B97.8090302-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-04  1:42                                               ` Jeff Layton
     [not found]                                                 ` <20110603214204.318602e8-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2011-06-04  5:03                                                   ` Ben Greear
     [not found]                                                     ` <4DE9BCAF.10303-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-04 11:19                                                       ` Jeff Layton
     [not found]                                                         ` <20110604071923.777c666f-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2011-06-06 13:45                                                           ` Jeff Layton
     [not found]                                                             ` <20110606094547.0c04d1c5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-06 15:37                                                               ` Steve French
2011-06-06 16:47                                                               ` Ben Greear
     [not found]                                                                 ` <4DED04AC.7090508-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-06 16:51                                                                   ` Jeff Layton
     [not found]                                                                     ` <20110606125143.56da1fdb-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-06 17:22                                                                       ` Ben Greear
     [not found]                                                                         ` <4DED0CCA.6090305-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-07  1:00                                                                           ` Steve French
     [not found]                                                                             ` <BANLkTimkinsfojB0=Sf5=o5HBOfiTWTsAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-10 18:55                                                                               ` Ben Greear
2011-05-31 20:51               ` Ben Greear [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DE554EE.3010402@candelatech.com \
    --to=greearb-my8/4n5vti7c+919tysfda@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.