From: Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
To: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: CIFS endless console spammage in 2.6.38.7
Date: Mon, 06 Jun 2011 09:47:40 -0700 [thread overview]
Message-ID: <4DED04AC.7090508@candelatech.com> (raw)
In-Reply-To: <20110606094547.0c04d1c5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
On 06/06/2011 06:45 AM, Jeff Layton wrote:
> On Sat, 4 Jun 2011 07:19:23 -0400
> Jeff Layton<jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>> On Fri, 03 Jun 2011 22:03:43 -0700
>> Ben Greear<greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org> wrote:
>>
>>> On 06/03/2011 06:42 PM, Jeff Layton wrote:
>>>> On Fri, 03 Jun 2011 14:01:11 -0700
>>>> Ben Greear<greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org> wrote:
>>>>
>>>>> Ok, we had some luck. Here's the backtrace and attending dmesg
>>>>> output. The filer has been doing failover, but it has not gone
>>>>> into a failed state...so, the system *should* be able to reconnect.
>>>>>
>>>>> We have the system in the failed state now and will leave it that way
>>>>> for a bit in case you have some commands you'd like me to run.
>>>>>
>>>>> Aside from the hung cifs processes (anything accessing those mounts
>>>>> gets into the D state), the system seems fine.
>>>>>
>>>>>
>>>>> CIFS VFS: Unexpected lookup error -112
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Unexpected lookup error -11
>>>>> CIFS VFS: Unexpected lookup error -112
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Unexpected lookup error -112
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Unexpected lookup error -11
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: Reconnecting tcp session
>>>>> CIFS VFS: need to reconnect in sendv here
>>>>> ------------[ cut here ]------------
>>>>> WARNING: at /home/greearb/git/linux-2.6.dev.38.y/fs/cifs/transport.c:137 smb_sendv+0x7a/0x2cf [cifs]()
>>>>> BUG: unable to handle kernel
>>>>> Hardware name: X8ST3
>>>>> NULL pointer dereference
>>>>> Modules linked in: at 0000000000000020
>>>>> be2iscsi
>>>>> IP: iscsi_boot_sysfs [<ffffffff81356230>] __sock_recvmsg_nosec+0x12/0x6e
>>>>> bnx2iPGD 0 cnic
>>>>> uio
>>>>> Oops: 0000 [#1] cxgb3iPREEMPT libcxgbiSMP cxgb3
>>>>> mdio
>>>>> last sysfs file: /sys/devices/platform/host10/session7/target10:0:0/10:0:0:0/block/sde/sde1/stat
>>>>> ib_iserCPU 2 rdma_cm
>>>>> Modules linked in: ib_cm be2iscsi iw_cm iscsi_boot_sysfs ib_sa bnx2i ib_mad cnic ib_core uio ib_addr cxgb3i md4 libcxgbi nls_utf8 cxgb3 cifs mdio xt_TPROXY
>>>>> ib_iser rdma_cm nf_tproxy_core ib_cm xt_socket iw_cm ib_sa ip6_tables ib_mad ib_core nf_defrag_ipv6 ib_addr md4 nls_utf8 xt_connlimit cifs xt_TPROXY
>>>>> nf_tproxy_core xt_socket ip6_tables nf_defrag_ipv6 xt_connlimit 8021q garp bridge stp llc fuse macvlan wanlink(P) pktgen iscsi_tcp libiscsi_tcp libiscsi
>>>>> scsi_transport_iscsi nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput i2c_i801 e1000e 8021q i2c_core garp igb bridge ioatdma stp iTCO_wdt llc
>>>>> i7core_edac fuse iTCO_vendor_support macvlan pcspkr wanlink(P) dca pktgen edac_core iscsi_tcp microcode [last unloaded: ipt_addrtype] libiscsi_tcp
>>>>> libiscsi
>>>>> scsi_transport_iscsi
>>>>> Pid: 5047, comm: cifsd Tainted: P 2.6.38.8+ #12 nfs lockdSupermicro X8ST3 fscache/X8ST3 nfs_acl
>>>>> auth_rpcgss
>>>>> RIP: 0010:[<ffffffff81356230>] [<ffffffff81356230>] __sock_recvmsg_nosec+0x12/0x6e
>>>>> sunrpc
>>>>> RSP: 0018:ffff8802e64e5bc0 EFLAGS: 00010286
>>>>> ipv6
>>>>> RAX: 0000000000000000 RBX: ffff8802e64e5c40 RCX: 0000000000000004
>>>>> uinput
>>>>> RDX: ffff8802e64e5e40 RSI: 0000000000000000 RDI: ffff8802e64e5c40
>>>>> i2c_i801
>>>>> RBP: ffff8802e64e5bf0 R08: 0000000000000000 R09: 0000000000000000
>>>>> R10: ffff8802e64e5d80 R11: ffff8802e64e5e40 R12: ffff8802e64e5d10
>>>>> e1000e
>>>>> R13: 0000000000000000 R14: ffff8802e64e5c40 R15: ffff8802e6429f80
>>>>> i2c_core
>>>>> FS: 0000000000000000(0000) GS:ffff8800df440000(0000) knlGS:0000000000000000
>>>>> igb
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>>> ioatdma
>>>>> CR2: 0000000000000020 CR3: 0000000001803000 CR4: 00000000000006e0
>>>>> iTCO_wdt
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> i7core_edac
>>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>>>> iTCO_vendor_supportProcess cifsd (pid: 5047, threadinfo ffff8802e64e4000, task ffff88030482d880)
>>>>> pcspkr
>>>>> Stack:
>>>>> dca ffff8802e64e5c10 edac_core ffffffff81039b72 microcode ffff880200000001 [last unloaded: ipt_addrtype] ffff880305a40fc8
>>>>>
>>>>> ffff8802e64e5e40Pid: 4754, comm: btserver Tainted: P 2.6.38.8+ #12
>>>>> 0000000000000004Call Trace:
>>>>> ffff8802e64e5c30 ffffffff8135792c
>>>>> 0000000000000000 0000000000000000 [<ffffffff8104556a>] ? warn_slowpath_common+0x80/0x98
>>>>> ffff8802e64e5c40 ffffffffffffffff [<ffffffff81045597>] ? warn_slowpath_null+0x15/0x17
>>>>>
>>>>> Call Trace:
>>>>> [<ffffffffa0330a2c>] ? smb_sendv+0x7a/0x2cf [cifs]
>>>>> [<ffffffff81039b72>] ? select_idle_sibling+0xec/0x127
>>>>> [<ffffffff8135792c>] __sock_recvmsg+0x49/0x54
>>>>> [<ffffffff81357e96>] sock_recvmsg+0xa6/0xbf
>>>>> [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
>>>>> [<ffffffff81041ee4>] ? try_to_wake_up+0x1ad/0x1c8
>>>>> [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
>>>>> [<ffffffff81041f0c>] ? default_wake_function+0xd/0xf
>>>>> [<ffffffff8105c7e0>] ? autoremove_wake_function+0x11/0x34
>>>>> [<ffffffffa0330ca2>] ? smb_send+0x21/0x23 [cifs]
>>>>> [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
>>>>> [<ffffffff814164ec>] ? sub_preempt_count+0x92/0xa5
>>>>> [<ffffffffa03311d1>] ? SendReceive+0x13f/0x317 [cifs]
>>>>> [<ffffffff814133f8>] ? _raw_spin_unlock_irqrestore+0x3a/0x47
>>>>> [<ffffffff810382c6>] ? __wake_up+0x3f/0x48
>>>>> [<ffffffffa031d839>] ? CIFSSMBNegotiate+0x191/0x766 [cifs]
>>>>> [<ffffffff81357ee4>] kernel_recvmsg+0x35/0x41
>>>>> [<ffffffff81412526>] ? __mutex_lock_common+0x358/0x3bc
>>>>> [<ffffffffa0321d20>] cifs_demultiplex_thread+0x21e/0xcd9 [cifs]
>>>>> [<ffffffffa031fd7b>] ? cifs_negotiate_protocol+0x37/0x87 [cifs]
>>>>> [<ffffffff8103fde4>] ? get_parent_ip+0x11/0x42
>>>>> [<ffffffff8141259e>] ? __mutex_lock_slowpath+0x14/0x16
>>>>> [<ffffffff8103838e>] ? need_resched+0x1e/0x28
>>>>> [<ffffffffa0315fed>] ? cifs_reconnect_tcon+0x19a/0x2c9 [cifs]
>>>>> [<ffffffffa0321b02>] ? cifs_demultiplex_thread+0x0/0xcd9 [cifs]
>>>>> [<ffffffff8105c7cf>] ? autoremove_wake_function+0x0/0x34
>>>>> [<ffffffffa0321b02>] ? cifs_demultiplex_thread+0x0/0xcd9 [cifs]
>>>>> [<ffffffff8105c3bf>] kthread+0x7d/0x85
>>>>> [<ffffffffa031af96>] ? small_smb_init+0x27/0x70 [cifs]
>>>>> [<ffffffff8100b8e4>] kernel_thread_helper+0x4/0x10
>>>>> [<ffffffffa031c0ad>] ? CIFSSMBWrite2+0xa3/0x242 [cifs]
>>>>> [<ffffffff8105c342>] ? kthread+0x0/0x85
>>>>> [<ffffffff8100b8e0>] ? kernel_thread_helper+0x0/0x10
>>>>> [<ffffffffa032a117>] ? cifs_writepages+0x461/0x714 [cifs]
>>>>> Code: 48 8b 4d d8 [<ffffffff810ac1e8>] ? do_writepages+0x1f/0x28
>>>>> 48 8b [<ffffffff810a4735>] ? __filemap_fdatawrite_range+0x4e/0x50
>>>>> 55 e0 48 [<ffffffff810a4c3c>] ? filemap_fdatawrite+0x1a/0x1c
>>>>> 8b 75 [<ffffffff810a4c56>] ? filemap_write_and_wait+0x18/0x33
>>>>> e8 ff 90 [<ffffffffa0326648>] ? cifs_flush+0x2d/0x60 [cifs]
>>>>> a8 00 [<ffffffff810e901f>] ? filp_close+0x3e/0x6d
>>>>> 00 00 [<ffffffff810e90f6>] ? sys_close+0xa8/0xe2
>>>>> 48 83 c4 [<ffffffff8100aad2>] ? system_call_fastpath+0x16/0x1b
>>>>> 28 5b
>>>>> ---[ end trace 3387e7bab0a9c645 ]---
>>>>
>>>> Kaboom. So you're seeing oopses too. Could you get a listing of the
>>>> place where it oopsed by following the instructions here?
>>>>
>>>> http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Oopses
>>>>
>>>> I suspect that "sock" is NULL in this case too and it blew up in
>>>> kernel_recvmsg.
>>>
>>> I added code to WARN_ON when ssocket was null. This isn't a real panic,
>>> just a WARN_ON:
>>>
>>>
>>> static int
>>> smb_sendv(struct TCP_Server_Info *server, struct kvec *iov, int n_vec)
>>> {
>>> int rc = 0;
>>> int i = 0;
>>> struct msghdr smb_msg;
>>> struct smb_hdr *smb_buffer = iov[0].iov_base;
>>> unsigned int len = iov[0].iov_len;
>>> unsigned int total_len;
>>> int first_vec = 0;
>>> unsigned int smb_buf_length = smb_buffer->smb_buf_length;
>>> struct socket *ssocket = server->ssocket;
>>>
>>> if (ssocket == NULL) {
>>> cERROR(1, "need to reconnect in sendv here");
>>> *** HERE *** WARN_ON_ONCE(1);
>>> return -ENOTSOCK; /* BB eventually add reconnect code here */
>>> }
>>>
>>> A second warn-on when ENOTSOCK is perculated up to the calling stack
>>> a bit causes the other stack dumpage. I think the one above is root
>>> cause...need to figure out how to have it gracefully bail out and re-connect
>>> when it hits this state, as current code just calls this general loop over
>>> and over again.
>>>
>>
>> No, your warning is there, but it's Oopsing too:
>>
>>>>> ------------[ cut here ]------------
>>>>> WARNING: at /home/greearb/git/linux-2.6.dev.38.y/fs/cifs/transport.c:137 smb_sendv+0x7a/0x2cf [cifs]()
>>>>> BUG: unable to handle kernel
>>>>> Hardware name: X8ST3
>>>>> NULL pointer dereference
>>>>> Modules linked in: at 0000000000000020
>>>>> be2iscsi
>>>>> IP: iscsi_boot_sysfs [<ffffffff81356230>] __sock_recvmsg_nosec+0x12/0x6e
>>
>>
>> ...smb_sendv is called by the "send" side which is generally a
>> userspace process. The oops happened on the receive side. cifsd called
>> kernel_recvmsg, and it looks like it passed in a NULL sock pointer.
>>
>
> I suspect that the following (untested) patch will fix this. I think
> the symptoms that you've seen are consistent with the patch
> description. Ben, would you be able to test this in your setup? This
> should at least prevent the oopses.
>
> ------------------[snip]--------------------
>
> [PATCH] cifs: don't allow cifs_reconnect to exit with NULL socket pointer
>
> It's possible for the following set of events to happen:
>
> cifsd calls cifs_reconnect which reconnects the socket. A userspace
> process then calls cifs_negotiate_protocol to handle the NEGOTIATE and
> gets a reply. But, while processing the reply, cifsd calls
> cifs_reconnect again. Eventually the GlobalMid_Lock is dropped and the
> reply from the earlier NEGOTIATE completes and the tcpStatus is set to
> CifsGood. cifs_reconnect then goes through and closes the socket and sets the
> pointer to zero, but because the status is now CifsGood, the new socket
> is not created and cifs_reconnect exits with the socket pointer set to
> NULL.
>
> Fix this by only setting the tcpStatus to CifsGood if the tcpStatus is
> CifsNeedNegotiate, and by making sure that generic_ip_connect is always
> called at least once in cifs_reconnect.
>
> Note that this is not a perfect fix for this issue. It's still possible
> that the NEGOTIATE reply is handled after the socket has been closed and
> reconnected. In that case, the socket state will look correct but it no
> NEGOTIATE was performed on it. In that situation though the server
> should just shut down the socket on the next attempted send, rather
> than causing the oops that occurs today.
>
> Reported-by: Ben Greear<greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
> Signed-off-by: Jeff Layton<jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> fs/cifs/connect.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 84c7307..8bb55bc 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -152,7 +152,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> mid_entry->callback(mid_entry);
> }
>
> - while (server->tcpStatus == CifsNeedReconnect) {
> + do {
> try_to_freeze();
>
> /* we should try only the port we connected to before */
> @@ -167,7 +167,7 @@ cifs_reconnect(struct TCP_Server_Info *server)
> server->tcpStatus = CifsNeedNegotiate;
> spin_unlock(&GlobalMid_Lock);
> }
> - }
> + } while (server->tcpStatus == CifsNeedReconnect);
>
> return rc;
> }
> @@ -3371,7 +3371,7 @@ int cifs_negotiate_protocol(unsigned int xid, struct cifs_ses *ses)
> }
> if (rc == 0) {
> spin_lock(&GlobalMid_Lock);
> - if (server->tcpStatus != CifsExiting)
> + if (server->tcpStatus == CifsNeedNegotiate)
> server->tcpStatus = CifsGood;
> else
> rc = -EHOSTDOWN;
This has some merge issues on 3.6.38.8:
<<<<<<<
while ((server->tcpStatus != CifsExiting) &&
(server->tcpStatus != CifsGood)) {
=======
do {
>>>>>>>
Should I keep your comparison for tcpStatus == CifsNeedReconnect
instead of these != comparisons above?
Thanks,
Ben
--
Ben Greear <greearb-my8/4N5VtI7c+919tysfdA@public.gmane.org>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2011-06-06 16:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-31 18:50 CIFS endless console spammage in 2.6.38.7 Ben Greear
[not found] ` <4DE5385C.1030808-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 19:36 ` Steve French
[not found] ` <BANLkTik+Z32vDVjB3_Rt7iPrqpJPJYnpwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-31 19:45 ` Ben Greear
[not found] ` <4DE54561.1090906-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 20:44 ` Jeff Layton
[not found] ` <20110531164408.178eeebf-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-05-31 20:51 ` Steve French
[not found] ` <BANLkTinyb=tekDwPLqxuSqyQfrgc8MykCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-31 20:53 ` Ben Greear
[not found] ` <4DE55537.5040705-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-05-31 20:54 ` Steve French
[not found] ` <BANLkTimNgW-Ff_50HeuFqmS7PXXjuLmYVw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-01 18:01 ` Jeff Layton
[not found] ` <20110601140139.079287da-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-01 18:07 ` Ben Greear
[not found] ` <4DE67FFE.3040907-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-01 19:06 ` Jeff Layton
[not found] ` <20110601150621.7b465941-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-01 19:17 ` Ben Greear
[not found] ` <4DE69041.5070802-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-03 21:01 ` Ben Greear
[not found] ` <4DE94B97.8090302-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-04 1:42 ` Jeff Layton
[not found] ` <20110603214204.318602e8-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2011-06-04 5:03 ` Ben Greear
[not found] ` <4DE9BCAF.10303-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-04 11:19 ` Jeff Layton
[not found] ` <20110604071923.777c666f-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2011-06-06 13:45 ` Jeff Layton
[not found] ` <20110606094547.0c04d1c5-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-06 15:37 ` Steve French
2011-06-06 16:47 ` Ben Greear [this message]
[not found] ` <4DED04AC.7090508-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-06 16:51 ` Jeff Layton
[not found] ` <20110606125143.56da1fdb-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2011-06-06 17:22 ` Ben Greear
[not found] ` <4DED0CCA.6090305-my8/4N5VtI7c+919tysfdA@public.gmane.org>
2011-06-07 1:00 ` Steve French
[not found] ` <BANLkTimkinsfojB0=Sf5=o5HBOfiTWTsAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-06-10 18:55 ` Ben Greear
2011-05-31 20:51 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DED04AC.7090508@candelatech.com \
--to=greearb-my8/4n5vti7c+919tysfda@public.gmane.org \
--cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox