From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Greear Subject: Re: CIFS endless console spammage in 2.6.38.7 Date: Fri, 03 Jun 2011 22:03:43 -0700 Message-ID: <4DE9BCAF.10303@candelatech.com> References: <4DE5385C.1030808@candelatech.com> <4DE54561.1090906@candelatech.com> <20110531164408.178eeebf@tlielax.poochiereds.net> <4DE55537.5040705@candelatech.com> <20110601140139.079287da@tlielax.poochiereds.net> <4DE67FFE.3040907@candelatech.com> <20110601150621.7b465941@tlielax.poochiereds.net> <4DE69041.5070802@candelatech.com> <4DE94B97.8090302@candelatech.com> <20110603214204.318602e8@corrin.poochiereds.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Steve French , linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Jeff Layton Return-path: In-Reply-To: <20110603214204.318602e8-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org> Sender: linux-cifs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: On 06/03/2011 06:42 PM, Jeff Layton wrote: > On Fri, 03 Jun 2011 14:01:11 -0700 > Ben Greear wrote: > >> Ok, we had some luck. Here's the backtrace and attending dmesg >> output. The filer has been doing failover, but it has not gone >> into a failed state...so, the system *should* be able to reconnect. >> >> We have the system in the failed state now and will leave it that way >> for a bit in case you have some commands you'd like me to run. >> >> Aside from the hung cifs processes (anything accessing those mounts >> gets into the D state), the system seems fine. >> >> >> CIFS VFS: Unexpected lookup error -112 >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Unexpected lookup error -11 >> CIFS VFS: Unexpected lookup error -112 >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Unexpected lookup error -112 >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Unexpected lookup error -11 >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: Reconnecting tcp session >> CIFS VFS: need to reconnect in sendv here >> ------------[ cut here ]------------ >> WARNING: at /home/greearb/git/linux-2.6.dev.38.y/fs/cifs/transport.c:137 smb_sendv+0x7a/0x2cf [cifs]() >> BUG: unable to handle kernel >> Hardware name: X8ST3 >> NULL pointer dereference >> Modules linked in: at 0000000000000020 >> be2iscsi >> IP: iscsi_boot_sysfs [] __sock_recvmsg_nosec+0x12/0x6e >> bnx2iPGD 0 cnic >> uio >> Oops: 0000 [#1] cxgb3iPREEMPT libcxgbiSMP cxgb3 >> mdio >> last sysfs file: /sys/devices/platform/host10/session7/target10:0:0/10:0:0:0/block/sde/sde1/stat >> ib_iserCPU 2 rdma_cm >> Modules linked in: ib_cm be2iscsi iw_cm iscsi_boot_sysfs ib_sa bnx2i ib_mad cnic ib_core uio ib_addr cxgb3i md4 libcxgbi nls_utf8 cxgb3 cifs mdio xt_TPROXY >> ib_iser rdma_cm nf_tproxy_core ib_cm xt_socket iw_cm ib_sa ip6_tables ib_mad ib_core nf_defrag_ipv6 ib_addr md4 nls_utf8 xt_connlimit cifs xt_TPROXY >> nf_tproxy_core xt_socket ip6_tables nf_defrag_ipv6 xt_connlimit 8021q garp bridge stp llc fuse macvlan wanlink(P) pktgen iscsi_tcp libiscsi_tcp libiscsi >> scsi_transport_iscsi nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput i2c_i801 e1000e 8021q i2c_core garp igb bridge ioatdma stp iTCO_wdt llc >> i7core_edac fuse iTCO_vendor_support macvlan pcspkr wanlink(P) dca pktgen edac_core iscsi_tcp microcode [last unloaded: ipt_addrtype] libiscsi_tcp >> libiscsi >> scsi_transport_iscsi >> Pid: 5047, comm: cifsd Tainted: P 2.6.38.8+ #12 nfs lockdSupermicro X8ST3 fscache/X8ST3 nfs_acl >> auth_rpcgss >> RIP: 0010:[] [] __sock_recvmsg_nosec+0x12/0x6e >> sunrpc >> RSP: 0018:ffff8802e64e5bc0 EFLAGS: 00010286 >> ipv6 >> RAX: 0000000000000000 RBX: ffff8802e64e5c40 RCX: 0000000000000004 >> uinput >> RDX: ffff8802e64e5e40 RSI: 0000000000000000 RDI: ffff8802e64e5c40 >> i2c_i801 >> RBP: ffff8802e64e5bf0 R08: 0000000000000000 R09: 0000000000000000 >> R10: ffff8802e64e5d80 R11: ffff8802e64e5e40 R12: ffff8802e64e5d10 >> e1000e >> R13: 0000000000000000 R14: ffff8802e64e5c40 R15: ffff8802e6429f80 >> i2c_core >> FS: 0000000000000000(0000) GS:ffff8800df440000(0000) knlGS:0000000000000000 >> igb >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> ioatdma >> CR2: 0000000000000020 CR3: 0000000001803000 CR4: 00000000000006e0 >> iTCO_wdt >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> i7core_edac >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> iTCO_vendor_supportProcess cifsd (pid: 5047, threadinfo ffff8802e64e4000, task ffff88030482d880) >> pcspkr >> Stack: >> dca ffff8802e64e5c10 edac_core ffffffff81039b72 microcode ffff880200000001 [last unloaded: ipt_addrtype] ffff880305a40fc8 >> >> ffff8802e64e5e40Pid: 4754, comm: btserver Tainted: P 2.6.38.8+ #12 >> 0000000000000004Call Trace: >> ffff8802e64e5c30 ffffffff8135792c >> 0000000000000000 0000000000000000 [] ? warn_slowpath_common+0x80/0x98 >> ffff8802e64e5c40 ffffffffffffffff [] ? warn_slowpath_null+0x15/0x17 >> >> Call Trace: >> [] ? smb_sendv+0x7a/0x2cf [cifs] >> [] ? select_idle_sibling+0xec/0x127 >> [] __sock_recvmsg+0x49/0x54 >> [] sock_recvmsg+0xa6/0xbf >> [] ? get_parent_ip+0x11/0x42 >> [] ? try_to_wake_up+0x1ad/0x1c8 >> [] ? get_parent_ip+0x11/0x42 >> [] ? default_wake_function+0xd/0xf >> [] ? autoremove_wake_function+0x11/0x34 >> [] ? smb_send+0x21/0x23 [cifs] >> [] ? get_parent_ip+0x11/0x42 >> [] ? sub_preempt_count+0x92/0xa5 >> [] ? SendReceive+0x13f/0x317 [cifs] >> [] ? _raw_spin_unlock_irqrestore+0x3a/0x47 >> [] ? __wake_up+0x3f/0x48 >> [] ? CIFSSMBNegotiate+0x191/0x766 [cifs] >> [] kernel_recvmsg+0x35/0x41 >> [] ? __mutex_lock_common+0x358/0x3bc >> [] cifs_demultiplex_thread+0x21e/0xcd9 [cifs] >> [] ? cifs_negotiate_protocol+0x37/0x87 [cifs] >> [] ? get_parent_ip+0x11/0x42 >> [] ? __mutex_lock_slowpath+0x14/0x16 >> [] ? need_resched+0x1e/0x28 >> [] ? cifs_reconnect_tcon+0x19a/0x2c9 [cifs] >> [] ? cifs_demultiplex_thread+0x0/0xcd9 [cifs] >> [] ? autoremove_wake_function+0x0/0x34 >> [] ? cifs_demultiplex_thread+0x0/0xcd9 [cifs] >> [] kthread+0x7d/0x85 >> [] ? small_smb_init+0x27/0x70 [cifs] >> [] kernel_thread_helper+0x4/0x10 >> [] ? CIFSSMBWrite2+0xa3/0x242 [cifs] >> [] ? kthread+0x0/0x85 >> [] ? kernel_thread_helper+0x0/0x10 >> [] ? cifs_writepages+0x461/0x714 [cifs] >> Code: 48 8b 4d d8 [] ? do_writepages+0x1f/0x28 >> 48 8b [] ? __filemap_fdatawrite_range+0x4e/0x50 >> 55 e0 48 [] ? filemap_fdatawrite+0x1a/0x1c >> 8b 75 [] ? filemap_write_and_wait+0x18/0x33 >> e8 ff 90 [] ? cifs_flush+0x2d/0x60 [cifs] >> a8 00 [] ? filp_close+0x3e/0x6d >> 00 00 [] ? sys_close+0xa8/0xe2 >> 48 83 c4 [] ? system_call_fastpath+0x16/0x1b >> 28 5b >> ---[ end trace 3387e7bab0a9c645 ]--- > > Kaboom. So you're seeing oopses too. Could you get a listing of the > place where it oopsed by following the instructions here? > > http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Oopses > > I suspect that "sock" is NULL in this case too and it blew up in > kernel_recvmsg. I added code to WARN_ON when ssocket was null. This isn't a real panic, just a WARN_ON: static int smb_sendv(struct TCP_Server_Info *server, struct kvec *iov, int n_vec) { int rc = 0; int i = 0; struct msghdr smb_msg; struct smb_hdr *smb_buffer = iov[0].iov_base; unsigned int len = iov[0].iov_len; unsigned int total_len; int first_vec = 0; unsigned int smb_buf_length = smb_buffer->smb_buf_length; struct socket *ssocket = server->ssocket; if (ssocket == NULL) { cERROR(1, "need to reconnect in sendv here"); *** HERE *** WARN_ON_ONCE(1); return -ENOTSOCK; /* BB eventually add reconnect code here */ } A second warn-on when ENOTSOCK is perculated up to the calling stack a bit causes the other stack dumpage. I think the one above is root cause...need to figure out how to have it gracefully bail out and re-connect when it hits this state, as current code just calls this general loop over and over again. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com