From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shawn Bohrer Subject: Re: [BUG] 2.6.32.21 bnx2_napi->hw_tx_cons_ptr NULL pointer dereference Date: Wed, 8 Sep 2010 20:34:38 -0500 Message-ID: <20100909013437.GA6400@lintop> References: <20100908224447.GA4979@BohrerMBP.rgmadvisors.com> <1283986692.9271.5.camel@HP1> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , stable@kernel.org To: Michael Chan Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:54649 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755221Ab0IIBeu (ORCPT ); Wed, 8 Sep 2010 21:34:50 -0400 Content-Disposition: inline In-Reply-To: <1283986692.9271.5.camel@HP1> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Sep 08, 2010 at 03:58:12PM -0700, Michael Chan wrote: > > On Wed, 2010-09-08 at 15:44 -0700, Shawn Bohrer wrote: > > Hello, > > > > While testing 2.6.32.21 I had the following Oops occur on one of my > > machines: > > This has been fixed by this patch: > > commit 4327ba435a56ada13eedf3eb332e583c7a0586a9 > > bnx2: Fix netpoll crash. > > But it caused a regression which was later fixed by this patch: > > commit f048fa9c8686119c3858a463cab6121dced7c0bf > > bnx2: Fix hang during rmmod bnx2. > > Both patches are in 2.6.34.2. Thanks. Is there any reason these can't/shouldn't be applied to the 2.6.32.y stable series? It looks like they apply cleanly. > > > > > > Sep 8 12:32:01 dev4 kernel: tcp_v4_hash_offload: sk=ffff88081ba0cb00 > > Sep 8 12:32:01 dev4 kernel: svc: failed to register lockdv1 RPC service (errno 97). > > Sep 8 12:32:04 dev4 sshd[15072]: Accepted publickey for hbi from 10.0.4.17 port 60687 ssh2 > > Sep 8 12:33:03 dev4 sshd[15096]: Accepted publickey for hbi from 10.0.4.17 port 60931 ssh2 > > Sep 8 12:34:03 dev4 sshd[15118]: Accepted publickey for hbi from 10.0.4.17 port 32929 ssh2 > > Sep 8 12:35:03 dev4 sshd[15141]: Accepted publickey for hbi from 10.0.4.17 port 40107 ssh2 > > Sep 8 12:36:03 dev4 sshd[15166]: Accepted publickey for hbi from 10.0.4.17 port 40361 ssh2 > > Sep 8 12:37:02 dev4 sshd[15191]: Connection closed by 10.0.0.104 > > Sep 8 12:37:03 dev4 sshd[15192]: Accepted publickey for hbi from 10.0.4.17 port 40642 ssh2 > > Sep 8 12:38:01 dev4 kernel: tcp_v4_hash_offload: sk=ffff88082f1d5900 > > Sep 8 12:38:01 dev4 kernel: svc: failed to register lockdv1 RPC service (errno 97). > > Sep 8 12:38:01 dev4 kernel: BUG: unable to handle kernel NULL pointer dereference at (null) > > Sep 8 12:38:01 dev4 kernel: IP: [] bnx2_poll_work+0x3a/0x1140 [bnx2] > > Sep 8 12:38:01 dev4 kernel: PGD 0 > > Sep 8 12:38:01 dev4 kernel: Oops: 0000 [#1] SMP > > Sep 8 12:38:01 dev4 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:06.0/0000:0c:00.0/irq > > Sep 8 12:38:01 dev4 kernel: CPU 0 > > Sep 8 12:38:01 dev4 kernel: Modules linked in: ipmi_si mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 netconsole configfs autofs4 fuse ext3 jbd mbcache dm_mirror dm_multipath scsi_dh video output sbs sbshc acpi_pad parport_pc lp parport joydev t3_tom ses enclosure bnx2 sg toecore cxgb3 radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core sr_mod cdrom pata_acpi i5k_amb ata_generic hwmon iTCO_wdt iTCO_vendor_support i5000_edac edac_core serio_raw snd_pcm dcdbas snd_timer snd soundcore snd_page_alloc pcspkr dm_region_hash dm_log dm_mod ata_piix libata shpchp megaraid_sas sd_mod crc_t10dif scsi_mod xfs exportfs uhci_hcd ohci_hcd ssb mmc_core ehci_hcd [last unloaded: ipmi_si] > > Sep 8 12:38:01 dev4 kernel: Pid: 15224, comm: mount.nfs Not tainted 2.6.32.21-1.rgm #1 PowerEdge 1950 > > Sep 8 12:38:01 dev4 kernel: RIP: 0010:[] [] bnx2_poll_work+0x3a/0x1140 [bnx2] > > Sep 8 12:38:01 dev4 kernel: RSP: 0018:ffff88081ba033f8 EFLAGS: 00010092 > > Sep 8 12:38:01 dev4 kernel: RAX: 0000000000000000 RBX: ffff88085b1a9800 RCX: 0000000000000010 > > Sep 8 12:38:01 dev4 kernel: RDX: ffff88085b1a9800 RSI: ffff88085b1a9800 RDI: ffff88085b1a85c0 > > Sep 8 12:38:01 dev4 kernel: RBP: ffff88081ba034f8 R08: 0000000000000000 R09: 0000000000000000 > > Sep 8 12:38:01 dev4 kernel: R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000010 > > Sep 8 12:38:01 dev4 kernel: R13: ffff88085b1a85c0 R14: 0000000000000000 R15: ffff88085bbac240 > > Sep 8 12:38:01 dev4 kernel: FS: 00007fb00eb566e0(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 > > Sep 8 12:38:01 dev4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > Sep 8 12:38:01 dev4 kernel: CR2: 0000000000000000 CR3: 00000008487e4000 CR4: 00000000000406f0 > > Sep 8 12:38:01 dev4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Sep 8 12:38:01 dev4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Sep 8 12:38:01 dev4 kernel: Process mount.nfs (pid: 15224, threadinfo ffff88081ba02000, task ffff880821000700) > > Sep 8 12:38:01 dev4 kernel: Stack: > > Sep 8 12:38:01 dev4 kernel: ffff880857852800 ffff88085b751400 0000000000000001 00000000000003f0 > > Sep 8 12:38:01 dev4 kernel: <0> 000000001ba03518 ffffffff8123aa4b 0000000000000010 0000000000000000 > > Sep 8 12:38:01 dev4 kernel: <0> ffff88081ba03458 ffffffff8121ef00 ffff88085b1a9800 ffff880857c5f600 > > Sep 8 12:38:01 dev4 kernel: Call Trace: > > Sep 8 12:38:01 dev4 kernel: [] ? bit_cursor+0x65b/0x6c0 > > Sep 8 12:38:01 dev4 kernel: [] ? msi_set_mask_bit+0x80/0x90 > > Sep 8 12:38:01 dev4 kernel: [] ? unmask_msi_irq+0x10/0x20 > > Sep 8 12:38:01 dev4 kernel: [] ? default_enable+0x29/0x40 > > Sep 8 12:38:01 dev4 kernel: [] ? check_irq_resend+0x2c/0x80 > > Sep 8 12:38:01 dev4 kernel: [] ? __enable_irq+0x7b/0x90 > > Sep 8 12:38:01 dev4 kernel: [] bnx2_poll_msix+0x3d/0xc0 [bnx2] > > Sep 8 12:38:01 dev4 kernel: [] ? poll_bnx2+0x63/0x80 [bnx2] > > Sep 8 12:38:01 dev4 kernel: [] netpoll_poll+0xd7/0x420 > > Sep 8 12:38:01 dev4 kernel: [] netpoll_send_skb+0x113/0x200 > > Sep 8 12:38:01 dev4 kernel: [] netpoll_send_udp+0x20f/0x220 > > Sep 8 12:38:01 dev4 kernel: [] write_msg+0xc3/0x110 [netconsole] > > Sep 8 12:38:01 dev4 kernel: [] __call_console_drivers+0x75/0x90 > > Sep 8 12:38:01 dev4 kernel: [] _call_console_drivers+0x4a/0x80 > > Sep 8 12:38:01 dev4 kernel: [] release_console_sem+0xf0/0x210 > > Sep 8 12:38:01 dev4 kernel: [] vprintk+0x1b9/0x4e0 > > Sep 8 12:38:01 dev4 kernel: [] printk+0x41/0x43 > > Sep 8 12:38:01 dev4 kernel: [] svc_register+0xdf/0x2b0 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] svc_setup_socket+0xa4/0x360 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] svc_create_socket+0x18b/0x2c0 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] ? rpcb_register_call+0x90/0xf0 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] svc_udp_create+0x1b/0x20 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] svc_create_xprt+0x175/0x2a0 [sunrpc] > > Sep 8 12:38:01 dev4 kernel: [] create_lockd_listener+0x75/0x80 [lockd] > > Sep 8 12:38:01 dev4 kernel: [] create_lockd_family+0x31/0x60 [lockd] > > Sep 8 12:38:01 dev4 kernel: [] lockd_up+0xb2/0x210 [lockd] > > Sep 8 12:38:01 dev4 kernel: [] nlmclnt_init+0x1d/0x70 [lockd] > > Sep 8 12:38:01 dev4 kernel: [] nfs_start_lockd+0x8a/0xc0 [nfs] > > Sep 8 12:38:01 dev4 kernel: [] nfs_create_server+0x162/0x650 [nfs] > > Sep 8 12:38:01 dev4 kernel: [] ? mntput_no_expire+0x29/0xf0 > > Sep 8 12:38:01 dev4 kernel: [] ? pcpu_alloc_area+0x23c/0x340 > > Sep 8 12:38:01 dev4 kernel: [] ? pcpu_next_pop+0x4e/0x70 > > Sep 8 12:38:01 dev4 kernel: [] ? pcpu_alloc+0x3ea/0xa30 > > Sep 8 12:38:01 dev4 kernel: [] nfs_get_sb+0x3e2/0x990 [nfs] > > Sep 8 12:38:01 dev4 kernel: [] vfs_kern_mount+0x7b/0x1b0 > > Sep 8 12:38:01 dev4 kernel: [] do_kern_mount+0x52/0x130 > > Sep 8 12:38:01 dev4 kernel: [] do_mount+0x2d5/0x850 > > Sep 8 12:38:01 dev4 kernel: [] sys_mount+0x98/0xf0 > > Sep 8 12:38:01 dev4 kernel: [] system_call_fastpath+0x16/0x1b > > Sep 8 12:38:01 dev4 kernel: Code: ec d8 00 00 00 0f 1f 44 00 00 48 89 b5 50 ff ff ff 89 95 24 ff ff ff 49 89 fd 89 8d 30 ff ff ff 48 8b 95 50 ff ff ff 48 8b 42 70 <0f> b7 00 3c ff 0f 84 e2 10 00 00 48 8b 8d 50 ff ff ff 66 39 81 > > Sep 8 12:38:01 dev4 kernel: RIP [] bnx2_poll_work+0x3a/0x1140 [bnx2] > > Sep 8 12:38:01 dev4 kernel: RSP > > Sep 8 12:38:01 dev4 kernel: CR2: 0000000000000000 > > Sep 8 12:38:01 dev4 kernel: ---[ end trace 8f10ffb4a2f96c8d ]--- > > > > > > All code > > ======== > > 0: ec in (%dx),%al > > 1: d8 00 fadds (%rax) > > 3: 00 00 add %al,(%rax) > > 5: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > a: 48 89 b5 50 ff ff ff mov %rsi,0xffffffffffffff50(%rbp) > > 11: 89 95 24 ff ff ff mov %edx,0xffffffffffffff24(%rbp) > > 17: 49 89 fd mov %rdi,%r13 > > 1a: 89 8d 30 ff ff ff mov %ecx,0xffffffffffffff30(%rbp) > > 20: 48 8b 95 50 ff ff ff mov 0xffffffffffffff50(%rbp),%rdx > > 27: 48 8b 42 70 mov 0x70(%rdx),%rax > > 2b:* 0f b7 00 movzwl (%rax),%eax <-- trapping instruction > > 2e: 3c ff cmp $0xff,%al > > 30: 0f 84 e2 10 00 00 je 0x1118 > > 36: 48 8b 8d 50 ff ff ff mov 0xffffffffffffff50(%rbp),%rcx > > 3d: 66 data16 > > 3e: 39 .byte 0x39 > > 3f: 81 .byte 0x81 > > > > Code starting with the faulting instruction > > =========================================== > > 0: 0f b7 00 movzwl (%rax),%eax > > 3: 3c ff cmp $0xff,%al > > 5: 0f 84 e2 10 00 00 je 0x10ed > > b: 48 8b 8d 50 ff ff ff mov 0xffffffffffffff50(%rbp),%rcx > > 12: 66 data16 > > 13: 39 .byte 0x39 > > 14: 81 .byte 0x81 > > > > > > I'm not sure if the: > > > > svc: failed to register lockdv1 RPC service (errno 97). > > > > messages are related, but I get those regularly and the crash appears > > to have happened when mounting nfs. I looked a little closer and the > > NULL pointer that is getting dereferenced appears to be the > > bnx2_napi->hw_tx_cons_ptr though that is as far as I looked. Let me > > know if there is any other useful information I can provide. > > > > Thanks, > > Shawn > > > >