From mboxrd@z Thu Jan 1 00:00:00 1970 From: Or Gerlitz Subject: Re: [PATCH 3.14-stable] net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one() Date: Sun, 1 Jun 2014 12:30:51 +0300 Message-ID: <538AF2CB.20603@mellanox.com> References: <1401607475-8367-1-git-send-email-weiyang@linux.vnet.ibm.com> <20140601073853.GA8635@richard> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: , Bjorn Helgaas , Amir Vadai , Jack Morgenstein To: Wei Yang , Return-path: Received: from eu1sys200aog117.obsmtp.com ([207.126.144.143]:46071 "EHLO eu1sys200aog117.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756786AbaFAJcP (ORCPT ); Sun, 1 Jun 2014 05:32:15 -0400 In-Reply-To: <20140601073853.GA8635@richard> Sender: netdev-owner@vger.kernel.org List-ID: On 01/06/2014 10:38, Wei Yang wrote: > David, Following are the backport of this patch to 3.14, 3.10, 3.4 and 3.2 stable tree. Wait, I recently noticed that on 3.15-rcX if the host is rebooted when the mlx4_core driver is loaded in SRIOV mode, we crash like that, looking on this now, I think there's chance we can relate it to your upstream change befdf89 "net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()" Or. [ 152.121286] mlx4_core 0000:06:00.0: Received reset from slave:2 [ 152.128031] mlx4_core 0000:06:00.0: Have more references for index 0,no need to modify mac table [ 152.209248] mlx4_core 0000:06:00.0: Received reset from slave:1 [ 152.215889] mlx4_core 0000:06:00.0: Have more references for index 0,no need to modify mac table [ 152.216305] sd 1:0:1:0: [sdd] Synchronizing SCSI cache [ 152.221714] sd 1:0:0:0: [sdc] Synchronizing SCSI cache [ 152.227108] sd 0:0:1:0: [sdb] Synchronizing SCSI cache [ 152.232494] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 152.271991] mlx4_en 0000:06:00.0: removed PHC [ 152.281611] mlx4_core 0000:06:00.0: Have more references for index 0,no need to modify mac table [ 152.318395] mlx4_core 0000:06:00.0: Disabling SR-IOV [ 152.323513] BUG: unable to handle kernel NULL pointer dereference at 0000000000000378 [ 152.331523] IP: [] __mlx4_remove_one+0x20/0x370 [mlx4_core] [ 152.338778] PGD 0 [ 152.340908] Oops: 0000 [#1] PREEMPT SMP [ 152.345058] Modules linked in: netconsole nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 nfs lockd autofs4 8021q sunrpc cpufreq_ondemand bridge stp llc ext4 jbd2 cr c16 raid0 dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm dm_mod ixgbevf microcode pcspkr joydev i2c_i801 sg ehci_pci ehci_hcd mlx4 _ib mlx4_en ioatdma ib_sa ib_mad ib_core ib_addr vxlan ipv6 mlx4_core ixgbe mdio igb dca ptp pps_core hwmon button ext3 jbd sd_mod ata_piix libata scsi_mod uhci_hcd [ 152.392161] CPU: 8 PID: 4557 Comm: reboot Not tainted 3.15.0-rc6+ #149 [ 152.398760] Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c 08/03/2012 [ 152.405954] task: ffff880331fca490 ti: ffff8800bb1a6000 task.ti: ffff8800bb1a6000 [ 152.413507] RIP: 0010:[] [] __mlx4_remove_one+0x20/0x370 [mlx4_core] [ 152.423220] RSP: 0018:ffff8800bb1a7b98 EFLAGS: 00010286 [ 152.428598] RAX: 0000000000000000 RBX: ffff880630a78098 RCX: 0000000000000000 [ 152.435793] RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff880630a78098 [ 152.442987] RBP: ffff8800bb1a7bc8 R08: 0000000000000000 R09: ffffffff81584556 [ 152.450181] R10: ffffea000cc42e18 R11: ffffffff811ab129 R12: ffff880630a78000 [ 152.457374] R13: ffff880630a78000 R14: 0000000000000000 R15: ffff8800bb1a7cc8 [ 152.464568] FS: 00007f60f21f6700(0000) GS:ffff88063fc80000(0000) knlGS:0000000000000000 [ 152.472731] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 152.478538] CR2: 0000000000000378 CR3: 00000000bcdf0000 CR4: 00000000000007e0 [ 152.485734] Stack: [ 152.487805] 0000000000000000 ffff880630a78098 ffff880630a78000 0000000000000000 [ 152.495534] 0000000000000000 ffff8800bb1a7cc8 ffff8800bb1a7bf8 ffffffffa0166c91 [ 152.503264] ffff880630a78098 ffff880630a78098 ffffffffa0181640 ffff880630a78000 [ 152.510978] Call Trace: [ 152.513492] [] mlx4_remove_one+0x31/0x60 [mlx4_core] [ 152.520172] [] pci_device_remove+0x41/0xc0 [ 152.525987] [] __device_release_driver+0x7a/0xe0 [ 152.532320] [] device_release_driver+0x28/0x40 [ 152.538475] [] pci_stop_bus_device+0x9c/0xb0 [ 152.544461] [] pci_stop_and_remove_bus_device+0x11/0x20 [ 152.551399] [] virtfn_remove.clone.0+0xdd/0x140 [ 152.557645] [] ? dev_warn+0x4e/0x50 [ 152.562841] [] pci_disable_sriov+0x5f/0xf0 [ 152.568655] [] __mlx4_remove_one+0x334/0x370 [mlx4_core] [ 152.575685] [] mlx4_remove_one+0x31/0x60 [mlx4_core] [ 152.582364] [] pci_device_shutdown+0x3c/0x90 [ 152.588343] [] device_shutdown+0x15/0x180 [ 152.594065] [] kernel_restart_prepare+0x31/0x40 [ 152.600304] [] kernel_restart+0x11/0x60 [ 152.605851] [] SyS_reboot+0x1b0/0x200 [ 152.611226] [] ? mntput_no_expire+0x33/0x180 [ 152.617204] [] ? mntput+0x1c/0x30 [ 152.622232] [] ? __fput+0x144/0x1f0 [ 152.627432] [] ? ____fput+0x9/0x10 [ 152.632545] [] ? task_work_run+0x8c/0xe0 [ 152.638180] [] ? do_notify_resume+0x74/0x80 [ 152.644075] [] ? __audit_syscall_exit+0x236/0x2e0 [ 152.650490] [] ? int_signal+0x12/0x17 [ 152.655869] [] system_call_fastpath+0x16/0x1b [ 152.661935] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 55 49 89 fd 48 8d bf 98 00 00 00 41 54 53 48 83 ec 08 e8 60 86 18 e1 <8b> 90 78 03 00 00 48 89 c3 85 d2 0f 85 30 02 00 00 f6 40 08 04 [ 152.684806] RIP [] __mlx4_remove_one+0x20/0x370 [mlx4_core] [ 152.692170] RSP [ 152.695723] CR2: 0000000000000378 [ 152.699163] ---[ end trace 9c36c3b85b765771 ]--- > > On 3.14, only this patch is backported. > On 3.10, a previous related one "pass pci_device_id.driver_data to > __mlx4_init_one during reset" is backported too. > On 3.4, "pass pci_device_id.driver_data to __mlx4_init_one during reset" is > not backported, since the slot_reset handler is not presented. > While another one, "Stash PCI ID driver_data in mlx4_priv structure" > is backported to make this patch valid on this version. > On 3.2, The same as 3.4. > > All version are compiled successfully. 3.14 and 3.10 are verified, while 3.4 > and 3.2 are not. > > I am not sure how to make them all in one big patch set, so send them > seperatedly. Each version is contained in one patch set. If there is a better > way for you to merge them, please let me know. > > At last, Happy Children's Day for all :-) >