From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout3.hostsharing.net ([176.9.242.54]:47423 "EHLO mailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753520AbcJUQgY (ORCPT ); Fri, 21 Oct 2016 12:36:24 -0400 Date: Fri, 21 Oct 2016 18:36:55 +0200 From: Lukas Wunner To: Keith Busch Cc: linux-pci@vger.kernel.org, Bjorn Helgaas , Ralf Baechle , Wei Zhang , Andreas Noever Subject: Re: [PATCHv3 2/5] pci: Add is_removed state Message-ID: <20161021163655.GC4221@wunner.de> References: <1475007815-28354-1-git-send-email-keith.busch@intel.com> <1475007815-28354-3-git-send-email-keith.busch@intel.com> <20161021153714.GA4221@wunner.de> <20161021161515.GA8596@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20161021161515.GA8596@localhost.localdomain> Sender: linux-pci-owner@vger.kernel.org List-ID: On Fri, Oct 21, 2016 at 12:15:16PM -0400, Keith Busch wrote: > On Fri, Oct 21, 2016 at 05:37:14PM +0200, Lukas Wunner wrote: > > With your patch above, the is_removed bit is only set on 0000:09:00.0 > > but not on its children. Consequently the "tg3" driver tries to > > access the hot-removed Broadcom 57762 Ethernet chip as before, > > causing a soft lockup. > > Is that something that can be fixed in the tg3 driver? I don't think > drivers can rely on this patch to fense off their unintended access since > we can't stop tg3 from accesses a removed device before 'is_removed' > is set. I haven't tested yet what happens when the adapter is unplugged while packets are in-flight, but at least unplugging works fine when the adapter is idle (with your series plus the small changes I outlined). *Without* your series, I have to set the interface to down with ifconfig before unplugging. If I ever forget that, the machine locks up: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:2:299] ... Workqueue: pciehp-4 pciehp_power_thread RIP: 0010:[] [] tg3_read32+0xd/0x10 [tg3] ... Call Trace: [] ? tg3_stop_block.constprop.126+0x80/0x110 [tg3] [] ? tg3_abort_hw+0x68/0x2f0 [tg3] [] ? tg3_halt+0x2d/0x180 [tg3] [] ? tg3_stop+0x157/0x210 [tg3] [] ? tg3_close+0x2b/0xe0 [tg3] [] ? __dev_close_many+0x84/0xd0 [] ? dev_close_many+0x74/0x100 [] ? rollback_registered_many+0xfb/0x2e0 [] ? rollback_registered+0x29/0x40 [] ? unregister_netdevice_queue+0x40/0x90 [] ? unregister_netdev+0x18/0x20 [] ? tg3_remove_one+0x8b/0x130 [tg3] [] ? pci_device_remove+0x36/0xb0 [] ? __device_release_driver+0x9a/0x140 [] ? device_release_driver+0x1e/0x30 [] ? pci_stop_bus_device+0x84/0xa0 [] ? pci_stop_bus_device+0x2b/0xa0 [] ? pci_stop_bus_device+0x2b/0xa0 [] ? pci_stop_and_remove_bus_device+0xe/0x20 [] ? pciehp_unconfigure_device+0x9a/0x180 [] ? pciehp_disable_slot+0x3f/0xb0 [] ? pciehp_power_thread+0x85/0xa0 [] ? process_one_work+0x19f/0x3d0 [] ? worker_thread+0x4d/0x450 [] ? process_one_work+0x3d0/0x3d0 [] ? kthread+0xbd/0xe0 [] ? kthread_create_on_node+0x170/0x170 [] ? ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x170/0x170 Being able to just unplug without having to think of ifconfig is already a massive improvement. Thanks, Lukas