From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from constellation.wizardsworks.org (wizardsworks.org [24.234.38.212]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1122E2F1FD6; Thu, 26 Jun 2025 17:38:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=24.234.38.212 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750959542; cv=none; b=cTJvkzsLSmO8z0ICFXbEmZ1Te8gZY+4v821MOZPLBBZ8rDwpZFJw91GnIBDinl3+7GRGvVic1HEMVm3LopKEGm78hYP6rTxn45JSp0HMh8e+KU50vYF3zkjvGRzOs8WHP5Ok8ppDUrh03EWjWLVhKPNlyCddayPGBtq4iqIAW2g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750959542; c=relaxed/simple; bh=DVjIuJM2pezWfUZTjnitRJ5py/hWh0P/9+CZa/0ZHeI=; h=MIME-Version:Date:From:To:Cc:Subject:In-Reply-To:References: Message-ID:Content-Type; b=B98aufdsu30geXM6VV9d2EgpVuCqGrd8y2MDefXPaTr14Jcld+a6QM1RPjkd4vyliC/0BaOegwgBECZHYzGH8al7T4Yz4HZcfDsdzoqNi0IlZvm/+FzOd9S4+g+c+gPYZ5tbMHWu6X1HGwjbXHr8+xak3m1tfSViBkGs8SUU02U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=wizardsworks.org; spf=pass smtp.mailfrom=wizardsworks.org; arc=none smtp.client-ip=24.234.38.212 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=wizardsworks.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=wizardsworks.org Received: from mail.wizardsworks.org (localhost [127.0.0.1]) by constellation.wizardsworks.org (8.18.1/8.18.1) with ESMTP id 55QHeE3B006108; Thu, 26 Jun 2025 10:40:15 -0700 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Thu, 26 Jun 2025 10:40:14 -0700 From: Greg Chandler To: "Maciej W. Rozycki" Cc: "Maciej W. Rozycki" , Florian Fainelli , stable@vger.kernel.org, netdev@vger.kernel.org Subject: Re: Tulip 21142 panic on physical link disconnect In-Reply-To: References: <53bb866f5bb12cc1b6c33b3866007f2b@wizardsworks.org> <02e3f9b8-9e60-4574-88e2-906ccd727829@gmail.com> <385f2469f504dd293775d3c39affa979@wizardsworks.org> <6a079cd0233b33c6faf6af6a1da9661f@wizardsworks.org> <9292e561-09bf-4d70-bcb7-f90f9cfbae7b@gmail.com> <12ccf3e4c24e8db2545f6ccaba8ce273@wizardsworks.org> <8c06f8969e726912b46ef941d36571ad@wizardsworks.org> <52564e1f-ab05-4347-bd64-b38a69180499@gmail.com> <5a21c21844beadb68ead00cb401ca1c0@wizardsworks.org> <2e30ae181acadd45da8cb91619326f37@wizardsworks.org> Message-ID: X-Sender: chandleg@wizardsworks.org Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit On 2025/06/24 16:18, Greg Chandler wrote: > On 2025/06/24 16:10, Greg Chandler wrote: >> On 2025/06/19 17:57, Maciej W. Rozycki wrote: >>> On Thu, 19 Jun 2025, Greg Chandler wrote: >>> >>>> > > I am still not sure why I could not see that warning on by Cobalt Qube2 >>>> > > trying >>>> > > to reproduce Greg's original issue, that is with an IP assigned on the >>>> > > interface yanking the cable did not trigger a timer warning. It could be >>>> > > that >>>> > > machine is orders of magnitude slower and has a different CONFIG_HZ value >>>> > > that >>>> > > just made it less likely to be seen? >>>> > >>>> > Can it have a different PHY attached? There's this code: >>>> > >>>> > if (tp->chip_id == PNIC2) >>>> > tp->link_change = pnic2_lnk_change; >>>> > else if (tp->flags & HAS_NWAY) >>>> > tp->link_change = t21142_lnk_change; >>>> > else if (tp->flags & HAS_PNICNWAY) >>>> > tp->link_change = pnic_lnk_change; >>>> >>>> I'm not sure which of us that was directed at, but for my onboard >>>> tulips: >>> >>> It was for Florian, as obviously your system does trigger the issue. >>> >>>> I found a link to the datasheet (If needed), but have had mixed luck >>>> with >>>> alldatasheets: >>>> https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.html >>> >>> There's no need to chase hw documentation as the issue isn't >>> directly >>> related to it. >>> >>> As I noted in the earlier e-mail it seems a regression in the >>> handling of >>> `del_timer_sync', perhaps deliberate, introduced sometime between >>> 5.18 and >>> 6.4. I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 >>> actually >>> here that worked correctly) and see if it still triggers the problem >>> and >>> if it does not then bisect it (perhaps limiting the upper bound to >>> 6.4 if >>> it does trigger it for you, to save an iteration or a couple). Once >>> you >>> know the offender you'll likely know the solution. Or you can come >>> back >>> with results and ask for one if unsure. >>> >>> HTH, >>> >>> Maciej >> >> >> I haven't had keyboard time in quite a few days, but I've been looking >> over the code today. >> I removed the HAS_ACPI from the 21142 setup, only to find later it was >> only used in a single function to deal with sleep mode stuff. >> As I was reading over the driver, I've been taking a look at what >> could potentially drop in some of the debgugging statements, and >> loaded the module with: >> >> insmod ./tulip.ko tulip_debug=100 >> >> [16933.489376] tulip0: EEPROM default media type Autosense >> [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a >> 21142 Serial PHY (2) block >> [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by >> a 21142 Serial PHY (2) block >> [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a >> 21143 SYM PHY (4) block >> [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described >> by a 21143 SYM PHY (4) block >> [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO >> 0xa120000, 08:00:2b:86:ab:b1, IRQ 29 >> [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 >> autonegotiation, csr14=0003ffff >> [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify >> [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify >> [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30 >> [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) >> (uninitialized): tulip_mwi_config() >> [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) >> (uninitialized): MWI config cacheline=16, csr0=01a09000 >> [16933.498165] tulip 0000:00:0b.0: enabling bus mastering >> [16933.505001] tulip1: EEPROM default media type Autosense >> [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a >> 21142 Serial PHY (2) block >> [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by >> a 21142 Serial PHY (2) block >> [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a >> 21143 SYM PHY (4) block >> [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described >> by a 21143 SYM PHY (4) block >> [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO >> 0xa121000, 08:00:2b:86:a8:5b, IRQ 30 >> [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 >> autonegotiation, csr14=0003ffff >> [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify >> [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0 >> [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109 >> >> >> >> >> This popped up when I bound an IP address to the interface (but not >> before) >> >> [17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30 >> [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 >> autonegotiation, csr14=0003ffff >> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0660000 >> [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 >> f9a09000, CSR5 f0760000 CSR6 b2422202 >> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0660000 >> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0660000 >> [17042.758852] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0660000 >> [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29 >> [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 >> autonegotiation, csr14=0003ffff >> [17043.033266] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 >> f9a09000, CSR5 f0760000 CSR6 b2422202 >> [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17043.034242] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17043.035219] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow >> Control: RX/TX >> [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0268010 new >> csr5=0xf0260000 >> [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 >> f0268010, fffbffff >> [17044.690491] net eth2: Switching to 100baseTx-FDX based on link >> negotiation 01e0 & cde1 = 01e0 >> [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX >> transceiver control 08af/00a0 >> [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to >> 08af0008/00a00008 >> [17044.690491] tulip 0000:00:09.0 eth2: Using media type >> 100baseTx-FDX, CSR12 is ce >> [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR6 >> 83860200/b3862202 CSR12 cde1d2ce >> [17044.690491] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status >> 7fffbc85 >> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status >> 7fffbc84 >> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status >> 7fffbc84 >> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status >> 7fffbc84 >> [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf8668000 new >> csr5=0xf8668000 >> [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 >> f8668000, fffbffff >> [17044.691468] net eth2: 21143 100baseTx-FDX link beat good >> [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17044.691468] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0668010 new >> csr5=0xf0660000 >> [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 >> f0668010, fffbff7f >> [17044.691468] net eth2: 21143 100baseTx-FDX link beat good >> [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17045.493225] tulip 0000:00:09.0 eth2: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status >> 7fffb000 >> [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, >> csr5=0xf0660000 >> [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT >> [17045.803772] net eth1: 21143 negotiation failed, status 000021c6 >> [17045.803772] net eth1: Testing new 21143 media 100baseTx >> [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0208100 new >> csr5=0xf0200000 >> [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0260000 >> [17045.803772] tulip 0000:00:0b.0 eth1: interrupt csr5=0xf0670004 new >> csr5=0xf0660000 >> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status >> 7fffbc85 >> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status >> 7fffbc84 >> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status >> 7fffbc84 >> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status >> 7fffbc84 >> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status >> 7fffbc84 >> [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, >> csr5=0xf0660000 >> [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 >> 0xf0660000 CSR6 0xb3862002) >> [17046.053772] net eth2: 21143 negotiation status 000002c8, >> 100baseTx-FDX >> [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 >> 000002c8 >> >> >> >> I'm still working my way through the driver, but I figured I'd post >> the additional debug info in case anyone wanted it. > > > > > As I hit send on that last mail, I noticed a line that has not shown up > before: > [17044.690491] net eth2: Switching to 100baseTx-FDX based on link > negotiation 01e0 & cde1 = 01e0 > > I looked down at the switch, and it was actually linked at 100MB/FDX, > until now it has only linked at 10-Half > > The interface worked even with the errors above (I brought the intel > adapter hard down and unplugged the cable to check). > > The only thing I have changed is the ACPI disable which should do > litterally nothing in this case, and loading the module with a debug > flag. > I am going to reboot the machine to clear out everything and see what > exactly did this. I can't beleive that turning on debugging fixed it, > but I have seen much weirder stuff happen. Another bit of info that might help as I am tracing through this. Debug levels 1-10 panic: insmod ./tulip.ko tulip_debug=1 insmod ./tulip.ko tulip_debug=2 insmod ./tulip.ko tulip_debug=3 insmod ./tulip.ko tulip_debug=4 insmod ./tulip.ko tulip_debug=5 insmod ./tulip.ko tulip_debug=6 insmod ./tulip.ko tulip_debug=7 insmod ./tulip.ko tulip_debug=8 insmod ./tulip.ko tulip_debug=9 insmod ./tulip.ko tulip_debug=10 This does not, so hopfully that will narrow the search today: insmod ./tulip.ko tulip_debug=100