netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Chandler <chandleg@wizardsworks.org>
To: "Maciej W. Rozycki" <macro@orcam.me.uk>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>,
	Florian Fainelli <f.fainelli@gmail.com>,
	stable@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: Tulip 21142 panic on physical link disconnect
Date: Thu, 26 Jun 2025 10:40:14 -0700	[thread overview]
Message-ID: <dce2d0518711d710806c87f98669cd39@wizardsworks.org> (raw)
In-Reply-To: <c56aabfb06cfc653ff3619da4eacb4c1@wizardsworks.org>

On 2025/06/24 16:18, Greg Chandler wrote:
> On 2025/06/24 16:10, Greg Chandler wrote:
>> On 2025/06/19 17:57, Maciej W. Rozycki wrote:
>>> On Thu, 19 Jun 2025, Greg Chandler wrote:
>>> 
>>>> > > I am still not sure why I could not see that warning on by Cobalt Qube2
>>>> > > trying
>>>> > > to reproduce Greg's original issue, that is with an IP assigned on the
>>>> > > interface yanking the cable did not trigger a timer warning. It could be
>>>> > > that
>>>> > > machine is orders of magnitude slower and has a different CONFIG_HZ value
>>>> > > that
>>>> > > just made it less likely to be seen?
>>>> >
>>>> >  Can it have a different PHY attached?  There's this code:
>>>> >
>>>> > 	if (tp->chip_id == PNIC2)
>>>> > 		tp->link_change = pnic2_lnk_change;
>>>> > 	else if (tp->flags & HAS_NWAY)
>>>> > 		tp->link_change = t21142_lnk_change;
>>>> > 	else if (tp->flags & HAS_PNICNWAY)
>>>> > 		tp->link_change = pnic_lnk_change;
>>>> 
>>>> I'm not sure which of us that was directed at, but for my onboard 
>>>> tulips:
>>> 
>>>  It was for Florian, as obviously your system does trigger the issue.
>>> 
>>>> I found a link to the datasheet (If needed), but have had mixed luck 
>>>> with
>>>> alldatasheets:
>>>> https://www.alldatasheet.com/datasheet-pdf/pdf/75840/MICRO-LINEAR/ML6698CH.html
>>> 
>>>  There's no need to chase hw documentation as the issue isn't 
>>> directly
>>> related to it.
>>> 
>>>  As I noted in the earlier e-mail it seems a regression in the 
>>> handling of
>>> `del_timer_sync', perhaps deliberate, introduced sometime between 
>>> 5.18 and
>>> 6.4.  I suggest that you try 5.18 (or 5.17 as it was 5.18.0-rc2 
>>> actually
>>> here that worked correctly) and see if it still triggers the problem 
>>> and
>>> if it does not then bisect it (perhaps limiting the upper bound to 
>>> 6.4 if
>>> it does trigger it for you, to save an iteration or a couple).  Once 
>>> you
>>> know the offender you'll likely know the solution.  Or you can come 
>>> back
>>> with results and ask for one if unsure.
>>> 
>>>  HTH,
>>> 
>>>   Maciej
>> 
>> 
>> I haven't had keyboard time in quite a few days, but I've been looking 
>> over the code today.
>> I removed the HAS_ACPI from the 21142 setup, only to find later it was 
>> only used in a single function to deal with sleep mode stuff.
>> As I was reading over the driver, I've been taking a look at what 
>> could potentially drop in some of the debgugging statements, and 
>> loaded the module with:
>> 
>> insmod ./tulip.ko tulip_debug=100
>> 
>> [16933.489376] tulip0: EEPROM default media type Autosense
>> [16933.489376] tulip0: Index #0 - Media 10baseT (#0) described by a 
>> 21142 Serial PHY (2) block
>> [16933.489376] tulip0: Index #1 - Media 10baseT-FDX (#4) described by 
>> a 21142 Serial PHY (2) block
>> [16933.489376] tulip0: Index #2 - Media 100baseTx (#3) described by a 
>> 21143 SYM PHY (4) block
>> [16933.489376] tulip0: Index #3 - Media 100baseTx-FDX (#5) described 
>> by a 21143 SYM PHY (4) block
>> [16933.498165] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 
>> 0xa120000, 08:00:2b:86:ab:b1, IRQ 29
>> [16933.498165] tulip 0000:00:09.0 eth0: Restarting 21143 
>> autonegotiation, csr14=0003ffff
>> [16933.498165] tulip 0000:00:09.0: vgaarb: pci_notify
>> [16933.498165] tulip 0000:00:0b.0: vgaarb: pci_notify
>> [16933.498165] tulip 0000:00:0b.0: assign IRQ: got 30
>> [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) 
>> (uninitialized): tulip_mwi_config()
>> [16933.498165] tulip 0000:00:0b.0 (unnamed net_device) 
>> (uninitialized): MWI config cacheline=16, csr0=01a09000
>> [16933.498165] tulip 0000:00:0b.0: enabling bus mastering
>> [16933.505001] tulip1: EEPROM default media type Autosense
>> [16933.505001] tulip1: Index #0 - Media 10baseT (#0) described by a 
>> 21142 Serial PHY (2) block
>> [16933.505001] tulip1: Index #1 - Media 10baseT-FDX (#4) described by 
>> a 21142 Serial PHY (2) block
>> [16933.505001] tulip1: Index #2 - Media 100baseTx (#3) described by a 
>> 21143 SYM PHY (4) block
>> [16933.505001] tulip1: Index #3 - Media 100baseTx-FDX (#5) described 
>> by a 21143 SYM PHY (4) block
>> [16933.513790] net eth1: Digital DS21142/43 Tulip rev 65 at MMIO 
>> 0xa121000, 08:00:2b:86:a8:5b, IRQ 30
>> [16933.513790] tulip 0000:00:0b.0 eth1: Restarting 21143 
>> autonegotiation, csr14=0003ffff
>> [16933.513790] tulip 0000:00:0b.0: vgaarb: pci_notify
>> [16933.609494] tulip 0000:00:09.0 eth109: renamed from eth0
>> [16933.619259] tulip 0000:00:09.0 eth2: renamed from eth109
>> 
>> 
>> 
>> 
>> This popped up when I bound an IP address to the interface (but not 
>> before)
>> 
>> [17042.757875] tulip 0000:00:0b.0 eth1: tulip_up(), irq==30
>> [17042.757875] tulip 0000:00:0b.0 eth1: Restarting 21143 
>> autonegotiation, csr14=0003ffff
>> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0660000
>> [17042.757875] tulip 0000:00:0b.0 eth1: Done tulip_up(), CSR0 
>> f9a09000, CSR5 f0760000 CSR6 b2422202
>> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0660000
>> [17042.757875] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17042.757875] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0660000
>> [17042.758852] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17042.758852] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0660000
>> [17043.033266] tulip 0000:00:09.0 eth2: tulip_up(), irq==29
>> [17043.033266] tulip 0000:00:09.0 eth2: Restarting 21143 
>> autonegotiation, csr14=0003ffff
>> [17043.033266] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17043.033266] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17043.034242] tulip 0000:00:09.0 eth2: Done tulip_up(), CSR0 
>> f9a09000, CSR5 f0760000 CSR6 b2422202
>> [17043.034242] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17043.034242] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17043.034242] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17043.035219] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17043.035219] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17043.330140] e1000: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow 
>> Control: RX/TX
>> [17044.690491] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0268010 new 
>> csr5=0xf0260000
>> [17044.690491] net eth2: 21143 link status interrupt cde1d2ce, CSR5 
>> f0268010, fffbffff
>> [17044.690491] net eth2: Switching to 100baseTx-FDX based on link 
>> negotiation 01e0 & cde1 = 01e0
>> [17044.690491] tulip 0000:00:09.0 eth2: 21143 non-MII 100baseTx-FDX 
>> transceiver control 08af/00a0
>> [17044.690491] tulip 0000:00:09.0 eth2: Setting CSR15 to 
>> 08af0008/00a00008
>> [17044.690491] tulip 0000:00:09.0 eth2: Using media type 
>> 100baseTx-FDX, CSR12 is ce
>> [17044.690491] tulip 0000:00:09.0 eth2:  Setting CSR6 
>> 83860200/b3862202 CSR12 cde1d2ce
>> [17044.690491] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 
>> 7fffbc85
>> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 
>> 7fffbc84
>> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 
>> 7fffbc84
>> [17044.690491] tulip 0000:00:09.0 eth2: Transmit error, Tx status 
>> 7fffbc84
>> [17044.690491] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17044.691468] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf8668000 new 
>> csr5=0xf8668000
>> [17044.691468] net eth2: 21143 link status interrupt cde1d2cc, CSR5 
>> f8668000, fffbffff
>> [17044.691468] net eth2: 21143 100baseTx-FDX link beat good
>> [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17044.691468] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0668010 new 
>> csr5=0xf0660000
>> [17044.691468] net eth2: 21143 link status interrupt 000002c8, CSR5 
>> f0668010, fffbff7f
>> [17044.691468] net eth2: 21143 100baseTx-FDX link beat good
>> [17044.691468] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17045.493225] tulip 0000:00:09.0 eth2: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17045.493225] tulip 0000:00:09.0 eth2: Transmit error, Tx status 
>> 7fffb000
>> [17045.493225] tulip 0000:00:09.0 eth2: exiting interrupt, 
>> csr5=0xf0660000
>> [17045.803772] net eth1: 21143 negotiation status 000021c6, 10baseT
>> [17045.803772] net eth1: 21143 negotiation failed, status 000021c6
>> [17045.803772] net eth1: Testing new 21143 media 100baseTx
>> [17045.803772] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0208100 new 
>> csr5=0xf0200000
>> [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0260000
>> [17045.803772] tulip 0000:00:0b.0 eth1: interrupt  csr5=0xf0670004 new 
>> csr5=0xf0660000
>> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 
>> 7fffbc85
>> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 
>> 7fffbc84
>> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 
>> 7fffbc84
>> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 
>> 7fffbc84
>> [17045.803772] tulip 0000:00:0b.0 eth1: Transmit error, Tx status 
>> 7fffbc84
>> [17045.803772] tulip 0000:00:0b.0 eth1: exiting interrupt, 
>> csr5=0xf0660000
>> [17045.805725] tulip 0000:00:0b.0 eth1: tulip_stop_rxtx() failed (CSR5 
>> 0xf0660000 CSR6 0xb3862002)
>> [17046.053772] net eth2: 21143 negotiation status 000002c8, 
>> 100baseTx-FDX
>> [17046.053772] net eth2: Using NWay-set 100baseTx-FDX media, csr12 
>> 000002c8
>> 
>> 
>> 
>> I'm still working my way through the driver, but I figured I'd post 
>> the additional debug info in case anyone wanted it.
> 
> 
> 
> 
> As I hit send on that last mail, I noticed a line that has not shown up 
> before:
> [17044.690491] net eth2: Switching to 100baseTx-FDX based on link 
> negotiation 01e0 & cde1 = 01e0
> 
> I looked down at the switch, and it was actually linked at 100MB/FDX, 
> until now it has only linked at 10-Half
> 
> The interface worked even with the errors above (I brought the intel 
> adapter hard down and unplugged the cable to check).
> 
> The only thing I have changed is the ACPI disable which should do 
> litterally nothing in this case, and loading the module with a debug 
> flag.
> I am going to reboot the machine to clear out everything and see what 
> exactly did this.  I can't beleive that turning on debugging fixed it, 
> but I have seen much weirder stuff happen.



Another bit of info that might help as I am tracing through this.
Debug levels 1-10 panic:
insmod ./tulip.ko tulip_debug=1
insmod ./tulip.ko tulip_debug=2
insmod ./tulip.ko tulip_debug=3
insmod ./tulip.ko tulip_debug=4
insmod ./tulip.ko tulip_debug=5
insmod ./tulip.ko tulip_debug=6
insmod ./tulip.ko tulip_debug=7
insmod ./tulip.ko tulip_debug=8
insmod ./tulip.ko tulip_debug=9
insmod ./tulip.ko tulip_debug=10

This does not, so hopfully that will narrow the search today:
insmod ./tulip.ko tulip_debug=100

      reply	other threads:[~2025-06-26 17:38 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 22:43 Tulip 21142 panic on physical link disconnect Greg Chandler
2025-06-10 16:27 ` Florian Fainelli
2025-06-10 18:33   ` Greg Chandler
2025-06-10 18:53   ` Greg Chandler
2025-06-16 19:01     ` Florian Fainelli
2025-06-17 18:19       ` Greg Chandler
2025-06-17 18:22         ` Florian Fainelli
2025-06-18 20:59           ` Greg Chandler
2025-06-18 22:51             ` Greg Chandler
2025-06-19 18:57               ` Greg Chandler
2025-06-19 19:36                 ` Maciej W. Rozycki
2025-06-19 19:46                   ` Florian Fainelli
2025-06-19 21:53                     ` Maciej W. Rozycki
2025-06-19 22:56                       ` Greg Chandler
2025-06-19 23:32                         ` Greg Chandler
2025-06-20  0:57                         ` Maciej W. Rozycki
2025-06-24 23:10                           ` Greg Chandler
2025-06-24 23:18                             ` Greg Chandler
2025-06-26 17:40                               ` Greg Chandler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dce2d0518711d710806c87f98669cd39@wizardsworks.org \
    --to=chandleg@wizardsworks.org \
    --cc=f.fainelli@gmail.com \
    --cc=macro@orcam.me.uk \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).