netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Fainelli <f.fainelli@gmail.com>
To: "Maciej W. Rozycki" <macro@orcam.me.uk>,
	Greg Chandler <chandleg@wizardsworks.org>
Cc: stable@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: Tulip 21142 panic on physical link disconnect
Date: Thu, 19 Jun 2025 12:46:35 -0700	[thread overview]
Message-ID: <52564e1f-ab05-4347-bd64-b38a69180499@gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2506192007440.37405@angie.orcam.me.uk>

Hi Maciej,

On 6/19/25 12:36, Maciej W. Rozycki wrote:
> On Thu, 19 Jun 2025, Greg Chandler wrote:
> 
>> So what I know for sure is this:
>> The tulip driver on alpha (generic and DP264) oops/panic on physical
>> disconnect, but only when an IP address is bound.
>> It does not panic when no address is bound to the interface.
>> It does not matter if the driver is compiled in, or if it is compiled as a
>> module.
>> It does not matter if all of the options are set for tulip or if none of them
>> are:
>>      New bus configuration
>>      Use PCI shared mem for NIC registers
>>      Use RX polling (NAPI)
>>      Use Interrupt Mitigation
>> The physical link does not auto-negotiate, and mii-tool does not seem to be
>> able to force it with -F or -A like you would expect it to.
>> The kernel does not drop the "Link is Up/Link is Down" messages when the PHY
>> "links"
>> The switch and interface both show LEDs as if linked at 10-Half-Duplex, and
>> the lights turn off when the link is broken.
>> Subsequently they do relink at 10-Half again if plugged back in.
>> I did also attempt to test the kernel level stack for nfsroot, just to see if
>> it worked prior to init launching everything else, and it did not.
>> I used the same IP configuration for that test as all of the tests in these
>> emails.
>> All of the oops/panics seem to happen at:
>>      kernel/time/timer.c:1657 __timer_delete_sync+0x10c/0x150
> 
>   FYI something's changed a while ago in how `del_timer_sync' is handled
> and I can see a similar warning nowadays with another network driver with
> the MIPS platform.
> 
>   Since I'm the maintainer of said driver I mean to bisect it and figure
> out what's going here, but haven't found time so far owing to other
> commitments (and the driver otherwise works just fine regardless, so it's
> minor annoyance).  If you beat me to it, then I'll gladly accept it, but
> otherwise I'm just letting you know you're not alone with this issue and
> that it's not specific to the DEC Tulip driver on your system.
 > >   For the record:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at kernel/time/timer.c:1563 __timer_delete_sync+0x110/0x118
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.4.0-rc3-00030-gae62c49c0cef #21
> Stack : 807a0000 80095a8c 00000000 00000004 806a0000 00000009 80c09dac 807d0000
>          807a0000 807056ec 80769fac 807a13f3 807d30c4 1000ec00 80c09d58 80787a18
>          00000000 00000000 807056ec 00000000 00000001 80c09c94 00000077 34633236
>          20202020 00000000 807d7311 20202020 807056ec 1000ec00 00000000 00000000
>          806fcb60 806fcb38 807a0000 00000001 00000000 fffffffe 00000000 807d0000
>          ...
> Call Trace:
> [<80048ecc>] show_stack+0x2c/0xf8
> [<80645c88>] dump_stack_lvl+0x34/0x4c
> [<80641d00>] __warn+0xb4/0xe8
> [<80641d84>] warn_slowpath_fmt+0x50/0x88
> [<800b177c>] __timer_delete_sync+0x110/0x118
> [<8040f4b0>] fza_interrupt+0x904/0x1004
> [<80098d7c>] __handle_irq_event_percpu+0x84/0x188
> [<80098f1c>] handle_irq_event+0x38/0xbc
> [<8009d4e4>] handle_level_irq+0xc8/0x208
> [<80098110>] generic_handle_irq+0x44/0x5c
> [<8064f450>] do_IRQ+0x1c/0x28
> [<80041cf0>] dec_irq_dispatch+0x10/0x20
> [<80043754>] handle_int+0x14c/0x158
> [<8008bf64>] do_idle+0x5c/0x15c
> [<8008c368>] cpu_startup_entry+0x20/0x28
> [<8064657c>] kernel_init+0x0/0x114
> 
> ---[ end trace 0000000000000000 ]---
> 
> -- the arrival of this particular device state change interrupt means the
> timer set up just in case the device gets stuck can be deleted, so I'm not
> sure why calling `del_timer_sync' to discard the timer has become a no-no
> now; this code is 20+ years old now, though I sat on it for a while and
> then it took some time and effort to get it upstream too.  The issue has
> started sometime between 5.18 (clean boot) and 6.4 (quoted above).
> 
>   Maybe it'll ring someone's bell and they'll chime in or otherwise I'll
> bisect it... sometime.  Or feel free to start yourself with 5.18, as it's
> not terribly old, only a bit and certainly not so as 2.6 is.

I am still not sure why I could not see that warning on by Cobalt Qube2 
trying to reproduce Greg's original issue, that is with an IP assigned 
on the interface yanking the cable did not trigger a timer warning. It 
could be that machine is orders of magnitude slower and has a different 
CONFIG_HZ value that just made it less likely to be seen?
-- 
Florian

  reply	other threads:[~2025-06-19 19:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 22:43 Tulip 21142 panic on physical link disconnect Greg Chandler
2025-06-10 16:27 ` Florian Fainelli
2025-06-10 18:33   ` Greg Chandler
2025-06-10 18:53   ` Greg Chandler
2025-06-16 19:01     ` Florian Fainelli
2025-06-17 18:19       ` Greg Chandler
2025-06-17 18:22         ` Florian Fainelli
2025-06-18 20:59           ` Greg Chandler
2025-06-18 22:51             ` Greg Chandler
2025-06-19 18:57               ` Greg Chandler
2025-06-19 19:36                 ` Maciej W. Rozycki
2025-06-19 19:46                   ` Florian Fainelli [this message]
2025-06-19 21:53                     ` Maciej W. Rozycki
2025-06-19 22:56                       ` Greg Chandler
2025-06-19 23:32                         ` Greg Chandler
2025-06-20  0:57                         ` Maciej W. Rozycki
2025-06-24 23:10                           ` Greg Chandler
2025-06-24 23:18                             ` Greg Chandler
2025-06-26 17:40                               ` Greg Chandler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52564e1f-ab05-4347-bd64-b38a69180499@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=chandleg@wizardsworks.org \
    --cc=macro@orcam.me.uk \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).