The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
@ 2026-05-09 21:04 Maciej W. Rozycki
  2026-05-13  1:06 ` Jakub Kicinski
  2026-05-14  0:50 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2026-05-09 21:04 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, linux-kernel

The reset actions of the DEFZA adapters are exceedingly slow, taking up 
to 30 seconds to complete by the device spec and typically in the range 
of 10 seconds in reality, as required for the device RTOS to boot, still 
quite a lot.  Therefore a state machine is used that's interrupt driven, 
however a safety mechanism is required in case of adapter malfunction, 
so that if no state change interrupt has arrived in time, then the 
situation is taken care of.

The safety mechanism depends on the origin of the reset.  For regular 
adapter initialisation at the device probe time a sleep is requested.  
However a reset is also required by the device spec when the adapter has 
transitioned into the halted state, such as in response to a PC Trace 
event in the course of ring fault recovery, possibly a common network 
event.  In that case no sleep is possible as a device halt is reported 
at the hardirq level.

A timer is therefore set up to ensure progress in case no adapter state 
change interrupt has arrived in time, but as from commit 168f6b6ffbee 
("timers: Use del_timer_sync() even on UP") a warning is issued as the 
timer is deleted in the hardirq handler upon an expected state change:

  defza: v.1.1.4  Oct  6 2018  Maciej W. Rozycki
  tc2: DEC FDDIcontroller 700 or 700-C at 0x18000000, irq 4
  tc2: resetting the board...
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-dirty #2 VOLUNTARY 
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 7070617773203a6d
          0000000000000000 9800000002027ba4 0000000000001000 6465746e69617420
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff800ab7c0>] do_idle+0x40/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108
  
  ---[ end trace 0000000000000000 ]---
  tc2: OK
  tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-xx-xx-xx
  tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
  tc2: link unavailable
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #2 VOLUNTARY 
  Tainted: [W]=WARN
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 0000000000000000
          0000000000000000 9800000002027ba4 0000000000001000 0000000000000000
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff806de8a4>] arch_local_irq_disable+0x4/0x28
  [<ffffffff800ab7d0>] do_idle+0x50/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108
  
  ---[ end trace 0000000000000000 ]---
  tc2: registered as fddi0

The immediate origin of the new warning is the switch away from aliasing 
del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete() 
in terms of current function names) for UP configurations, which however 
is the only choice for this driver anyway as no SMP hardware supports 
the TURBOchannel bus this device interfaces to.  Therefore there is a 
very remote issue only this is a sign of.

Specifically if an adapter reset issued upon a transition to the halted 
state times out and first triggers fza_reset_timer() for another reset 
assertion, which then schedules fza_reset_timer() for reset deassertion 
and then that second call is pre-empted after poking at the hardware, 
but before the timer has been rearmed and owing to high system load 
causing exceedingly high scheduling latency control is not handed back 
before a transition to the uninitialised state has caused the timer to 
be deleted even before it has been started, then fza_reset_timer() will 
be called yet again and issue another reset even though by then the 
adapter has already recovered.

Prevent this situation from happening by switching to timer_delete() for 
the transition to the halted state and protect the code region affected 
with a spinlock, also to make sure add_timer() has not been called twice 
in a row due to an execution race between the interrupt handler and the 
timer handler (though it could only happen on SMP, but let's keep the 
driver clean).  It's a very unlikely sequence of events to happen and 
therefore there's no point in trying to be overly clever about it, such 
as by placing printk() calls outside the protection.  For the transition 
to the uninitialised state switch to timer_delete_sync_try() instead, so 
that a timer isn't deleted that's just been rearmed by the timer handler 
and needs to watch for the device to come out of reset again (again, an 
SMP scenario only).

Retain timer_delete_sync() invocations outside the hardirq context for a 
stray timer not to fire once device structures have been released.

Fixes: 61414f5ec9834 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
 drivers/net/fddi/defza.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

linux-defza-timer-lock.diff
Index: linux-macro/drivers/net/fddi/defza.c
===================================================================
--- linux-macro.orig/drivers/net/fddi/defza.c
+++ linux-macro/drivers/net/fddi/defza.c
@@ -984,7 +984,7 @@ static irqreturn_t fza_interrupt(int irq
 
 		case FZA_STATE_UNINITIALIZED:
 			netif_carrier_off(dev);
-			timer_delete_sync(&fp->reset_timer);
+			timer_delete_sync_try(&fp->reset_timer);
 			fp->ring_cmd_index = 0;
 			fp->ring_uns_index = 0;
 			fp->ring_rmc_tx_index = 0;
@@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
 			fp->queue_active = 0;
 			netif_stop_queue(dev);
 			pr_debug("%s: queue stopped\n", fp->name);
-			timer_delete_sync(&fp->reset_timer);
+
+			spin_lock(&fp->lock);
+			timer_delete(&fp->reset_timer);
 			pr_warn("%s: halted, reason: %x\n", fp->name,
 				FZA_STATUS_GET_HALT(status));
 			fza_regs_dump(fp);
@@ -1027,6 +1029,8 @@ static irqreturn_t fza_interrupt(int irq
 			fp->timer_state = 0;
 			fp->reset_timer.expires = jiffies + 45 * HZ;
 			add_timer(&fp->reset_timer);
+			spin_unlock(&fp->lock);
+
 			break;
 
 		default:
@@ -1046,7 +1050,9 @@ static irqreturn_t fza_interrupt(int irq
 static void fza_reset_timer(struct timer_list *t)
 {
 	struct fza_private *fp = timer_container_of(fp, t, reset_timer);
+	unsigned long flags;
 
+	spin_lock_irqsave(&fp->lock, flags);
 	if (!fp->timer_state) {
 		pr_err("%s: RESET timed out!\n", fp->name);
 		pr_info("%s: trying harder...\n", fp->name);
@@ -1069,6 +1075,7 @@ static void fza_reset_timer(struct timer
 		fp->reset_timer.expires = jiffies + 45 * HZ;
 	}
 	add_timer(&fp->reset_timer);
+	spin_unlock_irqrestore(&fp->lock, flags);
 }
 
 static int fza_set_mac_address(struct net_device *dev, void *addr)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
  2026-05-09 21:04 [PATCH net-next] FDDI: defza: Sanitise the reset safety timer Maciej W. Rozycki
@ 2026-05-13  1:06 ` Jakub Kicinski
  2026-05-13 15:29   ` Maciej W. Rozycki
  2026-05-14  0:50 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2026-05-13  1:06 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel

On Sat, 9 May 2026 22:04:50 +0100 (BST) Maciej W. Rozycki wrote:
>  		case FZA_STATE_UNINITIALIZED:
>  			netif_carrier_off(dev);
> -			timer_delete_sync(&fp->reset_timer);
> +			timer_delete_sync_try(&fp->reset_timer);
>  			fp->ring_cmd_index = 0;
>  			fp->ring_uns_index = 0;
>  			fp->ring_rmc_tx_index = 0;
> @@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
>  			fp->queue_active = 0;
>  			netif_stop_queue(dev);
>  			pr_debug("%s: queue stopped\n", fp->name);
> -			timer_delete_sync(&fp->reset_timer);
> +
> +			spin_lock(&fp->lock);
> +			timer_delete(&fp->reset_timer);

Noob q, why use timer_delete_sync_try() above ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
  2026-05-13  1:06 ` Jakub Kicinski
@ 2026-05-13 15:29   ` Maciej W. Rozycki
  2026-05-14  0:18     ` Jakub Kicinski
  0 siblings, 1 reply; 5+ messages in thread
From: Maciej W. Rozycki @ 2026-05-13 15:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel

On Tue, 12 May 2026, Jakub Kicinski wrote:

> >  		case FZA_STATE_UNINITIALIZED:
> >  			netif_carrier_off(dev);
> > -			timer_delete_sync(&fp->reset_timer);
> > +			timer_delete_sync_try(&fp->reset_timer);
> >  			fp->ring_cmd_index = 0;
> >  			fp->ring_uns_index = 0;
> >  			fp->ring_rmc_tx_index = 0;
> > @@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
> >  			fp->queue_active = 0;
> >  			netif_stop_queue(dev);
> >  			pr_debug("%s: queue stopped\n", fp->name);
> > -			timer_delete_sync(&fp->reset_timer);
> > +
> > +			spin_lock(&fp->lock);
> > +			timer_delete(&fp->reset_timer);
> 
> Noob q, why use timer_delete_sync_try() above ?

 I guess it got lost in the lengthy commit description:

> [...]  For the transition 
> to the uninitialised state switch to timer_delete_sync_try() instead, so 
> that a timer isn't deleted that's just been rearmed by the timer handler 
> and needs to watch for the device to come out of reset again (again, an 
> SMP scenario only).

 Does it answer your question, or shall I rephrase it?

 Actually I gave this piece a bit of extra thought before sending this 
final version.  I considered having extra state added and protected with 
the data access spinlock and then using that state to refrain from 
reissuing the reset if the change to the uninitialised state happened 
literally as the reset timer handler has been entered.

 But I concluded it was not worth it given that it's such an unlikely
state to arrive at.  We're talking about recovery from failed recovery 
here.  So I chose to keep it simple: if the timer has already fired, then 
instead just let it re-reset the device and rearm itself, and abandon the 
timer deletion here, exactly what timer_delete_sync_try() does.  After all 
if the device state change happened a moment later, that's what would 
happen anyway with either approach and we gave the device enough time 
margin to come out of reset, so there's no need to be gentle.

 FWIW I want to port this code sometime to the defxx driver, which handles 
a newer generation of devices that are an evolution of and quite similar 
in many respects to ones handled by this defza driver and which in 
particular implement an analogous state machine handled by an RTOS running 
on m68k hardware.  While devices handled by the defxx driver are much 
faster and their reset takes ~1s rather than ~10s, it's still quite a lot 
and the original DEC code which just busy-waits in the hardirq context 
results in a horrible OS latency spike one user has actually complained 
about, as apparently it was quite a frequent event in their setup.

 As the defxx driver can actually run on SMP hardware (and I have MIPS, 
POWER9, RISC-V, and x86 SMP test systems for the defxx driver in addition 
to a bunch of UP machines of various CPU architectures) at that point I 
may revisit the choice made here and make an incremental change so that 
both drivers use the same approach, but I don't think there's a need right 
now.  Then again, I might not, as it might be good enough a path of least 
resistance approach for both drivers after all.

  Maciej

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
  2026-05-13 15:29   ` Maciej W. Rozycki
@ 2026-05-14  0:18     ` Jakub Kicinski
  0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-05-14  0:18 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel

On Wed, 13 May 2026 16:29:37 +0100 (BST) Maciej W. Rozycki wrote:
>  I guess it got lost in the lengthy commit description:

Indeed :) Thanks for re-explaining 

> > [...]  For the transition 
> > to the uninitialised state switch to timer_delete_sync_try() instead, so 
> > that a timer isn't deleted that's just been rearmed by the timer handler 
> > and needs to watch for the device to come out of reset again (again, an 
> > SMP scenario only). 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
  2026-05-09 21:04 [PATCH net-next] FDDI: defza: Sanitise the reset safety timer Maciej W. Rozycki
  2026-05-13  1:06 ` Jakub Kicinski
@ 2026-05-14  0:50 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-14  0:50 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 9 May 2026 22:04:50 +0100 (BST) you wrote:
> The reset actions of the DEFZA adapters are exceedingly slow, taking up
> to 30 seconds to complete by the device spec and typically in the range
> of 10 seconds in reality, as required for the device RTOS to boot, still
> quite a lot.  Therefore a state machine is used that's interrupt driven,
> however a safety mechanism is required in case of adapter malfunction,
> so that if no state change interrupt has arrived in time, then the
> situation is taken care of.
> 
> [...]

Here is the summary with links:
  - [net-next] FDDI: defza: Sanitise the reset safety timer
    https://git.kernel.org/netdev/net/c/4694efc41641

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-14  0:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 21:04 [PATCH net-next] FDDI: defza: Sanitise the reset safety timer Maciej W. Rozycki
2026-05-13  1:06 ` Jakub Kicinski
2026-05-13 15:29   ` Maciej W. Rozycki
2026-05-14  0:18     ` Jakub Kicinski
2026-05-14  0:50 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox