Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
@ 2026-05-09 21:04 Maciej W. Rozycki
  2026-05-13  1:06 ` Jakub Kicinski
  0 siblings, 1 reply; 2+ messages in thread
From: Maciej W. Rozycki @ 2026-05-09 21:04 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, linux-kernel

The reset actions of the DEFZA adapters are exceedingly slow, taking up 
to 30 seconds to complete by the device spec and typically in the range 
of 10 seconds in reality, as required for the device RTOS to boot, still 
quite a lot.  Therefore a state machine is used that's interrupt driven, 
however a safety mechanism is required in case of adapter malfunction, 
so that if no state change interrupt has arrived in time, then the 
situation is taken care of.

The safety mechanism depends on the origin of the reset.  For regular 
adapter initialisation at the device probe time a sleep is requested.  
However a reset is also required by the device spec when the adapter has 
transitioned into the halted state, such as in response to a PC Trace 
event in the course of ring fault recovery, possibly a common network 
event.  In that case no sleep is possible as a device halt is reported 
at the hardirq level.

A timer is therefore set up to ensure progress in case no adapter state 
change interrupt has arrived in time, but as from commit 168f6b6ffbee 
("timers: Use del_timer_sync() even on UP") a warning is issued as the 
timer is deleted in the hardirq handler upon an expected state change:

  defza: v.1.1.4  Oct  6 2018  Maciej W. Rozycki
  tc2: DEC FDDIcontroller 700 or 700-C at 0x18000000, irq 4
  tc2: resetting the board...
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.0.0-dirty #2 VOLUNTARY 
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 7070617773203a6d
          0000000000000000 9800000002027ba4 0000000000001000 6465746e69617420
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff800ab7c0>] do_idle+0x40/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108
  
  ---[ end trace 0000000000000000 ]---
  tc2: OK
  tc2: model 700 (DEFZA-AA), MMF PMD, address 08-00-2b-xx-xx-xx
  tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
  tc2: link unavailable
  ------------[ cut here ]------------
  WARNING: kernel/time/timer.c:1611 at __timer_delete_sync+0x104/0x120, CPU#0: swapper/0/0
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G        W           7.0.0-dirty #2 VOLUNTARY 
  Tainted: [W]=WARN
  Stack : 9800000002027d08 00000000140120e0 0000000000000000 ffffffff8089d468
          0000000000000000 0000000000000000 ffffffff807ed6b8 ffffffff80897458
          ffffffff80897400 9800000002027b88 0000000000000000 0000000000000000
          0000000000000000 9800000002027ba4 0000000000001000 0000000000000000
          0000000000000000 ffffffff807ed6b8 00000000140120e0 0000000000000009
          000000000000064b ffffffff800dd14c 0000000000000036 9800000002184000
          0000000000000000 0000000000000020 0000000000000000 ffffffff80910000
          ffffffff8085c000 9800000002027c70 0000000000000001 ffffffff80045fa0
          0000000000000000 0000000000000000 0000000000000000 0000000000000009
          000000000000064b ffffffff800502b8 ffffffff807ed6b8 ffffffff80045fa0
          ...
  Call Trace:
  [<ffffffff800502b8>] show_stack+0x28/0xf0
  [<ffffffff80045fa0>] dump_stack_lvl+0x48/0x7c
  [<ffffffff80068c98>] __warn+0xa0/0x128
  [<ffffffff8004120c>] warn_slowpath_fmt+0x64/0xa4
  [<ffffffff800dd14c>] __timer_delete_sync+0x104/0x120
  [<ffffffff804934ac>] fza_interrupt+0xc74/0xeb8
  [<ffffffff800c6390>] __handle_irq_event_percpu+0x70/0x228
  [<ffffffff800c6560>] handle_irq_event_percpu+0x18/0x78
  [<ffffffff800cc320>] handle_percpu_irq+0x50/0x80
  [<ffffffff800c5970>] generic_handle_irq+0x90/0xd0
  [<ffffffff806e956c>] do_IRQ+0x1c/0x30
  [<ffffffff8004ad4c>] handle_int+0x148/0x154
  [<ffffffff806de8a4>] arch_local_irq_disable+0x4/0x28
  [<ffffffff800ab7d0>] do_idle+0x50/0x108
  [<ffffffff800abb0c>] cpu_startup_entry+0x2c/0x38
  [<ffffffff806dfec8>] kernel_init+0x0/0x108
  
  ---[ end trace 0000000000000000 ]---
  tc2: registered as fddi0

The immediate origin of the new warning is the switch away from aliasing 
del_timer_sync() to del_timer() (timer_delete_sync() to timer_delete() 
in terms of current function names) for UP configurations, which however 
is the only choice for this driver anyway as no SMP hardware supports 
the TURBOchannel bus this device interfaces to.  Therefore there is a 
very remote issue only this is a sign of.

Specifically if an adapter reset issued upon a transition to the halted 
state times out and first triggers fza_reset_timer() for another reset 
assertion, which then schedules fza_reset_timer() for reset deassertion 
and then that second call is pre-empted after poking at the hardware, 
but before the timer has been rearmed and owing to high system load 
causing exceedingly high scheduling latency control is not handed back 
before a transition to the uninitialised state has caused the timer to 
be deleted even before it has been started, then fza_reset_timer() will 
be called yet again and issue another reset even though by then the 
adapter has already recovered.

Prevent this situation from happening by switching to timer_delete() for 
the transition to the halted state and protect the code region affected 
with a spinlock, also to make sure add_timer() has not been called twice 
in a row due to an execution race between the interrupt handler and the 
timer handler (though it could only happen on SMP, but let's keep the 
driver clean).  It's a very unlikely sequence of events to happen and 
therefore there's no point in trying to be overly clever about it, such 
as by placing printk() calls outside the protection.  For the transition 
to the uninitialised state switch to timer_delete_sync_try() instead, so 
that a timer isn't deleted that's just been rearmed by the timer handler 
and needs to watch for the device to come out of reset again (again, an 
SMP scenario only).

Retain timer_delete_sync() invocations outside the hardirq context for a 
stray timer not to fire once device structures have been released.

Fixes: 61414f5ec9834 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
 drivers/net/fddi/defza.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

linux-defza-timer-lock.diff
Index: linux-macro/drivers/net/fddi/defza.c
===================================================================
--- linux-macro.orig/drivers/net/fddi/defza.c
+++ linux-macro/drivers/net/fddi/defza.c
@@ -984,7 +984,7 @@ static irqreturn_t fza_interrupt(int irq
 
 		case FZA_STATE_UNINITIALIZED:
 			netif_carrier_off(dev);
-			timer_delete_sync(&fp->reset_timer);
+			timer_delete_sync_try(&fp->reset_timer);
 			fp->ring_cmd_index = 0;
 			fp->ring_uns_index = 0;
 			fp->ring_rmc_tx_index = 0;
@@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
 			fp->queue_active = 0;
 			netif_stop_queue(dev);
 			pr_debug("%s: queue stopped\n", fp->name);
-			timer_delete_sync(&fp->reset_timer);
+
+			spin_lock(&fp->lock);
+			timer_delete(&fp->reset_timer);
 			pr_warn("%s: halted, reason: %x\n", fp->name,
 				FZA_STATUS_GET_HALT(status));
 			fza_regs_dump(fp);
@@ -1027,6 +1029,8 @@ static irqreturn_t fza_interrupt(int irq
 			fp->timer_state = 0;
 			fp->reset_timer.expires = jiffies + 45 * HZ;
 			add_timer(&fp->reset_timer);
+			spin_unlock(&fp->lock);
+
 			break;
 
 		default:
@@ -1046,7 +1050,9 @@ static irqreturn_t fza_interrupt(int irq
 static void fza_reset_timer(struct timer_list *t)
 {
 	struct fza_private *fp = timer_container_of(fp, t, reset_timer);
+	unsigned long flags;
 
+	spin_lock_irqsave(&fp->lock, flags);
 	if (!fp->timer_state) {
 		pr_err("%s: RESET timed out!\n", fp->name);
 		pr_info("%s: trying harder...\n", fp->name);
@@ -1069,6 +1075,7 @@ static void fza_reset_timer(struct timer
 		fp->reset_timer.expires = jiffies + 45 * HZ;
 	}
 	add_timer(&fp->reset_timer);
+	spin_unlock_irqrestore(&fp->lock, flags);
 }
 
 static int fza_set_mac_address(struct net_device *dev, void *addr)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH net-next] FDDI: defza: Sanitise the reset safety timer
  2026-05-09 21:04 [PATCH net-next] FDDI: defza: Sanitise the reset safety timer Maciej W. Rozycki
@ 2026-05-13  1:06 ` Jakub Kicinski
  0 siblings, 0 replies; 2+ messages in thread
From: Jakub Kicinski @ 2026-05-13  1:06 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	linux-kernel

On Sat, 9 May 2026 22:04:50 +0100 (BST) Maciej W. Rozycki wrote:
>  		case FZA_STATE_UNINITIALIZED:
>  			netif_carrier_off(dev);
> -			timer_delete_sync(&fp->reset_timer);
> +			timer_delete_sync_try(&fp->reset_timer);
>  			fp->ring_cmd_index = 0;
>  			fp->ring_uns_index = 0;
>  			fp->ring_rmc_tx_index = 0;
> @@ -1018,7 +1018,9 @@ static irqreturn_t fza_interrupt(int irq
>  			fp->queue_active = 0;
>  			netif_stop_queue(dev);
>  			pr_debug("%s: queue stopped\n", fp->name);
> -			timer_delete_sync(&fp->reset_timer);
> +
> +			spin_lock(&fp->lock);
> +			timer_delete(&fp->reset_timer);

Noob q, why use timer_delete_sync_try() above ?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-13  1:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-09 21:04 [PATCH net-next] FDDI: defza: Sanitise the reset safety timer Maciej W. Rozycki
2026-05-13  1:06 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox