Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
       [not found]             ` <4fba57892e5bd6a1afc4a36a80b40e3ecc28cac5.camel@web.de>
@ 2026-02-17 11:24               ` Bert Karwatzki
  2026-02-17 16:52                 ` Bert Karwatzki
  0 siblings, 1 reply; 11+ messages in thread
From: Bert Karwatzki @ 2026-02-17 11:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, spasswolf,
	Jakub Kicinski, Eric Dumazet, netdev

Am Dienstag, dem 17.02.2026 um 11:42 +0100 schrieb Bert Karwatzki:
> 
> I just wondered if we can completely skip the
> 
> 	if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> 		[...]
> 	} else 
> 	{
> 		/* "Recursion" alert */
> 	}
>  
> check, as the synchronization will we provided by HARD_TX_{LOCK,UNLOCK}.
> 

I thought about that again, and it seems like a bad idea as (in the non-preempt) case
other threads trying to access the queue would wait for the spinlock to be freed, perhaps
one can just change the code like this:

commit 05026868843a4eea51d45811d87706f36896e828
Author: Bert Karwatzki <spasswolf@web.de>
Date:   Tue Feb 17 12:08:35 2026 +0100

    net: core: dev: don't warn about recursion when on same CPU
    
    This prints a message if we're on the same CPU and the lock is
    already taken, in a production use we would of course skip this
    message.
    
    Signed-off-by: Bert Karwatzki <spasswolf@web.de>

diff --git a/net/core/dev.c b/net/core/dev.c
index 5b536860138d..cac5588640b3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4784,15 +4784,18 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 			net_crit_ratelimited("Virtual device %s asks to queue packet!\n",
 					     dev->name);
 		} else {
-			/* Recursion is detected! It is possible,
-			 * unfortunately
-			 */
-recursion_alert:
-			net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
-					     dev->name);
+			net_crit_ratelimited("Lock taken already on %s!\n", dev->name);
+			goto lock_taken;
 		}
 	}
 
+	/* Recursion is detected! It is possible,
+	 * unfortunately
+	 */
+recursion_alert:
+	net_crit_ratelimited("Dead loop on virtual device %s, fix it urgently!\n",
+					     dev->name);
+lock_taken:
 	rc = -ENETDOWN;
 	rcu_read_unlock_bh();
 

With this I get these messages (be skipped in production code)
instead of the "Dead loop on virtual device":

[   51.994435] [   T1525] Lock taken already on wlp4s0!
[   51.994546] [   T1525] Lock taken already on wlp4s0!
[   51.994650] [   T1525] Lock taken already on wlp4s0!
[   51.994746] [   T1525] Lock taken already on wlp4s0!
[   51.994845] [   T1525] Lock taken already on wlp4s0!
[   51.994948] [   T1525] Lock taken already on wlp4s0!
[   51.995037] [   T1525] Lock taken already on wlp4s0!
[   51.995128] [   T1525] Lock taken already on wlp4s0!
[   51.995220] [   T1525] Lock taken already on wlp4s0!


Bert Karwatzki

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-17 11:24               ` "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT Bert Karwatzki
@ 2026-02-17 16:52                 ` Bert Karwatzki
  2026-02-17 19:10                   ` Bert Karwatzki
  0 siblings, 1 reply; 11+ messages in thread
From: Bert Karwatzki @ 2026-02-17 16:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, Jakub Kicinski,
	Eric Dumazet, netdev, spasswolf

Am Dienstag, dem 17.02.2026 um 12:24 +0100 schrieb Bert Karwatzki:
> Am Dienstag, dem 17.02.2026 um 11:42 +0100 schrieb Bert Karwatzki:
> > 
> > I just wondered if we can completely skip the
> > 
> > 	if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > 		[...]
> > 	} else 
> > 	{
> > 		/* "Recursion" alert */
> > 	}
> >  
> > check, as the synchronization will we provided by HARD_TX_{LOCK,UNLOCK}.
> > 
> 
> I thought about that again, and it seems like a bad idea as (in the non-preempt) case
> other threads trying to access the queue would wait for the spinlock to be freed, perhaps
> one can just change the code like this:

My argument above seems wrong: In the non-preempt case we cannot have another thread accessing the txq
from the same CPU if the lock is taken (it is not preemptible) and for other threads
accessing the txq from different CPUs the check (txq->xmit_lock_owner != cpu) would succeed
and they would try the spinlock anyway, So this would not speak against killing the lock owner
check. As for the recursion detection, perhaps dev_xmit_recursion() is enough?

Another Idea (more vague ...):

Using the CPU Id as the lock_owner seems to make sense for locks that are not
preemptible (raw_spinlock in the RT case). For preemptible locks a thread ID (which one exactly I'm not sure ...)
would perhaps make a better lock owner ...

Bert Karwatzki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-17 16:52                 ` Bert Karwatzki
@ 2026-02-17 19:10                   ` Bert Karwatzki
  2026-02-18  7:30                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Bert Karwatzki @ 2026-02-17 19:10 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, Jakub Kicinski,
	Eric Dumazet, netdev, spasswolf

Am Dienstag, dem 17.02.2026 um 17:52 +0100 schrieb Bert Karwatzki:
> Am Dienstag, dem 17.02.2026 um 12:24 +0100 schrieb Bert Karwatzki:
> > Am Dienstag, dem 17.02.2026 um 11:42 +0100 schrieb Bert Karwatzki:
> > > 
> > > I just wondered if we can completely skip the
> > > 
> > > 	if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > > 		[...]
> > > 	} else 
> > > 	{
> > > 		/* "Recursion" alert */
> > > 	}
> > >  
> > > check, as the synchronization will we provided by HARD_TX_{LOCK,UNLOCK}.
> > > 
> > 
> > I thought about that again, and it seems like a bad idea as (in the non-preempt) case
> > other threads trying to access the queue would wait for the spinlock to be freed, perhaps
> > one can just change the code like this:
> 
> My argument above seems wrong: In the non-preempt case we cannot have another thread accessing the txq
> from the same CPU if the lock is taken (it is not preemptible) and for other threads
> accessing the txq from different CPUs the check (txq->xmit_lock_owner != cpu) would succeed
> and they would try the spinlock anyway, So this would not speak against killing the lock owner
> check. As for the recursion detection, perhaps dev_xmit_recursion() is enough?

I tried to research the original commit which introduced the xmit_lock_owner check, but
it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).

Bert Karwatzki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-17 19:10                   ` Bert Karwatzki
@ 2026-02-18  7:30                     ` Sebastian Andrzej Siewior
  2026-02-18 12:50                       ` Bert Karwatzki
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-18  7:30 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, Jakub Kicinski,
	Eric Dumazet, netdev

On 2026-02-17 20:10:09 [+0100], Bert Karwatzki wrote:
> 
> I tried to research the original commit which introduced the xmit_lock_owner check, but
> it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
> so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
> not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).

Okay. Thank you. I add it to my list.

> Bert Karwatzki

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-18  7:30                     ` Sebastian Andrzej Siewior
@ 2026-02-18 12:50                       ` Bert Karwatzki
  2026-02-26 17:29                         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Bert Karwatzki @ 2026-02-18 12:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, Jakub Kicinski,
	Eric Dumazet, netdev, spasswolf

Am Mittwoch, dem 18.02.2026 um 08:30 +0100 schrieb Sebastian Andrzej Siewior:
> On 2026-02-17 20:10:09 [+0100], Bert Karwatzki wrote:
> > 
> > I tried to research the original commit which introduced the xmit_lock_owner check, but
> > it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
> > so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
> > not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).
> 
> Okay. Thank you. I add it to my list.
> 
I've thought about it again and I now think the xmit_lock_owner check IS necessary to
avoid deadlocks on recursion in the non-RT case.

My idea to use get_current()->tgid as lock owner also does not work as interrupts are still enabled
and __dev_queue_xmit() can be called from interrupt context. So in a situation where an interrupt occurs
after the lock has been taken and the interrupt handler calls __dev_queue_xmit() on the same CPU a deadlock
would occur.

Bert Karwatzki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-18 12:50                       ` Bert Karwatzki
@ 2026-02-26 17:29                         ` Sebastian Andrzej Siewior
  2026-03-18 10:30                           ` Daniel Vacek
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-26 17:29 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: Thomas Gleixner, linux-kernel, linux-rt-devel, Jakub Kicinski,
	Eric Dumazet, netdev

On 2026-02-18 13:50:14 [+0100], Bert Karwatzki wrote:
> Am Mittwoch, dem 18.02.2026 um 08:30 +0100 schrieb Sebastian Andrzej Siewior:
> > On 2026-02-17 20:10:09 [+0100], Bert Karwatzki wrote:
> > > 
> > > I tried to research the original commit which introduced the xmit_lock_owner check, but
> > > it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
> > > so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
> > > not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).
> > 
> > Okay. Thank you. I add it to my list.
> > 
> I've thought about it again and I now think the xmit_lock_owner check IS necessary to
> avoid deadlocks on recursion in the non-RT case.
> 
> My idea to use get_current()->tgid as lock owner also does not work as interrupts are still enabled
> and __dev_queue_xmit() can be called from interrupt context. So in a situation where an interrupt occurs
> after the lock has been taken and the interrupt handler calls __dev_queue_xmit() on the same CPU a deadlock
> would occur.

The warning happens because taskA on cpuX goes through
HARD_TX_LOCK(), gets preempted and then taskB on cpuX wants also to send
send a packet. The second one throws the warning.

We could ignore this check because a deadlock will throw a warning and
"halt" the task that runs into the deadlock.
But then we could be smart about this in the same way !RT is and
pro-active check for the simple deadlock. The lock owner of
netdev_queue::_xmit_lock is recorded can be checked vs current.

The snippet below should work. I need to see if tomorrow this is still a
good idea.

diff --git a/net/core/dev.c b/net/core/dev.c
index 6ff4256700e60..de342ceb17201 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 		/* Other cpus might concurrently change txq->xmit_lock_owner
 		 * to -1 or to their cpu id, but not to our id.
 		 */
-		if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
+		if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {
 			if (dev_xmit_recursion())
 				goto recursion_alert;
 

> Bert Karwatzki

Sebastian

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-02-26 17:29                         ` Sebastian Andrzej Siewior
@ 2026-03-18 10:30                           ` Daniel Vacek
  2026-03-18 11:18                             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vacek @ 2026-03-18 10:30 UTC (permalink / raw)
  To: bigeasy
  Cc: edumazet, kuba, linux-kernel, linux-rt-devel, netdev, spasswolf,
	tglx, Daniel Vacek

On Thu, 26 Feb 2026 18:29:27 +0100, Sebastian Andrzej Siewior wrote:
> On 2026-02-18 13:50:14 [+0100], Bert Karwatzki wrote:
> > Am Mittwoch, dem 18.02.2026 um 08:30 +0100 schrieb Sebastian Andrzej Siewior:
> > > On 2026-02-17 20:10:09 [+0100], Bert Karwatzki wrote:
> > > > 
> > > > I tried to research the original commit which introduced the xmit_lock_owner check, but
> > > > it is present since linux 2.3.6 (released 19990610) (when __dev_queue_xmit() was still called dev_queue_xmit()),
> > > > so I can't tell the original idea behind that check (perhaps recuesion detection ...), so I'm
> > > > not completely sure if it can be omitted (and just let dev_xmit_recursion() do the recursion checking).
> > > 
> > > Okay. Thank you. I add it to my list.
> > > 
> > I've thought about it again and I now think the xmit_lock_owner check IS necessary to
> > avoid deadlocks on recursion in the non-RT case.
> > 
> > My idea to use get_current()->tgid as lock owner also does not work as interrupts are still enabled
> > and __dev_queue_xmit() can be called from interrupt context. So in a situation where an interrupt occurs
> > after the lock has been taken and the interrupt handler calls __dev_queue_xmit() on the same CPU a deadlock
> > would occur.
> 
> The warning happens because taskA on cpuX goes through
> HARD_TX_LOCK(), gets preempted and then taskB on cpuX wants also to send
> send a packet. The second one throws the warning.
> 
> We could ignore this check because a deadlock will throw a warning and
> "halt" the task that runs into the deadlock.
> But then we could be smart about this in the same way !RT is and
> pro-active check for the simple deadlock. The lock owner of
> netdev_queue::_xmit_lock is recorded can be checked vs current.
> 
> The snippet below should work. I need to see if tomorrow this is still a
> good idea.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6ff4256700e60..de342ceb17201 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
>  		/* Other cpus might concurrently change txq->xmit_lock_owner
>  		 * to -1 or to their cpu id, but not to our id.
>  		 */
> -		if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> +		if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {

Ain't this changing the behavior for !RT case? Previously, if it was the same thread
which has already locked the queue (and hence the same CPU) evaluating this condition,
the condition was skipped, which is no longer the case with this change.

How about something like this instead?:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d99b0fbc1942..27d090b03493 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -705,7 +705,7 @@ struct netdev_queue {
 	struct dql		dql;
 #endif
 	spinlock_t		_xmit_lock ____cacheline_aligned_in_smp;
-	int			xmit_lock_owner;
+	struct task_struct	*xmit_lock_owner;
 	/*
 	 * Time (in jiffies) of last Tx
 	 */
@@ -4709,7 +4709,7 @@ static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
 {
 	spin_lock(&txq->_xmit_lock);
 	/* Pairs with READ_ONCE() in __dev_queue_xmit() */
-	WRITE_ONCE(txq->xmit_lock_owner, cpu);
+	WRITE_ONCE(txq->xmit_lock_owner, current);
 }
 
 static inline bool __netif_tx_acquire(struct netdev_queue *txq)
@@ -4727,7 +4727,7 @@ static inline void __netif_tx_lock_bh(struct netdev_queue *txq)
 {
 	spin_lock_bh(&txq->_xmit_lock);
 	/* Pairs with READ_ONCE() in __dev_queue_xmit() */
-	WRITE_ONCE(txq->xmit_lock_owner, smp_processor_id());
+	WRITE_ONCE(txq->xmit_lock_owner, current);
 }
 
 static inline bool __netif_tx_trylock(struct netdev_queue *txq)
@@ -4736,7 +4736,7 @@ static inline bool __netif_tx_trylock(struct netdev_queue *txq)
 
 	if (likely(ok)) {
 		/* Pairs with READ_ONCE() in __dev_queue_xmit() */
-		WRITE_ONCE(txq->xmit_lock_owner, smp_processor_id());
+		WRITE_ONCE(txq->xmit_lock_owner, current);
 	}
 	return ok;
 }
@@ -4744,14 +4744,14 @@ static inline bool __netif_tx_trylock(struct netdev_queue *txq)
 static inline void __netif_tx_unlock(struct netdev_queue *txq)
 {
 	/* Pairs with READ_ONCE() in __dev_queue_xmit() */
-	WRITE_ONCE(txq->xmit_lock_owner, -1);
+	WRITE_ONCE(txq->xmit_lock_owner, NULL);
 	spin_unlock(&txq->_xmit_lock);
 }
 
 static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
 {
 	/* Pairs with READ_ONCE() in __dev_queue_xmit() */
-	WRITE_ONCE(txq->xmit_lock_owner, -1);
+	WRITE_ONCE(txq->xmit_lock_owner, NULL);
 	spin_unlock_bh(&txq->_xmit_lock);
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index ccef685023c2..f62bffb8edf6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4817,7 +4817,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
 		/* Other cpus might concurrently change txq->xmit_lock_owner
 		 * to -1 or to their cpu id, but not to our id.
 		 */
-		if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
+		if (READ_ONCE(txq->xmit_lock_owner) != current) {
 			if (dev_xmit_recursion())
 				goto recursion_alert;
 
@@ -11178,7 +11178,7 @@ static void netdev_init_one_queue(struct net_device *dev,
 	/* Initialize queue lock */
 	spin_lock_init(&queue->_xmit_lock);
 	netdev_set_xmit_lockdep_class(&queue->_xmit_lock, dev->type);
-	queue->xmit_lock_owner = -1;
+	queue->xmit_lock_owner = NULL;
 	netdev_queue_numa_node_write(queue, NUMA_NO_NODE);
 	queue->dev = dev;
 #ifdef CONFIG_BQL

Daniel

>  			if (dev_xmit_recursion())
>  				goto recursion_alert;
>  
> 
> > Bert Karwatzki
> 
> Sebastian

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-03-18 10:30                           ` Daniel Vacek
@ 2026-03-18 11:18                             ` Sebastian Andrzej Siewior
  2026-03-18 14:43                               ` Daniel Vacek
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-18 11:18 UTC (permalink / raw)
  To: Daniel Vacek
  Cc: edumazet, kuba, linux-kernel, linux-rt-devel, netdev, spasswolf,
	tglx

On 2026-03-18 11:30:09 [+0100], Daniel Vacek wrote:
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >  		/* Other cpus might concurrently change txq->xmit_lock_owner
> >  		 * to -1 or to their cpu id, but not to our id.
> >  		 */
> > -		if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > +		if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {
> 
> Ain't this changing the behavior for !RT case? Previously, if it was the same thread
> which has already locked the queue (and hence the same CPU) evaluating this condition,
> the condition was skipped, which is no longer the case with this change.

The above was me thinking and does not even compile for !RT. Commit
b824c3e16c190 ("net: Provide a PREEMPT_RT specific check for
netdev_queue::_xmit_lock") is what was merged in the end.

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-03-18 11:18                             ` Sebastian Andrzej Siewior
@ 2026-03-18 14:43                               ` Daniel Vacek
  2026-03-18 14:51                                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Vacek @ 2026-03-18 14:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: edumazet, kuba, linux-kernel, linux-rt-devel, netdev, spasswolf,
	tglx

On Wed, 18 Mar 2026 at 12:18, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2026-03-18 11:30:09 [+0100], Daniel Vacek wrote:
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> > >             /* Other cpus might concurrently change txq->xmit_lock_owner
> > >              * to -1 or to their cpu id, but not to our id.
> > >              */
> > > -           if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > > +           if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {
> >
> > Ain't this changing the behavior for !RT case? Previously, if it was the same thread
> > which has already locked the queue (and hence the same CPU) evaluating this condition,
> > the condition was skipped, which is no longer the case with this change.
>
> The above was me thinking and does not even compile for !RT. Commit
> b824c3e16c190 ("net: Provide a PREEMPT_RT specific check for
> netdev_queue::_xmit_lock") is what was merged in the end.

Hmm, that means txq->xmit_lock_owner is not used at all for PREEMT_RT.
It's pointless to even store it. Shall we care?

--nX

> Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-03-18 14:43                               ` Daniel Vacek
@ 2026-03-18 14:51                                 ` Sebastian Andrzej Siewior
  2026-03-18 14:58                                   ` Daniel Vacek
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-18 14:51 UTC (permalink / raw)
  To: Daniel Vacek
  Cc: edumazet, kuba, linux-kernel, linux-rt-devel, netdev, spasswolf,
	tglx

On 2026-03-18 15:43:52 [+0100], Daniel Vacek wrote:
> On Wed, 18 Mar 2026 at 12:18, Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2026-03-18 11:30:09 [+0100], Daniel Vacek wrote:
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> > > >             /* Other cpus might concurrently change txq->xmit_lock_owner
> > > >              * to -1 or to their cpu id, but not to our id.
> > > >              */
> > > > -           if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > > > +           if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {
> > >
> > > Ain't this changing the behavior for !RT case? Previously, if it was the same thread
> > > which has already locked the queue (and hence the same CPU) evaluating this condition,
> > > the condition was skipped, which is no longer the case with this change.
> >
> > The above was me thinking and does not even compile for !RT. Commit
> > b824c3e16c190 ("net: Provide a PREEMPT_RT specific check for
> > netdev_queue::_xmit_lock") is what was merged in the end.
> 
> Hmm, that means txq->xmit_lock_owner is not used at all for PREEMT_RT.
> It's pointless to even store it. Shall we care?

For PREEMPT_RT the xmit_lock_owner member is only stored and not used
otherwise. It could be removed as in ifdef-ed away but I do not care
enough to sprinkle it and the gain is little (proof me wrong). So I am
happy as-is.

As for the check itself as we have now, we detect the deadlock before it
happens and this is nice to have. The alternative (in case of a
deadlock) would be the deadlock detection in rtmutex code which would
freeze the thread.

> --nX
> 
Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT
  2026-03-18 14:51                                 ` Sebastian Andrzej Siewior
@ 2026-03-18 14:58                                   ` Daniel Vacek
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Vacek @ 2026-03-18 14:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: edumazet, kuba, linux-kernel, linux-rt-devel, netdev, spasswolf,
	tglx

On Wed, 18 Mar 2026 at 15:51, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2026-03-18 15:43:52 [+0100], Daniel Vacek wrote:
> > On Wed, 18 Mar 2026 at 12:18, Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2026-03-18 11:30:09 [+0100], Daniel Vacek wrote:
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -4821,7 +4821,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> > > > >             /* Other cpus might concurrently change txq->xmit_lock_owner
> > > > >              * to -1 or to their cpu id, but not to our id.
> > > > >              */
> > > > > -           if (READ_ONCE(txq->xmit_lock_owner) != cpu) {
> > > > > +           if (rt_mutex_owner(&txq->_xmit_lock.lock) != current) {
> > > >
> > > > Ain't this changing the behavior for !RT case? Previously, if it was the same thread
> > > > which has already locked the queue (and hence the same CPU) evaluating this condition,
> > > > the condition was skipped, which is no longer the case with this change.
> > >
> > > The above was me thinking and does not even compile for !RT. Commit
> > > b824c3e16c190 ("net: Provide a PREEMPT_RT specific check for
> > > netdev_queue::_xmit_lock") is what was merged in the end.
> >
> > Hmm, that means txq->xmit_lock_owner is not used at all for PREEMT_RT.
> > It's pointless to even store it. Shall we care?
>
> For PREEMPT_RT the xmit_lock_owner member is only stored and not used
> otherwise. It could be removed as in ifdef-ed away but I do not care
> enough to sprinkle it and the gain is little (proof me wrong). So I am
> happy as-is.

Right. Sorry for bothering then. I missed the merged commit.

--nX

> As for the check itself as we have now, we detect the deadlock before it
> happens and this is nice to have. The alternative (in case of a
> deadlock) would be the deadlock detection in rtmutex code which would
> freeze the thread.
>
> > --nX
> >
> Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-18 14:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260216134333.412332-1-spasswolf@web.de>
     [not found] ` <6274de932f4a62c51b424b65fc875ef3cb5ffd60.camel@web.de>
     [not found]   ` <20260216153745.CA3__zRc@linutronix.de>
     [not found]     ` <37d6e27f96afb57c5716798530cb3560d25202e5.camel@web.de>
     [not found]       ` <20260217071952.WCXLGs5-@linutronix.de>
     [not found]         ` <80114792206dc00d0099f00999a209e717debb12.camel@web.de>
     [not found]           ` <20260217095700.SjYjM8RO@linutronix.de>
     [not found]             ` <4fba57892e5bd6a1afc4a36a80b40e3ecc28cac5.camel@web.de>
2026-02-17 11:24               ` "Dead loop on virtual device" error without softirq-BKL on PREEMPT_RT Bert Karwatzki
2026-02-17 16:52                 ` Bert Karwatzki
2026-02-17 19:10                   ` Bert Karwatzki
2026-02-18  7:30                     ` Sebastian Andrzej Siewior
2026-02-18 12:50                       ` Bert Karwatzki
2026-02-26 17:29                         ` Sebastian Andrzej Siewior
2026-03-18 10:30                           ` Daniel Vacek
2026-03-18 11:18                             ` Sebastian Andrzej Siewior
2026-03-18 14:43                               ` Daniel Vacek
2026-03-18 14:51                                 ` Sebastian Andrzej Siewior
2026-03-18 14:58                                   ` Daniel Vacek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox