public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
@ 2026-02-20 18:38 Sebastian Andrzej Siewior
  2026-02-20 21:20 ` Willem de Bruijn
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-20 18:38 UTC (permalink / raw)
  To: netdev
  Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
not be taken in IRQ context, only softirq is okay. A few drivers receive
the timestamp via a dedicated interrupt and complete the TX timestamp
from that handler. This will lead to a deadlock if the lock is already
write-locked on the same CPU.

Taking the lock can be avoided. The socket (pointed by the skb) will
remain valid until the skb is released. The ->sk_socket and ->file
member will be set to NULL once the user closes the socket which may
happen before the timestamp arrives.
If we happen to observe the pointer while the socket is closing but
before the pointer is set to NULL then we may use it because both
pointer (and the file's cred member) are RCU freed.

Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
matching WRITE_ONCE() where the pointer are cleared.

Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

v1…2: https://lore.kernel.org/all/20260214232456.A37oV4KQ@linutronix.de/
  - Added matching WRITE_ONCE() as per Eric.
  - Added a RCU read section. There are users from preemptible
    context.
  - Added a Link: to the original thread as per Willem de Bruijn

 include/net/sock.h |  2 +-
 net/core/skbuff.c  | 23 ++++++++++++++++++-----
 net/socket.c       |  2 +-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index aafe8bdb2c0f9..ff65c3a67efa2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2089,7 +2089,7 @@ static inline int sk_rx_queue_get(const struct sock *sk)
 
 static inline void sk_set_socket(struct sock *sk, struct socket *sock)
 {
-	sk->sk_socket = sock;
+	WRITE_ONCE(sk->sk_socket, sock);
 	if (sock) {
 		WRITE_ONCE(sk->sk_uid, SOCK_INODE(sock)->i_uid);
 		WRITE_ONCE(sk->sk_ino, SOCK_INODE(sock)->i_ino);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 61746c2b95f63..a01fb7c053bfa 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5555,15 +5555,28 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
 
 static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
 {
-	bool ret;
+	struct socket *sock;
+	struct file *file;
+	bool ret = false;
 
 	if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
 		return true;
 
-	read_lock_bh(&sk->sk_callback_lock);
-	ret = sk->sk_socket && sk->sk_socket->file &&
-	      file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
-	read_unlock_bh(&sk->sk_callback_lock);
+	/* The sk pointer remains valid as long as the skb is. The sk_socket and
+	 * file pointer may become NULL if the socket is closed. Both structures
+	 * (including file->cred) are RCU freed which means they can be accessed
+	 * within a RCU read section.
+	 */
+	rcu_read_lock();
+	sock = READ_ONCE(sk->sk_socket);
+	if (!sock)
+		goto out;
+	file = READ_ONCE(sock->file);
+	if (!file)
+		goto out;
+	ret = file_ns_capable(file, &init_user_ns, CAP_NET_RAW);
+out:
+	rcu_read_unlock();
 	return ret;
 }
 
diff --git a/net/socket.c b/net/socket.c
index 136b98c54fb37..05952188127f5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -674,7 +674,7 @@ static void __sock_release(struct socket *sock, struct inode *inode)
 		iput(SOCK_INODE(sock));
 		return;
 	}
-	sock->file = NULL;
+	WRITE_ONCE(sock->file, NULL);
 }
 
 /**
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
@ 2026-02-20 21:20 ` Willem de Bruijn
  2026-02-20 21:29 ` Jakub Kicinski
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Willem de Bruijn @ 2026-02-20 21:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, netdev
  Cc: Willem de Bruijn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

Sebastian Andrzej Siewior wrote:
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.
> 
> Taking the lock can be avoided. The socket (pointed by the skb) will
> remain valid until the skb is released. The ->sk_socket and ->file
> member will be set to NULL once the user closes the socket which may
> happen before the timestamp arrives.
> If we happen to observe the pointer while the socket is closing but
> before the pointer is set to NULL then we may use it because both
> pointer (and the file's cred member) are RCU freed.
> 
> Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
> matching WRITE_ONCE() where the pointer are cleared.
> 
> Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
> Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
  2026-02-20 21:20 ` Willem de Bruijn
@ 2026-02-20 21:29 ` Jakub Kicinski
  2026-02-20 21:49   ` Willem de Bruijn
  2026-02-21  0:45 ` Jason Xing
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2026-02-20 21:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, Willem de Bruijn, David S. Miller, Eric Dumazet,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

On Fri, 20 Feb 2026 19:38:58 +0100 Sebastian Andrzej Siewior wrote:
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.

Fine by me, the fix is fairly simple. But FWIW I can't stop myself from
restating that networking core is not supposed to be called by drivers
in hard IRQ context in general. Especially when it comes to anything
that may involve user sockets. So my intuition is that fixing the driver
to use a tasklet^w work_bh would make the expectations clearer.

Willem, you don't find that argument convincing?

BTW IIUC the driver in question was igc, and it did not exist for
another 3 years after the commit under Fixes, which is another bad
smell here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 21:29 ` Jakub Kicinski
@ 2026-02-20 21:49   ` Willem de Bruijn
  2026-02-20 22:02     ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Willem de Bruijn @ 2026-02-20 21:49 UTC (permalink / raw)
  To: Jakub Kicinski, Sebastian Andrzej Siewior
  Cc: netdev, Willem de Bruijn, David S. Miller, Eric Dumazet,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

Jakub Kicinski wrote:
> On Fri, 20 Feb 2026 19:38:58 +0100 Sebastian Andrzej Siewior wrote:
> > skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> > not be taken in IRQ context, only softirq is okay. A few drivers receive
> > the timestamp via a dedicated interrupt and complete the TX timestamp
> > from that handler. This will lead to a deadlock if the lock is already
> > write-locked on the same CPU.
> 
> Fine by me, the fix is fairly simple. But FWIW I can't stop myself from
> restating that networking core is not supposed to be called by drivers
> in hard IRQ context in general. Especially when it comes to anything
> that may involve user sockets. So my intuition is that fixing the driver
> to use a tasklet^w work_bh would make the expectations clearer.
> 
> Willem, you don't find that argument convincing?

This narrow case seems fine to me too.

But I can understand if that would be a global rule. As it simplifies
reasoning about correctness of core code quite a bit. In which case
that would trump this. No preference from me. Clearly other drivers
are quite capable of making this work without requiring updates from
hard IRQ.
 
> BTW IIUC the driver in question was igc, and it did not exist for
> another 3 years after the commit under Fixes, which is another bad
> smell here.

I was wondering about that too. But Sebastian shared quite a few more
drivers: https://lore.kernel.org/netdev/20260217132838.kgRAQ87W@linutronix.de/
I did not bother to check whether all were newer than the introduction
of the blamed commit.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 21:49   ` Willem de Bruijn
@ 2026-02-20 22:02     ` Jakub Kicinski
  2026-02-23 17:07       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2026-02-20 22:02 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Sebastian Andrzej Siewior, netdev, David S. Miller, Eric Dumazet,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

On Fri, 20 Feb 2026 16:49:24 -0500 Willem de Bruijn wrote:
> Jakub Kicinski wrote:
> > On Fri, 20 Feb 2026 19:38:58 +0100 Sebastian Andrzej Siewior wrote:  
> > > skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> > > not be taken in IRQ context, only softirq is okay. A few drivers receive
> > > the timestamp via a dedicated interrupt and complete the TX timestamp
> > > from that handler. This will lead to a deadlock if the lock is already
> > > write-locked on the same CPU.  
> > 
> > Fine by me, the fix is fairly simple. But FWIW I can't stop myself from
> > restating that networking core is not supposed to be called by drivers
> > in hard IRQ context in general. Especially when it comes to anything
> > that may involve user sockets. So my intuition is that fixing the driver
> > to use a tasklet^w work_bh would make the expectations clearer.
> > 
> > Willem, you don't find that argument convincing?  
> 
> This narrow case seems fine to me too.
> 
> But I can understand if that would be a global rule. As it simplifies
> reasoning about correctness of core code quite a bit. In which case
> that would trump this. No preference from me. Clearly other drivers
> are quite capable of making this work without requiring updates from
> hard IRQ.

Well put. I guess we're in the same boat then. No strong preference
here either, but I like the clearly drawn expectations to simplify
reasoning.

> > BTW IIUC the driver in question was igc, and it did not exist for
> > another 3 years after the commit under Fixes, which is another bad
> > smell here.  
> 
> I was wondering about that too. But Sebastian shared quite a few more
> drivers: https://lore.kernel.org/netdev/20260217132838.kgRAQ87W@linutronix.de/
> I did not bother to check whether all were newer than the introduction
> of the blamed commit.

IIRC CAN did not have timestamping until relatively recently so none 
of those count. hclge is even newer than igc.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
  2026-02-20 21:20 ` Willem de Bruijn
  2026-02-20 21:29 ` Jakub Kicinski
@ 2026-02-21  0:45 ` Jason Xing
  2026-02-23  8:42 ` Eric Dumazet
  2026-02-24 10:50 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 14+ messages in thread
From: Jason Xing @ 2026-02-21  0:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, Willem de Bruijn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Kurt Kanzenbach, Paolo Abeni, Simon Horman,
	Willem de Bruijn

On Sat, Feb 21, 2026 at 2:40 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.
>
> Taking the lock can be avoided. The socket (pointed by the skb) will
> remain valid until the skb is released. The ->sk_socket and ->file
> member will be set to NULL once the user closes the socket which may
> happen before the timestamp arrives.
> If we happen to observe the pointer while the socket is closing but
> before the pointer is set to NULL then we may use it because both
> pointer (and the file's cred member) are RCU freed.
>
> Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
> matching WRITE_ONCE() where the pointer are cleared.
>
> Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
> Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thanks for the fix!

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2026-02-21  0:45 ` Jason Xing
@ 2026-02-23  8:42 ` Eric Dumazet
  2026-02-24 10:50 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2026-02-23  8:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

On Fri, Feb 20, 2026 at 7:39 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.
>
> Taking the lock can be avoided. The socket (pointed by the skb) will
> remain valid until the skb is released. The ->sk_socket and ->file
> member will be set to NULL once the user closes the socket which may
> happen before the timestamp arrives.
> If we happen to observe the pointer while the socket is closing but
> before the pointer is set to NULL then we may use it because both
> pointer (and the file's cred member) are RCU freed.
>
> Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a
> matching WRITE_ONCE() where the pointer are cleared.
>
> Link: https://lore.kernel.org/all/20260205145104.iWinkXHv@linutronix.de
> Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 22:02     ` Jakub Kicinski
@ 2026-02-23 17:07       ` Sebastian Andrzej Siewior
  2026-02-23 23:27         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-23 17:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, netdev, David S. Miller, Eric Dumazet,
	Kurt Kanzenbach, Paolo Abeni, Simon Horman, Willem de Bruijn

On 2026-02-20 14:02:41 [-0800], Jakub Kicinski wrote:
> > > Willem, you don't find that argument convincing?  
> > 
> > This narrow case seems fine to me too.
> > 
> > But I can understand if that would be a global rule. As it simplifies
> > reasoning about correctness of core code quite a bit. In which case
> > that would trump this. No preference from me. Clearly other drivers
> > are quite capable of making this work without requiring updates from
> > hard IRQ.
> 
> Well put. I guess we're in the same boat then. No strong preference
> here either, but I like the clearly drawn expectations to simplify
> reasoning.

We kind of have NAPI and the rule that we inject packets into the stack
only from within sofirq. As far as I remember we got NAPI within the 2.5
cycle and then this got as an written rule (or written). Simply because
injecting packets from an IRQ could soft lock the system if you receive
a lot.
Using softirq has the benefit of moving the load to ksoftirqd which can
then be preempted by the scheduler.
This was easy to enforce as an argument of its own and the fact that
almost all required locking is BH-only.

Since around v2.3.4 the locking was moved into struct sk_buff_head,
before that there was global skb_queue_lock. Till this day the lock is
acquired with IRQ-off. This kind of makes it possible to queue the skb
from within the IRQ. This, and the fact that the "cloned" skb has no
extra baggage (such as an destructor) and the kfree_skb() works here,
too.

This can be a lot what you have to know especially if you want to extend
things later on and you don't want to think if this can be done here or
if we violate an exception to a rule.

Based on that I would say it is absolutely sane to enforce delivery of
skb here from !hardirq context. After looking into it last week, the BH
worker shouldn't be worst choice here.

How do we move forward? Merge this, backport stable and then convert
in-tree user away from IRQ delivery _or_ fix the driver one by one and
backport them stable? Or…

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-23 17:07       ` Sebastian Andrzej Siewior
@ 2026-02-23 23:27         ` Jakub Kicinski
  2026-02-24  9:02           ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2026-02-23 23:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Paolo Abeni
  Cc: Willem de Bruijn, netdev, David S. Miller, Eric Dumazet,
	Kurt Kanzenbach, Simon Horman, Willem de Bruijn

On Mon, 23 Feb 2026 18:07:50 +0100 Sebastian Andrzej Siewior wrote:
> How do we move forward? Merge this, backport stable and then convert
> in-tree user away from IRQ delivery _or_ fix the driver one by one and
> backport them stable? Or…

If we plan to convert in-tree users away from IRQ delivery I'd do that
directly, without applying this patch. But I'll let Paolo have the final
word. Eric's review tag seems to indicate to me that he disagrees.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-23 23:27         ` Jakub Kicinski
@ 2026-02-24  9:02           ` Eric Dumazet
  2026-02-24 10:26             ` Paolo Abeni
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2026-02-24  9:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Sebastian Andrzej Siewior, Paolo Abeni, Willem de Bruijn, netdev,
	David S. Miller, Kurt Kanzenbach, Simon Horman, Willem de Bruijn

On Tue, Feb 24, 2026 at 12:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 23 Feb 2026 18:07:50 +0100 Sebastian Andrzej Siewior wrote:
> > How do we move forward? Merge this, backport stable and then convert
> > in-tree user away from IRQ delivery _or_ fix the driver one by one and
> > backport them stable? Or…
>
> If we plan to convert in-tree users away from IRQ delivery I'd do that
> directly, without applying this patch. But I'll let Paolo have the final
> word. Eric's review tag seems to indicate to me that he disagrees.

This patch is orthogonal and will reduce cpu costs for NAPI drivers ?

We probably can remove sk_callback_lock cache line dirtyng in other places.

For instance:

diff --git a/net/netfilter/xt_owner.c b/net/netfilter/xt_owner.c
index 50332888c8d233aab0915a31f2f616f3171da45e..4ce9fae5eaf92bee7235c4619880bd9690005121
100644
--- a/net/netfilter/xt_owner.c
+++ b/net/netfilter/xt_owner.c
@@ -63,9 +63,10 @@ static bool
 owner_mt(const struct sk_buff *skb, struct xt_action_param *par)
 {
        const struct xt_owner_match_info *info = par->matchinfo;
-       const struct file *filp;
        struct sock *sk = skb_to_full_sk(skb);
        struct net *net = xt_net(par);
+       const struct socket *sock;
+       const struct file *filp;

        if (!sk || !sk->sk_socket || !net_eq(net, sock_net(sk)))
                return (info->match ^ info->invert) == 0;
@@ -76,10 +77,11 @@ owner_mt(const struct sk_buff *skb, struct
xt_action_param *par)
                 */
                return false;

-       read_lock_bh(&sk->sk_callback_lock);
-       filp = sk->sk_socket ? sk->sk_socket->file : NULL;
+       rcu_read_lock();
+       sock = READ_ONCE(sk->sk_socket);
+       filp = sock ? READ_ONCE(sock->file) : NULL;
        if (filp == NULL) {
-               read_unlock_bh(&sk->sk_callback_lock);
+               rcu_read_unlock();
                return ((info->match ^ info->invert) &
                       (XT_OWNER_UID | XT_OWNER_GID)) == 0;
        }
@@ -90,7 +92,7 @@ owner_mt(const struct sk_buff *skb, struct
xt_action_param *par)
                if ((uid_gte(filp->f_cred->fsuid, uid_min) &&
                     uid_lte(filp->f_cred->fsuid, uid_max)) ^
                    !(info->invert & XT_OWNER_UID)) {
-                       read_unlock_bh(&sk->sk_callback_lock);
+                       rcu_read_unlock();
                        return false;
                }
        }
@@ -118,12 +120,12 @@ owner_mt(const struct sk_buff *skb, struct
xt_action_param *par)
                }

                if (match ^ !(info->invert & XT_OWNER_GID)) {
-                       read_unlock_bh(&sk->sk_callback_lock);
+                       rcu_read_unlock();
                        return false;
                }
        }

-       read_unlock_bh(&sk->sk_callback_lock);
+       rcu_read_unlock();
        return true;
 }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-24  9:02           ` Eric Dumazet
@ 2026-02-24 10:26             ` Paolo Abeni
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2026-02-24 10:26 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski
  Cc: Sebastian Andrzej Siewior, Willem de Bruijn, netdev,
	David S. Miller, Kurt Kanzenbach, Simon Horman, Willem de Bruijn

On 2/24/26 10:02 AM, Eric Dumazet wrote:
> On Tue, Feb 24, 2026 at 12:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>> On Mon, 23 Feb 2026 18:07:50 +0100 Sebastian Andrzej Siewior wrote:
>>> How do we move forward? Merge this, backport stable and then convert
>>> in-tree user away from IRQ delivery _or_ fix the driver one by one and
>>> backport them stable? Or…
>>
>> If we plan to convert in-tree users away from IRQ delivery I'd do that
>> directly, without applying this patch. But I'll let Paolo have the final
>> word. Eric's review tag seems to indicate to me that he disagrees.
> 
> This patch is orthogonal and will reduce cpu costs for NAPI drivers ?

I find this point rater convincing: we will have this code regardless of
the fix tag.

/P



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2026-02-23  8:42 ` Eric Dumazet
@ 2026-02-24 10:50 ` patchwork-bot+netdevbpf
  2026-02-26  8:11   ` Kurt Kanzenbach
  4 siblings, 1 reply; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-24 10:50 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, willemdebruijn.kernel, davem, edumazet, kuba, kurt,
	pabeni, horms, willemb

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Fri, 20 Feb 2026 19:38:58 +0100 you wrote:
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.
> 
> Taking the lock can be avoided. The socket (pointed by the skb) will
> remain valid until the skb is released. The ->sk_socket and ->file
> member will be set to NULL once the user closes the socket which may
> happen before the timestamp arrives.
> If we happen to observe the pointer while the socket is closing but
> before the pointer is set to NULL then we may use it because both
> pointer (and the file's cred member) are RCU freed.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: Drop the lock in skb_may_tx_timestamp()
    https://git.kernel.org/netdev/net/c/983512f3a87f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-24 10:50 ` patchwork-bot+netdevbpf
@ 2026-02-26  8:11   ` Kurt Kanzenbach
  2026-02-26  8:50     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Kurt Kanzenbach @ 2026-02-26  8:11 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf, Sebastian Andrzej Siewior
  Cc: netdev, willemdebruijn.kernel, davem, edumazet, kuba, pabeni,
	horms, willemb

[-- Attachment #1: Type: text/plain, Size: 1613 bytes --]

On Tue Feb 24 2026, patchwork-bot+netdevbpf@kernel.org wrote:
> Hello:
>
> This patch was applied to netdev/net.git (main)
> by Paolo Abeni <pabeni@redhat.com>:
>
> On Fri, 20 Feb 2026 19:38:58 +0100 you wrote:
>> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
>> not be taken in IRQ context, only softirq is okay. A few drivers receive
>> the timestamp via a dedicated interrupt and complete the TX timestamp
>> from that handler. This will lead to a deadlock if the lock is already
>> write-locked on the same CPU.
>> 
>> Taking the lock can be avoided. The socket (pointed by the skb) will
>> remain valid until the skb is released. The ->sk_socket and ->file
>> member will be set to NULL once the user closes the socket which may
>> happen before the timestamp arrives.
>> If we happen to observe the pointer while the socket is closing but
>> before the pointer is set to NULL then we may use it because both
>> pointer (and the file's cred member) are RCU freed.
>> 
>> [...]
>
> Here is the summary with links:
>   - [net,v2] net: Drop the lock in skb_may_tx_timestamp()
>     https://git.kernel.org/netdev/net/c/983512f3a87f
>
> You are awesome, thank you!

Hi maintainers,

thanks for applying the patch.

What's your preferred solution for igb? Can we do the time stamping from
IRQ context now [1] or should we switch to BH workqueue [2]?

Thanks,
Kurt

[1] - https://lore.kernel.org/intel-wired-lan/20260205-igb_irq_ts-v3-1-2efc7bc4b885@linutronix.de/
[2] - https://lore.kernel.org/intel-wired-lan/20260211134436.1e623034@kernel.org/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 861 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp()
  2026-02-26  8:11   ` Kurt Kanzenbach
@ 2026-02-26  8:50     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-26  8:50 UTC (permalink / raw)
  To: Kurt Kanzenbach
  Cc: netdev, willemdebruijn.kernel, davem, edumazet, kuba, pabeni,
	horms, willemb

On 2026-02-26 09:11:16 [+0100], Kurt Kanzenbach wrote:
> Hi maintainers,
> 
> thanks for applying the patch.
> 
> What's your preferred solution for igb? Can we do the time stamping from
> IRQ context now [1] or should we switch to BH workqueue [2]?

We have this quote from Jakub [a]:

| But FWIW I can't stop myself from restating that networking core is not
| supposed to be called by drivers in hard IRQ context in general.
| Especially when it comes to anything that may involve user sockets.

which I kind of agree in terms of "easy" maintenance. The kfree_skb()
in the error case does not hurt as of today so does the socket
interaction.

Based on Paolo's [b], we can have this fix and insist on !hardirq
processing?

[a] https://lore.kernel.org/all/20260220132932.1ed45656@kernel.org/
[b] https://lore.kernel.org/all/55b24261-45cb-41f1-857f-f29527a1ee8d@redhat.com

> Thanks,
> Kurt
> 
> [1] - https://lore.kernel.org/intel-wired-lan/20260205-igb_irq_ts-v3-1-2efc7bc4b885@linutronix.de/
> [2] - https://lore.kernel.org/intel-wired-lan/20260211134436.1e623034@kernel.org/

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-26  8:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 18:38 [PATCH net v2] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-20 21:20 ` Willem de Bruijn
2026-02-20 21:29 ` Jakub Kicinski
2026-02-20 21:49   ` Willem de Bruijn
2026-02-20 22:02     ` Jakub Kicinski
2026-02-23 17:07       ` Sebastian Andrzej Siewior
2026-02-23 23:27         ` Jakub Kicinski
2026-02-24  9:02           ` Eric Dumazet
2026-02-24 10:26             ` Paolo Abeni
2026-02-21  0:45 ` Jason Xing
2026-02-23  8:42 ` Eric Dumazet
2026-02-24 10:50 ` patchwork-bot+netdevbpf
2026-02-26  8:11   ` Kurt Kanzenbach
2026-02-26  8:50     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox