public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	 netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Kurt Kanzenbach <kurt@linutronix.de>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Simon Horman <horms@kernel.org>,
	 Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] net: Drop the lock in skb_may_tx_timestamp()
Date: Sat, 14 Feb 2026 23:05:27 -0500	[thread overview]
Message-ID: <willemdebruijn.kernel.1682a93525c87@gmail.com> (raw)
In-Reply-To: <20260214232456.A37oV4KQ@linutronix.de>

Sebastian Andrzej Siewior wrote:
> skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must
> not be taken in IRQ context, only softirq is okay. A few drivers receive
> the timestamp via a dedicated interrupt and complete the TX timestamp
> from that handler. This will lead to a deadlock if the lock is already
> write-locked on the same CPU.
> 
> Taking the lock can be avoided. The socket (pointed by the skb) will
> remain valid until the skb is released. The ->sk_socket and ->file
> member will be set to NULL once the user closes the socket which may
> happen before the timestamp arrives.
> If we happen to observe the pointer while the socket is closing but
> before the pointer is set to NULL then we may use it because both
> pointer (and the file's cred member) are RCU freed and the IRQ and
> softirq handler qualify as a RCU-read section.
> 
> Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  net/core/skbuff.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 61746c2b95f63..a174010e334a1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5555,16 +5555,25 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
>  
>  static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
>  {
> -	bool ret;
> +	struct socket *sock;
> +	struct file *file;
>  
>  	if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
>  		return true;
>  
> -	read_lock_bh(&sk->sk_callback_lock);
> -	ret = sk->sk_socket && sk->sk_socket->file &&
> -	      file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
> -	read_unlock_bh(&sk->sk_callback_lock);
> -	return ret;
> +	/* The sk pointer remains valid as long as the skb is. The sk_socket and
> +	 * file pointer may become NULL if the socket is closed. Both structures
> +	 * (including file->cred) are RCU freed which means they can be accessed
> +	 * within a RCU read section.

The fields are not __rcu annotated. Is this common knowledge or is
there clear documentation of this behavior? Not disputing it, just not
familiar with the rules around these fields.

It does seem that updates to sk_socket, in sock_orphan and
sock_graft, are protected by sk_callback_lock.
> +	 */
> +	lockdep_assert_in_rcu_reader();

This function can be called from process context outside an RCU read
side section, I think.

Such as a packet socket sending and hitting the __skb_tstamp_tx
SCM_TSTAMP_SCHED at the top of __dev_queue_xmit.

> +	sock = READ_ONCE(sk->sk_socket);
> +	if (!sock)
> +		return false;
> +	file = READ_ONCE(sock->file);
> +	if (!file)
> +		return false;
> +	return file_ns_capable(file, &init_user_ns, CAP_NET_RAW);
>  }
>  
>  void skb_complete_tx_timestamp(struct sk_buff *skb,
> -- 
> 2.51.0
> 



  reply	other threads:[~2026-02-15  4:05 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14 23:24 [PATCH net] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-15  4:05 ` Willem de Bruijn [this message]
2026-02-15 18:46   ` Sebastian Andrzej Siewior
2026-02-16 16:47     ` Vadim Fedorenko
2026-02-16 18:08       ` Willem de Bruijn
2026-02-17 13:28         ` Sebastian Andrzej Siewior
2026-02-17 14:14           ` Willem de Bruijn
2026-02-17 12:13       ` Sebastian Andrzej Siewior
2026-02-17  7:48     ` Jason Xing
2026-02-17 14:44       ` Sebastian Andrzej Siewior
2026-02-17  8:01 ` Eric Dumazet
2026-02-17 14:45   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.1682a93525c87@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox