public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org,  "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Kurt Kanzenbach <kurt@linutronix.de>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Simon Horman <horms@kernel.org>,
	 Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] net: Drop the lock in skb_may_tx_timestamp()
Date: Mon, 16 Feb 2026 13:08:43 -0500	[thread overview]
Message-ID: <willemdebruijn.kernel.3b597169afaae@gmail.com> (raw)
In-Reply-To: <9130dfca-1e85-4975-8df9-4de8bd9e8f5e@linux.dev>

Vadim Fedorenko wrote:
> On 15/02/2026 18:46, Sebastian Andrzej Siewior wrote:
> > On 2026-02-14 23:05:27 [-0500], Willem de Bruijn wrote:
> >>> --- a/net/core/skbuff.c
> >>> +++ b/net/core/skbuff.c
> >>> @@ -5555,16 +5555,25 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
> >>>   
> >>>   static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
> >>>   {
> >>> -	bool ret;
> >>> +	struct socket *sock;
> >>> +	struct file *file;
> >>>   
> >>>   	if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
> >>>   		return true;
> >>>   
> >>> -	read_lock_bh(&sk->sk_callback_lock);
> >>> -	ret = sk->sk_socket && sk->sk_socket->file &&
> >>> -	      file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
> >>> -	read_unlock_bh(&sk->sk_callback_lock);
> >>> -	return ret;
> >>> +	/* The sk pointer remains valid as long as the skb is. The sk_socket and
> >>> +	 * file pointer may become NULL if the socket is closed. Both structures
> >>> +	 * (including file->cred) are RCU freed which means they can be accessed
> >>> +	 * within a RCU read section.
> >>
> >> The fields are not __rcu annotated. Is this common knowledge or is
> >> there clear documentation of this behavior? Not disputing it, just not
> >> familiar with the rules around these fields.
> > 
> > The pointer is not RCU annotated because normally, within a syscall, the
> > file descriptor does not vanish and so the pointer remains valid. The
> > interrupt callback is an exception because we don't hold a reference on
> > the fd. But since the underlying data structures have a RCU lifetime we
> > could use it.
> > 
> > sock_alloc() allocates a struct socket_alloc which contains the struct
> > socket and struct inode (sock_alloc_inode()). This the sk_socket in
> > struct sock. Once you close the socket via the close syscall and the
> > last reference is gone then you end up in __fput(). The invoked
> > ->release() callback is sock_close(). Here socket::file gets NULLed and
> > proto::release should sock_orphan() the sock where sock::sk_socket gets
> > NULLed (I don't see where else and packet_release() does so). The
> > "struct sock" remains around because the skb has a reference on.
> > Otherwise it would go, too.
> > 
> > Once that is done __fput() does dput() -> __dentry_kill() -> iput() ->
> > destroy_inode() which does RCU free the inode.
> > That means if we observe the pointer non-NULL we can use it under the
> > RCU rules because the inode is released via RCU.
> > 
> > ->file. At the end of __fput() there is file_free() which releases the
> > pointer. file goes to filp_cachep which is SLAB_TYPESAFE_BY_RCU. There
> > is a put_cred for file::f_cred before that and it does call_rcu() at the
> > end, too.
> > 
> > This is me reading the code on FRI/ SAT and double checking it with
> > ptp4l. I might have missed something but so far, it makes sense.
> > 
> >> It does seem that updates to sk_socket, in sock_orphan and
> >> sock_graft, are protected by sk_callback_lock.
> >>> +	 */
> >>> +	lockdep_assert_in_rcu_reader();

It will be great to have this additional context in the commit
message.

And a Link: to the original discussion thread.

Btw, the commit states that a few drivers already call this from their
hard interrupt handler. Can you point to one or two?

If this is only for new drivers, it would be vastly preferable if this
can go through net-next.

> >>
> >> This function can be called from process context outside an RCU read
> >> side section, I think.
> >>
> >> Such as a packet socket sending and hitting the __skb_tstamp_tx
> >> SCM_TSTAMP_SCHED at the top of __dev_queue_xmit.
> > 
> > If that is the case, then we could create a RCU read section here.
> 
> well, __dev_queue_xmit() grabs rcu_read_lock_bh() already, we just have
> to move it right before timestamping check:

True. That was just the first example I found though.

I did not check tcp_ack_tstamp, or whether any drivers defer the
operation to a process context kthread without RCU protection.


  reply	other threads:[~2026-02-16 18:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14 23:24 [PATCH net] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-15  4:05 ` Willem de Bruijn
2026-02-15 18:46   ` Sebastian Andrzej Siewior
2026-02-16 16:47     ` Vadim Fedorenko
2026-02-16 18:08       ` Willem de Bruijn [this message]
2026-02-17 13:28         ` Sebastian Andrzej Siewior
2026-02-17 14:14           ` Willem de Bruijn
2026-02-17 12:13       ` Sebastian Andrzej Siewior
2026-02-17  7:48     ` Jason Xing
2026-02-17 14:44       ` Sebastian Andrzej Siewior
2026-02-17  8:01 ` Eric Dumazet
2026-02-17 14:45   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.3b597169afaae@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vadim.fedorenko@linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox