public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Kurt Kanzenbach <kurt@linutronix.de>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] net: Drop the lock in skb_may_tx_timestamp()
Date: Sun, 15 Feb 2026 19:46:08 +0100	[thread overview]
Message-ID: <20260215184608.8dk4hemG@linutronix.de> (raw)
In-Reply-To: <willemdebruijn.kernel.1682a93525c87@gmail.com>

On 2026-02-14 23:05:27 [-0500], Willem de Bruijn wrote:
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5555,16 +5555,25 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
> >  
> >  static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
> >  {
> > -	bool ret;
> > +	struct socket *sock;
> > +	struct file *file;
> >  
> >  	if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
> >  		return true;
> >  
> > -	read_lock_bh(&sk->sk_callback_lock);
> > -	ret = sk->sk_socket && sk->sk_socket->file &&
> > -	      file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
> > -	read_unlock_bh(&sk->sk_callback_lock);
> > -	return ret;
> > +	/* The sk pointer remains valid as long as the skb is. The sk_socket and
> > +	 * file pointer may become NULL if the socket is closed. Both structures
> > +	 * (including file->cred) are RCU freed which means they can be accessed
> > +	 * within a RCU read section.
> 
> The fields are not __rcu annotated. Is this common knowledge or is
> there clear documentation of this behavior? Not disputing it, just not
> familiar with the rules around these fields.

The pointer is not RCU annotated because normally, within a syscall, the
file descriptor does not vanish and so the pointer remains valid. The
interrupt callback is an exception because we don't hold a reference on
the fd. But since the underlying data structures have a RCU lifetime we
could use it.

sock_alloc() allocates a struct socket_alloc which contains the struct
socket and struct inode (sock_alloc_inode()). This the sk_socket in
struct sock. Once you close the socket via the close syscall and the
last reference is gone then you end up in __fput(). The invoked
->release() callback is sock_close(). Here socket::file gets NULLed and
proto::release should sock_orphan() the sock where sock::sk_socket gets
NULLed (I don't see where else and packet_release() does so). The
"struct sock" remains around because the skb has a reference on.
Otherwise it would go, too.

Once that is done __fput() does dput() -> __dentry_kill() -> iput() ->
destroy_inode() which does RCU free the inode.
That means if we observe the pointer non-NULL we can use it under the
RCU rules because the inode is released via RCU. 

->file. At the end of __fput() there is file_free() which releases the
pointer. file goes to filp_cachep which is SLAB_TYPESAFE_BY_RCU. There
is a put_cred for file::f_cred before that and it does call_rcu() at the
end, too.

This is me reading the code on FRI/ SAT and double checking it with
ptp4l. I might have missed something but so far, it makes sense.

> It does seem that updates to sk_socket, in sock_orphan and
> sock_graft, are protected by sk_callback_lock.
> > +	 */
> > +	lockdep_assert_in_rcu_reader();
> 
> This function can be called from process context outside an RCU read
> side section, I think.
> 
> Such as a packet socket sending and hitting the __skb_tstamp_tx
> SCM_TSTAMP_SCHED at the top of __dev_queue_xmit.

If that is the case, then we could create a RCU read section here.

> > +	sock = READ_ONCE(sk->sk_socket);
> > +	if (!sock)
> > +		return false;
> > +	file = READ_ONCE(sock->file);
> > +	if (!file)
> > +		return false;
> > +	return file_ns_capable(file, &init_user_ns, CAP_NET_RAW);
> >  }
> >  
> >  void skb_complete_tx_timestamp(struct sk_buff *skb,

Sebastian

  reply	other threads:[~2026-02-15 18:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-14 23:24 [PATCH net] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-15  4:05 ` Willem de Bruijn
2026-02-15 18:46   ` Sebastian Andrzej Siewior [this message]
2026-02-16 16:47     ` Vadim Fedorenko
2026-02-16 18:08       ` Willem de Bruijn
2026-02-17 13:28         ` Sebastian Andrzej Siewior
2026-02-17 14:14           ` Willem de Bruijn
2026-02-17 12:13       ` Sebastian Andrzej Siewior
2026-02-17  7:48     ` Jason Xing
2026-02-17 14:44       ` Sebastian Andrzej Siewior
2026-02-17  8:01 ` Eric Dumazet
2026-02-17 14:45   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260215184608.8dk4hemG@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kurt@linutronix.de \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox