From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Kurt Kanzenbach <kurt@linutronix.de>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] net: Drop the lock in skb_may_tx_timestamp()
Date: Sun, 15 Feb 2026 19:46:08 +0100 [thread overview]
Message-ID: <20260215184608.8dk4hemG@linutronix.de> (raw)
In-Reply-To: <willemdebruijn.kernel.1682a93525c87@gmail.com>
On 2026-02-14 23:05:27 [-0500], Willem de Bruijn wrote:
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5555,16 +5555,25 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
> >
> > static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
> > {
> > - bool ret;
> > + struct socket *sock;
> > + struct file *file;
> >
> > if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
> > return true;
> >
> > - read_lock_bh(&sk->sk_callback_lock);
> > - ret = sk->sk_socket && sk->sk_socket->file &&
> > - file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
> > - read_unlock_bh(&sk->sk_callback_lock);
> > - return ret;
> > + /* The sk pointer remains valid as long as the skb is. The sk_socket and
> > + * file pointer may become NULL if the socket is closed. Both structures
> > + * (including file->cred) are RCU freed which means they can be accessed
> > + * within a RCU read section.
>
> The fields are not __rcu annotated. Is this common knowledge or is
> there clear documentation of this behavior? Not disputing it, just not
> familiar with the rules around these fields.
The pointer is not RCU annotated because normally, within a syscall, the
file descriptor does not vanish and so the pointer remains valid. The
interrupt callback is an exception because we don't hold a reference on
the fd. But since the underlying data structures have a RCU lifetime we
could use it.
sock_alloc() allocates a struct socket_alloc which contains the struct
socket and struct inode (sock_alloc_inode()). This the sk_socket in
struct sock. Once you close the socket via the close syscall and the
last reference is gone then you end up in __fput(). The invoked
->release() callback is sock_close(). Here socket::file gets NULLed and
proto::release should sock_orphan() the sock where sock::sk_socket gets
NULLed (I don't see where else and packet_release() does so). The
"struct sock" remains around because the skb has a reference on.
Otherwise it would go, too.
Once that is done __fput() does dput() -> __dentry_kill() -> iput() ->
destroy_inode() which does RCU free the inode.
That means if we observe the pointer non-NULL we can use it under the
RCU rules because the inode is released via RCU.
->file. At the end of __fput() there is file_free() which releases the
pointer. file goes to filp_cachep which is SLAB_TYPESAFE_BY_RCU. There
is a put_cred for file::f_cred before that and it does call_rcu() at the
end, too.
This is me reading the code on FRI/ SAT and double checking it with
ptp4l. I might have missed something but so far, it makes sense.
> It does seem that updates to sk_socket, in sock_orphan and
> sock_graft, are protected by sk_callback_lock.
> > + */
> > + lockdep_assert_in_rcu_reader();
>
> This function can be called from process context outside an RCU read
> side section, I think.
>
> Such as a packet socket sending and hitting the __skb_tstamp_tx
> SCM_TSTAMP_SCHED at the top of __dev_queue_xmit.
If that is the case, then we could create a RCU read section here.
> > + sock = READ_ONCE(sk->sk_socket);
> > + if (!sock)
> > + return false;
> > + file = READ_ONCE(sock->file);
> > + if (!file)
> > + return false;
> > + return file_ns_capable(file, &init_user_ns, CAP_NET_RAW);
> > }
> >
> > void skb_complete_tx_timestamp(struct sk_buff *skb,
Sebastian
next prev parent reply other threads:[~2026-02-15 18:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-14 23:24 [PATCH net] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-15 4:05 ` Willem de Bruijn
2026-02-15 18:46 ` Sebastian Andrzej Siewior [this message]
2026-02-16 16:47 ` Vadim Fedorenko
2026-02-16 18:08 ` Willem de Bruijn
2026-02-17 13:28 ` Sebastian Andrzej Siewior
2026-02-17 14:14 ` Willem de Bruijn
2026-02-17 12:13 ` Sebastian Andrzej Siewior
2026-02-17 7:48 ` Jason Xing
2026-02-17 14:44 ` Sebastian Andrzej Siewior
2026-02-17 8:01 ` Eric Dumazet
2026-02-17 14:45 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260215184608.8dk4hemG@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox