From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Kurt Kanzenbach <kurt@linutronix.de>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] net: Drop the lock in skb_may_tx_timestamp()
Date: Mon, 16 Feb 2026 16:47:03 +0000 [thread overview]
Message-ID: <9130dfca-1e85-4975-8df9-4de8bd9e8f5e@linux.dev> (raw)
In-Reply-To: <20260215184608.8dk4hemG@linutronix.de>
On 15/02/2026 18:46, Sebastian Andrzej Siewior wrote:
> On 2026-02-14 23:05:27 [-0500], Willem de Bruijn wrote:
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -5555,16 +5555,25 @@ static void __skb_complete_tx_timestamp(struct sk_buff *skb,
>>>
>>> static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly)
>>> {
>>> - bool ret;
>>> + struct socket *sock;
>>> + struct file *file;
>>>
>>> if (likely(tsonly || READ_ONCE(sock_net(sk)->core.sysctl_tstamp_allow_data)))
>>> return true;
>>>
>>> - read_lock_bh(&sk->sk_callback_lock);
>>> - ret = sk->sk_socket && sk->sk_socket->file &&
>>> - file_ns_capable(sk->sk_socket->file, &init_user_ns, CAP_NET_RAW);
>>> - read_unlock_bh(&sk->sk_callback_lock);
>>> - return ret;
>>> + /* The sk pointer remains valid as long as the skb is. The sk_socket and
>>> + * file pointer may become NULL if the socket is closed. Both structures
>>> + * (including file->cred) are RCU freed which means they can be accessed
>>> + * within a RCU read section.
>>
>> The fields are not __rcu annotated. Is this common knowledge or is
>> there clear documentation of this behavior? Not disputing it, just not
>> familiar with the rules around these fields.
>
> The pointer is not RCU annotated because normally, within a syscall, the
> file descriptor does not vanish and so the pointer remains valid. The
> interrupt callback is an exception because we don't hold a reference on
> the fd. But since the underlying data structures have a RCU lifetime we
> could use it.
>
> sock_alloc() allocates a struct socket_alloc which contains the struct
> socket and struct inode (sock_alloc_inode()). This the sk_socket in
> struct sock. Once you close the socket via the close syscall and the
> last reference is gone then you end up in __fput(). The invoked
> ->release() callback is sock_close(). Here socket::file gets NULLed and
> proto::release should sock_orphan() the sock where sock::sk_socket gets
> NULLed (I don't see where else and packet_release() does so). The
> "struct sock" remains around because the skb has a reference on.
> Otherwise it would go, too.
>
> Once that is done __fput() does dput() -> __dentry_kill() -> iput() ->
> destroy_inode() which does RCU free the inode.
> That means if we observe the pointer non-NULL we can use it under the
> RCU rules because the inode is released via RCU.
>
> ->file. At the end of __fput() there is file_free() which releases the
> pointer. file goes to filp_cachep which is SLAB_TYPESAFE_BY_RCU. There
> is a put_cred for file::f_cred before that and it does call_rcu() at the
> end, too.
>
> This is me reading the code on FRI/ SAT and double checking it with
> ptp4l. I might have missed something but so far, it makes sense.
>
>> It does seem that updates to sk_socket, in sock_orphan and
>> sock_graft, are protected by sk_callback_lock.
>>> + */
>>> + lockdep_assert_in_rcu_reader();
>>
>> This function can be called from process context outside an RCU read
>> side section, I think.
>>
>> Such as a packet socket sending and hitting the __skb_tstamp_tx
>> SCM_TSTAMP_SCHED at the top of __dev_queue_xmit.
>
> If that is the case, then we could create a RCU read section here.
well, __dev_queue_xmit() grabs rcu_read_lock_bh() already, we just have
to move it right before timestamping check:
iff --git a/net/core/dev.c b/net/core/dev.c
index ac6bcb2a0784..e9a72da5a214 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4744,15 +4744,15 @@ int __dev_queue_xmit(struct sk_buff *skb, struct
net_device *sb_dev)
skb_reset_mac_header(skb);
skb_assert_len(skb);
- if (unlikely(skb_shinfo(skb)->tx_flags &
- (SKBTX_SCHED_TSTAMP | SKBTX_BPF)))
- __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
-
/* Disable soft irqs for various locks below. Also
* stops preemption for RCU.
*/
rcu_read_lock_bh();
+ if (unlikely(skb_shinfo(skb)->tx_flags &
+ (SKBTX_SCHED_TSTAMP | SKBTX_BPF)))
+ __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
+
skb_update_prio(skb);
qdisc_pkt_len_segs_init(skb);
>
>>> + sock = READ_ONCE(sk->sk_socket);
>>> + if (!sock)
>>> + return false;
>>> + file = READ_ONCE(sock->file);
>>> + if (!file)
>>> + return false;
>>> + return file_ns_capable(file, &init_user_ns, CAP_NET_RAW);
>>> }
>>>
>>> void skb_complete_tx_timestamp(struct sk_buff *skb,
>
> Sebastian
next prev parent reply other threads:[~2026-02-16 16:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-14 23:24 [PATCH net] net: Drop the lock in skb_may_tx_timestamp() Sebastian Andrzej Siewior
2026-02-15 4:05 ` Willem de Bruijn
2026-02-15 18:46 ` Sebastian Andrzej Siewior
2026-02-16 16:47 ` Vadim Fedorenko [this message]
2026-02-16 18:08 ` Willem de Bruijn
2026-02-17 13:28 ` Sebastian Andrzej Siewior
2026-02-17 14:14 ` Willem de Bruijn
2026-02-17 12:13 ` Sebastian Andrzej Siewior
2026-02-17 7:48 ` Jason Xing
2026-02-17 14:44 ` Sebastian Andrzej Siewior
2026-02-17 8:01 ` Eric Dumazet
2026-02-17 14:45 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9130dfca-1e85-4975-8df9-4de8bd9e8f5e@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=bigeasy@linutronix.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kurt@linutronix.de \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox