From: David Laight <david.laight.linux@gmail.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
horms@kernel.org, idosch@nvidia.com, jianhao.xu@seu.edu.cn,
kuba@kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, pabeni@redhat.com, runyu.xiao@seu.edu.cn,
stable@vger.kernel.org
Subject: Re: [PATCH net] ipv6: use READ_ONCE() in ipv6_flowlabel_get()
Date: Tue, 2 Jun 2026 09:00:34 +0100 [thread overview]
Message-ID: <20260602090034.7a5c243e@pumpkin> (raw)
In-Reply-To: <20260601231546.3407019-1-kuniyu@google.com>
On Mon, 1 Jun 2026 23:14:44 +0000
Kuniyuki Iwashima <kuniyu@google.com> wrote:
> From: David Laight <david.laight.linux@gmail.com>
> Date: Mon, 1 Jun 2026 22:31:22 +0100
> > On Mon, 1 Jun 2026 05:36:37 -0700
> > Eric Dumazet <edumazet@google.com> wrote:
> >
> > > On Mon, Jun 1, 2026 at 5:22 AM David Laight
> > > <david.laight.linux@gmail.com> wrote:
> > > >
> > > > On Sun, 31 May 2026 23:39:46 +0800
> > > > Runyu Xiao <runyu.xiao@seu.edu.cn> wrote:
> > > >
> > > > > ipv6_flowlabel_get() still reads the shared per-net sysctl fields
> > > > > flowlabel_consistency and flowlabel_state_ranges with plain loads,
> > > > > while writers update them through proc_dou8vec_minmax(). These checks
> > > > > run in the live IPV6_FLOWLABEL_MGR path, so lockless plain reads leave
> > > > > KCSAN-visible data races and can make the policy checks observe stale or
> > > > > inconsistent values.
> > > > >
> > > > > The race can be reached on a running system by toggling
> > > > > /proc/sys/net/ipv6/flowlabel_consistency and
> > > > > /proc/sys/net/ipv6/flowlabel_state_ranges while another task repeatedly
> > > > > issues IPV6_FLOWLABEL_MGR requests with IPV6_FL_F_REFLECT or a
> > > > > state-ranges flow label.
> > > > >
> > > > > This issue was first flagged by our static analysis tool while scanning
> > > > > lockless IPv6 sysctl readers, then manually audited on Linux v6.18.21.
> > > > > The IPV6_FLOWLABEL_MGR paths were runtime-reproduced with QEMU/KCSAN by
> > > > > concurrently flipping the two sysctls while TCP reflect and UDP
> > > > > state-ranges setsockopt actors exercised ipv6_flowlabel_get(). KCSAN
> > > > > reported races between proc_dou8vec_minmax() and the two plain-load
> > > > > sites in ipv6_flowlabel_get().
> > > > >
> > > > > A narrower second-round UDPv6 + IPV6_AUTOFLOWLABEL send-side reproducer
> > > > > also hit the inline ip6_make_flowlabel() reader through
> > > > > __ip6_make_skb() / proc_dou8vec_minmax(), but that site is already
> > > > > fixed in this tree by commit ded139b59b5d
> > > > > ("ipv6: annotate data-races from ip6_make_flowlabel()"). The remaining
> > > > > plain readers in this tree are both in ipv6_flowlabel_get().
> > > > >
> > > > > Use READ_ONCE() for those remaining sysctl reads so they follow the same
> > > > > lockless reader contract already used by other IPv6 sysctl readers.
> > > > >
> > > > > Build-tested by compiling net/ipv6/ip6_flowlabel.o on x86_64.
> > > > >
> > > > > Representative QEMU/KCSAN reports from the two target reader paths:
> > > > >
> > > > > BUG: KCSAN: data-race in ipv6_flowlabel_opt / proc_dou8vec_minmax
> > > > > write: proc_dou8vec_minmax+0x206/0x220
> > > > > read: ipv6_flowlabel_opt+0x6d8/0xd20
> > > > > do_ipv6_setsockopt+0x873/0x2220
> > > > > tcp_setsockopt+0x72/0xb0
> > > > >
> > > > > BUG: KCSAN: data-race in ipv6_flowlabel_opt / proc_dou8vec_minmax
> > > > > write: proc_dou8vec_minmax+0x206/0x220
> > > > > read: ipv6_flowlabel_opt+0x129/0xd20
> > > > > do_ipv6_setsockopt+0x873/0x2220
> > > > > udpv6_setsockopt+0x21/0x40
> > > > >
> > > > > Fixes: 6444f72b4b74 ("ipv6: add flowlabel_consistency sysctl")
> > > > > Fixes: 82a584b7cd36 ("ipv6: Flow label state ranges")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> > > > > ---
> > > > > net/ipv6/ip6_flowlabel.c | 4 ++--
> > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >A
> > > > > diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> > > > > index b1ccdf0dc646..1ab5ad0dcf24 100644
> > > > > --- a/net/ipv6/ip6_flowlabel.c
> > > > > +++ b/net/ipv6/ip6_flowlabel.c
> > > > > @@ -620,7 +620,7 @@ static int ipv6_flowlabel_get(struct sock *sk, struct in6_flowlabel_req *freq,
> > > > > int err;
> > > > >
> > > > > if (freq->flr_flags & IPV6_FL_F_REFLECT) {
> > > > > - if (net->ipv6.sysctl.flowlabel_consistency) {
> > > > > + if (READ_ONCE(net->ipv6.sysctl.flowlabel_consistency)) {
> > > >
> > > > That can't actually fix anything.
> > >
> > > It fixes a KCSAN splat.
> > >
> > > If you think you can fix KCSAN instead, please do so.
ipv6.h has:
u8 flowlabel_consistency;
KCSAN probably shouldn't care about byte reads.
> >
> > It is a false positive.
>
> It's not.
>
>
> > (Which I think you also said in a different email.
>
> I guess you meant this one ?
> https://lore.kernel.org/netdev/20260601074201.1186061-1-runyu.xiao@seu.edu.cn/
>
> This is different because, in addition to Eric's comment, IPv6
> address is 128-bit and data-race is inevitable without locking
> unless CPU supports native 128-bit read/write; we already do
> load/store-tearing of 128bit with u32/u64.
But the code isn't looking at a 128bit value, it is only doing a check
for zero (and READ_ONCE() doesn't support 128bit values).
If there is no locking the value can change just before/after the test.
Even if it were subject to read/write tearing absolutely the worst that
could happen is a zero being detected when the value changes between
two non-zero values.
That isn't relevant here - it is just a boolean.
-- David
next prev parent reply other threads:[~2026-06-02 8:00 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 15:39 [PATCH net] ipv6: use READ_ONCE() in ipv6_flowlabel_get() Runyu Xiao
2026-05-31 18:04 ` Eric Dumazet
2026-06-01 12:22 ` David Laight
2026-06-01 12:36 ` Eric Dumazet
2026-06-01 21:31 ` David Laight
2026-06-01 23:14 ` Kuniyuki Iwashima
2026-06-02 8:00 ` David Laight [this message]
2026-06-02 8:10 ` Eric Dumazet
2026-06-02 9:46 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260602090034.7a5c243e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=jianhao.xu@seu.edu.cn \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=runyu.xiao@seu.edu.cn \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.