public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>,
	andrew+netdev@lunn.ch, davem@davemloft.net, pabeni@redhat.com,
	netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.6] ipv4: igmp: annotate data-races around idev->mr_maxdelay
Date: Sat, 14 Feb 2026 16:23:26 -0500	[thread overview]
Message-ID: <20260214212452.782265-61-sashal@kernel.org> (raw)
In-Reply-To: <20260214212452.782265-1-sashal@kernel.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit e4faaf65a75f650ac4366ddff5dabb826029ca5a ]

idev->mr_maxdelay is read and written locklessly,
add READ_ONCE()/WRITE_ONCE() annotations.

While we are at it, make this field an u32.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260122172247.2429403-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Analysis of commit: "ipv4: igmp: annotate data-races around
idev->mr_maxdelay"

### 1. COMMIT MESSAGE ANALYSIS

The commit message is straightforward: it adds READ_ONCE()/WRITE_ONCE()
annotations to `idev->mr_maxdelay` because the field is read and written
locklessly (i.e., without holding a lock that protects both the reader
and writer). Additionally, the field type is changed from `unsigned
long` to `u32`.

**Author**: Eric Dumazet — a highly prolific and respected networking
developer at Google, known for data-race annotation work across the
networking stack.

**Reviewed-by**: David Ahern — another senior networking developer.

### 2. CODE CHANGE ANALYSIS

The changes are minimal and contained:

**In `include/linux/inetdevice.h`:**
- `mr_maxdelay` field type changed from `unsigned long` to `u32`
- Field repositioned in the struct (moved after `mr_gq_running` which is
  `unsigned char`, before `mr_ifc_count` which is `u32`) — this is
  likely for better packing/alignment

**In `net/ipv4/igmp.c`:**
- **Line ~230 (igmp_gq_start_timer)**: `in_dev->mr_maxdelay` →
  `READ_ONCE(in_dev->mr_maxdelay)` — the reader side
- **Line ~1012 (igmp_heard_query)**: `in_dev->mr_maxdelay = max_delay` →
  `WRITE_ONCE(in_dev->mr_maxdelay, max_delay)` — the writer side

### 3. DATA RACE ANALYSIS

Let me examine the concurrency situation:

- **Writer**: `igmp_heard_query()` writes `mr_maxdelay` when processing
  an incoming IGMPv3 query. This runs in softirq/BH context when
  receiving network packets.
- **Reader**: `igmp_gq_start_timer()` reads `mr_maxdelay` to calculate a
  random timer delay. This is called from `igmp_heard_query()` itself,
  but also potentially from other contexts.

The key question: can the read and write happen concurrently? Looking at
the code, `igmp_heard_query()` writes `mr_maxdelay` and then calls
`igmp_gq_start_timer()` which reads it. However, on different CPUs
processing different network packets, one CPU could be writing while
another reads. Without proper annotations, the compiler could optimize
these accesses in ways that cause tearing or stale reads.

This is a real KCSAN-detectable data race. While the consequences may
not be catastrophic (the value is used for a random timer delay), it is
technically undefined behavior in the C memory model, and on
architectures where `unsigned long` is 64-bit but atomicity is only
guaranteed for 32-bit, there could be torn reads producing garbage
values that get passed to `get_random_u32_below()`.

### 4. TYPE CHANGE: unsigned long → u32

The change from `unsigned long` to `u32` is notable:
- `max_delay` in `igmp_heard_query()` is computed as
  `IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE)` — the MRC field is from
  a network packet and is bounded, so the value fits in u32.
- `get_random_u32_below()` takes a `u32` argument, so this makes the
  types consistent.
- On 64-bit systems, this also ensures atomic read/write since u32
  accesses are atomic on all Linux-supported architectures, which
  complements the READ_ONCE/WRITE_ONCE annotations.

### 5. RISK ASSESSMENT

**Risk**: Very low.
- The READ_ONCE/WRITE_ONCE annotations are purely compiler directives
  that don't change runtime behavior on most architectures
- The type change from `unsigned long` to `u32` is safe because the
  values stored are always small (timer delays in jiffies, derived from
  IGMP protocol fields)
- The struct field reordering doesn't affect functionality
- Only 2 files changed, only 3 lines of actual code logic changed

**Benefit**: Fixes a data race that could theoretically cause:
- Torn reads on 32-bit architectures (if `unsigned long` were involved
  in non-atomic access — though on 32-bit it's 32-bit anyway)
- Compiler-induced issues where the compiler might reload or optimize
  the value in unexpected ways
- Silences KCSAN warnings, which is important for finding real races

### 6. STABLE SUITABILITY

**Meets stable criteria?**
- **Obviously correct**: Yes — this is a textbook data-race annotation
  pattern from a top networking developer, reviewed by another expert
- **Fixes a real bug**: Yes — data races are real bugs (KCSAN reports
  them), even if the consequences are subtle
- **Small and contained**: Yes — minimal changes across 2 files
- **No new features**: Correct — purely a bug fix
- **No new APIs**: Correct

**Concerns:**
- The type change (`unsigned long` → `u32`) and struct field reordering
  are slightly beyond "pure annotation" and could theoretically conflict
  with other patches. However, the type change is necessary for
  correctness (ensuring atomicity on all architectures).
- The struct layout change might cause minor backport conflicts if other
  fields were added/modified in stable trees.

### 7. PRECEDENT

Eric Dumazet has authored hundreds of similar READ_ONCE/WRITE_ONCE
annotation patches in the networking stack, and many of them have been
backported to stable. These are considered important for correctness and
for enabling KCSAN to find real bugs by eliminating false positives.

### 8. CONCLUSION

This is a small, well-reviewed fix for a real data race in the IPv4 IGMP
code path. The IGMP code handles multicast group membership and is used
on any system with multicast networking. The fix is from a trusted
author, reviewed by a trusted reviewer, and follows established
patterns. The type change is safe and actually improves correctness. The
risk of regression is negligible.

**YES**

 include/linux/inetdevice.h | 2 +-
 net/ipv4/igmp.c            | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 5730ba6b1cfaf..dccbeb25f7014 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -38,11 +38,11 @@ struct in_device {
 	struct ip_mc_list	*mc_tomb;
 	unsigned long		mr_v1_seen;
 	unsigned long		mr_v2_seen;
-	unsigned long		mr_maxdelay;
 	unsigned long		mr_qi;		/* Query Interval */
 	unsigned long		mr_qri;		/* Query Response Interval */
 	unsigned char		mr_qrv;		/* Query Robustness Variable */
 	unsigned char		mr_gq_running;
+	u32			mr_maxdelay;
 	u32			mr_ifc_count;
 	struct timer_list	mr_gq_timer;	/* general query timer */
 	struct timer_list	mr_ifc_timer;	/* interface change timer */
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 7182f1419c2a4..0adc993c211d7 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -227,7 +227,7 @@ static void igmp_start_timer(struct ip_mc_list *im, int max_delay)
 
 static void igmp_gq_start_timer(struct in_device *in_dev)
 {
-	int tv = get_random_u32_below(in_dev->mr_maxdelay);
+	int tv = get_random_u32_below(READ_ONCE(in_dev->mr_maxdelay));
 	unsigned long exp = jiffies + tv + 2;
 
 	if (in_dev->mr_gq_running &&
@@ -1009,7 +1009,7 @@ static bool igmp_heard_query(struct in_device *in_dev, struct sk_buff *skb,
 		max_delay = IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE);
 		if (!max_delay)
 			max_delay = 1;	/* can't mod w/ 0 */
-		in_dev->mr_maxdelay = max_delay;
+		WRITE_ONCE(in_dev->mr_maxdelay, max_delay);
 
 		/* RFC3376, 4.1.6. QRV and 4.1.7. QQIC, when the most recently
 		 * received value was zero, use the default or statically
-- 
2.51.0


  parent reply	other threads:[~2026-02-14 21:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260214212452.782265-1-sashal@kernel.org>
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] myri10ge: avoid uninitialized variable use Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.1] net: mctp-i2c: fix duplicate reception of old data Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] net: wwan: mhi: Add network support for Foxconn T99W760 Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.10] net/rds: Clear reconnect pending bit Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-6.12] ipv6: annotate data-races over sysctl.flowlabel_reflect Sasha Levin
2026-02-14 21:22 ` [PATCH AUTOSEL 6.19-5.15] ipv6: exthdrs: annotate data-race over multiple sysctl Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] octeontx2-af: Workaround SQM/PSE stalls by disabling sticky Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] vmw_vsock: bypass false-positive Wnonnull warning with gcc-16 Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.15] ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}() Sasha Levin
2026-02-14 21:23 ` Sasha Levin [this message]
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] net/rds: No shortcut out of RDS_CONN_ERROR Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] ipv6: annotate data-races in net/ipv6/route.c Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] bnxt_en: Allow ntuple filters for drops Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.18] ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device match Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-5.10] ipv4: fib: Annotate access to struct fib_alias.fa_state Sasha Levin
2026-02-14 21:23 ` [PATCH AUTOSEL 6.19-6.12] net: sfp: add quirk for Lantech 8330-265D Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260214212452.782265-61-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox