From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4014F3FE655; Mon, 20 Apr 2026 13:26:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776691598; cv=none; b=CtRREQG7M6PYLsWx5u7GWojZWuS89kScVngjA+8J8+oWuXtexRpe0R1BtBN805NVlPEARtMUuFG590o4qgfAKthSxtDlTgoE+JtUw1QDVuAZbG1zBAXXkFNZla47jGIA0uZeXtVZeZfQ0eAox0LkTGS8S0jznHEpm+w8RXqkCqw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776691598; c=relaxed/simple; bh=1rGgOdUJKJk0sewHE8X1bxXSeo1TeqHKqTXmaURPeZ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dGwJN5Jz3K8MZYVlrUfUcYiKcoS97Y4fXq2RqnoSEqm+rXQZGHgWbmH5lixFUo7wf895zEA8JfKLf8bu7Z83izXhXeT+A6OpJM6DW6Tig/5I2Y9O1w9ZRR6lXg4Ke53bpnl88Net7t8XqKm9UF3VbeWKfPF8h0kVPO4z4Fah9jY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UZOtGlay; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UZOtGlay" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F1D6C2BCB4; Mon, 20 Apr 2026 13:26:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776691597; bh=1rGgOdUJKJk0sewHE8X1bxXSeo1TeqHKqTXmaURPeZ0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UZOtGlay37nHzbKtibnXiY5i4doUFu25GmJZvnVNFdzor7iYtJmIVGWToYt+DARlA cHwB5etKKmELiEDJ34ivUCPNkxqxokNlWqMaQqxAmA5HZCpPHcyLs+Xunjoy6Hc0rp DElYptNuZx3h+2ZE+DetjqYgP3QCwYPArlFKDwOY3xkSEmumd6zfX9+dgMY18Hru/c 5qaA8ibMg4rHoVFHZtgQp5/b6NkylTZ+t0zJmZ7BSnRJglJ9Yrii0pTd5EytKwNTVY DhgFxUxJcIC4JCwUsig1fW76kUNoTTykjlkFy0CwAJ8GdolrCV9/IQF6cbs/j32+LK T440zPv4Od36Q== From: Sasha Levin To: patches@lists.linux.dev, stable@vger.kernel.org Cc: Fernando Fernandez Mancera , Jakub Kicinski , Sasha Levin , davem@davemloft.net, dsahern@kernel.org, edumazet@google.com, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH AUTOSEL 7.0-6.18] ipv6: discard fragment queue earlier if there is malformed datagram Date: Mon, 20 Apr 2026 09:18:37 -0400 Message-ID: <20260420132314.1023554-123-sashal@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260420132314.1023554-1-sashal@kernel.org> References: <20260420132314.1023554-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 7.0 Content-Transfer-Encoding: 8bit From: Fernando Fernandez Mancera [ Upstream commit 9ff2d2a9837015ff4b3579b028aeae8c180aa8d3 ] Currently the kernel IPv6 implementation is not dicarding the fragment queue upon receiving a IPv6 fragment that is not 8 bytes aligned. It relies on queue expiration to free the queue. While RFC 8200 section 4.5 does not explicitly mention that the rest of fragments must be discarded, it does not make sense to keep them. The parameter problem message is sent regardless that. In addition, if the sender is able to re-compose the datagram so it is 8 bytes aligned it would qualify as a new whole datagram not fitting into the same fragment queue. The same situation happens if segment end is exceeding the IPv6 maximum packet length. The sooner we can free resources the better during reassembly, the better. Signed-off-by: Fernando Fernandez Mancera Link: https://patch.msgid.link/20260225133758.4553-1-fmancera@suse.de Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin --- LLM Generated explanations, may be completely bogus: Now I have all the information I need. Let me compile the complete analysis. --- ## PHASE 1: COMMIT MESSAGE FORENSICS ### Step 1.1: PARSE THE SUBJECT LINE Record: [ipv6] [discard] - "discard fragment queue earlier if there is malformed datagram" - The word "discard" combined with "earlier" implies resource cleanup is being moved sooner, suggesting a resource leak or memory waste fix. ### Step 1.2: PARSE ALL COMMIT MESSAGE TAGS - **Signed-off-by:** Fernando Fernandez Mancera (author) - **Link:** https://patch.msgid.link/20260225133758.4553-1-fmancera@suse.de - **Signed-off-by:** Jakub Kicinski (net maintainer - applied the patch) - No Fixes: tag (expected for manual review candidates) - No Reported-by: tag - No Cc: stable tag (expected) Record: Author is a SUSE contributor. Applied by Jakub Kicinski (net tree maintainer), which is a strong trust signal. ### Step 1.3: ANALYZE THE COMMIT BODY TEXT The commit explains: 1. When receiving a non-8-byte-aligned IPv6 fragment, the kernel sends an ICMP parameter problem but does NOT discard the fragment queue 2. Same issue when the segment end exceeds IPV6_MAXPLEN 3. The queue sits idle until its timeout timer fires 4. RFC 8200 section 4.5 doesn't explicitly require discard, but keeping the queue is pointless 5. "The sooner we can free resources the better during reassembly" Record: **Bug**: Fragment queues linger unnecessarily when malformed fragments are detected, consuming memory until timeout. **Failure mode**: Resource waste, potential DoS vector. **Root cause**: Two early return paths in `ip6_frag_queue()` don't call `inet_frag_kill()`. ### Step 1.4: DETECT HIDDEN BUG FIXES Record: Yes - this is a resource leak fix disguised as "optimization." While framed as "discarding earlier," the real issue is that fragment queues holding malformed fragments are never killed, only timing out. This is a real resource leak in the networking hot path, exploitable for DoS by sending crafted malformed IPv6 fragments. --- ## PHASE 2: DIFF ANALYSIS ### Step 2.1: INVENTORY THE CHANGES - **net/ipv6/reassembly.c**: +6 lines, 0 removed - Function modified: `ip6_frag_queue()` - Two hunks, each adding 3 lines (identical pattern) at two existing `return -1` sites - Scope: single-file, surgical fix ### Step 2.2: UNDERSTAND THE CODE FLOW CHANGE **Hunk 1** (end > IPV6_MAXPLEN check, ~line 130): - BEFORE: Sets `*prob_offset` and returns -1, leaving fq alive in hash table - AFTER: Calls `inet_frag_kill(&fq->q, refs)` + increments REASMFAILS stat, THEN returns -1 **Hunk 2** (end & 0x7 alignment check, ~line 161): - BEFORE: Sets `*prob_offset` and returns -1, leaving fq alive in hash table - AFTER: Calls `inet_frag_kill(&fq->q, refs)` + increments REASMFAILS stat, THEN returns -1 Both changes follow the exact same pattern as the existing `discard_fq` label at line 241-244. ### Step 2.3: IDENTIFY THE BUG MECHANISM Record: **Category**: Resource leak fix. The fragment queue (with all its previously received fragments, timer, hash entry) lingers until the 60-second timeout when it should be immediately cleaned up. `inet_frag_kill()` deletes the timer, sets INET_FRAG_COMPLETE, and removes the queue from the hash table. ### Step 2.4: ASSESS THE FIX QUALITY - **Obviously correct**: Yes - mirrors the existing `discard_fq` pattern exactly - **Minimal/surgical**: Yes - 6 lines total, 3 lines per error path - **Regression risk**: Very low - these paths already return -1 (error). The only change is that the fragment queue is cleaned up sooner. The caller (`ipv6_frag_rcv`) already handles `inet_frag_putn()` to drop refs - **Red flags**: None --- ## PHASE 3: GIT HISTORY INVESTIGATION ### Step 3.1: BLAME THE CHANGED LINES >From git blame: - The `if (end > IPV6_MAXPLEN)` check dates to the original kernel (`^1da177e4c3f41`, 2005) - The `return -1` at line 135 was introduced by `f61944efdf0d25` (Herbert Xu, 2007) - The `if (end & 0x7)` check dates to the original kernel (`^1da177e4c3f41`, 2005) - The `return -1` at line 166 was introduced by `f61944efdf0d25` (Herbert Xu, 2007) Record: **The buggy pattern has existed since 2005/2007** - present in ALL active stable trees. ### Step 3.2: RELATED HISTORICAL FIX No explicit Fixes: tag, but the 2018 commit `2475f59c618ea` ("ipv6: discard IP frag queue on more errors") by Peter Oskolkov is highly relevant. That commit changed many error paths from `goto err` to `goto discard_fq` but **missed these two paths** because they use `*prob_offset` + `return -1` instead of `kfree_skb`. The IPv4 equivalent was `0ff89efb5246` ("ip: fail fast on IP defrag errors") from the same author, which described the motivation: "fail fast: corrupted frag queues are cleared immediately, instead of by timeout." Record: This commit completes the work started in 2018 by catching the two remaining error paths. ### Step 3.3: FILE HISTORY Recent changes to reassembly.c are mostly refactoring (`inet_frag_kill` signature change in `eb0dfc0ef195a`, SKB_DR addition, helpers). No conflicting fixes to the same two error paths. Record: Standalone fix, no prerequisites beyond what's already in the file. ### Step 3.4: AUTHOR CONTEXT Fernando Fernandez Mancera is a SUSE contributor with multiple networking commits (netfilter, IPv4/IPv6, xfrm). Patch was applied by Jakub Kicinski (net maintainer). ### Step 3.5: DEPENDENCIES The fix uses `inet_frag_kill(&fq->q, refs)` with the `refs` parameter, which was introduced in `eb0dfc0ef195a` (March 2025, v6.15 cycle). For older stable trees, the call would be `inet_frag_kill(&fq->q)` - a trivial backport adjustment. Record: Clean apply on v6.15+. Minor adjustment needed for v6.12 and older. --- ## PHASE 4: MAILING LIST RESEARCH ### Step 4.1-4.5 Lore.kernel.org was not accessible (anti-scraping protection). However: - The patch was applied by Jakub Kicinski (net maintainer), indicating it passed review - The Link: tag confirms it went through the standard kernel mailing list process - Single-patch submission (not part of a series) Record: Could not access lore discussion directly. Applied by net maintainer. --- ## PHASE 5: CODE SEMANTIC ANALYSIS ### Step 5.1: FUNCTIONS MODIFIED - `ip6_frag_queue()` - the IPv6 fragment queue insertion function ### Step 5.2: CALLERS `ip6_frag_queue()` is called from `ipv6_frag_rcv()` (line 387), which is the main IPv6 fragment receive handler registered as `frag_protocol.handler`. This is called for **every IPv6 fragmented packet** received by the system. ### Step 5.3: INET_FRAG_KILL BEHAVIOR `inet_frag_kill()` (net/ipv4/inet_fragment.c:263): 1. Deletes the expiration timer 2. Sets `INET_FRAG_COMPLETE` flag 3. Removes from the rhashtable (if not dead) 4. Accumulates ref drops into `*refs` The caller `ipv6_frag_rcv()` then calls `inet_frag_putn(&fq->q, refs)` which handles the deferred refcount drops. ### Step 5.4: REACHABILITY The buggy path is directly reachable from any incoming IPv6 fragmented packet. An attacker can craft packets that: - Have `end > IPV6_MAXPLEN` (oversized fragment) - Have non-8-byte-aligned fragment length Both are trivially triggerable from the network. Record: **Directly reachable from network input** - no special configuration needed. --- ## PHASE 6: STABLE TREE ANALYSIS ### Step 6.1: CODE EXISTS IN ALL STABLE TREES The buggy code (`return -1` without `inet_frag_kill`) has existed since 2005/2007. All active stable trees (5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y) contain the buggy code. ### Step 6.2: BACKPORT COMPLICATIONS - v6.15+: Clean apply (has `refs` parameter) - v6.12 and older: `inet_frag_kill()` takes only `&fq->q` (no `refs`). Trivial adjustment: change `inet_frag_kill(&fq->q, refs)` to `inet_frag_kill(&fq->q)`. ### Step 6.3: RELATED FIXES IN STABLE No other fix for these specific two paths found. --- ## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT ### Step 7.1: SUBSYSTEM CRITICALITY - **Subsystem**: net/ipv6 - IPv6 fragment reassembly - **Criticality**: CORE - IPv6 networking affects virtually all modern systems - Fragment reassembly is a critical network stack function ### Step 7.2: SUBSYSTEM ACTIVITY The file sees regular activity, primarily from Eric Dumazet (Google) and other core net developers. --- ## PHASE 8: IMPACT AND RISK ASSESSMENT ### Step 8.1: AFFECTED POPULATION **Universal** - any system receiving IPv6 fragmented traffic (which is any IPv6-enabled system). ### Step 8.2: TRIGGER CONDITIONS - **Trivially triggerable**: Send a malformed IPv6 fragment from the network - **No authentication required**: Raw network packets - **Remote**: Attackable over the network without local access ### Step 8.3: FAILURE MODE SEVERITY - Without fix: Fragment queues leak for up to 60 seconds per malformed fragment - An attacker can exhaust `ip6frag_high_thresh` by sending many malformed fragment pairs (first valid fragment to create queue, then malformed to trigger the bug), causing **denial of service** for legitimate IPv6 fragment reassembly - Severity: **HIGH** (remote DoS via resource exhaustion) ### Step 8.4: RISK-BENEFIT RATIO - **Benefit**: HIGH - prevents remote resource exhaustion in core networking code - **Risk**: VERY LOW - 6 lines, follows existing pattern exactly, only affects error paths for already-invalid packets - **Ratio**: Strongly favorable --- ## PHASE 9: FINAL SYNTHESIS ### Step 9.1: EVIDENCE COMPILATION **FOR backporting:** - Fixes a real resource leak in IPv6 fragment reassembly (core networking) - Remotely exploitable for DoS (no authentication needed) - Bug exists in ALL stable trees (since 2005/2007) - Tiny, surgical fix (6 lines) following existing code patterns - Applied by net maintainer Jakub Kicinski - Completes work started by 2018 fix (`2475f59c618ea`) that missed these paths - The IPv4 equivalent was already fixed in 2018 **AGAINST backporting:** - No explicit Cc: stable or Fixes: tag (expected - that's why we're reviewing) - Older stable trees need trivial backport adjustment for `refs` parameter - No syzbot report or user bug report cited ### Step 9.2: STABLE RULES CHECKLIST 1. Obviously correct and tested? **YES** - follows exact same pattern as `discard_fq` label 2. Fixes a real bug? **YES** - resource leak / potential DoS 3. Important issue? **YES** - remote resource exhaustion in core networking 4. Small and contained? **YES** - 6 lines in one file 5. No new features or APIs? **YES** - only adds cleanup to error paths 6. Can apply to stable? **YES** - clean apply on 6.15+, trivial adjustment for older ### Step 9.3: EXCEPTION CATEGORIES Not applicable - this is a standard bug fix, not an exception category. --- ## Verification - [Phase 1] Parsed tags: SOB from author (fmancera@suse.de), Link to patch.msgid.link, SOB from Jakub Kicinski (net maintainer) - [Phase 2] Diff analysis: +6 lines in `ip6_frag_queue()`, adds `inet_frag_kill()` + stats at two early-return error paths - [Phase 3] git blame: buggy `return -1` pattern introduced by `f61944efdf0d25` (v2.6.24, 2007), check code from `^1da177e4c3f41` (v2.6.12, 2005) - [Phase 3] git show `2475f59c618ea`: confirmed 2018 fix missed these two paths specifically - [Phase 3] git show `0ff89efb5246`: confirmed IPv4 equivalent "fail fast" approach - [Phase 3] git show `eb0dfc0ef195a`: confirmed `refs` parameter was added in 2025 (v6.15 cycle) - [Phase 4] Lore not accessible (anti-scraping); confirmed patch applied by Jakub Kicinski - [Phase 5] Traced callers: `ipv6_frag_rcv()` -> `ip6_frag_queue()`, network input path - [Phase 5] Read `inet_frag_kill()` implementation: kills timer, removes from hash, defers ref drops - [Phase 5] Verified caller handles refs via `inet_frag_putn(&fq->q, refs)` at line 392 - [Phase 6] Code exists in all active stable trees (v5.10+) - [Phase 6] Backport needs trivial adjustment for pre-v6.15 trees (no `refs` param) - [Phase 8] Failure mode: remote resource exhaustion in IPv6 fragment reassembly, severity HIGH **YES** net/ipv6/reassembly.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 25ec8001898df..11f9144bebbe2 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -132,6 +132,9 @@ static int ip6_frag_queue(struct net *net, /* note that if prob_offset is set, the skb is freed elsewhere, * we do not free it here. */ + inet_frag_kill(&fq->q, refs); + __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), + IPSTATS_MIB_REASMFAILS); return -1; } @@ -163,6 +166,9 @@ static int ip6_frag_queue(struct net *net, * this case. -DaveM */ *prob_offset = offsetof(struct ipv6hdr, payload_len); + inet_frag_kill(&fq->q, refs); + __IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)), + IPSTATS_MIB_REASMFAILS); return -1; } if (end > fq->q.len) { -- 2.53.0