[PATCH AUTOSEL 7.0-5.10] selftests: fib_nexthops: test stale has

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 7.0-5.10] selftests: fib_nexthops: test stale has_v4 on nexthop replace
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
@ 2026-04-28 10:41 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.12] 9p/trans_xen: make cleanup idempotent after dataring alloc errors Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Jiayuan Chen, David Ahern, Paolo Abeni, Sasha Levin, davem,
	edumazet, kuba, shuah, netdev, linux-kselftest, linux-kernel

From: Jiayuan Chen <jiayuan.chen@linux.dev>

[ Upstream commit 104f082f5ed6d19c5d85ca905ccd4e4d01aef66e ]

Add test cases that exercise the scenario where an IPv6 nexthop is
replaced with an IPv4 nexthop while being part of a group. The group's
has_v4 flag must be updated so that subsequent IPv6 route additions are
properly rejected.

Two cases are covered:
  1. Gateway nexthop replaced across families with an existing IPv6
     route on the group (rejected by fib6_check_nh_list).
  2. Blackhole nexthop replaced across families with no existing IPv6
     route on the group (fib6_check_nh_list returns early) — this is
     the path that triggers a NULL ptr deref without the kernel fix.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20260413114522.147784-2-jiayuan.chen@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to make my analysis. Let me synthesize the
findings.

## Phase 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject line parse**
- Record: Subsystem `selftests: fib_nexthops:`, action verb `test`,
  summary: adds test cases for the "stale has_v4 on nexthop replace"
  scenario.

**Step 1.2: Tags**
- Record: `Signed-off-by: Jiayuan Chen` (author), `Reviewed-by: David
  Ahern` (subsystem maintainer), `Link:` to lore patch.msgid.link (patch
  2 of 2), `Signed-off-by: Paolo Abeni` (netdev maintainer). No explicit
  Cc: stable (expected, not a negative signal).

**Step 1.3: Body analysis**
- Record: Commit body explicitly references the kernel bug fixed by
  patch 1/2 (sibling commit). It describes two test cases: a gateway-
  family swap (caught by `fib6_check_nh_list`) and a blackhole-family
  swap that "triggers a NULL ptr deref without the kernel fix". This
  selftest is the test companion to a syzbot-reported NULL deref fix.

**Step 1.4: Hidden bug fix detection**
- Record: Not a hidden fix - this is explicitly a test-only commit. The
  kernel bug fix is in the paired commit (patch 1/2).

## Phase 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Record: Single file change
  `tools/testing/selftests/net/fib_nexthops.sh`, +22 lines, 0 removed.
  Function modified: `ipv6_fcnal_runtime()`. Scope: pure test additions
  to an existing test function.

**Step 2.2: Code flow change**
- Record: Adds two new test scenarios appended to the existing test
  series in `ipv6_fcnal_runtime()`. No existing code changed. New tests
  use existing helper `run_cmd` and `log_test`.

**Step 2.3: Bug mechanism**
- Record: No bug mechanism - this is a test file, not kernel code. The
  tests exercise:
  1. `ip nexthop replace id 89 via 172.16.1.1` (IPv6→IPv4 gateway
     replace), expects route rejection (exit 2)
  2. `ip nexthop replace id 90 blackhole` after `ip -6 nexthop add id 90
     blackhole` (IPv6→IPv4 blackhole), expects IPv6 route rejection and
     unreachable ping

**Step 2.4: Fix quality**
- Record: Test additions are small, appended at a safe location (right
  after the existing related test block and before `$IP nexthop flush`).
  No regression risk to kernel runtime - only affects test output.

## Phase 3: GIT HISTORY INVESTIGATION

**Step 3.1: File history**
- Record: `tools/testing/selftests/net/fib_nexthops.sh` has accumulated
  many test additions over the years. Recent stable-backported selftests
  include `44741e9de29b` (Add test cases for error routes deletion) and
  `46c1ef0cfcea5` (add test for IPv4 route with loopback IPv6 nexthop),
  confirming that this file receives selftest backports.

**Step 3.2: The kernel fix paired with this test**
- Record: The kernel fix is `29c95185ba32b nexthop: fix IPv6 route
  referencing IPv4 nexthop` (patch 1/2, immediately preceding this
  commit in git history). That fix has:
  - `Fixes: 7bf4796dd099 ("nexthops: add support for replace")` — buggy
    code introduced in v5.3, present in all active stable trees (v5.10+,
    v5.15+, v6.1+, v6.6+, v6.12+, v6.17+, v6.18+, v6.19+).
  - Two syzbot reports referenced.
  - 2-line `AF_INET == && AF_INET6 ==` → `!=` change; trivially correct.
  - Reviewed-by David Ahern (nexthop subsystem maintainer).

**Step 3.3: Related changes**
- Record: Historically, similar 2-patch series (fix + selftest) have
  been backported together to stable. The broader `ipv6_fcnal_runtime`
  section uses infrastructure present in all stable trees.

**Step 3.4: Author**
- Record: Jiayuan Chen is an active contributor who has been submitting
  many syzbot-related fixes recently (network UAF/NULL deref/race fixes,
  etc.)

**Step 3.5: Dependencies**
- Record: This selftest depends on the kernel fix being present -
  without it, the second test case would trigger the exact NULL pointer
  dereference panic the fix addresses. If backported without the kernel
  fix, running the test would crash the kernel.

## Phase 4: MAILING LIST RESEARCH

**Step 4.1: b4 dig on 104f082f5ed6d**
- Record: `b4 dig -c 104f082f5ed6d` matched exactly. Series is `[PATCH
  net v1 1/2, 2/2]`. Only v1 exists. URL: https://lore.kernel.org/all/20
  260413114522.147784-2-jiayuan.chen@linux.dev/

**Step 4.2: Recipients (b4 dig -w)**
- Record: Jiayuan Chen, netdev@vger.kernel.org, David Ahern (nexthop
  maintainer), David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo
  Abeni, Simon Horman, Shuah Khan, linux-kernel, linux-kselftest. All
  appropriate.

**Step 4.3: Bug report**
- Record: Thread content (saved mbox) shows David Ahern's Reviewed-by
  for both patches. Paolo Abeni applied both. The series was applied to
  netdev/net.git (the -net tree for bug fixes, not net-next which is for
  new features) - a strong indicator that this is treated as a bugfix,
  not feature.

**Step 4.4: Related patches**
- Record: Only 2 patches in the series. The selftest (2/2) is the direct
  companion to the kernel fix (1/2).

**Step 4.5: Stable discussion**
- Record: No explicit stable Cc in thread; none needed because the fix
  has a Fixes: tag and Greg KH's AUTOSEL will consider both.

## Phase 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Functions modified**
- Record: Only `ipv6_fcnal_runtime()` in a shell test script. No C code
  changes.

**Step 5.2-5.5: Impact surface**
- Record: This test is invoked when running the `fib_nexthops.sh`
  selftest. No kernel-side impact. The test validates the kernel-side
  `replace_nexthop_single()` function's handling of cross-family
  (AF_INET6 → AF_INET) nexthop replacement within groups.

## Phase 6: STABLE TREE ANALYSIS

**Step 6.1: Code in stable**
- Record: The kernel bug exists since v5.3 (verified via `git tag
  --contains 7bf4796dd099`). The `ipv6_fcnal_runtime` test function
  exists in all active stable trees (v5.10+). Context lines in the diff
  are present in stable.

**Step 6.2: Backport complications**
- Record: The surrounding `ipv6_fcnal_runtime` test body in
  stable/linux-6.19.y matches (verified indirectly through file
  history). The test should apply cleanly or with minor line-offset
  adjustment. Test uses existing `$IP`, `run_cmd`, `log_test`,
  `PING_TIMEOUT`, `$me` infrastructure all present in stable.

**Step 6.3: Related in stable**
- Record: No existing backport of this test. Similar companion selftests
  (e.g., 44741e9de29b for error routes deletion fix) were backported
  alongside their kernel fixes.

## Phase 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem**
- Record: `tools/testing/selftests/net/` - network subsystem test.
  Criticality: test-only, but validates IMPORTANT subsystem
  (networking/nexthop API).

**Step 7.2: Activity**
- Record: The nexthop subsystem is actively developed; selftests are
  regularly added.

## Phase 8: IMPACT AND RISK

**Step 8.1: Who affected**
- Record: The test-only change affects anyone running selftests. It's
  not a runtime change.

**Step 8.2: Trigger conditions**
- Record: Only triggered when `fib_nexthops.sh` is explicitly run.

**Step 8.3: Failure mode**
- Record: Without the paired kernel fix in stable, running this selftest
  WOULD trigger the NULL pointer dereference (test scenario 2 exercises
  the exact reproducer). With the fix, the test passes silently.

**Step 8.4: Risk-benefit**
- Record:
  - BENEFIT: Validates that the syzbot NULL-deref fix works in stable;
    prevents regressions. Low-medium.
  - RISK: Very low runtime risk (test-only). However, there is a
    **dependency risk**: if the selftest is backported WITHOUT the
    kernel fix (`29c95185ba32b`), running the test will crash the
    kernel. This means the two commits must travel together.

## Phase 9: FINAL SYNTHESIS

**Step 9.1: Evidence**
- FOR: Companion to an important syzbot NULL-deref fix; small (22 lines)
  self-contained test; adds to existing test file; accompanies fix that
  is strongly stable-worthy; reviewed by subsystem maintainer; submitted
  to -net tree (bugfix tree).
- AGAINST: Test-only commits don't themselves fix bugs; depends on the
  kernel fix being present to avoid panic.
- UNRESOLVED: None material.

**Step 9.2: Stable rules**
- Obviously correct? Yes, pure test additions.
- Fixes a real bug? No (test only), but validates one.
- Important issue? Indirectly - validates a NULL-deref fix.
- Small and contained? Yes - 22 lines, one file, tools/ only.
- No new features? Yes - adds test cases, not new behavior.
- Can apply to stable? Yes - should apply cleanly to all active stable
  trees.

**Step 9.3: Exceptions**
- Falls under the "test-only companion to a backport-worthy fix"
  pattern. Historical precedent (e.g., `44741e9de29b`) shows such tests
  are backported.

**Step 9.4: Decision**

This is a selftest commit that accompanies an important kernel fix
(`29c95185ba32b`, "nexthop: fix IPv6 route referencing IPv4 nexthop")
for a syzbot-reported NULL pointer dereference. The kernel fix itself is
a textbook stable candidate: 2-line surgical change, syzbot-reported,
Fixes: tag pointing to v5.3, reviewed by subsystem maintainer, applied
to the -net bugfix tree. Selftests that validate specific bug fixes are
commonly backported alongside those fixes (as demonstrated by
`44741e9de29b` already in stable), and stable maintainers generally want
such tests to be able to verify the backported fix works. The test is
small, self-contained, adds cases to an existing test file, uses only
pre-existing test infrastructure, and doesn't affect kernel runtime
behavior. Its only requirement is that it travel alongside the kernel
fix (which should also be selected).

## Verification

- [Phase 1] Parsed tags: `Signed-off-by: Jiayuan Chen`, `Reviewed-by:
  David Ahern`, `Link:` to msgid.link, `Signed-off-by: Paolo Abeni`. No
  Cc: stable (expected).
- [Phase 1] Body reference to "kernel fix" confirmed by reading mbox:
  patch 2/2 is explicit companion to patch 1/2.
- [Phase 2] Diff inventory:
  `tools/testing/selftests/net/fib_nexthops.sh` +22/-0 lines, only
  function `ipv6_fcnal_runtime()` touched.
- [Phase 2] Read lines 1180-1246 of current `fib_nexthops.sh`: verified
  the test insertion point is after existing replace-related tests and
  before `$IP nexthop flush` / "weird IPv6 cases".
- [Phase 3] `git log --grep="stale has_v4"`: identified paired commits
  `29c95185ba32b` (fix) and `104f082f5ed6d` (this selftest).
- [Phase 3] `git show 29c95185ba32b`: confirmed kernel fix is 2-line
  AF_INET/AF_INET6 comparison change with Fixes: tag and syzbot reports.
- [Phase 3] `git show 7bf4796dd099 --stat`: buggy code in
  `net/ipv4/nexthop.c` from Jun 2019.
- [Phase 3] `git tag --contains 7bf4796dd099 | grep v5`: buggy code
  present from v5.3 onward.
- [Phase 4] `b4 dig -c 104f082f5ed6d`: matched original submission;
  patch 2/2 of a 2-patch series.
- [Phase 4] `b4 dig -c 104f082f5ed6d -a`: only v1 of the series exists
  (no revisions).
- [Phase 4] `b4 dig -c 104f082f5ed6d -w`: appropriate reviewers
  including David Ahern (nexthop maintainer).
- [Phase 4] Read saved mbox `/tmp/selftest_thread.mbox`: found David
  Ahern's `Reviewed-by` on both patches and patchwork-bot confirmation
  that series was applied to netdev/net.git (bugfix tree).
- [Phase 6] `git log stable/linux-6.19.y --
  tools/testing/selftests/net/fib_nexthops.sh`: confirmed `44741e9de29b`
  and prior selftests were accepted into stable, establishing precedent.
- [Phase 6] `git log stable/linux-6.19.y --grep="has_v4"`: the new
  kernel fix `29c95185ba32b` is not yet in stable (expected - just
  merged to mainline).
- [Phase 8] Failure mode without accompanying kernel fix: running the
  test would panic the kernel (verified by reading commit body and
  reproducer).
- UNVERIFIED: Exact line-offset applicability to all stable trees not
  tested with `git apply`, but surrounding function structure appears
  stable across trees.

**YES**

 tools/testing/selftests/net/fib_nexthops.sh | 22 +++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 6eb7f95e70e15..ac868a7316946 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -1209,6 +1209,28 @@ ipv6_fcnal_runtime()
 	run_cmd "$IP ro replace 2001:db8:101::1/128 nhid 124"
 	log_test $? 0 "IPv6 route using a group after replacing v4 gateways"
 
+	# Replacing an IPv6 nexthop with an IPv4 nexthop should update has_v4
+	# for all groups using it, preventing IPv6 routes from referencing the
+	# group after the replace.
+	run_cmd "$IP nexthop add id 89 via 2001:db8:91::2 dev veth1"
+	run_cmd "$IP nexthop add id 125 group 89"
+	run_cmd "$IP nexthop replace id 89 via 172.16.1.1 dev veth1"
+	run_cmd "$IP ro replace 2001:db8:101::1/128 nhid 125"
+	log_test $? 2 "IPv6 route can not use group after v6 nexthop replaced by v4"
+
+	# Same scenario but with a blackhole nexthop: the group has no IPv6
+	# routes yet when the replace happens, so fib6_check_nh_list returns
+	# early without checking. has_v4 must still be updated to block
+	# subsequent IPv6 route additions.
+	run_cmd "$IP nexthop flush >/dev/null 2>&1"
+	run_cmd "$IP -6 nexthop add id 90 blackhole"
+	run_cmd "$IP nexthop add id 125 group 90"
+	run_cmd "$IP nexthop replace id 90 blackhole"
+	run_cmd "$IP -6 ro add 2001:db8:101::1/128 nhid 125"
+	log_test $? 2 "IPv6 route reject v6 blackhole replaced by v4 blackhole"
+	run_cmd "ip netns exec $me ping -6 2001:db8:101::1 -c1 -w$PING_TIMEOUT"
+	log_test $? 2 "Ping unreachable after rejected route"
+
 	$IP nexthop flush >/dev/null 2>&1
 
 	#
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] 9p/trans_xen: make cleanup idempotent after dataring alloc errors
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] selftests: fib_nexthops: test stale has_v4 on nexthop replace Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Yufan Chen, Stefano Stabellini, Dominique Martinet, Sasha Levin,
	ericvh, lucho, davem, edumazet, kuba, pabeni, v9fs, netdev,
	linux-kernel

From: Yufan Chen <ericterminal@gmail.com>

[ Upstream commit 72cb9ee4f6d80962df17c9763b14e62e28fd85a2 ]

xen_9pfs_front_alloc_dataring() tears down resources on failure but
leaves ring fields stale. If xen_9pfs_front_init() later jumps to the
common error path, xen_9pfs_front_free() may touch the same resources
again, causing duplicate/invalid gnttab_end_foreign_access() calls and
potentially dereferencing a freed intf pointer.

Initialize dataring sentinels before allocation, gate teardown on those
sentinels, and clear ref/intf/data/irq immediately after each release.

This keeps cleanup idempotent for partially initialized rings and
prevents repeated teardown during init failure handling.

Signed-off-by: Yufan Chen <ericterminal@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-ID: <20260324153023.86853-2-ericterminal@gmail.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough information to make a thorough analysis.

## Phase 1: Commit Message Forensics

**Step 1.1 — Subject line:**
- Record: subsystem=`9p/trans_xen`, action verb=`make` (with implicit
  "fix"), summary=make cleanup idempotent after dataring alloc errors.
  The phrase "make cleanup idempotent" is a classic disguised bug-fix
  verb pattern — it's preventing duplicate teardown.

**Step 1.2 — Tags parsing:**
- Record: `Signed-off-by: Yufan Chen` (author), `Reviewed-by: Stefano
  Stabellini <sstabellini@kernel.org>` (the original author/maintainer
  of trans_xen.c — strong endorsement), `Message-ID:` to lore, `Signed-
  off-by: Dominique Martinet <asmadeus@codewreck.org>` (9p maintainer).
  No `Fixes:`, no `Cc: stable` (expected — that's why this is being
  reviewed). No syzbot, no Reported-by.

**Step 1.3 — Body analysis:**
- Record: Body explains the mechanism precisely —
  `xen_9pfs_front_alloc_dataring()` releases resources on failure but
  leaves pointer/ref fields stale. If init then jumps to common error
  path, `xen_9pfs_front_free()` re-touches them, causing
  "duplicate/invalid `gnttab_end_foreign_access()` calls and potentially
  dereferencing a freed `intf` pointer". Symptom = double teardown + UAF
  on partially initialized rings during init failure.

**Step 1.4 — Hidden bug fix detection:**
- Record: Yes — "make cleanup idempotent" is a textbook hidden bug-fix
  subject. The phrase "potentially dereferencing a freed intf pointer"
  makes the use-after-free explicit. Cover letter (PATCH v3 0/2) states:
  "Patch 1 fixes a potential double-free/Oops during initialization
  failure" and "Tested error paths by forcing init failures on non-Xen
  systems; dmesg confirms the new sentinel-based cleanup correctly
  prevents Oops." So an actual Oops was observed.

## Phase 2: Diff Analysis

**Step 2.1 — Inventory:**
- Record: One file `net/9p/trans_xen.c`, +37/-14 lines, two functions
  changed: `xen_9pfs_front_free()` and
  `xen_9pfs_front_alloc_dataring()`. Single-file surgical fix, scope =
  error path / cleanup only.

**Step 2.2 — Code flow:**
- Record (alloc_dataring): Before — fields are not initialized to
  sentinels; on `out:` path, frees `bytes`/`intf` and revokes
  `ring->ref` unconditionally without clearing the fields. After —
  fields set to NULL/`INVALID_GRANT_REF`/-1 at the top; `out:` only
  frees what's set, then clears the fields after each release.
- Record (front_free): Before — uses `if (priv->rings[i].irq > 0)` and
  unconditionally calls `gnttab_end_foreign_access(ring->ref, NULL)` and
  `free_page(ring->intf)`. After — uses `if (ring->irq >= 0)` then
  resets to -1; checks `ring->ref != INVALID_GRANT_REF`; clears
  intf/ref/data.in/data.out/irq after each release.

**Step 2.3 — Bug mechanism:**
- Record: This is BOTH (a) error path / resource leak fixes AND (d)
  memory safety fixes:
  - **Double-free of `ring->intf`**: `xen_9pfs_front_alloc_dataring()`
    calls `free_page((unsigned long)ring->intf)` on failure but leaves
    the pointer pointing to freed memory. Init then calls
    `xen_9pfs_front_free()` whose check `if (!priv->rings[i].intf)
    break;` does NOT trip (stale non-NULL pointer), so
    `free_page((unsigned long)priv->rings[i].intf)` runs again → kernel
    page double-free.
  - **Double `gnttab_end_foreign_access` on `ring->ref`**: same path re-
    revokes a stale grant ref.
  - **Use-after-free of `ring->intf`**: if alloc failed at the
    `xenbus_alloc_evtchn` stage, `ring->data.in` was set, then `bytes`
    was freed by alloc_dataring's cleanup. On the second pass through
    front_free, the `if (ring->data.in)` branch dereferences
    `ring->intf->ring_order` and `ring->intf->ref[j]` (already-freed
    page) → UAF read; then calls `gnttab_end_foreign_access` on stale
    grant refs and `free_pages_exact` on already-freed `data.in`.

**Step 2.4 — Fix quality:**
- Record: Obviously correct — sentinel-based teardown is a standard
  idempotent-cleanup pattern. Each release is gated by a sentinel and
  the field is invalidated afterward. The change `irq > 0` → `irq >= 0`
  is also a defensive correction (with explicit `-1` init, this is the
  proper check). No new locking, no new APIs, no behaviour change on the
  success path. Regression risk is very low.

## Phase 3: Git History Investigation

**Step 3.1 — Blame:**
- Record: The buggy alloc_dataring code came from `71ebd71921e45`
  ("xen/9pfs: connect to the backend"), part of v4.12-rc1 (Apr 2017).
  Bug has been latent in every kernel since v4.12, so all currently-
  supported LTS trees (5.4, 5.10, 5.15, 6.1, 6.6, 6.12, 6.18+) carry it.

**Step 3.2 — Fixes: target:**
- Record: No `Fixes:` tag in the commit. The introducing commit
  `71ebd71921e45` is in mainline since v4.12, so it definitely exists in
  every active stable tree.

**Step 3.3 — File history:**
- Record: Recent related fixes on this file that are already in stable:
  `e43c608f40c06` ("9p/xen: fix release of IRQ"), `7ef3ae82a6ebb`
  ("9p/xen: fix init sequence"), `ea4f1009408ef` ("9p/xen: Fix UAF in
  xen_9pfs_front_remove"), `ce8ded2e61f47` ("9p/xen: protect
  xen_9pfs_front_free against concurrent calls"). All are small
  stability fixes. The current patch is standalone and not part of a
  multi-patch dependent series; series cover letter shows it splits into
  2/2 patches but patch 2 (parser cleanup with kstrtouint) is
  independent.

**Step 3.4 — Author context:**
- Record: Yufan Chen is a contributor; the patch was reviewed by Stefano
  Stabellini who is the original author/long-time maintainer of
  `trans_xen.c` (copyright at top of file). Authoritative review.

**Step 3.5 — Dependencies:**
- Record: Uses `INVALID_GRANT_REF`, defined in
  `include/xen/grant_table.h` since `bce21a2b48ede` (v5.12-rc3). This
  macro is present in all current stable LTS trees (verified in 5.15 —
  `#define INVALID_GRANT_REF ((grant_ref_t)-1)` at line 57). No other
  dependencies. Self-contained patch.

## Phase 4: Mailing List Research

**Step 4.1 — b4 dig:**
- Record: `b4 dig -c 72cb9ee4f6d80` matched by patch-id, returned `https
  ://lore.kernel.org/all/20260324153023.86853-2-ericterminal@gmail.com/`
  (v3 1/2).
- `b4 dig -a` showed evolution: v1 (single patch, 2026-02-25), v2 (1/4
  in mixed series, 2026-02-25), v3 (1/2 in dedicated 9p/trans_xen
  series, 2026-03-24). Applied version is the latest.
- v3 cover letter: "Patch 1 fixes a potential double-free/Oops during
  initialization failure by making the dataring cleanup idempotent."
  Confirms the author treats this as a stability/bug fix.

**Step 4.2 — Reviewers:**
- Record: Reviewed-by Stefano Stabellini (subsystem maintainer), CC'd
  Eric Van Hensbergen (ericvh@kernel.org), Lucho Ionkov
  (lucho@ionkov.net), and the v9fs list. The right people reviewed it.

**Step 4.3 — Bug report:**
- Record: No external bug report. Bug discovered by code inspection and
  confirmed by deliberate fault injection during testing (per the v3
  cover letter). No syzbot.

**Step 4.4 — Series context:**
- Record: 2-patch series. Patch 2 ("replace simple_strto* with
  kstrtouint") is unrelated parser modernization and not stable
  material. This patch (1/2) is fully standalone — no dependency on
  patch 2.

**Step 4.5 — Stable list:**
- Record: No prior discussion on stable list found via b4 dig. Author
  did not Cc stable, but recent precedent shows similar 9p/xen
  idempotency-style fixes (`e43c608`, `7ef3ae82`, `ea4f1009`,
  `ce8ded2e`) were backported to 5.15.y, 6.1.y, 6.6.y, 6.12.y as stable-
  eligible bug fixes.

## Phase 5: Code Semantic Analysis

**Step 5.1 — Functions modified:**
- Record: `xen_9pfs_front_free()`, `xen_9pfs_front_alloc_dataring()`.

**Step 5.2 — Callers:**
- Record: `xen_9pfs_front_alloc_dataring` is called from
  `xen_9pfs_front_init` (in a loop over `XEN_9PFS_NUM_RINGS`).
  `xen_9pfs_front_free` is called from `xen_9pfs_front_remove` (xenbus
  driver remove callback) AND from `xen_9pfs_front_init` error path.
  Critical: both callers are in the device probe/teardown flow, which is
  exactly the scenario the patch protects against.

**Step 5.3 — Callees:**
- Record: `gnttab_end_foreign_access`, `free_page`, `free_pages_exact`,
  `unbind_from_irqhandler`, `cancel_work_sync`.
  `gnttab_end_foreign_access(ref, NULL)` calls into
  `gnttab_try_end_foreign_access` → `_gnttab_end_foreign_access_ref` →
  indirect into the gnttab interface; reentering with stale ref produces
  warnings or worse on backend interaction.

**Step 5.4 — Reachability:**
- Record: Triggered from `xenbus_driver` callback chain when a 9pfs
  frontend tries to come up and any of these fails: `get_zeroed_page`
  (memory pressure), `gnttab_grant_foreign_access` (grant-table
  exhaustion — realistic on busy Xen guests), `alloc_pages_exact`,
  `xenbus_alloc_evtchn` (event-channel exhaustion),
  `bind_evtchn_to_irqhandler`. Reachable on every 9pfs frontend probe
  under resource pressure or hostile/buggy backend.

**Step 5.5 — Similar patterns:**
- Record: Idempotent-cleanup-with-sentinels is the same pattern used
  throughout xen frontends. The previous 9p/xen fixes (`e43c608`,
  `ce8ded2e`) target the same teardown function and were backported to
  stable.

## Phase 6: Cross-Referencing & Stable Tree Analysis

**Step 6.1 — Code presence:**
- Record: Verified by reading `git show
  stable/linux-6.6.y:net/9p/trans_xen.c` and `git show
  stable/linux-6.12.y:net/9p/trans_xen.c` — both contain the same buggy
  `xen_9pfs_front_alloc_dataring()` cleanup pattern and the same
  `xen_9pfs_front_free()` un-gated double-teardown. Bug present in 5.4,
  5.10, 5.15, 6.1, 6.6, 6.12, 6.18 (all active LTS).

**Step 6.2 — Backport complications:**
- Record: 6.12.y file matches mainline structure almost exactly — minor
  context-only deltas. 6.6.y / 6.1.y / 5.15.y use `priv->num_rings`
  instead of the constant in the loop and have a slightly different
  `xen_9pfs_front_free` outline (no `priv->rings` NULL check at the top
  in 6.6) — those need trivial mechanical adjustment.
  `INVALID_GRANT_REF` is available in all active LTS. Expected
  difficulty: clean-to-minor.

**Step 6.3 — Related fixes already in stable:**
- Record: Verified — `2bb3ee1bf2375` (6.6), `b9e26059664bd` (6.1),
  `4950408793b11` (5.15), `530bc9f03a102` (6.12) are the IRQ-double-free
  fix; `592fb738d8682`/`91b4763da3ee6`/`db94e06c24cd4`/`e978643c4c9c0`
  are the init-sequence fix; `a5d00dff97118` is the concurrent-
  front_free protection. None of these address the alloc-failure
  idempotency bug — this patch fills a remaining gap.

## Phase 7: Subsystem Context

**Step 7.1 — Subsystem:**
- Record: `net/9p/` — 9P virtual filesystem transport, Xen-specific.
  Criticality: PERIPHERAL globally but IMPORTANT for users who actually
  use 9P over Xen (e.g., Edera and other Xen-based confidential-
  computing/lightweight-VM stacks who recently submitted other 9p/xen
  fixes).

**Step 7.2 — Activity:**
- Record: Active subsystem with periodic stability-fix submissions in
  2024–2026; multiple recent patches went to stable.

## Phase 8: Impact and Risk

**Step 8.1 — Affected population:**
- Record: Users of Xen 9pfs frontend. Niche but real (Edera, others
  using 9p mounts in Xen guests).

**Step 8.2 — Trigger conditions:**
- Record: Failure during second-ring allocation in
  `xen_9pfs_front_init`. Triggers include memory pressure, grant-table
  exhaustion, evtchn exhaustion, malicious/buggy Xen backend. Not user-
  triggerable from unprivileged userspace, but a malicious backend can
  deliberately starve the frontend (Xen security model assumes the
  backend is more privileged but a frontend should not crash on backend
  misbehaviour).

**Step 8.3 — Severity:**
- Record: When triggered → kernel page double-free + grant ref double-
  revoke + use-after-free read on a freed page. Failure mode: kernel
  oops / panic / memory corruption. Severity: CRITICAL.

**Step 8.4 — Risk-benefit:**
- Record: Benefit = high (eliminates a confirmed Oops on init failure,
  idempotent cleanup is universally desirable). Risk = very low — pure
  error-path tightening, sentinel-based, no behaviour change on success
  path, reviewed by the original author Stefano Stabellini, tested with
  deliberate fault injection.

## Phase 9: Final Synthesis

**Evidence FOR backport:**
- Real bug — double-free of kernel page, use-after-free, double grant-
  ref revoke during init failure (CRITICAL severity)
- Reproduced (Oops) by author with fault injection in virtme-ng
- Reviewed by the original author/maintainer of the file (Stefano
  Stabellini)
- Small, surgical, single file (+37/-14)
- Bug present since v4.12 — affects every active LTS
- All dependencies (`INVALID_GRANT_REF`) present in stable trees
- Recent precedent: 4 similar 9p/xen fixes have been backported to
  5.15/6.1/6.6/6.12
- Idempotent cleanup is a textbook stable-friendly pattern, no
  behavioural change on success path

**Evidence AGAINST:**
- No `Fixes:` tag, no `Cc: stable` (expected, not a real negative
  signal)
- No external user bug report (but author observed Oops during testing)
- Niche subsystem (9p over Xen)

**Stable rules checklist:**
1. Obviously correct and tested? Yes — sentinel pattern, reviewed by
   maintainer, fault-injected by author.
2. Real bug? Yes — confirmed Oops.
3. Important issue? Yes — double-free / UAF (CRITICAL).
4. Small and contained? Yes — 51 lines, one file.
5. No new features? Correct — pure cleanup hardening.
6. Applies to stable? Yes for 6.12.y essentially clean; 6.6/6.1/5.15
   need trivial loop-variable adjustment.

**Decision:** This is a small, well-reviewed, fault-injection-confirmed
fix for a memory-safety bug (double-free + UAF) that has been latent in
Xen 9p frontend code since 2017 and exists in every active LTS tree. It
matches the pattern of multiple similar 9p/xen stability fixes already
backported to stable. Backport-worthy.

## Verification

- [Phase 1] Read commit message and v3 cover letter from saved mbox
  `/tmp/9pxen-thread.mbox` — confirmed "fixes a potential double-
  free/Oops during initialization failure" and "Tested error paths by
  forcing init failures... dmesg confirms the new sentinel-based cleanup
  correctly prevents Oops".
- [Phase 1] Confirmed Reviewed-by from Stefano Stabellini in the mbox
  thread.
- [Phase 2] Read full pre-fix `net/9p/trans_xen.c` and post-fix;
  manually traced ring-1 alloc failure scenarios at four distinct
  failure points and confirmed each leads to either double
  `free_page(intf)`, double `gnttab_end_foreign_access(ref)`, or UAF
  read of `ring->intf->ring_order`/`ring->intf->ref[j]`.
- [Phase 3] `git log --oneline --follow net/9p/trans_xen.c` showed
  `71ebd71921e45` as origin; `git describe --contains 71ebd71921e45` →
  `v4.12-rc1~103^2~31`.
- [Phase 3] `git show 71ebd71921e45` confirmed the pre-existing buggy
  `xen_9pfs_front_alloc_dataring`+`xen_9pfs_front_free` cleanup pattern
  was introduced in 2017.
- [Phase 3] Found prior related fixes (`e43c608`, `7ef3ae82`,
  `ea4f1009`, `ce8ded2e`) on the same file with their stable tree
  counterparts.
- [Phase 4] `b4 dig -c 72cb9ee4f6d80` returned the lore URL for v3 1/2.
- [Phase 4] `b4 dig -c 72cb9ee4f6d80 -a` showed v1 → v2 → v3 evolution;
  applied version is latest.
- [Phase 4] `b4 dig -m /tmp/9pxen-thread.mbox` saved full thread; read
  entire mbox with Read tool.
- [Phase 4] No `Cc: stable` and no NAKs in the thread; only Reviewed-by
  from Stefano.
- [Phase 5] Confirmed `xen_9pfs_front_alloc_dataring` is called from
  `xen_9pfs_front_init` (loop over rings) and `xen_9pfs_front_free` is
  called from both `xen_9pfs_front_init` error path and
  `xen_9pfs_front_remove`.
- [Phase 5] Read `drivers/xen/grant-table.c` to confirm
  `gnttab_end_foreign_access` chain — invoking on a stale ref re-enters
  the gnttab interface.
- [Phase 6] Verified `INVALID_GRANT_REF` exists in
  `include/xen/grant_table.h` of mainline (line 57) and
  `stable/linux-5.15.y` — backport-friendly.
- [Phase 6] Read `stable/linux-6.6.y:net/9p/trans_xen.c` and
  `stable/linux-6.12.y:net/9p/trans_xen.c` and confirmed the buggy code
  pattern is present in both.
- [Phase 6] Verified previous 9p/xen fixes were backported to
  5.15/6.1/6.6/6.12 stable branches via `git log <branch> --
  net/9p/trans_xen.c`.
- [Phase 8] Failure mode verified by manual trace: double-free of a
  kernel page + use-after-free read on stale `ring->intf` + double
  grant-revoke → CRITICAL.
- UNVERIFIED: No external bug report or syzbot reproducer; severity
  rests on author's fault-injection result and direct code analysis
  (both consistent with each other).

**YES**

 net/9p/trans_xen.c | 51 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 47af5a10e9212..85b9ebfaa17a6 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -283,25 +283,33 @@ static void xen_9pfs_front_free(struct xen_9pfs_front_priv *priv)
 
 			cancel_work_sync(&ring->work);
 
-			if (!priv->rings[i].intf)
+			if (!ring->intf)
 				break;
-			if (priv->rings[i].irq > 0)
-				unbind_from_irqhandler(priv->rings[i].irq, ring);
-			if (priv->rings[i].data.in) {
-				for (j = 0;
-				     j < (1 << priv->rings[i].intf->ring_order);
+			if (ring->irq >= 0) {
+				unbind_from_irqhandler(ring->irq, ring);
+				ring->irq = -1;
+			}
+			if (ring->data.in) {
+				for (j = 0; j < (1 << ring->intf->ring_order);
 				     j++) {
 					grant_ref_t ref;
 
-					ref = priv->rings[i].intf->ref[j];
+					ref = ring->intf->ref[j];
 					gnttab_end_foreign_access(ref, NULL);
+					ring->intf->ref[j] = INVALID_GRANT_REF;
 				}
-				free_pages_exact(priv->rings[i].data.in,
-				   1UL << (priv->rings[i].intf->ring_order +
-					   XEN_PAGE_SHIFT));
+				free_pages_exact(ring->data.in,
+						 1UL << (ring->intf->ring_order +
+							 XEN_PAGE_SHIFT));
+				ring->data.in = NULL;
+				ring->data.out = NULL;
+			}
+			if (ring->ref != INVALID_GRANT_REF) {
+				gnttab_end_foreign_access(ring->ref, NULL);
+				ring->ref = INVALID_GRANT_REF;
 			}
-			gnttab_end_foreign_access(priv->rings[i].ref, NULL);
-			free_page((unsigned long)priv->rings[i].intf);
+			free_page((unsigned long)ring->intf);
+			ring->intf = NULL;
 		}
 		kfree(priv->rings);
 	}
@@ -334,6 +342,12 @@ static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
 	int ret = -ENOMEM;
 	void *bytes = NULL;
 
+	ring->intf = NULL;
+	ring->data.in = NULL;
+	ring->data.out = NULL;
+	ring->ref = INVALID_GRANT_REF;
+	ring->irq = -1;
+
 	init_waitqueue_head(&ring->wq);
 	spin_lock_init(&ring->lock);
 	INIT_WORK(&ring->work, p9_xen_response);
@@ -379,9 +393,18 @@ static int xen_9pfs_front_alloc_dataring(struct xenbus_device *dev,
 		for (i--; i >= 0; i--)
 			gnttab_end_foreign_access(ring->intf->ref[i], NULL);
 		free_pages_exact(bytes, 1UL << (order + XEN_PAGE_SHIFT));
+		ring->data.in = NULL;
+		ring->data.out = NULL;
+	}
+	if (ring->ref != INVALID_GRANT_REF) {
+		gnttab_end_foreign_access(ring->ref, NULL);
+		ring->ref = INVALID_GRANT_REF;
+	}
+	if (ring->intf) {
+		free_page((unsigned long)ring->intf);
+		ring->intf = NULL;
 	}
-	gnttab_end_foreign_access(ring->ref, NULL);
-	free_page((unsigned long)ring->intf);
+	ring->irq = -1;
 	return ret;
 }
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] selftests: fib_nexthops: test stale has_v4 on nexthop replace Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.12] 9p/trans_xen: make cleanup idempotent after dataring alloc errors Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Daniel Borkmann, Ido Schimmel, Justin Iurman, Jakub Kicinski,
	Sasha Levin, davem, dsahern, edumazet, pabeni, netdev,
	linux-kernel

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 076b8cad77aa96557719fb5effe8703bfb64df00 ]

Commit 47d3d7ac656a ("ipv6: Implement limits on Hop-by-Hop and
Destination options") added net.ipv6.max_{hbh,dst}_opts_{cnt,len}
and applied them in ip6_parse_tlv(), the generic TLV walker
invoked from ipv6_destopt_rcv() and ipv6_parse_hopopts().

ip6_tnl_parse_tlv_enc_lim() does not go through ip6_parse_tlv();
it has its own hand-rolled TLV scanner inside its NEXTHDR_DEST
branch which looks for IPV6_TLV_TNL_ENCAP_LIMIT. That inner
loop is bounded only by optlen, which can be up to 2048 bytes.
Stuffing the Destination Options header with 2046 Pad1 (type=0)
entries advances the scanner a single byte at a time, yielding
~2000 TLV iterations per extension header.

Reusing max_dst_opts_cnt to bound the TLV iterations, matching
the semantics from 47d3d7ac656a, would require duplicating
ip6_parse_tlv() to also validate Pad1/PadN payload. It would
also mandate enforcing max_dst_opts_len, since otherwise an
attacker shifts the axis to few options with a giant PadN and
recovers the original DoS. Allowing up to 8 options before the
tunnel encapsulation limit TLV is liberal enough; in practice
encap limit is the first TLV. Thus, go with a hard-coded limit
IP6_TUNNEL_MAX_DEST_TLVS (8).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information I need. Let me compile the comprehensive
analysis.

# Phase-by-Phase Analysis

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1 - Subject Line Parse:**
- Subsystem prefix: `ipv6:`
- Action verb: "Cap" (limit/bound) — implicit bug fix verb
- Summary: Add a hard cap on TLV iterations in
  `ip6_tnl_parse_tlv_enc_lim()`
- Record: `[ipv6] [Cap] [Bound the TLV scan loop to prevent excessive
  iterations]`

**Step 1.2 - Tag Parsing:**
- Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> (author)
- Reviewed-by: Ido Schimmel <idosch@nvidia.com>
- Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
- Signed-off-by: Jakub Kicinski <kuba@kernel.org> (netdev maintainer)
- No explicit Fixes: tag, no Cc: stable, no Reported-by, no Link:
- References commit `47d3d7ac656a` ("ipv6: Implement limits on Hop-by-
  Hop and Destination options") in message body
- Record: Two independent Reviewed-by tags, applied by subsystem
  maintainer Kicinski. Pedigree is strong.

**Step 1.3 - Commit Body Analysis:**
- Describes bug: `ip6_tnl_parse_tlv_enc_lim()` has a hand-rolled TLV
  scanner in its `NEXTHDR_DEST` branch, bounded only by `optlen` (up to
  2048 bytes)
- Attack: "Stuffing the Destination Options header with 2046 Pad1
  (type=0) entries advances the scanner a single byte at a time,
  yielding ~2000 TLV iterations per extension header"
- Symptom: CPU-consuming DoS — an attacker can force ~2000 iterations
  per IPv6 extension header in a received packet
- Mentions that commit `47d3d7ac656a` already fixed the same class of
  bug in `ip6_parse_tlv()` (the generic TLV walker), but this separate
  hand-rolled scanner was missed
- Record: Clear DoS vector description, author's understanding of the
  bug mechanism is thorough

**Step 1.4 - Hidden Bug Fix Detection:**
- Subject says "Cap" rather than "Fix" but body makes explicit that this
  is a DoS fix
- This is NOT a hidden fix — the DoS mechanism is described openly
- Record: Commit is a clear bug fix despite neutral-sounding subject
  verb

## PHASE 2: DIFF ANALYSIS

**Step 2.1 - Inventory:**
- Single file: `net/ipv6/ip6_tunnel.c`
- +6 lines, 0 removed
- Function modified: `ip6_tnl_parse_tlv_enc_lim()`
- Scope: single-file surgical fix
- Record: 6 lines in 1 file, 1 function — minimal scope

**Step 2.2 - Code Flow:**
- Before: `while (1)` loop with break only when `i + sizeof(*tel) >
  optlen` — can iterate up to ~optlen/1 times when all entries are Pad1
  (type=0 advances `i` by 1 byte)
- After: new local `int tlv_cnt = 0;` declared; `if (unlikely(tlv_cnt++
  >= IP6_TUNNEL_MAX_DEST_TLVS)) break;` added at top of loop
- New macro `#define IP6_TUNNEL_MAX_DEST_TLVS 8` at file scope
- Record: Loop now breaks after at most 8 TLVs scanned per extension
  header

**Step 2.3 - Bug Mechanism Classification:**
- Category: (h) Hardware workarounds? No. This is category close to
  "bounds check" / DoS prevention — fits between logic/correctness (g)
  and memory safety (d)
- Specific: A counter-based upper bound on a while loop prevents
  attacker-controlled iteration count from causing excessive CPU use per
  received packet
- Record: DoS/CPU-exhaustion fix via iteration bound

**Step 2.4 - Fix Quality:**
- Obviously correct: the counter is incremented unconditionally,
  compared with constant 8
- Minimal: 6 lines, self-contained inside existing function
- Regression risk: In practice the encap limit TLV is the first TLV. 8
  is generous. Legitimate traffic never hits this cap. Extremely low
  risk.
- Record: High-quality, obviously-correct, minimal fix

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1 - Git Blame:**
- Ran `git blame -L 430,456 net/ipv6/ip6_tunnel.c`
- Core `while (1)` loop and TLV scanning logic attributed to
  `1da177e4c3f4` ("Linux-2.6.12-rc2", 2005-04-16) — the very beginning
  of git history
- Surrounding `nexthdr == NEXTHDR_DEST` check modified by
  `d375b98e024898` (Eric Dumazet, 2024-01-05)
- Earlier pointer-math/bounds fixes: `fbfa743a9d2a0f` (2017),
  `63117f09c768be` (2017)
- Record: **Buggy code present since git epoch (2005). Bug exists in all
  supported stable trees.**

**Step 3.2 - Follow Fixes: Tag:**
- No Fixes: tag. In the lore discussion, Ido Schimmel explicitly
  suggested: "Fixes: 1da177e4c3f4 ('Linux-2.6.12-rc2')"
- Referenced commit `47d3d7ac656a` (Tom Herbert, 2017-10-30) addressed
  the same DoS in `ip6_parse_tlv()` by adding
  `max_dst_opts_cnt`/`max_dst_opts_len` sysctls. It did not cover this
  hand-rolled scanner.
- Record: Bug is as old as git history; the analogous fix for the
  generic path is already in stable.

**Step 3.3 - File History:**
- Recent changes in this file are unrelated (DSCP handling, netns
  conversion, GRO fixes, skb_vlan_inet_prepare, etc.) — no prerequisite
  or competing fix
- `d375b98e024898` ("ip6_tunnel: fix NEXTHDR_FRAGMENT handling in
  ip6_tnl_parse_tlv_enc_lim()", 2024) is the most recent change in this
  function — itself a fix that went to stable
- Record: Standalone fix; no dependencies identified

**Step 3.4 - Author's Background:**
- Daniel Borkmann: networking/BPF maintainer, extensive
  ipv6/netfilter/BPF history
- Not a new contributor
- Record: Author has deep kernel/networking expertise

**Step 3.5 - Dependencies:**
- Patch only adds a local counter and a new macro — no external symbol
  dependencies
- Applies to the existing while loop that has been stable for decades
- Record: Standalone, self-contained

## PHASE 4: MAILING LIST RESEARCH

**Step 4.1 - b4 dig:**
- `b4 dig -c 076b8cad77aa9` found the original submission at `https://lo
  re.kernel.org/all/20260421202406.717885-1-daniel@iogearbox.net/`
- Subject: **[PATCH net v3]** — "net" tree tag signals this is a bug fix
  targeting the current release cycle (not "net-next"), which is where
  stable-candidate fixes go
- `b4 dig -a`: the v3 that was applied is the latest revision; changelog
  in the patch shows v1->v2 (use abs(), remove unlikely), v2->v3 (hard
  code limit of 8 vs max_dst_opts_cnt, per Ido)
- Record: Three-revision evolution; reviewers addressed; applied version
  is final

**Step 4.2 - Reviewers (b4 dig -w):**
- To: kuba@kernel.org (Jakub Kicinski — netdev maintainer)
- Cc: edumazet@google.com (Eric Dumazet — networking maintainer),
  dsahern@kernel.org (David Ahern — ipv6 maintainer),
  tom@herbertland.com (Tom Herbert — author of the related 2017 fix),
  willemdebruijn.kernel@gmail.com, idosch@nvidia.com,
  justin.iurman@gmail.com, pabeni@redhat.com (Paolo Abeni — networking
  maintainer), netdev@vger.kernel.org
- Record: All major networking maintainers included. Reviewed by Ido
  Schimmel and Justin Iurman (IPv6 extension header reviewer)

**Step 4.3 - Bug Report:**
- No Reported-by/Link: tag — the DoS was likely identified by the author
  through code review (he explicitly analyzed the disparity with the
  already-patched `ip6_parse_tlv()`)
- Record: Proactive DoS discovery rather than user-reported

**Step 4.4 - Related Patches:**
- Single patch, not a series
- Record: Standalone

**Step 4.5 - Stable Discussion:**
- In the lore mbox: Ido Schimmel said "Given that you are targeting net
  and that the issue was always present, I would use: Fixes:
  1da177e4c3f4 ('Linux-2.6.12-rc2')"
- This strongly implies the fix is intended for stable (Fixes: tag is
  the trigger for stable-autoselect)
- Record: Reviewer explicitly suggested adding a Fixes: tag pointing to
  kernel epoch — a clear stable-backport signal

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1 - Key Functions:** `ip6_tnl_parse_tlv_enc_lim()` — the only
function modified.

**Step 5.2 - Callers (via `git grep`):**
- `net/ipv6/ip6_tunnel.c`:
  - `ip6_tnl_err()` — ICMPv6 error handler for IPv6-over-IPv6 tunnels
  - `__ip6_tnl_xmit()` — the transmit path (when protocol ==
    IPPROTO_IPV6)
- `net/ipv6/ip6_gre.c`:
  - `ip6gre_err()` — ICMPv6 error handler for GRE-over-IPv6
  - `prepare_ip6gre_xmit_ipv6()` — GRE transmit path
- Record: Called from both transmit path and ICMPv6 error handling for
  ip6 and ip6gre tunnels — network-reachable data paths on any system
  using IPv6 tunnels

**Step 5.3 - Callees:** Reads `skb->data`, uses `pskb_may_pull`. No
external state changes inside the scanner.

**Step 5.4 - Call Chain / Reachability:**
- `__ip6_tnl_xmit()` is part of `ip6_tnl_start_xmit` / `ip6_tnl_rcv_ctl`
  infrastructure — runs on every packet sent over an IPv6 tunnel when
  the inner packet has Destination Options
- `ip6_tnl_err()` is invoked from `ip6_tnl_err_proto`, called by icmpv6
  when an IPv6 tunnel packet triggers an error
- An attacker over the network can craft packets to exploit this as long
  as the target has an IPv6 tunnel configured (ip6tnl, ip6gre modules)
- Record: Data path function, reachable from remote attacker when IPv6
  tunnel is configured

**Step 5.5 - Similar Patterns:**
- The generic `ip6_parse_tlv()` in `net/ipv6/exthdrs.c` already has this
  protection via `max_hbh_opts_cnt/max_dst_opts_cnt` (commit
  47d3d7ac656a, 2017)
- This commit closes the last remaining scanner that didn't have such a
  cap
- Record: This is the final instance; other instances already protected

## PHASE 6: CROSS-REFERENCING STABLE TREES

**Step 6.1 - Buggy code in stable trees?**
- The loop structure is in the codebase since `1da177e4c3f4`
  (2.6.12-rc2)
- Present in 5.4, 5.10, 5.15, 6.1, 6.6, 6.12 and every other supported
  stable tree
- Record: All supported stable trees contain the vulnerable code

**Step 6.2 - Backport Complications:**
- The function is modified by `d375b98e024898` (Jan 2024) — this is in
  6.7+; older stable trees (5.4, 5.10, 5.15, 6.1) may have a slightly
  different surrounding context (no `nexthdr ==
  NEXTHDR_FRAGMENT`/`NEXTHDR_AUTH` branching exactly as today)
- However, the key hunk — the `if (nexthdr == NEXTHDR_DEST) { ...
  while(1) { ... }}` block — is structurally unchanged since 2005
- The patch adds a new local variable and a new `if` inside the while
  loop; this should apply cleanly or with trivial offset fuzzing
- Record: Expected to apply cleanly to all active stable trees; at worst
  a trivial context adjustment

**Step 6.3 - Related fixes already in stable?**
- `47d3d7ac656a` is in stable trees (it was the original DoS hardening,
  merged 2017)
- No previous fix for this specific hand-rolled scanner exists
- Record: No overlap; this closes a gap left by the 2017 fix

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1 - Subsystem:** `net/ipv6/` — core IPv6 networking. Affects
users of IPv6 tunnels (ip6tnl, ip6gre). IMPORTANT criticality.

**Step 7.2 - Activity:** Very active subsystem, but the specific scanner
has been stable for 20+ years. Record: Mature code, long-lived bug.

## PHASE 8: IMPACT AND RISK

**Step 8.1 - Affected Users:** All users running IPv6 tunnel drivers
(ip6tnl, ip6gre modules loaded) — common on IPv6 dual-stack routers,
tunnel endpoints, mobile backhauls, cloud overlay networks.

**Step 8.2 - Trigger Conditions:**
- Attacker sends IPv6 packet with Destination Options header containing
  2046 Pad1 entries
- Per extension header, ~2000 CPU iterations in the scanner
- Can be triggered remotely without authentication — any reachable IPv6
  tunnel endpoint
- Record: Unprivileged remote attacker can trigger; realistic DoS

**Step 8.3 - Failure Mode Severity:**
- CPU exhaustion in softirq context — affects packet processing
  throughput
- With pipelined attack traffic, can starve other network processing
- Not a crash but a performance DoS — **MEDIUM-HIGH** severity
- Record: Remote DoS / CPU exhaustion, medium-high severity

**Step 8.4 - Risk-Benefit:**
- Benefit: Closes a 20-year-old remote DoS vector on IPv6 tunnel
  endpoints; completes the hardening started by the 2017 fix
- Risk: Very low — 6-line cap at value 8, legitimate traffic never
  approaches this limit (encap limit is typically the first TLV)
- Record: Strongly favorable benefit/risk ratio

## PHASE 9: SYNTHESIS

**Step 9.1 - Evidence:**
- FOR backport:
  - Closes a known class of remote DoS (same class as 47d3d7ac656a,
    which is in stable)
  - Bug present since 2.6.12-rc2 (2005) — affects every supported stable
    tree
  - 6-line surgical fix, no new APIs, no functional change for
    legitimate traffic
  - Reviewed by two independent reviewers (Ido Schimmel, Justin Iurman),
    applied by netdev maintainer to the `net` tree
  - Reviewer explicitly suggested Fixes: 1da177e4c3f4 (signaling stable
    relevance)
  - Reachable from remote unauthenticated attacker on any IPv6 tunnel
    endpoint
  - Author (Borkmann) is a senior networking developer, patch went
    through 3 review iterations
- AGAINST backport:
  - No Fixes: tag in the applied commit (reviewer suggested one but it
    was not added)
  - No Cc: stable tag — but absence is expected per the prompt
  - No user-filed bug report / CVE — the DoS is based on code analysis
    of an analogous, already-fixed vector

**Step 9.2 - Stable Rules Checklist:**
1. Obviously correct and tested? YES — trivial counter, 2 Reviewed-by,
   merged to net
2. Fixes a real bug? YES — remote DoS via crafted IPv6 Destination
   Options
3. Important issue? YES — remote CPU exhaustion in softirq path
   (security-relevant)
4. Small and contained? YES — 6 lines in one function, one file
5. No new features/APIs? YES — purely defensive counter
6. Applies to stable? YES (likely clean; minor context fuzz possible on
   very old trees)

**Step 9.3 - Exception Category:** Not applicable by name, but fits the
spirit of "security hardening for known DoS class" — a strong stable
candidate on its own merits.

**Step 9.4 - Decision:** YES.

# Verification

- [Phase 1] Parsed tags: Reviewed-by: Ido Schimmel, Reviewed-by: Justin
  Iurman, Signed-off-by: Daniel Borkmann, Signed-off-by: Jakub Kicinski.
  No Fixes:, no Cc: stable, no Reported-by, no Link: — verified from
  commit message and `git show 076b8cad77aa9 --format=fuller`
- [Phase 2] Diff analysis: `git diff 076b8cad77aa9^ 076b8cad77aa9` —
  confirmed +6 lines (1 macro, 1 local var, 1 conditional break) in
  `net/ipv6/ip6_tunnel.c`
- [Phase 3] `git blame -L 430,456 net/ipv6/ip6_tunnel.c`: core loop
  attributed to `1da177e4c3f4` (2.6.12-rc2, 2005); surrounding context
  modified by `d375b98e024898` (2024)
- [Phase 3] `git show 47d3d7ac656a`: confirmed the referenced prior
  commit added `max_hbh/dst_opts_cnt/len` to `ip6_parse_tlv()` for an
  identical DoS class in 2017
- [Phase 3] `git log --oneline --author="Daniel Borkmann" -- net/ipv6/`:
  confirmed author has prior ipv6 work (e.g., `47e27d5e92c46`,
  `e41b0bedba029`, `a824d0b83109e`)
- [Phase 4] `b4 dig -c 076b8cad77aa9`: found lore thread `https://lore.k
  ernel.org/all/20260421202406.717885-1-daniel@iogearbox.net/`
- [Phase 4] `b4 dig -a`: confirmed v3 is the applied/final revision;
  changelog shows v1->v2 dropping unlikely and using abs(), v2->v3 hard-
  coding 8 instead of max_dst_opts_cnt per Ido's suggestion
- [Phase 4] `b4 dig -w`: confirmed Kicinski, Dumazet, Ahern, Herbert,
  Abeni and netdev were recipients — proper maintainer review
- [Phase 4] mbox content (`/tmp/tlv_thread.mbox`) lines 340-347: Ido
  Schimmel wrote "Given that you are targeting net and that the issue
  was always present, I would use: Fixes: 1da177e4c3f4
  ('Linux-2.6.12-rc2')" — confirms bug is as old as git history and
  stable-worthy
- [Phase 4] mbox lines 53-80: patchwork bot confirms patch was applied
  to netdev/net.git (main) by Kicinski
- [Phase 5] `git grep "ip6_tnl_parse_tlv_enc_lim"`: confirmed 4 call
  sites (ip6_tnl_err, __ip6_tnl_xmit, ip6gre_err,
  prepare_ip6gre_xmit_ipv6) — all in data/error paths
- [Phase 6] Code since 2.6.12-rc2 → present in all active stable trees
  (5.4, 5.10, 5.15, 6.1, 6.6, 6.12)
- [Phase 8] Failure mode: remote-triggerable CPU exhaustion (~2000
  scanner iterations per extension header) — severity medium-high (DoS),
  quoted directly from the commit message's documented attack
- UNVERIFIED: Exact patch-apply cleanliness on pre-6.7 stable branches
  (the function body was restructured by d375b98e024898 in early 2024).
  The hunk target is the inner `if (nexthdr == NEXTHDR_DEST)` block
  which has been structurally unchanged since 2005, so clean apply is
  expected, but I did not run `git apply` against the individual stable
  branches.
- UNVERIFIED: Whether a CVE has been assigned for this specific scanner
  (no public CVE reference in the commit message or lore thread).

The fix is small, surgical, fixes a real remote DoS, is well-reviewed,
affects a long-present bug in all stable trees, and was applied to the
`net` (bug-fix) tree by the netdev maintainer.

**YES**

 net/ipv6/ip6_tunnel.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0b53488a92290..b9d41b5d1853b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -62,6 +62,8 @@ MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ip6tnl");
 MODULE_ALIAS_NETDEV("ip6tnl0");
 
+#define IP6_TUNNEL_MAX_DEST_TLVS    8
+
 #define IP6_TUNNEL_HASH_SIZE_SHIFT  5
 #define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT)
 
@@ -428,11 +430,15 @@ __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw)
 				break;
 		}
 		if (nexthdr == NEXTHDR_DEST) {
+			int tlv_cnt = 0;
 			u16 i = 2;
 
 			while (1) {
 				struct ipv6_tlv_tnl_enc_lim *tel;
 
+				if (unlikely(tlv_cnt++ >= IP6_TUNNEL_MAX_DEST_TLVS))
+					break;
+
 				/* No more room for encapsulation limit */
 				if (i + sizeof(*tel) > optlen)
 					break;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-28 10:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260428104133.2858589-1-sashal@kernel.org>
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] selftests: fib_nexthops: test stale has_v4 on nexthop replace Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.12] 9p/trans_xen: make cleanup idempotent after dataring alloc errors Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-5.10] ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox