public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Qingfang Deng <dqfext@gmail.com>, Paolo Abeni <pabeni@redhat.com>,
	Sasha Levin <sashal@kernel.org>,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, linux-ppp@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-5.10] ppp: disconnect channel before nullifying pch->chan
Date: Mon, 20 Apr 2026 09:08:30 -0400	[thread overview]
Message-ID: <20260420131539.986432-44-sashal@kernel.org> (raw)
In-Reply-To: <20260420131539.986432-1-sashal@kernel.org>

From: Qingfang Deng <dqfext@gmail.com>

[ Upstream commit 6a196e83a1a7e50be93482d1cd4305641f1a9fb1 ]

In ppp_unregister_channel(), pch->chan is set to NULL before calling
ppp_disconnect_channel(), which removes the channel from ppp->channels
list using list_del_rcu() + synchronize_net(). This creates an
intermediate state where the channel is still connected (on the list)
but already unregistered (pch->chan == NULL).

Call ppp_disconnect_channel() before setting pch->chan to NULL. After
the synchronize_net(), no new reader on the transmit path will hold a
reference to the channel from the list.

This eliminates the problematic state, and prepares for removing the
pch->chan NULL checks from the transmit path in a subsequent patch.

Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260312093732.277254-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the complete
analysis.

---

## Phase 1: Commit Message Forensics

### Step 1.1: Subject Line
- **Subsystem**: `ppp` (PPP networking subsystem)
- **Action verb**: "disconnect" / implied "fix" - reorder operations to
  eliminate a race
- **Summary**: Reorder `ppp_disconnect_channel()` and `pch->chan = NULL`
  to eliminate a problematic intermediate state

### Step 1.2: Tags
- **Signed-off-by**: Qingfang Deng (author, active PPP contributor with
  9+ commits to the subsystem)
- **Link**:
  `https://patch.msgid.link/20260312093732.277254-1-dqfext@gmail.com`
  (the `-1-` suggests a single patch, not a series)
- **Signed-off-by**: Paolo Abeni (netdev co-maintainer, merge authority)
- No Fixes: tag (expected - that's why it's under review)
- No Reported-by (no explicit crash report)

### Step 1.3: Body Analysis
The commit describes a concrete race condition: `pch->chan` is set to
NULL before `ppp_disconnect_channel()` removes the channel from
`ppp->channels`. This creates an intermediate state where the channel is
still on the list but already has `pch->chan == NULL`. The fix reorders
the operations: disconnect first (remove from list), then NULL-ify.

### Step 1.4: Hidden Bug Fix Detection
The phrase "This eliminates the problematic state" clearly describes
fixing a race condition. The mention of "prepares for removing the
pch->chan NULL checks from the transmit path in a subsequent patch"
suggests a follow-up cleanup, but the reordering itself has standalone
correctness value.

**Record**: This IS a race condition fix, disguised slightly by the
preparatory framing.

---

## Phase 2: Diff Analysis

### Step 2.1: Inventory
- **1 file changed**: `drivers/net/ppp/ppp_generic.c`
- **Net change**: 0 lines added, 0 removed - purely a reorder of one
  line
- **Function modified**: `ppp_unregister_channel()`
- **Scope**: single-file, single-function, surgical

### Step 2.2: Code Flow Change
**Before**: In `ppp_unregister_channel()`:
1. `down_write(&pch->chan_sem)` + `spin_lock_bh(&pch->downl)` +
   `WRITE_ONCE(pch->chan, NULL)` + unlock
2. `ppp_disconnect_channel(pch)` - removes from `ppp->channels` via
   `list_del_rcu()` + `synchronize_net()`

**After**:
1. `ppp_disconnect_channel(pch)` - removes from list first
2. `down_write(&pch->chan_sem)` + `spin_lock_bh(&pch->downl)` +
   `WRITE_ONCE(pch->chan, NULL)` + unlock

### Step 2.3: Bug Mechanism - RACE CONDITION / NULL POINTER DEREFERENCE

The race occurs in `ppp_mp_explode()` (multilink transmit path):

```1977:2001:drivers/net/ppp/ppp_generic.c
list_for_each_entry(pch, &ppp->channels, clist) {
    if (pch->chan) {          // Read #1 - no downl lock held
        pch->avail = 1;
        navail++;
        pch->speed = pch->chan->speed;  // Read #2 - TOCTOU: pch->chan
can be NULL here!
    }
```

**Race scenario**:
1. CPU A (`ppp_mp_explode`, under `ppp_xmit_lock`): reads `pch->chan !=
   NULL` at line 1978
2. CPU B (`ppp_unregister_channel`): does `WRITE_ONCE(pch->chan, NULL)`
   under `downl` (NOT under `ppp_xmit_lock`)
3. CPU A: dereferences `pch->chan->speed` at line 1981 - **NULL pointer
   dereference / kernel oops**

The channel is still on `ppp->channels` (hasn't been disconnected yet),
so `ppp_mp_explode` finds it while iterating.

With the fix, `ppp_disconnect_channel()` removes the channel from the
list (under `ppp_lock`, which includes `ppp_xmit_lock`) FIRST. After it
returns, no transmit code iterating `ppp->channels` will see this
channel. Then setting `pch->chan = NULL` is safe.

### Step 2.4: Fix Quality
- Obviously correct - eliminates the problematic state entirely
- Minimal/surgical - one line moved
- No regression risk - the semantic operations are the same, just
  reordered
- No API changes, no locking changes, no new features

---

## Phase 3: Git History Investigation

### Step 3.1: Blame
From git blame, the buggy code ordering dates back to `^1da177e4c3f41`
(Linus Torvalds, 2005-04-16) - the initial kernel tree import. The
`WRITE_ONCE` was added by `0417adf367a0af` (August 2025), but the
fundamental ordering issue exists since v2.6.12.

### Step 3.2: No Fixes: tag (expected)

### Step 3.3: Related Changes
The same author (Qingfang Deng) previously fixed a related race in
`0417adf367a0af` ("ppp: fix race conditions in ppp_fill_forward_path"),
which explicitly documented: "pch->chan can be NULL. When
ppp_unregister_channel() is called, pch->chan is set to NULL before pch
is removed from ppp->channels." That commit added workarounds (NULL
checks, RCU). This commit fixes the **root cause**.

### Step 3.4: Author
Qingfang Deng has 9+ PPP commits, including critical race fixes and
performance improvements. They are a de facto domain expert for this
code.

### Step 3.5: Dependencies
The diff references `WRITE_ONCE(pch->chan, NULL)` which was introduced
by `0417adf367a0af`. That commit has `Fixes: f6efc675c9dd` and was
likely already auto-selected for stable. For trees without that commit,
a trivial context adjustment (`pch->chan = NULL` instead of
`WRITE_ONCE`) would be needed. The reordering logic works correctly
regardless.

---

## Phase 4: Mailing List Research

### Step 4.1-4.2
Lore is currently blocked by Anubis anti-bot protection. b4 dig could
not find this specific commit by message-id (the commit doesn't exist in
this tree). However, the patch was accepted and merged by Paolo Abeni,
netdev co-maintainer, which provides strong confidence in correctness.

### Step 4.3-4.5
The earlier commit `0417adf367a0af` explicitly documents the bug that
this commit fixes at the root: "pch->chan can be NULL. When
ppp_unregister_channel() is called, pch->chan is set to NULL before pch
is removed from ppp->channels." This was a known problem.

---

## Phase 5: Code Semantic Analysis

### Step 5.1-5.4: Call Path Tracing
The vulnerable function `ppp_mp_explode()` is called from:
- `ppp_push()` -> `ppp_send_frame()` -> `__ppp_xmit_process()` ->
  `ppp_xmit_process()`

This is the **hot transmit path** for PPP multilink, triggered every
time a packet is sent on a multilink PPP connection. The race triggers
when a channel is unregistered while multilink transmission is active -
a common scenario during PPP session teardown or link failure.

`ppp_fill_forward_path()` (already patched with NULL checks by
`0417adf367a0af`) is also affected but has workarounds. This commit
fixes the root cause for all paths.

### Step 5.5: Similar Patterns
The same TOCTOU pattern (check `pch->chan`, then dereference) also
appears at:
- Line 1978/1981: `ppp_mp_explode()` - **vulnerable** (no `downl` lock)
- Line 1912-1914: `ppp_push()` - **safe** (holds `pch->downl`)
- Line 2059-2060: `ppp_mp_explode()` phase 2 - **safe** (holds
  `pch->downl`)
- Line 2185-2189: `__ppp_channel_push()` - **safe** (holds `pch->downl`)

---

## Phase 6: Stable Tree Analysis

### Step 6.1: Buggy Code in Stable
The buggy ordering exists since v2.6.12 (the very first git commit). It
is present in ALL active stable trees.

### Step 6.2: Backport Complications
For 7.0.y: applies cleanly (code matches exactly).
For older trees without `0417adf367a0af`: trivial context change needed
(`pch->chan = NULL` vs `WRITE_ONCE(pch->chan, NULL)`), and
`list_del_rcu`/`synchronize_net()` may not be present in
`ppp_disconnect_channel()`. However, the reordering is still correct for
the transmit path because `ppp_disconnect_channel()` takes `ppp_lock()`
(which includes `ppp_xmit_lock`), ensuring mutual exclusion with
transmit path iteration.

### Step 6.3: Related Fixes
`0417adf367a0af` added workarounds (NULL checks) for the same underlying
issue. This commit fixes the root cause.

---

## Phase 7: Subsystem Context

### Step 7.1: PPP networking - **IMPORTANT** subsystem
PPP is used by DSL/dial-up connections, VPN tunnels, and
embedded/routing devices. Multilink PPP aggregates multiple physical
links, common in WAN/enterprise networking.

### Step 7.2: Actively maintained by the author (9+ commits), merged by
netdev maintainers.

---

## Phase 8: Impact and Risk Assessment

### Step 8.1: Affected Users
Users running PPP with multilink (`SC_MULTILINK` flag set). This
includes enterprise WAN, embedded networking, and PPPoE configurations.

### Step 8.2: Trigger Conditions
- Multilink PPP transmitting while a channel is being unregistered
- Happens during link failure, session teardown, or module unload
- A timing-dependent race, but the window exists every time a channel is
  unregistered during active multilink transmission

### Step 8.3: Failure Mode
**NULL pointer dereference** at `pch->chan->speed` -> kernel oops/crash.
Severity: **HIGH** (kernel crash, potential data loss from interrupted
operations).

### Step 8.4: Risk-Benefit Ratio
- **Benefit**: HIGH - prevents kernel crash in PPP multilink transmit
  path
- **Risk**: VERY LOW - moving one line, no semantic change, obviously
  correct
- **Ratio**: Strongly favorable

---

## Phase 9: Final Synthesis

### Step 9.1: Evidence
**FOR backporting**:
- Fixes a real NULL pointer dereference race condition
- The race exists since v2.6.12 (all stable trees affected)
- Extremely small change (1 line moved)
- Obviously correct - eliminates invalid intermediate state
- Author is a domain expert (9+ PPP commits)
- Accepted by netdev co-maintainer
- The same root cause was already documented by the earlier fix
  (0417adf367a0af)
- PPP multilink is a commonly used feature

**AGAINST backporting**:
- No crash report or syzbot reproduction (theoretical race, but
  realistic trigger)
- Commit message mentions "prepares for" a subsequent patch (partially
  preparatory)
- No Fixes: tag (expected)

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** - merged by Paolo Abeni
2. Fixes a real bug? **YES** - NULL deref race in multilink transmit
3. Important issue? **YES** - kernel crash (NULL pointer deref)
4. Small and contained? **YES** - 1 line moved in 1 function
5. No new features/APIs? **YES** - pure reorder
6. Applies to stable? **YES** - cleanly to 7.0.y

### Step 9.3: Exception Categories
Not an exception category - this is a standard race condition fix.

---

## Verification

- [Phase 1] Parsed tags: Signed-off-by author (Qingfang Deng) +
  maintainer (Paolo Abeni), Link to patch
- [Phase 2] Diff analysis: 1 line moved in `ppp_unregister_channel()`,
  reordering `ppp_disconnect_channel()` before `WRITE_ONCE(pch->chan,
  NULL)`
- [Phase 2] Verified TOCTOU in `ppp_mp_explode()` lines 1978/1981: reads
  `pch->chan` without `downl` lock
- [Phase 3] git blame: buggy ordering from `^1da177e4c3f41` (Linus
  Torvalds, 2005) - present in all stable trees
- [Phase 3] Related commit `0417adf367a0af` explicitly documents this
  same race condition as known bug
- [Phase 3] Author has 9 commits to drivers/net/ppp/ - domain expert
- [Phase 4] Lore blocked by Anubis; could not verify mailing list
  discussion directly
- [Phase 5] Traced call chain: `ppp_xmit_process` ->
  `__ppp_xmit_process` -> `ppp_push` -> `ppp_mp_explode` - hot transmit
  path under `ppp_xmit_lock`
- [Phase 5] Verified `ppp_disconnect_channel()` takes `ppp_lock()`
  (includes `ppp_xmit_lock`) - mutual exclusion with transmit path
- [Phase 5] Verified `WRITE_ONCE(pch->chan, NULL)` is under
  `chan_sem+downl` only, NOT `ppp_xmit_lock` - confirms race window
- [Phase 6] Code exists in all active stable trees since v2.6.12
- [Phase 6] Patch applies cleanly to 7.0.y; older trees need trivial
  context adjustment
- [Phase 8] Failure mode: NULL pointer dereference -> kernel oops,
  severity HIGH
- UNVERIFIED: Could not access lore.kernel.org to verify if stable was
  requested by a reviewer

The fix is a minimal, obviously correct reordering that eliminates a
real NULL pointer dereference race condition in the PPP multilink
transmit path. The bug has existed since the original kernel tree and
affects all stable trees. The risk is negligible (one line moved) and
the benefit is preventing a kernel crash.

**YES**

 drivers/net/ppp/ppp_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index e9b41777be809..7cd936bc6a7ea 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3023,12 +3023,12 @@ ppp_unregister_channel(struct ppp_channel *chan)
 	 * This ensures that we have returned from any calls into
 	 * the channel's start_xmit or ioctl routine before we proceed.
 	 */
+	ppp_disconnect_channel(pch);
 	down_write(&pch->chan_sem);
 	spin_lock_bh(&pch->downl);
 	WRITE_ONCE(pch->chan, NULL);
 	spin_unlock_bh(&pch->downl);
 	up_write(&pch->chan_sem);
-	ppp_disconnect_channel(pch);
 
 	pn = ppp_pernet(pch->chan_net);
 	spin_lock_bh(&pn->all_channels_lock);
-- 
2.53.0


  parent reply	other threads:[~2026-04-20 13:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260420131539.986432-1-sashal@kernel.org>
2026-04-20 13:07 ` [PATCH AUTOSEL 6.18] net: stmmac: Fix PTP ref clock for Tegra234 Sasha Levin
2026-04-20 13:07 ` [PATCH AUTOSEL 7.0-6.12] wifi: mac80211: properly handle error in ieee80211_add_virtual_monitor Sasha Levin
2026-04-20 13:07 ` [PATCH AUTOSEL 7.0-5.10] net: qrtr: fix endian handling of confirm_rx field Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] netfilter: xt_multiport: validate range encoding in checkentry Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] ice: ptp: don't WARN when controlling PF is unavailable Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] e1000: check return value of e1000_read_eeprom Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 7.0-6.6] bpf, sockmap: Annotate af_unix sock:: Sk_state data-races Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 7.0-6.18] net: wangxun: reorder timer and work sync cancellations Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 7.0-5.15] net: mvneta: support EPROBE_DEFER when reading MAC address Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 7.0-6.1] net/mlx5e: XSK, Increase size for chunk_size param Sasha Levin
2026-04-20 13:08 ` Sasha Levin [this message]
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] netfilter: nfnetlink_queue: make hash table per queue Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] net: sfp: add quirks for Hisense and HSGQ GPON ONT SFP modules Sasha Levin
2026-04-20 13:08 ` [PATCH AUTOSEL 6.18] ixgbevf: add missing negotiate_features op to Hyper-V ops table Sasha Levin
2026-04-20 13:09 ` [PATCH AUTOSEL 7.0-6.18] wifi: ath12k: Fix the assignment of logical link index Sasha Levin
2026-04-20 13:09 ` [PATCH AUTOSEL 6.18] Bluetooth: hci_sync: annotate data-races around hdev->req_status Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260420131539.986432-44-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dqfext@gmail.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-ppp@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox