* [PATCH AUTOSEL 7.0-5.10] ppp: disconnect channel before nullifying pch->chan
[not found] <20260420131539.986432-1-sashal@kernel.org>
@ 2026-04-20 13:08 ` Sasha Levin
0 siblings, 0 replies; only message in thread
From: Sasha Levin @ 2026-04-20 13:08 UTC (permalink / raw)
To: patches, stable
Cc: Qingfang Deng, Paolo Abeni, Sasha Levin, andrew+netdev, davem,
edumazet, kuba, linux-ppp, netdev, linux-kernel
From: Qingfang Deng <dqfext@gmail.com>
[ Upstream commit 6a196e83a1a7e50be93482d1cd4305641f1a9fb1 ]
In ppp_unregister_channel(), pch->chan is set to NULL before calling
ppp_disconnect_channel(), which removes the channel from ppp->channels
list using list_del_rcu() + synchronize_net(). This creates an
intermediate state where the channel is still connected (on the list)
but already unregistered (pch->chan == NULL).
Call ppp_disconnect_channel() before setting pch->chan to NULL. After
the synchronize_net(), no new reader on the transmit path will hold a
reference to the channel from the list.
This eliminates the problematic state, and prepares for removing the
pch->chan NULL checks from the transmit path in a subsequent patch.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Link: https://patch.msgid.link/20260312093732.277254-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
LLM Generated explanations, may be completely bogus:
Now I have all the information needed. Let me compile the complete
analysis.
---
## Phase 1: Commit Message Forensics
### Step 1.1: Subject Line
- **Subsystem**: `ppp` (PPP networking subsystem)
- **Action verb**: "disconnect" / implied "fix" - reorder operations to
eliminate a race
- **Summary**: Reorder `ppp_disconnect_channel()` and `pch->chan = NULL`
to eliminate a problematic intermediate state
### Step 1.2: Tags
- **Signed-off-by**: Qingfang Deng (author, active PPP contributor with
9+ commits to the subsystem)
- **Link**:
`https://patch.msgid.link/20260312093732.277254-1-dqfext@gmail.com`
(the `-1-` suggests a single patch, not a series)
- **Signed-off-by**: Paolo Abeni (netdev co-maintainer, merge authority)
- No Fixes: tag (expected - that's why it's under review)
- No Reported-by (no explicit crash report)
### Step 1.3: Body Analysis
The commit describes a concrete race condition: `pch->chan` is set to
NULL before `ppp_disconnect_channel()` removes the channel from
`ppp->channels`. This creates an intermediate state where the channel is
still on the list but already has `pch->chan == NULL`. The fix reorders
the operations: disconnect first (remove from list), then NULL-ify.
### Step 1.4: Hidden Bug Fix Detection
The phrase "This eliminates the problematic state" clearly describes
fixing a race condition. The mention of "prepares for removing the
pch->chan NULL checks from the transmit path in a subsequent patch"
suggests a follow-up cleanup, but the reordering itself has standalone
correctness value.
**Record**: This IS a race condition fix, disguised slightly by the
preparatory framing.
---
## Phase 2: Diff Analysis
### Step 2.1: Inventory
- **1 file changed**: `drivers/net/ppp/ppp_generic.c`
- **Net change**: 0 lines added, 0 removed - purely a reorder of one
line
- **Function modified**: `ppp_unregister_channel()`
- **Scope**: single-file, single-function, surgical
### Step 2.2: Code Flow Change
**Before**: In `ppp_unregister_channel()`:
1. `down_write(&pch->chan_sem)` + `spin_lock_bh(&pch->downl)` +
`WRITE_ONCE(pch->chan, NULL)` + unlock
2. `ppp_disconnect_channel(pch)` - removes from `ppp->channels` via
`list_del_rcu()` + `synchronize_net()`
**After**:
1. `ppp_disconnect_channel(pch)` - removes from list first
2. `down_write(&pch->chan_sem)` + `spin_lock_bh(&pch->downl)` +
`WRITE_ONCE(pch->chan, NULL)` + unlock
### Step 2.3: Bug Mechanism - RACE CONDITION / NULL POINTER DEREFERENCE
The race occurs in `ppp_mp_explode()` (multilink transmit path):
```1977:2001:drivers/net/ppp/ppp_generic.c
list_for_each_entry(pch, &ppp->channels, clist) {
if (pch->chan) { // Read #1 - no downl lock held
pch->avail = 1;
navail++;
pch->speed = pch->chan->speed; // Read #2 - TOCTOU: pch->chan
can be NULL here!
}
```
**Race scenario**:
1. CPU A (`ppp_mp_explode`, under `ppp_xmit_lock`): reads `pch->chan !=
NULL` at line 1978
2. CPU B (`ppp_unregister_channel`): does `WRITE_ONCE(pch->chan, NULL)`
under `downl` (NOT under `ppp_xmit_lock`)
3. CPU A: dereferences `pch->chan->speed` at line 1981 - **NULL pointer
dereference / kernel oops**
The channel is still on `ppp->channels` (hasn't been disconnected yet),
so `ppp_mp_explode` finds it while iterating.
With the fix, `ppp_disconnect_channel()` removes the channel from the
list (under `ppp_lock`, which includes `ppp_xmit_lock`) FIRST. After it
returns, no transmit code iterating `ppp->channels` will see this
channel. Then setting `pch->chan = NULL` is safe.
### Step 2.4: Fix Quality
- Obviously correct - eliminates the problematic state entirely
- Minimal/surgical - one line moved
- No regression risk - the semantic operations are the same, just
reordered
- No API changes, no locking changes, no new features
---
## Phase 3: Git History Investigation
### Step 3.1: Blame
From git blame, the buggy code ordering dates back to `^1da177e4c3f41`
(Linus Torvalds, 2005-04-16) - the initial kernel tree import. The
`WRITE_ONCE` was added by `0417adf367a0af` (August 2025), but the
fundamental ordering issue exists since v2.6.12.
### Step 3.2: No Fixes: tag (expected)
### Step 3.3: Related Changes
The same author (Qingfang Deng) previously fixed a related race in
`0417adf367a0af` ("ppp: fix race conditions in ppp_fill_forward_path"),
which explicitly documented: "pch->chan can be NULL. When
ppp_unregister_channel() is called, pch->chan is set to NULL before pch
is removed from ppp->channels." That commit added workarounds (NULL
checks, RCU). This commit fixes the **root cause**.
### Step 3.4: Author
Qingfang Deng has 9+ PPP commits, including critical race fixes and
performance improvements. They are a de facto domain expert for this
code.
### Step 3.5: Dependencies
The diff references `WRITE_ONCE(pch->chan, NULL)` which was introduced
by `0417adf367a0af`. That commit has `Fixes: f6efc675c9dd` and was
likely already auto-selected for stable. For trees without that commit,
a trivial context adjustment (`pch->chan = NULL` instead of
`WRITE_ONCE`) would be needed. The reordering logic works correctly
regardless.
---
## Phase 4: Mailing List Research
### Step 4.1-4.2
Lore is currently blocked by Anubis anti-bot protection. b4 dig could
not find this specific commit by message-id (the commit doesn't exist in
this tree). However, the patch was accepted and merged by Paolo Abeni,
netdev co-maintainer, which provides strong confidence in correctness.
### Step 4.3-4.5
The earlier commit `0417adf367a0af` explicitly documents the bug that
this commit fixes at the root: "pch->chan can be NULL. When
ppp_unregister_channel() is called, pch->chan is set to NULL before pch
is removed from ppp->channels." This was a known problem.
---
## Phase 5: Code Semantic Analysis
### Step 5.1-5.4: Call Path Tracing
The vulnerable function `ppp_mp_explode()` is called from:
- `ppp_push()` -> `ppp_send_frame()` -> `__ppp_xmit_process()` ->
`ppp_xmit_process()`
This is the **hot transmit path** for PPP multilink, triggered every
time a packet is sent on a multilink PPP connection. The race triggers
when a channel is unregistered while multilink transmission is active -
a common scenario during PPP session teardown or link failure.
`ppp_fill_forward_path()` (already patched with NULL checks by
`0417adf367a0af`) is also affected but has workarounds. This commit
fixes the root cause for all paths.
### Step 5.5: Similar Patterns
The same TOCTOU pattern (check `pch->chan`, then dereference) also
appears at:
- Line 1978/1981: `ppp_mp_explode()` - **vulnerable** (no `downl` lock)
- Line 1912-1914: `ppp_push()` - **safe** (holds `pch->downl`)
- Line 2059-2060: `ppp_mp_explode()` phase 2 - **safe** (holds
`pch->downl`)
- Line 2185-2189: `__ppp_channel_push()` - **safe** (holds `pch->downl`)
---
## Phase 6: Stable Tree Analysis
### Step 6.1: Buggy Code in Stable
The buggy ordering exists since v2.6.12 (the very first git commit). It
is present in ALL active stable trees.
### Step 6.2: Backport Complications
For 7.0.y: applies cleanly (code matches exactly).
For older trees without `0417adf367a0af`: trivial context change needed
(`pch->chan = NULL` vs `WRITE_ONCE(pch->chan, NULL)`), and
`list_del_rcu`/`synchronize_net()` may not be present in
`ppp_disconnect_channel()`. However, the reordering is still correct for
the transmit path because `ppp_disconnect_channel()` takes `ppp_lock()`
(which includes `ppp_xmit_lock`), ensuring mutual exclusion with
transmit path iteration.
### Step 6.3: Related Fixes
`0417adf367a0af` added workarounds (NULL checks) for the same underlying
issue. This commit fixes the root cause.
---
## Phase 7: Subsystem Context
### Step 7.1: PPP networking - **IMPORTANT** subsystem
PPP is used by DSL/dial-up connections, VPN tunnels, and
embedded/routing devices. Multilink PPP aggregates multiple physical
links, common in WAN/enterprise networking.
### Step 7.2: Actively maintained by the author (9+ commits), merged by
netdev maintainers.
---
## Phase 8: Impact and Risk Assessment
### Step 8.1: Affected Users
Users running PPP with multilink (`SC_MULTILINK` flag set). This
includes enterprise WAN, embedded networking, and PPPoE configurations.
### Step 8.2: Trigger Conditions
- Multilink PPP transmitting while a channel is being unregistered
- Happens during link failure, session teardown, or module unload
- A timing-dependent race, but the window exists every time a channel is
unregistered during active multilink transmission
### Step 8.3: Failure Mode
**NULL pointer dereference** at `pch->chan->speed` -> kernel oops/crash.
Severity: **HIGH** (kernel crash, potential data loss from interrupted
operations).
### Step 8.4: Risk-Benefit Ratio
- **Benefit**: HIGH - prevents kernel crash in PPP multilink transmit
path
- **Risk**: VERY LOW - moving one line, no semantic change, obviously
correct
- **Ratio**: Strongly favorable
---
## Phase 9: Final Synthesis
### Step 9.1: Evidence
**FOR backporting**:
- Fixes a real NULL pointer dereference race condition
- The race exists since v2.6.12 (all stable trees affected)
- Extremely small change (1 line moved)
- Obviously correct - eliminates invalid intermediate state
- Author is a domain expert (9+ PPP commits)
- Accepted by netdev co-maintainer
- The same root cause was already documented by the earlier fix
(0417adf367a0af)
- PPP multilink is a commonly used feature
**AGAINST backporting**:
- No crash report or syzbot reproduction (theoretical race, but
realistic trigger)
- Commit message mentions "prepares for" a subsequent patch (partially
preparatory)
- No Fixes: tag (expected)
### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** - merged by Paolo Abeni
2. Fixes a real bug? **YES** - NULL deref race in multilink transmit
3. Important issue? **YES** - kernel crash (NULL pointer deref)
4. Small and contained? **YES** - 1 line moved in 1 function
5. No new features/APIs? **YES** - pure reorder
6. Applies to stable? **YES** - cleanly to 7.0.y
### Step 9.3: Exception Categories
Not an exception category - this is a standard race condition fix.
---
## Verification
- [Phase 1] Parsed tags: Signed-off-by author (Qingfang Deng) +
maintainer (Paolo Abeni), Link to patch
- [Phase 2] Diff analysis: 1 line moved in `ppp_unregister_channel()`,
reordering `ppp_disconnect_channel()` before `WRITE_ONCE(pch->chan,
NULL)`
- [Phase 2] Verified TOCTOU in `ppp_mp_explode()` lines 1978/1981: reads
`pch->chan` without `downl` lock
- [Phase 3] git blame: buggy ordering from `^1da177e4c3f41` (Linus
Torvalds, 2005) - present in all stable trees
- [Phase 3] Related commit `0417adf367a0af` explicitly documents this
same race condition as known bug
- [Phase 3] Author has 9 commits to drivers/net/ppp/ - domain expert
- [Phase 4] Lore blocked by Anubis; could not verify mailing list
discussion directly
- [Phase 5] Traced call chain: `ppp_xmit_process` ->
`__ppp_xmit_process` -> `ppp_push` -> `ppp_mp_explode` - hot transmit
path under `ppp_xmit_lock`
- [Phase 5] Verified `ppp_disconnect_channel()` takes `ppp_lock()`
(includes `ppp_xmit_lock`) - mutual exclusion with transmit path
- [Phase 5] Verified `WRITE_ONCE(pch->chan, NULL)` is under
`chan_sem+downl` only, NOT `ppp_xmit_lock` - confirms race window
- [Phase 6] Code exists in all active stable trees since v2.6.12
- [Phase 6] Patch applies cleanly to 7.0.y; older trees need trivial
context adjustment
- [Phase 8] Failure mode: NULL pointer dereference -> kernel oops,
severity HIGH
- UNVERIFIED: Could not access lore.kernel.org to verify if stable was
requested by a reviewer
The fix is a minimal, obviously correct reordering that eliminates a
real NULL pointer dereference race condition in the PPP multilink
transmit path. The bug has existed since the original kernel tree and
affects all stable trees. The risk is negligible (one line moved) and
the benefit is preventing a kernel crash.
**YES**
drivers/net/ppp/ppp_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index e9b41777be809..7cd936bc6a7ea 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3023,12 +3023,12 @@ ppp_unregister_channel(struct ppp_channel *chan)
* This ensures that we have returned from any calls into
* the channel's start_xmit or ioctl routine before we proceed.
*/
+ ppp_disconnect_channel(pch);
down_write(&pch->chan_sem);
spin_lock_bh(&pch->downl);
WRITE_ONCE(pch->chan, NULL);
spin_unlock_bh(&pch->downl);
up_write(&pch->chan_sem);
- ppp_disconnect_channel(pch);
pn = ppp_pernet(pch->chan_net);
spin_lock_bh(&pn->all_channels_lock);
--
2.53.0
^ permalink raw reply related [flat|nested] only message in thread