[PATCH AUTOSEL 6.18] xsk: fix XDP_UMEM_SG

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.18] xsk: fix XDP_UMEM_SG_FLAG issues
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
@ 2026-04-20 13:16 ` Sasha Levin
  2026-04-20 13:18 ` [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame Sasha Levin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:16 UTC (permalink / raw)
  To: patches, stable
  Cc: Maciej Fijalkowski, Björn Töpel, Jakub Kicinski,
	Sasha Levin, magnus.karlsson, davem, edumazet, pabeni, ast,
	tirthendu.sarkar, netdev, bpf, linux-kernel

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[ Upstream commit 93e84fe45b752d17a5a46b306ed78f0133bbc719 ]

Currently xp_assign_dev_shared() is missing XDP_USE_SG being propagated
to flags so set it in order to preserve mtu check that is supposed to be
done only when no multi-buffer setup is in picture.

Also, this flag has the same value as XDP_UMEM_TX_SW_CSUM so we could
get unexpected SG setups for software Tx checksums. Since csum flag is
UAPI, modify value of XDP_UMEM_SG_FLAG.

Fixes: d609f3d228a8 ("xsk: add multi-buffer support for sockets sharing umem")
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-4-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 include/net/xdp_sock.h  | 2 +-
 net/xdp/xsk_buff_pool.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index ce587a2256618..7c2bc46c67050 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -14,7 +14,7 @@
 #include <linux/mm.h>
 #include <net/sock.h>
 
-#define XDP_UMEM_SG_FLAG (1 << 1)
+#define XDP_UMEM_SG_FLAG BIT(3)
 
 struct net_device;
 struct xsk_queue;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index aa9788f20d0db..677c7d00f8c32 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -259,6 +259,10 @@ int xp_assign_dev_shared(struct xsk_buff_pool *pool, struct xdp_sock *umem_xs,
 		return -EINVAL;
 
 	flags = umem->zc ? XDP_ZEROCOPY : XDP_COPY;
+
+	if (umem->flags & XDP_UMEM_SG_FLAG)
+		flags |= XDP_USE_SG;
+
 	if (umem_xs->pool->uses_need_wakeup)
 		flags |= XDP_USE_NEED_WAKEUP;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
  2026-04-20 13:16 ` [PATCH AUTOSEL 6.18] xsk: fix XDP_UMEM_SG_FLAG issues Sasha Levin
@ 2026-04-20 13:18 ` Sasha Levin
  2026-04-21 17:29   ` sashiko-bot
  2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL Sasha Levin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:18 UTC (permalink / raw)
  To: patches, stable
  Cc: Maciej Fijalkowski, Björn Töpel, Stanislav Fomichev,
	Jakub Kicinski, Sasha Levin, magnus.karlsson, davem, edumazet,
	pabeni, daniel, netdev, bpf, linux-kernel

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[ Upstream commit a315e022a72d95ef5f1d4e58e903cb492b0ad931 ]

The current headroom validation in xdp_umem_reg() could leave us with
insufficient space dedicated to even receive minimum-sized ethernet
frame. Furthermore if multi-buffer would come to play then
skb_shared_info stored at the end of XSK frame would be corrupted.

HW typically works with 128-aligned sizes so let us provide this value
as bare minimum.

Multi-buffer setting is known later in the configuration process so
besides accounting for 128 bytes, let us also take care of tailroom space
upfront.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: 99e3a236dd43 ("xsk: Add missing check on user supplied headroom size")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-2-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 net/xdp/xdp_umem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 9f76ca591d54f..9ec7bd948acc7 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -202,7 +202,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
 	if (!unaligned_chunks && chunks_rem)
 		return -EINVAL;
 
-	if (headroom >= chunk_size - XDP_PACKET_HEADROOM)
+	if (headroom > chunk_size - XDP_PACKET_HEADROOM -
+		       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) - 128)
 		return -EINVAL;
 
 	if (mr->flags & XDP_UMEM_TX_METADATA_LEN) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
  2026-04-20 13:16 ` [PATCH AUTOSEL 6.18] xsk: fix XDP_UMEM_SG_FLAG issues Sasha Levin
  2026-04-20 13:18 ` [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame Sasha Levin
@ 2026-04-20 13:19 ` Sasha Levin
  2026-04-20 21:22   ` sashiko-bot
  2026-04-20 13:20 ` [PATCH AUTOSEL 6.18] xsk: respect tailroom for ZC setups Sasha Levin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:19 UTC (permalink / raw)
  To: patches, stable
  Cc: Hari Bathini, Ilya Leoshkevich, Alexei Starovoitov, Sasha Levin,
	daniel, andrii, eddyz87, memxor, bpf, linux-kernel

From: Hari Bathini <hbathini@linux.ibm.com>

[ Upstream commit 3733f4be287029dad963534da3d91ac806df233d ]

Currently, tailcall count is incremented in the interpreter even when
tailcall fails due to non-existent prog. Fix this by holding off on
the tailcall count increment until after NULL check on the prog.

Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Link: https://lore.kernel.org/r/20260220062959.195101-1-hbathini@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for a thorough analysis. Let me
compile the findings.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: `bpf` (BPF subsystem, kernel core)
- **Action verb**: "Do not increment" — corrective action, clearly a bug
  fix
- **Summary**: Stop incrementing tailcall count when the target prog is
  NULL (non-existent)

Record: [bpf] [Do not increment / fix] [Correct tailcall count behavior
when prog is NULL]

### Step 1.2: Tags
- **Suggested-by**: Ilya Leoshkevich (s390 BPF maintainer) — strong
  credibility signal
- **Signed-off-by**: Hari Bathini (author, IBM powerpc BPF contributor)
- **Link**: `https://lore.kernel.org/r/20260220062959.195101-1-
  hbathini@linux.ibm.com`
- **Signed-off-by**: Alexei Starovoitov (BPF co-maintainer applied it)
- No Fixes: tag, no Cc: stable — expected for this review pipeline

Notable: The **sibling powerpc commit** (521bd39d9d28c) by the SAME
author DOES have `Cc: stable@vger.kernel.org` and `Fixes:` tags. This
interpreter fix was submitted separately to the BPF tree.

Record: Suggested by s390 BPF maintainer, signed off by BPF co-
maintainer. Sibling powerpc commit explicitly CC'd stable.

### Step 1.3: Commit Body
The commit explains: the BPF interpreter increments `tail_call_cnt` even
when the tail call fails because the program at the requested index is
NULL. The fix moves the increment after the NULL check.

Record: Bug = premature tail_call_cnt increment. Symptom = tail call
budget consumed by failed (NULL prog) tail calls, causing later
legitimate tail calls to fail prematurely.

### Step 1.4: Hidden Bug Fix Detection
This is **not** hidden — it's an explicit correctness fix. The
interpreter's behavior diverges from the JIT implementations (x86 JIT
already only increments after verifying the prog is non-NULL, as
confirmed in the code comment: "Inc tail_call_cnt if the slot is
populated").

Record: Explicit correctness fix. Not hidden.

---

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory
- **Files changed**: 1 (`kernel/bpf/core.c`)
- **Lines**: 2 lines moved (net zero change in line count)
- **Functions modified**: `___bpf_prog_run()` — the BPF interpreter main
  loop
- **Scope**: Single-file, 2-line surgical fix

### Step 2.2: Code Flow Change
Before:
```c
tail_call_cnt++;
prog = READ_ONCE(array->ptrs[index]);
if (!prog)
    goto out;
```

After:
```c
prog = READ_ONCE(array->ptrs[index]);
if (!prog)
    goto out;
tail_call_cnt++;
```

The change simply reorders `tail_call_cnt++` to happen AFTER the NULL
check on `prog`. If `prog` is NULL, we now `goto out` WITHOUT
incrementing the count.

### Step 2.3: Bug Mechanism
**Category**: Logic/correctness bug.
When a BPF program attempts a tail call to an empty slot (NULL prog),
the tail_call_cnt was being incremented even though no actual tail call
occurred. This consumes the tail call budget for no-op operations,
potentially preventing later valid tail calls from succeeding.

### Step 2.4: Fix Quality
- **Obviously correct**: Yes — trivial reorder, clearly correct from
  both reading and comparison with JIT implementations
- **Minimal/surgical**: Yes — 2 lines moved, no other changes
- **Regression risk**: Extremely low — purely narrowing when the counter
  increments; the only behavior change is that failed tail calls no
  longer count against the budget

Record: Perfect fix quality. Minimal, obviously correct, zero regression
risk.

---

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
The buggy code (`tail_call_cnt++` at line 2090) was introduced by commit
`04fd61ab36ec` ("bpf: allow bpf programs to tail-call other bpf
programs") by Alexei Starovoitov, dated 2015-05-19. This is from
**kernel v4.2**.

Record: Bug has been present since v4.2 (2015). Present in ALL active
stable trees.

### Step 3.2: Fixes Tag
No Fixes: tag in this commit. The sibling powerpc commit has `Fixes:
ce0761419fae ("powerpc/bpf: Implement support for tail calls")`.

### Step 3.3: File History
Recent changes to `kernel/bpf/core.c` (last 20 commits) show active
development but no modifications to the `JMP_TAIL_CALL` section. The
only changes to this section since v4.2 were:
- `2a36f0b92eb6` (Wang Nan, 2015): Added `READ_ONCE()` around
  `array->ptrs[index]`
- `ebf7f6f0a6cdc` (Tiezhu Yang, 2021): Changed `>` to `>=` for
  MAX_TAIL_CALL_CNT comparison

Record: The JMP_TAIL_CALL section is very stable. Fix will apply cleanly
to all stable trees.

### Step 3.4: Author
Hari Bathini is a regular powerpc/BPF contributor at IBM. The powerpc
sibling commit was accepted via the powerpc tree with Madhavan
Srinivasan's sign-off. The interpreter fix was accepted directly by
Alexei Starovoitov (BPF co-maintainer).

### Step 3.5: Dependencies
No dependencies. The fix is completely standalone — just a 2-line
reorder within the same block.

Record: Standalone, no dependencies. Will apply cleanly.

---

## PHASE 4: MAILING LIST RESEARCH

### Step 4.1: Original Discussion
b4 dig found the series went through **4 revisions** (v1 through v4).
The interpreter fix was submitted separately (Message-ID:
`20260220062959.195101-1-hbathini@linux.ibm.com`) and the powerpc fixes
were a 6-patch series. The powerpc v4 series explicitly CC'd
`stable@vger.kernel.org`.

### Step 4.2: Reviewers
From b4 dig -w: The patch was sent to BPF maintainers (Alexei
Starovoitov, Daniel Borkmann, Andrii Nakryiko), the linuxppc-dev list,
and bpf@vger.kernel.org. The right people reviewed it.

### Step 4.3: Bug Report
No syzbot or external bug report; this was found by the author during
code review while fixing the same issue in the powerpc64 JIT. Ilya
Leoshkevich (s390 BPF maintainer) suggested the fix.

### Step 4.4: Related Patches
Part of a broader effort to fix the "increment before NULL check"
pattern across BPF JIT backends. The x86 JIT already had this correct
since the tailcall hierarchy rework (commit `116e04ba1459f`).

### Step 4.5: Stable History
The sibling powerpc commit was explicitly sent to stable. Lore was not
accessible for deeper investigation (anti-bot protection).

Record: 4 revisions, reviewed by appropriate maintainers, sibling commit
CC'd stable.

---

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: Key Functions
- `___bpf_prog_run()` — the main BPF interpreter dispatch loop

### Step 5.2: Callers
`___bpf_prog_run()` is called via the `DEFINE_BPF_PROG_RUN()` and
`DEFINE_BPF_PROG_RUN_ARGS()` macros, which are the entry points for BPF
program execution in interpreter mode. This is a HOT path when JIT is
disabled.

### Step 5.3-5.4: Call Chain
Any BPF program using tail calls that runs in interpreter mode (JIT
disabled, or CONFIG_BPF_JIT_ALWAYS_ON not set) will hit this code path.
This includes:
- XDP programs doing tail calls
- TC classifier programs
- Tracing programs
- Any BPF program type using `bpf_tail_call()`

Record: Core interpreter path, reachable from any BPF tail call when JIT
is disabled.

### Step 5.5: Similar Patterns
The x86 JIT already has the correct pattern:

```775:776:arch/x86/net/bpf_jit_comp.c
        /* Inc tail_call_cnt if the slot is populated. */
        EMIT4(0x48, 0x83, 0x00, 0x01);            /* add qword ptr
[rax], 1 */
```

This confirms the interpreter was the outlier with the incorrect
ordering.

---

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Buggy Code in Stable
The buggy code (from commit `04fd61ab36ec`, v4.2) exists in ALL active
stable trees (5.4, 5.10, 5.15, 6.1, 6.6, 6.12). The JMP_TAIL_CALL
section has been nearly unchanged since 2015.

### Step 6.2: Backport Complications
None. The code section is identical across all stable trees (only the
`>` vs `>=` comparison changed in 6.1+, which doesn't affect this fix).
The patch will apply cleanly.

### Step 6.3: Related Fixes in Stable
No similar fix for the interpreter has been applied to stable.

Record: Fix applies to all stable trees. Clean apply expected.

---

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1: Subsystem Criticality
**BPF subsystem** (`kernel/bpf/`) — **CORE**. BPF is used extensively by
networking (XDP, TC), tracing, security (seccomp), and observability
tools (bpftrace, Cilium).

### Step 7.2: Activity
Very active subsystem with frequent changes, but the interpreter's tail
call section has been stable for years.

Record: CORE subsystem, very high user impact.

---

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Affected Population
All users running BPF programs with tail calls in interpreter mode.
While most modern systems enable JIT, the interpreter is the fallback
and is used on architectures without JIT support or when JIT is
explicitly disabled.

### Step 8.2: Trigger Conditions
A BPF program attempts a tail call to an index in a prog_array map that
has no program loaded (NULL slot). This is a normal and expected usage
pattern — programs often check multiple slots.

### Step 8.3: Failure Mode Severity
**MEDIUM**: The bug causes incorrect behavior (premature exhaustion of
tail call budget) but doesn't cause crashes. It can cause BPF programs
to behave incorrectly — legitimate tail calls silently fail when they
shouldn't. This is a correctness issue that can lead to subtle, hard-to-
debug BPF program misbehavior.

### Step 8.4: Risk-Benefit
- **BENEFIT**: Fixes correctness of BPF interpreter tail call counting,
  consistent with JIT behavior. Affects all stable trees.
- **RISK**: Extremely low — 2 lines reordered within a single code
  block, obviously correct, matches JIT behavior.
- **Ratio**: Very favorable — high benefit, near-zero risk.

---

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary

**FOR backporting:**
- Fixes a real correctness bug in the BPF interpreter present since v4.2
  (2015)
- Extremely small and obviously correct (2-line reorder)
- Makes interpreter consistent with JIT behavior (x86 JIT comment says
  "Inc tail_call_cnt if the slot is populated")
- Suggested by s390 BPF maintainer (Ilya Leoshkevich)
- Accepted by BPF co-maintainer (Alexei Starovoitov)
- Sibling powerpc commit explicitly CC'd stable@vger.kernel.org
- No dependencies, applies cleanly to all stable trees
- Zero regression risk
- Tested by Venkat Rao Bagalkote (sibling commit)

**AGAINST backporting:**
- No crash/security impact — correctness issue only
- Primarily affects interpreter mode (JIT is more commonly used)
- No Fixes: tag on this specific commit (though sibling has one)

### Step 9.2: Stable Rules Checklist
1. Obviously correct and tested? **YES** — trivially verifiable, tested
   in sibling
2. Fixes a real bug? **YES** — incorrect tail call count behavior
3. Important issue? **MEDIUM** — correctness bug causing silent
   misbehavior
4. Small and contained? **YES** — 2 lines in 1 file
5. No new features or APIs? **YES** — no new features
6. Can apply to stable trees? **YES** — clean apply expected

### Step 9.3: Exception Categories
Not an exception category — this is a standard correctness fix.

### Step 9.4: Decision
The fix is trivially small (2-line reorder), obviously correct, fixes a
real correctness bug that has existed since 2015, was suggested by a BPF
maintainer and accepted by the BPF co-maintainer, and the sibling commit
for the same bug in powerpc was explicitly CC'd to stable. The risk is
essentially zero.

---

## Verification

- [Phase 1] Parsed tags: Suggested-by Ilya Leoshkevich (s390 BPF
  maintainer), Signed-off-by Alexei Starovoitov (BPF co-maintainer)
- [Phase 1] Sibling powerpc commit 521bd39d9d28c has explicit `Cc:
  stable@vger.kernel.org` and `Fixes:` tag
- [Phase 2] Diff: 2 lines moved in `___bpf_prog_run()`,
  `tail_call_cnt++` moved after `if (!prog) goto out;`
- [Phase 3] git blame: buggy code from commit 04fd61ab36ec (v4.2,
  2015-05-19), present in all stable trees
- [Phase 3] git show ebf7f6f0a6cdc: confirmed only change to this
  section was `>` to `>=` in 2021
- [Phase 4] b4 dig -c 521bd39d9d28c: found series v1-v4, URL: `https://p
  atch.msgid.link/20260303181031.390073-2-hbathini@linux.ibm.com`
- [Phase 4] b4 dig -w: confirmed patch CC'd to stable@vger.kernel.org,
  BPF maintainers, and linuxppc-dev
- [Phase 5] x86 JIT at line 775 has comment "Inc tail_call_cnt if the
  slot is populated" — confirming interpreter was the outlier
- [Phase 5] Interpreter function `___bpf_prog_run()` is the core BPF
  execution path when JIT is disabled
- [Phase 6] JMP_TAIL_CALL section unchanged since v4.2 except for `>=`
  fix — clean apply to all stable trees
- [Phase 8] Failure mode: silent premature tail call budget exhaustion,
  severity MEDIUM
- UNVERIFIED: Lore discussion content (anti-bot protection blocked
  WebFetch), but mbox was partially read confirming stable CC

**YES**

 kernel/bpf/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7b675a451ec8e..67eb12b637a5d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2087,12 +2087,12 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 		if (unlikely(tail_call_cnt >= MAX_TAIL_CALL_CNT))
 			goto out;
 
-		tail_call_cnt++;
-
 		prog = READ_ONCE(array->ptrs[index]);
 		if (!prog)
 			goto out;
 
+		tail_call_cnt++;
+
 		/* ARG1 at this point is guaranteed to point to CTX from
 		 * the verifier side due to the fact that the tail call is
 		 * handled like a helper, that is, bpf_tail_call_proto,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.18] xsk: respect tailroom for ZC setups
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL Sasha Levin
@ 2026-04-20 13:20 ` Sasha Levin
  2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-6.6] s390/bpf: Do not increment tailcall count when prog is NULL Sasha Levin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:20 UTC (permalink / raw)
  To: patches, stable
  Cc: Maciej Fijalkowski, Björn Töpel, Stanislav Fomichev,
	Jakub Kicinski, Sasha Levin, magnus.karlsson, davem, edumazet,
	pabeni, ast, netdev, bpf, linux-kernel

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[ Upstream commit 1ee1605138fc94cc8f8f273321dd2471c64977f9 ]

Multi-buffer XDP stores information about frags in skb_shared_info that
sits at the tailroom of a packet. The storage space is reserved via
xdp_data_hard_end():

	((xdp)->data_hard_start + (xdp)->frame_sz -	\
	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

and then we refer to it via macro below:

static inline struct skb_shared_info *
xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
{
        return (struct skb_shared_info *)xdp_data_hard_end(xdp);
}

Currently we do not respect this tailroom space in multi-buffer AF_XDP
ZC scenario. To address this, introduce xsk_pool_get_tailroom() and use
it within xsk_pool_get_rx_frame_size() which is used in ZC drivers to
configure length of HW Rx buffer.

Typically drivers on Rx Hw buffers side work on 128 byte alignment so
let us align the value returned by xsk_pool_get_rx_frame_size() in order
to avoid addressing this on driver's side. This addresses the fact that
idpf uses mentioned function *before* pool->dev being set so we were at
risk that after subtracting tailroom we would not provide 128-byte
aligned value to HW.

Since xsk_pool_get_rx_frame_size() is actively used in xsk_rcv_check()
and __xsk_rcv(), add a variant of this routine that will not include 128
byte alignment and therefore old behavior is preserved.

Reviewed-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-3-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 include/net/xdp_sock_drv.h | 23 ++++++++++++++++++++++-
 net/xdp/xsk.c              |  4 ++--
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 33e072768de9d..dd1d3a6e1b780 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -37,16 +37,37 @@ static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool)
 	return XDP_PACKET_HEADROOM + pool->headroom;
 }
 
+static inline u32 xsk_pool_get_tailroom(bool mbuf)
+{
+	return mbuf ? SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : 0;
+}
+
 static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool)
 {
 	return pool->chunk_size;
 }
 
-static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
+static inline u32 __xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
 {
 	return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool);
 }
 
+static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool)
+{
+	u32 frame_size =  __xsk_pool_get_rx_frame_size(pool);
+	struct xdp_umem *umem = pool->umem;
+	bool mbuf;
+
+	/* Reserve tailroom only for zero-copy pools that opted into
+	 * multi-buffer. The reserved area is used for skb_shared_info,
+	 * matching the XDP core's xdp_data_hard_end() layout.
+	 */
+	mbuf = pool->dev && (umem->flags & XDP_UMEM_SG_FLAG);
+	frame_size -= xsk_pool_get_tailroom(mbuf);
+
+	return ALIGN_DOWN(frame_size, 128);
+}
+
 static inline u32 xsk_pool_get_rx_frag_step(struct xsk_buff_pool *pool)
 {
 	return pool->unaligned ? 0 : xsk_pool_get_chunk_size(pool);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index a78cdc3356937..259ad9a3abcc4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -239,7 +239,7 @@ static u32 xsk_copy_xdp(void *to, void **from, u32 to_len,
 
 static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
-	u32 frame_size = xsk_pool_get_rx_frame_size(xs->pool);
+	u32 frame_size = __xsk_pool_get_rx_frame_size(xs->pool);
 	void *copy_from = xsk_copy_xdp_start(xdp), *copy_to;
 	u32 from_len, meta_len, rem, num_desc;
 	struct xdp_buff_xsk *xskb;
@@ -338,7 +338,7 @@ static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
-	if (len > xsk_pool_get_rx_frame_size(xs->pool) && !xs->sg) {
+	if (len > __xsk_pool_get_rx_frame_size(xs->pool) && !xs->sg) {
 		xs->rx_dropped++;
 		return -ENOSPC;
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 7.0-6.6] s390/bpf: Do not increment tailcall count when prog is NULL
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2026-04-20 13:20 ` [PATCH AUTOSEL 6.18] xsk: respect tailroom for ZC setups Sasha Levin
@ 2026-04-20 13:20 ` Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signature Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind Sasha Levin
  6 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:20 UTC (permalink / raw)
  To: patches, stable
  Cc: Ilya Leoshkevich, Hari Bathini, Alexei Starovoitov, Sasha Levin,
	daniel, andrii, eddyz87, memxor, hca, gor, agordeev, bpf,
	linux-s390, linux-kernel

From: Ilya Leoshkevich <iii@linux.ibm.com>

[ Upstream commit e4094d56c5592dd90aa619f9480265b0689ed3d9 ]

Currently tail calling a non-existent prog results in tailcall count
increment. This is what the interpreter is doing, but this is clearly
wrong, so replace load-and-increment and compare-and-jump with load
and compare-and-jump, conditionally followed by increment and store.

Reported-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20260217161058.101346-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information I need. Let me compile my comprehensive
analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

### Step 1.1: Subject Line
- **Subsystem**: s390/bpf (s390 architecture BPF JIT compiler)
- **Action verb**: "Do not increment" (correction of incorrect behavior)
- **Summary**: Stop incrementing the tail call counter when the target
  program is NULL (non-existent)

Record: [s390/bpf] [Do not / correct] [Stop incorrect tail call counter
increment for NULL programs]

### Step 1.2: Tags
- **Reported-by**: Hari Bathini <hbathini@linux.ibm.com> - this is the
  same person who wrote the powerpc equivalent fix with `Cc: stable` and
  `Fixes:` tags
- **Signed-off-by**: Ilya Leoshkevich <iii@linux.ibm.com> (author, s390
  BPF JIT maintainer)
- **Link**:
  https://lore.kernel.org/r/20260217161058.101346-1-iii@linux.ibm.com
- **Signed-off-by**: Alexei Starovoitov <ast@kernel.org> (BPF maintainer
  applied the patch)
- No `Fixes:` tag or `Cc: stable` tag on this s390 variant (expected -
  that's why it needs review)
- The **powerpc equivalent** (521bd39d9d28c) has both `Fixes:
  ce0761419fae` and `Cc: stable@vger.kernel.org`

Record: Reported by Hari Bathini, authored by s390 BPF maintainer,
applied by BPF maintainer. Powerpc equivalent explicitly tagged for
stable.

### Step 1.3: Commit Body Analysis
- **Bug described**: Tail calling a non-existent program results in tail
  call count increment
- **Symptom**: Failed tail calls (target NULL) consume the tail call
  budget (MAX_TAIL_CALL_CNT=33), potentially preventing legitimate tail
  calls from succeeding
- **Root cause**: The `laal` (atomic load-and-add) instruction
  increments the counter before the NULL program check; the counter is
  incremented even when the tail call path branches out due to NULL prog
- **Author's explanation**: "replace load-and-increment and compare-and-
  jump with load and compare-and-jump, conditionally followed by
  increment and store"

Record: The bug causes incorrect accounting of tail calls - failed
attempts count against the limit.

### Step 1.4: Hidden Bug Fix Detection
This is NOT hidden - it's an explicit correctness fix. The commit
message says "this is clearly wrong."

Record: Explicit correctness bug fix, not disguised.

## PHASE 2: DIFF ANALYSIS

### Step 2.1: Inventory
- **Files changed**: 1 file (`arch/s390/net/bpf_jit_comp.c`)
- **Lines changed**: ~20 lines modified (net delta: +7 lines)
- **Functions modified**: `bpf_jit_insn()` in the `BPF_JMP |
  BPF_TAIL_CALL` case
- **Scope**: Single-file surgical fix in one function, one case label

Record: [1 file, ~20 lines, single function, surgical scope]

### Step 2.2: Code Flow Change

**Before**:
1. Load 1 into %w0
2. `laal %w1,%w0,off(%r15)` — atomically loads the counter into %w1 and
   adds 1 (counter is now incremented)
3. Compare %w1 (old value) against `MAX_TAIL_CALL_CNT-1`, jump out if
   exceeded
4. Load program pointer from array
5. Check if prog is NULL, branch out if so
6. (Counter is already incremented, even if prog was NULL and we
   branched out)

**After**:
1. `ly %w0,off(%r15)` — load the counter (non-atomic, no increment)
2. `clij %w0,MAX_TAIL_CALL_CNT,0xa,out` — compare against
   MAX_TAIL_CALL_CNT, jump out if >=
3. Load program pointer from array
4. Check if prog is NULL, branch out if so
5. `ahi %w0,1` — increment counter
6. `sty %w0,off(%r15)` — store incremented counter back

The increment now happens ONLY after confirming the program is non-NULL.

### Step 2.3: Bug Mechanism
**Category**: Logic/correctness fix
**Mechanism**: The counter was incremented unconditionally before
checking if the tail call target exists. This matches what x86 JIT
already does correctly (confirmed: x86 `emit_bpf_tail_call_indirect()`
line 775-776 increments after NULL check with comment "Inc tail_call_cnt
if the slot is populated").

### Step 2.4: Fix Quality
- **Obviously correct**: Yes - it matches x86 behavior and the comment
  in the code explains the intent
- **Minimal/surgical**: Yes - only reorders the JIT emissions for the
  tail call sequence
- **Regression risk**: Very low. The change from atomic `laal` to non-
  atomic `ly`/`ahi`/`sty` is safe because `tail_call_cnt` is on the
  stack frame, which is per-CPU/per-thread
- **Red flags**: None

Record: High quality fix, obviously correct, minimal regression risk.

## PHASE 3: GIT HISTORY INVESTIGATION

### Step 3.1: Blame
- The buggy atomic-increment-before-NULL-check pattern was introduced in
  6651ee070b3124 ("s390/bpf: implement bpf_tail_call() helper",
  2015-06-08, v4.2)
- The code has been present for ~11 years across all stable trees
- The `struct prog_frame` refactoring (e26d523edf2a6) changed how
  offsets are computed but didn't change the logic

Record: Bug introduced in v4.2 (2015), present in ALL stable trees.

### Step 3.2: Fixes Tag
No Fixes tag on this commit. The powerpc equivalent fixes ce0761419fae.

### Step 3.3: Related Changes
Key related commits, all in v7.0:
- eada40e057fc1: "Do not write tail call counter into helper/kfunc
  frames" (Fixes: dd691e847d28)
- c861a6b147137: "Write back tail call counter for BPF_PSEUDO_CALL"
  (Fixes: dd691e847d28)
- bc3905a71f025: "Write back tail call counter for
  BPF_TRAMP_F_CALL_ORIG" (Fixes: 528eb2cb87bc)
- e26d523edf2a6: "Describe the frame using a struct instead of
  constants"

These show active work on s390 tail call counter correctness.

Record: This is standalone - no other patches needed.

### Step 3.4: Author
Ilya Leoshkevich is the primary s390 BPF JIT developer/maintainer with
20+ commits to `arch/s390/net/`.

Record: Author is the subsystem maintainer.

### Step 3.5: Dependencies
- Requires `struct prog_frame` from e26d523edf2a6 (IN v7.0)
- For older trees (6.6 and earlier), the patch would need adaptation to
  use `STK_OFF_TCCNT` offsets instead

Record: Applies cleanly to v7.0; needs rework for 6.6 and older.

## PHASE 4: MAILING LIST

Lore was inaccessible due to bot protection. However, key facts
established:
- The powerpc equivalent (521bd39d9d28c) by the same reporter has `Cc:
  stable@vger.kernel.org`
- b4 dig found the related series at
  https://patch.msgid.link/20250813121016.163375-2-iii@linux.ibm.com

Record: Could not fetch lore discussion. Powerpc equivalent explicitly
CC'd stable.

## PHASE 5: CODE SEMANTIC ANALYSIS

### Step 5.1: Key Functions
- `bpf_jit_insn()` - the main JIT emission function, `BPF_JMP |
  BPF_TAIL_CALL` case

### Step 5.2-5.4: Impact Surface
- Every BPF program that uses tail calls on s390 is affected
- The tail call mechanism is a core BPF feature used in XDP, networking,
  and tracing
- s390 is used in enterprise environments (mainframes) where BPF is
  increasingly deployed

Record: Affects all BPF tail call users on s390.

### Step 5.5: Similar Patterns
- The **same bug** was fixed on powerpc in 521bd39d9d28c
- x86 already has the correct ordering (verified in
  `emit_bpf_tail_call_indirect()`)
- The BPF interpreter in `kernel/bpf/core.c` lines 2087-2094 actually
  has the same ordering issue (increments before NULL check), but the
  commit message acknowledges this and calls it "clearly wrong"

Record: Cross-architecture issue, x86 already fixed, powerpc fix
explicitly for stable.

## PHASE 6: STABLE TREE ANALYSIS

### Step 6.1: Buggy Code Existence
- Bug exists since v4.2 (6651ee070b3124) - present in ALL active stable
  trees
- However, the patch as-is only applies cleanly to trees with `struct
  prog_frame` (v7.0)

### Step 6.2: Backport Complications
- v7.0: Should apply cleanly (confirmed code matches the "before" side
  of diff)
- v6.6 and older: Would need rework due to different frame offset
  calculations (`STK_OFF_TCCNT` vs `struct prog_frame`)

Record: Clean apply for v7.0. Older trees need rework.

### Step 6.3: Related Fixes in Stable
No equivalent fix found in stable trees for s390.

## PHASE 7: SUBSYSTEM CONTEXT

### Step 7.1: Subsystem
- **Subsystem**: BPF JIT compiler (arch-specific, s390)
- **Criticality**: IMPORTANT - s390 is used in enterprise mainframe
  environments; BPF is critical for networking, security, and
  observability

### Step 7.2: Activity
The s390/bpf JIT has been actively developed with 20+ commits in v7.0.

## PHASE 8: IMPACT AND RISK ASSESSMENT

### Step 8.1: Who is Affected
- s390 users running BPF programs with tail calls
- Enterprise mainframe users using eBPF for networking, tracing, or
  security

### Step 8.2: Trigger Conditions
- Triggered when a BPF program does a tail call to an array index that
  has no program (NULL)
- This is a common scenario: BPF prog arrays are often sparse with some
  NULL slots
- Can be triggered from userspace (BPF programs are loaded by
  unprivileged users in some configs)

### Step 8.3: Failure Mode Severity
- **Functional failure**: Legitimate tail calls fail prematurely because
  the counter hits MAX_TAIL_CALL_CNT sooner than expected
- **Result**: BPF program behavior is incorrect; tail call chains are
  cut short
- **Severity**: MEDIUM-HIGH (incorrect behavior, program logic failure,
  potential security implications if BPF programs relied on tail call
  guarantees)

### Step 8.4: Risk-Benefit Ratio
- **Benefit**: HIGH - fixes incorrect BPF behavior affecting tail call
  chains
- **Risk**: LOW - ~20 lines, single function, obviously correct, matches
  x86 behavior, from subsystem maintainer
- **Ratio**: Favorable for backporting

## PHASE 9: FINAL SYNTHESIS

### Step 9.1: Evidence Summary

**FOR backporting:**
- Fixes a real correctness bug (tail call counter incorrectly
  incremented for NULL programs)
- Bug has existed since v4.2 (2015) - all stable trees affected
- Powerpc equivalent (521bd39d9d28c) has `Cc: stable@vger.kernel.org`
  and `Fixes:` tag
- x86 JIT already has the correct behavior
- Small, surgical, single-file fix (~20 lines)
- Author is the s390 BPF maintainer; applied by BPF maintainer
- Reported by someone who also fixed it on powerpc and tagged for stable

**AGAINST backporting:**
- No explicit `Fixes:` or `Cc: stable` on the s390 variant itself
- Needs rework for older stable trees (6.6 and earlier)

### Step 9.2: Stable Rules Checklist
1. **Obviously correct and tested?** YES - matches x86 behavior, from
   subsystem maintainer, applied by BPF maintainer
2. **Fixes a real bug?** YES - incorrect tail call counting causing
   premature tail call limit
3. **Important issue?** YES - program correctness (can make BPF programs
   fail)
4. **Small and contained?** YES - ~20 lines in one function in one file
5. **No new features/APIs?** CORRECT - pure bugfix
6. **Can apply to stable?** YES for v7.0 (verified code matches)

### Step 9.3: Exception Categories
Not an exception category - this is a standard bug fix.

### Step 9.4: Decision
This is a clear correctness fix for a long-standing bug in the s390 BPF
JIT. The equivalent powerpc fix was explicitly tagged for stable. The
fix is small, from the maintainer, and brings s390 in line with x86's
already-correct behavior.

## Verification

- [Phase 1] Parsed tags: Reported-by: Hari Bathini (also powerpc fix
  author), SOBs from s390/bpf and BPF maintainers
- [Phase 2] Diff analysis: ~20 lines changed, reorders JIT emissions to
  increment after NULL check
- [Phase 2] Verified x86 JIT (`emit_bpf_tail_call_indirect()` lines
  775-776) already increments after NULL check
- [Phase 2] Verified BPF interpreter (`kernel/bpf/core.c` lines
  2087-2094) has same buggy ordering
- [Phase 3] git blame: buggy code from 6651ee070b3124 (v4.2, 2015),
  present in all stable trees
- [Phase 3] git show 521bd39d9d28c: powerpc equivalent has `Fixes:` and
  `Cc: stable@vger.kernel.org`
- [Phase 3] git merge-base: all dependencies (e26d523edf2a6,
  eada40e057fc1, bc3905a71f025, c861a6b147137) are in v7.0
- [Phase 3] Author (Ilya Leoshkevich) confirmed as s390 BPF JIT
  maintainer via 20+ commits
- [Phase 4] Lore inaccessible (bot protection); b4 dig found related
  series URL
- [Phase 5] Verified callers: `bpf_jit_insn()` handles all BPF JIT
  emission, core function
- [Phase 6] Verified current v7.0 code (lines 1864-1895) matches
  "before" side of diff exactly
- [Phase 6] v6.6 confirmed to have same bug pattern but uses different
  frame offset calculations
- [Phase 8] Impact: all s390 BPF tail call users; trigger: tail call to
  sparse array; severity: MEDIUM-HIGH

**YES**

 arch/s390/net/bpf_jit_comp.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index bf92964246eb1..2112267486623 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1862,20 +1862,21 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
 				 jit->prg);
 
 		/*
-		 * if (tail_call_cnt++ >= MAX_TAIL_CALL_CNT)
+		 * if (tail_call_cnt >= MAX_TAIL_CALL_CNT)
 		 *         goto out;
+		 *
+		 * tail_call_cnt is read into %w0, which needs to be preserved
+		 * until it's incremented and flushed.
 		 */
 
 		off = jit->frame_off +
 		      offsetof(struct prog_frame, tail_call_cnt);
-		/* lhi %w0,1 */
-		EMIT4_IMM(0xa7080000, REG_W0, 1);
-		/* laal %w1,%w0,off(%r15) */
-		EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W1, REG_W0, REG_15, off);
-		/* clij %w1,MAX_TAIL_CALL_CNT-1,0x2,out */
+		/* ly %w0,off(%r15) */
+		EMIT6_DISP_LH(0xe3000000, 0x0058, REG_W0, REG_0, REG_15, off);
+		/* clij %w0,MAX_TAIL_CALL_CNT,0xa,out */
 		patch_2_clij = jit->prg;
-		EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W1, MAX_TAIL_CALL_CNT - 1,
-				 2, jit->prg);
+		EMIT6_PCREL_RIEC(0xec000000, 0x007f, REG_W0, MAX_TAIL_CALL_CNT,
+				 0xa, jit->prg);
 
 		/*
 		 * prog = array->ptrs[index];
@@ -1894,6 +1895,12 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp,
 		patch_3_brc = jit->prg;
 		EMIT4_PCREL_RIC(0xa7040000, 8, jit->prg);
 
+		/* tail_call_cnt++; */
+		/* ahi %w0,1 */
+		EMIT4_IMM(0xa70a0000, REG_W0, 1);
+		/* sty %w0,off(%r15) */
+		EMIT6_DISP_LH(0xe3000000, 0x0050, REG_W0, REG_0, REG_15, off);
+
 		/*
 		 * Restore registers before calling function
 		 */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 7.0-6.18] bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signature
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-6.6] s390/bpf: Do not increment tailcall count when prog is NULL Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind Sasha Levin
  6 siblings, 0 replies; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Weixie Cui, Jiri Olsa, Alexei Starovoitov, Sasha Levin, daniel,
	andrii, eddyz87, memxor, bpf, linux-kernel

From: Weixie Cui <cuiweixie@gmail.com>

[ Upstream commit ad2f7ed0ee91d63792cbe52f2b38325918ae3daa ]

kvmemdup_bpfptr() returns -EFAULT when the user pointer cannot be
copied, and -ENOMEM on allocation failure. The error path always
returned -ENOMEM, misreporting bad addresses as out-of-memory.

Return PTR_ERR(sig) so user space gets the correct errno.

Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/tencent_C9C5B2B28413D6303D505CD02BFEA4708C07@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed. Let me compile the full analysis.

---

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject Line**
- Subsystem: `bpf`
- Action verb: "propagate" (indicating a correctness fix for error
  handling)
- Summary: Propagate correct errno from `kvmemdup_bpfptr()` in
  `bpf_prog_verify_signature()`

**Step 1.2: Tags**
- `Signed-off-by: Weixie Cui <cuiweixie@gmail.com>` - author
- `Acked-by: Jiri Olsa <jolsa@kernel.org>` - well-known BPF developer,
  regular contributor
- `Link:` to lore.kernel.org
- `Signed-off-by: Alexei Starovoitov <ast@kernel.org>` - BPF co-
  maintainer applied the patch
- No `Fixes:` tag, no `Cc: stable`, no `Reported-by:` (all expected for
  autosel review)

**Step 1.3: Body Text**
The commit message clearly describes: `kvmemdup_bpfptr()` can return
either `-EFAULT` (bad user pointer) or `-ENOMEM` (allocation failure),
but the error path always returned `-ENOMEM`, misreporting bad addresses
as out-of-memory. The fix returns `PTR_ERR(sig)` to propagate the
correct errno.

**Step 1.4: Hidden Bug Fix?**
This is an explicit correctness fix for error reporting. Not hidden.

Record: [bpf] [propagate/correct] [fixes incorrect errno returned to
userspace from kvmemdup_bpfptr failure]

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- Single file: `kernel/bpf/syscall.c`
- 1 line changed: `-ENOMEM` → `PTR_ERR(sig)`
- Function modified: `bpf_prog_verify_signature()`
- Classification: single-file, single-line surgical fix

**Step 2.2: Code Flow Change**
Before: When `kvmemdup_bpfptr()` returns an error (either `-EFAULT` or
`-ENOMEM`), the function unconditionally returns `-ENOMEM`.
After: The function returns the actual error code from the failed call.

**Step 2.3: Bug Mechanism**
Category: Logic/correctness fix - incorrect error code returned to
userspace. When a user passes a bad pointer for the BPF program
signature, they get `ENOMEM` instead of `EFAULT`.

Verified from `include/linux/bpfptr.h`:

```68:79:include/linux/bpfptr.h
static inline void *kvmemdup_bpfptr_noprof(bpfptr_t src, size_t len)
{
        void *p = kvmalloc_node_align_noprof(len, 1, GFP_USER |
__GFP_NOWARN, NUMA_NO_NODE);

        if (!p)
                return ERR_PTR(-ENOMEM);
        if (copy_from_bpfptr(p, src, len)) {
                kvfree(p);
                return ERR_PTR(-EFAULT);
        }
        return p;
}
```

Other call sites in the same file (`___bpf_copy_key` at line 1700,
`map_update_elem` at line 1806-1808) already correctly propagate
`PTR_ERR()`. This fix brings `bpf_prog_verify_signature` into
consistency.

**Step 2.4: Fix Quality**
- Obviously correct: trivial `PTR_ERR()` idiom, standard kernel pattern
- Minimal: 1 line
- Zero regression risk: only changes which errno is returned, cannot
  break any functionality
- No API changes, no structure changes

## PHASE 3: GIT HISTORY INVESTIGATION

**Step 3.1: Blame**
The buggy line was introduced by commit `34927156830369` (KP Singh,
2025-09-21) - "bpf: Implement signature verification for BPF programs".
This was a v7 patch series for signed BPF programs. The incorrect
`-ENOMEM` has been present since the function's introduction.

**Step 3.2: Fixes tag**
No Fixes: tag present. The bug was introduced in `34927156830369`
(v6.18).

**Step 3.3: File History**
The file is actively maintained (many recent commits). No related fixes
for this specific issue found.

**Step 3.4: Author**
Weixie Cui is not a frequent BPF contributor (no other commits found in
the tree). However, the patch was acked by Jiri Olsa (major BPF
developer) and applied by Alexei Starovoitov (BPF co-maintainer).

**Step 3.5: Dependencies**
None. This is a completely standalone one-line fix.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1-4.2:** Lore.kernel.org is behind Anubis PoW protection, so
WebFetch fails. b4 dig found the original series that introduced the
function (v1 through v7 of the "Signed BPF programs" series). The fix
commit itself is not yet in the tree being analyzed (it's a candidate
for backport).

**Step 4.3:** No Reported-by tag. This is a code-review-found bug
(author noticed incorrect error propagation).

**Step 4.4-4.5:** Could not access lore due to bot protection. No series
dependencies for the fix.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1-5.4: Call Chain**
`__sys_bpf()` (syscall handler) → `bpf_prog_load()` →
`bpf_prog_verify_signature()` → `kvmemdup_bpfptr()`

This is directly reachable from the BPF syscall (`BPF_PROG_LOAD`
command) when `attr->signature` is set. Any userspace program loading a
signed BPF program can trigger this code path.

**Step 5.5: Similar Patterns**
The other two `kvmemdup_bpfptr()` callsites in the same file correctly
use `PTR_ERR()`. This is the only inconsistent callsite.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1:** The `bpf_prog_verify_signature` function was introduced in
commit `34927156830369` which first appeared in v6.18. Verified:
- NOT in v6.12, v6.13, v6.14, v6.15 (exit code 1 from `merge-base --is-
  ancestor`)
- IS in v6.18, v6.19, v7.0

So only 6.18.y, 6.19.y, and 7.0.y stable trees are affected (if active).

**Step 6.2:** The fix would apply cleanly to 7.0.y. For 6.18.y/6.19.y,
the context differs slightly (missing the `KMALLOC_MAX_CACHE_SIZE` check
from `ea1535e28bb377` which is only in v7.0), but the changed line
itself is identical.

**Step 6.3:** No related fixes already in stable.

## PHASE 7: SUBSYSTEM AND MAINTAINER CONTEXT

**Step 7.1:** BPF subsystem (`kernel/bpf/`) - IMPORTANT level. BPF is
widely used for networking, security, tracing, etc. The signature
verification feature specifically serves security use cases.

**Step 7.2:** Actively maintained subsystem with frequent commits.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1:** Affected users: those loading signed BPF programs
(security-conscious environments using BPF program signature
verification).

**Step 8.2:** Trigger: pass an invalid signature pointer via
`BPF_PROG_LOAD`. Unprivileged users can trigger this (if they have BPF
access). Not a crash path - just wrong errno.

**Step 8.3:** Failure mode: Incorrect errno returned to userspace
(`ENOMEM` instead of `EFAULT`). Severity: **LOW**. No crash, no data
corruption, no security vulnerability. But misleading error codes can
cause tools to take incorrect recovery actions (e.g., backing off for
memory pressure instead of reporting a programming error).

**Step 8.4:**
- Benefit: LOW-MEDIUM (correct errno for BPF signature verification
  users)
- Risk: NEAR-ZERO (1 line, standard `PTR_ERR()` pattern, cannot regress)
- Ratio: Favorable - benefit > risk

## PHASE 9: FINAL SYNTHESIS

**Evidence FOR backporting:**
- Fixes a real bug (incorrect errno to userspace)
- Trivially correct one-line fix using standard kernel idiom
- Zero regression risk
- Acked by major BPF developer (Jiri Olsa), applied by BPF maintainer
  (Alexei Starovoitov)
- Brings code into consistency with other `kvmemdup_bpfptr()` callsites
- Directly reachable from syscall path
- Standalone, no dependencies

**Evidence AGAINST backporting:**
- Low severity: incorrect errno, not a crash/corruption/security issue
- Only affects relatively new code (v6.18+), limited to a few stable
  trees
- No Reported-by, no evidence users have been affected in practice
- Stable rules ask for "important issues" - wrong errno is debatable

**Stable Rules Checklist:**
1. Obviously correct and tested? YES
2. Fixes a real bug? YES (incorrect errno)
3. Important issue? BORDERLINE - incorrect errno is a real bug but low
   severity
4. Small and contained? YES (1 line)
5. No new features or APIs? YES
6. Can apply to stable trees? YES (clean apply to 7.0.y)

## Verification

- [Phase 1] Parsed tags: Acked-by Jiri Olsa, SOB by Alexei Starovoitov
- [Phase 2] Diff analysis: 1 line changed, `-ENOMEM` -> `PTR_ERR(sig)`
  in `bpf_prog_verify_signature()`
- [Phase 2] Verified `kvmemdup_bpfptr` returns `-EFAULT` or `-ENOMEM`
  from `include/linux/bpfptr.h:68-79`
- [Phase 2] Verified other callsites (lines 1700, 1806-1808) already use
  `PTR_ERR()` correctly
- [Phase 3] git blame: buggy code introduced in `34927156830369` (KP
  Singh, 2025-09-21, v6.18)
- [Phase 3] Verified `34927156830369` NOT in v6.12 (exit 1), IS in v6.18
  (exit 0)
- [Phase 4] b4 dig found original series at lore (v7, "Signed BPF
  programs" series)
- [Phase 4] lore.kernel.org inaccessible via WebFetch (Anubis
  protection)
- [Phase 5] Call chain: `__sys_bpf()` -> `bpf_prog_load()` (line 6249)
  -> `bpf_prog_verify_signature()` (line 3034) - verified via grep
- [Phase 6] Bug exists only in v6.18+, NOT in v6.12 or earlier LTS trees
- [Phase 8] Severity: LOW (incorrect errno, no crash/corruption)
- UNVERIFIED: Could not access lore discussion to check if stable was
  discussed by reviewers

This is a borderline case. The fix is a trivially correct one-line
change with zero regression risk, fixing a genuine incorrect errno bug
in the BPF syscall path. While the severity is low (wrong error code,
not a crash), the near-zero risk makes the risk-benefit ratio favorable.
The incorrect errno could mislead BPF tools into treating a bad pointer
as a memory pressure issue.

**YES**

 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 700938782bed2..cbb8696c04b59 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2832,7 +2832,7 @@ static int bpf_prog_verify_signature(struct bpf_prog *prog, union bpf_attr *attr
 	sig = kvmemdup_bpfptr(usig, attr->signature_size);
 	if (IS_ERR(sig)) {
 		bpf_key_put(key);
-		return -ENOMEM;
+		return PTR_ERR(sig);
 	}

 	bpf_dynptr_init(&sig_ptr, sig, BPF_DYNPTR_TYPE_LOCAL, 0,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind
       [not found] <20260420132314.1023554-1-sashal@kernel.org>
                   ` (5 preceding siblings ...)
  2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signature Sasha Levin
@ 2026-04-20 13:21 ` Sasha Levin
  2026-04-21 18:02   ` sashiko-bot
  6 siblings, 1 reply; 10+ messages in thread
From: Sasha Levin @ 2026-04-20 13:21 UTC (permalink / raw)
  To: patches, stable
  Cc: Maciej Fijalkowski, Björn Töpel, Jakub Kicinski,
	Sasha Levin, magnus.karlsson, davem, edumazet, pabeni, ast,
	netdev, bpf, linux-kernel

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

[ Upstream commit 36ee60b569ba0dfb6f961333b90d19ab5b323fa9 ]

AF_XDP bind currently accepts zero-copy pool configurations without
verifying that the device MTU fits into the usable frame space provided
by the UMEM chunk.

This becomes a problem since we started to respect tailroom which is
subtracted from chunk_size (among with headroom). 2k chunk size might
not provide enough space for standard 1500 MTU, so let us catch such
settings at bind time. Furthermore, validate whether underlying HW will
be able to satisfy configured MTU wrt XSK's frame size multiplied by
supported Rx buffer chain length (that is exposed via
net_device::xdp_zc_max_segs).

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20260402154958.562179-5-maciej.fijalkowski@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Error: Failed to generate final synthesis

 net/xdp/xsk_buff_pool.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 677c7d00f8c32..a129ce6f1c25f 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -10,6 +10,8 @@
 #include "xdp_umem.h"
 #include "xsk.h"
 
+#define ETH_PAD_LEN (ETH_HLEN + 2 * VLAN_HLEN  + ETH_FCS_LEN)
+
 void xp_add_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs)
 {
 	unsigned long flags;
@@ -165,8 +167,12 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
 int xp_assign_dev(struct xsk_buff_pool *pool,
 		  struct net_device *netdev, u16 queue_id, u16 flags)
 {
+	u32 needed = netdev->mtu + ETH_PAD_LEN;
+	u32 segs = netdev->xdp_zc_max_segs;
+	bool mbuf = flags & XDP_USE_SG;
 	bool force_zc, force_copy;
 	struct netdev_bpf bpf;
+	u32 frame_size;
 	int err = 0;
 
 	ASSERT_RTNL();
@@ -186,7 +192,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
 	if (err)
 		return err;
 
-	if (flags & XDP_USE_SG)
+	if (mbuf)
 		pool->umem->flags |= XDP_UMEM_SG_FLAG;
 
 	if (flags & XDP_USE_NEED_WAKEUP)
@@ -208,8 +214,24 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
 		goto err_unreg_pool;
 	}
 
-	if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
-		err = -EOPNOTSUPP;
+	if (mbuf) {
+		if (segs == 1) {
+			err = -EOPNOTSUPP;
+			goto err_unreg_pool;
+		}
+	} else {
+		segs = 1;
+	}
+
+	/* open-code xsk_pool_get_rx_frame_size() as pool->dev is not
+	 * set yet at this point; we are before getting down to driver
+	 */
+	frame_size = __xsk_pool_get_rx_frame_size(pool) -
+		     xsk_pool_get_tailroom(mbuf);
+	frame_size = ALIGN_DOWN(frame_size, 128);
+
+	if (needed > frame_size * segs) {
+		err = -EINVAL;
 		goto err_unreg_pool;
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL
  2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL Sasha Levin
@ 2026-04-20 21:22   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-20 21:22 UTC (permalink / raw)
  To: Sasha Levin; +Cc: bpf

Sashiko AI review found 1 potential issue(s):
- [High] Incomplete fix creating BPF architectural divergence: the tail call counter is still incremented before the NULL check in multiple BPF JIT implementations.
--

commit df5323d5a23d7d76d6c427c88e1a292e4d37a90e
Author: Hari Bathini <hbathini@linux.ibm.com>

bpf: Do not increment tailcall count when prog is NULL

This commit updates the BPF interpreter to increment the tail call
counter only after verifying the target program exists. This prevents
failing tail calls from exhausting the limit.

Link: https://lore.kernel.org/r/20260220062959.195101-1-hbathini@linux.ibm.com

> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 7b675a451ec8e..67eb12b637a5d 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2087,12 +2087,12 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
>  		if (unlikely(tail_call_cnt >= MAX_TAIL_CALL_CNT))
>  			goto out;
>  
> -		tail_call_cnt++;
> -
>  		prog = READ_ONCE(array->ptrs[index]);
>  		if (!prog)
>  			goto out;
>  
> +		tail_call_cnt++;
> +

Does this change introduce a semantic divergence between the BPF
interpreter and various BPF JIT implementations?

While this updates the interpreter, multiple JIT implementations (such as
x86-32, ARM32, s390x, and RISC-V 64) appear to still modify the tail call
counter before performing the prog == NULL check.

For example, if the JIT generated code updates the counter before the NULL
check, and does not revert it on the fall-through path, the counter will
remain modified.

Will a BPF program making many failed tail calls now exhaust its 33-call
limit when JIT-compiled on these architectures, but not when running under
the interpreter?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260420132314.1023554-191-sashal@kernel.org?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame
  2026-04-20 13:18 ` [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame Sasha Levin
@ 2026-04-21 17:29   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-21 17:29 UTC (permalink / raw)
  To: Sasha Levin; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [High] The commit introduces a UAPI backward compatibility breakage and makes UMEM headroom limits unpredictable for userspace.
- [High] Permanent `locked_vm` accounting leak during short pins in `xdp_umem_pin_pages()`.
--

commit 39b8ee9e55bd1dabb902c8f19155e836abf3b229
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

xsk: tighten UMEM headroom validation to account for tailroom and min frame

This commit modifies the headroom validation in xdp_umem_reg() to reserve
128 bytes for the minimum ethernet frame size and enough tailroom for
skb_shared_info. This prevents potential corruption at the end of
multi-buffer XSK frames.

Link: https://patch.msgid.link/20260402154958.562179-2-maciej.fijalkowski@intel.com

> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 9f76ca591d54f..9ec7bd948acc7 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c

[ ... ]

> @@ -202,7 +202,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
>  	if (!unaligned_chunks && chunks_rem)
>  		return -EINVAL;
>  
> -	if (headroom >= chunk_size - XDP_PACKET_HEADROOM)
> +	if (headroom > chunk_size - XDP_PACKET_HEADROOM -
> +		       SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) - 128)
>  		return -EINVAL;
>  
>  	if (mr->flags & XDP_UMEM_TX_METADATA_LEN) {

Does this change make the maximum allowed headroom unpredictable for userspace?

Since sizeof(struct skb_shared_info) and SKB_DATA_ALIGN depend on kernel
internal structures and architecture-specific cache line sizes, userspace
applications do not have access to these values to deterministically
calculate the limit.

Could this also break existing userspace applications that legitimately
configured larger headrooms but do not enable multi-buffer later on?

While looking at the surrounding code in xdp_umem_reg(), is there an existing
locked_vm accounting leak in the error paths?

When xdp_umem_reg() attempts to pin pages, it first accounts them:

	err = xdp_umem_account_pages(umem);
	if (err)
		return err;

	err = xdp_umem_pin_pages(umem, (unsigned long)addr);
	if (err)
		goto out_account;

If pin_user_pages() successfully pins some, but not all pages,
xdp_umem_pin_pages() updates umem->npgs to the smaller count before returning
an error:

	npgs = pin_user_pages(address, umem->npgs,
			      gup_flags | FOLL_LONGTERM, &umem->pgs[0]);
	...
	if (npgs != umem->npgs) {
		if (npgs >= 0) {
			umem->npgs = npgs;
			err = -ENOMEM;
			goto out_pin;
		}

Then xdp_umem_reg() jumps to out_account and calls
xdp_umem_unaccount_pages(umem). This subtracts the newly modified, smaller
umem->npgs from umem->user->locked_vm, which appears to permanently leak the
difference in the user's locked_vm counter.

Can this be repeatedly triggered by passing ranges with unmapped holes to
exhaust the memory lock limit?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260420132314.1023554-143-sashal@kernel.org?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind
  2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind Sasha Levin
@ 2026-04-21 18:02   ` sashiko-bot
  0 siblings, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-04-21 18:02 UTC (permalink / raw)
  To: Sasha Levin; +Cc: bpf

Sashiko AI review found 4 potential issue(s):
- [Critical] The patch references undefined functions `__xsk_pool_get_rx_frame_size()` and `xsk_pool_get_tailroom()`, which will cause a build failure.
- [High] Secondary sockets binding to a shared UMEM drop the `XDP_USE_SG` flag, causing spurious MTU validation failures.
- [High] The `XDP_UMEM_SG_FLAG` is set on the persistent UMEM object before the new MTU validation and is not reverted on failure, leading to socket state corruption.
- [High] An integer underflow in the calculation of `frame_size` bypasses the MTU validation, allowing out-of-bounds DMA writes.
--

commit f3a660b520a18a5f378e6506df30590629252c09
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

    xsk: validate MTU against usable frame size on bind

AF_XDP bind currently accepts zero-copy pool configurations without
verifying that the device MTU fits into the usable frame space provided
by the UMEM chunk. 2k chunk size might not provide enough space for
standard 1500 MTU, so catch such settings at bind time.

Link: https://patch.msgid.link/20260402154958.562179-5-maciej.fijalkowski@intel.com

> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index aa9788f20d0db..5641de2ed60d1 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -165,8 +167,12 @@ static void xp_disable_drv_zc(struct xsk_buff_pool *pool)
>  int xp_assign_dev(struct xsk_buff_pool *pool,
>  		  struct net_device *netdev, u16 queue_id, u16 flags)
>  {
> +	u32 needed = netdev->mtu + ETH_PAD_LEN;
> +	u32 segs = netdev->xdp_zc_max_segs;
> +	bool mbuf = flags & XDP_USE_SG;

Secondary sockets binding to a shared UMEM omit the XDP_USE_SG flag in
xp_assign_dev_shared(). Does this cause mbuf to evaluate to false and
force segs = 1, leading to spurious MTU validation failures for secondary
sockets on devices with jumbo frames?

>  	bool force_zc, force_copy;
>  	struct netdev_bpf bpf;
> +	u32 frame_size;
>  	int err = 0;
>  
>  	ASSERT_RTNL();

[ ... ]

> @@ -186,7 +192,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
>  	if (err)
>  		return err;
>  
> -	if (flags & XDP_USE_SG)
> +	if (mbuf)
>  		pool->umem->flags |= XDP_UMEM_SG_FLAG;

If the MTU validation fails later in this function, the XDP_UMEM_SG_FLAG
will remain set on the persistent UMEM object. Can this cause a subsequent
bind on the same UMEM without XDP_USE_SG to bypass SG device setup but
still be placed in multi-buffer mode, leading to driver inconsistency?

[ ... ]

> @@ -208,8 +214,24 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
>  		goto err_unreg_pool;
>  	}
>  
> -	if (netdev->xdp_zc_max_segs == 1 && (flags & XDP_USE_SG)) {
> -		err = -EOPNOTSUPP;
> +	if (mbuf) {
> +		if (segs == 1) {
> +			err = -EOPNOTSUPP;
> +			goto err_unreg_pool;
> +		}
> +	} else {
> +		segs = 1;
> +	}
> +
> +	/* open-code xsk_pool_get_rx_frame_size() as pool->dev is not
> +	 * set yet at this point; we are before getting down to driver
> +	 */
> +	frame_size = __xsk_pool_get_rx_frame_size(pool) -
> +		     xsk_pool_get_tailroom(mbuf);

Are __xsk_pool_get_rx_frame_size() and xsk_pool_get_tailroom() defined
in the kernel tree? It looks like this might cause an implicit function
declaration error and break the build. The comment mentions open-coding
these functions, but they appear to be called directly.

Also, if the XDP UMEM is configured with a large headroom, the available
rx frame size could be smaller than the tailroom. Would this subtraction
underflow the u32 frame_size, leading to a huge positive value that bypasses
the MTU validation and allows out-of-bounds DMA writes?

> +	frame_size = ALIGN_DOWN(frame_size, 128);
> +
> +	if (needed > frame_size * segs) {
> +		err = -EINVAL;
>  		goto err_unreg_pool;
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260420132314.1023554-312-sashal@kernel.org?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-21 18:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260420132314.1023554-1-sashal@kernel.org>
2026-04-20 13:16 ` [PATCH AUTOSEL 6.18] xsk: fix XDP_UMEM_SG_FLAG issues Sasha Levin
2026-04-20 13:18 ` [PATCH AUTOSEL 6.18] xsk: tighten UMEM headroom validation to account for tailroom and min frame Sasha Levin
2026-04-21 17:29   ` sashiko-bot
2026-04-20 13:19 ` [PATCH AUTOSEL 7.0-5.10] bpf: Do not increment tailcall count when prog is NULL Sasha Levin
2026-04-20 21:22   ` sashiko-bot
2026-04-20 13:20 ` [PATCH AUTOSEL 6.18] xsk: respect tailroom for ZC setups Sasha Levin
2026-04-20 13:20 ` [PATCH AUTOSEL 7.0-6.6] s390/bpf: Do not increment tailcall count when prog is NULL Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 7.0-6.18] bpf: propagate kvmemdup_bpfptr errors from bpf_prog_verify_signature Sasha Levin
2026-04-20 13:21 ` [PATCH AUTOSEL 6.18] xsk: validate MTU against usable frame size on bind Sasha Levin
2026-04-21 18:02   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox