[PATCH AUTOSEL 6.1 04/46] bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 6.1 04/46] bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()
       [not found] <20250604010404.5109-1-sashal@kernel.org>
@ 2025-06-04  1:03 ` Sasha Levin
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 29/46] bpftool: Fix cgroup command to only show cgroup bpf programs Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-06-04  1:03 UTC (permalink / raw)
  To: patches, stable
  Cc: Hou Tao, syzbot+dce5aae19ae4d6399986, Alexei Starovoitov,
	Sasha Levin, daniel, andrii, bpf

From: Hou Tao <houtao1@huawei.com>

[ Upstream commit d4965578267e2e81f67c86e2608481e77e9c8569 ]

bpf_map_lookup_percpu_elem() helper is also available for sleepable bpf
program. When BPF JIT is disabled or under 32-bit host,
bpf_map_lookup_percpu_elem() will not be inlined. Using it in a
sleepable bpf program will trigger the warning in
bpf_map_lookup_percpu_elem(), because the bpf program only holds
rcu_read_lock_trace lock. Therefore, add the missed check.

Reported-by: syzbot+dce5aae19ae4d6399986@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/bpf/000000000000176a130617420310@google.com/
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250526062534.1105938-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

**YES** This commit should be backported to stable kernel trees based on
my analysis: ## Analysis **1. Clear Bug Fix:** This commit fixes a
legitimate bug where `bpf_map_lookup_percpu_elem()` triggers warnings in
sleepable BPF programs when the BPF JIT is disabled or on 32-bit hosts.
The warning occurs because sleepable BPF programs hold
`rcu_read_lock_trace` instead of `rcu_read_lock`, but the function only
checked for the latter. **2. Minimal, Contained Change:** The fix is
extremely small and surgical - it only adds
`!rcu_read_lock_trace_held()` to the existing warning condition in
`bpf_map_lookup_percpu_elem()` at kernel/bpf/helpers.c:132-133. This
follows the exact same pattern established by the previous similar fix.
**3. Strong Historical Precedent:** Multiple similar commits have been
successfully backported: - **Commit 169410eba271** (Similar Commit #1 -
Backport Status: YES) - Added the same `rcu_read_lock_trace_held()`
check to `bpf_map_{lookup,update,delete}_elem()` helpers with identical
reasoning - **Commit 29a7e00ffadd** (Similar Commit #4 - Backport
Status: YES) - Fixed missed RCU read lock in `bpf_task_under_cgroup()`
for sleepable programs **4. Clear User Impact:** The commit was reported
by syzbot and fixes a concrete issue affecting users running sleepable
BPF programs. Without this fix, users see spurious warnings that
indicate potential RCU usage bugs. **5. Low Regression Risk:** The
change only expands the conditions under which the warning is suppressed
- it doesn't change any functional behavior, just makes the assertion
more accurate for sleepable BPF programs. **6. Part of Ongoing
Pattern:** This is the missing piece in a series of similar fixes that
have systematically addressed RCU assertions for sleepable BPF programs.
The previous commit 169410eba271 fixed the basic map helpers but missed
this percpu variant. **7. Stable Tree Criteria Alignment:** - Fixes
important functionality (eliminates false warnings) - No architectural
changes - Minimal risk of regression - Confined to BPF subsystem - Clear
side effects (none beyond fixing the warning) The commit perfectly
matches the stable tree backporting criteria and follows the established
pattern of similar successful backports.

 kernel/bpf/helpers.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4fef0a0155255..94e85d311641b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -125,7 +125,8 @@ const struct bpf_func_proto bpf_map_peek_elem_proto = {
 
 BPF_CALL_3(bpf_map_lookup_percpu_elem, struct bpf_map *, map, void *, key, u32, cpu)
 {
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 	return (unsigned long) map->ops->map_lookup_percpu_elem(map, key, cpu);
 }
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.1 29/46] bpftool: Fix cgroup command to only show cgroup bpf programs
       [not found] <20250604010404.5109-1-sashal@kernel.org>
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 04/46] bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() Sasha Levin
@ 2025-06-04  1:03 ` Sasha Levin
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 31/46] libbpf: Add identical pointer detection to btf_dedup_is_equiv() Sasha Levin
  2025-06-04  1:04 ` [PATCH AUTOSEL 6.1 44/46] bpf, sockmap: Fix data lost during EAGAIN retries Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-06-04  1:03 UTC (permalink / raw)
  To: patches, stable
  Cc: Martin KaFai Lau, Quentin Monnet, Takshak Chahande,
	Daniel Borkmann, Alexei Starovoitov, Sasha Levin, bpf

From: Martin KaFai Lau <martin.lau@kernel.org>

[ Upstream commit b69d4413aa1961930fbf9ffad8376d577378daf9 ]

The netkit program is not a cgroup bpf program and should not be shown
in the output of the "bpftool cgroup show" command.

However, if the netkit device happens to have ifindex 3,
the "bpftool cgroup show" command will output the netkit
bpf program as well:

> ip -d link show dev nk1
3: nk1@if2: ...
    link/ether ...
    netkit mode ...

> bpftool net show
tc:
nk1(3) netkit/peer tw_ns_nk2phy prog_id 469447

> bpftool cgroup show /sys/fs/cgroup/...
ID       AttachType      AttachFlags     Name
...      ...                             ...
469447   netkit_peer                     tw_ns_nk2phy

The reason is that the target_fd (which is the cgroup_fd here) and
the target_ifindex are in a union in the uapi/linux/bpf.h. The bpftool
iterates all values in "enum bpf_attach_type" which includes
non cgroup attach types like netkit. The cgroup_fd is usually 3 here,
so the bug is triggered when the netkit ifindex just happens
to be 3 as well.

The bpftool's cgroup.c already has a list of cgroup-only attach type
defined in "cgroup_attach_types[]". This patch fixes it by iterating
over "cgroup_attach_types[]" instead of "__MAX_BPF_ATTACH_TYPE".

Cc: Quentin Monnet <qmo@kernel.org>
Reported-by: Takshak Chahande <ctakshak@meta.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250507203232.1420762-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

This file seems unrelated to the current commit. Let me now analyze the
specific commit to understand if it should be backported. **Analysis of
the commit:** ## YES **Extensive explanation:** This commit should be
backported to stable kernel trees because it fixes a clear functional
bug with specific user-visible incorrect behavior. Here's why: ### 1.
**Fixes a clear user-visible bug** The commit fixes a specific issue
where `bpftool cgroup show` incorrectly displays non-cgroup BPF programs
(specifically netkit programs) when their interface index happens to
match the cgroup file descriptor. This is documented as causing
confusing output where netkit programs appear in cgroup listings where
they don't belong. ### 2. **Small, contained fix with minimal risk**
Looking at the code changes in `tools/bpf/bpftool/cgroup.c`: - **Lines
321-325**: Changed from `enum bpf_attach_type type;` to `unsigned int
i;` in `do_show()` - **Lines 373-381**: Changed from `for (type = 0;
type < __MAX_BPF_ATTACH_TYPE; type++)` to `for (i = 0; i <
ARRAY_SIZE(cgroup_attach_types); i++)` and using
`cgroup_attach_types[i]` instead of `type` - **Lines 403-405**: Same
pattern change in `do_show_tree_fn()` - **Lines 437-438**: Same pattern
change in the tree function loop The fix is straightforward: instead of
iterating through all possible BPF attach types
(`__MAX_BPF_ATTACH_TYPE`), it now only iterates through the predefined
array `cgroup_attach_types[]` which contains only cgroup-specific attach
types. ### 3. **Root cause is clearly identified** The bug occurs
because: - `target_fd` (cgroup_fd) and `target_ifindex` are in a union
in the BPF uapi - When iterating through all attach types including non-
cgroup types like `BPF_NETKIT_PRIMARY`, the kernel interprets the
cgroup_fd as an ifindex - When the cgroup_fd value (often 3) matches a
netkit device's ifindex, the query succeeds and incorrectly shows the
netkit program as a cgroup program ### 4. **Existing infrastructure used
correctly** The fix leverages the existing `cgroup_attach_types[]` array
that was already defined in the file (lines 22-52) but not being used in
the iteration loops. This array contains only legitimate cgroup attach
types, making the fix both correct and consistent with the existing
codebase. ### 5. **No functional changes to legitimate use cases** The
change only affects what gets queried - it will no longer query non-
cgroup attach types, which should never have been relevant to cgroup
operations anyway. All legitimate cgroup functionality remains
unchanged. ### 6. **Typical stable tree material** This fits the stable
tree criteria perfectly: - **Important bug fix**: Fixes
incorrect/confusing output that misleads users - **Small and obvious**:
The change is minimal and the fix is clear - **No architectural
changes**: This is just correcting the iteration logic - **Low
regression risk**: The change only restricts what gets queried to the
appropriate types - **Userspace tool fix**: While this is a userspace
tool, bpftool is part of the kernel tree and these fixes are commonly
backported ### 7. **Matches pattern of similar commits** Looking at the
similar commits provided, while they were marked as "NO" for
backporting, those were mostly feature additions (adding netkit support,
implementing new functionality, etc.). This commit is fundamentally
different - it's a pure bug fix that corrects existing functionality
rather than adding new features. The commit explicitly states it's a
"Fix" in the subject line and clearly describes the incorrect behavior
being corrected, making it a strong candidate for stable backporting.

 tools/bpf/bpftool/cgroup.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index b46a998d8f8df..d157f58ec7d5a 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -284,11 +284,11 @@ static int show_bpf_progs(int cgroup_fd, enum bpf_attach_type type,

 static int do_show(int argc, char **argv)
 {
-	enum bpf_attach_type type;
 	int has_attached_progs;
 	const char *path;
 	int cgroup_fd;
 	int ret = -1;
+	unsigned int i;

 	query_flags = 0;

@@ -336,14 +336,14 @@ static int do_show(int argc, char **argv)
 		       "AttachFlags", "Name");

 	btf_vmlinux = libbpf_find_kernel_btf();
-	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
+	for (i = 0; i < ARRAY_SIZE(cgroup_attach_types); i++) {
 		/*
 		 * Not all attach types may be supported, so it's expected,
 		 * that some requests will fail.
 		 * If we were able to get the show for at least one
 		 * attach type, let's return 0.
 		 */
-		if (show_bpf_progs(cgroup_fd, type, 0) == 0)
+		if (show_bpf_progs(cgroup_fd, cgroup_attach_types[i], 0) == 0)
 			ret = 0;
 	}

@@ -366,9 +366,9 @@ static int do_show(int argc, char **argv)
 static int do_show_tree_fn(const char *fpath, const struct stat *sb,
 			   int typeflag, struct FTW *ftw)
 {
-	enum bpf_attach_type type;
 	int has_attached_progs;
 	int cgroup_fd;
+	unsigned int i;

 	if (typeflag != FTW_D)
 		return 0;
@@ -400,8 +400,8 @@ static int do_show_tree_fn(const char *fpath, const struct stat *sb,
 	}

 	btf_vmlinux = libbpf_find_kernel_btf();
-	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++)
-		show_bpf_progs(cgroup_fd, type, ftw->level);
+	for (i = 0; i < ARRAY_SIZE(cgroup_attach_types); i++)
+		show_bpf_progs(cgroup_fd, cgroup_attach_types[i], ftw->level);

 	if (errno == EINVAL)
 		/* Last attach type does not support query.
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.1 31/46] libbpf: Add identical pointer detection to btf_dedup_is_equiv()
       [not found] <20250604010404.5109-1-sashal@kernel.org>
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 04/46] bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() Sasha Levin
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 29/46] bpftool: Fix cgroup command to only show cgroup bpf programs Sasha Levin
@ 2025-06-04  1:03 ` Sasha Levin
  2025-06-04  1:04 ` [PATCH AUTOSEL 6.1 44/46] bpf, sockmap: Fix data lost during EAGAIN retries Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-06-04  1:03 UTC (permalink / raw)
  To: patches, stable
  Cc: Alan Maguire, Alexei Starovoitov, Andrii Nakryiko, Sasha Levin,
	eddyz87, daniel, bpf

From: Alan Maguire <alan.maguire@oracle.com>

[ Upstream commit 8e64c387c942229c551d0f23de4d9993d3a2acb6 ]

Recently as a side-effect of

commit ac053946f5c4 ("compiler.h: introduce TYPEOF_UNQUAL() macro")

issues were observed in deduplication between modules and kernel BTF
such that a large number of kernel types were not deduplicated so
were found in module BTF (task_struct, bpf_prog etc).  The root cause
appeared to be a failure to dedup struct types, specifically those
with members that were pointers with __percpu annotations.

The issue in dedup is at the point that we are deduplicating structures,
we have not yet deduplicated reference types like pointers.  If multiple
copies of a pointer point at the same (deduplicated) integer as in this
case, we do not see them as identical.  Special handling already exists
to deal with structures and arrays, so add pointer handling here too.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250429161042.2069678-1-alan.maguire@oracle.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

Based on my comprehensive analysis of the commit message, code changes,
historical patterns, and kernel repository context, here is my
recommendation: **YES** This commit should be backported to stable
kernel trees for the following reasons: ## Critical Bug Fix Analysis
**1. Fixes a Real User-Impacting Regression:** - The commit addresses a
regression introduced by `commit ac053946f5c4 ("compiler.h: introduce
TYPEOF_UNQUAL() macro")` - This regression causes "a large number of
kernel types" (task_struct, bpf_prog, etc.) to fail deduplication -
Results in broken BPF functionality for kernel modules, which is user-
visible **2. Follows Established Stable Tree Criteria:** - **Important
bugfix**: ✅ Fixes BTF deduplication failures affecting core BPF
functionality - **Minimal risk**: ✅ Small, targeted change following
existing patterns - **No architectural changes**: ✅ Adds a simple helper
function without changing core algorithm - **Confined to subsystem**: ✅
Changes only affect BTF deduplication logic in libbpf ## Code Change
Analysis **3. Conservative and Safe Implementation:** ```c +static bool
btf_dedup_identical_ptrs(struct btf_dedup *d, __u32 id1, __u32 id2) +{ +
struct btf_type *t1, *t2; + + t1 = btf_type_by_id(d->btf, id1); + t2 =
btf_type_by_id(d->btf, id2); + + if (!btf_is_ptr(t1) || !btf_is_ptr(t2))
+ return false; + + return t1->type == t2->type; +} ``` - Simple type-
checking function with clear bounds checking - Mirrors existing
`btf_dedup_identical_arrays()` and `btf_dedup_identical_structs()`
patterns - No complex logic or state changes **4. Integration Follows
Existing Pattern:** ```c + /bin /bin.usr-is-merged /boot /dev /etc /home
/init /lib /lib.usr-is-merged /lib64 /lost+found /media /mnt /opt /proc
/root /run /sbin /sbin.usr-is-merged /snap /srv /sys /tmp /usr /var A
similar case is again observed for PTRs. */ + if
(btf_dedup_identical_ptrs(d, hypot_type_id, cand_id)) + return 1; ``` -
Added alongside existing identical array/struct checks - Same position
in control flow as established workarounds - Consistent with documented
compiler DWARF generation issues ## Historical Pattern Alignment **5.
Matches "YES" Backport Pattern:** Looking at similar commits marked for
backport: - **Similar Commit #1** (YES): Adds identical struct checking
for BTF dedup failures - same pattern - **Similar Commit #4** (YES):
Fixes memory leak in BTF dedup - critical subsystem fix - **Similar
Commit #5** (YES): Handles DWARF/compiler bugs with duplicated structs -
identical issue class This commit addresses the exact same class of
problem (compiler-generated identical types) that has been consistently
backported. ## Risk Assessment **6. Low Regression Risk:** - Function
only returns `true` when types are genuinely identical (`t1->type ==
t2->type`) - Early returns prevent processing non-pointer types - Cannot
cause false positives that would incorrectly deduplicate different types
- Follows defensive programming patterns used throughout the codebase ##
Conclusion This commit fixes a regression in critical BPF functionality,
uses a proven safe pattern, has minimal code footprint, and addresses
issues that directly impact users. The fix quality is high and the risk
is low, making it an excellent candidate for stable tree backporting.

 tools/lib/bpf/btf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 8224a797c2da5..f7e3209d6c641 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -3939,6 +3939,19 @@ static bool btf_dedup_identical_structs(struct btf_dedup *d, __u32 id1, __u32 id
 	return true;
 }
 
+static bool btf_dedup_identical_ptrs(struct btf_dedup *d, __u32 id1, __u32 id2)
+{
+	struct btf_type *t1, *t2;
+
+	t1 = btf_type_by_id(d->btf, id1);
+	t2 = btf_type_by_id(d->btf, id2);
+
+	if (!btf_is_ptr(t1) || !btf_is_ptr(t2))
+		return false;
+
+	return t1->type == t2->type;
+}
+
 /*
  * Check equivalence of BTF type graph formed by candidate struct/union (we'll
  * call it "candidate graph" in this description for brevity) to a type graph
@@ -4071,6 +4084,9 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
 		 */
 		if (btf_dedup_identical_structs(d, hypot_type_id, cand_id))
 			return 1;
+		/* A similar case is again observed for PTRs. */
+		if (btf_dedup_identical_ptrs(d, hypot_type_id, cand_id))
+			return 1;
 		return 0;
 	}
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH AUTOSEL 6.1 44/46] bpf, sockmap: Fix data lost during EAGAIN retries
       [not found] <20250604010404.5109-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 31/46] libbpf: Add identical pointer detection to btf_dedup_is_equiv() Sasha Levin
@ 2025-06-04  1:04 ` Sasha Levin
  3 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2025-06-04  1:04 UTC (permalink / raw)
  To: patches, stable
  Cc: Jiayuan Chen, Alexei Starovoitov, Sasha Levin, john.fastabend,
	jakub, netdev, bpf

From: Jiayuan Chen <jiayuan.chen@linux.dev>

[ Upstream commit 7683167196bd727ad5f3c3fc6a9ca70f54520a81 ]

We call skb_bpf_redirect_clear() to clean _sk_redir before handling skb in
backlog, but when sk_psock_handle_skb() return EAGAIN due to sk_rcvbuf
limit, the redirect info in _sk_redir is not recovered.

Fix skb redir loss during EAGAIN retries by restoring _sk_redir
information using skb_bpf_set_redir().

Before this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1343.172 MB/s, BPF Speed 1343.238 MB/s, Rcv Speed   65.271 MB/s
Send Speed 1352.022 MB/s, BPF Speed 1352.088 MB/s, Rcv Speed   0 MB/s
Send Speed 1354.105 MB/s, BPF Speed 1354.105 MB/s, Rcv Speed   0 MB/s
Send Speed 1355.018 MB/s, BPF Speed 1354.887 MB/s, Rcv Speed   0 MB/s
'''
Due to the high send rate, the RX processing path may frequently hit the
sk_rcvbuf limit. Once triggered, incorrect _sk_redir will cause the flow
to mistakenly enter the "!ingress" path, leading to send failures.
(The Rcv speed depends on tcp_rmem).

After this patch:
'''
./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
Setting up benchmark 'sockmap'...
create socket fd c1:13 p1:14 c2:15 p2:16
Benchmark 'sockmap' started.
Send Speed 1347.236 MB/s, BPF Speed 1347.367 MB/s, Rcv Speed   65.402 MB/s
Send Speed 1353.320 MB/s, BPF Speed 1353.320 MB/s, Rcv Speed   65.536 MB/s
Send Speed 1353.186 MB/s, BPF Speed 1353.121 MB/s, Rcv Speed   65.536 MB/s
'''

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250407142234.47591-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

**YES** This commit should be backported to stable kernel trees. ##
Extensive Analysis ### **Critical Bug Fix Nature** This is a **data loss
bug** in the BPF sockmap subsystem that causes silent packet drops under
load conditions. The commit adds a single line that restores essential
redirect information during EAGAIN retry scenarios. ### **Code Change
Analysis** The fix adds exactly one line at `net/core/skmsg.c:694`: ```c
skb_bpf_set_redir(skb, psock->sk, ingress); ``` This line restores
redirect information that was previously cleared by
`skb_bpf_redirect_clear(skb)` at line 684. When `sk_psock_handle_skb()`
returns `-EAGAIN` due to memory pressure or socket buffer limits, the
work is rescheduled, but without this fix, the redirect information
(`skb->_sk_redir`) would be lost. ### **Impact Without the Fix** 1.
**Silent Data Loss**: When EAGAIN occurs, subsequent retry attempts find
no redirect target (`skb_bpf_redirect_fetch()` returns NULL), causing
packets to be dropped in `sk_psock_skb_redirect()` at lines 935-938. 2.
**BPF Policy Bypass**: Packets that should be redirected according to
BPF program logic get dropped instead, effectively bypassing network
policies. 3. **Production Impact**: The commit message shows concrete
performance degradation - receive speeds dropping from 65+ MB/s to 0
MB/s under high load conditions. ### **Backport Suitability Factors**
**✅ Minimal and Surgical:** - Single line addition - No API changes or
structural modifications - Uses existing stable function
`skb_bpf_set_redir()` **✅ Self-Contained:** - Fix only affects the
specific error path (`ret == -EAGAIN`) - No dependencies on other
concurrent changes - Uses well-established APIs present across kernel
versions **✅ Clear Bug Fix Semantics:** - Restores state that was
previously cleared - Follows the established pattern: clear → try →
restore on failure - The comment explicitly states "Restore redir info
we cleared before" **✅ Critical Subsystem:** - Affects BPF sockmap, a
core networking infrastructure component - Used by service meshes,
container networking, and load balancers - Failure causes silent data
loss that's difficult to debug ### **Comparison with Similar Commits**
Looking at the provided historical examples: - Similar to commit #2 and
#4 (both marked YES) which also fix sockmap data handling issues -
Unlike commit #1, #3, and #5 (marked NO) which involved more complex
architectural changes - This fix addresses a fundamental correctness
issue rather than optimizations ### **Risk Assessment** **Low Risk:** -
The fix is in an error recovery path, so it only executes when problems
already exist - Restoring redirect information cannot make the situation
worse - The function `skb_bpf_set_redir()` is a simple state restoration
operation ### **Stable Tree Criteria Compliance** 1. **Important
bugfix**: ✅ Fixes silent data loss 2. **Minimal risk**: ✅ Single line,
error path only 3. **No new features**: ✅ Pure bug fix 4. **Confined
scope**: ✅ Limited to sockmap redirect handling 5. **User-visible
impact**: ✅ Prevents packet loss under load This commit perfectly fits
the stable tree criteria for important, low-risk bug fixes that address
user-visible problems in critical subsystems.

 net/core/skmsg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 5a790cd1121b1..0065a5a65afc0 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -678,7 +678,8 @@ static void sk_psock_backlog(struct work_struct *work)
 			if (ret <= 0) {
 				if (ret == -EAGAIN) {
 					sk_psock_skb_state(psock, state, len, off);
-
+					/* Restore redir info we cleared before */
+					skb_bpf_set_redir(skb, psock->sk, ingress);
 					/* Delay slightly to prioritize any
 					 * other work that might be here.
 					 */
-- 
2.39.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-04  1:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250604010404.5109-1-sashal@kernel.org>
2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 04/46] bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() Sasha Levin
2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 29/46] bpftool: Fix cgroup command to only show cgroup bpf programs Sasha Levin
2025-06-04  1:03 ` [PATCH AUTOSEL 6.1 31/46] libbpf: Add identical pointer detection to btf_dedup_is_equiv() Sasha Levin
2025-06-04  1:04 ` [PATCH AUTOSEL 6.1 44/46] bpf, sockmap: Fix data lost during EAGAIN retries Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).