* [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags @ 2025-04-04 14:23 Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn 0 siblings, 2 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf; +Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Address a longstanding issue that may lead to missed packets depending on system configuration. Ensure that reading from packet contents works regardless of skb geometry, also when using the special SKF_.. negative offsets to offset from L2 or L3 header. Patch 2 is the selftest for the fix. Willem de Bruijn (2): bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags selftests/net: test sk_filter support for SKF_NET_OFF on frags include/linux/filter.h | 3 - kernel/bpf/core.c | 21 -- net/core/filter.c | 80 ++++--- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/skf_net_off.c | 244 +++++++++++++++++++++ tools/testing/selftests/net/skf_net_off.sh | 30 +++ 7 files changed, 321 insertions(+), 60 deletions(-) create mode 100644 tools/testing/selftests/net/skf_net_off.c create mode 100755 tools/testing/selftests/net/skf_net_off.sh -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn @ 2025-04-04 14:23 ` Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn 1 sibling, 2 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski From: Willem de Bruijn <willemb@google.com> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to read when these offsets extend into frags. This has been observed with iwlwifi and reproduced with tun with IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, applied to a RAW socket, will silently miss matching packets. const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); struct sock_filter filter_code[] = { BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), This is unexpected behavior. Socket filter programs should be consistent regardless of environment. Silent misses are particularly concerning as hard to detect. Use skb_copy_bits for offsets outside linear, same as done for non-SKF_(LL|NET) offsets. Offset is always positive after subtracting the reference threshold SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of the two is an offset against skb->data, and may be negative, but it cannot point before skb->head, as skb_(mac|network)_offset would too. This appears to go back to when frag support was introduced to sk_run_filter in linux-2.4.4, before the introduction of git. The amount of code change and 8/16/32 bit duplication are unfortunate. But any attempt I made to be smarter saved very few LoC while complicating the code. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 Reported-by: Matt Moeller <moeller.matt@gmail.com> Co-developed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> --- v1->v2 - introduce bfp_skb_load_helper_convert_offset to avoid open coding --- include/linux/filter.h | 3 -- kernel/bpf/core.c | 21 ----------- net/core/filter.c | 80 +++++++++++++++++++++++------------------- 3 files changed, 44 insertions(+), 60 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index f5cf4d35d83e..708ac7e0cd36 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) } } -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, - int k, unsigned int size); - static inline int bpf_tell_extensions(void) { return SKF_AD_MAX; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf50..0e836b5ac9a0 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -68,27 +68,6 @@ struct bpf_mem_alloc bpf_global_ma; bool bpf_global_ma_set; -/* No hurry in this branch - * - * Exported for the bpf jit load helper. - */ -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) -{ - u8 *ptr = NULL; - - if (k >= SKF_NET_OFF) { - ptr = skb_network_header(skb) + k - SKF_NET_OFF; - } else if (k >= SKF_LL_OFF) { - if (unlikely(!skb_mac_header_was_set(skb))) - return NULL; - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; - } - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) - return ptr; - - return NULL; -} - /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ enum page_size_enum { __PAGE_SIZE = PAGE_SIZE diff --git a/net/core/filter.c b/net/core/filter.c index bc6828761a47..79cab4d78dc3 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -218,24 +218,36 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) return 0; } +static int bpf_skb_load_helper_convert_offset(const struct sk_buff *skb, int offset) +{ + if (likely(offset >= 0)) + return offset; + + if (offset >= SKF_NET_OFF) + return offset - SKF_NET_OFF + skb_network_offset(skb); + + if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + return offset - SKF_LL_OFF + skb_mac_offset(skb); + + return INT_MIN; +} + BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - u8 tmp, *ptr; + u8 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return *(u8 *)(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return tmp; - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return *(u8 *)ptr; - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return *(u8 *)(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return tmp; + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, @@ -248,21 +260,19 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be16 tmp, *ptr; + __be16 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return get_unaligned_be16(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be16_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be16(ptr); - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be16(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be16_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, @@ -275,21 +285,19 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be32 tmp, *ptr; + __be32 tmp; const int len = sizeof(tmp); - if (likely(offset >= 0)) { - if (headlen - offset >= len) - return get_unaligned_be32(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be32_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be32(ptr); - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be32(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be32_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb, -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn @ 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 1 sibling, 0 replies; 9+ messages in thread From: Stanislav Fomichev @ 2025-04-04 15:08 UTC (permalink / raw) To: Willem de Bruijn Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski On 04/04, Willem de Bruijn wrote: > From: Willem de Bruijn <willemb@google.com> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > read when these offsets extend into frags. > > This has been observed with iwlwifi and reproduced with tun with > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > applied to a RAW socket, will silently miss matching packets. > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > struct sock_filter filter_code[] = { > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > This is unexpected behavior. Socket filter programs should be > consistent regardless of environment. Silent misses are > particularly concerning as hard to detect. > > Use skb_copy_bits for offsets outside linear, same as done for > non-SKF_(LL|NET) offsets. > > Offset is always positive after subtracting the reference threshold > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > the two is an offset against skb->data, and may be negative, but it > cannot point before skb->head, as skb_(mac|network)_offset would too. > > This appears to go back to when frag support was introduced to > sk_run_filter in linux-2.4.4, before the introduction of git. > > The amount of code change and 8/16/32 bit duplication are unfortunate. > But any attempt I made to be smarter saved very few LoC while > complicating the code. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > Reported-by: Matt Moeller <moeller.matt@gmail.com> > Co-developed-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > v1->v2 > - introduce bfp_skb_load_helper_convert_offset to avoid open coding Thank you! Acked-by: Stanislav Fomichev <sdf@fomichev.me> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev @ 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 16:33 ` Willem de Bruijn 1 sibling, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2025-04-04 16:11 UTC (permalink / raw) To: Willem de Bruijn, bpf Cc: netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski Hi Willem, On 4/4/25 4:23 PM, Willem de Bruijn wrote: [...] > v1->v2 > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > --- > include/linux/filter.h | 3 -- > kernel/bpf/core.c | 21 ----------- > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > 3 files changed, 44 insertions(+), 60 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index f5cf4d35d83e..708ac7e0cd36 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > } > } > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > - int k, unsigned int size); > - > static inline int bpf_tell_extensions(void) > { > return SKF_AD_MAX; > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index ba6b6118cf50..0e836b5ac9a0 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -68,27 +68,6 @@ > struct bpf_mem_alloc bpf_global_ma; > bool bpf_global_ma_set; > > -/* No hurry in this branch > - * > - * Exported for the bpf jit load helper. > - */ > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > -{ > - u8 *ptr = NULL; > - > - if (k >= SKF_NET_OFF) { > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > - } else if (k >= SKF_LL_OFF) { > - if (unlikely(!skb_mac_header_was_set(skb))) > - return NULL; > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > - } > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > - return ptr; > - > - return NULL; > -} Wouldn't this break sparc 32bit JIT which still calls into this? arch/sparc/net/bpf_jit_asm_32.S : #define bpf_negative_common(LEN) \ save %sp, -SAVE_SZ, %sp; \ mov %i0, %o0; \ mov r_OFF, %o1; \ SIGN_EXTEND(%o1); \ call bpf_internal_load_pointer_neg_helper; \ mov (LEN), %o2; \ mov %o0, r_TMP; \ cmp %o0, 0; \ BE_PTR(bpf_error); \ restore; Thanks, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 16:11 ` Daniel Borkmann @ 2025-04-04 16:33 ` Willem de Bruijn 2025-04-04 17:56 ` Maciej Żenczykowski 0 siblings, 1 reply; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 16:33 UTC (permalink / raw) To: Daniel Borkmann Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > Hi Willem, > > On 4/4/25 4:23 PM, Willem de Bruijn wrote: > [...] > > v1->v2 > > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > > --- > > include/linux/filter.h | 3 -- > > kernel/bpf/core.c | 21 ----------- > > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > > 3 files changed, 44 insertions(+), 60 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index f5cf4d35d83e..708ac7e0cd36 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > } > > } > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > - int k, unsigned int size); > > - > > static inline int bpf_tell_extensions(void) > > { > > return SKF_AD_MAX; > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index ba6b6118cf50..0e836b5ac9a0 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -68,27 +68,6 @@ > > struct bpf_mem_alloc bpf_global_ma; > > bool bpf_global_ma_set; > > > > -/* No hurry in this branch > > - * > > - * Exported for the bpf jit load helper. > > - */ > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > -{ > > - u8 *ptr = NULL; > > - > > - if (k >= SKF_NET_OFF) { > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > - } else if (k >= SKF_LL_OFF) { > > - if (unlikely(!skb_mac_header_was_set(skb))) > > - return NULL; > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > - } > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > - return ptr; > > - > > - return NULL; > > -} > > Wouldn't this break sparc 32bit JIT which still calls into this? > > arch/sparc/net/bpf_jit_asm_32.S : > > #define bpf_negative_common(LEN) \ > save %sp, -SAVE_SZ, %sp; \ > mov %i0, %o0; \ > mov r_OFF, %o1; \ > SIGN_EXTEND(%o1); \ > call bpf_internal_load_pointer_neg_helper; \ > mov (LEN), %o2; \ > mov %o0, r_TMP; \ > cmp %o0, 0; \ > BE_PTR(bpf_error); \ > restore; Argh, good catch. Thanks Daniel. I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 16:33 ` Willem de Bruijn @ 2025-04-04 17:56 ` Maciej Żenczykowski 2025-04-07 9:00 ` Daniel Borkmann 0 siblings, 1 reply; 9+ messages in thread From: Maciej Żenczykowski @ 2025-04-04 17:56 UTC (permalink / raw) To: Willem de Bruijn Cc: Daniel Borkmann, bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > Hi Willem, > > > > On 4/4/25 4:23 PM, Willem de Bruijn wrote: > > [...] > > > v1->v2 > > > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > > > --- > > > include/linux/filter.h | 3 -- > > > kernel/bpf/core.c | 21 ----------- > > > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > > > 3 files changed, 44 insertions(+), 60 deletions(-) > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > index f5cf4d35d83e..708ac7e0cd36 100644 > > > --- a/include/linux/filter.h > > > +++ b/include/linux/filter.h > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > > } > > > } > > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > > - int k, unsigned int size); > > > - > > > static inline int bpf_tell_extensions(void) > > > { > > > return SKF_AD_MAX; > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > index ba6b6118cf50..0e836b5ac9a0 100644 > > > --- a/kernel/bpf/core.c > > > +++ b/kernel/bpf/core.c > > > @@ -68,27 +68,6 @@ > > > struct bpf_mem_alloc bpf_global_ma; > > > bool bpf_global_ma_set; > > > > > > -/* No hurry in this branch > > > - * > > > - * Exported for the bpf jit load helper. > > > - */ > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > > -{ > > > - u8 *ptr = NULL; > > > - > > > - if (k >= SKF_NET_OFF) { > > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > > - } else if (k >= SKF_LL_OFF) { > > > - if (unlikely(!skb_mac_header_was_set(skb))) > > > - return NULL; > > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > > - } > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > > - return ptr; > > > - > > > - return NULL; > > > -} > > > > Wouldn't this break sparc 32bit JIT which still calls into this? > > > > arch/sparc/net/bpf_jit_asm_32.S : > > > > #define bpf_negative_common(LEN) \ > > save %sp, -SAVE_SZ, %sp; \ > > mov %i0, %o0; \ > > mov r_OFF, %o1; \ > > SIGN_EXTEND(%o1); \ > > call bpf_internal_load_pointer_neg_helper; \ > > mov (LEN), %o2; \ > > mov %o0, r_TMP; \ > > cmp %o0, 0; \ > > BE_PTR(bpf_error); \ > > restore; > > Argh, good catch. Thanks Daniel. > > I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. add a 'deprecated only used by sparc32 comment' hopefully someone that knows sparc32 assembly can fix it -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 17:56 ` Maciej Żenczykowski @ 2025-04-07 9:00 ` Daniel Borkmann 2025-04-07 14:47 ` Willem de Bruijn 0 siblings, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2025-04-07 9:00 UTC (permalink / raw) To: Maciej Żenczykowski, Willem de Bruijn Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller On 4/4/25 7:56 PM, Maciej Żenczykowski wrote: > On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: >> On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >>> >>> Hi Willem, >>> >>> On 4/4/25 4:23 PM, Willem de Bruijn wrote: >>> [...] >>>> v1->v2 >>>> - introduce bfp_skb_load_helper_convert_offset to avoid open coding >>>> --- >>>> include/linux/filter.h | 3 -- >>>> kernel/bpf/core.c | 21 ----------- >>>> net/core/filter.c | 80 +++++++++++++++++++++++------------------- >>>> 3 files changed, 44 insertions(+), 60 deletions(-) >>>> >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h >>>> index f5cf4d35d83e..708ac7e0cd36 100644 >>>> --- a/include/linux/filter.h >>>> +++ b/include/linux/filter.h >>>> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) >>>> } >>>> } >>>> >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, >>>> - int k, unsigned int size); >>>> - >>>> static inline int bpf_tell_extensions(void) >>>> { >>>> return SKF_AD_MAX; >>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>>> index ba6b6118cf50..0e836b5ac9a0 100644 >>>> --- a/kernel/bpf/core.c >>>> +++ b/kernel/bpf/core.c >>>> @@ -68,27 +68,6 @@ >>>> struct bpf_mem_alloc bpf_global_ma; >>>> bool bpf_global_ma_set; >>>> >>>> -/* No hurry in this branch >>>> - * >>>> - * Exported for the bpf jit load helper. >>>> - */ >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) >>>> -{ >>>> - u8 *ptr = NULL; >>>> - >>>> - if (k >= SKF_NET_OFF) { >>>> - ptr = skb_network_header(skb) + k - SKF_NET_OFF; >>>> - } else if (k >= SKF_LL_OFF) { >>>> - if (unlikely(!skb_mac_header_was_set(skb))) >>>> - return NULL; >>>> - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; >>>> - } >>>> - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) >>>> - return ptr; >>>> - >>>> - return NULL; >>>> -} >>> >>> Wouldn't this break sparc 32bit JIT which still calls into this? >>> >>> arch/sparc/net/bpf_jit_asm_32.S : >>> >>> #define bpf_negative_common(LEN) \ >>> save %sp, -SAVE_SZ, %sp; \ >>> mov %i0, %o0; \ >>> mov r_OFF, %o1; \ >>> SIGN_EXTEND(%o1); \ >>> call bpf_internal_load_pointer_neg_helper; \ >>> mov (LEN), %o2; \ >>> mov %o0, r_TMP; \ >>> cmp %o0, 0; \ >>> BE_PTR(bpf_error); \ >>> restore; >> >> Argh, good catch. Thanks Daniel. >> >> I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. > > add a 'deprecated only used by sparc32 comment' > > hopefully someone that knows sparc32 assembly can fix it Alternatively, the bpf_internal_load_pointer_neg_helper() could be moved entirely over into arch/sparc/net/ so that others won't be tempted to reuse. Cheers, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-07 9:00 ` Daniel Borkmann @ 2025-04-07 14:47 ` Willem de Bruijn 0 siblings, 0 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-07 14:47 UTC (permalink / raw) To: Daniel Borkmann, Maciej Żenczykowski, Willem de Bruijn Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller Daniel Borkmann wrote: > On 4/4/25 7:56 PM, Maciej Żenczykowski wrote: > > On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > >> On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >>> > >>> Hi Willem, > >>> > >>> On 4/4/25 4:23 PM, Willem de Bruijn wrote: > >>> [...] > >>>> v1->v2 > >>>> - introduce bfp_skb_load_helper_convert_offset to avoid open coding > >>>> --- > >>>> include/linux/filter.h | 3 -- > >>>> kernel/bpf/core.c | 21 ----------- > >>>> net/core/filter.c | 80 +++++++++++++++++++++++------------------- > >>>> 3 files changed, 44 insertions(+), 60 deletions(-) > >>>> > >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h > >>>> index f5cf4d35d83e..708ac7e0cd36 100644 > >>>> --- a/include/linux/filter.h > >>>> +++ b/include/linux/filter.h > >>>> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > >>>> } > >>>> } > >>>> > >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > >>>> - int k, unsigned int size); > >>>> - > >>>> static inline int bpf_tell_extensions(void) > >>>> { > >>>> return SKF_AD_MAX; > >>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > >>>> index ba6b6118cf50..0e836b5ac9a0 100644 > >>>> --- a/kernel/bpf/core.c > >>>> +++ b/kernel/bpf/core.c > >>>> @@ -68,27 +68,6 @@ > >>>> struct bpf_mem_alloc bpf_global_ma; > >>>> bool bpf_global_ma_set; > >>>> > >>>> -/* No hurry in this branch > >>>> - * > >>>> - * Exported for the bpf jit load helper. > >>>> - */ > >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > >>>> -{ > >>>> - u8 *ptr = NULL; > >>>> - > >>>> - if (k >= SKF_NET_OFF) { > >>>> - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > >>>> - } else if (k >= SKF_LL_OFF) { > >>>> - if (unlikely(!skb_mac_header_was_set(skb))) > >>>> - return NULL; > >>>> - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > >>>> - } > >>>> - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > >>>> - return ptr; > >>>> - > >>>> - return NULL; > >>>> -} > >>> > >>> Wouldn't this break sparc 32bit JIT which still calls into this? > >>> > >>> arch/sparc/net/bpf_jit_asm_32.S : > >>> > >>> #define bpf_negative_common(LEN) \ > >>> save %sp, -SAVE_SZ, %sp; \ > >>> mov %i0, %o0; \ > >>> mov r_OFF, %o1; \ > >>> SIGN_EXTEND(%o1); \ > >>> call bpf_internal_load_pointer_neg_helper; \ > >>> mov (LEN), %o2; \ > >>> mov %o0, r_TMP; \ > >>> cmp %o0, 0; \ > >>> BE_PTR(bpf_error); \ > >>> restore; > >> > >> Argh, good catch. Thanks Daniel. > >> > >> I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. > > > > add a 'deprecated only used by sparc32 comment' > > > > hopefully someone that knows sparc32 assembly can fix it > > Alternatively, the bpf_internal_load_pointer_neg_helper() could be moved entirely > over into arch/sparc/net/ so that others won't be tempted to reuse. I'd prefer to keep it as is. I took a stab, but my Debian has no sparc32 gcc cross compiler anymore, and I was unable to cross compile with clang either. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn @ 2025-04-04 14:23 ` Willem de Bruijn 1 sibling, 0 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn, Stanislav Fomichev From: Willem de Bruijn <willemb@google.com> Verify that a classic BPF linux socket filter correctly matches packet contents. Including when accessing contents in an skb_frag. 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000. 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags. 3. Send a packet for which the UDP header is in frag[0]. 4. Receive this packet to demonstrate that the socket accepted it. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Willem de Bruijn <willemb@google.com> --- v1->v2 - add comment why early demux must be disabled --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/skf_net_off.c | 244 +++++++++++++++++++++ tools/testing/selftests/net/skf_net_off.sh | 30 +++ 4 files changed, 277 insertions(+) create mode 100644 tools/testing/selftests/net/skf_net_off.c create mode 100755 tools/testing/selftests/net/skf_net_off.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 679542f565a4..532bb732bc6d 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -39,6 +39,7 @@ scm_rights sk_bind_sendto_listen sk_connect_zero_addr sk_so_peek_off +skf_net_off socket so_incoming_cpu so_netns_cookie diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 6d718b478ed8..124078b56fa4 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh TEST_PROGS += busy_poll_test.sh TEST_GEN_PROGS += proc_net_pktgen TEST_PROGS += lwt_dst_cache_ref_loop.sh +TEST_PROGS += skf_net_off.sh +TEST_GEN_FILES += skf_net_off # YNL files, must be before "include ..lib.mk" YNL_GEN_FILES := busy_poller netlink-dumps diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c new file mode 100644 index 000000000000..1fdf61d6cd7f --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Open a tun device. + * + * [modifications: use IFF_NAPI_FRAGS, add sk filter] + * + * Expects the device to have been configured previously, e.g.: + * sudo ip tuntap add name tap1 mode tap + * sudo ip link set tap1 up + * sudo ip link set dev tap1 addr 02:00:00:00:00:01 + * sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad + * + * And to avoid premature pskb_may_pull: + * + * sudo ethtool -K tap1 gro off + * sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux' + */ + +#define _GNU_SOURCE + +#include <arpa/inet.h> +#include <errno.h> +#include <error.h> +#include <fcntl.h> +#include <getopt.h> +#include <linux/filter.h> +#include <linux/if.h> +#include <linux/if_packet.h> +#include <linux/if_tun.h> +#include <linux/ipv6.h> +#include <netinet/if_ether.h> +#include <netinet/in.h> +#include <netinet/ip.h> +#include <netinet/ip6.h> +#include <netinet/udp.h> +#include <poll.h> +#include <signal.h> +#include <stdbool.h> +#include <stddef.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/socket.h> +#include <sys/poll.h> +#include <sys/types.h> +#include <sys/uio.h> +#include <unistd.h> + +static bool cfg_do_filter; +static bool cfg_do_frags; +static int cfg_dst_port = 8000; +static char *cfg_ifname; + +static int tun_open(const char *tun_name) +{ + struct ifreq ifr = {0}; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd == -1) + error(1, errno, "open /dev/net/tun"); + + ifr.ifr_flags = IFF_TAP; + if (cfg_do_frags) + ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS; + + strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1); + + ret = ioctl(fd, TUNSETIFF, &ifr); + if (ret) + error(1, ret, "ioctl TUNSETIFF"); + + return fd; +} + +static void sk_set_filter(int fd) +{ + const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); + const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); + + /* Filter UDP packets with destination port cfg_dst_port */ + struct sock_filter filter_code[] = { + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), + BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0), + BPF_STMT(BPF_RET + BPF_K, 0), + BPF_STMT(BPF_RET + BPF_K, 0xFFFF), + }; + + struct sock_fprog filter = { + sizeof(filter_code) / sizeof(filter_code[0]), + filter_code, + }; + + if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter))) + error(1, errno, "setsockopt attach filter"); +} + +static int raw_open(void) +{ + int fd; + + fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP); + if (fd == -1) + error(1, errno, "socket raw (udp)"); + + if (cfg_do_filter) + sk_set_filter(fd); + + return fd; +} + +static void tun_write(int fd) +{ + const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 }; + const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 }; + struct tun_pi pi = {0}; + struct ipv6hdr ip6h = {0}; + struct udphdr uh = {0}; + struct ethhdr eth = {0}; + uint32_t payload; + struct iovec iov[5]; + int ret; + + pi.proto = htons(ETH_P_IPV6); + + memcpy(eth.h_source, eth_src, sizeof(eth_src)); + memcpy(eth.h_dest, eth_dst, sizeof(eth_dst)); + eth.h_proto = htons(ETH_P_IPV6); + + ip6h.version = 6; + ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t)); + ip6h.nexthdr = IPPROTO_UDP; + ip6h.hop_limit = 8; + if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1) + error(1, errno, "inet_pton src"); + if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1) + error(1, errno, "inet_pton src"); + + uh.source = htons(8000); + uh.dest = htons(cfg_dst_port); + uh.len = ip6h.payload_len; + uh.check = 0; + + payload = htonl(0xABABABAB); /* Covered in IPv6 length */ + + iov[0].iov_base = π + iov[0].iov_len = sizeof(pi); + iov[1].iov_base = ð + iov[1].iov_len = sizeof(eth); + iov[2].iov_base = &ip6h; + iov[2].iov_len = sizeof(ip6h); + iov[3].iov_base = &uh; + iov[3].iov_len = sizeof(uh); + iov[4].iov_base = &payload; + iov[4].iov_len = sizeof(payload); + + ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0])); + if (ret <= 0) + error(1, errno, "writev"); +} + +static void raw_read(int fd) +{ + struct timeval tv = { .tv_usec = 100 * 1000 }; + struct msghdr msg = {0}; + struct iovec iov[2]; + struct udphdr uh; + uint32_t payload[2]; + int ret; + + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) + error(1, errno, "setsockopt rcvtimeo udp"); + + iov[0].iov_base = &uh; + iov[0].iov_len = sizeof(uh); + + iov[1].iov_base = payload; + iov[1].iov_len = sizeof(payload); + + msg.msg_iov = iov; + msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]); + + ret = recvmsg(fd, &msg, 0); + if (ret <= 0) + error(1, errno, "read raw"); + if (ret != sizeof(uh) + sizeof(payload[0])) + error(1, errno, "read raw: len=%d\n", ret); + + fprintf(stderr, "raw recv: 0x%x\n", payload[0]); +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "fFi:")) != -1) { + switch (c) { + case 'f': + cfg_do_filter = true; + printf("bpf filter enabled\n"); + break; + case 'F': + cfg_do_frags = true; + printf("napi frags mode enabled\n"); + break; + case 'i': + cfg_ifname = optarg; + break; + default: + error(1, 0, "unknown option %c", optopt); + break; + } + } + + if (!cfg_ifname) + error(1, 0, "must specify tap interface name (-i)"); +} + +int main(int argc, char **argv) +{ + int fdt, fdr; + + parse_opts(argc, argv); + + fdr = raw_open(); + fdt = tun_open(cfg_ifname); + + tun_write(fdt); + raw_read(fdr); + + if (close(fdt)) + error(1, errno, "close tun"); + if (close(fdr)) + error(1, errno, "close udp"); + + fprintf(stderr, "OK\n"); + return 0; +} + diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh new file mode 100755 index 000000000000..5da5066fb465 --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +readonly NS="ns-$(mktemp -u XXXXXX)" + +cleanup() { + ip netns del $NS +} + +ip netns add $NS +trap cleanup EXIT + +ip -netns $NS link set lo up +ip -netns $NS tuntap add name tap1 mode tap +ip -netns $NS link set tap1 up +ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01 +ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad +ip netns exec $NS ethtool -K tap1 gro off + +# disable early demux, else udp_v6_early_demux pulls udp header into linear +ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0 + +echo "no filter" +ip netns exec $NS ./skf_net_off -i tap1 + +echo "filter, linear skb (-f)" +ip netns exec $NS ./skf_net_off -i tap1 -f + +echo "filter, fragmented skb (-f) (-F)" +ip netns exec $NS ./skf_net_off -i tap1 -f -F -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-04-07 14:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 16:33 ` Willem de Bruijn 2025-04-04 17:56 ` Maciej Żenczykowski 2025-04-07 9:00 ` Daniel Borkmann 2025-04-07 14:47 ` Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).