* [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags @ 2025-04-04 14:23 Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn 0 siblings, 2 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf; +Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn From: Willem de Bruijn <willemb@google.com> Address a longstanding issue that may lead to missed packets depending on system configuration. Ensure that reading from packet contents works regardless of skb geometry, also when using the special SKF_.. negative offsets to offset from L2 or L3 header. Patch 2 is the selftest for the fix. Willem de Bruijn (2): bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags selftests/net: test sk_filter support for SKF_NET_OFF on frags include/linux/filter.h | 3 - kernel/bpf/core.c | 21 -- net/core/filter.c | 80 ++++--- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/skf_net_off.c | 244 +++++++++++++++++++++ tools/testing/selftests/net/skf_net_off.sh | 30 +++ 7 files changed, 321 insertions(+), 60 deletions(-) create mode 100644 tools/testing/selftests/net/skf_net_off.c create mode 100755 tools/testing/selftests/net/skf_net_off.sh -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn @ 2025-04-04 14:23 ` Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn 1 sibling, 2 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski From: Willem de Bruijn <willemb@google.com> Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to read when these offsets extend into frags. This has been observed with iwlwifi and reproduced with tun with IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, applied to a RAW socket, will silently miss matching packets. const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); struct sock_filter filter_code[] = { BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), This is unexpected behavior. Socket filter programs should be consistent regardless of environment. Silent misses are particularly concerning as hard to detect. Use skb_copy_bits for offsets outside linear, same as done for non-SKF_(LL|NET) offsets. Offset is always positive after subtracting the reference threshold SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of the two is an offset against skb->data, and may be negative, but it cannot point before skb->head, as skb_(mac|network)_offset would too. This appears to go back to when frag support was introduced to sk_run_filter in linux-2.4.4, before the introduction of git. The amount of code change and 8/16/32 bit duplication are unfortunate. But any attempt I made to be smarter saved very few LoC while complicating the code. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 Reported-by: Matt Moeller <moeller.matt@gmail.com> Co-developed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> --- v1->v2 - introduce bfp_skb_load_helper_convert_offset to avoid open coding --- include/linux/filter.h | 3 -- kernel/bpf/core.c | 21 ----------- net/core/filter.c | 80 +++++++++++++++++++++++------------------- 3 files changed, 44 insertions(+), 60 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index f5cf4d35d83e..708ac7e0cd36 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) } } -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, - int k, unsigned int size); - static inline int bpf_tell_extensions(void) { return SKF_AD_MAX; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index ba6b6118cf50..0e836b5ac9a0 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -68,27 +68,6 @@ struct bpf_mem_alloc bpf_global_ma; bool bpf_global_ma_set; -/* No hurry in this branch - * - * Exported for the bpf jit load helper. - */ -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) -{ - u8 *ptr = NULL; - - if (k >= SKF_NET_OFF) { - ptr = skb_network_header(skb) + k - SKF_NET_OFF; - } else if (k >= SKF_LL_OFF) { - if (unlikely(!skb_mac_header_was_set(skb))) - return NULL; - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; - } - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) - return ptr; - - return NULL; -} - /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */ enum page_size_enum { __PAGE_SIZE = PAGE_SIZE diff --git a/net/core/filter.c b/net/core/filter.c index bc6828761a47..79cab4d78dc3 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -218,24 +218,36 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x) return 0; } +static int bpf_skb_load_helper_convert_offset(const struct sk_buff *skb, int offset) +{ + if (likely(offset >= 0)) + return offset; + + if (offset >= SKF_NET_OFF) + return offset - SKF_NET_OFF + skb_network_offset(skb); + + if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb)) + return offset - SKF_LL_OFF + skb_mac_offset(skb); + + return INT_MIN; +} + BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - u8 tmp, *ptr; + u8 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return *(u8 *)(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return tmp; - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return *(u8 *)ptr; - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return *(u8 *)(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return tmp; + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, @@ -248,21 +260,19 @@ BPF_CALL_2(bpf_skb_load_helper_8_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_16, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be16 tmp, *ptr; + __be16 tmp; const int len = sizeof(tmp); - if (offset >= 0) { - if (headlen - offset >= len) - return get_unaligned_be16(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be16_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be16(ptr); - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be16(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be16_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, @@ -275,21 +285,19 @@ BPF_CALL_2(bpf_skb_load_helper_16_no_cache, const struct sk_buff *, skb, BPF_CALL_4(bpf_skb_load_helper_32, const struct sk_buff *, skb, const void *, data, int, headlen, int, offset) { - __be32 tmp, *ptr; + __be32 tmp; const int len = sizeof(tmp); - if (likely(offset >= 0)) { - if (headlen - offset >= len) - return get_unaligned_be32(data + offset); - if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) - return be32_to_cpu(tmp); - } else { - ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len); - if (likely(ptr)) - return get_unaligned_be32(ptr); - } + offset = bpf_skb_load_helper_convert_offset(skb, offset); + if (offset == INT_MIN) + return -EFAULT; - return -EFAULT; + if (headlen - offset >= len) + return get_unaligned_be32(data + offset); + if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp))) + return be32_to_cpu(tmp); + else + return -EFAULT; } BPF_CALL_2(bpf_skb_load_helper_32_no_cache, const struct sk_buff *, skb, -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn @ 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 1 sibling, 0 replies; 9+ messages in thread From: Stanislav Fomichev @ 2025-04-04 15:08 UTC (permalink / raw) To: Willem de Bruijn Cc: bpf, netdev, ast, daniel, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski On 04/04, Willem de Bruijn wrote: > From: Willem de Bruijn <willemb@google.com> > > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to > read when these offsets extend into frags. > > This has been observed with iwlwifi and reproduced with tun with > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, > applied to a RAW socket, will silently miss matching packets. > > const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); > const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); > struct sock_filter filter_code[] = { > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), > BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), > BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), > BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), > > This is unexpected behavior. Socket filter programs should be > consistent regardless of environment. Silent misses are > particularly concerning as hard to detect. > > Use skb_copy_bits for offsets outside linear, same as done for > non-SKF_(LL|NET) offsets. > > Offset is always positive after subtracting the reference threshold > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of > the two is an offset against skb->data, and may be negative, but it > cannot point before skb->head, as skb_(mac|network)_offset would too. > > This appears to go back to when frag support was introduced to > sk_run_filter in linux-2.4.4, before the introduction of git. > > The amount of code change and 8/16/32 bit duplication are unfortunate. > But any attempt I made to be smarter saved very few LoC while > complicating the code. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 > Reported-by: Matt Moeller <moeller.matt@gmail.com> > Co-developed-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > Signed-off-by: Willem de Bruijn <willemb@google.com> > > --- > > v1->v2 > - introduce bfp_skb_load_helper_convert_offset to avoid open coding Thank you! Acked-by: Stanislav Fomichev <sdf@fomichev.me> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev @ 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 16:33 ` Willem de Bruijn 1 sibling, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2025-04-04 16:11 UTC (permalink / raw) To: Willem de Bruijn, bpf Cc: netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski Hi Willem, On 4/4/25 4:23 PM, Willem de Bruijn wrote: [...] > v1->v2 > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > --- > include/linux/filter.h | 3 -- > kernel/bpf/core.c | 21 ----------- > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > 3 files changed, 44 insertions(+), 60 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index f5cf4d35d83e..708ac7e0cd36 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > } > } > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > - int k, unsigned int size); > - > static inline int bpf_tell_extensions(void) > { > return SKF_AD_MAX; > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index ba6b6118cf50..0e836b5ac9a0 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -68,27 +68,6 @@ > struct bpf_mem_alloc bpf_global_ma; > bool bpf_global_ma_set; > > -/* No hurry in this branch > - * > - * Exported for the bpf jit load helper. > - */ > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > -{ > - u8 *ptr = NULL; > - > - if (k >= SKF_NET_OFF) { > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > - } else if (k >= SKF_LL_OFF) { > - if (unlikely(!skb_mac_header_was_set(skb))) > - return NULL; > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > - } > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > - return ptr; > - > - return NULL; > -} Wouldn't this break sparc 32bit JIT which still calls into this? arch/sparc/net/bpf_jit_asm_32.S : #define bpf_negative_common(LEN) \ save %sp, -SAVE_SZ, %sp; \ mov %i0, %o0; \ mov r_OFF, %o1; \ SIGN_EXTEND(%o1); \ call bpf_internal_load_pointer_neg_helper; \ mov (LEN), %o2; \ mov %o0, r_TMP; \ cmp %o0, 0; \ BE_PTR(bpf_error); \ restore; Thanks, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 16:11 ` Daniel Borkmann @ 2025-04-04 16:33 ` Willem de Bruijn 2025-04-04 17:56 ` Maciej Żenczykowski 0 siblings, 1 reply; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 16:33 UTC (permalink / raw) To: Daniel Borkmann Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller, Maciej Żenczykowski On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > Hi Willem, > > On 4/4/25 4:23 PM, Willem de Bruijn wrote: > [...] > > v1->v2 > > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > > --- > > include/linux/filter.h | 3 -- > > kernel/bpf/core.c | 21 ----------- > > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > > 3 files changed, 44 insertions(+), 60 deletions(-) > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > index f5cf4d35d83e..708ac7e0cd36 100644 > > --- a/include/linux/filter.h > > +++ b/include/linux/filter.h > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > } > > } > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > - int k, unsigned int size); > > - > > static inline int bpf_tell_extensions(void) > > { > > return SKF_AD_MAX; > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > index ba6b6118cf50..0e836b5ac9a0 100644 > > --- a/kernel/bpf/core.c > > +++ b/kernel/bpf/core.c > > @@ -68,27 +68,6 @@ > > struct bpf_mem_alloc bpf_global_ma; > > bool bpf_global_ma_set; > > > > -/* No hurry in this branch > > - * > > - * Exported for the bpf jit load helper. > > - */ > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > -{ > > - u8 *ptr = NULL; > > - > > - if (k >= SKF_NET_OFF) { > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > - } else if (k >= SKF_LL_OFF) { > > - if (unlikely(!skb_mac_header_was_set(skb))) > > - return NULL; > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > - } > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > - return ptr; > > - > > - return NULL; > > -} > > Wouldn't this break sparc 32bit JIT which still calls into this? > > arch/sparc/net/bpf_jit_asm_32.S : > > #define bpf_negative_common(LEN) \ > save %sp, -SAVE_SZ, %sp; \ > mov %i0, %o0; \ > mov r_OFF, %o1; \ > SIGN_EXTEND(%o1); \ > call bpf_internal_load_pointer_neg_helper; \ > mov (LEN), %o2; \ > mov %o0, r_TMP; \ > cmp %o0, 0; \ > BE_PTR(bpf_error); \ > restore; Argh, good catch. Thanks Daniel. I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 16:33 ` Willem de Bruijn @ 2025-04-04 17:56 ` Maciej Żenczykowski 2025-04-07 9:00 ` Daniel Borkmann 0 siblings, 1 reply; 9+ messages in thread From: Maciej Żenczykowski @ 2025-04-04 17:56 UTC (permalink / raw) To: Willem de Bruijn Cc: Daniel Borkmann, bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > Hi Willem, > > > > On 4/4/25 4:23 PM, Willem de Bruijn wrote: > > [...] > > > v1->v2 > > > - introduce bfp_skb_load_helper_convert_offset to avoid open coding > > > --- > > > include/linux/filter.h | 3 -- > > > kernel/bpf/core.c | 21 ----------- > > > net/core/filter.c | 80 +++++++++++++++++++++++------------------- > > > 3 files changed, 44 insertions(+), 60 deletions(-) > > > > > > diff --git a/include/linux/filter.h b/include/linux/filter.h > > > index f5cf4d35d83e..708ac7e0cd36 100644 > > > --- a/include/linux/filter.h > > > +++ b/include/linux/filter.h > > > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > > > } > > > } > > > > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > > > - int k, unsigned int size); > > > - > > > static inline int bpf_tell_extensions(void) > > > { > > > return SKF_AD_MAX; > > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > > > index ba6b6118cf50..0e836b5ac9a0 100644 > > > --- a/kernel/bpf/core.c > > > +++ b/kernel/bpf/core.c > > > @@ -68,27 +68,6 @@ > > > struct bpf_mem_alloc bpf_global_ma; > > > bool bpf_global_ma_set; > > > > > > -/* No hurry in this branch > > > - * > > > - * Exported for the bpf jit load helper. > > > - */ > > > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > > > -{ > > > - u8 *ptr = NULL; > > > - > > > - if (k >= SKF_NET_OFF) { > > > - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > > > - } else if (k >= SKF_LL_OFF) { > > > - if (unlikely(!skb_mac_header_was_set(skb))) > > > - return NULL; > > > - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > > > - } > > > - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > > > - return ptr; > > > - > > > - return NULL; > > > -} > > > > Wouldn't this break sparc 32bit JIT which still calls into this? > > > > arch/sparc/net/bpf_jit_asm_32.S : > > > > #define bpf_negative_common(LEN) \ > > save %sp, -SAVE_SZ, %sp; \ > > mov %i0, %o0; \ > > mov r_OFF, %o1; \ > > SIGN_EXTEND(%o1); \ > > call bpf_internal_load_pointer_neg_helper; \ > > mov (LEN), %o2; \ > > mov %o0, r_TMP; \ > > cmp %o0, 0; \ > > BE_PTR(bpf_error); \ > > restore; > > Argh, good catch. Thanks Daniel. > > I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. add a 'deprecated only used by sparc32 comment' hopefully someone that knows sparc32 assembly can fix it -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-04 17:56 ` Maciej Żenczykowski @ 2025-04-07 9:00 ` Daniel Borkmann 2025-04-07 14:47 ` Willem de Bruijn 0 siblings, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2025-04-07 9:00 UTC (permalink / raw) To: Maciej Żenczykowski, Willem de Bruijn Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller On 4/4/25 7:56 PM, Maciej Żenczykowski wrote: > On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: >> On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >>> >>> Hi Willem, >>> >>> On 4/4/25 4:23 PM, Willem de Bruijn wrote: >>> [...] >>>> v1->v2 >>>> - introduce bfp_skb_load_helper_convert_offset to avoid open coding >>>> --- >>>> include/linux/filter.h | 3 -- >>>> kernel/bpf/core.c | 21 ----------- >>>> net/core/filter.c | 80 +++++++++++++++++++++++------------------- >>>> 3 files changed, 44 insertions(+), 60 deletions(-) >>>> >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h >>>> index f5cf4d35d83e..708ac7e0cd36 100644 >>>> --- a/include/linux/filter.h >>>> +++ b/include/linux/filter.h >>>> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) >>>> } >>>> } >>>> >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, >>>> - int k, unsigned int size); >>>> - >>>> static inline int bpf_tell_extensions(void) >>>> { >>>> return SKF_AD_MAX; >>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>>> index ba6b6118cf50..0e836b5ac9a0 100644 >>>> --- a/kernel/bpf/core.c >>>> +++ b/kernel/bpf/core.c >>>> @@ -68,27 +68,6 @@ >>>> struct bpf_mem_alloc bpf_global_ma; >>>> bool bpf_global_ma_set; >>>> >>>> -/* No hurry in this branch >>>> - * >>>> - * Exported for the bpf jit load helper. >>>> - */ >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) >>>> -{ >>>> - u8 *ptr = NULL; >>>> - >>>> - if (k >= SKF_NET_OFF) { >>>> - ptr = skb_network_header(skb) + k - SKF_NET_OFF; >>>> - } else if (k >= SKF_LL_OFF) { >>>> - if (unlikely(!skb_mac_header_was_set(skb))) >>>> - return NULL; >>>> - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; >>>> - } >>>> - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) >>>> - return ptr; >>>> - >>>> - return NULL; >>>> -} >>> >>> Wouldn't this break sparc 32bit JIT which still calls into this? >>> >>> arch/sparc/net/bpf_jit_asm_32.S : >>> >>> #define bpf_negative_common(LEN) \ >>> save %sp, -SAVE_SZ, %sp; \ >>> mov %i0, %o0; \ >>> mov r_OFF, %o1; \ >>> SIGN_EXTEND(%o1); \ >>> call bpf_internal_load_pointer_neg_helper; \ >>> mov (LEN), %o2; \ >>> mov %o0, r_TMP; \ >>> cmp %o0, 0; \ >>> BE_PTR(bpf_error); \ >>> restore; >> >> Argh, good catch. Thanks Daniel. >> >> I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. > > add a 'deprecated only used by sparc32 comment' > > hopefully someone that knows sparc32 assembly can fix it Alternatively, the bpf_internal_load_pointer_neg_helper() could be moved entirely over into arch/sparc/net/ so that others won't be tempted to reuse. Cheers, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags 2025-04-07 9:00 ` Daniel Borkmann @ 2025-04-07 14:47 ` Willem de Bruijn 0 siblings, 0 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-07 14:47 UTC (permalink / raw) To: Daniel Borkmann, Maciej Żenczykowski, Willem de Bruijn Cc: bpf, netdev, ast, john.fastabend, Willem de Bruijn, Matt Moeller Daniel Borkmann wrote: > On 4/4/25 7:56 PM, Maciej Żenczykowski wrote: > > On Fri, Apr 4, 2025 at 9:34 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > >> On Fri, Apr 4, 2025 at 12:11 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >>> > >>> Hi Willem, > >>> > >>> On 4/4/25 4:23 PM, Willem de Bruijn wrote: > >>> [...] > >>>> v1->v2 > >>>> - introduce bfp_skb_load_helper_convert_offset to avoid open coding > >>>> --- > >>>> include/linux/filter.h | 3 -- > >>>> kernel/bpf/core.c | 21 ----------- > >>>> net/core/filter.c | 80 +++++++++++++++++++++++------------------- > >>>> 3 files changed, 44 insertions(+), 60 deletions(-) > >>>> > >>>> diff --git a/include/linux/filter.h b/include/linux/filter.h > >>>> index f5cf4d35d83e..708ac7e0cd36 100644 > >>>> --- a/include/linux/filter.h > >>>> +++ b/include/linux/filter.h > >>>> @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest) > >>>> } > >>>> } > >>>> > >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > >>>> - int k, unsigned int size); > >>>> - > >>>> static inline int bpf_tell_extensions(void) > >>>> { > >>>> return SKF_AD_MAX; > >>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > >>>> index ba6b6118cf50..0e836b5ac9a0 100644 > >>>> --- a/kernel/bpf/core.c > >>>> +++ b/kernel/bpf/core.c > >>>> @@ -68,27 +68,6 @@ > >>>> struct bpf_mem_alloc bpf_global_ma; > >>>> bool bpf_global_ma_set; > >>>> > >>>> -/* No hurry in this branch > >>>> - * > >>>> - * Exported for the bpf jit load helper. > >>>> - */ > >>>> -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size) > >>>> -{ > >>>> - u8 *ptr = NULL; > >>>> - > >>>> - if (k >= SKF_NET_OFF) { > >>>> - ptr = skb_network_header(skb) + k - SKF_NET_OFF; > >>>> - } else if (k >= SKF_LL_OFF) { > >>>> - if (unlikely(!skb_mac_header_was_set(skb))) > >>>> - return NULL; > >>>> - ptr = skb_mac_header(skb) + k - SKF_LL_OFF; > >>>> - } > >>>> - if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb)) > >>>> - return ptr; > >>>> - > >>>> - return NULL; > >>>> -} > >>> > >>> Wouldn't this break sparc 32bit JIT which still calls into this? > >>> > >>> arch/sparc/net/bpf_jit_asm_32.S : > >>> > >>> #define bpf_negative_common(LEN) \ > >>> save %sp, -SAVE_SZ, %sp; \ > >>> mov %i0, %o0; \ > >>> mov r_OFF, %o1; \ > >>> SIGN_EXTEND(%o1); \ > >>> call bpf_internal_load_pointer_neg_helper; \ > >>> mov (LEN), %o2; \ > >>> mov %o0, r_TMP; \ > >>> cmp %o0, 0; \ > >>> BE_PTR(bpf_error); \ > >>> restore; > >> > >> Argh, good catch. Thanks Daniel. > >> > >> I'll drop the removal of bpf_internal_load_pointer_neg_helper from the patch. > > > > add a 'deprecated only used by sparc32 comment' > > > > hopefully someone that knows sparc32 assembly can fix it > > Alternatively, the bpf_internal_load_pointer_neg_helper() could be moved entirely > over into arch/sparc/net/ so that others won't be tempted to reuse. I'd prefer to keep it as is. I took a stab, but my Debian has no sparc32 gcc cross compiler anymore, and I was unable to cross compile with clang either. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn @ 2025-04-04 14:23 ` Willem de Bruijn 1 sibling, 0 replies; 9+ messages in thread From: Willem de Bruijn @ 2025-04-04 14:23 UTC (permalink / raw) To: bpf Cc: netdev, ast, daniel, john.fastabend, Willem de Bruijn, Stanislav Fomichev From: Willem de Bruijn <willemb@google.com> Verify that a classic BPF linux socket filter correctly matches packet contents. Including when accessing contents in an skb_frag. 1. Open a SOCK_RAW socket with a classic BPF filter on UDP dport 8000. 2. Open a tap device with IFF_NAPI_FRAGS to inject skbs with frags. 3. Send a packet for which the UDP header is in frag[0]. 4. Receive this packet to demonstrate that the socket accepted it. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Willem de Bruijn <willemb@google.com> --- v1->v2 - add comment why early demux must be disabled --- tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 2 + tools/testing/selftests/net/skf_net_off.c | 244 +++++++++++++++++++++ tools/testing/selftests/net/skf_net_off.sh | 30 +++ 4 files changed, 277 insertions(+) create mode 100644 tools/testing/selftests/net/skf_net_off.c create mode 100755 tools/testing/selftests/net/skf_net_off.sh diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore index 679542f565a4..532bb732bc6d 100644 --- a/tools/testing/selftests/net/.gitignore +++ b/tools/testing/selftests/net/.gitignore @@ -39,6 +39,7 @@ scm_rights sk_bind_sendto_listen sk_connect_zero_addr sk_so_peek_off +skf_net_off socket so_incoming_cpu so_netns_cookie diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 6d718b478ed8..124078b56fa4 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -106,6 +106,8 @@ TEST_PROGS += ipv6_route_update_soft_lockup.sh TEST_PROGS += busy_poll_test.sh TEST_GEN_PROGS += proc_net_pktgen TEST_PROGS += lwt_dst_cache_ref_loop.sh +TEST_PROGS += skf_net_off.sh +TEST_GEN_FILES += skf_net_off # YNL files, must be before "include ..lib.mk" YNL_GEN_FILES := busy_poller netlink-dumps diff --git a/tools/testing/selftests/net/skf_net_off.c b/tools/testing/selftests/net/skf_net_off.c new file mode 100644 index 000000000000..1fdf61d6cd7f --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.c @@ -0,0 +1,244 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* Open a tun device. + * + * [modifications: use IFF_NAPI_FRAGS, add sk filter] + * + * Expects the device to have been configured previously, e.g.: + * sudo ip tuntap add name tap1 mode tap + * sudo ip link set tap1 up + * sudo ip link set dev tap1 addr 02:00:00:00:00:01 + * sudo ip -6 addr add fdab::1 peer fdab::2 dev tap1 nodad + * + * And to avoid premature pskb_may_pull: + * + * sudo ethtool -K tap1 gro off + * sudo bash -c 'echo 0 > /proc/sys/net/ipv4/ip_early_demux' + */ + +#define _GNU_SOURCE + +#include <arpa/inet.h> +#include <errno.h> +#include <error.h> +#include <fcntl.h> +#include <getopt.h> +#include <linux/filter.h> +#include <linux/if.h> +#include <linux/if_packet.h> +#include <linux/if_tun.h> +#include <linux/ipv6.h> +#include <netinet/if_ether.h> +#include <netinet/in.h> +#include <netinet/ip.h> +#include <netinet/ip6.h> +#include <netinet/udp.h> +#include <poll.h> +#include <signal.h> +#include <stdbool.h> +#include <stddef.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/socket.h> +#include <sys/poll.h> +#include <sys/types.h> +#include <sys/uio.h> +#include <unistd.h> + +static bool cfg_do_filter; +static bool cfg_do_frags; +static int cfg_dst_port = 8000; +static char *cfg_ifname; + +static int tun_open(const char *tun_name) +{ + struct ifreq ifr = {0}; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd == -1) + error(1, errno, "open /dev/net/tun"); + + ifr.ifr_flags = IFF_TAP; + if (cfg_do_frags) + ifr.ifr_flags |= IFF_NAPI | IFF_NAPI_FRAGS; + + strncpy(ifr.ifr_name, tun_name, IFNAMSIZ - 1); + + ret = ioctl(fd, TUNSETIFF, &ifr); + if (ret) + error(1, ret, "ioctl TUNSETIFF"); + + return fd; +} + +static void sk_set_filter(int fd) +{ + const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); + const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); + + /* Filter UDP packets with destination port cfg_dst_port */ + struct sock_filter filter_code[] = { + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), + BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), + BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), + BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, cfg_dst_port, 1, 0), + BPF_STMT(BPF_RET + BPF_K, 0), + BPF_STMT(BPF_RET + BPF_K, 0xFFFF), + }; + + struct sock_fprog filter = { + sizeof(filter_code) / sizeof(filter_code[0]), + filter_code, + }; + + if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter))) + error(1, errno, "setsockopt attach filter"); +} + +static int raw_open(void) +{ + int fd; + + fd = socket(PF_INET6, SOCK_RAW, IPPROTO_UDP); + if (fd == -1) + error(1, errno, "socket raw (udp)"); + + if (cfg_do_filter) + sk_set_filter(fd); + + return fd; +} + +static void tun_write(int fd) +{ + const char eth_src[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x02 }; + const char eth_dst[] = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x01 }; + struct tun_pi pi = {0}; + struct ipv6hdr ip6h = {0}; + struct udphdr uh = {0}; + struct ethhdr eth = {0}; + uint32_t payload; + struct iovec iov[5]; + int ret; + + pi.proto = htons(ETH_P_IPV6); + + memcpy(eth.h_source, eth_src, sizeof(eth_src)); + memcpy(eth.h_dest, eth_dst, sizeof(eth_dst)); + eth.h_proto = htons(ETH_P_IPV6); + + ip6h.version = 6; + ip6h.payload_len = htons(sizeof(uh) + sizeof(uint32_t)); + ip6h.nexthdr = IPPROTO_UDP; + ip6h.hop_limit = 8; + if (inet_pton(AF_INET6, "fdab::2", &ip6h.saddr) != 1) + error(1, errno, "inet_pton src"); + if (inet_pton(AF_INET6, "fdab::1", &ip6h.daddr) != 1) + error(1, errno, "inet_pton src"); + + uh.source = htons(8000); + uh.dest = htons(cfg_dst_port); + uh.len = ip6h.payload_len; + uh.check = 0; + + payload = htonl(0xABABABAB); /* Covered in IPv6 length */ + + iov[0].iov_base = π + iov[0].iov_len = sizeof(pi); + iov[1].iov_base = ð + iov[1].iov_len = sizeof(eth); + iov[2].iov_base = &ip6h; + iov[2].iov_len = sizeof(ip6h); + iov[3].iov_base = &uh; + iov[3].iov_len = sizeof(uh); + iov[4].iov_base = &payload; + iov[4].iov_len = sizeof(payload); + + ret = writev(fd, iov, sizeof(iov) / sizeof(iov[0])); + if (ret <= 0) + error(1, errno, "writev"); +} + +static void raw_read(int fd) +{ + struct timeval tv = { .tv_usec = 100 * 1000 }; + struct msghdr msg = {0}; + struct iovec iov[2]; + struct udphdr uh; + uint32_t payload[2]; + int ret; + + if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv))) + error(1, errno, "setsockopt rcvtimeo udp"); + + iov[0].iov_base = &uh; + iov[0].iov_len = sizeof(uh); + + iov[1].iov_base = payload; + iov[1].iov_len = sizeof(payload); + + msg.msg_iov = iov; + msg.msg_iovlen = sizeof(iov) / sizeof(iov[0]); + + ret = recvmsg(fd, &msg, 0); + if (ret <= 0) + error(1, errno, "read raw"); + if (ret != sizeof(uh) + sizeof(payload[0])) + error(1, errno, "read raw: len=%d\n", ret); + + fprintf(stderr, "raw recv: 0x%x\n", payload[0]); +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "fFi:")) != -1) { + switch (c) { + case 'f': + cfg_do_filter = true; + printf("bpf filter enabled\n"); + break; + case 'F': + cfg_do_frags = true; + printf("napi frags mode enabled\n"); + break; + case 'i': + cfg_ifname = optarg; + break; + default: + error(1, 0, "unknown option %c", optopt); + break; + } + } + + if (!cfg_ifname) + error(1, 0, "must specify tap interface name (-i)"); +} + +int main(int argc, char **argv) +{ + int fdt, fdr; + + parse_opts(argc, argv); + + fdr = raw_open(); + fdt = tun_open(cfg_ifname); + + tun_write(fdt); + raw_read(fdr); + + if (close(fdt)) + error(1, errno, "close tun"); + if (close(fdr)) + error(1, errno, "close udp"); + + fprintf(stderr, "OK\n"); + return 0; +} + diff --git a/tools/testing/selftests/net/skf_net_off.sh b/tools/testing/selftests/net/skf_net_off.sh new file mode 100755 index 000000000000..5da5066fb465 --- /dev/null +++ b/tools/testing/selftests/net/skf_net_off.sh @@ -0,0 +1,30 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +readonly NS="ns-$(mktemp -u XXXXXX)" + +cleanup() { + ip netns del $NS +} + +ip netns add $NS +trap cleanup EXIT + +ip -netns $NS link set lo up +ip -netns $NS tuntap add name tap1 mode tap +ip -netns $NS link set tap1 up +ip -netns $NS link set dev tap1 addr 02:00:00:00:00:01 +ip -netns $NS -6 addr add fdab::1 peer fdab::2 dev tap1 nodad +ip netns exec $NS ethtool -K tap1 gro off + +# disable early demux, else udp_v6_early_demux pulls udp header into linear +ip netns exec $NS sysctl -w net.ipv4.ip_early_demux=0 + +echo "no filter" +ip netns exec $NS ./skf_net_off -i tap1 + +echo "filter, linear skb (-f)" +ip netns exec $NS ./skf_net_off -i tap1 -f + +echo "filter, fragmented skb (-f) (-F)" +ip netns exec $NS ./skf_net_off -i tap1 -f -F -- 2.49.0.504.g3bcea36a83-goog ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-04-07 14:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-04 14:23 [PATCH bpf v2 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 1/2] bpf: " Willem de Bruijn 2025-04-04 15:08 ` Stanislav Fomichev 2025-04-04 16:11 ` Daniel Borkmann 2025-04-04 16:33 ` Willem de Bruijn 2025-04-04 17:56 ` Maciej Żenczykowski 2025-04-07 9:00 ` Daniel Borkmann 2025-04-07 14:47 ` Willem de Bruijn 2025-04-04 14:23 ` [PATCH bpf v2 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.