Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Stanislav Fomichev <stfomichev@gmail.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, ast@kernel.org,
	daniel@iogearbox.net, john.fastabend@gmail.com,
	"Willem de Bruijn" <willemb@google.com>,
	"Matt Moeller" <moeller.matt@gmail.com>,
	"Maciej Żenczykowski" <maze@google.com>
Subject: Re: [PATCH bpf 1/2] bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags
Date: Thu, 03 Apr 2025 13:27:29 -0400	[thread overview]
Message-ID: <67eec501d0d58_14b7b229490@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <Z-7DiZWkOQ_n5aXw@mini-arch>

Stanislav Fomichev wrote:
> On 04/03, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
> > read when these offsets extend into frags.
> > 
> > This has been observed with iwlwifi and reproduced with tun with
> > IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
> > applied to a RAW socket, will silently miss matching packets.
> > 
> >     const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
> >     const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
> >     struct sock_filter filter_code[] = {
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
> >             BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, SKF_NET_OFF + offset_proto),
> >             BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
> >             BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, SKF_NET_OFF + offset_dport),
> > 
> > This is unexpected behavior. Socket filter programs should be
> > consistent regardless of environment. Silent misses are
> > particularly concerning as hard to detect.
> > 
> > Use skb_copy_bits for offsets outside linear, same as done for
> > non-SKF_(LL|NET) offsets.
> > 
> > Offset is always positive after subtracting the reference threshold
> > SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
> > the two is an offset against skb->data, and may be negative, but it
> > cannot point before skb->head, as skb_(mac|network)_offset would too.
> > 
> > This appears to go back to when frag support was introduced to
> > sk_run_filter in linux-2.4.4, before the introduction of git.
> > 
> > The amount of code change and 8/16/32 bit duplication are unfortunate.
> > But any attempt I made to be smarter saved very few LoC while
> > complicating the code.
> > 
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
> > Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
> > Reported-by: Matt Moeller <moeller.matt@gmail.com>
> > Co-developed-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  include/linux/filter.h |  3 --
> >  kernel/bpf/core.c      | 21 ------------
> >  net/core/filter.c      | 75 +++++++++++++++++++++++-------------------
> >  3 files changed, 42 insertions(+), 57 deletions(-)
> > 
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index f5cf4d35d83e..708ac7e0cd36 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -1496,9 +1496,6 @@ static inline u16 bpf_anc_helper(const struct sock_filter *ftest)
> >  	}
> >  }
> >  
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb,
> > -					   int k, unsigned int size);
> > -
> >  static inline int bpf_tell_extensions(void)
> >  {
> >  	return SKF_AD_MAX;
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index ba6b6118cf50..0e836b5ac9a0 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -68,27 +68,6 @@
> >  struct bpf_mem_alloc bpf_global_ma;
> >  bool bpf_global_ma_set;
> >  
> > -/* No hurry in this branch
> > - *
> > - * Exported for the bpf jit load helper.
> > - */
> > -void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size)
> > -{
> > -	u8 *ptr = NULL;
> > -
> > -	if (k >= SKF_NET_OFF) {
> > -		ptr = skb_network_header(skb) + k - SKF_NET_OFF;
> > -	} else if (k >= SKF_LL_OFF) {
> > -		if (unlikely(!skb_mac_header_was_set(skb)))
> > -			return NULL;
> > -		ptr = skb_mac_header(skb) + k - SKF_LL_OFF;
> > -	}
> > -	if (ptr >= skb->head && ptr + size <= skb_tail_pointer(skb))
> > -		return ptr;
> > -
> > -	return NULL;
> > -}
> > -
> >  /* tell bpf programs that include vmlinux.h kernel's PAGE_SIZE */
> >  enum page_size_enum {
> >  	__PAGE_SIZE = PAGE_SIZE
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index bc6828761a47..b232b70dd10d 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -221,21 +221,24 @@ BPF_CALL_3(bpf_skb_get_nlattr_nest, struct sk_buff *, skb, u32, a, u32, x)
> >  BPF_CALL_4(bpf_skb_load_helper_8, const struct sk_buff *, skb, const void *,
> >  	   data, int, headlen, int, offset)
> >  {
> > -	u8 tmp, *ptr;
> > +	u8 tmp;
> >  	const int len = sizeof(tmp);
> >  
> > -	if (offset >= 0) {
> > -		if (headlen - offset >= len)
> > -			return *(u8 *)(data + offset);
> > -		if (!skb_copy_bits(skb, offset, &tmp, sizeof(tmp)))
> > -			return tmp;
> > -	} else {
> > -		ptr = bpf_internal_load_pointer_neg_helper(skb, offset, len);
> > -		if (likely(ptr))
> > -			return *(u8 *)ptr;
> 
> [..]
> 
> > +	if (offset < 0) {
> > +		if (offset >= SKF_NET_OFF)
> > +			offset += skb_network_offset(skb) - SKF_NET_OFF;
> > +		else if (offset >= SKF_LL_OFF && skb_mac_header_was_set(skb))
> > +			offset += skb_mac_offset(skb) - SKF_LL_OFF;
> > +		else
> > +			return -EFAULT;
> >  	}
> 
> nit: we now repeat the same logic three times, maybe still worth it to put it
> into a helper? bpf_resolve_classic_offset or something. 

I definitely tried this in various ways. But since the core logic is
only four lines and there is an early return on error, no helper
really simplifies anything. It just adds a layer of indirection and
more code in the end.

next prev parent reply	other threads:[~2025-04-03 17:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 14:07 [PATCH bpf 0/2] support SKF_NET_OFF and SKF_LL_OFF on skb frags Willem de Bruijn
2025-04-03 14:07 ` [PATCH bpf 1/2] bpf: " Willem de Bruijn
2025-04-03 14:18   ` Willem de Bruijn
2025-04-03 17:21   ` Stanislav Fomichev
2025-04-03 17:27     ` Willem de Bruijn [this message]
2025-04-03 17:35       ` Stanislav Fomichev
2025-04-03 18:30         ` Maciej Żenczykowski
2025-04-03 18:54           ` Willem de Bruijn
2025-04-03 14:07 ` [PATCH bpf 2/2] selftests/net: test sk_filter support for SKF_NET_OFF on frags Willem de Bruijn
2025-04-03 17:29   ` Stanislav Fomichev
2025-04-03 18:56     ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67eec501d0d58_14b7b229490@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=maze@google.com \
    --cc=moeller.matt@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox