netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag
       [not found] <4bfab93d-f1ce-4aa7-82fe-16972b47972c@hetzner-cloud.de>
@ 2025-09-12 20:29 ` Stanislav Fomichev
  2025-09-12 22:47   ` Willem de Bruijn
  0 siblings, 1 reply; 2+ messages in thread
From: Stanislav Fomichev @ 2025-09-12 20:29 UTC (permalink / raw)
  To: Tobias Böhm
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Marcus Wichelmann, netdev, willemdebruijn.kernel

On 09/10, Tobias Böhm wrote:
> Hi,
> 
> when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> a tap device I observed unexpected segmentation.
> 
> In my setup there is a sched_cls program attached at the ingress path of a
> physical NIC with GRO enabled. Packets are redirected either directly for
> plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
> 
> For both kinds of traffic GRO on the physical NIC works as expected
> resulting in merged packets.
> 
> Large non-decapsulated packets are transmitted directly on the tap interface
> as expected. But surprisingly, decapsulated packets are being segmented
> again before transmission.
> 
> When analyzing and comparing the call chains I observed that
> netif_skb_features returns different values for the different kind of
> traffic.
> 
> The tap devices have the following features set:
> 
>     dev->features        =   0x1558c9
>     dev->hw_enc_features = 0x10000001
> 
> For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> the decapsulated traffic it returns 0x1. This is same value as the result of
> "dev->features & dev->hw_enc_features".
> 
> In netif_skb_features this operation effectively happens in case
> skb->encapsulation is set. Inspecting the skb in both cases showed that in
> case of decapsulation the skb->encapsulation flag was indeed still set.
> 
> I wonder if there is a reason that the skb->encapsulation flag is not unset
> in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> skb->encapsulation is set in bpf_skb_net_grow when adding space for
> encapsulation my expectation would be that the flag is also unset when doing
> the opposite operation.

+ Willem and netdev for visibility.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag
  2025-09-12 20:29 ` [BUG?] bpf_skb_net_shrink does not unset encapsulation flag Stanislav Fomichev
@ 2025-09-12 22:47   ` Willem de Bruijn
  0 siblings, 0 replies; 2+ messages in thread
From: Willem de Bruijn @ 2025-09-12 22:47 UTC (permalink / raw)
  To: Stanislav Fomichev, Tobias Böhm
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Marcus Wichelmann, netdev, willemdebruijn.kernel,
	william.xuanziyang

Stanislav Fomichev wrote:
> On 09/10, Tobias Böhm wrote:
> > Hi,
> > 
> > when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> > a tap device I observed unexpected segmentation.
> > 
> > In my setup there is a sched_cls program attached at the ingress path of a
> > physical NIC with GRO enabled. Packets are redirected either directly for
> > plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> > done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
> > 
> > For both kinds of traffic GRO on the physical NIC works as expected
> > resulting in merged packets.
> > 
> > Large non-decapsulated packets are transmitted directly on the tap interface
> > as expected. But surprisingly, decapsulated packets are being segmented
> > again before transmission.
> > 
> > When analyzing and comparing the call chains I observed that
> > netif_skb_features returns different values for the different kind of
> > traffic.
> > 
> > The tap devices have the following features set:
> > 
> >     dev->features        =   0x1558c9
> >     dev->hw_enc_features = 0x10000001
> > 
> > For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> > the decapsulated traffic it returns 0x1. This is same value as the result of
> > "dev->features & dev->hw_enc_features".
> > 
> > In netif_skb_features this operation effectively happens in case
> > skb->encapsulation is set. Inspecting the skb in both cases showed that in
> > case of decapsulation the skb->encapsulation flag was indeed still set.
> > 
> > I wonder if there is a reason that the skb->encapsulation flag is not unset
> > in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> > skb->encapsulation is set in bpf_skb_net_grow when adding space for
> > encapsulation my expectation would be that the flag is also unset when doing
> > the opposite operation.
> 
> + Willem and netdev for visibility.

I think it just has not been implemented before.

The encap path is more strict. Besides setting skb->encapsulation, it
also initializes the inner_.. helpers.

The decap path does not do this, it expects IPIP packets to arrive
from the network, without the stack detecting them as such or
setting skb->encapsulation.

We must preserve that behavior. But we additionally can detect skbs
with encapsulation fields configured, and convert those.

The encap path also explicit UDP_L4 and GRE flags to update GSO
packets. For VXLAN decap, we probably need the same?



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-09-12 22:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4bfab93d-f1ce-4aa7-82fe-16972b47972c@hetzner-cloud.de>
2025-09-12 20:29 ` [BUG?] bpf_skb_net_shrink does not unset encapsulation flag Stanislav Fomichev
2025-09-12 22:47   ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).