* Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag
[not found] <4bfab93d-f1ce-4aa7-82fe-16972b47972c@hetzner-cloud.de>
@ 2025-09-12 20:29 ` Stanislav Fomichev
2025-09-12 22:47 ` Willem de Bruijn
0 siblings, 1 reply; 2+ messages in thread
From: Stanislav Fomichev @ 2025-09-12 20:29 UTC (permalink / raw)
To: Tobias Böhm
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
Marcus Wichelmann, netdev, willemdebruijn.kernel
On 09/10, Tobias Böhm wrote:
> Hi,
>
> when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> a tap device I observed unexpected segmentation.
>
> In my setup there is a sched_cls program attached at the ingress path of a
> physical NIC with GRO enabled. Packets are redirected either directly for
> plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
>
> For both kinds of traffic GRO on the physical NIC works as expected
> resulting in merged packets.
>
> Large non-decapsulated packets are transmitted directly on the tap interface
> as expected. But surprisingly, decapsulated packets are being segmented
> again before transmission.
>
> When analyzing and comparing the call chains I observed that
> netif_skb_features returns different values for the different kind of
> traffic.
>
> The tap devices have the following features set:
>
> dev->features = 0x1558c9
> dev->hw_enc_features = 0x10000001
>
> For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> the decapsulated traffic it returns 0x1. This is same value as the result of
> "dev->features & dev->hw_enc_features".
>
> In netif_skb_features this operation effectively happens in case
> skb->encapsulation is set. Inspecting the skb in both cases showed that in
> case of decapsulation the skb->encapsulation flag was indeed still set.
>
> I wonder if there is a reason that the skb->encapsulation flag is not unset
> in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> skb->encapsulation is set in bpf_skb_net_grow when adding space for
> encapsulation my expectation would be that the flag is also unset when doing
> the opposite operation.
+ Willem and netdev for visibility.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag
2025-09-12 20:29 ` [BUG?] bpf_skb_net_shrink does not unset encapsulation flag Stanislav Fomichev
@ 2025-09-12 22:47 ` Willem de Bruijn
0 siblings, 0 replies; 2+ messages in thread
From: Willem de Bruijn @ 2025-09-12 22:47 UTC (permalink / raw)
To: Stanislav Fomichev, Tobias Böhm
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
Marcus Wichelmann, netdev, willemdebruijn.kernel,
william.xuanziyang
Stanislav Fomichev wrote:
> On 09/10, Tobias Böhm wrote:
> > Hi,
> >
> > when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> > a tap device I observed unexpected segmentation.
> >
> > In my setup there is a sched_cls program attached at the ingress path of a
> > physical NIC with GRO enabled. Packets are redirected either directly for
> > plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> > done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
> >
> > For both kinds of traffic GRO on the physical NIC works as expected
> > resulting in merged packets.
> >
> > Large non-decapsulated packets are transmitted directly on the tap interface
> > as expected. But surprisingly, decapsulated packets are being segmented
> > again before transmission.
> >
> > When analyzing and comparing the call chains I observed that
> > netif_skb_features returns different values for the different kind of
> > traffic.
> >
> > The tap devices have the following features set:
> >
> > dev->features = 0x1558c9
> > dev->hw_enc_features = 0x10000001
> >
> > For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> > the decapsulated traffic it returns 0x1. This is same value as the result of
> > "dev->features & dev->hw_enc_features".
> >
> > In netif_skb_features this operation effectively happens in case
> > skb->encapsulation is set. Inspecting the skb in both cases showed that in
> > case of decapsulation the skb->encapsulation flag was indeed still set.
> >
> > I wonder if there is a reason that the skb->encapsulation flag is not unset
> > in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> > skb->encapsulation is set in bpf_skb_net_grow when adding space for
> > encapsulation my expectation would be that the flag is also unset when doing
> > the opposite operation.
>
> + Willem and netdev for visibility.
I think it just has not been implemented before.
The encap path is more strict. Besides setting skb->encapsulation, it
also initializes the inner_.. helpers.
The decap path does not do this, it expects IPIP packets to arrive
from the network, without the stack detecting them as such or
setting skb->encapsulation.
We must preserve that behavior. But we additionally can detect skbs
with encapsulation fields configured, and convert those.
The encap path also explicit UDP_L4 and GRE flags to update GSO
packets. For VXLAN decap, we probably need the same?
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-09-12 22:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4bfab93d-f1ce-4aa7-82fe-16972b47972c@hetzner-cloud.de>
2025-09-12 20:29 ` [BUG?] bpf_skb_net_shrink does not unset encapsulation flag Stanislav Fomichev
2025-09-12 22:47 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).