All of lore.kernel.org
 help / color / mirror / Atom feed
* RX metadata kfuncs cause kernel panic with XDP generic mode
@ 2025-01-23 16:02 Marcus Wichelmann
  2025-01-23 16:38 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 16:02 UTC (permalink / raw)
  To: bpf; +Cc: Stanislav Fomichev, Toke Høiland-Jørgensen

Hey there,

while taking a closer look at how the RX metadata kfuncs are implemented in the mlx5 and ice drivers,
I suspected a bug and, after testing, could in fact produce a NULL pointer dereference.

The mlx5 driver implements the RX metadata kfuncs like, for example, bpf_xdp_metadata_rx_vlan_tag by
casting the xdp_md pointer from the function argument to an mlx5e_xdp_buff pointer. This is needed to
get access to the packet metadata. See mlx5e_xdp_rx_vlan_tag for example. The ice driver works similarly.

This is fine, because normally these drivers always create a full mlx5e_xdp_buff struct when allocating
the xdp_buff struct. But when a device-bound XDP program is attached to the mlx5 netdevice in generic mode,
the xdp_buff is not allocated by the mlx5 driver but as a part of the do_xdp_generic implementation.

Now, when a packet comes in and the XDP program tries to call one of these kfuncs, the kfunc implementation
will try to dereference pointers inside the mlx5e_xdp_buff struct which is not fully allocated, leading to a
NULL pointer dereference.

There is probably a check missing somewhere that prevents the use of these kfuncs in the scope of
do_xdp_generic? Or may there be another way to implement the RX metadata kfuncs in the driver that does not
involve casting the xdp_buff pointer?

Here is how this can be reproduced:


eBPF program:

#include <bpf.h>

extern int bpf_xdp_metadata_rx_vlan_tag(
     const struct xdp_md *ctx, __be16 *vlan_proto, __u16 *vlan_tci) __ksym;

SEC("xdp")
int ingress(struct xdp_md *ctx) {
   __be16 vlan_proto;
   __u16 vlan_tci;
   if (bpf_xdp_metadata_rx_vlan_tag(ctx, &vlan_proto, &vlan_tci) != 0) {
     return XDP_ABORTED;
   }

   return XDP_DROP;
}

char _license[] SEC("license") = "GPL";


Load and attach it as a device-bound program to a mlx5 NIC in XDP-generic mode:

# bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
# bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1

Then make sure a packet is coming in on that NIC port so the XDP program gets called:

# ping -I mlx5-conx5-2 1.1.1.1

In my testing environment, mlx5-conx5-2 and mlx5-conx5-1 are directly connected.

Kernel output:

Unable to handle kernel NULL pointer dereference at virtual address 000000000000001d
Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000008035e557000
[000000000000001d] pgd=0000000000000000, p4d=0000000000000000

This was reproduced with Linux 6.12 mainline (adc2186).

-- 
Best regards,
Marcus Wichelmann
Linux Networking Specialist
Team SDN

______________________________

Hetzner Cloud GmbH
Feringastraße 12A
85774 Unterföhring
Germany

Phone: +49 89 381690 150
E-Mail: marcus.wichelmann@hetzner-cloud.de

Handelsregister München HRB 226782
Geschäftsführer: Sebastian Färber, Markus Kalmuk

------------------
For information on the processing of your personal
data in the context of this contact, please see
https://hetzner-cloud.de/datenschutz
------------------


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 16:02 RX metadata kfuncs cause kernel panic with XDP generic mode Marcus Wichelmann
@ 2025-01-23 16:38 ` Toke Høiland-Jørgensen
  2025-01-23 17:59   ` Marcus Wichelmann
  2025-01-23 19:13   ` Stanislav Fomichev
  0 siblings, 2 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-23 16:38 UTC (permalink / raw)
  To: Marcus Wichelmann, bpf; +Cc: Stanislav Fomichev

Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:

> There is probably a check missing somewhere that prevents the use of
> these kfuncs in the scope of do_xdp_generic?

Heh, yeah, we should definitely block device-bound programs from being
attached in generic mode. Something like the below, I guess. Care to
test that out?

-Toke

diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..c1fa68264989 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
                        NL_SET_ERR_MSG(extack, "Program bound to different device");
                        return -EINVAL;
                }
+               if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
+                       NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
+                       return -EINVAL;
+               }
                if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
                        NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
                        return -EINVAL;


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 16:38 ` Toke Høiland-Jørgensen
@ 2025-01-23 17:59   ` Marcus Wichelmann
  2025-01-23 19:51     ` Stanislav Fomichev
  2025-01-23 19:13   ` Stanislav Fomichev
  1 sibling, 1 reply; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 17:59 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf; +Cc: stfomichev


Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
> 
>> There is probably a check missing somewhere that prevents the use of
>> these kfuncs in the scope of do_xdp_generic?
> 
> Heh, yeah, we should definitely block device-bound programs from being
> attached in generic mode. Something like the below, I guess. Care to
> test that out?
> 
> -Toke
> 
Ah, thanks for the quick patch. ;)

I have tested your patch with the 6.12 branch I'm currently working with and this does the job.

   # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
   # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
   libbpf: Kernel error message: Can't attach device-bound programs in generic mode
   Error: interface xdpgeneric attach failed: Invalid argument

The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
even programs attached in driver-mode may take the generic XDP path. How can this be handled there?
Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
it may become one in the future.

Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 16:38 ` Toke Høiland-Jørgensen
  2025-01-23 17:59   ` Marcus Wichelmann
@ 2025-01-23 19:13   ` Stanislav Fomichev
  2025-01-23 19:34     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-01-23 19:13 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Marcus Wichelmann, bpf, Stanislav Fomichev

On 01/23, Toke Høiland-Jørgensen wrote:
> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
> 
> > There is probably a check missing somewhere that prevents the use of
> > these kfuncs in the scope of do_xdp_generic?
> 
> Heh, yeah, we should definitely block device-bound programs from being
> attached in generic mode. Something like the below, I guess. Care to
> test that out?
> 
> -Toke
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index afa2282f2604..c1fa68264989 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
>                         NL_SET_ERR_MSG(extack, "Program bound to different device");
>                         return -EINVAL;
>                 }
> +               if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
> +                       NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
> +                       return -EINVAL;
> +               }
>                 if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
>                         NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
>                         return -EINVAL;
> 

I'm assuming you'll properly post a patch at some point?

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Might be a good idea to extend bpf_offload.py with that condition.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 19:13   ` Stanislav Fomichev
@ 2025-01-23 19:34     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-23 19:34 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: Marcus Wichelmann, bpf

Stanislav Fomichev <stfomichev@gmail.com> writes:

> On 01/23, Toke Høiland-Jørgensen wrote:
>> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>> 
>> > There is probably a check missing somewhere that prevents the use of
>> > these kfuncs in the scope of do_xdp_generic?
>> 
>> Heh, yeah, we should definitely block device-bound programs from being
>> attached in generic mode. Something like the below, I guess. Care to
>> test that out?
>> 
>> -Toke
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index afa2282f2604..c1fa68264989 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
>>                         NL_SET_ERR_MSG(extack, "Program bound to different device");
>>                         return -EINVAL;
>>                 }
>> +               if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
>> +                       NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
>> +                       return -EINVAL;
>> +               }
>>                 if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
>>                         NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
>>                         return -EINVAL;
>> 
>
> I'm assuming you'll properly post a patch at some point?
>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Yes, will do - thanks for the ACK!

> Might be a good idea to extend bpf_offload.py with that condition.

Right, I'll take a look.

-Toke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 17:59   ` Marcus Wichelmann
@ 2025-01-23 19:51     ` Stanislav Fomichev
  2025-01-23 20:21       ` Marcus Wichelmann
  0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-01-23 19:51 UTC (permalink / raw)
  To: Marcus Wichelmann; +Cc: Toke Høiland-Jørgensen, bpf

On 01/23, Marcus Wichelmann wrote:
> 
> Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
> > Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
> > 
> > > There is probably a check missing somewhere that prevents the use of
> > > these kfuncs in the scope of do_xdp_generic?
> > 
> > Heh, yeah, we should definitely block device-bound programs from being
> > attached in generic mode. Something like the below, I guess. Care to
> > test that out?
> > 
> > -Toke
> > 
> Ah, thanks for the quick patch. ;)
> 
> I have tested your patch with the 6.12 branch I'm currently working with and this does the job.
> 
>   # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
>   # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
>   libbpf: Kernel error message: Can't attach device-bound programs in generic mode
>   Error: interface xdpgeneric attach failed: Invalid argument
> 
> The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
> even programs attached in driver-mode may take the generic XDP path. How can this be handled there?

[..]

> Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
> it may become one in the future.

We can solve it if/when we add metadata_ops to the tun driver, right?
Not sure we need any immediate action right now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
  2025-01-23 19:51     ` Stanislav Fomichev
@ 2025-01-23 20:21       ` Marcus Wichelmann
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 20:21 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: Toke Høiland-Jørgensen, bpf

Am 23.01.25 um 20:51 schrieb Stanislav Fomichev:
> On 01/23, Marcus Wichelmann wrote:
>>
>> Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
>>> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>>>
>>>> There is probably a check missing somewhere that prevents the use of
>>>> these kfuncs in the scope of do_xdp_generic?
>>>
>>> Heh, yeah, we should definitely block device-bound programs from being
>>> attached in generic mode. Something like the below, I guess. Care to
>>> test that out?
>>>
>>> -Toke
>>>
>> Ah, thanks for the quick patch. ;)
>>
>> I have tested your patch with the 6.12 branch I'm currently working with and this does the job.
>>
>>    # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
>>    # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
>>    libbpf: Kernel error message: Can't attach device-bound programs in generic mode
>>    Error: interface xdpgeneric attach failed: Invalid argument
>>
>> The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
>> even programs attached in driver-mode may take the generic XDP path. How can this be handled there?
> 
> [..]
> 
>> Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
>> it may become one in the future.
> 
> We can solve it if/when we add metadata_ops to the tun driver, right?
> Not sure we need any immediate action right now.

Yeah, that's not needed for now.

I'm currently playing around with implementing a "bpf_xdp_metadata_rx_vnethdr" RX metadata kfunc that allows
to retrieve the virtio_net_hdr from an XDP program attached to the tun driver (to trigger the TX offloads
requested by a VM). That's why I discovered this.
But that's probably a story for another day. ;) I'll submit a patch when I have something working so it can be
discussed and we can see if that has a chance to be upstreamable.

Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-01-23 20:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 16:02 RX metadata kfuncs cause kernel panic with XDP generic mode Marcus Wichelmann
2025-01-23 16:38 ` Toke Høiland-Jørgensen
2025-01-23 17:59   ` Marcus Wichelmann
2025-01-23 19:51     ` Stanislav Fomichev
2025-01-23 20:21       ` Marcus Wichelmann
2025-01-23 19:13   ` Stanislav Fomichev
2025-01-23 19:34     ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.