* RX metadata kfuncs cause kernel panic with XDP generic mode
@ 2025-01-23 16:02 Marcus Wichelmann
2025-01-23 16:38 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 16:02 UTC (permalink / raw)
To: bpf; +Cc: Stanislav Fomichev, Toke Høiland-Jørgensen
Hey there,
while taking a closer look at how the RX metadata kfuncs are implemented in the mlx5 and ice drivers,
I suspected a bug and, after testing, could in fact produce a NULL pointer dereference.
The mlx5 driver implements the RX metadata kfuncs like, for example, bpf_xdp_metadata_rx_vlan_tag by
casting the xdp_md pointer from the function argument to an mlx5e_xdp_buff pointer. This is needed to
get access to the packet metadata. See mlx5e_xdp_rx_vlan_tag for example. The ice driver works similarly.
This is fine, because normally these drivers always create a full mlx5e_xdp_buff struct when allocating
the xdp_buff struct. But when a device-bound XDP program is attached to the mlx5 netdevice in generic mode,
the xdp_buff is not allocated by the mlx5 driver but as a part of the do_xdp_generic implementation.
Now, when a packet comes in and the XDP program tries to call one of these kfuncs, the kfunc implementation
will try to dereference pointers inside the mlx5e_xdp_buff struct which is not fully allocated, leading to a
NULL pointer dereference.
There is probably a check missing somewhere that prevents the use of these kfuncs in the scope of
do_xdp_generic? Or may there be another way to implement the RX metadata kfuncs in the driver that does not
involve casting the xdp_buff pointer?
Here is how this can be reproduced:
eBPF program:
#include <bpf.h>
extern int bpf_xdp_metadata_rx_vlan_tag(
const struct xdp_md *ctx, __be16 *vlan_proto, __u16 *vlan_tci) __ksym;
SEC("xdp")
int ingress(struct xdp_md *ctx) {
__be16 vlan_proto;
__u16 vlan_tci;
if (bpf_xdp_metadata_rx_vlan_tag(ctx, &vlan_proto, &vlan_tci) != 0) {
return XDP_ABORTED;
}
return XDP_DROP;
}
char _license[] SEC("license") = "GPL";
Load and attach it as a device-bound program to a mlx5 NIC in XDP-generic mode:
# bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
# bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
Then make sure a packet is coming in on that NIC port so the XDP program gets called:
# ping -I mlx5-conx5-2 1.1.1.1
In my testing environment, mlx5-conx5-2 and mlx5-conx5-1 are directly connected.
Kernel output:
Unable to handle kernel NULL pointer dereference at virtual address 000000000000001d
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000008035e557000
[000000000000001d] pgd=0000000000000000, p4d=0000000000000000
This was reproduced with Linux 6.12 mainline (adc2186).
--
Best regards,
Marcus Wichelmann
Linux Networking Specialist
Team SDN
______________________________
Hetzner Cloud GmbH
Feringastraße 12A
85774 Unterföhring
Germany
Phone: +49 89 381690 150
E-Mail: marcus.wichelmann@hetzner-cloud.de
Handelsregister München HRB 226782
Geschäftsführer: Sebastian Färber, Markus Kalmuk
------------------
For information on the processing of your personal
data in the context of this contact, please see
https://hetzner-cloud.de/datenschutz
------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 16:02 RX metadata kfuncs cause kernel panic with XDP generic mode Marcus Wichelmann
@ 2025-01-23 16:38 ` Toke Høiland-Jørgensen
2025-01-23 17:59 ` Marcus Wichelmann
2025-01-23 19:13 ` Stanislav Fomichev
0 siblings, 2 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-23 16:38 UTC (permalink / raw)
To: Marcus Wichelmann, bpf; +Cc: Stanislav Fomichev
Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
> There is probably a check missing somewhere that prevents the use of
> these kfuncs in the scope of do_xdp_generic?
Heh, yeah, we should definitely block device-bound programs from being
attached in generic mode. Something like the below, I guess. Care to
test that out?
-Toke
diff --git a/net/core/dev.c b/net/core/dev.c
index afa2282f2604..c1fa68264989 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
NL_SET_ERR_MSG(extack, "Program bound to different device");
return -EINVAL;
}
+ if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
+ NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
+ return -EINVAL;
+ }
if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
return -EINVAL;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 16:38 ` Toke Høiland-Jørgensen
@ 2025-01-23 17:59 ` Marcus Wichelmann
2025-01-23 19:51 ` Stanislav Fomichev
2025-01-23 19:13 ` Stanislav Fomichev
1 sibling, 1 reply; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 17:59 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, bpf; +Cc: stfomichev
Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>
>> There is probably a check missing somewhere that prevents the use of
>> these kfuncs in the scope of do_xdp_generic?
>
> Heh, yeah, we should definitely block device-bound programs from being
> attached in generic mode. Something like the below, I guess. Care to
> test that out?
>
> -Toke
>
Ah, thanks for the quick patch. ;)
I have tested your patch with the 6.12 branch I'm currently working with and this does the job.
# bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
# bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
libbpf: Kernel error message: Can't attach device-bound programs in generic mode
Error: interface xdpgeneric attach failed: Invalid argument
The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
even programs attached in driver-mode may take the generic XDP path. How can this be handled there?
Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
it may become one in the future.
Marcus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 16:38 ` Toke Høiland-Jørgensen
2025-01-23 17:59 ` Marcus Wichelmann
@ 2025-01-23 19:13 ` Stanislav Fomichev
2025-01-23 19:34 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-01-23 19:13 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Marcus Wichelmann, bpf, Stanislav Fomichev
On 01/23, Toke Høiland-Jørgensen wrote:
> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>
> > There is probably a check missing somewhere that prevents the use of
> > these kfuncs in the scope of do_xdp_generic?
>
> Heh, yeah, we should definitely block device-bound programs from being
> attached in generic mode. Something like the below, I guess. Care to
> test that out?
>
> -Toke
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index afa2282f2604..c1fa68264989 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
> NL_SET_ERR_MSG(extack, "Program bound to different device");
> return -EINVAL;
> }
> + if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
> + NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
> + return -EINVAL;
> + }
> if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
> NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
> return -EINVAL;
>
I'm assuming you'll properly post a patch at some point?
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Might be a good idea to extend bpf_offload.py with that condition.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 19:13 ` Stanislav Fomichev
@ 2025-01-23 19:34 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-01-23 19:34 UTC (permalink / raw)
To: Stanislav Fomichev; +Cc: Marcus Wichelmann, bpf
Stanislav Fomichev <stfomichev@gmail.com> writes:
> On 01/23, Toke Høiland-Jørgensen wrote:
>> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>>
>> > There is probably a check missing somewhere that prevents the use of
>> > these kfuncs in the scope of do_xdp_generic?
>>
>> Heh, yeah, we should definitely block device-bound programs from being
>> attached in generic mode. Something like the below, I guess. Care to
>> test that out?
>>
>> -Toke
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index afa2282f2604..c1fa68264989 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -9924,6 +9924,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
>> NL_SET_ERR_MSG(extack, "Program bound to different device");
>> return -EINVAL;
>> }
>> + if (bpf_prog_is_dev_bound(new_prog->aux) && mode == XDP_MODE_SKB) {
>> + NL_SET_ERR_MSG(extack, "Can't attach device-bound programs in generic mode");
>> + return -EINVAL;
>> + }
>> if (new_prog->expected_attach_type == BPF_XDP_DEVMAP) {
>> NL_SET_ERR_MSG(extack, "BPF_XDP_DEVMAP programs can not be attached to a device");
>> return -EINVAL;
>>
>
> I'm assuming you'll properly post a patch at some point?
>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Yes, will do - thanks for the ACK!
> Might be a good idea to extend bpf_offload.py with that condition.
Right, I'll take a look.
-Toke
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 17:59 ` Marcus Wichelmann
@ 2025-01-23 19:51 ` Stanislav Fomichev
2025-01-23 20:21 ` Marcus Wichelmann
0 siblings, 1 reply; 7+ messages in thread
From: Stanislav Fomichev @ 2025-01-23 19:51 UTC (permalink / raw)
To: Marcus Wichelmann; +Cc: Toke Høiland-Jørgensen, bpf
On 01/23, Marcus Wichelmann wrote:
>
> Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
> > Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
> >
> > > There is probably a check missing somewhere that prevents the use of
> > > these kfuncs in the scope of do_xdp_generic?
> >
> > Heh, yeah, we should definitely block device-bound programs from being
> > attached in generic mode. Something like the below, I guess. Care to
> > test that out?
> >
> > -Toke
> >
> Ah, thanks for the quick patch. ;)
>
> I have tested your patch with the 6.12 branch I'm currently working with and this does the job.
>
> # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
> # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
> libbpf: Kernel error message: Can't attach device-bound programs in generic mode
> Error: interface xdpgeneric attach failed: Invalid argument
>
> The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
> even programs attached in driver-mode may take the generic XDP path. How can this be handled there?
[..]
> Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
> it may become one in the future.
We can solve it if/when we add metadata_ops to the tun driver, right?
Not sure we need any immediate action right now.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RX metadata kfuncs cause kernel panic with XDP generic mode
2025-01-23 19:51 ` Stanislav Fomichev
@ 2025-01-23 20:21 ` Marcus Wichelmann
0 siblings, 0 replies; 7+ messages in thread
From: Marcus Wichelmann @ 2025-01-23 20:21 UTC (permalink / raw)
To: Stanislav Fomichev; +Cc: Toke Høiland-Jørgensen, bpf
Am 23.01.25 um 20:51 schrieb Stanislav Fomichev:
> On 01/23, Marcus Wichelmann wrote:
>>
>> Am 23.01.25 um 17:38 schrieb Toke Høiland-Jørgensen:
>>> Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> writes:
>>>
>>>> There is probably a check missing somewhere that prevents the use of
>>>> these kfuncs in the scope of do_xdp_generic?
>>>
>>> Heh, yeah, we should definitely block device-bound programs from being
>>> attached in generic mode. Something like the below, I guess. Care to
>>> test that out?
>>>
>>> -Toke
>>>
>> Ah, thanks for the quick patch. ;)
>>
>> I have tested your patch with the 6.12 branch I'm currently working with and this does the job.
>>
>> # bpftool prog load crash.o /sys/fs/bpf/crash xdpmeta_dev mlx5-conx5-1
>> # bpftool net attach xdpgeneric pinned /sys/fs/bpf/crash dev mlx5-conx5-1
>> libbpf: Kernel error message: Can't attach device-bound programs in generic mode
>> Error: interface xdpgeneric attach failed: Invalid argument
>>
>> The do_xdp_generic is also used by the tun driver as a fallback in some cases, so, to my understanding,
>> even programs attached in driver-mode may take the generic XDP path. How can this be handled there?
>
> [..]
>
>> Currently, it's not an issue, because the tun driver does not implement the xdp_metadata_ops yet, but
>> it may become one in the future.
>
> We can solve it if/when we add metadata_ops to the tun driver, right?
> Not sure we need any immediate action right now.
Yeah, that's not needed for now.
I'm currently playing around with implementing a "bpf_xdp_metadata_rx_vnethdr" RX metadata kfunc that allows
to retrieve the virtio_net_hdr from an XDP program attached to the tun driver (to trigger the TX offloads
requested by a VM). That's why I discovered this.
But that's probably a story for another day. ;) I'll submit a patch when I have something working so it can be
discussed and we can see if that has a chance to be upstreamable.
Marcus
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-01-23 20:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 16:02 RX metadata kfuncs cause kernel panic with XDP generic mode Marcus Wichelmann
2025-01-23 16:38 ` Toke Høiland-Jørgensen
2025-01-23 17:59 ` Marcus Wichelmann
2025-01-23 19:51 ` Stanislav Fomichev
2025-01-23 20:21 ` Marcus Wichelmann
2025-01-23 19:13 ` Stanislav Fomichev
2025-01-23 19:34 ` Toke Høiland-Jørgensen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.