* [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
@ 2026-06-08 8:25 Johan Thomsen
2026-06-08 9:41 ` Ilya Maximets
0 siblings, 1 reply; 7+ messages in thread
From: Johan Thomsen @ 2026-06-08 8:25 UTC (permalink / raw)
To: netdev, dev
Hello,
I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
transmit path on a Talos Linux node running Kube-ovn with Geneve
overlay and KubeVirt VM traffic.
Environment:
Kernel: 6.18.33-talos
Distro: Talos v1.13.3
Compiler/config:
CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
CONFIG_CC_IS_CLANG=y
CONFIG_LTO=y
CONFIG_LTO_CLANG=y
CONFIG_LTO_CLANG_THIN=y
CONFIG_FORTIFY_SOURCE=y
Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
NIC driver: bnxt_en
Workload/network:
Kube-OVN, Geneve overlay
Open vSwitch datapath
KubeVirt/QEMU VM traffic via vhost/tap
Relevant console output:
[ 648.742603] memcpy: detected buffer overflow: 104 byte write of
buffer size 96
[ 648.749907] WARNING: CPU: 61 PID: 27020 at
lib/string_helpers.c:1036 __fortify_report+0x45/0x60
[ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
[ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
tainted 6.18.29-talos #1 PREEMPT(none)
[ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
Gen11, BIOS 2.84 11/05/2025
[ 648.890478] skb_tunnel_info_unclone+0x179/0x190
[ 648.895152] geneve_xmit+0x7fe/0xe00
[ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
[ 648.911479] __dev_queue_xmit+0x864/0xf40
[ 648.919688] do_execute_actions+0x9b9/0x1be0
[ 648.927727] ovs_execute_actions+0x58/0x170
[ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
[ 648.936370] ovs_vport_receive+0x90/0x100
[ 648.940428] netdev_frame_hook+0x146/0x1a0
[ 648.954093] __netif_receive_skb+0x3f/0x160
[ 648.958324] process_backlog+0x10c/0x210
[ 648.962295] __napi_poll+0x2f/0x190
[ 648.965832] net_rx_action+0x2e3/0x500
[ 648.969632] handle_softirqs+0xe7/0x310
[ 648.985387] tun_get_user+0x137e/0x1510
[ 649.005878] handle_tx+0x41f/0xd30
[ 649.029014] vhost_run_work_list+0x52/0x90
[ 649.033162] vhost_task_fn+0xc2/0x140
[ 649.064145] ---[ end trace 0000000000000000 ]---
[ 649.068820] ------------[ cut here ]------------
[ 649.073489] kernel BUG at lib/string_helpers.c:1043!
I don't know whether this is a real overflow or a FORTIFY false-positive.
I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
kernel, whereas the affected kernel is built with clang. Don't know
whether this is relevant here.
I am currently trying to make a reliable reproducer, but I can almost
always trigger the issue when iperf stressing the VM-network.
Please let me know if this should go to a more specific
maintainer/list or further info is needed. I am able to test candidate
patches if provided.
Downstream bug reports:
https://github.com/siderolabs/talos/issues/13440
https://github.com/kubeovn/kube-ovn/issues/6767
Thanks,
Johan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-08 8:25 [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() Johan Thomsen
@ 2026-06-08 9:41 ` Ilya Maximets
2026-06-10 13:10 ` Johan Thomsen
2026-06-10 19:51 ` Kees Cook
0 siblings, 2 replies; 7+ messages in thread
From: Ilya Maximets @ 2026-06-08 9:41 UTC (permalink / raw)
To: Johan Thomsen, netdev, dev; +Cc: i.maximets, Kees Cook
On 6/8/26 10:25 AM, Johan Thomsen wrote:
> Hello,
>
> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> transmit path on a Talos Linux node running Kube-ovn with Geneve
> overlay and KubeVirt VM traffic.
>
> Environment:
>
> Kernel: 6.18.33-talos
> Distro: Talos v1.13.3
>
> Compiler/config:
>
> CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> CONFIG_CC_IS_CLANG=y
> CONFIG_LTO=y
> CONFIG_LTO_CLANG=y
> CONFIG_LTO_CLANG_THIN=y
> CONFIG_FORTIFY_SOURCE=y
>
> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
>
> NIC driver: bnxt_en
>
> Workload/network:
>
> Kube-OVN, Geneve overlay
> Open vSwitch datapath
> KubeVirt/QEMU VM traffic via vhost/tap
>
> Relevant console output:
>
> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
> buffer size 96
> [ 648.749907] WARNING: CPU: 61 PID: 27020 at
> lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> tainted 6.18.29-talos #1 PREEMPT(none)
> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> Gen11, BIOS 2.84 11/05/2025
> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
> [ 648.895152] geneve_xmit+0x7fe/0xe00
> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
> [ 648.911479] __dev_queue_xmit+0x864/0xf40
> [ 648.919688] do_execute_actions+0x9b9/0x1be0
> [ 648.927727] ovs_execute_actions+0x58/0x170
> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
> [ 648.936370] ovs_vport_receive+0x90/0x100
> [ 648.940428] netdev_frame_hook+0x146/0x1a0
> [ 648.954093] __netif_receive_skb+0x3f/0x160
> [ 648.958324] process_backlog+0x10c/0x210
> [ 648.962295] __napi_poll+0x2f/0x190
> [ 648.965832] net_rx_action+0x2e3/0x500
> [ 648.969632] handle_softirqs+0xe7/0x310
> [ 648.985387] tun_get_user+0x137e/0x1510
> [ 649.005878] handle_tx+0x41f/0xd30
> [ 649.029014] vhost_run_work_list+0x52/0x90
> [ 649.033162] vhost_task_fn+0xc2/0x140
> [ 649.064145] ---[ end trace 0000000000000000 ]---
> [ 649.068820] ------------[ cut here ]------------
> [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
>
> I don't know whether this is a real overflow or a FORTIFY false-positive.
Looks like a false-positive from the __counted_by fortification.
I'd guess something like this would fit it:
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 1fc2fb03ce3f9..e51c3795da474 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
if (!new_md)
return ERR_PTR(-ENOMEM);
+ new_md->u.tun_info.options_len = md_size;
memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
sizeof(struct ip_tunnel_info) + md_size);
#ifdef CONFIG_DST_CACHE
---
Johan, could you try this in your setup?
The memory was actually allocated for the options, but the structure
is zeroed out on allocation, so the __counted_by check doesn't work
properly for the initial initialization copy.
But the operation in the diff above is kind of pointless as the memcpy
itself will copy the value again. So, I'm not sure if that's the right
solution here.
Alternative might be to revert the kmalloc_flex back to the simple
kmalloc in metadata_dst_alloc.
CC: Kees
Best regards, Ilya Maximets.
>
> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
> kernel, whereas the affected kernel is built with clang. Don't know
> whether this is relevant here.
>
> I am currently trying to make a reliable reproducer, but I can almost
> always trigger the issue when iperf stressing the VM-network.
>
> Please let me know if this should go to a more specific
> maintainer/list or further info is needed. I am able to test candidate
> patches if provided.
>
> Downstream bug reports:
> https://github.com/siderolabs/talos/issues/13440
> https://github.com/kubeovn/kube-ovn/issues/6767
>
> Thanks,
> Johan
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-08 9:41 ` Ilya Maximets
@ 2026-06-10 13:10 ` Johan Thomsen
2026-06-10 18:00 ` Ilya Maximets
2026-06-10 19:51 ` Kees Cook
1 sibling, 1 reply; 7+ messages in thread
From: Johan Thomsen @ 2026-06-10 13:10 UTC (permalink / raw)
To: Ilya Maximets; +Cc: netdev, dev, Kees Cook
Hi Ilya,
Sorry for the late follow-up.
Your patch has now run without crashes in my setup for 36 hours and
I'm unable to re-trigger a panic.
> But the operation in the diff above is kind of pointless as the memcpy
> itself will copy the value again.
Right. So it would probably need at least a comment, if that ends up
becoming the final fix.
I'm not a kernel dev and I don't know what is the right thing to do here.
Happy to test alternative patches if needed.
BR
Johan
Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>:
>
> On 6/8/26 10:25 AM, Johan Thomsen wrote:
> > Hello,
> >
> > I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> > transmit path on a Talos Linux node running Kube-ovn with Geneve
> > overlay and KubeVirt VM traffic.
> >
> > Environment:
> >
> > Kernel: 6.18.33-talos
> > Distro: Talos v1.13.3
> >
> > Compiler/config:
> >
> > CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> > CONFIG_CC_IS_CLANG=y
> > CONFIG_LTO=y
> > CONFIG_LTO_CLANG=y
> > CONFIG_LTO_CLANG_THIN=y
> > CONFIG_FORTIFY_SOURCE=y
> >
> > Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
> >
> > NIC driver: bnxt_en
> >
> > Workload/network:
> >
> > Kube-OVN, Geneve overlay
> > Open vSwitch datapath
> > KubeVirt/QEMU VM traffic via vhost/tap
> >
> > Relevant console output:
> >
> > [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
> > buffer size 96
> > [ 648.749907] WARNING: CPU: 61 PID: 27020 at
> > lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> > [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> > nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> > nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> > [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> > tainted 6.18.29-talos #1 PREEMPT(none)
> > [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> > Gen11, BIOS 2.84 11/05/2025
> > [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
> > [ 648.895152] geneve_xmit+0x7fe/0xe00
> > [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
> > [ 648.911479] __dev_queue_xmit+0x864/0xf40
> > [ 648.919688] do_execute_actions+0x9b9/0x1be0
> > [ 648.927727] ovs_execute_actions+0x58/0x170
> > [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
> > [ 648.936370] ovs_vport_receive+0x90/0x100
> > [ 648.940428] netdev_frame_hook+0x146/0x1a0
> > [ 648.954093] __netif_receive_skb+0x3f/0x160
> > [ 648.958324] process_backlog+0x10c/0x210
> > [ 648.962295] __napi_poll+0x2f/0x190
> > [ 648.965832] net_rx_action+0x2e3/0x500
> > [ 648.969632] handle_softirqs+0xe7/0x310
> > [ 648.985387] tun_get_user+0x137e/0x1510
> > [ 649.005878] handle_tx+0x41f/0xd30
> > [ 649.029014] vhost_run_work_list+0x52/0x90
> > [ 649.033162] vhost_task_fn+0xc2/0x140
> > [ 649.064145] ---[ end trace 0000000000000000 ]---
> > [ 649.068820] ------------[ cut here ]------------
> > [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
> >
> > I don't know whether this is a real overflow or a FORTIFY false-positive.
>
> Looks like a false-positive from the __counted_by fortification.
>
> I'd guess something like this would fit it:
>
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 1fc2fb03ce3f9..e51c3795da474 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> if (!new_md)
> return ERR_PTR(-ENOMEM);
>
> + new_md->u.tun_info.options_len = md_size;
> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> sizeof(struct ip_tunnel_info) + md_size);
> #ifdef CONFIG_DST_CACHE
> ---
>
> Johan, could you try this in your setup?
>
> The memory was actually allocated for the options, but the structure
> is zeroed out on allocation, so the __counted_by check doesn't work
> properly for the initial initialization copy.
>
> But the operation in the diff above is kind of pointless as the memcpy
> itself will copy the value again. So, I'm not sure if that's the right
> solution here.
>
> Alternative might be to revert the kmalloc_flex back to the simple
> kmalloc in metadata_dst_alloc.
>
> CC: Kees
>
> Best regards, Ilya Maximets.
>
> >
> > I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
> > kernel, whereas the affected kernel is built with clang. Don't know
> > whether this is relevant here.
> >
> > I am currently trying to make a reliable reproducer, but I can almost
> > always trigger the issue when iperf stressing the VM-network.
> >
> > Please let me know if this should go to a more specific
> > maintainer/list or further info is needed. I am able to test candidate
> > patches if provided.
> >
> > Downstream bug reports:
> > https://github.com/siderolabs/talos/issues/13440
> > https://github.com/kubeovn/kube-ovn/issues/6767
> >
> > Thanks,
> > Johan
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-10 13:10 ` Johan Thomsen
@ 2026-06-10 18:00 ` Ilya Maximets
2026-06-10 18:21 ` Kuniyuki Iwashima
2026-06-10 19:41 ` Kees Cook
0 siblings, 2 replies; 7+ messages in thread
From: Ilya Maximets @ 2026-06-10 18:00 UTC (permalink / raw)
To: Johan Thomsen, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: i.maximets, netdev, dev, Kees Cook
On 6/10/26 3:10 PM, Johan Thomsen wrote:
> Hi Ilya,
>
> Sorry for the late follow-up.
> Your patch has now run without crashes in my setup for 36 hours and
> I'm unable to re-trigger a panic.
>
>> But the operation in the diff above is kind of pointless as the memcpy
>> itself will copy the value again.
>
> Right. So it would probably need at least a comment, if that ends up
> becoming the final fix.
> I'm not a kernel dev and I don't know what is the right thing to do here.
>
> Happy to test alternative patches if needed.
OK. Thanks for testing.
>
> BR
> Johan
>
> Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>:
>>
>> On 6/8/26 10:25 AM, Johan Thomsen wrote:
>>> Hello,
>>>
>>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
>>> transmit path on a Talos Linux node running Kube-ovn with Geneve
>>> overlay and KubeVirt VM traffic.
>>>
>>> Environment:
>>>
>>> Kernel: 6.18.33-talos
>>> Distro: Talos v1.13.3
>>>
>>> Compiler/config:
>>>
>>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
>>> CONFIG_CC_IS_CLANG=y
>>> CONFIG_LTO=y
>>> CONFIG_LTO_CLANG=y
>>> CONFIG_LTO_CLANG_THIN=y
>>> CONFIG_FORTIFY_SOURCE=y
>>>
>>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
>>>
>>> NIC driver: bnxt_en
>>>
>>> Workload/network:
>>>
>>> Kube-OVN, Geneve overlay
>>> Open vSwitch datapath
>>> KubeVirt/QEMU VM traffic via vhost/tap
>>>
>>> Relevant console output:
>>>
>>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
>>> buffer size 96
>>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at
>>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60
>>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
>>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
>>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
>>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
>>> tainted 6.18.29-talos #1 PREEMPT(none)
>>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
>>> Gen11, BIOS 2.84 11/05/2025
>>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
>>> [ 648.895152] geneve_xmit+0x7fe/0xe00
>>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
>>> [ 648.911479] __dev_queue_xmit+0x864/0xf40
>>> [ 648.919688] do_execute_actions+0x9b9/0x1be0
>>> [ 648.927727] ovs_execute_actions+0x58/0x170
>>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
>>> [ 648.936370] ovs_vport_receive+0x90/0x100
>>> [ 648.940428] netdev_frame_hook+0x146/0x1a0
>>> [ 648.954093] __netif_receive_skb+0x3f/0x160
>>> [ 648.958324] process_backlog+0x10c/0x210
>>> [ 648.962295] __napi_poll+0x2f/0x190
>>> [ 648.965832] net_rx_action+0x2e3/0x500
>>> [ 648.969632] handle_softirqs+0xe7/0x310
>>> [ 648.985387] tun_get_user+0x137e/0x1510
>>> [ 649.005878] handle_tx+0x41f/0xd30
>>> [ 649.029014] vhost_run_work_list+0x52/0x90
>>> [ 649.033162] vhost_task_fn+0xc2/0x140
>>> [ 649.064145] ---[ end trace 0000000000000000 ]---
>>> [ 649.068820] ------------[ cut here ]------------
>>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
>>>
>>> I don't know whether this is a real overflow or a FORTIFY false-positive.
>>
>> Looks like a false-positive from the __counted_by fortification.
>>
>> I'd guess something like this would fit it:
>>
>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> index 1fc2fb03ce3f9..e51c3795da474 100644
>> --- a/include/net/dst_metadata.h
>> +++ b/include/net/dst_metadata.h
>> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>> if (!new_md)
>> return ERR_PTR(-ENOMEM);
>>
>> + new_md->u.tun_info.options_len = md_size;
>> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>> sizeof(struct ip_tunnel_info) + md_size);
>> #ifdef CONFIG_DST_CACHE
>> ---
>>
>> Johan, could you try this in your setup?
>>
>> The memory was actually allocated for the options, but the structure
>> is zeroed out on allocation, so the __counted_by check doesn't work
>> properly for the initial initialization copy.
>>
>> But the operation in the diff above is kind of pointless as the memcpy
>> itself will copy the value again. So, I'm not sure if that's the right
>> solution here.
>>
>> Alternative might be to revert the kmalloc_flex back to the simple
>> kmalloc in metadata_dst_alloc.
A little less icky alternative might be to just split the copy in two:
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 1fc2fb03ce3f9..996ae8350360a 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
if (!new_md)
return ERR_PTR(-ENOMEM);
+ /* Copy in two stages to keep the __counted_by happy. */
memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
- sizeof(struct ip_tunnel_info) + md_size);
+ sizeof(struct ip_tunnel_info));
+ memcpy(ip_tunnel_info_opts(&new_md->u.tun_info),
+ ip_tunnel_info_opts(&md_dst->u.tun_info),
+ md_size);
#ifdef CONFIG_DST_CACHE
/* Unclone the dst cache if there is one */
if (new_md->u.tun_info.dst_cache.cache) {
---
Adding netdev maintainers for more opinions.
>>
>> CC: Kees
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
>>> kernel, whereas the affected kernel is built with clang. Don't know
>>> whether this is relevant here.
>>>
>>> I am currently trying to make a reliable reproducer, but I can almost
>>> always trigger the issue when iperf stressing the VM-network.
>>>
>>> Please let me know if this should go to a more specific
>>> maintainer/list or further info is needed. I am able to test candidate
>>> patches if provided.
>>>
>>> Downstream bug reports:
>>> https://github.com/siderolabs/talos/issues/13440
>>> https://github.com/kubeovn/kube-ovn/issues/6767
>>>
>>> Thanks,
>>> Johan
>>
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-10 18:00 ` Ilya Maximets
@ 2026-06-10 18:21 ` Kuniyuki Iwashima
2026-06-10 19:41 ` Kees Cook
1 sibling, 0 replies; 7+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-10 18:21 UTC (permalink / raw)
To: i.maximets; +Cc: dev, edumazet, kees, kuba, netdev, pabeni, write
From: Ilya Maximets <i.maximets@ovn.org>
Date: Wed, 10 Jun 2026 20:00:53 +0200
> On 6/10/26 3:10 PM, Johan Thomsen wrote:
> > Hi Ilya,
> >
> > Sorry for the late follow-up.
> > Your patch has now run without crashes in my setup for 36 hours and
> > I'm unable to re-trigger a panic.
> >
> >> But the operation in the diff above is kind of pointless as the memcpy
> >> itself will copy the value again.
> >
> > Right. So it would probably need at least a comment, if that ends up
> > becoming the final fix.
> > I'm not a kernel dev and I don't know what is the right thing to do here.
> >
> > Happy to test alternative patches if needed.
>
> OK. Thanks for testing.
>
> >
> > BR
> > Johan
> >
> > Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>:
> >>
> >> On 6/8/26 10:25 AM, Johan Thomsen wrote:
> >>> Hello,
> >>>
> >>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> >>> transmit path on a Talos Linux node running Kube-ovn with Geneve
> >>> overlay and KubeVirt VM traffic.
> >>>
> >>> Environment:
> >>>
> >>> Kernel: 6.18.33-talos
> >>> Distro: Talos v1.13.3
> >>>
> >>> Compiler/config:
> >>>
> >>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> >>> CONFIG_CC_IS_CLANG=y
> >>> CONFIG_LTO=y
> >>> CONFIG_LTO_CLANG=y
> >>> CONFIG_LTO_CLANG_THIN=y
> >>> CONFIG_FORTIFY_SOURCE=y
> >>>
> >>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
> >>>
> >>> NIC driver: bnxt_en
> >>>
> >>> Workload/network:
> >>>
> >>> Kube-OVN, Geneve overlay
> >>> Open vSwitch datapath
> >>> KubeVirt/QEMU VM traffic via vhost/tap
> >>>
> >>> Relevant console output:
> >>>
> >>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
> >>> buffer size 96
> >>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at
> >>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> >>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> >>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> >>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> >>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> >>> tainted 6.18.29-talos #1 PREEMPT(none)
> >>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> >>> Gen11, BIOS 2.84 11/05/2025
> >>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
> >>> [ 648.895152] geneve_xmit+0x7fe/0xe00
> >>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
> >>> [ 648.911479] __dev_queue_xmit+0x864/0xf40
> >>> [ 648.919688] do_execute_actions+0x9b9/0x1be0
> >>> [ 648.927727] ovs_execute_actions+0x58/0x170
> >>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
> >>> [ 648.936370] ovs_vport_receive+0x90/0x100
> >>> [ 648.940428] netdev_frame_hook+0x146/0x1a0
> >>> [ 648.954093] __netif_receive_skb+0x3f/0x160
> >>> [ 648.958324] process_backlog+0x10c/0x210
> >>> [ 648.962295] __napi_poll+0x2f/0x190
> >>> [ 648.965832] net_rx_action+0x2e3/0x500
> >>> [ 648.969632] handle_softirqs+0xe7/0x310
> >>> [ 648.985387] tun_get_user+0x137e/0x1510
> >>> [ 649.005878] handle_tx+0x41f/0xd30
> >>> [ 649.029014] vhost_run_work_list+0x52/0x90
> >>> [ 649.033162] vhost_task_fn+0xc2/0x140
> >>> [ 649.064145] ---[ end trace 0000000000000000 ]---
> >>> [ 649.068820] ------------[ cut here ]------------
> >>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
> >>>
> >>> I don't know whether this is a real overflow or a FORTIFY false-positive.
> >>
> >> Looks like a false-positive from the __counted_by fortification.
> >>
> >> I'd guess something like this would fit it:
> >>
> >> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> >> index 1fc2fb03ce3f9..e51c3795da474 100644
> >> --- a/include/net/dst_metadata.h
> >> +++ b/include/net/dst_metadata.h
> >> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >> if (!new_md)
> >> return ERR_PTR(-ENOMEM);
> >>
> >> + new_md->u.tun_info.options_len = md_size;
> >> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >> sizeof(struct ip_tunnel_info) + md_size);
> >> #ifdef CONFIG_DST_CACHE
> >> ---
> >>
> >> Johan, could you try this in your setup?
> >>
> >> The memory was actually allocated for the options, but the structure
> >> is zeroed out on allocation, so the __counted_by check doesn't work
> >> properly for the initial initialization copy.
> >>
> >> But the operation in the diff above is kind of pointless as the memcpy
> >> itself will copy the value again. So, I'm not sure if that's the right
> >> solution here.
> >>
> >> Alternative might be to revert the kmalloc_flex back to the simple
> >> kmalloc in metadata_dst_alloc.
>
> A little less icky alternative might be to just split the copy in two:
I'd simply use unsafe_memcpy().
>
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 1fc2fb03ce3f9..996ae8350360a 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> if (!new_md)
> return ERR_PTR(-ENOMEM);
>
> + /* Copy in two stages to keep the __counted_by happy. */
> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> - sizeof(struct ip_tunnel_info) + md_size);
> + sizeof(struct ip_tunnel_info));
> + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info),
> + ip_tunnel_info_opts(&md_dst->u.tun_info),
> + md_size);
> #ifdef CONFIG_DST_CACHE
> /* Unclone the dst cache if there is one */
> if (new_md->u.tun_info.dst_cache.cache) {
> ---
>
> Adding netdev maintainers for more opinions.
>
> >>
> >> CC: Kees
> >>
> >> Best regards, Ilya Maximets.
> >>
> >>>
> >>> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built
> >>> kernel, whereas the affected kernel is built with clang. Don't know
> >>> whether this is relevant here.
> >>>
> >>> I am currently trying to make a reliable reproducer, but I can almost
> >>> always trigger the issue when iperf stressing the VM-network.
> >>>
> >>> Please let me know if this should go to a more specific
> >>> maintainer/list or further info is needed. I am able to test candidate
> >>> patches if provided.
> >>>
> >>> Downstream bug reports:
> >>> https://github.com/siderolabs/talos/issues/13440
> >>> https://github.com/kubeovn/kube-ovn/issues/6767
> >>>
> >>> Thanks,
> >>> Johan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-10 18:00 ` Ilya Maximets
2026-06-10 18:21 ` Kuniyuki Iwashima
@ 2026-06-10 19:41 ` Kees Cook
1 sibling, 0 replies; 7+ messages in thread
From: Kees Cook @ 2026-06-10 19:41 UTC (permalink / raw)
To: Ilya Maximets
Cc: Johan Thomsen, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev,
dev
On Wed, Jun 10, 2026 at 08:00:53PM +0200, Ilya Maximets wrote:
> On 6/10/26 3:10 PM, Johan Thomsen wrote:
> > Hi Ilya,
> >
> > Sorry for the late follow-up.
> > Your patch has now run without crashes in my setup for 36 hours and
> > I'm unable to re-trigger a panic.
> >
> >> But the operation in the diff above is kind of pointless as the memcpy
> >> itself will copy the value again.
> >
> > Right. So it would probably need at least a comment, if that ends up
> > becoming the final fix.
> > I'm not a kernel dev and I don't know what is the right thing to do here.
> >
> > Happy to test alternative patches if needed.
>
> OK. Thanks for testing.
>
> >
> > BR
> > Johan
> >
> > Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>:
> >>
> >> On 6/8/26 10:25 AM, Johan Thomsen wrote:
> >>> Hello,
> >>>
> >>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> >>> transmit path on a Talos Linux node running Kube-ovn with Geneve
> >>> overlay and KubeVirt VM traffic.
> >>>
> >>> Environment:
> >>>
> >>> Kernel: 6.18.33-talos
> >>> Distro: Talos v1.13.3
> >>>
> >>> Compiler/config:
> >>>
> >>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> >>> CONFIG_CC_IS_CLANG=y
> >>> CONFIG_LTO=y
> >>> CONFIG_LTO_CLANG=y
> >>> CONFIG_LTO_CLANG_THIN=y
> >>> CONFIG_FORTIFY_SOURCE=y
> >>>
> >>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
> >>>
> >>> NIC driver: bnxt_en
> >>>
> >>> Workload/network:
> >>>
> >>> Kube-OVN, Geneve overlay
> >>> Open vSwitch datapath
> >>> KubeVirt/QEMU VM traffic via vhost/tap
> >>>
> >>> Relevant console output:
> >>>
> >>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
> >>> buffer size 96
> >>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at
> >>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> >>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> >>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> >>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> >>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> >>> tainted 6.18.29-talos #1 PREEMPT(none)
> >>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> >>> Gen11, BIOS 2.84 11/05/2025
> >>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
> >>> [ 648.895152] geneve_xmit+0x7fe/0xe00
> >>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
> >>> [ 648.911479] __dev_queue_xmit+0x864/0xf40
> >>> [ 648.919688] do_execute_actions+0x9b9/0x1be0
> >>> [ 648.927727] ovs_execute_actions+0x58/0x170
> >>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
> >>> [ 648.936370] ovs_vport_receive+0x90/0x100
> >>> [ 648.940428] netdev_frame_hook+0x146/0x1a0
> >>> [ 648.954093] __netif_receive_skb+0x3f/0x160
> >>> [ 648.958324] process_backlog+0x10c/0x210
> >>> [ 648.962295] __napi_poll+0x2f/0x190
> >>> [ 648.965832] net_rx_action+0x2e3/0x500
> >>> [ 648.969632] handle_softirqs+0xe7/0x310
> >>> [ 648.985387] tun_get_user+0x137e/0x1510
> >>> [ 649.005878] handle_tx+0x41f/0xd30
> >>> [ 649.029014] vhost_run_work_list+0x52/0x90
> >>> [ 649.033162] vhost_task_fn+0xc2/0x140
> >>> [ 649.064145] ---[ end trace 0000000000000000 ]---
> >>> [ 649.068820] ------------[ cut here ]------------
> >>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
> >>>
> >>> I don't know whether this is a real overflow or a FORTIFY false-positive.
> >>
> >> Looks like a false-positive from the __counted_by fortification.
> >>
> >> I'd guess something like this would fit it:
> >>
> >> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> >> index 1fc2fb03ce3f9..e51c3795da474 100644
> >> --- a/include/net/dst_metadata.h
> >> +++ b/include/net/dst_metadata.h
> >> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> >> if (!new_md)
> >> return ERR_PTR(-ENOMEM);
> >>
> >> + new_md->u.tun_info.options_len = md_size;
> >> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> >> sizeof(struct ip_tunnel_info) + md_size);
> >> #ifdef CONFIG_DST_CACHE
> >> ---
> >>
> >> Johan, could you try this in your setup?
> >>
> >> The memory was actually allocated for the options, but the structure
> >> is zeroed out on allocation, so the __counted_by check doesn't work
> >> properly for the initial initialization copy.
> >>
> >> But the operation in the diff above is kind of pointless as the memcpy
> >> itself will copy the value again. So, I'm not sure if that's the right
> >> solution here.
> >>
> >> Alternative might be to revert the kmalloc_flex back to the simple
> >> kmalloc in metadata_dst_alloc.
>
> A little less icky alternative might be to just split the copy in two:
>
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 1fc2fb03ce3f9..996ae8350360a 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> if (!new_md)
> return ERR_PTR(-ENOMEM);
>
> + /* Copy in two stages to keep the __counted_by happy. */
> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> - sizeof(struct ip_tunnel_info) + md_size);
> + sizeof(struct ip_tunnel_info));
> + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info),
> + ip_tunnel_info_opts(&md_dst->u.tun_info),
> + md_size);
> #ifdef CONFIG_DST_CACHE
> /* Unclone the dst cache if there is one */
> if (new_md->u.tun_info.dst_cache.cache) {
> ---
>
> Adding netdev maintainers for more opinions.
>
> >>
> >> CC: Kees
Yeah, I think the split makes the most sense. This matches the proper
spans, and is what we've done in the past in several places. It'll all
get collapsed into the same binary output in most cases, so there's no
real down-side.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit()
2026-06-08 9:41 ` Ilya Maximets
2026-06-10 13:10 ` Johan Thomsen
@ 2026-06-10 19:51 ` Kees Cook
1 sibling, 0 replies; 7+ messages in thread
From: Kees Cook @ 2026-06-10 19:51 UTC (permalink / raw)
To: Ilya Maximets; +Cc: Johan Thomsen, netdev, dev
On Mon, Jun 08, 2026 at 11:41:37AM +0200, Ilya Maximets wrote:
> On 6/8/26 10:25 AM, Johan Thomsen wrote:
> > Hello,
> >
> > I am seeing what looks like a kernel bug in the Geneve/OVS/vhost
> > transmit path on a Talos Linux node running Kube-ovn with Geneve
> > overlay and KubeVirt VM traffic.
> >
> > Environment:
> >
> > Kernel: 6.18.33-talos
> > Distro: Talos v1.13.3
> >
> > Compiler/config:
> >
> > CONFIG_CC_VERSION_TEXT="clang version 22.1.2"
> > CONFIG_CC_IS_CLANG=y
> > CONFIG_LTO=y
> > CONFIG_LTO_CLANG=y
> > CONFIG_LTO_CLANG_THIN=y
> > CONFIG_FORTIFY_SOURCE=y
> >
> > Hardware: HPE ProLiant DL325 Gen11, AMD EPYC
> >
> > NIC driver: bnxt_en
> >
> > Workload/network:
> >
> > Kube-OVN, Geneve overlay
> > Open vSwitch datapath
> > KubeVirt/QEMU VM traffic via vhost/tap
> >
> > Relevant console output:
> >
> > [ 648.742603] memcpy: detected buffer overflow: 104 byte write of
> > buffer size 96
> > [ 648.749907] WARNING: CPU: 61 PID: 27020 at
> > lib/string_helpers.c:1036 __fortify_report+0x45/0x60
> > [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc
> > nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en
> > nvme hpilo hkdf libahci sp5100_tco watchdog k10temp
> > [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not
> > tainted 6.18.29-talos #1 PREEMPT(none)
> > [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325
> > Gen11, BIOS 2.84 11/05/2025
> > [ 648.890478] skb_tunnel_info_unclone+0x179/0x190
> > [ 648.895152] geneve_xmit+0x7fe/0xe00
> > [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0
> > [ 648.911479] __dev_queue_xmit+0x864/0xf40
> > [ 648.919688] do_execute_actions+0x9b9/0x1be0
> > [ 648.927727] ovs_execute_actions+0x58/0x170
> > [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0
> > [ 648.936370] ovs_vport_receive+0x90/0x100
> > [ 648.940428] netdev_frame_hook+0x146/0x1a0
> > [ 648.954093] __netif_receive_skb+0x3f/0x160
> > [ 648.958324] process_backlog+0x10c/0x210
> > [ 648.962295] __napi_poll+0x2f/0x190
> > [ 648.965832] net_rx_action+0x2e3/0x500
> > [ 648.969632] handle_softirqs+0xe7/0x310
> > [ 648.985387] tun_get_user+0x137e/0x1510
> > [ 649.005878] handle_tx+0x41f/0xd30
> > [ 649.029014] vhost_run_work_list+0x52/0x90
> > [ 649.033162] vhost_task_fn+0xc2/0x140
> > [ 649.064145] ---[ end trace 0000000000000000 ]---
> > [ 649.068820] ------------[ cut here ]------------
> > [ 649.073489] kernel BUG at lib/string_helpers.c:1043!
> >
> > I don't know whether this is a real overflow or a FORTIFY false-positive.
>
> Looks like a false-positive from the __counted_by fortification.
>
> I'd guess something like this would fit it:
>
> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
> index 1fc2fb03ce3f9..e51c3795da474 100644
> --- a/include/net/dst_metadata.h
> +++ b/include/net/dst_metadata.h
> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
> if (!new_md)
> return ERR_PTR(-ENOMEM);
>
> + new_md->u.tun_info.options_len = md_size;
> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
> sizeof(struct ip_tunnel_info) + md_size);
Speaking to this solution, it also makes sense, but does look redundant
to the memcpy that follows it. I wonder something more in between would
be better (the memcpy isn't needed to copy a struct, either):
new_md->u.tun_info = md_dst->u.tun_info;
memcpy(new_md->u.tun_info.options, md_dst->u.tun_info.options,
md_dst->u.tun_info.options_len);
Is this the only place in the kernel where a struct ip_tunnel_info is
being copied? The above really looks like an open-coded helper. :)
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-10 19:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 8:25 [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() Johan Thomsen
2026-06-08 9:41 ` Ilya Maximets
2026-06-10 13:10 ` Johan Thomsen
2026-06-10 18:00 ` Ilya Maximets
2026-06-10 18:21 ` Kuniyuki Iwashima
2026-06-10 19:41 ` Kees Cook
2026-06-10 19:51 ` Kees Cook
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox