* [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() @ 2026-06-08 8:25 Johan Thomsen 2026-06-08 9:41 ` Ilya Maximets 0 siblings, 1 reply; 7+ messages in thread From: Johan Thomsen @ 2026-06-08 8:25 UTC (permalink / raw) To: netdev, dev Hello, I am seeing what looks like a kernel bug in the Geneve/OVS/vhost transmit path on a Talos Linux node running Kube-ovn with Geneve overlay and KubeVirt VM traffic. Environment: Kernel: 6.18.33-talos Distro: Talos v1.13.3 Compiler/config: CONFIG_CC_VERSION_TEXT="clang version 22.1.2" CONFIG_CC_IS_CLANG=y CONFIG_LTO=y CONFIG_LTO_CLANG=y CONFIG_LTO_CLANG_THIN=y CONFIG_FORTIFY_SOURCE=y Hardware: HPE ProLiant DL325 Gen11, AMD EPYC NIC driver: bnxt_en Workload/network: Kube-OVN, Geneve overlay Open vSwitch datapath KubeVirt/QEMU VM traffic via vhost/tap Relevant console output: [ 648.742603] memcpy: detected buffer overflow: 104 byte write of buffer size 96 [ 648.749907] WARNING: CPU: 61 PID: 27020 at lib/string_helpers.c:1036 __fortify_report+0x45/0x60 [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en nvme hpilo hkdf libahci sp5100_tco watchdog k10temp [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not tainted 6.18.29-talos #1 PREEMPT(none) [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 Gen11, BIOS 2.84 11/05/2025 [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 [ 648.895152] geneve_xmit+0x7fe/0xe00 [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 [ 648.911479] __dev_queue_xmit+0x864/0xf40 [ 648.919688] do_execute_actions+0x9b9/0x1be0 [ 648.927727] ovs_execute_actions+0x58/0x170 [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 [ 648.936370] ovs_vport_receive+0x90/0x100 [ 648.940428] netdev_frame_hook+0x146/0x1a0 [ 648.954093] __netif_receive_skb+0x3f/0x160 [ 648.958324] process_backlog+0x10c/0x210 [ 648.962295] __napi_poll+0x2f/0x190 [ 648.965832] net_rx_action+0x2e3/0x500 [ 648.969632] handle_softirqs+0xe7/0x310 [ 648.985387] tun_get_user+0x137e/0x1510 [ 649.005878] handle_tx+0x41f/0xd30 [ 649.029014] vhost_run_work_list+0x52/0x90 [ 649.033162] vhost_task_fn+0xc2/0x140 [ 649.064145] ---[ end trace 0000000000000000 ]--- [ 649.068820] ------------[ cut here ]------------ [ 649.073489] kernel BUG at lib/string_helpers.c:1043! I don't know whether this is a real overflow or a FORTIFY false-positive. I cannot reproduce the issue on Talos v1.12.X which uses a gcc built kernel, whereas the affected kernel is built with clang. Don't know whether this is relevant here. I am currently trying to make a reliable reproducer, but I can almost always trigger the issue when iperf stressing the VM-network. Please let me know if this should go to a more specific maintainer/list or further info is needed. I am able to test candidate patches if provided. Downstream bug reports: https://github.com/siderolabs/talos/issues/13440 https://github.com/kubeovn/kube-ovn/issues/6767 Thanks, Johan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-08 8:25 [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() Johan Thomsen @ 2026-06-08 9:41 ` Ilya Maximets 2026-06-10 13:10 ` Johan Thomsen 2026-06-10 19:51 ` Kees Cook 0 siblings, 2 replies; 7+ messages in thread From: Ilya Maximets @ 2026-06-08 9:41 UTC (permalink / raw) To: Johan Thomsen, netdev, dev; +Cc: i.maximets, Kees Cook On 6/8/26 10:25 AM, Johan Thomsen wrote: > Hello, > > I am seeing what looks like a kernel bug in the Geneve/OVS/vhost > transmit path on a Talos Linux node running Kube-ovn with Geneve > overlay and KubeVirt VM traffic. > > Environment: > > Kernel: 6.18.33-talos > Distro: Talos v1.13.3 > > Compiler/config: > > CONFIG_CC_VERSION_TEXT="clang version 22.1.2" > CONFIG_CC_IS_CLANG=y > CONFIG_LTO=y > CONFIG_LTO_CLANG=y > CONFIG_LTO_CLANG_THIN=y > CONFIG_FORTIFY_SOURCE=y > > Hardware: HPE ProLiant DL325 Gen11, AMD EPYC > > NIC driver: bnxt_en > > Workload/network: > > Kube-OVN, Geneve overlay > Open vSwitch datapath > KubeVirt/QEMU VM traffic via vhost/tap > > Relevant console output: > > [ 648.742603] memcpy: detected buffer overflow: 104 byte write of > buffer size 96 > [ 648.749907] WARNING: CPU: 61 PID: 27020 at > lib/string_helpers.c:1036 __fortify_report+0x45/0x60 > [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc > nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en > nvme hpilo hkdf libahci sp5100_tco watchdog k10temp > [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not > tainted 6.18.29-talos #1 PREEMPT(none) > [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 > Gen11, BIOS 2.84 11/05/2025 > [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 > [ 648.895152] geneve_xmit+0x7fe/0xe00 > [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 > [ 648.911479] __dev_queue_xmit+0x864/0xf40 > [ 648.919688] do_execute_actions+0x9b9/0x1be0 > [ 648.927727] ovs_execute_actions+0x58/0x170 > [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 > [ 648.936370] ovs_vport_receive+0x90/0x100 > [ 648.940428] netdev_frame_hook+0x146/0x1a0 > [ 648.954093] __netif_receive_skb+0x3f/0x160 > [ 648.958324] process_backlog+0x10c/0x210 > [ 648.962295] __napi_poll+0x2f/0x190 > [ 648.965832] net_rx_action+0x2e3/0x500 > [ 648.969632] handle_softirqs+0xe7/0x310 > [ 648.985387] tun_get_user+0x137e/0x1510 > [ 649.005878] handle_tx+0x41f/0xd30 > [ 649.029014] vhost_run_work_list+0x52/0x90 > [ 649.033162] vhost_task_fn+0xc2/0x140 > [ 649.064145] ---[ end trace 0000000000000000 ]--- > [ 649.068820] ------------[ cut here ]------------ > [ 649.073489] kernel BUG at lib/string_helpers.c:1043! > > I don't know whether this is a real overflow or a FORTIFY false-positive. Looks like a false-positive from the __counted_by fortification. I'd guess something like this would fit it: diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 1fc2fb03ce3f9..e51c3795da474 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) if (!new_md) return ERR_PTR(-ENOMEM); + new_md->u.tun_info.options_len = md_size; memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, sizeof(struct ip_tunnel_info) + md_size); #ifdef CONFIG_DST_CACHE --- Johan, could you try this in your setup? The memory was actually allocated for the options, but the structure is zeroed out on allocation, so the __counted_by check doesn't work properly for the initial initialization copy. But the operation in the diff above is kind of pointless as the memcpy itself will copy the value again. So, I'm not sure if that's the right solution here. Alternative might be to revert the kmalloc_flex back to the simple kmalloc in metadata_dst_alloc. CC: Kees Best regards, Ilya Maximets. > > I cannot reproduce the issue on Talos v1.12.X which uses a gcc built > kernel, whereas the affected kernel is built with clang. Don't know > whether this is relevant here. > > I am currently trying to make a reliable reproducer, but I can almost > always trigger the issue when iperf stressing the VM-network. > > Please let me know if this should go to a more specific > maintainer/list or further info is needed. I am able to test candidate > patches if provided. > > Downstream bug reports: > https://github.com/siderolabs/talos/issues/13440 > https://github.com/kubeovn/kube-ovn/issues/6767 > > Thanks, > Johan ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-08 9:41 ` Ilya Maximets @ 2026-06-10 13:10 ` Johan Thomsen 2026-06-10 18:00 ` Ilya Maximets 2026-06-10 19:51 ` Kees Cook 1 sibling, 1 reply; 7+ messages in thread From: Johan Thomsen @ 2026-06-10 13:10 UTC (permalink / raw) To: Ilya Maximets; +Cc: netdev, dev, Kees Cook Hi Ilya, Sorry for the late follow-up. Your patch has now run without crashes in my setup for 36 hours and I'm unable to re-trigger a panic. > But the operation in the diff above is kind of pointless as the memcpy > itself will copy the value again. Right. So it would probably need at least a comment, if that ends up becoming the final fix. I'm not a kernel dev and I don't know what is the right thing to do here. Happy to test alternative patches if needed. BR Johan Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>: > > On 6/8/26 10:25 AM, Johan Thomsen wrote: > > Hello, > > > > I am seeing what looks like a kernel bug in the Geneve/OVS/vhost > > transmit path on a Talos Linux node running Kube-ovn with Geneve > > overlay and KubeVirt VM traffic. > > > > Environment: > > > > Kernel: 6.18.33-talos > > Distro: Talos v1.13.3 > > > > Compiler/config: > > > > CONFIG_CC_VERSION_TEXT="clang version 22.1.2" > > CONFIG_CC_IS_CLANG=y > > CONFIG_LTO=y > > CONFIG_LTO_CLANG=y > > CONFIG_LTO_CLANG_THIN=y > > CONFIG_FORTIFY_SOURCE=y > > > > Hardware: HPE ProLiant DL325 Gen11, AMD EPYC > > > > NIC driver: bnxt_en > > > > Workload/network: > > > > Kube-OVN, Geneve overlay > > Open vSwitch datapath > > KubeVirt/QEMU VM traffic via vhost/tap > > > > Relevant console output: > > > > [ 648.742603] memcpy: detected buffer overflow: 104 byte write of > > buffer size 96 > > [ 648.749907] WARNING: CPU: 61 PID: 27020 at > > lib/string_helpers.c:1036 __fortify_report+0x45/0x60 > > [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc > > nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en > > nvme hpilo hkdf libahci sp5100_tco watchdog k10temp > > [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not > > tainted 6.18.29-talos #1 PREEMPT(none) > > [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 > > Gen11, BIOS 2.84 11/05/2025 > > [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 > > [ 648.895152] geneve_xmit+0x7fe/0xe00 > > [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 > > [ 648.911479] __dev_queue_xmit+0x864/0xf40 > > [ 648.919688] do_execute_actions+0x9b9/0x1be0 > > [ 648.927727] ovs_execute_actions+0x58/0x170 > > [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 > > [ 648.936370] ovs_vport_receive+0x90/0x100 > > [ 648.940428] netdev_frame_hook+0x146/0x1a0 > > [ 648.954093] __netif_receive_skb+0x3f/0x160 > > [ 648.958324] process_backlog+0x10c/0x210 > > [ 648.962295] __napi_poll+0x2f/0x190 > > [ 648.965832] net_rx_action+0x2e3/0x500 > > [ 648.969632] handle_softirqs+0xe7/0x310 > > [ 648.985387] tun_get_user+0x137e/0x1510 > > [ 649.005878] handle_tx+0x41f/0xd30 > > [ 649.029014] vhost_run_work_list+0x52/0x90 > > [ 649.033162] vhost_task_fn+0xc2/0x140 > > [ 649.064145] ---[ end trace 0000000000000000 ]--- > > [ 649.068820] ------------[ cut here ]------------ > > [ 649.073489] kernel BUG at lib/string_helpers.c:1043! > > > > I don't know whether this is a real overflow or a FORTIFY false-positive. > > Looks like a false-positive from the __counted_by fortification. > > I'd guess something like this would fit it: > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 1fc2fb03ce3f9..e51c3795da474 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > if (!new_md) > return ERR_PTR(-ENOMEM); > > + new_md->u.tun_info.options_len = md_size; > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > sizeof(struct ip_tunnel_info) + md_size); > #ifdef CONFIG_DST_CACHE > --- > > Johan, could you try this in your setup? > > The memory was actually allocated for the options, but the structure > is zeroed out on allocation, so the __counted_by check doesn't work > properly for the initial initialization copy. > > But the operation in the diff above is kind of pointless as the memcpy > itself will copy the value again. So, I'm not sure if that's the right > solution here. > > Alternative might be to revert the kmalloc_flex back to the simple > kmalloc in metadata_dst_alloc. > > CC: Kees > > Best regards, Ilya Maximets. > > > > > I cannot reproduce the issue on Talos v1.12.X which uses a gcc built > > kernel, whereas the affected kernel is built with clang. Don't know > > whether this is relevant here. > > > > I am currently trying to make a reliable reproducer, but I can almost > > always trigger the issue when iperf stressing the VM-network. > > > > Please let me know if this should go to a more specific > > maintainer/list or further info is needed. I am able to test candidate > > patches if provided. > > > > Downstream bug reports: > > https://github.com/siderolabs/talos/issues/13440 > > https://github.com/kubeovn/kube-ovn/issues/6767 > > > > Thanks, > > Johan > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-10 13:10 ` Johan Thomsen @ 2026-06-10 18:00 ` Ilya Maximets 2026-06-10 18:21 ` Kuniyuki Iwashima 2026-06-10 19:41 ` Kees Cook 0 siblings, 2 replies; 7+ messages in thread From: Ilya Maximets @ 2026-06-10 18:00 UTC (permalink / raw) To: Johan Thomsen, Jakub Kicinski, Paolo Abeni, Eric Dumazet Cc: i.maximets, netdev, dev, Kees Cook On 6/10/26 3:10 PM, Johan Thomsen wrote: > Hi Ilya, > > Sorry for the late follow-up. > Your patch has now run without crashes in my setup for 36 hours and > I'm unable to re-trigger a panic. > >> But the operation in the diff above is kind of pointless as the memcpy >> itself will copy the value again. > > Right. So it would probably need at least a comment, if that ends up > becoming the final fix. > I'm not a kernel dev and I don't know what is the right thing to do here. > > Happy to test alternative patches if needed. OK. Thanks for testing. > > BR > Johan > > Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>: >> >> On 6/8/26 10:25 AM, Johan Thomsen wrote: >>> Hello, >>> >>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost >>> transmit path on a Talos Linux node running Kube-ovn with Geneve >>> overlay and KubeVirt VM traffic. >>> >>> Environment: >>> >>> Kernel: 6.18.33-talos >>> Distro: Talos v1.13.3 >>> >>> Compiler/config: >>> >>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2" >>> CONFIG_CC_IS_CLANG=y >>> CONFIG_LTO=y >>> CONFIG_LTO_CLANG=y >>> CONFIG_LTO_CLANG_THIN=y >>> CONFIG_FORTIFY_SOURCE=y >>> >>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC >>> >>> NIC driver: bnxt_en >>> >>> Workload/network: >>> >>> Kube-OVN, Geneve overlay >>> Open vSwitch datapath >>> KubeVirt/QEMU VM traffic via vhost/tap >>> >>> Relevant console output: >>> >>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of >>> buffer size 96 >>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at >>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60 >>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc >>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en >>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp >>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not >>> tainted 6.18.29-talos #1 PREEMPT(none) >>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 >>> Gen11, BIOS 2.84 11/05/2025 >>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 >>> [ 648.895152] geneve_xmit+0x7fe/0xe00 >>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 >>> [ 648.911479] __dev_queue_xmit+0x864/0xf40 >>> [ 648.919688] do_execute_actions+0x9b9/0x1be0 >>> [ 648.927727] ovs_execute_actions+0x58/0x170 >>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 >>> [ 648.936370] ovs_vport_receive+0x90/0x100 >>> [ 648.940428] netdev_frame_hook+0x146/0x1a0 >>> [ 648.954093] __netif_receive_skb+0x3f/0x160 >>> [ 648.958324] process_backlog+0x10c/0x210 >>> [ 648.962295] __napi_poll+0x2f/0x190 >>> [ 648.965832] net_rx_action+0x2e3/0x500 >>> [ 648.969632] handle_softirqs+0xe7/0x310 >>> [ 648.985387] tun_get_user+0x137e/0x1510 >>> [ 649.005878] handle_tx+0x41f/0xd30 >>> [ 649.029014] vhost_run_work_list+0x52/0x90 >>> [ 649.033162] vhost_task_fn+0xc2/0x140 >>> [ 649.064145] ---[ end trace 0000000000000000 ]--- >>> [ 649.068820] ------------[ cut here ]------------ >>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043! >>> >>> I don't know whether this is a real overflow or a FORTIFY false-positive. >> >> Looks like a false-positive from the __counted_by fortification. >> >> I'd guess something like this would fit it: >> >> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h >> index 1fc2fb03ce3f9..e51c3795da474 100644 >> --- a/include/net/dst_metadata.h >> +++ b/include/net/dst_metadata.h >> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) >> if (!new_md) >> return ERR_PTR(-ENOMEM); >> >> + new_md->u.tun_info.options_len = md_size; >> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, >> sizeof(struct ip_tunnel_info) + md_size); >> #ifdef CONFIG_DST_CACHE >> --- >> >> Johan, could you try this in your setup? >> >> The memory was actually allocated for the options, but the structure >> is zeroed out on allocation, so the __counted_by check doesn't work >> properly for the initial initialization copy. >> >> But the operation in the diff above is kind of pointless as the memcpy >> itself will copy the value again. So, I'm not sure if that's the right >> solution here. >> >> Alternative might be to revert the kmalloc_flex back to the simple >> kmalloc in metadata_dst_alloc. A little less icky alternative might be to just split the copy in two: diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h index 1fc2fb03ce3f9..996ae8350360a 100644 --- a/include/net/dst_metadata.h +++ b/include/net/dst_metadata.h @@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) if (!new_md) return ERR_PTR(-ENOMEM); + /* Copy in two stages to keep the __counted_by happy. */ memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, - sizeof(struct ip_tunnel_info) + md_size); + sizeof(struct ip_tunnel_info)); + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info), + ip_tunnel_info_opts(&md_dst->u.tun_info), + md_size); #ifdef CONFIG_DST_CACHE /* Unclone the dst cache if there is one */ if (new_md->u.tun_info.dst_cache.cache) { --- Adding netdev maintainers for more opinions. >> >> CC: Kees >> >> Best regards, Ilya Maximets. >> >>> >>> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built >>> kernel, whereas the affected kernel is built with clang. Don't know >>> whether this is relevant here. >>> >>> I am currently trying to make a reliable reproducer, but I can almost >>> always trigger the issue when iperf stressing the VM-network. >>> >>> Please let me know if this should go to a more specific >>> maintainer/list or further info is needed. I am able to test candidate >>> patches if provided. >>> >>> Downstream bug reports: >>> https://github.com/siderolabs/talos/issues/13440 >>> https://github.com/kubeovn/kube-ovn/issues/6767 >>> >>> Thanks, >>> Johan >> ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-10 18:00 ` Ilya Maximets @ 2026-06-10 18:21 ` Kuniyuki Iwashima 2026-06-10 19:41 ` Kees Cook 1 sibling, 0 replies; 7+ messages in thread From: Kuniyuki Iwashima @ 2026-06-10 18:21 UTC (permalink / raw) To: i.maximets; +Cc: dev, edumazet, kees, kuba, netdev, pabeni, write From: Ilya Maximets <i.maximets@ovn.org> Date: Wed, 10 Jun 2026 20:00:53 +0200 > On 6/10/26 3:10 PM, Johan Thomsen wrote: > > Hi Ilya, > > > > Sorry for the late follow-up. > > Your patch has now run without crashes in my setup for 36 hours and > > I'm unable to re-trigger a panic. > > > >> But the operation in the diff above is kind of pointless as the memcpy > >> itself will copy the value again. > > > > Right. So it would probably need at least a comment, if that ends up > > becoming the final fix. > > I'm not a kernel dev and I don't know what is the right thing to do here. > > > > Happy to test alternative patches if needed. > > OK. Thanks for testing. > > > > > BR > > Johan > > > > Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>: > >> > >> On 6/8/26 10:25 AM, Johan Thomsen wrote: > >>> Hello, > >>> > >>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost > >>> transmit path on a Talos Linux node running Kube-ovn with Geneve > >>> overlay and KubeVirt VM traffic. > >>> > >>> Environment: > >>> > >>> Kernel: 6.18.33-talos > >>> Distro: Talos v1.13.3 > >>> > >>> Compiler/config: > >>> > >>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2" > >>> CONFIG_CC_IS_CLANG=y > >>> CONFIG_LTO=y > >>> CONFIG_LTO_CLANG=y > >>> CONFIG_LTO_CLANG_THIN=y > >>> CONFIG_FORTIFY_SOURCE=y > >>> > >>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC > >>> > >>> NIC driver: bnxt_en > >>> > >>> Workload/network: > >>> > >>> Kube-OVN, Geneve overlay > >>> Open vSwitch datapath > >>> KubeVirt/QEMU VM traffic via vhost/tap > >>> > >>> Relevant console output: > >>> > >>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of > >>> buffer size 96 > >>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at > >>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60 > >>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc > >>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en > >>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp > >>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not > >>> tainted 6.18.29-talos #1 PREEMPT(none) > >>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 > >>> Gen11, BIOS 2.84 11/05/2025 > >>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 > >>> [ 648.895152] geneve_xmit+0x7fe/0xe00 > >>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 > >>> [ 648.911479] __dev_queue_xmit+0x864/0xf40 > >>> [ 648.919688] do_execute_actions+0x9b9/0x1be0 > >>> [ 648.927727] ovs_execute_actions+0x58/0x170 > >>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 > >>> [ 648.936370] ovs_vport_receive+0x90/0x100 > >>> [ 648.940428] netdev_frame_hook+0x146/0x1a0 > >>> [ 648.954093] __netif_receive_skb+0x3f/0x160 > >>> [ 648.958324] process_backlog+0x10c/0x210 > >>> [ 648.962295] __napi_poll+0x2f/0x190 > >>> [ 648.965832] net_rx_action+0x2e3/0x500 > >>> [ 648.969632] handle_softirqs+0xe7/0x310 > >>> [ 648.985387] tun_get_user+0x137e/0x1510 > >>> [ 649.005878] handle_tx+0x41f/0xd30 > >>> [ 649.029014] vhost_run_work_list+0x52/0x90 > >>> [ 649.033162] vhost_task_fn+0xc2/0x140 > >>> [ 649.064145] ---[ end trace 0000000000000000 ]--- > >>> [ 649.068820] ------------[ cut here ]------------ > >>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043! > >>> > >>> I don't know whether this is a real overflow or a FORTIFY false-positive. > >> > >> Looks like a false-positive from the __counted_by fortification. > >> > >> I'd guess something like this would fit it: > >> > >> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > >> index 1fc2fb03ce3f9..e51c3795da474 100644 > >> --- a/include/net/dst_metadata.h > >> +++ b/include/net/dst_metadata.h > >> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > >> if (!new_md) > >> return ERR_PTR(-ENOMEM); > >> > >> + new_md->u.tun_info.options_len = md_size; > >> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > >> sizeof(struct ip_tunnel_info) + md_size); > >> #ifdef CONFIG_DST_CACHE > >> --- > >> > >> Johan, could you try this in your setup? > >> > >> The memory was actually allocated for the options, but the structure > >> is zeroed out on allocation, so the __counted_by check doesn't work > >> properly for the initial initialization copy. > >> > >> But the operation in the diff above is kind of pointless as the memcpy > >> itself will copy the value again. So, I'm not sure if that's the right > >> solution here. > >> > >> Alternative might be to revert the kmalloc_flex back to the simple > >> kmalloc in metadata_dst_alloc. > > A little less icky alternative might be to just split the copy in two: I'd simply use unsafe_memcpy(). > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 1fc2fb03ce3f9..996ae8350360a 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > if (!new_md) > return ERR_PTR(-ENOMEM); > > + /* Copy in two stages to keep the __counted_by happy. */ > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > - sizeof(struct ip_tunnel_info) + md_size); > + sizeof(struct ip_tunnel_info)); > + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info), > + ip_tunnel_info_opts(&md_dst->u.tun_info), > + md_size); > #ifdef CONFIG_DST_CACHE > /* Unclone the dst cache if there is one */ > if (new_md->u.tun_info.dst_cache.cache) { > --- > > Adding netdev maintainers for more opinions. > > >> > >> CC: Kees > >> > >> Best regards, Ilya Maximets. > >> > >>> > >>> I cannot reproduce the issue on Talos v1.12.X which uses a gcc built > >>> kernel, whereas the affected kernel is built with clang. Don't know > >>> whether this is relevant here. > >>> > >>> I am currently trying to make a reliable reproducer, but I can almost > >>> always trigger the issue when iperf stressing the VM-network. > >>> > >>> Please let me know if this should go to a more specific > >>> maintainer/list or further info is needed. I am able to test candidate > >>> patches if provided. > >>> > >>> Downstream bug reports: > >>> https://github.com/siderolabs/talos/issues/13440 > >>> https://github.com/kubeovn/kube-ovn/issues/6767 > >>> > >>> Thanks, > >>> Johan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-10 18:00 ` Ilya Maximets 2026-06-10 18:21 ` Kuniyuki Iwashima @ 2026-06-10 19:41 ` Kees Cook 1 sibling, 0 replies; 7+ messages in thread From: Kees Cook @ 2026-06-10 19:41 UTC (permalink / raw) To: Ilya Maximets Cc: Johan Thomsen, Jakub Kicinski, Paolo Abeni, Eric Dumazet, netdev, dev On Wed, Jun 10, 2026 at 08:00:53PM +0200, Ilya Maximets wrote: > On 6/10/26 3:10 PM, Johan Thomsen wrote: > > Hi Ilya, > > > > Sorry for the late follow-up. > > Your patch has now run without crashes in my setup for 36 hours and > > I'm unable to re-trigger a panic. > > > >> But the operation in the diff above is kind of pointless as the memcpy > >> itself will copy the value again. > > > > Right. So it would probably need at least a comment, if that ends up > > becoming the final fix. > > I'm not a kernel dev and I don't know what is the right thing to do here. > > > > Happy to test alternative patches if needed. > > OK. Thanks for testing. > > > > > BR > > Johan > > > > Den man. 8. jun. 2026 kl. 11.41 skrev Ilya Maximets <i.maximets@ovn.org>: > >> > >> On 6/8/26 10:25 AM, Johan Thomsen wrote: > >>> Hello, > >>> > >>> I am seeing what looks like a kernel bug in the Geneve/OVS/vhost > >>> transmit path on a Talos Linux node running Kube-ovn with Geneve > >>> overlay and KubeVirt VM traffic. > >>> > >>> Environment: > >>> > >>> Kernel: 6.18.33-talos > >>> Distro: Talos v1.13.3 > >>> > >>> Compiler/config: > >>> > >>> CONFIG_CC_VERSION_TEXT="clang version 22.1.2" > >>> CONFIG_CC_IS_CLANG=y > >>> CONFIG_LTO=y > >>> CONFIG_LTO_CLANG=y > >>> CONFIG_LTO_CLANG_THIN=y > >>> CONFIG_FORTIFY_SOURCE=y > >>> > >>> Hardware: HPE ProLiant DL325 Gen11, AMD EPYC > >>> > >>> NIC driver: bnxt_en > >>> > >>> Workload/network: > >>> > >>> Kube-OVN, Geneve overlay > >>> Open vSwitch datapath > >>> KubeVirt/QEMU VM traffic via vhost/tap > >>> > >>> Relevant console output: > >>> > >>> [ 648.742603] memcpy: detected buffer overflow: 104 byte write of > >>> buffer size 96 > >>> [ 648.749907] WARNING: CPU: 61 PID: 27020 at > >>> lib/string_helpers.c:1036 __fortify_report+0x45/0x60 > >>> [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc > >>> nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en > >>> nvme hpilo hkdf libahci sp5100_tco watchdog k10temp > >>> [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not > >>> tainted 6.18.29-talos #1 PREEMPT(none) > >>> [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 > >>> Gen11, BIOS 2.84 11/05/2025 > >>> [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 > >>> [ 648.895152] geneve_xmit+0x7fe/0xe00 > >>> [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 > >>> [ 648.911479] __dev_queue_xmit+0x864/0xf40 > >>> [ 648.919688] do_execute_actions+0x9b9/0x1be0 > >>> [ 648.927727] ovs_execute_actions+0x58/0x170 > >>> [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 > >>> [ 648.936370] ovs_vport_receive+0x90/0x100 > >>> [ 648.940428] netdev_frame_hook+0x146/0x1a0 > >>> [ 648.954093] __netif_receive_skb+0x3f/0x160 > >>> [ 648.958324] process_backlog+0x10c/0x210 > >>> [ 648.962295] __napi_poll+0x2f/0x190 > >>> [ 648.965832] net_rx_action+0x2e3/0x500 > >>> [ 648.969632] handle_softirqs+0xe7/0x310 > >>> [ 648.985387] tun_get_user+0x137e/0x1510 > >>> [ 649.005878] handle_tx+0x41f/0xd30 > >>> [ 649.029014] vhost_run_work_list+0x52/0x90 > >>> [ 649.033162] vhost_task_fn+0xc2/0x140 > >>> [ 649.064145] ---[ end trace 0000000000000000 ]--- > >>> [ 649.068820] ------------[ cut here ]------------ > >>> [ 649.073489] kernel BUG at lib/string_helpers.c:1043! > >>> > >>> I don't know whether this is a real overflow or a FORTIFY false-positive. > >> > >> Looks like a false-positive from the __counted_by fortification. > >> > >> I'd guess something like this would fit it: > >> > >> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > >> index 1fc2fb03ce3f9..e51c3795da474 100644 > >> --- a/include/net/dst_metadata.h > >> +++ b/include/net/dst_metadata.h > >> @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > >> if (!new_md) > >> return ERR_PTR(-ENOMEM); > >> > >> + new_md->u.tun_info.options_len = md_size; > >> memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > >> sizeof(struct ip_tunnel_info) + md_size); > >> #ifdef CONFIG_DST_CACHE > >> --- > >> > >> Johan, could you try this in your setup? > >> > >> The memory was actually allocated for the options, but the structure > >> is zeroed out on allocation, so the __counted_by check doesn't work > >> properly for the initial initialization copy. > >> > >> But the operation in the diff above is kind of pointless as the memcpy > >> itself will copy the value again. So, I'm not sure if that's the right > >> solution here. > >> > >> Alternative might be to revert the kmalloc_flex back to the simple > >> kmalloc in metadata_dst_alloc. > > A little less icky alternative might be to just split the copy in two: > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 1fc2fb03ce3f9..996ae8350360a 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -164,8 +164,12 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > if (!new_md) > return ERR_PTR(-ENOMEM); > > + /* Copy in two stages to keep the __counted_by happy. */ > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > - sizeof(struct ip_tunnel_info) + md_size); > + sizeof(struct ip_tunnel_info)); > + memcpy(ip_tunnel_info_opts(&new_md->u.tun_info), > + ip_tunnel_info_opts(&md_dst->u.tun_info), > + md_size); > #ifdef CONFIG_DST_CACHE > /* Unclone the dst cache if there is one */ > if (new_md->u.tun_info.dst_cache.cache) { > --- > > Adding netdev maintainers for more opinions. > > >> > >> CC: Kees Yeah, I think the split makes the most sense. This matches the proper spans, and is what we've done in the past in several places. It'll all get collapsed into the same binary output in most cases, so there's no real down-side. -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() 2026-06-08 9:41 ` Ilya Maximets 2026-06-10 13:10 ` Johan Thomsen @ 2026-06-10 19:51 ` Kees Cook 1 sibling, 0 replies; 7+ messages in thread From: Kees Cook @ 2026-06-10 19:51 UTC (permalink / raw) To: Ilya Maximets; +Cc: Johan Thomsen, netdev, dev On Mon, Jun 08, 2026 at 11:41:37AM +0200, Ilya Maximets wrote: > On 6/8/26 10:25 AM, Johan Thomsen wrote: > > Hello, > > > > I am seeing what looks like a kernel bug in the Geneve/OVS/vhost > > transmit path on a Talos Linux node running Kube-ovn with Geneve > > overlay and KubeVirt VM traffic. > > > > Environment: > > > > Kernel: 6.18.33-talos > > Distro: Talos v1.13.3 > > > > Compiler/config: > > > > CONFIG_CC_VERSION_TEXT="clang version 22.1.2" > > CONFIG_CC_IS_CLANG=y > > CONFIG_LTO=y > > CONFIG_LTO_CLANG=y > > CONFIG_LTO_CLANG_THIN=y > > CONFIG_FORTIFY_SOURCE=y > > > > Hardware: HPE ProLiant DL325 Gen11, AMD EPYC > > > > NIC driver: bnxt_en > > > > Workload/network: > > > > Kube-OVN, Geneve overlay > > Open vSwitch datapath > > KubeVirt/QEMU VM traffic via vhost/tap > > > > Relevant console output: > > > > [ 648.742603] memcpy: detected buffer overflow: 104 byte write of > > buffer size 96 > > [ 648.749907] WARNING: CPU: 61 PID: 27020 at > > lib/string_helpers.c:1036 __fortify_report+0x45/0x60 > > [ 648.758689] Modules linked in: dm_round_robin dm_multipath lpfc > > nvmet_fc nvmet intel_rapl_msr intel_rapl_common ahci nvme_auth bnxt_en > > nvme hpilo hkdf libahci sp5100_tco watchdog k10temp > > [ 648.775429] CPU: 61 UID: 107 PID: 27020 Comm: vhost-27002 Not > > tainted 6.18.29-talos #1 PREEMPT(none) > > [ 648.784735] Hardware name: HPE ProLiant DL325 Gen11/ProLiant DL325 > > Gen11, BIOS 2.84 11/05/2025 > > [ 648.890478] skb_tunnel_info_unclone+0x179/0x190 > > [ 648.895152] geneve_xmit+0x7fe/0xe00 > > [ 648.907240] dev_hard_start_xmit+0xa7/0x1f0 > > [ 648.911479] __dev_queue_xmit+0x864/0xf40 > > [ 648.919688] do_execute_actions+0x9b9/0x1be0 > > [ 648.927727] ovs_execute_actions+0x58/0x170 > > [ 648.931960] ovs_dp_process_packet+0xb1/0x1c0 > > [ 648.936370] ovs_vport_receive+0x90/0x100 > > [ 648.940428] netdev_frame_hook+0x146/0x1a0 > > [ 648.954093] __netif_receive_skb+0x3f/0x160 > > [ 648.958324] process_backlog+0x10c/0x210 > > [ 648.962295] __napi_poll+0x2f/0x190 > > [ 648.965832] net_rx_action+0x2e3/0x500 > > [ 648.969632] handle_softirqs+0xe7/0x310 > > [ 648.985387] tun_get_user+0x137e/0x1510 > > [ 649.005878] handle_tx+0x41f/0xd30 > > [ 649.029014] vhost_run_work_list+0x52/0x90 > > [ 649.033162] vhost_task_fn+0xc2/0x140 > > [ 649.064145] ---[ end trace 0000000000000000 ]--- > > [ 649.068820] ------------[ cut here ]------------ > > [ 649.073489] kernel BUG at lib/string_helpers.c:1043! > > > > I don't know whether this is a real overflow or a FORTIFY false-positive. > > Looks like a false-positive from the __counted_by fortification. > > I'd guess something like this would fit it: > > diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h > index 1fc2fb03ce3f9..e51c3795da474 100644 > --- a/include/net/dst_metadata.h > +++ b/include/net/dst_metadata.h > @@ -164,6 +164,7 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb) > if (!new_md) > return ERR_PTR(-ENOMEM); > > + new_md->u.tun_info.options_len = md_size; > memcpy(&new_md->u.tun_info, &md_dst->u.tun_info, > sizeof(struct ip_tunnel_info) + md_size); Speaking to this solution, it also makes sense, but does look redundant to the memcpy that follows it. I wonder something more in between would be better (the memcpy isn't needed to copy a struct, either): new_md->u.tun_info = md_dst->u.tun_info; memcpy(new_md->u.tun_info.options, md_dst->u.tun_info.options, md_dst->u.tun_info.options_len); Is this the only place in the kernel where a struct ip_tunnel_info is being copied? The above really looks like an open-coded helper. :) -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-10 19:51 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-08 8:25 [BUG] FORTIFY: memcpy overflow in skb_tunnel_info_unclone() from geneve_xmit() Johan Thomsen 2026-06-08 9:41 ` Ilya Maximets 2026-06-10 13:10 ` Johan Thomsen 2026-06-10 18:00 ` Ilya Maximets 2026-06-10 18:21 ` Kuniyuki Iwashima 2026-06-10 19:41 ` Kees Cook 2026-06-10 19:51 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox