* [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts
@ 2025-07-14 19:57 Salvatore Bonaccorso
2025-07-15 9:43 ` Guillaume Nault
0 siblings, 1 reply; 4+ messages in thread
From: Salvatore Bonaccorso @ 2025-07-14 19:57 UTC (permalink / raw)
To: Guillaume Nault, Stefano Brivio, Aaron Conole, Jakub Kicinski,
David S. Miller, David Ahern, Eric Dumazet, Simon Horman, netdev,
Paolo Abeni, Charles Bordet
Cc: linux-kernel, regressions, stable, 1108860
Hi,
Charles Bordet reported the following issue (full context in
https://bugs.debian.org/1108860)
> Dear Maintainer,
>
> What led up to the situation?
> We run a production environment using Debian 12 VMs, with a network
> topology involving VXLAN tunnels encapsulated inside Wireguard
> interfaces. This setup has worked reliably for over a year, with MTU set
> to 1500 on all interfaces except the Wireguard interface (set to 1420).
> Wireguard kernel fragmentation allowed this configuration to function
> without issues, even though the effective path MTU is lower than 1500.
>
> What exactly did you do (or not do) that was effective (or ineffective)?
> We performed a routine system upgrade, updating all packages include the
> kernel. After the upgrade, we observed severe network issues (timeouts,
> very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
> router. SSH and small-packet traffic continued to work.
>
> To diagnose, we:
>
> * Restored a backup (with the previous kernel): the problem disappeared.
> * Repeated the upgrade, confirming the issue reappeared.
> * Systematically tested each kernel version from 6.1.124-1 up to
> 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
> versions work as expected.
> * Kernel version from the backports (6.12.32-1) did not resolve the
> problem.
>
> What was the outcome of this action?
>
> * With kernel 6.1.135-1 or later, network timeouts occur for
> large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
> protocols work.
> * With kernel 6.1.133-1 or earlier, everything works as expected.
>
> What outcome did you expect instead?
> We expected the network to function as before, with Wireguard handling
> fragmentation transparently and no application-level timeouts,
> regardless of the kernel version.
While triaging the issue we found that the commit 8930424777e4
("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
the issue and Charles confirmed that the issue was present as well in
6.12.35 and 6.15.4 (other version up could potentially still be
affected, but we wanted to check it is not a 6.1.y specific
regression).
Reverthing the commit fixes Charles' issue.
Does that ring a bell?
Regards,
Salvatore
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts
2025-07-14 19:57 [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts Salvatore Bonaccorso
@ 2025-07-15 9:43 ` Guillaume Nault
2025-07-16 12:44 ` Aaron Conole
0 siblings, 1 reply; 4+ messages in thread
From: Guillaume Nault @ 2025-07-15 9:43 UTC (permalink / raw)
To: Salvatore Bonaccorso
Cc: Stefano Brivio, Aaron Conole, Jakub Kicinski, David S. Miller,
David Ahern, Eric Dumazet, Simon Horman, netdev, Paolo Abeni,
Charles Bordet, linux-kernel, regressions, stable, 1108860
On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> Charles Bordet reported the following issue (full context in
> https://bugs.debian.org/1108860)
>
> > Dear Maintainer,
> >
> > What led up to the situation?
> > We run a production environment using Debian 12 VMs, with a network
> > topology involving VXLAN tunnels encapsulated inside Wireguard
> > interfaces. This setup has worked reliably for over a year, with MTU set
> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
> > Wireguard kernel fragmentation allowed this configuration to function
> > without issues, even though the effective path MTU is lower than 1500.
> >
> > What exactly did you do (or not do) that was effective (or ineffective)?
> > We performed a routine system upgrade, updating all packages include the
> > kernel. After the upgrade, we observed severe network issues (timeouts,
> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
> > router. SSH and small-packet traffic continued to work.
> >
> > To diagnose, we:
> >
> > * Restored a backup (with the previous kernel): the problem disappeared.
> > * Repeated the upgrade, confirming the issue reappeared.
> > * Systematically tested each kernel version from 6.1.124-1 up to
> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
> > versions work as expected.
> > * Kernel version from the backports (6.12.32-1) did not resolve the
> > problem.
> >
> > What was the outcome of this action?
> >
> > * With kernel 6.1.135-1 or later, network timeouts occur for
> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
> > protocols work.
> > * With kernel 6.1.133-1 or earlier, everything works as expected.
> >
> > What outcome did you expect instead?
> > We expected the network to function as before, with Wireguard handling
> > fragmentation transparently and no application-level timeouts,
> > regardless of the kernel version.
>
> While triaging the issue we found that the commit 8930424777e4
> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
> the issue and Charles confirmed that the issue was present as well in
> 6.12.35 and 6.15.4 (other version up could potentially still be
> affected, but we wanted to check it is not a 6.1.y specific
> regression).
>
> Reverthing the commit fixes Charles' issue.
>
> Does that ring a bell?
It doesn't ring a bell. Do you have more details on the setup that has
the problem? Or, ideally, a self-contained reproducer?
> Regards,
> Salvatore
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts
2025-07-15 9:43 ` Guillaume Nault
@ 2025-07-16 12:44 ` Aaron Conole
2025-08-30 19:03 ` Bug#1108860: " Salvatore Bonaccorso
0 siblings, 1 reply; 4+ messages in thread
From: Aaron Conole @ 2025-07-16 12:44 UTC (permalink / raw)
To: Guillaume Nault
Cc: Salvatore Bonaccorso, Stefano Brivio, Jakub Kicinski,
David S. Miller, David Ahern, Eric Dumazet, Simon Horman, netdev,
Paolo Abeni, Charles Bordet, linux-kernel, regressions, stable,
1108860
Guillaume Nault <gnault@redhat.com> writes:
> On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
>> Hi,
>>
>> Charles Bordet reported the following issue (full context in
>> https://bugs.debian.org/1108860)
>>
>> > Dear Maintainer,
>> >
>> > What led up to the situation?
>> > We run a production environment using Debian 12 VMs, with a network
>> > topology involving VXLAN tunnels encapsulated inside Wireguard
>> > interfaces. This setup has worked reliably for over a year, with MTU set
>> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
>> > Wireguard kernel fragmentation allowed this configuration to function
>> > without issues, even though the effective path MTU is lower than 1500.
>> >
>> > What exactly did you do (or not do) that was effective (or ineffective)?
>> > We performed a routine system upgrade, updating all packages include the
>> > kernel. After the upgrade, we observed severe network issues (timeouts,
>> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
>> > router. SSH and small-packet traffic continued to work.
>> >
>> > To diagnose, we:
>> >
>> > * Restored a backup (with the previous kernel): the problem disappeared.
>> > * Repeated the upgrade, confirming the issue reappeared.
>> > * Systematically tested each kernel version from 6.1.124-1 up to
>> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
>> > versions work as expected.
>> > * Kernel version from the backports (6.12.32-1) did not resolve the
>> > problem.
>> >
>> > What was the outcome of this action?
>> >
>> > * With kernel 6.1.135-1 or later, network timeouts occur for
>> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
>> > protocols work.
>> > * With kernel 6.1.133-1 or earlier, everything works as expected.
>> >
>> > What outcome did you expect instead?
>> > We expected the network to function as before, with Wireguard handling
>> > fragmentation transparently and no application-level timeouts,
>> > regardless of the kernel version.
>>
>> While triaging the issue we found that the commit 8930424777e4
>> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
>> the issue and Charles confirmed that the issue was present as well in
>> 6.12.35 and 6.15.4 (other version up could potentially still be
>> affected, but we wanted to check it is not a 6.1.y specific
>> regression).
>>
>> Reverthing the commit fixes Charles' issue.
>>
>> Does that ring a bell?
>
> It doesn't ring a bell. Do you have more details on the setup that has
> the problem? Or, ideally, a self-contained reproducer?
+1 - I tested this patch with an OVS setup using vxlan and geneve
tunnels. A reproducer or more details would help.
>> Regards,
>> Salvatore
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Bug#1108860: [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts
2025-07-16 12:44 ` Aaron Conole
@ 2025-08-30 19:03 ` Salvatore Bonaccorso
0 siblings, 0 replies; 4+ messages in thread
From: Salvatore Bonaccorso @ 2025-08-30 19:03 UTC (permalink / raw)
To: Aaron Conole, 1108860, Charles Bordet
Cc: Guillaume Nault, Stefano Brivio, Jakub Kicinski, David S. Miller,
David Ahern, Eric Dumazet, Simon Horman, netdev, Paolo Abeni,
Charles Bordet, linux-kernel, regressions, stable
Hi,
On Wed, Jul 16, 2025 at 08:44:55AM -0400, Aaron Conole wrote:
> Guillaume Nault <gnault@redhat.com> writes:
>
> > On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
> >> Hi,
> >>
> >> Charles Bordet reported the following issue (full context in
> >> https://bugs.debian.org/1108860)
> >>
> >> > Dear Maintainer,
> >> >
> >> > What led up to the situation?
> >> > We run a production environment using Debian 12 VMs, with a network
> >> > topology involving VXLAN tunnels encapsulated inside Wireguard
> >> > interfaces. This setup has worked reliably for over a year, with MTU set
> >> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
> >> > Wireguard kernel fragmentation allowed this configuration to function
> >> > without issues, even though the effective path MTU is lower than 1500.
> >> >
> >> > What exactly did you do (or not do) that was effective (or ineffective)?
> >> > We performed a routine system upgrade, updating all packages include the
> >> > kernel. After the upgrade, we observed severe network issues (timeouts,
> >> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
> >> > router. SSH and small-packet traffic continued to work.
> >> >
> >> > To diagnose, we:
> >> >
> >> > * Restored a backup (with the previous kernel): the problem disappeared.
> >> > * Repeated the upgrade, confirming the issue reappeared.
> >> > * Systematically tested each kernel version from 6.1.124-1 up to
> >> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
> >> > versions work as expected.
> >> > * Kernel version from the backports (6.12.32-1) did not resolve the
> >> > problem.
> >> >
> >> > What was the outcome of this action?
> >> >
> >> > * With kernel 6.1.135-1 or later, network timeouts occur for
> >> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
> >> > protocols work.
> >> > * With kernel 6.1.133-1 or earlier, everything works as expected.
> >> >
> >> > What outcome did you expect instead?
> >> > We expected the network to function as before, with Wireguard handling
> >> > fragmentation transparently and no application-level timeouts,
> >> > regardless of the kernel version.
> >>
> >> While triaging the issue we found that the commit 8930424777e4
> >> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
> >> the issue and Charles confirmed that the issue was present as well in
> >> 6.12.35 and 6.15.4 (other version up could potentially still be
> >> affected, but we wanted to check it is not a 6.1.y specific
> >> regression).
> >>
> >> Reverthing the commit fixes Charles' issue.
> >>
> >> Does that ring a bell?
> >
> > It doesn't ring a bell. Do you have more details on the setup that has
> > the problem? Or, ideally, a self-contained reproducer?
>
> +1 - I tested this patch with an OVS setup using vxlan and geneve
> tunnels. A reproducer or more details would help.
Charles, any news here, did you found a way to provide a
self-contained reproducer for your issue?
Does the issue still reproeduce for you on the most current version of
each of the affected dstable series?
Regards,
Salvatore
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-30 19:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-14 19:57 [regression] Wireguard fragmentation fails with VXLAN since 8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().") causing network timeouts Salvatore Bonaccorso
2025-07-15 9:43 ` Guillaume Nault
2025-07-16 12:44 ` Aaron Conole
2025-08-30 19:03 ` Bug#1108860: " Salvatore Bonaccorso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).