* Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
@ 2024-09-11 22:20 Mitchell Augustin
2024-09-13 2:13 ` Jakub Kicinski
0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-11 22:20 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Jiri Pirko, Sebastian Andrzej Siewior, Lorenzo Bianconi,
Daniel Borkmann, netdev, linux-kernel
Cc: Jacob Martin, dann frazier
Hello,
We recently identified a bug still impacting upstream, triggered
occasionally by one of the kernel selftests (net/pmtu.sh) that
sometimes causes the following behavior:
* One of this tests's namespaced network devices does not get properly
cleaned up when the namespace is destroyed, evidenced by
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` appearing in the dmesg output repeatedly
* Once we start to see the above `unregister_netdevice` message, an
un-cancelable hang will occur on subsequent attempts to run `modprobe
ip6_vti` or `rmmod ip6_vti`
Jacob and I have both investigated various conditions under which this
bug state does / does not occur, which is documented more thoroughly
in the following BugLink:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501
We expect that veth_A-R1's refcount should be cleaned up by the time
execution of pmtu.sh finishes since the relevant namespaces are
deleted during cleanup of the test suite. We've observed this behavior
on several kernels, at least as old as stable branches like
linux-6.1.y and as recent as v6.11-rc6, so this does not seem like a
new regression. (did not have a chance to test on rc7 yet).
This issue also only occurs very infrequently, and reproducibility is
extremely susceptible to very minor timing variations in the pmtu.sh
test case. (in fact, I was unable to reproduce the bug with the
version of pmtu.sh and lib.sh in v6.11-rc6 - not because the kernel is
unaffected (it is affected, as confirmed by running an older kernel's
pmtu.sh on it), but because v6.11-rc6 introduces some unrelated
functional changes to the tests that cause a slightly longer test
execution time.)
It is also difficult to reproduce the bug on slower CPUs, or even on
faster CPUs where the cpufreq scaling governor is not set to
`performance`.
However, I can easily reproduce the issue on an Nvidia Grace/Hopper
machine (and other platforms with modern CPUs) with the performance
governor set by doing the following:
* Install/boot any affected kernel
* Clone the kernel tree just to get an older version of the test cases
without subtle timing changes that mask the issue (such as
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
* cd tools/testing/selftests/net
* while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
If running on an appropriately fast CPU, you should start seeing
`unregister_netdevice: waiting for veth_A-R1 to become free. Usage
count = 5` in dmesg at some point. (On Grace/Hopper, it happens in
under a minute, reliably). After that point, attempts to interact with
ip6_vti will hang.
Please let me know if there is any other info I can provide to assist
in debugging this.
Thanks,
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-11 22:20 Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest Mitchell Augustin
@ 2024-09-13 2:13 ` Jakub Kicinski
2024-09-13 13:45 ` Mitchell Augustin
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2024-09-13 2:13 UTC (permalink / raw)
To: Mitchell Augustin
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> We recently identified a bug still impacting upstream, triggered
> occasionally by one of the kernel selftests (net/pmtu.sh) that
> sometimes causes the following behavior:
> * One of this tests's namespaced network devices does not get properly
> cleaned up when the namespace is destroyed, evidenced by
> `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> count = 5` appearing in the dmesg output repeatedly
> * Once we start to see the above `unregister_netdevice` message, an
> un-cancelable hang will occur on subsequent attempts to run `modprobe
> ip6_vti` or `rmmod ip6_vti`
Thanks for the report! We have seen it in our CI as well, it happens
maybe once a day. But as you say on x86 is quite hard to reproduce,
and nothing obvious stood out as a culprit.
> However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> machine (and other platforms with modern CPUs) with the performance
> governor set by doing the following:
> * Install/boot any affected kernel
> * Clone the kernel tree just to get an older version of the test cases
> without subtle timing changes that mask the issue (such as
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> * cd tools/testing/selftests/net
> * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
That's exciting! Would you be able to try to cut down the test itself
(is quite long and has a ton of sub-cases). Figure out which sub-cases
trigger this? And maybe with an even quicker repro we'll bisect or
someone will correctly guess the fix?
Somewhat tangentially but if you'd be willing I wouldn't mind if you
were to send patches to break this test up upstream, too. It takes
1h23m to run with various debug kernel options enabled. If we split
it into multiple smaller tests each running 10min or 20min we can
then spawn multiple VMs and get the results faster.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-13 2:13 ` Jakub Kicinski
@ 2024-09-13 13:45 ` Mitchell Augustin
2024-09-13 13:50 ` Eric Dumazet
2024-09-13 18:51 ` Jakub Kicinski
0 siblings, 2 replies; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-13 13:45 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
Hi Jakub,
Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
trigger the pmtu_ipv6_ipv6_exception sub-case, which only takes a
second to run on my machines, so you shouldn't need to run the
entirety of pmtu.sh to trigger the bug. It won't trigger on attempt
#1, but in my experience, when I do it in that while loop, it will
trigger in under a minute reliably.
> Somewhat tangentially but if you'd be willing I wouldn't mind if you
> were to send patches to break this test up upstream, too. It takes
> 1h23m to run with various debug kernel options enabled. If we split
> it into multiple smaller tests each running 10min or 20min we can
> then spawn multiple VMs and get the results faster.
This logical division of tests already exists in pmtu.sh if you pass a
sub-test name in as the first parameter like above, but if you think
there would be value in separating them out further or into different
files not all in pmtu.sh, I would be happy to help with that. Just let
me know.
Regardless, I will go ahead and work on a new regression test that
executes just our quick reproducer for this specific bug and will send
it to this list.
Thanks,
Mitchell Augustin
On Thu, Sep 12, 2024 at 9:13 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> > We recently identified a bug still impacting upstream, triggered
> > occasionally by one of the kernel selftests (net/pmtu.sh) that
> > sometimes causes the following behavior:
> > * One of this tests's namespaced network devices does not get properly
> > cleaned up when the namespace is destroyed, evidenced by
> > `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> > count = 5` appearing in the dmesg output repeatedly
> > * Once we start to see the above `unregister_netdevice` message, an
> > un-cancelable hang will occur on subsequent attempts to run `modprobe
> > ip6_vti` or `rmmod ip6_vti`
>
> Thanks for the report! We have seen it in our CI as well, it happens
> maybe once a day. But as you say on x86 is quite hard to reproduce,
> and nothing obvious stood out as a culprit.
>
> > However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> > machine (and other platforms with modern CPUs) with the performance
> > governor set by doing the following:
> > * Install/boot any affected kernel
> > * Clone the kernel tree just to get an older version of the test cases
> > without subtle timing changes that mask the issue (such as
> > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> > * cd tools/testing/selftests/net
> > * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
>
> That's exciting! Would you be able to try to cut down the test itself
> (is quite long and has a ton of sub-cases). Figure out which sub-cases
> trigger this? And maybe with an even quicker repro we'll bisect or
> someone will correctly guess the fix?
>
> Somewhat tangentially but if you'd be willing I wouldn't mind if you
> were to send patches to break this test up upstream, too. It takes
> 1h23m to run with various debug kernel options enabled. If we split
> it into multiple smaller tests each running 10min or 20min we can
> then spawn multiple VMs and get the results faster.
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-13 13:45 ` Mitchell Augustin
@ 2024-09-13 13:50 ` Eric Dumazet
2024-09-13 18:51 ` Jakub Kicinski
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2024-09-13 13:50 UTC (permalink / raw)
To: Mitchell Augustin
Cc: Jakub Kicinski, David S. Miller, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
On Fri, Sep 13, 2024 at 3:45 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> Hi Jakub,
> Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> trigger the pmtu_ipv6_ipv6_exception sub-case, which only takes a
> second to run on my machines, so you shouldn't need to run the
> entirety of pmtu.sh to trigger the bug. It won't trigger on attempt
> #1, but in my experience, when I do it in that while loop, it will
> trigger in under a minute reliably.
>
> > Somewhat tangentially but if you'd be willing I wouldn't mind if you
> > were to send patches to break this test up upstream, too. It takes
> > 1h23m to run with various debug kernel options enabled. If we split
> > it into multiple smaller tests each running 10min or 20min we can
> > then spawn multiple VMs and get the results faster.
>
> This logical division of tests already exists in pmtu.sh if you pass a
> sub-test name in as the first parameter like above, but if you think
> there would be value in separating them out further or into different
> files not all in pmtu.sh, I would be happy to help with that. Just let
> me know.
>
> Regardless, I will go ahead and work on a new regression test that
> executes just our quick reproducer for this specific bug and will send
> it to this list.
>
> Thanks,
> Mitchell Augustin
>
> On Thu, Sep 12, 2024 at 9:13 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 11 Sep 2024 17:20:29 -0500 Mitchell Augustin wrote:
> > > We recently identified a bug still impacting upstream, triggered
> > > occasionally by one of the kernel selftests (net/pmtu.sh) that
> > > sometimes causes the following behavior:
> > > * One of this tests's namespaced network devices does not get properly
> > > cleaned up when the namespace is destroyed, evidenced by
> > > `unregister_netdevice: waiting for veth_A-R1 to become free. Usage
> > > count = 5` appearing in the dmesg output repeatedly
> > > * Once we start to see the above `unregister_netdevice` message, an
> > > un-cancelable hang will occur on subsequent attempts to run `modprobe
> > > ip6_vti` or `rmmod ip6_vti`
> >
> > Thanks for the report! We have seen it in our CI as well, it happens
> > maybe once a day. But as you say on x86 is quite hard to reproduce,
> > and nothing obvious stood out as a culprit.
> >
> > > However, I can easily reproduce the issue on an Nvidia Grace/Hopper
> > > machine (and other platforms with modern CPUs) with the performance
> > > governor set by doing the following:
> > > * Install/boot any affected kernel
> > > * Clone the kernel tree just to get an older version of the test cases
> > > without subtle timing changes that mask the issue (such as
> > > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/tree/?h=Ubuntu-6.8.0-39.39)
> > > * cd tools/testing/selftests/net
> > > * while true; do sudo ./pmtu.sh pmtu_ipv6_ipv6_exception; done
> >
> > That's exciting! Would you be able to try to cut down the test itself
> > (is quite long and has a ton of sub-cases). Figure out which sub-cases
> > trigger this? And maybe with an even quicker repro we'll bisect or
> > someone will correctly guess the fix?
> >
> > Somewhat tangentially but if you'd be willing I wouldn't mind if you
> > were to send patches to break this test up upstream, too. It takes
> > 1h23m to run with various debug kernel options enabled. If we split
> > it into multiple smaller tests each running 10min or 20min we can
> > then spawn multiple VMs and get the results faster.
>
Note that this issue has been discussed already with Paolo Abeni.
The problem lies in dst_cache infrastructure.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-13 13:45 ` Mitchell Augustin
2024-09-13 13:50 ` Eric Dumazet
@ 2024-09-13 18:51 ` Jakub Kicinski
2024-09-13 18:59 ` Mitchell Augustin
1 sibling, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2024-09-13 18:51 UTC (permalink / raw)
To: Mitchell Augustin
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
On Fri, 13 Sep 2024 08:45:22 -0500 Mitchell Augustin wrote:
> Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> trigger the pmtu_ipv6_ipv6_exception sub-case
Sorry, I missed that you identified the test case.
The split of the test is quite tangential, then.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-13 18:51 ` Jakub Kicinski
@ 2024-09-13 18:59 ` Mitchell Augustin
2024-09-16 19:25 ` Mitchell Augustin
0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-13 18:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
> Sorry, I missed that you identified the test case.
All good!
I will still plan to turn the reproducer for this bug into its own
regression test. I think there would still be value in having an
individual case that can more reliably trigger this specific issue.
Thanks,
On Fri, Sep 13, 2024 at 1:51 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 13 Sep 2024 08:45:22 -0500 Mitchell Augustin wrote:
> > Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> > trigger the pmtu_ipv6_ipv6_exception sub-case
>
> Sorry, I missed that you identified the test case.
> The split of the test is quite tangential, then.
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
Email:mitchell.augustin@canonical.com
Location:Illinois, United States of America
canonical.com
ubuntu.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-13 18:59 ` Mitchell Augustin
@ 2024-09-16 19:25 ` Mitchell Augustin
2024-09-23 20:01 ` Mitchell Augustin
0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-16 19:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
Linking in this thread as well - I submitted a patch to net-next with
a reproducer for just this bug. It works reliably on Grace/Grace on
v6.11 (and prior kernels already known to be affected), but I have not
had a chance to test it on other platforms yet. Let me know if I need
to adjust anything and whether it reproduces the bug on your machines.
Patch w/ reproducer:
https://lore.kernel.org/all/20240916191857.1082092-1-mitchell.augustin@canonical.com/
Thanks!
On Fri, Sep 13, 2024 at 1:59 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> > Sorry, I missed that you identified the test case.
>
> All good!
>
> I will still plan to turn the reproducer for this bug into its own
> regression test. I think there would still be value in having an
> individual case that can more reliably trigger this specific issue.
>
> Thanks,
>
> On Fri, Sep 13, 2024 at 1:51 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 13 Sep 2024 08:45:22 -0500 Mitchell Augustin wrote:
> > > Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> > > trigger the pmtu_ipv6_ipv6_exception sub-case
> >
> > Sorry, I missed that you identified the test case.
> > The split of the test is quite tangential, then.
>
>
>
> --
> Mitchell Augustin
> Software Engineer - Ubuntu Partner Engineering
> Email:mitchell.augustin@canonical.com
> Location:Illinois, United States of America
>
>
> canonical.com
> ubuntu.com
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-16 19:25 ` Mitchell Augustin
@ 2024-09-23 20:01 ` Mitchell Augustin
2024-09-23 20:14 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-23 20:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
Hi!
I'm wondering if anyone has taken a look at my reproducer yet. I'd
love to know if it has helped any of you reproduce the bug more
easily.
Patch w/ reproducer:
https://lore.kernel.org/all/20240916191857.1082092-1-mitchell.augustin@canonical.com/
Thanks,
Mitchell Augustin
On Mon, Sep 16, 2024 at 2:25 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> Linking in this thread as well - I submitted a patch to net-next with
> a reproducer for just this bug. It works reliably on Grace/Grace on
> v6.11 (and prior kernels already known to be affected), but I have not
> had a chance to test it on other platforms yet. Let me know if I need
> to adjust anything and whether it reproduces the bug on your machines.
>
> Patch w/ reproducer:
> https://lore.kernel.org/all/20240916191857.1082092-1-mitchell.augustin@canonical.com/
>
> Thanks!
>
> On Fri, Sep 13, 2024 at 1:59 PM Mitchell Augustin
> <mitchell.augustin@canonical.com> wrote:
> >
> > > Sorry, I missed that you identified the test case.
> >
> > All good!
> >
> > I will still plan to turn the reproducer for this bug into its own
> > regression test. I think there would still be value in having an
> > individual case that can more reliably trigger this specific issue.
> >
> > Thanks,
> >
> > On Fri, Sep 13, 2024 at 1:51 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Fri, 13 Sep 2024 08:45:22 -0500 Mitchell Augustin wrote:
> > > > Executing ./pmtu.sh pmtu_ipv6_ipv6_exception manually will only
> > > > trigger the pmtu_ipv6_ipv6_exception sub-case
> > >
> > > Sorry, I missed that you identified the test case.
> > > The split of the test is quite tangential, then.
> >
> >
> >
> > --
> > Mitchell Augustin
> > Software Engineer - Ubuntu Partner Engineering
> > Email:mitchell.augustin@canonical.com
> > Location:Illinois, United States of America
> >
> >
> > canonical.com
> > ubuntu.com
>
>
>
> --
> Mitchell Augustin
> Software Engineer - Ubuntu Partner Engineering
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-23 20:01 ` Mitchell Augustin
@ 2024-09-23 20:14 ` Eric Dumazet
2024-09-24 16:21 ` Mitchell Augustin
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2024-09-23 20:14 UTC (permalink / raw)
To: Mitchell Augustin
Cc: Jakub Kicinski, David S. Miller, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
On Mon, Sep 23, 2024 at 10:01 PM Mitchell Augustin
<mitchell.augustin@canonical.com> wrote:
>
> Hi!
>
> I'm wondering if anyone has taken a look at my reproducer yet. I'd
> love to know if it has helped any of you reproduce the bug more
> easily.
>
> Patch w/ reproducer:
> https://lore.kernel.org/all/20240916191857.1082092-1-mitchell.augustin@canonical.com/
>
As I said before, we were aware of this issue, well before your report.
We have no efficient fix yet.
https://lore.kernel.org/netdev/202405311808.vqBTwxEf-lkp@intel.com/T/
You can disable dst_cache, this should remove the issue.
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 70c634b9e7b02300188582a1634d5977838db132..53351ff58b35dbee37ff587f7ef8f72580d9e116
100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -142,12 +142,7 @@ EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
{
- dst_cache->cache = alloc_percpu_gfp(struct dst_cache_pcpu,
- gfp | __GFP_ZERO);
- if (!dst_cache->cache)
- return -ENOMEM;
-
- dst_cache_reset(dst_cache);
+ dst_cache->cache = NULL;
return 0;
}
EXPORT_SYMBOL_GPL(dst_cache_init);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest
2024-09-23 20:14 ` Eric Dumazet
@ 2024-09-24 16:21 ` Mitchell Augustin
0 siblings, 0 replies; 10+ messages in thread
From: Mitchell Augustin @ 2024-09-24 16:21 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jakub Kicinski, David S. Miller, Paolo Abeni, Jiri Pirko,
Sebastian Andrzej Siewior, Lorenzo Bianconi, Daniel Borkmann,
netdev, linux-kernel, Jacob Martin, dann frazier
> As I said before, we were aware of this issue, well before your report.
Yes, sorry - to clarify, I wasn't commenting on the state of the bug
itself, just asking whether my reproducer is helpful in exposing it
more easily/reliably and on more systems than other known methods, per
Jakub's original request.
In any case, thank you for the additional context on the bug itself though.
-Mitchell Augustin
On Mon, Sep 23, 2024 at 3:14 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Sep 23, 2024 at 10:01 PM Mitchell Augustin
> <mitchell.augustin@canonical.com> wrote:
> >
> > Hi!
> >
> > I'm wondering if anyone has taken a look at my reproducer yet. I'd
> > love to know if it has helped any of you reproduce the bug more
> > easily.
> >
> > Patch w/ reproducer:
> > https://lore.kernel.org/all/20240916191857.1082092-1-mitchell.augustin@canonical.com/
> >
>
> As I said before, we were aware of this issue, well before your report.
>
> We have no efficient fix yet.
> https://lore.kernel.org/netdev/202405311808.vqBTwxEf-lkp@intel.com/T/
>
> You can disable dst_cache, this should remove the issue.
>
> diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
> index 70c634b9e7b02300188582a1634d5977838db132..53351ff58b35dbee37ff587f7ef8f72580d9e116
> 100644
> --- a/net/core/dst_cache.c
> +++ b/net/core/dst_cache.c
> @@ -142,12 +142,7 @@ EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
>
> int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
> {
> - dst_cache->cache = alloc_percpu_gfp(struct dst_cache_pcpu,
> - gfp | __GFP_ZERO);
> - if (!dst_cache->cache)
> - return -ENOMEM;
> -
> - dst_cache_reset(dst_cache);
> + dst_cache->cache = NULL;
> return 0;
> }
> EXPORT_SYMBOL_GPL(dst_cache_init);
--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-09-24 16:21 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 22:20 Namespaced network devices not cleaned up properly after execution of pmtu.sh kernel selftest Mitchell Augustin
2024-09-13 2:13 ` Jakub Kicinski
2024-09-13 13:45 ` Mitchell Augustin
2024-09-13 13:50 ` Eric Dumazet
2024-09-13 18:51 ` Jakub Kicinski
2024-09-13 18:59 ` Mitchell Augustin
2024-09-16 19:25 ` Mitchell Augustin
2024-09-23 20:01 ` Mitchell Augustin
2024-09-23 20:14 ` Eric Dumazet
2024-09-24 16:21 ` Mitchell Augustin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).