netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [TEST] TCP MD5 vs kmemleak
@ 2024-06-17 14:24 Jakub Kicinski
  2024-06-18  3:24 ` Dmitry Safonov
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-17 14:24 UTC (permalink / raw)
  To: Dmitry Safonov; +Cc: netdev@vger.kernel.org

Hi Dmitry!

We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
to trip it:

https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout

Could you take a look? kmemleak is not infallible, it could be a false
positive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-17 14:24 [TEST] TCP MD5 vs kmemleak Jakub Kicinski
@ 2024-06-18  3:24 ` Dmitry Safonov
  2024-06-18 14:40   ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Safonov @ 2024-06-18  3:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev@vger.kernel.org

Hi Jakub,

thanks for pinging,

On Mon, 17 Jun 2024 at 15:24, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Hi Dmitry!
>
> We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
> to trip it:
>
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout
>
> Could you take a look? kmemleak is not infallible, it could be a false
> positive.

Sure, that seems somewhat interesting, albeit at this moment not from
the TCP side :D

There is some pre-history to the related issue here:
https://lore.kernel.org/lkml/0000000000004d83170605e16003@google.com/

Which I was quite sure being addressed with what now is
https://git.kernel.org/linus/5f98fd034ca6

But now that I look at that commit, I see that kvfree_call_rcu() is
defined to __kvfree_call_rcu() under CONFIG_KASAN_GENERIC=n. And I
don't see the same kmemleak_ignore() on that path.

To double-check, you don't have kasan enabled on netdev runners, right?

And then straight away to another thought: the leak-report that you
get currently is ao_info, which should not happen if kfree_rcu() is
properly fixed.
But I'd expect kmemleak to be unhappy with ao keys freeing as well:
they are currently released with call_rcu(&key->rcu,
tcp_ao_key_free_rcu), which doesn't have a hint for kmemleak, too.

I'm going to take a look at it this week. Just to let you know, I'm
also looking at fixing those somewhat rare flakes on tcp-ao counters
checks (but that may be net-next material together with tracepoint
selftests).

Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18  3:24 ` Dmitry Safonov
@ 2024-06-18 14:40   ` Jakub Kicinski
  2024-06-18 16:17     ` Paul E. McKenney
  2024-06-18 16:30     ` Paolo Abeni
  0 siblings, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-18 14:40 UTC (permalink / raw)
  To: Dmitry Safonov; +Cc: netdev@vger.kernel.org, rcu

On Tue, 18 Jun 2024 04:24:08 +0100 Dmitry Safonov wrote:
> Hi Jakub,
> 
> thanks for pinging,
> 
> On Mon, 17 Jun 2024 at 15:24, Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > Hi Dmitry!
> >
> > We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
> > to trip it:
> >
> > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout
> >
> > Could you take a look? kmemleak is not infallible, it could be a false
> > positive.  
> 
> Sure, that seems somewhat interesting, albeit at this moment not from
> the TCP side :D
> 
> There is some pre-history to the related issue here:
> https://lore.kernel.org/lkml/0000000000004d83170605e16003@google.com/
> 
> Which I was quite sure being addressed with what now is
> https://git.kernel.org/linus/5f98fd034ca6
> 
> But now that I look at that commit, I see that kvfree_call_rcu() is
> defined to __kvfree_call_rcu() under CONFIG_KASAN_GENERIC=n. And I
> don't see the same kmemleak_ignore() on that path.
> 
> To double-check, you don't have kasan enabled on netdev runners, right?

We do:

CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y

here's the full config:
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/645202/config

> And then straight away to another thought: the leak-report that you
> get currently is ao_info, which should not happen if kfree_rcu() is
> properly fixed.
> But I'd expect kmemleak to be unhappy with ao keys freeing as well:
> they are currently released with call_rcu(&key->rcu,
> tcp_ao_key_free_rcu), which doesn't have a hint for kmemleak, too.
> 
> I'm going to take a look at it this week. Just to let you know, I'm
> also looking at fixing those somewhat rare flakes on tcp-ao counters
> checks (but that may be net-next material together with tracepoint
> selftests).

Let me add rcu@ to CC, perhaps folks there can guide us on known false
positives with KASAN + kmemleak?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 14:40   ` Jakub Kicinski
@ 2024-06-18 16:17     ` Paul E. McKenney
  2024-06-18 16:30     ` Paolo Abeni
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2024-06-18 16:17 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Dmitry Safonov, netdev@vger.kernel.org, rcu

On Tue, Jun 18, 2024 at 07:40:37AM -0700, Jakub Kicinski wrote:
> On Tue, 18 Jun 2024 04:24:08 +0100 Dmitry Safonov wrote:
> > Hi Jakub,
> > 
> > thanks for pinging,
> > 
> > On Mon, 17 Jun 2024 at 15:24, Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > Hi Dmitry!
> > >
> > > We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
> > > to trip it:
> > >
> > > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout
> > >
> > > Could you take a look? kmemleak is not infallible, it could be a false
> > > positive.  
> > 
> > Sure, that seems somewhat interesting, albeit at this moment not from
> > the TCP side :D
> > 
> > There is some pre-history to the related issue here:
> > https://lore.kernel.org/lkml/0000000000004d83170605e16003@google.com/
> > 
> > Which I was quite sure being addressed with what now is
> > https://git.kernel.org/linus/5f98fd034ca6
> > 
> > But now that I look at that commit, I see that kvfree_call_rcu() is
> > defined to __kvfree_call_rcu() under CONFIG_KASAN_GENERIC=n. And I
> > don't see the same kmemleak_ignore() on that path.
> > 
> > To double-check, you don't have kasan enabled on netdev runners, right?
> 
> We do:
> 
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> 
> here's the full config:
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/645202/config
> 
> > And then straight away to another thought: the leak-report that you
> > get currently is ao_info, which should not happen if kfree_rcu() is
> > properly fixed.
> > But I'd expect kmemleak to be unhappy with ao keys freeing as well:
> > they are currently released with call_rcu(&key->rcu,
> > tcp_ao_key_free_rcu), which doesn't have a hint for kmemleak, too.
> > 
> > I'm going to take a look at it this week. Just to let you know, I'm
> > also looking at fixing those somewhat rare flakes on tcp-ao counters
> > checks (but that may be net-next material together with tracepoint
> > selftests).
> 
> Let me add rcu@ to CC, perhaps folks there can guide us on known false
> positives with KASAN + kmemleak?

I myself don't run KASAN with kmemleak, but maybe some of the others
on this list do.

What sort of hint should be we add to call_rcu()?  The memory is still
reachable in the garbage-collector sense.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 14:40   ` Jakub Kicinski
  2024-06-18 16:17     ` Paul E. McKenney
@ 2024-06-18 16:30     ` Paolo Abeni
  2024-06-18 16:42       ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Abeni @ 2024-06-18 16:30 UTC (permalink / raw)
  To: Jakub Kicinski, Dmitry Safonov; +Cc: netdev@vger.kernel.org, rcu

On Tue, 2024-06-18 at 07:40 -0700, Jakub Kicinski wrote:
> On Tue, 18 Jun 2024 04:24:08 +0100 Dmitry Safonov wrote:
> > Hi Jakub,
> > 
> > thanks for pinging,
> > 
> > On Mon, 17 Jun 2024 at 15:24, Jakub Kicinski <kuba@kernel.org> wrote:
> > > 
> > > Hi Dmitry!
> > > 
> > > We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
> > > to trip it:
> > > 
> > > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout
> > > 
> > > Could you take a look? kmemleak is not infallible, it could be a false
> > > positive.  
> > 
> > Sure, that seems somewhat interesting, albeit at this moment not from
> > the TCP side :D
> > 
> > There is some pre-history to the related issue here:
> > https://lore.kernel.org/lkml/0000000000004d83170605e16003@google.com/
> > 
> > Which I was quite sure being addressed with what now is
> > https://git.kernel.org/linus/5f98fd034ca6
> > 
> > But now that I look at that commit, I see that kvfree_call_rcu() is
> > defined to __kvfree_call_rcu() under CONFIG_KASAN_GENERIC=n. And I
> > don't see the same kmemleak_ignore() on that path.
> > 
> > To double-check, you don't have kasan enabled on netdev runners, right?
> 
> We do:
> 
> CONFIG_KASAN=y
> CONFIG_KASAN_GENERIC=y
> 
> here's the full config:
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/645202/config
> 
> > And then straight away to another thought: the leak-report that you
> > get currently is ao_info, which should not happen if kfree_rcu() is
> > properly fixed.
> > But I'd expect kmemleak to be unhappy with ao keys freeing as well:
> > they are currently released with call_rcu(&key->rcu,
> > tcp_ao_key_free_rcu), which doesn't have a hint for kmemleak, too.
> > 
> > I'm going to take a look at it this week. Just to let you know, I'm
> > also looking at fixing those somewhat rare flakes on tcp-ao counters
> > checks (but that may be net-next material together with tracepoint
> > selftests).
> 
> Let me add rcu@ to CC, perhaps folks there can guide us on known false
> positives with KASAN + kmemleak?

FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
freed pointers, that where addressed by to this patch:

commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
Author: Catalin Marinas <catalin.marinas@arm.com>
Date:   Sat Sep 30 17:46:56 2023 +0000

    rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects

I'm wondering if this is hitting something similar? Possibly due to
lazy RCU callbacks invoked after MSECS_MIN_AGE???

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 16:30     ` Paolo Abeni
@ 2024-06-18 16:42       ` Paul E. McKenney
  2024-06-18 17:02         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2024-06-18 16:42 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Jakub Kicinski, Dmitry Safonov, netdev@vger.kernel.org, rcu

On Tue, Jun 18, 2024 at 06:30:59PM +0200, Paolo Abeni wrote:
> On Tue, 2024-06-18 at 07:40 -0700, Jakub Kicinski wrote:
> > On Tue, 18 Jun 2024 04:24:08 +0100 Dmitry Safonov wrote:
> > > Hi Jakub,
> > > 
> > > thanks for pinging,
> > > 
> > > On Mon, 17 Jun 2024 at 15:24, Jakub Kicinski <kuba@kernel.org> wrote:
> > > > 
> > > > Hi Dmitry!
> > > > 
> > > > We added kmemleak checks to the selftest runners, TCP AO/MD5 tests seem
> > > > to trip it:
> > > > 
> > > > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/643761/4-unsigned-md5-ipv6/stdout
> > > > 
> > > > Could you take a look? kmemleak is not infallible, it could be a false
> > > > positive.  
> > > 
> > > Sure, that seems somewhat interesting, albeit at this moment not from
> > > the TCP side :D
> > > 
> > > There is some pre-history to the related issue here:
> > > https://lore.kernel.org/lkml/0000000000004d83170605e16003@google.com/
> > > 
> > > Which I was quite sure being addressed with what now is
> > > https://git.kernel.org/linus/5f98fd034ca6
> > > 
> > > But now that I look at that commit, I see that kvfree_call_rcu() is
> > > defined to __kvfree_call_rcu() under CONFIG_KASAN_GENERIC=n. And I
> > > don't see the same kmemleak_ignore() on that path.
> > > 
> > > To double-check, you don't have kasan enabled on netdev runners, right?
> > 
> > We do:
> > 
> > CONFIG_KASAN=y
> > CONFIG_KASAN_GENERIC=y
> > 
> > here's the full config:
> > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/645202/config
> > 
> > > And then straight away to another thought: the leak-report that you
> > > get currently is ao_info, which should not happen if kfree_rcu() is
> > > properly fixed.
> > > But I'd expect kmemleak to be unhappy with ao keys freeing as well:
> > > they are currently released with call_rcu(&key->rcu,
> > > tcp_ao_key_free_rcu), which doesn't have a hint for kmemleak, too.
> > > 
> > > I'm going to take a look at it this week. Just to let you know, I'm
> > > also looking at fixing those somewhat rare flakes on tcp-ao counters
> > > checks (but that may be net-next material together with tracepoint
> > > selftests).
> > 
> > Let me add rcu@ to CC, perhaps folks there can guide us on known false
> > positives with KASAN + kmemleak?
> 
> FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> freed pointers, that where addressed by to this patch:
> 
> commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> Author: Catalin Marinas <catalin.marinas@arm.com>
> Date:   Sat Sep 30 17:46:56 2023 +0000
> 
>     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> 
> I'm wondering if this is hitting something similar? Possibly due to
> lazy RCU callbacks invoked after MSECS_MIN_AGE???

Fun!  ;-)

This commit handles memory passed to kfree_rcu() and friends, but
not memory passed to call_rcu() and friends.  Of course, call_rcu()
does not necessarily know the full extent of the memory passed to it,
for example, if passed a linked list, call_rcu() will know only about
the head of that list.

There are similar challenges with synchronize_rcu() and friends.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 16:42       ` Paul E. McKenney
@ 2024-06-18 17:02         ` Jakub Kicinski
  2024-06-18 17:04           ` Uladzislau Rezki
  2024-06-18 17:47           ` Paul E. McKenney
  0 siblings, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-18 17:02 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Paolo Abeni, Dmitry Safonov, netdev@vger.kernel.org, rcu

On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> > FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> > freed pointers, that where addressed by to this patch:
> > 
> > commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> > Author: Catalin Marinas <catalin.marinas@arm.com>
> > Date:   Sat Sep 30 17:46:56 2023 +0000
> > 
> >     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> > 
> > I'm wondering if this is hitting something similar? Possibly due to
> > lazy RCU callbacks invoked after MSECS_MIN_AGE???  

Dmitry mentioned this commit, too, but we use the same config for MPTCP
tests, and while we repro TCP AO failures quite frequently, mptcp
doesn't seem to have failed once.

> Fun!  ;-)
> 
> This commit handles memory passed to kfree_rcu() and friends, but
> not memory passed to call_rcu() and friends.  Of course, call_rcu()
> does not necessarily know the full extent of the memory passed to it,
> for example, if passed a linked list, call_rcu() will know only about
> the head of that list.
> 
> There are similar challenges with synchronize_rcu() and friends.

To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
call_rcu() as something he was expecting to have a similar issue but 
it in fact appeared immune.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 17:02         ` Jakub Kicinski
@ 2024-06-18 17:04           ` Uladzislau Rezki
  2024-06-18 17:47           ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2024-06-18 17:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paul E. McKenney, Paolo Abeni, Dmitry Safonov,
	netdev@vger.kernel.org, rcu

On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> > > FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> > > freed pointers, that where addressed by to this patch:
> > > 
> > > commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> > > Author: Catalin Marinas <catalin.marinas@arm.com>
> > > Date:   Sat Sep 30 17:46:56 2023 +0000
> > > 
> > >     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> > > 
> > > I'm wondering if this is hitting something similar? Possibly due to
> > > lazy RCU callbacks invoked after MSECS_MIN_AGE???  
> 
> Dmitry mentioned this commit, too, but we use the same config for MPTCP
> tests, and while we repro TCP AO failures quite frequently, mptcp
> doesn't seem to have failed once.
> 
> > Fun!  ;-)
> > 
> > This commit handles memory passed to kfree_rcu() and friends, but
> > not memory passed to call_rcu() and friends.  Of course, call_rcu()
> > does not necessarily know the full extent of the memory passed to it,
> > for example, if passed a linked list, call_rcu() will know only about
> > the head of that list.
> > 
> > There are similar challenges with synchronize_rcu() and friends.
> 
> To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> call_rcu() as something he was expecting to have a similar issue but 
> it in fact appeared immune.
> 
In the kfree_rcu() there is "an ignore" injection:

<snip>
	/*
	 * The kvfree_rcu() caller considers the pointer freed at this point
	 * and likely removes any references to it. Since the actual slab
	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
	 * this object (no scanning or false positives reporting).
	 */
	kmemleak_ignore(ptr);

	// Set timer to drain after KFREE_DRAIN_JIFFIES.
	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
		schedule_delayed_monitor_work(krcp);
<snip>

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 17:02         ` Jakub Kicinski
  2024-06-18 17:04           ` Uladzislau Rezki
@ 2024-06-18 17:47           ` Paul E. McKenney
  2024-06-19  0:33             ` Dmitry Safonov
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2024-06-18 17:47 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Paolo Abeni, Dmitry Safonov, netdev@vger.kernel.org, rcu

On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> > > FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> > > freed pointers, that where addressed by to this patch:
> > > 
> > > commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> > > Author: Catalin Marinas <catalin.marinas@arm.com>
> > > Date:   Sat Sep 30 17:46:56 2023 +0000
> > > 
> > >     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> > > 
> > > I'm wondering if this is hitting something similar? Possibly due to
> > > lazy RCU callbacks invoked after MSECS_MIN_AGE???  
> 
> Dmitry mentioned this commit, too, but we use the same config for MPTCP
> tests, and while we repro TCP AO failures quite frequently, mptcp
> doesn't seem to have failed once.
> 
> > Fun!  ;-)
> > 
> > This commit handles memory passed to kfree_rcu() and friends, but
> > not memory passed to call_rcu() and friends.  Of course, call_rcu()
> > does not necessarily know the full extent of the memory passed to it,
> > for example, if passed a linked list, call_rcu() will know only about
> > the head of that list.
> > 
> > There are similar challenges with synchronize_rcu() and friends.
> 
> To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> call_rcu() as something he was expecting to have a similar issue but 
> it in fact appeared immune.

Whew!!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-18 17:47           ` Paul E. McKenney
@ 2024-06-19  0:33             ` Dmitry Safonov
  2024-06-20 17:02               ` Paul E. McKenney
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Safonov @ 2024-06-19  0:33 UTC (permalink / raw)
  To: paulmck, Jakub Kicinski, Paolo Abeni; +Cc: netdev@vger.kernel.org, rcu

On Tue, 18 Jun 2024 at 18:47, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> > On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
[..]
> >
> > Dmitry mentioned this commit, too, but we use the same config for MPTCP
> > tests, and while we repro TCP AO failures quite frequently, mptcp
> > doesn't seem to have failed once.
[..]
> >
> > To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> > call_rcu() as something he was expecting to have a similar issue but
> > it in fact appeared immune.
>
> Whew!!!  ;-)

I'm sorry guys, that was me being inadequate.
That's a real issue, rather than a false-positive:
https://lore.kernel.org/netdev/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com/

Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-19  0:33             ` Dmitry Safonov
@ 2024-06-20 17:02               ` Paul E. McKenney
  2024-07-02  2:08                 ` Dmitry Safonov
  0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2024-06-20 17:02 UTC (permalink / raw)
  To: Dmitry Safonov; +Cc: Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org, rcu

On Wed, Jun 19, 2024 at 01:33:36AM +0100, Dmitry Safonov wrote:
> On Tue, 18 Jun 2024 at 18:47, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> > > On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> [..]
> > >
> > > Dmitry mentioned this commit, too, but we use the same config for MPTCP
> > > tests, and while we repro TCP AO failures quite frequently, mptcp
> > > doesn't seem to have failed once.
> [..]
> > >
> > > To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> > > call_rcu() as something he was expecting to have a similar issue but
> > > it in fact appeared immune.
> >
> > Whew!!!  ;-)
> 
> I'm sorry guys, that was me being inadequate.

I know that feeling!

> That's a real issue, rather than a false-positive:
> https://lore.kernel.org/netdev/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com/

So we need call_rcu() to mark memory flowing through it?  If so, we
need help from callers of call_rcu() in the case where more than one
object is being freed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-06-20 17:02               ` Paul E. McKenney
@ 2024-07-02  2:08                 ` Dmitry Safonov
  2024-07-08 11:05                   ` Catalin Marinas
  2024-07-08 18:29                   ` Paul E. McKenney
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitry Safonov @ 2024-07-02  2:08 UTC (permalink / raw)
  To: paulmck
  Cc: Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org, rcu,
	Catalin Marinas

Hi Paul,

Sorry for the delayed answer, I got a respiratory infection, so was in
bed last week,

On Thu, 20 Jun 2024 at 18:02, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Wed, Jun 19, 2024 at 01:33:36AM +0100, Dmitry Safonov wrote:
[..]
> > I'm sorry guys, that was me being inadequate.
>
> I know that feeling!

Thanks :)

[adding/quoting back the context of the thread that I cut previously]

> On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> > On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> > > > FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> > > > freed pointers, that where addressed by to this patch:
> > > >
> > > > commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> > > > Author: Catalin Marinas <catalin.marinas@arm.com>
> > > > Date:   Sat Sep 30 17:46:56 2023 +0000
> > > >
> > > >     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> > > >
> > > > I'm wondering if this is hitting something similar? Possibly due to
> > > > lazy RCU callbacks invoked after MSECS_MIN_AGE???
> >
> > Dmitry mentioned this commit, too, but we use the same config for MPTCP
> > tests, and while we repro TCP AO failures quite frequently, mptcp
> > doesn't seem to have failed once.
> >
> > > Fun!  ;-)
> > >
> > > This commit handles memory passed to kfree_rcu() and friends, but
> > > not memory passed to call_rcu() and friends.  Of course, call_rcu()
> > > does not necessarily know the full extent of the memory passed to it,
> > > for example, if passed a linked list, call_rcu() will know only about
> > > the head of that list.
> > >
> > > There are similar challenges with synchronize_rcu() and friends.
> >
> > To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> > call_rcu() as something he was expecting to have a similar issue but
> > it in fact appeared immune.

On Thu, 20 Jun 2024 at 18:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> > That's a real issue, rather than a false-positive:
> > https://lore.kernel.org/netdev/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com/
>
> So we need call_rcu() to mark memory flowing through it?  If so, we
> need help from callers of call_rcu() in the case where more than one
> object is being freed.

Not sure, I think one way to avoid explicitly marking pointers for
call_rcu() or even avoiding the patch above would be a hackery like
this:

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index d5b6fba44fc9..7a5eb55a155c 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1587,6 +1587,7 @@ static void kmemleak_cond_resched(struct
kmemleak_object *object)
 static void kmemleak_scan(void)
 {
        struct kmemleak_object *object;
+       unsigned long gp_start_state;
        struct zone *zone;
        int __maybe_unused i;
        int new_leaks = 0;
@@ -1630,6 +1631,7 @@ static void kmemleak_scan(void)
                        kmemleak_cond_resched(object);
        }
        rcu_read_unlock();
+       gp_start_state = get_state_synchronize_rcu();

 #ifdef CONFIG_SMP
        /* per-cpu sections scanning */
@@ -1690,6 +1692,13 @@ static void kmemleak_scan(void)
         */
        scan_gray_list();

+       /*
+        * Wait for the greylist objects potentially migrating from
+        * RCU callbacks or maybe getting freed.
+        */
+       cond_synchronize_rcu(gp_start_state);
+       rcu_barrier();
+
        /*
         * Check for new or unreferenced objects modified since the previous
         * scan and color them gray until the next scan.


-->8--

Not quite sure if this makes sense, my first time at kmemleak code,
adding Catalin.
But then if I didn't mess up, it's going to work only for one RCU
period, so in case some object calls rcu/kfree_rcu() from the
callback, it's going to be yet a false-positive.

Some other options/ideas:
- more RCU-invasive option which would be adding a mapping function to
segmented lists of callbacks, which would allow kmemleak to request
from RCU if the pointer is yet referenced by RCU.
- the third option is for kmemleak to trust RCU and generalize the
commit above, adding kmemleak_object::flags of OBJECT_RCU, which will
be set by call_rcu()/kfree_rcu() and unset once the callback is
invoked for RCU_DONE_TAIL.
- add kmemleak_object::update_jiffies or just directly touch
kmemleak_object::jiffies whenever the object has been modified (see
update_checksum()), that will ignore recently changed objects. As
rcu_head should be updated, that is going to automagically ignore
those deferred to RCU objects.

Not sure if I mis-looked anything and it seems there is no hurry in
fixing anything yet, as this is a theoretical issue at this moment.

Hopefully, these ideas don't look like a nonsense,
             Dmitry

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-07-02  2:08                 ` Dmitry Safonov
@ 2024-07-08 11:05                   ` Catalin Marinas
  2024-07-08 18:29                   ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Catalin Marinas @ 2024-07-08 11:05 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: paulmck, Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org, rcu

On Tue, Jul 02, 2024 at 03:08:42AM +0100, Dmitry Safonov wrote:
> On Thu, 20 Jun 2024 at 18:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> > So we need call_rcu() to mark memory flowing through it?  If so, we
> > need help from callers of call_rcu() in the case where more than one
> > object is being freed.
> 
> Not sure, I think one way to avoid explicitly marking pointers for
> call_rcu() or even avoiding the patch above would be a hackery like
> this:
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index d5b6fba44fc9..7a5eb55a155c 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1587,6 +1587,7 @@ static void kmemleak_cond_resched(struct
> kmemleak_object *object)
>  static void kmemleak_scan(void)
>  {
>         struct kmemleak_object *object;
> +       unsigned long gp_start_state;
>         struct zone *zone;
>         int __maybe_unused i;
>         int new_leaks = 0;
> @@ -1630,6 +1631,7 @@ static void kmemleak_scan(void)
>                         kmemleak_cond_resched(object);
>         }
>         rcu_read_unlock();
> +       gp_start_state = get_state_synchronize_rcu();
> 
>  #ifdef CONFIG_SMP
>         /* per-cpu sections scanning */
> @@ -1690,6 +1692,13 @@ static void kmemleak_scan(void)
>          */
>         scan_gray_list();
> 
> +       /*
> +        * Wait for the greylist objects potentially migrating from
> +        * RCU callbacks or maybe getting freed.
> +        */
> +       cond_synchronize_rcu(gp_start_state);
> +       rcu_barrier();
> +
>         /*
>          * Check for new or unreferenced objects modified since the previous
>          * scan and color them gray until the next scan.
> 
> 
> -->8--
> 
> Not quite sure if this makes sense, my first time at kmemleak code,
> adding Catalin.
> But then if I didn't mess up, it's going to work only for one RCU
> period, so in case some object calls rcu/kfree_rcu() from the
> callback, it's going to be yet a false-positive.

Yeah, I don't think this fully solves the problem.

> Some other options/ideas:
> - more RCU-invasive option which would be adding a mapping function to
> segmented lists of callbacks, which would allow kmemleak to request
> from RCU if the pointer is yet referenced by RCU.

This looks too invasive to me plus additional scanning cost.

> - the third option is for kmemleak to trust RCU and generalize the
> commit above, adding kmemleak_object::flags of OBJECT_RCU, which will
> be set by call_rcu()/kfree_rcu() and unset once the callback is
> invoked for RCU_DONE_TAIL.

This could work, the downside being a kmemleak object lookup every time
call_rcu() is invoked. Well, we do this now for kfree_rcu() already,
though we just mark the object as ignored once.

> - add kmemleak_object::update_jiffies or just directly touch
> kmemleak_object::jiffies whenever the object has been modified (see
> update_checksum()), that will ignore recently changed objects. As
> rcu_head should be updated, that is going to automagically ignore
> those deferred to RCU objects.

This looks the simplest, reset the jiffies every time it detected the
object being touched. However, I wonder whether we should introduce a
new variable to track this. 'jiffies' currently stores the creation time
(or thereabouts). We can change the 'age' reported in the sysfs as long
as no user script parses that.

I added the min age to kmemleak to avoid the early object being added to
some lists and getting reported as false positive. It looks like we have
a similar scenario when the object is getting freed: it is added to RCU
lists, disappears temporarily from kmemleak's view. So it does make
sense to employ similar delay as for creation.

That said, for the default kmemleak operation with scanning every 10min,
this wouldn't be needed. During one scan it detects the checksum changed
due to the rcu_head update. 10min later it should have been freed
already.

-- 
Catalin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [TEST] TCP MD5 vs kmemleak
  2024-07-02  2:08                 ` Dmitry Safonov
  2024-07-08 11:05                   ` Catalin Marinas
@ 2024-07-08 18:29                   ` Paul E. McKenney
  1 sibling, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2024-07-08 18:29 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org, rcu,
	Catalin Marinas

On Tue, Jul 02, 2024 at 03:08:42AM +0100, Dmitry Safonov wrote:
> Hi Paul,
> 
> Sorry for the delayed answer, I got a respiratory infection, so was in
> bed last week,
> 
> On Thu, 20 Jun 2024 at 18:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Wed, Jun 19, 2024 at 01:33:36AM +0100, Dmitry Safonov wrote:
> [..]
> > > I'm sorry guys, that was me being inadequate.
> >
> > I know that feeling!
> 
> Thanks :)
> 
> [adding/quoting back the context of the thread that I cut previously]
> 
> > On Tue, Jun 18, 2024 at 10:02:10AM -0700, Jakub Kicinski wrote:
> > > On Tue, 18 Jun 2024 09:42:35 -0700 Paul E. McKenney wrote:
> > > > > FTR, with mptcp self-tests we hit a few kmemleak false positive on RCU
> > > > > freed pointers, that where addressed by to this patch:
> > > > >
> > > > > commit 5f98fd034ca6fd1ab8c91a3488968a0e9caaabf6
> > > > > Author: Catalin Marinas <catalin.marinas@arm.com>
> > > > > Date:   Sat Sep 30 17:46:56 2023 +0000
> > > > >
> > > > >     rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
> > > > >
> > > > > I'm wondering if this is hitting something similar? Possibly due to
> > > > > lazy RCU callbacks invoked after MSECS_MIN_AGE???
> > >
> > > Dmitry mentioned this commit, too, but we use the same config for MPTCP
> > > tests, and while we repro TCP AO failures quite frequently, mptcp
> > > doesn't seem to have failed once.
> > >
> > > > Fun!  ;-)
> > > >
> > > > This commit handles memory passed to kfree_rcu() and friends, but
> > > > not memory passed to call_rcu() and friends.  Of course, call_rcu()
> > > > does not necessarily know the full extent of the memory passed to it,
> > > > for example, if passed a linked list, call_rcu() will know only about
> > > > the head of that list.
> > > >
> > > > There are similar challenges with synchronize_rcu() and friends.
> > >
> > > To be clear I think Dmitry was suspecting kfree_rcu(), he mentioned
> > > call_rcu() as something he was expecting to have a similar issue but
> > > it in fact appeared immune.
> 
> On Thu, 20 Jun 2024 at 18:02, Paul E. McKenney <paulmck@kernel.org> wrote:
> > > That's a real issue, rather than a false-positive:
> > > https://lore.kernel.org/netdev/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com/
> >
> > So we need call_rcu() to mark memory flowing through it?  If so, we
> > need help from callers of call_rcu() in the case where more than one
> > object is being freed.
> 
> Not sure, I think one way to avoid explicitly marking pointers for
> call_rcu() or even avoiding the patch above would be a hackery like
> this:
> 
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index d5b6fba44fc9..7a5eb55a155c 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c
> @@ -1587,6 +1587,7 @@ static void kmemleak_cond_resched(struct
> kmemleak_object *object)
>  static void kmemleak_scan(void)
>  {
>         struct kmemleak_object *object;
> +       unsigned long gp_start_state;
>         struct zone *zone;
>         int __maybe_unused i;
>         int new_leaks = 0;
> @@ -1630,6 +1631,7 @@ static void kmemleak_scan(void)
>                         kmemleak_cond_resched(object);
>         }
>         rcu_read_unlock();
> +       gp_start_state = get_state_synchronize_rcu();
> 
>  #ifdef CONFIG_SMP
>         /* per-cpu sections scanning */
> @@ -1690,6 +1692,13 @@ static void kmemleak_scan(void)
>          */
>         scan_gray_list();
> 
> +       /*
> +        * Wait for the greylist objects potentially migrating from
> +        * RCU callbacks or maybe getting freed.
> +        */
> +       cond_synchronize_rcu(gp_start_state);
> +       rcu_barrier();

You lost me on this one.  If you are waiting only on RCU callbacks,
the rcu_barrier() suffices.  If you are also waiting on objects to be
freed after a synchronize_rcu(), kfree_rcu() or similar, you are waiting
only for the corresponding grace period to end, not necessarily for the
freeing of objects following that synchronize_rcu().

So what exactly needs to be waited upon?

I should hasten to add that the fact that there is currently no way
to wait on kfree_rcu()'s frees is considered to be a bug, but that
bug is currently only known to be a problem in the case of a module
that uses kfree_rcu() on objects obtained from a kmem_cache structure,
and that also passes that kmem_cache structure to kmem_cache_free()
at module-unload time.

In the meantime, the workaround is to use call_rcu() instead of
kfree_rcu() in that case.  Which existing code does.

>         /*
>          * Check for new or unreferenced objects modified since the previous
>          * scan and color them gray until the next scan.
> 
> 
> -->8--
> 
> Not quite sure if this makes sense, my first time at kmemleak code,
> adding Catalin.
> But then if I didn't mess up, it's going to work only for one RCU
> period, so in case some object calls rcu/kfree_rcu() from the
> callback, it's going to be yet a false-positive.
> 
> Some other options/ideas:
> - more RCU-invasive option which would be adding a mapping function to
> segmented lists of callbacks, which would allow kmemleak to request
> from RCU if the pointer is yet referenced by RCU.
> - the third option is for kmemleak to trust RCU and generalize the
> commit above, adding kmemleak_object::flags of OBJECT_RCU, which will
> be set by call_rcu()/kfree_rcu() and unset once the callback is
> invoked for RCU_DONE_TAIL.
> - add kmemleak_object::update_jiffies or just directly touch
> kmemleak_object::jiffies whenever the object has been modified (see
> update_checksum()), that will ignore recently changed objects. As
> rcu_head should be updated, that is going to automagically ignore
> those deferred to RCU objects.

You could also make greater use of the polled RCU API, marking
an object (or, perhaps more space-efficiently, a group of
objects) using get_state_synchronize_rcu_full(), then using
poll_state_synchronize_rcu_full() to see when all readers should be done.
There are additional tricks that can be used if needed.

							Thanx, Paul

> Not sure if I mis-looked anything and it seems there is no hurry in
> fixing anything yet, as this is a theoretical issue at this moment.
> 
> Hopefully, these ideas don't look like a nonsense,
>              Dmitry

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-07-08 18:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 14:24 [TEST] TCP MD5 vs kmemleak Jakub Kicinski
2024-06-18  3:24 ` Dmitry Safonov
2024-06-18 14:40   ` Jakub Kicinski
2024-06-18 16:17     ` Paul E. McKenney
2024-06-18 16:30     ` Paolo Abeni
2024-06-18 16:42       ` Paul E. McKenney
2024-06-18 17:02         ` Jakub Kicinski
2024-06-18 17:04           ` Uladzislau Rezki
2024-06-18 17:47           ` Paul E. McKenney
2024-06-19  0:33             ` Dmitry Safonov
2024-06-20 17:02               ` Paul E. McKenney
2024-07-02  2:08                 ` Dmitry Safonov
2024-07-08 11:05                   ` Catalin Marinas
2024-07-08 18:29                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).