From: Joel Fernandes <joel@joelfernandes.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Paasch <cpaasch@apple.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, MPTCP Upstream <mptcp@lists.linux.dev>,
rcu@vger.kernel.org
Subject: Re: kmemleak handling of kfree_rcu
Date: Wed, 6 Sep 2023 14:35:29 +0000 [thread overview]
Message-ID: <20230906143529.GB1127143@google.com> (raw)
In-Reply-To: <ZPc+HGjmY2ZC4WO3@arm.com>
Hi Catalin,
On Tue, Sep 05, 2023 at 03:41:32PM +0100, Catalin Marinas wrote:
> On Tue, Sep 05, 2023 at 11:17:25AM +0000, Joel Fernandes wrote:
> > On Mon, Sep 04, 2023 at 10:22:56PM +0100, Catalin Marinas wrote:
> > > On Wed, Aug 30, 2023 at 09:37:23AM -0700, Christoph Paasch wrote:
> > > > for the MPTCP upstream project, we are running syzkaller [1] continuously to
> > > > qualify our kernel changes.
> > > >
> > > > We found one issue with kmemleak and its handling of kfree_rcu.
> > > >
> > > > Specifically, it looks like kmemleak falsely reports memory-leaks when the
> > > > object is being freed by kfree_rcu after a certain grace-period.
> > > >
> > > > For example, https://github.com/multipath-tcp/mptcp_net-next/issues/398#
> > > > issuecomment-1584819482 shows how the syzkaller program reliably produces a
> > > > kmemleak report, although the object is not leaked (we confirmed that by simply
> > > > increasing MSECS_MIN_AGE to something larger than the grace-period).
> > >
> > > Not sure which RCU variant you are using but most likely the false
> > > positives are caused by the original reference to the object being lost
> > > and the pointer added to a new location that kmemleak does not track
> > > (e.g. bnode->records[] in the tree-based variant).
> > >
> > > A quick attempt (untested, not even compiled):
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >
> > I am not sure if that works. Correct me if I'm wrong but the issue is that
> > the allocated pointer being RCU-freed is no longer reachable by kmemleak (it
> > is scanned but not reachable), however it might still be reachable via an
> > RCU reader. In such a situation, it is a false-positive.
>
> Yes, with a small correction that the object being RCU-freed is not
> scanned if it cannot be reached by kmemleak (unless explicitly marked as
> not a leak).
Yes I used wrong wording. I meant it is added to the kmemleak rbtree as
something that needs to be reachable, but cannot be reached. Hopefully that
makes sense.
> > The correct fix then should probably be to mark the object as
> > kmemleak_not_leak() until a grace period elapses. This will cause the object
> > to not be reported but still be scanned until eventually the lower layers
> > will remove the object from kmemleak-tracking after the grace period. Per the
> > docs also, that API is used to prevent false-positives.
>
> This should work as well but I'd use kmemleak_ignore() instead of
> kmemleak_not_leak(). The former, apart from masking the false positive,
> also tells kmemleak not to scan the object. After a kvfree_rcu(), the
> object shouldn't have any valid references to other objects, so not
> worth scanning.
Yes I am also OK with that, however to me I consider the object as alive as
long as the grace period does not end. But I agree with you and it may not be
worth tracking them or scanning them.
> > Instead what the below diff appears to do is to mark the bnode cache as a
> > kmemleak object itself, which smells a bit wrong. The bnode is not an
> > allocated object in the traditional sense, it is simple an internal data
> > structure.
>
> We do this in other places for objects containing pointers and which
> aren't tracked by kmemleak (it doesn't track page allocations as there
> would be too many leading to lots of false positives or overlapping
> objects - the slab itself uses the page allocator). An example where we
> do something similar is alloc_large_system_hash(). We could add a
> wrapper API if kmemleak_alloc() feels wrong but underlying in kmemleak
> it would use the same mechanism for recording and scanning the bnode.
Ah I see what you did, you basically made the bnode as a kmemleak object in
its own right, and perhaps the kmemleak detector can reach the object you
added. That actually (though sounding like a little of a hack) would work too.
I say hack because the object you added is not an allocated object in the
traditional sense :-D. But as you said, there is precendent for that.
> > That may not solve the issue as the false-positive unreachable
> > object is still unreachable.
>
> It should solve the issue as long as the bnode is reachable from the
> root objects (e.g. the data/bss sections; I think it is via the
> per_cpu_ptr(&krc, cpu)->bkvcache llist_head). When the bnode is scanned,
> it will find the pointer to the RCU-freed object and mark it as
> referenced (not a leak).
Right! That's what I was missing.
> Anyway, as we trust the RCU freeing implementation not to leak data, we
> can go with your simpler suggestion of a kmemleak_ignore() call on the
> kvfree_rcu() path. Probably even better as the object pending to be
> freed will no longer be scanned.
Sounds good and thanks a lot Catalin! Unless you beat me to it, I'll send a
patch out along those lines by the weekend and CC you with your suggested-by
and the reported-by from the reporter ;-). Let me know if you have a preference.
thanks,
- Joel
next prev parent reply other threads:[~2023-09-06 14:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-30 16:37 kmemleak handling of kfree_rcu Christoph Paasch
2023-09-04 21:22 ` Catalin Marinas
2023-09-05 11:17 ` Joel Fernandes
2023-09-05 14:41 ` Catalin Marinas
2023-09-06 14:35 ` Joel Fernandes [this message]
2023-09-06 17:15 ` Catalin Marinas
2023-09-06 19:11 ` Paul E. McKenney
2023-09-06 21:37 ` Catalin Marinas
2023-09-06 22:02 ` Paul E. McKenney
2023-09-06 23:10 ` Joel Fernandes
2023-09-12 13:15 ` Matthieu Baerts
2023-09-12 13:32 ` Joel Fernandes
2023-09-05 21:22 ` Christoph Paasch
2023-09-06 17:21 ` Catalin Marinas
2023-09-10 23:10 ` Joel Fernandes
2023-09-11 17:41 ` Christoph Paasch
2023-09-12 9:52 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230906143529.GB1127143@google.com \
--to=joel@joelfernandes.org \
--cc=akpm@linux-foundation.org \
--cc=catalin.marinas@arm.com \
--cc=cpaasch@apple.com \
--cc=linux-mm@kvack.org \
--cc=mptcp@lists.linux.dev \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).