Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
Date: Sun, 7 Dec 2008 15:19:59 -0800	[thread overview]
Message-ID: <20081207231959.GK6897@linux.vnet.ibm.com> (raw)
In-Reply-To: <1228604850.8621.42.camel@localhost.localdomain>

On Sat, Dec 06, 2008 at 11:07:30PM +0000, Catalin Marinas wrote:
> On Thu, 2008-12-04 at 08:55 -0800, Paul E. McKenney wrote:
> > On Thu, Dec 04, 2008 at 12:14:26PM +0000, Catalin Marinas wrote:
> > > On Wed, 2008-12-03 at 10:12 -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 20, 2008 at 11:30:34AM +0000, Catalin Marinas wrote:
> > > > > This patch adds the base support for the kernel memory leak
> > > > > detector. It traces the memory allocation/freeing in a way similar to
> > > > > the Boehm's conservative garbage collector, the difference being that
> > > > > the unreferenced objects are not freed but only shown in
> > > > > /sys/kernel/debug/memleak. Enabling this feature introduces an
> > > > > overhead to memory allocations.
> > > > 
> > > > I have some concerns about your locking/RCU design, please see
> > > > interspersed questions and comments.
> [...]
> > > I'll comment below as well but I'll first show my logic which I added as
> > > comment at the top of the memleak.c file:
> > 
> > This is very helpful, thank you!  (And please accept my apologies if I
> > missed seeing it in my earlier review.)
> 
> You haven't missed it, that's the first time I posted this text.

Whew!  ;-)

> > >  * The following locks are used by kmemleak:
> > >  * 
> > >  * - object_list_lock (spinlock): protects the object_list. This is the main
> > >  *   list holding the metadata (struct memleak_object) for the allocated
> > >  *   objects. The memleak_object structures are added to the list in the
> > >  *   create_object() function called from the memleak_alloc() callback. They
> > >  *   are removed from the list in put_object() if the object->use_count is 0
> > 
> > This part sounds good.  I would also add that once object->use_count is
> > zero, no one is allowed to increment it.  Also, attempts to increment
> > object->use_count must be under the protection of rcu_read_lock(),
> > correct?
> 
> So, to make sure I understand it correctly, the rcu_read_lock() is
> needed to protect between the point where the object pointer was
> obtained to the get_object() point.

More generally, it ensures that an RCU-protected object stays around
until the matching rcu_read_unlock() is reached.  The guarantee call_rcu()
provides is to invoke the specified function only after all pre-existing
RCU read-side critical sections have completed, in other words, only
after any task that previously executed rcu_read_lock() has executed
the matching rcu_read_unlock().

>                                     Would it also work if
> spin_lock_irqsave(object_list_lock) is used instead of rcu_read_lock()?
> The call_rcu() in put_object is bracketed with object_list_lock.

For some RCU implementations, but only by accident.

Just please don't do this!

Instead, consider wrapping rcu_read_lock() and rcu_read_unlock() around
the region of interest, if need be.  The RCU read-side primitives are
quite cheap, so you are losing almost nothing (and in some cases exactly
nothing) by using them.

> BTW, I'll have a look if I could remove an object from the object_list
> in delete_object() rather than waiting until put_object().

If by doing this, you exclude all readers while removing, you are set.

> > >  * - object_tree_lock (rwlock): protects the object_tree_root. When the
> > >  *   metadata is created in create_object(), it is added to the object prio
> > >  *   search tree and removed in delete_object() with this lock held
> > >  *   (write_lock). This lock is also acquired (read_lock) in
> > >  *   find_and_get_object() when an object needs to be looked up by a pointer
> > >  *   value (either during scanning or when changing its properties like
> > >  *   marking as false positive)
> > 
> > Looks OK.  I must confess that I am a bit fuzzy on the purpose of
> > object_tree_root vs. object_list.
> 
> object_list holds all the memleak_objects in the system and it is
> traversed when preparing the scanning and also when reporting the leaks.
> 
> object_tree_root is used to look-up memleak_objects by the allocated
> memory block. In the past, this used to be a radix tree (with some
> lockdep problems) and later a hash. I now use a prio tree because it
> allows pointer ranges.
> 
> Kmemleak could probably iterate over the object_tree_root when reporting
> but it is more convenient to report the leaks in the order they were
> allocated (preserved by object_list) since one leak may trigger many
> subsequent reports but they disappear once the first one is solved.

Very helpful, thank you!

> > >  * - memleak_object.lock (spinlock): protects a memleak_object. Modifications
> > >  *   of the metadata (e.g. count) are protected by this lock. Note that some
> > >  *   members of this structure may be protected by other means (atomic or
> > >  *   object_list lock). This lock is also held when scanning the corresponding
> > >  *   object to avoid the kernel freeing it via the memleak_free() callback.
> > >  *   This is less heavyweight than holding a global lock like object_list_lock
> > >  *   during scanning
> > 
> > OK, holding an object's lock can protect that object from deletion,
> > but only after you actually acquire the lock.  There must be some other
> > mechanism preventing the object from being freed during the actual
> > acquisition of the lock.
> > 
> > Now this might be the object_list_lock, object_tree_lock, RCU, or some
> > combination of the three, for example it might depend on how that object
> > is looked up.
> 
> Correct. I'll have to review this again.

Fair enough!

> > >  * Freeing a memleak_object is done via an RCU callback invoked from
> > >  * put_object() when its use_count is 0 and after removing it from the
> > >  * object_list. One of the reasons for the RCU is to delay the freeing and
> > >  * avoid a recursive call into the allocator via kmem_cache_free(). Another
> > >  * reason is to allow lock-less object_list traversal during memleak_scan().
> > 
> > I did figure out the lock-less object_list traversal, but totally missed
> > the fact that you were using RCU to prevent infinite recursion.  Cute!
> 
> It wasn't documented, so pretty hard to guess.

I guess I don't feel quite so bad, then.  ;-)

> > Also, the memleak_object must have been removed from object_tree before
> > its use_count can possibly go to 0, correct?
> 
> Yes.

Good!

> > > > > +static void free_object_rcu(struct rcu_head *rcu)
> > > > > +{
> > > > > +	struct hlist_node *elem, *tmp;
> > > > > +	struct memleak_scan_area *area;
> > > > > +	struct memleak_object *object =
> > > > > +		container_of(rcu, struct memleak_object, rcu);
> > > > > +
> > > > > +	/* once use_count is 0, there is no code accessing the object */
> > > > 
> > > > OK, so we won't pass free_object_rcu() to call_rcu() until use_count
> > > > is equal to zero.  And once use_count is zero, it is never incremented.
> > > > So the point of the RCU grace period is to ensure that all tasks
> > > > who didn't quite call get_object() soon enough get done failing
> > > > before we free up the object, correct?
> > > > 
> > > > Which means that get_object() needs to be under rcu_read_lock()...
> > > 
> > > My view here is that if use_count is 0, no other thread would be able to
> > > use this object. It will also be removed from the object_list and hence
> > > no other way to get this this object.
> > 
> > What if some other CPU picked up a pointer to the object just before it
> > was removed from the list?  If that CPU was not under the protection of
> > rcu_read_lock(), and if that CPU was delayed, then the object could be
> > freed (and possibly re-allocated as something else) before the CPU got
> > around to doing the atomic_inc_not_zero().
> 
> OK, I got it now.

It can indeed be a bit subtle, to be sure.

> > It looks to me that the code currently does the right thing here, just
> > want to make sure I understand the locking and that we don't end up
> > tempting someone later to break it.  ;-)
> 
> I'll document it better and make sure it's clear for me as well.

Sounds good, look forward to seeing the next version!

> > > > > +static void put_object(struct memleak_object *object)
> > > > > +{
> > > > > +	unsigned long flags;
> > > > > +
> > > > > +	if (!atomic_dec_and_test(&object->use_count))
> > > > > +		return;
> > > > > +
> > > > > +	/* should only get here after delete_object was called */
> > > > > +	BUG_ON(object->flags & OBJECT_ALLOCATED);
> > > > > +
> > > > > +	spin_lock_irqsave(&object_list_lock, flags);
> > > > 
> > > > We also need to write-hold the object_tree_lock, not?
> > > 
> > > Not here, the memleak_object is removed from the object_tree in the
> > > delete_object() function (via from the memleak_free callback). If it is
> > > in the object_tree, it should have a use_count >= 1.
> > 
> > So the code never calls the last put_object() without first having
> > called delete_object() to remove it from the object_tree?  The "last"
> > put_object() being the one that decrements object->use_count to zero.
> 
> Yes.

Good!

> > > > > +static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > > > +{
> > > > > +	struct list_head *n;
> > > > > +	struct memleak_object *next = NULL;
> > > > > +	unsigned long flags;
> > > > > +
> > > > > +	++(*pos);
> > > > > +	if (reported_leaks >= REPORTS_NR)
> > > > > +		goto out;
> > > > > +
> > > > > +	spin_lock_irqsave(&object_list_lock, flags);
> > > > 
> > > > Using a spinlock instead of rcu_read_lock() is OK, but only if all
> > > > updates are also protected by this same spinlock.  Which means that,
> > > > given that find_and_get_object read-acquires object_tree_lock, deletions
> > > > must be projected both by object_list_lock and by write-acquiring
> > > > object_tree_lock.  Or all calls to memleak_seq_next need to be covered
> > > > by rcu_read_lock().
> > > 
> > > The spin_lock here is only to retrieve the next object in the list but I
> > > agree that even if the object_list modifications are protected by
> > > object_list_lock, put_object could actually set the use_count to 0 and
> > > get_object return in this function isn't checked. If get_object returns
> > > successfully, I don't think an rcu_read_lock() is needed since
> > > put_object() can no longer invoke free_object_rcu().
> > 
> > OK, so let me see if I understand:
> > 
> > 	The memleak_object passed in via the "v" argument to
> > 	memleak_seq_next() was get_object()-ed by some prior
> > 	call, either an earlier memleak_seq_next() or presumably
> > 	by memleak_seq_start().
> 
> Yes.

OK, good!  I might be (slowly) catching on here.

> > 	memleak_seq_start() -does- do its scan under RCU protection,
> > 	so looks OK.
> > 
> > 	I believe you also need RCU protection in memleak_seq_next()
> > 	to prevent the next memleak_object from disappearing
> > 	during the traversal.  Yes, you do greatly decrease the
> > 	odds of this happening by having irqs disabled, but the
> > 	fact is that RCU is within its rights to end a grace
> > 	period during this time.
> 
> I'll try to make the memleak_seq_next() function use rcu_dereference()
> and rcu_read_lock(). ATM, the "v" object has use_count >= 1 (from a
> previous get_object) and the next pointer is accessed under
> object_list_lock, so no modifications to the list (even put_object
> acquires this lock when invoking call_rcu). There is still the bug with
> not checking the get_object() return.

Fair enough!

> > Assuming that I do understand, as you say, if the get_object() in
> > memleak_seq_next() fails, we could end up accessing freed-up memory on
> > the next call to memleak_seq_next(), or even during the current one,
> > assuming an aggressive RCU or an extended NMI, SMI, burst of ECC errors
> > or some other delay.  So I agree that it is necessary to check the return
> > value of get_object().
> 
> Yes

Whew!

> > Also, I do see the put_object() call in memleak_seq_stop(), but it looks
> > to me that this only does a put_object() on the last memleak_object.
> > What does the put_object() on the earlier memleak_object structures that
> > were scanned by memleak_seq_next?  Or is there never more than one
> > such object in a given list?
> 
> The previous objects' use_count are decremented in memleak_seq_next()
> just before returning "next", so between seq_start-seq_next and
> seq_next-seq_stop, there is only one object with an incremented
> use_count. The memleak_seq_next() function may have two such objects for
> a small period of time.
> 
> I actually added a test for this in memleak_scan() (if DEBUG is defined)
> and I've never got any reports. There may be some situations when for
> very short periods of time the use_count is > 1 at the beginning of a
> scan, usually when one of the memleak_scan_area or memleak_ignore
> callbacks are invoked.
> 
> I'll revise the locking a bit and re-post the patches this week.

I look forward to seeing them!

							Thanx, Paul

next prev parent reply	other threads:[~2008-12-07 23:20 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-20 11:30 [PATCH 2.6.28-rc5 00/11] Kernel memory leak detector (updated) Catalin Marinas
2008-11-20 11:30 ` [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support Catalin Marinas
2008-11-20 11:58   ` Ingo Molnar
2008-11-20 19:35   ` Pekka Enberg
2008-11-21 12:07     ` Catalin Marinas
2008-11-24  8:16       ` Pekka Enberg
2008-11-24  8:19         ` Pekka Enberg
2008-12-03 18:12   ` Paul E. McKenney
2008-12-04 12:14     ` Catalin Marinas
2008-12-04 16:55       ` Paul E. McKenney
2008-12-06 23:07         ` Catalin Marinas
2008-12-07 23:19           ` Paul E. McKenney [this message]
2008-11-20 11:30 ` [PATCH 2.6.28-rc5 02/11] kmemleak: Add documentation on the memory leak detector Catalin Marinas
2008-11-20 11:30 ` [PATCH 2.6.28-rc5 03/11] kmemleak: Add the memory allocation/freeing hooks Catalin Marinas
2008-11-20 12:00   ` Ingo Molnar
2008-11-20 19:30   ` Pekka Enberg
2008-11-21 11:07     ` Catalin Marinas
2008-11-24  8:19       ` Pekka Enberg
2008-11-24 10:18         ` Catalin Marinas
2008-11-24 10:35           ` Pekka Enberg
2008-11-24 10:43             ` Catalin Marinas
2008-11-20 11:30 ` [PATCH 2.6.28-rc5 04/11] kmemleak: Add modules support Catalin Marinas
2008-11-20 12:03   ` Ingo Molnar
2008-11-20 11:30 ` [PATCH 2.6.28-rc5 05/11] kmemleak: Add support for i386 Catalin Marinas
2008-11-20 12:16   ` Ingo Molnar
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 06/11] kmemleak: Add support for ARM Catalin Marinas
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 07/11] kmemleak: Remove some of the kmemleak false positives Catalin Marinas
2008-11-20 12:09   ` Ingo Molnar
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 08/11] kmemleak: Enable the building of the memory leak detector Catalin Marinas
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 09/11] kmemleak: Keep the __init functions after initialization Catalin Marinas
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 10/11] kmemleak: Simple testing module for kmemleak Catalin Marinas
2008-11-20 12:11   ` Ingo Molnar
2008-11-20 11:31 ` [PATCH 2.6.28-rc5 11/11] kmemleak: Add the corresponding MAINTAINERS entry Catalin Marinas
2008-11-20 12:10 ` [PATCH 2.6.28-rc5 00/11] Kernel memory leak detector (updated) Ingo Molnar
2008-11-20 17:54   ` Catalin Marinas
2008-11-20 12:22 ` Ingo Molnar
2008-11-20 18:10   ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081207231959.GK6897@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.