All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	linux-kernel@vger.kernel.org, paulmck@us.ibm.com
Subject: Re: [patch] radix-tree: fix small lockless radix-tree bug
Date: Thu, 12 Jun 2008 21:38:33 +0200	[thread overview]
Message-ID: <1213299513.31518.174.camel@twins> (raw)
In-Reply-To: <20080612123102.d8783b98.akpm@linux-foundation.org>

On Thu, 2008-06-12 at 12:31 -0700, Andrew Morton wrote:
> On Fri, 13 Jun 2008 05:03:45 +1000
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> > Hi guys,
> > 
> > Although this doesn't seem like cause for alarm (as per the analysis),
> > it may still be a good 2.6.26 candidate as we should have a few more
> > weeks of testing left.
> > 
> > It should definitely go in -mm with the lockless pagecache patch.
> > 
> > When shrinking a radix-tree, we do it in a lockless manner by atomically
> > switching the root pointer away from the redundant node (one that only
> > has a single entry in the left most slot), and switching it over to its
> > lone child.
> > 
> > Because a lockless lookup may have got a reference to the parent and be
> > in the middle of deciding what to do with it while it is being swapped
> > away for its child. For this reason, we also have to keep it around and
> > in a valid state for the lookup to proceed and give a valid result, for
> > at least an RCU grace period. So we need to keep the child in the left
> > most slot there in case that is requested by the lookup.
> > 
> > This is all pretty standard RCU stuff. It is worth repeating because
> > in my eagerness to obey the radix tree node constructor scheme, I had
> > broken this by zeroing the radix tree node before the grace period.
> > 
> > Fix it by clearing those fields in the RCU callback. I would normally
> > want to rip out the constructor entirely, but radix tree nodes are one
> > of those places where they make sense (only few cachelines will be
> > touched soon after allocation).
> > 
> > 
> > This was never actually observed in any lockless pagecache testing or
> > using the test harness, but as a rare problem testing my scalable vmap
> > rewrite.
> > 
> > Fortunately, it is not a problem anywhere lockless pagecache is used in
> > mainline kernels (pagecache probe is not a guarantee, and brd does not
> > have concurrent lookups and deletes).
> > 
> > However, it would eventually pop up for someone using lockless pagecache :P
> > 
> 
> OK, I give up.  A cannot spot what you actually changed amongst all the
> code motion?

The two relevant hunks:

@@ -124,6 +175,17 @@ static void radix_tree_node_rcu_free(str
 {
        struct radix_tree_node *node =
                        container_of(head, struct radix_tree_node, rcu_head);
+
+       /*
+        * must only free zeroed nodes into the slab. radix_tree_shrink
+        * can leave us with a non-NULL entry in the first slot, so clear
+        * that here to make sure.
+        */
+       tag_clear(node, 0, 0);
+       tag_clear(node, 1, 0);
+       node->slots[0] = NULL;
+       node->count = 0;
+
        kmem_cache_free(radix_tree_node_cachep, node);
 }
 
@@ -930,11 +939,6 @@ static inline void radix_tree_shrink(str
                        newptr = radix_tree_ptr_to_indirect(newptr);
                root->rnode = newptr;
                root->height--;
-               /* must only free zeroed nodes into the slab */
-               tag_clear(to_free, 0, 0);
-               tag_clear(to_free, 1, 0);
-               to_free->slots[0] = NULL;
-               to_free->count = 0;
                radix_tree_node_free(to_free);
        }
 }

So instead of clearing the first slot on free, we delay it one grace
period and clear it later. So that anybody already having a pointer to
the node can correctly resolve the item.


  reply	other threads:[~2008-06-12 19:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-12 19:03 [patch] radix-tree: fix small lockless radix-tree bug Nick Piggin
2008-06-12 19:15 ` Peter Zijlstra
2008-06-12 19:31 ` Andrew Morton
2008-06-12 19:38   ` Peter Zijlstra [this message]
2008-06-12 19:34 ` Andrew Morton
2008-06-12 19:47   ` Nick Piggin
2008-06-13  7:53 ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1213299513.31518.174.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=paulmck@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.