linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	npiggin@kernel.dk, a.p.zijlstra@chello.nl
Subject: Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly
Date: Fri, 20 Aug 2010 12:04:07 +1000	[thread overview]
Message-ID: <20100820020407.GX10429@dastard> (raw)
In-Reply-To: <20100819222559.GW10429@dastard>

On Fri, Aug 20, 2010 at 08:25:59AM +1000, Dave Chinner wrote:
> On Thu, Aug 19, 2010 at 05:58:39PM +0200, Jan Kara wrote:
> >   Hi Dave,
> > 
> > On Thu 19-08-10 23:25:52, Dave Chinner wrote:
> > > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally
> > > broken.  All the tag set/clear code stores the tree path in a cursor
> > > and uses that to propagate the tags if and only if the full path
> > > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets
> > > tags on intermediate nodes before it has resolved the full path and
> > > hence can set tags when it should not. The "should not" cases occur
> > > when we have to tag sub-ranges or the scan aborts because it's
> > > reached the number ot tag in a batch.
> >   Thanks for debugging this! You are right that the code can leave dangling
> > tag when we end the scan at the end of given range but the first tagged
> > leaf is after the end of the given range (there shouldn't be a problem with
> > the batches because there we can exit only just after we tag a leaf so that
> > should be OK).
> >   There are two possibilities how to fix the bug:
> > a) Always tag bottom up - i.e., when we see leaf that should be tagged, go
> > up and tag the parent as well if it is not already tagged.
> > b) When we exit the search and we didn't not set any leaf tag since last
> > time we went down, we walk up the tree and do an equivalent of
> > radix_tree_clear_tag().
> >   I'll probably go for a) since it looks more robust but b) would be
> > probably faster.
> 
> I think that when it comes to data integrity, more robust should
> win over speed every time. I think it can be done quite easily,
> though, having slept on it - we have the current path in the
> open_slots[] array, so we could just walk that when we set a leaf
> tag. That should be easy to optimise as well - just keep track of
> how high up the path we have set the tag and only walk that far
> when setting the tags. That way we don't continually set the tag on
> the root higher level slots. That shouldn't be any slower than the
> current code...

Fixing this indicates that there is a second bug also corrupting the
PAGECACHE_TAG_TOWRITE tags - it takes quite a bit longer to hit, but
when it fails it is generally because the bit at slot offset zero in
a high-up intermediate node is incorrectly set. It appears that none
of the code is actually setting it, so it's been quite difficult to
track down.

Eventually I noticed through code inspection that
radix_tree_node_rcu_free() clears the tag at offset zero for the
because of the radix_tree_shrink implementation potentially leaving
the first slot non-null. The addition of the third tag did not add
this clearing of the tag in the zero slot.  Adding this:

 	 */
 	tag_clear(node, 0, 0);
 	tag_clear(node, 1, 0);
+	tag_clear(node, 2, 0);
 	node->slots[0] = NULL;
 	node->count = 0;
 
To radix_tree_node_rcu_free() appears to fix the problem. Whoever
failed to coment the definition of the number of tags the radix tree
supports left a really nasty landmine that Jan stepped on. Cleaning
up the mess hasn't been pretty, either.

So, after a couple of days of debugging I finally have test
013 passing without failing. Now to clean up the mess I have and
test some proper patches....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2010-08-20  2:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-18 13:56 [bug] radix_tree_gang_lookup_tag_slot() looping endlessly Dave Chinner
2010-08-18 17:37 ` Jan Kara
2010-08-18 23:29   ` Dave Chinner
2010-08-19  7:25     ` Dave Chinner
2010-08-19 13:25       ` Dave Chinner
2010-08-19 15:58         ` Jan Kara
2010-08-19 22:25           ` Dave Chinner
2010-08-20  2:04             ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100820020407.GX10429@dastard \
    --to=david@fromorbit.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).