From: Alex Elder <aelder@sgi.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] stable: restart busy extent search after node removal
Date: Fri, 15 Jul 2011 09:19:02 -0500 [thread overview]
Message-ID: <1310739542.2921.8.camel@doink> (raw)
In-Reply-To: <4E1CF47D.7080909@sandeen.net>
On Tue, 2011-07-12 at 20:27 -0500, Eric Sandeen wrote:
> On 7/12/11 7:20 PM, Dave Chinner wrote:
> > On Tue, Jul 12, 2011 at 07:14:19PM -0500, Eric Sandeen wrote:
> >> On 7/12/11 7:12 PM, Dave Chinner wrote:
> >>> On Tue, Jul 12, 2011 at 05:03:38PM -0500, Eric Sandeen wrote:
> >>>> Sending this for review prior to stable submission...
> >>>>
> >>>> A user on #xfs reported that a log replay was oopsing in
> >>>> __rb_rotate_left() with a null pointer deref.
> >>>>
> >>>> I traced this down to the fact that in xfs_alloc_busy_insert(),
> >>>> we erased a node with rb_erase() when the new node overlapped,
> >>>> but left it specified as the parent node for the new insertion.
> >>>>
> >>>> So when we try to insert a new node with an erased node as
> >>>> its parent, obviously things go very wrong.
> >>>>
> >>>> Upstream,
> >>>> 97d3ac75e5e0ebf7ca38ae74cebd201c09b97ab2 xfs: exact busy extent tracking
> >>>> actually fixed this, but as part of a much larger change. Here's
> >>>> the relevant bit:
> >>>>
> >>>> * We also need to restart the busy extent search from the
> >>>> * tree root, because erasing the node can rearrange the
> >>>> * tree topology.
> >>>> */
> >>>> rb_erase(&busyp->rb_node, &pag->pagb_tree);
> >>>> busyp->length = 0;
> >>>> return false;
> >>>>
> >>>> We can do essentially the same thing to older codebases by restarting
> >>>> the search after the erase.
> >>>>
> >>>> This should apply to .35 through .39, and was tested on .39
> >>>> with the oopsing replay reproducer.
> >>>>
> >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >>>> ---
> >>>>
> >>>> Index: linux-2.6/fs/xfs/xfs_alloc.c
> >>>> ===================================================================
> >>>> --- linux-2.6.orig/fs/xfs/xfs_alloc.c
> >>>> +++ linux-2.6/fs/xfs/xfs_alloc.c
> >>>> @@ -2664,6 +2664,12 @@ restart:
> >>>> new->bno + new->length) -
> >>>> min(busyp->bno, new->bno);
> >>>> new->bno = min(busyp->bno, new->bno);
> >>>> + /*
> >>>> + * Start the search over from the tree root, because
> >>>> + * erasing the node can rearrange the tree topology.
> >>>> + */
> >>>> + spin_unlock(&pag->pagb_lock);
> >>>> + goto restart;
> >>>> } else
> >>>> busyp = NULL;
> >>>
> >>> Looks good.
> >>>
> >>> I'm guessing that the only case I was able to hit during testing of
> >>> this code originally was the "overlap with exact start block match",
> >>> otherwise I would have seen this. I'm not sure that there really is
> >>> much we can do to improve the test coverage of this code, though.
> >>> Hell, just measuring our test coverage so we know what we aren't
> >>> testing would probably be a good start. :/
> >>
> >> Apparently the original oops, and the subsequent replay oopses,
> >> were on a filesystem VERY busy with torrents.
> >>
> >> Might be a testcase ;)
So, would you mind trying to create this as a test?
Can you come up with a reliable way to create a
small but *very* fragmented filesystem to do stuff
with?
Maybe a function to do that would be useful (sort
of like the one Allison Henderson did for creating
full filesystem) for doing various tests, including
log replay, xfs_repair, and various operations while
"loaded" in that way.
-Alex
> >
> > That just means large files. And fragmentation levels are
> > effectively dependent on whether the torrent client uses
> > preallocation or not. Just creating a set of large fragmented file
> > using preallocation, shutting the filesystem down in the middle
> > of it and then doing log replay might do the trick...
>
> well yeah, my point was, it was in fact badly fragmented.
>
> To quote my favorite meaningless xfs_db statistic,
>
> actual 29700140, ideal 185230, fragmentation factor 99.38%
>
> I guess that's "only" 160 extents per file.
>
> But one of the 2.2G files had 44,000 extents, as an example.
> I am guessing the client did not preallocate. :)
>
> -Eric
>
> > Cheers,
> >
> > Dave.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-07-15 14:19 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-12 22:03 [PATCH] stable: restart busy extent search after node removal Eric Sandeen
2011-07-13 0:12 ` Dave Chinner
2011-07-13 0:14 ` Eric Sandeen
2011-07-13 0:20 ` Dave Chinner
2011-07-13 1:27 ` Eric Sandeen
2011-07-15 14:19 ` Alex Elder [this message]
2011-07-16 1:20 ` Dave Chinner
2011-07-13 13:50 ` Alex Elder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1310739542.2921.8.camel@doink \
--to=aelder@sgi.com \
--cc=sandeen@redhat.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox