* [PATCH] xfs: fix attr tree double split corruption
@ 2012-11-07 6:06 Dave Chinner
0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2012-11-07 6:06 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2. The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty. The attribute being replaced - call it X
- is the last attribute in L1.
The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.
So, the initial condition is:
+--------+ +--------+
| L1 | | L2 |
| fwd: 2 |---->| fwd: 0 |
| bwd: 0 |<----| bwd: 1 |
| fsp: 0 | | fsp: N |
|--------| |--------|
| attr A | | attr 1 |
|--------| |--------|
| attr B | | attr 2 |
|--------| |--------|
.......... ..........
|--------| |--------|
| attr X | | attr n |
+--------+ +--------+
So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:
+--------+ +--------+ +--------+
| L1 | | L3 | | L2 |
| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
| bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
| fsp: M | | fsp: J | | fsp: N |
|--------| |--------| |--------|
| attr A | | attr X | | attr 1 |
|--------| +--------+ |--------|
| attr B | | attr 2 |
|--------| |--------|
.......... ..........
|--------| |--------|
| attr W | | attr n |
+--------+ +--------+
And we track that the original attribute is now at L3:0.
We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:
+--------+ +--------+ +--------+ +--------+
| L1 | | L4 | | L3 | | L2 |
| fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
| bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
| fsp: M | | fsp: J | | fsp: J | | fsp: N |
|--------| |--------| |--------| |--------|
| attr A | | attr Y | | attr X | | attr 1 |
|--------| + INCOMP + +--------+ |--------|
| attr B | +--------+ | attr 2 |
|--------| |--------|
.......... ..........
|--------| |--------|
| attr W | | attr n |
+--------+ +--------+
And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.
This is where we are supposed to end up with this:
+--------+ +--------+ +--------+ +--------+
| L1 | | L4 | | L3 | | L2 |
| fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
| bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
| fsp: M | | fsp: J | | fsp: J | | fsp: N |
|--------| |--------| |--------| |--------|
| attr A | | attr Y | | attr X | | attr 1 |
|--------| +--------+ + INCOMP + |--------|
| attr B | +--------+ | attr 2 |
|--------| |--------|
.......... ..........
|--------| |--------|
| attr W | | attr n |
+--------+ +--------+
But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1. On a debug kernel, this assert fails like so:
XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725
because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.
xfs_repair will report something like:
bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0
The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.
With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.
Hopefully this table will show this in a foramt that is a bit easier
to understand:
Split old attr index new attr index
vanilla patched vanilla patched
before 1st L1:26 L1:26 N/A N/A
after 1st L3:0 L3:0 L1:26 L1:26
after 2nd L4:0 L3:0 L4:1 L4:0
^^^^ ^^^^
wrong wrong
The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_attr_leaf.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index d790860..39eb8c6 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -1336,6 +1336,7 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
leaf2 = blk2->bp->b_addr;
ASSERT(leaf1->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
ASSERT(leaf2->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC));
+ ASSERT(leaf2->hdr.count == 0);
args = state->args;
trace_xfs_attr_leaf_rebalance(args);
@@ -1406,6 +1407,7 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
* I assert that since all callers pass in an empty
* second buffer, this code should never execute.
*/
+ ASSERT(0);
/*
* Figure the total bytes to be added to the destination leaf.
@@ -1467,10 +1469,24 @@ xfs_attr_leaf_rebalance(xfs_da_state_t *state, xfs_da_state_blk_t *blk1,
args->index2 = 0;
args->blkno2 = blk2->blkno;
} else {
+ /*
+ * On a double leaf split, the original attr location
+ * is already stored in blkno2/index2, so don't
+ * overwrite it overwise we corrupt the tree.
+ */
blk2->index = blk1->index
- be16_to_cpu(leaf1->hdr.count);
- args->index = args->index2 = blk2->index;
- args->blkno = args->blkno2 = blk2->blkno;
+ args->index = blk2->index;
+ args->blkno = blk2->blkno;
+ if (!state->extravalid) {
+ /*
+ * set the new attr location to match the old
+ * one and let the higher level split code
+ * decide where in the leaf to place it.
+ */
+ args->index2 = blk2->index;
+ args->blkno2 = blk2->blkno;
+ }
}
} else {
ASSERT(state->inleaf == 1);
--
1.7.10
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix attr tree double split corruption
[not found] <1353625175-29479-1-git-send-email-peterhuewe@gmx.de>
@ 2012-11-23 0:49 ` Dave Chinner
2012-11-23 7:32 ` Peter Hüwe
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2012-11-23 0:49 UTC (permalink / raw)
To: Peter Huewe; +Cc: Ben Myers, stable, xfs
[added xfs@oss.sgi.com]
On Thu, Nov 22, 2012 at 11:59:35PM +0100, Peter Huewe wrote:
> In certain circumstances, a double split of an attribute tree is
> needed to insert or replace an attribute. In rare situations, this
> can go wrong, leaving the attribute tree corrupted. In this case,
> the attr being replaced is the last attr in a leaf node, and the
> replacement is larger so doesn't fit in the same leaf node.
> When we have the initial condition of a node format attribute
> btree with two leaves at index 1 and 2. Call them L1 and L2. The
> leaf L1 is completely full, there is not a single byte of free space
> in it. L2 is mostly empty. The attribute being replaced - call it X
> - is the last attribute in L1.
Hi Peter,
While I appreciate what you are doing here, can you please send
stable backport patches to the xfs@oss.sgi.com list for review
before sending them to stable@vger.kernel.org? There may already be
someone doing this work, and we want to make sure that backports are
correct, tested and worth the risk of fixing before asking the
(already overworked) stable kernel maintainers to include it.
As such, how did you test that the fix works on the stable kernels
you are targetting? AFAIK, I'm the only person who has the
filesystem images that reproducably triggered this corruption
problem....
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
> Signed-off-by: Ben Myers <bpm@sgi.com>
>
> Cc: <stable@vger.kernel.org> # 3.6.x
> Cc: <stable@vger.kernel.org> # 3.4.x
> Cc: <stable@vger.kernel.org> # 3.2.x
If we are pushing this fix back to stable kernels, then it
should go back to 3.0.x as well.
Cheers,
Dave.
--
Dave Chinner
dchinner@redhat.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix attr tree double split corruption
2012-11-23 0:49 ` [PATCH] xfs: fix attr tree double split corruption Dave Chinner
@ 2012-11-23 7:32 ` Peter Hüwe
2012-11-23 8:55 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Peter Hüwe @ 2012-11-23 7:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: Ben Myers, stable, xfs
Hi David,
thanks for your response!
Am Freitag, 23. November 2012, 01:49:37 schrieb Dave Chinner:
> [added xfs@oss.sgi.com]
>
> On Thu, Nov 22, 2012 at 11:59:35PM +0100, Peter Huewe wrote:
>
> Hi Peter,
>
> While I appreciate what you are doing here, can you please send
> stable backport patches to the xfs@oss.sgi.com list for review
> before sending them to stable@vger.kernel.org? There may already be
> someone doing this work, and we want to make sure that backports are
> correct, tested and worth the risk of fixing before asking the
> (already overworked) stable kernel maintainers to include it.
Yeah you're right, especially with fs changes we should be more cautious,
sorry about the noise then.
I'm one of the few people who offered Greg KH a hand after his 'help wanted'
message/blog post and one of the 'todos' was to look through git-commits@ and
pick those who look worth of including into stable.
I was probably a bit too eager here, and skimming from top to bottom of git-
commits might also not have been a really good idea either ;)
> As such, how did you test that the fix works on the stable kernels
> you are targetting? AFAIK, I'm the only person who has the
> filesystem images that reproducably triggered this corruption
> problem....
I was mainly looking at:
- does it apply cleanly
- does it still boot
- basic functional test
- short review of the affected code after applying the patch i.e. does it still
look okay/as intended?
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Mark Tinguely <tinguely@sgi.com>
> > Signed-off-by: Ben Myers <bpm@sgi.com>
> >
> > Cc: <stable@vger.kernel.org> # 3.6.x
> > Cc: <stable@vger.kernel.org> # 3.4.x
> > Cc: <stable@vger.kernel.org> # 3.2.x
>
> If we are pushing this fix back to stable kernels, then it
> should go back to 3.0.x as well.
3.0.x failed my first test (does it apply cleanly?) so I left it out, as a
dedicated backport is needed.
Thanks
PeterH
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] xfs: fix attr tree double split corruption
2012-11-23 7:32 ` Peter Hüwe
@ 2012-11-23 8:55 ` Dave Chinner
0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2012-11-23 8:55 UTC (permalink / raw)
To: Peter Hüwe; +Cc: Ben Myers, xfs, stable, Dave Chinner
On Fri, Nov 23, 2012 at 08:32:44AM +0100, Peter Hüwe wrote:
> Hi David,
>
> thanks for your response!
> Am Freitag, 23. November 2012, 01:49:37 schrieb Dave Chinner:
> > [added xfs@oss.sgi.com]
> >
> > On Thu, Nov 22, 2012 at 11:59:35PM +0100, Peter Huewe wrote:
> >
> > Hi Peter,
> >
> > While I appreciate what you are doing here, can you please send
> > stable backport patches to the xfs@oss.sgi.com list for review
> > before sending them to stable@vger.kernel.org? There may already be
> > someone doing this work, and we want to make sure that backports are
> > correct, tested and worth the risk of fixing before asking the
> > (already overworked) stable kernel maintainers to include it.
>
> Yeah you're right, especially with fs changes we should be more cautious,
> sorry about the noise then.
It's not noise, just an opportunity to sort a a good process ;)
> I'm one of the few people who offered Greg KH a hand after his 'help wanted'
> message/blog post and one of the 'todos' was to look through git-commits@ and
> pick those who look worth of including into stable.
That's great to hear.
> I was probably a bit too eager here, and skimming from top to bottom of git-
> commits might also not have been a really good idea either ;)
Probably not. But like I said, this is an opportunity....
> > As such, how did you test that the fix works on the stable kernels
> > you are targetting? AFAIK, I'm the only person who has the
> > filesystem images that reproducably triggered this corruption
> > problem....
>
> I was mainly looking at:
> - does it apply cleanly
> - does it still boot
> - basic functional test
> - short review of the affected code after applying the patch i.e. does it still
> look okay/as intended?
I can see that this is a good process for most changes, especially
for drivers and non-core functionality where mistakes won't cause
permanent or unrecoverable damage.
Filesystems, though, are a slightly different case - make a mistake,
you can corrupt and/or lose people's data. That's what we need to
avoid at all costs. Hence there's a higher bar for backports and
it pays to involve the guys in the trenches when deciding what to
backport.
I'll give you an idea of my normal process for doing a backport to
stable:
1. identify backport candidates based on how critical the bug is,
risk of regression, how certain I am that the problem properly
fixed, etc.
1a. let people know I'm doing this
2. do the set of backports
3. run xfstests several times, a couple of load tests (e.g. dbench,
compilebench, fsmark for a couple of hours each). Generally I get
24hrs of testing on backports before moving onwards...
4. post the backport for review. If there are multiple versions for
different stable kernels, they all get sent for review (rare). If
the code is already tested and very little change from upstream is
needed, then I'll cc stable@vger.kernel.org at this point too.
5. after review, send to stable@....
We do send some stuff to stable via CC's in commit messages - they
typically aren't the complex fixes that we are cautious about,
though.
Essentially, 1a and 3 are the main things I'm concerned about -
knowing what is going on and what testing has been done. If you are
unsure/scared by the testing requirements, I'm sure that if you ask
nicely one of us will be able to do a test cycle as part of the
review process. :)
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > Reviewed-by: Mark Tinguely <tinguely@sgi.com>
> > > Signed-off-by: Ben Myers <bpm@sgi.com>
> > >
> > > Cc: <stable@vger.kernel.org> # 3.6.x
> > > Cc: <stable@vger.kernel.org> # 3.4.x
> > > Cc: <stable@vger.kernel.org> # 3.2.x
> >
> > If we are pushing this fix back to stable kernels, then it
> > should go back to 3.0.x as well.
>
> 3.0.x failed my first test (does it apply cleanly?) so I left it out, as a
> dedicated backport is needed.
And we can help with that. If you want learn a lot quickly, then
doing these sorts of backports is a fine way to acheive that.
I'd really, really like what you are doing to work - keeping track
of all the stable kernels and what is in them and what has/hasn't
been back ported is more that I can keep up with (and I've got
distro kernels to worry about, too). Any help we can get would be
greatly appreciated.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-11-23 8:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1353625175-29479-1-git-send-email-peterhuewe@gmx.de>
2012-11-23 0:49 ` [PATCH] xfs: fix attr tree double split corruption Dave Chinner
2012-11-23 7:32 ` Peter Hüwe
2012-11-23 8:55 ` Dave Chinner
2012-11-07 6:06 Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox