public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] btrfs: fix inode rbtree corruption
@ 2009-08-18 16:45 Nick Piggin
  2009-08-18 18:56 ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-18 16:45 UTC (permalink / raw)
  To: Chris Mason, linux-btrfs

Hi,

Ran into a problem stress testing my btrfs truncate conversion attempt...
Unfortunately it was an existing btrfs problem. Fortunately I think I
was able to fix it.

Thanks,
Nick

--
btrfs: fix inode rbtree corruption

Node may not be inserted over existing node. This causes inode tree
corruption and I was seeing crashes in inode_tree_del which I can not
reproduce after this patch.

The other way to fix this would be to tie inode lifetime in the rbtree
with inode while not in freeing state. I had a look at this but it is
not so trivial at this point. At least this patch gets things working again.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 fs/btrfs/inode.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c
+++ linux-2.6/fs/btrfs/inode.c
@@ -3099,8 +3099,12 @@ static void inode_tree_add(struct inode
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_inode *entry;
-	struct rb_node **p = &root->inode_tree.rb_node;
-	struct rb_node *parent = NULL;
+	struct rb_node **p;
+	struct rb_node *parent;
+
+again:
+	p = &root->inode_tree.rb_node;
+	parent = NULL;
 
 	spin_lock(&root->inode_lock);
 	while (*p) {
@@ -3108,13 +3112,16 @@ static void inode_tree_add(struct inode
 		entry = rb_entry(parent, struct btrfs_inode, rb_node);
 
 		if (inode->i_ino < entry->vfs_inode.i_ino)
-			p = &(*p)->rb_left;
+			p = &parent->rb_left;
 		else if (inode->i_ino > entry->vfs_inode.i_ino)
-			p = &(*p)->rb_right;
+			p = &parent->rb_right;
 		else {
 			WARN_ON(!(entry->vfs_inode.i_state &
 				  (I_WILL_FREE | I_FREEING | I_CLEAR)));
-			break;
+			rb_erase(parent, &root->inode_tree);
+			RB_CLEAR_NODE(parent);
+			spin_unlock(&root->inode_lock);
+			goto again;
 		}
 	}
 	rb_link_node(&BTRFS_I(inode)->rb_node, parent, p);
@@ -3126,12 +3133,12 @@ static void inode_tree_del(struct inode
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 
+	spin_lock(&root->inode_lock);
 	if (!RB_EMPTY_NODE(&BTRFS_I(inode)->rb_node)) {
-		spin_lock(&root->inode_lock);
 		rb_erase(&BTRFS_I(inode)->rb_node, &root->inode_tree);
-		spin_unlock(&root->inode_lock);
 		RB_CLEAR_NODE(&BTRFS_I(inode)->rb_node);
 	}
+	spin_unlock(&root->inode_lock);
 }
 
 static noinline void init_btrfs_i(struct inode *inode)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-18 16:45 [patch] btrfs: fix inode rbtree corruption Nick Piggin
@ 2009-08-18 18:56 ` Yan, Zheng 
  2009-08-18 21:19   ` Jens Axboe
  2009-08-19  8:32   ` Nick Piggin
  0 siblings, 2 replies; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-18 18:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Mason, linux-btrfs

2009/8/19 Nick Piggin <npiggin@suse.de>:
> Hi,
>
> Ran into a problem stress testing my btrfs truncate conversion attempt...
> Unfortunately it was an existing btrfs problem. Fortunately I think I
> was able to fix it.
>
> Thanks,
> Nick
>
> --
> btrfs: fix inode rbtree corruption
>
> Node may not be inserted over existing node. This causes inode tree
> corruption and I was seeing crashes in inode_tree_del which I can not
> reproduce after this patch.
>
> The other way to fix this would be to tie inode lifetime in the rbtree
> with inode while not in freeing state. I had a look at this but it is
> not so trivial at this point. At least this patch gets things working again.
>

I'm not quite understand this. rbtree allows entries having the same keys.
I guess your problem is because of some nodes get inserted into the tree
twice. But I have no idea how can it happen.

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-18 18:56 ` Yan, Zheng 
@ 2009-08-18 21:19   ` Jens Axboe
  2009-08-19  8:45     ` Nick Piggin
  2009-08-19  8:32   ` Nick Piggin
  1 sibling, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2009-08-18 21:19 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Nick Piggin, Chris Mason, linux-btrfs

On Wed, Aug 19 2009, Yan, Zheng  wrote:
> 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > Hi,
> >
> > Ran into a problem stress testing my btrfs truncate conversion attempt...
> > Unfortunately it was an existing btrfs problem. Fortunately I think I
> > was able to fix it.
> >
> > Thanks,
> > Nick
> >
> > --
> > btrfs: fix inode rbtree corruption
> >
> > Node may not be inserted over existing node. This causes inode tree
> > corruption and I was seeing crashes in inode_tree_del which I can not
> > reproduce after this patch.
> >
> > The other way to fix this would be to tie inode lifetime in the rbtree
> > with inode while not in freeing state. I had a look at this but it is
> > not so trivial at this point. At least this patch gets things working again.
> >
> 
> I'm not quite understand this. rbtree allows entries having the same keys.
> I guess your problem is because of some nodes get inserted into the tree
> twice. But I have no idea how can it happen.

It can work with key aliases, if it's a problem then it's likely due to
another problem in related lookup code.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-18 18:56 ` Yan, Zheng 
  2009-08-18 21:19   ` Jens Axboe
@ 2009-08-19  8:32   ` Nick Piggin
  1 sibling, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2009-08-19  8:32 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Chris Mason, linux-btrfs

On Wed, Aug 19, 2009 at 02:56:54AM +0800, Yan, Zheng  wrote:
> 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > Hi,
> >
> > Ran into a problem stress testing my btrfs truncate conversion attempt...
> > Unfortunately it was an existing btrfs problem. Fortunately I think I
> > was able to fix it.
> >
> > Thanks,
> > Nick
> >
> > --
> > btrfs: fix inode rbtree corruption
> >
> > Node may not be inserted over existing node. This causes inode tree
> > corruption and I was seeing crashes in inode_tree_del which I can not
> > reproduce after this patch.
> >
> > The other way to fix this would be to tie inode lifetime in the rbtree
> > with inode while not in freeing state. I had a look at this but it is
> > not so trivial at this point. At least this patch gets things working again.
> >
> 
> I'm not quite understand this. rbtree allows entries having the same keys.

No it doesn't. Well it *does* -- that's completely a choice of your
insert routine. But what you are not allowed to do is just pass in
a pointer to a non-null left/right pointer to link your new node into
because then the old node just gets lost.

So the tree is probably not corrupt as such, but when you try to delete
one of these old nodes then it crashes because it is not attached in
the tree.

> I guess your problem is because of some nodes get inserted into the tree
> twice. But I have no idea how can it happen.

No, the problem is 2 different nodes with the same key get inserted.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-18 21:19   ` Jens Axboe
@ 2009-08-19  8:45     ` Nick Piggin
  2009-08-19  8:46       ` Jens Axboe
  2009-08-19  8:56       ` Yan, Zheng 
  0 siblings, 2 replies; 20+ messages in thread
From: Nick Piggin @ 2009-08-19  8:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Yan, Zheng , Chris Mason, linux-btrfs

On Tue, Aug 18, 2009 at 11:19:10PM +0200, Jens Axboe wrote:
> On Wed, Aug 19 2009, Yan, Zheng  wrote:
> > 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > > Hi,
> > >
> > > Ran into a problem stress testing my btrfs truncate conversion attempt...
> > > Unfortunately it was an existing btrfs problem. Fortunately I think I
> > > was able to fix it.
> > >
> > > Thanks,
> > > Nick
> > >
> > > --
> > > btrfs: fix inode rbtree corruption
> > >
> > > Node may not be inserted over existing node. This causes inode tree
> > > corruption and I was seeing crashes in inode_tree_del which I can not
> > > reproduce after this patch.
> > >
> > > The other way to fix this would be to tie inode lifetime in the rbtree
> > > with inode while not in freeing state. I had a look at this but it is
> > > not so trivial at this point. At least this patch gets things working again.
> > >
> > 
> > I'm not quite understand this. rbtree allows entries having the same keys.
> > I guess your problem is because of some nodes get inserted into the tree
> > twice. But I have no idea how can it happen.
> 
> It can work with key aliases, if it's a problem then it's likely due to
> another problem in related lookup code.

See my other reply. It *can* work with key aliases, but this particular
code does not.

It is pretty easy obviously to put in duplicates because the rbtree
code doesn't know about keys, but if we do this then it looks like
it might cause the search code to miss some valid inodes and instead
return freeing inodes -- so you'd also have to look at that and update
it which is why I didn't go down this route..

---
 fs/btrfs/inode.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6/fs/btrfs/inode.c
===================================================================
--- linux-2.6.orig/fs/btrfs/inode.c
+++ linux-2.6/fs/btrfs/inode.c
@@ -3108,13 +3108,14 @@ static void inode_tree_add(struct inode
 		entry = rb_entry(parent, struct btrfs_inode, rb_node);
 
 		if (inode->i_ino < entry->vfs_inode.i_ino)
-			p = &(*p)->rb_left;
-		else if (inode->i_ino > entry->vfs_inode.i_ino)
-			p = &(*p)->rb_right;
+			p = &parent->rb_left;
 		else {
-			WARN_ON(!(entry->vfs_inode.i_state &
-				  (I_WILL_FREE | I_FREEING | I_CLEAR)));
-			break;
+			p = &parent->rb_right;
+			if (inode->i_ino == entry->vfs_inode.i_ino) {
+				/* tolerate duplicates */
+				WARN_ON(!(entry->vfs_inode.i_state &
+					  (I_WILL_FREE | I_FREEING | I_CLEAR)));
+			}
 		}
 	}
 	rb_link_node(&BTRFS_I(inode)->rb_node, parent, p);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:45     ` Nick Piggin
@ 2009-08-19  8:46       ` Jens Axboe
  2009-08-19  8:52         ` Nick Piggin
  2009-08-19  8:56       ` Yan, Zheng 
  1 sibling, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2009-08-19  8:46 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Yan, Zheng , Chris Mason, linux-btrfs

On Wed, Aug 19 2009, Nick Piggin wrote:
> On Tue, Aug 18, 2009 at 11:19:10PM +0200, Jens Axboe wrote:
> > On Wed, Aug 19 2009, Yan, Zheng  wrote:
> > > 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > > > Hi,
> > > >
> > > > Ran into a problem stress testing my btrfs truncate conversion attempt...
> > > > Unfortunately it was an existing btrfs problem. Fortunately I think I
> > > > was able to fix it.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > > --
> > > > btrfs: fix inode rbtree corruption
> > > >
> > > > Node may not be inserted over existing node. This causes inode tree
> > > > corruption and I was seeing crashes in inode_tree_del which I can not
> > > > reproduce after this patch.
> > > >
> > > > The other way to fix this would be to tie inode lifetime in the rbtree
> > > > with inode while not in freeing state. I had a look at this but it is
> > > > not so trivial at this point. At least this patch gets things working again.
> > > >
> > > 
> > > I'm not quite understand this. rbtree allows entries having the same keys.
> > > I guess your problem is because of some nodes get inserted into the tree
> > > twice. But I have no idea how can it happen.
> > 
> > It can work with key aliases, if it's a problem then it's likely due to
> > another problem in related lookup code.
> 
> See my other reply. It *can* work with key aliases, but this particular
> code does not.
> 
> It is pretty easy obviously to put in duplicates because the rbtree
> code doesn't know about keys, but if we do this then it looks like
> it might cause the search code to miss some valid inodes and instead
> return freeing inodes -- so you'd also have to look at that and update
> it which is why I didn't go down this route..

Mine was just a generic statement, I didn't read the btrfs code (hence
my comment about potential lookup bug, if you allow aliases you have to
be careful).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:46       ` Jens Axboe
@ 2009-08-19  8:52         ` Nick Piggin
  2009-08-19  8:59           ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-19  8:52 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Yan, Zheng , Chris Mason, linux-btrfs

On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
> On Wed, Aug 19 2009, Nick Piggin wrote:
> > See my other reply. It *can* work with key aliases, but this particular
> > code does not.
> > 
> > It is pretty easy obviously to put in duplicates because the rbtree
> > code doesn't know about keys, but if we do this then it looks like
> > it might cause the search code to miss some valid inodes and instead
> > return freeing inodes -- so you'd also have to look at that and update
> > it which is why I didn't go down this route..
> 
> Mine was just a generic statement, I didn't read the btrfs code (hence
> my comment about potential lookup bug, if you allow aliases you have to
> be careful).

Ah ok. Well yeah in this case btrfs is definitely wrong in the way it
tried to insert aliases.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:45     ` Nick Piggin
  2009-08-19  8:46       ` Jens Axboe
@ 2009-08-19  8:56       ` Yan, Zheng 
  2009-08-19  9:04         ` Nick Piggin
  1 sibling, 1 reply; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-19  8:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jens Axboe, Chris Mason, linux-btrfs

2009/8/19 Nick Piggin <npiggin@suse.de>:
> On Tue, Aug 18, 2009 at 11:19:10PM +0200, Jens Axboe wrote:
>> On Wed, Aug 19 2009, Yan, Zheng =A0wrote:
>> > 2009/8/19 Nick Piggin <npiggin@suse.de>:
>> > > Hi,
>> > >
>> > > Ran into a problem stress testing my btrfs truncate conversion a=
ttempt...
>> > > Unfortunately it was an existing btrfs problem. Fortunately I th=
ink I
>> > > was able to fix it.
>> > >
>> > > Thanks,
>> > > Nick
>> > >
>> > > --
>> > > btrfs: fix inode rbtree corruption
>> > >
>> > > Node may not be inserted over existing node. This causes inode t=
ree
>> > > corruption and I was seeing crashes in inode_tree_del which I ca=
n not
>> > > reproduce after this patch.
>> > >
>> > > The other way to fix this would be to tie inode lifetime in the =
rbtree
>> > > with inode while not in freeing state. I had a look at this but =
it is
>> > > not so trivial at this point. At least this patch gets things wo=
rking again.
>> > >
>> >
>> > I'm not quite understand this. rbtree allows entries having the sa=
me keys.
>> > I guess your problem is because of some nodes get inserted into th=
e tree
>> > twice. But I have no idea how can it happen.
>>
>> It can work with key aliases, if it's a problem then it's likely due=
 to
>> another problem in related lookup code.
> See my other reply. It *can* work with key aliases, but this particul=
ar
> code does not.
> It is pretty easy obviously to put in duplicates because the rbtree
> code doesn't know about keys, but if we do this then it looks like
> it might cause the search code to miss some valid inodes and instead
> return freeing inodes -- so you'd also have to look at that and updat=
e
> it which is why I didn't go down this route..

There is no search code. The only place uses the inode tree is
the relocation code, it traverses the tree and uses igrab to guarantee
freeing inodes are not touched. I'm still confused :(

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:52         ` Nick Piggin
@ 2009-08-19  8:59           ` Jens Axboe
  2009-08-20 13:23             ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2009-08-19  8:59 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Yan, Zheng , Chris Mason, linux-btrfs

On Wed, Aug 19 2009, Nick Piggin wrote:
> On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
> > On Wed, Aug 19 2009, Nick Piggin wrote:
> > > See my other reply. It *can* work with key aliases, but this particular
> > > code does not.
> > > 
> > > It is pretty easy obviously to put in duplicates because the rbtree
> > > code doesn't know about keys, but if we do this then it looks like
> > > it might cause the search code to miss some valid inodes and instead
> > > return freeing inodes -- so you'd also have to look at that and update
> > > it which is why I didn't go down this route..
> > 
> > Mine was just a generic statement, I didn't read the btrfs code (hence
> > my comment about potential lookup bug, if you allow aliases you have to
> > be careful).
> 
> Ah ok. Well yeah in this case btrfs is definitely wrong in the way it
> tried to insert aliases.

I looked at the actual problem now and I agree, it cannot work that way.
I don't know if Linus is planning another -rc, we should probably get
this upstream sooner rather than later. Chris is away this week, so if
we can get Yan to agree on this patch as well, I'll submit it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:56       ` Yan, Zheng 
@ 2009-08-19  9:04         ` Nick Piggin
  2009-08-19  9:34           ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-19  9:04 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Jens Axboe, Chris Mason, linux-btrfs

On Wed, Aug 19, 2009 at 04:56:12PM +0800, Yan, Zheng  wrote:
> 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > On Tue, Aug 18, 2009 at 11:19:10PM +0200, Jens Axboe wrote:
> >> On Wed, Aug 19 2009, Yan, Zheng =A0wrote:
> >> > 2009/8/19 Nick Piggin <npiggin@suse.de>:
> >> It can work with key aliases, if it's a problem then it's likely d=
ue to
> >> another problem in related lookup code.
> > See my other reply. It *can* work with key aliases, but this partic=
ular
> > code does not.
> > It is pretty easy obviously to put in duplicates because the rbtree
> > code doesn't know about keys, but if we do this then it looks like
> > it might cause the search code to miss some valid inodes and instea=
d
> > return freeing inodes -- so you'd also have to look at that and upd=
ate
> > it which is why I didn't go down this route..
>=20
> There is no search code. The only place uses the inode tree is
> the relocation code, it traverses the tree and uses igrab to guarante=
e
> freeing inodes are not touched. I'm still confused :(

=46irstly, the insert/delete code is wrong for duplicates and it will c=
rash in
the absense of any search activity. Agree?

Secondly, OK now if we did allow duplicates in the tree as-per my last
patch to Jens, then look what happens with igrab: it will correctly
prevent us from getting a freeing inode, but then it will set the next
inode to search at ino+1 -- ie. it will not correctly traverse duplicat=
es
without modifications. Agree?

So with that in mind -- the fact that you don't want to see freeing
inodes in your search code, then there is no point to handle duplicates
at all; simply remove freeing inodes from the tree.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  9:04         ` Nick Piggin
@ 2009-08-19  9:34           ` Yan, Zheng 
  2009-08-19 10:47             ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-19  9:34 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jens Axboe, Chris Mason, linux-btrfs

2009/8/19 Nick Piggin <npiggin@suse.de>:
> On Wed, Aug 19, 2009 at 04:56:12PM +0800, Yan, Zheng =A0wrote:
>> 2009/8/19 Nick Piggin <npiggin@suse.de>:
>> > On Tue, Aug 18, 2009 at 11:19:10PM +0200, Jens Axboe wrote:
>> >> On Wed, Aug 19 2009, Yan, Zheng =A0wrote:
>> >> > 2009/8/19 Nick Piggin <npiggin@suse.de>:
>> >> It can work with key aliases, if it's a problem then it's likely =
due to
>> >> another problem in related lookup code.
>> > See my other reply. It *can* work with key aliases, but this parti=
cular
>> > code does not.
>> > It is pretty easy obviously to put in duplicates because the rbtre=
e
>> > code doesn't know about keys, but if we do this then it looks like
>> > it might cause the search code to miss some valid inodes and inste=
ad
>> > return freeing inodes -- so you'd also have to look at that and up=
date
>> > it which is why I didn't go down this route..
>>
>> There is no search code. The only place uses the inode tree is
>> the relocation code, it traverses the tree and uses igrab to guarant=
ee
>> freeing inodes are not touched. I'm still confused :(
> Firstly, the insert/delete code is wrong for duplicates and it will c=
rash in
> the absense of any search activity. Agree?
> Secondly, OK now if we did allow duplicates in the tree as-per my las=
t
> patch to Jens, then look what happens with igrab: it will correctly
> prevent us from getting a freeing inode, but then it will set the nex=
t
> inode to search at ino+1 -- ie. it will not correctly traverse duplic=
ates
> without modifications. Agree?
> So with that in mind -- the fact that you don't want to see freeing
> inodes in your search code, then there is no point to handle duplicat=
es
> at all; simply remove freeing inodes from the tree.
>

I agree all of this. Thing confuses me is you saw crashes in
inode_tree_del.  It's unlikely you were playing btrfsctl -b when you
encountered the problem. So no search code got involved, only
inode_tree_add/del modified the tree. I don't think the crash was
caused by duplicates in the tree.

I have no objection to take this patch.

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  9:34           ` Yan, Zheng 
@ 2009-08-19 10:47             ` Nick Piggin
  2009-08-19 12:00               ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-19 10:47 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Jens Axboe, Chris Mason, linux-btrfs

On Wed, Aug 19, 2009 at 05:34:08PM +0800, Yan, Zheng  wrote:
> 2009/8/19 Nick Piggin <npiggin@suse.de>:
> > On Wed, Aug 19, 2009 at 04:56:12PM +0800, Yan, Zheng =A0wrote:
> > Firstly, the insert/delete code is wrong for duplicates and it will=
 crash in
> > the absense of any search activity. Agree?
> > Secondly, OK now if we did allow duplicates in the tree as-per my l=
ast
> > patch to Jens, then look what happens with igrab: it will correctly
> > prevent us from getting a freeing inode, but then it will set the n=
ext
> > inode to search at ino+1 -- ie. it will not correctly traverse dupl=
icates
> > without modifications. Agree?
> > So with that in mind -- the fact that you don't want to see freeing
> > inodes in your search code, then there is no point to handle duplic=
ates
> > at all; simply remove freeing inodes from the tree.
> >
>=20
> I agree all of this. Thing confuses me is you saw crashes in
> inode_tree_del.  It's unlikely you were playing btrfsctl -b when you
> encountered the problem. So no search code got involved, only
> inode_tree_add/del modified the tree. I don't think the crash was
> caused by duplicates in the tree.

Oh, it is because inodes get reclaimed then presumably the inode number
gets reused while an inode of the same number is being freed. I'm not
exactly sure how the inode and inode number lifetimes work in btrfs...

Duplicate inodes are definitely being added because I put a message in
there and it is being triggered. What happens is that the duplicate
insertion code is wrong, and it kicks the old inode out of the tree
with the inode still marked as being in the tree.  So when you try to
delete that inode, it crashes.

I'm not quite sure how to explain it any better. It is pretty easy to
reproduce (it is the same workload as I describe in fsx-linux failures
mail) if you would like to verify what is happening.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19 10:47             ` Nick Piggin
@ 2009-08-19 12:00               ` Yan, Zheng 
  0 siblings, 0 replies; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-19 12:00 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-btrfs

2009/8/19 Nick Piggin <npiggin@suse.de>:
> On Wed, Aug 19, 2009 at 05:34:08PM +0800, Yan, Zheng =A0wrote:
>> 2009/8/19 Nick Piggin <npiggin@suse.de>:
>> > On Wed, Aug 19, 2009 at 04:56:12PM +0800, Yan, Zheng =A0wrote:
>> > Firstly, the insert/delete code is wrong for duplicates and it wil=
l crash in
>> > the absense of any search activity. Agree?
>> > Secondly, OK now if we did allow duplicates in the tree as-per my =
last
>> > patch to Jens, then look what happens with igrab: it will correctl=
y
>> > prevent us from getting a freeing inode, but then it will set the =
next
>> > inode to search at ino+1 -- ie. it will not correctly traverse dup=
licates
>> > without modifications. Agree?
>> > So with that in mind -- the fact that you don't want to see freein=
g
>> > inodes in your search code, then there is no point to handle dupli=
cates
>> > at all; simply remove freeing inodes from the tree.
>> >
>>
>> I agree all of this. Thing confuses me is you saw crashes in
>> inode_tree_del. =A0It's unlikely you were playing btrfsctl -b when y=
ou
>> encountered the problem. So no search code got involved, only
>> inode_tree_add/del modified the tree. I don't think the crash was
>> caused by duplicates in the tree.
>
> Oh, it is because inodes get reclaimed then presumably the inode numb=
er
> gets reused while an inode of the same number is being freed. I'm not
> exactly sure how the inode and inode number lifetimes work in btrfs..=
=2E
>
> Duplicate inodes are definitely being added because I put a message i=
n
> there and it is being triggered. What happens is that the duplicate
> insertion code is wrong, and it kicks the old inode out of the tree
> with the inode still marked as being in the tree. =A0So when you try =
to
> delete that inode, it crashes.
>
> I'm not quite sure how to explain it any better. It is pretty easy to
> reproduce (it is the same workload as I describe in fsx-linux failure=
s
> mail) if you would like to verify what is happening.
>

I finally understand, thank you very much.
I didn't read your reply contains duplicates toleration fix carefully, =
sorry.

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-19  8:59           ` Jens Axboe
@ 2009-08-20 13:23             ` Nick Piggin
  2009-08-20 13:51               ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2009-08-20 13:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Yan, Zheng , Chris Mason, linux-btrfs

On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
> On Wed, Aug 19 2009, Nick Piggin wrote:
> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
> > > On Wed, Aug 19 2009, Nick Piggin wrote:
> > > > See my other reply. It *can* work with key aliases, but this particular
> > > > code does not.
> > > > 
> > > > It is pretty easy obviously to put in duplicates because the rbtree
> > > > code doesn't know about keys, but if we do this then it looks like
> > > > it might cause the search code to miss some valid inodes and instead
> > > > return freeing inodes -- so you'd also have to look at that and update
> > > > it which is why I didn't go down this route..
> > > 
> > > Mine was just a generic statement, I didn't read the btrfs code (hence
> > > my comment about potential lookup bug, if you allow aliases you have to
> > > be careful).
> > 
> > Ah ok. Well yeah in this case btrfs is definitely wrong in the way it
> > tried to insert aliases.
> 
> I looked at the actual problem now and I agree, it cannot work that way.
> I don't know if Linus is planning another -rc, we should probably get
> this upstream sooner rather than later. Chris is away this week, so if
> we can get Yan to agree on this patch as well, I'll submit it.

I think the first patch I submitted was agreed?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-20 13:23             ` Nick Piggin
@ 2009-08-20 13:51               ` Yan, Zheng 
  2009-08-20 22:07                 ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-20 13:51 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jens Axboe, Chris Mason, linux-btrfs

2009/8/20 Nick Piggin <npiggin@suse.de>:
> On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
>> On Wed, Aug 19 2009, Nick Piggin wrote:
>> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
>> > > On Wed, Aug 19 2009, Nick Piggin wrote:
>> > > > See my other reply. It *can* work with key aliases, but this particular
>> > > > code does not.
>> > > >
>> > > > It is pretty easy obviously to put in duplicates because the rbtree
>> > > > code doesn't know about keys, but if we do this then it looks like
>> > > > it might cause the search code to miss some valid inodes and instead
>> > > > return freeing inodes -- so you'd also have to look at that and update
>> > > > it which is why I didn't go down this route..
>> > >
>> > > Mine was just a generic statement, I didn't read the btrfs code (hence
>> > > my comment about potential lookup bug, if you allow aliases you have to
>> > > be careful).
>> >
>> > Ah ok. Well yeah in this case btrfs is definitely wrong in the way it
>> > tried to insert aliases.
>>
>> I looked at the actual problem now and I agree, it cannot work that way.
>> I don't know if Linus is planning another -rc, we should probably get
>> this upstream sooner rather than later. Chris is away this week, so if
>> we can get Yan to agree on this patch as well, I'll submit it.
> I think the first patch I submitted was agreed?
>

Of course, thank you.

Yan, Zheng

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-20 13:51               ` Yan, Zheng 
@ 2009-08-20 22:07                 ` Jens Axboe
  2009-08-21  0:55                   ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2009-08-20 22:07 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Nick Piggin, Chris Mason, linux-btrfs

On Thu, Aug 20 2009, Yan, Zheng  wrote:
> 2009/8/20 Nick Piggin <npiggin@suse.de>:
> > On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
> >> On Wed, Aug 19 2009, Nick Piggin wrote:
> >> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
> >> > > On Wed, Aug 19 2009, Nick Piggin wrote:
> >> > > > See my other reply. It *can* work with key aliases, but this particular
> >> > > > code does not.
> >> > > >
> >> > > > It is pretty easy obviously to put in duplicates because the rbtree
> >> > > > code doesn't know about keys, but if we do this then it looks like
> >> > > > it might cause the search code to miss some valid inodes and instead
> >> > > > return freeing inodes -- so you'd also have to look at that and update
> >> > > > it which is why I didn't go down this route..
> >> > >
> >> > > Mine was just a generic statement, I didn't read the btrfs code (hence
> >> > > my comment about potential lookup bug, if you allow aliases you have to
> >> > > be careful).
> >> >
> >> > Ah ok. Well yeah in this case btrfs is definitely wrong in the way it
> >> > tried to insert aliases.
> >>
> >> I looked at the actual problem now and I agree, it cannot work that way.
> >> I don't know if Linus is planning another -rc, we should probably get
> >> this upstream sooner rather than later. Chris is away this week, so if
> >> we can get Yan to agree on this patch as well, I'll submit it.
> > I think the first patch I submitted was agreed?
> >
> 
> Of course, thank you.

Yan, are you sending this upstream?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-20 22:07                 ` Jens Axboe
@ 2009-08-21  0:55                   ` Yan, Zheng 
  2009-08-21  6:20                     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-21  0:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Nick Piggin, Chris Mason, linux-btrfs

2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
> On Thu, Aug 20 2009, Yan, Zheng =A0wrote:
>> 2009/8/20 Nick Piggin <npiggin@suse.de>:
>> > On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
>> >> On Wed, Aug 19 2009, Nick Piggin wrote:
>> >> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
>> >> > > On Wed, Aug 19 2009, Nick Piggin wrote:
>> >> > > > See my other reply. It *can* work with key aliases, but thi=
s particular
>> >> > > > code does not.
>> >> > > >
>> >> > > > It is pretty easy obviously to put in duplicates because th=
e rbtree
>> >> > > > code doesn't know about keys, but if we do this then it loo=
ks like
>> >> > > > it might cause the search code to miss some valid inodes an=
d instead
>> >> > > > return freeing inodes -- so you'd also have to look at that=
 and update
>> >> > > > it which is why I didn't go down this route..
>> >> > >
>> >> > > Mine was just a generic statement, I didn't read the btrfs co=
de (hence
>> >> > > my comment about potential lookup bug, if you allow aliases y=
ou have to
>> >> > > be careful).
>> >> >
>> >> > Ah ok. Well yeah in this case btrfs is definitely wrong in the =
way it
>> >> > tried to insert aliases.
>> >>
>> >> I looked at the actual problem now and I agree, it cannot work th=
at way.
>> >> I don't know if Linus is planning another -rc, we should probably=
 get
>> >> this upstream sooner rather than later. Chris is away this week, =
so if
>> >> we can get Yan to agree on this patch as well, I'll submit it.
>> > I think the first patch I submitted was agreed?
>> >
>>
>> Of course, thank you.
> Yan, are you sending this upstream?

No.  Jens, please submit it. Thanks

Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-21  0:55                   ` Yan, Zheng 
@ 2009-08-21  6:20                     ` Jens Axboe
  2009-08-21  8:06                       ` Yan, Zheng 
  0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2009-08-21  6:20 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Nick Piggin, Chris Mason, linux-btrfs

On Fri, Aug 21 2009, Yan, Zheng  wrote:
> 2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
> > On Thu, Aug 20 2009, Yan, Zheng =A0wrote:
> >> 2009/8/20 Nick Piggin <npiggin@suse.de>:
> >> > On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
> >> >> On Wed, Aug 19 2009, Nick Piggin wrote:
> >> >> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
> >> >> > > On Wed, Aug 19 2009, Nick Piggin wrote:
> >> >> > > > See my other reply. It *can* work with key aliases, but t=
his particular
> >> >> > > > code does not.
> >> >> > > >
> >> >> > > > It is pretty easy obviously to put in duplicates because =
the rbtree
> >> >> > > > code doesn't know about keys, but if we do this then it l=
ooks like
> >> >> > > > it might cause the search code to miss some valid inodes =
and instead
> >> >> > > > return freeing inodes -- so you'd also have to look at th=
at and update
> >> >> > > > it which is why I didn't go down this route..
> >> >> > >
> >> >> > > Mine was just a generic statement, I didn't read the btrfs =
code (hence
> >> >> > > my comment about potential lookup bug, if you allow aliases=
 you have to
> >> >> > > be careful).
> >> >> >
> >> >> > Ah ok. Well yeah in this case btrfs is definitely wrong in th=
e way it
> >> >> > tried to insert aliases.
> >> >>
> >> >> I looked at the actual problem now and I agree, it cannot work =
that way.
> >> >> I don't know if Linus is planning another -rc, we should probab=
ly get
> >> >> this upstream sooner rather than later. Chris is away this week=
, so if
> >> >> we can get Yan to agree on this patch as well, I'll submit it.
> >> > I think the first patch I submitted was agreed?
> >> >
> >>
> >> Of course, thank you.
> > Yan, are you sending this upstream?
>=20
> No.  Jens, please submit it. Thanks

Will do. Can I add your acked-by?

--=20
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-21  6:20                     ` Jens Axboe
@ 2009-08-21  8:06                       ` Yan, Zheng 
  2009-08-21  8:10                         ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Yan, Zheng  @ 2009-08-21  8:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Nick Piggin, Chris Mason, linux-btrfs

2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
> On Fri, Aug 21 2009, Yan, Zheng =A0wrote:
>> 2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
>> > On Thu, Aug 20 2009, Yan, Zheng =A0wrote:
>> >> 2009/8/20 Nick Piggin <npiggin@suse.de>:
>> >> > On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
>> >> >> On Wed, Aug 19 2009, Nick Piggin wrote:
>> >> >> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote:
>> >> >> > > On Wed, Aug 19 2009, Nick Piggin wrote:
>> >> >> > > > See my other reply. It *can* work with key aliases, but =
this particular
>> >> >> > > > code does not.
>> >> >> > > >
>> >> >> > > > It is pretty easy obviously to put in duplicates because=
 the rbtree
>> >> >> > > > code doesn't know about keys, but if we do this then it =
looks like
>> >> >> > > > it might cause the search code to miss some valid inodes=
 and instead
>> >> >> > > > return freeing inodes -- so you'd also have to look at t=
hat and update
>> >> >> > > > it which is why I didn't go down this route..
>> >> >> > >
>> >> >> > > Mine was just a generic statement, I didn't read the btrfs=
 code (hence
>> >> >> > > my comment about potential lookup bug, if you allow aliase=
s you have to
>> >> >> > > be careful).
>> >> >> >
>> >> >> > Ah ok. Well yeah in this case btrfs is definitely wrong in t=
he way it
>> >> >> > tried to insert aliases.
>> >> >>
>> >> >> I looked at the actual problem now and I agree, it cannot work=
 that way.
>> >> >> I don't know if Linus is planning another -rc, we should proba=
bly get
>> >> >> this upstream sooner rather than later. Chris is away this wee=
k, so if
>> >> >> we can get Yan to agree on this patch as well, I'll submit it.
>> >> > I think the first patch I submitted was agreed?
>> >> >
>> >>
>> >> Of course, thank you.
>> > Yan, are you sending this upstream?
>>
>> No. =A0Jens, please submit it. Thanks
> Will do. Can I add your acked-by?

Acked-by: Yan Zheng <zheng.yan@oracle.com>

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch] btrfs: fix inode rbtree corruption
  2009-08-21  8:06                       ` Yan, Zheng 
@ 2009-08-21  8:10                         ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2009-08-21  8:10 UTC (permalink / raw)
  To: Yan, Zheng ; +Cc: Nick Piggin, Chris Mason, linux-btrfs

On Fri, Aug 21 2009, Yan, Zheng  wrote:
> 2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
> > On Fri, Aug 21 2009, Yan, Zheng =A0wrote:
> >> 2009/8/21 Jens Axboe <jens.axboe@oracle.com>:
> >> > On Thu, Aug 20 2009, Yan, Zheng =A0wrote:
> >> >> 2009/8/20 Nick Piggin <npiggin@suse.de>:
> >> >> > On Wed, Aug 19, 2009 at 10:59:07AM +0200, Jens Axboe wrote:
> >> >> >> On Wed, Aug 19 2009, Nick Piggin wrote:
> >> >> >> > On Wed, Aug 19, 2009 at 10:46:59AM +0200, Jens Axboe wrote=
:
> >> >> >> > > On Wed, Aug 19 2009, Nick Piggin wrote:
> >> >> >> > > > See my other reply. It *can* work with key aliases, bu=
t this particular
> >> >> >> > > > code does not.
> >> >> >> > > >
> >> >> >> > > > It is pretty easy obviously to put in duplicates becau=
se the rbtree
> >> >> >> > > > code doesn't know about keys, but if we do this then i=
t looks like
> >> >> >> > > > it might cause the search code to miss some valid inod=
es and instead
> >> >> >> > > > return freeing inodes -- so you'd also have to look at=
 that and update
> >> >> >> > > > it which is why I didn't go down this route..
> >> >> >> > >
> >> >> >> > > Mine was just a generic statement, I didn't read the btr=
fs code (hence
> >> >> >> > > my comment about potential lookup bug, if you allow alia=
ses you have to
> >> >> >> > > be careful).
> >> >> >> >
> >> >> >> > Ah ok. Well yeah in this case btrfs is definitely wrong in=
 the way it
> >> >> >> > tried to insert aliases.
> >> >> >>
> >> >> >> I looked at the actual problem now and I agree, it cannot wo=
rk that way.
> >> >> >> I don't know if Linus is planning another -rc, we should pro=
bably get
> >> >> >> this upstream sooner rather than later. Chris is away this w=
eek, so if
> >> >> >> we can get Yan to agree on this patch as well, I'll submit i=
t.
> >> >> > I think the first patch I submitted was agreed?
> >> >> >
> >> >>
> >> >> Of course, thank you.
> >> > Yan, are you sending this upstream?
> >>
> >> No. =A0Jens, please submit it. Thanks
> > Will do. Can I add your acked-by?
>=20
> Acked-by: Yan Zheng <zheng.yan@oracle.com>

Thanks Yan, going upstream now.

--=20
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-08-21  8:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-18 16:45 [patch] btrfs: fix inode rbtree corruption Nick Piggin
2009-08-18 18:56 ` Yan, Zheng 
2009-08-18 21:19   ` Jens Axboe
2009-08-19  8:45     ` Nick Piggin
2009-08-19  8:46       ` Jens Axboe
2009-08-19  8:52         ` Nick Piggin
2009-08-19  8:59           ` Jens Axboe
2009-08-20 13:23             ` Nick Piggin
2009-08-20 13:51               ` Yan, Zheng 
2009-08-20 22:07                 ` Jens Axboe
2009-08-21  0:55                   ` Yan, Zheng 
2009-08-21  6:20                     ` Jens Axboe
2009-08-21  8:06                       ` Yan, Zheng 
2009-08-21  8:10                         ` Jens Axboe
2009-08-19  8:56       ` Yan, Zheng 
2009-08-19  9:04         ` Nick Piggin
2009-08-19  9:34           ` Yan, Zheng 
2009-08-19 10:47             ` Nick Piggin
2009-08-19 12:00               ` Yan, Zheng 
2009-08-19  8:32   ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox