public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
@ 2002-11-05  4:47 Jeremy Fitzhardinge
  2002-11-05 21:24 ` [Ext2-devel] " chrisl
  0 siblings, 1 reply; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-05  4:47 UTC (permalink / raw)
  To: Ext2 devel; +Cc: Linux Kernel List

I've isolated the problem to rename not removing the old name under some
circumstances, leaving two names for a file with an nlink of 1.  This
will reliably reproduce the problem for me, under 2.4.19-ac4 and 2.4.19
(stock) w/ patch-ext3-dxdir-2.4.19-4.

Generate a new filesystem: this will create htree-bug.fs
$ sh genfs

# mkdir m
# mount -o loop htree-bug.fs m

$ gcc -o tickle tickle.c
$ ./tickle m/test
*** rename("drivers/scsi/psi240i.h", "drivers/scsi/psi240i.h.orig") failure:
stating drivers/scsi/psi240i.h
  ino=294
  nlink=1
stating drivers/scsi/psi240i.h.orig
  ino=294
  nlink=1
*** rename("drivers/scsi/sun3_scsi.h", "drivers/scsi/sun3_scsi.h.orig") failure:
stating drivers/scsi/sun3_scsi.h
  ino=350
  nlink=1
stating drivers/scsi/sun3_scsi.h.orig
  ino=350
  nlink=1

# umount m

$ e2fsck -f htree-bug.fs
e2fsck 1.30-WIP (30-Sep-2002)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 294 ref count is 1, should be 2.  Fix<y>? yes

Inode 350 ref count is 1, should be 2.  Fix<y>? yes

Pass 5: Checking group summary information

htree-bug.fs: ***** FILE SYSTEM WAS MODIFIED *****
htree-bug.fs: 541/10240 files (0.2% non-contiguous), 1369/10240 blocks
exit status 1
$ debugfs htree-bug.fs 
debugfs 1.30-WIP (30-Sep-2002)
debugfs:  ncheck 294
Inode   Pathname
294     /test/drivers/scsi/psi240i.h
debugfs:  ncheck 350
Inode   Pathname
350     /test/drivers/scsi/sun3_scsi.h


I've put all the bits needed to reproduce the bug (genfs, tickle) at
http://www.goop.org/~jeremy/htree/


        J


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Ext2-devel] bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-05  4:47 bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink Jeremy Fitzhardinge
@ 2002-11-05 21:24 ` chrisl
  2002-11-06  5:02   ` Jeremy Fitzhardinge
  2002-11-06  8:25   ` [PATCH] Fix " chrisl
  0 siblings, 2 replies; 13+ messages in thread
From: chrisl @ 2002-11-05 21:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ext2 devel, Linux Kernel List

Thanks again for the nice bug report.

I think I understand the problem now. What happen was, in ext3_rename,
it will first add the new entry to directory and then remove the
old entry. In this case, when add a new entry to the directory
cause a leaf node split. And the old entry is in the very same
leaf node. After split, the old entry have been move to another
leaf node. But ext3_rename still holding the old pointer and bh
to the old leaf node.

It will try to delete the entry from old leaf node, and of course,
it can't find it.

You know the rest of the story then.

The old ext3 code is OK because it only append new blocks, it
will not change the previous block.

This also raise an interesting question, after split leave node,
do we need to update the dentry cache for the change?

I will try to post a patch tonight (pacific time) to fix this problem.

Cheers.

Chris

On Mon, Nov 04, 2002 at 08:47:50PM -0800, Jeremy Fitzhardinge wrote:
> I've isolated the problem to rename not removing the old name under some
> circumstances, leaving two names for a file with an nlink of 1.  This
> will reliably reproduce the problem for me, under 2.4.19-ac4 and 2.4.19
> (stock) w/ patch-ext3-dxdir-2.4.19-4.
> 
> Generate a new filesystem: this will create htree-bug.fs
> $ sh genfs
> 
> # mkdir m
> # mount -o loop htree-bug.fs m
> 
> $ gcc -o tickle tickle.c
> $ ./tickle m/test
> *** rename("drivers/scsi/psi240i.h", "drivers/scsi/psi240i.h.orig") failure:
> stating drivers/scsi/psi240i.h
>   ino=294
>   nlink=1
> stating drivers/scsi/psi240i.h.orig
>   ino=294
>   nlink=1
> *** rename("drivers/scsi/sun3_scsi.h", "drivers/scsi/sun3_scsi.h.orig") failure:
> stating drivers/scsi/sun3_scsi.h
>   ino=350
>   nlink=1
> stating drivers/scsi/sun3_scsi.h.orig
>   ino=350
>   nlink=1
> 
> # umount m
> 
> $ e2fsck -f htree-bug.fs
> e2fsck 1.30-WIP (30-Sep-2002)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Inode 294 ref count is 1, should be 2.  Fix<y>? yes
> 
> Inode 350 ref count is 1, should be 2.  Fix<y>? yes
> 
> Pass 5: Checking group summary information
> 
> htree-bug.fs: ***** FILE SYSTEM WAS MODIFIED *****
> htree-bug.fs: 541/10240 files (0.2% non-contiguous), 1369/10240 blocks
> exit status 1
> $ debugfs htree-bug.fs 
> debugfs 1.30-WIP (30-Sep-2002)
> debugfs:  ncheck 294
> Inode   Pathname
> 294     /test/drivers/scsi/psi240i.h
> debugfs:  ncheck 350
> Inode   Pathname
> 350     /test/drivers/scsi/sun3_scsi.h
> 
> 
> I've put all the bits needed to reproduce the bug (genfs, tickle) at
> http://www.goop.org/~jeremy/htree/
> 
> 
>         J
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: ApacheCon, November 18-21 in
> Las Vegas (supported by COMDEX), the only Apache event to be
> fully supported by the ASF. http://www.apachecon.com
> _______________________________________________
> Ext2-devel mailing list
> Ext2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ext2-devel



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Ext2-devel] bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-05 21:24 ` [Ext2-devel] " chrisl
@ 2002-11-06  5:02   ` Jeremy Fitzhardinge
  2002-11-06  8:25   ` [PATCH] Fix " chrisl
  1 sibling, 0 replies; 13+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-06  5:02 UTC (permalink / raw)
  To: chrisl; +Cc: Ext2 devel, Linux Kernel List

On Tue, 2002-11-05 at 13:24, chrisl@vmware.com wrote:
> Thanks again for the nice bug report.

Thanks for looking at it so quickly.  I want ext3+htree to stabilise as
quickly as possible, and since I had a nice reproducible bug, I may as
well make the most of it.

> I think I understand the problem now. What happen was, in ext3_rename,
> it will first add the new entry to directory and then remove the
> old entry. In this case, when add a new entry to the directory
> cause a leaf node split. And the old entry is in the very same
> leaf node. After split, the old entry have been move to another
> leaf node. But ext3_rename still holding the old pointer and bh
> to the old leaf node.

Yes, I would have guessed that it was related to a tree split.  The
interesting thing to me is that it happened twice in this particular
run, and yet it must be a fairly uncommon occurrence overall (otherwise
it would have been reported before).  I wonder if it really is a rare
event, or it has just gone unnoticed?

> This also raise an interesting question, after split leave node,
> do we need to update the dentry cache for the change?

Update it in what way?  In principle a rename is an atomic operation, so
other things shouldn't be able to observe the directory in an
intermediate state.

	J



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-05 21:24 ` [Ext2-devel] " chrisl
  2002-11-06  5:02   ` Jeremy Fitzhardinge
@ 2002-11-06  8:25   ` chrisl
  2002-11-06  8:44     ` Andrew Morton
  2002-11-06 21:40     ` Theodore Ts'o
  1 sibling, 2 replies; 13+ messages in thread
From: chrisl @ 2002-11-06  8:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ext2 devel, Linux Kernel List

This should fix the ext3 htree rename problem. Please try it again.

Thanks

Chris

--- namei.c     2002/11/06 07:19:11     1.2
+++ namei.c     2002/11/06 08:17:31
@@ -2091,7 +2091,7 @@
        old_bh = new_bh = dir_bh = NULL;
 
        handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
-                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
+                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);
        if (IS_ERR(handle)) {
                return PTR_ERR(handle);
        }
@@ -2167,8 +2167,30 @@
        /*
         * ok, that's it
         */
-       ext3_delete_entry(handle, old_dir, old_de, old_bh);
+       retval = ext3_delete_entry(handle, old_dir, old_de, old_bh);
+       if (retval == -ENOENT) {
+               /*
+                * old_de can be moved during ext3_add_entry.
+                */
+               struct buffer_head * old_bh2;
+               struct ext3_dir_entry_2 * old_de2;
+               old_bh2 = ext3_find_entry (old_dentry, &old_de2);
+               if (old_bh2) {
+                       retval = ext3_delete_entry(handle, old_dir, old_de2,
+                                                  old_bh2);
+                       brelse(old_bh2);
+               } else {
+                       ext3_warning(old_dir->i_sb, "ext3_rename",
+                                    "Deleting old file not found (%lu), %d",
+                                    old_dir->i_ino, old_dir->i_nlink);
+               }
 
+       }
+       if (retval) {
+               ext3_warning (old_dir->i_sb, "ext3_rename",
+                             "Deleting old file (%lu), %d, error=%d",
+                             old_dir->i_ino, old_dir->i_nlink, retval);
+       }
        if (new_inode) {
                new_inode->i_nlink--;
                new_inode->i_ctime = CURRENT_TIME;



On Tue, Nov 05, 2002 at 01:24:16PM -0800, chrisl@vmware.com wrote:
> Thanks again for the nice bug report.
> 
> I think I understand the problem now. What happen was, in ext3_rename,
> it will first add the new entry to directory and then remove the
> old entry. In this case, when add a new entry to the directory
> cause a leaf node split. And the old entry is in the very same
> leaf node. After split, the old entry have been move to another
> leaf node. But ext3_rename still holding the old pointer and bh
> to the old leaf node.
> 
> It will try to delete the entry from old leaf node, and of course,
> it can't find it.
> 
> You know the rest of the story then.
> 
> The old ext3 code is OK because it only append new blocks, it
> will not change the previous block.
> 
> This also raise an interesting question, after split leave node,
> do we need to update the dentry cache for the change?
> 
> I will try to post a patch tonight (pacific time) to fix this problem.
> 
> Cheers.
> 
> Chris
> 
> On Mon, Nov 04, 2002 at 08:47:50PM -0800, Jeremy Fitzhardinge wrote:
> > I've isolated the problem to rename not removing the old name under some
> > circumstances, leaving two names for a file with an nlink of 1.  This
> > will reliably reproduce the problem for me, under 2.4.19-ac4 and 2.4.19
> > (stock) w/ patch-ext3-dxdir-2.4.19-4.
> > 
> > Generate a new filesystem: this will create htree-bug.fs
> > $ sh genfs
> > 
> > # mkdir m
> > # mount -o loop htree-bug.fs m
> > 
> > $ gcc -o tickle tickle.c
> > $ ./tickle m/test
> > *** rename("drivers/scsi/psi240i.h", "drivers/scsi/psi240i.h.orig") failure:
> > stating drivers/scsi/psi240i.h
> >   ino=294
> >   nlink=1
> > stating drivers/scsi/psi240i.h.orig
> >   ino=294
> >   nlink=1
> > *** rename("drivers/scsi/sun3_scsi.h", "drivers/scsi/sun3_scsi.h.orig") failure:
> > stating drivers/scsi/sun3_scsi.h
> >   ino=350
> >   nlink=1
> > stating drivers/scsi/sun3_scsi.h.orig
> >   ino=350
> >   nlink=1
> > 
> > # umount m
> > 
> > $ e2fsck -f htree-bug.fs
> > e2fsck 1.30-WIP (30-Sep-2002)
> > Pass 1: Checking inodes, blocks, and sizes
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity
> > Pass 4: Checking reference counts
> > Inode 294 ref count is 1, should be 2.  Fix<y>? yes
> > 
> > Inode 350 ref count is 1, should be 2.  Fix<y>? yes
> > 
> > Pass 5: Checking group summary information
> > 
> > htree-bug.fs: ***** FILE SYSTEM WAS MODIFIED *****
> > htree-bug.fs: 541/10240 files (0.2% non-contiguous), 1369/10240 blocks
> > exit status 1
> > $ debugfs htree-bug.fs 
> > debugfs 1.30-WIP (30-Sep-2002)
> > debugfs:  ncheck 294
> > Inode   Pathname
> > 294     /test/drivers/scsi/psi240i.h
> > debugfs:  ncheck 350
> > Inode   Pathname
> > 350     /test/drivers/scsi/sun3_scsi.h
> > 
> > 
> > I've put all the bits needed to reproduce the bug (genfs, tickle) at
> > http://www.goop.org/~jeremy/htree/
> > 
> > 
> >         J
> > 
> > 
> > 
> > -------------------------------------------------------
> > This SF.net email is sponsored by: ApacheCon, November 18-21 in
> > Las Vegas (supported by COMDEX), the only Apache event to be
> > fully supported by the ASF. http://www.apachecon.com
> > _______________________________________________
> > Ext2-devel mailing list
> > Ext2-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ext2-devel
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by: See the NEW Palm 
> Tungsten T handheld. Power & Color in a compact size!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0001en
> _______________________________________________
> Ext2-devel mailing list
> Ext2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ext2-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name,  leaves ino with bad nlink
  2002-11-06  8:25   ` [PATCH] Fix " chrisl
@ 2002-11-06  8:44     ` Andrew Morton
  2002-11-06 21:40     ` Theodore Ts'o
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2002-11-06  8:44 UTC (permalink / raw)
  To: chrisl; +Cc: Jeremy Fitzhardinge, Ext2 devel, Linux Kernel List

chrisl@vmware.com wrote:
> 
> This should fix the ext3 htree rename problem. Please try it again.
> 

The 2.5 version....


--- 25/fs/ext3/namei.c~ext3-rename-fix	Wed Nov  6 00:36:43 2002
+++ 25-akpm/fs/ext3/namei.c	Wed Nov  6 00:41:20 2002
@@ -2160,7 +2160,7 @@ static int ext3_rename (struct inode * o
 
 	lock_kernel();
 	handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
-			 		EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
+			 		EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);
 	if (IS_ERR(handle)) {
 		unlock_kernel();
 		return PTR_ERR(handle);
@@ -2237,7 +2237,30 @@ static int ext3_rename (struct inode * o
 	/*
 	 * ok, that's it
 	 */
-	ext3_delete_entry(handle, old_dir, old_de, old_bh);
+	retval = ext3_delete_entry(handle, old_dir, old_de, old_bh);
+	if (retval == -ENOENT) {
+		/*
+		 * old_de can be moved during ext3_add_entry.
+		 */
+		struct buffer_head *old_bh2;
+		struct ext3_dir_entry_2 *old_de2;
+		old_bh2 = ext3_find_entry(old_dentry, &old_de2);
+		if (old_bh2) {
+			retval = ext3_delete_entry(handle, old_dir,
+						old_de2, old_bh2);
+			brelse(old_bh2);
+		} else {
+			ext3_warning(old_dir->i_sb, "ext3_rename",
+				"Deleting old file not found (%lu), %d",
+				old_dir->i_ino, old_dir->i_nlink);
+		}
+	}
+
+	if (retval) {
+		ext3_warning(old_dir->i_sb, "ext3_rename",
+				"Deleting old file (%lu), %d, error=%d",
+				old_dir->i_ino, old_dir->i_nlink, retval);
+	}
 
 	if (new_inode) {
 		new_inode->i_nlink--;

_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-06  8:25   ` [PATCH] Fix " chrisl
  2002-11-06  8:44     ` Andrew Morton
@ 2002-11-06 21:40     ` Theodore Ts'o
  2002-11-07  1:24       ` Christopher Li
  1 sibling, 1 reply; 13+ messages in thread
From: Theodore Ts'o @ 2002-11-06 21:40 UTC (permalink / raw)
  To: chrisl; +Cc: Jeremy Fitzhardinge, Ext2 devel, Linux Kernel List

On Wed, Nov 06, 2002 at 12:25:00AM -0800, chrisl@vmware.com wrote:
> This should fix the ext3 htree rename problem. Please try it again.

I've looked over the patch, and I've got some comments....

>         handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
> -                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
> +                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);

There's no need to increase the number of blocks that might need to be
dirtied; if ext3_delete_entry() can't find the missing entry, it won't
dirty the block, so the number of blocks that might need to modified
remains constant.

> -       ext3_delete_entry(handle, old_dir, old_de, old_bh);
> +       retval = ext3_delete_entry(handle, old_dir, old_de, old_bh);
> +       if (retval == -ENOENT) {
> +               /*
> +                * old_de can be moved during ext3_add_entry.
> +                */
> +               struct buffer_head * old_bh2;
> +               struct ext3_dir_entry_2 * old_de2;
> +               old_bh2 = ext3_find_entry (old_dentry, &old_de2);
> +               if (old_bh2) {
> +                       retval = ext3_delete_entry(handle, old_dir, old_de2,
> +                                                  old_bh2);
> +                       brelse(old_bh2);
> +               } else {
> +                       ext3_warning(old_dir->i_sb, "ext3_rename",
> +                                    "Deleting old file not found (%lu), %d",
> +                                    old_dir->i_ino, old_dir->i_nlink);
> +               }
>  
> +       }

Simply retrying the ext3_delete_entry() isn't sufficient, since
another ext3_add_entry() could move the directory entry again while
you're reading in the blocks as part of ext3_find_entry().  OK, that
would be pretty rare, since enough other directory adds would have
to fill up enough that another split could happen, but *is* possible.
(Surely our scheduler isn't that unfair....)

Probably a better thing to do is use a while loop, and retry as long
(a) ext3_delete_entry fails, and (b) old_dir->i_version has changed.
In practice this will probably never happen, but I'll feel better with
that change.

Anyway, I plan to make these two changes to your patch, and then
submit it to Linus.

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-07  1:24       ` Christopher Li
@ 2002-11-06 22:41         ` Theodore Ts'o
  2002-11-06 22:47           ` Alexander Viro
  2002-11-07  1:58           ` Christopher Li
  2002-11-06 23:27         ` [Ext2-devel] " Andreas Dilger
  1 sibling, 2 replies; 13+ messages in thread
From: Theodore Ts'o @ 2002-11-06 22:41 UTC (permalink / raw)
  To: Christopher Li; +Cc: Jeremy Fitzhardinge, Ext2 devel, Linux Kernel List

On Wed, Nov 06, 2002 at 05:24:55PM -0800, Christopher Li wrote:
> > Simply retrying the ext3_delete_entry() isn't sufficient, since
> > another ext3_add_entry() could move the directory entry again while
> > you're reading in the blocks as part of ext3_find_entry().  OK, that
> > would be pretty rare, since enough other directory adds would have
> > to fill up enough that another split could happen, but *is* possible.
> > (Surely our scheduler isn't that unfair....)
> 
> I think we have the lock on ext3_rename. I might be wrong.
> If other process can change the dir between the add new entry
> and delete old entry. We should also need to check the entry have
> been delete from other process in case fall into dead loop? 

We take the BKL, yes; but if we need to sleep waiting for a block to
be read in, that's when another process can run.  Yes, that means
another process could end up deleting the entry out from under us ---
or make some other change to the directory.  I was actually quite
nervous about this, so I spent some time auditing the code paths of
when do_split() might sleep, to make sure it would never leave the
directory in an unstable condition.

Things will actually get easier once we fine-grain lock ext3 (and
remove the BKL), since we'll very likely end up locking the directory
during an insert, rename, or delete, and so we don't have to worry
about things happening in an interleaved fashion.

> Do you have time to look at the missing "." and ".." entry patch?

Yup, that's going to Linus as well.

> Can you please change my email address in source file to vmware
> if you submit it?

I wasn't able to find your e-mail address in the source file.... 
grep -i chrisl fs/ext3/*.c didn't turn up anything?

						- Ted

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-06 22:41         ` Theodore Ts'o
@ 2002-11-06 22:47           ` Alexander Viro
  2002-11-07  2:44             ` Theodore Ts'o
  2002-11-07  1:58           ` Christopher Li
  1 sibling, 1 reply; 13+ messages in thread
From: Alexander Viro @ 2002-11-06 22:47 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christopher Li, Jeremy Fitzhardinge, Ext2 devel,
	Linux Kernel List



On Wed, 6 Nov 2002, Theodore Ts'o wrote:

> We take the BKL, yes; but if we need to sleep waiting for a block to
> be read in, that's when another process can run.  Yes, that means
> another process could end up deleting the entry out from under us ---
> or make some other change to the directory.  I was actually quite
> nervous about this, so I spent some time auditing the code paths of
> when do_split() might sleep, to make sure it would never leave the
> directory in an unstable condition.

HUH?

->rename() holds ->i_sem on both directories.  So do all other directory
methods.  What the hell is going on there?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Ext2-devel] Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-07  1:24       ` Christopher Li
  2002-11-06 22:41         ` Theodore Ts'o
@ 2002-11-06 23:27         ` Andreas Dilger
  1 sibling, 0 replies; 13+ messages in thread
From: Andreas Dilger @ 2002-11-06 23:27 UTC (permalink / raw)
  To: Christopher Li
  Cc: Theodore Ts'o, Jeremy Fitzhardinge, Ext2 devel,
	Linux Kernel List

On Nov 06, 2002  17:24 -0800, Christopher Li wrote:
> On Wed, Nov 06, 2002 at 04:40:27PM -0500, Theodore Ts'o wrote:
> > On Wed, Nov 06, 2002 at 12:25:00AM -0800, chrisl@vmware.com wrote:
> > > This should fix the ext3 htree rename problem. Please try it again.
> > 
> > I've looked over the patch, and I've got some comments....
> > 
> > >      handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
> > > -                                    EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
> > > +                                    EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);
> > 
> > There's no need to increase the number of blocks that might need to be
> > dirtied; if ext3_delete_entry() can't find the missing entry, it won't
> > dirty the block, so the number of blocks that might need to modified
> > remains constant.
> 
> Even for the same block dirty twice? I am not sure about that case
> so I put it there. I got a lots of thing to do tonight.

Ted is correct on this one - if we hit this bug because both the source
and target names were in the same block before the split, then after the
split we will still need to dirty only 2 blocks because of the rename
(the split blocks are accounted for in EXT3_INDEX_EXTRA_TRANS_BLOCKS).

> > Simply retrying the ext3_delete_entry() isn't sufficient, since
> > another ext3_add_entry() could move the directory entry again while
> > you're reading in the blocks as part of ext3_find_entry().  OK, that
> > would be pretty rare, since enough other directory adds would have
> > to fill up enough that another split could happen, but *is* possible.
> > (Surely our scheduler isn't that unfair....)
> 
> I think we have the lock on ext3_rename. I might be wrong.
> If other process can change the dir between the add new entry
> and delete old entry. We should also need to check the entry have
> been delete from other process in case fall into dead loop? 

Chris is right on this one.  Like Al said, the VFS holds i_sem on
"both" directories (or the single directory if src_dir & tgt_dir are
the same).  We don't need additional locking within ext3...(yet)

<aside>
For _real_ scaling of large directories, it would be good to allow
locking just individual leaf blocks of the directories instead of the
entire directory.  Since the source and target directory leaf blocks
are the only possible locations for those filenames, we do not have
any races w.r.t. other threads adding/deleting files of the same name
if we lock those directory blocks.

We will probably need to do this in the next year or so for Lustre where
we have a requirement for millions of files being created/renamed/deleted
in the same directory by thousands of clients, and tens of metadata
servers load-balancing those requests.
</aside>

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-06 21:40     ` Theodore Ts'o
@ 2002-11-07  1:24       ` Christopher Li
  2002-11-06 22:41         ` Theodore Ts'o
  2002-11-06 23:27         ` [Ext2-devel] " Andreas Dilger
  0 siblings, 2 replies; 13+ messages in thread
From: Christopher Li @ 2002-11-07  1:24 UTC (permalink / raw)
  To: Theodore Ts'o, Jeremy Fitzhardinge, Ext2 devel,
	Linux Kernel List

On Wed, Nov 06, 2002 at 04:40:27PM -0500, Theodore Ts'o wrote:
> On Wed, Nov 06, 2002 at 12:25:00AM -0800, chrisl@vmware.com wrote:
> > This should fix the ext3 htree rename problem. Please try it again.
> 
> I've looked over the patch, and I've got some comments....
> 
> >         handle = ext3_journal_start(old_dir, 2 * EXT3_DATA_TRANS_BLOCKS +
> > -                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 2);
> > +                                       EXT3_INDEX_EXTRA_TRANS_BLOCKS + 3);
> 
> There's no need to increase the number of blocks that might need to be
> dirtied; if ext3_delete_entry() can't find the missing entry, it won't
> dirty the block, so the number of blocks that might need to modified
> remains constant.

Even for the same block dirty twice? I am not sure about that case
so I put it there. I got a lots of thing to do tonight.

> 
> > -       ext3_delete_entry(handle, old_dir, old_de, old_bh);
> > +       retval = ext3_delete_entry(handle, old_dir, old_de, old_bh);
> > +       if (retval == -ENOENT) {
> > +               /*
> > +                * old_de can be moved during ext3_add_entry.
> > +                */
> > +               struct buffer_head * old_bh2;
> > +               struct ext3_dir_entry_2 * old_de2;
> > +               old_bh2 = ext3_find_entry (old_dentry, &old_de2);
> > +               if (old_bh2) {
> > +                       retval = ext3_delete_entry(handle, old_dir, old_de2,
> > +                                                  old_bh2);
> > +                       brelse(old_bh2);
> > +               } else {
> > +                       ext3_warning(old_dir->i_sb, "ext3_rename",
> > +                                    "Deleting old file not found (%lu), %d",
> > +                                    old_dir->i_ino, old_dir->i_nlink);
> > +               }
> >  
> > +       }
> 
> Simply retrying the ext3_delete_entry() isn't sufficient, since
> another ext3_add_entry() could move the directory entry again while
> you're reading in the blocks as part of ext3_find_entry().  OK, that
> would be pretty rare, since enough other directory adds would have
> to fill up enough that another split could happen, but *is* possible.
> (Surely our scheduler isn't that unfair....)

I think we have the lock on ext3_rename. I might be wrong.
If other process can change the dir between the add new entry
and delete old entry. We should also need to check the entry have
been delete from other process in case fall into dead loop? 

Too bad I don't have time to look at those until tonight.

> 
> Probably a better thing to do is use a while loop, and retry as long
> (a) ext3_delete_entry fails, and (b) old_dir->i_version has changed.
> In practice this will probably never happen, but I'll feel better with
> that change.
> 
> Anyway, I plan to make these two changes to your patch, and then
> submit it to Linus.

Do you have time to look at the missing "." and ".." entry patch?

Can you please change my email address in source file to vmware
if you submit it?

Thanks

Chris




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-06 22:41         ` Theodore Ts'o
  2002-11-06 22:47           ` Alexander Viro
@ 2002-11-07  1:58           ` Christopher Li
  1 sibling, 0 replies; 13+ messages in thread
From: Christopher Li @ 2002-11-07  1:58 UTC (permalink / raw)
  To: Theodore Ts'o, Jeremy Fitzhardinge, Ext2 devel,
	Linux Kernel List

On Wed, Nov 06, 2002 at 05:41:12PM -0500, Theodore Ts'o wrote:
> On Wed, Nov 06, 2002 at 05:24:55PM -0800, Christopher Li wrote:
> > 
> > I think we have the lock on ext3_rename. I might be wrong.
> > If other process can change the dir between the add new entry
> > and delete old entry. We should also need to check the entry have
> > been delete from other process in case fall into dead loop? 
> 
> We take the BKL, yes; but if we need to sleep waiting for a block to
> be read in, that's when another process can run.  Yes, that means
> another process could end up deleting the entry out from under us ---
> or make some other change to the directory.  I was actually quite

Then the correct fix for the rename problem is very nasty then.
I thought about remove entry first then add new entry. Then if
add new entry fail abort the whole transaction. It looks nasty
also if it sleep in between, the file goes nowhere.

> nervous about this, so I spent some time auditing the code paths of
> when do_split() might sleep, to make sure it would never leave the
> directory in an unstable condition.
> 
> Things will actually get easier once we fine-grain lock ext3 (and
> remove the BKL), since we'll very likely end up locking the directory
> during an insert, rename, or delete, and so we don't have to worry
> about things happening in an interleaved fashion.

Agree.

>
> I wasn't able to find your e-mail address in the source file.... 
> grep -i chrisl fs/ext3/*.c didn't turn up anything?
Sorry, indeed I did not put my email address there. Never mind.

Chris



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink
  2002-11-06 22:47           ` Alexander Viro
@ 2002-11-07  2:44             ` Theodore Ts'o
  0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2002-11-07  2:44 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Christopher Li, Jeremy Fitzhardinge, Ext2 devel,
	Linux Kernel List

On Wed, Nov 06, 2002 at 05:47:40PM -0500, Alexander Viro wrote:
> 
> HUH?
> 
> ->rename() holds ->i_sem on both directories.  So do all other directory
> methods.  What the hell is going on there?

What's going on is that had a brain-fart, and forgot about inode
semaphore that's held by the VFS layer.  Chris, sorry about that;
there is no need retry multiple times.  We only need to retry once, in
the case where the node gets split.

						- Ted


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name , leaves ino with bad nlink
@ 2002-11-07  5:40 Christopher Li
  0 siblings, 0 replies; 13+ messages in thread
From: Christopher Li @ 2002-11-07  5:40 UTC (permalink / raw)
  To: 'Theodore Ts'o ', 'Alexander Viro '
  Cc: Christopher Li, 'Jeremy Fitzhardinge ',
	'Ext2 devel ', 'Linux Kernel List '

The revised patch:

--- namei.c     2002/11/06 07:19:11     1.2
+++ namei.c     2002/11/07 05:24:46
@@ -2167,8 +2167,25 @@
        /*
         * ok, that's it
         */
-       ext3_delete_entry(handle, old_dir, old_de, old_bh);
-
+       retval = ext3_delete_entry(handle, old_dir, old_de, old_bh);
+       if (retval == -ENOENT) {
+               /*
+                * old_de can be moved during ext3_add_entry.
+                */
+               struct buffer_head * old_bh2;
+               struct ext3_dir_entry_2 * old_de2;
+               old_bh2 = ext3_find_entry (old_dentry, &old_de2);
+               if (old_bh2) {
+                       retval = ext3_delete_entry(handle, old_dir, old_de2,
+                                                  old_bh2);
+                       brelse(old_bh2);
+               }
+       }
+       if (retval) {
+               ext3_warning (old_dir->i_sb, "ext3_rename",
+                             "Deleting old file (%lu), %d, error=%d",
+                             old_dir->i_ino, old_dir->i_nlink, retval);
+       }
        if (new_inode) {
                new_inode->i_nlink--;
                new_inode->i_ctime = CURRENT_TIME;

Chris

-----Original Message-----
From: Theodore Ts'o
To: Alexander Viro
Cc: Christopher Li; Jeremy Fitzhardinge; Ext2 devel; Linux Kernel List
Sent: 11/6/02 6:44 PM
Subject: Re: [PATCH] Fix bug in ext3 htree rename: doesn't delete old name,
leaves ino with bad nlink

On Wed, Nov 06, 2002 at 05:47:40PM -0500, Alexander Viro wrote:
> 
> HUH?
> 
> ->rename() holds ->i_sem on both directories.  So do all other
directory
> methods.  What the hell is going on there?

What's going on is that had a brain-fart, and forgot about inode
semaphore that's held by the VFS layer.  Chris, sorry about that;
there is no need retry multiple times.  We only need to retry once, in
the case where the node gets split.

						- Ted



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-11-07  5:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-05  4:47 bug in ext3 htree rename: doesn't delete old name, leaves ino with bad nlink Jeremy Fitzhardinge
2002-11-05 21:24 ` [Ext2-devel] " chrisl
2002-11-06  5:02   ` Jeremy Fitzhardinge
2002-11-06  8:25   ` [PATCH] Fix " chrisl
2002-11-06  8:44     ` Andrew Morton
2002-11-06 21:40     ` Theodore Ts'o
2002-11-07  1:24       ` Christopher Li
2002-11-06 22:41         ` Theodore Ts'o
2002-11-06 22:47           ` Alexander Viro
2002-11-07  2:44             ` Theodore Ts'o
2002-11-07  1:58           ` Christopher Li
2002-11-06 23:27         ` [Ext2-devel] " Andreas Dilger
  -- strict thread matches above, loose matches on Subject: below --
2002-11-07  5:40 [PATCH] Fix bug in ext3 htree rename: doesn't delete old name , " Christopher Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox