linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ext2 hang on (intentionally) corrupted filesystem
@ 2012-05-05  1:38 Sami Liedes
  2012-05-09 21:12 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Sami Liedes @ 2012-05-05  1:38 UTC (permalink / raw)
  To: linux-ext4, linux-fsdev

[-- Attachment #1: Type: text/plain, Size: 3583 bytes --]

Hi,

There seems to be a bug in the ext2 implementation (in vanilla 3.3.4)
where operations on a corrupted ext2 filesystem cause a hung task:

1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
2. mount ... /mnt -t ext2 -o errors=continue
3. Do some operations; what I do (it's the rm that crashes):

  timeout 30 cp -r doc doc2 >&/dev/null
  timeout 30 find -xdev >&/dev/null
  timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
  timeout 30 mkdir tmp >&/dev/null
  timeout 30 echo whoah >tmp/filu 2>/dev/null
  timeout 30 rm -rf /mnt/* >&/dev/null

4. The rm task hangs

The filesystem in fact differs from a pristine, fully working ext2
filesystem by only one bit:

------------------------------------------------------------
$ diff -u <(hd testimg.ext2) <(hd testimg.ext2.110.min)
--- /dev/fd/63 2012-05-05 04:26:49.972546154 +0300
+++ /dev/fd/62 2012-05-05 04:26:49.972546154 +0300
@@ -13520,7 +13520,7 @@
 00902c90  73 64 65 31 00 00 00 00  00 00 00 00 00 00 00 00  |sde1............|
 00902ca0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
-00903000  1d 05 00 00 0c 00 01 02  2e 00 00 00 29 00 00 00  |............)...|
+00903000  1d 05 00 00 0c 00 01 02  6e 00 00 00 29 00 00 00  |........n...)...|
 00903010  0c 00 02 02 2e 2e 00 00  1e 05 00 00 e8 03 26 01  |..............&.|
 00903020  5c 78 32 66 64 65 76 69  63 65 73 5c 78 32 66 76  |\x2fdevices\x2fv|
 00903030  69 72 74 75 61 6c 5c 78  32 66 74 74 79 5c 78 32  |irtual\x2ftty\x2|
------------------------------------------------------------

The buggy filesystem (10 MiB uncompressed) can be downloaded from

   http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2

and the pristine filesystem from

   http://sli.dy.fi/~sliedes/berserker/testcases/pristine.ext2.bz2

See the dmesg output below.

	Sami


------------------------------------------------------------
INFO: task rm:1549 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm              D ffff880006f0cd40     0  1549   1548 0x00020004
 ffff88000560ddc8 0000000000000046 ffff8800068b2040 ffff88000560dfd8
 ffff88000560dfd8 ffff88000560dfd8 ffff880007852040 ffff8800068b2040
 ffff88000560de08 ffff880006f0cd00 ffff8800068b2040 0000000000000246
Call Trace:
 [<ffffffff8171d609>] schedule+0x39/0x50
 [<ffffffff8171baa0>] mutex_lock_nested+0x130/0x2f0
 [<ffffffff810fb467>] ? vfs_rmdir+0x67/0x120
 [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
 [<ffffffff810fb62b>] do_rmdir+0x10b/0x120
 [<ffffffff81556e5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff810fb94d>] sys_unlinkat+0x2d/0x40
 [<ffffffff817204b1>] sysenter_dispatch+0x7/0x2a
 [<ffffffff81556e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
2 locks held by rm/1549:
 #0:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff810fb58c>] do_rmdir+0x6c/0x120
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 361, comm: khungtaskd Not tainted 3.3.4 #3
Call Trace:
 [<ffffffff81713aff>] panic+0xb5/0x1be
 [<ffffffff8108a017>] watchdog+0x2b7/0x2c0
 [<ffffffff81089dc6>] ? watchdog+0x66/0x2c0
 [<ffffffff81089d60>] ? hung_task_panic+0x20/0x20
 [<ffffffff810525cd>] kthread+0x8d/0xa0
 [<ffffffff81720304>] kernel_thread_helper+0x4/0x10
 [<ffffffff8171ec30>] ? retint_restore_args+0x13/0x13
 [<ffffffff81052540>] ? kthread_flush_work_fn+0x10/0x10
 [<ffffffff81720300>] ? gs_change+0x13/0x13
Rebooting in 1 seconds..
------------------------------------------------------------

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext2 hang on (intentionally) corrupted filesystem
  2012-05-05  1:38 ext2 hang on (intentionally) corrupted filesystem Sami Liedes
@ 2012-05-09 21:12 ` Jan Kara
  2012-05-28 17:31   ` Ted Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-05-09 21:12 UTC (permalink / raw)
  To: Sami Liedes; +Cc: linux-ext4, linux-fsdev

  Hello,

  Thanks for report!

On Sat 05-05-12 04:38:41, Sami Liedes wrote:
> There seems to be a bug in the ext2 implementation (in vanilla 3.3.4)
> where operations on a corrupted ext2 filesystem cause a hung task:
> 
> 1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> 2. mount ... /mnt -t ext2 -o errors=continue
> 3. Do some operations; what I do (it's the rm that crashes):
> 
>   timeout 30 cp -r doc doc2 >&/dev/null
>   timeout 30 find -xdev >&/dev/null
>   timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
>   timeout 30 mkdir tmp >&/dev/null
>   timeout 30 echo whoah >tmp/filu 2>/dev/null
>   timeout 30 rm -rf /mnt/* >&/dev/null
                      ^^^ Should /mnt really be here? I guess some changing
of a directory is missing...

> 4. The rm task hangs
> 
> The filesystem in fact differs from a pristine, fully working ext2
> filesystem by only one bit:
> 
> ------------------------------------------------------------
> $ diff -u <(hd testimg.ext2) <(hd testimg.ext2.110.min)
> --- /dev/fd/63 2012-05-05 04:26:49.972546154 +0300
> +++ /dev/fd/62 2012-05-05 04:26:49.972546154 +0300
> @@ -13520,7 +13520,7 @@
>  00902c90  73 64 65 31 00 00 00 00  00 00 00 00 00 00 00 00  |sde1............|
>  00902ca0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
>  *
> -00903000  1d 05 00 00 0c 00 01 02  2e 00 00 00 29 00 00 00  |............)...|
> +00903000  1d 05 00 00 0c 00 01 02  6e 00 00 00 29 00 00 00  |........n...)...|
>  00903010  0c 00 02 02 2e 2e 00 00  1e 05 00 00 e8 03 26 01  |..............&.|
>  00903020  5c 78 32 66 64 65 76 69  63 65 73 5c 78 32 66 76  |\x2fdevices\x2fv|
>  00903030  69 72 74 75 61 6c 5c 78  32 66 74 74 79 5c 78 32  |irtual\x2ftty\x2|
> ------------------------------------------------------------
  OK, you've changed '.' directory entry to a normal directory entry with a
name 0x6e. I guess that has some potential in confusing something. Actually
rm -rf does not reproduce the problem for me (it just complains about
cyclic directory hierarchy) but trying to rmdir bad entry hangs the system
- we try to grab i_mutex for the directory twice because the directory is
it's own parent... That would be kind of hard to fix in VFS since once our
directory structure contains a cycle, our locking protocol is no longer
deadlock free. I'll see what we could do...

> The buggy filesystem (10 MiB uncompressed) can be downloaded from
> 
>    http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> 
> and the pristine filesystem from
> 
>    http://sli.dy.fi/~sliedes/berserker/testcases/pristine.ext2.bz2
> 
> See the dmesg output below.

								Honza

> ------------------------------------------------------------
> INFO: task rm:1549 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> rm              D ffff880006f0cd40     0  1549   1548 0x00020004
>  ffff88000560ddc8 0000000000000046 ffff8800068b2040 ffff88000560dfd8
>  ffff88000560dfd8 ffff88000560dfd8 ffff880007852040 ffff8800068b2040
>  ffff88000560de08 ffff880006f0cd00 ffff8800068b2040 0000000000000246
> Call Trace:
>  [<ffffffff8171d609>] schedule+0x39/0x50
>  [<ffffffff8171baa0>] mutex_lock_nested+0x130/0x2f0
>  [<ffffffff810fb467>] ? vfs_rmdir+0x67/0x120
>  [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
>  [<ffffffff810fb62b>] do_rmdir+0x10b/0x120
>  [<ffffffff81556e5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff810fb94d>] sys_unlinkat+0x2d/0x40
>  [<ffffffff817204b1>] sysenter_dispatch+0x7/0x2a
>  [<ffffffff81556e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> 2 locks held by rm/1549:
>  #0:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff810fb58c>] do_rmdir+0x6c/0x120
>  #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
> Kernel panic - not syncing: hung_task: blocked tasks
> Pid: 361, comm: khungtaskd Not tainted 3.3.4 #3
> Call Trace:
>  [<ffffffff81713aff>] panic+0xb5/0x1be
>  [<ffffffff8108a017>] watchdog+0x2b7/0x2c0
>  [<ffffffff81089dc6>] ? watchdog+0x66/0x2c0
>  [<ffffffff81089d60>] ? hung_task_panic+0x20/0x20
>  [<ffffffff810525cd>] kthread+0x8d/0xa0
>  [<ffffffff81720304>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8171ec30>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81052540>] ? kthread_flush_work_fn+0x10/0x10
>  [<ffffffff81720300>] ? gs_change+0x13/0x13
> Rebooting in 1 seconds..
> ------------------------------------------------------------


-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext2 hang on (intentionally) corrupted filesystem
  2012-05-09 21:12 ` Jan Kara
@ 2012-05-28 17:31   ` Ted Ts'o
  2012-05-28 17:34     ` [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Ted Ts'o @ 2012-05-28 17:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sami Liedes, linux-ext4, linux-fsdev, Al Viro

On Wed, May 09, 2012 at 11:12:36PM +0200, Jan Kara wrote:
> > 1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> > 2. mount ... /mnt -t ext2 -o errors=continue
> > 3. Do some operations; what I do (it's the rm that crashes):
> >   timeout 30 rm -rf /mnt/* >&/dev/null
> > 4. The rm task hangs
> > 
>   OK, you've changed '.' directory entry to a normal directory entry with a
> name 0x6e. I guess that has some potential in confusing something. Actually
> rm -rf does not reproduce the problem for me (it just complains about
> cyclic directory hierarchy) but trying to rmdir bad entry hangs the system
> - we try to grab i_mutex for the directory twice because the directory is
> it's own parent... That would be kind of hard to fix in VFS since once our
> directory structure contains a cycle, our locking protocol is no longer
> deadlock free. I'll see what we could do...

Just wanted to chime in that this crashes when the file system is
mounted using ext4; not surprising, since it's clearly a VFS issue.

The following proof-of-concept patch (see reply chained to this mail
message) fixes the problem for your test file system.  Al, what do you
think?  Is it worth it to define a new mechanism where we can pass
VFS-detected corruption down to the low-level file system?

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system
  2012-05-28 17:31   ` Ted Ts'o
@ 2012-05-28 17:34     ` Theodore Ts'o
  0 siblings, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2012-05-28 17:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Ext4 Developers List, viro, sami.liedes, Theodore Ts'o

If we rmdir a directory which is a hard link to '.', we will deadlock
trying to grab the directory's i_mutex.  Check for this condition and
return EINVAL, which is what we return if the user attempts to rmdir
"/foo/bar/."

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/namei.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 0062dd1..081f872 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2774,6 +2774,17 @@ static long do_rmdir(int dfd, const char __user *pathname)
 		error = -ENOENT;
 		goto exit3;
 	}
+	if (nd.path.dentry->d_inode == dentry->d_inode) {
+		/*
+		 * Corrupt file system where there is a symlink to
+		 * '.'; treat it as if we are trying to rmdir '.'
+		 *
+		 * XXX Should we call into the low-level file system
+		 * to request that the file system be marked corrupt?
+		 */
+		error = -EINVAL;
+		goto exit3;
+	}
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-- 
1.7.10.2.552.gaa3bb87


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-28 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-05  1:38 ext2 hang on (intentionally) corrupted filesystem Sami Liedes
2012-05-09 21:12 ` Jan Kara
2012-05-28 17:31   ` Ted Ts'o
2012-05-28 17:34     ` [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system Theodore Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).