ext2 hang on (intentionally) corrupted filesystem

All of lore.kernel.org
 help / color / mirror / Atom feed

* ext2 hang on (intentionally) corrupted filesystem
@ 2012-05-05  1:38 Sami Liedes
  2012-05-09 21:12 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Sami Liedes @ 2012-05-05  1:38 UTC (permalink / raw)
  To: linux-ext4, linux-fsdev

[-- Attachment #1: Type: text/plain, Size: 3583 bytes --]

Hi,

There seems to be a bug in the ext2 implementation (in vanilla 3.3.4)
where operations on a corrupted ext2 filesystem cause a hung task:

1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
2. mount ... /mnt -t ext2 -o errors=continue
3. Do some operations; what I do (it's the rm that crashes):

  timeout 30 cp -r doc doc2 >&/dev/null
  timeout 30 find -xdev >&/dev/null
  timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
  timeout 30 mkdir tmp >&/dev/null
  timeout 30 echo whoah >tmp/filu 2>/dev/null
  timeout 30 rm -rf /mnt/* >&/dev/null

4. The rm task hangs

The filesystem in fact differs from a pristine, fully working ext2
filesystem by only one bit:

------------------------------------------------------------
$ diff -u <(hd testimg.ext2) <(hd testimg.ext2.110.min)
--- /dev/fd/63 2012-05-05 04:26:49.972546154 +0300
+++ /dev/fd/62 2012-05-05 04:26:49.972546154 +0300
@@ -13520,7 +13520,7 @@
 00902c90  73 64 65 31 00 00 00 00  00 00 00 00 00 00 00 00  |sde1............|
 00902ca0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
-00903000  1d 05 00 00 0c 00 01 02  2e 00 00 00 29 00 00 00  |............)...|
+00903000  1d 05 00 00 0c 00 01 02  6e 00 00 00 29 00 00 00  |........n...)...|
 00903010  0c 00 02 02 2e 2e 00 00  1e 05 00 00 e8 03 26 01  |..............&.|
 00903020  5c 78 32 66 64 65 76 69  63 65 73 5c 78 32 66 76  |\x2fdevices\x2fv|
 00903030  69 72 74 75 61 6c 5c 78  32 66 74 74 79 5c 78 32  |irtual\x2ftty\x2|
------------------------------------------------------------

The buggy filesystem (10 MiB uncompressed) can be downloaded from

   http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2

and the pristine filesystem from

   http://sli.dy.fi/~sliedes/berserker/testcases/pristine.ext2.bz2

See the dmesg output below.

	Sami


------------------------------------------------------------
INFO: task rm:1549 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm              D ffff880006f0cd40     0  1549   1548 0x00020004
 ffff88000560ddc8 0000000000000046 ffff8800068b2040 ffff88000560dfd8
 ffff88000560dfd8 ffff88000560dfd8 ffff880007852040 ffff8800068b2040
 ffff88000560de08 ffff880006f0cd00 ffff8800068b2040 0000000000000246
Call Trace:
 [<ffffffff8171d609>] schedule+0x39/0x50
 [<ffffffff8171baa0>] mutex_lock_nested+0x130/0x2f0
 [<ffffffff810fb467>] ? vfs_rmdir+0x67/0x120
 [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
 [<ffffffff810fb62b>] do_rmdir+0x10b/0x120
 [<ffffffff81556e5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff810fb94d>] sys_unlinkat+0x2d/0x40
 [<ffffffff817204b1>] sysenter_dispatch+0x7/0x2a
 [<ffffffff81556e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
2 locks held by rm/1549:
 #0:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff810fb58c>] do_rmdir+0x6c/0x120
 #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
Kernel panic - not syncing: hung_task: blocked tasks
Pid: 361, comm: khungtaskd Not tainted 3.3.4 #3
Call Trace:
 [<ffffffff81713aff>] panic+0xb5/0x1be
 [<ffffffff8108a017>] watchdog+0x2b7/0x2c0
 [<ffffffff81089dc6>] ? watchdog+0x66/0x2c0
 [<ffffffff81089d60>] ? hung_task_panic+0x20/0x20
 [<ffffffff810525cd>] kthread+0x8d/0xa0
 [<ffffffff81720304>] kernel_thread_helper+0x4/0x10
 [<ffffffff8171ec30>] ? retint_restore_args+0x13/0x13
 [<ffffffff81052540>] ? kthread_flush_work_fn+0x10/0x10
 [<ffffffff81720300>] ? gs_change+0x13/0x13
Rebooting in 1 seconds..
------------------------------------------------------------

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext2 hang on (intentionally) corrupted filesystem
  2012-05-05  1:38 ext2 hang on (intentionally) corrupted filesystem Sami Liedes
@ 2012-05-09 21:12 ` Jan Kara
  2012-05-28 17:31   ` Ted Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2012-05-09 21:12 UTC (permalink / raw)
  To: Sami Liedes; +Cc: linux-ext4, linux-fsdev

  Hello,

  Thanks for report!

On Sat 05-05-12 04:38:41, Sami Liedes wrote:
> There seems to be a bug in the ext2 implementation (in vanilla 3.3.4)
> where operations on a corrupted ext2 filesystem cause a hung task:
> 
> 1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> 2. mount ... /mnt -t ext2 -o errors=continue
> 3. Do some operations; what I do (it's the rm that crashes):
> 
>   timeout 30 cp -r doc doc2 >&/dev/null
>   timeout 30 find -xdev >&/dev/null
>   timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
>   timeout 30 mkdir tmp >&/dev/null
>   timeout 30 echo whoah >tmp/filu 2>/dev/null
>   timeout 30 rm -rf /mnt/* >&/dev/null
                      ^^^ Should /mnt really be here? I guess some changing
of a directory is missing...

> 4. The rm task hangs
> 
> The filesystem in fact differs from a pristine, fully working ext2
> filesystem by only one bit:
> 
> ------------------------------------------------------------
> $ diff -u <(hd testimg.ext2) <(hd testimg.ext2.110.min)
> --- /dev/fd/63 2012-05-05 04:26:49.972546154 +0300
> +++ /dev/fd/62 2012-05-05 04:26:49.972546154 +0300
> @@ -13520,7 +13520,7 @@
>  00902c90  73 64 65 31 00 00 00 00  00 00 00 00 00 00 00 00  |sde1............|
>  00902ca0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
>  *
> -00903000  1d 05 00 00 0c 00 01 02  2e 00 00 00 29 00 00 00  |............)...|
> +00903000  1d 05 00 00 0c 00 01 02  6e 00 00 00 29 00 00 00  |........n...)...|
>  00903010  0c 00 02 02 2e 2e 00 00  1e 05 00 00 e8 03 26 01  |..............&.|
>  00903020  5c 78 32 66 64 65 76 69  63 65 73 5c 78 32 66 76  |\x2fdevices\x2fv|
>  00903030  69 72 74 75 61 6c 5c 78  32 66 74 74 79 5c 78 32  |irtual\x2ftty\x2|
> ------------------------------------------------------------
  OK, you've changed '.' directory entry to a normal directory entry with a
name 0x6e. I guess that has some potential in confusing something. Actually
rm -rf does not reproduce the problem for me (it just complains about
cyclic directory hierarchy) but trying to rmdir bad entry hangs the system
- we try to grab i_mutex for the directory twice because the directory is
it's own parent... That would be kind of hard to fix in VFS since once our
directory structure contains a cycle, our locking protocol is no longer
deadlock free. I'll see what we could do...

> The buggy filesystem (10 MiB uncompressed) can be downloaded from
> 
>    http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> 
> and the pristine filesystem from
> 
>    http://sli.dy.fi/~sliedes/berserker/testcases/pristine.ext2.bz2
> 
> See the dmesg output below.

								Honza

> ------------------------------------------------------------
> INFO: task rm:1549 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> rm              D ffff880006f0cd40     0  1549   1548 0x00020004
>  ffff88000560ddc8 0000000000000046 ffff8800068b2040 ffff88000560dfd8
>  ffff88000560dfd8 ffff88000560dfd8 ffff880007852040 ffff8800068b2040
>  ffff88000560de08 ffff880006f0cd00 ffff8800068b2040 0000000000000246
> Call Trace:
>  [<ffffffff8171d609>] schedule+0x39/0x50
>  [<ffffffff8171baa0>] mutex_lock_nested+0x130/0x2f0
>  [<ffffffff810fb467>] ? vfs_rmdir+0x67/0x120
>  [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
>  [<ffffffff810fb62b>] do_rmdir+0x10b/0x120
>  [<ffffffff81556e5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff810fb94d>] sys_unlinkat+0x2d/0x40
>  [<ffffffff817204b1>] sysenter_dispatch+0x7/0x2a
>  [<ffffffff81556e1e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> 2 locks held by rm/1549:
>  #0:  (&type->i_mutex_dir_key#4/1){+.+.+.}, at: [<ffffffff810fb58c>] do_rmdir+0x6c/0x120
>  #1:  (&type->i_mutex_dir_key#4){+.+.+.}, at: [<ffffffff810fb467>] vfs_rmdir+0x67/0x120
> Kernel panic - not syncing: hung_task: blocked tasks
> Pid: 361, comm: khungtaskd Not tainted 3.3.4 #3
> Call Trace:
>  [<ffffffff81713aff>] panic+0xb5/0x1be
>  [<ffffffff8108a017>] watchdog+0x2b7/0x2c0
>  [<ffffffff81089dc6>] ? watchdog+0x66/0x2c0
>  [<ffffffff81089d60>] ? hung_task_panic+0x20/0x20
>  [<ffffffff810525cd>] kthread+0x8d/0xa0
>  [<ffffffff81720304>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8171ec30>] ? retint_restore_args+0x13/0x13
>  [<ffffffff81052540>] ? kthread_flush_work_fn+0x10/0x10
>  [<ffffffff81720300>] ? gs_change+0x13/0x13
> Rebooting in 1 seconds..
> ------------------------------------------------------------


-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ext2 hang on (intentionally) corrupted filesystem
  2012-05-09 21:12 ` Jan Kara
@ 2012-05-28 17:31   ` Ted Ts'o
  2012-05-28 17:34     ` [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Ted Ts'o @ 2012-05-28 17:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sami Liedes, linux-ext4, linux-fsdev, Al Viro

On Wed, May 09, 2012 at 11:12:36PM +0200, Jan Kara wrote:
> > 1. wget http://sli.dy.fi/~sliedes/berserker/testcases/ext2.110.min.bz2
> > 2. mount ... /mnt -t ext2 -o errors=continue
> > 3. Do some operations; what I do (it's the rm that crashes):
> >   timeout 30 rm -rf /mnt/* >&/dev/null
> > 4. The rm task hangs
> > 
>   OK, you've changed '.' directory entry to a normal directory entry with a
> name 0x6e. I guess that has some potential in confusing something. Actually
> rm -rf does not reproduce the problem for me (it just complains about
> cyclic directory hierarchy) but trying to rmdir bad entry hangs the system
> - we try to grab i_mutex for the directory twice because the directory is
> it's own parent... That would be kind of hard to fix in VFS since once our
> directory structure contains a cycle, our locking protocol is no longer
> deadlock free. I'll see what we could do...

Just wanted to chime in that this crashes when the file system is
mounted using ext4; not surprising, since it's clearly a VFS issue.

The following proof-of-concept patch (see reply chained to this mail
message) fixes the problem for your test file system.  Al, what do you
think?  Is it worth it to define a new mechanism where we can pass
VFS-detected corruption down to the low-level file system?

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system
  2012-05-28 17:31   ` Ted Ts'o
@ 2012-05-28 17:34     ` Theodore Ts'o
  0 siblings, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2012-05-28 17:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Ext4 Developers List, viro, sami.liedes, Theodore Ts'o

If we rmdir a directory which is a hard link to '.', we will deadlock
trying to grab the directory's i_mutex.  Check for this condition and
return EINVAL, which is what we return if the user attempts to rmdir
"/foo/bar/."

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/namei.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/namei.c b/fs/namei.c
index 0062dd1..081f872 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2774,6 +2774,17 @@ static long do_rmdir(int dfd, const char __user *pathname)
 		error = -ENOENT;
 		goto exit3;
 	}
+	if (nd.path.dentry->d_inode == dentry->d_inode) {
+		/*
+		 * Corrupt file system where there is a symlink to
+		 * '.'; treat it as if we are trying to rmdir '.'
+		 *
+		 * XXX Should we call into the low-level file system
+		 * to request that the file system be marked corrupt?
+		 */
+		error = -EINVAL;
+		goto exit3;
+	}
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit3;
-- 
1.7.10.2.552.gaa3bb87


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-28 17:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-05  1:38 ext2 hang on (intentionally) corrupted filesystem Sami Liedes
2012-05-09 21:12 ` Jan Kara
2012-05-28 17:31   ` Ted Ts'o
2012-05-28 17:34     ` [PATCH] vfs: avoid hang caused by attempting to rmdir an invalid file system Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.