Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
       [not found] <bug-11266-10286@http.bugzilla.kernel.org/>
@ 2008-08-07 17:52 ` Andrew Morton
       [not found] ` <0K5800031SEDU2@smtp02.hut-mail>
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2008-08-07 17:52 UTC (permalink / raw)
  To: sliedes; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu,  7 Aug 2008 05:53:37 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11266
> 
>            Summary: unable to handle kernel paging request in
>                     ext2_free_blocks
>            Product: File System
>            Version: 2.5
>      KernelVersion: 2.6.27-rc2 + patch for #10976 (now in -mm)
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: ext2
>         AssignedTo: akpm@osdl.org
>         ReportedBy: sliedes@cc.hut.fi
> 
> 
> Latest working kernel version: (I think at least 2.6.25.4 works)
> Earliest failing kernel version:
> Distribution: Minimal Debian sid (unstable)
> Hardware Environment: qemu x86
> Software Environment:
> Problem Description:
> 
> Mere rm -rf after mounting on an intentionally corrupted partition occasionally
> causes "BUG: unable to handle kernel paging request" in ext2_free_blocks. 
> 
> Unfortunately the issue seems to be timing sensitive (or something), doing it
> on the same filesystem only sometimes results in the crash :( But I have
> reproduced it something like 6 times now with brief testing.
> 
> If you wish, I can attach some filesystems with which I have been able to
> reproduce this at least once.
> 
> Another thing I could do is take a look at it with the new kernel debugger
> (which I haven't tried yet) if none of you are able to figure out this from the
> traces. Is there something you would specifically want me to take a look at?
> The local and referenced variables at ext2_free_blocks(), I guess?
> 
> I think I ran quite extensive tests on 2.6.25.4 & ext2, so I suspect (but am
> not sure, I've made some changes to the way I test) this bug is newer than
> 2.6.25.4. I could do some bisecting too, but I haven't managed to automate the
> thing yet.
> 
> Here's a script I run under qemu, google for zzuf (it's a fuzzer), and timeout
> is from the Debian package `timeout':
> 
> ----------
> #!/bin/sh
> 
> if [ "`hostname`" != "fstest" ]; then
>    echo "This is a dangerous script."
>    echo "Set your hostname to \`fstest\' if you want to use it."
>    exit 1
> fi
> 
> umount /dev/hdb
> umount /dev/hdc
> /etc/init.d/sysklogd stop
> /etc/init.d/klogd stop
> /etc/init.d/cron stop
> mount /dev/hda / -t ext3 -o remount,ro || exit 1
> 
> ulimit -t 20
> 
> for ((s=$1; s<1000000000; s++)); do
>   umount /mnt
>   echo '***** zzuffing *****' seed $s
>   zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
>   mount /dev/hdb /mnt -o errors=continue || continue
>   cd /mnt || continue
>   cp -r doc doc2 >&/dev/null
>   find -xdev >&/dev/null
>   find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
>   mkdir tmp >&/dev/null
>   echo whoah >tmp/filu 2>/dev/null
>   rm -rf /mnt/* >&/dev/null
>   cd /
> done
> ----------
> 
> The attached backtraces all start from the time of mounting the filesystem.
> 

Yes, please do test 2.6.26.

Aneesh, your recent changes to the ext2 block allocator would have to
be prime suspects here.


^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <0K5800031SEDU2@smtp02.hut-mail>]

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
       [not found] ` <0K5800031SEDU2@smtp02.hut-mail>
@ 2008-08-07 20:07   ` Sami Liedes
  2008-08-07 20:28     ` Sami Liedes
  0 siblings, 1 reply; 13+ messages in thread
From: Sami Liedes @ 2008-08-07 20:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V

On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> Yes, please do test 2.6.26.

Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-07 20:07   ` Sami Liedes
@ 2008-08-07 20:28     ` Sami Liedes
  2008-08-18 14:58       ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Sami Liedes @ 2008-08-07 20:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V

On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote:
> On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> > Yes, please do test 2.6.26.
> 
> Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.

2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4
working (unless something changed between those two versions).

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-07 20:28     ` Sami Liedes
@ 2008-08-18 14:58       ` Jan Kara
  2008-08-18 16:51         ` Aneesh Kumar K.V
  2008-08-19 21:43         ` Sami Liedes
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Kara @ 2008-08-18 14:58 UTC (permalink / raw)
  To: Sami Liedes; +Cc: Andrew Morton, bugme-daemon, linux-ext4, Aneesh Kumar K.V

> On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote:
> > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> > > Yes, please do test 2.6.26.
> > 
> > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.
> 
> 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4
> working (unless something changed between those two versions).
  I think this is the same problem Vegard reported in
http://marc.info/?l=linux-ext4&m=121637999611618&w=2.
  The problem seems to be in ext2_valid_block_bitmap() which does

  bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
  offset = bitmap_blk - group_first_block;
  if (!ext2_test_bit(offset, bh->b_data))

  (and similarly for inode bitmap). Now when the group descriptor is
corrupted, this simply accesses beyond the bh->b_data...
  The patch below should hopefully fix the issue. Can you test it
please?

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs
---

>From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 18 Aug 2008 16:45:11 +0200
Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it

We have to check whether a group descriptor isn't corrupted in
read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
to access bits outside of bitmap and Oops happens.

CC: Vegard Nossum <vegard.nossum@gmail.com>
CC: Sami Liedes <sliedes@cc.hut.fi>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext2/balloc.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index 10bb02c..9104712 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -113,6 +113,17 @@ err_out:
 	return 0;
 }
 
+static int ext2_block_in_group(struct super_block *sb,
+			unsigned int block_group, ext2_fsblk_t block)
+{
+	if (block < ext2_group_first_block_no(sb, block_group))
+		return 0;
+	if (block >= ext2_group_first_block_no(sb, block_group) +
+	    EXT2_BLOCKS_PER_GROUP(sb))
+		return 0;
+	return 1;
+}
+
 /*
  * Read the bitmap for a given block_group,and validate the
  * bits for block/inode/inode tables are set in the bitmaps
@@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
 	desc = ext2_get_group_desc(sb, block_group, NULL);
 	if (!desc)
 		return NULL;
+	if (!ext2_block_in_group(sb, block_group,
+				le32_to_cpu(desc->bg_block_bitmap)) ||
+	    !ext2_block_in_group(sb, block_group,
+				le32_to_cpu(desc->bg_inode_bitmap)) ||
+	    !ext2_block_in_group(sb, block_group,
+				le32_to_cpu(desc->bg_inode_table)) ||
+	    !ext2_block_in_group(sb, block_group,
+				le32_to_cpu(desc->bg_inode_table) +
+				EXT2_SB(sb)->s_itb_per_group - 1)) {
+		ext2_error(sb, __func__, "Corrupted group descriptor - "
+				"block_group = %u, block_bitmap = %u, "
+				"inode_bitmap = %u, inode_table = %u",
+				block_group,
+				le32_to_cpu(desc->bg_block_bitmap),
+				le32_to_cpu(desc->bg_inode_bitmap),
+				le32_to_cpu(desc->bg_inode_table));
+		return NULL;
+	}
 	bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
 	bh = sb_getblk(sb, bitmap_blk);
 	if (unlikely(!bh)) {
-- 
1.5.2.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-18 14:58       ` Jan Kara
@ 2008-08-18 16:51         ` Aneesh Kumar K.V
  2008-08-19  3:24           ` Andreas Dilger
  2008-08-19 21:43         ` Sami Liedes
  1 sibling, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2008-08-18 16:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: Sami Liedes, Andrew Morton, bugme-daemon, linux-ext4

On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote:
> 
> From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 18 Aug 2008 16:45:11 +0200
> Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it
> 
> We have to check whether a group descriptor isn't corrupted in
> read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
> to access bits outside of bitmap and Oops happens.
> 
> CC: Vegard Nossum <vegard.nossum@gmail.com>
> CC: Sami Liedes <sliedes@cc.hut.fi>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext2/balloc.c |   29 +++++++++++++++++++++++++++++
>  1 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
> index 10bb02c..9104712 100644
> --- a/fs/ext2/balloc.c
> +++ b/fs/ext2/balloc.c
> @@ -113,6 +113,17 @@ err_out:
>  	return 0;
>  }
> 
> +static int ext2_block_in_group(struct super_block *sb,
> +			unsigned int block_group, ext2_fsblk_t block)
> +{
> +	if (block < ext2_group_first_block_no(sb, block_group))
> +		return 0;
> +	if (block >= ext2_group_first_block_no(sb, block_group) +
> +	    EXT2_BLOCKS_PER_GROUP(sb))
> +		return 0;
> +	return 1;
> +}
> +
>  /*
>   * Read the bitmap for a given block_group,and validate the
>   * bits for block/inode/inode tables are set in the bitmaps
> @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
>  	desc = ext2_get_group_desc(sb, block_group, NULL);
>  	if (!desc)
>  		return NULL;
> +	if (!ext2_block_in_group(sb, block_group,
> +				le32_to_cpu(desc->bg_block_bitmap)) ||
> +	    !ext2_block_in_group(sb, block_group,
> +				le32_to_cpu(desc->bg_inode_bitmap)) ||
> +	    !ext2_block_in_group(sb, block_group,
> +				le32_to_cpu(desc->bg_inode_table)) ||
> +	    !ext2_block_in_group(sb, block_group,
> +				le32_to_cpu(desc->bg_inode_table) +
> +				EXT2_SB(sb)->s_itb_per_group - 1)) {
> +		ext2_error(sb, __func__, "Corrupted group descriptor - "
> +				"block_group = %u, block_bitmap = %u, "
> +				"inode_bitmap = %u, inode_table = %u",
> +				block_group,
> +				le32_to_cpu(desc->bg_block_bitmap),
> +				le32_to_cpu(desc->bg_inode_bitmap),
> +				le32_to_cpu(desc->bg_inode_table));
> +		return NULL;
> +	}
>  	bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
>  	bh = sb_getblk(sb, bitmap_blk);
>  	if (unlikely(!bh)) {

Do we need to do this validation every time we do a read_block_bitmap ?
I guess we need to move the validation where we read the desc blocks
from the disk.

-aneesh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-18 16:51         ` Aneesh Kumar K.V
@ 2008-08-19  3:24           ` Andreas Dilger
  2008-08-19  9:13             ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Dilger @ 2008-08-19  3:24 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Jan Kara, Sami Liedes, Andrew Morton, bugme-daemon, linux-ext4

On Aug 18, 2008  22:21 +0530, Aneesh Kumar wrote:
> > +static int ext2_block_in_group(struct super_block *sb,
> > +			unsigned int block_group, ext2_fsblk_t block)
> > +{
> > +	if (block < ext2_group_first_block_no(sb, block_group))
> > +		return 0;
> > +	if (block >= ext2_group_first_block_no(sb, block_group) +
> > +	    EXT2_BLOCKS_PER_GROUP(sb))
> > +		return 0;
> > +	return 1;
> > +}
> > +
> >  /*
> >   * Read the bitmap for a given block_group,and validate the
> >   * bits for block/inode/inode tables are set in the bitmaps
> > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
> >  	desc = ext2_get_group_desc(sb, block_group, NULL);
> >  	if (!desc)
> >  		return NULL;
> > +	if (!ext2_block_in_group(sb, block_group,
> > +				le32_to_cpu(desc->bg_block_bitmap)) ||
> > +	    !ext2_block_in_group(sb, block_group,
> > +				le32_to_cpu(desc->bg_inode_bitmap)) ||
> > +	    !ext2_block_in_group(sb, block_group,
> > +				le32_to_cpu(desc->bg_inode_table)) ||
> > +	    !ext2_block_in_group(sb, block_group,
> > +				le32_to_cpu(desc->bg_inode_table) +
> > +				EXT2_SB(sb)->s_itb_per_group - 1)) {

Isn't equivalent checking done in ext2_check_descriptors()?  It would make
sense to abstract out the "check one group and return error" code and use
it in both places.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-19  3:24           ` Andreas Dilger
@ 2008-08-19  9:13             ` Jan Kara
  2008-08-19 10:51               ` Sami Liedes
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2008-08-19  9:13 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Aneesh Kumar K.V, Sami Liedes, Andrew Morton, bugme-daemon,
	linux-ext4

On Mon 18-08-08 21:24:10, Andreas Dilger wrote:
> On Aug 18, 2008  22:21 +0530, Aneesh Kumar wrote:
> > > +static int ext2_block_in_group(struct super_block *sb,
> > > +			unsigned int block_group, ext2_fsblk_t block)
> > > +{
> > > +	if (block < ext2_group_first_block_no(sb, block_group))
> > > +		return 0;
> > > +	if (block >= ext2_group_first_block_no(sb, block_group) +
> > > +	    EXT2_BLOCKS_PER_GROUP(sb))
> > > +		return 0;
> > > +	return 1;
> > > +}
> > > +
> > >  /*
> > >   * Read the bitmap for a given block_group,and validate the
> > >   * bits for block/inode/inode tables are set in the bitmaps
> > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
> > >  	desc = ext2_get_group_desc(sb, block_group, NULL);
> > >  	if (!desc)
> > >  		return NULL;
> > > +	if (!ext2_block_in_group(sb, block_group,
> > > +				le32_to_cpu(desc->bg_block_bitmap)) ||
> > > +	    !ext2_block_in_group(sb, block_group,
> > > +				le32_to_cpu(desc->bg_inode_bitmap)) ||
> > > +	    !ext2_block_in_group(sb, block_group,
> > > +				le32_to_cpu(desc->bg_inode_table)) ||
> > > +	    !ext2_block_in_group(sb, block_group,
> > > +				le32_to_cpu(desc->bg_inode_table) +
> > > +				EXT2_SB(sb)->s_itb_per_group - 1)) {
> 
> Isn't equivalent checking done in ext2_check_descriptors()?  It would make
> sense to abstract out the "check one group and return error" code and use
> it in both places.
  Actually yes, it is. Good point. Sami, is it the case that you have
mounted the filesystem, then intentionally corrupted it and after that
the kernel oopsed (as opposed to first corrupting the filesystem image and
mounting it after that)? That would explain how corrupted values could get
to read_block_bitmap() even though ext2_check_descriptors() checked them.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-19  9:13             ` Jan Kara
@ 2008-08-19 10:51               ` Sami Liedes
  2008-08-20 10:25                 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Sami Liedes @ 2008-08-19 10:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon,
	linux-ext4

On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > Isn't equivalent checking done in ext2_check_descriptors()?  It would make
> > sense to abstract out the "check one group and return error" code and use
> > it in both places.
>   Actually yes, it is. Good point. Sami, is it the case that you have
> mounted the filesystem, then intentionally corrupted it and after that
> the kernel oopsed (as opposed to first corrupting the filesystem image and
> mounting it after that)? That would explain how corrupted values could get
> to read_block_bitmap() even though ext2_check_descriptors() checked them.

No, that's not what I do. I corrupt the fs before mounting it, then
mount it, perform normal filesystem operations on it and unmount it.

Here's the most current script I use (zzuf is the fuzzer):

------------------------------------------------------------
#!/bin/sh

if [ "`hostname`" != "fstest" ]; then
   echo "This is a dangerous script."
   echo "Set your hostname to \`fstest\' if you want to use it."
   exit 1
fi

umount /dev/hdb
umount /dev/hdc
/etc/init.d/sysklogd stop
/etc/init.d/klogd stop
/etc/init.d/cron stop
mount /dev/hda / -t ext3 -o remount,ro || exit 1

#ulimit -t 20

for ((s=$1; s<1000000000; s++)); do
  umount /mnt
  echo '***** zzuffing *****' seed $s
  zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
  mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
  cd /mnt || continue
  timeout 30 cp -r doc doc2 >&/dev/null
  timeout 30 find -xdev >&/dev/null
  timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
  timeout 30 mkdir tmp >&/dev/null
  timeout 30 echo whoah >tmp/filu 2>/dev/null
  timeout 30 rm -rf /mnt/* >&/dev/null
  cd /
done
------------------------------------------------------------

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-19 10:51               ` Sami Liedes
@ 2008-08-20 10:25                 ` Jan Kara
  2008-08-20 13:29                   ` Sami Liedes
  2008-08-20 19:07                   ` Andreas Dilger
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Kara @ 2008-08-20 10:25 UTC (permalink / raw)
  To: Sami Liedes
  Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon,
	linux-ext4

> On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > > Isn't equivalent checking done in ext2_check_descriptors()?  It would make
> > > sense to abstract out the "check one group and return error" code and use
> > > it in both places.
> >   Actually yes, it is. Good point. Sami, is it the case that you have
> > mounted the filesystem, then intentionally corrupted it and after that
> > the kernel oopsed (as opposed to first corrupting the filesystem image and
> > mounting it after that)? That would explain how corrupted values could get
> > to read_block_bitmap() even though ext2_check_descriptors() checked them.
> 
> No, that's not what I do. I corrupt the fs before mounting it, then
> mount it, perform normal filesystem operations on it and unmount it.
  OK, thanks. Then we must somehow corrupt group descriptor block during
the operation. Because I'm pretty sure it *is* corrupted - the oops
is: unable to handle kernel paging request at c7e95ffc. If we look into
registers, we see ECX has c7e96000 (which is probably bh->b_data). In
the second oops it's exactly the same - ECX has c11e4000, the oops is at
address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
pass negative offset into ext2_test_bit(). But as Andreas pointed out,
when we load descriptors into memory, we check that both bitmaps and
inode table is in ext2_check_descriptors()... The other possibility
would be that we managed to corrupts s_first_data_block in the
superblock. Anyway, both possibilities don't look very likely. I'll try
to reproduce the problem and maybe get more insight... How large is your
filesystem BTW?

> Here's the most current script I use (zzuf is the fuzzer):
> 
> ------------------------------------------------------------
> #!/bin/sh
> 
> if [ "`hostname`" != "fstest" ]; then
>    echo "This is a dangerous script."
>    echo "Set your hostname to \`fstest\' if you want to use it."
>    exit 1
> fi
> 
> umount /dev/hdb
> umount /dev/hdc
> /etc/init.d/sysklogd stop
> /etc/init.d/klogd stop
> /etc/init.d/cron stop
> mount /dev/hda / -t ext3 -o remount,ro || exit 1
> 
> #ulimit -t 20
> 
> for ((s=$1; s<1000000000; s++)); do
>   umount /mnt
>   echo '***** zzuffing *****' seed $s
>   zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
>   mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
>   cd /mnt || continue
>   timeout 30 cp -r doc doc2 >&/dev/null
>   timeout 30 find -xdev >&/dev/null
>   timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
>   timeout 30 mkdir tmp >&/dev/null
>   timeout 30 echo whoah >tmp/filu 2>/dev/null
>   timeout 30 rm -rf /mnt/* >&/dev/null
>   cd /
> done
> ------------------------------------------------------------

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-20 10:25                 ` Jan Kara
@ 2008-08-20 13:29                   ` Sami Liedes
  2008-08-20 19:07                   ` Andreas Dilger
  1 sibling, 0 replies; 13+ messages in thread
From: Sami Liedes @ 2008-08-20 13:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon,
	linux-ext4

On Wed, Aug 20, 2008 at 12:25:33PM +0200, Jan Kara wrote:
>   OK, thanks. Then we must somehow corrupt group descriptor block during
> the operation. Because I'm pretty sure it *is* corrupted - the oops
> is: unable to handle kernel paging request at c7e95ffc. If we look into
> registers, we see ECX has c7e96000 (which is probably bh->b_data). In
> the second oops it's exactly the same - ECX has c11e4000, the oops is at
> address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
> pass negative offset into ext2_test_bit(). But as Andreas pointed out,
> when we load descriptors into memory, we check that both bitmaps and
> inode table is in ext2_check_descriptors()... The other possibility
> would be that we managed to corrupts s_first_data_block in the
> superblock. Anyway, both possibilities don't look very likely. I'll try
> to reproduce the problem and maybe get more insight... How large is your
> filesystem BTW?

My FS is 10 MiB and tries to be diverse in its contents. It has a copy
of my /dev and a small partial copy of /usr/share/doc.

I put the pristine (non-corrupted) filesystem at

   http://www.hut.fi/~sliedes/fsdebug-hdc-ext2.bz2

(520k compressed).

I've been thinking I should write a script to prepare the root
filesystem for the tests, but haven't got that far yet. Basically
(unless I forget some step) I use debootstrap to bootstrap a minimal
Debian system, create some needed devices in it (hd[abc], ttyS0 at
least), set the hostname to fstest, configure getty to listen to
ttyS0, copy the script to /root/runtest (the script's first parameter
is the seed) and install some Debian packages (zzuf and timeout at
least).

Then I make four copies of the images and run four qemus in parallel
since I have four cpus, modifying the first parameter (initial seed)
of the runtest script, e.g. 0, 10M, 20M, 30M.

I guess the approach might be useful for those who write the code too
(or people closer to them than me), since I've already found a fair
number of bugs with it in a fairly short period of time (#10871,
#10882, #10976, #11250, #11253, #11266 for ext[23] bugs, also one ext4
bug I hit when an ext3 fs was detected as ext4; search bugzilla for my
email to see the rest of the bugs).

The current root filesystem is 144M compressed (yeah, there's a lot of
stuff irrelevant to the tests there), I could upload it somewhere if
that helps. After that running the tests is a matter of running
something like

   qemu -kernel bzImage -append 'root=/dev/hda console=ttyS0,115200n8' \
       -hda hda -hdb hdb -hdc hdc -nographic -serial pty

, attaching a screen session to the allocated pty, logging in as root
and running ./runtest $seed.

Also the tests are not as comprehensive as I'd like. As an example,
some years ago I stress tested reiser4 (it was already "ready") with
pretty mundane operations (without corrupting the fs) and it worked,
but I've got it to break badly at three separate times in separate
ways just by normally using Debian's aptitude - the breakage was in
flock(), and the current tests don't test flock()). Other things to
test would be at least hard links and fifos...

The level of automation isn't quite what I'd like either, optimally
there would just be a single script that takes the kernel image,
filesystem type and number of parallel instances as arguments and runs
the tests.

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-20 10:25                 ` Jan Kara
  2008-08-20 13:29                   ` Sami Liedes
@ 2008-08-20 19:07                   ` Andreas Dilger
  2008-11-02  5:27                     ` Sami Liedes
  1 sibling, 1 reply; 13+ messages in thread
From: Andreas Dilger @ 2008-08-20 19:07 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sami Liedes, Aneesh Kumar K.V, Andrew Morton, bugme-daemon,
	linux-ext4

On Aug 20, 2008  12:25 +0200, Jan Kara wrote:
> > On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > > > Isn't equivalent checking done in ext2_check_descriptors()?  It would make
> > > > sense to abstract out the "check one group and return error" code and use
> > > > it in both places.
> > >   Actually yes, it is. Good point. Sami, is it the case that you have
> > > mounted the filesystem, then intentionally corrupted it and after that
> > > the kernel oopsed (as opposed to first corrupting the filesystem image and
> > > mounting it after that)? That would explain how corrupted values could get
> > > to read_block_bitmap() even though ext2_check_descriptors() checked them.
> > 
> > No, that's not what I do. I corrupt the fs before mounting it, then
> > mount it, perform normal filesystem operations on it and unmount it.

>   OK, thanks. Then we must somehow corrupt group descriptor block during
> the operation.

Oh, interesting...  The data in the journal is probably corrupt, but all
of the superblock/gdt sanity checks are done BEFORE the journal is replayed.

It would seem that the ext*_fill_super() code should do the sanity checks,
and then recheck the superblock and group descriptors after the journal
is replayed.  The superblock checking code can be moved out of
ext*_fill_super() into a helper function like ext*_check_super()) and then
calling ext*_check_super() and ext*_check_descriptors() again after journal
replay.

Having journal checksums enabled (ext4) would also detect this problem
before the journal replay corrupts the filesystem metadata.

It doesn't look possible that we can do journal recovery before loading
the GDT because ext*_load_journal()->ext*_get_journal() is doing iget()
and this needs the GDT to read the journal inode.

It might also make sense to just clean up the superblock and group descriptor
table and goto the beginning of fill_super() because in some cases the
superblock contents may have changed in important ways (e.g. crash after
resize of the filesystem which is only in the journal).


> Because I'm pretty sure it *is* corrupted - the oops
> is: unable to handle kernel paging request at c7e95ffc. If we look into
> registers, we see ECX has c7e96000 (which is probably bh->b_data). In
> the second oops it's exactly the same - ECX has c11e4000, the oops is at
> address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
> pass negative offset into ext2_test_bit(). But as Andreas pointed out,
> when we load descriptors into memory, we check that both bitmaps and
> inode table is in ext2_check_descriptors()... The other possibility
> would be that we managed to corrupts s_first_data_block in the
> superblock. Anyway, both possibilities don't look very likely. I'll try
> to reproduce the problem and maybe get more insight... How large is your
> filesystem BTW?
> 
> > Here's the most current script I use (zzuf is the fuzzer):
> > 
> > ------------------------------------------------------------
> > #!/bin/sh
> > 
> > if [ "`hostname`" != "fstest" ]; then
> >    echo "This is a dangerous script."
> >    echo "Set your hostname to \`fstest\' if you want to use it."
> >    exit 1
> > fi
> > 
> > umount /dev/hdb
> > umount /dev/hdc
> > /etc/init.d/sysklogd stop
> > /etc/init.d/klogd stop
> > /etc/init.d/cron stop
> > mount /dev/hda / -t ext3 -o remount,ro || exit 1
> > 
> > #ulimit -t 20
> > 
> > for ((s=$1; s<1000000000; s++)); do
> >   umount /mnt
> >   echo '***** zzuffing *****' seed $s
> >   zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
> >   mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
> >   cd /mnt || continue
> >   timeout 30 cp -r doc doc2 >&/dev/null
> >   timeout 30 find -xdev >&/dev/null
> >   timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
> >   timeout 30 mkdir tmp >&/dev/null
> >   timeout 30 echo whoah >tmp/filu 2>/dev/null
> >   timeout 30 rm -rf /mnt/* >&/dev/null
> >   cd /
> > done
> > ------------------------------------------------------------

Oh, hmm, this is ext2 and not ext3, so no journal...  I guess my bug is
still valid, but just not this one?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-20 19:07                   ` Andreas Dilger
@ 2008-11-02  5:27                     ` Sami Liedes
  0 siblings, 0 replies; 13+ messages in thread
From: Sami Liedes @ 2008-11-02  5:27 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jan Kara, Aneesh Kumar K.V, Andrew Morton, bugme-daemon,
	linux-ext4

[Sorry for duplicates, forgot to use email instead of bugzilla web
interface.]

I now have found an ext3 filesystem for which this bug happens pretty
reproducibly on 2.6.27.4. Increasing commit interval seems to help it happen,
otherwise the journal can be aborted and then the bug no longer happens. I do
realize that this report is for the ext2 bug, but I hope finding a similar bug
on ext3 might help (and even if this is a separate bug, this information should
help resolve it).

Here's how to do it:

1. bunzip2 the attached filesystem image hdb.10000097.bz2

(I did the following inside qemu, hence /dev/hdb)

2. mount /dev/hdb /mnt -t ext3 -o errors=continue,commit=300
3. cd /mnt
4. timeout 30 cp -r doc doc2 >&/dev/null (or manually break cp after 30
seconds, it's jammed anyway)
6. find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
7. mkdir tmp >&/dev/null
8. echo whoah >tmp/filu 2>/dev/null
9. rm -rf /mnt/* >&/dev/null
10. while completing rm -rf, the following oops occurs:

------------------------------------------------------------
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones -
Block = 8234, count = 1
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_forget
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones -
Block = 42, count = 3
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 25630524, count = 1
EXT3-fs error (device hdb) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
BUG: unable to handle kernel paging request at c13fbbfc
IP: [<c02de4f9>] read_block_bitmap+0xa3/0x147
*pde = 07886163 *pte = 013fb160
Oops: 0000 [#1] DEBUG_PAGEALLOC

Pid: 817, comm: rm Not tainted (2.6.27.4 #1)
EIP: 0060:[<c02de4f9>] EFLAGS: 00000206 CPU: 0
EIP is at read_block_bitmap+0xa3/0x147
EAX: ffffdfff EBX: c13fc820 ECX: c13fc000 EDX: 00002001
ESI: c74b15b0 EDI: c7aae400 EBP: c7b7acd0 ESP: c7b7aca0
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process rm (pid: 817, ti=c7b7a000 task=c78a1ce0 task.ti=c7b7a000)
Stack: 00000001 00000000 00000000 c7aaf1c0 00000246 c79cdc00 00000001 00000000
       c13fc000 00000000 00000001 c163b37c c7b7ad28 c02de66f c0315003 c740aadc
       c7b7ad10 c7440000 c7aaf1c0 00000029 0000202a c7aae400 c7440000 c79cdcac
Call Trace:
 [<c02de66f>] ? ext3_free_blocks_sb+0x93/0x3d6
 [<c0315003>] ? journal_forget+0xff/0x1aa
 [<c02edd83>] ? __ext3_journal_forget+0x19/0x3f
 [<c02de9dd>] ? ext3_free_blocks+0x2b/0x7f
 [<c02e3f8c>] ? ext3_clear_blocks+0x137/0x159
 [<c02e4072>] ? ext3_free_data+0xc4/0x133
 [<c02e4320>] ? ext3_free_branches+0x23f/0x247
 [<c02e4189>] ? ext3_free_branches+0xa8/0x247
 [<c02e4189>] ? ext3_free_branches+0xa8/0x247
 [<c02e498d>] ? ext3_truncate+0x665/0x8ad
 [<c0316062>] ? journal_start+0xb2/0x112
 [<c031608d>] ? journal_start+0xdd/0x112
 [<c0316062>] ? journal_start+0xb2/0x112
 [<c02ebb53>] ? ext3_journal_start_sb+0x29/0x4a
 [<c02e4ca4>] ? ext3_delete_inode+0xcf/0xdb
 [<c02e4bd5>] ? ext3_delete_inode+0x0/0xdb
 [<c02774b3>] ? generic_delete_inode+0x62/0xd5
 [<c0277639>] ? generic_drop_inode+0x113/0x16a
 [<c02765ac>] ? iput+0x47/0x4e
 [<c026d9f4>] ? do_unlinkat+0xc3/0x13d
 [<c054484f>] ? mutex_unlock+0x8/0xa
 [<c026fb0b>] ? vfs_readdir+0x60/0x85
 [<c026f84c>] ? filldir64+0x0/0xd7
 [<c026fbc7>] ? sys_getdents64+0x97/0xa1
 [<c026db66>] ? sys_unlinkat+0x23/0x36
 [<c0202f1e>] ? syscall_call+0x7/0xb
 =======================
Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10
8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0
74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19
EIP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c7b7aca0
---[ end trace 780108b88e07a03e ]---
------------------------------------------------------------

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks
  2008-08-18 14:58       ` Jan Kara
  2008-08-18 16:51         ` Aneesh Kumar K.V
@ 2008-08-19 21:43         ` Sami Liedes
  1 sibling, 0 replies; 13+ messages in thread
From: Sami Liedes @ 2008-08-19 21:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, bugme-daemon, linux-ext4, Aneesh Kumar K.V

On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote:
> From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 18 Aug 2008 16:45:11 +0200
> Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it
> 
> We have to check whether a group descriptor isn't corrupted in
> read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
> to access bits outside of bitmap and Oops happens.

I think something similar is needed for ext3, or at least the
backtrace looks similar to me (tell me if you want me to file a
separate bug for it):

------------------------------------------------------------
[ 1303.485714] EXT3-fs unexpected failure: !jh->b_committed_data;
[ 1303.485714] inconsistent data on disk
[ 1303.485714] BUG: unable to handle kernel paging request at c7edfffc
[ 1303.485714] IP: [<c02ddca9>] read_block_bitmap+0xa3/0x147
[ 1303.485714] *pde = 00007067 *pte = 07edf160
[ 1303.485714] Oops: 0000 [#1] DEBUG_PAGEALLOC
[ 1303.485714]
[ 1303.485714] Pid: 17001, comm: rm Not tainted (2.6.27-rc3 #2)
[ 1303.485714] EIP: 0060:[<c02ddca9>] EFLAGS: 00000246 CPU: 0
[ 1303.485714] EIP is at read_block_bitmap+0xa3/0x147
[ 1303.485714] EAX: ffffffff EBX: c7ee0800 ECX: c7ee0000 EDX: 00000001
[ 1303.485714] ESI: c3c40690 EDI: c7abd000 EBP: c79c4c9c ESP: c79c4c6c
[ 1303.485714]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 1303.485714] Process rm (pid: 17001, ti=c79c4000 task=c79189a0 task.ti=c79c4000)
[ 1303.485714] Stack: 00000246 00000001 00000246 c7abda3c c7413aa0 c5d7f800 00000000 00000000
[ 1303.485714]        c7ee0000 00000000 00000000 c3c25064 c79c4cf4 c02dde1f c3c405b0 c79c4ccc
[ 1303.485714]        c0317987 00000001 c0314a9b 00000029 0000002a c7abd000 c7440000 c5d7f8ac
[ 1303.485714] Call Trace:
[ 1303.485714]  [<c02dde1f>] ? ext3_free_blocks_sb+0x93/0x3d6
[ 1303.485714]  [<c0317987>] ? journal_revoke+0x81/0xe3
[ 1303.485714]  [<c0314a9b>] ? do_get_write_access+0x381/0x49c
[ 1303.485714]  [<c02ed428>] ? __ext3_journal_revoke+0x1e/0x44
[ 1303.485714]  [<c02de18d>] ? ext3_free_blocks+0x2b/0x7f
[ 1303.485714]  [<c02e3694>] ? ext3_clear_blocks+0x11f/0x141
[ 1303.485714]  [<c02e377a>] ? ext3_free_data+0xc4/0x133
[ 1303.485714]  [<c02e3a0e>] ? ext3_free_branches+0x225/0x22d
[ 1303.485714]  [<c02e3891>] ? ext3_free_branches+0xa8/0x22d
[ 1303.485714]  [<c02e3891>] ? ext3_free_branches+0xa8/0x22d
[ 1303.485714]  [<c02e407d>] ? ext3_truncate+0x667/0x8af
[ 1303.485714]  [<c03153e2>] ? journal_start+0xb2/0x112
[ 1303.485714]  [<c031540d>] ? journal_start+0xdd/0x112
[ 1303.485714]  [<c03153e2>] ? journal_start+0xb2/0x112
[ 1303.485714]  [<c02eb243>] ? ext3_journal_start_sb+0x29/0x4a
[ 1303.485714]  [<c02e4389>] ? ext3_delete_inode+0xc4/0xdb
[ 1303.485714]  [<c02e42c5>] ? ext3_delete_inode+0x0/0xdb
[ 1303.485714]  [<c0276c2b>] ? generic_delete_inode+0x62/0xd5
[ 1303.485714]  [<c0276db1>] ? generic_drop_inode+0x113/0x162
[ 1303.485714]  [<c0275d3c>] ? iput+0x47/0x4e
[ 1303.485714]  [<c02737a7>] ? dentry_iput+0x6b/0xb1
[ 1303.485714]  [<c0273859>] ? d_kill+0x1d/0x37
[ 1303.485714]  [<c027519b>] ? dput+0x58/0x10a
[ 1303.485714]  [<c026d2a4>] ? do_rmdir+0xa4/0xc3
[ 1303.485714]  [<c026d2f4>] ? sys_unlinkat+0x31/0x36
[ 1303.485714]  [<c0202f3e>] ? syscall_call+0x7/0xb
[ 1303.485714]  =======================
[ 1303.485714] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 0
1 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19
[ 1303.485714] EIP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c79c4c6c
[ 1303.485714] ---[ end trace ba199677255b7e73 ]---
------------------------------------------------------------
$ addr2line -e vmlinux -i 0xc02ddca9
include/asm/bitops.h:305
fs/ext3/balloc.c:98
fs/ext3/balloc.c:167

     98         if (!ext3_test_bit(offset, bh->b_data))
     99                 /* bad block bitmap */
    100                 goto err_out;
------------------------------------------------------------

	Sami

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-11-02  5:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-11266-10286@http.bugzilla.kernel.org/>
2008-08-07 17:52 ` [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks Andrew Morton
     [not found] ` <0K5800031SEDU2@smtp02.hut-mail>
2008-08-07 20:07   ` Sami Liedes
2008-08-07 20:28     ` Sami Liedes
2008-08-18 14:58       ` Jan Kara
2008-08-18 16:51         ` Aneesh Kumar K.V
2008-08-19  3:24           ` Andreas Dilger
2008-08-19  9:13             ` Jan Kara
2008-08-19 10:51               ` Sami Liedes
2008-08-20 10:25                 ` Jan Kara
2008-08-20 13:29                   ` Sami Liedes
2008-08-20 19:07                   ` Andreas Dilger
2008-11-02  5:27                     ` Sami Liedes
2008-08-19 21:43         ` Sami Liedes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox