* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks [not found] <bug-11266-10286@http.bugzilla.kernel.org/> @ 2008-08-07 17:52 ` Andrew Morton [not found] ` <0K5800031SEDU2@smtp02.hut-mail> 1 sibling, 0 replies; 13+ messages in thread From: Andrew Morton @ 2008-08-07 17:52 UTC (permalink / raw) To: sliedes; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 7 Aug 2008 05:53:37 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11266 > > Summary: unable to handle kernel paging request in > ext2_free_blocks > Product: File System > Version: 2.5 > KernelVersion: 2.6.27-rc2 + patch for #10976 (now in -mm) > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: ext2 > AssignedTo: akpm@osdl.org > ReportedBy: sliedes@cc.hut.fi > > > Latest working kernel version: (I think at least 2.6.25.4 works) > Earliest failing kernel version: > Distribution: Minimal Debian sid (unstable) > Hardware Environment: qemu x86 > Software Environment: > Problem Description: > > Mere rm -rf after mounting on an intentionally corrupted partition occasionally > causes "BUG: unable to handle kernel paging request" in ext2_free_blocks. > > Unfortunately the issue seems to be timing sensitive (or something), doing it > on the same filesystem only sometimes results in the crash :( But I have > reproduced it something like 6 times now with brief testing. > > If you wish, I can attach some filesystems with which I have been able to > reproduce this at least once. > > Another thing I could do is take a look at it with the new kernel debugger > (which I haven't tried yet) if none of you are able to figure out this from the > traces. Is there something you would specifically want me to take a look at? > The local and referenced variables at ext2_free_blocks(), I guess? > > I think I ran quite extensive tests on 2.6.25.4 & ext2, so I suspect (but am > not sure, I've made some changes to the way I test) this bug is newer than > 2.6.25.4. I could do some bisecting too, but I haven't managed to automate the > thing yet. > > Here's a script I run under qemu, google for zzuf (it's a fuzzer), and timeout > is from the Debian package `timeout': > > ---------- > #!/bin/sh > > if [ "`hostname`" != "fstest" ]; then > echo "This is a dangerous script." > echo "Set your hostname to \`fstest\' if you want to use it." > exit 1 > fi > > umount /dev/hdb > umount /dev/hdc > /etc/init.d/sysklogd stop > /etc/init.d/klogd stop > /etc/init.d/cron stop > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > ulimit -t 20 > > for ((s=$1; s<1000000000; s++)); do > umount /mnt > echo '***** zzuffing *****' seed $s > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > mount /dev/hdb /mnt -o errors=continue || continue > cd /mnt || continue > cp -r doc doc2 >&/dev/null > find -xdev >&/dev/null > find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > mkdir tmp >&/dev/null > echo whoah >tmp/filu 2>/dev/null > rm -rf /mnt/* >&/dev/null > cd / > done > ---------- > > The attached backtraces all start from the time of mounting the filesystem. > Yes, please do test 2.6.26. Aneesh, your recent changes to the ext2 block allocator would have to be prime suspects here. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <0K5800031SEDU2@smtp02.hut-mail>]
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks [not found] ` <0K5800031SEDU2@smtp02.hut-mail> @ 2008-08-07 20:07 ` Sami Liedes 2008-08-07 20:28 ` Sami Liedes 0 siblings, 1 reply; 13+ messages in thread From: Sami Liedes @ 2008-08-07 20:07 UTC (permalink / raw) To: Andrew Morton; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > Yes, please do test 2.6.26. Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-07 20:07 ` Sami Liedes @ 2008-08-07 20:28 ` Sami Liedes 2008-08-18 14:58 ` Jan Kara 0 siblings, 1 reply; 13+ messages in thread From: Sami Liedes @ 2008-08-07 20:28 UTC (permalink / raw) To: Andrew Morton; +Cc: bugme-daemon, linux-ext4, Aneesh Kumar K.V On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote: > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > > Yes, please do test 2.6.26. > > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4 working (unless something changed between those two versions). Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-07 20:28 ` Sami Liedes @ 2008-08-18 14:58 ` Jan Kara 2008-08-18 16:51 ` Aneesh Kumar K.V 2008-08-19 21:43 ` Sami Liedes 0 siblings, 2 replies; 13+ messages in thread From: Jan Kara @ 2008-08-18 14:58 UTC (permalink / raw) To: Sami Liedes; +Cc: Andrew Morton, bugme-daemon, linux-ext4, Aneesh Kumar K.V > On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote: > > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > > > Yes, please do test 2.6.26. > > > > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. > > 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4 > working (unless something changed between those two versions). I think this is the same problem Vegard reported in http://marc.info/?l=linux-ext4&m=121637999611618&w=2. The problem seems to be in ext2_valid_block_bitmap() which does bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); offset = bitmap_blk - group_first_block; if (!ext2_test_bit(offset, bh->b_data)) (and similarly for inode bitmap). Now when the group descriptor is corrupted, this simply accesses beyond the bh->b_data... The patch below should hopefully fix the issue. Can you test it please? Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs --- >From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Mon, 18 Aug 2008 16:45:11 +0200 Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it We have to check whether a group descriptor isn't corrupted in read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try to access bits outside of bitmap and Oops happens. CC: Vegard Nossum <vegard.nossum@gmail.com> CC: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++ 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 10bb02c..9104712 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -113,6 +113,17 @@ err_out: return 0; } +static int ext2_block_in_group(struct super_block *sb, + unsigned int block_group, ext2_fsblk_t block) +{ + if (block < ext2_group_first_block_no(sb, block_group)) + return 0; + if (block >= ext2_group_first_block_no(sb, block_group) + + EXT2_BLOCKS_PER_GROUP(sb)) + return 0; + return 1; +} + /* * Read the bitmap for a given block_group,and validate the * bits for block/inode/inode tables are set in the bitmaps @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) desc = ext2_get_group_desc(sb, block_group, NULL); if (!desc) return NULL; + if (!ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_block_bitmap)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_bitmap)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_table)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_table) + + EXT2_SB(sb)->s_itb_per_group - 1)) { + ext2_error(sb, __func__, "Corrupted group descriptor - " + "block_group = %u, block_bitmap = %u, " + "inode_bitmap = %u, inode_table = %u", + block_group, + le32_to_cpu(desc->bg_block_bitmap), + le32_to_cpu(desc->bg_inode_bitmap), + le32_to_cpu(desc->bg_inode_table)); + return NULL; + } bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); bh = sb_getblk(sb, bitmap_blk); if (unlikely(!bh)) { -- 1.5.2.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-18 14:58 ` Jan Kara @ 2008-08-18 16:51 ` Aneesh Kumar K.V 2008-08-19 3:24 ` Andreas Dilger 2008-08-19 21:43 ` Sami Liedes 1 sibling, 1 reply; 13+ messages in thread From: Aneesh Kumar K.V @ 2008-08-18 16:51 UTC (permalink / raw) To: Jan Kara; +Cc: Sami Liedes, Andrew Morton, bugme-daemon, linux-ext4 On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote: > > From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack@suse.cz> > Date: Mon, 18 Aug 2008 16:45:11 +0200 > Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it > > We have to check whether a group descriptor isn't corrupted in > read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try > to access bits outside of bitmap and Oops happens. > > CC: Vegard Nossum <vegard.nossum@gmail.com> > CC: Sami Liedes <sliedes@cc.hut.fi> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c > index 10bb02c..9104712 100644 > --- a/fs/ext2/balloc.c > +++ b/fs/ext2/balloc.c > @@ -113,6 +113,17 @@ err_out: > return 0; > } > > +static int ext2_block_in_group(struct super_block *sb, > + unsigned int block_group, ext2_fsblk_t block) > +{ > + if (block < ext2_group_first_block_no(sb, block_group)) > + return 0; > + if (block >= ext2_group_first_block_no(sb, block_group) + > + EXT2_BLOCKS_PER_GROUP(sb)) > + return 0; > + return 1; > +} > + > /* > * Read the bitmap for a given block_group,and validate the > * bits for block/inode/inode tables are set in the bitmaps > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) > desc = ext2_get_group_desc(sb, block_group, NULL); > if (!desc) > return NULL; > + if (!ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_block_bitmap)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_bitmap)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_table)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_table) + > + EXT2_SB(sb)->s_itb_per_group - 1)) { > + ext2_error(sb, __func__, "Corrupted group descriptor - " > + "block_group = %u, block_bitmap = %u, " > + "inode_bitmap = %u, inode_table = %u", > + block_group, > + le32_to_cpu(desc->bg_block_bitmap), > + le32_to_cpu(desc->bg_inode_bitmap), > + le32_to_cpu(desc->bg_inode_table)); > + return NULL; > + } > bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); > bh = sb_getblk(sb, bitmap_blk); > if (unlikely(!bh)) { Do we need to do this validation every time we do a read_block_bitmap ? I guess we need to move the validation where we read the desc blocks from the disk. -aneesh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-18 16:51 ` Aneesh Kumar K.V @ 2008-08-19 3:24 ` Andreas Dilger 2008-08-19 9:13 ` Jan Kara 0 siblings, 1 reply; 13+ messages in thread From: Andreas Dilger @ 2008-08-19 3:24 UTC (permalink / raw) To: Aneesh Kumar K.V Cc: Jan Kara, Sami Liedes, Andrew Morton, bugme-daemon, linux-ext4 On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote: > > +static int ext2_block_in_group(struct super_block *sb, > > + unsigned int block_group, ext2_fsblk_t block) > > +{ > > + if (block < ext2_group_first_block_no(sb, block_group)) > > + return 0; > > + if (block >= ext2_group_first_block_no(sb, block_group) + > > + EXT2_BLOCKS_PER_GROUP(sb)) > > + return 0; > > + return 1; > > +} > > + > > /* > > * Read the bitmap for a given block_group,and validate the > > * bits for block/inode/inode tables are set in the bitmaps > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) > > desc = ext2_get_group_desc(sb, block_group, NULL); > > if (!desc) > > return NULL; > > + if (!ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_block_bitmap)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_bitmap)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_table)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_table) + > > + EXT2_SB(sb)->s_itb_per_group - 1)) { Isn't equivalent checking done in ext2_check_descriptors()? It would make sense to abstract out the "check one group and return error" code and use it in both places. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-19 3:24 ` Andreas Dilger @ 2008-08-19 9:13 ` Jan Kara 2008-08-19 10:51 ` Sami Liedes 0 siblings, 1 reply; 13+ messages in thread From: Jan Kara @ 2008-08-19 9:13 UTC (permalink / raw) To: Andreas Dilger Cc: Aneesh Kumar K.V, Sami Liedes, Andrew Morton, bugme-daemon, linux-ext4 On Mon 18-08-08 21:24:10, Andreas Dilger wrote: > On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote: > > > +static int ext2_block_in_group(struct super_block *sb, > > > + unsigned int block_group, ext2_fsblk_t block) > > > +{ > > > + if (block < ext2_group_first_block_no(sb, block_group)) > > > + return 0; > > > + if (block >= ext2_group_first_block_no(sb, block_group) + > > > + EXT2_BLOCKS_PER_GROUP(sb)) > > > + return 0; > > > + return 1; > > > +} > > > + > > > /* > > > * Read the bitmap for a given block_group,and validate the > > > * bits for block/inode/inode tables are set in the bitmaps > > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) > > > desc = ext2_get_group_desc(sb, block_group, NULL); > > > if (!desc) > > > return NULL; > > > + if (!ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_block_bitmap)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_bitmap)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_table)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_table) + > > > + EXT2_SB(sb)->s_itb_per_group - 1)) { > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > sense to abstract out the "check one group and return error" code and use > it in both places. Actually yes, it is. Good point. Sami, is it the case that you have mounted the filesystem, then intentionally corrupted it and after that the kernel oopsed (as opposed to first corrupting the filesystem image and mounting it after that)? That would explain how corrupted values could get to read_block_bitmap() even though ext2_check_descriptors() checked them. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-19 9:13 ` Jan Kara @ 2008-08-19 10:51 ` Sami Liedes 2008-08-20 10:25 ` Jan Kara 0 siblings, 1 reply; 13+ messages in thread From: Sami Liedes @ 2008-08-19 10:51 UTC (permalink / raw) To: Jan Kara Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon, linux-ext4 On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > > sense to abstract out the "check one group and return error" code and use > > it in both places. > Actually yes, it is. Good point. Sami, is it the case that you have > mounted the filesystem, then intentionally corrupted it and after that > the kernel oopsed (as opposed to first corrupting the filesystem image and > mounting it after that)? That would explain how corrupted values could get > to read_block_bitmap() even though ext2_check_descriptors() checked them. No, that's not what I do. I corrupt the fs before mounting it, then mount it, perform normal filesystem operations on it and unmount it. Here's the most current script I use (zzuf is the fuzzer): ------------------------------------------------------------ #!/bin/sh if [ "`hostname`" != "fstest" ]; then echo "This is a dangerous script." echo "Set your hostname to \`fstest\' if you want to use it." exit 1 fi umount /dev/hdb umount /dev/hdc /etc/init.d/sysklogd stop /etc/init.d/klogd stop /etc/init.d/cron stop mount /dev/hda / -t ext3 -o remount,ro || exit 1 #ulimit -t 20 for ((s=$1; s<1000000000; s++)); do umount /mnt echo '***** zzuffing *****' seed $s zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit mount /dev/hdb /mnt -t ext2 -o errors=continue || continue cd /mnt || continue timeout 30 cp -r doc doc2 >&/dev/null timeout 30 find -xdev >&/dev/null timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null timeout 30 mkdir tmp >&/dev/null timeout 30 echo whoah >tmp/filu 2>/dev/null timeout 30 rm -rf /mnt/* >&/dev/null cd / done ------------------------------------------------------------ Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-19 10:51 ` Sami Liedes @ 2008-08-20 10:25 ` Jan Kara 2008-08-20 13:29 ` Sami Liedes 2008-08-20 19:07 ` Andreas Dilger 0 siblings, 2 replies; 13+ messages in thread From: Jan Kara @ 2008-08-20 10:25 UTC (permalink / raw) To: Sami Liedes Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon, linux-ext4 > On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > > > sense to abstract out the "check one group and return error" code and use > > > it in both places. > > Actually yes, it is. Good point. Sami, is it the case that you have > > mounted the filesystem, then intentionally corrupted it and after that > > the kernel oopsed (as opposed to first corrupting the filesystem image and > > mounting it after that)? That would explain how corrupted values could get > > to read_block_bitmap() even though ext2_check_descriptors() checked them. > > No, that's not what I do. I corrupt the fs before mounting it, then > mount it, perform normal filesystem operations on it and unmount it. OK, thanks. Then we must somehow corrupt group descriptor block during the operation. Because I'm pretty sure it *is* corrupted - the oops is: unable to handle kernel paging request at c7e95ffc. If we look into registers, we see ECX has c7e96000 (which is probably bh->b_data). In the second oops it's exactly the same - ECX has c11e4000, the oops is at address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to pass negative offset into ext2_test_bit(). But as Andreas pointed out, when we load descriptors into memory, we check that both bitmaps and inode table is in ext2_check_descriptors()... The other possibility would be that we managed to corrupts s_first_data_block in the superblock. Anyway, both possibilities don't look very likely. I'll try to reproduce the problem and maybe get more insight... How large is your filesystem BTW? > Here's the most current script I use (zzuf is the fuzzer): > > ------------------------------------------------------------ > #!/bin/sh > > if [ "`hostname`" != "fstest" ]; then > echo "This is a dangerous script." > echo "Set your hostname to \`fstest\' if you want to use it." > exit 1 > fi > > umount /dev/hdb > umount /dev/hdc > /etc/init.d/sysklogd stop > /etc/init.d/klogd stop > /etc/init.d/cron stop > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > #ulimit -t 20 > > for ((s=$1; s<1000000000; s++)); do > umount /mnt > echo '***** zzuffing *****' seed $s > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > mount /dev/hdb /mnt -t ext2 -o errors=continue || continue > cd /mnt || continue > timeout 30 cp -r doc doc2 >&/dev/null > timeout 30 find -xdev >&/dev/null > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > timeout 30 mkdir tmp >&/dev/null > timeout 30 echo whoah >tmp/filu 2>/dev/null > timeout 30 rm -rf /mnt/* >&/dev/null > cd / > done > ------------------------------------------------------------ Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-20 10:25 ` Jan Kara @ 2008-08-20 13:29 ` Sami Liedes 2008-08-20 19:07 ` Andreas Dilger 1 sibling, 0 replies; 13+ messages in thread From: Sami Liedes @ 2008-08-20 13:29 UTC (permalink / raw) To: Jan Kara Cc: Andreas Dilger, Aneesh Kumar K.V, Andrew Morton, bugme-daemon, linux-ext4 On Wed, Aug 20, 2008 at 12:25:33PM +0200, Jan Kara wrote: > OK, thanks. Then we must somehow corrupt group descriptor block during > the operation. Because I'm pretty sure it *is* corrupted - the oops > is: unable to handle kernel paging request at c7e95ffc. If we look into > registers, we see ECX has c7e96000 (which is probably bh->b_data). In > the second oops it's exactly the same - ECX has c11e4000, the oops is at > address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to > pass negative offset into ext2_test_bit(). But as Andreas pointed out, > when we load descriptors into memory, we check that both bitmaps and > inode table is in ext2_check_descriptors()... The other possibility > would be that we managed to corrupts s_first_data_block in the > superblock. Anyway, both possibilities don't look very likely. I'll try > to reproduce the problem and maybe get more insight... How large is your > filesystem BTW? My FS is 10 MiB and tries to be diverse in its contents. It has a copy of my /dev and a small partial copy of /usr/share/doc. I put the pristine (non-corrupted) filesystem at http://www.hut.fi/~sliedes/fsdebug-hdc-ext2.bz2 (520k compressed). I've been thinking I should write a script to prepare the root filesystem for the tests, but haven't got that far yet. Basically (unless I forget some step) I use debootstrap to bootstrap a minimal Debian system, create some needed devices in it (hd[abc], ttyS0 at least), set the hostname to fstest, configure getty to listen to ttyS0, copy the script to /root/runtest (the script's first parameter is the seed) and install some Debian packages (zzuf and timeout at least). Then I make four copies of the images and run four qemus in parallel since I have four cpus, modifying the first parameter (initial seed) of the runtest script, e.g. 0, 10M, 20M, 30M. I guess the approach might be useful for those who write the code too (or people closer to them than me), since I've already found a fair number of bugs with it in a fairly short period of time (#10871, #10882, #10976, #11250, #11253, #11266 for ext[23] bugs, also one ext4 bug I hit when an ext3 fs was detected as ext4; search bugzilla for my email to see the rest of the bugs). The current root filesystem is 144M compressed (yeah, there's a lot of stuff irrelevant to the tests there), I could upload it somewhere if that helps. After that running the tests is a matter of running something like qemu -kernel bzImage -append 'root=/dev/hda console=ttyS0,115200n8' \ -hda hda -hdb hdb -hdc hdc -nographic -serial pty , attaching a screen session to the allocated pty, logging in as root and running ./runtest $seed. Also the tests are not as comprehensive as I'd like. As an example, some years ago I stress tested reiser4 (it was already "ready") with pretty mundane operations (without corrupting the fs) and it worked, but I've got it to break badly at three separate times in separate ways just by normally using Debian's aptitude - the breakage was in flock(), and the current tests don't test flock()). Other things to test would be at least hard links and fifos... The level of automation isn't quite what I'd like either, optimally there would just be a single script that takes the kernel image, filesystem type and number of parallel instances as arguments and runs the tests. Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-20 10:25 ` Jan Kara 2008-08-20 13:29 ` Sami Liedes @ 2008-08-20 19:07 ` Andreas Dilger 2008-11-02 5:27 ` Sami Liedes 1 sibling, 1 reply; 13+ messages in thread From: Andreas Dilger @ 2008-08-20 19:07 UTC (permalink / raw) To: Jan Kara Cc: Sami Liedes, Aneesh Kumar K.V, Andrew Morton, bugme-daemon, linux-ext4 On Aug 20, 2008 12:25 +0200, Jan Kara wrote: > > On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > > > > sense to abstract out the "check one group and return error" code and use > > > > it in both places. > > > Actually yes, it is. Good point. Sami, is it the case that you have > > > mounted the filesystem, then intentionally corrupted it and after that > > > the kernel oopsed (as opposed to first corrupting the filesystem image and > > > mounting it after that)? That would explain how corrupted values could get > > > to read_block_bitmap() even though ext2_check_descriptors() checked them. > > > > No, that's not what I do. I corrupt the fs before mounting it, then > > mount it, perform normal filesystem operations on it and unmount it. > OK, thanks. Then we must somehow corrupt group descriptor block during > the operation. Oh, interesting... The data in the journal is probably corrupt, but all of the superblock/gdt sanity checks are done BEFORE the journal is replayed. It would seem that the ext*_fill_super() code should do the sanity checks, and then recheck the superblock and group descriptors after the journal is replayed. The superblock checking code can be moved out of ext*_fill_super() into a helper function like ext*_check_super()) and then calling ext*_check_super() and ext*_check_descriptors() again after journal replay. Having journal checksums enabled (ext4) would also detect this problem before the journal replay corrupts the filesystem metadata. It doesn't look possible that we can do journal recovery before loading the GDT because ext*_load_journal()->ext*_get_journal() is doing iget() and this needs the GDT to read the journal inode. It might also make sense to just clean up the superblock and group descriptor table and goto the beginning of fill_super() because in some cases the superblock contents may have changed in important ways (e.g. crash after resize of the filesystem which is only in the journal). > Because I'm pretty sure it *is* corrupted - the oops > is: unable to handle kernel paging request at c7e95ffc. If we look into > registers, we see ECX has c7e96000 (which is probably bh->b_data). In > the second oops it's exactly the same - ECX has c11e4000, the oops is at > address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to > pass negative offset into ext2_test_bit(). But as Andreas pointed out, > when we load descriptors into memory, we check that both bitmaps and > inode table is in ext2_check_descriptors()... The other possibility > would be that we managed to corrupts s_first_data_block in the > superblock. Anyway, both possibilities don't look very likely. I'll try > to reproduce the problem and maybe get more insight... How large is your > filesystem BTW? > > > Here's the most current script I use (zzuf is the fuzzer): > > > > ------------------------------------------------------------ > > #!/bin/sh > > > > if [ "`hostname`" != "fstest" ]; then > > echo "This is a dangerous script." > > echo "Set your hostname to \`fstest\' if you want to use it." > > exit 1 > > fi > > > > umount /dev/hdb > > umount /dev/hdc > > /etc/init.d/sysklogd stop > > /etc/init.d/klogd stop > > /etc/init.d/cron stop > > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > > > #ulimit -t 20 > > > > for ((s=$1; s<1000000000; s++)); do > > umount /mnt > > echo '***** zzuffing *****' seed $s > > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > > mount /dev/hdb /mnt -t ext2 -o errors=continue || continue > > cd /mnt || continue > > timeout 30 cp -r doc doc2 >&/dev/null > > timeout 30 find -xdev >&/dev/null > > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > > timeout 30 mkdir tmp >&/dev/null > > timeout 30 echo whoah >tmp/filu 2>/dev/null > > timeout 30 rm -rf /mnt/* >&/dev/null > > cd / > > done > > ------------------------------------------------------------ Oh, hmm, this is ext2 and not ext3, so no journal... I guess my bug is still valid, but just not this one? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-20 19:07 ` Andreas Dilger @ 2008-11-02 5:27 ` Sami Liedes 0 siblings, 0 replies; 13+ messages in thread From: Sami Liedes @ 2008-11-02 5:27 UTC (permalink / raw) To: Andreas Dilger Cc: Jan Kara, Aneesh Kumar K.V, Andrew Morton, bugme-daemon, linux-ext4 [Sorry for duplicates, forgot to use email instead of bugzilla web interface.] I now have found an ext3 filesystem for which this bug happens pretty reproducibly on 2.6.27.4. Increasing commit interval seems to help it happen, otherwise the journal can be aborted and then the bug no longer happens. I do realize that this report is for the ext2 bug, but I hope finding a similar bug on ext3 might help (and even if this is a separate bug, this information should help resolve it). Here's how to do it: 1. bunzip2 the attached filesystem image hdb.10000097.bz2 (I did the following inside qemu, hence /dev/hdb) 2. mount /dev/hdb /mnt -t ext3 -o errors=continue,commit=300 3. cd /mnt 4. timeout 30 cp -r doc doc2 >&/dev/null (or manually break cp after 30 seconds, it's jammed anyway) 6. find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null 7. mkdir tmp >&/dev/null 8. echo whoah >tmp/filu 2>/dev/null 9. rm -rf /mnt/* >&/dev/null 10. while completing rm -rf, the following oops occurs: ------------------------------------------------------------ EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 8234, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_forget EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 42, count = 3 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 25630524, count = 1 EXT3-fs error (device hdb) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk BUG: unable to handle kernel paging request at c13fbbfc IP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 *pde = 07886163 *pte = 013fb160 Oops: 0000 [#1] DEBUG_PAGEALLOC Pid: 817, comm: rm Not tainted (2.6.27.4 #1) EIP: 0060:[<c02de4f9>] EFLAGS: 00000206 CPU: 0 EIP is at read_block_bitmap+0xa3/0x147 EAX: ffffdfff EBX: c13fc820 ECX: c13fc000 EDX: 00002001 ESI: c74b15b0 EDI: c7aae400 EBP: c7b7acd0 ESP: c7b7aca0 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process rm (pid: 817, ti=c7b7a000 task=c78a1ce0 task.ti=c7b7a000) Stack: 00000001 00000000 00000000 c7aaf1c0 00000246 c79cdc00 00000001 00000000 c13fc000 00000000 00000001 c163b37c c7b7ad28 c02de66f c0315003 c740aadc c7b7ad10 c7440000 c7aaf1c0 00000029 0000202a c7aae400 c7440000 c79cdcac Call Trace: [<c02de66f>] ? ext3_free_blocks_sb+0x93/0x3d6 [<c0315003>] ? journal_forget+0xff/0x1aa [<c02edd83>] ? __ext3_journal_forget+0x19/0x3f [<c02de9dd>] ? ext3_free_blocks+0x2b/0x7f [<c02e3f8c>] ? ext3_clear_blocks+0x137/0x159 [<c02e4072>] ? ext3_free_data+0xc4/0x133 [<c02e4320>] ? ext3_free_branches+0x23f/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e498d>] ? ext3_truncate+0x665/0x8ad [<c0316062>] ? journal_start+0xb2/0x112 [<c031608d>] ? journal_start+0xdd/0x112 [<c0316062>] ? journal_start+0xb2/0x112 [<c02ebb53>] ? ext3_journal_start_sb+0x29/0x4a [<c02e4ca4>] ? ext3_delete_inode+0xcf/0xdb [<c02e4bd5>] ? ext3_delete_inode+0x0/0xdb [<c02774b3>] ? generic_delete_inode+0x62/0xd5 [<c0277639>] ? generic_drop_inode+0x113/0x16a [<c02765ac>] ? iput+0x47/0x4e [<c026d9f4>] ? do_unlinkat+0xc3/0x13d [<c054484f>] ? mutex_unlock+0x8/0xa [<c026fb0b>] ? vfs_readdir+0x60/0x85 [<c026f84c>] ? filldir64+0x0/0xd7 [<c026fbc7>] ? sys_getdents64+0x97/0xa1 [<c026db66>] ? sys_unlinkat+0x23/0x36 [<c0202f1e>] ? syscall_call+0x7/0xb ======================= Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 EIP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c7b7aca0 ---[ end trace 780108b88e07a03e ]--- ------------------------------------------------------------ Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks 2008-08-18 14:58 ` Jan Kara 2008-08-18 16:51 ` Aneesh Kumar K.V @ 2008-08-19 21:43 ` Sami Liedes 1 sibling, 0 replies; 13+ messages in thread From: Sami Liedes @ 2008-08-19 21:43 UTC (permalink / raw) To: Jan Kara; +Cc: Andrew Morton, bugme-daemon, linux-ext4, Aneesh Kumar K.V On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote: > From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack@suse.cz> > Date: Mon, 18 Aug 2008 16:45:11 +0200 > Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it > > We have to check whether a group descriptor isn't corrupted in > read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try > to access bits outside of bitmap and Oops happens. I think something similar is needed for ext3, or at least the backtrace looks similar to me (tell me if you want me to file a separate bug for it): ------------------------------------------------------------ [ 1303.485714] EXT3-fs unexpected failure: !jh->b_committed_data; [ 1303.485714] inconsistent data on disk [ 1303.485714] BUG: unable to handle kernel paging request at c7edfffc [ 1303.485714] IP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 [ 1303.485714] *pde = 00007067 *pte = 07edf160 [ 1303.485714] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 1303.485714] [ 1303.485714] Pid: 17001, comm: rm Not tainted (2.6.27-rc3 #2) [ 1303.485714] EIP: 0060:[<c02ddca9>] EFLAGS: 00000246 CPU: 0 [ 1303.485714] EIP is at read_block_bitmap+0xa3/0x147 [ 1303.485714] EAX: ffffffff EBX: c7ee0800 ECX: c7ee0000 EDX: 00000001 [ 1303.485714] ESI: c3c40690 EDI: c7abd000 EBP: c79c4c9c ESP: c79c4c6c [ 1303.485714] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 1303.485714] Process rm (pid: 17001, ti=c79c4000 task=c79189a0 task.ti=c79c4000) [ 1303.485714] Stack: 00000246 00000001 00000246 c7abda3c c7413aa0 c5d7f800 00000000 00000000 [ 1303.485714] c7ee0000 00000000 00000000 c3c25064 c79c4cf4 c02dde1f c3c405b0 c79c4ccc [ 1303.485714] c0317987 00000001 c0314a9b 00000029 0000002a c7abd000 c7440000 c5d7f8ac [ 1303.485714] Call Trace: [ 1303.485714] [<c02dde1f>] ? ext3_free_blocks_sb+0x93/0x3d6 [ 1303.485714] [<c0317987>] ? journal_revoke+0x81/0xe3 [ 1303.485714] [<c0314a9b>] ? do_get_write_access+0x381/0x49c [ 1303.485714] [<c02ed428>] ? __ext3_journal_revoke+0x1e/0x44 [ 1303.485714] [<c02de18d>] ? ext3_free_blocks+0x2b/0x7f [ 1303.485714] [<c02e3694>] ? ext3_clear_blocks+0x11f/0x141 [ 1303.485714] [<c02e377a>] ? ext3_free_data+0xc4/0x133 [ 1303.485714] [<c02e3a0e>] ? ext3_free_branches+0x225/0x22d [ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d [ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d [ 1303.485714] [<c02e407d>] ? ext3_truncate+0x667/0x8af [ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112 [ 1303.485714] [<c031540d>] ? journal_start+0xdd/0x112 [ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112 [ 1303.485714] [<c02eb243>] ? ext3_journal_start_sb+0x29/0x4a [ 1303.485714] [<c02e4389>] ? ext3_delete_inode+0xc4/0xdb [ 1303.485714] [<c02e42c5>] ? ext3_delete_inode+0x0/0xdb [ 1303.485714] [<c0276c2b>] ? generic_delete_inode+0x62/0xd5 [ 1303.485714] [<c0276db1>] ? generic_drop_inode+0x113/0x162 [ 1303.485714] [<c0275d3c>] ? iput+0x47/0x4e [ 1303.485714] [<c02737a7>] ? dentry_iput+0x6b/0xb1 [ 1303.485714] [<c0273859>] ? d_kill+0x1d/0x37 [ 1303.485714] [<c027519b>] ? dput+0x58/0x10a [ 1303.485714] [<c026d2a4>] ? do_rmdir+0xa4/0xc3 [ 1303.485714] [<c026d2f4>] ? sys_unlinkat+0x31/0x36 [ 1303.485714] [<c0202f3e>] ? syscall_call+0x7/0xb [ 1303.485714] ======================= [ 1303.485714] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 0 1 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 1303.485714] EIP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c79c4c6c [ 1303.485714] ---[ end trace ba199677255b7e73 ]--- ------------------------------------------------------------ $ addr2line -e vmlinux -i 0xc02ddca9 include/asm/bitops.h:305 fs/ext3/balloc.c:98 fs/ext3/balloc.c:167 98 if (!ext3_test_bit(offset, bh->b_data)) 99 /* bad block bitmap */ 100 goto err_out; ------------------------------------------------------------ Sami ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-11-02 5:48 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-11266-10286@http.bugzilla.kernel.org/>
2008-08-07 17:52 ` [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks Andrew Morton
[not found] ` <0K5800031SEDU2@smtp02.hut-mail>
2008-08-07 20:07 ` Sami Liedes
2008-08-07 20:28 ` Sami Liedes
2008-08-18 14:58 ` Jan Kara
2008-08-18 16:51 ` Aneesh Kumar K.V
2008-08-19 3:24 ` Andreas Dilger
2008-08-19 9:13 ` Jan Kara
2008-08-19 10:51 ` Sami Liedes
2008-08-20 10:25 ` Jan Kara
2008-08-20 13:29 ` Sami Liedes
2008-08-20 19:07 ` Andreas Dilger
2008-11-02 5:27 ` Sami Liedes
2008-08-19 21:43 ` Sami Liedes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox