Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs
       [not found] <bug-11525-27@http.bugzilla.kernel.org/>
@ 2008-09-09 20:46 ` Andrew Morton
  2008-09-09 21:55   ` Theodore Tso
       [not found]   ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2008-09-09 20:46 UTC (permalink / raw)
  To: linux-ext4; +Cc: bugme-daemon, sliedes

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).
On Tue,  9 Sep 2008 11:27:52 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11525
> 
>            Summary: Unable to handle paging request at ext3_rmdir() and
>                     ext4_rmdir() on intentionally corrupted fs
>            Product: File System
>            Version: 2.5
>      KernelVersion: 2.6.27-rc5 (ext4), 2.6.27-rc3 (ext3)
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: ext3
>         AssignedTo: akpm@osdl.org
>         ReportedBy: sliedes@cc.hut.fi
> 
> 
> Hardware Environment: qemu x86
> Software Environment: Minimal Debian sid (unstable)
> Problem Description:
> 
> [I really thought I had already reported this, but since I can't find it either
> via bugzilla or google, I assume I haven't.]
> 
> Hi,
> 
> Unfortunately this is one of those bugs that I can't find a way to reproduce
> except by randomly breaking one fs after another. This happens with ext3 and
> ext4, but so far I haven't seen it happen with ext2.
> 
> On doing rm -rf on an intentionally corrupted ext3/ext4 filesystem, I
> occasionally hit bugs like this (ext3 backtrace from -rc3, two ext4 traces from
> -rc5). If you want me to try to reproduce the ext3 crash on latest -rc, just
> mention.
> 
> ----------
> *** seed 270, ext3, 2.6.27-rc3 ***
> EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
> block = 1479317508, count = 1
> EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
> block = 4718764, count = 1
> attempt to access beyond end of device
> hdb: rw=0, want=1048578, limit=20480
> EXT3-fs error (device hdb): ext3_free_branches: Read failure, inode=1428,
> block=524288
> EXT3-fs warning (device hdb): empty_dir: bad directory (dir #1360) - no `.' or
> `..'
> EXT3-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory
> #1332: directory entry across blocks - offset=0, inode=1332, rec_len=
> BUG: unable to handle kernel paging request at c7c3240c
> IP: [<c02e4be6>] empty_dir+0xe1/0x305
> *pde = 00007067 *pte = 07c32160
> Oops: 0000 [#1] DEBUG_PAGEALLOC
> [ 1306.100454]
> Pid: 24302, comm: rm Not tainted (2.6.27-rc3 #2)
> EIP: 0060:[<c02e4be6>] EFLAGS: 00000246 CPU: 0
> EIP is at empty_dir+0xe1/0x305
> EAX: c7c3240c EBX: c3fa7cc4 ECX: 00000534 EDX: 00000534
> ESI: c7c2a400 EDI: c74d4888 EBP: c1e6cef4 ESP: c1e6cec0
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process rm (pid: 24302, ti=c1e6c000 task=c5664d00 task.ti=c1e6c000)
> Stack: 00000000 c1e6cee4 c7aab400 00000058 38583e14 72b9e783 00000002 c7c3240c
>        c7aaa800 00000000 c7440000 c744471c fffffffb c1e6cf28 c02e7910 00000246
>        c0620de0 c3c67690 c0620de0 c3c67688 c3fa7cc4 c3f6e230 c7cab9a0 00000000
> Call Trace:
>  [<c02e7910>] ? ext3_rmdir+0xb7/0x18f
>  [<c026ba2d>] ? vfs_rmdir+0x7e/0xb3
>  [<c026d2b7>] ? do_rmdir+0xb7/0xc3
>  [<c026d2f4>] ? sys_unlinkat+0x31/0x36
>  [<c0202f3e>] ? syscall_call+0x7/0xb
>  =======================
> Code: 08 5c b4 5d c0 c7 44 24 04 a4 26 55 c0 8b 45 ec 89 04 24 e8 47 45 00 00
> b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 85 c0 74 86 8d
> 56 08 b8 6c cb 5f c0 e8 a8 9d 17 00 85 c0
> EIP: [<c02e4be6>] empty_dir+0xe1/0x305 SS:ESP 0068:c1e6cec0
> ---[ end trace 3a33b21de407e362 ]---
> ----------
> *** seed 451, ext4, 2.6.27-rc5 ***
> attempt to access beyond end of device
> hdb: rw=0, want=268435458, limit=20480
> EXT4-fs error (device hdb): ext4_xattr_delete_inode: inode 507: block 134217728
> read error
> EXT4-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory
> #653: directory entry across blocks - offset=0, inode=653, rec_len=16
> BUG: unable to handle kernel paging request at c7d2540c
> IP: [<c02fb496>] empty_dir+0xe1/0x305
> *pde = 00007067 *pte = 07d25160
> Oops: 0000 [#1] DEBUG_PAGEALLOC
> [ 2151.877484]
> Pid: 20705, comm: rm Not tainted (2.6.27-rc5 #2)
> EIP: 0060:[<c02fb496>] EFLAGS: 00000246 CPU: 0
> EIP is at empty_dir+0xe1/0x305
> EAX: c7d2540c EBX: c48440e0 ECX: 0000028d EDX: 0000028d
> ESI: c7d21400 EDI: c1b99428 EBP: c1bd7ef4 ESP: c1bd7ec0
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process rm (pid: 20705, ti=c1bd7000 task=c1a38000 task.ti=c1bd7000)
> Stack: 00000000 c1bd7ee4 c6169800 0000007e e18fea3c 54ed2757 00000001 c7d2540c
>        c6169400 00000000 c4a35020 c4982138 fffffffb c1bd7f28 c02fe5ef 00000246
>        c0620de0 c485bbe8 c0620de0 c485bbe0 c48440e0 c4a15dc8 c2b7a5c8 00000000
> Call Trace:
>  [<c02fe5ef>] ? ext4_rmdir+0xd5/0x1e8
>  [<c026bd5d>] ? vfs_rmdir+0x7e/0xb3
>  [<c026d5e7>] ? do_rmdir+0xb7/0xc3
>  [<c026d624>] ? sys_unlinkat+0x31/0x36
>  [<c0202f3e>] ? syscall_call+0x7/0xb
>  =======================
> Code: 08 54 b4 5d c0 c7 44 24 04 a4 34 55 c0 8b 45 ec 89 04 24 e8 73 4b 00 00
> b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 8
> EIP: [<c02fb496>] empty_dir+0xe1/0x305 SS:ESP 0068:c1bd7ec0
> ---[ end trace 79e4e3dfd3fb9e7d ]---
> umount: /mnt: device is busy
> ----------
> *** seed 10000193, ext4, 2.6.27-rc5 ***
> EXT4-fs warning (device hdb): empty_dir: bad directory (dir #733) - no `.' or
> `..'
> EXT4-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory
> #461: directory entry across blocks - offset=0, inode=461, rec_len=82
> BUG: unable to handle kernel paging request at c769940c
> IP: [<c02fb496>] empty_dir+0xe1/0x305
> *pde = 079e7163 *pte = 07699160
> Oops: 0000 [#1] DEBUG_PAGEALLOC
> [  961.774442]
> Pid: 4518, comm: rm Not tainted (2.6.27-rc5 #2)
> EIP: 0060:[<c02fb496>] EFLAGS: 00000246 CPU: 0
> EIP is at empty_dir+0xe1/0x305
> EAX: c769940c EBX: c3fc36c8 ECX: 000001cd EDX: 000001cd
> ESI: c7697400 EDI: c3fc8380 EBP: c7a6cef4 ESP: c7a6cec0
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process rm (pid: 4518, ti=c7a6c000 task=c78bc360 task.ti=c7a6c000)
> Stack: 00000000 c7a6cee4 c532ec00 0000007e 1da9562e eb3f2f99 00000001 c769940c
>        c532e000 00000000 c3ee0020 c3eada08 fffffffb c7a6cf28 c02fe5ef 00000246
>        c0620de0 c747c560 c0620de0 c747c558 c3fc36c8 c3fc8d90 c76965f0 00000000
> Call Trace:
>  [<c02fe5ef>] ? ext4_rmdir+0xd5/0x1e8
>  [<c026bd5d>] ? vfs_rmdir+0x7e/0xb3
>  [<c026d5e7>] ? do_rmdir+0xb7/0xc3
>  [<c026d624>] ? sys_unlinkat+0x31/0x36
>  [<c0202f3e>] ? syscall_call+0x7/0xb
>  =======================
> Code: 08 54 b4 5d c0 c7 44 24 04 a4 34 55 c0 8b 45 ec 89 04 24 e8 73 4b 00 00
> b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 8
> EIP: [<c02fb496>] empty_dir+0xe1/0x305 SS:ESP 0068:c7a6cec0
> ---[ end trace 7aaee6ca8f8adc20 ]---
> ----------
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs
  2008-09-09 20:46 ` [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs Andrew Morton
@ 2008-09-09 21:55   ` Theodore Tso
       [not found]   ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>
  1 sibling, 0 replies; 4+ messages in thread
From: Theodore Tso @ 2008-09-09 21:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-ext4, bugme-daemon, sliedes

> > Unfortunately this is one of those bugs that I can't find a way to
> > reproduce except by randomly breaking one fs after another. This
> > happens with ext3 and ext4, but so far I haven't seen it happen
> > with ext2.
> > 
> >
> > *** seed 270, ext3, 2.6.27-rc3 ***
> > *** seed 451, ext4, 2.6.27-rc5 ***

Given these seed numbers, I assume this was generating using some tool
like fsfuzzer?  Would it be possible to generate a filesystem image
*before* that triggers the problem case, before trying to execute the
rm -rf?  

That would be the fastest way to try to track the problem down.

							- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

[parent not found: <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>]

* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs
       [not found]   ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>
@ 2008-09-10  3:26     ` Sami Liedes
  2008-09-10 12:58       ` Theodore Tso
  0 siblings, 1 reply; 4+ messages in thread
From: Sami Liedes @ 2008-09-10  3:26 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, bugme-daemon

On Tue, Sep 09, 2008 at 05:55:31PM -0400, Theodore Tso wrote:
> > > Unfortunately this is one of those bugs that I can't find a way to
> > > reproduce except by randomly breaking one fs after another. This
> > > happens with ext3 and ext4, but so far I haven't seen it happen
> > > with ext2.
> > > 
> > >
> > > *** seed 270, ext3, 2.6.27-rc3 ***
> > > *** seed 451, ext4, 2.6.27-rc5 ***
> 
> Given these seed numbers, I assume this was generating using some tool
> like fsfuzzer?  Would it be possible to generate a filesystem image
> *before* that triggers the problem case, before trying to execute the
> rm -rf?  
> 
> That would be the fastest way to try to track the problem down.

Yes, I can generate those filesystems. However the problem seems to be
elusive in that I haven't yet been able to reproduce it twice with the
same filesystem (and even with random filesystems, it every occurs
once in a while). I'll do some more testing and try to figure out if
it can be reproduced more easily. Still I can give you some
filesystems that crashed once, if you wish. They are typically
something like 600 KiB compressed, and I guess that could be made less
by zeroing all regular files in the pristine fs before doing the
fuzzing.

Here's a script I use to do the testing ($1 is the initial seed). The
filesystem is a 10 MiB pristine ext[34] image with a copy of my
workstation's /dev and a partial copy of /usr/share/doc (I tried to be
diverse in what I put there).

------------------------------------------------------------
#!/bin/sh

if [ "`hostname`" != "fstest" ]; then
   echo "This is a dangerous script."
   echo "Set your hostname to \`fstest\' if you want to use it."
   exit 1
fi

umount /dev/hdb
umount /dev/hdc
/etc/init.d/sysklogd stop
/etc/init.d/klogd stop
/etc/init.d/cron stop
mount /dev/hda / -t ext3 -o remount,ro || exit 1

#ulimit -t 20

for ((s=$1; s<1000000000; s++)); do
  umount /mnt
  echo '***** zzuffing *****' seed $s
  zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
  mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
  cd /mnt || continue
  timeout 30 cp -r doc doc2 >&/dev/null
  timeout 30 find -xdev >&/dev/null
  timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
  timeout 30 mkdir tmp >&/dev/null
  timeout 30 echo whoah >tmp/filu 2>/dev/null
  timeout 30 rm -rf /mnt/* >&/dev/null
  cd /
done
------------------------------------------------------------

	Sami

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs
  2008-09-10  3:26     ` Sami Liedes
@ 2008-09-10 12:58       ` Theodore Tso
  0 siblings, 0 replies; 4+ messages in thread
From: Theodore Tso @ 2008-09-10 12:58 UTC (permalink / raw)
  To: Sami Liedes; +Cc: Andrew Morton, linux-ext4, bugme-daemon

On Wed, Sep 10, 2008 at 06:26:34AM +0300, Sami Liedes wrote:
> 
> Yes, I can generate those filesystems. However the problem seems to be
> elusive in that I haven't yet been able to reproduce it twice with the
> same filesystem (and even with random filesystems, it every occurs
> once in a while). I'll do some more testing and try to figure out if
> it can be reproduced more easily. Still I can give you some
> filesystems that crashed once, if you wish. They are typically
> something like 600 KiB compressed, and I guess that could be made less
> by zeroing all regular files in the pristine fs before doing the
> fuzzing.

One easy way of doing this is the following:

    e2image -r /dev/hdXX /var/tmp/hdXX.e2i
    dd if=/var/tmp/hdXX.e2i of=/dev/hdXX

Another thing you can do is change your script to add the following
line before the filesystem is mounted:

     e2image -r /dev/hdXX - | bzip2 > /var/tmp/hdXX.e2i

and then if the filesystem fails (i.e., the system oops),
/var/tmp/hdXX.e2i.bz2 will have all of the filesystem metadata
(including directories), such that if you decompress and write out the
filesystem (or what I do when given one of these to examine):

   bunzip2 < hdXX.e2i.bz2 | make-sparse > hdXX.e2i

Said sparse file can now be checked via e2fsck, or mounted using a
loopback mount, etc.

Even if it's not reliably reproducable, if I can get a series of
filesystems which show the problem, using "e2fsck -nf" we can see a
pattern of how the filesystems are corrupted, and that can help narrow
down what might be going on that causes the kernel oops.

Thanks, regards,

     	  	   	    	 	    	   - Ted

/*
 * make-sparse.c --- make a sparse file from stdin
 * 
 * Copyright 2004 by Theodore Ts'o.
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License.
 * %End-Header%
 */

#define _LARGEFILE_SOURCE
#define _LARGEFILE64_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int full_read(int fd, char *buf, size_t count)
{
	int got, total = 0;
	int pass = 0;

	while (count > 0) {
		got = read(fd, buf, count);
		if (got == -1) {
			if ((errno == EINTR) || (errno == EAGAIN)) 
				continue;
			return total ? total : -1;
		}
		if (got == 0) {
			if (pass++ >= 3)
				return total;
			continue;
		}
		pass = 0;
		buf += got;
		total += got;
		count -= got;
	}
	return total;
}

int main(int argc, char **argv)
{
	int fd, got, i;
	char buf[1024];

	if (argc != 2) {
		fprintf(stderr, "Usage: make-sparse out-file\n");
		exit(1);
	}
	fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0777);
	if (fd < 0) {
		perror(argv[1]);
		exit(1);
	}
	while (1) {
		got = full_read(0, buf, sizeof(buf));
		if (got == 0)
			break;
		if (got == sizeof(buf)) {
			for (i=0; i < sizeof(buf); i++) 
				if (buf[i])
					break;
			if (i == sizeof(buf)) {
				lseek(fd, sizeof(buf), SEEK_CUR);
				continue;
			}
		}
		write(fd, buf, got);
	}
	return 0;
}
		

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-09-10 12:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-11525-27@http.bugzilla.kernel.org/>
2008-09-09 20:46 ` [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs Andrew Morton
2008-09-09 21:55   ` Theodore Tso
     [not found]   ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>
2008-09-10  3:26     ` Sami Liedes
2008-09-10 12:58       ` Theodore Tso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox