* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs [not found] <bug-11525-27@http.bugzilla.kernel.org/> @ 2008-09-09 20:46 ` Andrew Morton 2008-09-09 21:55 ` Theodore Tso [not found] ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu> 0 siblings, 2 replies; 4+ messages in thread From: Andrew Morton @ 2008-09-09 20:46 UTC (permalink / raw) To: linux-ext4; +Cc: bugme-daemon, sliedes (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 9 Sep 2008 11:27:52 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11525 > > Summary: Unable to handle paging request at ext3_rmdir() and > ext4_rmdir() on intentionally corrupted fs > Product: File System > Version: 2.5 > KernelVersion: 2.6.27-rc5 (ext4), 2.6.27-rc3 (ext3) > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: ext3 > AssignedTo: akpm@osdl.org > ReportedBy: sliedes@cc.hut.fi > > > Hardware Environment: qemu x86 > Software Environment: Minimal Debian sid (unstable) > Problem Description: > > [I really thought I had already reported this, but since I can't find it either > via bugzilla or google, I assume I haven't.] > > Hi, > > Unfortunately this is one of those bugs that I can't find a way to reproduce > except by randomly breaking one fs after another. This happens with ext3 and > ext4, but so far I haven't seen it happen with ext2. > > On doing rm -rf on an intentionally corrupted ext3/ext4 filesystem, I > occasionally hit bugs like this (ext3 backtrace from -rc3, two ext4 traces from > -rc5). If you want me to try to reproduce the ext3 crash on latest -rc, just > mention. > > ---------- > *** seed 270, ext3, 2.6.27-rc3 *** > EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - > block = 1479317508, count = 1 > EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - > block = 4718764, count = 1 > attempt to access beyond end of device > hdb: rw=0, want=1048578, limit=20480 > EXT3-fs error (device hdb): ext3_free_branches: Read failure, inode=1428, > block=524288 > EXT3-fs warning (device hdb): empty_dir: bad directory (dir #1360) - no `.' or > `..' > EXT3-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory > #1332: directory entry across blocks - offset=0, inode=1332, rec_len= > BUG: unable to handle kernel paging request at c7c3240c > IP: [<c02e4be6>] empty_dir+0xe1/0x305 > *pde = 00007067 *pte = 07c32160 > Oops: 0000 [#1] DEBUG_PAGEALLOC > [ 1306.100454] > Pid: 24302, comm: rm Not tainted (2.6.27-rc3 #2) > EIP: 0060:[<c02e4be6>] EFLAGS: 00000246 CPU: 0 > EIP is at empty_dir+0xe1/0x305 > EAX: c7c3240c EBX: c3fa7cc4 ECX: 00000534 EDX: 00000534 > ESI: c7c2a400 EDI: c74d4888 EBP: c1e6cef4 ESP: c1e6cec0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process rm (pid: 24302, ti=c1e6c000 task=c5664d00 task.ti=c1e6c000) > Stack: 00000000 c1e6cee4 c7aab400 00000058 38583e14 72b9e783 00000002 c7c3240c > c7aaa800 00000000 c7440000 c744471c fffffffb c1e6cf28 c02e7910 00000246 > c0620de0 c3c67690 c0620de0 c3c67688 c3fa7cc4 c3f6e230 c7cab9a0 00000000 > Call Trace: > [<c02e7910>] ? ext3_rmdir+0xb7/0x18f > [<c026ba2d>] ? vfs_rmdir+0x7e/0xb3 > [<c026d2b7>] ? do_rmdir+0xb7/0xc3 > [<c026d2f4>] ? sys_unlinkat+0x31/0x36 > [<c0202f3e>] ? syscall_call+0x7/0xb > ======================= > Code: 08 5c b4 5d c0 c7 44 24 04 a4 26 55 c0 8b 45 ec 89 04 24 e8 47 45 00 00 > b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 85 c0 74 86 8d > 56 08 b8 6c cb 5f c0 e8 a8 9d 17 00 85 c0 > EIP: [<c02e4be6>] empty_dir+0xe1/0x305 SS:ESP 0068:c1e6cec0 > ---[ end trace 3a33b21de407e362 ]--- > ---------- > *** seed 451, ext4, 2.6.27-rc5 *** > attempt to access beyond end of device > hdb: rw=0, want=268435458, limit=20480 > EXT4-fs error (device hdb): ext4_xattr_delete_inode: inode 507: block 134217728 > read error > EXT4-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory > #653: directory entry across blocks - offset=0, inode=653, rec_len=16 > BUG: unable to handle kernel paging request at c7d2540c > IP: [<c02fb496>] empty_dir+0xe1/0x305 > *pde = 00007067 *pte = 07d25160 > Oops: 0000 [#1] DEBUG_PAGEALLOC > [ 2151.877484] > Pid: 20705, comm: rm Not tainted (2.6.27-rc5 #2) > EIP: 0060:[<c02fb496>] EFLAGS: 00000246 CPU: 0 > EIP is at empty_dir+0xe1/0x305 > EAX: c7d2540c EBX: c48440e0 ECX: 0000028d EDX: 0000028d > ESI: c7d21400 EDI: c1b99428 EBP: c1bd7ef4 ESP: c1bd7ec0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process rm (pid: 20705, ti=c1bd7000 task=c1a38000 task.ti=c1bd7000) > Stack: 00000000 c1bd7ee4 c6169800 0000007e e18fea3c 54ed2757 00000001 c7d2540c > c6169400 00000000 c4a35020 c4982138 fffffffb c1bd7f28 c02fe5ef 00000246 > c0620de0 c485bbe8 c0620de0 c485bbe0 c48440e0 c4a15dc8 c2b7a5c8 00000000 > Call Trace: > [<c02fe5ef>] ? ext4_rmdir+0xd5/0x1e8 > [<c026bd5d>] ? vfs_rmdir+0x7e/0xb3 > [<c026d5e7>] ? do_rmdir+0xb7/0xc3 > [<c026d624>] ? sys_unlinkat+0x31/0x36 > [<c0202f3e>] ? syscall_call+0x7/0xb > ======================= > Code: 08 54 b4 5d c0 c7 44 24 04 a4 34 55 c0 8b 45 ec 89 04 24 e8 73 4b 00 00 > b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 8 > EIP: [<c02fb496>] empty_dir+0xe1/0x305 SS:ESP 0068:c1bd7ec0 > ---[ end trace 79e4e3dfd3fb9e7d ]--- > umount: /mnt: device is busy > ---------- > *** seed 10000193, ext4, 2.6.27-rc5 *** > EXT4-fs warning (device hdb): empty_dir: bad directory (dir #733) - no `.' or > `..' > EXT4-fs error (device hdb): htree_dirblock_to_tree: bad entry in directory > #461: directory entry across blocks - offset=0, inode=461, rec_len=82 > BUG: unable to handle kernel paging request at c769940c > IP: [<c02fb496>] empty_dir+0xe1/0x305 > *pde = 079e7163 *pte = 07699160 > Oops: 0000 [#1] DEBUG_PAGEALLOC > [ 961.774442] > Pid: 4518, comm: rm Not tainted (2.6.27-rc5 #2) > EIP: 0060:[<c02fb496>] EFLAGS: 00000246 CPU: 0 > EIP is at empty_dir+0xe1/0x305 > EAX: c769940c EBX: c3fc36c8 ECX: 000001cd EDX: 000001cd > ESI: c7697400 EDI: c3fc8380 EBP: c7a6cef4 ESP: c7a6cec0 > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process rm (pid: 4518, ti=c7a6c000 task=c78bc360 task.ti=c7a6c000) > Stack: 00000000 c7a6cee4 c532ec00 0000007e 1da9562e eb3f2f99 00000001 c769940c > c532e000 00000000 c3ee0020 c3eada08 fffffffb c7a6cf28 c02fe5ef 00000246 > c0620de0 c747c560 c0620de0 c747c558 c3fc36c8 c3fc8d90 c76965f0 00000000 > Call Trace: > [<c02fe5ef>] ? ext4_rmdir+0xd5/0x1e8 > [<c026bd5d>] ? vfs_rmdir+0x7e/0xb3 > [<c026d5e7>] ? do_rmdir+0xb7/0xc3 > [<c026d624>] ? sys_unlinkat+0x31/0x36 > [<c0202f3e>] ? syscall_call+0x7/0xb > ======================= > Code: 08 54 b4 5d c0 c7 44 24 04 a4 34 55 c0 8b 45 ec 89 04 24 e8 73 4b 00 00 > b8 01 00 00 00 83 c4 28 5b 5e 5f 5d c3 8d 04 06 89 45 e8 <8b> 00 8 > EIP: [<c02fb496>] empty_dir+0xe1/0x305 SS:ESP 0068:c7a6cec0 > ---[ end trace 7aaee6ca8f8adc20 ]--- > ---------- > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs 2008-09-09 20:46 ` [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs Andrew Morton @ 2008-09-09 21:55 ` Theodore Tso [not found] ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu> 1 sibling, 0 replies; 4+ messages in thread From: Theodore Tso @ 2008-09-09 21:55 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-ext4, bugme-daemon, sliedes > > Unfortunately this is one of those bugs that I can't find a way to > > reproduce except by randomly breaking one fs after another. This > > happens with ext3 and ext4, but so far I haven't seen it happen > > with ext2. > > > > > > *** seed 270, ext3, 2.6.27-rc3 *** > > *** seed 451, ext4, 2.6.27-rc5 *** Given these seed numbers, I assume this was generating using some tool like fsfuzzer? Would it be possible to generate a filesystem image *before* that triggers the problem case, before trying to execute the rm -rf? That would be the fastest way to try to track the problem down. - Ted ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>]
* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs [not found] ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu> @ 2008-09-10 3:26 ` Sami Liedes 2008-09-10 12:58 ` Theodore Tso 0 siblings, 1 reply; 4+ messages in thread From: Sami Liedes @ 2008-09-10 3:26 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, bugme-daemon On Tue, Sep 09, 2008 at 05:55:31PM -0400, Theodore Tso wrote: > > > Unfortunately this is one of those bugs that I can't find a way to > > > reproduce except by randomly breaking one fs after another. This > > > happens with ext3 and ext4, but so far I haven't seen it happen > > > with ext2. > > > > > > > > > *** seed 270, ext3, 2.6.27-rc3 *** > > > *** seed 451, ext4, 2.6.27-rc5 *** > > Given these seed numbers, I assume this was generating using some tool > like fsfuzzer? Would it be possible to generate a filesystem image > *before* that triggers the problem case, before trying to execute the > rm -rf? > > That would be the fastest way to try to track the problem down. Yes, I can generate those filesystems. However the problem seems to be elusive in that I haven't yet been able to reproduce it twice with the same filesystem (and even with random filesystems, it every occurs once in a while). I'll do some more testing and try to figure out if it can be reproduced more easily. Still I can give you some filesystems that crashed once, if you wish. They are typically something like 600 KiB compressed, and I guess that could be made less by zeroing all regular files in the pristine fs before doing the fuzzing. Here's a script I use to do the testing ($1 is the initial seed). The filesystem is a 10 MiB pristine ext[34] image with a copy of my workstation's /dev and a partial copy of /usr/share/doc (I tried to be diverse in what I put there). ------------------------------------------------------------ #!/bin/sh if [ "`hostname`" != "fstest" ]; then echo "This is a dangerous script." echo "Set your hostname to \`fstest\' if you want to use it." exit 1 fi umount /dev/hdb umount /dev/hdc /etc/init.d/sysklogd stop /etc/init.d/klogd stop /etc/init.d/cron stop mount /dev/hda / -t ext3 -o remount,ro || exit 1 #ulimit -t 20 for ((s=$1; s<1000000000; s++)); do umount /mnt echo '***** zzuffing *****' seed $s zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit mount /dev/hdb /mnt -t ext2 -o errors=continue || continue cd /mnt || continue timeout 30 cp -r doc doc2 >&/dev/null timeout 30 find -xdev >&/dev/null timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null timeout 30 mkdir tmp >&/dev/null timeout 30 echo whoah >tmp/filu 2>/dev/null timeout 30 rm -rf /mnt/* >&/dev/null cd / done ------------------------------------------------------------ Sami ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs 2008-09-10 3:26 ` Sami Liedes @ 2008-09-10 12:58 ` Theodore Tso 0 siblings, 0 replies; 4+ messages in thread From: Theodore Tso @ 2008-09-10 12:58 UTC (permalink / raw) To: Sami Liedes; +Cc: Andrew Morton, linux-ext4, bugme-daemon On Wed, Sep 10, 2008 at 06:26:34AM +0300, Sami Liedes wrote: > > Yes, I can generate those filesystems. However the problem seems to be > elusive in that I haven't yet been able to reproduce it twice with the > same filesystem (and even with random filesystems, it every occurs > once in a while). I'll do some more testing and try to figure out if > it can be reproduced more easily. Still I can give you some > filesystems that crashed once, if you wish. They are typically > something like 600 KiB compressed, and I guess that could be made less > by zeroing all regular files in the pristine fs before doing the > fuzzing. One easy way of doing this is the following: e2image -r /dev/hdXX /var/tmp/hdXX.e2i dd if=/var/tmp/hdXX.e2i of=/dev/hdXX Another thing you can do is change your script to add the following line before the filesystem is mounted: e2image -r /dev/hdXX - | bzip2 > /var/tmp/hdXX.e2i and then if the filesystem fails (i.e., the system oops), /var/tmp/hdXX.e2i.bz2 will have all of the filesystem metadata (including directories), such that if you decompress and write out the filesystem (or what I do when given one of these to examine): bunzip2 < hdXX.e2i.bz2 | make-sparse > hdXX.e2i Said sparse file can now be checked via e2fsck, or mounted using a loopback mount, etc. Even if it's not reliably reproducable, if I can get a series of filesystems which show the problem, using "e2fsck -nf" we can see a pattern of how the filesystems are corrupted, and that can help narrow down what might be going on that causes the kernel oops. Thanks, regards, - Ted /* * make-sparse.c --- make a sparse file from stdin * * Copyright 2004 by Theodore Ts'o. * * %Begin-Header% * This file may be redistributed under the terms of the GNU Public * License. * %End-Header% */ #define _LARGEFILE_SOURCE #define _LARGEFILE64_SOURCE #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <errno.h> int full_read(int fd, char *buf, size_t count) { int got, total = 0; int pass = 0; while (count > 0) { got = read(fd, buf, count); if (got == -1) { if ((errno == EINTR) || (errno == EAGAIN)) continue; return total ? total : -1; } if (got == 0) { if (pass++ >= 3) return total; continue; } pass = 0; buf += got; total += got; count -= got; } return total; } int main(int argc, char **argv) { int fd, got, i; char buf[1024]; if (argc != 2) { fprintf(stderr, "Usage: make-sparse out-file\n"); exit(1); } fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0777); if (fd < 0) { perror(argv[1]); exit(1); } while (1) { got = full_read(0, buf, sizeof(buf)); if (got == 0) break; if (got == sizeof(buf)) { for (i=0; i < sizeof(buf); i++) if (buf[i]) break; if (i == sizeof(buf)) { lseek(fd, sizeof(buf), SEEK_CUR); continue; } } write(fd, buf, got); } return 0; } ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-09-10 12:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-11525-27@http.bugzilla.kernel.org/>
2008-09-09 20:46 ` [Bug 11525] New: Unable to handle paging request at ext3_rmdir() and ext4_rmdir() on intentionally corrupted fs Andrew Morton
2008-09-09 21:55 ` Theodore Tso
[not found] ` <15802_1220997383_ZZ0K6Y00A5R7LWI2.00_20080909215531.GE21071@mit.edu>
2008-09-10 3:26 ` Sami Liedes
2008-09-10 12:58 ` Theodore Tso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox