* ext3 / reiserfs data corruption, 2.5-bk @ 2003-06-09 19:35 Dave Jones 2003-06-10 8:43 ` Oleg Drokin 0 siblings, 1 reply; 9+ messages in thread From: Dave Jones @ 2003-06-09 19:35 UTC (permalink / raw) To: Linux Kernel 2.5 Bitkeeper tree as of last 24 hrs. Running a lot of disk IO stress (multiple fsstress, over 100 fsx instances, and random sync calling) produced failures on both reiserfs and ext3. Tests were done on seperate disks, but concurrently. fsx logs at http://www.codemonkey.org.uk/cruft/reiserfs.fsxlog http://www.codemonkey.org.uk/cruft/ext3.fsxlog Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-09 19:35 ext3 / reiserfs data corruption, 2.5-bk Dave Jones @ 2003-06-10 8:43 ` Oleg Drokin 2003-06-10 9:20 ` Dave Jones 2003-06-10 21:44 ` Nathan Conrad 0 siblings, 2 replies; 9+ messages in thread From: Oleg Drokin @ 2003-06-10 8:43 UTC (permalink / raw) To: Dave Jones, Linux Kernel Hello! On Mon, Jun 09, 2003 at 08:35:55PM +0100, Dave Jones wrote: > 2.5 Bitkeeper tree as of last 24 hrs. Running a lot > of disk IO stress (multiple fsstress, over 100 fsx instances, > and random sync calling) produced failures on both reiserfs > and ext3. > Tests were done on seperate disks, but concurrently. Do you have smp or preempt enabled? Bye, Oleg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 8:43 ` Oleg Drokin @ 2003-06-10 9:20 ` Dave Jones 2003-06-10 21:44 ` Nathan Conrad 1 sibling, 0 replies; 9+ messages in thread From: Dave Jones @ 2003-06-10 9:20 UTC (permalink / raw) To: Oleg Drokin; +Cc: Linux Kernel On Tue, Jun 10, 2003 at 12:43:23PM +0400, Oleg Drokin wrote: > > 2.5 Bitkeeper tree as of last 24 hrs. Running a lot > > of disk IO stress (multiple fsstress, over 100 fsx instances, > > and random sync calling) produced failures on both reiserfs > > and ext3. > > Tests were done on seperate disks, but concurrently. > > Do you have smp or preempt enabled? # CONFIG_SMP is not set CONFIG_PREEMPT=y Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 8:43 ` Oleg Drokin 2003-06-10 9:20 ` Dave Jones @ 2003-06-10 21:44 ` Nathan Conrad 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Nathan Conrad @ 2003-06-10 21:44 UTC (permalink / raw) To: Oleg Drokin; +Cc: Dave Jones, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 2598 bytes --] I've been noticing a similar problem on my laptop. This may, or may not be related, but it did start somewhere within the past week (maybe the IDE taskfile conversion???, to throw out a guess). I wonder if Dave Jones is using IDE or SCSI. CONFIG_SMP and CONFIG_PREEMPT are disabled on my machine (Sony Vaio PCG-FXA49 laptop, Athlon4). I'm compiling the kernel with gcc 3.3 (Debian version). Anyway, certain directories get locked up on occasion and when I try to execute 'ls' or read from the directory, the process gets into a locked up state; ^C does not work to kill the process. The only way to make a directory "readable" is to restart the machine. I have not noticed any FS corruption, just the lack of being able to enter the directory. At the same time, a kernel bug will be displayed: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c016781a *pde = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[find_inode_fast+26/96] Not tainted EFLAGS: 00010286 EIP is at find_inode_fast+0x1a/0x60 eax: db0355c4 ebx: 0001859f ecx: c3a69844 edx: 00000000 esi: dfd60c00 edi: dff99340 ebp: dff99340 esp: cc6dde50 ds: 007b es: 007b ss: 0068 Process emacs20 (pid: 16508, threadinfo=cc6dc000 task=c6d0adc0) Stack: c4bca5b8 0001859f 0001859f dfd60c00 c0167d2e dfd60c00 dff99340 0001859f 0001859f da191d40 dfd60c00 da191d40 c018e45b dfd60c00 0001859f db666130 fffffff4 dca22aac dca22a44 c015cd60 dca22a44 da191d40 00000000 cc6ddf48 Call Trace: [iget_locked+78/160] iget_locked+0x4e/0xa0 [ext3_lookup+107/208] ext3_lookup+0x6b/0xd0 [real_lookup+192/240] real_lookup+0xc0/0xf0 [do_lookup+158/176] do_lookup+0x9e/0xb0 [link_path_walk+1066/2000] link_path_walk+0x42a/0x7d0 [__user_walk+73/96] __user_walk+0x49/0x60 [vfs_stat+31/96] vfs_stat+0x1f/0x60 [sys_stat64+27/64] sys_stat64+0x1b/0x40 [syscall_call+7/11] syscall_call+0x7/0xb Code: 0f 18 02 90 39 59 18 89 c8 74 0f 85 d2 89 d1 75 ed 31 c0 83 On Tue, Jun 10, 2003 at 12:43:23PM +0400, Oleg Drokin wrote: > Hello! > > On Mon, Jun 09, 2003 at 08:35:55PM +0100, Dave Jones wrote: > > > 2.5 Bitkeeper tree as of last 24 hrs. Running a lot > > of disk IO stress (multiple fsstress, over 100 fsx instances, > > and random sync calling) produced failures on both reiserfs > > and ext3. > > Tests were done on seperate disks, but concurrently. > > Do you have smp or preempt enabled? > > Bye, > Oleg -Nathan Conrad -- Nathan J. Conrad GPG: F4FC 7E25 9308 ECE1 735C 0798 CE86 DA45 9170 3112 [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 21:44 ` Nathan Conrad @ 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz 2003-06-10 22:18 ` Nathan Conrad 2003-06-10 20:59 ` Andrew Morton 2003-06-10 22:49 ` ext3 / reiserfs data corruption, 2.5-bk Dave Jones 2 siblings, 1 reply; 9+ messages in thread From: Bartlomiej Zolnierkiewicz @ 2003-06-10 18:11 UTC (permalink / raw) To: Nathan Conrad; +Cc: Oleg Drokin, Dave Jones, Linux Kernel On Tue, 10 Jun 2003, Nathan Conrad wrote: > I've been noticing a similar problem on my laptop. This may, or may > not be related, but it did start somewhere within the past week (maybe > the IDE taskfile conversion???, to throw out a guess). I wonder if wrt taskfile conversion, if you are using DMA on your IDE disks, there shouldn't be any change in behaviour. I will prepare a patch adding old crap and making it selectable (default will be taskfile, if you go into problems you can check with old code) to easy spotting possible taskfile problems and allowing quick judging - taskfile guilty/not guilty. -- Bartlomiej > Dave Jones is using IDE or SCSI. CONFIG_SMP and CONFIG_PREEMPT are > disabled on my machine (Sony Vaio PCG-FXA49 laptop, Athlon4). I'm > compiling the kernel with gcc 3.3 (Debian version). > > Anyway, certain directories get locked up on occasion and when I try > to execute 'ls' or read from the directory, the process gets into a > locked up state; ^C does not work to kill the process. The only way to > make a directory "readable" is to restart the machine. I have not > noticed any FS corruption, just the lack of being able to enter the > directory. > > At the same time, a kernel bug will be displayed: <...> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz @ 2003-06-10 22:18 ` Nathan Conrad 0 siblings, 0 replies; 9+ messages in thread From: Nathan Conrad @ 2003-06-10 22:18 UTC (permalink / raw) To: Bartlomiej Zolnierkiewicz; +Cc: Oleg Drokin, Dave Jones, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1899 bytes --] Oh, ok. I am using DMA on my drives. The problem with this bug is that it is fairly hard to observe, I've only seen it about once every other day. I should have also pointed out that I am using ext3. I thought that it might be taskfile stuff because that was the major change in the kernel the time right before I started to notice these problems. There likely is some other source of problems because you say that there should be no change in behaviour. -Nathan On Tue, Jun 10, 2003 at 08:11:22PM +0200, Bartlomiej Zolnierkiewicz wrote: > > On Tue, 10 Jun 2003, Nathan Conrad wrote: > > > I've been noticing a similar problem on my laptop. This may, or may > > not be related, but it did start somewhere within the past week (maybe > > the IDE taskfile conversion???, to throw out a guess). I wonder if > > wrt taskfile conversion, if you are using DMA on your IDE disks, > there shouldn't be any change in behaviour. > > I will prepare a patch adding old crap and making it selectable > (default will be taskfile, if you go into problems you can check > with old code) to easy spotting possible taskfile problems > and allowing quick judging - taskfile guilty/not guilty. > > -- > Bartlomiej > > > Dave Jones is using IDE or SCSI. CONFIG_SMP and CONFIG_PREEMPT are > > disabled on my machine (Sony Vaio PCG-FXA49 laptop, Athlon4). I'm > > compiling the kernel with gcc 3.3 (Debian version). > > > > Anyway, certain directories get locked up on occasion and when I try > > to execute 'ls' or read from the directory, the process gets into a > > locked up state; ^C does not work to kill the process. The only way to > > make a directory "readable" is to restart the machine. I have not > > noticed any FS corruption, just the lack of being able to enter the > > directory. > > > > At the same time, a kernel bug will be displayed: > > <...> [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 21:44 ` Nathan Conrad 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz @ 2003-06-10 20:59 ` Andrew Morton 2003-06-12 5:20 ` ext3 / reiserfs data corruption, 2.5-bk; NULL pointer dereference bug Nathan Conrad 2003-06-10 22:49 ` ext3 / reiserfs data corruption, 2.5-bk Dave Jones 2 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2003-06-10 20:59 UTC (permalink / raw) To: Nathan Conrad; +Cc: green, davej, linux-kernel Nathan Conrad <conrad@bungled.net> wrote: > > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > printing eip: > c016781a > *pde = 00000000 > Oops: 0000 [#1] > CPU: 0 > EIP: 0060:[find_inode_fast+26/96] Not tainted Something scribbled on your inode hash chains. Please make sure that you're building the kernel with all the memory debug options enabled, and run memtest86 on that machine for 12 hourws or so. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk; NULL pointer dereference bug 2003-06-10 20:59 ` Andrew Morton @ 2003-06-12 5:20 ` Nathan Conrad 0 siblings, 0 replies; 9+ messages in thread From: Nathan Conrad @ 2003-06-12 5:20 UTC (permalink / raw) To: Andrew Morton; +Cc: green, davej, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2042 bytes --] I just saw another one of these NULL pointer dereference oops on my laptop: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c01665f3 *pde = 00000000 Oops: 0000 [#1] CPU: 0 EIP: 0060:[__d_lookup+99/256] Not tainted EFLAGS: 00210282 EIP is at __d_lookup+0x63/0x100 eax: 00000000 ebx: c06ef980 ecx: 00000010 edx: dfe80000 esi: dfe8da40 edi: 00000000 ebp: df85be70 esp: db047ec8 ds: 007b es: 007b ss: 0068 Process gcc (pid: 4738, threadinfo=db046000 task=c22198c0) Stack: dcfcc014 c012a225 00000000 00000000 dfe8da40 db047f48 00000000 dcfcc001 0029e101 00000003 dcfcc001 db047f90 dfff4fc0 db047f3c c015cf80 dfd50e00 db047f44 c015cb64 dcfcc001 dcfcc005 db047f3c db047f44 c015d129 db047f90 Call Trace: [in_group_p+37/48] in_group_p+0x25/0x30 [do_lookup+48/176] do_lookup+0x30/0xb0 [permission+84/112] permission+0x54/0x70 [link_path_walk+297/2000] link_path_walk+0x129/0x7d0 [__user_walk+73/96] __user_walk+0x49/0x60 [sys_access+129/320] sys_access+0x81/0x140 [syscall_call+7/11] syscall_call+0x7/0xb Code: 0f 18 00 90 8b 74 24 10 8d 5d 90 39 73 78 75 17 8b 7b 58 89 I ran memtest86 for about 14 hours and it passed all of its tests. I enabled the memory debugging options (under the kernel hacking section) and I did not notice any errors displayed by it in my syslog. I'm not sure what else to try... The backtrace is signifigantly different that the last one... On Tue, Jun 10, 2003 at 01:59:35PM -0700, Andrew Morton wrote: > Nathan Conrad <conrad@bungled.net> wrote: > > > > Unable to handle kernel NULL pointer dereference at virtual address 00000000 > > printing eip: > > c016781a > > *pde = 00000000 > > Oops: 0000 [#1] > > CPU: 0 > > EIP: 0060:[find_inode_fast+26/96] Not tainted > > Something scribbled on your inode hash chains. Please make sure that > you're building the kernel with all the memory debug options enabled, and > run memtest86 on that machine for 12 hourws or so. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: ext3 / reiserfs data corruption, 2.5-bk 2003-06-10 21:44 ` Nathan Conrad 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz 2003-06-10 20:59 ` Andrew Morton @ 2003-06-10 22:49 ` Dave Jones 2 siblings, 0 replies; 9+ messages in thread From: Dave Jones @ 2003-06-10 22:49 UTC (permalink / raw) To: Nathan Conrad; +Cc: Oleg Drokin, Linux Kernel On Tue, Jun 10, 2003 at 05:44:36PM -0400, Nathan Conrad wrote: > I've been noticing a similar problem on my laptop. This may, or may > not be related, but it did start somewhere within the past week (maybe > the IDE taskfile conversion???, to throw out a guess). I wonder if > Dave Jones is using IDE or SCSI. IDE. I'm too cheap to buy SCSI. Dave ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-06-12 5:07 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-09 19:35 ext3 / reiserfs data corruption, 2.5-bk Dave Jones 2003-06-10 8:43 ` Oleg Drokin 2003-06-10 9:20 ` Dave Jones 2003-06-10 21:44 ` Nathan Conrad 2003-06-10 18:11 ` Bartlomiej Zolnierkiewicz 2003-06-10 22:18 ` Nathan Conrad 2003-06-10 20:59 ` Andrew Morton 2003-06-12 5:20 ` ext3 / reiserfs data corruption, 2.5-bk; NULL pointer dereference bug Nathan Conrad 2003-06-10 22:49 ` ext3 / reiserfs data corruption, 2.5-bk Dave Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox