* Re: corruption
@ 2000-11-29 21:54 Andries.Brouwer
2000-11-29 22:18 ` corruption Alexander Viro
0 siblings, 1 reply; 55+ messages in thread
From: Andries.Brouwer @ 2000-11-29 21:54 UTC (permalink / raw)
To: torvalds, viro; +Cc: linux-kernel, tigran
> ISTR bug reports looking like that and IIRC they were never resolved.
Have you looked at the report by Daniel Phillips?
http://marc.theaimsgroup.com/?l=linux-kernel&m=95162877201890&w=2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: corruption 2000-11-29 21:54 corruption Andries.Brouwer @ 2000-11-29 22:18 ` Alexander Viro 2000-11-30 14:21 ` corruption Andrew Morton 0 siblings, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-11-29 22:18 UTC (permalink / raw) To: Andries.Brouwer; +Cc: torvalds, linux-kernel, tigran On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > ISTR bug reports looking like that and IIRC they were never resolved. > > Have you looked at the report by Daniel Phillips? Yes. The problem is real, but the fix... I'm doing a cleanup there and I'll post the patch when I'll give it some testing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 22:18 ` corruption Alexander Viro @ 2000-11-30 14:21 ` Andrew Morton 2000-11-30 18:39 ` corruption Jonathan Hudson 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-11-30 14:21 UTC (permalink / raw) To: Alexander Viro Cc: Andries.Brouwer, torvalds, linux-kernel, tigran, Stephen C. Tweedie, Lawrence Walton In thread "File corruption part deux", Lawrence Walton wrote: > > my system has been acting slightly odd on all the pre 12 kernels > with the fs going read only with out any messages until now. > no opps or anything like that, but I did get this just now. > > EXT2-fs error (device sd(8,2)): ext2_readdir: > bad entry in directory #458430: directory entry > across blocks - offset=152, inode=3393794200, > rec_len=12440, name_len=73 > 3393794200 == 0xca493098. A kernel address. And 152 is 0x98, which is equal to N * 0x20 + 0x18. Read on... I am somewhat reluctant to report this problem because I always run kernels with the lowish latency patch, but having reviewed the effects of that patch on fs/*.c I don't think it's to blame. Plus it's been 100% stable for months. I believe that the problem I've observed is caused by or exposed by the O_SYNC changes. Or maybe not. Running test11-ac4 on *very* vanilla machines. x86 UP, IDE, 3c905 and really nothing else. No APM, fat, vfat, isofs, USB, audio, etc. It has happened on two different machines which have been 100% reliable for a year. The problem is corruption of in-core files. It has only started happening in the past few days. It happened after two days uptime. In the most recent case my /bin/ls went bad. I took a copy and rebooted. After reboot /bin/ls had a correct MD5 sum. Here's the diff: --- ls.good Thu Nov 30 15:07:11 2000 +++ ls.bad Thu Nov 30 15:07:04 2000 @@ -1589,7 +1589,7 @@ 006340: C7 85 F8 BF FF FF 00 00 00 00 E9 EA 02 00 00 90 >@@@@@@@@@@@@@@@@< 006350: 8B BD FC BF FF FF 8D B5 00 E0 FF FF 57 68 00 20 >@@@@@@@@@@@@Wh@ < 006360: 00 00 56 E8 3C B2 FF FF 83 C4 0C 85 C0 0F 84 DD >@@V@<@@@@@@@@@@@< -006370: 02 00 00 6A 0A 56 E8 49 B0 FF FF 83 C4 08 85 C0 >@@@j@V@I@@@@@@@@< +006370: 02 00 00 6A 0A 56 E8 49 78 73 62 C6 78 73 62 C6 >@@@j@V@Ixsb@xsb@< 006380: 75 2E 8D 9D 00 C0 FF FF 8B BD FC BF FF FF 57 68 >u.@@@@@@@@@@@@Wh< 006390: 00 20 00 00 53 E8 0A B2 FF FF 83 C4 0C 85 C0 74 >@ @@S@@@@@@@@@@t< 0063A0: 0F 6A 0A 53 E8 1B B0 FF FF 83 C4 08 85 C0 74 D8 >@j@S@@@@@@@@@@t@< @@ -1709,7 +1709,7 @@ 006AC0: 00 00 00 FF 75 DF 83 E8 03 40 40 2B 44 24 58 83 >@@@@u@@@@@@+D$X@< 006AD0: C0 02 89 44 24 14 EB 08 C7 44 24 14 01 00 00 00 >@@@D$@@@@D$@@@@@< 006AE0: 8B 4C 24 3C F6 C1 01 74 5B 8B 44 24 5C 8B 74 24 >@L$<@@@t[@D$\@t$< -006AF0: 14 89 C2 83 E0 03 74 16 7A 0F 83 F8 02 74 05 38 >@@@@@@t@z@@@@t@8< +006AF0: 14 89 C2 83 E0 03 74 16 F8 7A 62 C6 F8 7A 62 C6 >@@@@@@t@@zb@@zb@< 006B00: 22 74 2F 42 38 22 74 2A 42 38 22 74 25 42 8B 02 >"t/B8"t*B8"t%B@@< 006B10: 84 E0 75 08 84 C0 74 1A 84 E4 74 15 A9 00 00 FF >@@u@@@t@@@t@@@@@< 006B20: 00 74 0D 83 C2 04 A9 00 00 00 FF 75 E1 83 EA 03 >@t@@@@@@@@@u@@@@< @@ -1733,7 +1733,7 @@ 006C40: 4C 24 54 40 51 50 E8 C9 A7 FF FF 83 C4 08 83 7C >L$T@QP@@@@@@@@@|< 006C50: 24 1C 00 74 38 C6 00 2C 8B 5C 24 3C 40 8B 4C 24 >$@@t8@@,@\$<@@L$< 006C60: 3C 83 E3 01 F6 C1 02 74 0E 8B 74 24 58 56 50 E8 ><@@@@@@t@@t$XVP@< -006C70: A0 A7 FF FF 83 C4 08 85 DB 74 12 C6 00 5F 8B 4C >@@@@@@@@@t@@@_@L< +006C70: A0 A7 FF FF 83 C4 08 85 78 7C 62 C6 78 7C 62 C6 >@@@@@@@@x|b@x|b@< 006C80: 24 5C 40 51 50 E8 8A A7 FF FF 83 C4 08 C6 00 2F >$\@QP@@@@@@@@@@/< 006C90: 31 FF 8B 74 24 60 40 56 50 E8 76 A7 FF FF 83 C4 >1@@t$`@VP@v@@@@@< 006CA0: 08 8B 4C 24 30 8B 29 85 ED 74 31 90 8D 74 26 00 >@@L$0@)@@t1@@t&@< Note that in both my cases (and, apparently, Lawrence's) the corrupted data consists of two identical kernel addresses which have the value N * 0x20 + 0x18 and they are always equal. And they occur at a file offset of N * 0x20 + 0x18 Which leads one to believe that someone somewhere is doing an init_list_head() on a wild pointer. Or, more likely, someone is doing a list_del() on a list_head which points at recycled memory, and that list_head resides within a structure at offset 0x18. And that description perfectly matches the new i_dirty_buffers field in struct inode. Which would perhaps indicate that one of the following statements: - the list_del in buffer_insert_inode_queue() or - the list_del in __remove_inode_queue() - the list_del in fsync_inode_buffers() has gotten itself a wild pointer. Other possible candidates apart from i_dirty_buffers which have a list_head at offset 0x18 and whose list_dels should be reviewed are: - request_queue.elevator.queue - dentry.d_hash - anything which has a timer_list at offset 0x18 - anything which has a waitqueue at offset 0x14 There may be others which have list_heads at 0x38, 0x58, ... This doesn't just happen a single time. The first time it happened during a CVS commit at least eight files on the server ended up with this corruption, as did /usr/lib/netscape/netscape-communicator, so we had a whole bunch of corruptions happening in a short period of time. It takes a very bad kernel bug to be able to crash netscape. Anyway, something to be thinking about. I've written the canonical list_head debugging code. I'll run that overnight and finish it off tomorrow. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 14:21 ` corruption Andrew Morton @ 2000-11-30 18:39 ` Jonathan Hudson 2000-11-30 19:07 ` corruption Alexander Viro 0 siblings, 1 reply; 55+ messages in thread From: Jonathan Hudson @ 2000-11-30 18:39 UTC (permalink / raw) To: linux-kernel In article <3A26625E.446AE3D@uow.edu.au>, Andrew Morton <andrewm@uow.edu.au> writes: AM> In thread "File corruption part deux", Lawrence Walton wrote: >> >> my system has been acting slightly odd on all the pre 12 kernels >> with the fs going read only with out any messages until now. >> no opps or anything like that, but I did get this just now. >> >> EXT2-fs error (device sd(8,2)): ext2_readdir: >> bad entry in directory #458430: directory entry >> across blocks - offset=152, inode=3393794200, >> rec_len=12440, name_len=73 >> AM> AM> 3393794200 == 0xca493098. A kernel address. And 152 is 0x98, AM> which is equal to N * 0x20 + 0x18. Read on... AM> Don't know what these do for your analysis, observed on 2.4.0test12pre2, compiling mozilla. EXT2-fs error (device ide0(3,11)): ext2_readdir: bad entry in directory #409870: directory entry across blocks - offset=88, inode=3284439128, rec_len=36952, name_len=196 EXT2-fs error (device ide0(3,11)): ext2_add_entry: bad entry in directory #344273: rec_len % 4 != 0 - offset=0, inode=1769234798, rec_len=28271, name_len=85 Recompiling it with 2.4.0test12pre3 last night did not cause any fs problems, at least that I've noticed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 18:39 ` corruption Jonathan Hudson @ 2000-11-30 19:07 ` Alexander Viro 2000-11-30 21:35 ` corruption Andrew Morton 0 siblings, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-11-30 19:07 UTC (permalink / raw) To: Jonathan Hudson; +Cc: linux-kernel On Thu, 30 Nov 2000, Jonathan Hudson wrote: > > In article <3A26625E.446AE3D@uow.edu.au>, > Andrew Morton <andrewm@uow.edu.au> writes: > AM> In thread "File corruption part deux", Lawrence Walton wrote: > >> > >> my system has been acting slightly odd on all the pre 12 kernels > >> with the fs going read only with out any messages until now. > >> no opps or anything like that, but I did get this just now. > >> > >> EXT2-fs error (device sd(8,2)): ext2_readdir: > >> bad entry in directory #458430: directory entry > >> across blocks - offset=152, inode=3393794200, > >> rec_len=12440, name_len=73 > >> > AM> > AM> 3393794200 == 0xca493098. A kernel address. And 152 is 0x98, > AM> which is equal to N * 0x20 + 0x18. Read on... > AM> > > Don't know what these do for your analysis, observed on > 2.4.0test12pre2, compiling mozilla. > > EXT2-fs error (device ide0(3,11)): > ext2_readdir: bad entry in directory #409870: directory entry across blocks > - offset=88, inode=3284439128, rec_len=36952, name_len=196 offset 0x58, data: 58 90 c4 c3 58 90 c4 Confirms. That's definitely an empty list_head at address 0xc3c49058 and -pre2 has O_SYNC patches. > EXT2-fs error (device ide0(3,11)): > ext2_add_entry: bad entry in directory #344273: rec_len % 4 != 0 - offset=0, > inode=1769234798, rec_len=28271, name_len=85 data: 6e 61 74 69 6f 6e 55, i.e. "nationU" ... and that one looks like a duplicated blocks effect, but there may be a lot of other reasons for that. > Recompiling it with 2.4.0test12pre3 last night did not cause any fs > problems, at least that I've noticed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 19:07 ` corruption Alexander Viro @ 2000-11-30 21:35 ` Andrew Morton 2000-12-01 0:57 ` corruption Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Andrew Morton @ 2000-11-30 21:35 UTC (permalink / raw) To: Alexander Viro; +Cc: Jonathan Hudson, linux-kernel Alexander Viro wrote: > > Confirms. That's definitely an empty list_head at address 0xc3c49058 and -pre2 > has O_SYNC patches. foo. The overnight run wedged tight in mmap002. No progress. I bet this'll catch it: --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 +++ include/linux/list.h Fri Dec 1 08:33:55 2000 @@ -90,6 +90,7 @@ static __inline__ void list_del(struct list_head *entry) { __list_del(entry->prev, entry->next); + entry->next = entry->prev = 0; } /** First person to send a ksymoops trace gets a cookie. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 21:35 ` corruption Andrew Morton @ 2000-12-01 0:57 ` Andrew Morton 2000-12-01 12:18 ` corruption Jens Axboe 2000-12-01 12:23 ` corruption Andrew Morton 2000-12-01 14:16 ` corruption Stephen C. Tweedie 2000-12-01 17:29 ` corruption Jeff Garzik 2 siblings, 2 replies; 55+ messages in thread From: Andrew Morton @ 2000-12-01 0:57 UTC (permalink / raw) To: Andrew Morton; +Cc: Alexander Viro, linux-kernel Andrew Morton wrote: > > I bet this'll catch it: > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > @@ -90,6 +90,7 @@ > static __inline__ void list_del(struct list_head *entry) > { > __list_del(entry->prev, entry->next); > + entry->next = entry->prev = 0; > } > > /** > > First person to send a ksymoops trace gets a cookie. mmmm... choc-chip. With the above patch applied the machine crashed after an hour. Crashed a second time during the e2fsck. gdb backtrace: (gdb) l *0xc01755d4 0xc01755d4 is in __make_request (ll_rw_blk.c:744). 739 * skip first entry, for devices with active queue head 740 */ 741 if (q->head_active && !q->plugged) 742 head = head->next; 743 744 if (list_empty(head)) { 745 q->plug_device_fn(q, bh->b_rdev); /* is atomic */ 746 goto get_rq; 747 } 748 (gdb) l *0xc0175c3d 0xc0175c3d is in generic_make_request (ll_rw_blk.c:882). 877 buffer_IO_error(bh); 878 break; 879 } 880 881 } 882 while (q->make_request_fn(q, rw, bh)); 883 } 884 885 /* This function can be used to request a number of buffers from a block 886 device. Currently the only restriction is that all buffers must belong to (gdb) l *0xc0175da1 0xc0175da1 is in ll_rw_block (ll_rw_blk.c:924). 919 printk(KERN_NOTICE "Can't write to read-only device %s\n", 920 kdevname(bhs[0]->b_dev)); 921 goto sorry; 922 } 923 924 for (i = 0; i < nr; i++) { 925 bh = bhs[i]; 926 927 /* Only one thread can actually submit the I/O. */ 928 if (test_and_set_bit(BH_Lock, &bh->b_state)) (gdb) l *0xc01309e9 0xc01309e9 is in sync_buffers (/uhome/morton/akpm/linux/include/asm/atomic.h:65). 60 :"=m" (v->counter) 61 :"m" (v->counter)); 62 } 63 64 static __inline__ void atomic_dec(atomic_t *v) 65 { 66 __asm__ __volatile__( 67 LOCK "decl %0" 68 :"=m" (v->counter) 69 :"m" (v->counter)); (gdb) l *0xc0130af7 0xc0130af7 is in fsync_dev (/uhome/morton/akpm/linux/include/asm/current.h:9). 4 struct task_struct; 5 6 static inline struct task_struct * get_current(void) 7 { 8 struct task_struct *current; 9 __asm__("andl %%esp,%0; ":"=r" (current) : "0" (~8191UL)); 10 return current; 11 } 12 13 #define current get_current() (gdb) l *0xc0137339 0xc0137339 is in block_fsync (block_dev.c:353). 348 * since the vma has no handle. 349 */ 350 351 static int block_fsync(struct file *filp, struct dentry *dentry, int datasync) 352 { 353 return fsync_dev(dentry->d_inode->i_rdev); 354 } 355 356 /* 357 * bdev cache handling - shamelessly stolen from inode.c (gdb) l *0xc0130c76 0xc0130c76 is in sys_fsync (buffer.c:373). 368 if (!file->f_op || !file->f_op->fsync) 369 goto out_putf; 370 371 /* We need to protect against concurrent writers.. */ 372 down(&inode->i_sem); 373 err = file->f_op->fsync(file, dentry, 0); 374 up(&inode->i_sem); 375 376 out_putf: 377 fput(file); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 0:57 ` corruption Andrew Morton @ 2000-12-01 12:18 ` Jens Axboe 2000-12-01 12:34 ` corruption Andrew Morton 2000-12-01 12:23 ` corruption Andrew Morton 1 sibling, 1 reply; 55+ messages in thread From: Jens Axboe @ 2000-12-01 12:18 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrew Morton, Alexander Viro, linux-kernel On Fri, Dec 01 2000, Andrew Morton wrote: > Andrew Morton wrote: > > > > I bet this'll catch it: > > > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > > @@ -90,6 +90,7 @@ > > static __inline__ void list_del(struct list_head *entry) > > { > > __list_del(entry->prev, entry->next); > > + entry->next = entry->prev = 0; > > } > > > > /** > > > > First person to send a ksymoops trace gets a cookie. > > mmmm... choc-chip. > > With the above patch applied the machine crashed after an hour. Crashed > a second time during the e2fsck. gdb backtrace: Very interesting. IDE / SCSI? -- * Jens Axboe <axboe@suse.de> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 12:18 ` corruption Jens Axboe @ 2000-12-01 12:34 ` Andrew Morton 2000-12-01 12:37 ` corruption Jens Axboe 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-01 12:34 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel Jens Axboe wrote: > > On Fri, Dec 01 2000, Andrew Morton wrote: > > Andrew Morton wrote: > > > > > > I bet this'll catch it: > > > > > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > > > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > > > @@ -90,6 +90,7 @@ > > > static __inline__ void list_del(struct list_head *entry) > > > { > > > __list_del(entry->prev, entry->next); > > > + entry->next = entry->prev = 0; > > > } > > > > > > /** > > > > > > First person to send a ksymoops trace gets a cookie. > > > > mmmm... choc-chip. > > > > With the above patch applied the machine crashed after an hour. Crashed > > a second time during the e2fsck. gdb backtrace: > > Very interesting. IDE / SCSI? hmm.. Overlapping emails. The crash with e2fsck was easily repeatable with the above patch. Just dirty a few buffers and run /sbin/sync. It's due to the __make_request queue_head thing which you fixed in test12-pre3. Yes, this was IDE. However the original problem of a list_del being performed on a wild pointer is being seen on SCSI systems. I expect the above patch will catch it if it's still happening. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 12:34 ` corruption Andrew Morton @ 2000-12-01 12:37 ` Jens Axboe 0 siblings, 0 replies; 55+ messages in thread From: Jens Axboe @ 2000-12-01 12:37 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Fri, Dec 01 2000, Andrew Morton wrote: > > > mmmm... choc-chip. > > > > > > With the above patch applied the machine crashed after an hour. Crashed > > > a second time during the e2fsck. gdb backtrace: > > > > Very interesting. IDE / SCSI? > > hmm.. Overlapping emails. > > The crash with e2fsck was easily repeatable with the above patch. Just > dirty a few buffers and run /sbin/sync. It's due to the __make_request > queue_head thing which you fixed in test12-pre3. Yes, this was IDE. Ah ok, I thought this was on test12-pre3. > However the original problem of a list_del being performed on a wild > pointer is being seen on SCSI systems. I expect the above patch will > catch it if it's still happening. Indeed, and I don't think it's request queue_head related anymore. I will look forward to seeing a trace, though :-) -- * Jens Axboe <axboe@suse.de> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 0:57 ` corruption Andrew Morton 2000-12-01 12:18 ` corruption Jens Axboe @ 2000-12-01 12:23 ` Andrew Morton 2000-12-01 15:04 ` corruption Lawrence Walton 1 sibling, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-01 12:23 UTC (permalink / raw) To: Lawrence Walton; +Cc: Alexander Viro, linux-kernel Andrew Morton wrote: > > Andrew Morton wrote: > > > > I bet this'll catch it: > > > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > > @@ -90,6 +90,7 @@ > > static __inline__ void list_del(struct list_head *entry) > > { > > __list_del(entry->prev, entry->next); > > + entry->next = entry->prev = 0; > > } > > > > /** > > > > First person to send a ksymoops trace gets a cookie. > > mmmm... choc-chip. > > With the above patch applied the machine crashed after an hour. Crashed > a second time during the e2fsck. gdb backtrace: > This sync_buffers corruption is fixed by Jens' patch, which is present in test12-pre3. However it is possible that the original list_head-based corruption which I reported will not be fixed by this patch. This is because none of the structure offsets match up with what I observed, and because Lawrence's sytem is "SCSI-only" - no SCSI drivers are headactive. Lawrence, did you see this problem with test12-pre3? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 12:23 ` corruption Andrew Morton @ 2000-12-01 15:04 ` Lawrence Walton 0 siblings, 0 replies; 55+ messages in thread From: Lawrence Walton @ 2000-12-01 15:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Alexander Viro, linux-kernel Andrew Morton [andrewm@uow.edu.au] wrote: > Andrew Morton wrote: > > > > Andrew Morton wrote: > > > > > > I bet this'll catch it: > > > > > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > > > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > > > @@ -90,6 +90,7 @@ > > > static __inline__ void list_del(struct list_head *entry) > > > { > > > __list_del(entry->prev, entry->next); > > > + entry->next = entry->prev = 0; > > > } > > > > > > /** > > > > > > First person to send a ksymoops trace gets a cookie. > > > > mmmm... choc-chip. > > > > With the above patch applied the machine crashed after an hour. Crashed > > a second time during the e2fsck. gdb backtrace: > > > > > This sync_buffers corruption is fixed by Jens' patch, which is present > in test12-pre3. > > However it is possible that the original list_head-based corruption which > I reported will not be fixed by this patch. This is because none of the > structure offsets match up with what I observed, and because Lawrence's > system is "SCSI-only" - no SCSI drivers are headactive. > > Lawrence, did you see this problem with test12-pre3? Yes all of my current problems are with pre3; Is there anything you would like me to test specific? The best way to cause problems so far is compiling mozilla. -- *--* Mail: lawrence@otak.com *--* Voice: 425.739.4247 *--* Fax: 425.827.9577 *--* HTTP://www.otak-k.com/~lawrence/ -------------------------------------- - - - - - - O t a k i n c . - - - - - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 21:35 ` corruption Andrew Morton 2000-12-01 0:57 ` corruption Andrew Morton @ 2000-12-01 14:16 ` Stephen C. Tweedie 2000-12-01 23:28 ` corruption Andrew Morton 2000-12-02 3:59 ` corruption Andrew Morton 2000-12-01 17:29 ` corruption Jeff Garzik 2 siblings, 2 replies; 55+ messages in thread From: Stephen C. Tweedie @ 2000-12-01 14:16 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, Jonathan Hudson, linux-kernel, Stephen Tweedie Hi, On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote: > > I bet this'll catch it: > > static __inline__ void list_del(struct list_head *entry) > { > __list_del(entry->prev, entry->next); > + entry->next = entry->prev = 0; > } No, because the buffer hash list is never referenced unless buffer->b_inode is non-null, so we can't ever do a double-list_del on the buffer. The patch below should fix it. It has been sent to Linus. The important part is the first hunk of the inode.c diff. Cheers, Stephen fsync-fix2.diff: --- fs/buffer.c.~1~ Wed Nov 29 15:16:43 2000 +++ fs/buffer.c Fri Dec 1 00:41:28 2000 @@ -871,10 +871,11 @@ else { bh->b_inode = &tmp; list_add(&bh->b_inode_buffers, &tmp.i_dirty_buffers); - atomic_inc(&bh->b_count); if (buffer_dirty(bh)) { + atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); ll_rw_block(WRITE, 1, &bh); + brelse(bh); spin_lock(&lru_list_lock); } } @@ -883,6 +884,7 @@ while (!list_empty(&tmp.i_dirty_buffers)) { bh = BH_ENTRY(tmp.i_dirty_buffers.prev); remove_inode_queue(bh); + atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); wait_on_buffer(bh); if (!buffer_uptodate(bh)) @@ -929,9 +931,9 @@ atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); wait_on_buffer(bh); - brelse(bh); if (!buffer_uptodate(bh)) err = -EIO; + brelse(bh); spin_lock(&lru_list_lock); goto repeat; } --- fs/inode.c.~1~ Wed Nov 29 15:16:43 2000 +++ fs/inode.c Fri Dec 1 00:40:26 2000 @@ -77,7 +77,13 @@ #define alloc_inode() \ ((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL)) -#define destroy_inode(inode) kmem_cache_free(inode_cachep, (inode)) +static void destroy_inode(struct inode *inode) +{ + if (!list_empty(&inode->i_dirty_buffers)) + BUG(); + kmem_cache_free(inode_cachep, (inode)); +} + /* * These are initializations that only need to be done @@ -348,6 +354,12 @@ void clear_inode(struct inode *inode) { + if (!list_empty(&inode->i_dirty_buffers)) { + if (inode->i_nlink) + BUG(); + invalidate_inode_buffers(inode); + } + if (inode->i_data.nrpages) BUG(); if (!(inode->i_state & I_FREEING)) @@ -407,6 +419,7 @@ inode = list_entry(tmp, struct inode, i_list); if (inode->i_sb != sb) continue; + invalidate_inode_buffers(inode); if (!atomic_read(&inode->i_count)) { list_del(&inode->i_hash); INIT_LIST_HEAD(&inode->i_hash); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 14:16 ` corruption Stephen C. Tweedie @ 2000-12-01 23:28 ` Andrew Morton 2000-12-02 0:30 ` corruption kumon 2000-12-02 3:59 ` corruption Andrew Morton 1 sibling, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-01 23:28 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Alexander Viro, Jonathan Hudson, linux-kernel "Stephen C. Tweedie" wrote: > > Hi, > > On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote: > > > > I bet this'll catch it: > > > > static __inline__ void list_del(struct list_head *entry) > > { > > __list_del(entry->prev, entry->next); > > + entry->next = entry->prev = 0; > > } > > No, because the buffer hash list is never referenced unless > buffer->b_inode is non-null, so we can't ever do a double-list_del on > the buffer. You are right. An overnight diskthrash with the above patch didn't oops, but it turned up three instances of the EXT2 directory warnings. Incidentally, there's something wrong in blockdev/VM land. The overnight run consisted of: - one looping instance of `dbench 4 ; sleep 120' - one looping instance of 'lmbench ; sleep 120' - one looping instance of `bonnie++ ; sleep 120' - one looping instance of `mmap001;mmap002;misc001;sleep 120' Things which went wrong (after ten hours): - the first dbench run never completed. - the first bonnie++ run never completed. - I then killed everything with ALT-SYSRQ-T. It's been 20 minutes now and the disk LED is *still* hard on. This machine has 256 megs and the hdparm disk bandwidth is 20 megs/sec. You can observe the latter problem pretty easily by running `misc001' on its own. Kill it after 20 seconds and the disk remains active for *ages*. Usually ninety seconds. Long enough to write all physical memory out ten times... > The patch below should fix it. It has been sent to Linus. The > important part is the first hunk of the inode.c diff. Okay, will test. (25 minutes now. It's fsck time...) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 23:28 ` corruption Andrew Morton @ 2000-12-02 0:30 ` kumon 0 siblings, 0 replies; 55+ messages in thread From: kumon @ 2000-12-02 0:30 UTC (permalink / raw) To: Andrew Morton Cc: Stephen C. Tweedie, Alexander Viro, Jonathan Hudson, linux-kernel, kumon Hi, Andrew, Andrew Morton writes: > - I then killed everything with ALT-SYSRQ-T. It's been 20 minutes > now and the disk LED is *still* hard on. This machine has 256 megs > and the hdparm disk bandwidth is 20 megs/sec. > > You can observe the latter problem pretty easily by running `misc001' on > its own. Kill it after 20 seconds and the disk remains active for *ages*. > Usually ninety seconds. Long enough to write all physical memory out > ten times... If the benchmark writes blocks completely random, the random writing performance should be used instead of the bandwidth. It is tipically 30blk/s to 200blk/s depends on the seek speed. 90 second writing corresponds to 10MB to 72MB dirty buffer (4K fs block), it is not so crazy. But 30 minutes is still too long on 256MB physical memory. I think they are different, aren't they? -- Computer Systems Laboratory, Fujitsu Labs. kumon@flab.fujitsu.co.jp - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-01 14:16 ` corruption Stephen C. Tweedie 2000-12-01 23:28 ` corruption Andrew Morton @ 2000-12-02 3:59 ` Andrew Morton 2000-12-02 14:00 ` corruption Andrew Morton 1 sibling, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-02 3:59 UTC (permalink / raw) To: Stephen C. Tweedie; +Cc: Alexander Viro, Jonathan Hudson, linux-kernel "Stephen C. Tweedie" wrote: > > Hi, > > On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote: > > > > I bet this'll catch it: > > > > static __inline__ void list_del(struct list_head *entry) > > { > > __list_del(entry->prev, entry->next); > > + entry->next = entry->prev = 0; > > } > > No, because the buffer hash list is never referenced unless > buffer->b_inode is non-null, so we can't ever do a double-list_del on > the buffer. > > The patch below should fix it. It has been sent to Linus. The > important part is the first hunk of the inode.c diff. Testing test11-pre3 with your inode.c patch and the above list_del patch. x86 dual processor, IDE. Same workload as before, except I cut out misc001 (and the machine recovered almost immediately when I killed everything! Need more testing to characterise this). kernel BUG at inode.c:83! The trace is below. Now, this was probably triggered by my list_del change. If so it means that we're doing a list_empty() test on a list_head which has actually been deleted from a list. So it's possibly the actual assertion in destroy_inode() which is a little bogus. But the wierd thing is that this BUG only hit a single time, after three hours of intensive testing. If my theory is right, the BUG should hit every time. Will investigate further... ksymoops 0.7c on i686 2.4.0-test12-pre3. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0-test12-pre3/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Reading Oops report from the terminal Dec 2 13:15:45 mnm kernel: invalid operand: 0000 Dec 2 13:15:45 mnm kernel: CPU: 0 Dec 2 13:15:45 mnm kernel: EIP: 0010:[<c014570e>] Using defaults from ksymoops -t elf32-i386 -a i386 Dec 2 13:15:45 mnm kernel: EFLAGS: 00010282 Dec 2 13:15:45 mnm kernel: eax: 0000001a ebx: c78686e0 ecx: 00000000 edx: 0000002f Dec 2 13:15:45 mnm kernel: esi: c025b800 edi: cd950960 ebp: c7869160 esp: ce611f3c Dec 2 13:15:45 mnm kernel: ds: 0018 es: 0018 ss: 0018 Dec 2 13:15:45 mnm kernel: Process dbench (pid: 13094, stackpage=ce611000) Dec 2 13:15:45 mnm kernel: Stack: c021b7e5 c021b8a5 00000053 c78686e0 c0146916 c78686e0 c7869160 c78686e0 Dec 2 13:15:45 mnm kernel: c0145096 c78686e0 00000000 ce610000 c013de4d c7869160 c7869160 c9b1e000 Dec 2 13:15:45 mnm kernel: ce611fa4 c7869160 cd9509d0 c013df25 cd950960 c7869160 ce610000 bffff5ca Dec 2 13:15:45 mnm kernel: Call Trace: [<c021b7e5>] [<c021b8a5>] [<c0146916>] [<c0145096>] [<c013de4d>] [<c013df25> Dec 2 13:15:45 mnm kernel: [<c010002b>] Dec 2 13:15:45 mnm kernel: Code: 0f 0b 83 c4 0c 53 a1 10 d1 2a c0 50 e8 cd 3d fe ff 83 c4 08 >>EIP; c014570e <destroy_inode+1e/34> <===== Trace; c021b7e5 <tvecs+5a3d/1a3d8> Trace; c021b8a5 <tvecs+5afd/1a3d8> Trace; c0146916 <iput+18e/194> Trace; c0145096 <d_delete+66/ac> Trace; c013de4d <vfs_unlink+18d/1c0> Trace; c013df25 <sys_unlink+a5/118> Trace; c010002b <startup_32+2b/cb> Code; c014570e <destroy_inode+1e/34> 00000000 <_EIP>: Code; c014570e <destroy_inode+1e/34> <===== 0: 0f 0b ud2a <===== Code; c0145710 <destroy_inode+20/34> 2: 83 c4 0c add $0xc,%esp Code; c0145713 <destroy_inode+23/34> 5: 53 push %ebx Code; c0145714 <destroy_inode+24/34> 6: a1 10 d1 2a c0 mov 0xc02ad110,%eax Code; c0145719 <destroy_inode+29/34> b: 50 push %eax Code; c014571a <destroy_inode+2a/34> c: e8 cd 3d fe ff call fffe3dde <_EIP+0xfffe3dde> c01294ec <kmem_cache_free+0/7c> Code; c014571f <destroy_inode+2f/34> 11: 83 c4 08 add $0x8,%esp 1 warning issued. Results may not be reliable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 3:59 ` corruption Andrew Morton @ 2000-12-02 14:00 ` Andrew Morton 2000-12-02 15:33 ` corruption Alexander Viro 0 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-02 14:00 UTC (permalink / raw) To: Stephen C. Tweedie, Alexander Viro, Jonathan Hudson, linux-kernel Andrew Morton wrote: > > "Stephen C. Tweedie" wrote: > > > > Hi, > > > > On Fri, Dec 01, 2000 at 08:35:41AM +1100, Andrew Morton wrote: > > > > > > I bet this'll catch it: > > > > > > static __inline__ void list_del(struct list_head *entry) > > > { > > > __list_del(entry->prev, entry->next); > > > + entry->next = entry->prev = 0; > > > } > > > > No, because the buffer hash list is never referenced unless > > buffer->b_inode is non-null, so we can't ever do a double-list_del on > > the buffer. > > > > The patch below should fix it. It has been sent to Linus. The > > important part is the first hunk of the inode.c diff. > > Testing test11-pre3 with your inode.c patch and the above list_del > patch. x86 dual processor, IDE. Same workload as before, except > I cut out misc001 (and the machine recovered almost immediately > when I killed everything! Need more testing to characterise this). > > kernel BUG at inode.c:83! The trace is below. Now, this was > probably triggered by my list_del change. If so it means > that we're doing a list_empty() test on a list_head which > has actually been deleted from a list. So it's possibly the > actual assertion in destroy_inode() which is a little bogus. > > But the wierd thing is that this BUG only hit a single time, > after three hours of intensive testing. If my theory is > right, the BUG should hit every time. Will investigate further... > It appears that this problem is not fixed. My destroy_inode() is now static void destroy_inode(struct inode *inode) { if (!list_empty(&inode->i_dirty_buffers)) { printk("&inode->i_dirty_buffers=0x%p\n", &inode->i_dirty_buffers); printk("next=0x%p\n", inode->i_dirty_buffers.next); printk("prev=0x%p\n", inode->i_dirty_buffers.prev); BUG(); } kmem_cache_free(inode_cachep, (inode)); } After 45 minutes of running the previously described tests: Dec 2 16:14:30 mnm kernel: &inode->i_dirty_buffers=0xcfe16878 Dec 2 16:14:30 mnm kernel: next=0xc16f3678 Dec 2 16:14:30 mnm kernel: prev=0xc16f3678 Dec 2 16:14:30 mnm kernel: kernel BUG at inode.c:86! We're throwing away an inode which has live data on its dirty buffers list. This is 2.4.0-test11-pre3 + list_del patch + sct's inode patch (buffer.c, inode.c). x86 dual processor. gcc 2.91.66. I rediffed my tree. No rogue patches. It's not an SMP thing - the address patterns match up with the previously reported uniprocessor corruption. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 14:00 ` corruption Andrew Morton @ 2000-12-02 15:33 ` Alexander Viro 2000-12-02 16:39 ` corruption Petr Vandrovec ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Alexander Viro @ 2000-12-02 15:33 UTC (permalink / raw) To: Linus Torvalds Cc: Stephen C. Tweedie, Andrew Morton, Jonathan Hudson, linux-kernel On Sun, 3 Dec 2000, Andrew Morton wrote: > It appears that this problem is not fixed. Sure, it isn't. Place where the shit hits the fan: fs/buffer.c::unmap_buffer(). Add the call of remove_inode_queue(bh) there and see if it helps. I.e. ed fs/buffer.c <<EOF /unmap_buffer/ /}/i remove_inode_queue(bh); . wq EOF Linus, could you apply that? We are leaving the unmapped buffers on the inode queue. I.e. every truncate_inode_pages() leaves a lot of junk around. Now, guess what happens when we destroy the last link to inode that nobody keeps open... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 15:33 ` corruption Alexander Viro @ 2000-12-02 16:39 ` Petr Vandrovec 2000-12-02 17:50 ` corruption Alexander Viro 2000-12-02 17:59 ` corruption Alexander Viro 2000-12-03 21:44 ` corruption Andrew Morton 2000-12-04 15:00 ` corruption Stephen C. Tweedie 2 siblings, 2 replies; 55+ messages in thread From: Petr Vandrovec @ 2000-12-02 16:39 UTC (permalink / raw) To: Alexander Viro Cc: Linus Torvalds, Stephen C. Tweedie, Andrew Morton, Jonathan Hudson, linux-kernel On Sat, Dec 02, 2000 at 10:33:36AM -0500, Alexander Viro wrote: > On Sun, 3 Dec 2000, Andrew Morton wrote: > > > It appears that this problem is not fixed. > > Sure, it isn't. Place where the shit hits the fan: fs/buffer.c::unmap_buffer(). > Add the call of remove_inode_queue(bh) there and see if it helps. I.e. > > ed fs/buffer.c <<EOF > /unmap_buffer/ > /}/i > remove_inode_queue(bh); > . > wq > EOF > > Linus, could you apply that? We are leaving the unmapped buffers on the > inode queue. I.e. every truncate_inode_pages() leaves a lot of junk around. > Now, guess what happens when we destroy the last link to inode that nobody > keeps open... Nothing new (was it meant to run remove_inode_queue() conditionaly inside buffer_mapped() branch? ed did it that way). First is list of buffers at time of destroy_inode, then process. If you want full oops trace, it is available at http://platan.vc.cvut.cz/oops3.txt, but last part is always iput. For now I'm back on test9, as I lost inetd.conf again :-( Someone should shoot sendmail Debian maintainer... Running update-inetd at startup is really bad idea, as fsck then usually removes both old and new inetd.conf, so I'm back on inetd.conf from 25 Aug 1999 :-((( Fields printed from buffer_head are b_next, b_blocknr, b_size, b_list, b_count, b_state and b_inode. (oops now I see that I left remove_inode_queue(bh) in printing loop (I copied it from invalidate_inode_buffers()), but it should not hurt, I believe. Dirty buffers should find its way to disk anyway, or not?) Best regards, Petr Vandrovec vandrove@vc.cvut.cz BTW, running 'ksymoops < oops > oops.txt' is great source of errors below, as it (probably) uses couple of unlinked temorary files... next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7bf3ce0 Process mount (pid: 30, stackpage=c7df3000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7bf3ce0 Process mount (pid: 31, stackpage=c7df3000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7c1a860 Process mount (pid: 68, stackpage=c7997000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7b54c80 Process mount (pid: 70, stackpage=c7997000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7b54c80 Process mount (pid: 70, stackpage=c7997000) next=00000000, nr=3375109, size=4096, list=2 count=0 state=0000001B inode=c77e2260 Process rm (pid: 82, stackpage=c7b35000) next=00000000, nr=3506180, size=4096, list=2 count=0 state=0000001B inode=c77eb9c0 Process rm (pid: 121, stackpage=c776d000) next=00000000, nr=3506180, size=4096, list=2 count=0 state=0000001B inode=c77ebb40 next=00000000, nr=3507147, size=4096, list=2 count=0 state=0000001B inode=c77ebb40 Process rm (pid: 122, stackpage=c776d000) next=00000000, nr=1179657, size=4096, list=2 count=0 state=0000001B inode=c77ebb40 Process rm (pid: 123, stackpage=c776d000) next=00000000, nr=294919, size=4096, list=2 count=0 state=0000001B inode=c77eb240 Process rm (pid: 129, stackpage=c775d000) next=00000000, nr=294919, size=4096, list=2 count=0 state=0000001B inode=c77eb0c0 Process rm (pid: 130, stackpage=c775d000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7889c60 Process mv (pid: 138, stackpage=c7985000) next=c796fce0, nr=294916, size=4096, list=2 count=0 state=0000001B inode=c77eb6c0 Process rm (pid: 142, stackpage=c7721000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c75e83a0 Process update-inetd (pid: 273, stackpage=c715d000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7b359a0 Process rm (pid: 305, stackpage=c6b33000) next=00000000, nr=2457654, size=4096, list=2 count=0 state=0000001B inode=c7b356a0 Process rm (pid: 305, stackpage=c6b33000) next=c6ed8980, nr=753674, size=4096, list=2 count=0 state=0000001B inode=c655a3a0 Process mc (pid: 330, stackpage=c6703000) next=00000000, nr=196617, size=4096, list=2 count=0 state=0000001B inode=c656c4e0 Process mc (pid: 330, stackpage=c6703000) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 16:39 ` corruption Petr Vandrovec @ 2000-12-02 17:50 ` Alexander Viro 2000-12-02 17:59 ` corruption Alexander Viro 1 sibling, 0 replies; 55+ messages in thread From: Alexander Viro @ 2000-12-02 17:50 UTC (permalink / raw) To: Petr Vandrovec Cc: Linus Torvalds, Stephen C. Tweedie, Andrew Morton, Jonathan Hudson, linux-kernel On Sat, 2 Dec 2000, Petr Vandrovec wrote: > Nothing new (was it meant to run remove_inode_queue() conditionaly inside > buffer_mapped() branch? ed did it that way). First is list of buffers at > time of destroy_inode, then process. If you want full oops trace, it is what list of buffers? ->i_dirty_buffers one? > available at http://platan.vc.cvut.cz/oops3.txt, but last part is always > iput. For now I'm back on test9, as I lost inetd.conf again :-( Someone > should shoot sendmail Debian maintainer... Running update-inetd at startup > is really bad idea, as fsck then usually removes both old and new inetd.conf, > so I'm back on inetd.conf from 25 Aug 1999 :-((( > > Fields printed from buffer_head are b_next, b_blocknr, b_size, b_list, > b_count, b_state and b_inode. (oops now I see that I left > remove_inode_queue(bh) in printing loop (I copied it from > invalidate_inode_buffers()), but it should not hurt, I believe. Dirty buffers > should find its way to disk anyway, or not?) When you delete the inode? Why would they? Petr, could you send me the diff between the variant you've run and pristine 12-pre3? I'ld really like to see what exactly was doing the printks... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 16:39 ` corruption Petr Vandrovec 2000-12-02 17:50 ` corruption Alexander Viro @ 2000-12-02 17:59 ` Alexander Viro 2000-12-03 20:24 ` corruption Jonathan Hudson 1 sibling, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-12-02 17:59 UTC (permalink / raw) To: Petr Vandrovec Cc: Linus Torvalds, Stephen C. Tweedie, Andrew Morton, Jonathan Hudson, linux-kernel On Sat, 2 Dec 2000, Petr Vandrovec wrote: [I wrote:] > > ed fs/buffer.c <<EOF > > /unmap_buffer/ > > /}/i spin_lock(&lru_list_lock); > > remove_inode_queue(bh); spin_unlock(&lru_list_lock); > > . > > wq > > EOF Damn. I claim the sudden idiocy attack - didn't look at the locking rules for the ->b_inode_buffers. My apologies. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 17:59 ` corruption Alexander Viro @ 2000-12-03 20:24 ` Jonathan Hudson 0 siblings, 0 replies; 55+ messages in thread From: Jonathan Hudson @ 2000-12-03 20:24 UTC (permalink / raw) To: linux-kernel In article <Pine.GSO.4.21.0012021255330.28923-100000@weyl.math.psu.edu>, Alexander Viro <viro@math.psu.edu> writes: AV> >> > ed fs/buffer.c <<EOF >> > /unmap_buffer/ >> > /}/i AV> spin_lock(&lru_list_lock); >> > remove_inode_queue(bh); AV> spin_unlock(&lru_list_lock); >> > . >> > wq >> > EOF AV> I applied this on top the the previous SCT patch, and have thrashed the system harder than I would have dared previously. It's still running. I feel very comfortable with this, much more so than any prior 2.4.0t*. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 15:33 ` corruption Alexander Viro 2000-12-02 16:39 ` corruption Petr Vandrovec @ 2000-12-03 21:44 ` Andrew Morton 2000-12-03 22:45 ` [resync?] corruption Alexander Viro 2000-12-04 15:00 ` corruption Stephen C. Tweedie 2 siblings, 1 reply; 55+ messages in thread From: Andrew Morton @ 2000-12-03 21:44 UTC (permalink / raw) To: Alexander Viro, Petr Vandrovec Cc: Linus Torvalds, Stephen C. Tweedie, Jonathan Hudson, linux-kernel Alexander Viro wrote: > > On Sun, 3 Dec 2000, Andrew Morton wrote: > > > It appears that this problem is not fixed. > > Sure, it isn't. Place where the shit hits the fan: fs/buffer.c::unmap_buffer(). > Add the call of remove_inode_queue(bh) there and see if it helps. I.e. > > ed fs/buffer.c <<EOF > /unmap_buffer/ > /}/i > remove_inode_queue(bh); > . > wq > EOF > Sorry, it's still failing. It took three hours. &inode->i_dirty_buffers=0xca9e63f8 next=0xc30a2598 prev=0xc30a2598 kernel BUG at inode.c:86! The ksymoops output is here, as is my current diff wrt test12-pre3. ksymoops 0.7c on i686 2.4.0-test12-pre3. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0-test12-pre3/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Reading Oops report from the terminal Dec 4 01:56:02 mnm kernel: EIP: 0010:[<c0145758>] Using defaults from ksymoops -t elf32-i386 -a i386 Dec 4 01:56:02 mnm kernel: EFLAGS: 00010282 Dec 4 01:56:02 mnm kernel: eax: 0000001a ebx: ca9e63e0 ecx: 00000001 edx: 00000000 Dec 4 01:56:02 mnm kernel: esi: c025b8a0 edi: cfbcb040 ebp: ce2b0260 esp: ced4df3c Dec 4 01:56:02 mnm kernel: ds: 0018 es: 0018 ss: 0018 Dec 4 01:56:02 mnm kernel: Process lat_fs (pid: 17559, stackpage=ced4d000) Dec 4 01:56:02 mnm kernel: Stack: c021b845 c021b939 00000056 ca9e63e0 c0146966 ca9e63e0 ce2b0260 ca9e63e0 Dec 4 01:56:02 mnm kernel: Call Trace: [<c021b845>] [<c021b939>] [<c0146966>] [<c01450b6>] [<c013de6d>] [<c013df45>] [<c0108fdf>] Dec 4 01:56:02 mnm kernel: Code: 0f 0b 83 c4 0c 8d 76 00 53 a1 10 d1 2a c0 50 e8 80 3d fe ff >>EIP; c0145758 <destroy_inode+48/64> <===== Trace; c021b845 <tvecs+5a3d/1a418> Trace; c021b939 <tvecs+5b31/1a418> Trace; c0146966 <iput+18e/194> Trace; c01450b6 <d_delete+66/ac> Trace; c013de6d <vfs_unlink+18d/1c0> Trace; c013df45 <sys_unlink+a5/118> Trace; c0108fdf <system_call+33/38> Code; c0145758 <destroy_inode+48/64> 00000000 <_EIP>: Code; c0145758 <destroy_inode+48/64> <===== 0: 0f 0b ud2a <===== Code; c014575a <destroy_inode+4a/64> 2: 83 c4 0c add $0xc,%esp Code; c014575d <destroy_inode+4d/64> 5: 8d 76 00 lea 0x0(%esi),%esi Code; c0145760 <destroy_inode+50/64> 8: 53 push %ebx Code; c0145761 <destroy_inode+51/64> 9: a1 10 d1 2a c0 mov 0xc02ad110,%eax Code; c0145766 <destroy_inode+56/64> e: 50 push %eax Code; c0145767 <destroy_inode+57/64> f: e8 80 3d fe ff call fffe3d94 <_EIP+0xfffe3d94> c01294ec <kmem_cache_free+0/7c> 1 warning issued. Results may not be reliable. --- linux-2.4.0-test12-pre3/include/linux/list.h Fri Aug 11 19:06:12 2000 +++ linux-akpm/include/linux/list.h Fri Dec 1 17:31:35 2000 @@ -90,6 +90,7 @@ static __inline__ void list_del(struct list_head *entry) { __list_del(entry->prev, entry->next); + entry->next = entry->prev = 0; } /** --- linux-2.4.0-test12-pre3/fs/buffer.c Wed Nov 29 18:23:19 2000 +++ linux-akpm/fs/buffer.c Sun Dec 3 22:36:18 2000 @@ -871,10 +871,11 @@ else { bh->b_inode = &tmp; list_add(&bh->b_inode_buffers, &tmp.i_dirty_buffers); - atomic_inc(&bh->b_count); if (buffer_dirty(bh)) { + atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); ll_rw_block(WRITE, 1, &bh); + brelse(bh); spin_lock(&lru_list_lock); } } @@ -883,6 +884,7 @@ while (!list_empty(&tmp.i_dirty_buffers)) { bh = BH_ENTRY(tmp.i_dirty_buffers.prev); remove_inode_queue(bh); + atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); wait_on_buffer(bh); if (!buffer_uptodate(bh)) @@ -929,9 +931,9 @@ atomic_inc(&bh->b_count); spin_unlock(&lru_list_lock); wait_on_buffer(bh); - brelse(bh); if (!buffer_uptodate(bh)) err = -EIO; + brelse(bh); spin_lock(&lru_list_lock); goto repeat; } @@ -1459,6 +1461,9 @@ clear_bit(BH_Mapped, &bh->b_state); clear_bit(BH_Req, &bh->b_state); clear_bit(BH_New, &bh->b_state); + spin_lock(&lru_list_lock); + remove_inode_queue(bh); + spin_unlock(&lru_list_lock); } } --- linux-2.4.0-test12-pre3/fs/inode.c Wed Nov 29 18:23:19 2000 +++ linux-akpm/fs/inode.c Sat Dec 2 15:34:51 2000 @@ -77,7 +77,17 @@ #define alloc_inode() \ ((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL)) -#define destroy_inode(inode) kmem_cache_free(inode_cachep, (inode)) +static void destroy_inode(struct inode *inode) +{ + if (!list_empty(&inode->i_dirty_buffers)) { + printk("&inode->i_dirty_buffers=0x%p\n", &inode->i_dirty_buffers); + printk("next=0x%p\n", inode->i_dirty_buffers.next); + printk("prev=0x%p\n", inode->i_dirty_buffers.prev); + BUG(); + } + kmem_cache_free(inode_cachep, (inode)); +} + /* * These are initializations that only need to be done @@ -348,6 +358,12 @@ void clear_inode(struct inode *inode) { + if (!list_empty(&inode->i_dirty_buffers)) { + if (inode->i_nlink) + BUG(); + invalidate_inode_buffers(inode); + } + if (inode->i_data.nrpages) BUG(); if (!(inode->i_state & I_FREEING)) @@ -407,6 +423,7 @@ inode = list_entry(tmp, struct inode, i_list); if (inode->i_sb != sb) continue; + invalidate_inode_buffers(inode); if (!atomic_read(&inode->i_count)) { list_del(&inode->i_hash); INIT_LIST_HEAD(&inode->i_hash); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* [resync?] Re: corruption 2000-12-03 21:44 ` corruption Andrew Morton @ 2000-12-03 22:45 ` Alexander Viro 2000-12-04 0:56 ` Jeff V. Merkey 0 siblings, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-12-03 22:45 UTC (permalink / raw) To: Andrew Morton Cc: Petr Vandrovec, Linus Torvalds, Stephen C. Tweedie, Jonathan Hudson, linux-kernel On Mon, 4 Dec 2000, Andrew Morton wrote: > Sorry, it's still failing. It took three hours. Yes. For one thing, original was plain wrong wrt locking (lru_list_lock should be held). For another, it does not take care of metadata. And that's way more serious. What really happens: ext2_truncate() got a buffer_head of indirect block that is going to die. Fine, we release the blocks refered from it and... do bforget() on our block. Notice that we are not guaranteed that bh will actually die here. buffer.c code might bump its ->b_count for a while, it might be written out right now, etc. As the result, bforget() leaves the sucker alive. It's not a big deal, since we will do unmap_underlying_metadata() before we write anything there (if it will be reused for data) or we'll just pick the bh and zero the buffer out (if it will be reused for metadata). Unfortunately, we also leave it on the per-inode dirty blocks list. Guess what happens if inode is destroyed, page that used to hold it gets reused and bh gets finally written? Exactly. Suggested fix: void bforget_inode(struct buffer_head *bh) that would be a copy of __bforget(), except that it would call remove_inode_queue(bh) unconditionally. And replace bforget() with bforget_inode() in those places of ext2/inode.c that are followed by freeing the block. Comments? I'll do a patch, but I'ld really like to know what had already gone into the main tree. Linus, could you put the 12-pre4-dont-use on ftp.kernel.org? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: [resync?] Re: corruption 2000-12-03 22:45 ` [resync?] corruption Alexander Viro @ 2000-12-04 0:56 ` Jeff V. Merkey 0 siblings, 0 replies; 55+ messages in thread From: Jeff V. Merkey @ 2000-12-04 0:56 UTC (permalink / raw) To: Alexander Viro Cc: Andrew Morton, Petr Vandrovec, Linus Torvalds, Stephen C. Tweedie, Jonathan Hudson, linux-kernel On Sun, Dec 03, 2000 at 05:45:57PM -0500, Alexander Viro wrote: > > > On Mon, 4 Dec 2000, Andrew Morton wrote: > > > Sorry, it's still failing. It took three hours. > > Yes. For one thing, original was plain wrong wrt locking (lru_list_lock > should be held). For another, it does not take care of metadata. And > that's way more serious. What really happens: > > ext2_truncate() got a buffer_head of indirect block that is going to > die. Fine, we release the blocks refered from it and... do bforget() > on our block. Notice that we are not guaranteed that bh will actually > die here. buffer.c code might bump its ->b_count for a while, it might > be written out right now, etc. As the result, bforget() leaves the > sucker alive. It's not a big deal, since we will do unmap_underlying_metadata() > before we write anything there (if it will be reused for data) or we'll > just pick the bh and zero the buffer out (if it will be reused for metadata). > > Unfortunately, we also leave it on the per-inode dirty blocks list. Guess > what happens if inode is destroyed, page that used to hold it gets reused > and bh gets finally written? Exactly. > > Suggested fix: void bforget_inode(struct buffer_head *bh) that would > be a copy of __bforget(), except that it would call remove_inode_queue(bh) > unconditionally. And replace bforget() with bforget_inode() in those places > of ext2/inode.c that are followed by freeing the block. > > Comments? I'll do a patch, but I'ld really like to know what had already > gone into the main tree. Linus, could you put the 12-pre4-dont-use on > ftp.kernel.org? Al, I am always amazed at how rapidly you seem to be able to run down some of these file system corruption problems. You seem to understand the interaction of this layer extremely well. :-) Jeff > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-02 15:33 ` corruption Alexander Viro 2000-12-02 16:39 ` corruption Petr Vandrovec 2000-12-03 21:44 ` corruption Andrew Morton @ 2000-12-04 15:00 ` Stephen C. Tweedie 2000-12-04 15:19 ` corruption Alexander Viro 2 siblings, 1 reply; 55+ messages in thread From: Stephen C. Tweedie @ 2000-12-04 15:00 UTC (permalink / raw) To: Alexander Viro Cc: Linus Torvalds, Stephen C. Tweedie, Andrew Morton, Jonathan Hudson, linux-kernel Hi, On Sat, Dec 02, 2000 at 10:33:36AM -0500, Alexander Viro wrote: > > On Sun, 3 Dec 2000, Andrew Morton wrote: > > > It appears that this problem is not fixed. > Sure, it isn't. Place where the shit hits the fan: fs/buffer.c::unmap_buffer(). > Add the call of remove_inode_queue(bh) there and see if it helps. I.e. unmap_buffer() calls mark_buffer_clean() calls refile_buffer() calls remove_inode_queue(), which is why we don't see this all the time. However, refile_buffer() is only calling the remove_inode_queue() if the buffer disposition changes. I'm looking to see where we may be going wrong here --- the refile_buffer() is not atomic wrt. the bh->b_inode structures. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-12-04 15:00 ` corruption Stephen C. Tweedie @ 2000-12-04 15:19 ` Alexander Viro 0 siblings, 0 replies; 55+ messages in thread From: Alexander Viro @ 2000-12-04 15:19 UTC (permalink / raw) To: Stephen C. Tweedie Cc: Linus Torvalds, Andrew Morton, Jonathan Hudson, linux-kernel On Mon, 4 Dec 2000, Stephen C. Tweedie wrote: > unmap_buffer() calls mark_buffer_clean() calls refile_buffer() calls > remove_inode_queue(), which is why we don't see this all the time. Not enough, since you can hit the window between the request completion (bh is marked clean) and getting it picked by flush_dirty_buffers() et.al. If you get destroy_inode() before that window will close... > However, refile_buffer() is only calling the remove_inode_queue() if > the buffer disposition changes. I'm looking to see where we may be > going wrong here --- the refile_buffer() is not atomic wrt. the > bh->b_inode structures. See above. Point about the metadata (bforget() is not enough) also stands, ditto for ext2_update_inode() one. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-30 21:35 ` corruption Andrew Morton 2000-12-01 0:57 ` corruption Andrew Morton 2000-12-01 14:16 ` corruption Stephen C. Tweedie @ 2000-12-01 17:29 ` Jeff Garzik 2 siblings, 0 replies; 55+ messages in thread From: Jeff Garzik @ 2000-12-01 17:29 UTC (permalink / raw) To: Andrew Morton; +Cc: Alexander Viro, Jonathan Hudson, linux-kernel On Fri, 1 Dec 2000, Andrew Morton wrote: > Alexander Viro wrote: > > > > Confirms. That's definitely an empty list_head at address 0xc3c49058 and -pre2 > > has O_SYNC patches. > > foo. The overnight run wedged tight in mmap002. No progress. > > I bet this'll catch it: > > --- include/linux/list.h.orig Fri Dec 1 08:33:36 2000 > +++ include/linux/list.h Fri Dec 1 08:33:55 2000 > @@ -90,6 +90,7 @@ > static __inline__ void list_del(struct list_head *entry) > { > __list_del(entry->prev, entry->next); > + entry->next = entry->prev = 0; > } > Or just call list_del_init instead... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
[parent not found: <20001202161158.A475@ppc.vc.cvut.cz>]
* Re: corruption [not found] <20001202161158.A475@ppc.vc.cvut.cz> @ 2000-12-02 15:35 ` Petr Vandrovec 0 siblings, 0 replies; 55+ messages in thread From: Petr Vandrovec @ 2000-12-02 15:35 UTC (permalink / raw) To: andrewm; +Cc: sct, viro, jonathan, linux-kernel Andrew Morton wrote: > Andrew Morton wrote: > > > > actual assertion in destroy_inode() which is a little bogus. > > > > But the wierd thing is that this BUG only hit a single time, > > after three hours of intensive testing. If my theory is > > right, the BUG should hit every time. Will investigate further... > > > > It appears that this problem is not fixed. > > My destroy_inode() is now > > static void destroy_inode(struct inode *inode) > { > if (!list_empty(&inode->i_dirty_buffers)) { > printk("&inode->i_dirty_buffers=0x%p\n", &inode->i_dirty_buffers); > printk("next=0x%p\n", inode->i_dirty_buffers.next); > printk("prev=0x%p\n", inode->i_dirty_buffers.prev); > BUG(); > } > kmem_cache_free(inode_cachep, (inode)); > } I used do { if (inode_has_buffers(inode)) { printstate(); } kmem_cache_free.... } while (0) and machine complained very loudly during boot... > This is 2.4.0-test11-pre3 + list_del patch + sct's inode > patch (buffer.c, inode.c). x86 dual processor. gcc 2.91.66. > I rediffed my tree. No rogue patches. test12-pre3, NULL in list_del, destroy_inode as above, UP, 2.95.2 So I thought that adding fsync_inode_buffers() added into iput() just below atomic_dec_and_lock(&inode->i_count...) would be good idea. It is not, bug was still trigerred. So there are oopses... I removed disassembled code, as it is same for all oopses (as my printstate dumps itself). Before fsync_inode_buffers() it was almost same, there were also traces through sys_close() in additon to this. But maybe that I just did not trigger this code path during testing. I think that buffer_insert_inode_queue and __remove_inode_queue should also do iget() and iput(), but maybe I'm wrong. And I have no idea why fsync_inode_buffers() does not work. I thought that inode should not have any buffers attached after this function returns if inode use count was zero... Maybe it is a bit complicated when inode is going to cease... Best regards, Petr Vandrovec vandrove@vc.cvut.cz ksymoops 2.3.4 on i686 2.4.0-test9-smp. Options used -v /usr/src/linux/vmlinux (specified) -k none (specified) -l none (specified) -o /lib/modules/2.4.0-test12-pre3-smp/ (specified) -m /boot/System.map (specified) Error (regular_file): read_ksyms stat none failed No modules in ksyms, skipping objects No ksyms, skipping lsmod CPU: 0 EIP: 0010:[<c010b9b5>] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00000202 eax: 00000001 ebx: c13fe9a0 ecx: c022ce90 edx: c13fe9b8 esi: c022f500 edi: c7d940a0 ebp: c7d69fbc esp: c7d69f38 ds: 0018 es: 0018 ss: 0018 Process mount (pid: 34, stackpage=c7d69000) Stack: c7d69f3c c0140018 c01475b5 c13ffd40 c13fe9a0 c0145296 c13fe9a0 c7d68000 00000000 c01400b2 c13ffd40 c7d68000 08058fd0 08059930 c123d920 c13ffd40 c123d920 c7d12000 c7f45000 c123d920 c7f7b8c0 c7d12005 00000004 0006a272 Call Trace: [<c0140018>] [<c01475b5>] [<c0145296>] [<c01400b2>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145296 <dput+116/174> Trace; c01400b2 <sys_rename+1ee/270> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> 0000000000000000 <_EIP>: Code; c010b9b5 <printstate+9/2c> <===== 0: 50 push %eax <===== Code; c010b9b6 <printstate+a/2c> 1: 1e push %ds Code; c010b9b7 <printstate+b/2c> 2: 06 push %es Code; c010b9b8 <printstate+c/2c> 3: 50 push %eax Code; c010b9b9 <printstate+d/2c> 4: 55 push %ebp Code; c010b9ba <printstate+e/2c> 5: 57 push %edi Code; c010b9bb <printstate+f/2c> 6: 56 push %esi Code; c010b9bc <printstate+10/2c> 7: 52 push %edx Code; c010b9bd <printstate+11/2c> 8: 51 push %ecx Code; c010b9be <printstate+12/2c> 9: 53 push %ebx Code; c010b9bf <printstate+13/2c> a: 89 e0 mov %esp,%eax Code; c010b9c1 <printstate+15/2c> c: 50 push %eax Code; c010b9c2 <printstate+16/2c> d: e8 a9 fe ff ff call fffffebb <_EIP+0xfffffebb> c010b870 <show_registers+0/13c> Code; c010b9c7 <printstate+1b/2c> 12: 83 c4 00 add $0x0,%esp CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7c5cb40 ecx: c022ce90 edx: c7c5cb58 esi: c022f500 edi: c7f3d4e0 ebp: c132a460 esp: c7d69f4c ds: 0018 es: 0018 ss: 0018 Process mount (pid: 34, stackpage=c7d69000) Stack: c7d69f50 c0140018 c01475b5 c132a460 c7c5cb40 c0145df8 c7c5cb40 c7d68000 00000000 c013f110 c132a460 c132a460 c132a460 c7f45000 c7d69fa4 c013f1e7 c7f3d4e0 c132a460 c7d68000 08058fd0 08059930 bffffb8c c123d920 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7c5cb40 ecx: c022ce90 edx: c7c5cb58 esi: c022f500 edi: c7f3d4e0 ebp: c132a460 esp: c7d69f4c ds: 0018 es: 0018 ss: 0018 Process mount (pid: 35, stackpage=c7d69000) Stack: c7d69f50 c0140018 c01475b5 c132a460 c7c5cb40 c0145df8 c7c5cb40 c7d68000 00000000 c013f110 c132a460 c132a460 c132a460 c7f45000 c7d69fa4 c013f1e7 c7f3d4e0 c132a460 c7d68000 08058f60 08052c71 00000000 c123d920 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0ce0 ecx: c022ce90 edx: c7ad0cf8 esi: c022f500 edi: c7ad0e60 ebp: c7d6c3e0 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 41, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7d6c3e0 c7ad0ce0 c0145df8 c7ad0ce0 c7c06000 00000000 c013f110 c7d6c3e0 c7d6c3e0 c7d6c3e0 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7d6c3e0 c7c06000 400165a4 bffffd4c bffffc54 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0ce0 ecx: c022ce90 edx: c7ad0cf8 esi: c022f500 edi: c7ad0e60 ebp: c7d844a0 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 41, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7d844a0 c7ad0ce0 c0145df8 c7ad0ce0 c7c06000 00000000 c013f110 c7d844a0 c7d844a0 c7d844a0 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7d844a0 c7c06000 400165a4 bffffd4c bffffc54 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0ce0 ecx: c022ce90 edx: c7ad0cf8 esi: c022f500 edi: c7ad0e60 ebp: c7d6c560 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 41, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7d6c560 c7ad0ce0 c0145df8 c7ad0ce0 c7c06000 00000000 c013f110 c7d6c560 c7d6c560 c7d6c560 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7d6c560 c7c06000 400165a4 bffffd4c bffffc54 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0b60 ecx: c022ce90 edx: c7ad0b78 esi: c022f500 edi: c7ad0e60 ebp: c7d6c660 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 47, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7d6c660 c7ad0b60 c0145df8 c7ad0b60 c7c06000 00000000 c013f110 c7d6c660 c7d6c660 c7d6c660 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7d6c660 c7c06000 400165a4 bffffd2c bffffc34 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0b60 ecx: c022ce90 edx: c7ad0b78 esi: c022f500 edi: c7ad0e60 ebp: c7d846a0 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 47, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7d846a0 c7ad0b60 c0145df8 c7ad0b60 c7c06000 00000000 c013f110 c7d846a0 c7d846a0 c7d846a0 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7d846a0 c7c06000 400165a4 bffffd2c bffffc34 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7ad0b60 ecx: c022ce90 edx: c7ad0b78 esi: c022f500 edi: c7ad0e60 ebp: c7df0140 esp: c7c07f4c ds: 0018 es: 0018 ss: 0018 Process ksymoops (pid: 47, stackpage=c7c07000) Stack: c7c07f50 c0140018 c01475b5 c7df0140 c7ad0b60 c0145df8 c7ad0b60 c7c06000 00000000 c013f110 c7df0140 c7df0140 c7df0140 c7f45000 c7c07fa4 c013f1e7 c7ad0e60 c7df0140 c7c06000 400165a4 bffffd2c bffffc34 c7d84420 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c7b279c0 ecx: c022ce90 edx: c7b279d8 esi: c022f500 edi: c7f3d4e0 ebp: c132a460 esp: c77abf4c ds: 0018 es: 0018 ss: 0018 Process mount (pid: 89, stackpage=c77ab000) Stack: c77abf50 c0140018 c01475b5 c132a460 c7b279c0 c0145df8 c7b279c0 c77aa000 00000000 c013f110 c132a460 c132a460 c132a460 c7f45000 c77abfa4 c013f1e7 c7f3d4e0 c132a460 c77aa000 08057f60 08057fc8 00000000 c123d920 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c78681c0 ecx: c022ce90 edx: c78681d8 esi: c022f500 edi: c7f3d4e0 ebp: c132a460 esp: c7a41f4c ds: 0018 es: 0018 ss: 0018 Process mount (pid: 92, stackpage=c7a41000) Stack: c7a41f50 c0140018 c01475b5 c132a460 c78681c0 c0145df8 c78681c0 c7a40000 00000000 c013f110 c132a460 c132a460 c132a460 c7f45000 c7a41fa4 c013f1e7 c7f3d4e0 c132a460 c7a40000 08073d98 08052c71 00000000 c123d920 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c78681c0 ecx: c022ce90 edx: c78681d8 esi: c022f500 edi: c7f3d4e0 ebp: c132a460 esp: c7a41f4c ds: 0018 es: 0018 ss: 0018 Process mount (pid: 92, stackpage=c7a41000) Stack: c7a41f50 c0140018 c01475b5 c132a460 c78681c0 c0145df8 c78681c0 c7a40000 00000000 c013f110 c132a460 c132a460 c132a460 c7f45000 c7a41fa4 c013f1e7 c7f3d4e0 c132a460 c7a40000 080762e0 08052c71 00000000 c123d920 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c76e03a0 ecx: c022ce90 edx: c76e03b8 esi: c022f500 edi: c76e0520 ebp: c76c0840 esp: c7a6ff4c ds: 0018 es: 0018 ss: 0018 Process rm (pid: 105, stackpage=c7a6f000) Stack: c7a6ff50 c0140018 c01475b5 c76c0840 c76e03a0 c0145df8 c76e03a0 c7a6e000 00000000 c013f110 c76c0840 c76c0840 c76c0840 c7f45000 c7a6ffa4 c013f1e7 c76e0520 c76c0840 c7a6e000 bffffec1 00000000 bffffc44 c76c08c0 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c76154c0 ecx: c022ce90 edx: c76154d8 esi: c022f500 edi: c7f45000 ebp: c751bfa4 esp: c751bf60 ds: 0018 es: 0018 ss: 0018 Process rm (pid: 143, stackpage=c751b000) Stack: c751bf64 c0140018 c01475b5 c77fe220 c76154c0 c0145296 c76154c0 c77fe220 00000000 c013ef77 c77fe220 c7ad0e60 c77fe220 c751a000 0804ec80 bffffbcc bffffc34 c7d84420 c7f7b8c0 c7f45002 00000009 b23a38c3 00000010 00000000 Call Trace: [<c0140018>] [<c01475b5>] [<c0145296>] [<c013ef77>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145296 <dput+116/174> Trace; c013ef77 <sys_rmdir+cb/104> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c752fae0 ecx: c022ce90 edx: c752faf8 esi: c022f500 edi: c13fe520 ebp: c7684dc0 esp: c7529f4c ds: 0018 es: 0018 ss: 0018 Process rm (pid: 149, stackpage=c7529000) Stack: c7529f50 c0140018 c01475b5 c7684dc0 c752fae0 c0145df8 c752fae0 c7528000 00000000 c013f110 c7684dc0 c7684dc0 c7684dc0 c7f45000 c7529fa4 c013f1e7 c13fe520 c7684dc0 c7528000 bffffeb4 00000000 bffffc34 c7d6cf60 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c752f7e0 ecx: c022ce90 edx: c752f7f8 esi: c022f500 edi: c13fe520 ebp: c7684c40 esp: c7529f4c ds: 0018 es: 0018 ss: 0018 Process rm (pid: 150, stackpage=c7529000) Stack: c7529f50 c0140018 c01475b5 c7684c40 c752f7e0 c0145df8 c752f7e0 c7528000 00000000 c013f110 c7684c40 c7684c40 c7684c40 c7f45000 c7529fa4 c013f1e7 c13fe520 c7684c40 c7528000 bffffea7 00000000 bffffc24 c7d6cf60 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> CPU: 0 EIP: 0010:[<c010b9b5>] EFLAGS: 00000202 eax: 00000001 ebx: c752fde0 ecx: c022ce90 edx: c752fdf8 esi: c022f500 edi: c13fe520 ebp: c7684ec0 esp: c74edf4c ds: 0018 es: 0018 ss: 0018 Process rm (pid: 164, stackpage=c74ed000) Stack: c74edf50 c0140018 c01475b5 c7684ec0 c752fde0 c0145df8 c752fde0 c74ec000 00000000 c013f110 c7684ec0 c7684ec0 c7684ec0 c7f45000 c74edfa4 c013f1e7 c13fe520 c7684ec0 c74ec000 bffffec4 00000000 bffffc44 c7d6cf60 c7f7b8c0 Call Trace: [<c0140018>] [<c01475b5>] [<c0145df8>] [<c013f110>] [<c013f1e7>] [<c010b513>] Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4 >>EIP; c010b9b5 <printstate+9/2c> <===== Trace; c0140018 <sys_rename+154/270> Trace; c01475b5 <iput+1a1/1b4> Trace; c0145df8 <d_delete+60/a8> Trace; c013f110 <vfs_unlink+160/190> Trace; c013f1e7 <sys_unlink+a7/120> Trace; c010b513 <system_call+33/38> Code; c010b9b5 <printstate+9/2c> 1 error issued. Results may not be reliable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption @ 2000-11-29 13:44 Andries.Brouwer 2000-11-29 14:10 ` corruption Tigran Aivazian 0 siblings, 1 reply; 55+ messages in thread From: Andries.Brouwer @ 2000-11-29 13:44 UTC (permalink / raw) To: Andries.Brouwer, torvalds; +Cc: linux-kernel I just tried 2.4.0test12pre3, which has Jens' fix, and no corruption to be seen. Will test a bit more, but perhaps this did it. Andries - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 13:44 corruption Andries.Brouwer @ 2000-11-29 14:10 ` Tigran Aivazian 2000-11-29 14:16 ` corruption Alexander Viro 2000-11-29 14:26 ` corruption Jens Axboe 0 siblings, 2 replies; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 14:10 UTC (permalink / raw) To: Andries.Brouwer; +Cc: torvalds, linux-kernel On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > I just tried 2.4.0test12pre3, which has Jens' fix, > and no corruption to be seen. Will test a bit more, > but perhaps this did it. > I have also been testing very hard on the SMP (4xXeon/6G) machine with test12-pre3 and also cannot reproduce the problem. This is a SCSI-only machine and I don't know what Jens' fix is and whether it is applicable or not. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 14:10 ` corruption Tigran Aivazian @ 2000-11-29 14:16 ` Alexander Viro 2000-11-29 14:26 ` corruption Jens Axboe 1 sibling, 0 replies; 55+ messages in thread From: Alexander Viro @ 2000-11-29 14:16 UTC (permalink / raw) To: Tigran Aivazian; +Cc: Andries.Brouwer, torvalds, linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > > I just tried 2.4.0test12pre3, which has Jens' fix, > > and no corruption to be seen. Will test a bit more, > > but perhaps this did it. > > > > I have also been testing very hard on the SMP (4xXeon/6G) machine with > test12-pre3 and also cannot reproduce the problem. This is a SCSI-only > machine and I don't know what Jens' fix is and whether it is applicable or > not. Change in the __make_request() and no, it shouldn't change the behaviour on SCSI. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 14:10 ` corruption Tigran Aivazian 2000-11-29 14:16 ` corruption Alexander Viro @ 2000-11-29 14:26 ` Jens Axboe 1 sibling, 0 replies; 55+ messages in thread From: Jens Axboe @ 2000-11-29 14:26 UTC (permalink / raw) To: Tigran Aivazian; +Cc: Andries.Brouwer, torvalds, linux-kernel On Wed, Nov 29 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > > I just tried 2.4.0test12pre3, which has Jens' fix, > > and no corruption to be seen. Will test a bit more, > > but perhaps this did it. > > > > I have also been testing very hard on the SMP (4xXeon/6G) machine with > test12-pre3 and also cannot reproduce the problem. This is a SCSI-only > machine and I don't know what Jens' fix is and whether it is applicable or > not. No, the fix could only really make a difference on IDE. So it can't possibly account for all the corruption issues reported, but I'm hoping at least some of them... The fix was posted in the other corruption thread, and it's in test12-pre3 too. -- * Jens Axboe <axboe@suse.de> * SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption
@ 2000-11-29 11:16 Andries.Brouwer
2000-11-29 17:47 ` corruption Linus Torvalds
0 siblings, 1 reply; 55+ messages in thread
From: Andries.Brouwer @ 2000-11-29 11:16 UTC (permalink / raw)
To: Andries.Brouwer, torvalds; +Cc: linux-kernel
> can you give a rough estimate on when you suspect you started seeing it?
I reported both cases. That is, I started seeing it a few days ago.
(But there is no problem during daily work. Also for example a
diff between two kernel trees never gave corruption so far.
It was only with diff between trees several GB in size, and I
don't do that very often.)
Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: corruption 2000-11-29 11:16 corruption Andries.Brouwer @ 2000-11-29 17:47 ` Linus Torvalds 2000-11-29 17:57 ` corruption Tigran Aivazian 2000-11-29 18:07 ` corruption Zdenek Kabelac 0 siblings, 2 replies; 55+ messages in thread From: Linus Torvalds @ 2000-11-29 17:47 UTC (permalink / raw) To: Andries.Brouwer; +Cc: linux-kernel On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > > can you give a rough estimate on when you suspect you started seeing it? > > I reported both cases. That is, I started seeing it a few days ago. I wasn't trying to imply that you hadn't reported them well. It's just that I was born with a highly developed case of Altzheimers, and I have trouble keeping details around in my head for more than about five minutes. I'm half serious, btw. It's not that I don't have a good memory, but I tend to remember patterns and how things work, and I'm _really_ bad at keeping track of details. This is why I absolutely depend on people like Alan Cox etc who maintain lists of problems, and who are good at gathering reports on what kinds of machines see it etc. I just suck at it. I really do. Anyway, it tentatively sounds like it might have been request corruption by the new re-merge code. It fits the details, you having IDE and all. I see that you can't at least easily reproduce it in pre3 any more, but if it turns out later that you still can, please holler. Loudly. That still leaves the SCSI corruption, which could not have been due to the request issue. What's the pattern there for people? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 17:47 ` corruption Linus Torvalds @ 2000-11-29 17:57 ` Tigran Aivazian 2000-11-29 18:08 ` corruption Tigran Aivazian 2000-11-29 18:07 ` corruption Zdenek Kabelac 1 sibling, 1 reply; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 17:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Linus Torvalds wrote: > That still leaves the SCSI corruption, which could not have been due to > the request issue. What's the pattern there for people? Linus, I confess that at the time (when I reproduced this problem on my SCSI-only 4way/6G machine) I did not realize the importance of observing the pattern or even just saving the log. No, I was _not_ just being stupid but rather it was _so_ easy to panic Linux at the time (for various reasons) that this one looked like just "yet another panic" somewhere. Now, I am trying hard (lots of kernel compiles, bonnies, diff -urN between linux trees, cp -a linuxA linuxB etc etc) to reproduce it and I can't. All I remember from memory was those messages about "freeing stuff not in datazone" etc. They were the same messages as I had on an IDE system and the same as Mohammad and others reported on the list recently. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 17:57 ` corruption Tigran Aivazian @ 2000-11-29 18:08 ` Tigran Aivazian 2000-11-29 18:14 ` corruption Tigran Aivazian ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 18:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000, Linus Torvalds wrote: > > That still leaves the SCSI corruption, which could not have been due to > > the request issue. What's the pattern there for people? one more thing I remember when this happened: a) lots of ld processes from kernel compilation were failing with ENOSPC although df(1) was showing plenty of memory and I could manually "touch ok" in the same filesystem just fine. b) immediately restarting "make -j4 bzImage" would go on for quite a bit and then hit the same set of .c files and "run out of space" again. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 18:08 ` corruption Tigran Aivazian @ 2000-11-29 18:14 ` Tigran Aivazian 2000-11-29 18:17 ` corruption Alexander Viro 2000-11-29 18:38 ` corruption Linus Torvalds 2 siblings, 0 replies; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 18:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > although df(1) was showing plenty of memory and I could manually "touch ~~~~~~ space, I meant. *blush* Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 18:08 ` corruption Tigran Aivazian 2000-11-29 18:14 ` corruption Tigran Aivazian @ 2000-11-29 18:17 ` Alexander Viro 2000-11-29 18:38 ` corruption Linus Torvalds 2 siblings, 0 replies; 55+ messages in thread From: Alexander Viro @ 2000-11-29 18:17 UTC (permalink / raw) To: Tigran Aivazian; +Cc: Linus Torvalds, Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000, Tigran Aivazian wrote: > > > On Wed, 29 Nov 2000, Linus Torvalds wrote: > > > That still leaves the SCSI corruption, which could not have been due to > > > the request issue. What's the pattern there for people? > > one more thing I remember when this happened: > > a) lots of ld processes from kernel compilation were failing with ENOSPC > although df(1) was showing plenty of memory and I could manually "touch > ok" in the same filesystem just fine. Consistent with bitmap corruption - counters are out of sync with the block/inode bitmaps. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 18:08 ` corruption Tigran Aivazian 2000-11-29 18:14 ` corruption Tigran Aivazian 2000-11-29 18:17 ` corruption Alexander Viro @ 2000-11-29 18:38 ` Linus Torvalds 2000-11-29 18:47 ` corruption Tigran Aivazian 2 siblings, 1 reply; 55+ messages in thread From: Linus Torvalds @ 2000-11-29 18:38 UTC (permalink / raw) To: Tigran Aivazian; +Cc: Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000, Tigran Aivazian wrote: > > > On Wed, 29 Nov 2000, Linus Torvalds wrote: > > > That still leaves the SCSI corruption, which could not have been due to > > > the request issue. What's the pattern there for people? > > one more thing I remember when this happened: > > a) lots of ld processes from kernel compilation were failing with ENOSPC > although df(1) was showing plenty of memory and I could manually "touch > ok" in the same filesystem just fine. > > b) immediately restarting "make -j4 bzImage" would go on for quite a bit > and then hit the same set of .c files and "run out of space" again. Ehh, this is a stupid question, but I've had that happen too, and it turned out my /tmp filesystem was full, and it runs out of space only with certain large link cases (never anything else, because all the other stages of compilation are done with -pipe and do not use /tmp files). I'm embarrassed to even mention this, but I'v ebeen confused myself. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 18:38 ` corruption Linus Torvalds @ 2000-11-29 18:47 ` Tigran Aivazian 0 siblings, 0 replies; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 18:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, linux-kernel On Wed, 29 Nov 2000, Linus Torvalds wrote: > Ehh, this is a stupid question, but I've had that happen too, and it > turned out my /tmp filesystem was full, and it runs out of space only with > certain large link cases (never anything else, because all the other > stages of compilation are done with -pipe and do not use /tmp files). > > I'm embarrassed to even mention this, but I'v ebeen confused myself. > No, I did check my /tmp, I can assure you. Well, no, I did _not_ but I do not have a separate /tmp. I like huge root filesystems (until recently, when I realized that I most often corrupt/work in /usr/src so it's now separate and there are multiple roots for disaster recovery) so checking that root was OK implied /tmp was OK. Also, running out of space in /tmp can hardly cause a bunch of ext2 messages about freeing blocks not in datazone etc. Regards, Tigran PS. I do have another filesystem which is separate from root, i.e. /boot, again shared amongst all roots but I doubt ld(1) is storing anything in /boot :) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 17:47 ` corruption Linus Torvalds 2000-11-29 17:57 ` corruption Tigran Aivazian @ 2000-11-29 18:07 ` Zdenek Kabelac 1 sibling, 0 replies; 55+ messages in thread From: Zdenek Kabelac @ 2000-11-29 18:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, linux-kernel Linus Torvalds wrote: > > On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > > > > can you give a rough estimate on when you suspect you started seeing it? > > > > I reported both cases. That is, I started seeing it a few days ago. I'm seeing this kind of corruption during the tar zxf. $ tar zxf linux-2.4.0-test11.tar.gz gzip: stdin: invalid compressed data--format violated tar: Unexpected EOF in archive tar: Child returned status 1 tar: Error exit delayed from previous errors Currently running kernel is 2.4.0-test11-ac4 and the file is correct. After the reboot there is no problem with uncompressing this file. It possible that this problem is fixed with test12-pre3, but in case its not I'm reporting this now (this already happend to me at least three times with this kernel - also there are no messages in the kernel log. -- There are three types of people in the world: those who can count, and those who can't. Zdenek Kabelac http://i.am/kabi/ kabi@i.am {debian.org; fi.muni.cz} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* corruption
@ 2000-11-29 4:08 Andries.Brouwer
2000-11-29 5:09 ` corruption Linus Torvalds
0 siblings, 1 reply; 55+ messages in thread
From: Andries.Brouwer @ 2000-11-29 4:08 UTC (permalink / raw)
To: linux-kernel, torvalds
I did again a large test comparing two identical trees.
Found again corruption, and, upon inspection, the disk
files did not differ - this is in-core corruption only.
A few days ago:
diff -r /c2/linux/linux-2.4.0-test10/linux/include/asm-sparc/ecc.h /g1/linux/li\
nux-2.4.0-test10/linux/include/asm-sparc/ecc.h
80,83c80,95
< #define ECC_FADDR0_CACHE 0x00000800
< #define ECC_FADDR0_SIZE 0x00000700
< #define ECC_FADDR0_TYPE 0x000000f0
< #define ECC_FADDR0_PADDR 0x0000000f
---
> #define ECC_FADDR0_Ccount << RATIO_SCALE_LOG;
> if (db->bytes_out != 0)
> {
> new_ratio /= db->bytes_out;
> }
>
> if (new_ratio < db->ratio || new_ratio < 1 * RATIO_SCALE)
> {
> bsd_clear (db);
> return 1;
> }
> db->ratio = new_ratio;
> }
> }
> return 0;
> }
Here the corruption starts precisely 3072 bytes into the file
(which lives on a filesystem with 1024-byte blocks).
But the tail is a fragment of drivers/isdn/isdn_bsdcomp.c
starting at an offset of 7168 bytes.
The former lives on blocks 6373895 6373896 6373897 6373898 6373899,
the other on blocks 2475568...2475579,2475616...2475628.
Today:
diff -r /g1/linux/linux-2.4.0-test11vanilla/linux/net/sched/sch_cbq.c /c2/linux\
/linux-2.4.0-test11vanilla/linux/net/sched/sch_cbq.c
2000c2000,2115
< cbq_destr\201XM^@\202XM^@^@^@^@^@^@^@^@^@^@^@...
(lots of nulls)
with corruption starting at an offset of 47104=46*1024 bytes.
Don't know where the corruption part is from.
Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: corruption 2000-11-29 4:08 corruption Andries.Brouwer @ 2000-11-29 5:09 ` Linus Torvalds 2000-11-29 9:08 ` corruption Alexander Viro 0 siblings, 1 reply; 55+ messages in thread From: Linus Torvalds @ 2000-11-29 5:09 UTC (permalink / raw) To: Andries.Brouwer; +Cc: linux-kernel On Wed, 29 Nov 2000 Andries.Brouwer@cwi.nl wrote: > > I did again a large test comparing two identical trees. > Found again corruption, and, upon inspection, the disk > files did not differ - this is in-core corruption only. Ok. It definitely looks like the 1kB thing has become broken somehow. The fact that it is in-core only doesn't mean that much - it could still easily be just problems at read-time, and if you have an IDE disk I would strongly suggest you try out the patch that Jens Axboe posted, re-initializing the "head" pointer when doing a re-merge. That said, the VM/ext2 angle should definitely be looked at too. Nothing has really changed there in some time - can you give a rough estimate on when you suspect you started seeing it? Ie is it new to one of the test11 pre-kernels, or does it happen so occasionally that you can't tell whether it happened much earlier too? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 5:09 ` corruption Linus Torvalds @ 2000-11-29 9:08 ` Alexander Viro 2000-11-29 9:20 ` corruption Tigran Aivazian ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Alexander Viro @ 2000-11-29 9:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, Tigran Aivazian, linux-kernel On Tue, 28 Nov 2000, Linus Torvalds wrote: > The fact that it is in-core only doesn't mean that much - it could still > easily be just problems at read-time, and if you have an IDE disk I would > strongly suggest you try out the patch that Jens Axboe posted, > re-initializing the "head" pointer when doing a re-merge. > > That said, the VM/ext2 angle should definitely be looked at too. Nothing > has really changed there in some time - can you give a rough estimate on > when you suspect you started seeing it? Ie is it new to one of the test11 > pre-kernels, or does it happen so occasionally that you can't tell whether > it happened much earlier too? Problem fixed by Jens' patch had been there since March, so if it's a mix of __make_request() screwing up and something else... Urgh. I'ld really like to see details on the box with ext2 corruption on SCSI. Tigran, IIRC you had it on SCSI boxen, right? Could you send me relevant part of logs? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 9:08 ` corruption Alexander Viro @ 2000-11-29 9:20 ` Tigran Aivazian 2000-11-29 9:26 ` corruption Alexander Viro 2000-11-29 18:56 ` corruption Andrea Arcangeli 2000-11-29 19:25 ` corruption Linus Torvalds 2 siblings, 1 reply; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 9:20 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-kernel On Wed, 29 Nov 2000, Alexander Viro wrote: > > I'ld really like to see details on the box with ext2 corruption on SCSI. > Tigran, IIRC you had it on SCSI boxen, right? Could you send me relevant > part of logs? > I definitely did have this very corruption on a 4xXeon SCSI-only box. But the bad news is that I reinstalled redhat7 on it immediately after this happened so I don't have the logs. _However_, I don't need that particular root filesystem there anymore (since more disks arrive today and I'm rearranging stuff) so I'll try and corrupt it for you right now. Using test12-pre3, unless you have better suggestions on what to do to help. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 9:20 ` corruption Tigran Aivazian @ 2000-11-29 9:26 ` Alexander Viro 2000-11-29 10:52 ` corruption Tigran Aivazian 0 siblings, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-11-29 9:26 UTC (permalink / raw) To: Tigran Aivazian; +Cc: linux-kernel On Wed, 29 Nov 2000, Tigran Aivazian wrote: > On Wed, 29 Nov 2000, Alexander Viro wrote: > > > > I'ld really like to see details on the box with ext2 corruption on SCSI. > > Tigran, IIRC you had it on SCSI boxen, right? Could you send me relevant > > part of logs? > > > > I definitely did have this very corruption on a 4xXeon SCSI-only box. But "This" as in "range of blocks duplicated onto another range", "random crap in indirect blocks" or both? > the bad news is that I reinstalled redhat7 on it immediately after this > happened so I don't have the logs. _However_, I don't need that particular > root filesystem there anymore (since more disks arrive today and I'm > rearranging stuff) so I'll try and corrupt it for you right now. Using > test12-pre3, unless you have better suggestions on what to do to help. Could you look for duplicates too? TIA, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 9:26 ` corruption Alexander Viro @ 2000-11-29 10:52 ` Tigran Aivazian 0 siblings, 0 replies; 55+ messages in thread From: Tigran Aivazian @ 2000-11-29 10:52 UTC (permalink / raw) To: Alexander Viro; +Cc: linux-kernel On Wed, 29 Nov 2000, Alexander Viro wrote: > Could you look for duplicates too? will do. One useful finding so far -- trying simultaneous mke2fs /dev/sdX1 for X = {b,c,d,e,f} deadlocks the machine dead (and without kdb such death was in vain). (each disk is 37G, RAM is 6G) I know this is offtopic for this thread but not for this list. I am continuing to pursue the corruption. Nothing yet. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 9:08 ` corruption Alexander Viro 2000-11-29 9:20 ` corruption Tigran Aivazian @ 2000-11-29 18:56 ` Andrea Arcangeli 2000-11-29 19:05 ` corruption Rik van Riel 2000-11-29 19:25 ` corruption Linus Torvalds 2 siblings, 1 reply; 55+ messages in thread From: Andrea Arcangeli @ 2000-11-29 18:56 UTC (permalink / raw) To: Alexander Viro Cc: Linus Torvalds, Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, Nov 29, 2000 at 04:08:26AM -0500, Alexander Viro wrote: > Problem fixed by Jens' patch had been there since March, so if it's a No, it's there only since Jens fixed the request merging bug in test11 or so. With previous kernel the head pointer couldn't change so that change was unnecessary and initializing it outside the critical section was a micro scalability optimization :). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 18:56 ` corruption Andrea Arcangeli @ 2000-11-29 19:05 ` Rik van Riel 2000-11-29 19:27 ` corruption Andrea Arcangeli 0 siblings, 1 reply; 55+ messages in thread From: Rik van Riel @ 2000-11-29 19:05 UTC (permalink / raw) To: Andrea Arcangeli Cc: Alexander Viro, Linus Torvalds, Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, 29 Nov 2000, Andrea Arcangeli wrote: > On Wed, Nov 29, 2000 at 04:08:26AM -0500, Alexander Viro wrote: > > Problem fixed by Jens' patch had been there since March, so if it's a > > No, it's there only since Jens fixed the request merging bug in > test11 or so. > > With previous kernel the head pointer couldn't change so that > change was unnecessary and initializing it outside the critical > section was a micro scalability optimization :). To be honest, I have a big problem with micro optimisations that prevent the big optimisations from happening. Would it be an idea to explicitly comment such dangerous micro optimisations so people implementing the big optimisations later on won't run into nasty surprises? regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 19:05 ` corruption Rik van Riel @ 2000-11-29 19:27 ` Andrea Arcangeli 2000-11-29 20:02 ` corruption Rik van Riel 0 siblings, 1 reply; 55+ messages in thread From: Andrea Arcangeli @ 2000-11-29 19:27 UTC (permalink / raw) To: Rik van Riel Cc: Alexander Viro, Linus Torvalds, Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, Nov 29, 2000 at 05:05:20PM -0200, Rik van Riel wrote: > To be honest, I have a big problem with micro optimisations > that prevent the big optimisations from happening. > > Would it be an idea to explicitly comment such dangerous > micro optimisations so people implementing the big optimisations > later on won't run into nasty surprises? Did you read the code we're talking about? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 19:27 ` corruption Andrea Arcangeli @ 2000-11-29 20:02 ` Rik van Riel 0 siblings, 0 replies; 55+ messages in thread From: Rik van Riel @ 2000-11-29 20:02 UTC (permalink / raw) To: Andrea Arcangeli Cc: Alexander Viro, Linus Torvalds, Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, 29 Nov 2000, Andrea Arcangeli wrote: > On Wed, Nov 29, 2000 at 05:05:20PM -0200, Rik van Riel wrote: > > To be honest, I have a big problem with micro optimisations > > that prevent the big optimisations from happening. > > > > Would it be an idea to explicitly comment such dangerous > > micro optimisations so people implementing the big optimisations > > later on won't run into nasty surprises? > > Did you read the code we're talking about? This particular piece of code may be a bad example of my "complaint", but I guess we can just as easily take something like shrink_mmap() as our example ... regards, Rik -- Hollywood goes for world dumbination, Trailer at 11. http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 9:08 ` corruption Alexander Viro 2000-11-29 9:20 ` corruption Tigran Aivazian 2000-11-29 18:56 ` corruption Andrea Arcangeli @ 2000-11-29 19:25 ` Linus Torvalds 2000-11-29 19:57 ` corruption Alexander Viro 2 siblings, 1 reply; 55+ messages in thread From: Linus Torvalds @ 2000-11-29 19:25 UTC (permalink / raw) To: Alexander Viro; +Cc: Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, 29 Nov 2000, Alexander Viro wrote: > > Problem fixed by Jens' patch had been there since March, so if it's a > mix of __make_request() screwing up and something else... Urgh. No, the bug really got introduced in test11 due to the request merging stuff. The patch may _look_ like it fixed a generic problem that has been there forever, but we didn't actually need the spinlock for initializing "head" at all. It's initialized to a constant offset within the unchaning request queue, so we can happily do it outside the spinlock. The reason the initialization was moved inside the spinlock was really just that it had to be re-initialized for the case where we re-did the merge, so it had to be moved down to inside the loop - and it just happens to happen inside the spinlock now. So the spinlock protection was never relevant to the bug - forgetting to re-initialize a variable when a straight-line code was turned into a loop was the bug. > I'ld really like to see details on the box with ext2 corruption on SCSI. > Tigran, IIRC you had it on SCSI boxen, right? Could you send me relevant > part of logs? I suspect that Tigran may have seen other instability (of which we had lots back when he saw it), and that the current rash is for the IDE problem only. Which is not to say that there might not be SCSI issues or other issues too, but I'm also not convinced that the SCSI thing might not just be a red herring at this point. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 19:25 ` corruption Linus Torvalds @ 2000-11-29 19:57 ` Alexander Viro 2000-11-29 20:36 ` corruption Andrea Arcangeli 0 siblings, 1 reply; 55+ messages in thread From: Alexander Viro @ 2000-11-29 19:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, 29 Nov 2000, Linus Torvalds wrote: > > > On Wed, 29 Nov 2000, Alexander Viro wrote: > > > > Problem fixed by Jens' patch had been there since March, so if it's a > > mix of __make_request() screwing up and something else... Urgh. > > No, the bug really got introduced in test11 due to the request merging > stuff. > > The patch may _look_ like it fixed a generic problem that has been there > forever, but we didn't actually need the spinlock for initializing "head" Sure. > at all. It's initialized to a constant offset within the unchaning request > queue, so we can happily do it outside the spinlock. Actually, I was not thinking about spinlock. What I missed was the fact that again: was quite recent. My apologies... > > I'ld really like to see details on the box with ext2 corruption on SCSI. > > Tigran, IIRC you had it on SCSI boxen, right? Could you send me relevant > > part of logs? > > I suspect that Tigran may have seen other instability (of which we had > lots back when he saw it), and that the current rash is for the IDE > problem only. > > Which is not to say that there might not be SCSI issues or other issues > too, but I'm also not convinced that the SCSI thing might not just be a > red herring at this point. There are two quite distinct patterns: duplicated range vs. crap in metadata. The former looks like a bug caught by Jens. The latter (especially in bitmaps) seems to be older[1] and independent from elevator stuff. _That_ may be a fs/buffer.c or fs/ext2/* bug. The former definitely lives below the fs/buffer.c level. [1] "older" may mean "shared with 2.2" here - ISTR bug reports looking like that and IIRC they were never resolved. BTW, if you know some searchable l-k archive... DN sucks coprolites through the straw these days ;-/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: corruption 2000-11-29 19:57 ` corruption Alexander Viro @ 2000-11-29 20:36 ` Andrea Arcangeli 0 siblings, 0 replies; 55+ messages in thread From: Andrea Arcangeli @ 2000-11-29 20:36 UTC (permalink / raw) To: Alexander Viro Cc: Linus Torvalds, Andries.Brouwer, Tigran Aivazian, linux-kernel On Wed, Nov 29, 2000 at 02:57:11PM -0500, Alexander Viro wrote: > that again: was quite recent. My apologies... Never mind, strict patch reading was obviously misleading in this case. > [1] "older" may mean "shared with 2.2" here - ISTR bug reports looking like > that and IIRC they were never resolved. [..] I don't recall any report after 2.2.1x series but I might have missed them. Previous reports may be due the bugs that got fixed there. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2000-12-04 15:49 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-11-29 21:54 corruption Andries.Brouwer
2000-11-29 22:18 ` corruption Alexander Viro
2000-11-30 14:21 ` corruption Andrew Morton
2000-11-30 18:39 ` corruption Jonathan Hudson
2000-11-30 19:07 ` corruption Alexander Viro
2000-11-30 21:35 ` corruption Andrew Morton
2000-12-01 0:57 ` corruption Andrew Morton
2000-12-01 12:18 ` corruption Jens Axboe
2000-12-01 12:34 ` corruption Andrew Morton
2000-12-01 12:37 ` corruption Jens Axboe
2000-12-01 12:23 ` corruption Andrew Morton
2000-12-01 15:04 ` corruption Lawrence Walton
2000-12-01 14:16 ` corruption Stephen C. Tweedie
2000-12-01 23:28 ` corruption Andrew Morton
2000-12-02 0:30 ` corruption kumon
2000-12-02 3:59 ` corruption Andrew Morton
2000-12-02 14:00 ` corruption Andrew Morton
2000-12-02 15:33 ` corruption Alexander Viro
2000-12-02 16:39 ` corruption Petr Vandrovec
2000-12-02 17:50 ` corruption Alexander Viro
2000-12-02 17:59 ` corruption Alexander Viro
2000-12-03 20:24 ` corruption Jonathan Hudson
2000-12-03 21:44 ` corruption Andrew Morton
2000-12-03 22:45 ` [resync?] corruption Alexander Viro
2000-12-04 0:56 ` Jeff V. Merkey
2000-12-04 15:00 ` corruption Stephen C. Tweedie
2000-12-04 15:19 ` corruption Alexander Viro
2000-12-01 17:29 ` corruption Jeff Garzik
[not found] <20001202161158.A475@ppc.vc.cvut.cz>
2000-12-02 15:35 ` corruption Petr Vandrovec
-- strict thread matches above, loose matches on Subject: below --
2000-11-29 13:44 corruption Andries.Brouwer
2000-11-29 14:10 ` corruption Tigran Aivazian
2000-11-29 14:16 ` corruption Alexander Viro
2000-11-29 14:26 ` corruption Jens Axboe
2000-11-29 11:16 corruption Andries.Brouwer
2000-11-29 17:47 ` corruption Linus Torvalds
2000-11-29 17:57 ` corruption Tigran Aivazian
2000-11-29 18:08 ` corruption Tigran Aivazian
2000-11-29 18:14 ` corruption Tigran Aivazian
2000-11-29 18:17 ` corruption Alexander Viro
2000-11-29 18:38 ` corruption Linus Torvalds
2000-11-29 18:47 ` corruption Tigran Aivazian
2000-11-29 18:07 ` corruption Zdenek Kabelac
2000-11-29 4:08 corruption Andries.Brouwer
2000-11-29 5:09 ` corruption Linus Torvalds
2000-11-29 9:08 ` corruption Alexander Viro
2000-11-29 9:20 ` corruption Tigran Aivazian
2000-11-29 9:26 ` corruption Alexander Viro
2000-11-29 10:52 ` corruption Tigran Aivazian
2000-11-29 18:56 ` corruption Andrea Arcangeli
2000-11-29 19:05 ` corruption Rik van Riel
2000-11-29 19:27 ` corruption Andrea Arcangeli
2000-11-29 20:02 ` corruption Rik van Riel
2000-11-29 19:25 ` corruption Linus Torvalds
2000-11-29 19:57 ` corruption Alexander Viro
2000-11-29 20:36 ` corruption Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox