* PROBLEM: Oops when doing disk heavy disk I/O @ 2006-10-24 15:56 Michael Sallaway 2006-10-24 17:26 ` Sergio Monteiro Basto 2006-10-24 17:43 ` Badari Pulavarty 0 siblings, 2 replies; 11+ messages in thread From: Michael Sallaway @ 2006-10-24 15:56 UTC (permalink / raw) To: linux-kernel [1.] One line summary of the problem: Kernel Oopses when doing I/O to a disk (using dd). [2.] Full description of the problem/report: When writing to [any] disk (IDE or SCSI), the kernel will Oops after short periods of time ranging from 30 seconds to 5-10 minutes. Sometimes this is a complete crash (with "Aieee, killing interrupt handler!"), sometimes it's just an oops but the system doesn't crash comepletely (isn't very usable, though), and sometimes it just gives a "general protection fault: 0000 [1] SMP". It's worth mentioning that I've managed to set up the entire system without incident -- a debian netinstall, downloading new packages, changing things, etc. The only reason I discovered this was when copying large amounts of data off another machine, and it died reproducably after a few gigabytes of data. (I originally thought it was an XFS issue, but had the same problem with EXT3, and all other combinations I tried - I can now reproduce it by using dd if=/dev/zero of=/dev/hda6.) Ultimately, I've tried it with different (known good) devices and hard drives. The only common things between all failures are the CPU (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533). (Memtest x86 shows the memory to be fine.) As such, I'm suspecting it's something to do with the motherboard -- It's using an x86_64 kernel (although it does the same with an i386), on an nforce 570 motherboard. Other things I have tried: - SATA, SCSI and IDE drives -- all do the same thing - removing *all* drives and cards and devices -- it does it with a single IDE drive connected and no PCI cards - kernels 2.6.16, 18, 18.1, 19-rc3. - the patch suggested in http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 - booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby) - with and without md devices [3.] Keywords (i.e., modules, networking, kernel): kernel disk I/O nforce [4.] Kernel version (from /proc/version): Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007 (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) (as I understand it from Documentation/oops-tracing.txt, ksymoops doesn't apply anymore? If that's not the case, I apologise -- could someone tell me what I need to do with the below?) Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Oct 25 00:23:11 barbossa kernel: [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0 Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP Oct 25 00:23:11 barbossa kernel: CPU 0 Oct 25 00:23:11 barbossa kernel: Modules linked in: Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1 Oct 25 00:23:11 barbossa kernel: RIP: 0010:[<ffffffff8021a6c5>] [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8 EFLAGS: 00010283 Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002 Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI: 000000000019eea9 RDI: ffff810037cc8440 Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08: ffff810037cc8440 R09: ffff81003fa35b48 Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11: ffffffff80440b8b R12: ffff81001b3426e0 Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14: ffff810037cc8440 R15: 0000000001c42574 Oct 25 00:23:11 barbossa kernel: FS: 00002b64b70eb6d0(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000 Oct 25 00:23:11 barbossa kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3: 000000007b6b7000 CR4: 00000000000006e0 Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo ffff81003fa34000, task ffff810037e44140) Oct 25 00:23:11 barbossa kernel: Stack: ffff81003fa35b48 ffffffff802bee4c 0000040001602550 ffff810001602550 Oct 25 00:23:11 barbossa kernel: ffff81003fa35b48 ffff810037cc8550 000000000000000b ffff81007efe9b08 Oct 25 00:23:11 barbossa kernel: 0000000000000000 ffffffff8029db1e 000000000000000e ffffffff802bdff3 Oct 25 00:23:11 barbossa kernel: Call Trace: Oct 25 00:23:11 barbossa kernel: [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46 Oct 25 00:23:11 barbossa kernel: [<ffffffff8029db1e>] generic_writepages+0x18e/0x2d8 Oct 25 00:23:11 barbossa kernel: [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf Oct 25 00:23:11 barbossa kernel: [<ffffffff8025591f>] do_writepages+0x20/0x2d Oct 25 00:23:11 barbossa kernel: [<ffffffff8022cf8c>] __writeback_single_inode+0x1b4/0x38b Oct 25 00:23:11 barbossa kernel: [<ffffffff8021f44f>] sync_sb_inodes+0x1d1/0x2b5 Oct 25 00:23:11 barbossa kernel: [<ffffffff8024b3b7>] writeback_inodes+0x82/0xd8 Oct 25 00:23:11 barbossa kernel: [<ffffffff8029de51>] balance_dirty_pages_ratelimited_nr+0x115/0x1f6 Oct 25 00:23:11 barbossa kernel: [<ffffffff8020f4b1>] generic_file_buffered_write+0x516/0x64b Oct 25 00:23:11 barbossa kernel: [<ffffffff802295b6>] remove_suid+0x1/0x1c Oct 25 00:23:11 barbossa kernel: [<ffffffff80215018>] __generic_file_aio_write_nolock+0x375/0x3e8 Oct 25 00:23:11 barbossa kernel: [<ffffffff802078bb>] unmap_vmas+0x372/0x716 Oct 25 00:23:11 barbossa kernel: [<ffffffff8029bdea>] generic_file_aio_write_nolock+0x3a/0x86 Oct 25 00:23:11 barbossa kernel: [<ffffffff802166a9>] do_sync_write+0xc9/0x10c Oct 25 00:23:11 barbossa kernel: [<ffffffff802899a3>] autoremove_wake_function+0x0/0x2e Oct 25 00:23:11 barbossa kernel: [<ffffffff8022c1e9>] __clear_user+0x12/0x50 Oct 25 00:23:11 barbossa kernel: [<ffffffff8048c584>] read_zero+0x1d1/0x23c Oct 25 00:23:11 barbossa kernel: [<ffffffff802153ee>] vfs_write+0xce/0x174 Oct 25 00:23:11 barbossa kernel: [<ffffffff8020afee>] fget_light+0x18/0x7c Oct 25 00:23:11 barbossa kernel: [<ffffffff80215d2f>] sys_write+0x45/0x6e Oct 25 00:23:11 barbossa kernel: [<ffffffff8025811e>] system_call+0x7e/0x83 Oct 25 00:23:11 barbossa kernel: Oct 25 00:23:11 barbossa kernel: Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02 74 66 8b 44 24 14 48 39 43 20 Oct 25 00:23:11 barbossa kernel: RIP [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df Oct 25 00:23:11 barbossa kernel: RSP <ffff81003fa358f8> Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 [6.] A small shell script or example program which triggers the problem (if possible) dd if=/dev/zero of=/dev/hda6 bs=512K (note that it will also happen without the bs argument, however usually takes longer. It's not related to a particular point in the disk, or anything, though, just seems to last longer.) [7.] Environment Please see the below output for more environment information -- I didn't want to dump too much info in here. :-) http://sallaway.org/lkml/output.txt also, I've seen mention of this, but I'm not sure if it would be useful: http://sallaway.org/lkml/System.map-2.6.19-rc3 Please let me know if you need any more info. This is my first bug report, so apologies if this should have gone elsewhere. :-) Cheers, Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-24 15:56 PROBLEM: Oops when doing disk heavy disk I/O Michael Sallaway @ 2006-10-24 17:26 ` Sergio Monteiro Basto 2006-10-25 0:21 ` Michael 2006-10-24 17:43 ` Badari Pulavarty 1 sibling, 1 reply; 11+ messages in thread From: Sergio Monteiro Basto @ 2006-10-24 17:26 UTC (permalink / raw) To: Michael Sallaway; +Cc: linux-kernel Hi, please boot with "report_lost_ticks" like http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 and see if you have time.c: Lost n timer tick(s)! and cat /sys/devices/system/clocksource/clocksource0/ cat /sys/devices/system/clocksource/clocksource0/current_clocksource Thanks, On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote: > [1.] One line summary of the problem: > Kernel Oopses when doing I/O to a disk (using dd). > > [2.] Full description of the problem/report: > When writing to [any] disk (IDE or SCSI), the kernel will Oops after > short periods of time ranging from 30 seconds to 5-10 minutes. > Sometimes this is a complete crash (with "Aieee, killing interrupt > handler!"), sometimes it's just an oops but the system doesn't crash > comepletely (isn't very usable, though), and sometimes it just gives a > "general protection fault: 0000 [1] SMP". > > It's worth mentioning that I've managed to set up the entire system > without incident -- a debian netinstall, downloading new packages, > changing things, etc. The only reason I discovered this was when > copying large amounts of data off another machine, and it died > reproducably after a few gigabytes of data. (I originally thought it > was an XFS issue, but had the same problem with EXT3, and all other > combinations I tried - I can now reproduce it by using dd if=/dev/zero > of=/dev/hda6.) > > Ultimately, I've tried it with different (known good) devices and hard > drives. The only common things between all failures are the CPU > (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533). > (Memtest x86 shows the memory to be fine.) As such, I'm suspecting > it's something to do with the motherboard -- It's using an x86_64 > kernel (although it does the same with an i386), on an nforce 570 > motherboard. > > Other things I have tried: > - SATA, SCSI and IDE drives -- all do the same thing > - removing *all* drives and cards and devices -- it does it with a > single IDE drive connected and no PCI cards > - kernels 2.6.16, 18, 18.1, 19-rc3. > - the patch suggested in > http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 > - booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby) > - with and without md devices > > > [3.] Keywords (i.e., modules, networking, kernel): > kernel disk I/O nforce > > [4.] Kernel version (from /proc/version): > Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007 > (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006 > > [5.] Output of Oops.. message (if applicable) with symbolic > information resolved (see Documentation/oops-tracing.txt) > (as I understand it from Documentation/oops-tracing.txt, ksymoops > doesn't apply anymore? If that's not the case, I apologise -- could > someone tell me what I need to do with the below?) > > Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer > dereference at 0000000000000000 RIP: > Oct 25 00:23:11 barbossa kernel: [<ffffffff8021a6c5>] > __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0 > Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP > Oct 25 00:23:11 barbossa kernel: CPU 0 > Oct 25 00:23:11 barbossa kernel: Modules linked in: > Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1 > Oct 25 00:23:11 barbossa kernel: RIP: 0010:[<ffffffff8021a6c5>] > [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8 EFLAGS: 00010283 > Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX: > 0000000000000000 RCX: 0000000000000002 > Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI: > 000000000019eea9 RDI: ffff810037cc8440 > Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08: > ffff810037cc8440 R09: ffff81003fa35b48 > Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11: > ffffffff80440b8b R12: ffff81001b3426e0 > Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14: > ffff810037cc8440 R15: 0000000001c42574 > Oct 25 00:23:11 barbossa kernel: FS: 00002b64b70eb6d0(0000) > GS:ffffffff807d6000(0000) knlGS:0000000000000000 > Oct 25 00:23:11 barbossa kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 000000008005003b > Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3: > 000000007b6b7000 CR4: 00000000000006e0 > Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo > ffff81003fa34000, task ffff810037e44140) > Oct 25 00:23:11 barbossa kernel: Stack: ffff81003fa35b48 > ffffffff802bee4c 0000040001602550 ffff810001602550 > Oct 25 00:23:11 barbossa kernel: ffff81003fa35b48 ffff810037cc8550 > 000000000000000b ffff81007efe9b08 > Oct 25 00:23:11 barbossa kernel: 0000000000000000 ffffffff8029db1e > 000000000000000e ffffffff802bdff3 > Oct 25 00:23:11 barbossa kernel: Call Trace: > Oct 25 00:23:11 barbossa kernel: [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8029db1e>] > generic_writepages+0x18e/0x2d8 > Oct 25 00:23:11 barbossa kernel: [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf > Oct 25 00:23:11 barbossa kernel: [<ffffffff8025591f>] do_writepages+0x20/0x2d > Oct 25 00:23:11 barbossa kernel: [<ffffffff8022cf8c>] > __writeback_single_inode+0x1b4/0x38b > Oct 25 00:23:11 barbossa kernel: [<ffffffff8021f44f>] > sync_sb_inodes+0x1d1/0x2b5 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8024b3b7>] > writeback_inodes+0x82/0xd8 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8029de51>] > balance_dirty_pages_ratelimited_nr+0x115/0x1f6 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8020f4b1>] > generic_file_buffered_write+0x516/0x64b > Oct 25 00:23:11 barbossa kernel: [<ffffffff802295b6>] remove_suid+0x1/0x1c > Oct 25 00:23:11 barbossa kernel: [<ffffffff80215018>] > __generic_file_aio_write_nolock+0x375/0x3e8 > Oct 25 00:23:11 barbossa kernel: [<ffffffff802078bb>] unmap_vmas+0x372/0x716 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8029bdea>] > generic_file_aio_write_nolock+0x3a/0x86 > Oct 25 00:23:11 barbossa kernel: [<ffffffff802166a9>] do_sync_write+0xc9/0x10c > Oct 25 00:23:11 barbossa kernel: [<ffffffff802899a3>] > autoremove_wake_function+0x0/0x2e > Oct 25 00:23:11 barbossa kernel: [<ffffffff8022c1e9>] __clear_user+0x12/0x50 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8048c584>] read_zero+0x1d1/0x23c > Oct 25 00:23:11 barbossa kernel: [<ffffffff802153ee>] vfs_write+0xce/0x174 > Oct 25 00:23:11 barbossa kernel: [<ffffffff8020afee>] fget_light+0x18/0x7c > Oct 25 00:23:11 barbossa kernel: [<ffffffff80215d2f>] sys_write+0x45/0x6e > Oct 25 00:23:11 barbossa kernel: [<ffffffff8025811e>] system_call+0x7e/0x83 > Oct 25 00:23:11 barbossa kernel: > Oct 25 00:23:11 barbossa kernel: > Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02 > 74 66 8b 44 24 14 48 39 43 20 > Oct 25 00:23:11 barbossa kernel: RIP [<ffffffff8021a6c5>] > __block_write_full_page+0xa4/0x2df > Oct 25 00:23:11 barbossa kernel: RSP <ffff81003fa358f8> > Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 > > > [6.] A small shell script or example program which triggers the > problem (if possible) > dd if=/dev/zero of=/dev/hda6 bs=512K > > (note that it will also happen without the bs argument, however > usually takes longer. It's not related to a particular point in the > disk, or anything, though, just seems to last longer.) > > > [7.] Environment > > Please see the below output for more environment information -- I > didn't want to dump too much info in here. :-) > > http://sallaway.org/lkml/output.txt > > also, I've seen mention of this, but I'm not sure if it would be useful: > > http://sallaway.org/lkml/System.map-2.6.19-rc3 > > > Please let me know if you need any more info. This is my first bug > report, so apologies if this should have gone elsewhere. :-) > > Cheers, > Michael > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-24 17:26 ` Sergio Monteiro Basto @ 2006-10-25 0:21 ` Michael 2006-10-25 0:41 ` Sergio Monteiro Basto 0 siblings, 1 reply; 11+ messages in thread From: Michael @ 2006-10-25 0:21 UTC (permalink / raw) To: 'Sergio Monteiro Basto'; +Cc: linux-kernel > -----Original Message----- > From: Sergio Monteiro Basto [mailto:sergio@sergiomb.no-ip.org] > Sent: Wednesday, 25 October 2006 3:27 AM > > Hi, > please boot with "report_lost_ticks" like > http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 > and see if you have > time.c: Lost n timer tick(s)! > > and > cat /sys/devices/system/clocksource/clocksource0/ > > cat /sys/devices/system/clocksource/clocksource0/current_clocksource > > Hi, Yep, seems to be losing some ticks -- I'm not sure what that means, but is it bad? :-) [ root @ barbossa : ~ ] # dmesg | grep tick Command line: root=/dev/md1 ro report_lost_ticks Kernel command line: root=/dev/md1 ro report_lost_ticks time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c) time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121) time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4) (I've put a full dmesg below). Also, for the clocksource: [ root @ barbossa : ~ ] # ls /sys/devices/system/clocksource/clocksource0/ available_clocksource current_clocksource [ root @ barbossa : ~ ] # cat /sys/devices/system/clocksource/clocksource0/ cat: /sys/devices/system/clocksource/clocksource0/: Is a directory [ root @ barbossa : ~ ] # cat /sys/devices/system/clocksource/clocksource0/available_clocksource jiffies [ root @ barbossa : ~ ] # cat /sys/devices/system/clocksource/clocksource0/current_clocksource jiffies Thanks for your help. Cheers, Michael Full dmesg: [ root @ barbossa : ~ ] # dmesg Linux version 2.6.19-rc3 (root@barbossa) (gcc version 4.1.2 20061007 (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006 Command line: root=/dev/md1 ro report_lost_ticks BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f800 (usable) BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fee0000 (usable) BIOS-e820: 000000007fee0000 - 000000007fee3000 (ACPI NVS) BIOS-e820: 000000007fee3000 - 000000007fef0000 (ACPI data) BIOS-e820: 000000007fef0000 - 000000007ff00000 (reserved) BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 524000) 1 entries of 256 used end_pfn_map = 1048576 DMI 2.4 present. ACPI: RSDP (v002 Nvidia ) @ 0x00000000000f7760 ACPI: XSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000007fee30c0 ACPI: FADT (v003 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000007feec380 ACPI: HPET (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000098) @ 0x000000007feec580 ACPI: MCFG (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000007feec600 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x000000007feec4c0 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x0000000000000000 Entering add_active_range(0, 0, 159) 0 entries of 256 used Entering add_active_range(0, 256, 524000) 1 entries of 256 used Zone PFN ranges: DMA 0 -> 4096 DMA32 4096 -> 1048576 Normal 1048576 -> 1048576 early_node_map[2] active PFN ranges 0: 0 -> 159 0: 256 -> 524000 On node 0 totalpages: 523903 DMA zone: 56 pages used for memmap DMA zone: 1737 pages reserved DMA zone: 2206 pages, LIFO batch:0 DMA32 zone: 7108 pages used for memmap DMA32 zone: 512796 pages, LIFO batch:31 Normal zone: 0 pages used for memmap ACPI: PM-Timer IO Port: 0x1008 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. ACPI: IRQ14 used by override. ACPI: IRQ15 used by override. Setting APIC routing to flat ACPI: HPET id: 0x10de8201 base: 0xfefff000 Using ACPI (MADT) for SMP configuration information Nosave address range: 000000000009f000 - 00000000000a0000 Nosave address range: 00000000000a0000 - 00000000000f0000 Nosave address range: 00000000000f0000 - 0000000000100000 Allocating PCI resources starting at 80000000 (gap: 7ff00000:70100000) PERCPU: Allocating 33408 bytes of per cpu data Built 1 zonelists. Total pages: 515002 Kernel command line: root=/dev/md1 ro report_lost_ticks Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour VGA+ 80x25 time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c) Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Checking aperture... CPU 0: aperture @ c8f4000000 size 32 MB Aperture too small (32 MB) No AGP bridge found Memory: 2045020k/2096000k available (4241k kernel code, 50228k reserved, 1733k data, 272k init) Calibrating delay using timer specific routine.. 4022.98 BogoMIPS (lpj=8045974) Security Framework v1.0.0 initialized SELinux: Disabled at boot. Capability LSM initialized Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) SMP alternatives: switching to UP code Freeing SMP alternatives: 48k freed ACPI: Core revision 20060707 Using local APIC timer interrupts. result 12559460 Detected 12.559 MHz APIC timer. time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121) Brought up 1 CPUs testing NMI watchdog ... OK. time.c: Using 25.000000 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2009.511 MHz processor. checking if image is initramfs... it is Freeing initrd memory: 3210k freed NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) Boot video device is 0000:01:07.0 PCI: Transparent bridge - 0000:00:06.0 ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 9 *10 11 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 9 10 *11 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK5] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK6] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK7] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNK8] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LP2P] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LUBA] (IRQs 5 7 9 *10 11 14 15) ACPI: PCI Interrupt Link [LMAC] (IRQs 5 7 9 *10 11 14 15) ACPI: PCI Interrupt Link [LAZA] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LPMU] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LSMB] (IRQs *5 7 9 10 11 14 15) ACPI: PCI Interrupt Link [LUB2] (IRQs 5 7 9 10 *11 14 15) ACPI: PCI Interrupt Link [LIDE] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LSID] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LFID] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LSA2] (IRQs 5 7 9 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [APC1] (IRQs 16) *0, disabled. ACPI: PCI Interrupt Link [APC2] (IRQs 17) *0, disabled. ACPI: PCI Interrupt Link [APC3] (IRQs 18) *0, disabled. ACPI: PCI Interrupt Link [APC4] (IRQs 19) *0, disabled. ACPI: PCI Interrupt Link [APC5] (IRQs 16) *0, disabled. ACPI: PCI Interrupt Link [APC6] (IRQs 16) *0, disabled. ACPI: PCI Interrupt Link [APC7] (IRQs 16) *0, disabled. ACPI: PCI Interrupt Link [APC8] (IRQs 16) *0, disabled. ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APMU] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [AAZA] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APCS] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APSI] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [APSJ] (IRQs 20 21 22 23) *0, disabled. ACPI: PCI Interrupt Link [ASA2] (IRQs 20 21 22 23) *0, disabled. SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report PCI-DMA: Disabling IOMMU. PCI: Bridge: 0000:00:06.0 IO window: d000-dfff MEM window: fc800000-fd7fffff PREFETCH window: 80000000-800fffff PCI: Bridge: 0000:00:0a.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0b.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0d.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0e.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:0f.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Setting latency timer of device 0000:00:06.0 to 64 PCI: Setting latency timer of device 0000:00:0a.0 to 64 PCI: Setting latency timer of device 0000:00:0b.0 to 64 PCI: Setting latency timer of device 0000:00:0c.0 to 64 PCI: Setting latency timer of device 0000:00:0d.0 to 64 PCI: Setting latency timer of device 0000:00:0e.0 to 64 PCI: Setting latency timer of device 0000:00:0f.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered audit: initializing netlink socket (disabled) audit(1161771160.632:1): initialized VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Installing knfsd (copyright (C) 1996 okir@monad.swb.de). fuse init (API version 7.7) JFS: nTxBlock = 8192, nTxLock = 65536 SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled SGI XFS Quota Management subsystem io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4) PCI: Setting latency timer of device 0000:00:0a.0 to 64 pcie_portdrv_probe->Dev[0376:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0a.0:pcie00] PCI: Setting latency timer of device 0000:00:0b.0 to 64 pcie_portdrv_probe->Dev[0374:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0b.0:pcie00] PCI: Setting latency timer of device 0000:00:0c.0 to 64 pcie_portdrv_probe->Dev[0374:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0c.0:pcie00] PCI: Setting latency timer of device 0000:00:0d.0 to 64 pcie_portdrv_probe->Dev[0378:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0d.0:pcie00] PCI: Setting latency timer of device 0000:00:0e.0 to 64 pcie_portdrv_probe->Dev[0375:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0e.0:pcie00] PCI: Setting latency timer of device 0000:00:0f.0 to 64 pcie_portdrv_probe->Dev[0377:10de] has invalid IRQ. Check vendor BIOS assign_interrupt_mode Found MSI capability Allocate Port Service[0000:00:0f.0:pcie00] vga16fb: initializing vga16fb: mapped to 0xffff8100000a0000 fb0: VGA16 VGA frame buffer device ACPI: Power Button (FF) [PWRF] ACPI: Power Button (CM) [PWRB] ACPI: Fan [FAN] (on) ACPI: Getting cpuindex for acpiid 0x1 ACPI: Thermal Zone [THRM] (40 C) Real Time Clock Driver v1.12ac Linux agpgart interface v0.101 (c) Dave Jones Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize loop: loaded (max 8 devices) forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.57. ACPI: PCI Interrupt Link [APCH] enabled at IRQ 23 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [APCH] -> GSI 23 (level, low) -> IRQ 23 PCI: Setting latency timer of device 0000:00:08.0 to 64 forcedeth: using HIGHDMA eth0: forcedeth.c: subsystem: 01043:8239 bound to 0000:00:08.0 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE-MCP55: IDE controller at PCI slot 0000:00:04.0 NFORCE-MCP55: chipset revision 161 NFORCE-MCP55: not 100% native mode: will probe irqs later NFORCE-MCP55: BIOS didn't set cable bits correctly. Enabling workaround. NFORCE-MCP55: 0000:00:04.0 (rev a1) UDMA133 controller ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA Probing IDE interface ide0... hda: IC35L040AVVN07-0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hda: max request size: 128KiB hda: 80418240 sectors (41174 MB) w/1863KiB Cache, CHS=65535/16/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 hda4 < hda5 hda6 > ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18 ACPI: PCI Interrupt 0000:01:08.0[A] -> Link [APC3] -> GSI 18 (level, low) -> IRQ 18 scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0 <Adaptec 29160B Ultra160 SCSI adapter> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs usbmon: debugfs is not available ACPI: PCI Interrupt Link [APCL] enabled at IRQ 22 ACPI: PCI Interrupt 0000:00:02.1[B] -> Link [APCL] -> GSI 22 (level, low) -> IRQ 22 PCI: Setting latency timer of device 0000:00:02.1 to 64 ehci_hcd 0000:00:02.1: EHCI Host Controller ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:02.1: debug port 1 PCI: cache line size of 64 is not supported by device 0000:00:02.1 ehci_hcd 0000:00:02.1: irq 22, io mem 0xfe02e000 ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 10 ports detected 116x: driver isp116x-hcd, 03 Nov 2005 ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ACPI: PCI Interrupt Link [APCF] enabled at IRQ 21 ACPI: PCI Interrupt 0000:00:02.0[A] -> Link [APCF] -> GSI 21 (level, low) -> IRQ 21 PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: OHCI Host Controller ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2 ohci_hcd 0000:00:02.0: irq 21, io mem 0xfe02f000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 10 ports detected USB Universal Host Controller Interface driver v3.0 sl811: driver sl811-hcd, 19 May 2005 usbcore: registered new interface driver cdc_acm drivers/usb/class/cdc-acm.c: v0.25:USB Abstract Control Model driver for USB modems and ISDN adapters Initializing USB Mass Storage driver... usbcore: registered new interface driver usb-storage USB Mass Storage support registered. usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice input: PC Speaker as /class/input/input0 input: AT Translated Set 2 keyboard as /class/input/input1 I2O subsystem v1.325 i2o: max drivers = 8 I2O Configuration OSM v1.323 I2O Bus Adapter OSM v1.317 I2O Block Device OSM v1.325 I2O SCSI Peripheral OSM v1.316 I2O ProcFS OSM v1.316 i2c /dev entries driver i2c_adapter i2c-0: nForce2 SMBus adapter at 0x1c00 i2c_adapter i2c-1: nForce2 SMBus adapter at 0x1c40 md: linear personality registered for level -1 md: raid0 personality registered for level 0 md: raid1 personality registered for level 1 md: raid10 personality registered for level 10 raid6: int64x1 1847 MB/s raid6: int64x2 2357 MB/s raid6: int64x4 2379 MB/s raid6: int64x8 1679 MB/s raid6: sse2x1 2697 MB/s raid6: sse2x2 3699 MB/s raid6: sse2x4 3815 MB/s raid6: using algorithm sse2x4 (3815 MB/s) md: raid6 personality registered for level 6 md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid5: automatically using best checksumming function: generic_sse generic_sse: 6186.000 MB/sec raid5: using function: generic_sse (6186.000 MB/sec) md: multipath personality registered for level -4 md: faulty personality registered for level -5 device-mapper: ioctl: 4.10.0-ioctl (2006-09-14) initialised: dm-devel@redhat.com device-mapper: multipath: version 1.0.5 loaded device-mapper: multipath round-robin: version 1.0.0 loaded device-mapper: multipath emc: version 0.0.3 loaded Netfilter messages via NETLINK v0.30. ip_conntrack version 2.4 (8192 buckets, 65536 max) - 312 bytes per conntrack ctnetlink v0.90: registering with nfnetlink. ip_conntrack_pptp version 3.1 loaded ip_nat_pptp version 3.0 loaded ip_tables: (C) 2000-2006 Netfilter Core Team ClusterIP Version 0.8 loaded successfully arp_tables: (C) 2002 David S. Miller TCP bic registered Initializing XFRM netlink socket NET: Registered protocol family 1 NET: Registered protocol family 10 lo: Disabled Privacy Extensions ip6_tables: (C) 2000-2006 Netfilter Core Team IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 NET: Registered protocol family 15 NET: Registered protocol family 8 NET: Registered protocol family 20 Freeing unused kernel memory: 272k freed md: md0 stopped. md: bind<hda1> raid1: raid set md0 active with 1 out of 2 mirrors md: md1 stopped. md: bind<hda2> raid1: raid set md1 active with 1 out of 2 mirrors EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 15992 to 16718 (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 6324 and revoked 1/4 blocks kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Adding 1052248k swap on /dev/hda3. Priority:-1 extents:1 across:1052248k EXT3 FS on md1, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on md0, internal journal EXT3-fs: mounted filesystem with ordered data mode. eth0: no IPv6 routers present ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 0:21 ` Michael @ 2006-10-25 0:41 ` Sergio Monteiro Basto 2006-10-25 1:25 ` Michael 0 siblings, 1 reply; 11+ messages in thread From: Sergio Monteiro Basto @ 2006-10-25 0:41 UTC (permalink / raw) To: Michael; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1514 bytes --] On Wed, 2006-10-25 at 10:21 +1000, Michael wrote: > > > -----Original Message----- > > From: Sergio Monteiro Basto [mailto:sergio@sergiomb.no-ip.org] > > Sent: Wednesday, 25 October 2006 3:27 AM > > > > Hi, > > please boot with "report_lost_ticks" like > > http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2 > > and see if you have > > time.c: Lost n timer tick(s)! > > > > and > > cat /sys/devices/system/clocksource/clocksource0/ > > > > cat /sys/devices/system/clocksource/clocksource0/current_clocksource > > > > > > Hi, > > Yep, seems to be losing some ticks -- I'm not sure what that means, but is > it bad? :-) > well I don't know for sure , but after this initials Lost Timer tick , it appears more after a few time of uptime ? > [ root @ barbossa : ~ ] # dmesg | grep tick > Command line: root=/dev/md1 ro report_lost_ticks > Kernel command line: root=/dev/md1 ro report_lost_ticks > time.c: Lost 1 timer tick(s)! rip release_console_sem+0x196/0x20c) > time.c: Lost 12 timer tick(s)! rip setup_boot_APIC_clock+0x11f/0x121) > time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4) > > > Also, for the clocksource: > [ root @ barbossa : ~ ] # cat > /sys/devices/system/clocksource/clocksource0/available_clocksource > jiffies > > [ root @ barbossa : ~ ] # cat > /sys/devices/system/clocksource/clocksource0/current_clocksource > jiffies > > > Thanks for your help. > > Cheers, > Michael > -- Sérgio M.B. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 2166 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 0:41 ` Sergio Monteiro Basto @ 2006-10-25 1:25 ` Michael 2006-10-25 4:12 ` Ray Lee 0 siblings, 1 reply; 11+ messages in thread From: Michael @ 2006-10-25 1:25 UTC (permalink / raw) To: sergio; +Cc: linux-kernel > -----Original Message----- > From: Sergio Monteiro Basto [mailto:sergio@sergiomb.no-ip.org] > Sent: Wednesday, 25 October 2006 10:42 AM > > well I don't know for sure , but after this initials Lost Timer tick , > it appears more after a few time of uptime ? > Well, when the machine was just sitting there, I don't get any lost timer ticks -- it's perfectly happy. However, I started to `dd if=/dev/zero of=/dev/hda6` again, and after a few minutes, it oops'ed again (below), losing 2 timer ticks: <4>time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4) Also, I rebooted and did the same command, but with bs=512K. It Oops'ed again (after about 30 seconds?), but this time, with: <4>time.c: Lost 1 timer tick(s)! rip _spin_unlock_irqrestore+0x8/0x9) Kernel panic - not syncing: Aiee, killing interrupt handler! It seems to be related to interrupts, somehow... since I'm not too familiar with what's happening, would anyone have any suggestions about what I could do next to troubleshoot? Thanks! Michael Full Oops: Unable to handle kernel NULL pointer dereference at 000000000000002b RIP: [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df PGD 7a312067 PUD 7a374067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: Pid: 263, comm: pdflush Not tainted 2.6.19-rc3 #1 RIP: 0010:[<ffffffff8021a6c5>] [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df RSP: 0018:ffff810037ebbc30 EFLAGS: 00010287 RAX: 0000000000000000 RBX: 000000000000002b RCX: 0000000000000002 RDX: 000000000000000a RSI: 00000000000554a9 RDI: ffff810037cc8440 RBP: ffff8100015f1c30 R08: ffff810037cc8440 R09: ffff810037ebbe70 R10: ffffffff802bee4c R11: ffffffff80440b8b R12: ffff81001b2c9dc8 R13: 00000000001552a7 R14: ffff810037cc8440 R15: 0000000001c42574 FS: 00002b1c6f9536d0(0000) GS:ffffffff807d6000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000000002b CR3: 000000007a315000 CR4: 00000000000006e0 Process pdflush (pid: 263, threadinfo ffff810037eba000, task ffff810037f250e0) Stack: ffff810037ebbe70 ffffffff802bee4c 00000400015f1c30 ffff8100015f1c30 ffff810037ebbe70 ffff810037cc8550 000000000000000b ffff81007efe9b08 0000000000000000 ffffffff8029db1e 000000000000000e ffffffff802bdff3 Call Trace: [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46 [<ffffffff8029db1e>] generic_writepages+0x18e/0x2d8 [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf [<ffffffff8025591f>] do_writepages+0x20/0x2d [<ffffffff8022cf8c>] __writeback_single_inode+0x1b4/0x38b [<ffffffff8021f44f>] sync_sb_inodes+0x1d1/0x2b5 [<ffffffff802897e0>] keventd_create_kthread+0x0/0x61 [<ffffffff8024b3b7>] writeback_inodes+0x82/0xd8 [<ffffffff8029dfb4>] background_writeout+0x82/0xb5 [<ffffffff8025108f>] pdflush+0x0/0x1e3 [<ffffffff802511c8>] pdflush+0x139/0x1e3 [<ffffffff8029df32>] background_writeout+0x0/0xb5 [<ffffffff8022f78a>] kthread+0xd1/0x101 [<ffffffff80258ed8>] child_rip+0xa/0x12 [<ffffffff802897e0>] keventd_create_kthread+0x0/0x61 [<ffffffff8022f6b9>] kthread+0x0/0x101 [<ffffffff80258ece>] child_rip+0x0/0x12 Code: 8b 03 a8 20 75 6c 8b 03 a8 02 74 66 8b 44 24 14 48 39 43 20 RIP [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df RSP <ffff810037ebbc30> CR2: 000000000000002b <4>time.c: Lost 2 timer tick(s)! rip __do_softirq+0x4a/0xc4) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 1:25 ` Michael @ 2006-10-25 4:12 ` Ray Lee 2006-10-25 12:28 ` Michael 0 siblings, 1 reply; 11+ messages in thread From: Ray Lee @ 2006-10-25 4:12 UTC (permalink / raw) To: Michael; +Cc: sergio, linux-kernel > It seems to be related to interrupts, somehow... since I'm not too familiar > with what's happening, would anyone have any suggestions about what I could > do next to troubleshoot? Try swapping out the RAM (or getting it down to 1Gig). Try a really old kernel, such as debian's 2.6.8 package. Ray ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 4:12 ` Ray Lee @ 2006-10-25 12:28 ` Michael 2006-10-25 15:08 ` Ray Lee 2006-10-25 15:24 ` Paulo Marques 0 siblings, 2 replies; 11+ messages in thread From: Michael @ 2006-10-25 12:28 UTC (permalink / raw) To: ray-gmail; +Cc: linux-kernel > -----Original Message----- > From: Ray Lee [mailto:madrabbit@gmail.com] > Sent: Wednesday, 25 October 2006 2:13 PM > > Try swapping out the RAM (or getting it down to 1Gig). Try a really > old kernel, such as debian's 2.6.8 package. > > Ray Well, what do you know -- that seems to have fixed it! I took out one stick of RAM (so it's down to 1 gig) and it seems to work fine, now, without any boot parameters or anything. (mind you, murphy's law will dictate that it'll crash about 30 seconds after I send this...) I'm amazed at that -- but I'm not going to look a gift horse in the mouth, this has been frustrating me for far too long. :-) Although, having said that, I'm curious... It is working because there's only 1 gig of RAM in there, or because it's only a single stick (ie. not dual-channel)? It works fine with both sticks, individually, just not both together... I wonder what the cause of it actually is... Thanks heaps for the suggestion! Cheers, Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 12:28 ` Michael @ 2006-10-25 15:08 ` Ray Lee 2006-10-25 15:24 ` Paulo Marques 1 sibling, 0 replies; 11+ messages in thread From: Ray Lee @ 2006-10-25 15:08 UTC (permalink / raw) To: Michael; +Cc: linux-kernel On 10/25/06, Michael <michael.sallaway@gmail.com> wrote: > > Try swapping out the RAM (or getting it down to 1Gig). Try a really > > old kernel, such as debian's 2.6.8 package. > > Well, what do you know -- that seems to have fixed it! I took out one stick > of RAM (so it's down to 1 gig) and it seems to work fine, now, without any > boot parameters or anything. (mind you, murphy's law will dictate that it'll > crash about 30 seconds after I send this...) > > I'm amazed at that -- but I'm not going to look a gift horse in the mouth, > this has been frustrating me for far too long. :-) > > Although, having said that, I'm curious... It is working because there's > only 1 gig of RAM in there, or because it's only a single stick (ie. not > dual-channel)? It works fine with both sticks, individually, just not both > together... I wonder what the cause of it actually is... That I don't know. Marginal power supplies can cause very strange problems, or there can be motherboard/memory timing issues with lots of RAM, or just plain old more-than-one-gig-of-RAM-tickles-software bugs. Though the latter, especially on an x86-64 kernel, really shouldn't be very common if present at all. What I'd suggest is resending to the list the first oops after boot (several of them would be good) while using 2 gigs of RAM, with a new subject line to attract those who are especially good at reading these things. (I can barely find the stack trace.) Ray ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-25 12:28 ` Michael 2006-10-25 15:08 ` Ray Lee @ 2006-10-25 15:24 ` Paulo Marques 1 sibling, 0 replies; 11+ messages in thread From: Paulo Marques @ 2006-10-25 15:24 UTC (permalink / raw) To: Michael; +Cc: ray-gmail, linux-kernel Michael wrote: >> From: Ray Lee [mailto:madrabbit@gmail.com] >> Try swapping out the RAM (or getting it down to 1Gig). Try a really >> old kernel, such as debian's 2.6.8 package. > > [...] > Although, having said that, I'm curious... It is working because there's > only 1 gig of RAM in there, or because it's only a single stick (ie. not > dual-channel)? It works fine with both sticks, individually, just not both > together... I wonder what the cause of it actually is... Another thing that I would try is to tweak the /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio settings. pdflush appears in your trace and with twice the RAM there is twice the dirty data to write out. Maybe choosing half the ratio in both settings with 2Gb of RAM would produce the same amount of dirty data as using just 1Gb of RAM with the original settings. This is still a bug, though. This test would just give more debug information. -- Paulo "grasping at straws" Marques - www.grupopie.com "The face of a child can say it all, especially the mouth part of the face." ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-24 15:56 PROBLEM: Oops when doing disk heavy disk I/O Michael Sallaway 2006-10-24 17:26 ` Sergio Monteiro Basto @ 2006-10-24 17:43 ` Badari Pulavarty 2006-10-25 0:09 ` Michael 1 sibling, 1 reply; 11+ messages in thread From: Badari Pulavarty @ 2006-10-24 17:43 UTC (permalink / raw) To: Michael Sallaway; +Cc: lkml On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote: > [1.] One line summary of the problem: > Kernel Oopses when doing I/O to a disk (using dd). > > [2.] Full description of the problem/report: > When writing to [any] disk (IDE or SCSI), the kernel will Oops after > short periods of time ranging from 30 seconds to 5-10 minutes. > Sometimes this is a complete crash (with "Aieee, killing interrupt > handler!"), sometimes it's just an oops but the system doesn't crash > comepletely (isn't very usable, though), and sometimes it just gives a > "general protection fault: 0000 [1] SMP". > > It's worth mentioning that I've managed to set up the entire system > without incident -- a debian netinstall, downloading new packages, > changing things, etc. The only reason I discovered this was when > copying large amounts of data off another machine, and it died > reproducably after a few gigabytes of data. (I originally thought it > was an XFS issue, but had the same problem with EXT3, and all other > combinations I tried - I can now reproduce it by using dd if=/dev/zero > of=/dev/hda6.) > > Ultimately, I've tried it with different (known good) devices and hard > drives. The only common things between all failures are the CPU > (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533). > (Memtest x86 shows the memory to be fine.) As such, I'm suspecting > it's something to do with the motherboard -- It's using an x86_64 > kernel (although it does the same with an i386), on an nforce 570 > motherboard. > > Other things I have tried: > - SATA, SCSI and IDE drives -- all do the same thing > - removing *all* drives and cards and devices -- it does it with a > single IDE drive connected and no PCI cards > - kernels 2.6.16, 18, 18.1, 19-rc3. All of these kernels are having the same problem ? Or just noticed it only in 19-rc3 ? Thanks, Badari ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: PROBLEM: Oops when doing disk heavy disk I/O 2006-10-24 17:43 ` Badari Pulavarty @ 2006-10-25 0:09 ` Michael 0 siblings, 0 replies; 11+ messages in thread From: Michael @ 2006-10-25 0:09 UTC (permalink / raw) To: 'Badari Pulavarty'; +Cc: 'lkml' > From: Badari Pulavarty [mailto:pbadari@gmail.com] > Sent: Wednesday, 25 October 2006 3:43 AM > > > Other things I have tried: > > - SATA, SCSI and IDE drives -- all do the same thing > > - removing *all* drives and cards and devices -- it does it with a > > single IDE drive connected and no PCI cards > > - kernels 2.6.16, 18, 18.1, 19-rc3. > > All of these kernels are having the same problem ? Or just noticed > it only in 19-rc3 ? > All of them. I was going to try earlier kernels, but from my googling it sounds like nforce 5xx support wasn't incorporated until recently (could be wrong, though.) Thanks, Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-10-25 15:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-24 15:56 PROBLEM: Oops when doing disk heavy disk I/O Michael Sallaway 2006-10-24 17:26 ` Sergio Monteiro Basto 2006-10-25 0:21 ` Michael 2006-10-25 0:41 ` Sergio Monteiro Basto 2006-10-25 1:25 ` Michael 2006-10-25 4:12 ` Ray Lee 2006-10-25 12:28 ` Michael 2006-10-25 15:08 ` Ray Lee 2006-10-25 15:24 ` Paulo Marques 2006-10-24 17:43 ` Badari Pulavarty 2006-10-25 0:09 ` Michael
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox