* Auto Checking Raid 6 crashes my system @ 2010-02-16 5:04 Dawning Sky 2010-02-16 13:37 ` Kristleifur Daðason 0 siblings, 1 reply; 5+ messages in thread From: Dawning Sky @ 2010-02-16 5:04 UTC (permalink / raw) To: linux-raid Hi, I just build a brand new md raid-6 with 5 disks. And on Fedora 12, the auto checking of md devices via a weekly cron job is enabled by default. It performs the checking by echo "check" into /sys/block/mdX/md/sync_action. But after a while, the kernel just crashes, without finishing checking the raid device or leaving anything in the messages file. For now, I've disable the raid-check cron job. The kernel version I'm running is 2.6.31.12-174.2.3.fc12.x86_64. Thanks, DS ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Auto Checking Raid 6 crashes my system 2010-02-16 5:04 Auto Checking Raid 6 crashes my system Dawning Sky @ 2010-02-16 13:37 ` Kristleifur Daðason 2010-02-16 17:16 ` Dawning Sky 0 siblings, 1 reply; 5+ messages in thread From: Kristleifur Daðason @ 2010-02-16 13:37 UTC (permalink / raw) To: Dawning Sky; +Cc: linux-raid On Tue, Feb 16, 2010 at 5:04 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: > Hi, > > I just build a brand new md raid-6 with 5 disks. And on Fedora 12, > the auto checking of md devices via a weekly cron job is enabled by > default. It performs the checking by echo "check" into > /sys/block/mdX/md/sync_action. But after a while, the kernel just > crashes, without finishing checking the raid device or leaving > anything in the messages file. > > For now, I've disable the raid-check cron job. The kernel version I'm > running is 2.6.31.12-174.2.3.fc12.x86_64. > > Thanks, > > DS Hi, I encountered similar issues because of a faulty mvsas driver that flaked out under the load. What hardware do you have? -- Kristleifur -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Auto Checking Raid 6 crashes my system 2010-02-16 13:37 ` Kristleifur Daðason @ 2010-02-16 17:16 ` Dawning Sky 2010-02-18 8:28 ` Dawning Sky 0 siblings, 1 reply; 5+ messages in thread From: Dawning Sky @ 2010-02-16 17:16 UTC (permalink / raw) To: Kristleifur Daðason; +Cc: linux-raid On Tue, Feb 16, 2010 at 5:37 AM, Kristleifur Daðason <kristleifur@gmail.com> wrote: > On Tue, Feb 16, 2010 at 5:04 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: >> Hi, >> >> I just build a brand new md raid-6 with 5 disks. And on Fedora 12, >> the auto checking of md devices via a weekly cron job is enabled by >> default. It performs the checking by echo "check" into >> /sys/block/mdX/md/sync_action. But after a while, the kernel just >> crashes, without finishing checking the raid device or leaving >> anything in the messages file. >> >> For now, I've disable the raid-check cron job. The kernel version I'm >> running is 2.6.31.12-174.2.3.fc12.x86_64. >> >> Thanks, >> >> DS > > > Hi, > > I encountered similar issues because of a faulty mvsas driver that > flaked out under the load. What hardware do you have? I'm using the onboard sata chip, which I believe is nVidia, and the driver is nv_sata. > > -- Kristleifur > DS -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Auto Checking Raid 6 crashes my system 2010-02-16 17:16 ` Dawning Sky @ 2010-02-18 8:28 ` Dawning Sky 2010-03-03 5:32 ` Neil Brown 0 siblings, 1 reply; 5+ messages in thread From: Dawning Sky @ 2010-02-18 8:28 UTC (permalink / raw) To: Kristleifur Daðason; +Cc: linux-raid I tried to reproduce the crash. First I echoed "check" into /sys/block/md127/md/sync_action under the single mode and the array finished checking. Then I booted into init 3 and repeated the same thing and got the following error on the screen and the computer hung. I had to manually type the error since I had no way to copy/paste, so there might be a few typos. UG: unable to handle kernel paging request at 000000006a312c50 IP: [<ffffffff810402a8>] task_rq_lock_0x3c/0x7e PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/block/md127/md/sync_speed CPU 1 Modules linked: fuse ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 hwmon_vid sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 nf_conntrack_netbios_ns ext2 kvm_amd kvm uinput snd_hda_codec_analog usblp snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd forcedeth amd64_edac_mod ppdev soundcore edac_core i2c_nforce2 snd_page_alloc parport_pc k8temp serio_raw parport asusatk0110 raid456 raid6_pq async_xor async_tx xor dm_multipath ata_generic pata_jmicron firewire_ohci firewire_core crc_itu_t pata_amd pata_acpi sata_nv usb_storage nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] Pid: 523, comm: md127_raid5 Not tainted 2.6.31.12-174.2.3.fc12.x86_64 #1 System Product Name RIP: 0010:[<ffffffff810402a8>] [<ffffffff810402a8>] task_rq_lock+0x3c/0x7e RSP: 0010:ffff8801155d5c10 EFLAGS: 00010046 RAX: 000000001d1836b0 RBX: 0000000000015600 RCX: 0000000000000000 RDX: 0000000000000046 RSI: ffff8801155d5c58 RDI: ffff8800dc4c2f00 RBP: ffff8801155d5c30 R08: 0000000000000000 R09: 0000000000000001 R10: ffff880115cb64e8 R11: 6db6db6db6db6db7 R12: ffff8801155d5c58 R13: ffff8800dc4c2f00 R14: 0000000000015600 R15: 0000000000000000 FS: 00007f66cc00c780(0000) GS: ffff88002803c000(0000) knlGS: 0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000006a312c50 CR3: 0000000001001000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process md127_raid5 (pid: 523, threadinfo ffff8801155d4000, task ffff880115daaf00) Stack: ffff8800dc4c2f00 0000000000000000 0000000000000003 ffff8801155d5c58 <0> ffff8801155d5c90 ffffffff8104ae7d [drm] nouveau 0000:07:00.0: GPU lockup - switching to software fbcon ffff880115fc4c00 ffff880115a45400 <0> ffff88011809e400 00000000000000046 ffff888115cb64e8 0000000000000001 Call Trace: [<ffffffff8104ae7d>] try_to_wake_up+0x9a/0x2de [<ffffffff8104b0d3>] default_wake_function+0x12/0x14 [<ffffffff8103c237>] __wake_up_common+0x4e/0x84 [<ffffffff810400cd>] __wake_up+0x39/0x4d [<ffffffffa017ce72>] __release_stripe+0x115/0x147 [raid456] [<ffffffffa017ced9>] release_stripe+0x35/0x49 [raid456] [<ffffffffa0182cd8>] raid5d+0x44e/0x563 [raid456] [<ffffffff8141c285>] ? schedule_timeout+0xb3/0xe3 [<ffffffff8105c236>] ? process_timeout+0x0/0x10 [<ffffffff8133deef>] md_thread+0xf1/0x10f [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39 [<ffffffff8133ddfe>] ? md_thread+0x0/0x10f [<ffffffff810677b5>] kthread+0x91/0x99 [<ffffffff81022daa>] child_rip+0xa/0x20 [<ffffffff81067724>] ? kthread+0x0/0x99 [<ffffffff81012da0>] ? child_rip+0x0/0x20 Code: c7 c3 00 56 01 00 49 89 fd 49 89 f4 9c 58 66 66 90 66 90 48 89 c2 fa 66 66 90 66 66 90 49 89 14 24 49 8b 45 08 49 89 de 8b 40 18 <4c> 03 34 c5 d0 76 6f 81 4c 89 f7 e8 ac d3 3d 00 49 8b 45 08 8b RIP [<ffffffff810402a8>] task_rq_lock+0x3c/0x7e RSP <ffff8801155d5c10> CR2: 000000006a312c50 ---[ end trace 6c5abd1701cc36a0 ]--- BUG: unable to handle kernel On Tue, Feb 16, 2010 at 9:16 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: > On Tue, Feb 16, 2010 at 5:37 AM, Kristleifur Daðason > <kristleifur@gmail.com> wrote: >> On Tue, Feb 16, 2010 at 5:04 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: >>> Hi, >>> >>> I just build a brand new md raid-6 with 5 disks. And on Fedora 12, >>> the auto checking of md devices via a weekly cron job is enabled by >>> default. It performs the checking by echo "check" into >>> /sys/block/mdX/md/sync_action. But after a while, the kernel just >>> crashes, without finishing checking the raid device or leaving >>> anything in the messages file. >>> >>> For now, I've disable the raid-check cron job. The kernel version I'm >>> running is 2.6.31.12-174.2.3.fc12.x86_64. >>> >>> Thanks, >>> >>> DS >> >> >> Hi, >> >> I encountered similar issues because of a faulty mvsas driver that >> flaked out under the load. What hardware do you have? > > I'm using the onboard sata chip, which I believe is nVidia, and the > driver is nv_sata. > >> >> -- Kristleifur >> > > DS > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Auto Checking Raid 6 crashes my system 2010-02-18 8:28 ` Dawning Sky @ 2010-03-03 5:32 ` Neil Brown 0 siblings, 0 replies; 5+ messages in thread From: Neil Brown @ 2010-03-03 5:32 UTC (permalink / raw) To: Dawning Sky; +Cc: Kristleifur Daðason, linux-raid On Thu, 18 Feb 2010 00:28:05 -0800 Dawning Sky <the.dawning.sky@gmail.com> wrote: > I tried to reproduce the crash. First I echoed "check" into > /sys/block/md127/md/sync_action under the single mode and the array > finished checking. Then I booted into init 3 and repeated the same > thing and got the following error on the screen and the computer hung. > I had to manually type the error since I had no way to copy/paste, so > there might be a few typos. That is quite an effort typing all that in!! I wish I could say it was really helpful and I can see exactly the problem but unfortunately I cannot. I have never seen any bugs in this part of the code and I cannot see how such a BUG could be occurring, unless memory has become corrupted some how. I think this is very likely to be caused by a hardware problem of some sort. Maybe try running memcheck86 just in case. Maybe try a different controller card or something. sorry I cannot be more helpful. NeilBrown > > > UG: unable to handle kernel paging request at 000000006a312c50 > IP: [<ffffffff810402a8>] task_rq_lock_0x3c/0x7e > PGD 0 > Oops: 0000 [#1] SMP > last sysfs file: /sys/devices/virtual/block/md127/md/sync_speed > CPU 1 > Modules linked: fuse ipt_MASQUERADE iptable_nat nf_nat bridge stp llc > nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 hwmon_vid sunrpc > cpufreq_ondemand powernow_k8 freq_table ipv6 nf_conntrack_netbios_ns > ext2 kvm_amd kvm uinput snd_hda_codec_analog usblp snd_hda_intel > snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd > forcedeth amd64_edac_mod ppdev soundcore edac_core i2c_nforce2 > snd_page_alloc parport_pc k8temp serio_raw parport asusatk0110 raid456 > raid6_pq async_xor async_tx xor dm_multipath ata_generic pata_jmicron > firewire_ohci firewire_core crc_itu_t pata_amd pata_acpi sata_nv > usb_storage nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last > unloaded: scsi_wait_scan] > Pid: 523, comm: md127_raid5 Not tainted 2.6.31.12-174.2.3.fc12.x86_64 > #1 System Product Name > RIP: 0010:[<ffffffff810402a8>] [<ffffffff810402a8>] task_rq_lock+0x3c/0x7e > RSP: 0010:ffff8801155d5c10 EFLAGS: 00010046 > RAX: 000000001d1836b0 RBX: 0000000000015600 RCX: 0000000000000000 > RDX: 0000000000000046 RSI: ffff8801155d5c58 RDI: ffff8800dc4c2f00 > RBP: ffff8801155d5c30 R08: 0000000000000000 R09: 0000000000000001 > R10: ffff880115cb64e8 R11: 6db6db6db6db6db7 R12: ffff8801155d5c58 > R13: ffff8800dc4c2f00 R14: 0000000000015600 R15: 0000000000000000 > FS: 00007f66cc00c780(0000) GS: ffff88002803c000(0000) knlGS: 0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 000000006a312c50 CR3: 0000000001001000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process md127_raid5 (pid: 523, threadinfo ffff8801155d4000, task > ffff880115daaf00) > Stack: > ffff8800dc4c2f00 0000000000000000 0000000000000003 ffff8801155d5c58 > <0> ffff8801155d5c90 ffffffff8104ae7d > [drm] nouveau 0000:07:00.0: GPU lockup - switching to software fbcon > ffff880115fc4c00 ffff880115a45400 > <0> ffff88011809e400 00000000000000046 ffff888115cb64e8 0000000000000001 > Call Trace: > [<ffffffff8104ae7d>] try_to_wake_up+0x9a/0x2de > [<ffffffff8104b0d3>] default_wake_function+0x12/0x14 > [<ffffffff8103c237>] __wake_up_common+0x4e/0x84 > [<ffffffff810400cd>] __wake_up+0x39/0x4d > [<ffffffffa017ce72>] __release_stripe+0x115/0x147 [raid456] > [<ffffffffa017ced9>] release_stripe+0x35/0x49 [raid456] > [<ffffffffa0182cd8>] raid5d+0x44e/0x563 [raid456] > [<ffffffff8141c285>] ? schedule_timeout+0xb3/0xe3 > [<ffffffff8105c236>] ? process_timeout+0x0/0x10 > [<ffffffff8133deef>] md_thread+0xf1/0x10f > [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39 > [<ffffffff8133ddfe>] ? md_thread+0x0/0x10f > [<ffffffff810677b5>] kthread+0x91/0x99 > [<ffffffff81022daa>] child_rip+0xa/0x20 > [<ffffffff81067724>] ? kthread+0x0/0x99 > [<ffffffff81012da0>] ? child_rip+0x0/0x20 > Code: c7 c3 00 56 01 00 49 89 fd 49 89 f4 9c 58 66 66 90 66 90 48 89 > c2 fa 66 66 90 66 66 90 49 89 14 24 49 8b 45 08 49 89 de 8b 40 18 <4c> > 03 34 c5 d0 76 6f 81 4c 89 f7 e8 ac d3 3d 00 49 8b 45 08 8b > RIP [<ffffffff810402a8>] task_rq_lock+0x3c/0x7e > RSP <ffff8801155d5c10> > CR2: 000000006a312c50 > ---[ end trace 6c5abd1701cc36a0 ]--- > BUG: unable to handle kernel > > > On Tue, Feb 16, 2010 at 9:16 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: > > On Tue, Feb 16, 2010 at 5:37 AM, Kristleifur Daðason > > <kristleifur@gmail.com> wrote: > >> On Tue, Feb 16, 2010 at 5:04 AM, Dawning Sky <the.dawning.sky@gmail.com> wrote: > >>> Hi, > >>> > >>> I just build a brand new md raid-6 with 5 disks. And on Fedora 12, > >>> the auto checking of md devices via a weekly cron job is enabled by > >>> default. It performs the checking by echo "check" into > >>> /sys/block/mdX/md/sync_action. But after a while, the kernel just > >>> crashes, without finishing checking the raid device or leaving > >>> anything in the messages file. > >>> > >>> For now, I've disable the raid-check cron job. The kernel version I'm > >>> running is 2.6.31.12-174.2.3.fc12.x86_64. > >>> > >>> Thanks, > >>> > >>> DS > >> > >> > >> Hi, > >> > >> I encountered similar issues because of a faulty mvsas driver that > >> flaked out under the load. What hardware do you have? > > > > I'm using the onboard sata chip, which I believe is nVidia, and the > > driver is nv_sata. > > > >> > >> -- Kristleifur > >> > > > > DS > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-03-03 5:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-16 5:04 Auto Checking Raid 6 crashes my system Dawning Sky 2010-02-16 13:37 ` Kristleifur Daðason 2010-02-16 17:16 ` Dawning Sky 2010-02-18 8:28 ` Dawning Sky 2010-03-03 5:32 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).