* bonnie++ on md device causes reboot on new motherboard @ 2009-02-03 4:14 Matt Garman 2009-02-03 8:53 ` David Greaves 2009-02-23 14:45 ` Matt Garman 0 siblings, 2 replies; 6+ messages in thread From: Matt Garman @ 2009-02-03 4:14 UTC (permalink / raw) To: linux-raid I have two four disk raid5 arrays on Ubuntu Linux 8.04 (AMD64). Both are using XFS for the filesystem. $ uname -a Linux septictank 2.6.24-19-generic #1 SMP Wed Aug 20 17:53:40 UTC 2008 x86_64 GNU/Linux I recently replaced the motherboard and processor: switched from an Intel Q35 motherboard + E5200 CPU to a Gigabyte GA-MA74GM-S2 motherboard (AMD 740G/SB700) with an 4850e CPU. I ran a ton of benchmarks under the old configuration, and intend to do the same with the new hardware. (See my previous posts regarding SATA/southbridge performance.) Anyway, four times now, when I run bonnie++ (with my current working directory on one of the md arrays), the computer immediately reboots. I saw this happen twice while I logged in remotely; I did it again from the console to see if there are any useful error messages. It flashed too quickly, but the reboot looked like it happened immediately after bonnie++ started "Writing intelligently...". There are no useful indications in the system logs. Every time this reboot happens, it forces a rebuild of the md array. I have only tried it on the one array (which is empty, so I can afford to have the rebuild fail). I'm afraid to try it on the other one until I figure this out. For what it's worth the md device in question is four Western Digital 7500AAKS 750 GB 7200 rpm SATA drives. The controller is not the onboard SATA, but actually two 2-port PCIe SATA cards. Anyone seen anything like this or have any ideas where I can start looking for more information? Thanks! Matt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonnie++ on md device causes reboot on new motherboard 2009-02-03 4:14 bonnie++ on md device causes reboot on new motherboard Matt Garman @ 2009-02-03 8:53 ` David Greaves 2009-02-03 15:22 ` John Stoffel 2009-02-23 14:45 ` Matt Garman 1 sibling, 1 reply; 6+ messages in thread From: David Greaves @ 2009-02-03 8:53 UTC (permalink / raw) To: Matt Garman; +Cc: linux-raid Matt Garman wrote: > Anyone seen anything like this or have any ideas where I can start > looking for more information? netconsole? http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt At least then you may see what the error is. And for a crash like this I'd contact your distro kernel team too (not sure about lkml with 2.6.24 but probably) David -- "Don't worry, you'll be fine; I saw it work in a cartoon once..." ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonnie++ on md device causes reboot on new motherboard 2009-02-03 8:53 ` David Greaves @ 2009-02-03 15:22 ` John Stoffel 2009-02-04 1:02 ` Roger Heflin 0 siblings, 1 reply; 6+ messages in thread From: John Stoffel @ 2009-02-03 15:22 UTC (permalink / raw) To: David Greaves; +Cc: Matt Garman, linux-raid David> Matt Garman wrote: >> Anyone seen anything like this or have any ideas where I can start >> looking for more information? David> netconsole? David> http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt Or a serial console... David> At least then you may see what the error is. And for a crash David> like this I'd contact your distro kernel team too (not sure David> about lkml with 2.6.24 but probably) From the sounds of it, it's a Hardware problem of some sort. I'd run a full memtest86 on the box, as well as some sort of CPU torture. Check all your cables, possibly remove two of the four disks, etc. Remove as much memory as possible, re-seat memory board, etc. Have you checked the BIOS version? Have you reset the BIOS defaults to the 'safe' or 'default' settings? Don't bother tweaking stuff to get more speed, go for stability. The second you have porblems with stability, you've lost all that time you saved by tweaking things. :] John ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonnie++ on md device causes reboot on new motherboard 2009-02-03 15:22 ` John Stoffel @ 2009-02-04 1:02 ` Roger Heflin 2009-02-05 15:57 ` Matt Garman 0 siblings, 1 reply; 6+ messages in thread From: Roger Heflin @ 2009-02-04 1:02 UTC (permalink / raw) To: John Stoffel; +Cc: David Greaves, Matt Garman, linux-raid John Stoffel wrote: > David> Matt Garman wrote: >>> Anyone seen anything like this or have any ideas where I can start >>> looking for more information? > > David> netconsole? > David> http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt > > Or a serial console... > > David> At least then you may see what the error is. And for a crash > David> like this I'd contact your distro kernel team too (not sure > David> about lkml with 2.6.24 but probably) > >>From the sounds of it, it's a Hardware problem of some sort. I'd run > a full memtest86 on the box, as well as some sort of CPU torture. > Check all your cables, possibly remove two of the four disks, etc. > > Remove as much memory as possible, re-seat memory board, etc. Have > you checked the BIOS version? Have you reset the BIOS defaults to the > 'safe' or 'default' settings? Don't bother tweaking stuff to get more > speed, go for stability. The second you have porblems with stability, > you've lost all that time you saved by tweaking things. :] > > I would second the HW issue, if the machine is doing a full reset with no printout out of any type I would think PS, or some other serious HW issue, Linux generally does not crash without some error message. How big of PS do you have? I would try just dding the 4 disks at the same time and see if that also crashes. And then if you can remove 2 disks from the machine and retest. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonnie++ on md device causes reboot on new motherboard 2009-02-04 1:02 ` Roger Heflin @ 2009-02-05 15:57 ` Matt Garman 0 siblings, 0 replies; 6+ messages in thread From: Matt Garman @ 2009-02-05 15:57 UTC (permalink / raw) To: Roger Heflin; +Cc: John Stoffel, David Greaves, linux-raid On Tue, Feb 03, 2009 at 07:02:58PM -0600, Roger Heflin wrote: > John Stoffel wrote: >> David> Matt Garman wrote: >>>> Anyone seen anything like this or have any ideas where I can start >>>> looking for more information? >> David> netconsole? >> David> >> http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt >> Or a serial console... >> David> At least then you may see what the error is. And for a crash >> David> like this I'd contact your distro kernel team too (not sure >> David> about lkml with 2.6.24 but probably) >>> From the sounds of it, it's a Hardware problem of some sort. I'd run >> a full memtest86 on the box, as well as some sort of CPU torture. >> Check all your cables, possibly remove two of the four disks, etc. >> Remove as much memory as possible, re-seat memory board, etc. Have >> you checked the BIOS version? Have you reset the BIOS defaults to the >> 'safe' or 'default' settings? Don't bother tweaking stuff to get more >> speed, go for stability. The second you have porblems with stability, >> you've lost all that time you saved by tweaking things. :] >> > > I would second the HW issue, if the machine is doing a full reset > with no printout out of any type I would think PS, or some other > serious HW issue, Linux generally does not crash without some > error message. > > How big of PS do you have? > > I would try just dding the 4 disks at the same time and see if > that also crashes. > > And then if you can remove 2 disks from the machine and retest. Netconsole is a great idea, thanks for that! I'm going to keep testing, but here are some answers to the above and general notes. Maybe these will generate ideas... - Power supply is a Seasonic 450 Watt. I doubt this is the problem, as I've been using this same power supply---as well as all other hardware (except mobo and cpu)---without any stability problems for several months. This MB/CPU actually uses less power than the previous. Plus I have a Kill-A-Watt electricity meter hooked up; I have yet to see the machine pull more than 200 W AC (even at boot, md resync, cpuburn, memtest, etc). - I did a dd *read* test from the four drives in parallel numerous times without causing a crash. - I ran 24 hours of memtest86 without a single error. - BIOS settings are all set to stable/conservative values. (There is a newer BIOS, but no changelog---just says "updated CPU support". I'll try it anyway.) - It's not just bonnie++, it appears to be any bulk write to the filesystem. I tried to do a bulk copy (locally, using rsync) from the other md array, and that also caused a reset (unfortunately, I didn't have netconsole running when it happened). - One thing that's interesting is that every time this machine has rebooted itself, it has to resync the md array. The rsync process itself has never caused a reboot. - I got brave and both ran bonnie++ and wrote a bunch of data (via NFS) to the other md array on the integrated (SB700) SATA controller. No problems. My hunch is that the board doesn't like one or both of those SiI 2-port PCIe SATA cards. The motherboard has a single PCIe 1x slot and a 16x; the SATA cards are both PCIe 1x. Maybe the board doesn't like having a 1x device in the 16x slot? Although, my understanding is that PCI express is smart enough to handle this kind of thing. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bonnie++ on md device causes reboot on new motherboard 2009-02-03 4:14 bonnie++ on md device causes reboot on new motherboard Matt Garman 2009-02-03 8:53 ` David Greaves @ 2009-02-23 14:45 ` Matt Garman 1 sibling, 0 replies; 6+ messages in thread From: Matt Garman @ 2009-02-23 14:45 UTC (permalink / raw) To: linux-raid A few weeks ago I posted a problem I was having with my machine rebooting whenever I wrote to an md array (details below). The problem turned out to be using the SiI 3132 2-port SATA card in the PCIe x16 slot of the ga-ma74gm-s2 motherboard. I contacted Gigabyte and they said that the x16 slot is designed for *video only*. So just a heads-up if you plan to use this board for a low-power file server. -Matt On Mon, Feb 02, 2009 at 10:14:50PM -0600, Matt Garman wrote: > > I have two four disk raid5 arrays on Ubuntu Linux 8.04 (AMD64). > Both are using XFS for the filesystem. > > $ uname -a > Linux septictank 2.6.24-19-generic #1 SMP Wed Aug 20 17:53:40 UTC > 2008 x86_64 GNU/Linux > > I recently replaced the motherboard and processor: switched from an > Intel Q35 motherboard + E5200 CPU to a Gigabyte GA-MA74GM-S2 > motherboard (AMD 740G/SB700) with an 4850e CPU. > > I ran a ton of benchmarks under the old configuration, and intend to > do the same with the new hardware. (See my previous posts regarding > SATA/southbridge performance.) > > Anyway, four times now, when I run bonnie++ (with my current working > directory on one of the md arrays), the computer immediately > reboots. > > I saw this happen twice while I logged in remotely; I did it again > from the console to see if there are any useful error messages. > > It flashed too quickly, but the reboot looked like it happened > immediately after bonnie++ started "Writing intelligently...". > > There are no useful indications in the system logs. > > Every time this reboot happens, it forces a rebuild of the md array. > I have only tried it on the one array (which is empty, so I can > afford to have the rebuild fail). I'm afraid to try it on the other > one until I figure this out. > > For what it's worth the md device in question is four Western > Digital 7500AAKS 750 GB 7200 rpm SATA drives. The controller is not > the onboard SATA, but actually two 2-port PCIe SATA cards. > > Anyone seen anything like this or have any ideas where I can start > looking for more information? > > Thanks! > Matt > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-23 14:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-03 4:14 bonnie++ on md device causes reboot on new motherboard Matt Garman 2009-02-03 8:53 ` David Greaves 2009-02-03 15:22 ` John Stoffel 2009-02-04 1:02 ` Roger Heflin 2009-02-05 15:57 ` Matt Garman 2009-02-23 14:45 ` Matt Garman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).