From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bill Davidsen Subject: Re: very odd iowait problem Date: Sun, 04 Jul 2010 20:02:56 -0400 Message-ID: <4C312130.5040505@tmr.com> References: <4C1C7F2F.3040604@meetinghouse.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4C1C7F2F.3040604@meetinghouse.net> Sender: linux-raid-owner@vger.kernel.org To: Miles Fidelman Cc: "xen-users@lists.xensource.com" , General Linux-HA mailing list , linux-raid@vger.kernel.org List-Id: linux-raid.ids Miles Fidelman wrote: > Hi Folks, > > I'm experiencing a very odd, daily, high-load situation - that seems > to localize in my disk stack. I direct this to the xen-users, > linux-raid and linux-ha lists as I expect there's a pretty high degree > of experience on these lists with complicated disk driver stacks. > > I recently virtualized a production system, and have been slowly > wringing out the bugs that have shown up. This seems to be the last > one, and it's a doozie. > > Basic setup: Two identical machines except for the DomUs they're > running. > > Two machines, slightly older Pentium 4 processors, 4meg RAM each > (max), 2 CPUs each, 4 SATA Drives each. > Debian Lenny Install for Dom0 and DomUs (2.6.26-2-xen-686) > > Disk setup on each: > - 4 partitions on each drive > - 3 RAID-1s set up across the 4 drives (4 drives in each - yes it's > silly, but easy) - for Dom0 /boot / swap > - 1 RAID-6 set up across the 4 drives - set up as a LVM PV - underlies > all my DomUs > note: all the RAIDs are set up with internal metadata, chunk size of > 131072KB - per advice here - works like a charm > - pairs of LVs - / and swap per VM > - each LV is linked with it's counterpart on the other machine, using > DRBD > - LVs are specified as drbd: devices in DomU .cfg files > - LVs are mounted with noatime option inside production DomU - makes a > big difference > > A few DomUs - currently started and stopped either via links in > /etc/xen/auto or manually - I've temporarily turned off heartbeat and > pacemaker until I get the underlying stuff stable. > > ------ > now to the problem: > > for several days in a row, at 2:05am, iowait on the production DomU > went from averaging 10% or to 100% (I've been running vmstat 1 in a > window and watching the iowait column) > > the past two days, this has happened at 2:26am instead of 2:05 > > rebooting the VM fixes the problem, though it has occured again within > 20 minutes of the reboot, and then another reboot fixes the problem > until 2am the next day > > killing a bunch of processes also fixed things, but at that point so > little was running that I just rebooted the DomU - unfortunately, one > night it looked like lwresd was eating up resources, the next night it > was something else. > > ------ > ok... so I'm thinking there'a cron job that's doing something that > eats up all my i/o - I traced a couple of other issues back to cron > jobs - I can't seem to find either a cron job that runs around this > time, or anything in my logs > > so, now I set up a bunch of things to watch what's going - copies of > atop running in Dom0 on both servers, and in the production DomU > (note: I caught a couple of more bugs by running top in a window, and > seeing what was frozen in the window, after the machine crashed) > > ok - so I'm up at 2am for the 4th day in a row (along with a couple of > proposals I'm writing during the day, and a couple of fires with my > kids' computers - I've discovered that Mozy is perhaps the world's > worst backup service - it's impossible to restore things) - anyway.... > 2:26 rolls around, the iowait goes to 100%, and I start looking using > ps, and iostat, and lsof and such to try to locate whatever process is > locking up my DomU, when I notice: > > --- on one server, atop is showing one drive (/dev/sdb) maxing out at > 98% busy - sort of suggestive of a drive failure, and something that > would certainly ripple through all the layers of RAID, LVM, DRBD to > slow down everything on top of it (which is everything) > > Now this is pretty weird - given the way my system is set up, I'd > expect a dying disk that to show up as very high iowaits, but.... > - it's a relatively new drive > - I've been running smartd, and smartctl doesn't yield any results > suggesting a drive problem > - the problem goes away when I reboot the DomU > > One more symptom: I migrated the DomU to my other server, and there's > still a correlation between seeing the 98% busy on /dev/sdb, and > seeing iowait of 100% on the DomU - even though we're now talking a > disk on one machine dragging down a VM on the other machine. > (Presumeably it's impacting DRBD replication.) > > So.... > - on the one hand, the 98% busy on /dev/sdb is rippling up through md, > lvm, drbd, dom0 - and causing 100% iowait in the production DomU - > which is to be expected in a raided, drbd'd environment - a low level > delay ripples all the way up > - on the other hand, it's only effecting the one DomU, and it's not > effecting the Dom0 on that machine > - there seems to be something going on at 2:25am, give or take a few, > that kicks everything into the high iowait state (but I can't find a > job running at that time - though I guess someone could be hitting me > with some spam that's kicking amavisd or clam into a high-resource mode) > > All of which leads to two questions: > - if it's a disk going bad, why does this manifest nightly, at roughly > the same time, and effect only one DomU? > - if it's something in the DomU, by what mechanism is that rippling > all they way down to a component of a raid array, hidden below several > layers of stuff that's supposed to isolate virtual volumes from hardware? > > The only thought that occurs to me is that perhaps there's a bad > record or block on that one drive, that only gets exercised when on > particular process runs. > - is that a possibility? > - if yes, why isn't drbd or md or something catching it and fixing it > (or adding the block to the bad block table)? > - any suggestions on diagnostic or drive rebuilding steps to take > next? (includings that I can do before staying up until 2am tomorrow!) > > If it weren't hitting me, I'd be intrigued by this one. > Unfortunately, it IS hitting me, and I'm getting tireder and crankier > by the minute, hour, and day. And it's now 4:26am. Sigh... > > Thanks very much for any ideas or suggestions. Get some sleep, for one. I would install and enable process accounting, turn it on at midnight and let it run until morning (unless you feel like staying up to reboot). That's at a low enough level that I would expect it to have information as to what's running, at least. -- Bill Davidsen "We can't solve today's problems by using the same thinking we used in creating them." - Einstein