From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: High Net and Disk Use == stuck domain Date: Mon, 01 Dec 2008 12:19:50 -0800 Message-ID: <493446E6.2060706@goop.org> References: <4926E7DD.8040603@theshore.net> <4933FD44.7050101@theshore.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4933FD44.7050101@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Christopher S. Aker" Cc: xen devel List-Id: xen-devel@lists.xenproject.org Christopher S. Aker wrote: > Christopher S. Aker wrote: >> For the past year or so we've been seeing a bug whereby a domU's CPU >> would spin up to a steady 100, 200, 300 or 400% (4 vcpus), console >> would freeze, and some or all of the network-facing services within >> the domU would connect but block without any output. Disk IO would >> flatline. The domU would never recover and required rebooting. >> >> Since pv_ops hasn't always been around, we previously had only seen >> this behavior with xen-patched domUs (2.6.18.x), but now we're seeing >> it with pv_ops. Identical symptoms. And, I have a user that is able >> to reliable reproduce it on 2.6.27.4! >> >> His recipe is downloading an ISO from a very fast and close-by news >> server using nzbget. The trigger appears to be a combination of high >> network use and high disk use (like download from a very fast mirror) >> -- because we weren't able to reproduce the problem when saving to a >> tmpfs mount. >> >> I was able to grab the output of sysrq t while it was in the bad state: >> >> http://theshore.net/~caker/xen/BUGS/D-state/console.log >> >> The number of processes in D state (39) is quite suspicious. >> >> Let me know if there's anything else I can provide. >> >> -Chris > > Jeremy, > > Did this one slip by you? I figured a reproducible bug would be just > too tantalizing to resist. Hoping it would go away by itself? ;) I'm trying to repro it now, copying ISOs at 25 Mbytes/sec. How long does it take to happen? > What's the correct venue for these issues that overlap xen-devel, > lkml, and virtualization/pv_ops stuff -- should I be blasting these to > everybody? Me and xen-devel are a good start, and posting in a bugzilla cc:ing me if it looks like its been dropped on the floor. J