From mboxrd@z Thu Jan 1 00:00:00 1970 From: "dwight at supercomputer.org" Subject: Re: XCP - FYI - An easy way to wedge (and fix) a Cloud Date: Wed, 9 Jun 2010 09:58:35 -0700 Message-ID: <201006090958.35275.dwight@supercomputer.org> References: <201006080904.31362.dwight@supercomputer.org> <1276029413.2939.186.camel@agari.van.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1276029413.2939.186.camel@agari.van.xensource.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Daniel Stodden Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On Tuesday 08 June 2010 01:36:53 pm Daniel Stodden wrote: > On Tue, 2010-06-08 at 12:04 -0400, dwight at supercomputer.org wrote: > > It turns out that /var/log had filled up the root filesystem on > > the master. 500M+ worth of messages in there. After I tracked > > down the problem, and freed this space up, everything started > > working again. > > Which ones were the files growing too big? I recently caused > potential trouble with blktap. But there may be more. Both xapi > and storage management can get quite chatty, although I think this > improved with xs5.x. > > Daniel I'm going from memory here, as the main impetus was on triage, and not proper debug/fix/testing. But if memory serves, it was xensource.log. It's unlikely that any recent change was the culprit, as this was stock XCP 0.1.1. I have to say that it's something else to reboot and debug an entire Cloud. I've dealt with wedged/crashed systems before on microcontrollers, small embedded devices, PC's, Servers, Mainfraimes and Supercomputers, including Virtualized Systems. This is the first time I've had to debug and reboot an entire Cloud before. The main lesson for me is that the debugging interface could be improved. This is one of the most critical aspects of any Development environment. Being able to get to a single user shell prompt easily from the "boot:" prompt would go a long way here. -dwight-