From mboxrd@z Thu Jan 1 00:00:00 1970 From: "dwight at supercomputer.org" Subject: XCP - FYI - An easy way to wedge (and fix) a Cloud Date: Tue, 8 Jun 2010 09:04:31 -0700 Message-ID: <201006080904.31362.dwight@supercomputer.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org This is mostly FYI. I know someone else is going to run into this. It turns out that it's real easy to wedge an entire Cloud with the default configurations in XCP 0.1.1. We saw this recently with our Development Cloud. It turns out that /var/log had filled up the root filesystem on the master. 500M+ worth of messages in there. After I tracked down the problem, and freed this space up, everything started working again. When this happens, various things either fail mysteriously (including a failure of the slaves and master to reboot), xsconsole wedging (on the master and slaves), and OpenXenCenter not being able to connect, and at best messages that aren't helpful. I would recommend, at the very least, that compression of the logs in logrotate.conf be turned on. I'd also strongly recommend that this be the default in release 0.5. Myself, I've taken this further, by putting logrotate into the hourly cronjob. And we're going to change our automatic installation scripts to put /var on a separate, large disk volume, not on the root filesystem. Having /var separate from the root filesystem is generally a wise move for servers, so that /var doesn't impact the root. I'd also add that having grub available would've been helpful. -dwight-