From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: xenbus and the message of doom Date: Fri, 16 Dec 2011 10:25:33 -0500 Message-ID: <20111216152533.GE31755@phenom.dumpdata.com> References: <4EEA4877.8010307@canonical.com> <20111215193942.GA7640@andromeda.dapyr.net> <20111216113300.GA4854@aepfle.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20111216113300.GA4854@aepfle.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Olaf Hering , pradeepv@amazon.com, scott.moser@canonical.com Cc: Konrad Rzeszutek Wilk , "xen-devel@lists.xensource.com" , Stefan Bader List-Id: xen-devel@lists.xenproject.org On Fri, Dec 16, 2011 at 12:33:00PM +0100, Olaf Hering wrote: > On Thu, Dec 15, Konrad Rzeszutek Wilk wrote: > > > On Thu, Dec 15, 2011 at 08:20:23PM +0100, Stefan Bader wrote: > > > I was investigating a bug report[1] about newer kernels (>3.1) not booting as > > > HVM guests on Amazon EC2. For some reason git bisect did give the some pain, but > > > it lead me at least close and with some crash dump data I think I figured the > > > problem. > > > > Stefan, thanks for finding this. > > > > Olaf, what are your thoughts? Should I prep a patch to revert the patch > > below and then we can work on 3.3 and rethink this in 3.3? The clock is > > ticking for 3.2 and there is not much runway to fix stuff. > > Sometimes guest changes expose bugs in the host. Its my understanding > that hosts should be kept uptodate so that it can serve both old and new > guests well. > > In my testing with Xen4 based hosts their xenstored did properly ignore > the new command. > > I proposed several ways to get rid of existing watches, but finally we > came to the conclusion that a new xenstored command would be the > cleanest way. > > Wether adding a timeout is a good idea has to be decided. I can imagine > that a busy host may take some time to respond to guest commands. > > > Perhaps we should figure out what exactly EC2 is using as host and why > it only breaks with upstream kernels. So far I havent received reports Good point. Stefan were you able to provide to Scott a kernel without the git commit mentioned to see if that fixed the issue? CC-ing here Vincent in hopes of getting some hints.. > for SLES11 guests. SP1 got an update recently, so their HVM guests would > have seen the hang as well. The not yet released SP2 sends > XS_RESET_WATCHES as well since quite some time. > > > Olaf