From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: xenbus and the message of doom Date: Mon, 02 Jan 2012 10:32:42 +0100 Message-ID: <4F0179BA.7090909@canonical.com> References: <4EEA4877.8010307@canonical.com> <20111215193942.GA7640@andromeda.dapyr.net> <20111216113300.GA4854@aepfle.de> <20111216152533.GE31755@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20111216152533.GE31755@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk , pradeepv@amazon.com, Olaf Hering , "xen-devel@lists.xensource.com" , scott.moser@canonical.com List-Id: xen-devel@lists.xenproject.org On 16.12.2011 16:25, Konrad Rzeszutek Wilk wrote: > On Fri, Dec 16, 2011 at 12:33:00PM +0100, Olaf Hering wrote: >> On Thu, Dec 15, Konrad Rzeszutek Wilk wrote: >> >>> On Thu, Dec 15, 2011 at 08:20:23PM +0100, Stefan Bader wrote: >>>> I was investigating a bug report[1] about newer kernels (>3.1) not booting as >>>> HVM guests on Amazon EC2. For some reason git bisect did give the some pain, but >>>> it lead me at least close and with some crash dump data I think I figured the >>>> problem. >>> >>> Stefan, thanks for finding this. >>> >>> Olaf, what are your thoughts? Should I prep a patch to revert the patch >>> below and then we can work on 3.3 and rethink this in 3.3? The clock is >>> ticking for 3.2 and there is not much runway to fix stuff. >> >> Sometimes guest changes expose bugs in the host. Its my understanding >> that hosts should be kept uptodate so that it can serve both old and new >> guests well. >> >> In my testing with Xen4 based hosts their xenstored did properly ignore >> the new command. >> >> I proposed several ways to get rid of existing watches, but finally we >> came to the conclusion that a new xenstored command would be the >> cleanest way. >> >> Wether adding a timeout is a good idea has to be decided. I can imagine >> that a busy host may take some time to respond to guest commands. >> >> >> Perhaps we should figure out what exactly EC2 is using as host and why >> it only breaks with upstream kernels. So far I havent received reports > > Good point. Stefan were you able to provide to Scott a kernel without the > git commit mentioned to see if that fixed the issue? Sorry have been off over the end of year and I try to be seriously off when I am off. ;) Am working my way through email now and maybe this is already obsolete as I see some submissions which I have not read, yet. But doing that was on my list. -Stefan > CC-ing here Vincent in hopes of getting some hints.. > >> for SLES11 guests. SP1 got an update recently, so their HVM guests would >> have seen the hang as well. The not yet released SP2 sends >> XS_RESET_WATCHES as well since quite some time. >> >> >> Olaf