From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefan Bader <stefan.bader@canonical.com>
Subject: Re: xenbus and the message of doom
Date: Mon, 02 Jan 2012 10:32:42 +0100
Message-ID: <4F0179BA.7090909@canonical.com>
References: <4EEA4877.8010307@canonical.com>
	<20111215193942.GA7640@andromeda.dapyr.net>
	<20111216113300.GA4854@aepfle.de>
	<20111216152533.GE31755@phenom.dumpdata.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <20111216152533.GE31755@phenom.dumpdata.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>, pradeepv@amazon.com, Olaf Hering <olaf@aepfle.de>, "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, scott.moser@canonical.com
List-Id: xen-devel@lists.xenproject.org

On 16.12.2011 16:25, Konrad Rzeszutek Wilk wrote:
> On Fri, Dec 16, 2011 at 12:33:00PM +0100, Olaf Hering wrote:
>> On Thu, Dec 15, Konrad Rzeszutek Wilk wrote:
>>
>>> On Thu, Dec 15, 2011 at 08:20:23PM +0100, Stefan Bader wrote:
>>>> I was investigating a bug report[1] about newer kernels (>3.1) not booting as
>>>> HVM guests on Amazon EC2. For some reason git bisect did give the some pain, but
>>>> it lead me at least close and with some crash dump data I think I figured the
>>>> problem.
>>>
>>> Stefan, thanks for finding this.
>>>
>>> Olaf, what are your thoughts? Should I prep a patch to revert the patch
>>> below and then we can work on 3.3 and rethink this in 3.3? The clock is
>>> ticking for 3.2 and there is not much runway to fix stuff.
>>
>> Sometimes guest changes expose bugs in the host. Its my understanding
>> that hosts should be kept uptodate so that it can serve both old and new
>> guests well.
>>
>> In my testing with Xen4 based hosts their xenstored did properly ignore
>> the new command.
>>
>> I proposed several ways to get rid of existing watches, but finally we
>> came to the conclusion that a new xenstored command would be the
>> cleanest way.
>>
>> Wether adding a timeout is a good idea has to be decided. I can imagine
>> that a busy host may take some time to respond to guest commands.
>>
>>
>> Perhaps we should figure out what exactly EC2 is using as host and why
>> it only breaks with upstream kernels. So far I havent received reports
> 
> Good point. Stefan were you able to provide to Scott a kernel without the
> git commit mentioned to see if that fixed the issue?

Sorry have been off over the end of year and I try to be seriously off when I am
off. ;)
Am working my way through email now and maybe this is already obsolete as I see
some submissions which I have not read, yet. But doing that was on my list.

-Stefan

> CC-ing here Vincent in hopes of getting some hints..
> 
>> for SLES11 guests. SP1 got an update recently, so their HVM guests would
>> have seen the hang as well. The not yet released SP2 sends
>> XS_RESET_WATCHES as well since quite some time.
>>
>>
>> Olaf