From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad@darnok.org>
Subject: Re: xenbus and the message of doom
Date: Tue, 20 Dec 2011 09:16:13 -0500
Message-ID: <20111220141612.GA25139@konrad-lan>
References: <4EEA4877.8010307@canonical.com>
	<20111215193942.GA7640@andromeda.dapyr.net>
	<20111216113300.GA4854@aepfle.de>
	<1324375910.23729.31.camel@zakaz.uk.xensource.com>
	<20111220131533.GA7800@aepfle.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
Content-Disposition: inline
In-Reply-To: <20111220131533.GA7800@aepfle.de>
List-Unsubscribe: <http://lists.xensource.com/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Olaf Hering <olaf@aepfle.de>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, Ian Campbell <Ian.Campbell@citrix.com>, Stefan Bader <stefan.bader@canonical.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On Tue, Dec 20, 2011 at 02:15:33PM +0100, Olaf Hering wrote:
> On Tue, Dec 20, Ian Campbell wrote:
> 
> > What's wrong with only doing this reset if we know we are kexec'd? If
> > that can't be automatically detected then e.g. using an explicit
> > reset_watches command line option. You could even make a tenuous
> > argument for hanging this off reset_devices?
> 
> The kexec kernel does not know that it was loaded via kexec.
> We could make the reset_devices option mandatory for kexec in PVonHVM
> guests, so the change to drivers/xen/xenbus/xenbus_xs.c would be very
> small, like "if (hvm && reset_devices) xs_reset_watches();"

<nods> OK that would be one way. Granted if one tried to kexec under
Amazon EC2 an PVonHVM domain we would hit this bug again. But then
I don't think kexecing without this patch works, so that scenario is
probably moot.

> 
> > > Perhaps we should figure out what exactly EC2 is using as host and why
> > > it only breaks with upstream kernels.
> > 
> > and in the meantime we leave upstream (and any distros which picks up a
> > new enough kernel) on EC2? I think at this stage in the rc cycle we'd be
> > better off reverting and trying again for 3.3.
> 
> If EC2 is unable to fix it in time (or provide info what exactly they
> use), I'm ok with reverting/disabling the call to xs_reset_watches().

By my reckoning the 3.2 is going to come out Dec 29th (60 days after 3.1
was released) or it might slip. With folks buying presents online (and
potentially using Amazon) they [Amazon] is not going to fix anything -
they are in "must work now to sell stuff mode" - which means fix only
those $1M bugs. With the craze of purchases stopping around January I
think they could start addressing this sometime in Janurary - which would
be past the 3.2 release date.

Sorry Olaf, have to revert that commit.

> I can continue to work on this next year.

Ok. I need to serioulsy get a free Amazon EC2 instance to run nightly tests.
This is the third breakage this year.