From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: xenbus and the message of doom Date: Tue, 20 Dec 2011 09:16:13 -0500 Message-ID: <20111220141612.GA25139@konrad-lan> References: <4EEA4877.8010307@canonical.com> <20111215193942.GA7640@andromeda.dapyr.net> <20111216113300.GA4854@aepfle.de> <1324375910.23729.31.camel@zakaz.uk.xensource.com> <20111220131533.GA7800@aepfle.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20111220131533.GA7800@aepfle.de> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Olaf Hering Cc: "xen-devel@lists.xensource.com" , Ian Campbell , Stefan Bader , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Tue, Dec 20, 2011 at 02:15:33PM +0100, Olaf Hering wrote: > On Tue, Dec 20, Ian Campbell wrote: > > > What's wrong with only doing this reset if we know we are kexec'd? If > > that can't be automatically detected then e.g. using an explicit > > reset_watches command line option. You could even make a tenuous > > argument for hanging this off reset_devices? > > The kexec kernel does not know that it was loaded via kexec. > We could make the reset_devices option mandatory for kexec in PVonHVM > guests, so the change to drivers/xen/xenbus/xenbus_xs.c would be very > small, like "if (hvm && reset_devices) xs_reset_watches();" OK that would be one way. Granted if one tried to kexec under Amazon EC2 an PVonHVM domain we would hit this bug again. But then I don't think kexecing without this patch works, so that scenario is probably moot. > > > > Perhaps we should figure out what exactly EC2 is using as host and why > > > it only breaks with upstream kernels. > > > > and in the meantime we leave upstream (and any distros which picks up a > > new enough kernel) on EC2? I think at this stage in the rc cycle we'd be > > better off reverting and trying again for 3.3. > > If EC2 is unable to fix it in time (or provide info what exactly they > use), I'm ok with reverting/disabling the call to xs_reset_watches(). By my reckoning the 3.2 is going to come out Dec 29th (60 days after 3.1 was released) or it might slip. With folks buying presents online (and potentially using Amazon) they [Amazon] is not going to fix anything - they are in "must work now to sell stuff mode" - which means fix only those $1M bugs. With the craze of purchases stopping around January I think they could start addressing this sometime in Janurary - which would be past the 3.2 release date. Sorry Olaf, have to revert that commit. > I can continue to work on this next year. Ok. I need to serioulsy get a free Amazon EC2 instance to run nightly tests. This is the third breakage this year.