From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [xen-unstable test] 56759: regressions - FAIL Date: Fri, 29 May 2015 11:50:49 +0100 Message-ID: <1432896649.15036.16.camel@citrix.com> References: <1432115769.12989.219.camel@citrix.com> <1432646989.14664.112.camel@citrix.com> <1432742677.14664.270.camel@citrix.com> <5566F2E9020000780007E6B2@mail.emea.novell.com> <1432805194.14664.286.camel@citrix.com> <556705CE020000780007E76C@mail.emea.novell.com> <556837E7.3090400@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YyHsN-00033a-Ig for xen-devel@lists.xenproject.org; Fri, 29 May 2015 10:50:55 +0000 In-Reply-To: <556837E7.3090400@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Wei Liu , Stefano Stabellini , ian.jackson@eu.citrix.com, Tim Deegan , Julien Grall , David Vrabel , Jan Beulich , xen-devel List-Id: xen-devel@lists.xenproject.org On Fri, 2015-05-29 at 10:56 +0100, Andrew Cooper wrote: > On 28/05/15 11:10, Jan Beulich wrote: > >>>> On 28.05.15 at 11:26, wrote: > >> On Thu, 2015-05-28 at 09:50 +0100, Jan Beulich wrote: > >>>>>> On 27.05.15 at 18:04, wrote: > >>>> On Tue, 2015-05-26 at 14:29 +0100, Ian Campbell wrote: > >>>>> I've now managed to reproduce using the arndale on my desk. > >>>> ... and now I've confirmed that reverting the spin lock change causes > >>>> the issue to not happen any more. > >>> Considering that this issue has prevented a push for almost > >>> two weeks, I think we ought to consider reverting the two > >>> offending commits until the problem got sorted out. > >> I think that would probably be wise. I'll try and figure out exactly > >> what is going on and propose some patches ASAP. > > Now done and pushed. > > Wait what? This failure is not related to spinlocks; It is a networking > behavioural bug (hardware specific, even) which has been uncovered, > showing that there is a preexisting race condition. That's the current _hypothesis_, but it hasn't been confirmed what is actually happening here. So far doing the apparently obvious fix in netback (moving the state change to closed until after the uevent is generated) doesn't seem to have fixed the issue. So either the hypothesis is wrong or there is something more subtle going on. We don't know what is causing this issue yet and therefore neither holding up the push gate nor force pushing seem appropriate under the circumstances. > It is not reasonable to revert a correct change because it has exposed > an existing race condition elsewhere. IMO, this should have been a > force push to mark the test as non-blocking. > > ~Andrew