From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [xen-unstable test] 65141: regressions - FAIL Date: Mon, 7 Dec 2015 16:28:36 +0000 Message-ID: <1449505716.29724.66.camel@citrix.com> References: <22104.31917.532615.949661@mariner.uk.xensource.com> <9E79D1C9A97CFD4097BCE431828FDD31023BAE97@SHSMSX103.ccr.corp.intel.com> <1449052492.4424.30.camel@citrix.com> <1449064269.4424.73.camel@citrix.com> <1449302981.3451.3.camel@citrix.com> <5665BF5202000078000BCC0E@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5665BF5202000078000BCC0E@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Ian Jackson , Jun Nakajima , KevinTian , Robert Hu Cc: Andrew Cooper , "xen-devel@lists.xensource.com" , osstestservice owner List-Id: xen-devel@lists.xenproject.org On Mon, 2015-12-07 at 09:18 -0700, Jan Beulich wrote: > > > > On 05.12.15 at 09:09, wrote: > > On Wed, 2015-12-02 at 13:51 +0000, Ian Campbell wrote: > > > > > http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65301/ > > > > > > I think that ought to give a baseline for the bisector to work with. > > > I'll > > > prod it to do so. > > > > Results are below. TL;DR: d02e84b9d9d "vVMX: use latched VMCS machine > > address" is somehow at fault. > > > > It appears to be somewhat machine specific, the one this has been > > failing on is godello* which says "CPU0: Intel(R) Xeon(R) CPU E3-1220 > > v3 @ 3.10GHz stepping 03" in its serial log. > > > > Andy suggested this might be related to cpu_has_vmx_vmcs_shadowing > > so Haswell and newer vs IvyBridge and older. > > Yeah, but on irc it was also made clear that the regression is on a > system without that capability. What I was trying to say he said was that the difference between working and broken hosts might be spread along the lines of >=Haswell vs <=IvyBridge. How that maps onto E3-1220, which is what is exhibiting the issue, I leave to you guys. > At this point we certainly need to seriously consider reverting the > whole change. The reason I continue to be hesitant is that I'm > afraid this may result in no-one trying to find out what the problem > here is. While I could certainly try to, I'm sure I won't find time to > do so within the foreseeable future. And since we didn't get any > real feedback from Intel so far, I thought I'd ping them to at least > share some status before we decide. That pinging has happened > a few minutes ago. I'd therefore like to give it, say, another day, > and if by then we don't have an estimate for when a fix might > become available, I'd do the revert. Unless of course somebody > feels strongly about doing the revert immediately. I don't mind waiting. One approach to fixing might be to disentangle the various things which this patch did, such that the actual culprit is a smaller thing to analyse. Ian.