From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Subject: Re: Dom0 losing interrupts???
Date: Mon, 14 Feb 2011 12:46:56 +0100
Message-ID: <4D591630.1090302@ts.fujitsu.com>
References: <4D58D2D7.9010803@ts.fujitsu.com>	<4D59034A0200007800031B7A@vpn.id2.novell.com>	<4D58F820.80401@ts.fujitsu.com>	<4D590AE70200007800031BC1@vpn.id2.novell.com>
	<AANLkTimT2td-zPxzXuJ7psi_hTRj7Ryv8-4hfa3ieDR4@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <AANLkTimT2td-zPxzXuJ7psi_hTRj7Ryv8-4hfa3ieDR4@mail.gmail.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>, Jan Beulich <JBeulich@novell.com>
List-Id: xen-devel@lists.xenproject.org

On 02/14/11 12:21, George Dunlap wrote:
> My sense is that:
> * Pinning N vcpus to N-M pcpus (where M is a significant fraction of
> N) is just a really bad idea; it would be better just not to do that.

I just wanted to make sure the interrupts are not lost due to the cpupool
operation itself.
So I tried with an extreme configuration and was proved right :-)

> It would be ideal if somehow when dom0's cpu pool shrinks, it
> automatically offlines an appropriate number of vcpus; but it
> shouldn't be difficult for an administrator to do that themselves.

I've sent a patch for the cpupool-numa-split case, which will always remove a
significant number of physical cpus for dom0.

> * On average, a vcpu shouldn't have to wait more than 60ms or so for
> an interrupt.  It seems like there's a non-negligible possibility that
> there's some kind of bug in the interrupt delivery and handling,
> either on the Xen side or the Linux side (or as Jan pointed out, a bug
> in the driver).  In that case, doing something in the scheduler isn't
> actually fixing the problem, it's just making it less likely to
> happen.  (NB that we've had intermittent failures in the xen.org
> testing infrastructure with what looks like might be missed interrupts
> as well -- and those weren't on heavily loaded boxes.)

Any idea what I could do to help? Our larger test machines are not just
idling, but I could use one from time to time without much problems.
It's rather easy for me to reproduce the problem, OTOH it should be easy for
others with a reasonable large machine, too.

> * Even if it is ultimately a scheduler bug, understanding exactly what
> the scheduler is doing and why is key to making a proper fix.  It's
> possible that there's just a simple quirk in the algorithm, such that
> a general fix will make everything work better without needing to
> introduce a special case for hardware interrupts.
> * I'm not opposed in principle to a mechanism which will prioritize
> vcpus awaiting hardware interrupts.  But I am wary of guessing what
> the problem is and then introducing a patch without proper root-cause
> analysis.  Even if it seems to fix the immediate problem, it may
> simply be masking the real problem, and may also cause problems of its
> own.  Behavior of the scheduler is hard enough to understand already,
> and every special case makes it even harder.

I absolutely agree!


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html