From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: POD: soft lockups in dom0 kernel
Date: Fri, 6 Dec 2013 11:07:23 +0000
Message-ID: <52A1AFEB.3050308@citrix.com>
References: <1538524.5AKIkpF9LB@amur>
	<52A1AE3E020000780010AC8E@nat28.tlf.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <david.vrabel@citrix.com>) id 1VotFo-0000Vh-2J
	for xen-devel@lists.xenproject.org; Fri, 06 Dec 2013 11:07:28 +0000
In-Reply-To: <52A1AE3E020000780010AC8E@nat28.tlf.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
List-Id: xen-devel@lists.xenproject.org

On 06/12/13 10:00, Jan Beulich wrote:
>>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> wrote:
>> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get
>> softlockups from time to time.
>>
>> kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
>>
>> I tracked this down to the call of xc_domain_set_pod_target() and further
>> p2m_pod_set_mem_target().
>>
>> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine
>> with enough memory for current hypervisors. But it seems the code is nearly
>> the same.
>>
>> My suggestion would be to do the 'pod set target' in the function
>> xc_domain_set_pod_target() in chunks of maybe 1GB to give the dom0 scheduler
>> a chance to run.
>> As this is not performance critical it should not be a problem.
> 
> This is a broader problem: There are more long running hypercalls
> than just the one setting the POD target. While a kernel built with
> CONFIG_PREEMPT ought to have no issue with this (as the
> hypervisor internal preemption will always exit back to the guest,
> thus allowing interrupts to be processed) as long as such
> hypercalls aren't invoked with preemption disabled, non-
> preemptable kernels (the suggested default for servers) have -
> afaict - no way to deal with this.
> 
> However, as long as interrupts and softirqs can get serviced by
> the kernel (which they can as long as they weren't disabled upon
> invocation of the hypercall), that may also be a mostly cosmetic
> problem (in that the soft lockup is being reported) as long as no
> real time like guarantees are required (which if they were would
> be sort of contradictory to the kernel being non-preemptable),
> i.e. other tasks may get starved for some time, but OS health
> shouldn't be impacted.
> 
> Hence I wonder whether it wouldn't make sense to simply
> suppress the soft lockup detection at least across privcmd
> invoked hypercalls - Cc-ing upstream Linux maintainers to see if
> they have an opinion or thoughts towards a proper solution.

We do not want to disable the soft lockup detection here as it has found
a bug.  We can't have tasks that are unschedulable for minutes as it
would only take a handful of such tasks to hose the system.

We should put an explicit preemption point in.  This will fix it for the
CONFIG_PREEMPT_VOLUNTARY case which I think is the most common
configuration.  Or perhaps this should even be a cond_reched() call to
fix it for fully non-preemptible as well.

David