From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754755AbbCFPCX (ORCPT ); Fri, 6 Mar 2015 10:02:23 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:41114 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754212AbbCFPCW (ORCPT ); Fri, 6 Mar 2015 10:02:22 -0500 Message-ID: <54F9C155.3050309@oracle.com> Date: Fri, 06 Mar 2015 08:01:41 -0700 From: David Ahern User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Mike Galbraith CC: Peter Zijlstra , Ingo Molnar , LKML Subject: Re: NMI watchdog triggering during load_balance References: <54F92788.6010007@oracle.com> <1425617559.16821.36.camel@gmx.de> In-Reply-To: <1425617559.16821.36.camel@gmx.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/5/15 9:52 PM, Mike Galbraith wrote: >> CPU970 attaching sched-domain: >> domain 0: span 968-975 level SIBLING >> groups: 8 single CPU groups >> domain 1: span 968-975 level MC >> groups: 1 group with 8 cpus >> domain 2: span 768-1023 level CPU >> groups: 4 groups with 256 cpus per group > > Wow, that topology is horrid. I'm not surprised that your box is > writhing in agony. Can you twiddle that? > twiddle that how? The system has 4 physical cpus (sockets). Each cpu has 32 cores with 8 threads per core and each cpu has 4 memory controllers. If I disable SCHED_MC and CGROUPS_SCHED (group scheduling) there is a noticeable improvement -- watchdog does not trigger and I do not get the rq locks held for 2-3 seconds. But there is still fairly high cpu usage for an idle system. Perhaps I should leave SCHED_MC on and disable SCHED_SMT; I'll try that today. Thanks, David