From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758191AbcDHLJG (ORCPT ); Fri, 8 Apr 2016 07:09:06 -0400 Received: from mail1.bemta5.messagelabs.com ([195.245.231.137]:53108 "EHLO mail1.bemta5.messagelabs.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754035AbcDHLJE (ORCPT ); Fri, 8 Apr 2016 07:09:04 -0400 X-Greylist: delayed 424 seconds by postgrey-1.27 at vger.kernel.org; Fri, 08 Apr 2016 07:09:04 EDT X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupnkeJIrShJLcpLzFFi42Kxs+GYqjulnz3 cYNoVPovLu+awOTB6fN4kF8AYxZqZl5RfkcCasej+C8aCS2IVXxv3sjYwnhbqYuTiEBJYwSTx 6e8MVghnK6PEo8+/2bsYOTlEBJQkvl/rZgSx2QSMJBa9PgkWFxbwlXi58CYbiM0roC3x+8Ajl i5GDg4WARWJs7vTQMKiAsESGz68YIcoEZQ4OfMJWAmzgKbE+l36IGFmAXmJ5q2zmUFsCQFFiX cfdrJC2JUSD372MU9g5J2FpHsWQvcsJN0LGJlXMWoUpxaVpRbpGproJRVlpmeU5CZm5ugaGpj q5aYWFyemp+YkJhXrJefnbmIEBhQDEOxgPHva8xCjJAeTkigvQy97uBBfUn5KZUZicUZ8UWlO avEhRhkODiUJ3r4+oJxgUWp6akVaZg4wtGHSEhw8SiK8uiBp3uKCxNzizHSI1ClGRSlx3kCQh ABIIqM0D64NFk+XGGWlhHkZgQ4R4ilILcrNLEGVf8UozsGoJMzrCzKFJzOvBG76K6DFTECLL/ CzgSwuSURISTUwahyzmlj/dlqPjtjzxmc8Tzm49tYG+n/RNVih4HdJr/NIyNs/b6QvKWvw3tW b+8qwNP+Ya01g2MWIbkeRRvW7UcZPf/7IrexQZFn9jP9tfUeqHnvTwhcv78ra3jn2mi0iJMT1 04XKGW/P2v/8oj1pz8GS7gU8mt92deXuX3ndbWHhHTU93tWCSizFGYmGWsxFxYkAtTPJ/qICA AA= X-Env-Sender: rainer.koenig@ts.fujitsu.com X-Msg-Ref: server-8.tower-36.messagelabs.com!1460113300!26734436!1 X-Originating-IP: [62.60.8.149] X-StarScan-Received: X-StarScan-Version: 8.28; banners=-,-,- X-VirusChecked: Checked To: From: Rainer Koenig Subject: sched: How does the scheduler determine which CPU core gets the job? Message-ID: <57078F83.20405@ts.fujitsu.com> Date: Fri, 8 Apr 2016 13:01:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.183.43.211] X-ClientProxiedBy: R01UKEXCASM116.r01.fujitsu.local (10.183.43.172) To r01ukexcasm215.r01.fujitsu.local (10.182.185.117) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Short summary: ============== Investigating an isuue where parallel tasks are spread differently over the available CPU cores, depending if the machine was cold booted from power off or warm booted by init 6. On cold boot the parallel processes were spread as expected so that with "N" cores and "N" tasks every core gets one task. Same test with warm boot shows that the tasks are spread differently which results in a lousy performance. More details: ============= Have a workstation here with 2 physical CPUs Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GH which sums up to 48 cores (including hypterthreading). The test sample is an example from the LIGGHTS tutorial files. Test is called like that: mpirun -np 48 liggghts < in.chutewear The performance and CPU load is monitored with htop. If I run the test after a cold boot everyting is like I expected it to be. 48 parallel processes are started, distributed over 48 cores and I see that every CPU core is working at around 100% load. Same hardware, same test, only difference is that meanwhile I did a reboot. Behaviour is totally different. This time only a few CPU cores get the processes and so many cores are just idling around. Question that comes to my mind: =============================== What can cause such a behaviour? Ok, simple answer would be "talk to your machine vendor and ask them what they have done wrong during initialization when the system is rebootet". Bad news in that is that I'm working for that vendor and we need an idea what to look for. After discussing this on the OpenMPI list I now decided to ask here for help. What we tried out so far: ========================= - compared dmesg output betweend cold and warm boot. Nothing special, just a few different numbers for computed performance and different timestamps. - compared the output of lstopo from hwloc, but nothing special here too. - wrote a script that make a snapshot of all /proc//status files for the liggghts jobs and compared the snapshots. Now its clear that we still launch 48 processes, but they are distributed differently. - tried newer kernel (test is running on Ubuntu 14.04.4). Performance got a bit better, but problem still exists. - Took snapshots of /proc/sched_debug when test is running after cold or warm boot. Problem is that for interpreting this output I would need the details how the scheduler works. But that's why I'm asking here. So, if anyone has an idea what to look for please post it here and add me to Cc: TIA Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Clients FJ EMEIA PR PSO PM&D CCD ENG SW OSS&C Fujitsu Technology Solutions Bürgermeister-Ullrich-Str. 100 86199 Augsburg Germany Telephone: +49-821-804-3321 Telefax: +49-821-804-2131 Mail: mailto:Rainer.Koenig@ts.fujitsu.com Internet ts.fujtsu.com Company Details ts.fujitsu.com/imprint.html