From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752854AbcGCRYR (ORCPT ); Sun, 3 Jul 2016 13:24:17 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35473 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752818AbcGCRYQ (ORCPT ); Sun, 3 Jul 2016 13:24:16 -0400 Date: Sun, 3 Jul 2016 17:24:11 +0000 From: Vladimir Panteleev X-Priority: 3 (Normal) Message-ID: <169481048.20160703172411@gmail.com> To: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org CC: linux-kernel@vger.kernel.org Subject: Subject: PROBLEM: CPU accounting/scheduling regression in v4.6 CPU scheduling patchset? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Since updating my PC to Linux 4.6, I noticed the following problems: 1. CPU-bound tasks which use all CPU cores have a severe impact on responsiveness. For example, the following bash command (which simply starts one busyloop per core) is enough to make the machine almost completely unresponsive: for N in $(seq $(nproc)) ; do while true ; do ; done & ; done 2. Nearly all tasks in the process listing are shown with 0% CPU usage, even when they're CPU-bound. The only exceptions are the kernel migration and kthreadd tasks, and occasionally the init process. I have bisected the problem to commit 1cf4f629d9d246519a1e76c021806f2a51ddba4d ("cpu/hotplug: Move online calls to hotplugged cpu"), which is part of Thomas Gleixner's CPU hotplug refactoring patchset [1]. It introduces both problems described above. My system is a GIGABYTE X79S-UP5-WIFI motherboard (F5f BIOS) with an i7-4960X CPU, running Arch Linux. I've reproduced with both the distro's kernel config [2], as well as a minimal config for my system. I can reproduce the problems on the latest rc at the moment, v4.7-rc5. Comparing dmesg output before and after 1cf4f629, I see no notable differences. I noticed an existing thread "S3 resume regression" [3] referencing this commit, however it describes a different problem. I also found a Bugzilla issue for the zero CPU usage problem [4], however it has no replies. [1]: https://lkml.org/lkml/2016/2/26/806 [2]: https://aur.archlinux.org/cgit/aur.git/tree/config.x86_64?h=linux-git [3]: https://lkml.org/lkml/2016/5/11/238 [4]: https://bugzilla.kernel.org/show_bug.cgi?id=120151 Stuff REPORTING-BUGS told me to include: ver_linux output: https://dump.v.panteleev.md/616390d43a4c6a3d085acc5eaa390c82/16%3A58%3A08-stdin.txt /proc/cpuinfo: https://dump.v.panteleev.md/5dfeba5d7c64028de51d50559b566088/16%3A58%3A49-stdin.txt /proc/modules: https://dump.v.panteleev.md/868c0f2b23651be8164975fa5d7e7aab/16%3A59%3A18-stdin.txt /proc/ioports: https://dump.v.panteleev.md/5e44aa12cc403dbd783b0273bd3edab4/17%3A01%3A33-stdin.txt /proc/iomem: https://dump.v.panteleev.md/110a8fdd0f647fd8d729c54f4f01a3d0/17%3A01%3A49-stdin.txt "lspci -vvv" output: https://dump.v.panteleev.md/0c2448fa8a872e34c4555d876b656013/17%3A02%3A18-stdin.txt /proc/scsi/scsi: https://dump.v.panteleev.md/6efa007ce74f0bf4ce10ae56690c63de/17%3A02%3A54-stdin.txt dmesg output: https://dump.v.panteleev.md/b8a3ba608a914a3d70667dad697dddfb/1467563818.log