All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Stancek <jstancek@redhat.com>
To: alex.shi@intel.com, guz.fnst@cn.fujitsu.com,
	peterz@infradead.org, mingo@redhat.com, jolsa@redhat.com,
	riel@redhat.com, linux-kernel@vger.kernel.org
Subject: [BUG] scheduler doesn't balance thread to idle cpu for 3 seconds
Date: Wed, 27 Jan 2016 15:52:04 +0100	[thread overview]
Message-ID: <56A8D994.6050205@redhat.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

Hello,

pthread_cond_wait_1/2 [1] is rarely failing for me on 4.5.0-rc1,
on x86_64 KVM guest with 2 CPUs.

This test [1]:
- spawns 2 SCHED_RR threads
- first thread with higher priority sets alarm for 2 seconds and blocks on condition
- second thread with lower priority is busy looping for 5 seconds
- after 2 seconds alarm signal arrives and handler signals condition
- high priority thread should resume running

But rarely I see that high priority thread doesn't resume running until
low priority thread completes its 5 second busy loop.

Looking at traces (short version attached, long version at [2]),
I see that after 2 seconds scheduler tries to wake up main thread, but it
appears to do that on same CPU where SCHED_RR low prio thread is running,
so nothing happens. Then scheduler makes numerous balance attempts,
but main thread is not balanced to idle CPU.

My guess is this started with following commit, which changed weighted_cpuload():
  commit b92486cbf2aa230d00f160664858495c81d2b37b
  Author: Alex Shi <alex.shi@intel.com>
  Date:   Thu Jun 20 10:18:50 2013 +0800
    sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task

I could reproduce it with HEAD set at above commit, I couldn't reproduce it
with 3.10 kernel so far.

Regards,
Jan

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/functional/threads/condvar/pthread_cond_wait_1.c
[2] http://jan.stancek.eu/tmp/pthread_cond_wait_failure/sched-trace1.tar.bz2

[-- Attachment #2: sched-trace1-short-v4.5-rc1 --]
[-- Type: text/plain, Size: 5410 bytes --]

# test starts running
            bash-5626  [001] ....   294.609021: sched_process_fork: comm=bash pid=5626 child_comm=bash child_pid=10005
            bash-5626  [001] d...   294.609023: sched_migrate_task: comm=bash pid=10005 prio=120 orig_cpu=1 dest_cpu=0
            bash-5626  [001] d...   294.609029: sched_wakeup_new: comm=bash pid=10005 prio=120 target_cpu=000
          <idle>-0     [000] d...   294.609073: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=10005 next_prio=120
 condvar_pthread-10005 [000] d.h.   294.609355: sched_stat_runtime: comm=bash pid=10005 runtime=330015 [ns] vruntime=937586860 [ns]
 condvar_pthread-10005 [000] d...   294.610222: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=866662 [ns] vruntime=938453522 [ns]
 condvar_pthread-10005 [000] d...   294.610224: sched_migrate_task: comm=condvar_pthread pid=10005 prio=120 orig_cpu=0 dest_cpu=0
 condvar_pthread-10005 [000] ....   294.610231: sched_process_fork: comm=condvar_pthread pid=10005 child_comm=condvar_pthread child_pid=10006
# high prio thread (10006) is started
 condvar_pthread-10005 [000] d...   294.610232: sched_migrate_task: comm=condvar_pthread pid=10006 prio=120 orig_cpu=0 dest_cpu=0
 condvar_pthread-10005 [000] d...   294.610233: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=11929 [ns] vruntime=938465451 [ns]
 condvar_pthread-10005 [000] d...   294.610238: sched_wakeup_new: comm=condvar_pthread pid=10006 prio=120 target_cpu=000
 condvar_pthread-10005 [000] d...   294.610267: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=34190 [ns] vruntime=938499641 [ns]
 condvar_pthread-10005 [000] d...   294.610268: sched_migrate_task: comm=condvar_pthread pid=10005 prio=120 orig_cpu=0 dest_cpu=0
 condvar_pthread-10005 [000] ....   294.610270: sched_process_fork: comm=condvar_pthread pid=10005 child_comm=condvar_pthread child_pid=10007
# low prio thread (10007) is started
 condvar_pthread-10005 [000] d...   294.610271: sched_migrate_task: comm=condvar_pthread pid=10007 prio=120 orig_cpu=0 dest_cpu=1
 condvar_pthread-10005 [000] d...   294.610277: sched_wakeup_new: comm=condvar_pthread pid=10007 prio=120 target_cpu=001
 condvar_pthread-10005 [000] d...   294.610298: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=31353 [ns] vruntime=938530994 [ns]
 condvar_pthread-10005 [000] d...   294.610302: sched_switch: prev_comm=condvar_pthread prev_pid=10005 prev_prio=120 prev_state=S ==> next_comm=condvar_pthread next_pid=10006 next_prio=120
 condvar_pthread-10006 [000] d...   294.610311: sched_stat_runtime: comm=condvar_pthread pid=10006 runtime=12355 [ns] vruntime=944465872 [ns]
          <idle>-0     [001] d...   294.610318: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=condvar_pthread next_pid=10007 next_prio=120
 condvar_pthread-10007 [001] d...   294.610332: sched_stat_runtime: comm=condvar_pthread pid=10007 runtime=59910 [ns] vruntime=581696119 [ns]
 condvar_pthread-10006 [000] d...   294.610340: sched_switch: prev_comm=condvar_pthread prev_pid=10006 prev_prio=89 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
 condvar_pthread-10007 [001] d...   296.571389: sched_switch: prev_comm=condvar_pthread prev_pid=10007 prev_prio=94 prev_state=R ==> next_comm=watchdog/1 next_pid=13 next_prio=0
      watchdog/1-13    [001] d...   296.571393: sched_migrate_task: comm=condvar_pthread pid=10007 prio=94 orig_cpu=1 dest_cpu=0
          <idle>-0     [000] d...   296.571413: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=condvar_pthread next_pid=10007 next_prio=94
# after 2 seconds, alarm signal tries to wake up main thread on same cpu where low prio thread is running
 condvar_pthread-10007 [000] d.h.   296.610352: sched_wakeup: comm=condvar_pthread pid=10005 prio=120 target_cpu=000
 condvar_pthread-10007 [000] d...   299.610446: sched_switch: prev_comm=condvar_pthread prev_pid=10007 prev_prio=94 prev_state=x ==> next_comm=tuned next_pid=1183 next_prio=120
    xfsaild/dm-0-424   [000] d...   299.610582: sched_switch: prev_comm=xfsaild/dm-0 prev_pid=424 prev_prio=120 prev_state=S ==> next_comm=condvar_pthread next_pid=10005 next_prio=120
# main thread runs only after low prio thread is done
 condvar_pthread-10005 [000] dN..   299.610596: sched_wakeup: comm=condvar_pthread pid=10006 prio=89 target_cpu=000
 condvar_pthread-10005 [000] dN..   299.610597: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=13944 [ns] vruntime=938544938 [ns]
 condvar_pthread-10005 [000] d...   299.610598: sched_switch: prev_comm=condvar_pthread prev_pid=10005 prev_prio=120 prev_state=R ==> next_comm=condvar_pthread next_pid=10006 next_prio=89
 condvar_pthread-10006 [000] d...   299.610614: sched_switch: prev_comm=condvar_pthread prev_pid=10006 prev_prio=89 prev_state=x ==> next_comm=condvar_pthread next_pid=10005 next_prio=120
 condvar_pthread-10005 [000] d.h.   299.611591: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=976745 [ns] vruntime=939521683 [ns]
 condvar_pthread-10005 [000] d...   299.612029: sched_stat_runtime: comm=condvar_pthread pid=10005 runtime=438310 [ns] vruntime=939959993 [ns]
 condvar_pthread-10005 [000] d...   299.612031: sched_switch: prev_comm=condvar_pthread prev_pid=10005 prev_prio=120 prev_state=x ==> next_comm=rcuos/0 next_pid=9 next_prio=120

             reply	other threads:[~2016-01-27 14:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-27 14:52 Jan Stancek [this message]
2016-01-28 15:55 ` [BUG] scheduler doesn't balance thread to idle cpu for 3 seconds Jan Stancek
2016-01-28 17:49   ` Peter Zijlstra
2016-01-28 18:43     ` Jan Stancek
2016-01-29 10:15       ` Peter Zijlstra
2016-01-29 10:33         ` Jan Stancek
2016-02-08 13:40           ` Jan Stancek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A8D994.6050205@redhat.com \
    --to=jstancek@redhat.com \
    --cc=alex.shi@intel.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.