[BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked when cpu is set to offline

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yi Yang <yi.y.yang@intel.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked when cpu is set to offline
Date: Mon, 03 Mar 2008 02:42:09 +0800	[thread overview]
Message-ID: <1204483329.3607.8.camel@yangyi-dev.bj.intel.com> (raw)

When i run cpu hotplug stress test on a system with 16 logical cpus,
some processes are deadlocked, it can be regenerated on 2.6.25-rc2
and 2.6.25-rc3.

The kernel thread [watchdog/1] isn't reaped by its parent when cpu #1
is set to offline, so that [kstopmachine] sleeps for ever.

After i investigated, debugged and analyzed it, i think it is a scheduler
specific issue, i noticed someone submitted a patch which mentioned
kthread_should_stop calling rule:

"In order to safely use kthread_stop() for kthread, there is a requirement
on how its main loop has to be orginized. Namely, the sequence of
events that lead to kthread being blocked (schedule()) has to be
ordered as follows:

    set_current_state(TASK_INTERRUPTIBLE);
    if (kthread_should_stop()) break;
    schedule() or similar.

set_current_state() implies a full memory barrier. kthread_stop()
has a matching barrier right after an update of kthread_stop_info.k
and before kthread's wakeup."

I don't think it is a good programming paradigm because it will result
in some issues very easily, why can't we provide a simple wrapper for it?

This issue seems such one, but i tried to change it to follow this rule but
the issue is still there.

Why isn't the kernel thread [watchdog/1] reaped by its parent? its state
is TASK_RUNNING with high priority (R< means this), why it isn't done?

Anyone ever met such a problem? Your thought?


Here is my stress test script, you may try it on SMP platforms (the more
number of cpu is ,  the easier it can be regenerated.).

########################################################################

####### stresscpus.sh #######

#!/bin/sh

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
        ./bugexposure.sh $i &
done


######## bugexposure.sh ###########

#!/bin/bash

main(){

    typeset -i CPU=$1
    ./task.sh > /dev/null&
    PID=$!

    if [ `cat /sys/devices/system/cpu/cpu${CPU}/online` = "0" ]; then
        echo "1" > /sys/devices/system/cpu/cpu${CPU}/online
    fi

    MASK=$((1<<${CPU}))

    `taskset -p ${MASK} ${PID} > /dev/null 2>&1`

    echo "0" > /sys/devices/system/cpu/cpu${CPU}/online

    echo "1" > /sys/devices/system/cpu/cpu${CPU}/online

    disown $PID
    kill -9 $PID > /dev/null 2>&1

    #echo "PASS\n"

}

typeset -i TEST_CPU=$1
while true
do
        main $TEST_CPU
done


###### task.sh ######
#!/bin/bash

while :
do
  NOOP=1
done

########################################################################

My tracing and analysis info is below:

# ps aux | grep watchdog
root         5  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/0]
root         8  0.0  0.0      0     0 ?        R<   22:26   0:00 [watchdog/1]
root        11  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/2]
root        14  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/3]
root        17  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/4]
root        20  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/5]
root        23  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/6]
root        26  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/7]
root        29  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/8]
root        32  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/9]
root        35  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/10]
root        38  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/11]
root        41  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/12]
root        44  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/13]
root        50  0.0  0.0      0     0 ?        S<   22:26   0:00 [watchdog/15]
root      5641  0.0  0.0  61144   708 pts/0    R+   22:58   0:00 grep watchdog
#

# ps aux | grep ksoftirqd
root         4  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/0]
root        10  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/2]
root        13  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/3]
root        16  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/4]
root        19  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/5]
root        22  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/6]
root        25  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/7]
root        28  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/8]
root        31  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/9]
root        34  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/10]
root        37  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/11]
root        40  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/12]
root        43  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/13]
root        49  0.0  0.0      0     0 ?        S<   22:26   0:00 [ksoftirqd/15]
root      5647  0.0  0.0  61144   712 pts/0    R+   23:00   0:00 grep ksoftirqd
#

#ps aux
...
root      5554  0.0  0.0      0     0 ?        S<   22:42   0:00 [kstopmachine]
root      5556  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5557  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5558  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5559  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5560  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5561  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5562  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5563  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5564  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5565  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5566  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5567  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5568  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
root      5569  0.0  0.0      0     0 ?        Z<   22:42   0:00 [kstopmachine] <defunct>
...


[watchdog/1] isn't terminated so that [kstopmachine] will waiting for it
for ever, so that "echo 0 > /sys/devices/syste/cpu/cpu1/online" will hang
up, so that hotplug spinlock will be holden by it, so that other spinlock
contenter will wait for locking it for ever, so that...

These are some softlock detection output:

process 5471 (task.sh) no longer affine to cpu1
INFO: task group_balance:51 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
group_balance D ffff810001011580     0    51      2
 ffff81027e97fdc0 0000000000000046 ffff81027e97fdd0 ffffffff80472b15
 0000000000000000 ffff81027e97d280 ffff81027c8bd300 ffff81027e97d5d0
 0000000080569da0 ffff81027e97d5d0 000000007e97d280 000000010009d366
Call Trace:
 [<ffffffff80472b15>] thread_return+0x3d/0xa2
 [<ffffffff8023cce5>] lock_timer_base+0x26/0x4b
 [<ffffffff8047338b>] __mutex_lock_slowpath+0x5c/0x93
 [<ffffffff80473221>] mutex_lock+0x1a/0x1e
 [<ffffffff8024f8fd>] get_online_cpus+0x27/0x3e
 [<ffffffff8022fd04>] load_balance_monitor+0x60/0x375
 [<ffffffff8022fca4>] load_balance_monitor+0x0/0x375
 [<ffffffff80245ff6>] kthread+0x47/0x75
 [<ffffffff802301f0>] schedule_tail+0x28/0x5c
 [<ffffffff8020cc78>] child_rip+0xa/0x12
 [<ffffffff80245faf>] kthread+0x0/0x75
 [<ffffffff8020cc6e>] child_rip+0x0/0x12

INFO: task bugexposure.sh:5462 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bugexposure.s D ffff8100010e1580     0  5462      1
 ffff81026a981c58 0000000000000082 ffff81026a981c68 ffffffff80472b15
 00000000000004d6 ffff81027c83c920 ffff81027e8bd680 ffff81027c83cc70
 0000000d7e8bc4c0 ffff81027c83cc70 0000000d00000002 000000010009d809
Call Trace:
 [<ffffffff80472b15>] thread_return+0x3d/0xa2
 [<ffffffff80314b25>] __next_cpu+0x19/0x28
 [<ffffffff8022a5d4>] enqueue_rt_entity+0x4e/0xb1
 [<ffffffff80472f34>] schedule_timeout+0x1e/0xad
 [<ffffffff802280bf>] enqueue_task+0x4d/0x58
 [<ffffffff80472ccf>] wait_for_common+0xd5/0x118
 [<ffffffff8022c31b>] default_wake_function+0x0/0xe
 [<ffffffff80245c93>] kthread_stop+0x58/0x79
 [<ffffffff8046fdcf>] cpu_callback+0x189/0x197
 [<ffffffff8046f59e>] cpu_callback+0x1cf/0x274
 [<ffffffff8026c07e>] writeback_set_ratelimit+0x19/0x68
 [<ffffffff8047658b>] notifier_call_chain+0x29/0x4c
 [<ffffffff8024f74d>] _cpu_down+0x1d6/0x2aa
 [<ffffffff8024f845>] cpu_down+0x24/0x31
 [<ffffffff8038fcf4>] store_online+0x29/0x67
 [<ffffffff802d23e1>] sysfs_write_file+0xd2/0x110
 [<ffffffff8028db49>] vfs_write+0xad/0x136
 [<ffffffff8028e086>] sys_write+0x45/0x6e
 [<ffffffff8020bfd9>] tracesys+0xdc/0xe1

next             reply	other threads:[~2008-03-03  6:24 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-02 18:42 Yi Yang [this message]
2008-03-03 11:54 ` [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked when cpu is set to offline Dmitry Adamushko
2008-03-03 11:56   ` Ingo Molnar
2008-03-03 12:02     ` Dmitry Adamushko
2008-03-03 14:53       ` Yi Yang
2008-03-03 17:37         ` Yi Yang
2008-03-03 15:31 ` Gautham R Shenoy
2008-03-03 14:45   ` Yi Yang
2008-03-04  5:26     ` Gautham R Shenoy
2008-03-04  9:09       ` Gautham R Shenoy
2008-03-03 21:56         ` Yi Yang
2008-03-04 15:01       ` Oleg Nesterov
2008-03-04 14:37         ` Yi Yang
2008-03-06 20:05           ` Yi Yang
2008-03-05 10:05         ` Gautham R Shenoy
2008-03-05 13:53           ` Oleg Nesterov
2008-03-06 11:15             ` Gautham R Shenoy
2008-03-06 12:22               ` Gautham R Shenoy
2008-03-06 13:44         ` Gautham R Shenoy
2008-03-07  2:54           ` Oleg Nesterov
2008-03-07  9:10             ` Gautham R Shenoy
2008-03-07 10:51               ` Gautham R Shenoy
2008-03-06 23:20                 ` Yi Yang
2008-03-07 13:02                 ` Dmitry Adamushko
2008-03-07 13:55                   ` Gautham R Shenoy
2008-03-07 15:50                     ` Gautham R Shenoy
2008-03-07 19:14                       ` [BUG 2.6.25-rc3] scheduler/hotplug: some processes aredealocked " Suresh Siddha
2008-03-07 20:18                   ` [BUG 2.6.25-rc3] scheduler/hotplug: some processes are dealocked " Andrew Morton
2008-03-07 21:36                     ` Rafael J. Wysocki
2008-03-07 23:01                       ` Suresh Siddha
2008-03-07 23:29                         ` Andrew Morton
2008-03-07 23:43                           ` Rafael J. Wysocki
2008-03-08  1:50                             ` Suresh Siddha
2008-03-08  2:09                               ` Andrew Morton
2008-03-08  5:10                               ` [PATCH] adjust root-domain->online span in response to hotplug event Gregory Haskins
2008-03-08  8:41                                 ` Ingo Molnar
2008-03-08 17:50                                   ` [PATCH] adjust root-domain->online span in response to hotplugevent Gregory Haskins
2008-03-09  0:31                                     ` Dmitry Adamushko
2008-03-10 14:12                                       ` Gregory Haskins
2008-03-09  2:35                                 ` [PATCH] adjust root-domain->online span in response to hotplug event Suresh Siddha
2008-03-10 12:41                                   ` Gregory Haskins
2008-03-10  8:14                                 ` Gautham R Shenoy
2008-03-10 13:13                                   ` [PATCH] cpu-hotplug: Register update_sched_domains() notifier with higher prio Gautham R Shenoy
2008-03-10 22:25                                     ` Andrew Morton
2008-03-10 13:39                                   ` [PATCH] keep rd->online and cpu_online_map in sync Gregory Haskins
2008-03-10 14:21                                     ` Gautham R Shenoy
2008-03-10 18:12                                     ` Suresh Siddha
2008-03-10 22:03                                       ` Rafael J. Wysocki
2008-03-10 22:00                                         ` Gregory Haskins
2008-03-10 22:10                                           ` Suresh Siddha
2008-03-10 21:59                                             ` [PATCH v2] " Gregory Haskins
2008-03-10 23:36                                               ` Andrew Morton
2008-03-11  1:34                                                 ` Suresh Siddha
2008-03-11  4:39                                                   ` Gautham R Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1204483329.3607.8.camel@yangyi-dev.bj.intel.com \
    --to=yi.y.yang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).