linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cgroup_fj tests will stick the nort kernel
@ 2013-04-19  7:30 Qiang Huang
  2013-04-20  2:00 ` Qiang Huang
  2013-04-22  9:39 ` Li Zefan
  0 siblings, 2 replies; 12+ messages in thread
From: Qiang Huang @ 2013-04-19  7:30 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Li Zefan, zhangwei

Hi,

I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
stick the system when ran cpuset stress tests, it happens everytime.

Here stick the system means there are almost no response from the system and
we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
(according to the lockdep message), and it may do some response sometimes.

The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.

When the system is stuck, we will get the following message:
# dmesg
...
[96967.772181] NOHZ: local_softirq_pending 200
[96967.776398] NOHZ: local_softirq_pending 200
[96967.780212] NOHZ: local_softirq_pending 200
[96967.781215] NOHZ: local_softirq_pending 200
[96967.784152] NOHZ: local_softirq_pending 200
[96967.784310] NOHZ: local_softirq_pending 200
[96967.788239] NOHZ: local_softirq_pending 200
[96967.796092] NOHZ: local_softirq_pending 200
[96967.800089] NOHZ: local_softirq_pending 200
[96967.800225] NOHZ: local_softirq_pending 200
[97112.950055] ------------[ cut here ]------------
[97112.950068] WARNING: at /usr/src/packages/BUILD/kernel-default-3.4.24.03/linux-3.4/kernel/workqueue.c:1208 worker_enter_idle+0x1d3/0x200()
[97112.950073] Hardware name: Tecal RH2285
[97112.950076] Modules linked in: reiserfs minix hfs vfat fat tun xt_limit xt_tcpudp nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 x_tables dummy edd cpufreq_conservative cpufreq_userspace
cpufreq_powersave acpi_cpufreq mperf loop dm_mod coretemp crc32c_intel igb ghash_clmulni_intel aesni_intel cryptd aes_x86_64 aes_generic iTCO_wdt bnx2 iTCO_vendor_support i7core_edac pcspkr i2c_i801
dca edac_core button rtc_cmos microcode serio_raw i2c_core ses enclosure sg mptctl ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon
scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: ip_tables]
[97112.950178] Pid: 5331, comm: kworker/0:2 Tainted: GF       WC   3.4.24.03-0.1.2-default #1
[97112.950182] Call Trace:
[97112.950191]  [<ffffffff8105e2d2>] warn_slowpath_common+0xb2/0x120
[97112.950196]  [<ffffffff8105e365>] warn_slowpath_null+0x25/0x30
[97112.950202]  [<ffffffff81085593>] worker_enter_idle+0x1d3/0x200
[97112.950207]  [<ffffffff81084a95>] ? need_to_create_worker+0x15/0x50
[97112.950213]  [<ffffffff8108a308>] worker_thread+0x2a8/0x4f0
[97112.950218]  [<ffffffff8108a060>] ? rescuer_thread+0x320/0x320
[97112.950226]  [<ffffffff81091d86>] kthread+0xc6/0xe0
[97112.950233]  [<ffffffff81720454>] kernel_thread_helper+0x4/0x10
[97112.950239]  [<ffffffff81091cc0>] ? __init_kthread_worker+0x50/0x50
[97112.950244]  [<ffffffff81720450>] ? gs_change+0x13/0x13
[97112.950248] ---[ end trace 61f48fadbd018007 ]---



Here is a sample version of cgroup_fj which can trigger this problem everytime:
(make sure we have CONFIG_CGROUPS and CONFIG_CPUSET endabled :))
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# cat cgroup_fj.sh
#! /bin/sh

LOGFILE=./cgroup_fj-output.txt
TMPFILE=/tmp/cgroup_fj_tempfile.txt

subsystem=2
subsystem_name="cpuset"

subgroup_num=100

cur_subgroup_path1=""

get_subgroup_path1()
{
        cur_subgroup_path1=""
        if [ "$#" -ne 1 ] || [ "$1" -lt 1 ] ; then
                return;
        fi

        cur_subgroup_path1="/dev/cgroup/subgroup_$1/"
}

cleanup()
{
        mount_str="`mount -l | grep /dev/cgroup`"
        if [ "$mount_str" != "" ]; then
                umount /dev/cgroup
        fi

        if [ -e /dev/cgroup ]; then
                rmdir /dev/cgroup
        fi
}

setup()
{
        mkdir /dev/cgroup
        mount -t cgroup -o $subsystem_name cgroup /dev/cgroup
}

reclaim_foundling()
{
        cat `find /dev/cgroup/subgroup_* -name "tasks"` > $TMPFILE
        nlines=`cat "$TMPFILE" | wc -l`
        for k in `seq 1 $nlines`
        do
                cur_pid=`sed -n "$k""p" $TMPFILE`
                if [ -e /proc/$cur_pid/ ];then
                        echo "pid $cur_pid reclaimed"
                        echo "$cur_pid" > "/dev/cgroup/tasks"
                fi
        done
}

##########################  main   #######################
echo "-------------------------------------------------------------------------" >> $LOGFILE

cleanup;

setup;

if [ $subsystem -eq 2 ]; then
        cpus=`cat /dev/cgroup/cpuset.cpus`
        mems=`cat /dev/cgroup/cpuset.mems`
fi

count=0
pathes[1]=""
for i in `seq 1 $subgroup_num`
do
        get_subgroup_path1 $i
        mkdir $cur_subgroup_path1

        if [ $subsystem -eq 2 ]; then
                echo "$cpus" > "$cur_subgroup_path1""cpuset.cpus"
                echo "$mems" > "$cur_subgroup_path1""cpuset.mems"
        fi

        let "count = $count + 1"
        pathes[$count]="$cur_subgroup_path1"
done

echo "...mkdired $count times" >> $LOGFILE

sleep 1

count2=$count
let "count2 = $count2 + 1"
pathes[0]="/dev/cgroup/"
pathes[$count2]="/dev/cgroup/"
for i in `seq 0 $count`
do
        j=$i
        let "j = $j + 1"
        cat "${pathes[$i]}tasks" > $TMPFILE
        nlines=`cat "$TMPFILE" | wc -l`
        for k in `seq 1 $nlines`
        do
                cur_pid=`sed -n "$k""p" $TMPFILE`
                if [ -e /proc/$cur_pid/ ];then
                        echo "$cur_pid" > "${pathes[$j]}tasks"
                        echo "task: $cur_pid" >> $LOGFILE
                        echo "target: ${pathes[$j]}tasks}" >> $LOGFILE
                fi
        done
done

reclaim_foundling;

for i in `seq 1 $count`
do
        j=i
        let "j = $count - $j + 1"
        rmdir ${pathes[$j]}
done

sleep 1

cleanup;

exit 0;
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-19  7:30 cgroup_fj tests will stick the nort kernel Qiang Huang
@ 2013-04-20  2:00 ` Qiang Huang
  2013-04-20  7:21   ` Li Zefan
  2013-04-22  9:39 ` Li Zefan
  1 sibling, 1 reply; 12+ messages in thread
From: Qiang Huang @ 2013-04-20  2:00 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Li Zefan, zhangwei

On 2013/4/19 15:30, Qiang Huang wrote:
> Hi,
> 
> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
> stick the system when ran cpuset stress tests, it happens everytime.

Here let me explain something, cgroup_fj is a test suit in ltp, which will do
some functionality and pressure test on cgroup.

And the script I give below is a very simple version of cgroup_fj which only
do one type of pressure test on cpuset subsystem.
What he did is:
1. Create /dev/cgroup and mount cpuset subsystem on it.
2. Create 100 dir under /dev/cgroup named subgroup_1..subgroup_100.
3. Attach all tasks in /dev/cgroup/tasks to /dev/cgroup/subgroup_1/tasks, then
from /dev/cgroup/subgroup_1/tasks to /dev/cgroup/subgroup_2/tasks and so on,
finally from /dev/cgroup/subgroup_100/tasks to /dev/cgroup/tasks, then end.

And the system will stuck in step 3.

> 
> Here stick the system means there are almost no response from the system and
> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
> (according to the lockdep message), and it may do some response sometimes.
> 
> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
> 
> When the system is stuck, we will get the following message:
> # dmesg
> ...
> [96967.772181] NOHZ: local_softirq_pending 200
> [96967.776398] NOHZ: local_softirq_pending 200
> [96967.780212] NOHZ: local_softirq_pending 200
> [96967.781215] NOHZ: local_softirq_pending 200
> [96967.784152] NOHZ: local_softirq_pending 200
> [96967.784310] NOHZ: local_softirq_pending 200
> [96967.788239] NOHZ: local_softirq_pending 200
> [96967.796092] NOHZ: local_softirq_pending 200
> [96967.800089] NOHZ: local_softirq_pending 200
> [96967.800225] NOHZ: local_softirq_pending 200
> [97112.950055] ------------[ cut here ]------------
> [97112.950068] WARNING: at /usr/src/packages/BUILD/kernel-default-3.4.24.03/linux-3.4/kernel/workqueue.c:1208 worker_enter_idle+0x1d3/0x200()
> [97112.950073] Hardware name: Tecal RH2285
> [97112.950076] Modules linked in: reiserfs minix hfs vfat fat tun xt_limit xt_tcpudp nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 x_tables dummy edd cpufreq_conservative cpufreq_userspace
> cpufreq_powersave acpi_cpufreq mperf loop dm_mod coretemp crc32c_intel igb ghash_clmulni_intel aesni_intel cryptd aes_x86_64 aes_generic iTCO_wdt bnx2 iTCO_vendor_support i7core_edac pcspkr i2c_i801
> dca edac_core button rtc_cmos microcode serio_raw i2c_core ses enclosure sg mptctl ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif processor thermal_sys hwmon
> scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod [last unloaded: ip_tables]
> [97112.950178] Pid: 5331, comm: kworker/0:2 Tainted: GF       WC   3.4.24.03-0.1.2-default #1
> [97112.950182] Call Trace:
> [97112.950191]  [<ffffffff8105e2d2>] warn_slowpath_common+0xb2/0x120
> [97112.950196]  [<ffffffff8105e365>] warn_slowpath_null+0x25/0x30
> [97112.950202]  [<ffffffff81085593>] worker_enter_idle+0x1d3/0x200
> [97112.950207]  [<ffffffff81084a95>] ? need_to_create_worker+0x15/0x50
> [97112.950213]  [<ffffffff8108a308>] worker_thread+0x2a8/0x4f0
> [97112.950218]  [<ffffffff8108a060>] ? rescuer_thread+0x320/0x320
> [97112.950226]  [<ffffffff81091d86>] kthread+0xc6/0xe0
> [97112.950233]  [<ffffffff81720454>] kernel_thread_helper+0x4/0x10
> [97112.950239]  [<ffffffff81091cc0>] ? __init_kthread_worker+0x50/0x50
> [97112.950244]  [<ffffffff81720450>] ? gs_change+0x13/0x13
> [97112.950248] ---[ end trace 61f48fadbd018007 ]---
> 
> 
> 
> Here is a sample version of cgroup_fj which can trigger this problem everytime:
> (make sure we have CONFIG_CGROUPS and CONFIG_CPUSET endabled :))
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> # cat cgroup_fj.sh
> #! /bin/sh
> 
> LOGFILE=./cgroup_fj-output.txt
> TMPFILE=/tmp/cgroup_fj_tempfile.txt
> 
> subsystem=2
> subsystem_name="cpuset"
> 
> subgroup_num=100
> 
> cur_subgroup_path1=""
> 
> get_subgroup_path1()
> {
>         cur_subgroup_path1=""
>         if [ "$#" -ne 1 ] || [ "$1" -lt 1 ] ; then
>                 return;
>         fi
> 
>         cur_subgroup_path1="/dev/cgroup/subgroup_$1/"
> }
> 
> cleanup()
> {
>         mount_str="`mount -l | grep /dev/cgroup`"
>         if [ "$mount_str" != "" ]; then
>                 umount /dev/cgroup
>         fi
> 
>         if [ -e /dev/cgroup ]; then
>                 rmdir /dev/cgroup
>         fi
> }
> 
> setup()
> {
>         mkdir /dev/cgroup
>         mount -t cgroup -o $subsystem_name cgroup /dev/cgroup
> }
> 
> reclaim_foundling()
> {
>         cat `find /dev/cgroup/subgroup_* -name "tasks"` > $TMPFILE
>         nlines=`cat "$TMPFILE" | wc -l`
>         for k in `seq 1 $nlines`
>         do
>                 cur_pid=`sed -n "$k""p" $TMPFILE`
>                 if [ -e /proc/$cur_pid/ ];then
>                         echo "pid $cur_pid reclaimed"
>                         echo "$cur_pid" > "/dev/cgroup/tasks"
>                 fi
>         done
> }
> 
> ##########################  main   #######################
> echo "-------------------------------------------------------------------------" >> $LOGFILE
> 
> cleanup;
> 
> setup;
> 
> if [ $subsystem -eq 2 ]; then
>         cpus=`cat /dev/cgroup/cpuset.cpus`
>         mems=`cat /dev/cgroup/cpuset.mems`
> fi
> 
> count=0
> pathes[1]=""
> for i in `seq 1 $subgroup_num`
> do
>         get_subgroup_path1 $i
>         mkdir $cur_subgroup_path1
> 
>         if [ $subsystem -eq 2 ]; then
>                 echo "$cpus" > "$cur_subgroup_path1""cpuset.cpus"
>                 echo "$mems" > "$cur_subgroup_path1""cpuset.mems"
>         fi
> 
>         let "count = $count + 1"
>         pathes[$count]="$cur_subgroup_path1"
> done
> 
> echo "...mkdired $count times" >> $LOGFILE
> 
> sleep 1
> 
> count2=$count
> let "count2 = $count2 + 1"
> pathes[0]="/dev/cgroup/"
> pathes[$count2]="/dev/cgroup/"
> for i in `seq 0 $count`
> do
>         j=$i
>         let "j = $j + 1"
>         cat "${pathes[$i]}tasks" > $TMPFILE
>         nlines=`cat "$TMPFILE" | wc -l`
>         for k in `seq 1 $nlines`
>         do
>                 cur_pid=`sed -n "$k""p" $TMPFILE`
>                 if [ -e /proc/$cur_pid/ ];then
>                         echo "$cur_pid" > "${pathes[$j]}tasks"
>                         echo "task: $cur_pid" >> $LOGFILE
>                         echo "target: ${pathes[$j]}tasks}" >> $LOGFILE
>                 fi
>         done
> done
> 
> reclaim_foundling;
> 
> for i in `seq 1 $count`
> do
>         j=i
>         let "j = $count - $j + 1"
>         rmdir ${pathes[$j]}
> done
> 
> sleep 1
> 
> cleanup;
> 
> exit 0;
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-20  2:00 ` Qiang Huang
@ 2013-04-20  7:21   ` Li Zefan
  0 siblings, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-04-20  7:21 UTC (permalink / raw)
  To: Qiang Huang; +Cc: linux-rt-users, zhangwei

On 2013/4/20 10:00, Qiang Huang wrote:
> On 2013/4/19 15:30, Qiang Huang wrote:
>> Hi,
>>
>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
>> stick the system when ran cpuset stress tests, it happens everytime.
> 
> Here let me explain something, cgroup_fj is a test suit in ltp, which will do
> some functionality and pressure test on cgroup.
> 
> And the script I give below is a very simple version of cgroup_fj which only
> do one type of pressure test on cpuset subsystem.
> What he did is:
> 1. Create /dev/cgroup and mount cpuset subsystem on it.
> 2. Create 100 dir under /dev/cgroup named subgroup_1..subgroup_100.
> 3. Attach all tasks in /dev/cgroup/tasks to /dev/cgroup/subgroup_1/tasks, then
> from /dev/cgroup/subgroup_1/tasks to /dev/cgroup/subgroup_2/tasks and so on,
> finally from /dev/cgroup/subgroup_100/tasks to /dev/cgroup/tasks, then end.
> 
> And the system will stuck in step 3.
> 

This is strange. When tasks are moved from one cpuset to another, their cpumask
and nodemask will be updated and memory will be migrated, but only if the source
cpuset and dest cpuset have different masks. In this test case, all cpusets have
the same configs.

Try to comment out some lines in cpuset_attach(), and see if the problem still
exists?

diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 5fc1570..ea430e7 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1422,16 +1422,17 @@ static void cpuset_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
                 * can_attach beforehand should guarantee that this doesn't
                 * fail.  TODO: have a better way to handle failure here
                 */
-               WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
+//             WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));

-               cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
-               cpuset_update_task_spread_flag(cs, task);
+//             cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
+//             cpuset_update_task_spread_flag(cs, task);
        }

        /*
         * Change mm, possibly for multiple threads in a threadgroup. This is
         * expensive and may sleep.
         */
+       /*
        cpuset_attach_nodemask_from = oldcs->mems_allowed;
        cpuset_attach_nodemask_to = cs->mems_allowed;
        mm = get_task_mm(leader);
@@ -1442,6 +1443,7 @@ static void cpuset_attach(struct cgroup *cgrp, struct cgroup_taskset *tset)
                                          &cpuset_attach_nodemask_to);
                mmput(mm);
        }
+       */
 }



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-19  7:30 cgroup_fj tests will stick the nort kernel Qiang Huang
  2013-04-20  2:00 ` Qiang Huang
@ 2013-04-22  9:39 ` Li Zefan
  2013-04-22 16:00   ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Li Zefan @ 2013-04-22  9:39 UTC (permalink / raw)
  To: Steven Rostedt, Thomas Gleixner; +Cc: Qiang Huang, linux-rt-users, zhangwei

On 2013/4/19 15:30, Qiang Huang wrote:
> Hi,
> 
> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
> stick the system when ran cpuset stress tests, it happens everytime.
> 
> Here stick the system means there are almost no response from the system and
> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
> (according to the lockdep message), and it may do some response sometimes.
> 
> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
> 
> When the system is stuck, we will get the following message:
> # dmesg
> ...

I've found the culprit after some investigation:

From: Thomas Gleixner <tglx@linutronix.de>
Date: Fri, 04 Nov 2011 19:48:36 +0000
Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch

At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
and schedules tasks in other cpus, which ends up clearing some kernel threads'
PF_THREAD_BOUND flag...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-22  9:39 ` Li Zefan
@ 2013-04-22 16:00   ` Steven Rostedt
  2013-04-23  5:51     ` Li Zefan
  2013-04-30 14:21     ` Luis Claudio R. Goncalves
  0 siblings, 2 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-04-22 16:00 UTC (permalink / raw)
  To: Li Zefan; +Cc: Thomas Gleixner, Qiang Huang, linux-rt-users, zhangwei

On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote:
> On 2013/4/19 15:30, Qiang Huang wrote:
> > Hi,
> > 
> > I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
> > stick the system when ran cpuset stress tests, it happens everytime.
> > 
> > Here stick the system means there are almost no response from the system and
> > we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
> > (according to the lockdep message), and it may do some response sometimes.
> > 
> > The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
> > without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
> > 
> > When the system is stuck, we will get the following message:
> > # dmesg
> > ...
> 
> I've found the culprit after some investigation:
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Fri, 04 Nov 2011 19:48:36 +0000
> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch
> 
> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
> and schedules tasks in other cpus, which ends up clearing some kernel threads'
> PF_THREAD_BOUND flag...

I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I
would think it would also cause issues there too.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-22 16:00   ` Steven Rostedt
@ 2013-04-23  5:51     ` Li Zefan
  2013-04-23 10:46       ` Li Zefan
  2013-04-25  6:11       ` Qiang Huang
  2013-04-30 14:21     ` Luis Claudio R. Goncalves
  1 sibling, 2 replies; 12+ messages in thread
From: Li Zefan @ 2013-04-23  5:51 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, Qiang Huang, linux-rt-users, zhangwei

On 2013/4/23 0:00, Steven Rostedt wrote:
> On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote:
>> On 2013/4/19 15:30, Qiang Huang wrote:
>>> Hi,
>>>
>>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
>>> stick the system when ran cpuset stress tests, it happens everytime.
>>>
>>> Here stick the system means there are almost no response from the system and
>>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
>>> (according to the lockdep message), and it may do some response sometimes.
>>>
>>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
>>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
>>>
>>> When the system is stuck, we will get the following message:
>>> # dmesg
>>> ...
>>
>> I've found the culprit after some investigation:
>>
>> From: Thomas Gleixner <tglx@linutronix.de>
>> Date: Fri, 04 Nov 2011 19:48:36 +0000
>> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch
>>
>> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
>> and schedules tasks in other cpus, which ends up clearing some kernel threads'
>> PF_THREAD_BOUND flag...
> 
> I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I
> would think it would also cause issues there too.
> 

I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not
online yet. It's because select_task_rq_fair() just returns prev_cpu, which is
task_cpu(p), which is 0 during system boot or some other cpu after boot, which
is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears
PF_THREAD_BOUND.

I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang,
but I did encoutner problems when testing in my box.

I can trigger the bug with cgroup_fj.sh, or with taskset:

  # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done

But system hung or tasks hung may not happen right in the test, but will happen
after some random operations (try compile kernel).

And while running test I saw lots of warnings like this:

[  146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/
4:0/23
[  146.702069] caller is vmstat_update+0x22/0x60
[  146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49
[  146.702077] Call Trace:
[  146.702087]  [<ffffffff8125f685>] debug_smp_processor_id+0x145/0x150
[  146.702091]  [<ffffffff8113c872>] vmstat_update+0x22/0x60
[  146.702097]  [<ffffffff81061033>] process_one_work+0x203/0x610
[  146.702101]  [<ffffffff81060f70>] ? process_one_work+0x140/0x610
[  146.702105]  [<ffffffff81061fdd>] ? worker_thread+0x6d/0x450
[  146.702109]  [<ffffffff8113c850>] ? refresh_cpu_vm_stats+0x1d0/0x1d0
[  146.702114]  [<ffffffff81062116>] worker_thread+0x1a6/0x450
[  146.702118]  [<ffffffff81061f70>] ? manage_workers+0x250/0x250
[  146.702122]  [<ffffffff810680f6>] kthread+0xb6/0xc0
[  146.702130]  [<ffffffff81474ab4>] kernel_thread_helper+0x4/0x10
[  146.702137]  [<ffffffff81076930>] ? finish_task_switch+0x90/0x100
[  146.702142]  [<ffffffff8146bb34>] ? retint_restore_args+0x13/0x13
[  146.702145]  [<ffffffff81068040>] ? kthreadd+0x310/0x310
[  146.702149]  [<ffffffff81474ab0>] ? gs_change+0x13/0x13

and after a while those warnings stopped, instead warnings like this popped up,
even after I stopped the test:

[  252.896103] ------------[ cut here ]------------
[  252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90()
[  252.896110] Hardware name: Tecal RH2285
[  252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge
ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm
t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s
erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp
ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m
bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih
mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[  252.896201] Pid: 9893, comm: dmesg Tainted: G        W    3.4.24.05+ #49
[  252.896203] Call Trace:
[  252.896208]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
[  252.896212]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
[  252.896217]  [<ffffffff8103d83f>] warn_slowpath_common+0x7f/0xc0
[  252.896221]  [<ffffffff8103d89a>] warn_slowpath_null+0x1a/0x20
[  252.896226]  [<ffffffff810404ed>] unpin_current_cpu+0x7d/0x90
[  252.896231]  [<ffffffff81078ddb>] migrate_enable+0xeb/0x1e0
[  252.896235]  [<ffffffff81146b7b>] handle_pte_fault+0x34b/0x980
[  252.896240]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
[  252.896244]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
[  252.896250]  [<ffffffff811472fc>] handle_mm_fault+0x14c/0x1e0
[  252.896254]  [<ffffffff8146ef47>] do_page_fault+0x257/0x550
[  252.896260]  [<ffffffff8114c995>] ? do_mmap_pgoff+0x375/0x3a0
[  252.896264]  [<ffffffff8146bfb6>] ? error_sti+0x5/0x6
[  252.896269]  [<ffffffff81259175>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  252.896274]  [<ffffffff8146bd75>] page_fault+0x25/0x30
[  252.896277] ---[ end trace 000000000000ae6e ]---

I didn't see those warnings if !RT_FULL.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-23  5:51     ` Li Zefan
@ 2013-04-23 10:46       ` Li Zefan
  2013-04-25  6:11       ` Qiang Huang
  1 sibling, 0 replies; 12+ messages in thread
From: Li Zefan @ 2013-04-23 10:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, Qiang Huang, linux-rt-users, zhangwei

> [  146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/
> 4:0/23
> [  146.702069] caller is vmstat_update+0x22/0x60
> [  146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49

Oh, I have to make it clear that I firstly tested an older kernel and then tested
the latest 3.4.39-rt53.

> [  146.702077] Call Trace:


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-23  5:51     ` Li Zefan
  2013-04-23 10:46       ` Li Zefan
@ 2013-04-25  6:11       ` Qiang Huang
  2013-04-25  8:44         ` Li Zefan
  2013-04-25 12:53         ` Steven Rostedt
  1 sibling, 2 replies; 12+ messages in thread
From: Qiang Huang @ 2013-04-25  6:11 UTC (permalink / raw)
  To: Li Zefan; +Cc: Steven Rostedt, Thomas Gleixner, linux-rt-users, zhangwei

Hi Steven,

A patch follows the comment, could you take a look?

On 2013/4/23 13:51, Li Zefan wrote:
> On 2013/4/23 0:00, Steven Rostedt wrote:
>> On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote:
>>> On 2013/4/19 15:30, Qiang Huang wrote:
>>>> Hi,
>>>>
>>>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
>>>> stick the system when ran cpuset stress tests, it happens everytime.
>>>>
>>>> Here stick the system means there are almost no response from the system and
>>>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
>>>> (according to the lockdep message), and it may do some response sometimes.
>>>>
>>>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
>>>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
>>>>
>>>> When the system is stuck, we will get the following message:
>>>> # dmesg
>>>> ...
>>>
>>> I've found the culprit after some investigation:
>>>
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>> Date: Fri, 04 Nov 2011 19:48:36 +0000
>>> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch
>>>
>>> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
>>> and schedules tasks in other cpus, which ends up clearing some kernel threads'
>>> PF_THREAD_BOUND flag...
>>
>> I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I
>> would think it would also cause issues there too.
>>
> 
> I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not
> online yet. It's because select_task_rq_fair() just returns prev_cpu, which is
> task_cpu(p), which is 0 during system boot or some other cpu after boot, which
> is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears
> PF_THREAD_BOUND.
> 
> I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang,

I retested it, we do have the similar trouble when RT enabled, I might
missed some config that avoid these warnings.

And the patch below, I added your signed-off-by if it looks good to you.

> but I did encoutner problems when testing in my box.
> 
> I can trigger the bug with cgroup_fj.sh, or with taskset:
> 
>   # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
> 
> But system hung or tasks hung may not happen right in the test, but will happen
> after some random operations (try compile kernel).
> 
> And while running test I saw lots of warnings like this:
> 
> [  146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/
> 4:0/23
> [  146.702069] caller is vmstat_update+0x22/0x60
> [  146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49
> [  146.702077] Call Trace:
> [  146.702087]  [<ffffffff8125f685>] debug_smp_processor_id+0x145/0x150
> [  146.702091]  [<ffffffff8113c872>] vmstat_update+0x22/0x60
> [  146.702097]  [<ffffffff81061033>] process_one_work+0x203/0x610
> [  146.702101]  [<ffffffff81060f70>] ? process_one_work+0x140/0x610
> [  146.702105]  [<ffffffff81061fdd>] ? worker_thread+0x6d/0x450
> [  146.702109]  [<ffffffff8113c850>] ? refresh_cpu_vm_stats+0x1d0/0x1d0
> [  146.702114]  [<ffffffff81062116>] worker_thread+0x1a6/0x450
> [  146.702118]  [<ffffffff81061f70>] ? manage_workers+0x250/0x250
> [  146.702122]  [<ffffffff810680f6>] kthread+0xb6/0xc0
> [  146.702130]  [<ffffffff81474ab4>] kernel_thread_helper+0x4/0x10
> [  146.702137]  [<ffffffff81076930>] ? finish_task_switch+0x90/0x100
> [  146.702142]  [<ffffffff8146bb34>] ? retint_restore_args+0x13/0x13
> [  146.702145]  [<ffffffff81068040>] ? kthreadd+0x310/0x310
> [  146.702149]  [<ffffffff81474ab0>] ? gs_change+0x13/0x13
> 
> and after a while those warnings stopped, instead warnings like this popped up,
> even after I stopped the test:
> 
> [  252.896103] ------------[ cut here ]------------
> [  252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90()
> [  252.896110] Hardware name: Tecal RH2285
> [  252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge
> ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm
> t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s
> erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp
> ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m
> bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih
> mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> [  252.896201] Pid: 9893, comm: dmesg Tainted: G        W    3.4.24.05+ #49
> [  252.896203] Call Trace:
> [  252.896208]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
> [  252.896212]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
> [  252.896217]  [<ffffffff8103d83f>] warn_slowpath_common+0x7f/0xc0
> [  252.896221]  [<ffffffff8103d89a>] warn_slowpath_null+0x1a/0x20
> [  252.896226]  [<ffffffff810404ed>] unpin_current_cpu+0x7d/0x90
> [  252.896231]  [<ffffffff81078ddb>] migrate_enable+0xeb/0x1e0
> [  252.896235]  [<ffffffff81146b7b>] handle_pte_fault+0x34b/0x980
> [  252.896240]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
> [  252.896244]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
> [  252.896250]  [<ffffffff811472fc>] handle_mm_fault+0x14c/0x1e0
> [  252.896254]  [<ffffffff8146ef47>] do_page_fault+0x257/0x550
> [  252.896260]  [<ffffffff8114c995>] ? do_mmap_pgoff+0x375/0x3a0
> [  252.896264]  [<ffffffff8146bfb6>] ? error_sti+0x5/0x6
> [  252.896269]  [<ffffffff81259175>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  252.896274]  [<ffffffff8146bd75>] page_fault+0x25/0x30
> [  252.896277] ---[ end trace 000000000000ae6e ]---
> 
> I didn't see those warnings if !RT_FULL.
> 
> 

Here is the patch seems solve the problem, it looks all good in my box, my
only concern is how will this affect our RT code.


>From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001
From: Qiang Huang <h.huangqiang@huawei.com>
Date: Thu, 25 Apr 2013 10:22:01 +0800
Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq

This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch"
(commit 0d939066acdcb in v3.4-rt),.

Select_fallback_rq() can be easilly called during system boot, because
select_task_rq_fair() just return task_cpu(p) for bounded kernel threads,
which is 0 during system boot and not in tsk_cpus_allowed, so
select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my
box, 1/3 bounded kernel threads will clear that flag after boot.

And it will cause problems, for example:
# for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
this command will cause system hung.

What's more, I don't see why we need to clear this flag any more,
because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
optimization for PF_THREAD_BOUND on migrate_disable/enable.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/sched/core.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 751ec60..8db6e3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1327,12 +1327,6 @@ out:
 		}
 	}

-	/*
-	 * Clear PF_THREAD_BOUND, otherwise we wreckage
-	 * migrate_disable/enable. See optimization for
-	 * PF_THREAD_BOUND tasks there.
-	 */
-	p->flags &= ~PF_THREAD_BOUND;
 	return dest_cpu;
 }

-- 
1.7.1





^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-25  6:11       ` Qiang Huang
@ 2013-04-25  8:44         ` Li Zefan
  2013-04-25  8:56           ` Qiang Huang
  2013-04-25 12:53         ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Li Zefan @ 2013-04-25  8:44 UTC (permalink / raw)
  To: Qiang Huang; +Cc: Steven Rostedt, Thomas Gleixner, linux-rt-users, zhangwei

> And the patch below, I added your signed-off-by if it looks good to you.

Don't...

> Here is the patch seems solve the problem, it looks all good in my box, my
> only concern is how will this affect our RT code.
> 
> 
>>From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001
> From: Qiang Huang <h.huangqiang@huawei.com>
> Date: Thu, 25 Apr 2013 10:22:01 +0800
> Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq
> 
> This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch"
> (commit 0d939066acdcb in v3.4-rt),.
> 
> Select_fallback_rq() can be easilly called during system boot, because
> select_task_rq_fair() just return task_cpu(p) for bounded kernel threads,
> which is 0 during system boot and not in tsk_cpus_allowed, so
> select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my
> box, 1/3 bounded kernel threads will clear that flag after boot.
> 
> And it will cause problems, for example:
> # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
> this command will cause system hung.
> 
> What's more, I don't see why we need to clear this flag any more,
> because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
> optimization for PF_THREAD_BOUND on migrate_disable/enable.
> 
> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
> Signed-off-by: Li Zefan <lizefan@huawei.com>

You shoudn't have added my SOB... I didn't write this patch and I didn't
even sugguest this fix or check if this is a correct fix.

Please read Documentation/SubmittingPatches to learn how SOB should
be used.

> ---
>  kernel/sched/core.c |    6 ------
>  1 files changed, 0 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 751ec60..8db6e3b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1327,12 +1327,6 @@ out:
>  		}
>  	}
> 
> -	/*
> -	 * Clear PF_THREAD_BOUND, otherwise we wreckage
> -	 * migrate_disable/enable. See optimization for
> -	 * PF_THREAD_BOUND tasks there.
> -	 */
> -	p->flags &= ~PF_THREAD_BOUND;
>  	return dest_cpu;
>  }
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-25  8:44         ` Li Zefan
@ 2013-04-25  8:56           ` Qiang Huang
  0 siblings, 0 replies; 12+ messages in thread
From: Qiang Huang @ 2013-04-25  8:56 UTC (permalink / raw)
  To: Li Zefan; +Cc: Steven Rostedt, Thomas Gleixner, linux-rt-users, zhangwei

On 2013/4/25 16:44, Li Zefan wrote:
>> And the patch below, I added your signed-off-by if it looks good to you.
> 
> Don't...
> 
>> Here is the patch seems solve the problem, it looks all good in my box, my
>> only concern is how will this affect our RT code.
>>
>>
>> >From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001
>> From: Qiang Huang <h.huangqiang@huawei.com>
>> Date: Thu, 25 Apr 2013 10:22:01 +0800
>> Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq
>>
>> This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch"
>> (commit 0d939066acdcb in v3.4-rt),.
>>
>> Select_fallback_rq() can be easilly called during system boot, because
>> select_task_rq_fair() just return task_cpu(p) for bounded kernel threads,
>> which is 0 during system boot and not in tsk_cpus_allowed, so
>> select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my
>> box, 1/3 bounded kernel threads will clear that flag after boot.
>>
>> And it will cause problems, for example:
>> # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
>> this command will cause system hung.
>>
>> What's more, I don't see why we need to clear this flag any more,
>> because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
>> optimization for PF_THREAD_BOUND on migrate_disable/enable.
>>
>> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
>> Signed-off-by: Li Zefan <lizefan@huawei.com>
> 
> You shoudn't have added my SOB... I didn't write this patch and I didn't
> even sugguest this fix or check if this is a correct fix.
> 
> Please read Documentation/SubmittingPatches to learn how SOB should
> be used.
> 

OK, I'll resend it.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-25  6:11       ` Qiang Huang
  2013-04-25  8:44         ` Li Zefan
@ 2013-04-25 12:53         ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2013-04-25 12:53 UTC (permalink / raw)
  To: Qiang Huang; +Cc: Li Zefan, Thomas Gleixner, linux-rt-users, zhangwei

On Thu, 2013-04-25 at 14:11 +0800, Qiang Huang wrote:

> What's more, I don't see why we need to clear this flag any more,
> because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
> optimization for PF_THREAD_BOUND on migrate_disable/enable.

I'll have to think about this. That is, take a deeper look.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: cgroup_fj tests will stick the nort kernel
  2013-04-22 16:00   ` Steven Rostedt
  2013-04-23  5:51     ` Li Zefan
@ 2013-04-30 14:21     ` Luis Claudio R. Goncalves
  1 sibling, 0 replies; 12+ messages in thread
From: Luis Claudio R. Goncalves @ 2013-04-30 14:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Li Zefan, Thomas Gleixner, Qiang Huang, linux-rt-users, zhangwei

On Mon, Apr 22, 2013 at 12:00:47PM -0400, Steven Rostedt wrote:
| On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote:
| > On 2013/4/19 15:30, Qiang Huang wrote:
| > > Hi,
| > > 
| > > I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
| > > stick the system when ran cpuset stress tests, it happens everytime.
| > > 
| > > Here stick the system means there are almost no response from the system and
| > > we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
| > > (according to the lockdep message), and it may do some response sometimes.
| > > 
| > > The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
| > > without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
| > > 
| > > When the system is stuck, we will get the following message:
| > > # dmesg
| > > ...
| > 
| > I've found the culprit after some investigation:
| > 
| > From: Thomas Gleixner <tglx@linutronix.de>
| > Date: Fri, 04 Nov 2011 19:48:36 +0000
| > Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch
| > 
| > At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
| > and schedules tasks in other cpus, which ends up clearing some kernel threads'
| > PF_THREAD_BOUND flag...
| 
| I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I
| would think it would also cause issues there too.

I does break when PREEMPT_RT_FULL is enabled :)

I was able to consistently reproduce the issue on the latest 3.6-rt kernel
this weekend. And I was also able to confirm that the patch in this thread
did mitigate the issue.

Cheers,
Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-04-30 14:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19  7:30 cgroup_fj tests will stick the nort kernel Qiang Huang
2013-04-20  2:00 ` Qiang Huang
2013-04-20  7:21   ` Li Zefan
2013-04-22  9:39 ` Li Zefan
2013-04-22 16:00   ` Steven Rostedt
2013-04-23  5:51     ` Li Zefan
2013-04-23 10:46       ` Li Zefan
2013-04-25  6:11       ` Qiang Huang
2013-04-25  8:44         ` Li Zefan
2013-04-25  8:56           ` Qiang Huang
2013-04-25 12:53         ` Steven Rostedt
2013-04-30 14:21     ` Luis Claudio R. Goncalves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).