Re: cgroup_fj tests will stick the nort kernel

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Qiang Huang <h.huangqiang@huawei.com>
To: Li Zefan <lizefan@huawei.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	zhangwei <jovi.zhangwei@huawei.com>
Subject: Re: cgroup_fj tests will stick the nort kernel
Date: Thu, 25 Apr 2013 14:11:46 +0800	[thread overview]
Message-ID: <5178C922.3060506@huawei.com> (raw)
In-Reply-To: <5176217E.8030008@huawei.com>

Hi Steven,

A patch follows the comment, could you take a look?

On 2013/4/23 13:51, Li Zefan wrote:
> On 2013/4/23 0:00, Steven Rostedt wrote:
>> On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote:
>>> On 2013/4/19 15:30, Qiang Huang wrote:
>>>> Hi,
>>>>
>>>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will
>>>> stick the system when ran cpuset stress tests, it happens everytime.
>>>>
>>>> Here stick the system means there are almost no response from the system and
>>>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked
>>>> (according to the lockdep message), and it may do some response sometimes.
>>>>
>>>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but
>>>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists.
>>>>
>>>> When the system is stuck, we will get the following message:
>>>> # dmesg
>>>> ...
>>>
>>> I've found the culprit after some investigation:
>>>
>>> From: Thomas Gleixner <tglx@linutronix.de>
>>> Date: Fri, 04 Nov 2011 19:48:36 +0000
>>> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch
>>>
>>> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq()
>>> and schedules tasks in other cpus, which ends up clearing some kernel threads'
>>> PF_THREAD_BOUND flag...
>>
>> I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I
>> would think it would also cause issues there too.
>>
> 
> I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not
> online yet. It's because select_task_rq_fair() just returns prev_cpu, which is
> task_cpu(p), which is 0 during system boot or some other cpu after boot, which
> is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears
> PF_THREAD_BOUND.
> 
> I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang,

I retested it, we do have the similar trouble when RT enabled, I might
missed some config that avoid these warnings.

And the patch below, I added your signed-off-by if it looks good to you.

> but I did encoutner problems when testing in my box.
> 
> I can trigger the bug with cgroup_fj.sh, or with taskset:
> 
>   # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
> 
> But system hung or tasks hung may not happen right in the test, but will happen
> after some random operations (try compile kernel).
> 
> And while running test I saw lots of warnings like this:
> 
> [  146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/
> 4:0/23
> [  146.702069] caller is vmstat_update+0x22/0x60
> [  146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49
> [  146.702077] Call Trace:
> [  146.702087]  [<ffffffff8125f685>] debug_smp_processor_id+0x145/0x150
> [  146.702091]  [<ffffffff8113c872>] vmstat_update+0x22/0x60
> [  146.702097]  [<ffffffff81061033>] process_one_work+0x203/0x610
> [  146.702101]  [<ffffffff81060f70>] ? process_one_work+0x140/0x610
> [  146.702105]  [<ffffffff81061fdd>] ? worker_thread+0x6d/0x450
> [  146.702109]  [<ffffffff8113c850>] ? refresh_cpu_vm_stats+0x1d0/0x1d0
> [  146.702114]  [<ffffffff81062116>] worker_thread+0x1a6/0x450
> [  146.702118]  [<ffffffff81061f70>] ? manage_workers+0x250/0x250
> [  146.702122]  [<ffffffff810680f6>] kthread+0xb6/0xc0
> [  146.702130]  [<ffffffff81474ab4>] kernel_thread_helper+0x4/0x10
> [  146.702137]  [<ffffffff81076930>] ? finish_task_switch+0x90/0x100
> [  146.702142]  [<ffffffff8146bb34>] ? retint_restore_args+0x13/0x13
> [  146.702145]  [<ffffffff81068040>] ? kthreadd+0x310/0x310
> [  146.702149]  [<ffffffff81474ab0>] ? gs_change+0x13/0x13
> 
> and after a while those warnings stopped, instead warnings like this popped up,
> even after I stopped the test:
> 
> [  252.896103] ------------[ cut here ]------------
> [  252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90()
> [  252.896110] Hardware name: Tecal RH2285
> [  252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge
> ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm
> t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s
> erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp
> ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m
> bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih
> mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
> [  252.896201] Pid: 9893, comm: dmesg Tainted: G        W    3.4.24.05+ #49
> [  252.896203] Call Trace:
> [  252.896208]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
> [  252.896212]  [<ffffffff810404ed>] ? unpin_current_cpu+0x7d/0x90
> [  252.896217]  [<ffffffff8103d83f>] warn_slowpath_common+0x7f/0xc0
> [  252.896221]  [<ffffffff8103d89a>] warn_slowpath_null+0x1a/0x20
> [  252.896226]  [<ffffffff810404ed>] unpin_current_cpu+0x7d/0x90
> [  252.896231]  [<ffffffff81078ddb>] migrate_enable+0xeb/0x1e0
> [  252.896235]  [<ffffffff81146b7b>] handle_pte_fault+0x34b/0x980
> [  252.896240]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
> [  252.896244]  [<ffffffff81076431>] ? get_parent_ip+0x11/0x50
> [  252.896250]  [<ffffffff811472fc>] handle_mm_fault+0x14c/0x1e0
> [  252.896254]  [<ffffffff8146ef47>] do_page_fault+0x257/0x550
> [  252.896260]  [<ffffffff8114c995>] ? do_mmap_pgoff+0x375/0x3a0
> [  252.896264]  [<ffffffff8146bfb6>] ? error_sti+0x5/0x6
> [  252.896269]  [<ffffffff81259175>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [  252.896274]  [<ffffffff8146bd75>] page_fault+0x25/0x30
> [  252.896277] ---[ end trace 000000000000ae6e ]---
> 
> I didn't see those warnings if !RT_FULL.
> 
> 

Here is the patch seems solve the problem, it looks all good in my box, my
only concern is how will this affect our RT code.


>From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001
From: Qiang Huang <h.huangqiang@huawei.com>
Date: Thu, 25 Apr 2013 10:22:01 +0800
Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq

This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch"
(commit 0d939066acdcb in v3.4-rt),.

Select_fallback_rq() can be easilly called during system boot, because
select_task_rq_fair() just return task_cpu(p) for bounded kernel threads,
which is 0 during system boot and not in tsk_cpus_allowed, so
select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my
box, 1/3 bounded kernel threads will clear that flag after boot.

And it will cause problems, for example:
# for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done
this command will cause system hung.

What's more, I don't see why we need to clear this flag any more,
because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the
optimization for PF_THREAD_BOUND on migrate_disable/enable.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/sched/core.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 751ec60..8db6e3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1327,12 +1327,6 @@ out:
 		}
 	}

-	/*
-	 * Clear PF_THREAD_BOUND, otherwise we wreckage
-	 * migrate_disable/enable. See optimization for
-	 * PF_THREAD_BOUND tasks there.
-	 */
-	p->flags &= ~PF_THREAD_BOUND;
 	return dest_cpu;
 }

-- 
1.7.1

next prev parent reply	other threads:[~2013-04-25  6:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-19  7:30 cgroup_fj tests will stick the nort kernel Qiang Huang
2013-04-20  2:00 ` Qiang Huang
2013-04-20  7:21   ` Li Zefan
2013-04-22  9:39 ` Li Zefan
2013-04-22 16:00   ` Steven Rostedt
2013-04-23  5:51     ` Li Zefan
2013-04-23 10:46       ` Li Zefan
2013-04-25  6:11       ` Qiang Huang [this message]
2013-04-25  8:44         ` Li Zefan
2013-04-25  8:56           ` Qiang Huang
2013-04-25 12:53         ` Steven Rostedt
2013-04-30 14:21     ` Luis Claudio R. Goncalves

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:751ec60 dfblob:8db6e3b )
 OR (
bs:"sched: don't clear PF_THREAD_BOUND in select_fallback_rq" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5178C922.3060506@huawei.com \
    --to=h.huangqiang@huawei.com \
    --cc=jovi.zhangwei@huawei.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.