From mboxrd@z Thu Jan 1 00:00:00 1970 From: Qiang Huang Subject: Re: cgroup_fj tests will stick the nort kernel Date: Thu, 25 Apr 2013 14:11:46 +0800 Message-ID: <5178C922.3060506@huawei.com> References: <5170F28F.3060002@huawei.com> <51750563.8050301@huawei.com> <1366646447.9609.131.camel@gandalf.local.home> <5176217E.8030008@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Steven Rostedt , Thomas Gleixner , linux-rt-users , zhangwei To: Li Zefan Return-path: Received: from szxga01-in.huawei.com ([119.145.14.64]:59648 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751332Ab3DYGTk (ORCPT ); Thu, 25 Apr 2013 02:19:40 -0400 In-Reply-To: <5176217E.8030008@huawei.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Hi Steven, A patch follows the comment, could you take a look? On 2013/4/23 13:51, Li Zefan wrote: > On 2013/4/23 0:00, Steven Rostedt wrote: >> On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote: >>> On 2013/4/19 15:30, Qiang Huang wrote: >>>> Hi, >>>> >>>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will >>>> stick the system when ran cpuset stress tests, it happens everytime. >>>> >>>> Here stick the system means there are almost no response from the system and >>>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked >>>> (according to the lockdep message), and it may do some response sometimes. >>>> >>>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but >>>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists. >>>> >>>> When the system is stuck, we will get the following message: >>>> # dmesg >>>> ... >>> >>> I've found the culprit after some investigation: >>> >>> From: Thomas Gleixner >>> Date: Fri, 04 Nov 2011 19:48:36 +0000 >>> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch >>> >>> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq() >>> and schedules tasks in other cpus, which ends up clearing some kernel threads' >>> PF_THREAD_BOUND flag... >> >> I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I >> would think it would also cause issues there too. >> > > I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not > online yet. It's because select_task_rq_fair() just returns prev_cpu, which is > task_cpu(p), which is 0 during system boot or some other cpu after boot, which > is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears > PF_THREAD_BOUND. > > I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang, I retested it, we do have the similar trouble when RT enabled, I might missed some config that avoid these warnings. And the patch below, I added your signed-off-by if it looks good to you. > but I did encoutner problems when testing in my box. > > I can trigger the bug with cgroup_fj.sh, or with taskset: > > # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done > > But system hung or tasks hung may not happen right in the test, but will happen > after some random operations (try compile kernel). > > And while running test I saw lots of warnings like this: > > [ 146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/ > 4:0/23 > [ 146.702069] caller is vmstat_update+0x22/0x60 > [ 146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49 > [ 146.702077] Call Trace: > [ 146.702087] [] debug_smp_processor_id+0x145/0x150 > [ 146.702091] [] vmstat_update+0x22/0x60 > [ 146.702097] [] process_one_work+0x203/0x610 > [ 146.702101] [] ? process_one_work+0x140/0x610 > [ 146.702105] [] ? worker_thread+0x6d/0x450 > [ 146.702109] [] ? refresh_cpu_vm_stats+0x1d0/0x1d0 > [ 146.702114] [] worker_thread+0x1a6/0x450 > [ 146.702118] [] ? manage_workers+0x250/0x250 > [ 146.702122] [] kthread+0xb6/0xc0 > [ 146.702130] [] kernel_thread_helper+0x4/0x10 > [ 146.702137] [] ? finish_task_switch+0x90/0x100 > [ 146.702142] [] ? retint_restore_args+0x13/0x13 > [ 146.702145] [] ? kthreadd+0x310/0x310 > [ 146.702149] [] ? gs_change+0x13/0x13 > > and after a while those warnings stopped, instead warnings like this popped up, > even after I stopped the test: > > [ 252.896103] ------------[ cut here ]------------ > [ 252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90() > [ 252.896110] Hardware name: Tecal RH2285 > [ 252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge > ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm > t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s > erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp > ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m > bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih > mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon > [ 252.896201] Pid: 9893, comm: dmesg Tainted: G W 3.4.24.05+ #49 > [ 252.896203] Call Trace: > [ 252.896208] [] ? unpin_current_cpu+0x7d/0x90 > [ 252.896212] [] ? unpin_current_cpu+0x7d/0x90 > [ 252.896217] [] warn_slowpath_common+0x7f/0xc0 > [ 252.896221] [] warn_slowpath_null+0x1a/0x20 > [ 252.896226] [] unpin_current_cpu+0x7d/0x90 > [ 252.896231] [] migrate_enable+0xeb/0x1e0 > [ 252.896235] [] handle_pte_fault+0x34b/0x980 > [ 252.896240] [] ? get_parent_ip+0x11/0x50 > [ 252.896244] [] ? get_parent_ip+0x11/0x50 > [ 252.896250] [] handle_mm_fault+0x14c/0x1e0 > [ 252.896254] [] do_page_fault+0x257/0x550 > [ 252.896260] [] ? do_mmap_pgoff+0x375/0x3a0 > [ 252.896264] [] ? error_sti+0x5/0x6 > [ 252.896269] [] ? trace_hardirqs_off_thunk+0x3a/0x3c > [ 252.896274] [] page_fault+0x25/0x30 > [ 252.896277] ---[ end trace 000000000000ae6e ]--- > > I didn't see those warnings if !RT_FULL. > > Here is the patch seems solve the problem, it looks all good in my box, my only concern is how will this affect our RT code. >>From 8e4fa4e9a7b510bdaf90b8140ce1e847375abccf Mon Sep 17 00:00:00 2001 From: Qiang Huang Date: Thu, 25 Apr 2013 10:22:01 +0800 Subject: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch" (commit 0d939066acdcb in v3.4-rt),. Select_fallback_rq() can be easilly called during system boot, because select_task_rq_fair() just return task_cpu(p) for bounded kernel threads, which is 0 during system boot and not in tsk_cpus_allowed, so select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my box, 1/3 bounded kernel threads will clear that flag after boot. And it will cause problems, for example: # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done this command will cause system hung. What's more, I don't see why we need to clear this flag any more, because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the optimization for PF_THREAD_BOUND on migrate_disable/enable. Signed-off-by: Qiang Huang Signed-off-by: Li Zefan --- kernel/sched/core.c | 6 ------ 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 751ec60..8db6e3b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1327,12 +1327,6 @@ out: } } - /* - * Clear PF_THREAD_BOUND, otherwise we wreckage - * migrate_disable/enable. See optimization for - * PF_THREAD_BOUND tasks there. - */ - p->flags &= ~PF_THREAD_BOUND; return dest_cpu; } -- 1.7.1