From mboxrd@z Thu Jan 1 00:00:00 1970 From: Li Zefan Subject: Re: cgroup_fj tests will stick the nort kernel Date: Tue, 23 Apr 2013 13:51:58 +0800 Message-ID: <5176217E.8030008@huawei.com> References: <5170F28F.3060002@huawei.com> <51750563.8050301@huawei.com> <1366646447.9609.131.camel@gandalf.local.home> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Thomas Gleixner , Qiang Huang , linux-rt-users , zhangwei To: Steven Rostedt Return-path: Received: from szxga02-in.huawei.com ([119.145.14.65]:52924 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754963Ab3DWFwY (ORCPT ); Tue, 23 Apr 2013 01:52:24 -0400 In-Reply-To: <1366646447.9609.131.camel@gandalf.local.home> Sender: linux-rt-users-owner@vger.kernel.org List-ID: On 2013/4/23 0:00, Steven Rostedt wrote: > On Mon, 2013-04-22 at 17:39 +0800, Li Zefan wrote: >> On 2013/4/19 15:30, Qiang Huang wrote: >>> Hi, >>> >>> I ran cgroup_fj tests on RT kernel with PREEMPT_RT_FULL disabled, it will >>> stick the system when ran cpuset stress tests, it happens everytime. >>> >>> Here stick the system means there are almost no response from the system and >>> we can hardly do anything on the terminal, but kernel isn't crash nor deadlocked >>> (according to the lockdep message), and it may do some response sometimes. >>> >>> The problem exists on all RT versions from 3.4.18-rt29 to 3.4.37-rt51 AFAIK, but >>> without RT patches or with PREEMPT_RT_FULL enabled, the problem isn't exists. >>> >>> When the system is stuck, we will get the following message: >>> # dmesg >>> ... >> >> I've found the culprit after some investigation: >> >> From: Thomas Gleixner >> Date: Fri, 04 Nov 2011 19:48:36 +0000 >> Subject: sched-clear-pf-thread-bound-on-fallback-rq.patch >> >> At system boot when some cpus haven't been up, the scheduler calls select_fallback_rq() >> and schedules tasks in other cpus, which ends up clearing some kernel threads' >> PF_THREAD_BOUND flag... > > I'm curious to why this doesn't break when PREEMPT_RT_FULL is enabled. I > would think it would also cause issues there too. > I was wrong in saying that PF_THREAD_BOUND is cleared because some cpus are not online yet. It's because select_task_rq_fair() just returns prev_cpu, which is task_cpu(p), which is 0 during system boot or some other cpu after boot, which is not in tsk_cpus_allowed, so select_fallback_rq() is called and it clears PF_THREAD_BOUND. I don't know why it didn't cause trouble when RT_FULL is enabled for Huang Qiang, but I did encoutner problems when testing in my box. I can trigger the bug with cgroup_fj.sh, or with taskset: # for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done But system hung or tasks hung may not happen right in the test, but will happen after some random operations (try compile kernel). And while running test I saw lots of warnings like this: [ 146.702056] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: kworker/ 4:0/23 [ 146.702069] caller is vmstat_update+0x22/0x60 [ 146.702075] Pid: 23, comm: kworker/4:0 Not tainted 3.4.24.05+ #49 [ 146.702077] Call Trace: [ 146.702087] [] debug_smp_processor_id+0x145/0x150 [ 146.702091] [] vmstat_update+0x22/0x60 [ 146.702097] [] process_one_work+0x203/0x610 [ 146.702101] [] ? process_one_work+0x140/0x610 [ 146.702105] [] ? worker_thread+0x6d/0x450 [ 146.702109] [] ? refresh_cpu_vm_stats+0x1d0/0x1d0 [ 146.702114] [] worker_thread+0x1a6/0x450 [ 146.702118] [] ? manage_workers+0x250/0x250 [ 146.702122] [] kthread+0xb6/0xc0 [ 146.702130] [] kernel_thread_helper+0x4/0x10 [ 146.702137] [] ? finish_task_switch+0x90/0x100 [ 146.702142] [] ? retint_restore_args+0x13/0x13 [ 146.702145] [] ? kthreadd+0x310/0x310 [ 146.702149] [] ? gs_change+0x13/0x13 and after a while those warnings stopped, instead warnings like this popped up, even after I stopped the test: [ 252.896103] ------------[ cut here ]------------ [ 252.896107] WARNING: at kernel/cpu.c:157 unpin_current_cpu+0x7d/0x90() [ 252.896110] Hardware name: Tecal RH2285 [ 252.896112] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge ipv6 stp llc cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf binfm t_misc fuse loop dm_mod tpm_tis tpm coretemp crc32c_intel ghash_clmulni_intel aesni_intel sg s erio_raw cryptd aes_x86_64 tpm_bios microcode i2c_i801 iTCO_wdt i2c_core bnx2 iTCO_vendor_supp ort mptctl button usbhid hid uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 m bcache jbd fan processor ide_pci_generic ide_core ata_generic ata_piix libata mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon [ 252.896201] Pid: 9893, comm: dmesg Tainted: G W 3.4.24.05+ #49 [ 252.896203] Call Trace: [ 252.896208] [] ? unpin_current_cpu+0x7d/0x90 [ 252.896212] [] ? unpin_current_cpu+0x7d/0x90 [ 252.896217] [] warn_slowpath_common+0x7f/0xc0 [ 252.896221] [] warn_slowpath_null+0x1a/0x20 [ 252.896226] [] unpin_current_cpu+0x7d/0x90 [ 252.896231] [] migrate_enable+0xeb/0x1e0 [ 252.896235] [] handle_pte_fault+0x34b/0x980 [ 252.896240] [] ? get_parent_ip+0x11/0x50 [ 252.896244] [] ? get_parent_ip+0x11/0x50 [ 252.896250] [] handle_mm_fault+0x14c/0x1e0 [ 252.896254] [] do_page_fault+0x257/0x550 [ 252.896260] [] ? do_mmap_pgoff+0x375/0x3a0 [ 252.896264] [] ? error_sti+0x5/0x6 [ 252.896269] [] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 252.896274] [] page_fault+0x25/0x30 [ 252.896277] ---[ end trace 000000000000ae6e ]--- I didn't see those warnings if !RT_FULL.