From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sebastian Andrzej Siewior Subject: Re: [PATCH] sched: don't clear PF_THREAD_BOUND in select_fallback_rq Date: Fri, 3 May 2013 22:46:10 +0200 Message-ID: <20130503204610.GG8230@linutronix.de> References: <5178F0DE.8030808@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: Steven Rostedt , Thomas Gleixner , linux-rt-users , Li Zefan , zhangwei To: Qiang Huang Return-path: Received: from www.linutronix.de ([62.245.132.108]:46770 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1763522Ab3ECUqM (ORCPT ); Fri, 3 May 2013 16:46:12 -0400 Content-Disposition: inline In-Reply-To: <5178F0DE.8030808@huawei.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: * Qiang Huang | 2013-04-25 17:01:18 [+0800]: >This is revert of "sched-clear-pf-thread-bound-on-fallback-rq.patch" >(commit 0d939066acdcb in v3.4-rt),. > >Select_fallback_rq() can be easilly called during system boot, because >select_task_rq_fair() just return task_cpu(p) for bounded kernel threads, >which is 0 during system boot and not in tsk_cpus_allowed, so >select_fallback_rq() is called and PF_THREAD_BOUND is cleared. In my >box, 1/3 bounded kernel threads will clear that flag after boot. > >And it will cause problems, for example: ># for pid in `ps -e -o pid`; do taskset -p -c 0-15 $pid; done >this command will cause system hung. > >What's more, I don't see why we need to clear this flag any more, >because "cpu/rt: Rework cpu down for PREEMPT_RT" already remove the >optimization for PF_THREAD_BOUND on migrate_disable/enable. > >Signed-off-by: Qiang Huang I can execute the command you mendtion above on v3.4 and v3.8 with no hangs. Can you give me number of your cpus and maybe the config or another detail? I played a little with it on v3.8. That code you asked to remove triggers only on cpu down for kernel threads which do not use the park/unpark infrastructure that is "posixcputmr" and "migration" which get removed later. The only reason why "migration" pops up is so it can leave. I managed to trigger it as well for worker threads. The threads which were bound the CPU, that went down, are marked DISASSOCIATED in gcwq_unbind_fn() and we lose that PF_THREAD_BOUND flag once that thread is used. After the CPU gets back, it is assigned to the "old" cpu via worker_maybe_bind_and_lock() and the PF_THREAD_BOUND flag is missing. So that is not looking that good. Will look at this later. Sebastian