* Re: race condition in schedule_on_each_cpu() [not found] <tencent_0777D84B54B4163A3B85255A@qq.com> @ 2013-06-06 21:23 ` Tejun Heo 2013-06-07 1:34 ` weiqi 0 siblings, 1 reply; 6+ messages in thread From: Tejun Heo @ 2013-06-06 21:23 UTC (permalink / raw) To: 韦奇; +Cc: torvalds, linux-kernel Hello, On Thu, Jun 06, 2013 at 06:14:46PM +0800, 韦奇 wrote: > Hello, Tejun Heo > thanks for your help, > 1) I've test the two kernel version on this problem: > latest 3.10-rc3:(https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.10-rc3.tar.xz) > latest 3.0branch - 3.0.80:(https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.0.80.tar.xz > > they all work fine when hot remove raid disk.. Thanks for verifying. > | preemption | machine 1 | machine 2 | kversion | > ------------------------------------------------------------------------------- > | Fully Preemptible | stuck | no stuck | 3.0.30-rt50 | > | Low-Latency Desktop | no stuck | no stuck | 3.0.30-rt50 | > | Low-Latency Desktop | no stuck | -- | 3.0.30 | > | default | no stuck | -- | 3.0.80 | > | default | no stuck | -- | 3.10-rc3 | > > could you tell me some way to debug this problem. for example, how > to debug workqueue deadlock? I want to find the deadlock point. I looked through the logs but the only worker depletion related patches which pop up are around CPU hotplugs, so I don't think they apply here. If the problem is relatively easy to reproduce && you can't move onto a newer kernel, I'm afraid bisection probably is the best option. Thanks! -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu() 2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo @ 2013-06-07 1:34 ` weiqi 2013-06-07 2:24 ` weiqi 0 siblings, 1 reply; 6+ messages in thread From: weiqi @ 2013-06-07 1:34 UTC (permalink / raw) To: Tejun Heo; +Cc: 韦奇, torvalds, linux-kernel it's preemption mode related , on the 3.0.30-rt50, only config kernel with highest preemption level (Fully Preemptible Kernel (RT)) in cpu preemption model will cause problem and even i use the "Preemptible Kernel" or "Preemptible Kernel (Low-Latency Desktop)" the problem would not happen.. > I looked through the logs but the only worker depletion related patches which pop up are around CPU hotplugs, so I don't think they apply here. If the problem is relatively easy to reproduce && you can't move onto a newer kernel, I'm afraid bisection probably is the best option. Thanks! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu() 2013-06-07 1:34 ` weiqi @ 2013-06-07 2:24 ` weiqi 2013-06-07 23:22 ` Tejun Heo 0 siblings, 1 reply; 6+ messages in thread From: weiqi @ 2013-06-07 2:24 UTC (permalink / raw) To: Tejun Heo; +Cc: weiqi@kylinos.com.cn, torvalds, linux-kernel In the previous message,You mentioned: >> by the way, I'm wondering about what's the race condition before which doesn't exist now > Before the commit you originally quoted, the calling thread could be preempted and migrated to another CPU before get_online_cpus() thus ending up executing the function twice on the new cpu but skipping the old one. does this situation will happen in "Full preemption" config, on 3.0.30-rt50? 于 2013年06月07日 09:34, weiqi@kylinos.com.cn 写道: > it's preemption mode related , > > on the 3.0.30-rt50, only config kernel with highest preemption > level (Fully Preemptible Kernel (RT)) in cpu preemption model > will cause problem > > and even i use the "Preemptible Kernel" or "Preemptible Kernel > (Low-Latency Desktop)" the problem would not happen.. > > > > I looked through the logs but the only worker depletion related > patches which pop up are around CPU hotplugs, so I don't think they > apply here. If the problem is relatively easy to reproduce && you > can't move onto a newer kernel, I'm afraid bisection probably is the > best option. Thanks! > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: race condition in schedule_on_each_cpu() 2013-06-07 2:24 ` weiqi @ 2013-06-07 23:22 ` Tejun Heo [not found] ` <51B27744.6090507@kylinos.com.cn> 0 siblings, 1 reply; 6+ messages in thread From: Tejun Heo @ 2013-06-07 23:22 UTC (permalink / raw) To: weiqi@kylinos.com.cn; +Cc: torvalds, linux-kernel On Fri, Jun 07, 2013 at 10:24:50AM +0800, weiqi@kylinos.com.cn wrote: > >Before the commit you originally quoted, the calling thread could be > preempted and migrated to another CPU before get_online_cpus() thus > ending up executing the function twice on the new cpu but skipping the > old one. > > does this situation will happen in "Full preemption" config, on 3.0.30-rt50? Sure, if the fix hasn't been backported. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <51B27744.6090507@kylinos.com.cn>]
* Re: race condition in schedule_on_each_cpu() [not found] ` <51B27744.6090507@kylinos.com.cn> @ 2013-06-08 11:30 ` weiqi 0 siblings, 0 replies; 6+ messages in thread From: weiqi @ 2013-06-08 11:30 UTC (permalink / raw) To: Tejun Heo; +Cc: weiqi@kylinos.com.cn, torvalds, linux-kernel [-- Attachment #1: Type: text/plain, Size: 403 bytes --] Hello Tejun Heo, I've backported the schedule_on_each_cpu() "direct excution" patch on 3.0.30-rt50, and It fixed my problem. attachment is the effective patch. However, I do not understand why machine1 can expose problem, but machine2 not. I guess, because it's rt-kernel's preempt level related, so , is this difference due to cpu performance? How do you think about this ? Thank you~ [-- Attachment #2: direct_execution.patch --] [-- Type: text/plain, Size: 1170 bytes --] diff -up linux-3.0.30-rt50/kernel/workqueue.c.bak linux-3.0.30-rt50/kernel/workqueue.c --- linux-3.0.30-rt50/kernel/workqueue.c.bak 2013-06-08 19:09:06.801059232 +0800 +++ linux-3.0.30-rt50/kernel/workqueue.c 2013-06-08 19:09:15.680069626 +0800 @@ -1922,6 +1922,7 @@ static int worker_thread(void *__worker) /* tell the scheduler that this is a workqueue worker */ worker->task->flags |= PF_WQ_WORKER; + smp_mb(); woke_up: spin_lock_irq(&gcwq->lock); @@ -2736,6 +2737,7 @@ EXPORT_SYMBOL(schedule_delayed_work_on); int schedule_on_each_cpu(work_func_t func) { int cpu; + int orig = -1; struct work_struct __percpu *works; works = alloc_percpu(struct work_struct); @@ -2744,13 +2746,20 @@ int schedule_on_each_cpu(work_func_t fun get_online_cpus(); + if(current->flags & PF_WQ_WORKER) + orig = raw_smp_processor_id(); + for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, func); - schedule_work_on(cpu, work); + if(cpu != orig) + schedule_work_on(cpu, work); } + if (orig >= 0) + func(per_cpu_ptr(works,orig)); + for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <51A7FFE8.6060204@kylinos.com.cn>]
[parent not found: <20130531023246.GD30479@mtj.dyndns.org>]
[parent not found: <51A821F3.1000605@kylinos.com.cn>]
* Re: race condition in schedule_on_each_cpu() [not found] ` <51A821F3.1000605@kylinos.com.cn> @ 2013-05-31 5:03 ` Tejun Heo 0 siblings, 0 replies; 6+ messages in thread From: Tejun Heo @ 2013-05-31 5:03 UTC (permalink / raw) To: weiqi@kylinos.com.cn; +Cc: torvalds, linux-kernel On Fri, May 31, 2013 at 12:07:15PM +0800, weiqi@kylinos.com.cn wrote: > > >the only way for them to get stuck is if there aren't enough execution > >resources (ie. if a new thread can't be created) but OOM killers would > >have been activated if that were the case. > > The following is a detailed description of our scenerio: > > 1. after turnning off the the disk array, the ps results is shown > in *ps*, which indicates the kworker/1:0 kworker/1:2 are stuck > > 2. the call stack for the kworkers are shown in *stack_xxx.txt* > > 3. the workqueue operations during that period is shown in > *out.txt*, use ftrace > (we added a new trace point /workqueue_queue_work_insert/, > immediately before insert_wq_barrier, in the function > start_flush_work. its implementation is shown in > *trace_insert_wq_barrier.txt*) > from the results int *grep_kwork1:0_from_out.txt*, we can see: > kworker/1:0 is stuck after start work > /fc_starget_delete/ at time 360.801271, and catch the > insert_wq_barrier trace_info behind this > > > 4. from out.txt , we can see, there are altogether three > /fc_starget_delete/ work enqueued. > atfer the point of deadlock, kworker/1:1 and kworker/1:3 is > executing ... > > > 5. if we let the scsi_transport_fc uses only one worker thread, > i.e., change scsi_transport_fc.c : fc_host_setup() > alloc_workqueue(fc_host->work_q_name, 0, 0) to > alloc_workqueue(fc_host->work_q_name, WQ_UNBOUND, 1) > > alloc_workqueue(fc_host->devloss_work_q_name, 0, 0) to > alloc_workqueue(fc_host->devloss_work_q_name, WQ_UNBOUND, 1) > > the deadlock won't occur. > >Can you please test a recent kernel? How easily can you reproduce the > >issue? > > > it's occured every time when hot remove disk array. > > I'll test recent kernel after a while , but this problem in 3.0.30 > really confused me Yeah, it definitely sounds like concurrency depletion. There have been some fixes and substantial changes in the area, so I really wanna find out whether the problem is reproducible in recent vanilla kernel - say, v3.9 or, even better, v3.10-rc2. Can you please try to reproduce the problem with a newer kernel? > by the way, I'm wondering about what's the race condition before > which doesn't exist now Before the commit you originally quoted, the calling thread could be preempted and migrated to another CPU before get_online_cpus() thus ending up executing the function twice on the new cpu but skipping the old one. Thanks. -- tejun ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-06-08 11:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
2013-06-07 1:34 ` weiqi
2013-06-07 2:24 ` weiqi
2013-06-07 23:22 ` Tejun Heo
[not found] ` <51B27744.6090507@kylinos.com.cn>
2013-06-08 11:30 ` weiqi
[not found] <51A7FFE8.6060204@kylinos.com.cn>
[not found] ` <20130531023246.GD30479@mtj.dyndns.org>
[not found] ` <51A821F3.1000605@kylinos.com.cn>
2013-05-31 5:03 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox