Re: race condition in schedule_on_each

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: race condition in schedule_on_each_cpu()
       [not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
@ 2013-06-06 21:23 ` Tejun Heo
  2013-06-07  1:34   ` weiqi
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2013-06-06 21:23 UTC (permalink / raw)
  To: 韦奇; +Cc: torvalds, linux-kernel

Hello,

On Thu, Jun 06, 2013 at 06:14:46PM +0800, 韦奇 wrote:
> Hello, Tejun Heo
> thanks for your help,
> 1) I've test the two kernel version on this problem:
>     latest 3.10-rc3：(https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.10-rc3.tar.xz)
>      latest 3.0branch - 3.0.80：(https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.0.80.tar.xz
> 
> they all work fine when hot remove raid disk..

Thanks for verifying.

> | preemption                  | machine 1      | machine 2      | kversion    |
> -------------------------------------------------------------------------------
> | Fully Preemptible           |      stuck     |      no stuck  | 3.0.30-rt50 |
> | Low-Latency Desktop         |      no stuck  |      no stuck  | 3.0.30-rt50 |
> | Low-Latency Desktop         |      no stuck  |      --        | 3.0.30      |
> | default                     |      no stuck  |      --        | 3.0.80      |
> | default                     |      no stuck  |      --        | 3.10-rc3    |
> 
> could you tell me some way to debug this problem. for example, how
> to debug workqueue deadlock? I want to find the deadlock point.

I looked through the logs but the only worker depletion related
patches which pop up are around CPU hotplugs, so I don't think they
apply here.  If the problem is relatively easy to reproduce && you
can't move onto a newer kernel, I'm afraid bisection probably is the
best option.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: race condition in schedule_on_each_cpu()
  2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
@ 2013-06-07  1:34   ` weiqi
  2013-06-07  2:24     ` weiqi
  0 siblings, 1 reply; 6+ messages in thread
From: weiqi @ 2013-06-07  1:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: 韦奇, torvalds, linux-kernel

it's preemption mode related ,

on the 3.0.30-rt50, only   config kernel  with highest preemption level 
(Fully Preemptible Kernel (RT)) in cpu preemption model
will cause problem

and even i use the "Preemptible Kernel" or "Preemptible Kernel 
(Low-Latency Desktop)"  the problem would not happen..


 > I looked through the logs but the only worker depletion related 
patches which pop up are around CPU hotplugs, so I don't think they 
apply here. If the problem is relatively easy to reproduce && you can't 
move onto a newer kernel, I'm afraid bisection probably is the best 
option. Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: race condition in schedule_on_each_cpu()
  2013-06-07  1:34   ` weiqi
@ 2013-06-07  2:24     ` weiqi
  2013-06-07 23:22       ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: weiqi @ 2013-06-07  2:24 UTC (permalink / raw)
  To: Tejun Heo; +Cc: weiqi@kylinos.com.cn, torvalds, linux-kernel

In the previous  message,You mentioned:


>> by the way, I'm wondering about  what's the race condition before
which  doesn't exist now


> Before the commit you originally quoted, the calling thread could be
preempted and migrated to another CPU before get_online_cpus() thus
ending up executing the function twice on the new cpu but skipping the
old one.


does this situation will happen in "Full preemption" config, on 3.0.30-rt50?







于 2013年06月07日 09:34, weiqi@kylinos.com.cn 写道:
> it's preemption mode related ,
>
> on the 3.0.30-rt50, only   config kernel  with highest preemption 
> level (Fully Preemptible Kernel (RT)) in cpu preemption model
> will cause problem
>
> and even i use the "Preemptible Kernel" or "Preemptible Kernel 
> (Low-Latency Desktop)"  the problem would not happen..
>
>
> > I looked through the logs but the only worker depletion related 
> patches which pop up are around CPU hotplugs, so I don't think they 
> apply here. If the problem is relatively easy to reproduce && you 
> can't move onto a newer kernel, I'm afraid bisection probably is the 
> best option. Thanks!
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: race condition in schedule_on_each_cpu()
  2013-06-07  2:24     ` weiqi
@ 2013-06-07 23:22       ` Tejun Heo
       [not found]         ` <51B27744.6090507@kylinos.com.cn>
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2013-06-07 23:22 UTC (permalink / raw)
  To: weiqi@kylinos.com.cn; +Cc: torvalds, linux-kernel

On Fri, Jun 07, 2013 at 10:24:50AM +0800, weiqi@kylinos.com.cn wrote:
> >Before the commit you originally quoted, the calling thread could be
> preempted and migrated to another CPU before get_online_cpus() thus
> ending up executing the function twice on the new cpu but skipping the
> old one.
> 
> does this situation will happen in "Full preemption" config, on 3.0.30-rt50?

Sure, if the fix hasn't been backported.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <51B27744.6090507@kylinos.com.cn>]

* Re: race condition in schedule_on_each_cpu()
       [not found]         ` <51B27744.6090507@kylinos.com.cn>
@ 2013-06-08 11:30           ` weiqi
  0 siblings, 0 replies; 6+ messages in thread
From: weiqi @ 2013-06-08 11:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: weiqi@kylinos.com.cn, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

Hello Tejun Heo,

I've backported the  schedule_on_each_cpu()  "direct excution" patch on 
3.0.30-rt50,
and  It  fixed my problem.

attachment is the effective patch.

However, I do not understand why  machine1 can expose problem,  but 
machine2 not.

I guess, because it's  rt-kernel's  preempt level related, so , is this 
difference due to cpu performance?

How do you think about this ?

Thank you~

[-- Attachment #2: direct_execution.patch --]
[-- Type: text/plain, Size: 1170 bytes --]

diff -up linux-3.0.30-rt50/kernel/workqueue.c.bak linux-3.0.30-rt50/kernel/workqueue.c
--- linux-3.0.30-rt50/kernel/workqueue.c.bak	2013-06-08 19:09:06.801059232 +0800
+++ linux-3.0.30-rt50/kernel/workqueue.c	2013-06-08 19:09:15.680069626 +0800
@@ -1922,6 +1922,7 @@ static int worker_thread(void *__worker)
 
 	/* tell the scheduler that this is a workqueue worker */
 	worker->task->flags |= PF_WQ_WORKER;
+	smp_mb();
 woke_up:
 	spin_lock_irq(&gcwq->lock);
 
@@ -2736,6 +2737,7 @@ EXPORT_SYMBOL(schedule_delayed_work_on);
 int schedule_on_each_cpu(work_func_t func)
 {
 	int cpu;
+	int orig = -1;
 	struct work_struct __percpu *works;
 
 	works = alloc_percpu(struct work_struct);
@@ -2744,13 +2746,20 @@ int schedule_on_each_cpu(work_func_t fun
 
 	get_online_cpus();
 
+	if(current->flags & PF_WQ_WORKER)
+		orig = raw_smp_processor_id();
+
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = per_cpu_ptr(works, cpu);
 
 		INIT_WORK(work, func);
-		schedule_work_on(cpu, work);
+		if(cpu != orig)
+			schedule_work_on(cpu, work);
 	}
 
+	if (orig >= 0)
+		func(per_cpu_ptr(works,orig));
+
 	for_each_online_cpu(cpu)
 		flush_work(per_cpu_ptr(works, cpu));
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <51A7FFE8.6060204@kylinos.com.cn>]

[parent not found: <20130531023246.GD30479@mtj.dyndns.org>]

[parent not found: <51A821F3.1000605@kylinos.com.cn>]

* Re: race condition in schedule_on_each_cpu()
       [not found]   ` <51A821F3.1000605@kylinos.com.cn>
@ 2013-05-31  5:03     ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-05-31  5:03 UTC (permalink / raw)
  To: weiqi@kylinos.com.cn; +Cc: torvalds, linux-kernel

On Fri, May 31, 2013 at 12:07:15PM +0800, weiqi@kylinos.com.cn wrote:
> 
> >the only way for them to get stuck is if there aren't enough execution
> >resources (ie. if a new thread can't be created) but OOM killers would
> >have been activated if that were the case.
> 
> The following is a detailed description of our scenerio:
> 
> 1.  after turnning off the the disk array, the ps results is shown
> in *ps*, which indicates the kworker/1:0 kworker/1:2 are stuck
> 
> 2.  the call stack for the kworkers are shown in *stack_xxx.txt*
> 
> 3.  the workqueue operations during that period is shown in
> *out.txt*, use ftrace
> (we added a new trace point /workqueue_queue_work_insert/,
> immediately before insert_wq_barrier, in the function
> start_flush_work. its implementation is shown in
> *trace_insert_wq_barrier.txt*)
>        from the results int *grep_kwork1:0_from_out.txt*, we can see:
>               kworker/1:0 is stuck after start work
> /fc_starget_delete/ at time 360.801271,  and  catch the
> insert_wq_barrier trace_info behind this
> 
> 
> 4.  from out.txt , we can see, there are altogether three
> /fc_starget_delete/ work enqueued.
>       atfer the point of deadlock, kworker/1:1 and kworker/1:3 is
> executing ...
> 
> 
> 5.  if we let the scsi_transport_fc uses only one worker thread,
> i.e.,  change scsi_transport_fc.c : fc_host_setup()
>               alloc_workqueue(fc_host->work_q_name, 0, 0) to
>                      alloc_workqueue(fc_host->work_q_name, WQ_UNBOUND, 1)
> 
>               alloc_workqueue(fc_host->devloss_work_q_name, 0, 0) to
> alloc_workqueue(fc_host->devloss_work_q_name, WQ_UNBOUND, 1)
> 
>      the deadlock won't occur.
> >Can you please test a recent kernel?  How easily can you reproduce the
> >issue?
> >
> it's occured every time when hot remove disk array.
> 
> I'll test recent kernel after a while , but  this problem in 3.0.30
> really confused me

Yeah, it definitely sounds like concurrency depletion.  There have
been some fixes and substantial changes in the area, so I really wanna
find out whether the problem is reproducible in recent vanilla kernel
- say, v3.9 or, even better, v3.10-rc2.  Can you please try to
reproduce the problem with a newer kernel?

> by the way, I'm wondering about  what's the race condition before
> which  doesn't exist now

Before the commit you originally quoted, the calling thread could be
preempted and migrated to another CPU before get_online_cpus() thus
ending up executing the function twice on the new cpu but skipping the
old one.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-08 11:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
2013-06-07  1:34   ` weiqi
2013-06-07  2:24     ` weiqi
2013-06-07 23:22       ` Tejun Heo
     [not found]         ` <51B27744.6090507@kylinos.com.cn>
2013-06-08 11:30           ` weiqi
     [not found] <51A7FFE8.6060204@kylinos.com.cn>
     [not found] ` <20130531023246.GD30479@mtj.dyndns.org>
     [not found]   ` <51A821F3.1000605@kylinos.com.cn>
2013-05-31  5:03     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox