From: Tejun Heo <tj@kernel.org>
To: "weiqi@kylinos.com.cn" <weiqi@kylinos.com.cn>
Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: race condition in schedule_on_each_cpu()
Date: Fri, 31 May 2013 14:03:16 +0900 [thread overview]
Message-ID: <20130531050316.GC7720@mtj.dyndns.org> (raw)
In-Reply-To: <51A821F3.1000605@kylinos.com.cn>
On Fri, May 31, 2013 at 12:07:15PM +0800, weiqi@kylinos.com.cn wrote:
>
> >the only way for them to get stuck is if there aren't enough execution
> >resources (ie. if a new thread can't be created) but OOM killers would
> >have been activated if that were the case.
>
> The following is a detailed description of our scenerio:
>
> 1. after turnning off the the disk array, the ps results is shown
> in *ps*, which indicates the kworker/1:0 kworker/1:2 are stuck
>
> 2. the call stack for the kworkers are shown in *stack_xxx.txt*
>
> 3. the workqueue operations during that period is shown in
> *out.txt*, use ftrace
> (we added a new trace point /workqueue_queue_work_insert/,
> immediately before insert_wq_barrier, in the function
> start_flush_work. its implementation is shown in
> *trace_insert_wq_barrier.txt*)
> from the results int *grep_kwork1:0_from_out.txt*, we can see:
> kworker/1:0 is stuck after start work
> /fc_starget_delete/ at time 360.801271, and catch the
> insert_wq_barrier trace_info behind this
>
>
> 4. from out.txt , we can see, there are altogether three
> /fc_starget_delete/ work enqueued.
> atfer the point of deadlock, kworker/1:1 and kworker/1:3 is
> executing ...
>
>
> 5. if we let the scsi_transport_fc uses only one worker thread,
> i.e., change scsi_transport_fc.c : fc_host_setup()
> alloc_workqueue(fc_host->work_q_name, 0, 0) to
> alloc_workqueue(fc_host->work_q_name, WQ_UNBOUND, 1)
>
> alloc_workqueue(fc_host->devloss_work_q_name, 0, 0) to
> alloc_workqueue(fc_host->devloss_work_q_name, WQ_UNBOUND, 1)
>
> the deadlock won't occur.
> >Can you please test a recent kernel? How easily can you reproduce the
> >issue?
> >
> it's occured every time when hot remove disk array.
>
> I'll test recent kernel after a while , but this problem in 3.0.30
> really confused me
Yeah, it definitely sounds like concurrency depletion. There have
been some fixes and substantial changes in the area, so I really wanna
find out whether the problem is reproducible in recent vanilla kernel
- say, v3.9 or, even better, v3.10-rc2. Can you please try to
reproduce the problem with a newer kernel?
> by the way, I'm wondering about what's the race condition before
> which doesn't exist now
Before the commit you originally quoted, the calling thread could be
preempted and migrated to another CPU before get_online_cpus() thus
ending up executing the function twice on the new cpu but skipping the
old one.
Thanks.
--
tejun
next parent reply other threads:[~2013-05-31 5:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <51A7FFE8.6060204@kylinos.com.cn>
[not found] ` <20130531023246.GD30479@mtj.dyndns.org>
[not found] ` <51A821F3.1000605@kylinos.com.cn>
2013-05-31 5:03 ` Tejun Heo [this message]
[not found] <tencent_0777D84B54B4163A3B85255A@qq.com>
2013-06-06 21:23 ` race condition in schedule_on_each_cpu() Tejun Heo
2013-06-07 1:34 ` weiqi
2013-06-07 2:24 ` weiqi
2013-06-07 23:22 ` Tejun Heo
[not found] ` <51B27744.6090507@kylinos.com.cn>
2013-06-08 11:30 ` weiqi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130531050316.GC7720@mtj.dyndns.org \
--to=tj@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=weiqi@kylinos.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox