* [PATCH] core: workqueue: BUG_ON on workqueue recursion
@ 2010-02-03 11:27 Simon Kagstrom
2010-02-03 19:43 ` Oleg Nesterov
2010-02-04 2:00 ` [PATCH] core: workqueue: BUG_ON " Lai Jiangshan
0 siblings, 2 replies; 7+ messages in thread
From: Simon Kagstrom @ 2010-02-03 11:27 UTC (permalink / raw)
To: linux-kernel, laijs; +Cc: oleg, rusty, tj, akpm, mingo
When the workqueue is flushed from workqueue context (recursively), the
system enters a strange state where things at random (dependent on the
global workqueue) start misbehaving. For example, for us the console and
logins locks up while the web server continues running.
Since the system becomes unstable, change this to a BUG_ON instead.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
---
kernel/workqueue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dee4865..e617d29 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -482,7 +482,7 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
int active = 0;
struct wq_barrier barr;
- WARN_ON(cwq->thread == current);
+ BUG_ON(cwq->thread == current);
spin_lock_irq(&cwq->lock);
if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
--
1.6.0.4
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] core: workqueue: BUG_ON on workqueue recursion
2010-02-03 11:27 [PATCH] core: workqueue: BUG_ON on workqueue recursion Simon Kagstrom
@ 2010-02-03 19:43 ` Oleg Nesterov
2010-02-04 2:12 ` Tejun Heo
2010-02-04 2:00 ` [PATCH] core: workqueue: BUG_ON " Lai Jiangshan
1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2010-02-03 19:43 UTC (permalink / raw)
To: Simon Kagstrom; +Cc: linux-kernel, laijs, rusty, tj, akpm, mingo
On 02/03, Simon Kagstrom wrote:
>
> When the workqueue is flushed from workqueue context (recursively), the
> system enters a strange state where things at random (dependent on the
> global workqueue) start misbehaving. For example, for us the console and
> logins locks up while the web server continues running.
>
> Since the system becomes unstable, change this to a BUG_ON instead.
I agree with this patch. We are going to deadlock anyway, if the
condition is true the caller is cwq->current_work, this means
flush_cpu_workqueue() will insert the barrier and hang.
However,
> @@ -482,7 +482,7 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> int active = 0;
> struct wq_barrier barr;
>
> - WARN_ON(cwq->thread == current);
> + BUG_ON(cwq->thread == current);
Another option is change the code to do
if (WARN_ON(cwq->thread == current))
return;
This gives the kernel chance to survive after the warning.
What do you think?
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] core: workqueue: BUG_ON on workqueue recursion
2010-02-03 19:43 ` Oleg Nesterov
@ 2010-02-04 2:12 ` Tejun Heo
2010-02-04 8:02 ` [PATCH v2] core: workqueue: return " Simon Kagstrom
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2010-02-04 2:12 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Simon Kagstrom, linux-kernel, laijs, rusty, akpm, mingo
Hello,
On 02/04/2010 04:43 AM, Oleg Nesterov wrote:
> On 02/03, Simon Kagstrom wrote:
>>
>> When the workqueue is flushed from workqueue context (recursively), the
>> system enters a strange state where things at random (dependent on the
>> global workqueue) start misbehaving. For example, for us the console and
>> logins locks up while the web server continues running.
>>
>> Since the system becomes unstable, change this to a BUG_ON instead.
>
> I agree with this patch. We are going to deadlock anyway, if the
> condition is true the caller is cwq->current_work, this means
> flush_cpu_workqueue() will insert the barrier and hang.
>
> However,
>
>> @@ -482,7 +482,7 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
>> int active = 0;
>> struct wq_barrier barr;
>>
>> - WARN_ON(cwq->thread == current);
>> + BUG_ON(cwq->thread == current);
>
> Another option is change the code to do
>
> if (WARN_ON(cwq->thread == current))
> return;
>
> This gives the kernel chance to survive after the warning.
>
> What do you think?
Yeah, I like this one better too. Even solely for debugging,
WARN_ON() is better as often users don't have reliable ways to gather
kernel log after a BUG_ON().
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] core: workqueue: return on workqueue recursion
2010-02-04 2:12 ` Tejun Heo
@ 2010-02-04 8:02 ` Simon Kagstrom
2010-02-04 10:52 ` Oleg Nesterov
2010-02-12 8:47 ` Tejun Heo
0 siblings, 2 replies; 7+ messages in thread
From: Simon Kagstrom @ 2010-02-04 8:02 UTC (permalink / raw)
To: Tejun Heo, Oleg Nesterov, linux-kernel; +Cc: laijs, rusty, akpm, mingo
When the workqueue is flushed from workqueue context (recursively), the
system enters a strange state where things at random (dependent on the
global workqueue) start misbehaving. For example, for us the console and
logins locks up while the web server continues running.
The system becomes unstable since the workqueue barrier locks the
workqueue. This patch instead returns if the workqueue is flushed
recursively, which keeps the workqueue alive but warns.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
---
ChangeLog:
* Instead of BUG_ON, warn and return on recursive calls as suggested
by Oleg Nesterov and Tejun Hao
kernel/workqueue.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dee4865..49f8fa7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -482,7 +482,8 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
int active = 0;
struct wq_barrier barr;
- WARN_ON(cwq->thread == current);
+ if (WARN_ON(cwq->thread == current))
+ return 1;
spin_lock_irq(&cwq->lock);
if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
--
1.6.0.4
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2] core: workqueue: return on workqueue recursion
2010-02-04 8:02 ` [PATCH v2] core: workqueue: return " Simon Kagstrom
@ 2010-02-04 10:52 ` Oleg Nesterov
2010-02-12 8:47 ` Tejun Heo
1 sibling, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2010-02-04 10:52 UTC (permalink / raw)
To: Simon Kagstrom; +Cc: Tejun Heo, linux-kernel, laijs, rusty, akpm, mingo
On 02/04, Simon Kagstrom wrote:
>
> When the workqueue is flushed from workqueue context (recursively), the
> system enters a strange state where things at random (dependent on the
> global workqueue) start misbehaving. For example, for us the console and
> logins locks up while the web server continues running.
>
> The system becomes unstable since the workqueue barrier locks the
> workqueue. This patch instead returns if the workqueue is flushed
> recursively, which keeps the workqueue alive but warns.
>
> Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
> ---
> ChangeLog:
> * Instead of BUG_ON, warn and return on recursive calls as suggested
> by Oleg Nesterov and Tejun Hao
>
> kernel/workqueue.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index dee4865..49f8fa7 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -482,7 +482,8 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> int active = 0;
> struct wq_barrier barr;
>
> - WARN_ON(cwq->thread == current);
> + if (WARN_ON(cwq->thread == current))
> + return 1;
>
> spin_lock_irq(&cwq->lock);
> if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> --
> 1.6.0.4
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] core: workqueue: return on workqueue recursion
2010-02-04 8:02 ` [PATCH v2] core: workqueue: return " Simon Kagstrom
2010-02-04 10:52 ` Oleg Nesterov
@ 2010-02-12 8:47 ` Tejun Heo
1 sibling, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2010-02-12 8:47 UTC (permalink / raw)
To: Simon Kagstrom; +Cc: Oleg Nesterov, linux-kernel, laijs, rusty, akpm, mingo
On 02/04/2010 05:02 PM, Simon Kagstrom wrote:
> When the workqueue is flushed from workqueue context (recursively), the
> system enters a strange state where things at random (dependent on the
> global workqueue) start misbehaving. For example, for us the console and
> logins locks up while the web server continues running.
>
> The system becomes unstable since the workqueue barrier locks the
> workqueue. This patch instead returns if the workqueue is flushed
> recursively, which keeps the workqueue alive but warns.
>
> Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
applied to wq tree. Will push out when the merge window opens.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] core: workqueue: BUG_ON on workqueue recursion
2010-02-03 11:27 [PATCH] core: workqueue: BUG_ON on workqueue recursion Simon Kagstrom
2010-02-03 19:43 ` Oleg Nesterov
@ 2010-02-04 2:00 ` Lai Jiangshan
1 sibling, 0 replies; 7+ messages in thread
From: Lai Jiangshan @ 2010-02-04 2:00 UTC (permalink / raw)
To: Simon Kagstrom; +Cc: linux-kernel, oleg, rusty, tj, akpm, mingo
Simon Kagstrom wrote:
> When the workqueue is flushed from workqueue context (recursively), the
> system enters a strange state where things at random (dependent on the
> global workqueue) start misbehaving. For example, for us the console and
> logins locks up while the web server continues running.
>
> Since the system becomes unstable, change this to a BUG_ON instead.
For design view, we should disallow this recursion when using workqueue.
I like BUG_ON. But it is not a fatal end usually when it happens,
most developers would like to let system go on.
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>
> Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
> ---
> kernel/workqueue.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index dee4865..e617d29 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -482,7 +482,7 @@ static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> int active = 0;
> struct wq_barrier barr;
>
> - WARN_ON(cwq->thread == current);
> + BUG_ON(cwq->thread == current);
>
> spin_lock_irq(&cwq->lock);
> if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-02-12 8:42 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-03 11:27 [PATCH] core: workqueue: BUG_ON on workqueue recursion Simon Kagstrom
2010-02-03 19:43 ` Oleg Nesterov
2010-02-04 2:12 ` Tejun Heo
2010-02-04 8:02 ` [PATCH v2] core: workqueue: return " Simon Kagstrom
2010-02-04 10:52 ` Oleg Nesterov
2010-02-12 8:47 ` Tejun Heo
2010-02-04 2:00 ` [PATCH] core: workqueue: BUG_ON " Lai Jiangshan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).