* [BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue()
@ 2007-05-25 3:21 Jason Wessel
2007-05-25 8:16 ` Jarek Poplawski
0 siblings, 1 reply; 3+ messages in thread
From: Jason Wessel @ 2007-05-25 3:21 UTC (permalink / raw)
To: linux-kernel
There is a problem with the calling cancel_rearming_delayed_work if the
timer was not yet active.
I see this problem when netpoll_cleanup() is called without having done
any work because it had not processed any packets yet. The problem
appears to be a result of the loop check
while(!cancel_delayed_work(dwork)). This endlessly loops because
del_timer_sync() can return 0 or 1 for success which is passed back as a
result to the final invariant check for the loop. In this particular
case zero will always be returned because the timer is not active.
It is possible that the problem exists else where, but I thought I would
ask if this is expected?
#0 del_timer_sync (timer=0xc7ed90f8) at kernel/timer.c:530
#1 0xc012f08e in cancel_rearming_delayed_workqueue (wq=0xc7fee800,
dwork=0xc7ed90e8) at include/linux/workqueue.h:201
#2 0xc012f0af in cancel_rearming_delayed_work (dwork=0x20)
at kernel/workqueue.c:680
#3 0xc0312f78 in netpoll_cleanup (np=0xc880bf40) at net/core/netpoll.c:784
Possible fix.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Index: linux-2.6.21/kernel/workqueue.c
===================================================================
--- linux-2.6.21.orig/kernel/workqueue.c
+++ linux-2.6.21/kernel/workqueue.c
@@ -666,7 +666,7 @@ EXPORT_SYMBOL(flush_scheduled_work);
void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq,
struct delayed_work *dwork)
{
- while (!cancel_delayed_work(dwork))
+ while (cancel_delayed_work(dwork) > 0)
flush_workqueue(wq);
}
EXPORT_SYMBOL(cancel_rearming_delayed_workqueue);
Thanks,
Jason.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue()
2007-05-25 3:21 [BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue() Jason Wessel
@ 2007-05-25 8:16 ` Jarek Poplawski
2007-05-25 13:21 ` Jason Wessel
0 siblings, 1 reply; 3+ messages in thread
From: Jarek Poplawski @ 2007-05-25 8:16 UTC (permalink / raw)
To: Jason Wessel; +Cc: linux-kernel
On 25-05-2007 05:21, Jason Wessel wrote:
> There is a problem with the calling cancel_rearming_delayed_work if the
> timer was not yet active.
>
> I see this problem when netpoll_cleanup() is called without having done
> any work because it had not processed any packets yet. The problem
> appears to be a result of the loop check
> while(!cancel_delayed_work(dwork)). This endlessly loops because
> del_timer_sync() can return 0 or 1 for success which is passed back as a
> result to the final invariant check for the loop. In this particular
> case zero will always be returned because the timer is not active.
>
> It is possible that the problem exists else where, but I thought I would
> ask if this is expected?
>
> #0 del_timer_sync (timer=0xc7ed90f8) at kernel/timer.c:530
> #1 0xc012f08e in cancel_rearming_delayed_workqueue (wq=0xc7fee800,
> dwork=0xc7ed90e8) at include/linux/workqueue.h:201
> #2 0xc012f0af in cancel_rearming_delayed_work (dwork=0x20)
> at kernel/workqueue.c:680
> #3 0xc0312f78 in netpoll_cleanup (np=0xc880bf40) at net/core/netpoll.c:784
>
> Possible fix.
>
> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
>
> Index: linux-2.6.21/kernel/workqueue.c
> ===================================================================
> --- linux-2.6.21.orig/kernel/workqueue.c
> +++ linux-2.6.21/kernel/workqueue.c
> @@ -666,7 +666,7 @@ EXPORT_SYMBOL(flush_scheduled_work);
> void cancel_rearming_delayed_workqueue(struct workqueue_struct *wq,
> struct delayed_work *dwork)
> {
> - while (!cancel_delayed_work(dwork))
> + while (cancel_delayed_work(dwork) > 0)
> flush_workqueue(wq);
> }
> EXPORT_SYMBOL(cancel_rearming_delayed_workqueue);
It's very optimistic change...
I wonder, how this all could work so long (or how it is supposed
to work now without breaking other callers) with (almost) reversed
condition?
According to this comment:
" * cancel_rearming_delayed_workqueue - reliably kill off a delayed
work whose handler rearms the delayed work."
So, it cannot be used in netpoll_cleanup() if there is no rearming
during this cancel at all. This is a tricky behaviour of course,
and is changed in 2.6.22-rc.
Regards,
Jarek P.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue()
2007-05-25 8:16 ` Jarek Poplawski
@ 2007-05-25 13:21 ` Jason Wessel
0 siblings, 0 replies; 3+ messages in thread
From: Jason Wessel @ 2007-05-25 13:21 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 890 bytes --]
Jarek Poplawski wrote:
> On 25-05-2007 05:21, Jason Wessel wrote:
>
>> There is a problem with the calling cancel_rearming_delayed_work if the
>> timer was not yet active.
>>
>>
>> It is possible that the problem exists else where, but I thought I would
>> ask if this is expected?
>>
>>
>
>
> " * cancel_rearming_delayed_workqueue - reliably kill off a delayed
> work whose handler rearms the delayed work."
>
> So, it cannot be used in netpoll_cleanup() if there is no rearming
> during this cancel at all. This is a tricky behaviour of course,
> and is changed in 2.6.22-rc.
>
>
I had read that as well. It seems this the API for the cancel is
restricted such that you cannot call it if there is nothing to cancel.
This means the netpoll code is incorrect and the simple fix is
attached. With the change the netpoll_cleanup works correctly in 2.6.21.
Thanks,
Jason.
[-- Attachment #2: netpoll_workqueue_fix.patch --]
[-- Type: text/plain, Size: 841 bytes --]
Do not call cancel_rearming_delayed_work() if there is no
pending work.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
---
net/core/netpoll.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
Index: linux-2.6.21/net/core/netpoll.c
===================================================================
--- linux-2.6.21.orig/net/core/netpoll.c
+++ linux-2.6.21/net/core/netpoll.c
@@ -781,8 +781,10 @@ void netpoll_cleanup(struct netpoll *np)
if (atomic_dec_and_test(&npinfo->refcnt)) {
skb_queue_purge(&npinfo->arp_tx);
skb_queue_purge(&npinfo->txq);
- cancel_rearming_delayed_work(&npinfo->tx_work);
- flush_scheduled_work();
+ if (delayed_work_pending(&npinfo->tx_work)) {
+ cancel_rearming_delayed_work(&npinfo->tx_work);
+ flush_scheduled_work();
+ }
kfree(npinfo);
}
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-05-25 13:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-25 3:21 [BUG] 2.6.21 hang in cancel_rearming_delayed_workqueue() Jason Wessel
2007-05-25 8:16 ` Jarek Poplawski
2007-05-25 13:21 ` Jason Wessel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox