* Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue @ 2012-11-28 15:13 Anders Kaseorg 2012-11-28 15:17 ` Anders Kaseorg 0 siblings, 1 reply; 8+ messages in thread From: Anders Kaseorg @ 2012-11-28 15:13 UTC (permalink / raw) To: Tejun Heo, Herbert Xu, John W. Linville Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-wireless-u79uwXL29TY76Z2rM5mHXA, linux-kernel-OBmyj6YDURU My Intel 6250 wireless card (iwldvm) can no longer associate with a WPA-Enterprise network (PEAP-MSCHAPv2). To my surprise, I bisected this regression to commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79, workqueue: use mod_delayed_work() instead of __cancel + queue. A bunch of logs collected by Ubuntu apport are in this bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1083980 How can I help to debug this? I see that someone else reported another regression with the same commit last week, although this looks unrelated at first glance: http://thread.gmane.org/gmane.linux.kernel/1395938 Anders -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-11-28 15:13 Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue Anders Kaseorg @ 2012-11-28 15:17 ` Anders Kaseorg 2012-11-30 21:14 ` Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Anders Kaseorg @ 2012-11-28 15:17 UTC (permalink / raw) To: Tejun Heo, Herbert Xu, John W. Linville Cc: netdev, linux-wireless, linux-kernel On Wed, 28 Nov 2012, Anders Kaseorg wrote: > My Intel 6250 wireless card (iwldvm) can no longer associate with a > WPA-Enterprise network (PEAP-MSCHAPv2). To my surprise, I bisected this > regression to commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79, > workqueue: use mod_delayed_work() instead of __cancel + queue. > > A bunch of logs collected by Ubuntu apport are in this bug report: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1083980 > > How can I help to debug this? > > I see that someone else reported another regression with the same commit > last week, although this looks unrelated at first glance: > http://thread.gmane.org/gmane.linux.kernel/1395938 > > Anders (Oops, re-sending to fix a typo in the LKML address.) Anders ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-11-28 15:17 ` Anders Kaseorg @ 2012-11-30 21:14 ` Tejun Heo 2012-11-30 22:56 ` Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Tejun Heo @ 2012-11-30 21:14 UTC (permalink / raw) To: Anders Kaseorg Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel Hello, Anders. Sorry about the delay. On Wed, Nov 28, 2012 at 10:17:28AM -0500, Anders Kaseorg wrote: > On Wed, 28 Nov 2012, Anders Kaseorg wrote: > > My Intel 6250 wireless card (iwldvm) can no longer associate with a > > WPA-Enterprise network (PEAP-MSCHAPv2). To my surprise, I bisected this > > regression to commit e7c2f967445dd2041f0f8e3179cca22bb8bb7f79, > > workqueue: use mod_delayed_work() instead of __cancel + queue. I see. > > A bunch of logs collected by Ubuntu apport are in this bug report: > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1083980 > > > > How can I help to debug this? > > > > I see that someone else reported another regression with the same commit > > last week, although this looks unrelated at first glance: > > http://thread.gmane.org/gmane.linux.kernel/1395938 Urgh... that one was in my spam folder probably due to the mimed content. Nothing rings a bell yet. Will keep looking into it. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-11-30 21:14 ` Tejun Heo @ 2012-11-30 22:56 ` Tejun Heo 2012-12-01 4:15 ` Anders Kaseorg 0 siblings, 1 reply; 8+ messages in thread From: Tejun Heo @ 2012-11-30 22:56 UTC (permalink / raw) To: Anders Kaseorg Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel Hey, again. Can you please test whether the following patch makes any difference? Thanks! diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 042d221..26368ef 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1477,7 +1477,10 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, } while (unlikely(ret == -EAGAIN)); if (likely(ret >= 0)) { - __queue_delayed_work(cpu, wq, dwork, delay); + if (!delay) + __queue_work(cpu, wq, &dwork->work); + else + __queue_delayed_work(cpu, wq, dwork, delay); local_irq_restore(flags); } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-11-30 22:56 ` Tejun Heo @ 2012-12-01 4:15 ` Anders Kaseorg 2012-12-01 14:39 ` Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Anders Kaseorg @ 2012-12-01 4:15 UTC (permalink / raw) To: Tejun Heo Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel On Fri, 30 Nov 2012, Tejun Heo wrote: > Hey, again. > > Can you please test whether the following patch makes any difference? > > Thanks! > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 042d221..26368ef 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1477,7 +1477,10 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, > } while (unlikely(ret == -EAGAIN)); > > if (likely(ret >= 0)) { > - __queue_delayed_work(cpu, wq, dwork, delay); > + if (!delay) > + __queue_work(cpu, wq, &dwork->work); > + else > + __queue_delayed_work(cpu, wq, dwork, delay); > local_irq_restore(flags); > } > Yes. I tested that both directly on top of the bad commit, and on v3.7-rc7, and it fixes the bug in both cases. Thanks, Anders ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-12-01 4:15 ` Anders Kaseorg @ 2012-12-01 14:39 ` Tejun Heo 2012-12-01 23:53 ` Anders Kaseorg 0 siblings, 1 reply; 8+ messages in thread From: Tejun Heo @ 2012-12-01 14:39 UTC (permalink / raw) To: Anders Kaseorg Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel Hey, Anders. On Fri, Nov 30, 2012 at 11:15:50PM -0500, Anders Kaseorg wrote: > Yes. I tested that both directly on top of the bad commit, and on > v3.7-rc7, and it fixes the bug in both cases. Can you please test this one too? Thanks! diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 042d221..94964d1 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1364,6 +1364,11 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, BUG_ON(timer_pending(timer)); BUG_ON(!list_empty(&work->entry)); + if (!delay) { + __queue_work(cpu, wq, &dwork->work); + return; + } + timer_stats_timer_set_start_info(&dwork->timer); /* @@ -1417,9 +1422,6 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, bool ret = false; unsigned long flags; - if (!delay) - return queue_work_on(cpu, wq, &dwork->work); - /* read the comment in __queue_work() */ local_irq_save(flags); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue 2012-12-01 14:39 ` Tejun Heo @ 2012-12-01 23:53 ` Anders Kaseorg 2012-12-02 1:11 ` [PATCH] workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Anders Kaseorg @ 2012-12-01 23:53 UTC (permalink / raw) To: Tejun Heo Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel On Sat, 1 Dec 2012, Tejun Heo wrote: > Can you please test this one too? Thanks! > > […] > + if (!delay) { > + __queue_work(cpu, wq, &dwork->work); > + return; > + } > + > […] > - if (!delay) > - return queue_work_on(cpu, wq, &dwork->work); > - Yes, this one fixes the bug too (on v3.7.0-rc7). Anders ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay 2012-12-01 23:53 ` Anders Kaseorg @ 2012-12-02 1:11 ` Tejun Heo 0 siblings, 0 replies; 8+ messages in thread From: Tejun Heo @ 2012-12-02 1:11 UTC (permalink / raw) To: Anders Kaseorg Cc: Herbert Xu, John W. Linville, netdev, linux-wireless, linux-kernel, Zlatko Calusic, Joonsoo Kim >From 8852aac25e79e38cc6529f20298eed154f60b574 Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Sat, 1 Dec 2012 16:23:42 -0800 8376fe22c7 ("workqueue: implement mod_delayed_work[_on]()") implemented mod_delayed_work[_on]() using the improved try_to_grab_pending(). The function is later used, among others, to replace [__]candel_delayed_work() + queue_delayed_work() combinations. Unfortunately, a delayed_work item w/ zero @delay is handled slightly differently by mod_delayed_work_on() compared to queue_delayed_work_on(). The latter skips timer altogether and directly queues it using queue_work_on() while the former schedules timer which will expire on the closest tick. This means, when @delay is zero, that [__]cancel_delayed_work() + queue_delayed_work_on() makes the target item immediately executable while mod_delayed_work_on() may induce delay of upto a full tick. This somewhat subtle difference breaks some of the converted users. e.g. block queue plugging uses delayed_work for deferred processing and uses mod_delayed_work_on() when the queue needs to be immediately unplugged. The above problem manifested as noticeably higher number of context switches under certain circumstances. The difference in behavior was caused by missing special case handling for 0 delay in mod_delayed_work_on() compared to queue_delayed_work_on(). Joonsoo Kim posted a patch to add it - ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1]. The patch was queued for 3.8 but it was described as optimization and I missed that it was a correctness issue. As both queue_delayed_work_on() and mod_delayed_work_on() use __queue_delayed_work() for queueing, it seems that the better approach is to move the 0 delay special handling to the function instead of duplicating it in mod_delayed_work_on(). Fix the problem by moving 0 delay special case handling from queue_delayed_work_on() to __queue_delayed_work(). This replaces Joonsoo's patch. [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Anders Kaseorg <andersk@MIT.EDU> Reported-and-tested-by: Zlatko Calusic <zlatko.calusic@iskon.hr> LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu> LKML-Reference: <50A78AA9.5040904@iskon.hr> Cc: Joonsoo Kim <js1304@gmail.com> --- Applied to wq/for-3.7-fixes. Pull request sent. Thanks! kernel/workqueue.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index ac25db1..084aa47 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1364,6 +1364,17 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq, BUG_ON(timer_pending(timer)); BUG_ON(!list_empty(&work->entry)); + /* + * If @delay is 0, queue @dwork->work immediately. This is for + * both optimization and correctness. The earliest @timer can + * expire is on the closest next tick and delayed_work users depend + * on that there's no such delay when @delay is 0. + */ + if (!delay) { + __queue_work(cpu, wq, &dwork->work); + return; + } + timer_stats_timer_set_start_info(&dwork->timer); /* @@ -1417,9 +1428,6 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq, bool ret = false; unsigned long flags; - if (!delay) - return queue_work_on(cpu, wq, &dwork->work); - /* read the comment in __queue_work() */ local_irq_save(flags); -- 1.7.11.7 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-12-02 1:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-28 15:13 Wireless regression in workqueue: use mod_delayed_work() instead of __cancel + queue Anders Kaseorg 2012-11-28 15:17 ` Anders Kaseorg 2012-11-30 21:14 ` Tejun Heo 2012-11-30 22:56 ` Tejun Heo 2012-12-01 4:15 ` Anders Kaseorg 2012-12-01 14:39 ` Tejun Heo 2012-12-01 23:53 ` Anders Kaseorg 2012-12-02 1:11 ` [PATCH] workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).