Re: Boot time regression in 2.6.38 after initial wq merge

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: Boot time regression in 2.6.38 after initial wq merge
       [not found] <4D62CE9C.7090806@versanet.de>
@ 2011-02-22  8:17 ` Tejun Heo
  2011-02-22  8:52   ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2011-02-22  8:17 UTC (permalink / raw)
  To: pantherchen@versanet.de; +Cc: Dmitry Torokhov, linux-kernel

(cc'ing Dmitry and lkml)

Hello,

On Mon, Feb 21, 2011 at 09:44:12PM +0100, pantherchen@versanet.de wrote:
> I'm experiencing a significant boot time regression (almost +50%)
> starting with 2.6.38-rc1 (see boot charts before [0], and after [1]
> - note the long uninterruptible sleep of kworker/0:1).

Comparing the two boot charts, the uninterruptible sleeps in
kowrker/0:1 seems to be the same one from kseriod.  They're about the
same duration.  kseriod is replaced with a work item, so that explains
it.

> After doing some bisecting and thorough testing this merge seems to
> be the first one showing the regression: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=23d69b09b78c4876e134f104a3814c30747c53f1
> ("Merge branch 'for-2.6.38' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq").
> 
> However, a kernel built from the branch 'for-2.6.38' at
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq doesn't show the
> behavior.
> 
> I also tried to cherry-pick those 33 commits, applying them to the
> last good commit from Linus' tree (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4ead36407b41eae942c8c9f70ef963cd369c90e2)
> and it also doesn't show the regression.
> 
> Do you know what's going on here? I'm just an end user and I've
> definitely run out of ideas.
> 
> Thanks in advance for your help,
> 
> Hernando Torque
> Thanks
> 
> [0]: http://launchpadlibrarian.net/64639927/27-12-intel.png
> [1]: http://launchpadlibrarian.net/64640030/38-4-intel.png

>From the boot chart, two things are noticeable.

1. kworker/0:1's uninterruptible sleeps start later than kserio's.

   It could be that cpu 0 was busy running other stuff and thus cmwq
   delayed executing serio_event_work; however, if we look at the CPU
   usage, that doesn't seem likely.  The CPU is not busy at all and if
   the CPU isn't busy, cmwq wouldn't introduce any noticeable delay in
   work item execution.

   Another possibility is the rescuer concurrency depletion bug is
   delaying execution of queued work items early during boot.  This
   was fixed recently.  Can you please give a shot at 2.6.38-rc6 and
   see whether anything is different?

2. Most of the delay is caused by xorg starting up much later.  xorg
   seems to start up in parallel with the kseriod sleeps in 2.6.37 but
   on 2.6.38 it seems to wait for the serio_event_work to finish.

   I have no idea what xorg is waiting for.  Dmitry, any clue?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22  8:17 ` Boot time regression in 2.6.38 after initial wq merge Tejun Heo
@ 2011-02-22  8:52   ` Dmitry Torokhov
  2011-02-22  9:02     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-22  8:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: pantherchen@versanet.de, linux-kernel

On Tue, Feb 22, 2011 at 09:17:52AM +0100, Tejun Heo wrote:
> (cc'ing Dmitry and lkml)
> 
> Hello,
> 
> On Mon, Feb 21, 2011 at 09:44:12PM +0100, pantherchen@versanet.de wrote:
> > I'm experiencing a significant boot time regression (almost +50%)
> > starting with 2.6.38-rc1 (see boot charts before [0], and after [1]
> > - note the long uninterruptible sleep of kworker/0:1).
> 
> Comparing the two boot charts, the uninterruptible sleeps in
> kowrker/0:1 seems to be the same one from kseriod.  They're about the
> same duration.  kseriod is replaced with a work item, so that explains
> it.
> 
> > After doing some bisecting and thorough testing this merge seems to
> > be the first one showing the regression: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=23d69b09b78c4876e134f104a3814c30747c53f1
> > ("Merge branch 'for-2.6.38' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq").
> > 
> > However, a kernel built from the branch 'for-2.6.38' at
> > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq doesn't show the
> > behavior.
> > 
> > I also tried to cherry-pick those 33 commits, applying them to the
> > last good commit from Linus' tree (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4ead36407b41eae942c8c9f70ef963cd369c90e2)
> > and it also doesn't show the regression.
> > 
> > Do you know what's going on here? I'm just an end user and I've
> > definitely run out of ideas.
> > 
> > Thanks in advance for your help,
> > 
> > Hernando Torque
> > Thanks
> > 
> > [0]: http://launchpadlibrarian.net/64639927/27-12-intel.png
> > [1]: http://launchpadlibrarian.net/64640030/38-4-intel.png
> 
> From the boot chart, two things are noticeable.
> 
> 1. kworker/0:1's uninterruptible sleeps start later than kserio's.
> 
>    It could be that cpu 0 was busy running other stuff and thus cmwq
>    delayed executing serio_event_work; however, if we look at the CPU
>    usage, that doesn't seem likely.  The CPU is not busy at all and if
>    the CPU isn't busy, cmwq wouldn't introduce any noticeable delay in
>    work item execution.
> 
>    Another possibility is the rescuer concurrency depletion bug is
>    delaying execution of queued work items early during boot.  This
>    was fixed recently.  Can you please give a shot at 2.6.38-rc6 and
>    see whether anything is different?
> 
> 2. Most of the delay is caused by xorg starting up much later.  xorg
>    seems to start up in parallel with the kseriod sleeps in 2.6.37 but
>    on 2.6.38 it seems to wait for the serio_event_work to finish.
> 
>    I have no idea what xorg is waiting for.  Dmitry, any clue?
> 

It looks like it is not X is waiting but plymouth not being told to
quit... I will have to look at waht triggers plymouth->X/GDM transition.

Also, serio jobs (mouse probe) is quite lengthtly. Should it be using
unbound workqueue instead?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22  8:52   ` Dmitry Torokhov
@ 2011-02-22  9:02     ` Tejun Heo
  2011-02-22  9:15       ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2011-02-22  9:02 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: pantherchen@versanet.de, linux-kernel

On Tue, Feb 22, 2011 at 12:52:23AM -0800, Dmitry Torokhov wrote:
> > 1. kworker/0:1's uninterruptible sleeps start later than kserio's.
> > 
> >    It could be that cpu 0 was busy running other stuff and thus cmwq
> >    delayed executing serio_event_work; however, if we look at the CPU
> >    usage, that doesn't seem likely.  The CPU is not busy at all and if
> >    the CPU isn't busy, cmwq wouldn't introduce any noticeable delay in
> >    work item execution.
> > 
> >    Another possibility is the rescuer concurrency depletion bug is
> >    delaying execution of queued work items early during boot.  This
> >    was fixed recently.  Can you please give a shot at 2.6.38-rc6 and
> >    see whether anything is different?
> > 
> > 2. Most of the delay is caused by xorg starting up much later.  xorg
> >    seems to start up in parallel with the kseriod sleeps in 2.6.37 but
> >    on 2.6.38 it seems to wait for the serio_event_work to finish.
> > 
> >    I have no idea what xorg is waiting for.  Dmitry, any clue?
> > 
> 
> It looks like it is not X is waiting but plymouth not being told to
> quit... I will have to look at waht triggers plymouth->X/GDM transition.
> 
> Also, serio jobs (mouse probe) is quite lengthtly. Should it be using
> unbound workqueue instead?

How long it works doesn't matter at all.  If you look at the boot
chart, as soon as those uninterruptible sleeps start, kworker/0:2 is
created to serve other work items, so it doesn't really affect anyone
else.  Unbound ones are mostly helpful for cases where the work items
involved may consume large amount of cpu cycles (not true here) over
long period of time.  That said, something definitely seems wrong
here.  Eh well, let's find out. :-)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22  9:02     ` Tejun Heo
@ 2011-02-22  9:15       ` Dmitry Torokhov
  2011-02-22 15:04         ` pantherchen
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-22  9:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: pantherchen@versanet.de, linux-kernel@vger.kernel.org

On Feb 22, 2011, at 1:02 AM, Tejun Heo <tj@kernel.org> wrote:

> On Tue, Feb 22, 2011 at 12:52:23AM -0800, Dmitry Torokhov wrote:
>>> 1. kworker/0:1's uninterruptible sleeps start later than kserio's.
>>> 
>>>   It could be that cpu 0 was busy running other stuff and thus cmwq
>>>   delayed executing serio_event_work; however, if we look at the CPU
>>>   usage, that doesn't seem likely.  The CPU is not busy at all and if
>>>   the CPU isn't busy, cmwq wouldn't introduce any noticeable delay in
>>>   work item execution.
>>> 
>>>   Another possibility is the rescuer concurrency depletion bug is
>>>   delaying execution of queued work items early during boot.  This
>>>   was fixed recently.  Can you please give a shot at 2.6.38-rc6 and
>>>   see whether anything is different?
>>> 
>>> 2. Most of the delay is caused by xorg starting up much later.  xorg
>>>   seems to start up in parallel with the kseriod sleeps in 2.6.37 but
>>>   on 2.6.38 it seems to wait for the serio_event_work to finish.
>>> 
>>>   I have no idea what xorg is waiting for.  Dmitry, any clue?
>>> 
>> 
>> It looks like it is not X is waiting but plymouth not being told to
>> quit... I will have to look at waht triggers plymouth->X/GDM transition.
>> 
>> Also, serio jobs (mouse probe) is quite lengthtly. Should it be using
>> unbound workqueue instead?
> 
> How long it works doesn't matter at all.  If you look at the boot
> chart, as soon as those uninterruptible sleeps start, kworker/0:2 is
> created to serve other work items, so it doesn't really affect anyone
> else.  Unbound ones are mostly helpful for cases where the work items
> involved may consume large amount of cpu cycles (not true here) over
> long period of time.  That said, something definitely seems wrong
> here.  Eh well, let's find out. :-)

Ok, so here on Fedora 14 Plymouth is being told to exit once all rc scripts are finished running, as part of prefdm task. If serio hogs kworkers and for some reason mount or similar has to wait that would explain boot delay. I'd like to see rc6 tried. 

Thanks. 

-- 
Dmitry


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22  9:15       ` Dmitry Torokhov
@ 2011-02-22 15:04         ` pantherchen
  2011-02-22 17:22           ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: pantherchen @ 2011-02-22 15:04 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On 02/22/2011 10:15 AM, Dmitry Torokhov wrote:
> I'd like to see rc6 tried.

Unfortunately it shows the same behavior [0]. (To speed things up, I 
used a stripped down kernel config [1], while the first two posted boot 
charts used Ubuntu's stock kernels.)

I was surprised that the kernel directly built from the wq branch and 
the last "good" kernel from Linus' tree with the 33 patches applied 
don't show the increased boot time - shouldn't they all be the same?

About the system: Lenovo ThinkPad T510 with an Core i5-560m, 6 GB RAM, a 
SSD, and hybrid graphics (can boot from either the Intel HD or the 
Nvidia NVS 3100M - regression shows up with both).

The system is running an up-to-date Ubuntu dev release (Natty) with Xorg 
7.6(~3ubuntu7), Xserver 1.9.99.901+git20110131.be3be758(-0ubuntu6), and 
Plymouth 0.8.2(-2ubuntu17) FWIW.

Thanks,

Hernando Torque

[0]: http://img.xrmb2.net/images/576186.png
[1]: http://paste.ubuntu.com/570595/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 15:04         ` pantherchen
@ 2011-02-22 17:22           ` Dmitry Torokhov
  2011-02-22 17:59             ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-22 17:22 UTC (permalink / raw)
  To: pantherchen@versanet.de; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On Tue, Feb 22, 2011 at 04:04:02PM +0100, pantherchen@versanet.de wrote:
> On 02/22/2011 10:15 AM, Dmitry Torokhov wrote:
> >I'd like to see rc6 tried.
> 
> Unfortunately it shows the same behavior [0]. (To speed things up, I
> used a stripped down kernel config [1], while the first two posted
> boot charts used Ubuntu's stock kernels.)
> 
> I was surprised that the kernel directly built from the wq branch
> and the last "good" kernel from Linus' tree with the 33 patches
> applied don't show the increased boot time - shouldn't they all be
> the same?
> 

No, because there were more merges between your last-known-good (which
is somewhere in the middle of Jiri's HID merge) and Tejun's workqueue
pull. Namely there was merge of my tree that changed serio from using
kseriod to the common workqueue.

Just to confirm, if you revert commit

	8ee294cd9def0004887da7f44b80563493b0a097

from 2.6.38-rc6, does this restore boot time?

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 17:22           ` Dmitry Torokhov
@ 2011-02-22 17:59             ` Dmitry Torokhov
  2011-02-22 19:32               ` pantherchen
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-22 17:59 UTC (permalink / raw)
  To: pantherchen@versanet.de; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On Tue, Feb 22, 2011 at 09:22:54AM -0800, Dmitry Torokhov wrote:
> On Tue, Feb 22, 2011 at 04:04:02PM +0100, pantherchen@versanet.de wrote:
> > On 02/22/2011 10:15 AM, Dmitry Torokhov wrote:
> > >I'd like to see rc6 tried.
> > 
> > Unfortunately it shows the same behavior [0]. (To speed things up, I
> > used a stripped down kernel config [1], while the first two posted
> > boot charts used Ubuntu's stock kernels.)
> > 
> > I was surprised that the kernel directly built from the wq branch
> > and the last "good" kernel from Linus' tree with the 33 patches
> > applied don't show the increased boot time - shouldn't they all be
> > the same?
> > 
> 
> No, because there were more merges between your last-known-good (which
> is somewhere in the middle of Jiri's HID merge) and Tejun's workqueue
> pull. Namely there was merge of my tree that changed serio from using
> kseriod to the common workqueue.
> 
> Just to confirm, if you revert commit
> 
> 	8ee294cd9def0004887da7f44b80563493b0a097
> 
> from 2.6.38-rc6, does this restore boot time?
> 

And if that indeed fixes the issue I wonder if the reason for the stall
is that we trip on flush_scheduled_work() somewhere. If you could stick
dump_stack() into flush_scheduled_work() that might give us some clues.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 17:59             ` Dmitry Torokhov
@ 2011-02-22 19:32               ` pantherchen
  2011-02-22 19:52                 ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: pantherchen @ 2011-02-22 19:32 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On 02/22/2011 06:59 PM, Dmitry Torokhov wrote:
> On Tue, Feb 22, 2011 at 09:22:54AM -0800, Dmitry Torokhov wrote:
>> Just to confirm, if you revert commit
>>
>> 	8ee294cd9def0004887da7f44b80563493b0a097
>>
>> from 2.6.38-rc6, does this restore boot time?

Yes, it's booting fine with that commit reverted.

> And if that indeed fixes the issue I wonder if the reason for the stall
> is that we trip on flush_scheduled_work() somewhere. If you could stick
> dump_stack() into flush_scheduled_work() that might give us some clues.

I wasn't sure where to put the dump_stack() call, so I placed one before 
and one after the flush_workqueue() call.

Here's the part that turned up in dmesg: http://paste.ubuntu.com/570729/

Regards,

Hernando Torque

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 19:32               ` pantherchen
@ 2011-02-22 19:52                 ` Dmitry Torokhov
  2011-02-22 20:14                   ` pantherchen
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-22 19:52 UTC (permalink / raw)
  To: pantherchen@versanet.de; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On Tue, Feb 22, 2011 at 08:32:55PM +0100, pantherchen@versanet.de wrote:
> On 02/22/2011 06:59 PM, Dmitry Torokhov wrote:
> >On Tue, Feb 22, 2011 at 09:22:54AM -0800, Dmitry Torokhov wrote:
> >>Just to confirm, if you revert commit
> >>
> >>	8ee294cd9def0004887da7f44b80563493b0a097
> >>
> >>from 2.6.38-rc6, does this restore boot time?
> 
> Yes, it's booting fine with that commit reverted.
> 
> >And if that indeed fixes the issue I wonder if the reason for the stall
> >is that we trip on flush_scheduled_work() somewhere. If you could stick
> >dump_stack() into flush_scheduled_work() that might give us some clues.
> 
> I wasn't sure where to put the dump_stack() call, so I placed one
> before and one after the flush_workqueue() call.
> 
> Here's the part that turned up in dmesg: http://paste.ubuntu.com/570729/
> 

Ewww... tty/ldisc...

Does it help if you change drivers/input/serio/serio.c::serio_queue_event()
from calling

	schedule_work(&serio_event_work);

to call

	queue_work(system_long_wq, &serio_event_work);

?

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 19:52                 ` Dmitry Torokhov
@ 2011-02-22 20:14                   ` pantherchen
  2011-02-23  5:20                     ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: pantherchen @ 2011-02-22 20:14 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On 02/22/2011 08:52 PM, Dmitry Torokhov wrote:
> Ewww... tty/ldisc...
>
> Does it help if you change drivers/input/serio/serio.c::serio_queue_event()
> from calling
>
> 	schedule_work(&serio_event_work);
>
> to call
>
> 	queue_work(system_long_wq,&serio_event_work);
>
> ?

Yes, that works: http://img.xrmb2.net/images/499337.png

Regards,

Hernando Torque

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-22 20:14                   ` pantherchen
@ 2011-02-23  5:20                     ` Dmitry Torokhov
  2011-02-23  9:38                       ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-23  5:20 UTC (permalink / raw)
  To: pantherchen@versanet.de; +Cc: Tejun Heo, linux-kernel@vger.kernel.org

On Tue, Feb 22, 2011 at 09:14:46PM +0100, pantherchen@versanet.de wrote:
> On 02/22/2011 08:52 PM, Dmitry Torokhov wrote:
> >Ewww... tty/ldisc...
> >
> >Does it help if you change drivers/input/serio/serio.c::serio_queue_event()
> >from calling
> >
> >	schedule_work(&serio_event_work);
> >
> >to call
> >
> >	queue_work(system_long_wq,&serio_event_work);
> >
> >?
> 
> Yes, that works: http://img.xrmb2.net/images/499337.png
> 

Great!

OK, so below is properly formatted patch. I am going to send it upstream
unless somebody objects.

Thanks.

-- 
Dmitry


Input: serio/gameport - use 'long' system workqueue

From: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Commit 8ee294cd9def0004887da7f44b80563493b0a097 converted serio
subsystem event handling from using a dedicated thread to using
common workqueue. Unfortunately, this regressed our boot times,
due to the fact that serio jobs take long time to execute. While
the new concurrency managed workqueue code manages long-playing
works just fine and schedules additional workers as needed, such
works wreck havoc among remaining users of flush_scheduled_work().

To solve this problem let's move serio/gameport works from system_wq
to system_long_wq which nobody tries to flush.

Reported-and-tested-by: Hernando Torque <pantherchen@versanet.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
---

 drivers/input/gameport/gameport.c |    2 +-
 drivers/input/serio/serio.c       |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/drivers/input/gameport/gameport.c b/drivers/input/gameport/gameport.c
index 23cf8fc..5b8f59d 100644
--- a/drivers/input/gameport/gameport.c
+++ b/drivers/input/gameport/gameport.c
@@ -360,7 +360,7 @@ static int gameport_queue_event(void *object, struct module *owner,
 	event->owner = owner;
 
 	list_add_tail(&event->node, &gameport_event_list);
-	schedule_work(&gameport_event_work);
+	queue_work(system_long_wq, &gameport_event_work);
 
 out:
 	spin_unlock_irqrestore(&gameport_event_lock, flags);
diff --git a/drivers/input/serio/serio.c b/drivers/input/serio/serio.c
index 7c38d1f..ba70058 100644
--- a/drivers/input/serio/serio.c
+++ b/drivers/input/serio/serio.c
@@ -299,7 +299,7 @@ static int serio_queue_event(void *object, struct module *owner,
 	event->owner = owner;
 
 	list_add_tail(&event->node, &serio_event_list);
-	schedule_work(&serio_event_work);
+	queue_work(system_long_wq, &serio_event_work);
 
 out:
 	spin_unlock_irqrestore(&serio_event_lock, flags);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-23  5:20                     ` Dmitry Torokhov
@ 2011-02-23  9:38                       ` Tejun Heo
  2011-02-23 10:22                         ` Dmitry Torokhov
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2011-02-23  9:38 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: pantherchen@versanet.de, linux-kernel@vger.kernel.org

Hello,

On Tue, Feb 22, 2011 at 09:20:42PM -0800, Dmitry Torokhov wrote:
> On Tue, Feb 22, 2011 at 09:14:46PM +0100, pantherchen@versanet.de wrote:
> > On 02/22/2011 08:52 PM, Dmitry Torokhov wrote:
> > >Ewww... tty/ldisc...
> > >
> > >Does it help if you change drivers/input/serio/serio.c::serio_queue_event()
> > >from calling
> > >
> > >	schedule_work(&serio_event_work);
> > >
> > >to call
> > >
> > >	queue_work(system_long_wq,&serio_event_work);
> > >
> > >?
> > 
> > Yes, that works: http://img.xrmb2.net/images/499337.png
> > 
> 
> Great!
> 
> OK, so below is properly formatted patch. I am going to send it upstream
> unless somebody objects.

Ah, okay, thanks a lot for tracking this down, Dmitry.  Didn't expect
flush_scheduled_work() to bite back this way.  :-)

> Input: serio/gameport - use 'long' system workqueue
> 
> From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> 
> Commit 8ee294cd9def0004887da7f44b80563493b0a097 converted serio
> subsystem event handling from using a dedicated thread to using
> common workqueue. Unfortunately, this regressed our boot times,
> due to the fact that serio jobs take long time to execute. While
> the new concurrency managed workqueue code manages long-playing
> works just fine and schedules additional workers as needed, such
> works wreck havoc among remaining users of flush_scheduled_work().
> 
> To solve this problem let's move serio/gameport works from system_wq
> to system_long_wq which nobody tries to flush.
> 
> Reported-and-tested-by: Hernando Torque <pantherchen@versanet.de>
> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

Acked-by: Tejun Heo <tj@kernel.org>

Most flush_scheduled_work() users are already gone and once all of
them are gone, system_long_wq can be removed without causing any
difference.

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Boot time regression in 2.6.38 after initial wq merge
  2011-02-23  9:38                       ` Tejun Heo
@ 2011-02-23 10:22                         ` Dmitry Torokhov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Torokhov @ 2011-02-23 10:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: pantherchen@versanet.de, linux-kernel@vger.kernel.org

On Wed, Feb 23, 2011 at 10:38:54AM +0100, Tejun Heo wrote:
> Hello,
> 
> On Tue, Feb 22, 2011 at 09:20:42PM -0800, Dmitry Torokhov wrote:
> > On Tue, Feb 22, 2011 at 09:14:46PM +0100, pantherchen@versanet.de wrote:
> > > On 02/22/2011 08:52 PM, Dmitry Torokhov wrote:
> > > >Ewww... tty/ldisc...
> > > >
> > > >Does it help if you change drivers/input/serio/serio.c::serio_queue_event()
> > > >from calling
> > > >
> > > >	schedule_work(&serio_event_work);
> > > >
> > > >to call
> > > >
> > > >	queue_work(system_long_wq,&serio_event_work);
> > > >
> > > >?
> > > 
> > > Yes, that works: http://img.xrmb2.net/images/499337.png
> > > 
> > 
> > Great!
> > 
> > OK, so below is properly formatted patch. I am going to send it upstream
> > unless somebody objects.
> 
> Ah, okay, thanks a lot for tracking this down, Dmitry.  Didn't expect
> flush_scheduled_work() to bite back this way.  :-)
> 
> > Input: serio/gameport - use 'long' system workqueue
> > 
> > From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> > 
> > Commit 8ee294cd9def0004887da7f44b80563493b0a097 converted serio
> > subsystem event handling from using a dedicated thread to using
> > common workqueue. Unfortunately, this regressed our boot times,
> > due to the fact that serio jobs take long time to execute. While
> > the new concurrency managed workqueue code manages long-playing
> > works just fine and schedules additional workers as needed, such
> > works wreck havoc among remaining users of flush_scheduled_work().
> > 
> > To solve this problem let's move serio/gameport works from system_wq
> > to system_long_wq which nobody tries to flush.
> > 
> > Reported-and-tested-by: Hernando Torque <pantherchen@versanet.de>
> > Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> Most flush_scheduled_work() users are already gone and once all of
> them are gone, system_long_wq can be removed without causing any
> difference.

Right. When you are ready to remove system_long_wq simply revert this
path and all should be set.

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-02-23 10:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4D62CE9C.7090806@versanet.de>
2011-02-22  8:17 ` Boot time regression in 2.6.38 after initial wq merge Tejun Heo
2011-02-22  8:52   ` Dmitry Torokhov
2011-02-22  9:02     ` Tejun Heo
2011-02-22  9:15       ` Dmitry Torokhov
2011-02-22 15:04         ` pantherchen
2011-02-22 17:22           ` Dmitry Torokhov
2011-02-22 17:59             ` Dmitry Torokhov
2011-02-22 19:32               ` pantherchen
2011-02-22 19:52                 ` Dmitry Torokhov
2011-02-22 20:14                   ` pantherchen
2011-02-23  5:20                     ` Dmitry Torokhov
2011-02-23  9:38                       ` Tejun Heo
2011-02-23 10:22                         ` Dmitry Torokhov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox