linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Crap, ksoftirqd/0 looping forever in softirq
@ 2012-10-11  0:16 Paul Gortmaker
  2012-10-11  2:38 ` Mike Galbraith
  2012-10-11 14:21 ` Paul Gortmaker
  0 siblings, 2 replies; 5+ messages in thread
From: Paul Gortmaker @ 2012-10-11  0:16 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

Seeing the above message, I added a trivial change to print out the
Mask and Pending, and I consistently saw it was (both p/m) in every
case `was __IRQF_TIMER (0x200).

Looking at do_current_softirqs, it seems that it implicitly expects
two entrance paths:

   /*
    * Called with interrupts disabled. Process softirqs which were raised
    * in current context (or on behalf of ksoftirqd).
    */
   static void do_current_softirqs(int need_rcu_bh_qs)

In there, we loop on current->softirqs_raised, but there is no "filter"
for the case where we spawn from ksoftirqd with __IRQF_TIMER, or more
specifically IRQF_NO_THREAD.  Asking ksoftirqd to handle IRQF_NO_THREAD?

So based on my dain-bramaged and limited understanding, it seems that
we are waiting on completion from ksoftirqd on something that we've
asked to be not run from a thread.  I'd propose a patch but I'd probably
only manage to hurt myself, so I'll stop at reporting the above data,
and leave it to the folks more in the know to suggest a proper fix...

Paul.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Crap, ksoftirqd/0 looping forever in softirq
  2012-10-11  0:16 Crap, ksoftirqd/0 looping forever in softirq Paul Gortmaker
@ 2012-10-11  2:38 ` Mike Galbraith
  2012-10-11 14:21 ` Paul Gortmaker
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Galbraith @ 2012-10-11  2:38 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Thomas Gleixner, linux-rt-users

On Wed, 2012-10-10 at 20:16 -0400, Paul Gortmaker wrote: 
> Seeing the above message, I added a trivial change to print out the
> Mask and Pending, and I consistently saw it was (both p/m) in every
> case `was __IRQF_TIMER (0x200).
> 
> Looking at do_current_softirqs, it seems that it implicitly expects
> two entrance paths:
> 
>    /*
>     * Called with interrupts disabled. Process softirqs which were raised
>     * in current context (or on behalf of ksoftirqd).
>     */
>    static void do_current_softirqs(int need_rcu_bh_qs)
> 
> In there, we loop on current->softirqs_raised, but there is no "filter"
> for the case where we spawn from ksoftirqd with __IRQF_TIMER, or more
> specifically IRQF_NO_THREAD.  Asking ksoftirqd to handle IRQF_NO_THREAD?
> 
> So based on my dain-bramaged and limited understanding, it seems that
> we are waiting on completion from ksoftirqd on something that we've
> asked to be not run from a thread.  I'd propose a patch but I'd probably
> only manage to hurt myself, so I'll stop at reporting the above data,
> and leave it to the folks more in the know to suggest a proper fix...

FWIW, I saw the same trying to get rt1 running an my 64 core test box.
Box has other seemingly unrelated issues, so haven't poked at it yet,
only saw it say oh crap while fighting for its life (R.I.Pieces).

-Mike


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Crap, ksoftirqd/0 looping forever in softirq
  2012-10-11  0:16 Crap, ksoftirqd/0 looping forever in softirq Paul Gortmaker
  2012-10-11  2:38 ` Mike Galbraith
@ 2012-10-11 14:21 ` Paul Gortmaker
  2012-10-11 15:34   ` Paul Gortmaker
  1 sibling, 1 reply; 5+ messages in thread
From: Paul Gortmaker @ 2012-10-11 14:21 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

[Crap, ksoftirqd/0 looping forever in softirq] On 10/10/2012 (Wed 20:16) Paul Gortmaker wrote:

> Seeing the above message, I added a trivial change to print out the
> Mask and Pending, and I consistently saw it was (both p/m) in every
> case `was __IRQF_TIMER (0x200).

Of cource it helps if I look up the value in the right list.
I guess the ffs had me thinking "bits", instead of looking up
the value in the softirq enum list.

When I add a change to record the last vecs used and dump them
out when the Crap message is triggered (using softirq_to_name
instead of looking things up in random tables!)  I get this:

[   20.887137] Crap, ksoftirqd/0 looping forever in softirq
[   20.887150] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
[   20.920168] Crap, ksoftirqd/0 looping forever in softirq
[   20.920179] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
[   20.954200] Crap, ksoftirqd/0 looping forever in softirq
[   20.954213] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU

Which makes my previous guess complete garbage...

Paul.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Crap, ksoftirqd/0 looping forever in softirq
  2012-10-11 14:21 ` Paul Gortmaker
@ 2012-10-11 15:34   ` Paul Gortmaker
  2012-10-15 13:36     ` Paul E. McKenney
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Gortmaker @ 2012-10-11 15:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-rt-users

[Re: Crap, ksoftirqd/0 looping forever in softirq] On 11/10/2012 (Thu 10:21) Paul Gortmaker wrote:

> [Crap, ksoftirqd/0 looping forever in softirq] On 10/10/2012 (Wed 20:16) Paul Gortmaker wrote:
> 
> > Seeing the above message, I added a trivial change to print out the
> > Mask and Pending, and I consistently saw it was (both p/m) in every
> > case `was __IRQF_TIMER (0x200).
> 
> Of cource it helps if I look up the value in the right list.
> I guess the ffs had me thinking "bits", instead of looking up
> the value in the softirq enum list.
> 
> When I add a change to record the last vecs used and dump them
> out when the Crap message is triggered (using softirq_to_name
> instead of looking things up in random tables!)  I get this:
> 
> [   20.887137] Crap, ksoftirqd/0 looping forever in softirq
> [   20.887150] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
> [   20.920168] Crap, ksoftirqd/0 looping forever in softirq
> [   20.920179] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
> [   20.954200] Crap, ksoftirqd/0 looping forever in softirq
> [   20.954213] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU

Here is a trace, which I think shows us that we call the
RCU softirq, which eventually calls invoke_rcu_core, which
is just "raise_softirq(RCU_SOFTIRQ);" so the loop which is
testing on current->softirqs_raised will never see it zero.

We clear it on entry with:
	 current->softirqs_raised &= ~mask;
but after we call:
	 do_single_softirq(i, need_rcu_bh_qs);
it will be re-armed by invoke_rcu_core.

  0xffffffff81044d9d in __raise_softirq_irqoff (nr=9) at kernel/softirq.c:608
  0xffffffff81044e29 in raise_softirq_irqoff (nr=<optimized out>) at kernel/softirq.c:618
  0xffffffff810452e1 in raise_softirq (nr=<optimized out>) at kernel/softirq.c:744
  0xffffffff810bd4e1 in invoke_rcu_core () at kernel/rcutree.c:1878
  rcu_do_batch (rdp=0xffff880017c0bf80, rsp=<optimized out>) at kernel/rcutree.c:1635
  invoke_rcu_callbacks (rdp=0xffff880017c0bf80, rsp=0xffffffff81c2b140) at kernel/rcutree.c:1870
  __rcu_process_callbacks (rsp=0xffffffff81c2b140) at kernel/rcutree.c:1842
  rcu_process_callbacks (unused=<optimized out>) at kernel/rcutree.c:1854
  0xffffffff8104466b in handle_softirq (need_rcu_bh_qs=1, cpu=0, vec_nr=9) at kernel/softirq.c:149
  do_single_softirq (need_rcu_bh_qs=1, which=9) at kernel/softirq.c:457
  do_current_softirqs (need_rcu_bh_qs=1) at kernel/softirq.c:500

Does that sound right?

Paul.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Crap, ksoftirqd/0 looping forever in softirq
  2012-10-11 15:34   ` Paul Gortmaker
@ 2012-10-15 13:36     ` Paul E. McKenney
  0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2012-10-15 13:36 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Thomas Gleixner, linux-rt-users

On Thu, Oct 11, 2012 at 11:34:14AM -0400, Paul Gortmaker wrote:
> [Re: Crap, ksoftirqd/0 looping forever in softirq] On 11/10/2012 (Thu 10:21) Paul Gortmaker wrote:
> 
> > [Crap, ksoftirqd/0 looping forever in softirq] On 10/10/2012 (Wed 20:16) Paul Gortmaker wrote:
> > 
> > > Seeing the above message, I added a trivial change to print out the
> > > Mask and Pending, and I consistently saw it was (both p/m) in every
> > > case `was __IRQF_TIMER (0x200).
> > 
> > Of cource it helps if I look up the value in the right list.
> > I guess the ffs had me thinking "bits", instead of looking up
> > the value in the softirq enum list.
> > 
> > When I add a change to record the last vecs used and dump them
> > out when the Crap message is triggered (using softirq_to_name
> > instead of looking things up in random tables!)  I get this:
> > 
> > [   20.887137] Crap, ksoftirqd/0 looping forever in softirq
> > [   20.887150] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
> > [   20.920168] Crap, ksoftirqd/0 looping forever in softirq
> > [   20.920179] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
> > [   20.954200] Crap, ksoftirqd/0 looping forever in softirq
> > [   20.954213] Last: RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU RCU
> 
> Here is a trace, which I think shows us that we call the
> RCU softirq, which eventually calls invoke_rcu_core, which
> is just "raise_softirq(RCU_SOFTIRQ);" so the loop which is
> testing on current->softirqs_raised will never see it zero.
> 
> We clear it on entry with:
> 	 current->softirqs_raised &= ~mask;
> but after we call:
> 	 do_single_softirq(i, need_rcu_bh_qs);
> it will be re-armed by invoke_rcu_core.
> 
>   0xffffffff81044d9d in __raise_softirq_irqoff (nr=9) at kernel/softirq.c:608
>   0xffffffff81044e29 in raise_softirq_irqoff (nr=<optimized out>) at kernel/softirq.c:618
>   0xffffffff810452e1 in raise_softirq (nr=<optimized out>) at kernel/softirq.c:744
>   0xffffffff810bd4e1 in invoke_rcu_core () at kernel/rcutree.c:1878
>   rcu_do_batch (rdp=0xffff880017c0bf80, rsp=<optimized out>) at kernel/rcutree.c:1635
>   invoke_rcu_callbacks (rdp=0xffff880017c0bf80, rsp=0xffffffff81c2b140) at kernel/rcutree.c:1870
>   __rcu_process_callbacks (rsp=0xffffffff81c2b140) at kernel/rcutree.c:1842
>   rcu_process_callbacks (unused=<optimized out>) at kernel/rcutree.c:1854
>   0xffffffff8104466b in handle_softirq (need_rcu_bh_qs=1, cpu=0, vec_nr=9) at kernel/softirq.c:149
>   do_single_softirq (need_rcu_bh_qs=1, which=9) at kernel/softirq.c:457
>   do_current_softirqs (need_rcu_bh_qs=1) at kernel/softirq.c:500
> 
> Does that sound right?

(Following up on IRC discussion)

Yep, if a given CPU has a large number of callbacks, it will execute
them in batches of 10 (controlled by rcutree.blimit module parameter).
It will re-raise RCU_SOFTIRQ at the end of each batch.  The point is
to allow other softirq handlers to run.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-15 13:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-11  0:16 Crap, ksoftirqd/0 looping forever in softirq Paul Gortmaker
2012-10-11  2:38 ` Mike Galbraith
2012-10-11 14:21 ` Paul Gortmaker
2012-10-11 15:34   ` Paul Gortmaker
2012-10-15 13:36     ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).