From: "Tim Sander" <tim.sander@hbm.com>
To: "Mike Galbraith" <efault@gmx.de>
Cc: "Tim Sander" <tstone@iss.tu-darmstadt.de>,
"Steven Rostedt" <rostedt@goodmis.org>,
"LKML" <linux-kernel@vger.kernel.org>,
"RT" <linux-rt-users@vger.kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Clark Williams" <williams@redhat.com>,
"John Kacur" <jkacur@redhat.com>
Subject: Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep
Date: Wed, 18 Jan 2012 12:11:44 +0100 [thread overview]
Message-ID: <201201181211.45011.tim.sander@hbm.com> (raw)
In-Reply-To: <1326822011.7386.40.camel@marge.simson.net>
Hi Mike and others
Thanks for your reply Mike.
Am Dienstag, 17. Januar 2012, 18:40:11 schrieb Mike Galbraith:
> I have a patchlet lying about that will show the likely culprit, but if
> ksoftirqd is eating CPU, someone has to raising softirqs at a frightful
> rate, and the culprit it shows would almost certainly be ksoftirqd. I
> mean, what else is running during boot that is RT other than kernel
> threads. Nada.
Well thanks for your patch. It didn't apply cleanly due to some moved lines,
but nothing to serious. I now have a machine where top just shows me the
culprit:
sirq-net-tx/0
It seems to be triggered not as often as the mainline rt kernel though. But
after some starts and stops of "connmand" and "ifconfig eth0 down" i got back
this errornous behaviour. The only question is what next? Still i have some
more observations which might help to nail down this bug:
* ifconfig does not return when sirq-net-tx/0 eats all cpu
* sometimes sirq-net-tx/0 sits on the cpu for a couple of seconds and goes
away, somtimes it just stays there when "ifconfig eth0 up" is issued.
* There are suspicious "FEC: MDIO read timeout" kernel log messages from the
ethernet driver.
* The ethernet phy uses polling since i do not know how to set the phy irq in
the board definition. I tried using "phy_register_fixup_for_uid" and then
setting the phy_dev->irq int the fixup routine but that seems to be to late and
the interrupt is deregisterd but has not been registered when the network
device is shut down.
I also didn't found a example in the source and there has been no word in the
phy.txt documentation about it? So input on how to set the phy irq in the
board config of the pcm043 would be really nice.
> You can find out easy easy enough, just edit kernel/softirq.c, comment
> out ksoftirqd_set_sched_params() in run_ksoftirqd(). If the throttle
> doesn't kick in (because ksoftirqd is now not RT), box boots but
> ksoftirqd still chewing up a CPU, you have the same info the throttle
> hacklet would show.
>
> If that's it, you can apply the below, do the same edit, and see which
> thread is grinding away. From there, I'd set a trap. Let sirq threads
> detect that they are being awakened too fast (hey, I can't go to sleep,
> the sirq I just processed is busy again, N times in a row) and leave a
> note for wakeup_softirqd(). There, WARN_ON(ksoftirqd)[i].help_me) or
> such, to see who is flogging which softirq mercilessly.
I didn't use this tricks, since top was already doing its job good enough :-).
Best regards
Tim
Please ignore:
Hottinger Baldwin Messtechnik GmbH, Im Tiefen See 45, 64293 Darmstadt, Germany | www.hbm.com
Registered as GmbH (German limited liability corporation) in the commercial register at the local court of Darmstadt, HRB 1147
Company domiciled in Darmstadt | CEO: Andreas Huellhorst | Chairman of the board: James Charles Webster
Als Gesellschaft mit beschraenkter Haftung eingetragen im Handelsregister des Amtsgerichts Darmstadt unter HRB 1147
Sitz der Gesellschaft: Darmstadt | Geschaeftsfuehrung: Andreas Huellhorst | Aufsichtsratsvorsitzender: James Charles Webster
The information in this email is confidential. It is intended solely for the addressee. If you are not the intended recipient, please let me know and delete this email.
Die in dieser E-Mail enthaltene Information ist vertraulich und lediglich für den Empfaenger bestimmt. Sollten Sie nicht der eigentliche Empfaenger sein, informieren Sie mich bitte kurz und loeschen diese E-Mail.
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: "Tim Sander" <tim.sander@hbm.com>
To: "Mike Galbraith" <efault@gmx.de>
Cc: "Tim Sander" <tstone@iss.tu-darmstadt.de>,
"Steven Rostedt" <rostedt@goodmis.org>,
"LKML" <linux-kernel@vger.kernel.org>,
"RT" <linux-rt-users@vger.kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Clark Williams" <williams@redhat.com>,
"John Kacur" <jkacur@redhat.com>
Subject: Re: [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep
Date: Wed, 18 Jan 2012 12:11:44 +0100 [thread overview]
Message-ID: <201201181211.45011.tim.sander@hbm.com> (raw)
In-Reply-To: <1326822011.7386.40.camel@marge.simson.net>
Hi Mike and others
Thanks for your reply Mike.
Am Dienstag, 17. Januar 2012, 18:40:11 schrieb Mike Galbraith:
> I have a patchlet lying about that will show the likely culprit, but if
> ksoftirqd is eating CPU, someone has to raising softirqs at a frightful
> rate, and the culprit it shows would almost certainly be ksoftirqd. I
> mean, what else is running during boot that is RT other than kernel
> threads. Nada.
Well thanks for your patch. It didn't apply cleanly due to some moved lines,
but nothing to serious. I now have a machine where top just shows me the
culprit:
sirq-net-tx/0
It seems to be triggered not as often as the mainline rt kernel though. But
after some starts and stops of "connmand" and "ifconfig eth0 down" i got back
this errornous behaviour. The only question is what next? Still i have some
more observations which might help to nail down this bug:
* ifconfig does not return when sirq-net-tx/0 eats all cpu
* sometimes sirq-net-tx/0 sits on the cpu for a couple of seconds and goes
away, somtimes it just stays there when "ifconfig eth0 up" is issued.
* There are suspicious "FEC: MDIO read timeout" kernel log messages from the
ethernet driver.
* The ethernet phy uses polling since i do not know how to set the phy irq in
the board definition. I tried using "phy_register_fixup_for_uid" and then
setting the phy_dev->irq int the fixup routine but that seems to be to late and
the interrupt is deregisterd but has not been registered when the network
device is shut down.
I also didn't found a example in the source and there has been no word in the
phy.txt documentation about it? So input on how to set the phy irq in the
board config of the pcm043 would be really nice.
> You can find out easy easy enough, just edit kernel/softirq.c, comment
> out ksoftirqd_set_sched_params() in run_ksoftirqd(). If the throttle
> doesn't kick in (because ksoftirqd is now not RT), box boots but
> ksoftirqd still chewing up a CPU, you have the same info the throttle
> hacklet would show.
>
> If that's it, you can apply the below, do the same edit, and see which
> thread is grinding away. From there, I'd set a trap. Let sirq threads
> detect that they are being awakened too fast (hey, I can't go to sleep,
> the sirq I just processed is busy again, N times in a row) and leave a
> note for wakeup_softirqd(). There, WARN_ON(ksoftirqd)[i].help_me) or
> such, to see who is flogging which softirq mercilessly.
I didn't use this tricks, since top was already doing its job good enough :-).
Best regards
Tim
Please ignore:
Hottinger Baldwin Messtechnik GmbH, Im Tiefen See 45, 64293 Darmstadt, Germany | www.hbm.com
Registered as GmbH (German limited liability corporation) in the commercial register at the local court of Darmstadt, HRB 1147
Company domiciled in Darmstadt | CEO: Andreas Huellhorst | Chairman of the board: James Charles Webster
Als Gesellschaft mit beschraenkter Haftung eingetragen im Handelsregister des Amtsgerichts Darmstadt unter HRB 1147
Sitz der Gesellschaft: Darmstadt | Geschaeftsfuehrung: Andreas Huellhorst | Aufsichtsratsvorsitzender: James Charles Webster
The information in this email is confidential. It is intended solely for the addressee. If you are not the intended recipient, please let me know and delete this email.
Die in dieser E-Mail enthaltene Information ist vertraulich und lediglich für den Empfaenger bestimmt. Sollten Sie nicht der eigentliche Empfaenger sein, informieren Sie mich bitte kurz und loeschen diese E-Mail.
next prev parent reply other threads:[~2012-01-18 11:17 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-22 3:40 [ANNOUNCE] 3.0.14-rt31 Steven Rostedt
2011-12-22 11:08 ` Raz
2011-12-22 11:50 ` Steven Rostedt
2011-12-22 12:08 ` Lars Segerlund
2011-12-22 13:53 ` John Kacur
2011-12-22 14:00 ` Georgiewskiy Yuriy
2011-12-22 14:00 ` Georgiewskiy Yuriy
2011-12-22 14:44 ` Steven Rostedt
2011-12-22 14:44 ` Steven Rostedt
2011-12-22 15:29 ` Georgiewskiy Yuriy
2011-12-22 15:29 ` Georgiewskiy Yuriy
2011-12-22 16:23 ` Steven Rostedt
2011-12-22 16:35 ` Georgiewskiy Yuriy
2011-12-24 0:02 ` Karsten Wiese
2011-12-24 14:13 ` Steven Rostedt
2011-12-24 16:16 ` Clark Williams
2012-01-10 18:53 ` Steven Rostedt
2012-01-10 23:56 ` Karsten Wiese
2012-01-04 15:19 ` Georgiewskiy Yuriy
2012-01-11 11:10 ` Karsten Wiese
2012-01-11 15:39 ` Steven Rostedt
2012-01-12 10:18 ` Tim Sander
2012-01-12 10:18 ` Tim Sander
2012-01-12 13:54 ` Steven Rostedt
2012-01-12 16:57 ` Tim Sander
2012-01-12 16:57 ` Tim Sander
2012-01-13 9:42 ` Tim Sander
2012-01-13 18:45 ` Bernardo Barros
2012-01-13 20:30 ` Tim Sander
2012-01-13 23:51 ` Steven Rostedt
2012-01-17 14:27 ` [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Tim Sander
2012-01-17 14:42 ` Steven Rostedt
2012-01-17 17:40 ` Mike Galbraith
2012-01-18 11:11 ` Tim Sander [this message]
2012-01-18 11:11 ` [ANNOUNCE] 3.0.14-rt31 - ksoftirq running wild - FEC ethernet driver to blame? Yep Tim Sander
2012-01-18 13:54 ` Mike Galbraith
2012-01-18 15:37 ` Steven Rostedt
2012-01-24 10:52 ` Tim Sander
2012-01-24 10:52 ` Tim Sander
2012-01-25 9:31 ` Tim Sander
2012-01-25 9:31 ` Tim Sander
2012-02-01 20:27 ` Steven Rostedt
2012-02-01 23:11 ` Tim Sander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201201181211.45011.tim.sander@hbm.com \
--to=tim.sander@hbm.com \
--cc=efault@gmx.de \
--cc=jkacur@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tstone@iss.tu-darmstadt.de \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.