* Re: warn: Turn the netdev timeout WARN_ON() into a WARN()
[not found] <200809170259.m8H2xClI020907@hera.kernel.org>
@ 2008-09-17 3:27 ` Jeff Garzik
2008-09-17 13:41 ` Arjan van de Ven
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2008-09-17 3:27 UTC (permalink / raw)
To: Linux Kernel Mailing List, torvalds, davem, arjan
On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List wrote:
>
> this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
> so that the device and driver names are inside the warning message.
> This helps automated tools like kerneloops.org to collect the data
> and do statistics, as well as making it more likely that humans
> cut-n-paste the important message as part of a bugreport.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> +#define WARN_ONCE(condition, format...) ({ \
> + static int __warned; \
> + int __ret_warn_once = !!(condition); \
> + \
> + if (unlikely(__ret_warn_once)) \
> + if (WARN(!__warned, format)) \
> + __warned = 1; \
> + unlikely(__ret_warn_once); \
> +})
> +
> #define WARN_ON_RATELIMIT(condition, state) \
> WARN_ON((condition) && __ratelimit(state))
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 9634091..ec0a083 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -215,10 +215,9 @@ static void dev_watchdog(unsigned long arg)
> time_after(jiffies, (dev->trans_start +
> dev->watchdog_timeo))) {
> char drivername[64];
> - printk(KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> + WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> dev->name, netdev_drivername(dev, drivername, 64));
> dev->tx_timeout(dev);
> - WARN_ON_ONCE(1);
hrm, am I misunderstanding?
AFAICS, this change means the user is no longer notified [after
the first time] of a condition they really need to know about --
a hardware or driver bug.
These conditions can occur many hours or days apart, and the admin
needs to know EACH time it occurs, because it is a major networking
event, generally leading to a complete reset of the entire hardware.
And quite honestly, the backtrace is not useful (yes, even the one
that existing previously)... THINK for a second. The backtrace
is going to look exactly the same, since it is a timer-triggered
dev_watchdog() call.
NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep
warnings, and the admin really does need to see each one.
Unless I am missing something, (1) this patch should be reverted,
and in additional, (2) I recommend removing the WARN_ON_ONCE()
because the backtrace is not helpful.
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN()
2008-09-17 3:27 ` warn: Turn the netdev timeout WARN_ON() into a WARN() Jeff Garzik
@ 2008-09-17 13:41 ` Arjan van de Ven
2008-09-17 21:39 ` Jeff Garzik
0 siblings, 1 reply; 5+ messages in thread
From: Arjan van de Ven @ 2008-09-17 13:41 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linux Kernel Mailing List, torvalds, davem
On Tue, 16 Sep 2008 23:27:08 -0400
Jeff Garzik <jeff@garzik.org> wrote:
> On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List
> wrote:
>
> hrm, am I misunderstanding?
>
> AFAICS, this change means the user is no longer notified [after
> the first time] of a condition they really need to know about --
> a hardware or driver bug.
>
> These conditions can occur many hours or days apart, and the admin
> needs to know EACH time it occurs, because it is a major networking
> event, generally leading to a complete reset of the entire hardware.
ok fair enough; the patch below drops the _ONCE
>
> And quite honestly, the backtrace is not useful (yes, even the one
> that existing previously)... THINK for a second. The backtrace
> is going to look exactly the same, since it is a timer-triggered
> dev_watchdog() call.
>
> NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep
> warnings, and the admin really does need to see each one.
>
> Unless I am missing something, (1) this patch should be reverted,
.. or rather changed as below ..
> and in additional, (2) I recommend removing the WARN_ON_ONCE()
> because the backtrace is not helpful.
I disagree on your (2):
First of all, you say that these messages are really critical for the
admin; well, a WARN() is a much more critical message than a printk, and
tends to get a lot more attention to it.
Second, a WARN() is about giving a set of standard information so that
the bugreports we get are more useful the first time. Yes a backtrace
is part of that, and yes you're right just the backtrace won't have new
information here, but a WARN() message also contains things such as the
kernel version, the loaded modules and is generally easier for a user
to make a good bugreport from. That one component of it isn't going to
contain new information so be it.
Third, by using this standardized format, computer programs such as
the kerneloops tools can automatically collect these reports and do
statistics on them.
Your argument that these are not easily fixable may be valid, but
they're as you say serious issues, and at least some portion can be
fixed by doing driver work; I just got an email from an ATL1E developer
as a result of my oopsreport who said he was fixing such issues the
last week for example. If all the networking guys agree that the report
has no value for developers because they're all unfixable hardware bugs,
I'm happy to stop tracking it in kerneloops.org, but I get the impression
so far that that is not the case.
From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] net: WARN_ONCE -> WARN as requested by Jeff Garzik
Jeff points out that you want to see each and every
timeout message
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
net/sched/sch_generic.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ec0a083..8772642 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -215,7 +215,7 @@ static void dev_watchdog(unsigned long arg)
time_after(jiffies, (dev->trans_start +
dev->watchdog_timeo))) {
char drivername[64];
- WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
+ WARN(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
dev->name, netdev_drivername(dev, drivername, 64));
dev->tx_timeout(dev);
}
--
1.5.5.1
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN()
2008-09-17 13:41 ` Arjan van de Ven
@ 2008-09-17 21:39 ` Jeff Garzik
2008-09-17 21:45 ` Arjan van de Ven
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Garzik @ 2008-09-17 21:39 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Linux Kernel Mailing List, torvalds, davem
Arjan van de Ven wrote:
> From: Arjan van de Ven <arjan@linux.intel.com>
> Subject: [PATCH] net: WARN_ONCE -> WARN as requested by Jeff Garzik
>
> Jeff points out that you want to see each and every
> timeout message
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---
> net/sched/sch_generic.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index ec0a083..8772642 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -215,7 +215,7 @@ static void dev_watchdog(unsigned long arg)
> time_after(jiffies, (dev->trans_start +
> dev->watchdog_timeo))) {
> char drivername[64];
> - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> + WARN(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> dev->name, netdev_drivername(dev, drivername, 64));
> dev->tx_timeout(dev);
ACK, this fixes my objection, thanks
It still seems a bit burdensome to print out a huge, redundant backtrace
and kernel info each time, though.
The most important information is (a) a timeout occurred and (b)
hopefully some hardware register dump or similar driver-specific output.
> If all the networking guys agree that the report
> has no value for developers because they're all unfixable hardware bugs,
You misunderstand; reporting and tracking these issues have value, but
the information you are dumping (specifically the backtrace) does not.
You say your goal here mainly to standardize reporting to permit
automated collection -- I support that goal too. But is there a better
way -- say perhaps just printing the header you need, and _not_ the
useless backtrace?
I think it would benefit Linux if our bugs announce themselves in a
standard way, but the backtrace is not a key part of that, and IMO
should be optional.
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN()
2008-09-17 21:39 ` Jeff Garzik
@ 2008-09-17 21:45 ` Arjan van de Ven
2008-09-17 21:53 ` Jeff Garzik
0 siblings, 1 reply; 5+ messages in thread
From: Arjan van de Ven @ 2008-09-17 21:45 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Linux Kernel Mailing List, torvalds, davem
On Wed, 17 Sep 2008 17:39:07 -0400
Jeff Garzik <jeff@garzik.org> wrote:
\
>
> ACK, this fixes my objection, thanks
ok
> > If all the networking guys agree that the report
> > has no value for developers because they're all unfixable hardware
> > bugs,
>
> You misunderstand; reporting and tracking these issues have value,
> but the information you are dumping (specifically the backtrace) does
> not.
WARN() is about a lot more than a backtrace, but point taken, see below
>
> I think it would benefit Linux if our bugs announce themselves in a
> standard way, but the backtrace is not a key part of that, and IMO
> should be optional.
I cannot disagree with the "the backtrace is not useful" statement for
this case. I don't even mind making a WARN() variant that doesn't
backtrace; I'll look into that. I don't consider it "2.6.27 urgent"
though; a backtrace is pretty light to do for such an exceptional
slowpath case, and the only real effect is a few lines
in /var/log/messages that are somewhat redundant.
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN()
2008-09-17 21:45 ` Arjan van de Ven
@ 2008-09-17 21:53 ` Jeff Garzik
0 siblings, 0 replies; 5+ messages in thread
From: Jeff Garzik @ 2008-09-17 21:53 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Linux Kernel Mailing List, torvalds, davem
Arjan van de Ven wrote:
> I cannot disagree with the "the backtrace is not useful" statement for
> this case. I don't even mind making a WARN() variant that doesn't
> backtrace; I'll look into that. I don't consider it "2.6.27 urgent"
> though; a backtrace is pretty light to do for such an exceptional
> slowpath case, and the only real effect is a few lines
> in /var/log/messages that are somewhat redundant.
Quite agreed -- I never meant to imply that [second half of my reply]
was urgent, just giving some food for thought about improving the system.
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-17 21:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200809170259.m8H2xClI020907@hera.kernel.org>
2008-09-17 3:27 ` warn: Turn the netdev timeout WARN_ON() into a WARN() Jeff Garzik
2008-09-17 13:41 ` Arjan van de Ven
2008-09-17 21:39 ` Jeff Garzik
2008-09-17 21:45 ` Arjan van de Ven
2008-09-17 21:53 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox