* Re: warn: Turn the netdev timeout WARN_ON() into a WARN() [not found] <200809170259.m8H2xClI020907@hera.kernel.org> @ 2008-09-17 3:27 ` Jeff Garzik 2008-09-17 13:41 ` Arjan van de Ven 0 siblings, 1 reply; 5+ messages in thread From: Jeff Garzik @ 2008-09-17 3:27 UTC (permalink / raw) To: Linux Kernel Mailing List, torvalds, davem, arjan On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List wrote: > > this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(), > so that the device and driver names are inside the warning message. > This helps automated tools like kerneloops.org to collect the data > and do statistics, as well as making it more likely that humans > cut-n-paste the important message as part of a bugreport. > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > +#define WARN_ONCE(condition, format...) ({ \ > + static int __warned; \ > + int __ret_warn_once = !!(condition); \ > + \ > + if (unlikely(__ret_warn_once)) \ > + if (WARN(!__warned, format)) \ > + __warned = 1; \ > + unlikely(__ret_warn_once); \ > +}) > + > #define WARN_ON_RATELIMIT(condition, state) \ > WARN_ON((condition) && __ratelimit(state)) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 9634091..ec0a083 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -215,10 +215,9 @@ static void dev_watchdog(unsigned long arg) > time_after(jiffies, (dev->trans_start + > dev->watchdog_timeo))) { > char drivername[64]; > - printk(KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", > + WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", > dev->name, netdev_drivername(dev, drivername, 64)); > dev->tx_timeout(dev); > - WARN_ON_ONCE(1); hrm, am I misunderstanding? AFAICS, this change means the user is no longer notified [after the first time] of a condition they really need to know about -- a hardware or driver bug. These conditions can occur many hours or days apart, and the admin needs to know EACH time it occurs, because it is a major networking event, generally leading to a complete reset of the entire hardware. And quite honestly, the backtrace is not useful (yes, even the one that existing previously)... THINK for a second. The backtrace is going to look exactly the same, since it is a timer-triggered dev_watchdog() call. NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep warnings, and the admin really does need to see each one. Unless I am missing something, (1) this patch should be reverted, and in additional, (2) I recommend removing the WARN_ON_ONCE() because the backtrace is not helpful. Jeff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN() 2008-09-17 3:27 ` warn: Turn the netdev timeout WARN_ON() into a WARN() Jeff Garzik @ 2008-09-17 13:41 ` Arjan van de Ven 2008-09-17 21:39 ` Jeff Garzik 0 siblings, 1 reply; 5+ messages in thread From: Arjan van de Ven @ 2008-09-17 13:41 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel Mailing List, torvalds, davem On Tue, 16 Sep 2008 23:27:08 -0400 Jeff Garzik <jeff@garzik.org> wrote: > On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List > wrote: > > hrm, am I misunderstanding? > > AFAICS, this change means the user is no longer notified [after > the first time] of a condition they really need to know about -- > a hardware or driver bug. > > These conditions can occur many hours or days apart, and the admin > needs to know EACH time it occurs, because it is a major networking > event, generally leading to a complete reset of the entire hardware. ok fair enough; the patch below drops the _ONCE > > And quite honestly, the backtrace is not useful (yes, even the one > that existing previously)... THINK for a second. The backtrace > is going to look exactly the same, since it is a timer-triggered > dev_watchdog() call. > > NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep > warnings, and the admin really does need to see each one. > > Unless I am missing something, (1) this patch should be reverted, .. or rather changed as below .. > and in additional, (2) I recommend removing the WARN_ON_ONCE() > because the backtrace is not helpful. I disagree on your (2): First of all, you say that these messages are really critical for the admin; well, a WARN() is a much more critical message than a printk, and tends to get a lot more attention to it. Second, a WARN() is about giving a set of standard information so that the bugreports we get are more useful the first time. Yes a backtrace is part of that, and yes you're right just the backtrace won't have new information here, but a WARN() message also contains things such as the kernel version, the loaded modules and is generally easier for a user to make a good bugreport from. That one component of it isn't going to contain new information so be it. Third, by using this standardized format, computer programs such as the kerneloops tools can automatically collect these reports and do statistics on them. Your argument that these are not easily fixable may be valid, but they're as you say serious issues, and at least some portion can be fixed by doing driver work; I just got an email from an ATL1E developer as a result of my oopsreport who said he was fixing such issues the last week for example. If all the networking guys agree that the report has no value for developers because they're all unfixable hardware bugs, I'm happy to stop tracking it in kerneloops.org, but I get the impression so far that that is not the case. From: Arjan van de Ven <arjan@linux.intel.com> Subject: [PATCH] net: WARN_ONCE -> WARN as requested by Jeff Garzik Jeff points out that you want to see each and every timeout message Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> --- net/sched/sch_generic.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index ec0a083..8772642 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -215,7 +215,7 @@ static void dev_watchdog(unsigned long arg) time_after(jiffies, (dev->trans_start + dev->watchdog_timeo))) { char drivername[64]; - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", + WARN(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", dev->name, netdev_drivername(dev, drivername, 64)); dev->tx_timeout(dev); } -- 1.5.5.1 -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN() 2008-09-17 13:41 ` Arjan van de Ven @ 2008-09-17 21:39 ` Jeff Garzik 2008-09-17 21:45 ` Arjan van de Ven 0 siblings, 1 reply; 5+ messages in thread From: Jeff Garzik @ 2008-09-17 21:39 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux Kernel Mailing List, torvalds, davem Arjan van de Ven wrote: > From: Arjan van de Ven <arjan@linux.intel.com> > Subject: [PATCH] net: WARN_ONCE -> WARN as requested by Jeff Garzik > > Jeff points out that you want to see each and every > timeout message > > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> > --- > net/sched/sch_generic.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index ec0a083..8772642 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -215,7 +215,7 @@ static void dev_watchdog(unsigned long arg) > time_after(jiffies, (dev->trans_start + > dev->watchdog_timeo))) { > char drivername[64]; > - WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", > + WARN(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n", > dev->name, netdev_drivername(dev, drivername, 64)); > dev->tx_timeout(dev); ACK, this fixes my objection, thanks It still seems a bit burdensome to print out a huge, redundant backtrace and kernel info each time, though. The most important information is (a) a timeout occurred and (b) hopefully some hardware register dump or similar driver-specific output. > If all the networking guys agree that the report > has no value for developers because they're all unfixable hardware bugs, You misunderstand; reporting and tracking these issues have value, but the information you are dumping (specifically the backtrace) does not. You say your goal here mainly to standardize reporting to permit automated collection -- I support that goal too. But is there a better way -- say perhaps just printing the header you need, and _not_ the useless backtrace? I think it would benefit Linux if our bugs announce themselves in a standard way, but the backtrace is not a key part of that, and IMO should be optional. Jeff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN() 2008-09-17 21:39 ` Jeff Garzik @ 2008-09-17 21:45 ` Arjan van de Ven 2008-09-17 21:53 ` Jeff Garzik 0 siblings, 1 reply; 5+ messages in thread From: Arjan van de Ven @ 2008-09-17 21:45 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel Mailing List, torvalds, davem On Wed, 17 Sep 2008 17:39:07 -0400 Jeff Garzik <jeff@garzik.org> wrote: \ > > ACK, this fixes my objection, thanks ok > > If all the networking guys agree that the report > > has no value for developers because they're all unfixable hardware > > bugs, > > You misunderstand; reporting and tracking these issues have value, > but the information you are dumping (specifically the backtrace) does > not. WARN() is about a lot more than a backtrace, but point taken, see below > > I think it would benefit Linux if our bugs announce themselves in a > standard way, but the backtrace is not a key part of that, and IMO > should be optional. I cannot disagree with the "the backtrace is not useful" statement for this case. I don't even mind making a WARN() variant that doesn't backtrace; I'll look into that. I don't consider it "2.6.27 urgent" though; a backtrace is pretty light to do for such an exceptional slowpath case, and the only real effect is a few lines in /var/log/messages that are somewhat redundant. -- Arjan van de Ven Intel Open Source Technology Centre For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: warn: Turn the netdev timeout WARN_ON() into a WARN() 2008-09-17 21:45 ` Arjan van de Ven @ 2008-09-17 21:53 ` Jeff Garzik 0 siblings, 0 replies; 5+ messages in thread From: Jeff Garzik @ 2008-09-17 21:53 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Linux Kernel Mailing List, torvalds, davem Arjan van de Ven wrote: > I cannot disagree with the "the backtrace is not useful" statement for > this case. I don't even mind making a WARN() variant that doesn't > backtrace; I'll look into that. I don't consider it "2.6.27 urgent" > though; a backtrace is pretty light to do for such an exceptional > slowpath case, and the only real effect is a few lines > in /var/log/messages that are somewhat redundant. Quite agreed -- I never meant to imply that [second half of my reply] was urgent, just giving some food for thought about improving the system. Jeff ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-09-17 21:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200809170259.m8H2xClI020907@hera.kernel.org>
2008-09-17 3:27 ` warn: Turn the netdev timeout WARN_ON() into a WARN() Jeff Garzik
2008-09-17 13:41 ` Arjan van de Ven
2008-09-17 21:39 ` Jeff Garzik
2008-09-17 21:45 ` Arjan van de Ven
2008-09-17 21:53 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox