From: "David S. Miller" <davem@redhat.com>
To: kuznet@ms2.inr.ac.ru
Cc: anton@samba.org, jes@trained-monkey.org, netdev@oss.sgi.com,
jgarzik@redhat.com
Subject: Re: acenic lockup
Date: Thu, 08 May 2003 09:00:49 -0700 (PDT) [thread overview]
Message-ID: <20030508.090049.48375778.davem@redhat.com> (raw)
In-Reply-To: <200305072221.CAA00973@mops.inr.ac.ru>
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Date: Thu, 8 May 2003 02:21:11 +0400 (MSD)
> However, what does kick device back into working state?
Usually it is a link problem and device will recover after
link is restored. This happens here.
I fully recognize this.
If it is some PCI failure or something went wrong in hardware,
the device will stop forever, I guess. And I guess this happens
with the same frequency as memory parity errors i.e. not so much. :-)
Sometimes it is not cosmic bit-flip which causes this, but rather
bit-flip caused by programmer of some unrelated area of kernel :-)))
I am happy that hard-hang part is probably gone now. But I also
desire real resiliency in this area of drivers.
> Do we make shamans dance when this message hits the logs
> and pray for the best? :-)
Sort of. I was about to dance for a while when saw creepy
"ethX: BUG, tx ring is full" from tulip, which has the same bogus
netif_wake_queue(). :-)
Note to Jeff, independant of what is being discussed here, a real
audit of drivers that blindly invoke netif_wake_queue() from transmit
timeout watchdog routine is in order at some point. This is what
Alexey is referring to as "same bogus netif_wake_queue()".
Well, full reset is difficult thing with lock-free acenic.
Seems, it has to throttle card, wake up something at process context,
to disable irq there and to reset nic like it happens at ifconfig down
(or even module unload in face of hard hardware failure?)
It is simple, use schedule_work(), tg3.c does exactly this.
Only difference in acenic is need to use disable_irq(), that is all.
next prev parent reply other threads:[~2003-05-08 16:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-05-07 7:06 acenic lockup Anton Blanchard
2003-05-07 6:43 ` David S. Miller
2003-05-07 17:06 ` Alexey Kuznetsov
2003-05-07 19:34 ` David S. Miller
2003-05-07 22:21 ` Alexey Kuznetsov
2003-05-08 16:00 ` David S. Miller [this message]
2003-05-08 17:11 ` Jeff Garzik
2003-05-08 16:09 ` David S. Miller
2003-05-08 17:27 ` Jeff Garzik
2003-05-14 11:40 ` Jamal Hadi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030508.090049.48375778.davem@redhat.com \
--to=davem@redhat.com \
--cc=anton@samba.org \
--cc=jes@trained-monkey.org \
--cc=jgarzik@redhat.com \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).