netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bernhard Schmidt <berni@birkenwald.de>
To: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bugme-daemon@bugzilla.kernel.org"
	<bugme-daemon@bugzilla.kernel.org>
Subject: Re: [Bugme-new] [Bug 12877] New: tg3: eth0 transit timed out, resetting -> dead NIC
Date: Tue, 24 Mar 2009 01:35:46 +0100	[thread overview]
Message-ID: <49C82AE2.3080206@birkenwald.de> (raw)
In-Reply-To: <20090323181859.GA5473@xw6200.broadcom.net>

On 23.03.2009 19:18, Matt Carlson wrote:

Hello Matt,

>> Mar 22 04:06:46 svr02 kernel: [1392136.468921] PCI Memory Mapped IO Disabled!!!!
[...]
>> Mar 22 04:07:14 svr02 kernel: [1392164.768266] PCI Memory Mapped IO Disabled!!!!
>> at this point the "watchdog" kicked in and did rmmod/modprobe, so I
>> think the only thing you can read out of this debugging log is that
>> there was no kernel message right before MMIO got disabled and it takes
>> quite a while to fire the Tx timeout.
> So traffic on this box must be pretty light for the watchdog to fire off
> 30 seconds after the MMIO problem was detected, right?  Interesting.

Just to make sure I didn't confuse you, the "watchdog" I was talking 
about here is a shellscript like this, executed every minute

---
/bin/ping -q -c 5 <defaultgw> > /dev/null
RC=$?
if [ ${RC} -ne 0 ]; then
	rmmod tg3; sleep 5; modprobe tg3; sleep 5; ifup --force eth0
fi
---

at :46 MMIO was disabled, at :00 the cronjob started which took until 
:15 before detecting the network was dead and reloaded the modules

>> Mar 22 04:07:15 svr02 kernel: [1392165.540078] tg3 0000:03:04.1: PCI INT B disabled
>> Mar 22 04:07:16 svr02 kernel: [1392166.817125] tg3: tg3_abort_hw timed out for eth0, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>> Mar 22 04:07:18 svr02 kernel: [1392168.398844] tg3: eth0: No firmware running.
>> Mar 22 04:07:29 svr02 kernel: [1392179.793309] tg3: eth0: Link is down.
>> Mar 22 04:07:31 svr02 kernel: [1392181.896030] tg3 0000:03:04.0: PCI INT A disabled
>> Mar 22 04:07:33 svr02 kernel: [1392183.957132] tg3.c:v3.94 (August 14, 2008)
>> Mar 22 04:07:33 svr02 kernel: [1392184.020034] tg3 0000:03:04.0: enabling device (0000 ->  0002)
>> Mar 22 04:07:33 svr02 kernel: [1392184.086083] tg3 0000:03:04.0: PCI INT A ->  GSI 16 (level, low) ->  IRQ 16

The tg3 watchdog (tg3: eth0: transmit timed out, resetting) did not 
appear at all in this circle, so I guess the checkscript killed the 
module before.

Yes, the NIC is very lightly loaded, around 100kbps / 70pps in each 
direction with a few occasional spikes.

>> I'm now switching to eth1.
> O.K.  I eagerly await your results.

So far so good, but it has only been running ~36 hours, that's not 
really a stability spree yet :-)

I'll keep you updated.

Bernhard

  reply	other threads:[~2009-03-24  0:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-12877-10286@http.bugzilla.kernel.org/>
2009-03-15 21:32 ` [Bugme-new] [Bug 12877] New: tg3: eth0 transit timed out, resetting -> dead NIC Andrew Morton
2009-03-16 21:23   ` Michael Chan
2009-03-16 22:46     ` Bernhard Schmidt
2009-03-17 22:09     ` Bernhard Schmidt
2009-03-17 23:30       ` Michael Chan
2009-03-19 16:58       ` Matt Carlson
2009-03-19 18:06         ` Bernhard Schmidt
2009-03-19 18:15           ` Matt Carlson
2009-03-19 18:19             ` Bernhard Schmidt
2009-03-22 13:21         ` Bernhard Schmidt
2009-03-23 18:18           ` Matt Carlson
2009-03-24  0:35             ` Bernhard Schmidt [this message]
2009-03-31 16:26           ` Matt Carlson
2009-03-31 22:16             ` Bernhard Schmidt
2009-04-13 21:54               ` Bernhard Schmidt
2009-04-14 18:29                 ` Matt Carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C82AE2.3080206@birkenwald.de \
    --to=berni@birkenwald.de \
    --cc=akpm@linux-foundation.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).