All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bernhard Schmidt <berni@birkenwald.de>
To: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"bugme-daemon@bugzilla.kernel.org"
	<bugme-daemon@bugzilla.kernel.org>
Subject: Re: [Bugme-new] [Bug 12877] New: tg3: eth0 transit timed out, resetting -> dead NIC
Date: Tue, 24 Mar 2009 01:35:46 +0100	[thread overview]
Message-ID: <49C82AE2.3080206@birkenwald.de> (raw)
In-Reply-To: <20090323181859.GA5473@xw6200.broadcom.net>

On 23.03.2009 19:18, Matt Carlson wrote:

Hello Matt,

>> Mar 22 04:06:46 svr02 kernel: [1392136.468921] PCI Memory Mapped IO Disabled!!!!
[...]
>> Mar 22 04:07:14 svr02 kernel: [1392164.768266] PCI Memory Mapped IO Disabled!!!!
>> at this point the "watchdog" kicked in and did rmmod/modprobe, so I
>> think the only thing you can read out of this debugging log is that
>> there was no kernel message right before MMIO got disabled and it takes
>> quite a while to fire the Tx timeout.
> So traffic on this box must be pretty light for the watchdog to fire off
> 30 seconds after the MMIO problem was detected, right?  Interesting.

Just to make sure I didn't confuse you, the "watchdog" I was talking 
about here is a shellscript like this, executed every minute

---
/bin/ping -q -c 5 <defaultgw> > /dev/null
RC=$?
if [ ${RC} -ne 0 ]; then
	rmmod tg3; sleep 5; modprobe tg3; sleep 5; ifup --force eth0
fi
---

at :46 MMIO was disabled, at :00 the cronjob started which took until 
:15 before detecting the network was dead and reloaded the modules

>> Mar 22 04:07:15 svr02 kernel: [1392165.540078] tg3 0000:03:04.1: PCI INT B disabled
>> Mar 22 04:07:16 svr02 kernel: [1392166.817125] tg3: tg3_abort_hw timed out for eth0, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
>> Mar 22 04:07:18 svr02 kernel: [1392168.398844] tg3: eth0: No firmware running.
>> Mar 22 04:07:29 svr02 kernel: [1392179.793309] tg3: eth0: Link is down.
>> Mar 22 04:07:31 svr02 kernel: [1392181.896030] tg3 0000:03:04.0: PCI INT A disabled
>> Mar 22 04:07:33 svr02 kernel: [1392183.957132] tg3.c:v3.94 (August 14, 2008)
>> Mar 22 04:07:33 svr02 kernel: [1392184.020034] tg3 0000:03:04.0: enabling device (0000 ->  0002)
>> Mar 22 04:07:33 svr02 kernel: [1392184.086083] tg3 0000:03:04.0: PCI INT A ->  GSI 16 (level, low) ->  IRQ 16

The tg3 watchdog (tg3: eth0: transmit timed out, resetting) did not 
appear at all in this circle, so I guess the checkscript killed the 
module before.

Yes, the NIC is very lightly loaded, around 100kbps / 70pps in each 
direction with a few occasional spikes.

>> I'm now switching to eth1.
> O.K.  I eagerly await your results.

So far so good, but it has only been running ~36 hours, that's not 
really a stability spree yet :-)

I'll keep you updated.

Bernhard

  reply	other threads:[~2009-03-24  0:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-12877-10286@http.bugzilla.kernel.org/>
2009-03-15 21:32 ` [Bugme-new] [Bug 12877] New: tg3: eth0 transit timed out, resetting -> dead NIC Andrew Morton
2009-03-16 21:23   ` Michael Chan
2009-03-16 22:46     ` Bernhard Schmidt
2009-03-17 22:09     ` Bernhard Schmidt
2009-03-17 23:30       ` Michael Chan
2009-03-19 16:58       ` Matt Carlson
2009-03-19 18:06         ` Bernhard Schmidt
2009-03-19 18:15           ` Matt Carlson
2009-03-19 18:19             ` Bernhard Schmidt
2009-03-22 13:21         ` Bernhard Schmidt
2009-03-23 18:18           ` Matt Carlson
2009-03-24  0:35             ` Bernhard Schmidt [this message]
2009-03-31 16:26           ` Matt Carlson
2009-03-31 22:16             ` Bernhard Schmidt
2009-04-13 21:54               ` Bernhard Schmidt
2009-04-14 18:29                 ` Matt Carlson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49C82AE2.3080206@birkenwald.de \
    --to=berni@birkenwald.de \
    --cc=akpm@linux-foundation.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=mcarlson@broadcom.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.