netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Matt Carlson <mcarlson@broadcom.com>
Cc: Roger Heflin <rogerheflin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>
Subject: Re: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xfe/0x17e() with tg3 network
Date: Tue, 2 Dec 2008 23:55:33 +0100	[thread overview]
Message-ID: <20081202225533.GA28767@1wt.eu> (raw)
In-Reply-To: <20081126225421.GA8906@xw6200.broadcom.net>

Hi Matt,

I ran a lot of tests last night. I have a few more information.
The issue sometimes takes longer to reproduce so it caused me
to identify wrong culprits among the 29 patches affecting tg3
between 2.6.25 and 2.6.27.7. I was finally able to reproduce
the issue by running the plain 2.6.25 driver (v3.90) on 2.6.27.7,
but not at all when running on 2.6.25, even after ten minutes
(in 2.6.27.7, it takes between 5s and 1mn to get a tx timeout).

Later, I noticed that 2.6.27's driver uses libphy, which was
never removed between tests. I wonder if it can interfer with
my tests. Maybe it initializes the phy differently from plain
2.6.25, causing delayed issues, I don't know. Unfortunately,
I cannot run 2.6.27's driver on 2.6.25 because of the libphy
dependency (that's how I discovered it).

I'm also now 100% certain that enabling/disabling FC does not
change anything with either kernel. So unless the hardware still
interpretes pause frames when disabled, it should not come from
there.

I suspect that the switch is getting ill : The problem happens
more often when it's been transfering at full speed for some
time. Since it's a cheap one lying on a desk, it might have
burned out capacitors in it causing some randomly corrupt
frames to go out from time to time (maybe even pause frames
preventing the NIC from sending). That was also a problem
for my tests, because after patching/unpatching and compilation
phases, it had some time to rest and took longer to reproduce
the issue.

I will re-run some tests on 2.6.27 + tg3 v3.90 (from 2.6.25)
without ever loading libphy from the power up, in order to
clearly identify if the problem is caused by the driver or
something else in the kernel. If it's something else, the
bisect will take a few weeks since I'm not there long enough
to run about 15 full builds and wait long enough for the
problem to (not) occur.

But I'm keeping hope, there's no reason not to find it!

Regards,
Willy


  parent reply	other threads:[~2008-12-02 22:55 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <491954E1.2050002@gmail.com>
2008-11-11 11:31 ` WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xfe/0x17e() with tg3 network Peter Zijlstra
2008-11-15  4:01   ` Roger Heflin
2008-11-18  6:50     ` Willy Tarreau
2008-11-20  3:11       ` Matt Carlson
2008-11-20  5:37         ` Willy Tarreau
2008-11-20 18:43           ` Matt Carlson
2008-11-20 21:26             ` Willy Tarreau
2008-11-20 21:53               ` Matt Carlson
2008-11-21 17:55                 ` Willy Tarreau
2008-11-24 13:27                 ` Willy Tarreau
2008-11-24 21:52                   ` Willy Tarreau
2008-11-25  1:52                     ` Matt Carlson
2008-11-25  5:31                       ` Willy Tarreau
2008-11-25 17:54                         ` Matt Carlson
2008-11-26 21:12                           ` Willy Tarreau
2008-11-26 22:54                             ` Matt Carlson
2008-11-27  5:16                               ` Willy Tarreau
2008-11-27 10:06                                 ` Frantisek Hanzlik
2008-11-27 20:33                                   ` Willy Tarreau
2008-12-02 22:55                               ` Willy Tarreau [this message]
2008-11-20  3:00     ` Matt Carlson
2008-11-20 10:07       ` Roger Heflin
2008-11-20 17:11         ` Matt Carlson
2008-11-21  9:34           ` Roger Heflin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081202225533.GA28767@1wt.eu \
    --to=w@1wt.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcarlson@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rogerheflin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).