netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tg3 broken in 3.18.0?
@ 2014-12-10 23:06 Nils Holland
  2014-12-11 16:45 ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 28+ messages in thread
From: Nils Holland @ 2014-12-10 23:06 UTC (permalink / raw)
  To: netdev

Hi everyone,

I just upgraded a machine from 3.17.3 to 3.18.0 and noticed that after
the upgrade, the machine's network interface (which is a tg3) would no
longer run correctly (or, for that matter, run at all). During the
upgrade, I didn't change any kernel config options or any other parts
of the system.

Since the machine is remote and I don't have direct access to it, it's
kind of hard currently to give more details, but here's what I'm
seeing in the logs:

[Booting 3.17.3:]
[    1.383151] tg3.c:v3.137 (May 11, 2014)
[    1.387296] libphy: tg3 mdio bus: probed
[    1.452600] tg3 0000:02:00.0 eth0:
        Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
        00:19:99:ce:13:a6
[    1.454660] tg3 0000:02:00.0 eth0:
        attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[    1.456764] tg3 0000:02:00.0 eth0:
        RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    1.458911] tg3 0000:02:00.0 eth0:
        dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[    6.602608] tg3 0000:02:00.0
        enp2s0: renamed from eth0
[    9.865638] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[    9.887584] IPv6:
        ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   10.469819] tg3 0000:02:00.0
        enp2s0: Link is down
[   12.477396] tg3 0000:02:00.0
        enp2s0: Link is up at 100 Mbps, full duplex
[   12.477404] tg3 0000:02:00.0
        enp2s0: Flow control is off for TX and off for RX

[Booting 3.18.0:]
[    2.192915] tg3.c:v3.137 (May 11, 2014)
[    2.196767] libphy: tg3 mdio bus: probed
[    2.256294] tg3 0000:02:00.0 eth0:
        Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
        00:19:99:ce:13:a6
[    2.258387] tg3 0000:02:00.0 eth0:
        attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[    2.260530] tg3 0000:02:00.0 eth0:
        RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    2.262679] tg3 0000:02:00.0 eth0:
        dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[    7.431176] tg3 0000:02:00.0
        enp2s0: renamed from eth0
[   10.422839] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[   12.530363] tg3 0000:02:00.0
        enp2s0: No firmware running

That's the last thing I find about the card in the logs, the machine
will then just sit there, working normally but being unreachable from
the network.

If I see things correctly, there were only two patches affecting tg3
between 3.17(.3) and 3.18:

2c7c9ea429ba30fe506747b7da110e2212d8fefa
a620a6bc1c94c22d6c312892be1e0ae171523125

The affected machine being, like I said, remote, I've not yet been
able to do more thorough tests. So I thought I'd report the issue and
see if someone else has also seen it already, or can test things with
a more easily accesible machine. Otherwise, I might start digging
deeper.

Greetings,
Nils

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [bisected] tg3 broken in 3.18.0?
@ 2014-12-13 21:02 Nils Holland
  2014-12-15 15:06 ` Marcelo Ricardo Leitner
  2014-12-16  0:31 ` Bjorn Helgaas
  0 siblings, 2 replies; 28+ messages in thread
From: Nils Holland @ 2014-12-13 21:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-pci, rajatxjain

rajatxjain@gmail.com
Bcc: 
Subject: Re: [bisected] tg3 broken in 3.18.0?
Reply-To: 
In-Reply-To: <20141212.201831.186234837340644301.davem@davemloft.net>

On Fri, Dec 12, 2014 at 08:18:31PM -0500, David Miller wrote:
> From: Nils Holland <nholland@tisys.org>
> Date: Sat, 13 Dec 2014 02:14:08 +0100
> 
> > 
> > My bisect exercise suggests that the following commit is the culprit:
> > 
> > 89665a6a71408796565bfd29cfa6a7877b17a667 (PCI: Check only the Vendor
> > ID to identify Configuration Request Retry)
> 
> You definitely need to bring this up with the author of that change
> and the relevent list for the PCI subsystem and/or linux-kernel.

I've now already sent an inquiry to Rajat Jain, the author of the
patch in question, and this message here is now also CC'd to
linux-pci@.

With this message, I'd like to add one last result of investigation
I've done today, in the hope that it will aid the folks with more
knowledge to go after the issue.

Basically, I've added a little debug output to tg3.c in the function
tg3_poll_fw(), as that function contained the code that would print
out the "No firmware running" line that was visible in dmesg on those
kernels where tg3 would not work for me. So, I basically had this:

static int tg3_poll_fw(struct tg3 *tp)
{
        int i;
        u32 val;

        netdev_info(tp->dev, "XX: Boom!\n");
        [...]
}

Now, I was looking through dmesg searching for occurances of this
debug output, using a standard 3.18.0 kernel (where my tg3 doesn't
work) as well as using a 3.18.0 kernel with
89665a6a71408796565bfd29cfa6a7877b17a667 reverted (where my tg3
works). Here's the results:

[standard 3.18.0 (=problematic)]:
[    2.197653] libphy: tg3 mdio bus: probed
[    2.257488] tg3 0000:02:00.0 eth0:
        Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
        00:19:99:ce:13:a6
[    2.259589] tg3 0000:02:00.0 eth0:
        attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[    2.261740] tg3 0000:02:00.0 eth0:
        RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    2.263912] tg3 0000:02:00.0 eth0:
        dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[   10.028002] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[   10.028247] tg3 0000:02:00.0 enp2s0: XX: Boom!
[   12.157034] tg3 0000:02:00.0 enp2s0: No firmware running


[3.18.0 without above mentioned patch, 3.17.3 is the same, both result
in a working tg3]:
[    1.397167] libphy: tg3 mdio bus: probed
[    1.456473] tg3 0000:02:00.0
        (unnamed net_device) (uninitialized): XX: Boom!
[    1.464987] tg3 0000:02:00.0 eth0:
        Tigon3 [partno(BCM57780) rev 57780001] (PCI Express) MAC address
        00:19:99:ce:13:a6
[    1.467118] tg3 0000:02:00.0 eth0:
        attached PHY driver [Broadcom BCM57780] (mii_bus:phy_addr=200:01)
[    1.469311] tg3 0000:02:00.0 eth0:
        RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[    1.471500] tg3 0000:02:00.0 eth0:
        dma_rwctrl[76180000] dma_mask[64-bit]
[...]
[    9.631629] tg3 0000:02:00.0: irq 25 for MSI/MSI-X
[    9.631962] tg3 0000:02:00.0 enp2s0: XX: Boom!
[    9.634339] tg3 0000:02:00.0 enp2s0: XX: Boom!
[    9.642741] IPv6:
        ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   10.479636] tg3 0000:02:00.0
        enp2s0: Link is down
[   11.484498] tg3 0000:02:00.0
        enp2s0: Link is up at 100 Mbps, full duplex

As can be seen, there are two tg3-related sections in my dmesg in both
the working and non-working scenarios: At about 1 - 2 secs, the card
seems to begin initializing, and at about 9 - 10 seconds it is (or
should be) ready to establish a network connection.

My debug section, or tg3.c's tg3_poll_fw(), seems to be called thrice
in the working situation: The first hit occurs at 1.456473 where the tg3
device is still reported as "(unnamed net_device) (uninitialized)".
Then, the section gets hit twice again at around 9.63 - at this point
the driver already reports the card as initialized / by its real name.

In the non-working situation, the debug sections seems to be hit only
once, at 10.028247. At this point, the tg3 is already reported as
initialized - just like when it's hit the second and third time in the
working situation.

Bottom line is that commit 89665a6a71408796565bfd29cfa6a7877b17a667
really makes a difference regarding the way the tg3 card is
initialized, which seems to cause the problem.

Greetings,
Nils

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-12-19 19:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 23:06 tg3 broken in 3.18.0? Nils Holland
2014-12-11 16:45 ` Marcelo Ricardo Leitner
2014-12-12 14:50   ` Jonathan Bither
2014-12-12 20:31     ` Nils Holland
2014-12-13  1:14       ` [bisected] " Nils Holland
2014-12-13  1:18         ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2014-12-13 21:02 Nils Holland
2014-12-15 15:06 ` Marcelo Ricardo Leitner
2014-12-16 16:04   ` Rajat Jain
2014-12-16 16:20     ` Bjorn Helgaas
2014-12-16 17:15       ` Michael Chan
2014-12-16 17:59         ` Marcelo Ricardo Leitner
2014-12-16 19:54           ` Michael Chan
2014-12-16 20:02             ` Marcelo Ricardo Leitner
2014-12-18 19:15             ` Bjorn Helgaas
2014-12-18 19:28               ` Prashant Sreedharan
2014-12-18 20:09                 ` Marcelo Ricardo Leitner
2014-12-18 20:33                   ` Marcelo Ricardo Leitner
2014-12-18 20:26                 ` Nils Holland
2014-12-19  2:10                   ` Prashant Sreedharan
2014-12-19 17:09                     ` Bjorn Helgaas
2014-12-19 17:16                       ` Marcelo Ricardo Leitner
2014-12-19 18:24                         ` Rajat Jain
2014-12-19 18:53                           ` Prashant Sreedharan
2014-12-19 19:37                             ` Rajat Jain
2014-12-16 18:00     ` Marcelo Ricardo Leitner
2014-12-16 20:38       ` Nils Holland
2014-12-16  0:31 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).