netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
To: Bjorn Helgaas <bhelgaas@google.com>,
	Prashant Sreedharan <prashant@broadcom.com>
Cc: Nils Holland <nholland@tisys.org>,
	Michael Chan <mchan@broadcom.com>,
	Rajat Jain <rajatxjain@gmail.com>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Rafael Wysocki <rjw@rjwysocki.net>
Subject: Re: [bisected] tg3 broken in 3.18.0?
Date: Fri, 19 Dec 2014 15:16:41 -0200	[thread overview]
Message-ID: <54945D79.2070008@gmail.com> (raw)
In-Reply-To: <CAErSpo4UdxwL=wA8cBJ7_DAdTb2xNeg6600qB0=Jdxv80aVmcg@mail.gmail.com>

On 19-12-2014 15:09, Bjorn Helgaas wrote:
> On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
> <prashant@broadcom.com> wrote:
>> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>>>> On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>>>>>
>>>>> Any updates from the hardware team?
>>>>>
>>>>> This is a pretty serious regression, but as far as I can tell, it is
>>>>> not a PCI bug.  The device should respond to a config read of vendor
>>>>> ID.  If the driver does something that make the read return CRS
>>>>> status, I think the driver is responsible for doing whatever delay or
>>>>> other fixup is required.
>>>>>
>>>>> I'm inclined to reassign this bug to the tg3 driver unless you think
>>>>> the PCI core is doing something wrong here.
>>>>>
>>>>> Bjorn
>>>>
>>>> We were not able to reproduce this issue, could you please check what is
>>>> the value of reg 0x70, before the pci_device_is_present call is made ?
>>>> if bit 15 is set config access will be retried.
>>>>
>>>> --- a/drivers/net/ethernet/broadcom/tg3.c
>>>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>>>> @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>>>>          void (*write_op)(struct tg3 *, u32, u32);
>>>>          int i, err;
>>>>
>>>> +       printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>>>>          if (!pci_device_is_present(tp->pdev))
>>>>                  return -ENODEV;
>>>
>>> No problem, I gave this a try and here is what I get:
>>>
>>> [    2.185190] libphy: tg3 mdio bus: probed
>>> [    2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>>> [    2.244993] config state: 1292
>>> [    2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev 57780001]
>>>          (PCI Express) MAC address 00:19:99:ce:13:a6
>>> [    2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom BCM57780]
>>>          (mii_bus:phy_addr=200:01)
>>> [    2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>>>          MIirq[0] ASF[0] TSOcap[1]
>>> [    2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
>>> [...]
>>> [   12.204692] tg3 0000:02:00.0
>>>          enp2s0: No firmware running
>>> [   12.206653] config state: 1292
>>> [   12.208655] config state: 1292
>>>
>>> That's all of the three times the new debugging line gets hit when I
>>> boot my system using the supplied diagnostic patch.
>>>
>>> Hope that helps - of course, I'd gladly test any further
>>> (diagnostic) patches if required! Also, if I can provide any
>>> additional information that might be of value, just ask:-)
>>>
>> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
>> indicates the chip is not setting the config retry bit. We were hoping
>> this bit is causing the config access to return CRS but looks like it is
>> not.
>>
>> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
>> driver now we are able to reproduce the problem on 5722 in house. We are
>> working with the HW team to narrow this down.
>>
>> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
>> the problem.
>
> The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
> works with any unique *prefix* of that.  The current convention is to
> use the first 12 characters (I have "[core] abbrev = 12" in my
> .git/config).  Unfortunately, suffixes don't work at all.
>
> Anyway, here's why I think 89665a6a7140 makes a difference.  We're in this path:
>
>    pci_device_is_present
>      pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
>        pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
>
> and for some reason the chip returns 0x00010001 for that 32-bit read.

Actually it returns just 0x00000001, but yeah, that's my understanding too.

   Marcelo

> Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
> This is false, so pci_bus_read_dev_vendor_id() returns true, which
> means pci_device_is_present() is also true.
>
> After 89665a6a7140, we compare only the low 16 bits with ((*l &
> 0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
> returns false, and pci_device_is_present() is false.
>
> Bjorn
>

  reply	other threads:[~2014-12-19 17:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-13 21:02 [bisected] tg3 broken in 3.18.0? Nils Holland
2014-12-15 15:06 ` Marcelo Ricardo Leitner
2014-12-16 16:04   ` Rajat Jain
2014-12-16 16:20     ` Bjorn Helgaas
2014-12-16 17:15       ` Michael Chan
2014-12-16 17:59         ` Marcelo Ricardo Leitner
2014-12-16 19:54           ` Michael Chan
2014-12-16 20:02             ` Marcelo Ricardo Leitner
2014-12-18 19:15             ` Bjorn Helgaas
2014-12-18 19:28               ` Prashant Sreedharan
2014-12-18 20:09                 ` Marcelo Ricardo Leitner
2014-12-18 20:33                   ` Marcelo Ricardo Leitner
2014-12-18 20:26                 ` Nils Holland
2014-12-19  2:10                   ` Prashant Sreedharan
2014-12-19 17:09                     ` Bjorn Helgaas
2014-12-19 17:16                       ` Marcelo Ricardo Leitner [this message]
2014-12-19 18:24                         ` Rajat Jain
2014-12-19 18:53                           ` Prashant Sreedharan
2014-12-19 19:37                             ` Rajat Jain
2014-12-16 18:00     ` Marcelo Ricardo Leitner
2014-12-16 20:38       ` Nils Holland
2014-12-16  0:31 ` Bjorn Helgaas
  -- strict thread matches above, loose matches on Subject: below --
2014-12-10 23:06 Nils Holland
2014-12-11 16:45 ` Marcelo Ricardo Leitner
2014-12-12 14:50   ` Jonathan Bither
2014-12-12 20:31     ` Nils Holland
2014-12-13  1:14       ` [bisected] " Nils Holland
2014-12-13  1:18         ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54945D79.2070008@gmail.com \
    --to=marcelo.leitner@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=linux-pci@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=nholland@tisys.org \
    --cc=prashant@broadcom.com \
    --cc=rajatxjain@gmail.com \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).