netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Måns Rullgård" <mans@mansr.com>
To: Mason <slash.tmp@free.fr>
Cc: Florian Fainelli <f.fainelli@gmail.com>,
	Marc Gonzalez <marc_gonzalez@sigmadesigns.com>,
	netdev <netdev@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open
Date: Mon, 31 Jul 2017 16:18:42 +0100	[thread overview]
Message-ID: <yw1xshhciqi5.fsf@mansr.com> (raw)
In-Reply-To: <31792688-7720-1d1e-4fd4-a0f96ac8af59@free.fr> (Mason's message of "Mon, 31 Jul 2017 16:08:31 +0200")

Mason <slash.tmp@free.fr> writes:

> On 31/07/2017 13:59, Måns Rullgård wrote:
>
>> Mason writes:
>> 
>>> On 29/07/2017 17:18, Florian Fainelli wrote:
>>>
>>>> On 07/29/2017 05:02 AM, Mason wrote:
>>>>
>>>>> I have identified a 100% reproducible flaw.
>>>>> I have proposed a work-around that brings this down to 0
>>>>> (tested 1000 cycles of link up / ping / link down).
>>>>
>>>> Can you also try to get help from your HW resources to eventually help
>>>> you find out what is going on here?
>>>
>>> The patch I proposed /is/ based on the feedback from the HW team :-(
>>> "Just reset the HW block, and everything will work as expected."
>> 
>> Nobody is saying a reset won't recover the lockup.  The problem is that
>> we don't know what caused it to lock up in the first place.  How do we
>> know it can't happen during normal operation?  If we knew the cause, it
>> might also be possible to avoid the situation entirely.
>
> How does one prove that something "can't happen during normal operation"?

One figures out what conditions cause the something and ensures they
never arise.

> The "put adapter in loop-back mode so we can send ourselves fake packets"
> shenanigans seems completely insane, if you ask me.

Blame the hardware designers.  The *only* way to stop the rx dma is to
have it receive a packet into a descriptor with the end of chain flag
set.  Thankfully the loopback mode means this can be made to happen at
will rather than waiting for actual network traffic.

> Other things make no sense to me, for example in nb8800_dma_stop()
> there is a polling loop:
>
> 	do {
> 		mdelay(100);
> 		nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
> 		wmb();
> 		mdelay(100);
> 		nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);
>
> 		mdelay(5500);
>
> 		err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
> 						rxcr, !(rxcr & RCR_EN),
> 						1000, 100000);
> 		printk("err=%d retry=%d\n", err, retry);
> 	} while (err && --retry);
>
> (It was me who added the delays.)
>
> *Whatever* delays I insert, it always goes 3 times through the loop.
>
> [   29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
> [   29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   35.364705] err=-110 retry=5
> [   35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
> [   35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   41.177822] err=-110 retry=4
> [   41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
> [   41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
> [   46.890907] err=0 retry=3
>
> How is that possible?

One possibility is that the hardware loads three descriptors in advance
and doesn't see the newly set end of chain flag until its internal queue
has been used up.

> I've tried using spinlocks and delays to get parallel execution
> down to a minimum, and have the same logs on both boards.
>
> Regards.

-- 
Måns Rullgård

      parent reply	other threads:[~2017-07-31 15:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-28 16:13 [RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open Marc Gonzalez
2017-07-28 16:17 ` Måns Rullgård
2017-07-28 16:43   ` Marc Gonzalez
2017-07-28 18:56     ` Måns Rullgård
2017-07-28 21:53       ` Mason
2017-07-29 11:24         ` Måns Rullgård
2017-07-29 12:02           ` Mason
2017-07-29 12:05             ` Måns Rullgård
2017-07-29 12:44               ` Mason
2017-07-29 12:51                 ` Måns Rullgård
2017-07-29 20:15                 ` Florian Fainelli
2017-07-29 22:48                   ` Mason
2017-07-29 15:18             ` Florian Fainelli
2017-07-31 11:49               ` Mason
2017-07-31 11:59                 ` Måns Rullgård
2017-07-31 14:08                   ` Mason
2017-07-31 15:18                     ` Mason
2017-07-31 15:28                       ` Måns Rullgård
2017-07-31 15:18                     ` Måns Rullgård [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yw1xshhciqi5.fsf@mansr.com \
    --to=mans@mansr.com \
    --cc=f.fainelli@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=marc_gonzalez@sigmadesigns.com \
    --cc=netdev@vger.kernel.org \
    --cc=slash.tmp@free.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).