Re: Problem with PHY state machine when using interrupts

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Florian Fainelli <f.fainelli@gmail.com>
To: Mason <slash.tmp@free.fr>, Andrew Lunn <andrew@lunn.ch>,
	Mans Rullgard <mans@mansr.com>
Cc: netdev <netdev@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: Problem with PHY state machine when using interrupts
Date: Mon, 24 Jul 2017 12:32:10 -0700	[thread overview]
Message-ID: <b0410a73-ff2d-ca06-93fa-17af2040c41c@gmail.com> (raw)
In-Reply-To: <e6a7a5df-b735-6f9c-6a2b-cded86ec6525@free.fr>

On 07/24/2017 12:13 PM, Mason wrote:
> On 24/07/2017 18:49, Florian Fainelli wrote:
> 
>> On 07/24/2017 08:01 AM, Mason wrote:
>>
>>>> When I set the link down via 'ip link set eth0 down'
>>>> (as opposed to pulling the Ethernet cable) things don't happen as expected:
>>>>
>>>> The driver's adjust_link() callback is never called, and doesn't
>>>> get a chance make some required changes. And when I set the link
>>>> up again, there is no network connectivity.
>>>>
>>>> I get this problem only if I enable interrupts on my PHY.
>>>> If I use polling, things work as expected.
>>>>
>>>>
>>>> When I set the link down, devinet_ioctl() eventually calls
>>>> ndo_set_rx_mode() and ndo_stop()
>>>>
>>>> In ndo_stop() the driver calls
>>>> phy_stop(phydev);
>>>> which disables interrupts and sets the state to HALTED.
>>>>
>>>> In phy_state_machine()
>>>> the PHY_HALTED case does call the adjust_link() callback:
>>>>
>>>> 		if (phydev->link) {
>>>> 			phydev->link = 0;
>>>> 			netif_carrier_off(phydev->attached_dev);
>>>> 			phy_adjust_link(phydev);
>>>> 			do_suspend = true;
>>>> 		}
>>>>
>>>> But it's not called when I use interrupts...
>>>>
>>>> Perhaps because there are no interrupts generated?
>>>> Or even if there were, they have been turned off by phy_stop?
>>>>
>>>> Basically, it seems like when I use interrupts,
>>>> the phy_state_machine() is not called on link down,
>>>> which breaks the MAC driver's expectations.
>>>>
>>>> Am I barking up the wrong tree?
>>>
>>> FWIW, the patch below solves my issue.
>>> Basically, we reset the MAC in open(), instead of probe().
>>>
>>> I also had to solve the issue of adjust_link() not being
>>> called by calling it explicitly in stop() instead of
>>> relying on phy_stop() to do it indirectly.
>>
>> Which is of course absolutely not how it is intended to be used.
>> phy_stop() does the following:
>>
>> - if the PHY was already HALTED do nothing and exit
>> - if it was not and an interrupt is valid for this PHY:  disable and
>> clear these interrupts
>> - set state to PHY_HALTED
>>
>> somehow an interrupt should be generated from doing this such that
>> phy_change(), invoked from phy_interrupt() should have a chance to run
>> and make the PHY state machine transition properly to PHY_HALTED.
> 
> I'm totally confused. Are you saying that phy_stop itself
> should trigger an interrupt, or that the process of setting
> the link down should generate an interrupt *before* we reach
> phy_stop?

My reading of the code, and because I don't actually have a system where
PHY interrupts proper are used (only polling or PHY_IGNORE INTERRUPT) is
that, yes, somehow calling phy_stop() should result in a PHY interrupt
to be generated making the state machine move to PHY_HALTED.

> 
> I'm also perplex over this synchronous IRQ business.
> Should I be looking for a way to trigger an IRQ in
> software in the Atheros PHY?

No, first understand the problem and what is going on before trying to
workaround things in the PHY driver, there were questions for you as to
what state the PHY state machine is left in we need to see that to
understand how to possibly fix what you are seeing.

> 
> Before I forget: there is also an issue when using the PHY
> in polling mode. The ndo_stop callback runs through phy_stop
> and phy_disconnect too fast for the adjust_link() callback
> to be called. My patch fixed that too, by calling
> nb8800_link_reconfigure() explicitly.

Most, if not all drivers should have this:

ndo_open() calls phy_connect() or phy_attach() + phy_start() because
that allows you to properly manage the PHY's power state and the state
machine, the reciprocal is to have ndo_stop() call phy_disconnect() (and
just that) which properly waits for the PHY state machine to be fully
stopped.

phy_stop() returns immediately but the PHY state machine only gets
stopped asynchronously at a later time, either with an interrupt or with
an explicit work queue scheduling. If you call phy_disconnect() right
after, this cancels the work queue and it may not have run the
adjust_link callback yet.

> 
> 
>> So from there can you check a few things:
>>
>> - is such an interrupt actually generated?
>> - if you turn on dynamic debug prints for drivers/net/phy/phy.c where do
>> we leave the PHY state machine and what state is it in when you call
>> ifconfig up again?
> 
> The only interrupts I've ever seen the PHY generate are
> on plugging/unplugging the Ethernet cable.
> 
> Looking at the driver and datasheet...
> http://elixir.free-electrons.com/linux/v4.13-rc2/source/drivers/net/phy/at803x.c#L312
> 		value |= AT803X_INTR_ENABLE_AUTONEG_ERR;
> 		value |= AT803X_INTR_ENABLE_SPEED_CHANGED;
> 		value |= AT803X_INTR_ENABLE_DUPLEX_CHANGED;
> 		value |= AT803X_INTR_ENABLE_LINK_FAIL;
> 		value |= AT803X_INTR_ENABLE_LINK_SUCCESS;
> 
> And the interrupts reasons supported by the PHY are:
> #define AT803X_INTR_ENABLE_AUTONEG_ERR		BIT(15)
> #define AT803X_INTR_ENABLE_SPEED_CHANGED	BIT(14)
> #define AT803X_INTR_ENABLE_DUPLEX_CHANGED	BIT(13)
> #define AT803X_INTR_ENABLE_PAGE_RECEIVED	BIT(12)
> #define AT803X_INTR_ENABLE_LINK_FAIL		BIT(11)
> #define AT803X_INTR_ENABLE_LINK_SUCCESS		BIT(10)
> #define AT803X_INTR_ENABLE_WIRESPEED_DOWNGRADE	BIT(5)
> #define AT803X_INTR_ENABLE_POLARITY_CHANGED	BIT(1)
> #define AT803X_INTR_ENABLE_WOL			BIT(0)
> 
> These all seem to be external reasons (from the peer).
> 
> I did enable debug logs in drivers/net/phy/phy.c
> to trace the state machine, and it is not called
> at all on set link down, so it remains in state
> RUNNING (both in polling and interrupt modes).
> 
> IIRC, this used to work on the 2-core board, but breaks
> on the 4-core board. Could this be some kind of race?

See what I just replied, phy_stop() then phy_disconnect() is "racy"
because there is a work queue involved. If I can read the code so can you.
-- 
Florian

next prev parent reply	other threads:[~2017-07-24 19:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 11:07 Problem with PHY state machine when using interrupts Mason
2017-07-24 15:01 ` Mason
2017-07-24 16:49   ` Florian Fainelli
2017-07-24 19:13     ` Mason
2017-07-24 19:32       ` Florian Fainelli [this message]
2017-07-24 19:53         ` Florian Fainelli
2017-07-24 21:20           ` Mason
2017-07-24 22:36             ` Florian Fainelli
2017-07-24 22:39               ` Florian Fainelli
2017-07-25 10:51                 ` Mason
2017-07-25 11:41                   ` Mason
2017-07-25 17:55                     ` Florian Fainelli
2017-07-24 22:53               ` Mason
2017-07-24 22:59                 ` Florian Fainelli
2017-07-25  0:30                   ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0410a73-ff2d-ca06-93fa-17af2040c41c@gmail.com \
    --to=f.fainelli@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mans@mansr.com \
    --cc=netdev@vger.kernel.org \
    --cc=slash.tmp@free.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).