All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Erico Nunes <nunes.erico@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>,
	Martin Blumenstingl <martin.blumenstingl@googlemail.com>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Giuseppe Cavallaro <peppe.cavallaro@st.com>,
	Jose Abreu <joabreu@synopsys.com>,
	Kevin Hilman <khilman@baylibre.com>,
	Neil Armstrong <narmstrong@baylibre.com>,
	linux-amlogic@lists.infradead.org, netdev@vger.kernel.org,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip@lists.infradead.org>,
	linux-sunxi@lists.linux.dev
Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
Date: Sun, 6 Mar 2022 13:56:18 +0100	[thread overview]
Message-ID: <435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com> (raw)
In-Reply-To: <CAK4VdL08sdZV7o7Bw=cutdmoCEi1NYB-yisstLqRuH7QcHOHvA@mail.gmail.com>

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

WARNING: multiple messages have this Message-ID (diff)
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Erico Nunes <nunes.erico@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>,
	Martin Blumenstingl <martin.blumenstingl@googlemail.com>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Giuseppe Cavallaro <peppe.cavallaro@st.com>,
	Jose Abreu <joabreu@synopsys.com>,
	Kevin Hilman <khilman@baylibre.com>,
	Neil Armstrong <narmstrong@baylibre.com>,
	linux-amlogic@lists.infradead.org, netdev@vger.kernel.org,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip@lists.infradead.org>,
	linux-sunxi@lists.linux.dev
Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
Date: Sun, 6 Mar 2022 13:56:18 +0100	[thread overview]
Message-ID: <435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com> (raw)
In-Reply-To: <CAK4VdL08sdZV7o7Bw=cutdmoCEi1NYB-yisstLqRuH7QcHOHvA@mail.gmail.com>

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

WARNING: multiple messages have this Message-ID (diff)
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Erico Nunes <nunes.erico@gmail.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>,
	Martin Blumenstingl <martin.blumenstingl@googlemail.com>,
	Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Giuseppe Cavallaro <peppe.cavallaro@st.com>,
	Jose Abreu <joabreu@synopsys.com>,
	Kevin Hilman <khilman@baylibre.com>,
	Neil Armstrong <narmstrong@baylibre.com>,
	linux-amlogic@lists.infradead.org, netdev@vger.kernel.org,
	"open list:ARM/Rockchip SoC..."
	<linux-rockchip@lists.infradead.org>,
	linux-sunxi@lists.linux.dev
Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
Date: Sun, 6 Mar 2022 13:56:18 +0100	[thread overview]
Message-ID: <435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com> (raw)
In-Reply-To: <CAK4VdL08sdZV7o7Bw=cutdmoCEi1NYB-yisstLqRuH7QcHOHvA@mail.gmail.com>

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



  reply	other threads:[~2022-03-06 12:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02 20:18 net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot Erico Nunes
2022-02-02 20:18 ` Erico Nunes
2022-02-03 13:53 ` Vyacheslav
2022-02-03 13:53   ` Vyacheslav
2022-02-07 10:41 ` Jerome Brunet
2022-02-07 10:41   ` Jerome Brunet
2022-02-07 10:41   ` Jerome Brunet
2022-02-20 16:51   ` Erico Nunes
2022-02-20 16:51     ` Erico Nunes
2022-02-20 16:51     ` Erico Nunes
2022-02-22  2:30     ` Samuel Holland
2022-02-22  2:30       ` Samuel Holland
2022-02-22  2:30       ` Samuel Holland
2022-02-26 13:53     ` Heiner Kallweit
2022-02-26 13:53       ` Heiner Kallweit
2022-02-26 13:53       ` Heiner Kallweit
2022-03-02 10:33       ` Erico Nunes
2022-03-02 10:33         ` Erico Nunes
2022-03-02 10:33         ` Erico Nunes
2022-03-02 11:01         ` Heiner Kallweit
2022-03-02 11:01           ` Heiner Kallweit
2022-03-02 11:01           ` Heiner Kallweit
2022-03-02 13:39           ` Jerome Brunet
2022-03-02 13:39             ` Jerome Brunet
2022-03-02 13:39             ` Jerome Brunet
2022-03-02 16:34             ` Heiner Kallweit
2022-03-02 16:34               ` Heiner Kallweit
2022-03-02 16:34               ` Heiner Kallweit
2022-03-06  9:40               ` Erico Nunes
2022-03-06  9:40                 ` Erico Nunes
2022-03-06  9:40                 ` Erico Nunes
2022-03-06 12:56                 ` Heiner Kallweit [this message]
2022-03-06 12:56                   ` Heiner Kallweit
2022-03-06 12:56                   ` Heiner Kallweit
2022-03-09 14:45                   ` Erico Nunes
2022-03-09 14:45                     ` Erico Nunes
2022-03-09 14:45                     ` Erico Nunes
2022-03-09 14:57                     ` Jerome Brunet
2022-03-09 14:57                       ` Jerome Brunet
2022-03-09 14:57                       ` Jerome Brunet
2022-03-09 20:42                       ` Heiner Kallweit
2022-03-09 20:42                         ` Heiner Kallweit
2022-03-09 20:42                         ` Heiner Kallweit
     [not found]                         ` <CACdvmAhcyNXViJgk6o6oAoYvAjAg-NFD74Eym_nGHJx3YAqjzw@mail.gmail.com>
2022-06-13  9:10                           ` Jerome Brunet
2022-06-13  9:10                             ` Jerome Brunet
2022-06-13  9:10                             ` Jerome Brunet
2022-07-15  5:35                             ` Anand Moon
2022-07-15  5:35                               ` Anand Moon
2022-07-15  5:35                               ` Anand Moon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=jbrunet@baylibre.com \
    --cc=joabreu@synopsys.com \
    --cc=khilman@baylibre.com \
    --cc=linux-amlogic@lists.infradead.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-sunxi@lists.linux.dev \
    --cc=martin.blumenstingl@googlemail.com \
    --cc=narmstrong@baylibre.com \
    --cc=netdev@vger.kernel.org \
    --cc=nunes.erico@gmail.com \
    --cc=peppe.cavallaro@st.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.