Experimental fix for MSI-X issue on r8169

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Experimental fix for MSI-X issue on r8169
@ 2018-08-19 20:34 Heiner Kallweit
  2018-08-20  3:47 ` Jian-Hong Pan
  0 siblings, 1 reply; 10+ messages in thread
From: Heiner Kallweit @ 2018-08-19 20:34 UTC (permalink / raw)
  To: Steve Dodd, Lou Reed, Jian-Hong Pan; +Cc: netdev@vger.kernel.org

The three of you reported an MSI-X-related error when the system
resumes from suspend. This has been fixed for now by disabling MSI-X
on certain chip versions. However more versions may be affected.

I checked with Realtek and they confirmed that on certain chip
versions a MSIX-related value in PCI config space is reset when
resuming from S3.

I would appreciate if you could test the following experimental patch
and whether warning "MSIX address lost, re-configuring" appears in
your dmesg output after resume from suspend.

Thanks a lot for your efforts.

Heiner

---
 drivers/net/ethernet/realtek/r8169.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0d9c38318..56b4bdff9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -690,6 +690,8 @@ struct rtl8169_private {
 	struct rtl8169_counters *counters;
 	struct rtl8169_tc_offsets tc_offset;
 	u32 saved_wolopts;
+	u32 saved_msix_addr_lo;
+	u32 saved_msix_addr_hi;
 
 	struct rtl_fw {
 		const struct firmware *fw;
@@ -6876,6 +6878,19 @@ static int rtl8169_resume(struct device *device)
 {
 	struct pci_dev *pdev = to_pci_dev(device);
 	struct net_device *dev = pci_get_drvdata(pdev);
+	struct rtl8169_private *tp = netdev_priv(dev);
+	u32 val;
+
+	/* Some chip versions loose these values when resuming */
+	if (pdev->msix_enabled) {
+		pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, &val);
+		if (!val)
+			dev_warn(device, "MSIX address lost, re-configuring\n");
+		pci_write_config_dword(pdev, PCI_BASE_ADDRESS_4,
+				       tp->saved_msix_addr_lo);
+		pci_write_config_dword(pdev, PCI_BASE_ADDRESS_5,
+				       tp->saved_msix_addr_hi);
+	}
 
 	if (netif_running(dev))
 		__rtl8169_resume(dev);
@@ -7076,11 +7091,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
 		RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 		flags = PCI_IRQ_LEGACY;
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
-		/* This version was reported to have issues with resume
-		 * from suspend when using MSI-X
-		 */
-		flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
 	} else {
 		flags = PCI_IRQ_ALL_TYPES;
 	}
@@ -7355,6 +7365,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return rc;
 	}
 
+	if (pdev->msix_enabled) {
+		pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4,
+				      &tp->saved_msix_addr_lo);
+		pci_read_config_dword(pdev, PCI_BASE_ADDRESS_5,
+				      &tp->saved_msix_addr_hi);
+	}
+
 	tp->saved_wolopts = __rtl8169_get_wol(tp);
 
 	mutex_init(&tp->wk.mutex);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-19 20:34 Experimental fix for MSI-X issue on r8169 Heiner Kallweit
@ 2018-08-20  3:47 ` Jian-Hong Pan
  2018-08-21 17:57   ` Steve Dodd
  2018-08-21 21:19   ` Heiner Kallweit
  0 siblings, 2 replies; 10+ messages in thread
From: Jian-Hong Pan @ 2018-08-20  3:47 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org

2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>:
> The three of you reported an MSI-X-related error when the system
> resumes from suspend. This has been fixed for now by disabling MSI-X
> on certain chip versions. However more versions may be affected.
>
> I checked with Realtek and they confirmed that on certain chip
> versions a MSIX-related value in PCI config space is reset when
> resuming from S3.
>
> I would appreciate if you could test the following experimental patch
> and whether warning "MSIX address lost, re-configuring" appears in
> your dmesg output after resume from suspend.
>
> Thanks a lot for your efforts.

Tested with the experiment patch on ASUS X441UAR.

This is the information before suspend:

dev@endless:~$ dmesg | grep r8169
[   10.279565] libphy: r8169: probed
[   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
XID 44900000, IRQ 127
[   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off

dev@endless:~$ ip addr show enp2s0
4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP group default qlen 1000
    link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
    inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute
dynamic enp2s0
       valid_lft 86347sec preferred_lft 86347sec
    inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

This is the information after resume:

dev@endless:~$ dmesg | grep r8169
[   10.279565] libphy: r8169: probed
[   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
XID 44900000, IRQ 127
[   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[   95.594265] r8169 0000:02:00.0 enp2s0: Link is Down
[   96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)

dev@endless:~$ ip addr show enp2s0
4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff

There is no "MSIX address lost, re-configuring" in dmesg.
The ethernet interface is still down after resume.

This is the ethernet controller in detail:

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet
controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169

Regards,
Jian-Hong Pan

>
> ---
>  drivers/net/ethernet/realtek/r8169.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 0d9c38318..56b4bdff9 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -690,6 +690,8 @@ struct rtl8169_private {
>         struct rtl8169_counters *counters;
>         struct rtl8169_tc_offsets tc_offset;
>         u32 saved_wolopts;
> +       u32 saved_msix_addr_lo;
> +       u32 saved_msix_addr_hi;
>
>         struct rtl_fw {
>                 const struct firmware *fw;
> @@ -6876,6 +6878,19 @@ static int rtl8169_resume(struct device *device)
>  {
>         struct pci_dev *pdev = to_pci_dev(device);
>         struct net_device *dev = pci_get_drvdata(pdev);
> +       struct rtl8169_private *tp = netdev_priv(dev);
> +       u32 val;
> +
> +       /* Some chip versions loose these values when resuming */
> +       if (pdev->msix_enabled) {
> +               pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, &val);
> +               if (!val)
> +                       dev_warn(device, "MSIX address lost, re-configuring\n");
> +               pci_write_config_dword(pdev, PCI_BASE_ADDRESS_4,
> +                                      tp->saved_msix_addr_lo);
> +               pci_write_config_dword(pdev, PCI_BASE_ADDRESS_5,
> +                                      tp->saved_msix_addr_hi);
> +       }
>
>         if (netif_running(dev))
>                 __rtl8169_resume(dev);
> @@ -7076,11 +7091,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
>                 RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
>                 RTL_W8(tp, Cfg9346, Cfg9346_Lock);
>                 flags = PCI_IRQ_LEGACY;
> -       } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
> -               /* This version was reported to have issues with resume
> -                * from suspend when using MSI-X
> -                */
> -               flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
>         } else {
>                 flags = PCI_IRQ_ALL_TYPES;
>         }
> @@ -7355,6 +7365,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>                 return rc;
>         }
>
> +       if (pdev->msix_enabled) {
> +               pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4,
> +                                     &tp->saved_msix_addr_lo);
> +               pci_read_config_dword(pdev, PCI_BASE_ADDRESS_5,
> +                                     &tp->saved_msix_addr_hi);
> +       }
> +
>         tp->saved_wolopts = __rtl8169_get_wol(tp);
>
>         mutex_init(&tp->wk.mutex);
> --
> 2.18.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-20  3:47 ` Jian-Hong Pan
@ 2018-08-21 17:57   ` Steve Dodd
  2018-08-21 21:19   ` Heiner Kallweit
  1 sibling, 0 replies; 10+ messages in thread
From: Steve Dodd @ 2018-08-21 17:57 UTC (permalink / raw)
  To: Jian-Hong Pan; +Cc: Heiner Kallweit, Lou Reed, netdev@vger.kernel.org

On 20 August 2018 at 04:47, Jian-Hong Pan <jian-hong@endlessm.com> wrote:

> There is no "MSIX address lost, re-configuring" in dmesg.
> The ethernet interface is still down after resume.

Sorry, only just seen this thread. I can confirm Jian-Jong's report --
this patch doesn't help (applied to 4.18.3); no message is output.
Thanks for the investigatory effort though!

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-20  3:47 ` Jian-Hong Pan
  2018-08-21 17:57   ` Steve Dodd
@ 2018-08-21 21:19   ` Heiner Kallweit
  2018-08-21 21:48     ` Steve Dodd
                       ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Heiner Kallweit @ 2018-08-21 21:19 UTC (permalink / raw)
  To: Jian-Hong Pan; +Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org

On 20.08.2018 05:47, Jian-Hong Pan wrote:
> 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>:
>> The three of you reported an MSI-X-related error when the system
>> resumes from suspend. This has been fixed for now by disabling MSI-X
>> on certain chip versions. However more versions may be affected.
>>
>> I checked with Realtek and they confirmed that on certain chip
>> versions a MSIX-related value in PCI config space is reset when
>> resuming from S3.
>>
>> I would appreciate if you could test the following experimental patch
>> and whether warning "MSIX address lost, re-configuring" appears in
>> your dmesg output after resume from suspend.
>>
>> Thanks a lot for your efforts.
> 
> Tested with the experiment patch on ASUS X441UAR.
> 
> This is the information before suspend:
> 
> dev@endless:~$ dmesg | grep r8169
> [   10.279565] libphy: r8169: probed
> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
> XID 44900000, IRQ 127
> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
> flow control off
> 
> dev@endless:~$ ip addr show enp2s0
> 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UP group default qlen 1000
>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
>     inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute
> dynamic enp2s0
>        valid_lft 86347sec preferred_lft 86347sec
>     inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute
>        valid_lft forever preferred_lft forever
> 
> This is the information after resume:
> 
> dev@endless:~$ dmesg | grep r8169
> [   10.279565] libphy: r8169: probed
> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
> XID 44900000, IRQ 127
> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
> flow control off
> [   95.594265] r8169 0000:02:00.0 enp2s0: Link is Down
> [   96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic
> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
> 
> dev@endless:~$ ip addr show enp2s0
> 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
> pfifo_fast state DOWN group default qlen 1000
>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
> 
> There is no "MSIX address lost, re-configuring" in dmesg.
> The ethernet interface is still down after resume.
> 

Thanks a lot for testing. Unfortunately I don't have test hardware
affected by this MSI-X issue, so maybe you can help me to understand
the issue a little better.

Below is a patch printing the MSI-X table entry in different contexts,
it's not supposed to fix anything. Could you please let me know
what the output is on your system?
I want to get an idea whether the issue clears the complete entry or
just corrupts certain parts.

That's what I get on my system (RTL8168E-VL). In your case you'll come
only till the first suspend.

[    3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
[   29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0
[   29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0
[   36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0
[   37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 54f53c8c0..f32645119 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -11,6 +11,7 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <linux/pci.h>
+#include <linux/msi.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>
 #include <linux/delay.h>
@@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	pm_runtime_put_noidle(&pdev->dev);
 }
 
+static void rtl_print_msix_entry(struct rtl8169_private *tp, const char *context)
+{
+	struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev);
+	u32 data[4];
+
+	data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR);
+	data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR);
+	data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA);
+	data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+	dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n",
+		 context, data[0], data[1], data[2], data[3]);
+}
+
 static void rtl8169_net_suspend(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
@@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device)
 {
 	struct pci_dev *pdev = to_pci_dev(device);
 	struct net_device *dev = pci_get_drvdata(pdev);
+	struct rtl8169_private *tp = netdev_priv(dev);
 
 	rtl8169_net_suspend(dev);
 
+	rtl_print_msix_entry(tp, "suspend");
+
 	return 0;
 }
 
@@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device)
 {
 	struct pci_dev *pdev = to_pci_dev(device);
 	struct net_device *dev = pci_get_drvdata(pdev);
+	struct rtl8169_private *tp = netdev_priv(dev);
+
+	rtl_print_msix_entry(tp, "resume");
 
 	if (netif_running(dev))
 		__rtl8169_resume(dev);
@@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
 		RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 		flags = PCI_IRQ_LEGACY;
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
-		/* This version was reported to have issues with resume
-		 * from suspend when using MSI-X
-		 */
-		flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
 	} else {
 		flags = PCI_IRQ_ALL_TYPES;
 	}
@@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		return rc;
 	}
 
+	rtl_print_msix_entry(tp, "probe");
+
 	tp->saved_wolopts = __rtl8169_get_wol(tp);
 
 	mutex_init(&tp->wk.mutex);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-21 21:19   ` Heiner Kallweit
@ 2018-08-21 21:48     ` Steve Dodd
  2018-08-21 23:32     ` David Miller
  2018-08-22  3:01     ` Jian-Hong Pan
  2 siblings, 0 replies; 10+ messages in thread
From: Steve Dodd @ 2018-08-21 21:48 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Jian-Hong Pan, Lou Reed, netdev@vger.kernel.org

On 21 August 2018 at 22:19, Heiner Kallweit <hkallweit1@gmail.com> wrote:

> Below is a patch printing the MSI-X table entry in different contexts,
> it's not supposed to fix anything. Could you please let me know
> what the output is on your system?
> I want to get an idea whether the issue clears the complete entry or
> just corrupts certain parts.

Here we go..

[    2.865434] r8169 0000:01:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
[  269.747608] r8169 0000:01:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
[  295.454099] r8169 0000:01:00.0: MSI-X entry: context suspend:
fee08004 0 4028 0

S.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-21 21:19   ` Heiner Kallweit
  2018-08-21 21:48     ` Steve Dodd
@ 2018-08-21 23:32     ` David Miller
  2018-08-22  3:01     ` Jian-Hong Pan
  2 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2018-08-21 23:32 UTC (permalink / raw)
  To: hkallweit1; +Cc: jian-hong, steved424, gogen, netdev

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Tue, 21 Aug 2018 23:19:04 +0200

> That's what I get on my system (RTL8168E-VL). In your case you'll come
> only till the first suspend.
> 
> [    3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1

On probe, MSI-X is masked (ie. disabled) and is configured to use:

	address: 0x00000000fee01004
	data:    0x00000000000040ef

> [   29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0

At suspend time, MSI-X is unmasked (ie. enabled) and is configured to use:

	address: 0x00000000fee01004
	data:    0x0000000000004028

> [   29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0

At reume time, MSI-X is unmasked (ie. enabled) and is configured to use:

	address: 0x00000000fee01004
	data:    0x000000000000402b

> [   36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0

Second suspend:

	address: 0x00000000fee01004
	data:    0x000000000000402b

> [   37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0

Second resume:

	address: 0x00000000fee01004
	data:    0x000000000000402b

And this all looks normal.  The data field is changing when you first up the device
and interrupts are enabled.  This is where the request_irq happens, the MSI
vector is allocated, and that vector number is written to the data field of the
MSI-X entry.

It looks like this (re-)allocation of MSI vectors happens on every resume as well.

And that's why the data field changes each resume.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-21 21:19   ` Heiner Kallweit
  2018-08-21 21:48     ` Steve Dodd
  2018-08-21 23:32     ` David Miller
@ 2018-08-22  3:01     ` Jian-Hong Pan
  2018-08-22  4:24       ` David Miller
  2 siblings, 1 reply; 10+ messages in thread
From: Jian-Hong Pan @ 2018-08-22  3:01 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org,
	Linux Upstreaming Team

2018-08-22 5:19 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>:
> On 20.08.2018 05:47, Jian-Hong Pan wrote:
>> 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>:
>>> The three of you reported an MSI-X-related error when the system
>>> resumes from suspend. This has been fixed for now by disabling MSI-X
>>> on certain chip versions. However more versions may be affected.
>>>
>>> I checked with Realtek and they confirmed that on certain chip
>>> versions a MSIX-related value in PCI config space is reset when
>>> resuming from S3.
>>>
>>> I would appreciate if you could test the following experimental patch
>>> and whether warning "MSIX address lost, re-configuring" appears in
>>> your dmesg output after resume from suspend.
>>>
>>> Thanks a lot for your efforts.
>>
>> Tested with the experiment patch on ASUS X441UAR.
>>
>> This is the information before suspend:
>>
>> dev@endless:~$ dmesg | grep r8169
>> [   10.279565] libphy: r8169: probed
>> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
>> XID 44900000, IRQ 127
>> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
>> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
>> flow control off
>>
>> dev@endless:~$ ip addr show enp2s0
>> 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP group default qlen 1000
>>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
>>     inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute
>> dynamic enp2s0
>>        valid_lft 86347sec preferred_lft 86347sec
>>     inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute
>>        valid_lft forever preferred_lft forever
>>
>> This is the information after resume:
>>
>> dev@endless:~$ dmesg | grep r8169
>> [   10.279565] libphy: r8169: probed
>> [   10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
>> XID 44900000, IRQ 127
>> [   10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0
>> [   15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>> [   17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
>> flow control off
>> [   95.594265] r8169 0000:02:00.0 enp2s0: Link is Down
>> [   96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic
>> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
>>
>> dev@endless:~$ ip addr show enp2s0
>> 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
>> pfifo_fast state DOWN group default qlen 1000
>>     link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff
>>
>> There is no "MSIX address lost, re-configuring" in dmesg.
>> The ethernet interface is still down after resume.
>>
>
> Thanks a lot for testing. Unfortunately I don't have test hardware
> affected by this MSI-X issue, so maybe you can help me to understand
> the issue a little better.
>
> Below is a patch printing the MSI-X table entry in different contexts,
> it's not supposed to fix anything. Could you please let me know
> what the output is on your system?
> I want to get an idea whether the issue clears the complete entry or
> just corrupts certain parts.

Here is the test result on ASUS X441UAR with this patch:

dev@endless:~$ dmesg | grep -E "(r8169|enp2s0)"
[    8.980001] r8169 0000:02:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
[    8.981594] libphy: r8169: probed
[    8.981769] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4,
XID 44900000, IRQ 127
[    9.479848] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   11.332834] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   11.336350] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   11.574892] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   11.581816] r8169 0000:02:00.0 enp2s0: Link is Down
[   13.190535] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[   13.190548] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[   56.227974] r8169 0000:02:00.0: MSI-X entry: context suspend:
fee04004 0 4024 0
[   56.462464] r8169 0000:02:00.0: MSI-X entry: context resume:
ffffffff ffffffff ffffffff ffffffff
[   58.406713] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[   58.766740] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[   58.767331] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   59.003660] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready

uh!  The MSI-X entry seems missed after resume on this laptop!

Ethernet interface status after resume:
dev@endless:~$ ip addr show enp2s0
3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
pfifo_fast state DOWN group default qlen 1000
    link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff

Regards,
Jian-Hong Pan

> That's what I get on my system (RTL8168E-VL). In your case you'll come
> only till the first suspend.
>
> [    3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1
> [   29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0
> [   29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0
> [   36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0
> [   37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 54f53c8c0..f32645119 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -11,6 +11,7 @@
>  #include <linux/module.h>
>  #include <linux/moduleparam.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/netdevice.h>
>  #include <linux/etherdevice.h>
>  #include <linux/delay.h>
> @@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>         pm_runtime_put_noidle(&pdev->dev);
>  }
>
> +static void rtl_print_msix_entry(struct rtl8169_private *tp, const char *context)
> +{
> +       struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev);
> +       u32 data[4];
> +
> +       data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR);
> +       data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR);
> +       data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA);
> +       data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
> +
> +       dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n",
> +                context, data[0], data[1], data[2], data[3]);
> +}
> +
>  static void rtl8169_net_suspend(struct net_device *dev)
>  {
>         struct rtl8169_private *tp = netdev_priv(dev);
> @@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device)
>  {
>         struct pci_dev *pdev = to_pci_dev(device);
>         struct net_device *dev = pci_get_drvdata(pdev);
> +       struct rtl8169_private *tp = netdev_priv(dev);
>
>         rtl8169_net_suspend(dev);
>
> +       rtl_print_msix_entry(tp, "suspend");
> +
>         return 0;
>  }
>
> @@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device)
>  {
>         struct pci_dev *pdev = to_pci_dev(device);
>         struct net_device *dev = pci_get_drvdata(pdev);
> +       struct rtl8169_private *tp = netdev_priv(dev);
> +
> +       rtl_print_msix_entry(tp, "resume");
>
>         if (netif_running(dev))
>                 __rtl8169_resume(dev);
> @@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
>                 RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
>                 RTL_W8(tp, Cfg9346, Cfg9346_Lock);
>                 flags = PCI_IRQ_LEGACY;
> -       } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
> -               /* This version was reported to have issues with resume
> -                * from suspend when using MSI-X
> -                */
> -               flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
>         } else {
>                 flags = PCI_IRQ_ALL_TYPES;
>         }
> @@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>                 return rc;
>         }
>
> +       rtl_print_msix_entry(tp, "probe");
> +
>         tp->saved_wolopts = __rtl8169_get_wol(tp);
>
>         mutex_init(&tp->wk.mutex);
> --
> 2.18.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-22  3:01     ` Jian-Hong Pan
@ 2018-08-22  4:24       ` David Miller
  2018-08-22  5:56         ` Heiner Kallweit
  2018-08-22  6:41         ` Steve Dodd
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2018-08-22  4:24 UTC (permalink / raw)
  To: jian-hong; +Cc: hkallweit1, steved424, gogen, netdev, linux

From: Jian-Hong Pan <jian-hong@endlessm.com>
Date: Wed, 22 Aug 2018 11:01:02 +0800

 ...
> [   56.462464] r8169 0000:02:00.0: MSI-X entry: context resume:
> ffffffff ffffffff ffffffff ffffffff
 ...
> uh!  The MSI-X entry seems missed after resume on this laptop!

Yeah, having all of the MSI-X entry values be all-1's is not a good
sign.

But this is quite a curious set of debugging traces we now have.

In the working case, the vector number in the DATA field seems
to change, which suggests that something is assigning new values
and programming them into these fields at resume time.

But in the failing cases, all of the values are garbage.

I would expect, given what the working trace looks like, that in the
failing case some values would be wrong and the DATA value would have
some new yet valid value.  But that is not what we are seeing here.

Weird.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-22  4:24       ` David Miller
@ 2018-08-22  5:56         ` Heiner Kallweit
  2018-08-22  6:41         ` Steve Dodd
  1 sibling, 0 replies; 10+ messages in thread
From: Heiner Kallweit @ 2018-08-22  5:56 UTC (permalink / raw)
  To: David Miller, jian-hong; +Cc: steved424, gogen, netdev, linux

On 22.08.2018 06:24, David Miller wrote:
> From: Jian-Hong Pan <jian-hong@endlessm.com>
> Date: Wed, 22 Aug 2018 11:01:02 +0800
> 
>  ...
>> [   56.462464] r8169 0000:02:00.0: MSI-X entry: context resume:
>> ffffffff ffffffff ffffffff ffffffff
>  ...
>> uh!  The MSI-X entry seems missed after resume on this laptop!
> 
> Yeah, having all of the MSI-X entry values be all-1's is not a good
> sign.
> 
all-1's seems to indicate that PCI access to the MSI-X table
BAR/region fails. Because falling back to MSI helps, accessing
the other BAR/region with the memory-mapped registers works.
I'll check with Realtek whether this symptom rings any bell.


> But this is quite a curious set of debugging traces we now have.
> 
> In the working case, the vector number in the DATA field seems
> to change, which suggests that something is assigning new values
> and programming them into these fields at resume time.
> 
> But in the failing cases, all of the values are garbage.
> 
> I would expect, given what the working trace looks like, that in the
> failing case some values would be wrong and the DATA value would have
> some new yet valid value.  But that is not what we are seeing here.
> 
> Weird.
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Experimental fix for MSI-X issue on r8169
  2018-08-22  4:24       ` David Miller
  2018-08-22  5:56         ` Heiner Kallweit
@ 2018-08-22  6:41         ` Steve Dodd
  1 sibling, 0 replies; 10+ messages in thread
From: Steve Dodd @ 2018-08-22  6:41 UTC (permalink / raw)
  To: David Miller; +Cc: Jian-Hong Pan, Heiner Kallweit, Lou Reed, netdev, linux

On 22 August 2018 at 05:24, David Miller <davem@davemloft.net> wrote:
> From: Jian-Hong Pan <jian-hong@endlessm.com>

>> [   56.462464] r8169 0000:02:00.0: MSI-X entry: context resume:
>> ffffffff ffffffff ffffffff ffffffff
>  ...
>> uh!  The MSI-X entry seems missed after resume on this laptop!
>
> Yeah, having all of the MSI-X entry values be all-1's is not a good
> sign.

I'm slightly confused as to why my machine doesn't even get to
printing the debugging message on resume..?

S.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-08-22 10:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-19 20:34 Experimental fix for MSI-X issue on r8169 Heiner Kallweit
2018-08-20  3:47 ` Jian-Hong Pan
2018-08-21 17:57   ` Steve Dodd
2018-08-21 21:19   ` Heiner Kallweit
2018-08-21 21:48     ` Steve Dodd
2018-08-21 23:32     ` David Miller
2018-08-22  3:01     ` Jian-Hong Pan
2018-08-22  4:24       ` David Miller
2018-08-22  5:56         ` Heiner Kallweit
2018-08-22  6:41         ` Steve Dodd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).