* Experimental fix for MSI-X issue on r8169 @ 2018-08-19 20:34 Heiner Kallweit 2018-08-20 3:47 ` Jian-Hong Pan 0 siblings, 1 reply; 10+ messages in thread From: Heiner Kallweit @ 2018-08-19 20:34 UTC (permalink / raw) To: Steve Dodd, Lou Reed, Jian-Hong Pan; +Cc: netdev@vger.kernel.org The three of you reported an MSI-X-related error when the system resumes from suspend. This has been fixed for now by disabling MSI-X on certain chip versions. However more versions may be affected. I checked with Realtek and they confirmed that on certain chip versions a MSIX-related value in PCI config space is reset when resuming from S3. I would appreciate if you could test the following experimental patch and whether warning "MSIX address lost, re-configuring" appears in your dmesg output after resume from suspend. Thanks a lot for your efforts. Heiner --- drivers/net/ethernet/realtek/r8169.c | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 0d9c38318..56b4bdff9 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -690,6 +690,8 @@ struct rtl8169_private { struct rtl8169_counters *counters; struct rtl8169_tc_offsets tc_offset; u32 saved_wolopts; + u32 saved_msix_addr_lo; + u32 saved_msix_addr_hi; struct rtl_fw { const struct firmware *fw; @@ -6876,6 +6878,19 @@ static int rtl8169_resume(struct device *device) { struct pci_dev *pdev = to_pci_dev(device); struct net_device *dev = pci_get_drvdata(pdev); + struct rtl8169_private *tp = netdev_priv(dev); + u32 val; + + /* Some chip versions loose these values when resuming */ + if (pdev->msix_enabled) { + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, &val); + if (!val) + dev_warn(device, "MSIX address lost, re-configuring\n"); + pci_write_config_dword(pdev, PCI_BASE_ADDRESS_4, + tp->saved_msix_addr_lo); + pci_write_config_dword(pdev, PCI_BASE_ADDRESS_5, + tp->saved_msix_addr_hi); + } if (netif_running(dev)) __rtl8169_resume(dev); @@ -7076,11 +7091,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); RTL_W8(tp, Cfg9346, Cfg9346_Lock); flags = PCI_IRQ_LEGACY; - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { - /* This version was reported to have issues with resume - * from suspend when using MSI-X - */ - flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; } else { flags = PCI_IRQ_ALL_TYPES; } @@ -7355,6 +7365,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return rc; } + if (pdev->msix_enabled) { + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, + &tp->saved_msix_addr_lo); + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_5, + &tp->saved_msix_addr_hi); + } + tp->saved_wolopts = __rtl8169_get_wol(tp); mutex_init(&tp->wk.mutex); -- 2.18.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-19 20:34 Experimental fix for MSI-X issue on r8169 Heiner Kallweit @ 2018-08-20 3:47 ` Jian-Hong Pan 2018-08-21 17:57 ` Steve Dodd 2018-08-21 21:19 ` Heiner Kallweit 0 siblings, 2 replies; 10+ messages in thread From: Jian-Hong Pan @ 2018-08-20 3:47 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>: > The three of you reported an MSI-X-related error when the system > resumes from suspend. This has been fixed for now by disabling MSI-X > on certain chip versions. However more versions may be affected. > > I checked with Realtek and they confirmed that on certain chip > versions a MSIX-related value in PCI config space is reset when > resuming from S3. > > I would appreciate if you could test the following experimental patch > and whether warning "MSIX address lost, re-configuring" appears in > your dmesg output after resume from suspend. > > Thanks a lot for your efforts. Tested with the experiment patch on ASUS X441UAR. This is the information before suspend: dev@endless:~$ dmesg | grep r8169 [ 10.279565] libphy: r8169: probed [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID 44900000, IRQ 127 [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off dev@endless:~$ ip addr show enp2s0 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute dynamic enp2s0 valid_lft 86347sec preferred_lft 86347sec inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute valid_lft forever preferred_lft forever This is the information after resume: dev@endless:~$ dmesg | grep r8169 [ 10.279565] libphy: r8169: probed [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID 44900000, IRQ 127 [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off [ 95.594265] r8169 0000:02:00.0 enp2s0: Link is Down [ 96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) dev@endless:~$ ip addr show enp2s0 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff There is no "MSIX address lost, re-configuring" in dmesg. The ethernet interface is still down after resume. This is the ethernet controller in detail: 02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07) Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet controller [1043:200f] Flags: bus master, fast devsel, latency 0, IRQ 16 I/O ports at e000 [size=256] Memory at ef100000 (64-bit, non-prefetchable) [size=4K] Memory at e0000000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00 Capabilities: [170] Latency Tolerance Reporting Kernel driver in use: r8169 Kernel modules: r8169 Regards, Jian-Hong Pan > > --- > drivers/net/ethernet/realtek/r8169.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 0d9c38318..56b4bdff9 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -690,6 +690,8 @@ struct rtl8169_private { > struct rtl8169_counters *counters; > struct rtl8169_tc_offsets tc_offset; > u32 saved_wolopts; > + u32 saved_msix_addr_lo; > + u32 saved_msix_addr_hi; > > struct rtl_fw { > const struct firmware *fw; > @@ -6876,6 +6878,19 @@ static int rtl8169_resume(struct device *device) > { > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > + struct rtl8169_private *tp = netdev_priv(dev); > + u32 val; > + > + /* Some chip versions loose these values when resuming */ > + if (pdev->msix_enabled) { > + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, &val); > + if (!val) > + dev_warn(device, "MSIX address lost, re-configuring\n"); > + pci_write_config_dword(pdev, PCI_BASE_ADDRESS_4, > + tp->saved_msix_addr_lo); > + pci_write_config_dword(pdev, PCI_BASE_ADDRESS_5, > + tp->saved_msix_addr_hi); > + } > > if (netif_running(dev)) > __rtl8169_resume(dev); > @@ -7076,11 +7091,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) > RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); > RTL_W8(tp, Cfg9346, Cfg9346_Lock); > flags = PCI_IRQ_LEGACY; > - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { > - /* This version was reported to have issues with resume > - * from suspend when using MSI-X > - */ > - flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; > } else { > flags = PCI_IRQ_ALL_TYPES; > } > @@ -7355,6 +7365,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > return rc; > } > > + if (pdev->msix_enabled) { > + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_4, > + &tp->saved_msix_addr_lo); > + pci_read_config_dword(pdev, PCI_BASE_ADDRESS_5, > + &tp->saved_msix_addr_hi); > + } > + > tp->saved_wolopts = __rtl8169_get_wol(tp); > > mutex_init(&tp->wk.mutex); > -- > 2.18.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-20 3:47 ` Jian-Hong Pan @ 2018-08-21 17:57 ` Steve Dodd 2018-08-21 21:19 ` Heiner Kallweit 1 sibling, 0 replies; 10+ messages in thread From: Steve Dodd @ 2018-08-21 17:57 UTC (permalink / raw) To: Jian-Hong Pan; +Cc: Heiner Kallweit, Lou Reed, netdev@vger.kernel.org On 20 August 2018 at 04:47, Jian-Hong Pan <jian-hong@endlessm.com> wrote: > There is no "MSIX address lost, re-configuring" in dmesg. > The ethernet interface is still down after resume. Sorry, only just seen this thread. I can confirm Jian-Jong's report -- this patch doesn't help (applied to 4.18.3); no message is output. Thanks for the investigatory effort though! Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-20 3:47 ` Jian-Hong Pan 2018-08-21 17:57 ` Steve Dodd @ 2018-08-21 21:19 ` Heiner Kallweit 2018-08-21 21:48 ` Steve Dodd ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Heiner Kallweit @ 2018-08-21 21:19 UTC (permalink / raw) To: Jian-Hong Pan; +Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org On 20.08.2018 05:47, Jian-Hong Pan wrote: > 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>: >> The three of you reported an MSI-X-related error when the system >> resumes from suspend. This has been fixed for now by disabling MSI-X >> on certain chip versions. However more versions may be affected. >> >> I checked with Realtek and they confirmed that on certain chip >> versions a MSIX-related value in PCI config space is reset when >> resuming from S3. >> >> I would appreciate if you could test the following experimental patch >> and whether warning "MSIX address lost, re-configuring" appears in >> your dmesg output after resume from suspend. >> >> Thanks a lot for your efforts. > > Tested with the experiment patch on ASUS X441UAR. > > This is the information before suspend: > > dev@endless:~$ dmesg | grep r8169 > [ 10.279565] libphy: r8169: probed > [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, > XID 44900000, IRQ 127 > [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 > [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic > PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) > [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - > flow control off > > dev@endless:~$ ip addr show enp2s0 > 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast > state UP group default qlen 1000 > link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff > inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute > dynamic enp2s0 > valid_lft 86347sec preferred_lft 86347sec > inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute > valid_lft forever preferred_lft forever > > This is the information after resume: > > dev@endless:~$ dmesg | grep r8169 > [ 10.279565] libphy: r8169: probed > [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, > XID 44900000, IRQ 127 > [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 > [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic > PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) > [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - > flow control off > [ 95.594265] r8169 0000:02:00.0 enp2s0: Link is Down > [ 96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic > PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) > > dev@endless:~$ ip addr show enp2s0 > 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc > pfifo_fast state DOWN group default qlen 1000 > link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff > > There is no "MSIX address lost, re-configuring" in dmesg. > The ethernet interface is still down after resume. > Thanks a lot for testing. Unfortunately I don't have test hardware affected by this MSI-X issue, so maybe you can help me to understand the issue a little better. Below is a patch printing the MSI-X table entry in different contexts, it's not supposed to fix anything. Could you please let me know what the output is on your system? I want to get an idea whether the issue clears the complete entry or just corrupts certain parts. That's what I get on my system (RTL8168E-VL). In your case you'll come only till the first suspend. [ 3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 [ 29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0 [ 29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 [ 36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0 [ 37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 54f53c8c0..f32645119 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -11,6 +11,7 @@ #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/pci.h> +#include <linux/msi.h> #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/delay.h> @@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) pm_runtime_put_noidle(&pdev->dev); } +static void rtl_print_msix_entry(struct rtl8169_private *tp, const char *context) +{ + struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev); + u32 data[4]; + + data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR); + data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR); + data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA); + data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL); + + dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n", + context, data[0], data[1], data[2], data[3]); +} + static void rtl8169_net_suspend(struct net_device *dev) { struct rtl8169_private *tp = netdev_priv(dev); @@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device) { struct pci_dev *pdev = to_pci_dev(device); struct net_device *dev = pci_get_drvdata(pdev); + struct rtl8169_private *tp = netdev_priv(dev); rtl8169_net_suspend(dev); + rtl_print_msix_entry(tp, "suspend"); + return 0; } @@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device) { struct pci_dev *pdev = to_pci_dev(device); struct net_device *dev = pci_get_drvdata(pdev); + struct rtl8169_private *tp = netdev_priv(dev); + + rtl_print_msix_entry(tp, "resume"); if (netif_running(dev)) __rtl8169_resume(dev); @@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); RTL_W8(tp, Cfg9346, Cfg9346_Lock); flags = PCI_IRQ_LEGACY; - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { - /* This version was reported to have issues with resume - * from suspend when using MSI-X - */ - flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; } else { flags = PCI_IRQ_ALL_TYPES; } @@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return rc; } + rtl_print_msix_entry(tp, "probe"); + tp->saved_wolopts = __rtl8169_get_wol(tp); mutex_init(&tp->wk.mutex); -- 2.18.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-21 21:19 ` Heiner Kallweit @ 2018-08-21 21:48 ` Steve Dodd 2018-08-21 23:32 ` David Miller 2018-08-22 3:01 ` Jian-Hong Pan 2 siblings, 0 replies; 10+ messages in thread From: Steve Dodd @ 2018-08-21 21:48 UTC (permalink / raw) To: Heiner Kallweit; +Cc: Jian-Hong Pan, Lou Reed, netdev@vger.kernel.org On 21 August 2018 at 22:19, Heiner Kallweit <hkallweit1@gmail.com> wrote: > Below is a patch printing the MSI-X table entry in different contexts, > it's not supposed to fix anything. Could you please let me know > what the output is on your system? > I want to get an idea whether the issue clears the complete entry or > just corrupts certain parts. Here we go.. [ 2.865434] r8169 0000:01:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 [ 269.747608] r8169 0000:01:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 [ 295.454099] r8169 0000:01:00.0: MSI-X entry: context suspend: fee08004 0 4028 0 S. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-21 21:19 ` Heiner Kallweit 2018-08-21 21:48 ` Steve Dodd @ 2018-08-21 23:32 ` David Miller 2018-08-22 3:01 ` Jian-Hong Pan 2 siblings, 0 replies; 10+ messages in thread From: David Miller @ 2018-08-21 23:32 UTC (permalink / raw) To: hkallweit1; +Cc: jian-hong, steved424, gogen, netdev From: Heiner Kallweit <hkallweit1@gmail.com> Date: Tue, 21 Aug 2018 23:19:04 +0200 > That's what I get on my system (RTL8168E-VL). In your case you'll come > only till the first suspend. > > [ 3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 On probe, MSI-X is masked (ie. disabled) and is configured to use: address: 0x00000000fee01004 data: 0x00000000000040ef > [ 29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0 At suspend time, MSI-X is unmasked (ie. enabled) and is configured to use: address: 0x00000000fee01004 data: 0x0000000000004028 > [ 29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 At reume time, MSI-X is unmasked (ie. enabled) and is configured to use: address: 0x00000000fee01004 data: 0x000000000000402b > [ 36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0 Second suspend: address: 0x00000000fee01004 data: 0x000000000000402b > [ 37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 Second resume: address: 0x00000000fee01004 data: 0x000000000000402b And this all looks normal. The data field is changing when you first up the device and interrupts are enabled. This is where the request_irq happens, the MSI vector is allocated, and that vector number is written to the data field of the MSI-X entry. It looks like this (re-)allocation of MSI vectors happens on every resume as well. And that's why the data field changes each resume. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-21 21:19 ` Heiner Kallweit 2018-08-21 21:48 ` Steve Dodd 2018-08-21 23:32 ` David Miller @ 2018-08-22 3:01 ` Jian-Hong Pan 2018-08-22 4:24 ` David Miller 2 siblings, 1 reply; 10+ messages in thread From: Jian-Hong Pan @ 2018-08-22 3:01 UTC (permalink / raw) To: Heiner Kallweit Cc: Steve Dodd, Lou Reed, netdev@vger.kernel.org, Linux Upstreaming Team 2018-08-22 5:19 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>: > On 20.08.2018 05:47, Jian-Hong Pan wrote: >> 2018-08-20 4:34 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>: >>> The three of you reported an MSI-X-related error when the system >>> resumes from suspend. This has been fixed for now by disabling MSI-X >>> on certain chip versions. However more versions may be affected. >>> >>> I checked with Realtek and they confirmed that on certain chip >>> versions a MSIX-related value in PCI config space is reset when >>> resuming from S3. >>> >>> I would appreciate if you could test the following experimental patch >>> and whether warning "MSIX address lost, re-configuring" appears in >>> your dmesg output after resume from suspend. >>> >>> Thanks a lot for your efforts. >> >> Tested with the experiment patch on ASUS X441UAR. >> >> This is the information before suspend: >> >> dev@endless:~$ dmesg | grep r8169 >> [ 10.279565] libphy: r8169: probed >> [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, >> XID 44900000, IRQ 127 >> [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 >> [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - >> flow control off >> >> dev@endless:~$ ip addr show enp2s0 >> 4: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast >> state UP group default qlen 1000 >> link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff >> inet 10.100.13.152/24 brd 10.100.13.255 scope global noprefixroute >> dynamic enp2s0 >> valid_lft 86347sec preferred_lft 86347sec >> inet6 fe80::2873:a2a9:6ca1:c79d/64 scope link noprefixroute >> valid_lft forever preferred_lft forever >> >> This is the information after resume: >> >> dev@endless:~$ dmesg | grep r8169 >> [ 10.279565] libphy: r8169: probed >> [ 10.279947] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, >> XID 44900000, IRQ 127 >> [ 10.445952] r8169 0000:02:00.0 enp2s0: renamed from eth0 >> [ 15.676229] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> [ 17.455392] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - >> flow control off >> [ 95.594265] r8169 0000:02:00.0 enp2s0: Link is Down >> [ 96.242074] Generic PHY r8169-200:00: attached PHY driver [Generic >> PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) >> >> dev@endless:~$ ip addr show enp2s0 >> 4: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc >> pfifo_fast state DOWN group default qlen 1000 >> link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff >> >> There is no "MSIX address lost, re-configuring" in dmesg. >> The ethernet interface is still down after resume. >> > > Thanks a lot for testing. Unfortunately I don't have test hardware > affected by this MSI-X issue, so maybe you can help me to understand > the issue a little better. > > Below is a patch printing the MSI-X table entry in different contexts, > it's not supposed to fix anything. Could you please let me know > what the output is on your system? > I want to get an idea whether the issue clears the complete entry or > just corrupts certain parts. Here is the test result on ASUS X441UAR with this patch: dev@endless:~$ dmesg | grep -E "(r8169|enp2s0)" [ 8.980001] r8169 0000:02:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 [ 8.981594] libphy: r8169: probed [ 8.981769] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID 44900000, IRQ 127 [ 9.479848] r8169 0000:02:00.0 enp2s0: renamed from eth0 [ 11.332834] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 11.336350] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 11.574892] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 11.581816] r8169 0000:02:00.0 enp2s0: Link is Down [ 13.190535] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off [ 13.190548] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready [ 56.227974] r8169 0000:02:00.0: MSI-X entry: context suspend: fee04004 0 4024 0 [ 56.462464] r8169 0000:02:00.0: MSI-X entry: context resume: ffffffff ffffffff ffffffff ffffffff [ 58.406713] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full - flow control off [ 58.766740] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready [ 58.767331] Generic PHY r8169-200:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE) [ 59.003660] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready uh! The MSI-X entry seems missed after resume on this laptop! Ethernet interface status after resume: dev@endless:~$ ip addr show enp2s0 3: enp2s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000 link/ether 0c:9d:92:32:67:b4 brd ff:ff:ff:ff:ff:ff Regards, Jian-Hong Pan > That's what I get on my system (RTL8168E-VL). In your case you'll come > only till the first suspend. > > [ 3.743404] r8169 0000:03:00.0: MSI-X entry: context probe: fee01004 0 40ef 1 > [ 29.539250] r8169 0000:03:00.0: MSI-X entry: context suspend: fee02004 0 4028 0 > [ 29.837457] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 > [ 36.921370] r8169 0000:03:00.0: MSI-X entry: context suspend: fee01004 0 402b 0 > [ 37.239407] r8169 0000:03:00.0: MSI-X entry: context resume: fee01004 0 402b 0 > > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 54f53c8c0..f32645119 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -11,6 +11,7 @@ > #include <linux/module.h> > #include <linux/moduleparam.h> > #include <linux/pci.h> > +#include <linux/msi.h> > #include <linux/netdevice.h> > #include <linux/etherdevice.h> > #include <linux/delay.h> > @@ -6822,6 +6823,20 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > pm_runtime_put_noidle(&pdev->dev); > } > > +static void rtl_print_msix_entry(struct rtl8169_private *tp, const char *context) > +{ > + struct msi_desc *desc = first_pci_msi_entry(tp->pci_dev); > + u32 data[4]; > + > + data[0] = readl(desc->mask_base + PCI_MSIX_ENTRY_LOWER_ADDR); > + data[1] = readl(desc->mask_base + PCI_MSIX_ENTRY_UPPER_ADDR); > + data[2] = readl(desc->mask_base + PCI_MSIX_ENTRY_DATA); > + data[3] = readl(desc->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL); > + > + dev_info(tp_to_dev(tp), "MSI-X entry: context %s: %x %x %x %x\n", > + context, data[0], data[1], data[2], data[3]); > +} > + > static void rtl8169_net_suspend(struct net_device *dev) > { > struct rtl8169_private *tp = netdev_priv(dev); > @@ -6846,9 +6861,12 @@ static int rtl8169_suspend(struct device *device) > { > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > + struct rtl8169_private *tp = netdev_priv(dev); > > rtl8169_net_suspend(dev); > > + rtl_print_msix_entry(tp, "suspend"); > + > return 0; > } > > @@ -6875,6 +6893,9 @@ static int rtl8169_resume(struct device *device) > { > struct pci_dev *pdev = to_pci_dev(device); > struct net_device *dev = pci_get_drvdata(pdev); > + struct rtl8169_private *tp = netdev_priv(dev); > + > + rtl_print_msix_entry(tp, "resume"); > > if (netif_running(dev)) > __rtl8169_resume(dev); > @@ -7075,11 +7096,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp) > RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable); > RTL_W8(tp, Cfg9346, Cfg9346_Lock); > flags = PCI_IRQ_LEGACY; > - } else if (tp->mac_version == RTL_GIGA_MAC_VER_40) { > - /* This version was reported to have issues with resume > - * from suspend when using MSI-X > - */ > - flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI; > } else { > flags = PCI_IRQ_ALL_TYPES; > } > @@ -7354,6 +7370,8 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > return rc; > } > > + rtl_print_msix_entry(tp, "probe"); > + > tp->saved_wolopts = __rtl8169_get_wol(tp); > > mutex_init(&tp->wk.mutex); > -- > 2.18.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-22 3:01 ` Jian-Hong Pan @ 2018-08-22 4:24 ` David Miller 2018-08-22 5:56 ` Heiner Kallweit 2018-08-22 6:41 ` Steve Dodd 0 siblings, 2 replies; 10+ messages in thread From: David Miller @ 2018-08-22 4:24 UTC (permalink / raw) To: jian-hong; +Cc: hkallweit1, steved424, gogen, netdev, linux From: Jian-Hong Pan <jian-hong@endlessm.com> Date: Wed, 22 Aug 2018 11:01:02 +0800 ... > [ 56.462464] r8169 0000:02:00.0: MSI-X entry: context resume: > ffffffff ffffffff ffffffff ffffffff ... > uh! The MSI-X entry seems missed after resume on this laptop! Yeah, having all of the MSI-X entry values be all-1's is not a good sign. But this is quite a curious set of debugging traces we now have. In the working case, the vector number in the DATA field seems to change, which suggests that something is assigning new values and programming them into these fields at resume time. But in the failing cases, all of the values are garbage. I would expect, given what the working trace looks like, that in the failing case some values would be wrong and the DATA value would have some new yet valid value. But that is not what we are seeing here. Weird. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-22 4:24 ` David Miller @ 2018-08-22 5:56 ` Heiner Kallweit 2018-08-22 6:41 ` Steve Dodd 1 sibling, 0 replies; 10+ messages in thread From: Heiner Kallweit @ 2018-08-22 5:56 UTC (permalink / raw) To: David Miller, jian-hong; +Cc: steved424, gogen, netdev, linux On 22.08.2018 06:24, David Miller wrote: > From: Jian-Hong Pan <jian-hong@endlessm.com> > Date: Wed, 22 Aug 2018 11:01:02 +0800 > > ... >> [ 56.462464] r8169 0000:02:00.0: MSI-X entry: context resume: >> ffffffff ffffffff ffffffff ffffffff > ... >> uh! The MSI-X entry seems missed after resume on this laptop! > > Yeah, having all of the MSI-X entry values be all-1's is not a good > sign. > all-1's seems to indicate that PCI access to the MSI-X table BAR/region fails. Because falling back to MSI helps, accessing the other BAR/region with the memory-mapped registers works. I'll check with Realtek whether this symptom rings any bell. > But this is quite a curious set of debugging traces we now have. > > In the working case, the vector number in the DATA field seems > to change, which suggests that something is assigning new values > and programming them into these fields at resume time. > > But in the failing cases, all of the values are garbage. > > I would expect, given what the working trace looks like, that in the > failing case some values would be wrong and the DATA value would have > some new yet valid value. But that is not what we are seeing here. > > Weird. > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Experimental fix for MSI-X issue on r8169 2018-08-22 4:24 ` David Miller 2018-08-22 5:56 ` Heiner Kallweit @ 2018-08-22 6:41 ` Steve Dodd 1 sibling, 0 replies; 10+ messages in thread From: Steve Dodd @ 2018-08-22 6:41 UTC (permalink / raw) To: David Miller; +Cc: Jian-Hong Pan, Heiner Kallweit, Lou Reed, netdev, linux On 22 August 2018 at 05:24, David Miller <davem@davemloft.net> wrote: > From: Jian-Hong Pan <jian-hong@endlessm.com> >> [ 56.462464] r8169 0000:02:00.0: MSI-X entry: context resume: >> ffffffff ffffffff ffffffff ffffffff > ... >> uh! The MSI-X entry seems missed after resume on this laptop! > > Yeah, having all of the MSI-X entry values be all-1's is not a good > sign. I'm slightly confused as to why my machine doesn't even get to printing the debugging message on resume..? S. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-08-22 10:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-19 20:34 Experimental fix for MSI-X issue on r8169 Heiner Kallweit 2018-08-20 3:47 ` Jian-Hong Pan 2018-08-21 17:57 ` Steve Dodd 2018-08-21 21:19 ` Heiner Kallweit 2018-08-21 21:48 ` Steve Dodd 2018-08-21 23:32 ` David Miller 2018-08-22 3:01 ` Jian-Hong Pan 2018-08-22 4:24 ` David Miller 2018-08-22 5:56 ` Heiner Kallweit 2018-08-22 6:41 ` Steve Dodd
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).