All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Martin Kjær Jørgensen" <me@lagy.org>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, nic_swsd@realtek.com
Subject: Re: r8169 link up but no traffic, and watchdog error
Date: Fri, 22 Mar 2024 12:28:54 +0100	[thread overview]
Message-ID: <871q82fqpk.fsf@lagy.org> (raw)
In-Reply-To: <b0e2f6fb-2a1b-4452-bf49-739a30925fde@gmail.com>


On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 25.09.2023 17:41, Martin Kjær Jørgensen wrote:
>>
>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>>> On 25.09.2023 13:30, Martin Kjær Jørgensen wrote:
>>>>
>>>> On Mon, Sep 25 2023, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>>
>>>> There are no PCI extension cards.
>>>>
>>>
>>> Your BIOS signature indicates that the system is a Thinkstation P350.
>>> According to the Lenovo website it comes with one Intel-based network port.
>>> However you have additional 4 Realtek-based network ports on the mainboard?
>>>
>>
>> Yes. 2 PCIE cards with two Realtek ethernet controllers each.
>>
>>>>> And does the problem occur with all of your NICs?
>>>>
>>>> No, only the Realtek ones.
>>>>
>>>>> The exact NIC type might provide a hint, best provide a full dmesg log.
>>>> [ 1512.295490] RSP: 0018:ffffbc0240193e88 EFLAGS: 00000246
>>>> [ 1512.295492] RAX: ffff998935680000 RBX: ffffdc023faa8e00 RCX: 000000000000001f
>>>> [ 1512.295493] RDX: 0000000000000002 RSI: ffffffffb544f718 RDI: ffffffffb543bc32
>>>> [ 1512.295494] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000018
>>>> [ 1512.295495] R10: ffff9989356b1dc4 R11: 00000000000058a8 R12: ffffffffb5d981a0
>>>> [ 1512.295496] R13: 000001601bd198ef R14: 0000000000000003 R15: 0000000000000000
>>>> [ 1512.295497]  ? cpuidle_enter_state+0xbd/0x440
>>>> [ 1512.295499]  cpuidle_enter+0x2d/0x40
>>>> [ 1512.295501]  do_idle+0x217/0x270
>>>> [ 1512.295503]  cpu_startup_entry+0x1d/0x20
>>>> [ 1512.295505]  start_secondary+0x11a/0x140
>>>> [ 1512.295508]  secondary_startup_64_no_verify+0x17e/0x18b
>>>> [ 1512.295510]  </TASK>
>>>> [ 1512.295511] ---[ end trace 0000000000000000 ]---
>>>> [ 1512.295526] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>>> [ 1531.322039] r8169 0000:03:00.0 enp3s0: Link is Down
>>>> [ 1534.138489] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>> [ 1538.177385] r8169 0000:03:00.0 enp3s0: Link is Down
>>>> [ 1566.174660] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>> [ 1567.839082] r8169 0000:03:00.0 enp3s0: Link is Down
>>>> [ 1570.621088] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>> [ 1576.294267] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>>
>>> Regarding the following: Issue occurs after few seconds of link-loss.
>>> Was this an intentional link-down event?
>>
>> Yes, I intentionally unplug the cable at the other end for the link to go down.
>>
>>> And is issue always related to link-up after a link-loss period?
>>>
>>
>> Yes, it happends after cable is plugged in again, so after a link-loss period.
>>
> Good to know. I heard this before, under unknown circumstances (Realtek doesn't publish
> errata information) the NIC (unclear whether MAC or PHY) seems to hang up after link-loss
> in rare cases. Vendor driver does a full hw init on each link-up, maybe this is to work
> around the issue we talk about here.
>
>>
>>>> [ 1488.643231] r8169 0000:03:00.0 enp3s0: Link is Down
>>>> [ 1506.576941] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control rx/tx
>>>> [ 1512.295215] ------------[ cut here ]------------
>>>> [ 1512.295219] NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out 5368 ms

I am seeing the behavior again with latest 6.6.21 kernel. Like last time, it
helps to manually shutdown and bring up the interface with 'ip link enp4s0
down/up'


[243277.859725] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control off
[243283.400061] ------------[ cut here ]------------
[243283.400063] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out 5537 ms
[243283.400070] WARNING: CPU: 3 PID: 2909804 at net/sched/sch_generic.c:525 dev_watchdog+0x225/0x230
[243283.400073] Modules linked in: tls nfnetlink_queue nfnetlink_log bluetooth ecdh_generic ecc xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink iptable_nat xt_addrtype iptable_filter ip_tables x_tables bpfilter br_netfilter overlay authenc echainiv geniv crypto_null esp4 xfrm_interface xfrm6_tunnel tunnel4 tunnel6 cmac xfrm_user xfrm_algo nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 dns_resolver fscache netfs snd_seq_dummy snd_hrtimer snd_seq af_packet bridge stp llc cfg80211 rfkill nft_fib_ipv6 nft_nat nft_fib_ipv4 nft_fib nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nls_ascii nls_cp437 vfat fat intel_rapl_msr coretemp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp ofpart cmdlinepart snd_usb_audio spi_nor iTCO_wdt intel_pmc_bxt kvm_intel mei_wdt snd_usbmidi_lib iTCO_vendor_support mei_pxp mei_hdcp ee1004 mtd watchdog snd_hwdep r8169 snd_ump kvm snd_rawmidi realtek uvcvideo snd_seq_device snd_pcm mdio_devres
[243283.400109]  videobuf2_vmalloc irqbypass of_mdio uvc videobuf2_memops fixed_phy snd_timer videobuf2_v4l2 think_lmi intel_cstate rtsx_usb_ms fwnode_mdio intel_uncore memstick e1000e rtc_cmos videodev mei_me ftdi_sio firmware_attributes_class joydev snd videobuf2_common i2c_i801 ptp spi_intel_pci libphy intel_wmi_thunderbolt wmi_bmof mei mc tiny_power_button pps_core spi_intel i2c_smbus usbserial soundcore mousedev thermal fan input_leds int3400_thermal acpi_thermal_rel intel_pmc_core acpi_tad acpi_pad evdev button mac_hid sch_fq_codel msr loop fuse efi_pstore nfnetlink efivarfs dmi_sysfs dm_crypt trusted asn1_encoder tee ext4 crc32c_generic crc16 mbcache jbd2 hid_logitech_hidpp hid_logitech_dj hid_jabra hid_generic rtsx_usb_sdmmc mmc_core led_class usbhid hid rtsx_usb crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel i915 sha512_ssse3 sha256_ssse3 sha1_ssse3 xhci_pci ahci xhci_pci_renesas nvme libahci xhci_hcd nvme_core libata nvme_common i2c_algo_bit drm_buddy t10_pi ttm usbcore
[243283.400142]  crc64_rocksoft_generic aesni_intel drm_display_helper scsi_mod crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul cec hwmon crypto_simd crc64 rc_core cryptd crct10dif_common usb_common scsi_common 8250 8250_base video serial_mctrl_gpio serial_base wmi backlight dm_mod dax
[243283.400150] CPU: 3 PID: 2909804 Comm: python3.11 Not tainted 6.6.21-gentoo-desktop-r1 #1
[243283.400152] Hardware name: LENOVO 30E30051UK/1052, BIOS S0AKT3EA 09/22/2023
[243283.400153] RIP: 0010:dev_watchdog+0x225/0x230
[243283.400154] Code: ff ff ff 48 89 ef c6 05 08 6c ee 00 01 e8 03 94 fa ff 45 89 f8 44 89 f1 48 89 ee 48 89 c2 48 c7 c7 98 c4 37 9a e8 8b 5c 79 ff <0f> 0b e9 27 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[243283.400155] RSP: 0000:ffffc9000f3e3df8 EFLAGS: 00010292
[243283.400156] RAX: 0000000000000043 RBX: ffff88810b6f841c RCX: 0000000000000027
[243283.400157] RDX: ffff8890356e04c8 RSI: 0000000000000001 RDI: ffff8890356e04c0
[243283.400158] RBP: ffff88810b6f8000 R08: 0000000000000000 R09: ffffffff9a646ce0
[243283.400158] R10: ffffc9000f3e3cb0 R11: ffffffff9a726d28 R12: ffff88810b6f84c8
[243283.400159] R13: ffff88810b6e6800 R14: 0000000000000000 R15: 00000000000015a1
[243283.400159] FS:  00007fb111f89740(0000) GS:ffff8890356c0000(0000) knlGS:0000000000000000
[243283.400160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[243283.400161] CR2: 00007fb1041fb030 CR3: 000000044b576002 CR4: 0000000000770ee0
[243283.400162] PKRU: 55555554
[243283.400162] Call Trace:
[243283.400163]  <TASK>
[243283.400164]  ? dev_watchdog+0x225/0x230
[243283.400165]  ? __warn+0x7c/0x130
[243283.400168]  ? dev_watchdog+0x225/0x230
[243283.400169]  ? report_bug+0x171/0x1a0
[243283.400172]  ? handle_bug+0x3a/0x70
[243283.400174]  ? exc_invalid_op+0x17/0x70
[243283.400175]  ? asm_exc_invalid_op+0x1a/0x20
[243283.400178]  ? dev_watchdog+0x225/0x230
[243283.400179]  ? dev_watchdog+0x225/0x230
[243283.400180]  ? __pfx_dev_watchdog+0x10/0x10
[243283.400181]  ? __pfx_dev_watchdog+0x10/0x10
[243283.400182]  call_timer_fn+0x1f/0x130
[243283.400184]  __run_timers.part.0+0x1bc/0x250
[243283.400186]  ? ktime_get+0x34/0xa0
[243283.400187]  run_timer_softirq+0x25/0x50
[243283.400188]  __do_softirq+0xbd/0x296
[243283.400190]  irq_exit_rcu+0x65/0x80
[243283.400191]  sysvec_apic_timer_interrupt+0x3e/0x90
[243283.400192]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[243283.400194] RIP: 0033:0x7fb111bd3cd5
[243283.400195] Code: c8 48 8b 56 08 48 83 c6 08 48 85 d2 75 a7 48 85 c0 74 05 48 39 c3 75 09 49 8b 47 10 48 89 44 24 10 48 85 ed 0f 85 8e 00 00 00 <49> 8b 40 28 49 8b 4f 18 48 39 48 18 75 5d 49 8b 85 58 01 00 00 49
[243283.400196] RSP: 002b:00007ffd3916a9b0 EFLAGS: 00000246
[243283.400196] RAX: 00007fb111bd73c0 RBX: 000056339f603f78 RCX: 00007fb111f66aa8
[243283.400197] RDX: 0000000000000000 RSI: 00007fb111f04360 RDI: 00007fb111f66aa8
[243283.400197] RBP: 0000000000000000 R08: 00007fb10410c270 R09: 00007fb111f66aa0
[243283.400198] R10: 8d3a98eb5e44a685 R11: 1ffffffffffffffe R12: 00007ffd3916a9d4
[243283.400198] R13: 000056339f603eb0 R14: 00000000000000c8 R15: 00007fb111e0dcc0
[243283.400199]  </TASK>
[243283.400200] ---[ end trace 0000000000000000 ]---
[243283.400216] r8169 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[243295.067251] r8169 0000:04:00.0 enp4s0: Link is Down
[243297.620960] RTL8226 2.5Gbps PHY r8169-0-400:00: attached PHY driver (mii_bus:phy_addr=r8169-0-400:00, irq=MAC)
[243297.752106] r8169 0000:04:00.0 enp4s0: Link is Down
[243300.656125] r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control off

  reply	other threads:[~2024-03-22 11:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09 11:50 r8169 link up but no traffic, and watchdog error Martin Kjær Jørgensen
2023-08-09 19:58 ` Jakub Kicinski
2023-08-18 11:49   ` Martin Kjær Jørgensen
2023-08-24  7:52     ` Heiner Kallweit
2023-08-24  8:01       ` Martin Kjær Jørgensen
2023-08-24  8:21         ` Heiner Kallweit
2023-08-24  9:22       ` Martin Kjær Jørgensen
2023-08-25 16:55         ` Heiner Kallweit
2023-09-25  8:36   ` Martin Kjær Jørgensen
2023-09-25 10:37     ` Heiner Kallweit
2023-09-25 11:30       ` Martin Kjær Jørgensen
2023-09-25 15:38         ` Heiner Kallweit
2023-09-25 15:41           ` Martin Kjær Jørgensen
2023-09-25 15:59             ` Heiner Kallweit
2024-03-22 11:28               ` Martin Kjær Jørgensen [this message]
2024-03-22 12:26                 ` Heiner Kallweit
2024-03-23 11:18                   ` Heiner Kallweit
2023-09-25 20:38             ` Heiner Kallweit
2023-10-02  8:28               ` Martin Kjær Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871q82fqpk.fsf@lagy.org \
    --to=me@lagy.org \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.