netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: VDR User <user.vdr@gmail.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Netdev <netdev@vger.kernel.org>
Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb
Date: Sun, 17 Mar 2019 09:25:56 +0100	[thread overview]
Message-ID: <dca8471d-d599-4318-96fe-6cffb1cdcd0a@gmail.com> (raw)
In-Reply-To: <CAKgT0UcS2+HHTnRyVe=f_tJZ07MKV7+XVLJtND2UDkzUbEP52Q@mail.gmail.com>

On 16.03.2019 18:08, Alexander Duyck wrote:
> On Sat, Mar 16, 2019 at 8:12 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 16.03.2019 15:38, VDR User wrote:
>>>> Part of the issue though is that we don't know how reliable that test
>>>> was. I believe Derek he hasn't had any crashes, but he wasn't confident
>>>> that it had actually resolved the issue.
>>>
>>> Previously I thought I could easily & consistently reproduce the crash
>>> but the more testing I did, the more I realized that wasn't the case.
>>> That's why my confidence was low in that reversing commit 5317d5c6d47e
>>> ("r8169: use napi_consume_skb where possible") fixed it. I felt like I
>>> needed to do a lot more testing over the weekend to be sure. But, I
>>> can now confirm that reversing that commit did not solve the problem.
>>> I didn't ifdown/ifup after the crash so the nic eventually recovered
>>> on its own I guess. The `ethtool -S` output is:
>>>
>>> NIC statistics:
>>>      tx_packets: 5370650
>>>      rx_packets: 57340787
>>>      tx_errors: 0
>>>      rx_errors: 0
>>>      rx_missed: 26
>>>      align_errors: 0
>>>      tx_single_collisions: 0
>>>      tx_multi_collisions: 0
>>>      unicast: 57332905
>>>      broadcast: 6409
>>>      multicast: 1473
>>>      tx_aborted: 0
>>>      tx_underrun: 0
>>>
>>> The dmesg log looks the same as before:
>>>
>>> [95579.984062] ------------[ cut here ]------------
>>> [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out
>>> [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461
>>> dev_watchdog+0x1bb/0x1e0
>>> [95579.984276] Modules linked in: snd_hda_codec_hdmi
>>> snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel
>>> snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd
>>> usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd
>>> auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6
>>> [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1
>>> [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011
>>> [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0
>>> [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89
>>> f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25
>>> ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00
>>> 00 00
>>> [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103
>>> [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c
>>> [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296
>>> [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690
>>> [95579.986325] Call Trace:
>>> [95579.986356]  <SOFTIRQ>
>>> [95579.986389]  ? qdisc_put_unlocked+0x40/0x40
>>> [95579.986423]  call_timer_fn+0x19/0xa0
>>> [95579.986456]  run_timer_softirq+0x337/0x380
>>> [95579.986488]  ? qdisc_put_unlocked+0x40/0x40
>>> [95579.986521]  ? rcu_process_callbacks+0xcb/0x380
>>> [95579.986555]  __do_softirq+0xd6/0x21c
>>> [95579.986586]  ? __irqentry_text_end+0x18/0x18
>>> [95579.986619]  call_on_stack+0x10/0x60
>>> [95579.986646]  </SOFTIRQ>
>>> [95579.986674]  ? irq_exit+0x91/0xc0
>>> [95579.986701]  ? smp_apic_timer_interrupt+0x56/0xa0
>>> [95579.986731]  ? apic_timer_interrupt+0xd5/0xdc
>>> [95579.986761]  ? acpi_idle_enter_s2idle+0x60/0x60
>>> [95579.986790]  ? cpuidle_enter_state+0x122/0x360
>>> [95579.986821]  ? cpuidle_enter+0xf/0x20
>>> [95579.986850]  ? call_cpuidle+0x1c/0x40
>>> [95579.986878]  ? do_idle+0x1e6/0x220
>>> [95579.986906]  ? cpu_startup_entry+0x25/0x40
>>> [95579.986934]  ? start_secondary+0x1a5/0x220
>>> [95579.986963]  ? startup_32_smp+0x15f/0x164
>>> [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]---
>>>
>>> Please let me know if there's anything I can do to help.
>>> Derek
>>>
>> The other changes between 4.20 and 5.0 don't look like they could cause
>> the issue. And two critical ones have been reverted already.
>> So what would need to be done is bisecting the issue.
>>
>> Heiner
> 
> So I took a quick look through the git log. Have you tried taking a
> look at the following commit as a possible culprit?
> commit e782410ed2378a2ddac58d944c3cf0c6f96b1ff3
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Sat Dec 15 16:25:05 2018 +0100
> 
>     r8169: improve spurious interrupt detection
> 
>     Improve detection of spurious interrupts by checking against the
>     interrupt mask as currently set in the chip.
> 
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> This patch seems problematic for a few different reasons. The first
> being that adding a register read to the interrupt handler will
> actually increase the latency for your interrupt handler as a MMIO
> read is not a fast operation compared to a memory read. You already
> had to read status, and adding a read for the interrupt mask will just
> double the overhead as the reads have to be performed sequentially.
> Secondly, are we guaranteed that the read of the register will always
> be reliable? I know I have seen plenty of hardware issues in the past
> where reading a register in the middle of an event can sometimes lead
> to the value read being incorrect.
> 
> The only other patch in the set between 4.20 and 5.0 that catches my eye is:
> commit 76085c9e7eaebf0caa2476f77380efc3d55b050c
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Thu Nov 22 22:03:08 2018 +0100
> 
>     r8169: replace macro TX_FRAGS_READY_FOR with a function
> 
>     Replace macro TX_FRAGS_READY_FOR with function rtl_tx_slots_avail
>     to make code cleaner and type-safe.
> 
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> However I don't see anything that should be wrong there that is
> causing this, but instrumenting the points where we are taking action
> based on that with a trace_printk probably wouldn't hurt as it would
> help us to verify that we are stopping things in xmit and not getting
> around to waking them up for whatever reason. Also a minor
> optimization you could do on the patch is to replace the call to
> netif_wake_queue in rtl8169_start_xmit with netif_start_queue as you
> don't need to actually wake up the queue since you are still in mid
> transmit so you just need to reset the bit so it doesn't prevent the
> next transmit.
> 
> Hope that helps.
> 
Definitely. Thanks a lot for the valuable feedback.

> - Alex
> 
Heiner

  reply	other threads:[~2019-03-17  8:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-10 19:02 r8169 driver from kernel 5.0 crashing VDR User
2019-03-10 19:35 ` Heiner Kallweit
2019-03-14  3:04   ` VDR User
2019-03-14  6:43     ` Heiner Kallweit
2019-03-14 15:10       ` VDR User
2019-03-14 18:08         ` Heiner Kallweit
2019-03-14 21:36           ` VDR User
2019-03-15 20:09             ` VDR User
2019-03-15 20:26               ` r8169 driver from kernel 5.0 crashing - napi_consume_skb Heiner Kallweit
2019-03-15 20:40                 ` Alexander Duyck
2019-03-15 20:46                   ` Heiner Kallweit
2019-03-15 22:09                     ` Alexander Duyck
2019-03-15 22:27                       ` Heiner Kallweit
     [not found]                         ` <b470f0ee6071bded8e78e15e18df14ffbc32cd18.camel@linux.intel.com>
2019-03-16  9:07                           ` Heiner Kallweit
2019-03-16 14:38                           ` VDR User
2019-03-16 15:10                             ` Heiner Kallweit
2019-03-16 17:08                               ` Alexander Duyck
2019-03-17  8:25                                 ` Heiner Kallweit [this message]
2019-03-17 11:35                             ` Heiner Kallweit
2019-03-17 15:23                               ` VDR User
2019-03-17 15:41                               ` VDR User
2019-03-17 15:49                                 ` Heiner Kallweit
2019-03-18 16:17                                   ` VDR User
2019-03-18 18:36                                     ` Heiner Kallweit
2019-03-21 19:35                                       ` VDR User
2019-03-22 19:03                                         ` Heiner Kallweit
2019-03-22 20:44                                           ` VDR User
2019-04-04 14:28                                             ` VDR User
2019-04-04 19:41                                               ` Heiner Kallweit
2019-04-04 22:11                                                 ` VDR User

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dca8471d-d599-4318-96fe-6cffb1cdcd0a@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=user.vdr@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).