From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B946FC43381 for ; Sun, 17 Mar 2019 11:35:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 784442086A for ; Sun, 17 Mar 2019 11:35:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ufc7Vqbt" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726806AbfCQLfz (ORCPT ); Sun, 17 Mar 2019 07:35:55 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:33109 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfCQLfz (ORCPT ); Sun, 17 Mar 2019 07:35:55 -0400 Received: by mail-wr1-f66.google.com with SMTP id i8so13979404wrm.0 for ; Sun, 17 Mar 2019 04:35:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Gn+e/dSjg5IKI6pxQjPS1qQpzYlAMFYG4i05LUPr0Hk=; b=Ufc7Vqbt76FRaEf9PArYzXZEnvBwIVAG3gZhHHoUKoQwITp2N/xznGV9y5di4HR7TP QzWLqJXrKjMy/N1Y8+JzLjWzDs7rq6SdA27LTVQb9MMtRpUQiEUs4/GtEkbPooBt1ii4 7I3OpM0Hun6qJqMYAlb99Pya0FVvK4d5m9Yo58aODMcoG/5qlnpd6LJXoUXJrJz4dqvv 9bDP6n32EupQfg81UojPsKs0UgOpcUFCjYk8KeQMzIBt3hF5WtvXw+SwyyOFyt5uSh6p Cof+DxiV65/+WRm3zhXPte1tdbj3GPjqmzTJP+N6AeyyrKKhFB8TlK7wSyzLmVuwZSp2 ivWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Gn+e/dSjg5IKI6pxQjPS1qQpzYlAMFYG4i05LUPr0Hk=; b=RqUeZRPdBBhhym4y6+nTCuHGbrW6OZnakY13pVBEZjdTbeGMsgWxMkz0BYGXjguqsD O+wxqZ1grvYFdALmG0j/4qR9AT53ojyfDwT9A8dA6LxJ01mz6sWRWfhqbWMRzbmDMbMw kOUrpBupE3ge1FA29VbbfYI2D/sT8orHoh3Sc/lsIQIlPtAMvmUKgWpMB82mlfS2azdr 0zrjwMr7wOxoMuvtT9GDubtDVqe4QUU+7yNXSRmgW6y7BjA6mr9ffLr58FtR7KG9ZGRn 513smDszQeItc7wuf3oD9xLolQzbaWCFlsh//RlwQSVk/R+rOCLFyUB6b2PNMeiPTNPg fVOg== X-Gm-Message-State: APjAAAXQOZxqhmuqTed/5AL1Zp3WiIVvwboG2ZZgF2ZUln0A04RKmZBd h2Fa0rc4rTleASq3N6YfZSYHJvew X-Google-Smtp-Source: APXvYqwqpspbfN7oiVgR2CoO/OR3QKm2y/1leTkirMkt3uKBv+G4ITkUiB7A0zL6Ky6wIVOkV8dE2g== X-Received: by 2002:adf:d84d:: with SMTP id k13mr9422070wrl.154.1552822552355; Sun, 17 Mar 2019 04:35:52 -0700 (PDT) Received: from ?IPv6:2003:ea:8bc4:dc00:b5ac:b014:6448:28f? (p200300EA8BC4DC00B5ACB0146448028F.dip0.t-ipconnect.de. [2003:ea:8bc4:dc00:b5ac:b014:6448:28f]) by smtp.googlemail.com with ESMTPSA id s5sm19283002wra.77.2019.03.17.04.35.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 17 Mar 2019 04:35:51 -0700 (PDT) Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb To: VDR User Cc: Alexander Duyck , netdev@vger.kernel.org References: <753b56b8-f1ab-82f5-f9b5-089fbb638989@gmail.com> <02388deb-0a06-95ae-1aac-b39c108fc2e7@gmail.com> <9b34d60d-8de7-5384-3822-98ec79d53e04@gmail.com> <0704f164-aa0a-bcae-a886-a7fc4a4cd52f@gmail.com> <8f910b1339a741cdc780f3948c11a082a8a51b9e.camel@linux.intel.com> From: Heiner Kallweit Message-ID: Date: Sun, 17 Mar 2019 12:35:45 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 16.03.2019 15:38, VDR User wrote: >> Part of the issue though is that we don't know how reliable that test >> was. I believe Derek he hasn't had any crashes, but he wasn't confident >> that it had actually resolved the issue. > > Previously I thought I could easily & consistently reproduce the crash > but the more testing I did, the more I realized that wasn't the case. > That's why my confidence was low in that reversing commit 5317d5c6d47e > ("r8169: use napi_consume_skb where possible") fixed it. I felt like I > needed to do a lot more testing over the weekend to be sure. But, I > can now confirm that reversing that commit did not solve the problem. > I didn't ifdown/ifup after the crash so the nic eventually recovered > on its own I guess. The `ethtool -S` output is: > > NIC statistics: > tx_packets: 5370650 > rx_packets: 57340787 > tx_errors: 0 > rx_errors: 0 > rx_missed: 26 > align_errors: 0 > tx_single_collisions: 0 > tx_multi_collisions: 0 > unicast: 57332905 > broadcast: 6409 > multicast: 1473 > tx_aborted: 0 > tx_underrun: 0 > [...] > > Please let me know if there's anything I can do to help. > Derek > Below are two patches. The first removes an extra PCI register read in the interrupt handler, the second one just adds some tracing for debugging. Interesting would be whether patch 1 has an impact on the issue, and the trace output of patch 2 after the issue occurred (w/ or w/o patch 1). Trace output you find in /sys/kernel/debug/tracing/trace Heiner diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 761097710..46a4dc888 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -679,6 +679,7 @@ struct rtl8169_private { struct work_struct work; } wk; + unsigned irq_enabled:1; unsigned supports_gmii:1; dma_addr_t counters_phys_addr; struct rtl8169_counters *counters; @@ -1294,6 +1295,7 @@ static void rtl_ack_events(struct rtl8169_private *tp, u16 bits) static void rtl_irq_disable(struct rtl8169_private *tp) { RTL_W16(tp, IntrMask, 0); + tp->irq_enabled = 0; } #define RTL_EVENT_NAPI_RX (RxOK | RxErr) @@ -1302,6 +1304,7 @@ static void rtl_irq_disable(struct rtl8169_private *tp) static void rtl_irq_enable(struct rtl8169_private *tp) { + tp->irq_enabled = 1; RTL_W16(tp, IntrMask, tp->irq_mask); } @@ -6521,9 +6524,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) { struct rtl8169_private *tp = dev_instance; u16 status = RTL_R16(tp, IntrStatus); - u16 irq_mask = RTL_R16(tp, IntrMask); - if (status == 0xffff || !(status & irq_mask)) + if (!tp->irq_enabled || status == 0xffff || !(status & tp->irq_mask)) return IRQ_NONE; if (unlikely(status & SYSErr)) { -- 2.21.0 diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 46a4dc888..5a40fa6f8 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6258,6 +6258,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * not miss a ring update when it notices a stopped queue. */ smp_wmb(); + trace_printk("stopping tx queue\n"); netif_stop_queue(dev); /* Sync with rtl_tx: * - publish queue status and cur_tx ring index (write barrier) @@ -6267,8 +6268,10 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, * can't. */ smp_mb(); - if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) + if (rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { + trace_printk("waking tx queue\n"); netif_wake_queue(dev); + } } return NETDEV_TX_OK; @@ -6376,6 +6379,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp, smp_mb(); if (netif_queue_stopped(dev) && rtl_tx_slots_avail(tp, MAX_SKB_FRAGS)) { + trace_printk("waking tx queue\n"); netif_wake_queue(dev); } /* -- 2.21.0