From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 702B3C43381 for ; Fri, 15 Mar 2019 20:26:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3252E21871 for ; Fri, 15 Mar 2019 20:26:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lOm+T+OO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726701AbfCOU0X (ORCPT ); Fri, 15 Mar 2019 16:26:23 -0400 Received: from mail-wm1-f51.google.com ([209.85.128.51]:54956 "EHLO mail-wm1-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725990AbfCOU0W (ORCPT ); Fri, 15 Mar 2019 16:26:22 -0400 Received: by mail-wm1-f51.google.com with SMTP id f3so7828157wmj.4 for ; Fri, 15 Mar 2019 13:26:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=HNaCbIwFI190iWM9IOkBDgRcz3/mgwXLTqX1hPBF5Y4=; b=lOm+T+OO4lOXbsh8zpTp2mQrjwVMsWMTKBpVXnH2wh3EbOUb+zPBv5vZ1a+WiOx2Q+ izvWk+rJVgD/8ocP5OLqy0zvETlbxDDWxkHaOvlteIuYxrG9b9kGRcJCIdefXFn6cz3p /NbtJmtfclWZ1duMWYgZLM8Hf3QPmYDJmKihSmLVhVdrwuhf15gDKQjlzHjYvclfEcMf 4bsLOcxf427gVH3s0eBofPgIUT2ez2fjdUP3/9LYjIFveixj3Vi51e7qs5tLBvFduqSJ 1kpb68QrpH3NdfohcOb0w/Es4UWG2AZJW7k3T36aUOSq8uqcv4toUcnUoS0IPQvKLx4M c6Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=HNaCbIwFI190iWM9IOkBDgRcz3/mgwXLTqX1hPBF5Y4=; b=q6qqNS2JjKzW78urVYW70Pvn+xUwAKpErlpIB+ZdqO26kgvvwWIuxw52tazQQnUQcf o0xDa0pjYa0KJiDnTmLmEG4SU6xikH+hjiucX8yxiPgTMzkpPOC8emo1gA0cVzszcYsO endNV+KvvrFKgWA+z/m/nMbQDM2PKtiLBqv+hP6sgOntHQATaXsIdGIT64l/LwGRwZBd Why8L7k0eqqWXcI72FS7teqY6z2UGB05FoYvPAzBaymOjGZH6Y0/v7RokFgNyHhLjl33 4cFjzJ43/jby57bOmE7mBLzj+Ih5ZTNvmwisJKM26GvlL1aQlu1TJCLcy4ZrNdAlSGkF F4oA== X-Gm-Message-State: APjAAAW3EtY/6dBFST9waAZVAjL7g+6pwWyQUMto9hVR+V/WBaQmXIwA 0AXgssIRu7eyIE88QHXU/M1AeqXp X-Google-Smtp-Source: APXvYqzYYRKdyENrXZTfiTg1XJQPrI7BxHU5SzFNRQO47xwwnK9ceRQ8ImXQe1hf07FOBKaMWABBvw== X-Received: by 2002:a1c:c90b:: with SMTP id f11mr3301519wmb.61.1552681579514; Fri, 15 Mar 2019 13:26:19 -0700 (PDT) Received: from ?IPv6:2003:ea:8bc4:dc00:e99b:2537:dcd3:443f? (p200300EA8BC4DC00E99B2537DCD3443F.dip0.t-ipconnect.de. [2003:ea:8bc4:dc00:e99b:2537:dcd3:443f]) by smtp.googlemail.com with ESMTPSA id 16sm2381552wrb.19.2019.03.15.13.26.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Mar 2019 13:26:18 -0700 (PDT) Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb To: VDR User , Alexander Duyck Cc: netdev@vger.kernel.org References: <753b56b8-f1ab-82f5-f9b5-089fbb638989@gmail.com> <02388deb-0a06-95ae-1aac-b39c108fc2e7@gmail.com> From: Heiner Kallweit Message-ID: <9b34d60d-8de7-5384-3822-98ec79d53e04@gmail.com> Date: Fri, 15 Mar 2019 21:26:08 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 15.03.2019 21:09, VDR User wrote: >>>>> Thanks for the additional info and for testing 4.20.15. >>>>> To rule out that the issue is caused by a regression in network or >>>>> some other subsystem: Can you take the r8169.c from 4.20.15 and test >>>>> it on top of 5.0? >>>>> Meanwhile I'll look at the changes in the driver between 4.20 and 5.0. >>>> >>>> Sure, no problem! I'll copy the driver & recompile now actually. >>>> Hopefully there aren't a ton of changes to r8169.c to sift through and >>>> the cause isn't good at hiding itself! >>>> >>> I checked the driver changes new in 5.0 and there are very few >>> functional changes. You could try to revert the following: >>> >>> 5317d5c6d47e ("r8169: use napi_consume_skb where possible") >> >> Will do, and fwiw, while I haven't been able to do tons of testing >> today, I haven't been able to trigger the crash after replacing >> 5.0.0's r8169.c with 4.20.15's r8169.c this morning. I'll restore the >> file and revert the change you mentioned, and report back my findings. > > Heiner, > > After going back to vanilla kernel 5.0 and then reverting 5317d5c6d47e > ("r8169: use napi_consume_skb where possible"), I so far have not had > any crashes after transferring roughly 30GB back & forth. I'm not > completely confident yet the crash is resolve with that revert and > will continue to do further testing throughout the weekend as well. > What confidence level do you have that 5317d5c6d47e is the culprit at > this point? > Good, thanks for testing. I simply see no other change since 4.20 that could cause these symptoms. Using napi_consume_skb() at this place in r8169.c looks safe to me. Option 1 is that I miss something, option 2 is that there's an issue in the NAPI subsystem. However in the latter case I assume at least the Mellanox and/or Intel guys would have observed the same issue on their respective CI systems. Let me add Alexander, maybe he can provide a hint before we go and revert the change. > Thanks, > Derek > Heiner