From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C214FC433EF for ; Thu, 2 Dec 2021 22:34:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377247AbhLBWiC (ORCPT ); Thu, 2 Dec 2021 17:38:02 -0500 Received: from mga07.intel.com ([134.134.136.100]:40876 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1377426AbhLBWhq (ORCPT ); Thu, 2 Dec 2021 17:37:46 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10186"; a="300245957" X-IronPort-AV: E=Sophos;i="5.87,282,1631602800"; d="scan'208";a="300245957" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2021 14:34:22 -0800 X-IronPort-AV: E=Sophos;i="5.87,282,1631602800"; d="scan'208";a="746314689" Received: from vcostago-desk1.jf.intel.com (HELO vcostago-desk1) ([10.54.70.10]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2021 14:34:21 -0800 From: Vinicius Costa Gomes To: Stefan Dietrich Cc: kuba@kernel.org, greg@kroah.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, regressions@lists.linux.dev Subject: Re: [PATCH] igc: Avoid possible deadlock during suspend/resume In-Reply-To: <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> References: <87r1awtdx3.fsf@intel.com> <20211201185731.236130-1-vinicius.gomes@intel.com> <5a4b31d43d9bf32e518188f3ef84c433df3a18b1.camel@gmx.de> Date: Thu, 02 Dec 2021 14:34:21 -0800 Message-ID: <87o85yljpu.fsf@intel.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Stefan, Stefan Dietrich writes: > Hi Vinicius, > > thanks for the patch - unfortunately it did not solve the issue and I > am still getting reboots/lockups. > Thanks for the test. We learned something, not a lot, but something: the problem you are facing is PTM related and it's not the same bug as that PM deadlock. I am still trying to understand what's going on. Are you able to send me the 'dmesg' output for the two kernel configs (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the network interface up or down). Your kernel .config would be useful as well. > > Cheers, > Stefan > > On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote: >> Inspired by: >> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >> >> Signed-off-by: Vinicius Costa Gomes >> --- >> Just to see if it's indeed the same problem as the bug report above. >> >> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c >> b/drivers/net/ethernet/intel/igc/igc_main.c >> index 0e19b4d02e62..c58bf557a2a1 100644 >> --- a/drivers/net/ethernet/intel/igc/igc_main.c >> +++ b/drivers/net/ethernet/intel/igc/igc_main.c >> @@ -6619,7 +6619,7 @@ static void igc_deliver_wake_packet(struct >> net_device *netdev) >> netif_rx(skb); >> } >> >> -static int __maybe_unused igc_resume(struct device *dev) >> +static int __maybe_unused __igc_resume(struct device *dev, bool rpm) >> { >> struct pci_dev *pdev = to_pci_dev(dev); >> struct net_device *netdev = pci_get_drvdata(pdev); >> @@ -6661,20 +6661,27 @@ static int __maybe_unused igc_resume(struct >> device *dev) >> >> wr32(IGC_WUS, ~0); >> >> - rtnl_lock(); >> + if (!rpm) >> + rtnl_lock(); >> if (!err && netif_running(netdev)) >> err = __igc_open(netdev, true); >> >> if (!err) >> netif_device_attach(netdev); >> - rtnl_unlock(); >> + if (!rpm) >> + rtnl_unlock(); >> >> return err; >> } >> >> static int __maybe_unused igc_runtime_resume(struct device *dev) >> { >> - return igc_resume(dev); >> + return __igc_resume(dev, true); >> +} >> + >> +static int __maybe_unused igc_resume(struct device *dev) >> +{ >> + return __igc_resume(dev, false); >> } >> >> static int __maybe_unused igc_suspend(struct device *dev) >> @@ -6738,7 +6745,7 @@ static pci_ers_result_t >> igc_io_error_detected(struct pci_dev *pdev, >> * @pdev: Pointer to PCI device >> * >> * Restart the card from scratch, as if from a cold-boot. >> Implementation >> - * resembles the first-half of the igc_resume routine. >> + * resembles the first-half of the __igc_resume routine. >> **/ >> static pci_ers_result_t igc_io_slot_reset(struct pci_dev *pdev) >> { >> @@ -6777,7 +6784,7 @@ static pci_ers_result_t >> igc_io_slot_reset(struct pci_dev *pdev) >> * >> * This callback is called when the error recovery driver tells us >> that >> * its OK to resume normal operation. Implementation resembles the >> - * second-half of the igc_resume routine. >> + * second-half of the __igc_resume routine. >> */ >> static void igc_io_resume(struct pci_dev *pdev) >> { > Cheers, -- Vinicius