From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E528623BCF4; Wed, 21 May 2025 10:04:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747821899; cv=none; b=ZKX7hd3D/IV3xt2V9HKSKCat/OLNWDRTA7mqsDtAQXz/nXomrm/Zon+cfg6ZNN9K+QVNCUcD/VdhtBbbquciEsIew2vofCBzVCcygz+WhLbbDStWci0ASwXYXGl21ZmgdkcCHHL2Hc9N6i/kW6uQuNymTFIydEdv3rSNVq/r8ho= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747821899; c=relaxed/simple; bh=V+k7h6+Q0/ddKSMSP5VzOiJCtuV4GZRXu/fTW8/R69M=; h=From:Date:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=X831k0S1LMzN/ROcc+LC6e21OsitOF1VFYjxgP9//HiVrH1YSdNVjHg1dRVhq0qz/aDogySaWVNf8bVu2QtVedql25bOvG6wCfPaFaQ85fUQHgPoUIfM8eO4us5W4UUhdd3g+McZokxx5Lwvix/YVgt94/5gvDySY3HLrcpFlEY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bgln4jpi; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bgln4jpi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747821898; x=1779357898; h=from:date:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=V+k7h6+Q0/ddKSMSP5VzOiJCtuV4GZRXu/fTW8/R69M=; b=bgln4jpi3ZWd2d2AIFuXOCVL2Gv1XOHyQZNosvt95ArqI+Hq8Qaqn5Xz LRO/Frow7LxdrBT3fWBeFrv+bv/6Ka+pZddP1uRUCxtHuOn7+aPIsP6tS hlmgJNfkzG6l7L+ZidVMD0cnD7t5TKygrpu+MH6EMQhxl4taSMrY/MZ1O N7UV4vI6Y1U8mt98oFqwksaar33bkO9IHHoWL5no+2kNz2yGL0CIyosss 2MhHCQyrBzdE+hbAkQo/eJnapM3qSJmA3DHhYT00smSf5Dk/sMAtZs+nx TpJPg18DUQ2MaF2Tsg5nLwDJNO3BHTNSDbgl6Zia8wVf4G2uiAbvpfYez g==; X-CSE-ConnectionGUID: SEaxTx6dRnGtJSaaM6Ye5Q== X-CSE-MsgGUID: vPVeEeuWRLqm9sc+B6S4Tg== X-IronPort-AV: E=McAfee;i="6700,10204,11439"; a="60835090" X-IronPort-AV: E=Sophos;i="6.15,303,1739865600"; d="scan'208";a="60835090" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2025 02:57:22 -0700 X-CSE-ConnectionGUID: mPSgTPnCRMymVa6225jeoQ== X-CSE-MsgGUID: aZustmdkRT2F/53vhlKE7g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,303,1739865600"; d="scan'208";a="145242519" Received: from dalessan-mobl3.ger.corp.intel.com (HELO localhost) ([10.245.244.221]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2025 02:57:16 -0700 From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= Date: Wed, 21 May 2025 12:57:12 +0300 (EEST) To: Bjorn Helgaas cc: linux-pci@vger.kernel.org, Jon Pan-Doh , Karolina Stolarek , Martin Petersen , Ben Fuller , Drew Walton , Anil Agrawal , Tony Luck , Sathyanarayanan Kuppuswamy , Lukas Wunner , Jonathan Cameron , Sargun Dhillon , "Paul E . McKenney" , Mahesh J Salgaonkar , Oliver O'Halloran , Kai-Heng Feng , Keith Busch , Robert Richter , Terry Bowman , Shiju Jose , Dave Jiang , LKML , linuxppc-dev@lists.ozlabs.org, Bjorn Helgaas Subject: Re: [PATCH v6 14/16] PCI/AER: Introduce ratelimit for error logs In-Reply-To: <20250520193819.GA1318016@bhelgaas> Message-ID: <741055cc-42bd-2e84-20eb-a4345753fcbc@linux.intel.com> References: <20250520193819.GA1318016@bhelgaas> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323328-714307709-1747821432=:946" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323328-714307709-1747821432=:946 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Tue, 20 May 2025, Bjorn Helgaas wrote: > On Tue, May 20, 2025 at 02:55:32PM +0300, Ilpo J=C3=A4rvinen wrote: > > On Mon, 19 May 2025, Bjorn Helgaas wrote: > >=20 > > > From: Jon Pan-Doh > > >=20 > > > Spammy devices can flood kernel logs with AER errors and slow/stall > > > execution. Add per-device ratelimits for AER correctable and uncorrec= table > > > errors that use the kernel defaults (10 per 5s). > > >=20 > > > There are two AER logging entry points: > > >=20 > > > - aer_print_error() is used by DPC and native AER > > >=20 > > > - pci_print_aer() is used by GHES and CXL > > >=20 > > > The native AER aer_print_error() case includes a loop that may log de= tails > > > from multiple devices. This is ratelimited by the union of ratelimit= s for > > > these devices, set by add_error_device(), which collects the devices.= If > > > no such device is found, the Error Source message is ratelimited by t= he > > > Root Port or RCEC that received the ERR_* message. > > >=20 > > > The DPC aer_print_error() case is currently not ratelimited. > > >=20 > > > The GHES and CXL pci_print_aer() cases are ratelimited by the Error S= ource > > > device. >=20 > > > static int add_error_device(struct aer_err_info *e_info, struct pci_= dev *dev) > > > { > > > +=09/* > > > +=09 * Ratelimit AER log messages. Generally we add the Error Source > > > +=09 * device, but there are is_error_source() cases that can result = in > > > +=09 * multiple devices being added here, so we OR them all together. > >=20 > > I can see the code uses OR ;-) but I wasn't helpful because this commen= t=20 > > didn't explain why at all. As this ratelimit thing is using reverse log= ic=20 > > to begin with, this is a very tricky bit. > >=20 > > Perhaps something less vague like: > >=20 > > ... we ratelimit if all devices have reached their ratelimit. > >=20 > > Assuming that was the intention here? (I'm not sure.) >=20 > My intention was that if there's any downstream device that has an > unmasked error logged and it has not reached its ratelimit, we should > log messages for all devices with errors logged. Does something like > this help? >=20 > /* > * Ratelimit AER log messages. "dev" is either the source > * identified by the root's Error Source ID or it has an unmasked > * error logged in its own AER Capability. If any of these devices > * has not reached its ratelimit, log messages for all of them. > * Messages are emitted when e_info->ratelimit is non-zero. > * > * Note that e_info->ratelimit was already initialized to 1 for the > * ERR_FATAL case. > */ Yes, this is much clearer of intent, thanks. > The ERR_FATAL case is from this post-v6 change that I haven't posted > yet: >=20 > aer_isr_one_error(...) > { > ... > if (status & PCI_ERR_ROOT_UNCOR_RCV) { > int fatal =3D status & PCI_ERR_ROOT_FATAL_RCV; > struct aer_err_info e_info =3D { > ... > + .ratelimit =3D fatal ? 1 : 0; >=20 >=20 > > > +=09 */ > > > =09if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { > > > =09=09e_info->dev[e_info->error_dev_num] =3D pci_dev_get(dev); > > > +=09=09e_info->ratelimit |=3D aer_ratelimit(dev, e_info->severity); > > > =09=09e_info->error_dev_num++; > > > =09=09return 0; > > > =09} >=20 > > > @@ -1147,9 +1183,10 @@ static void aer_recover_work_func(struct work_= struct *work) > > > =09=09pdev =3D pci_get_domain_bus_and_slot(entry.domain, entry.bus, > > > =09=09=09=09=09=09 entry.devfn); > > > =09=09if (!pdev) { > > > -=09=09=09pr_err("no pci_dev for %04x:%02x:%02x.%x\n", > > > -=09=09=09 entry.domain, entry.bus, > > > -=09=09=09 PCI_SLOT(entry.devfn), PCI_FUNC(entry.devfn)); > > > +=09=09=09pr_err_ratelimited("%04x:%02x:%02x.%x: no pci_dev found\n", > >=20 > > This case was not mentioned in the changelog. >=20 > Sharp eyes! What do you think of this commit log text? >=20 > The CXL pci_print_aer() case is ratelimited by the Error Source device. >=20 > The GHES pci_print_aer() case is via aer_recover_work_func(), which > searches for the Error Source device. If the device is not found, ther= e's > no per-device ratelimit, so we use a system-wide ratelimit that covers = all > error types (correctable, non-fatal, and fatal). Works for me as long as it is mentioned. > This isn't really ideal because in pci_print_aer(), the struct > aer_capability_regs has already been filled by firmware and the > logging doesn't read any registers from the device at all. >=20 > However, pci_print_aer() *does* want the pci_dev for statistics and > tracing (pci_dev_aer_stats_incr()) and, of course, for the aer_printks > themselves. While not a perfect solution, this looks yet another case where it would=20 help to create a dummy pci_dev struct with minimal setup which allows=20 calling functions that input a pci_dev. That solution is not perfect because it arms a trap. Downstream=20 functions could get changed and if the developer assumes they have a full= =20 pci_dev at hand, it could cause issues with the dummy pci_dev. How likely it happens is debatable but for many cases where the call-chain isn't=20 overly complex such as here, dummy pci_dev seems helpful. > We could leave this pr_err() completely alone; hopefully it's a rare > case. I think the CXL path just silently skips pci_print_aer() if > this happens. >=20 > Eventually I would really like the native AER path to start by doing > whatever firmware is doing, e.g., fill in struct aer_capability_regs, > so the core of the AER handling could be identical between native AER > and GHES/CXL. If we could do that, maybe we could figure out a > cleaner way to handle this corner case. --=20 i. --8323328-714307709-1747821432=:946--