From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1C0BC021AA for ; Wed, 19 Feb 2025 17:34:45 +0000 (UTC) Received: from localhost ([::1] helo=shelob.surriel.com) by shelob.surriel.com with esmtp (Exim 4.97.1) (envelope-from ) id 1tknaF-000000002Yt-1IMa; Wed, 19 Feb 2025 12:10:03 -0500 Received: from dfw.source.kernel.org ([2604:1380:4641:c500::1]) by shelob.surriel.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tknXM-000000002T2-3RUr for kernelnewbies@kernelnewbies.org; Wed, 19 Feb 2025 12:07:04 -0500 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D2E9C5C55C0; Wed, 19 Feb 2025 17:06:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93FBAC4CED1; Wed, 19 Feb 2025 17:06:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1739984802; bh=MyfLEvfS7yxTS8fAcMru9YtZORPP55M03oOVf1wJwag=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=N3GT3MVnRSqdikNeUBuH8nooDmDSlHq7qhQNbdzQQCBEGY3dgQF0c2xzPbt3cCgiA lhHU9ac9zww9+54xp2eKqSDGZxs75xiSiOssp9krqAWzxWE+s6Oe9DhqUR77zl4qrc A6EmmpD2rftmBbwBcwNpj0lgDLE1HXFvJl+Dd28ZnoM1iO2OC5HGl973pPhUaPIZlq 0hmYPkmqrNwb1egOEbiaDPZ2azBpMG8b2a49onzyMIiRwc2pDpqLzb8M1DYWorRpPR b9JuBolf8uORmuCE3CnN2fH5FMWgUs0XAJCEvfHt/da38pNyBNJQOXY5+pBaG2Q7VS SkQqm5nFNM+jw== Date: Wed, 19 Feb 2025 11:06:40 -0600 From: Bjorn Helgaas To: Naveen Kumar P Subject: Re: PCI: hotplug_event: PCIe PLDA Device BAR Reset Message-ID: <20250219170640.GA219612@bhelgaas> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Cc: linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, kernelnewbies X-BeenThere: kernelnewbies@kernelnewbies.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Learn about the Linux kernel List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kernelnewbies-bounces@kernelnewbies.org [+cc linux-acpi] On Wed, Feb 19, 2025 at 05:52:47PM +0530, Naveen Kumar P wrote: > Hi all, > > I am writing to seek assistance with an issue we are experiencing with > a PCIe device (PLDA Device 5555) connected through PCI Express Root > Port 1 to the host bridge. > > We have observed that after booting the system, the Base Address > Register (BAR0) memory of this device gets reset to 0x0 after > approximately one hour or more (the timing is inconsistent). This was > verified using the lspci output and the setpci -s 01:00.0 > BASE_ADDRESS_0 command. > > To diagnose the issue, we checked the dmesg log, but it did not > provide any relevant information. I then enabled dynamic debugging for > the PCI subsystem (drivers/pci/*) and noticed the following messages > related ACPI hotplug in the dmesg log: > > [ 0.465144] pci 0000:01:00.0: reg 0x10: [mem 0xb0400000-0xb07fffff] > ... > [ 6710.000355] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [ 7916.250868] perf: interrupt took too long (4072 > 3601), lowering > kernel.perf_event_max_sample_rate to 49000 > [ 7984.719647] perf: interrupt took too long (5378 > 5090), lowering > kernel.perf_event_max_sample_rate to 37000 > [11051.409115] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [11755.388727] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [12223.885715] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [14303.465636] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > After these messages appear, reading the device BAR memory results in > 0x0 instead of the expected value. > > I would like to understand the following: > > 1. What could be causing these hotplug_event debug messages? This is an ACPI Notify event. Basically the platform is telling us to re-enumerate the hierarchy below RP01 because a device might have been added or removed. Unfortunately the only real information we get is the ACPI device (RP01) and the notification value (ACPI_NOTIFY_BUS_CHECK). You could instrument acpiphp_check_bridge() to see what path we take. The main paths look like enable_slot() or disable_slot(), but those both include a pr_debug() than you apparently don't see. A remove followed by add would definitely reset the device, including its BARs. But you would normally see some messages related to enumerating a new device. If this doesn't help, try to reproduce the problem with a recent kernel, e.g., v6.13, and post the complete dmesg log. > 2. Why does this result in the BAR memory being reset? > 3. How can we resolve this issue? > > I have verified that the issue occurs even without loading the driver > for the PLDA Device 5555, so it does not appear to be related to the > device driver. > > Any help or guidance on debugging this issue would be greatly appreciated. > > Thank you for your assistance. > > Best regards, > Naveen _______________________________________________ Kernelnewbies mailing list Kernelnewbies@kernelnewbies.org https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies