From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95EA51A0B15; Mon, 20 Oct 2025 15:21:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.98 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760973677; cv=none; b=NkCs9lJw9zdMR0/TE30cKRDQEGQh4bus/sqAUqxYg7vuIvpi43sbilojVjOZy3sKbI/me+SEN99cFzxmmdPrh58hZL3nqkGLEewVztjnglisU9L4ZL/JAt1uo3ce8i1hjSdrbDZLWZ8Giwm8Qv4Ivsg9NZHkHAirY39yPEzYuNc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760973677; c=relaxed/simple; bh=sySpu83x3D2k+B0Y3xJGro9AoTYQQDg2MjIJhobhCIA=; h=Message-ID:Date:MIME-Version:From:Subject:To:Cc:References: In-Reply-To:Content-Type; b=MtXUmbJCB5YxFs4G3cRFlxtKlWG3nEGQYGA5FCDlCKse5IyK34XP1OXNGcRAVGWquXfV0IieVYeCdlREIb4fIlPjeEg1TpDxZUOIUxlC+sepTG75ALfAD36ISBkqBJpV3rBmuMntGy7XSYptACs8A1gWhd+ZvjBZbqu4FPz3CB0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=dSf/WOh1; arc=none smtp.client-ip=115.124.30.98 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="dSf/WOh1" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1760973664; h=Message-ID:Date:MIME-Version:From:Subject:To:Content-Type; bh=Vou/dzf4KP++tes6RLuekqGJp6PWAI7RQxEaTe2LEz4=; b=dSf/WOh15QX4onLkMBaXmVSYu0N6hCGfPAX+X9CSaLLjb3SAlaM89BuCB/QiBIdpmSmQwPpXFJuetHYs2jhInJBwqD+N3eS+R6qSgCSU10JMRwc/SPLkULK01UjEKLp/xIq2MEZQc7oSQIMEk1vPVuwa+efUAHSAz+KYupzfk1E= Received: from 30.246.161.241(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WqdMvSw_1760973662 cluster:ay36) by smtp.aliyun-inc.com; Mon, 20 Oct 2025 23:21:03 +0800 Message-ID: <30fe11dd-3f21-4a61-adb0-74e39087c84c@linux.alibaba.com> Date: Mon, 20 Oct 2025 23:20:58 +0800 Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Shuai Xue Subject: Re: [PATCH v6 3/5] PCI/AER: Report fatal errors of RCiEP and EP if link recoverd To: Lukas Wunner Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, bhelgaas@google.com, kbusch@kernel.org, sathyanarayanan.kuppuswamy@linux.intel.com, mahesh@linux.ibm.com, oohall@gmail.com, Jonathan.Cameron@huawei.com, terry.bowman@amd.com, tianruidong@linux.alibaba.com References: <20251015024159.56414-1-xueshuai@linux.alibaba.com> <20251015024159.56414-4-xueshuai@linux.alibaba.com> <6d7143a3-196f-49f8-8e71-a5abc81ae84b@linux.alibaba.com> <43390d36-147f-482c-b31a-d02c2624061f@linux.alibaba.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit 在 2025/10/20 22:24, Lukas Wunner 写道: > On Mon, Oct 20, 2025 at 10:17:10PM +0800, Shuai Xue wrote: >> void aer_report_frozen_error(struct pci_dev *dev) >> { >> struct aer_err_info info; >> >> if (dev->pci_type != PCI_EXP_TYPE_ENDPOINT && >> dev->pci_type != PCI_EXP_TYPE_RC_END) >> return; >> >> aer_info_init(&info); >> aer_add_error_device(&info, dev); >> info.severity = AER_FATAL; >> if (aer_get_device_error_info(&info, 0, true)) >> aer_print_error(&info, 0); >> >> /* pci_dev_put() pairs with pci_dev_get() in aer_add_error_device() */ >> pci_dev_put(dev); >> } Hi Lukas, > > Much better. Again, I think you don't need to rename add_error_device() > and then the code comment even fits on the same line: Good point. I'll keep the original function name and use the one-line comment format. > > pci_dev_put(dev); /* pairs with pci_dev_get() in add_error_device() */ > >>>> .slot_reset() >>>> => pci_restore_state() >>>> => pci_aer_clear_status() >>> >>> This was added in 2015 by b07461a8e45b. The commit claims that >>> the errors are stale and can be ignored. It turns out they cannot. >>> >>> So maybe pci_restore_state() should print information about the >>> errors before clearing them? >> >> While that could work, we would lose the error severity information at > > Wait, we've got that saved in pci_cap_saved_state, so we could restore > the severity register, report leftover errors, then clear those errors? You're right that the severity register is also sticky, so we could retrieve error severity directly from AER registers. However, I have concerns about implementing this approach: 1. Scope creep: The current implementation is explicit - we know we're in an AER error recovery path and intentionally report the missed fatal errors. Your suggestion would make error reporting implicit during any device state restoration. 2. Wider impact: pci_save_state() and pci_restore_state() are used extensively beyond AER error handling - in power management, reset operations, and various driver flows. Adding error reporting to pci_restore_state() could generate unexpected error messages in non-error scenarios. 3. Architectural consistency: As you noted earlier, "pci_restore_state() is only supposed to restore state, as the name implies, and not clear errors." Adding error reporting to this function would further violate this principle - we'd be making it do even more than just restore state. Would you prefer I implement this broader change, or shall we proceed with the targeted helper function approach for now? The helper function solves the immediate problem while keeping the changes focused on the AER recovery path. > Thanks, > > Lukas Thanks. Shuai