public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	oohall@gmail.com, Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Fontenot Nathan <Nathan.Fontenot@amd.com>
Subject: Re: [PATCH v2 1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR
Date: Mon, 22 May 2023 15:23:57 -0700	[thread overview]
Message-ID: <8ab986f2-6aa5-401a-aa21-e8b21f68eaad@amd.com> (raw)
In-Reply-To: <20230516101001.GA18952@wunner.de>

Hi,

On 5/16/2023 3:10 AM, Lukas Wunner wrote:
> On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
>> According to Section 6.7.6 of PCIe Base Specification [1], async removal
>> with DPC and EDR may be unexpected and may result in a Surprise Down error.
>> This error is just a side effect of hot remove. Most of the time, these
>> errors will be abstract to the OS as current systems rely on Firmware-First
>> model for AER and DPC, where the error handling (side effects of async
>> remove) and other necessary HW sequencing actions is taken care by the FW
>> and is abstract to the OS. However, FW-First model poses issues while
>> rolling out updates or fixing bugs as the servers need to be brought down
>> for firmware updates.
>>
>> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
>> responsible for handling async add and remove along with handling of AER
>> and DPC events which are generated as a side-effect of async remove.
> 
> I think you can probably leave out the details about Firmware-First.
> Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as
> an expected side-effect of surprise removal is sufficient.  They should
> be treated as such.
> 
> You also want to point out what you're trying to achieve here, i.e. get
> rid of irritating log messages and a 1 sec delay while pciehp waits for
> DPC recovery.

Okay.

> 
> 
>> Please note that, I have provided explanation why I'm not setting the
>> Surprise Down bit in uncorrectable error mask register in AER.
>> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/
> 
> Add an explanation to the commit message that masking Surprise Down Errors
> was explored as an alternative approach, but scrapped due to the odd
> behavior that masking only avoids the interrupt, but still records an
> error per PCIe r6.0.1 sec 6.2.3.2.2.  That stale error is going to be
> reported the next time some error other than Surprise Down is handled.
> 
> 

Okay.

>> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
>> on an async remove and will not be cleared while the device is brought
>> down. I have included clearing them here in order to mask any kind of
>> appearance that there was an error and as well duplicating our BIOS
>> functionality. I can remove if its not necessary.
> 
> Which bits are set exactly?  Can you constrain the register write to
> those bits?
>
Hmm, I was mostly trying to follow the similar approach done for AER.
pci_aer_raw_clear_status(), where they clear status registers 
unconditionally. Also, thought might be better if we are dealing with 
legacy endpoints or so.

I see these bits set in status registers:
PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down)
PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion)
PCI_EXP_DEVSTA 0x604 (fatal error detected)

> 
>> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
>> +{
>> +	if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
>> +		return;
> 
> Emit an error message here?

Okay

> 
> 
>> +	/*
>> +	 * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
>> +	 * removal might be unexpected, errors might be reported as a side
>> +	 * effect of the event and software should handle them as an expected
>> +	 * part of this event.
>> +	 */
> 
> I'd move that code comment to into dpc_handler() above the
> "if (dpc_is_surprise_removal(pdev))" check.

Okay.

> 
> 
>> +	pci_aer_raw_clear_status(pdev);
>> +	pci_clear_surpdn_errors(pdev);
>> +
>> +	pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
>> +			      PCI_EXP_DPC_STATUS_TRIGGER);
>> +}
> 
> Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
> of the function to wake up a pciehp handler waiting for DPC recovery?

I don't think so. The pciehp handler is however getting invoked 
simultaneously due to PDSC or DLLSC state change right.. Let me know if 
I'm missing anything here.

> 
> 
>> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
>> +{
>> +	u16 status;
>> +
>> +	pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
>> +
>> +	if (!(status & PCI_ERR_UNC_SURPDN))
>> +		return false;
>> +
> 
> You need an additional check for pdev->is_hotplug_bridge here.
> 
> And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
> 
> Return false if either of them isn't set.

Return false, if PCI_EXP_SLTCAP isn't set only correct? 
PCI_EXP_SLTCAP_HPS should be disabled if DPC is enabled.

Implementation notes in 6.7.6 says that:
"The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
Surprise bit in the Slot Capabilities Register being Set, is deprecated
for use with async hot-plug. DPC is the recommended mechanism for 
supporting async hot-plug."

Platform FW will disable the SLTCAP_HPS bit at boot time to enable async 
hotplug on AMD devices.

Probably check if SLTCAP_HPS bit is set and return false?

> 
> 
>> +	dpc_handle_surprise_removal(pdev);
>> +
>> +	return true;
>> +}
> 
> A function named "..._is_..." should not have any side effects.
> So move the dpc_handle_surprise_removal() call down into dpc_handler()
> before the "return IRQ_HANDLED;" statement.

Okay.

> 
> 
> What about ports which support AER but not DPC?  Don't you need to
> amend aer.c in that case?  I suppose you don't have hardware without
> DPC to test that?

Yeah right. Also, if DPC isn't supported we leave HPS bit set and we 
won't see the DPC event at that time.

Thanks,
Smita

> 
> Thanks,
> 
> Lukas


  reply	other threads:[~2023-05-22 22:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 21:05 [PATCH v2 0/2] PCI: pciehp: Add support for native AER and DPC handling on async remove Smita Koralahalli
2023-04-18 21:05 ` [PATCH v2 1/2] PCI: pciehp: Add support for async hotplug with native AER and DPC/EDR Smita Koralahalli
2023-05-09 21:10   ` Sathyanarayanan Kuppuswamy
     [not found]     ` <5efcb6a9-5878-1e26-dd43-2e4bd01bc8a1@amd.com>
2023-05-11  6:56       ` Lukas Wunner
2023-05-16 10:10   ` Lukas Wunner
2023-05-22 22:23     ` Smita Koralahalli [this message]
2023-06-16 17:31       ` Lukas Wunner
2023-06-16 23:30         ` Smita Koralahalli
2023-06-17  7:59           ` Lukas Wunner
2023-04-18 21:05 ` [PATCH v2 2/2] PCI: pciehp: Clear the optional capabilities in DEVCTL2 on a hot-plug Smita Koralahalli
2023-05-11 11:19   ` Lukas Wunner
2023-05-22 22:23     ` Smita Koralahalli
2023-06-16 18:24       ` Lukas Wunner
2023-06-16 23:34         ` Smita Koralahalli
2023-06-21  7:12           ` Lukas Wunner
2023-06-21 18:55             ` Smita Koralahalli
2023-05-09 20:58 ` [PATCH v2 0/2] PCI: pciehp: Add support for native AER and DPC handling on async remove Smita Koralahalli
2023-06-12 17:54   ` Bjorn Helgaas
2023-06-12 19:31     ` Smita Koralahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ab986f2-6aa5-401a-aa21-e8b21f68eaad@amd.com \
    --to=smita.koralahallichannabasappa@amd.com \
    --cc=Nathan.Fontenot@amd.com \
    --cc=bhelgaas@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox