public inbox for linuxppc-dev@ozlabs.org
 help / color / mirror / Atom feed
From: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: <mahesh@linux.ibm.com>, <oohall@gmail.com>, <bhelgaas@google.com>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <CobeChen@zhaoxin.com>,
	<TonyWWang@zhaoxin.com>, <ErosZhang@zhaoxin.com>,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery
Date: Wed, 28 Jan 2026 18:07:51 +0800	[thread overview]
Message-ID: <3af9f754-d282-485c-a3f2-49a230bfe143@zhaoxin.com> (raw)
In-Reply-To: <20260123202140.GA84703@bhelgaas>



在 2026/1/24 4:21, Bjorn Helgaas 写道:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
> [+cc Lukas, pciehp expert and author of a97396c6eb13]
> 
> On Fri, Jan 23, 2026 at 06:40:34PM +0800, LeoLiu-oc wrote:
>> Commit a97396c6eb13 ("PCI: pciehp: Ignore Link Down/Up caused by DPC")
>> amended PCIe hotplug to not bring down the slot upon Data Link Layer State
>> Changed events caused by Downstream Port Containment.
>>
>> However, PCIe hotplug (pciehp) waits up to 4 seconds before assuming that
>> DPC recovery has failed and disabling the slot. This timeout period is
>> insufficient for some PCIe devices.
>> For example, the E810 dual-port network card driver needs to take over
>> 10 seconds to execute its err_detected() callback.
>> Since this exceeds the maximum wait time allowed for DPC recovery by the
>> hotplug IRQ threads, a race condition occurs between the hotplug thread and
>> the dpc_handler() thread.
> 
> Add blank lines between paragraphs.
> 
> Include the name of the E810 driver so we can easily find the
> .err_detected() callback in question.  Actually, including the *name*
> of that callback would be a very direct way of doing this :)
> 
> I guess the problem this fixes is that there was a PCIe error that
> triggered DPC, and the E810 .err_detected() works but takes longer
> than expected, which results in pciehp disabling the slot when it
> doesn't need to?  So the user basically sees a dead E810 device?
> 
Yes, this patch is to solve this problem.

> It seems unfortunate that we have this dependency on the time allowed
> for .err_detected() to execute.  It's nice if adding arbitrary delay
> doesn't break things, but maybe we can't always achieve that.
> 
I think this is a feasible solution. For some PCIE devices, executing
the .err_detect() within 4 seconds will not have any impact, for a few
PCIE devices, it might increase the execution time of pciehp_ist().
Without this patch, PCIE devices may not be usable and could even cause
more serious errors, such as a kernel panic. For example, the following
log is encountered in hardware testing:

list_del corruption, ffff8881418b79e8->next is LIST_POISON1
(dead000000000100)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:56!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
...
Kernel panic - not syncing: Fatal exception

> I see that pci_dpc_recovered() is called from pciehp_ist().  Are we
> prepared for long delays there?
> 
This patch may affect the hotplug IRQ threads execution time triggered
by DPC, but it has no effect for normal HotPlug operation, e.g.
Attention Button Pressed or Power Fault Detected. If you have better
modification suggestions, I will update to the next version.

>> Signed-off-by: LeoLiu-oc <LeoLiu-oc@zhaoxin.com>
>> ---
>>  drivers/pci/pcie/dpc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index fc18349614d7..08b5f275699a 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -121,7 +121,7 @@ bool pci_dpc_recovered(struct pci_dev *pdev)
>>        * but reports indicate that DPC completes within 4 seconds.
>>        */
>>       wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev),
>> -                        msecs_to_jiffies(4000));
>> +                        msecs_to_jiffies(16000));
> 
> It looks like this breaks the connection between the "completes within
> 4 seconds" comment and the 4000ms wait_event timeout.
> 
Thanks for your suggestion, I will change it in the next version.

Yours sincerely.
LeoLiu-oc

>>       return test_and_clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>>  }
>> --
>> 2.43.0
>>



  reply	other threads:[~2026-01-28 11:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 10:40 [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery LeoLiu-oc
2026-01-23 20:21 ` Bjorn Helgaas
2026-01-28 10:07   ` LeoLiu-oc [this message]
2026-01-28 19:48     ` Bjorn Helgaas
2026-02-02  5:55       ` LeoLiu-oc
2026-01-30 11:59     ` Lukas Wunner
2026-02-02  6:00       ` LeoLiu-oc
2026-02-02  9:02         ` Lukas Wunner
2026-02-03 11:23           ` Przemek Kitszel
2026-02-04  2:10           ` LeoLiu-oc
2026-02-06  8:13             ` LeoLiu-oc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3af9f754-d282-485c-a3f2-49a230bfe143@zhaoxin.com \
    --to=leoliu-oc@zhaoxin.com \
    --cc=CobeChen@zhaoxin.com \
    --cc=ErosZhang@zhaoxin.com \
    --cc=TonyWWang@zhaoxin.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox