Linux PCI subsystem development
 help / color / mirror / Atom feed
From: ALOK TIWARI <alok.a.tiwari@oracle.com>
To: Matthew W Carlis <mattc@purestorage.com>
Cc: ahuang12@lenovo.com, ashishk@purestorage.com,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	guojinhui.liam@bytedance.com,
	"Bjorn Helgaas" <helgaas@kernel.org>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	jiwei.sun.bj@qq.com, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, "Lukas Wunner" <lukas@wunner.de>,
	msaggi@purestorage.com, sconnor@purestorage.com,
	sunjw10@lenovo.com, "Maciej W. Rozycki" <macro@orcam.me.uk>
Subject: Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
Date: Thu, 30 Apr 2026 20:57:18 +0530	[thread overview]
Message-ID: <781bd57a-00ad-43d8-9f23-23b5f64f77a5@oracle.com> (raw)
In-Reply-To: <b1645e3b-21f9-4d12-bcc1-27740e2231d1@oracle.com>



On 3/27/2026 1:22 AM, ALOK TIWARI wrote:
> 
> 
> On 2/27/2026 3:32 AM, Maciej W. Rozycki wrote:
>> On Tue, 24 Feb 2026, Matthew W Carlis wrote:
>>
>>>>   I argue that by applying this change the issues with NVMe hot-plug 
>>>> will
>>>> be sorted while keeping the configuration working that
>>>> pcie_failed_link_retrain() is needed for.  Win-win.
>>>
>>> I don't think that what you are saying is true there is invariably 
>>> going to be
>>> some other consequence of this change.. Its hard to believe there can 
>>> be any
>>> changes to the pci drivers that won't break something.
>>
>>   You're being sarcastic, aren't you?
>>
>>   While I sympathise with your feeling, may I pretty please ask you to at
>> the very least give my fix a try in your test environment?
>>
>>>>   I note that active links are unaffected, so to say it's meddling 
>>>> with the
>>>> link on every device is I think a bit of an overstatement, and 
>>>> reports of
>>>> issues are from a few people only...
>>>
>>> There is no discrimination about which device it can be invoked on..
>>> I'm looking at a fleet of millions of hot-plug'able devices.... I 
>>> don't really
>>> know if it matters how many people report an issue, I think what 
>>> probably
>>> matters is making the right change. Initially was there any other 
>>> reports
>>> of the quirk helping with other devices besides the delock 41433?
>>
>>   No reports that I know of.  Please bear in mind that the failure 
>> mode is
>> such that you need enough knowledge of PCIe internals and the spec to
>> actually realise there is periodic link training activity taking place.
>>
>>   In the absence of the quirk for the average user there's just no
>> communication, as with a dead downstream device (and the upstream device
>> is sound as anything else plugged in, including but not limited to NVMe
>> storage, works just fine).  In the presence of the quirk the downstream
>> device just works and I expect hardly anyone can be bothered to report
>> seeing "broken device, retraining non-functional downstream link at
>> 2.5GT/s" in the log.  It's only cases like yours that bring attention to
>> the message.
>>
>>>>   What outcome would you envisage had I taken the approach from this 
>>>> update
>>>> right away with the original change?  My only fault was I have no 
>>>> use(*)
>>>> for PCIe hot-plug and did not predict the impact there.
>>>
>>> What I'm seeing now is an overall confusion about whether a link 
>>> failed to train
>>> to gen 1 or was recovered by the quirk or recovered on its own etc... 
>>> In my systems
>>> I would prefer to NEVER invoke the quirk under any circumstances 
>>> because I expect
>>> my devices to work. With the quirk it becomes more unclear about what 
>>> the cause
>>> of a link issue might have been or whether it was even a real link 
>>> issue in the
>>> first place or some weird timing..
>>
>>   I can see your point.
>>
>>   However from your description I infer this is about a test 
>> environment, a
>> development lab so to speak.  And you are a highly skilled professional
>> who has access to measurement, test, and hardware debug equipment, and 
>> are
>> therefore able to figure out stuff.  Conversely, the vast majority of
>> Linux deployments is in the field, where no sophisticated equipment is
>> available and the operator, if any, may have basic technical skills only.
>>
>>   I have been taught that in the field it is more desirable for equipment
>> to operate according to expectations rather than to strictly follow the
>> relevant specifications and consequently fail operating.  And the quirk I
>> have come up with just follows this principle, letting unqualified people
>> use their equipment (this is similar to Postel's law if you know what I
>> mean).
>>
>>   I realise that in the lab you want strict compliance as this will 
>> verify
>> interoperation of the devices you design.
>>
>>   So I think we have conflicting objectives here and I can only offer a
>> sysfs setting that will switch between the modes according to the 
>> specific
>> user's needs, as the intent is not something the kernel can figure out by
>> itself.
>>
>>   Please mind however that throughout this week and the next I'm away on
>> holiday (a proper one, as in alpine skiing), so my availability to 
>> respond
>> or work on stuff is limited.  I'll appreciate if you give my fix a try
>> meanwhile.
>>
>>    Maciej
> 
>  From my perspective, the current patch looks like a positive step and 
> seems to address the NVMe hot-plug concerns without regressing the 
> original use case.
> 
> That said, I’d really appreciate input from other maintainers on whether 
> this approach strikes the right balance between field robustness,
> 
> Given the concerns around hot-plug behavior and diagnosability, do you 
> think reverting (or partially reverting)
> the behavioral impact of the original commit: "PCI: Work around PCIe 
> link training failures"
> 
> what is best way to conclude this issue?
> 
> Thanks,
> Alok
> 

Hi Matthew,

Given your earlier concerns, do you think the right direction would be 
to (logically or partially) revert commit a89c82249c37 ("PCI: Work 
around PCIe link training failures"), or would you suggest refining the 
current approach instead?

I’d appreciate your guidance on what you think is the best way to move 
this forward.


Thanks,
Alok

  reply	other threads:[~2026-04-30 15:27 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08 19:24 [PATCH v2 0/3] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining Maciej W. Rozycki
2025-12-08 19:24 ` [PATCH v2 1/3] " Maciej W. Rozycki
2026-02-05 11:00   ` [External] : " ALOK TIWARI
2025-12-08 19:24 ` [PATCH v2 2/3] PCI: Use pcie_get_speed_cap() " Maciej W. Rozycki
2026-02-05 10:59   ` [External] : " ALOK TIWARI
2025-12-08 19:24 ` [PATCH v2 3/3] PCI: Bail out early for 2.5GT/s devices " Maciej W. Rozycki
2026-02-05 10:57   ` [External] : " ALOK TIWARI
2026-02-04 17:12 ` [PING][PATCH v2 0/3] PCI: Always lift 2.5GT/s restriction " Maciej W. Rozycki
2026-02-19  3:42   ` ALOK TIWARI
2026-03-09 15:45     ` ALOK TIWARI
2026-02-19 21:26 ` [PATCH " Bjorn Helgaas
2026-02-19 22:09   ` [PATCH] " Matthew W Carlis
2026-02-19 22:53     ` Bjorn Helgaas
2026-02-20 12:03       ` Maciej W. Rozycki
2026-02-23 17:36         ` Bjorn Helgaas
2026-02-23 22:49           ` Matthew W Carlis
2026-02-23 23:14           ` Maciej W. Rozycki
2026-02-25  1:41             ` Matthew W Carlis
2026-02-26 22:02               ` Maciej W. Rozycki
2026-03-26 19:52                 ` ALOK TIWARI
2026-04-30 15:27                   ` ALOK TIWARI [this message]
2026-05-06 22:30 ` [PATCH v2 0/3] " Bjorn Helgaas
  -- strict thread matches above, loose matches on Subject: below --
2025-12-01  3:52 [PATCH] " Maciej W. Rozycki
2025-12-01  9:45 ` Ilpo Järvinen
2025-12-01 13:55   ` Maciej W. Rozycki
2025-12-01 16:48     ` Ilpo Järvinen
2025-12-08 19:24     ` Maciej W. Rozycki
2025-12-04 18:30 ` Matthew W Carlis
2025-12-08 19:25   ` Maciej W. Rozycki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=781bd57a-00ad-43d8-9f23-23b5f64f77a5@oracle.com \
    --to=alok.a.tiwari@oracle.com \
    --cc=ahuang12@lenovo.com \
    --cc=ashishk@purestorage.com \
    --cc=bhelgaas@google.com \
    --cc=guojinhui.liam@bytedance.com \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jiwei.sun.bj@qq.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=macro@orcam.me.uk \
    --cc=mattc@purestorage.com \
    --cc=msaggi@purestorage.com \
    --cc=sconnor@purestorage.com \
    --cc=sunjw10@lenovo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox