* Removing a misleading warning message?
@ 2024-10-14 7:12 Coly Li
0 siblings, 0 replies; 8+ messages in thread
From: Coly Li @ 2024-10-14 7:12 UTC (permalink / raw)
To: linux-cxl
Hi list,
Recently I have a report for a warning message from CXL subsystem,
[ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.144704] cxl_port port3: HDM decoder capability not found
[ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.144859] cxl_port port4: HDM decoder capability not found
[ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.174689] cxl_port port7: HDM decoder capability not found
[ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
[ 48.175105] cxl_port port8: HDM decoder capability not found
After checking the source code I realize this is not a real bug, just a warning message that expected device was not detected.
But from the above warning information itself, users/customers are worried there is something wrong (IMHO indeed not).
Is there any chance that we can improve the code logic that only printing out the warning message when it is really a problem to be noticed?
Thanks in advance.
Coly Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
[not found] <15237B14-B55B-4737-9A98-D76AEDB4AEAD@suse.de>
@ 2024-10-17 14:56 ` Alison Schofield
2024-10-18 10:22 ` Jonathan Cameron
2024-10-18 19:32 ` Dan Williams
0 siblings, 2 replies; 8+ messages in thread
From: Alison Schofield @ 2024-10-17 14:56 UTC (permalink / raw)
To: Coly Li; +Cc: nvdimm, linux-cxl
+ linux-cxl mailing list
On Fri, Oct 11, 2024 at 05:58:52PM +0800, Coly Li wrote:
> Hi list,
>
> Recently I have a report for a warning message from CXL subsystem,
> [ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.144704] cxl_port port3: HDM decoder capability not found
> [ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.144859] cxl_port port4: HDM decoder capability not found
> [ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.174689] cxl_port port7: HDM decoder capability not found
> [ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
> [ 48.175105] cxl_port port8: HDM decoder capability not found
>
> After checking the source code I realize this is not a real bug, just a warning message that expected device was not detected.
> But from the above warning information itself, users/customers are worried there is something wrong (IMHO indeed not).
>
> Is there any chance that we can improve the code logic that only printing out the warning message when it is really a problem to be noticed?
>
> Thanks in advance.
>
> Coly Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-10-17 14:56 ` Removing a misleading warning message? Alison Schofield
@ 2024-10-18 10:22 ` Jonathan Cameron
2024-10-18 19:32 ` Dan Williams
1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2024-10-18 10:22 UTC (permalink / raw)
To: Alison Schofield; +Cc: Coly Li, nvdimm, linux-cxl, Bowman, Terry
On Thu, 17 Oct 2024 07:56:03 -0700
Alison Schofield <alison.schofield@intel.com> wrote:
> + linux-cxl mailing list
>
> On Fri, Oct 11, 2024 at 05:58:52PM +0800, Coly Li wrote:
> > Hi list,
> >
> > Recently I have a report for a warning message from CXL subsystem,
> > [ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144704] cxl_port port3: HDM decoder capability not found
> > [ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144859] cxl_port port4: HDM decoder capability not found
> > [ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.174689] cxl_port port7: HDM decoder capability not found
> > [ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.175105] cxl_port port8: HDM decoder capability not found
> >
> > After checking the source code I realize this is not a real bug, just a warning message that expected device was not detected.
I'd like to understand a little more about the hardware.
Superficially this looks to be a screaming about non spec compliant (i.e. broken) ports.
Are these on a switch or all root ports? What is connected to the ports?
Understanding what exactly is going on may affect how this is suppressed (which seems
reasonable to do if this is production hardware).
There have been a few mentions of 'late' register validity on specific root ports. Maybe
that is going on here? +CC Terry.
Jonathan
> > But from the above warning information itself, users/customers are worried there is something wrong (IMHO indeed not).
> >
> > Is there any chance that we can improve the code logic that only printing out the warning message when it is really a problem to be noticed?
> >
> > Thanks in advance.
> >
> > Coly Li
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-10-17 14:56 ` Removing a misleading warning message? Alison Schofield
2024-10-18 10:22 ` Jonathan Cameron
@ 2024-10-18 19:32 ` Dan Williams
2024-10-21 5:05 ` Coly Li
2024-12-09 4:02 ` Li Ming
1 sibling, 2 replies; 8+ messages in thread
From: Dan Williams @ 2024-10-18 19:32 UTC (permalink / raw)
To: Alison Schofield, Coly Li; +Cc: nvdimm, linux-cxl
Alison Schofield wrote:
>
> + linux-cxl mailing list
Thanks for forwarding...
> On Fri, Oct 11, 2024 at 05:58:52PM +0800, Coly Li wrote:
> > Hi list,
> >
> > Recently I have a report for a warning message from CXL subsystem,
> > [ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144704] cxl_port port3: HDM decoder capability not found
> > [ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.144859] cxl_port port4: HDM decoder capability not found
> > [ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.174689] cxl_port port7: HDM decoder capability not found
> > [ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
> > [ 48.175105] cxl_port port8: HDM decoder capability not found
> >
> > After checking the source code I realize this is not a real bug,
> > just a warning message that expected device was not detected. But
> > from the above warning information itself, users/customers are
> > worried there is something wrong (IMHO indeed not).
> >
> > Is there any chance that we can improve the code logic that only
> > printing out the warning message when it is really a problem to be
> > noticed?
There is a short term fix and a long term fix. The short term fix could
be to just delete the warning message, or downgrade it to dev_dbg(), for
now since it is more often a false positive than not. The long term fix,
and the logic needed to resolve false-positive reports, is to flip the
capability discovery until *after* it is clear that there is a
downstream endpoint capable of CXL.cachemem.
Without an endpoint there is no point in reporting that a potentially
CXL capable port is missing cachemem registers.
So, if you want to send a patch changing that warning to dev_dbg() for
now I would support that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-10-18 19:32 ` Dan Williams
@ 2024-10-21 5:05 ` Coly Li
2024-12-09 4:02 ` Li Ming
1 sibling, 0 replies; 8+ messages in thread
From: Coly Li @ 2024-10-21 5:05 UTC (permalink / raw)
To: Dan Williams; +Cc: Alison Schofield, nvdimm, linux-cxl
> 2024年10月19日 03:32,Dan Williams <dan.j.williams@intel.com> 写道:
>
> Alison Schofield wrote:
>>
>> + linux-cxl mailing list
>
> Thanks for forwarding...
>
>> On Fri, Oct 11, 2024 at 05:58:52PM +0800, Coly Li wrote:
>>> Hi list,
>>>
>>> Recently I have a report for a warning message from CXL subsystem,
>>> [ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144704] cxl_port port3: HDM decoder capability not found
>>> [ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144859] cxl_port port4: HDM decoder capability not found
>>> [ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.174689] cxl_port port7: HDM decoder capability not found
>>> [ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.175105] cxl_port port8: HDM decoder capability not found
>>>
>>> After checking the source code I realize this is not a real bug,
>>> just a warning message that expected device was not detected. But
>>> from the above warning information itself, users/customers are
>>> worried there is something wrong (IMHO indeed not).
>>>
>>> Is there any chance that we can improve the code logic that only
>>> printing out the warning message when it is really a problem to be
>>> noticed?
>
> There is a short term fix and a long term fix. The short term fix could
> be to just delete the warning message, or downgrade it to dev_dbg(), for
> now since it is more often a false positive than not. The long term fix,
> and the logic needed to resolve false-positive reports, is to flip the
> capability discovery until *after* it is clear that there is a
> downstream endpoint capable of CXL.cachemem.
>
> Without an endpoint there is no point in reporting that a potentially
> CXL capable port is missing cachemem registers.
>
> So, if you want to send a patch changing that warning to dev_dbg() for
> now I would support that.
A patch posted by the above suggestion. Thanks in advance for reviewing.
Coly Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-10-18 19:32 ` Dan Williams
2024-10-21 5:05 ` Coly Li
@ 2024-12-09 4:02 ` Li Ming
2024-12-10 20:30 ` Dan Williams
1 sibling, 1 reply; 8+ messages in thread
From: Li Ming @ 2024-12-09 4:02 UTC (permalink / raw)
To: Dan Williams, Alison Schofield; +Cc: nvdimm, linux-cxl, Coly Li
On 10/19/2024 3:32 AM, Dan Williams wrote:
> Alison Schofield wrote:
>>
>> + linux-cxl mailing list
>
> Thanks for forwarding...
>
>> On Fri, Oct 11, 2024 at 05:58:52PM +0800, Coly Li wrote:
>>> Hi list,
>>>
>>> Recently I have a report for a warning message from CXL subsystem,
>>> [ 48.142342] cxl_port port2: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144690] cxl_port port3: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144704] cxl_port port3: HDM decoder capability not found
>>> [ 48.144850] cxl_port port4: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.144859] cxl_port port4: HDM decoder capability not found
>>> [ 48.170374] cxl_port port6: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.172893] cxl_port port7: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.174689] cxl_port port7: HDM decoder capability not found
>>> [ 48.175091] cxl_port port8: Couldn't locate the CXL.cache and CXL.mem capability array header.
>>> [ 48.175105] cxl_port port8: HDM decoder capability not found
>>>
>>> After checking the source code I realize this is not a real bug,
>>> just a warning message that expected device was not detected. But
>>> from the above warning information itself, users/customers are
>>> worried there is something wrong (IMHO indeed not).
>>>
>>> Is there any chance that we can improve the code logic that only
>>> printing out the warning message when it is really a problem to be
>>> noticed?
>
> There is a short term fix and a long term fix. The short term fix could
> be to just delete the warning message, or downgrade it to dev_dbg(), for
> now since it is more often a false positive than not. The long term fix,
> and the logic needed to resolve false-positive reports, is to flip the
> capability discovery until *after* it is clear that there is a
> downstream endpoint capable of CXL.cachemem.
>
> Without an endpoint there is no point in reporting that a potentially
> CXL capable port is missing cachemem registers.
>
> So, if you want to send a patch changing that warning to dev_dbg() for
> now I would support that.
>
I noticed the short term solution been merged, may I know if anyone is working on the long term solution? If not, I can work on it.
Thanks
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-12-09 4:02 ` Li Ming
@ 2024-12-10 20:30 ` Dan Williams
2024-12-11 14:48 ` Li Ming
0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2024-12-10 20:30 UTC (permalink / raw)
To: Li Ming, Dan Williams, Alison Schofield; +Cc: nvdimm, linux-cxl, Coly Li
Li Ming wrote:
[..]
> > There is a short term fix and a long term fix. The short term fix could
> > be to just delete the warning message, or downgrade it to dev_dbg(), for
> > now since it is more often a false positive than not. The long term fix,
> > and the logic needed to resolve false-positive reports, is to flip the
> > capability discovery until *after* it is clear that there is a
> > downstream endpoint capable of CXL.cachemem.
> >
> > Without an endpoint there is no point in reporting that a potentially
> > CXL capable port is missing cachemem registers.
> >
> > So, if you want to send a patch changing that warning to dev_dbg() for
> > now I would support that.
> >
>
> I noticed the short term solution been merged, may I know if anyone is
> working on the long term solution? If not, I can work on it.
Hi Ming,
To my knowledge nobody is working on it, so feel free to take a look.
Just note though that if this gets in someone else's critical path they
could also produce some patches. I.e. typical Linux kernel task
wrangling where the first to post a workable solution usually gets to
drive the discussion.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Removing a misleading warning message?
2024-12-10 20:30 ` Dan Williams
@ 2024-12-11 14:48 ` Li Ming
0 siblings, 0 replies; 8+ messages in thread
From: Li Ming @ 2024-12-11 14:48 UTC (permalink / raw)
To: Dan Williams, Alison Schofield; +Cc: nvdimm, linux-cxl, Coly Li
On 12/11/2024 4:30 AM, Dan Williams wrote:
> Li Ming wrote:
> [..]
>>> There is a short term fix and a long term fix. The short term fix could
>>> be to just delete the warning message, or downgrade it to dev_dbg(), for
>>> now since it is more often a false positive than not. The long term fix,
>>> and the logic needed to resolve false-positive reports, is to flip the
>>> capability discovery until *after* it is clear that there is a
>>> downstream endpoint capable of CXL.cachemem.
>>>
>>> Without an endpoint there is no point in reporting that a potentially
>>> CXL capable port is missing cachemem registers.
>>>
>>> So, if you want to send a patch changing that warning to dev_dbg() for
>>> now I would support that.
>>>
>> I noticed the short term solution been merged, may I know if anyone is
>> working on the long term solution? If not, I can work on it.
> Hi Ming,
>
> To my knowledge nobody is working on it, so feel free to take a look.
> Just note though that if this gets in someone else's critical path they
> could also produce some patches. I.e. typical Linux kernel task
> wrangling where the first to post a workable solution usually gets to
> drive the discussion.
>
Hi Dan,
Understand, thanks for your information, I am also willing to review those patches if that happens.
Ming
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-11 14:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <15237B14-B55B-4737-9A98-D76AEDB4AEAD@suse.de>
2024-10-17 14:56 ` Removing a misleading warning message? Alison Schofield
2024-10-18 10:22 ` Jonathan Cameron
2024-10-18 19:32 ` Dan Williams
2024-10-21 5:05 ` Coly Li
2024-12-09 4:02 ` Li Ming
2024-12-10 20:30 ` Dan Williams
2024-12-11 14:48 ` Li Ming
2024-10-14 7:12 Coly Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox