* [PATCH V2 0/2] firmware: arm_scmi: Misc Fixes
@ 2024-09-04 3:13 Sibi Sankar
2024-09-04 3:13 ` [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
0 siblings, 2 replies; 22+ messages in thread
From: Sibi Sankar @ 2024-09-04 3:13 UTC (permalink / raw)
To: sudeep.holla, cristian.marussi
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
quic_sibis, johan, konradybcio
The series addresses a couple of kernel warnings that are required to [1]
to land.
[1] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
V1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.
Base branch: next-20240903
Sibi Sankar (2):
firmware: arm_scmi: Ensure that the message-id supports fastchannel
firmware: arm_scmi: Skip adding bad duplicates
drivers/firmware/arm_scmi/driver.c | 9 +++++++++
drivers/firmware/arm_scmi/perf.c | 13 +++++++++++--
drivers/firmware/arm_scmi/protocols.h | 2 ++
3 files changed, 22 insertions(+), 2 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 3:13 [PATCH V2 0/2] firmware: arm_scmi: Misc Fixes Sibi Sankar
@ 2024-09-04 3:13 ` Sibi Sankar
2024-09-04 7:00 ` Johan Hovold
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
1 sibling, 1 reply; 22+ messages in thread
From: Sibi Sankar @ 2024-09-04 3:13 UTC (permalink / raw)
To: sudeep.holla, cristian.marussi
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
quic_sibis, johan, konradybcio
Currently the perf and powercap protocol relies on the protocol domain
attributes, which just ensures that one fastchannel per domain, before
instantiating fastchannels for all possible message-ids. Fix this by
ensuring that each message-id supports fastchannel before initialization.
Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
v1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.
drivers/firmware/arm_scmi/driver.c | 9 +++++++++
drivers/firmware/arm_scmi/protocols.h | 2 ++
2 files changed, 11 insertions(+)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 69c15135371c..95d039152f79 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -56,6 +56,9 @@ static atomic_t transfer_last_id;
static struct dentry *scmi_top_dentry;
+static int scmi_protocol_msg_check(const struct scmi_protocol_handle *ph,
+ u32 message_id, u32 *attributes);
+
/**
* struct scmi_xfers_info - Structure to manage transfer information
*
@@ -1841,6 +1844,7 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
int ret;
u32 flags;
u64 phys_addr;
+ u32 attributes;
u8 size;
void __iomem *addr;
struct scmi_xfer *t;
@@ -1849,6 +1853,11 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
struct scmi_msg_resp_desc_fc *resp;
const struct scmi_protocol_instance *pi = ph_to_pi(ph);
+ /* Check if the MSG_ID supports fastchannel */
+ ret = scmi_protocol_msg_check(ph, message_id, &attributes);
+ if (!ret && !MSG_SUPPORTS_FASTCHANNEL(attributes))
+ return;
+
if (!p_addr) {
ret = -EINVAL;
goto err_out;
diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h
index 8e95f53bd7b7..c1c5260f71c9 100644
--- a/drivers/firmware/arm_scmi/protocols.h
+++ b/drivers/firmware/arm_scmi/protocols.h
@@ -31,6 +31,8 @@
#define SCMI_PROTOCOL_VENDOR_BASE 0x80
+#define MSG_SUPPORTS_FASTCHANNEL(x) ((x) & BIT(0))
+
enum scmi_common_cmd {
PROTOCOL_VERSION = 0x0,
PROTOCOL_ATTRIBUTES = 0x1,
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 3:13 [PATCH V2 0/2] firmware: arm_scmi: Misc Fixes Sibi Sankar
2024-09-04 3:13 ` [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
@ 2024-09-04 3:13 ` Sibi Sankar
2024-09-04 7:09 ` Johan Hovold
` (3 more replies)
1 sibling, 4 replies; 22+ messages in thread
From: Sibi Sankar @ 2024-09-04 3:13 UTC (permalink / raw)
To: sudeep.holla, cristian.marussi
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
quic_sibis, johan, konradybcio
Ensure that the bad duplicates reported by the platform firmware doesn't
get added to the opp-tables.
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
drivers/firmware/arm_scmi/perf.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 2d77b5f40ca7..114c3dd70ede 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -386,9 +386,11 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->info.name, ret);
+ opp->perf = 0;
+ }
}
static inline void
@@ -404,9 +406,12 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->info.name, ret);
+ opp->perf = 0;
+ return;
+ }
/* Note that PERF v4 reports always five 32-bit words */
opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
@@ -871,6 +876,10 @@ static int scmi_dvfs_device_opps_add(const struct scmi_protocol_handle *ph,
else
freq = dom->opp[idx].indicative_freq * dom->mult_factor;
+ /* Skip all invalid frequencies reported by the firmware */
+ if (!freq)
+ continue;
+
/* All OPPs above the sustained frequency are treated as turbo */
data.turbo = freq > dom->sustained_freq_khz * 1000;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 3:13 ` [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
@ 2024-09-04 7:00 ` Johan Hovold
2024-09-04 11:29 ` Konrad Dybcio
0 siblings, 1 reply; 22+ messages in thread
From: Johan Hovold @ 2024-09-04 7:00 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, konradybcio
On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote:
> Currently the perf and powercap protocol relies on the protocol domain
> attributes, which just ensures that one fastchannel per domain, before
> instantiating fastchannels for all possible message-ids. Fix this by
> ensuring that each message-id supports fastchannel before initialization.
Please include the warnings that I reported seeing on x1e80100 and that
this patch suppresses to the commit message:
arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging.
arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging.
arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging.
> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
And add:
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
(or use Closes: if you prefer).
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
>
> v1:
> * add missing MSG_SUPPORTS_FASTCHANNEL definition.
Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
[ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
[ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
[ 26.987142] cpufreq: cpufreq_online: ->get() failed
and then the machine hangs (mostly, I saw an nvme timeout message after a
while).
Make sure you test suspend as well as some of the warnings I reported
only show up during suspend.
Johan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
@ 2024-09-04 7:09 ` Johan Hovold
2024-09-04 8:05 ` Johan Hovold
2024-09-04 13:56 ` Konrad Dybcio
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Johan Hovold @ 2024-09-04 7:09 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, konradybcio
On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> Ensure that the bad duplicates reported by the platform firmware doesn't
> get added to the opp-tables.
Please expand on why this is an issue on Qualcomm platforms, these
entries aren't just "bad duplicates" if IIUC.
Also here, please add (examples of) the warnings I reported. During boot
of the x1e80100 crd, I see:
[ 8.992956] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
[ 9.021940] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
[ 9.036171] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
[ 9.036177] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. New: freq: 3417600000, volt: 0, enabled: 1
and during resume:
[ 85.286615] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. N
ew: freq: 3417600000, volt: 0, enabled: 1
[ 85.319849] cpu cpu4: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. N
ew: freq: 3417600000, volt: 0, enabled: 1
[ 85.334686] debugfs: File 'cpu5' in directory 'opp' already present!
[ 85.341399] debugfs: File 'cpu6' in directory 'opp' already present!
[ 85.348016] debugfs: File 'cpu7' in directory 'opp' already present!
[ 85.443093] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. N
ew: freq: 3417600000, volt: 0, enabled: 1
[ 85.476595] cpu cpu8: _opp_is_duplicate: duplicate OPPs detected. Existing: freq: 3417600000, volt: 0, enabled: 1. N
ew: freq: 3417600000, volt: 0, enabled: 1
[ 85.491645] debugfs: File 'cpu9' in directory 'opp' already present!
[ 85.498409] debugfs: File 'cpu10' in directory 'opp' already present!
[ 85.505187] debugfs: File 'cpu11' in directory 'opp' already present!
Please also add:
Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
But with this patch applied, instead of the above warnings I now get two
*errors* at boot:
[ 8.952173] cpu cpu4: EM: non-increasing freq: 0
[ 8.979460] cpu cpu8: EM: non-increasing freq: 0
Can you do something about that as well? At least make sure to highlight
this in the commit message as this is information that is needed to be
able to evaluate the patch.
Johan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 7:09 ` Johan Hovold
@ 2024-09-04 8:05 ` Johan Hovold
0 siblings, 0 replies; 22+ messages in thread
From: Johan Hovold @ 2024-09-04 8:05 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, konradybcio
On Wed, Sep 04, 2024 at 09:09:56AM +0200, Johan Hovold wrote:
> On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > Ensure that the bad duplicates reported by the platform firmware doesn't
> > get added to the opp-tables.
> But with this patch applied, instead of the above warnings I now get two
> *errors* at boot:
>
> [ 8.952173] cpu cpu4: EM: non-increasing freq: 0
> [ 8.979460] cpu cpu8: EM: non-increasing freq: 0
>
> Can you do something about that as well? At least make sure to highlight
> this in the commit message as this is information that is needed to be
> able to evaluate the patch.
I tried running with just this patch (i.e. without patch 1/2), but then
cpufreq fails already during boot with eight of these errors:
[ 9.258577] cpufreq: cpufreq_online: ->get() failed
...
[ 9.405603] cpufreq: cpufreq_online: ->get() failed
and seven of these messages:
[ 12.322987] energy_model: Accessing cpu4 policy failed
...
[ 18.462780] energy_model: Accessing cpu4 policy failed
Johan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 7:00 ` Johan Hovold
@ 2024-09-04 11:29 ` Konrad Dybcio
2024-09-04 12:38 ` Sudeep Holla
0 siblings, 1 reply; 22+ messages in thread
From: Konrad Dybcio @ 2024-09-04 11:29 UTC (permalink / raw)
To: Johan Hovold, Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, konradybcio
On 4.09.2024 9:00 AM, Johan Hovold wrote:
> On Wed, Sep 04, 2024 at 08:43:23AM +0530, Sibi Sankar wrote:
>> Currently the perf and powercap protocol relies on the protocol domain
>> attributes, which just ensures that one fastchannel per domain, before
>> instantiating fastchannels for all possible message-ids. Fix this by
>> ensuring that each message-id supports fastchannel before initialization.
>
> Please include the warnings that I reported seeing on x1e80100 and that
> this patch suppresses to the commit message:
>
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:0] - ret:-95. Using regular messaging.
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:1] - ret:-95. Using regular messaging.
> arm-scmi firmware:scmi: Failed to get FC for protocol 13 [MSG_ID:6 / RES_ID:2] - ret:-95. Using regular messaging.
>
>> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
>
> And add:
>
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Link: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>
> (or use Closes: if you prefer).
>
>> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
>> ---
>>
>> v1:
>> * add missing MSG_SUPPORTS_FASTCHANNEL definition.
>
> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
>
> [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
> [ 26.987142] cpufreq: cpufreq_online: ->get() failed
>
> and then the machine hangs (mostly, I saw an nvme timeout message after a
> while).
>
> Make sure you test suspend as well as some of the warnings I reported
> only show up during suspend.
Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
the firmware fails to inform us about it through BIT(0) in attrs..
Konrad
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 11:29 ` Konrad Dybcio
@ 2024-09-04 12:38 ` Sudeep Holla
2024-09-04 14:20 ` Cristian Marussi
0 siblings, 1 reply; 22+ messages in thread
From: Sudeep Holla @ 2024-09-04 12:38 UTC (permalink / raw)
To: Konrad Dybcio
Cc: Johan Hovold, Sibi Sankar, cristian.marussi, linux-kernel,
arm-scmi, linux-arm-kernel, linux-arm-msm, Sudeep Holla
On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
> On 4.09.2024 9:00 AM, Johan Hovold wrote:
[...]
> >
> > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
> >
> > [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> > [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
> > [ 26.987142] cpufreq: cpufreq_online: ->get() failed
> >
> > and then the machine hangs (mostly, I saw an nvme timeout message after a
> > while).
> >
> > Make sure you test suspend as well as some of the warnings I reported
> > only show up during suspend.
>
> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
> the firmware fails to inform us about it through BIT(0) in attrs..
>
Just trying to understand things better here. So the firmware expects OSPM
to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
but fails to set the attribute indicating FC is available for the domain.
I am not sure if that is stupid choice or there is some cost benefit in
not implementing PERF_LEVEL_GET via normal channel if that is a fact. I
am very much interested to know the reason either way especially if it
is latter.
--
Regards,
Sudeep
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
2024-09-04 7:09 ` Johan Hovold
@ 2024-09-04 13:56 ` Konrad Dybcio
2024-09-04 14:00 ` Konrad Dybcio
2024-09-04 15:21 ` Cristian Marussi
2024-09-04 16:12 ` Sudeep Holla
3 siblings, 1 reply; 22+ messages in thread
From: Konrad Dybcio @ 2024-09-04 13:56 UTC (permalink / raw)
To: Sibi Sankar, sudeep.holla, cristian.marussi
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan,
konradybcio
On 4.09.2024 5:13 AM, Sibi Sankar wrote:
> Ensure that the bad duplicates reported by the platform firmware doesn't
> get added to the opp-tables.
>
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
> drivers/firmware/arm_scmi/perf.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
> index 2d77b5f40ca7..114c3dd70ede 100644
> --- a/drivers/firmware/arm_scmi/perf.c
> +++ b/drivers/firmware/arm_scmi/perf.c
> @@ -386,9 +386,11 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
> le16_to_cpu(r->opp[loop_idx].transition_latency_us);
>
> ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
> - if (ret)
> + if (ret) {
> dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> opp->perf, dom->info.name, ret);
> + opp->perf = 0;
> + }
> }
>
> static inline void
> @@ -404,9 +406,12 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
> le16_to_cpu(r->opp[loop_idx].transition_latency_us);
>
> ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
> - if (ret)
> + if (ret) {
> dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
> opp->perf, dom->info.name, ret);
> + opp->perf = 0;
> + return;
> + }
>
> /* Note that PERF v4 reports always five 32-bit words */
> opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
> @@ -871,6 +876,10 @@ static int scmi_dvfs_device_opps_add(const struct scmi_protocol_handle *ph,
> else
> freq = dom->opp[idx].indicative_freq * dom->mult_factor;
>
> + /* Skip all invalid frequencies reported by the firmware */
> + if (!freq)
> + continue;
Maybe something like this instead? (not tested)
diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 2d77b5f40ca7..530692119c79 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -431,8 +431,14 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
{
struct scmi_opp *opp;
struct scmi_perf_ipriv *p = priv;
+ unsigned int idx = st->desc_index + st->loop_idx;
+
+ opp = &p->perf_dom->opp[idx];
+
+ /* Avoid duplicate entries coming from buggy firmware */
+ if (idx > 0 && opp->perf && p->perf_dom->opp[idx - 1].perf)
+ return 0;
- opp = &p->perf_dom->opp[st->desc_index + st->loop_idx];
if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx,
response);
Konrad
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 13:56 ` Konrad Dybcio
@ 2024-09-04 14:00 ` Konrad Dybcio
0 siblings, 0 replies; 22+ messages in thread
From: Konrad Dybcio @ 2024-09-04 14:00 UTC (permalink / raw)
To: Konrad Dybcio, Sibi Sankar, sudeep.holla, cristian.marussi
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan
On 4.09.2024 3:56 PM, Konrad Dybcio wrote:
> On 4.09.2024 5:13 AM, Sibi Sankar wrote:
>> Ensure that the bad duplicates reported by the platform firmware doesn't
>> get added to the opp-tables.
>>
>> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
>> ---
>> drivers/firmware/arm_scmi/perf.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
>> index 2d77b5f40ca7..114c3dd70ede 100644
>> --- a/drivers/firmware/arm_scmi/perf.c
>> +++ b/drivers/firmware/arm_scmi/perf.c
>> @@ -386,9 +386,11 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
>> le16_to_cpu(r->opp[loop_idx].transition_latency_us);
>>
>> ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
>> - if (ret)
>> + if (ret) {
>> dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
>> opp->perf, dom->info.name, ret);
>> + opp->perf = 0;
>> + }
>> }
>>
>> static inline void
>> @@ -404,9 +406,12 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
>> le16_to_cpu(r->opp[loop_idx].transition_latency_us);
>>
>> ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
>> - if (ret)
>> + if (ret) {
>> dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
>> opp->perf, dom->info.name, ret);
>> + opp->perf = 0;
>> + return;
>> + }
>>
>> /* Note that PERF v4 reports always five 32-bit words */
>> opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
>> @@ -871,6 +876,10 @@ static int scmi_dvfs_device_opps_add(const struct scmi_protocol_handle *ph,
>> else
>> freq = dom->opp[idx].indicative_freq * dom->mult_factor;
>>
>> + /* Skip all invalid frequencies reported by the firmware */
>> + if (!freq)
>> + continue;
>
> Maybe something like this instead? (not tested)
>
> diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
> index 2d77b5f40ca7..530692119c79 100644
> --- a/drivers/firmware/arm_scmi/perf.c
> +++ b/drivers/firmware/arm_scmi/perf.c
> @@ -431,8 +431,14 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
> {
> struct scmi_opp *opp;
> struct scmi_perf_ipriv *p = priv;
> + unsigned int idx = st->desc_index + st->loop_idx;
> +
> + opp = &p->perf_dom->opp[idx];
> +
> + /* Avoid duplicate entries coming from buggy firmware */
> + if (idx > 0 && opp->perf && p->perf_dom->opp[idx - 1].perf)
> + return 0;
>
> - opp = &p->perf_dom->opp[st->desc_index + st->loop_idx];
> if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
> process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx,
> response);
No that won't work, perf_dom->opp has all the entries and that's used
in e.g. scmi_dvfs_device_opps_add :/
Konrad
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 12:38 ` Sudeep Holla
@ 2024-09-04 14:20 ` Cristian Marussi
2024-09-05 12:54 ` Konrad Dybcio
0 siblings, 1 reply; 22+ messages in thread
From: Cristian Marussi @ 2024-09-04 14:20 UTC (permalink / raw)
To: Sudeep Holla
Cc: Konrad Dybcio, Johan Hovold, Sibi Sankar, cristian.marussi,
linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm
On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
> > On 4.09.2024 9:00 AM, Johan Hovold wrote:
>
> [...]
>
> > >
> > > Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
> > >
> > > [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
> > > [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
> > > [ 26.987142] cpufreq: cpufreq_online: ->get() failed
> > >
> > > and then the machine hangs (mostly, I saw an nvme timeout message after a
> > > while).
> > >
> > > Make sure you test suspend as well as some of the warnings I reported
> > > only show up during suspend.
> >
> > Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
> > the firmware fails to inform us about it through BIT(0) in attrs..
> >
>
> Just trying to understand things better here. So the firmware expects OSPM
> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
> but fails to set the attribute indicating FC is available for the domain.
>
Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
support is mandatory by the spec anyway ?
Thanks,
Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
2024-09-04 7:09 ` Johan Hovold
2024-09-04 13:56 ` Konrad Dybcio
@ 2024-09-04 15:21 ` Cristian Marussi
2024-09-04 15:30 ` Cristian Marussi
2024-09-04 16:12 ` Sudeep Holla
3 siblings, 1 reply; 22+ messages in thread
From: Cristian Marussi @ 2024-09-04 15:21 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> Ensure that the bad duplicates reported by the platform firmware doesn't
> get added to the opp-tables.
>
Hi Sibi,
so if the idea is to make the code more robust when FW sends BAD
duplicates, you necessarily need to properly drop opps in opp_count too.
One other option would be to just loop with xa_for_each BUT opp_count is
used in a number of places...so first of all let's try drop count properly.
Can you try this patch down below, instead of your patch.
If it solves, I will send a patch (after testing it a bit more :D)
Thanks,
Cristian
P.S.: thanks for spotting this, I forgot NOT to trust FW replies as usual :P
--->8---
diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 397a39729e29..cbac29792d1e 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -340,7 +340,7 @@ static int iter_perf_levels_update_state(struct scmi_iterator_state *st,
return 0;
}
-static inline void
+static inline int
process_response_opp(struct device *dev, struct scmi_perf_domain_info *dom,
struct scmi_opp *opp, unsigned int loop_idx,
const struct scmi_msg_resp_perf_describe_levels *r)
@@ -353,12 +353,16 @@ process_response_opp(struct device *dev, struct scmi_perf_domain_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->name, ret);
+ return ret;
+ }
+
+ return 0;
}
-static inline void
+static inline int
process_response_opp_v4(struct device *dev, struct scmi_perf_domain_info *dom,
struct scmi_opp *opp, unsigned int loop_idx,
const struct scmi_msg_resp_perf_describe_levels_v4 *r)
@@ -371,9 +375,11 @@ process_response_opp_v4(struct device *dev, struct scmi_perf_domain_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->name, ret);
+ return ret;
+ }
/* Note that PERF v4 reports always five 32-bit words */
opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
@@ -382,13 +388,21 @@ process_response_opp_v4(struct device *dev, struct scmi_perf_domain_info *dom,
ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp,
GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev,
"Failed to add opps_by_idx at %d for %s - ret:%d\n",
opp->level_index, dom->name, ret);
+ /* Cleanup by_lvl too */
+ xa_erase(&dom->opps_by_lvl, opp->perf);
+
+ return ret;
+ }
+
hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq);
}
+
+ return 0;
}
static int
@@ -396,16 +410,22 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
const void *response,
struct scmi_iterator_state *st, void *priv)
{
+ int ret;
struct scmi_opp *opp;
struct scmi_perf_ipriv *p = priv;
opp = &p->perf_dom->opp[st->desc_index + st->loop_idx];
if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
- process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx,
- response);
+ ret = process_response_opp(ph->dev, p->perf_dom, opp,
+ st->loop_idx, response);
else
- process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx,
- response);
+ ret = process_response_opp_v4(ph->dev, p->perf_dom, opp,
+ st->loop_idx, response);
+
+ /* Skip BAD duplicates received from firmware */
+ if (ret)
+ return ret == -EBUSY ? 0 : ret;
+
p->perf_dom->opp_count++;
dev_dbg(ph->dev, "Level %d Power %d Latency %dus Ifreq %d Index %d\n",
---8<---
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 15:21 ` Cristian Marussi
@ 2024-09-04 15:30 ` Cristian Marussi
2024-09-04 15:46 ` Cristian Marussi
2024-10-07 7:00 ` Sibi Sankar
0 siblings, 2 replies; 22+ messages in thread
From: Cristian Marussi @ 2024-09-04 15:30 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Wed, Sep 04, 2024 at 04:21:49PM +0100, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > Ensure that the bad duplicates reported by the platform firmware doesn't
> > get added to the opp-tables.
> >
>
> Hi Sibi,
>
> so if the idea is to make the code more robust when FW sends BAD
> duplicates, you necessarily need to properly drop opps in opp_count too.
>
> One other option would be to just loop with xa_for_each BUT opp_count is
> used in a number of places...so first of all let's try drop count properly.
>
> Can you try this patch down below, instead of your patch.
> If it solves, I will send a patch (after testing it a bit more :D)
Hold on... I sent you a diff that does not apply probably on your tree due
to some uncomitted local work of mine...my bad...let me resend.
Thanks,
Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 15:30 ` Cristian Marussi
@ 2024-09-04 15:46 ` Cristian Marussi
2024-09-05 12:43 ` Cristian Marussi
2024-10-07 7:00 ` Sibi Sankar
1 sibling, 1 reply; 22+ messages in thread
From: Cristian Marussi @ 2024-09-04 15:46 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Wed, Sep 04, 2024 at 04:30:24PM +0100, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 04:21:49PM +0100, Cristian Marussi wrote:
> > On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > > Ensure that the bad duplicates reported by the platform firmware doesn't
> > > get added to the opp-tables.
> > >
> >
> > Hi Sibi,
> >
> > so if the idea is to make the code more robust when FW sends BAD
> > duplicates, you necessarily need to properly drop opps in opp_count too.
> >
> > One other option would be to just loop with xa_for_each BUT opp_count is
> > used in a number of places...so first of all let's try drop count properly.
> >
> > Can you try this patch down below, instead of your patch.
> > If it solves, I will send a patch (after testing it a bit more :D)
>
> Hold on... I sent you a diff that does not apply probably on your tree due
> to some uncomitted local work of mine...my bad...let me resend.
>
This one should be good...apologies
---8<---
diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 4b7f1cbb9b04..bca9c6e4a3ab 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -373,7 +373,7 @@ static int iter_perf_levels_update_state(struct scmi_iterator_state *st,
return 0;
}
-static inline void
+static inline int
process_response_opp(struct device *dev, struct perf_dom_info *dom,
struct scmi_opp *opp, unsigned int loop_idx,
const struct scmi_msg_resp_perf_describe_levels *r)
@@ -386,12 +386,16 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->info.name, ret);
+ return ret;
+ }
+
+ return 0;
}
-static inline void
+static inline int
process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
struct scmi_opp *opp, unsigned int loop_idx,
const struct scmi_msg_resp_perf_describe_levels_v4 *r)
@@ -404,9 +408,11 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
le16_to_cpu(r->opp[loop_idx].transition_latency_us);
ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
opp->perf, dom->info.name, ret);
+ return ret;
+ }
/* Note that PERF v4 reports always five 32-bit words */
opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
@@ -415,13 +421,21 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp,
GFP_KERNEL);
- if (ret)
+ if (ret) {
dev_warn(dev,
"Failed to add opps_by_idx at %d for %s - ret:%d\n",
opp->level_index, dom->info.name, ret);
+ /* Cleanup by_lvl too */
+ xa_erase(&dom->opps_by_lvl, opp->perf);
+
+ return ret;
+ }
+
hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq);
}
+
+ return 0;
}
static int
@@ -429,16 +443,22 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
const void *response,
struct scmi_iterator_state *st, void *priv)
{
+ int ret;
struct scmi_opp *opp;
struct scmi_perf_ipriv *p = priv;
opp = &p->perf_dom->opp[st->desc_index + st->loop_idx];
if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
- process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx,
- response);
+ ret = process_response_opp(ph->dev, p->perf_dom, opp,
+ st->loop_idx, response);
else
- process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx,
- response);
+ ret = process_response_opp_v4(ph->dev, p->perf_dom, opp,
+ st->loop_idx, response);
+
+ /* Skip BAD duplicates received from firmware */
+ if (ret)
+ return ret == -EBUSY ? 0 : ret;
+
p->perf_dom->opp_count++;
dev_dbg(ph->dev, "Level %d Power %d Latency %dus Ifreq %d Index %d\n",
--->8----
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
` (2 preceding siblings ...)
2024-09-04 15:21 ` Cristian Marussi
@ 2024-09-04 16:12 ` Sudeep Holla
2024-09-04 16:20 ` Cristian Marussi
3 siblings, 1 reply; 22+ messages in thread
From: Sudeep Holla @ 2024-09-04 16:12 UTC (permalink / raw)
To: Sibi Sankar
Cc: cristian.marussi, linux-kernel, arm-scmi, linux-arm-kernel,
linux-arm-msm, johan, konradybcio, Sudeep Holla
On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> Ensure that the bad duplicates reported by the platform firmware doesn't
> get added to the opp-tables.
>
I am really interested to know if the platform firmware is presenting
duplicates intentionally for some unknown reasons and we are just speculating
it to be broken firmware or is it really broken firmware.
For me, it is very hard to digest something like OPP tables which is there
for a very long time now is not very well understood by firmware authors.
How many duplicates are we seeing on this platform really ? If it is
just one I can understand. More than one is hard to miss from the OPP
tables in the firmware.
While I am not opposing to make the driver handle these duplicates,
I am just worried if they are put there intentionally for reasons we
don't understand yet or not published.
--
Regards,
Sudeep
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 16:12 ` Sudeep Holla
@ 2024-09-04 16:20 ` Cristian Marussi
2024-10-07 6:51 ` Sibi Sankar
0 siblings, 1 reply; 22+ messages in thread
From: Cristian Marussi @ 2024-09-04 16:20 UTC (permalink / raw)
To: Sudeep Holla
Cc: Sibi Sankar, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Wed, Sep 04, 2024 at 05:12:29PM +0100, Sudeep Holla wrote:
> On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > Ensure that the bad duplicates reported by the platform firmware doesn't
> > get added to the opp-tables.
> >
>
> I am really interested to know if the platform firmware is presenting
> duplicates intentionally for some unknown reasons and we are just speculating
> it to be broken firmware or is it really broken firmware.
>
> For me, it is very hard to digest something like OPP tables which is there
> for a very long time now is not very well understood by firmware authors.
> How many duplicates are we seeing on this platform really ? If it is
> just one I can understand. More than one is hard to miss from the OPP
> tables in the firmware.
>
> While I am not opposing to make the driver handle these duplicates,
> I am just worried if they are put there intentionally for reasons we
> don't understand yet or not published.
>
The number of duplicates reported in logs makes me suspect the same...seems
like intentional/by_design .... but at first I stick to the general issue
of handling bad fw replies and how to survive kernel side at first...but I
indeed share your same concerns...
Thanks,
Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 15:46 ` Cristian Marussi
@ 2024-09-05 12:43 ` Cristian Marussi
0 siblings, 0 replies; 22+ messages in thread
From: Cristian Marussi @ 2024-09-05 12:43 UTC (permalink / raw)
To: Sibi Sankar
Cc: sudeep.holla, cristian.marussi, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Wed, Sep 04, 2024 at 04:46:08PM +0100, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 04:30:24PM +0100, Cristian Marussi wrote:
> > On Wed, Sep 04, 2024 at 04:21:49PM +0100, Cristian Marussi wrote:
> > > On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > > > Ensure that the bad duplicates reported by the platform firmware doesn't
> > > > get added to the opp-tables.
> > > >
> > >
> > > Hi Sibi,
> > >
> > > so if the idea is to make the code more robust when FW sends BAD
> > > duplicates, you necessarily need to properly drop opps in opp_count too.
> > >
> > > One other option would be to just loop with xa_for_each BUT opp_count is
> > > used in a number of places...so first of all let's try drop count properly.
> > >
> > > Can you try this patch down below, instead of your patch.
> > > If it solves, I will send a patch (after testing it a bit more :D)
> >
> > Hold on... I sent you a diff that does not apply probably on your tree due
> > to some uncomitted local work of mine...my bad...let me resend.
> >
>
> This one should be good...apologies
>
As a side comment about this, even though we certain can and should make
the Kernel more robust to skip possible bad replies from FW (like with this
or similar patches), if the bad replies comes by design since, as I have
grasped from the other thread, the FW just ALSO exposes resources/OPPs that
are just NOT usable by the Kernel OSPM/agent (but maybe by other agents),
you should fix your FW to fully avoid the warnings...since the warnings in
SCMI/perf are there exactly to catch this kind of situations...
(even though we can deal better with the replies as with this patch)
IOW, the SCMI server should expose to its agents only the subset of
resources and characteristics that are supposed to be used by those
respective agents (server can expose a per-agent view of the system)...
...because, even if innocuous and even though most of the time we can cope
with such "alien" resources (with the FW simply replying with a DENY to any
request not meant to be touched or the Kernel spotting such anomalies and
ignoring them) all of these "alien" resources will generate some sort of
uneeded background SCMI traffic (of DENY replies)
Thanks,
Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-04 14:20 ` Cristian Marussi
@ 2024-09-05 12:54 ` Konrad Dybcio
2024-10-07 6:46 ` Sibi Sankar
0 siblings, 1 reply; 22+ messages in thread
From: Konrad Dybcio @ 2024-09-05 12:54 UTC (permalink / raw)
To: Cristian Marussi, Sudeep Holla
Cc: Konrad Dybcio, Johan Hovold, Sibi Sankar, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm
On 4.09.2024 4:20 PM, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
>> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
>>> On 4.09.2024 9:00 AM, Johan Hovold wrote:
>>
>> [...]
>>
>>>>
>>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
>>>>
>>>> [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
>>>> [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
>>>> [ 26.987142] cpufreq: cpufreq_online: ->get() failed
>>>>
>>>> and then the machine hangs (mostly, I saw an nvme timeout message after a
>>>> while).
>>>>
>>>> Make sure you test suspend as well as some of the warnings I reported
>>>> only show up during suspend.
>>>
>>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
>>> the firmware fails to inform us about it through BIT(0) in attrs..
>>>
>>
>> Just trying to understand things better here. So the firmware expects OSPM
>> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
>> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
>> but fails to set the attribute indicating FC is available for the domain.
>>
>
> Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
> support is mandatory by the spec anyway ?
So doing a bit of poking I think it's that FC is not marked as supported,
but we need to read out the frequency from the .get_addr.. which is only
populated if we go through fastchannel_init
Konrad
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel
2024-09-05 12:54 ` Konrad Dybcio
@ 2024-10-07 6:46 ` Sibi Sankar
0 siblings, 0 replies; 22+ messages in thread
From: Sibi Sankar @ 2024-10-07 6:46 UTC (permalink / raw)
To: Konrad Dybcio, Cristian Marussi, Sudeep Holla
Cc: Johan Hovold, linux-kernel, arm-scmi, linux-arm-kernel,
linux-arm-msm
On 9/5/24 18:24, Konrad Dybcio wrote:
> On 4.09.2024 4:20 PM, Cristian Marussi wrote:
>> On Wed, Sep 04, 2024 at 01:38:55PM +0100, Sudeep Holla wrote:
>>> On Wed, Sep 04, 2024 at 01:29:29PM +0200, Konrad Dybcio wrote:
>>>> On 4.09.2024 9:00 AM, Johan Hovold wrote:
>>>
>>> [...]
>>>
>>>>>
>>>>> Unfortunately, this patch breaks resume from suspend on the x1e80100 crd:
>>>>>
>>>>> [ 26.919676] CPU4: Booted secondary processor 0x0000010000 [0x511f0011]
>>>>> [ 26.960607] arm-scmi firmware:scmi: timed out in resp(caller: do_xfer+0x164/0x568)
>>>>> [ 26.987142] cpufreq: cpufreq_online: ->get() failed
>>>>>
>>>>> and then the machine hangs (mostly, I saw an nvme timeout message after a
>>>>> while).
>>>>>
>>>>> Make sure you test suspend as well as some of the warnings I reported
>>>>> only show up during suspend.
>>>>
>>>> Eh it looks like PERF_LEVEL_GET (msgid 8) requires the use of FC, but
>>>> the firmware fails to inform us about it through BIT(0) in attrs..
>>>>
>>>
>>> Just trying to understand things better here. So the firmware expects OSPM
>>> to just use FC only for PERF_LEVEL_GET and hence doesn't implement the
>>> default/normal channel for PERF_LEVEL_GET(I assume it returns error ?)
>>> but fails to set the attribute indicating FC is available for the domain.
>>>
>>
>> Is not that FCs are optional BUT PERF_LEVEL_GET standard messages is
>> support is mandatory by the spec anyway ?
>
> So doing a bit of poking I think it's that FC is not marked as supported,
> but we need to read out the frequency from the .get_addr.. which is only
> populated if we go through fastchannel_init
On further debug it was found that the SCP was servicing the request
but mailbox had the interrupt disabled during suspend which caused the
timeout. I just re-spun the series wit hte fix. So yeah PERF_LEVEL_GET
is expectedto work without any problems. There is no dependence on EC as
Konrad speculated. Just a straight forward case of interrupt being
disabled in the resume path.
-Sibi
>
> Konrad
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 16:20 ` Cristian Marussi
@ 2024-10-07 6:51 ` Sibi Sankar
0 siblings, 0 replies; 22+ messages in thread
From: Sibi Sankar @ 2024-10-07 6:51 UTC (permalink / raw)
To: Cristian Marussi, Sudeep Holla
Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan,
konradybcio
On 9/4/24 21:50, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 05:12:29PM +0100, Sudeep Holla wrote:
>> On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
>>> Ensure that the bad duplicates reported by the platform firmware doesn't
>>> get added to the opp-tables.
>>>
>>
>> I am really interested to know if the platform firmware is presenting
>> duplicates intentionally for some unknown reasons and we are just speculating
>> it to be broken firmware or is it really broken firmware.
>>
>> For me, it is very hard to digest something like OPP tables which is there
>> for a very long time now is not very well understood by firmware authors.
>> How many duplicates are we seeing on this platform really ? If it is
>> just one I can understand. More than one is hard to miss from the OPP
>> tables in the firmware.
>>
>> While I am not opposing to make the driver handle these duplicates,
>> I am just worried if they are put there intentionally for reasons we
>> don't understand yet or not published.
>>
>
> The number of duplicates reported in logs makes me suspect the same...seems
> like intentional/by_design .... but at first I stick to the general issue
> of handling bad fw replies and how to survive kernel side at first...but I
> indeed share your same concerns...
Hey Cristian/Sudeep,
The number of opps being duplicated is limited to the max
sustainable frequency before we see the turbo frequency. This
was pretty much the case in older non scmi perf qc cpufreq
drivers. They just filter it there, but I've gotten word that
this will get fixed in firmware for this SoC and any future ones
planning to use scmi-perf for cpufreq.
-Sibi
>
> Thanks,
> Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-09-04 15:30 ` Cristian Marussi
2024-09-04 15:46 ` Cristian Marussi
@ 2024-10-07 7:00 ` Sibi Sankar
2024-10-09 14:20 ` Cristian Marussi
1 sibling, 1 reply; 22+ messages in thread
From: Sibi Sankar @ 2024-10-07 7:00 UTC (permalink / raw)
To: Cristian Marussi
Cc: sudeep.holla, linux-kernel, arm-scmi, linux-arm-kernel,
linux-arm-msm, johan, konradybcio
On 9/4/24 21:00, Cristian Marussi wrote:
> On Wed, Sep 04, 2024 at 04:21:49PM +0100, Cristian Marussi wrote:
>> On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
>>> Ensure that the bad duplicates reported by the platform firmware doesn't
>>> get added to the opp-tables.
>>>
>>
>> Hi Sibi,
>>
>> so if the idea is to make the code more robust when FW sends BAD
>> duplicates, you necessarily need to properly drop opps in opp_count too.
>>
>> One other option would be to just loop with xa_for_each BUT opp_count is
>> used in a number of places...so first of all let's try drop count properly.
>>
>> Can you try this patch down below, instead of your patch.
>> If it solves, I will send a patch (after testing it a bit more :D)
>
> Hold on... I sent you a diff that does not apply probably on your tree due
> to some uncomitted local work of mine...my bad...let me resend.
Hey Cristian,
Thanks for taking time to send out the diff. I thought this would be
enough but there will still be a disconnect between opp_count and idx
of the opp we populate. Consider a case were we get to have a valid
opp just after duplicate opp. The opp_count will still limit us on what
levels we are allowed to see.
-Sibi
>
> Thanks,
> Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates
2024-10-07 7:00 ` Sibi Sankar
@ 2024-10-09 14:20 ` Cristian Marussi
0 siblings, 0 replies; 22+ messages in thread
From: Cristian Marussi @ 2024-10-09 14:20 UTC (permalink / raw)
To: Sibi Sankar
Cc: Cristian Marussi, sudeep.holla, linux-kernel, arm-scmi,
linux-arm-kernel, linux-arm-msm, johan, konradybcio
On Mon, Oct 07, 2024 at 12:30:14PM +0530, Sibi Sankar wrote:
>
>
> On 9/4/24 21:00, Cristian Marussi wrote:
> > On Wed, Sep 04, 2024 at 04:21:49PM +0100, Cristian Marussi wrote:
> > > On Wed, Sep 04, 2024 at 08:43:24AM +0530, Sibi Sankar wrote:
> > > > Ensure that the bad duplicates reported by the platform firmware doesn't
> > > > get added to the opp-tables.
> > > >
> > >
> > > Hi Sibi,
> > >
> > > so if the idea is to make the code more robust when FW sends BAD
> > > duplicates, you necessarily need to properly drop opps in opp_count too.
> > >
> > > One other option would be to just loop with xa_for_each BUT opp_count is
> > > used in a number of places...so first of all let's try drop count properly.
> > >
> > > Can you try this patch down below, instead of your patch.
> > > If it solves, I will send a patch (after testing it a bit more :D)
> >
> > Hold on... I sent you a diff that does not apply probably on your tree due
> > to some uncomitted local work of mine...my bad...let me resend.
>
> Hey Cristian,
> Thanks for taking time to send out the diff. I thought this would be
> enough but there will still be a disconnect between opp_count and idx
> of the opp we populate. Consider a case were we get to have a valid
> opp just after duplicate opp. The opp_count will still limit us on what
> levels we are allowed to see.
>
Ah right...indeed... I missed that the opp_count is used also to loop on the
opps arrays and OPPs are not only accessed by xa_load....
...anyway the index in the dom->opp arrauy is NOT related to index/level
indexing, so we just have to have the bad oop dupicates also in the
array and NOT only in the XArray...
I am sending you, as a reply to this patch, a new version of my fix
with just a one-line difference tthat should solve completely the issue
also in the usecase that you describe.
Thanks,
Cristian
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-10-09 14:22 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04 3:13 [PATCH V2 0/2] firmware: arm_scmi: Misc Fixes Sibi Sankar
2024-09-04 3:13 ` [PATCH V2 1/2] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
2024-09-04 7:00 ` Johan Hovold
2024-09-04 11:29 ` Konrad Dybcio
2024-09-04 12:38 ` Sudeep Holla
2024-09-04 14:20 ` Cristian Marussi
2024-09-05 12:54 ` Konrad Dybcio
2024-10-07 6:46 ` Sibi Sankar
2024-09-04 3:13 ` [PATCH V2 2/2] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
2024-09-04 7:09 ` Johan Hovold
2024-09-04 8:05 ` Johan Hovold
2024-09-04 13:56 ` Konrad Dybcio
2024-09-04 14:00 ` Konrad Dybcio
2024-09-04 15:21 ` Cristian Marussi
2024-09-04 15:30 ` Cristian Marussi
2024-09-04 15:46 ` Cristian Marussi
2024-09-05 12:43 ` Cristian Marussi
2024-10-07 7:00 ` Sibi Sankar
2024-10-09 14:20 ` Cristian Marussi
2024-09-04 16:12 ` Sudeep Holla
2024-09-04 16:20 ` Cristian Marussi
2024-10-07 6:51 ` Sibi Sankar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).