linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
@ 2024-10-07  6:06 Sibi Sankar
  2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-07  6:06 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	quic_sibis, johan, konradybcio, linux-pm, tstrudel, rafael

The series addresses the kernel warnings reported by Johan at [1] and are
are required to X1E cpufreq device tree changes [2] to land.

[1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
[2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/

The following warnings remain unadressed:
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

Duplicate levels:
arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15

^^ exist because SCP reports duplicate values for the highest sustainable
freq for perf domains 1 and 2. These are the only freqs that appear as
duplicates and will be fixed with a firmware update. FWIW the warnings
that we are addressing in this series will also get fixed by a firmware
update but they still have to land for devices already out in the wild.

V2:
* Include the fix for do_xfer timeout
* Include the fix debugfs node creation failure
* Include Cristian's fix for skipping opp duplication

V1:
* add missing MSG_SUPPORTS_FASTCHANNEL definition.

Base branch: next-20241004

Cristian Marussi (1):
  firmware: arm_scmi: Skip adding bad duplicates

Sibi Sankar (3):
  firmware: arm_scmi: Ensure that the message-id supports fastchannel
  pmdomain: core: Fix debugfs node creation failure
  mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag

 drivers/firmware/arm_scmi/driver.c    |  9 ++++++
 drivers/firmware/arm_scmi/perf.c      | 37 +++++++++++++++++++------
 drivers/firmware/arm_scmi/protocols.h |  2 ++
 drivers/mailbox/qcom-cpucp-mbox.c     |  2 +-
 drivers/pmdomain/core.c               | 40 +++++++++++++++++----------
 include/linux/pm_domain.h             |  1 +
 6 files changed, 66 insertions(+), 25 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
@ 2024-10-07  6:06 ` Sibi Sankar
  2024-10-09 13:46   ` Sudeep Holla
  2024-10-10 14:55   ` Johan Hovold
  2024-10-07  6:06 ` [PATCH V3 2/4] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-07  6:06 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	quic_sibis, johan, konradybcio, linux-pm, tstrudel, rafael,
	Johan Hovold

Currently the perf and powercap protocol relies on the protocol domain
attributes, which just ensures that one fastchannel per domain, before
instantiating fastchannels for all possible message-ids. Fix this by
ensuring that each message-id supports fastchannel before initialization.

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
 drivers/firmware/arm_scmi/driver.c    | 9 +++++++++
 drivers/firmware/arm_scmi/protocols.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 88c5c4ff4bb6..80a9a615672a 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -56,6 +56,9 @@ static atomic_t transfer_last_id;
 
 static struct dentry *scmi_top_dentry;
 
+static int scmi_protocol_msg_check(const struct scmi_protocol_handle *ph,
+				   u32 message_id, u32 *attributes);
+
 /**
  * struct scmi_xfers_info - Structure to manage transfer information
  *
@@ -1841,6 +1844,7 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
 	int ret;
 	u32 flags;
 	u64 phys_addr;
+	u32 attributes;
 	u8 size;
 	void __iomem *addr;
 	struct scmi_xfer *t;
@@ -1849,6 +1853,11 @@ scmi_common_fastchannel_init(const struct scmi_protocol_handle *ph,
 	struct scmi_msg_resp_desc_fc *resp;
 	const struct scmi_protocol_instance *pi = ph_to_pi(ph);
 
+	/* Check if the MSG_ID supports fastchannel */
+	ret = scmi_protocol_msg_check(ph, message_id, &attributes);
+	if (!ret && !MSG_SUPPORTS_FASTCHANNEL(attributes))
+		return;
+
 	if (!p_addr) {
 		ret = -EINVAL;
 		goto err_out;
diff --git a/drivers/firmware/arm_scmi/protocols.h b/drivers/firmware/arm_scmi/protocols.h
index aaee57cdcd55..d62c4469d1fd 100644
--- a/drivers/firmware/arm_scmi/protocols.h
+++ b/drivers/firmware/arm_scmi/protocols.h
@@ -31,6 +31,8 @@
 
 #define SCMI_PROTOCOL_VENDOR_BASE	0x80
 
+#define MSG_SUPPORTS_FASTCHANNEL(x)	((x) & BIT(0))
+
 enum scmi_common_cmd {
 	PROTOCOL_VERSION = 0x0,
 	PROTOCOL_ATTRIBUTES = 0x1,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V3 2/4] firmware: arm_scmi: Skip adding bad duplicates
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
  2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
@ 2024-10-07  6:06 ` Sibi Sankar
  2024-10-07  6:06 ` [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure Sibi Sankar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-07  6:06 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	quic_sibis, johan, konradybcio, linux-pm, tstrudel, rafael,
	Johan Hovold

From: Cristian Marussi <cristian.marussi@arm.com>

Ensure that the bad duplicates reported by the platform firmware
doesn't get added to the opp-tables.

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
 drivers/firmware/arm_scmi/perf.c | 37 ++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c
index 2d77b5f40ca7..38893cc44423 100644
--- a/drivers/firmware/arm_scmi/perf.c
+++ b/drivers/firmware/arm_scmi/perf.c
@@ -373,7 +373,7 @@ static int iter_perf_levels_update_state(struct scmi_iterator_state *st,
 	return 0;
 }
 
-static inline void
+static inline int
 process_response_opp(struct device *dev, struct perf_dom_info *dom,
 		     struct scmi_opp *opp, unsigned int loop_idx,
 		     const struct scmi_msg_resp_perf_describe_levels *r)
@@ -386,12 +386,16 @@ process_response_opp(struct device *dev, struct perf_dom_info *dom,
 		le16_to_cpu(r->opp[loop_idx].transition_latency_us);
 
 	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
-	if (ret)
+	if (ret) {
 		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
 			 opp->perf, dom->info.name, ret);
+		return ret;
+	}
+
+	return 0;
 }
 
-static inline void
+static inline int
 process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
 			struct scmi_opp *opp, unsigned int loop_idx,
 			const struct scmi_msg_resp_perf_describe_levels_v4 *r)
@@ -404,10 +408,13 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
 		le16_to_cpu(r->opp[loop_idx].transition_latency_us);
 
 	ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
-	if (ret)
+	if (ret) {
 		dev_warn(dev, "Failed to add opps_by_lvl at %d for %s - ret:%d\n",
 			 opp->perf, dom->info.name, ret);
 
+		return ret;
+	}
+
 	/* Note that PERF v4 reports always five 32-bit words */
 	opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
 	if (dom->level_indexing_mode) {
@@ -415,13 +422,21 @@ process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
 
 		ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp,
 				GFP_KERNEL);
-		if (ret)
+		if (ret) {
 			dev_warn(dev,
 				 "Failed to add opps_by_idx at %d for %s - ret:%d\n",
 				 opp->level_index, dom->info.name, ret);
 
+			/* Cleanup by_lvl too */
+			xa_erase(&dom->opps_by_lvl, opp->perf);
+
+			return ret;
+		}
+
 		hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq);
 	}
+
+	return 0;
 }
 
 static int
@@ -429,16 +444,20 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
 				  const void *response,
 				  struct scmi_iterator_state *st, void *priv)
 {
+	int ret;
 	struct scmi_opp *opp;
 	struct scmi_perf_ipriv *p = priv;
 
 	opp = &p->perf_dom->opp[st->desc_index + st->loop_idx];
 	if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
-		process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx,
-				     response);
+		ret = process_response_opp(ph->dev, p->perf_dom, opp, st->loop_idx, response);
 	else
-		process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx,
-					response);
+		ret = process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx, response);
+
+	/* Skip BAD duplicates received from firmware */
+	if (ret)
+		return ret == -EBUSY ? 0 : ret;
+
 	p->perf_dom->opp_count++;
 
 	dev_dbg(ph->dev, "Level %d Power %d Latency %dus Ifreq %d Index %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
  2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
  2024-10-07  6:06 ` [PATCH V3 2/4] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
@ 2024-10-07  6:06 ` Sibi Sankar
  2024-10-07 17:33   ` Dmitry Baryshkov
  2024-10-07  6:06 ` [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag Sibi Sankar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Sibi Sankar @ 2024-10-07  6:06 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	quic_sibis, johan, konradybcio, linux-pm, tstrudel, rafael,
	Johan Hovold

The domain attributes returned by the perf protocol can end up
reporting identical names across domains, resulting in debugfs
node creation failure. Fix this failure by ensuring that pm domains
get a unique name using ida in pm_genpd_init.

Logs: [X1E reports 'NCC' for all its scmi perf domains]
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
debugfs: Directory 'NCC' with parent 'pm_genpd' already present!

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
Fixes: 718072ceb211 ("PM: domains: create debugfs nodes when adding power domains")
Fix-suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---

genpd names with ida appended:
power-domain-cpu0_0
power-domain-cpu1_1
....
ebi_18
gfx_19
...
NCC_56
NCC_57
NCC_58

genpd summary with ida appended:
domain                          status          children        performance
    /device                         runtime status                  managed by
    ------------------------------------------------------------------------------
    NCC_58                          on                                                 0
    NCC_57                          on                                                 0
    NCC_56                          on                                                 0
    ...
    gfx_19                          off-0                                              0
    ebi_18                          off-0                                              0
    ...
    power-domain-cpu1_1             off-0                                              0
	genpd:0:cpu1                    suspended                   0           SW
    power-domain-cpu0_0             off-0                                              0
	genpd:0:cpu0                    suspended                   0           SW

 drivers/pmdomain/core.c   | 40 ++++++++++++++++++++++++---------------
 include/linux/pm_domain.h |  1 +
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index 5ede0f7eda09..631cb732bb39 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -7,6 +7,7 @@
 #define pr_fmt(fmt) "PM: " fmt
 
 #include <linux/delay.h>
+#include <linux/idr.h>
 #include <linux/kernel.h>
 #include <linux/io.h>
 #include <linux/platform_device.h>
@@ -23,6 +24,9 @@
 #include <linux/cpu.h>
 #include <linux/debugfs.h>
 
+/* Provides a unique ID for each genpd device */
+static DEFINE_IDA(genpd_ida);
+
 #define GENPD_RETRY_MAX_MS	250		/* Approximate */
 
 #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)		\
@@ -189,7 +193,7 @@ static inline bool irq_safe_dev_in_sleep_domain(struct device *dev,
 
 	if (ret)
 		dev_warn_once(dev, "PM domain %s will not be powered off\n",
-				genpd->name);
+			      dev_name(&genpd->dev));
 
 	return ret;
 }
@@ -274,7 +278,7 @@ static void genpd_debug_remove(struct generic_pm_domain *genpd)
 	if (!genpd_debugfs_dir)
 		return;
 
-	debugfs_lookup_and_remove(genpd->name, genpd_debugfs_dir);
+	debugfs_lookup_and_remove(dev_name(&genpd->dev), genpd_debugfs_dir);
 }
 
 static void genpd_update_accounting(struct generic_pm_domain *genpd)
@@ -731,7 +735,7 @@ static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed)
 	genpd->states[state_idx].power_on_latency_ns = elapsed_ns;
 	genpd->gd->max_off_time_changed = true;
 	pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n",
-		 genpd->name, "on", elapsed_ns);
+		 dev_name(&genpd->dev), "on", elapsed_ns);
 
 out:
 	raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_ON, NULL);
@@ -782,7 +786,7 @@ static int _genpd_power_off(struct generic_pm_domain *genpd, bool timed)
 	genpd->states[state_idx].power_off_latency_ns = elapsed_ns;
 	genpd->gd->max_off_time_changed = true;
 	pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n",
-		 genpd->name, "off", elapsed_ns);
+		 dev_name(&genpd->dev), "off", elapsed_ns);
 
 out:
 	raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_OFF,
@@ -1940,7 +1944,7 @@ int dev_pm_genpd_add_notifier(struct device *dev, struct notifier_block *nb)
 
 	if (ret) {
 		dev_warn(dev, "failed to add notifier for PM domain %s\n",
-			 genpd->name);
+			 dev_name(&genpd->dev));
 		return ret;
 	}
 
@@ -1987,7 +1991,7 @@ int dev_pm_genpd_remove_notifier(struct device *dev)
 
 	if (ret) {
 		dev_warn(dev, "failed to remove notifier for PM domain %s\n",
-			 genpd->name);
+			 dev_name(&genpd->dev));
 		return ret;
 	}
 
@@ -2013,7 +2017,7 @@ static int genpd_add_subdomain(struct generic_pm_domain *genpd,
 	 */
 	if (!genpd_is_irq_safe(genpd) && genpd_is_irq_safe(subdomain)) {
 		WARN(1, "Parent %s of subdomain %s must be IRQ safe\n",
-				genpd->name, subdomain->name);
+		     dev_name(&genpd->dev), subdomain->name);
 		return -EINVAL;
 	}
 
@@ -2088,7 +2092,7 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd,
 
 	if (!list_empty(&subdomain->parent_links) || subdomain->device_count) {
 		pr_warn("%s: unable to remove subdomain %s\n",
-			genpd->name, subdomain->name);
+			dev_name(&genpd->dev), subdomain->name);
 		ret = -EBUSY;
 		goto out;
 	}
@@ -2264,8 +2268,13 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
 	if (ret)
 		return ret;
 
+	ret = ida_alloc(&genpd_ida, GFP_KERNEL);
+	if (ret < 0)
+		return ret;
+	genpd->device_id = ret;
+
 	device_initialize(&genpd->dev);
-	dev_set_name(&genpd->dev, "%s", genpd->name);
+	dev_set_name(&genpd->dev, "%s_%u", genpd->name, genpd->device_id);
 
 	mutex_lock(&gpd_list_lock);
 	list_add(&genpd->gpd_list_node, &gpd_list);
@@ -2287,13 +2296,13 @@ static int genpd_remove(struct generic_pm_domain *genpd)
 
 	if (genpd->has_provider) {
 		genpd_unlock(genpd);
-		pr_err("Provider present, unable to remove %s\n", genpd->name);
+		pr_err("Provider present, unable to remove %s\n", dev_name(&genpd->dev));
 		return -EBUSY;
 	}
 
 	if (!list_empty(&genpd->parent_links) || genpd->device_count) {
 		genpd_unlock(genpd);
-		pr_err("%s: unable to remove %s\n", __func__, genpd->name);
+		pr_err("%s: unable to remove %s\n", __func__, dev_name(&genpd->dev));
 		return -EBUSY;
 	}
 
@@ -2307,9 +2316,10 @@ static int genpd_remove(struct generic_pm_domain *genpd)
 	genpd_unlock(genpd);
 	genpd_debug_remove(genpd);
 	cancel_work_sync(&genpd->power_off_work);
+	ida_free(&genpd_ida, genpd->device_id);
 	genpd_free_data(genpd);
 
-	pr_debug("%s: removed %s\n", __func__, genpd->name);
+	pr_debug("%s: removed %s\n", __func__, dev_name(&genpd->dev));
 
 	return 0;
 }
@@ -3272,12 +3282,12 @@ static int genpd_summary_one(struct seq_file *s,
 	else
 		snprintf(state, sizeof(state), "%s",
 			 status_lookup[genpd->status]);
-	seq_printf(s, "%-30s  %-30s  %u", genpd->name, state, genpd->performance_state);
+	seq_printf(s, "%-30s  %-30s  %u", dev_name(&genpd->dev), state, genpd->performance_state);
 
 	/*
 	 * Modifications on the list require holding locks on both
 	 * parent and child, so we are safe.
-	 * Also genpd->name is immutable.
+	 * Also the device name is immutable.
 	 */
 	list_for_each_entry(link, &genpd->parent_links, parent_node) {
 		if (list_is_first(&link->parent_node, &genpd->parent_links))
@@ -3502,7 +3512,7 @@ static void genpd_debug_add(struct generic_pm_domain *genpd)
 	if (!genpd_debugfs_dir)
 		return;
 
-	d = debugfs_create_dir(genpd->name, genpd_debugfs_dir);
+	d = debugfs_create_dir(dev_name(&genpd->dev), genpd_debugfs_dir);
 
 	debugfs_create_file("current_state", 0444,
 			    d, genpd, &status_fops);
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index b637ec14025f..738df5296ec7 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -163,6 +163,7 @@ struct generic_pm_domain {
 	atomic_t sd_count;	/* Number of subdomains with power "on" */
 	enum gpd_status status;	/* Current state of the domain */
 	unsigned int device_count;	/* Number of devices */
+	unsigned int device_id;		/* unique device id */
 	unsigned int suspended_count;	/* System suspend device counter */
 	unsigned int prepared_count;	/* Suspend counter of prepared devices */
 	unsigned int performance_state;	/* Aggregated max performance state */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
                   ` (2 preceding siblings ...)
  2024-10-07  6:06 ` [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure Sibi Sankar
@ 2024-10-07  6:06 ` Sibi Sankar
  2024-10-07 13:14   ` Konrad Dybcio
  2024-10-10 14:58   ` Johan Hovold
  2024-10-09 11:14 ` [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Ulf Hansson
  2024-10-10 15:02 ` Johan Hovold
  5 siblings, 2 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-07  6:06 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	quic_sibis, johan, konradybcio, linux-pm, tstrudel, rafael,
	Johan Hovold

The qcom-cpucp mailbox irq is expected to function during suspend-resume
cycle particularly when the scmi cpufreq driver can query the current
frequency using the get_level message after the cpus are brought up during
resume. Hence mark the irq with IRQF_NO_SUSPEND flag to fix the do_xfer
failures we see during resume.

Err Logs:
arm-scmi firmware:scmi: timed out in resp(caller:do_xfer+0x164/0x568)
cpufreq: cpufreq_online: ->get() failed

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Closes: https://lore.kernel.org/lkml/ZtgFj1y5ggipgEOS@hovoldconsulting.com/
Fixes: 0e2a9a03106c ("mailbox: Add support for QTI CPUCP mailbox controller")
Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
---
 drivers/mailbox/qcom-cpucp-mbox.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mailbox/qcom-cpucp-mbox.c b/drivers/mailbox/qcom-cpucp-mbox.c
index e5437c294803..44f4ed15f818 100644
--- a/drivers/mailbox/qcom-cpucp-mbox.c
+++ b/drivers/mailbox/qcom-cpucp-mbox.c
@@ -138,7 +138,7 @@ static int qcom_cpucp_mbox_probe(struct platform_device *pdev)
 		return irq;
 
 	ret = devm_request_irq(dev, irq, qcom_cpucp_mbox_irq_fn,
-			       IRQF_TRIGGER_HIGH, "apss_cpucp_mbox", cpucp);
+			       IRQF_TRIGGER_HIGH | IRQF_NO_SUSPEND, "apss_cpucp_mbox", cpucp);
 	if (ret < 0)
 		return dev_err_probe(dev, ret, "Failed to register irq: %d\n", irq);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
  2024-10-07  6:06 ` [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag Sibi Sankar
@ 2024-10-07 13:14   ` Konrad Dybcio
  2024-10-10 14:58   ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Konrad Dybcio @ 2024-10-07 13:14 UTC (permalink / raw)
  To: Sibi Sankar, sudeep.holla, cristian.marussi, ulf.hansson,
	jassisinghbrar
  Cc: linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan,
	konradybcio, linux-pm, tstrudel, rafael, Johan Hovold

On 7.10.2024 8:06 AM, Sibi Sankar wrote:
> The qcom-cpucp mailbox irq is expected to function during suspend-resume
> cycle particularly when the scmi cpufreq driver can query the current
> frequency using the get_level message after the cpus are brought up during
> resume. Hence mark the irq with IRQF_NO_SUSPEND flag to fix the do_xfer
> failures we see during resume.
> 
> Err Logs:
> arm-scmi firmware:scmi: timed out in resp(caller:do_xfer+0x164/0x568)
> cpufreq: cpufreq_online: ->get() failed
> 
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Closes: https://lore.kernel.org/lkml/ZtgFj1y5ggipgEOS@hovoldconsulting.com/
> Fixes: 0e2a9a03106c ("mailbox: Add support for QTI CPUCP mailbox controller")
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---

Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>

Konrad

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure
  2024-10-07  6:06 ` [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure Sibi Sankar
@ 2024-10-07 17:33   ` Dmitry Baryshkov
  2024-10-09 11:11     ` Ulf Hansson
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-10-07 17:33 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan,
	konradybcio, linux-pm, tstrudel, rafael, Johan Hovold

On Mon, Oct 07, 2024 at 11:36:41AM GMT, Sibi Sankar wrote:
> The domain attributes returned by the perf protocol can end up
> reporting identical names across domains, resulting in debugfs
> node creation failure. Fix this failure by ensuring that pm domains
> get a unique name using ida in pm_genpd_init.

Can we make this opt-in or opt-out? Seeing numeric suffixes next to
well-known power domain names (e.g. those comin from RPMh or the CPU
domains) is a bit strange. Or maybe you can limit the IDA suffix just to
the SCMI / perf domains?

> 
> Logs: [X1E reports 'NCC' for all its scmi perf domains]
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
> debugfs: Directory 'NCC' with parent 'pm_genpd' already present!
> 
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> Fixes: 718072ceb211 ("PM: domains: create debugfs nodes when adding power domains")
> Fix-suggested-by: Ulf Hansson <ulf.hansson@linaro.org>

Just "Suggested-by: ..."

> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
> 
> genpd names with ida appended:
> power-domain-cpu0_0
> power-domain-cpu1_1
> ....
> ebi_18
> gfx_19
> ...
> NCC_56
> NCC_57
> NCC_58
> 
> genpd summary with ida appended:
> domain                          status          children        performance
>     /device                         runtime status                  managed by
>     ------------------------------------------------------------------------------
>     NCC_58                          on                                                 0
>     NCC_57                          on                                                 0
>     NCC_56                          on                                                 0
>     ...
>     gfx_19                          off-0                                              0
>     ebi_18                          off-0                                              0
>     ...
>     power-domain-cpu1_1             off-0                                              0
> 	genpd:0:cpu1                    suspended                   0           SW
>     power-domain-cpu0_0             off-0                                              0
> 	genpd:0:cpu0                    suspended                   0           SW
> 
>  drivers/pmdomain/core.c   | 40 ++++++++++++++++++++++++---------------
>  include/linux/pm_domain.h |  1 +
>  2 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
> index 5ede0f7eda09..631cb732bb39 100644
> --- a/drivers/pmdomain/core.c
> +++ b/drivers/pmdomain/core.c
> @@ -7,6 +7,7 @@
>  #define pr_fmt(fmt) "PM: " fmt
>  
>  #include <linux/delay.h>
> +#include <linux/idr.h>
>  #include <linux/kernel.h>
>  #include <linux/io.h>
>  #include <linux/platform_device.h>
> @@ -23,6 +24,9 @@
>  #include <linux/cpu.h>
>  #include <linux/debugfs.h>
>  
> +/* Provides a unique ID for each genpd device */
> +static DEFINE_IDA(genpd_ida);
> +
>  #define GENPD_RETRY_MAX_MS	250		/* Approximate */
>  
>  #define GENPD_DEV_CALLBACK(genpd, type, callback, dev)		\
> @@ -189,7 +193,7 @@ static inline bool irq_safe_dev_in_sleep_domain(struct device *dev,
>  
>  	if (ret)
>  		dev_warn_once(dev, "PM domain %s will not be powered off\n",
> -				genpd->name);
> +			      dev_name(&genpd->dev));
>  
>  	return ret;
>  }
> @@ -274,7 +278,7 @@ static void genpd_debug_remove(struct generic_pm_domain *genpd)
>  	if (!genpd_debugfs_dir)
>  		return;
>  
> -	debugfs_lookup_and_remove(genpd->name, genpd_debugfs_dir);
> +	debugfs_lookup_and_remove(dev_name(&genpd->dev), genpd_debugfs_dir);
>  }
>  
>  static void genpd_update_accounting(struct generic_pm_domain *genpd)
> @@ -731,7 +735,7 @@ static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed)
>  	genpd->states[state_idx].power_on_latency_ns = elapsed_ns;
>  	genpd->gd->max_off_time_changed = true;
>  	pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n",
> -		 genpd->name, "on", elapsed_ns);
> +		 dev_name(&genpd->dev), "on", elapsed_ns);
>  
>  out:
>  	raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_ON, NULL);
> @@ -782,7 +786,7 @@ static int _genpd_power_off(struct generic_pm_domain *genpd, bool timed)
>  	genpd->states[state_idx].power_off_latency_ns = elapsed_ns;
>  	genpd->gd->max_off_time_changed = true;
>  	pr_debug("%s: Power-%s latency exceeded, new value %lld ns\n",
> -		 genpd->name, "off", elapsed_ns);
> +		 dev_name(&genpd->dev), "off", elapsed_ns);
>  
>  out:
>  	raw_notifier_call_chain(&genpd->power_notifiers, GENPD_NOTIFY_OFF,
> @@ -1940,7 +1944,7 @@ int dev_pm_genpd_add_notifier(struct device *dev, struct notifier_block *nb)
>  
>  	if (ret) {
>  		dev_warn(dev, "failed to add notifier for PM domain %s\n",
> -			 genpd->name);
> +			 dev_name(&genpd->dev));
>  		return ret;
>  	}
>  
> @@ -1987,7 +1991,7 @@ int dev_pm_genpd_remove_notifier(struct device *dev)
>  
>  	if (ret) {
>  		dev_warn(dev, "failed to remove notifier for PM domain %s\n",
> -			 genpd->name);
> +			 dev_name(&genpd->dev));
>  		return ret;
>  	}
>  
> @@ -2013,7 +2017,7 @@ static int genpd_add_subdomain(struct generic_pm_domain *genpd,
>  	 */
>  	if (!genpd_is_irq_safe(genpd) && genpd_is_irq_safe(subdomain)) {
>  		WARN(1, "Parent %s of subdomain %s must be IRQ safe\n",
> -				genpd->name, subdomain->name);
> +		     dev_name(&genpd->dev), subdomain->name);
>  		return -EINVAL;
>  	}
>  
> @@ -2088,7 +2092,7 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd,
>  
>  	if (!list_empty(&subdomain->parent_links) || subdomain->device_count) {
>  		pr_warn("%s: unable to remove subdomain %s\n",
> -			genpd->name, subdomain->name);
> +			dev_name(&genpd->dev), subdomain->name);
>  		ret = -EBUSY;
>  		goto out;
>  	}
> @@ -2264,8 +2268,13 @@ int pm_genpd_init(struct generic_pm_domain *genpd,
>  	if (ret)
>  		return ret;
>  
> +	ret = ida_alloc(&genpd_ida, GFP_KERNEL);
> +	if (ret < 0)
> +		return ret;
> +	genpd->device_id = ret;
> +
>  	device_initialize(&genpd->dev);
> -	dev_set_name(&genpd->dev, "%s", genpd->name);
> +	dev_set_name(&genpd->dev, "%s_%u", genpd->name, genpd->device_id);
>  
>  	mutex_lock(&gpd_list_lock);
>  	list_add(&genpd->gpd_list_node, &gpd_list);
> @@ -2287,13 +2296,13 @@ static int genpd_remove(struct generic_pm_domain *genpd)
>  
>  	if (genpd->has_provider) {
>  		genpd_unlock(genpd);
> -		pr_err("Provider present, unable to remove %s\n", genpd->name);
> +		pr_err("Provider present, unable to remove %s\n", dev_name(&genpd->dev));
>  		return -EBUSY;
>  	}
>  
>  	if (!list_empty(&genpd->parent_links) || genpd->device_count) {
>  		genpd_unlock(genpd);
> -		pr_err("%s: unable to remove %s\n", __func__, genpd->name);
> +		pr_err("%s: unable to remove %s\n", __func__, dev_name(&genpd->dev));
>  		return -EBUSY;
>  	}
>  
> @@ -2307,9 +2316,10 @@ static int genpd_remove(struct generic_pm_domain *genpd)
>  	genpd_unlock(genpd);
>  	genpd_debug_remove(genpd);
>  	cancel_work_sync(&genpd->power_off_work);
> +	ida_free(&genpd_ida, genpd->device_id);
>  	genpd_free_data(genpd);
>  
> -	pr_debug("%s: removed %s\n", __func__, genpd->name);
> +	pr_debug("%s: removed %s\n", __func__, dev_name(&genpd->dev));
>  
>  	return 0;
>  }
> @@ -3272,12 +3282,12 @@ static int genpd_summary_one(struct seq_file *s,
>  	else
>  		snprintf(state, sizeof(state), "%s",
>  			 status_lookup[genpd->status]);
> -	seq_printf(s, "%-30s  %-30s  %u", genpd->name, state, genpd->performance_state);
> +	seq_printf(s, "%-30s  %-30s  %u", dev_name(&genpd->dev), state, genpd->performance_state);
>  
>  	/*
>  	 * Modifications on the list require holding locks on both
>  	 * parent and child, so we are safe.
> -	 * Also genpd->name is immutable.
> +	 * Also the device name is immutable.
>  	 */
>  	list_for_each_entry(link, &genpd->parent_links, parent_node) {
>  		if (list_is_first(&link->parent_node, &genpd->parent_links))
> @@ -3502,7 +3512,7 @@ static void genpd_debug_add(struct generic_pm_domain *genpd)
>  	if (!genpd_debugfs_dir)
>  		return;
>  
> -	d = debugfs_create_dir(genpd->name, genpd_debugfs_dir);
> +	d = debugfs_create_dir(dev_name(&genpd->dev), genpd_debugfs_dir);
>  
>  	debugfs_create_file("current_state", 0444,
>  			    d, genpd, &status_fops);
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index b637ec14025f..738df5296ec7 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -163,6 +163,7 @@ struct generic_pm_domain {
>  	atomic_t sd_count;	/* Number of subdomains with power "on" */
>  	enum gpd_status status;	/* Current state of the domain */
>  	unsigned int device_count;	/* Number of devices */
> +	unsigned int device_id;		/* unique device id */
>  	unsigned int suspended_count;	/* System suspend device counter */
>  	unsigned int prepared_count;	/* Suspend counter of prepared devices */
>  	unsigned int performance_state;	/* Aggregated max performance state */
> -- 
> 2.34.1
> 

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure
  2024-10-07 17:33   ` Dmitry Baryshkov
@ 2024-10-09 11:11     ` Ulf Hansson
  2024-10-10 12:47       ` Dmitry Baryshkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ulf Hansson @ 2024-10-09 11:11 UTC (permalink / raw)
  To: Sibi Sankar, Dmitry Baryshkov
  Cc: sudeep.holla, cristian.marussi, jassisinghbrar, linux-kernel,
	arm-scmi, linux-arm-kernel, linux-arm-msm, johan, konradybcio,
	linux-pm, tstrudel, rafael, Johan Hovold

On Mon, 7 Oct 2024 at 19:33, Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On Mon, Oct 07, 2024 at 11:36:41AM GMT, Sibi Sankar wrote:
> > The domain attributes returned by the perf protocol can end up
> > reporting identical names across domains, resulting in debugfs
> > node creation failure. Fix this failure by ensuring that pm domains
> > get a unique name using ida in pm_genpd_init.

Thanks for working on this!

>
> Can we make this opt-in or opt-out? Seeing numeric suffixes next to
> well-known power domain names (e.g. those comin from RPMh or the CPU
> domains) is a bit strange. Or maybe you can limit the IDA suffix just to
> the SCMI / perf domains?

I was also thinking something along the lines of this.

Another thing on top of what Dmitry suggests, could be to iterate
through the &gpd_list and compare the existing genpd->names with the
one that we are adding in pm_genpd_init(). In this way, we don't need
to add the IDA to more than those that really need it.

What do you think?

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
                   ` (3 preceding siblings ...)
  2024-10-07  6:06 ` [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag Sibi Sankar
@ 2024-10-09 11:14 ` Ulf Hansson
  2024-10-10 15:02 ` Johan Hovold
  5 siblings, 0 replies; 26+ messages in thread
From: Ulf Hansson @ 2024-10-09 11:14 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, jassisinghbrar, linux-kernel,
	arm-scmi, linux-arm-kernel, linux-arm-msm, johan, konradybcio,
	linux-pm, tstrudel, rafael

On Mon, 7 Oct 2024 at 08:07, Sibi Sankar <quic_sibis@quicinc.com> wrote:
>
> The series addresses the kernel warnings reported by Johan at [1] and are
> are required to X1E cpufreq device tree changes [2] to land.
>
> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>
> The following warnings remain unadressed:
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>
> Duplicate levels:
> arm-scmi arm-scmi.0.auto: Level 2976000 Power 218062 Latency 30us Ifreq 2976000 Index 10
> arm-scmi arm-scmi.0.auto: Level 3206400 Power 264356 Latency 30us Ifreq 3206400 Index 11
> arm-scmi arm-scmi.0.auto: Level 3417600 Power 314966 Latency 30us Ifreq 3417600 Index 12
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Level 4012800 Power 528848 Latency 30us Ifreq 4012800 Index 15
>
> ^^ exist because SCP reports duplicate values for the highest sustainable
> freq for perf domains 1 and 2. These are the only freqs that appear as
> duplicates and will be fixed with a firmware update. FWIW the warnings
> that we are addressing in this series will also get fixed by a firmware
> update but they still have to land for devices already out in the wild.
>
> V2:
> * Include the fix for do_xfer timeout
> * Include the fix debugfs node creation failure
> * Include Cristian's fix for skipping opp duplication
>
> V1:
> * add missing MSG_SUPPORTS_FASTCHANNEL definition.
>
> Base branch: next-20241004
>
> Cristian Marussi (1):
>   firmware: arm_scmi: Skip adding bad duplicates
>
> Sibi Sankar (3):
>   firmware: arm_scmi: Ensure that the message-id supports fastchannel
>   pmdomain: core: Fix debugfs node creation failure
>   mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
>

Is there a preferred way to merge this?

I can certainly pick the pmdomain patch, as it looks independent for
the other. Or let me know if you prefer that I take them all?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel
  2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
@ 2024-10-09 13:46   ` Sudeep Holla
  2024-10-10 14:55   ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Sudeep Holla @ 2024-10-09 13:46 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: cristian.marussi, ulf.hansson, jassisinghbrar, linux-kernel,
	arm-scmi, linux-arm-kernel, linux-arm-msm, johan, konradybcio,
	linux-pm, tstrudel, rafael, Johan Hovold

On Mon, Oct 07, 2024 at 11:36:39AM +0530, Sibi Sankar wrote:
> Currently the perf and powercap protocol relies on the protocol domain
> attributes, which just ensures that one fastchannel per domain, before
> instantiating fastchannels for all possible message-ids. Fix this by
> ensuring that each message-id supports fastchannel before initialization.
>

Looks good to me. With the minor nit below addressed,

Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>

(assuming you will take this all via pmdomain or qcom soc tree)

> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>
> ---
>  drivers/firmware/arm_scmi/driver.c    | 9 +++++++++
>  drivers/firmware/arm_scmi/protocols.h | 2 ++
>  2 files changed, 11 insertions(+)
>
> diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
> index 88c5c4ff4bb6..80a9a615672a 100644
> --- a/drivers/firmware/arm_scmi/driver.c
> +++ b/drivers/firmware/arm_scmi/driver.c
> @@ -56,6 +56,9 @@ static atomic_t transfer_last_id;
>
>  static struct dentry *scmi_top_dentry;
>
> +static int scmi_protocol_msg_check(const struct scmi_protocol_handle *ph,
> +				   u32 message_id, u32 *attributes);
>

I prefer to just move the function above if possible to avoid this extra
declaration just keep keep it consistent with other such internal/static
function calls within this file. No hard opinion, just preference to avoid
me thinking(or scratching my head) why only this is done different few
months down the line.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure
  2024-10-09 11:11     ` Ulf Hansson
@ 2024-10-10 12:47       ` Dmitry Baryshkov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-10-10 12:47 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Sibi Sankar, sudeep.holla, cristian.marussi, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm, johan,
	konradybcio, linux-pm, tstrudel, rafael, Johan Hovold

On Wed, Oct 09, 2024 at 01:11:14PM GMT, Ulf Hansson wrote:
> On Mon, 7 Oct 2024 at 19:33, Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
> >
> > On Mon, Oct 07, 2024 at 11:36:41AM GMT, Sibi Sankar wrote:
> > > The domain attributes returned by the perf protocol can end up
> > > reporting identical names across domains, resulting in debugfs
> > > node creation failure. Fix this failure by ensuring that pm domains
> > > get a unique name using ida in pm_genpd_init.
> 
> Thanks for working on this!
> 
> >
> > Can we make this opt-in or opt-out? Seeing numeric suffixes next to
> > well-known power domain names (e.g. those comin from RPMh or the CPU
> > domains) is a bit strange. Or maybe you can limit the IDA suffix just to
> > the SCMI / perf domains?
> 
> I was also thinking something along the lines of this.
> 
> Another thing on top of what Dmitry suggests, could be to iterate
> through the &gpd_list and compare the existing genpd->names with the
> one that we are adding in pm_genpd_init(). In this way, we don't need
> to add the IDA to more than those that really need it.
> 
> What do you think?

I have no strong preference. Your proposal sounds good to me too.

> 
> [...]
> 
> Kind regards
> Uffe

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel
  2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
  2024-10-09 13:46   ` Sudeep Holla
@ 2024-10-10 14:55   ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Johan Hovold @ 2024-10-10 14:55 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael, Johan Hovold

On Mon, Oct 07, 2024 at 11:36:39AM +0530, Sibi Sankar wrote:
> Currently the perf and powercap protocol relies on the protocol domain
> attributes, which just ensures that one fastchannel per domain, before
> instantiating fastchannels for all possible message-ids. Fix this by
> ensuring that each message-id supports fastchannel before initialization.

Perhaps you could include the error message I reported here so that
anyone searching for that error will find this fix more easily.

> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> Fixes: 6f9ea4dabd2d ("firmware: arm_scmi: Generalize the fast channel support")
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>

Do we want this to be backported as well so that you should add a
CC-stable tag?

Either way, this does seem to address the FC errors I reported:

Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
  2024-10-07  6:06 ` [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag Sibi Sankar
  2024-10-07 13:14   ` Konrad Dybcio
@ 2024-10-10 14:58   ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Johan Hovold @ 2024-10-10 14:58 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael, Johan Hovold

On Mon, Oct 07, 2024 at 11:36:42AM +0530, Sibi Sankar wrote:
> The qcom-cpucp mailbox irq is expected to function during suspend-resume
> cycle particularly when the scmi cpufreq driver can query the current
> frequency using the get_level message after the cpus are brought up during
> resume. Hence mark the irq with IRQF_NO_SUSPEND flag to fix the do_xfer
> failures we see during resume.
> 
> Err Logs:
> arm-scmi firmware:scmi: timed out in resp(caller:do_xfer+0x164/0x568)
> cpufreq: cpufreq_online: ->get() failed
> 
> Reported-by: Johan Hovold <johan+linaro@kernel.org>
> Closes: https://lore.kernel.org/lkml/ZtgFj1y5ggipgEOS@hovoldconsulting.com/
> Fixes: 0e2a9a03106c ("mailbox: Add support for QTI CPUCP mailbox controller")
> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com>

Seems to work as intended:

Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
                   ` (4 preceding siblings ...)
  2024-10-09 11:14 ` [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Ulf Hansson
@ 2024-10-10 15:02 ` Johan Hovold
  2024-10-23  7:46   ` Sibi Sankar
  5 siblings, 1 reply; 26+ messages in thread
From: Johan Hovold @ 2024-10-10 15:02 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael

On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> The series addresses the kernel warnings reported by Johan at [1] and are
> are required to X1E cpufreq device tree changes [2] to land.
> 
> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> 
> The following warnings remain unadressed:
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

Are there any plans for how to address these?

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-10 15:02 ` Johan Hovold
@ 2024-10-23  7:46   ` Sibi Sankar
  2024-10-23 16:26     ` Johan Hovold
  0 siblings, 1 reply; 26+ messages in thread
From: Sibi Sankar @ 2024-10-23  7:46 UTC (permalink / raw)
  To: Johan Hovold
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael



On 10/10/24 20:32, Johan Hovold wrote:
> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>> The series addresses the kernel warnings reported by Johan at [1] and are
>> are required to X1E cpufreq device tree changes [2] to land.
>>
>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>
>> The following warnings remain unadressed:
>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
> Are there any plans for how to address these?

Hey Johan,
Sorry missed replying to this. The error implies that duplicate
opps are reported by the SCP firmware and appear once during probe.
This particular error can be fixed only by a firmware update and you
should be able to test it out soon on the CRD first.

"FWIW the warnings that we are addressing in this series will also get
fixed by a firmware update but they still have to land for devices
already out in the wild."


> 
> Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-23  7:46   ` Sibi Sankar
@ 2024-10-23 16:26     ` Johan Hovold
  2024-10-25  6:08       ` Sibi Sankar
  0 siblings, 1 reply; 26+ messages in thread
From: Johan Hovold @ 2024-10-23 16:26 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael

On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> On 10/10/24 20:32, Johan Hovold wrote:
> > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >> The series addresses the kernel warnings reported by Johan at [1] and are
> >> are required to X1E cpufreq device tree changes [2] to land.
> >>
> >> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>
> >> The following warnings remain unadressed:
> >> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > 
> > Are there any plans for how to address these?

> Sorry missed replying to this. The error implies that duplicate
> opps are reported by the SCP firmware and appear once during probe.

I only see it at boot, but it shows up four times here with the CRD:

[    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
[    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

> This particular error can be fixed only by a firmware update and you
> should be able to test it out soon on the CRD first.

Can you explain why this can only be fixed by a firmware update? Why
can't we suppress these warnings as well, like we did for the other
warnings related to the duplicate entries?

IIUC the firmware is not really broken, but rather describes a feature
that Linux does not (yet) support, right?

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-23 16:26     ` Johan Hovold
@ 2024-10-25  6:08       ` Sibi Sankar
  2024-10-25  6:14         ` Dmitry Baryshkov
  2024-10-25 13:23         ` Johan Hovold
  0 siblings, 2 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-25  6:08 UTC (permalink / raw)
  To: Johan Hovold
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael



On 10/23/24 21:56, Johan Hovold wrote:
> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
>> On 10/10/24 20:32, Johan Hovold wrote:
>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>>>> The series addresses the kernel warnings reported by Johan at [1] and are
>>>> are required to X1E cpufreq device tree changes [2] to land.
>>>>
>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>>>
>>>> The following warnings remain unadressed:
>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>
>>> Are there any plans for how to address these?
> 
>> Sorry missed replying to this. The error implies that duplicate
>> opps are reported by the SCP firmware and appear once during probe.
> 
> I only see it at boot, but it shows up four times here with the CRD:

https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/

As explained ^^, we see duplicates for max sustainable performance twice
for each domain.

> 
> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
>> This particular error can be fixed only by a firmware update and you
>> should be able to test it out soon on the CRD first.
> 
> Can you explain why this can only be fixed by a firmware update? Why
> can't we suppress these warnings as well, like we did for the other
> warnings related to the duplicate entries?
> 
> IIUC the firmware is not really broken, but rather describes a feature
> that Linux does not (yet) support, right?

We keep saying it's a buggy firmware because the SCP firmware reports
identical perf and power levels for the additional two opps and the
kernel has no way of treating it otherwise and we shouldn't suppress
them. Out of the two duplicate opps reported one is a artifact from how
Qualcomm usually show a transition to boost frequencies. The second opp
which you say is a feature should be treated as a boost opp i.e. one
core can run at max at a lower power when other cores are at idle but
we can start marking them as such once they start advertising their
correct power requirements. So I maintain that this is the best we
can do and need a firmware update for us to address anything more.

-Sibi


> 
> Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25  6:08       ` Sibi Sankar
@ 2024-10-25  6:14         ` Dmitry Baryshkov
  2024-10-25  6:45           ` Sibi Sankar
  2024-10-25 13:23         ` Johan Hovold
  1 sibling, 1 reply; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-10-25  6:14 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Johan Hovold, sudeep.holla, cristian.marussi, ulf.hansson,
	jassisinghbrar, linux-kernel, arm-scmi, linux-arm-kernel,
	linux-arm-msm, konradybcio, linux-pm, tstrudel, rafael

On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> 
> 
> On 10/23/24 21:56, Johan Hovold wrote:
> > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > On 10/10/24 20:32, Johan Hovold wrote:
> > > > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > are required to X1E cpufreq device tree changes [2] to land.
> > > > > 
> > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > > [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > > > 
> > > > > The following warnings remain unadressed:
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > 
> > > > Are there any plans for how to address these?
> > 
> > > Sorry missed replying to this. The error implies that duplicate
> > > opps are reported by the SCP firmware and appear once during probe.
> > 
> > I only see it at boot, but it shows up four times here with the CRD:
> 
> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> 
> As explained ^^, we see duplicates for max sustainable performance twice
> for each domain.

If existing products were shipped with the firmware that lists single
freq twice, please filter the frequencies like qcom-cpufreq-hw does.

> 
> > 
> > [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > 
> > > This particular error can be fixed only by a firmware update and you
> > > should be able to test it out soon on the CRD first.
> > 
> > Can you explain why this can only be fixed by a firmware update? Why
> > can't we suppress these warnings as well, like we did for the other
> > warnings related to the duplicate entries?
> > 
> > IIUC the firmware is not really broken, but rather describes a feature
> > that Linux does not (yet) support, right?
> 
> We keep saying it's a buggy firmware because the SCP firmware reports
> identical perf and power levels for the additional two opps and the
> kernel has no way of treating it otherwise and we shouldn't suppress
> them. Out of the two duplicate opps reported one is a artifact from how
> Qualcomm usually show a transition to boost frequencies. The second opp
> which you say is a feature should be treated as a boost opp i.e. one
> core can run at max at a lower power when other cores are at idle but
> we can start marking them as such once they start advertising their
> correct power requirements. So I maintain that this is the best we
> can do and need a firmware update for us to address anything more.

Will existing shipping products get these firmware updates?

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25  6:14         ` Dmitry Baryshkov
@ 2024-10-25  6:45           ` Sibi Sankar
  2024-10-25  8:28             ` Cristian Marussi
  2024-10-25 10:11             ` Dmitry Baryshkov
  0 siblings, 2 replies; 26+ messages in thread
From: Sibi Sankar @ 2024-10-25  6:45 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Johan Hovold, sudeep.holla, cristian.marussi, ulf.hansson,
	jassisinghbrar, linux-kernel, arm-scmi, linux-arm-kernel,
	linux-arm-msm, konradybcio, linux-pm, tstrudel, rafael



On 10/25/24 11:44, Dmitry Baryshkov wrote:
> On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
>>
>>
>> On 10/23/24 21:56, Johan Hovold wrote:
>>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
>>>> On 10/10/24 20:32, Johan Hovold wrote:
>>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
>>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
>>>>>> are required to X1E cpufreq device tree changes [2] to land.
>>>>>>
>>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
>>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
>>>>>>
>>>>>> The following warnings remain unadressed:
>>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>>>
>>>>> Are there any plans for how to address these?
>>>
>>>> Sorry missed replying to this. The error implies that duplicate
>>>> opps are reported by the SCP firmware and appear once during probe.
>>>
>>> I only see it at boot, but it shows up four times here with the CRD:
>>
>> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
>>
>> As explained ^^, we see duplicates for max sustainable performance twice
>> for each domain.
> 
> If existing products were shipped with the firmware that lists single
> freq twice, please filter the frequencies like qcom-cpufreq-hw does.

That was a qualcomm specific driver and hence we could do such
kind of filtering. This however is the generic scmi perf protocol
and it's not something we should ever consider introducing :/

> 
>>
>>>
>>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
>>>
>>>> This particular error can be fixed only by a firmware update and you
>>>> should be able to test it out soon on the CRD first.
>>>
>>> Can you explain why this can only be fixed by a firmware update? Why
>>> can't we suppress these warnings as well, like we did for the other
>>> warnings related to the duplicate entries?
>>>
>>> IIUC the firmware is not really broken, but rather describes a feature
>>> that Linux does not (yet) support, right?
>>
>> We keep saying it's a buggy firmware because the SCP firmware reports
>> identical perf and power levels for the additional two opps and the
>> kernel has no way of treating it otherwise and we shouldn't suppress
>> them. Out of the two duplicate opps reported one is a artifact from how
>> Qualcomm usually show a transition to boost frequencies. The second opp
>> which you say is a feature should be treated as a boost opp i.e. one
>> core can run at max at a lower power when other cores are at idle but
>> we can start marking them as such once they start advertising their
>> correct power requirements. So I maintain that this is the best we
>> can do and need a firmware update for us to address anything more.
> 
> Will existing shipping products get these firmware updates?

They are sure to trickle out but I guess it's upto the oem
to decide if they do want to pick these up like some of the
other firmware updates being tested only on CRD. Not sure why
warnings duplicates should block cpufreq from landing for x1e
but if that's what the community wants I can drop reposting
this series!

-Sibi

> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25  6:45           ` Sibi Sankar
@ 2024-10-25  8:28             ` Cristian Marussi
  2024-10-25 10:11             ` Dmitry Baryshkov
  1 sibling, 0 replies; 26+ messages in thread
From: Cristian Marussi @ 2024-10-25  8:28 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Dmitry Baryshkov, Johan Hovold, sudeep.holla, cristian.marussi,
	ulf.hansson, jassisinghbrar, linux-kernel, arm-scmi,
	linux-arm-kernel, linux-arm-msm, konradybcio, linux-pm, tstrudel,
	rafael

On Fri, Oct 25, 2024 at 12:15:59PM +0530, Sibi Sankar wrote:
> 
> 
> On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > > 

Hi,

> > > 
> > > On 10/23/24 21:56, Johan Hovold wrote:
> > > > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > > > On 10/10/24 20:32, Johan Hovold wrote:
> > > > > > On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > > > > > The series addresses the kernel warnings reported by Johan at [1] and are
> > > > > > > are required to X1E cpufreq device tree changes [2] to land.
> > > > > > > 
> > > > > > > [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > > > > > [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > > > > > 
> > > > > > > The following warnings remain unadressed:
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > > arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > > > 
> > > > > > Are there any plans for how to address these?
> > > > 
> > > > > Sorry missed replying to this. The error implies that duplicate
> > > > > opps are reported by the SCP firmware and appear once during probe.
> > > > 
> > > > I only see it at boot, but it shows up four times here with the CRD:
> > > 
> > > https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > > 
> > > As explained ^^, we see duplicates for max sustainable performance twice
> > > for each domain.
> > 
> > If existing products were shipped with the firmware that lists single
> > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> 
> That was a qualcomm specific driver and hence we could do such
> kind of filtering. This however is the generic scmi perf protocol
> and it's not something we should ever consider introducing :/
> 

+1

In the case of the other warnings, those were similarly related to
duplicates, but the warns themselves were genereated by the OPP
subsystem when trying to register a duplicate...it was indeed a bug
at the SCMI layer to try registering a well-known duplicate with
the OPP subsytem, so it was fixed in the SCMI stack...avoiding to
propagate it to the OPP layer...but the duplicate error condition
indeed still exist (since dependent on what the fw spits out) and they
are trapped at the SCMI level and those noisy warning are just meant
to trap this kind of firmware anomalies...

...IOW who would have ever spotted this thing and considered to fix the
firmware in future releases without the warnings :P ?

> > 
> > > 
> > > > 
> > > > [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > 
> > > > > This particular error can be fixed only by a firmware update and you
> > > > > should be able to test it out soon on the CRD first.
> > > > 
> > > > Can you explain why this can only be fixed by a firmware update? Why
> > > > can't we suppress these warnings as well, like we did for the other
> > > > warnings related to the duplicate entries?
> > > > 
> > > > IIUC the firmware is not really broken, but rather describes a feature
> > > > that Linux does not (yet) support, right?
> > > 
> > > We keep saying it's a buggy firmware because the SCP firmware reports
> > > identical perf and power levels for the additional two opps and the
> > > kernel has no way of treating it otherwise and we shouldn't suppress
> > > them. Out of the two duplicate opps reported one is a artifact from how
> > > Qualcomm usually show a transition to boost frequencies. The second opp
> > > which you say is a feature should be treated as a boost opp i.e. one
> > > core can run at max at a lower power when other cores are at idle but
> > > we can start marking them as such once they start advertising their
> > > correct power requirements. So I maintain that this is the best we
> > > can do and need a firmware update for us to address anything more.
> > 
> > Will existing shipping products get these firmware updates?
> 
> They are sure to trickle out but I guess it's upto the oem
> to decide if they do want to pick these up like some of the
> other firmware updates being tested only on CRD. Not sure why
> warnings duplicates should block cpufreq from landing for x1e
> but if that's what the community wants I can drop reposting
> this series!

Not sure indeed which is the problem with such warnings since they are
just doing their job...in general we tend not to disrupt operation even
when the firmware is buggy (if possible) BUT we definitely try to be
noisy about that to have firmware fixed (...not that fw guys seems so
scared usually about warnings though :P)

Anyway, I'm totally with Sibi here unless there is an impact at the
functional level...Sudeep may think otherwise of course ...

Thanks
Cristian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25  6:45           ` Sibi Sankar
  2024-10-25  8:28             ` Cristian Marussi
@ 2024-10-25 10:11             ` Dmitry Baryshkov
  2024-10-25 10:29               ` Cristian Marussi
  1 sibling, 1 reply; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-10-25 10:11 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: Johan Hovold, sudeep.holla, cristian.marussi, ulf.hansson,
	jassisinghbrar, linux-kernel, arm-scmi, linux-arm-kernel,
	linux-arm-msm, konradybcio, linux-pm, tstrudel, rafael

On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
>
>
>
> On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> >>
> >>
> >> On 10/23/24 21:56, Johan Hovold wrote:
> >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> >>>> On 10/10/24 20:32, Johan Hovold wrote:
> >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> >>>>>>
> >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>>>>>
> >>>>>> The following warnings remain unadressed:
> >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>>>
> >>>>> Are there any plans for how to address these?
> >>>
> >>>> Sorry missed replying to this. The error implies that duplicate
> >>>> opps are reported by the SCP firmware and appear once during probe.
> >>>
> >>> I only see it at boot, but it shows up four times here with the CRD:
> >>
> >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> >>
> >> As explained ^^, we see duplicates for max sustainable performance twice
> >> for each domain.
> >
> > If existing products were shipped with the firmware that lists single
> > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
>
> That was a qualcomm specific driver and hence we could do such
> kind of filtering. This however is the generic scmi perf protocol
> and it's not something we should ever consider introducing :/

This depends on the maintainer's discretion.

>
> >
> >>
> >>>
> >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>
> >>>> This particular error can be fixed only by a firmware update and you
> >>>> should be able to test it out soon on the CRD first.
> >>>
> >>> Can you explain why this can only be fixed by a firmware update? Why
> >>> can't we suppress these warnings as well, like we did for the other
> >>> warnings related to the duplicate entries?
> >>>
> >>> IIUC the firmware is not really broken, but rather describes a feature
> >>> that Linux does not (yet) support, right?
> >>
> >> We keep saying it's a buggy firmware because the SCP firmware reports
> >> identical perf and power levels for the additional two opps and the
> >> kernel has no way of treating it otherwise and we shouldn't suppress
> >> them. Out of the two duplicate opps reported one is a artifact from how
> >> Qualcomm usually show a transition to boost frequencies. The second opp
> >> which you say is a feature should be treated as a boost opp i.e. one
> >> core can run at max at a lower power when other cores are at idle but
> >> we can start marking them as such once they start advertising their
> >> correct power requirements. So I maintain that this is the best we
> >> can do and need a firmware update for us to address anything more.
> >
> > Will existing shipping products get these firmware updates?
>
> They are sure to trickle out but I guess it's upto the oem
> to decide if they do want to pick these up like some of the
> other firmware updates being tested only on CRD. Not sure why
> warnings duplicates should block cpufreq from landing for x1e
> but if that's what the community wants I can drop reposting
> this series!

No, the community definitely wants to have cpufreq for X1E.
But at the same time some reviewers prefer to have a warning-free boot
if those warnings can't be really fixed. I don't have such a strict
position, but I'd prefer to see those messages at dev_info or dev_dbg
level.

Also, can we please have some plan or idea if somebody is actually
working with laptop vendors to get corresponding firmware updates onto
their hardware?

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25 10:11             ` Dmitry Baryshkov
@ 2024-10-25 10:29               ` Cristian Marussi
  2024-10-25 11:37                 ` Dmitry Baryshkov
  2024-10-25 13:32                 ` Johan Hovold
  0 siblings, 2 replies; 26+ messages in thread
From: Cristian Marussi @ 2024-10-25 10:29 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Sibi Sankar, Johan Hovold, sudeep.holla, cristian.marussi,
	ulf.hansson, jassisinghbrar, linux-kernel, arm-scmi,
	linux-arm-kernel, linux-arm-msm, konradybcio, linux-pm, tstrudel,
	rafael

On Fri, Oct 25, 2024 at 01:11:37PM +0300, Dmitry Baryshkov wrote:
> On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> >
> >
> >
> > On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > >>
> > >>
> > >> On 10/23/24 21:56, Johan Hovold wrote:
> > >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > >>>> On 10/10/24 20:32, Johan Hovold wrote:
> > >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> > >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> > >>>>>>
> > >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > >>>>>>
> > >>>>>> The following warnings remain unadressed:
> > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>>>
> > >>>>> Are there any plans for how to address these?
> > >>>
> > >>>> Sorry missed replying to this. The error implies that duplicate
> > >>>> opps are reported by the SCP firmware and appear once during probe.
> > >>>
> > >>> I only see it at boot, but it shows up four times here with the CRD:
> > >>
> > >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > >>
> > >> As explained ^^, we see duplicates for max sustainable performance twice
> > >> for each domain.
> > >
> > > If existing products were shipped with the firmware that lists single
> > > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> >
> > That was a qualcomm specific driver and hence we could do such
> > kind of filtering. This however is the generic scmi perf protocol
> > and it's not something we should ever consider introducing :/
> 
> This depends on the maintainer's discretion.
> 
> >
> > >
> > >>
> > >>>
> > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > >>>
> > >>>> This particular error can be fixed only by a firmware update and you
> > >>>> should be able to test it out soon on the CRD first.
> > >>>
> > >>> Can you explain why this can only be fixed by a firmware update? Why
> > >>> can't we suppress these warnings as well, like we did for the other
> > >>> warnings related to the duplicate entries?
> > >>>
> > >>> IIUC the firmware is not really broken, but rather describes a feature
> > >>> that Linux does not (yet) support, right?
> > >>
> > >> We keep saying it's a buggy firmware because the SCP firmware reports
> > >> identical perf and power levels for the additional two opps and the
> > >> kernel has no way of treating it otherwise and we shouldn't suppress
> > >> them. Out of the two duplicate opps reported one is a artifact from how
> > >> Qualcomm usually show a transition to boost frequencies. The second opp
> > >> which you say is a feature should be treated as a boost opp i.e. one
> > >> core can run at max at a lower power when other cores are at idle but
> > >> we can start marking them as such once they start advertising their
> > >> correct power requirements. So I maintain that this is the best we
> > >> can do and need a firmware update for us to address anything more.
> > >
> > > Will existing shipping products get these firmware updates?
> >
> > They are sure to trickle out but I guess it's upto the oem
> > to decide if they do want to pick these up like some of the
> > other firmware updates being tested only on CRD. Not sure why
> > warnings duplicates should block cpufreq from landing for x1e
> > but if that's what the community wants I can drop reposting
> > this series!
> 
> No, the community definitely wants to have cpufreq for X1E.
> But at the same time some reviewers prefer to have a warning-free boot
> if those warnings can't be really fixed. I don't have such a strict
> position, but I'd prefer to see those messages at dev_info or dev_dbg
> level.

I think dev_info could be an option from the SCMI perspective (as per my
other mail), the important thing in these regards is to NOT go
completely silent against fw anomalies...to avoid the the risk of being
silently ignored .... I'll see what Sudeep thinks about...

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25 10:29               ` Cristian Marussi
@ 2024-10-25 11:37                 ` Dmitry Baryshkov
  2024-10-25 13:32                 ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Dmitry Baryshkov @ 2024-10-25 11:37 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Sibi Sankar, Johan Hovold, sudeep.holla, ulf.hansson,
	jassisinghbrar, linux-kernel, arm-scmi, linux-arm-kernel,
	linux-arm-msm, konradybcio, linux-pm, tstrudel, rafael

On Fri, 25 Oct 2024 at 13:29, Cristian Marussi <cristian.marussi@arm.com> wrote:
>
> On Fri, Oct 25, 2024 at 01:11:37PM +0300, Dmitry Baryshkov wrote:
> > On Fri, 25 Oct 2024 at 09:46, Sibi Sankar <quic_sibis@quicinc.com> wrote:
> > >
> > >
> > >
> > > On 10/25/24 11:44, Dmitry Baryshkov wrote:
> > > > On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> > > >>
> > > >>
> > > >> On 10/23/24 21:56, Johan Hovold wrote:
> > > >>> On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> > > >>>> On 10/10/24 20:32, Johan Hovold wrote:
> > > >>>>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> > > >>>>>> The series addresses the kernel warnings reported by Johan at [1] and are
> > > >>>>>> are required to X1E cpufreq device tree changes [2] to land.
> > > >>>>>>
> > > >>>>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> > > >>>>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> > > >>>>>>
> > > >>>>>> The following warnings remain unadressed:
> > > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>>>
> > > >>>>> Are there any plans for how to address these?
> > > >>>
> > > >>>> Sorry missed replying to this. The error implies that duplicate
> > > >>>> opps are reported by the SCP firmware and appear once during probe.
> > > >>>
> > > >>> I only see it at boot, but it shows up four times here with the CRD:
> > > >>
> > > >> https://lore.kernel.org/lkml/d54f6851-d479-a136-f747-4c0180904a5e@quicinc.com/
> > > >>
> > > >> As explained ^^, we see duplicates for max sustainable performance twice
> > > >> for each domain.
> > > >
> > > > If existing products were shipped with the firmware that lists single
> > > > freq twice, please filter the frequencies like qcom-cpufreq-hw does.
> > >
> > > That was a qualcomm specific driver and hence we could do such
> > > kind of filtering. This however is the generic scmi perf protocol
> > > and it's not something we should ever consider introducing :/
> >
> > This depends on the maintainer's discretion.
> >
> > >
> > > >
> > > >>
> > > >>>
> > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>>
> > > >>>> This particular error can be fixed only by a firmware update and you
> > > >>>> should be able to test it out soon on the CRD first.
> > > >>>
> > > >>> Can you explain why this can only be fixed by a firmware update? Why
> > > >>> can't we suppress these warnings as well, like we did for the other
> > > >>> warnings related to the duplicate entries?
> > > >>>
> > > >>> IIUC the firmware is not really broken, but rather describes a feature
> > > >>> that Linux does not (yet) support, right?
> > > >>
> > > >> We keep saying it's a buggy firmware because the SCP firmware reports
> > > >> identical perf and power levels for the additional two opps and the
> > > >> kernel has no way of treating it otherwise and we shouldn't suppress
> > > >> them. Out of the two duplicate opps reported one is a artifact from how
> > > >> Qualcomm usually show a transition to boost frequencies. The second opp
> > > >> which you say is a feature should be treated as a boost opp i.e. one
> > > >> core can run at max at a lower power when other cores are at idle but
> > > >> we can start marking them as such once they start advertising their
> > > >> correct power requirements. So I maintain that this is the best we
> > > >> can do and need a firmware update for us to address anything more.
> > > >
> > > > Will existing shipping products get these firmware updates?
> > >
> > > They are sure to trickle out but I guess it's upto the oem
> > > to decide if they do want to pick these up like some of the
> > > other firmware updates being tested only on CRD. Not sure why
> > > warnings duplicates should block cpufreq from landing for x1e
> > > but if that's what the community wants I can drop reposting
> > > this series!
> >
> > No, the community definitely wants to have cpufreq for X1E.
> > But at the same time some reviewers prefer to have a warning-free boot
> > if those warnings can't be really fixed. I don't have such a strict
> > position, but I'd prefer to see those messages at dev_info or dev_dbg
> > level.
>
> I think dev_info could be an option from the SCMI perspective (as per my
> other mail), the important thing in these regards is to NOT go
> completely silent against fw anomalies...to avoid the the risk of being
> silently ignored .... I'll see what Sudeep thinks about...

Absolutely. SCMI layer knows that such a problem might exist and knows
how to handle it, so it should bug the user in a polite way.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25  6:08       ` Sibi Sankar
  2024-10-25  6:14         ` Dmitry Baryshkov
@ 2024-10-25 13:23         ` Johan Hovold
  1 sibling, 0 replies; 26+ messages in thread
From: Johan Hovold @ 2024-10-25 13:23 UTC (permalink / raw)
  To: Sibi Sankar
  Cc: sudeep.holla, cristian.marussi, ulf.hansson, jassisinghbrar,
	linux-kernel, arm-scmi, linux-arm-kernel, linux-arm-msm,
	konradybcio, linux-pm, tstrudel, rafael

On Fri, Oct 25, 2024 at 11:38:36AM +0530, Sibi Sankar wrote:
> On 10/23/24 21:56, Johan Hovold wrote:
> > On Wed, Oct 23, 2024 at 01:16:47PM +0530, Sibi Sankar wrote:
> >> On 10/10/24 20:32, Johan Hovold wrote:
> >>> On Mon, Oct 07, 2024 at 11:36:38AM +0530, Sibi Sankar wrote:
> >>>> The series addresses the kernel warnings reported by Johan at [1] and are
> >>>> are required to X1E cpufreq device tree changes [2] to land.
> >>>>
> >>>> [1] - https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/
> >>>> [2] - https://lore.kernel.org/lkml/20240612124056.39230-1-quic_sibis@quicinc.com/
> >>>>
> >>>> The following warnings remain unadressed:
> >>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>> arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> >>>
> >>> Are there any plans for how to address these?

> >> This particular error can be fixed only by a firmware update and you
> >> should be able to test it out soon on the CRD first.
> > 
> > Can you explain why this can only be fixed by a firmware update? Why
> > can't we suppress these warnings as well, like we did for the other
> > warnings related to the duplicate entries?
> > 
> > IIUC the firmware is not really broken, but rather describes a feature
> > that Linux does not (yet) support, right?
> 
> We keep saying it's a buggy firmware because the SCP firmware reports
> identical perf and power levels for the additional two opps and the
> kernel has no way of treating it otherwise and we shouldn't suppress
> them. Out of the two duplicate opps reported one is a artifact from how
> Qualcomm usually show a transition to boost frequencies. The second opp
> which you say is a feature should be treated as a boost opp i.e. one
> core can run at max at a lower power when other cores are at idle but
> we can start marking them as such once they start advertising their
> correct power requirements. So I maintain that this is the best we
> can do and need a firmware update for us to address anything more.

Fair enough, but if you end up respinning the series, please say
something about this in the cover letter so that we know why those
warnings are (rightly) left in place.

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25 10:29               ` Cristian Marussi
  2024-10-25 11:37                 ` Dmitry Baryshkov
@ 2024-10-25 13:32                 ` Johan Hovold
  2024-10-25 13:48                   ` Cristian Marussi
  1 sibling, 1 reply; 26+ messages in thread
From: Johan Hovold @ 2024-10-25 13:32 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Dmitry Baryshkov, Sibi Sankar, sudeep.holla, ulf.hansson,
	jassisinghbrar, linux-kernel, arm-scmi, linux-arm-kernel,
	linux-arm-msm, konradybcio, linux-pm, tstrudel, rafael

On Fri, Oct 25, 2024 at 11:29:05AM +0100, Cristian Marussi wrote:

> > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16

> I think dev_info could be an option from the SCMI perspective (as per my
> other mail), the important thing in these regards is to NOT go
> completely silent against fw anomalies...to avoid the the risk of being
> silently ignored .... I'll see what Sudeep thinks about...

I agree.

But could the error handling be improved to look less scary for an end
user by saying something about duplicate entries being ignored instead
perhaps?

Printing something at info level and with a FW_BUG ("[Firmware Bug]: ")
prefix as was done here:

	https://lore.kernel.org/all/20230414084619.31524-1-johan+linaro@kernel.org/

should make it clear that this is not something for end users to worry
(too much) about.

Johan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes
  2024-10-25 13:32                 ` Johan Hovold
@ 2024-10-25 13:48                   ` Cristian Marussi
  0 siblings, 0 replies; 26+ messages in thread
From: Cristian Marussi @ 2024-10-25 13:48 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Cristian Marussi, Dmitry Baryshkov, Sibi Sankar, sudeep.holla,
	ulf.hansson, jassisinghbrar, linux-kernel, arm-scmi,
	linux-arm-kernel, linux-arm-msm, konradybcio, linux-pm, tstrudel,
	rafael

On Fri, Oct 25, 2024 at 03:32:53PM +0200, Johan Hovold wrote:
> On Fri, Oct 25, 2024 at 11:29:05AM +0100, Cristian Marussi wrote:
> 
> > > > >>> [    8.098452] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.109647] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.128970] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> > > > >>> [    8.142455] arm-scmi arm-scmi.0.auto: Failed to add opps_by_lvl at 3417600 for NCC - ret:-16
> 
> > I think dev_info could be an option from the SCMI perspective (as per my
> > other mail), the important thing in these regards is to NOT go
> > completely silent against fw anomalies...to avoid the the risk of being
> > silently ignored .... I'll see what Sudeep thinks about...
> 
> I agree.
> 
> But could the error handling be improved to look less scary for an end
> user by saying something about duplicate entries being ignored instead
> perhaps?
> 
> Printing something at info level and with a FW_BUG ("[Firmware Bug]: ")
> prefix as was done here:
> 
> 	https://lore.kernel.org/all/20230414084619.31524-1-johan+linaro@kernel.org/
> 
> should make it clear that this is not something for end users to worry
> (too much) about.

Sure...and thanks for the suggestion....I will cook something up around
this....

(I am probably too used to try to scary the FW guys that I forgot there
are also innocent bystanders like final users :P)

Thanks,
Cristian

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-10-25 13:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07  6:06 [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Sibi Sankar
2024-10-07  6:06 ` [PATCH V3 1/4] firmware: arm_scmi: Ensure that the message-id supports fastchannel Sibi Sankar
2024-10-09 13:46   ` Sudeep Holla
2024-10-10 14:55   ` Johan Hovold
2024-10-07  6:06 ` [PATCH V3 2/4] firmware: arm_scmi: Skip adding bad duplicates Sibi Sankar
2024-10-07  6:06 ` [PATCH V3 3/4] pmdomain: core: Fix debugfs node creation failure Sibi Sankar
2024-10-07 17:33   ` Dmitry Baryshkov
2024-10-09 11:11     ` Ulf Hansson
2024-10-10 12:47       ` Dmitry Baryshkov
2024-10-07  6:06 ` [PATCH V3 4/4] mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag Sibi Sankar
2024-10-07 13:14   ` Konrad Dybcio
2024-10-10 14:58   ` Johan Hovold
2024-10-09 11:14 ` [PATCH V3 0/4] firmware: arm_scmi: Misc Fixes Ulf Hansson
2024-10-10 15:02 ` Johan Hovold
2024-10-23  7:46   ` Sibi Sankar
2024-10-23 16:26     ` Johan Hovold
2024-10-25  6:08       ` Sibi Sankar
2024-10-25  6:14         ` Dmitry Baryshkov
2024-10-25  6:45           ` Sibi Sankar
2024-10-25  8:28             ` Cristian Marussi
2024-10-25 10:11             ` Dmitry Baryshkov
2024-10-25 10:29               ` Cristian Marussi
2024-10-25 11:37                 ` Dmitry Baryshkov
2024-10-25 13:32                 ` Johan Hovold
2024-10-25 13:48                   ` Cristian Marussi
2024-10-25 13:23         ` Johan Hovold

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).