* [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver
@ 2023-09-01 16:41 Sumit Gupta
2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Sumit Gupta @ 2023-09-01 16:41 UTC (permalink / raw)
To: rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel
Cc: treding, jonathanh, bbasu, sumitg, amiettinen
This patch set adds below improvements to the Tegra194 CPUFREQ driver.
They are applicable to all the Tegra SoC's supported by the driver.
1) Patch 1: Avoid making SMP call on every frequency request to reduce
the time for frequency set and get calls.
2) Patch 2: Use reference clock count based loop instead of udelay()
to improve the accuracy of re-generated CPU frequency.
The patches are not related but have minor conflict. So, need to be
applied in order of patch numbers. If 'Patch 2' is to be applied first
then will rebase that and send separately.
Sumit Gupta (2):
cpufreq: tegra194: save CPU data to avoid repeated SMP calls
cpufreq: tegra194: use refclk delta based loop instead of udelay
drivers/cpufreq/tegra194-cpufreq.c | 151 ++++++++++++++++++++---------
1 file changed, 106 insertions(+), 45 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-01 16:41 [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta @ 2023-09-01 16:41 ` Sumit Gupta 2023-09-01 17:58 ` Jon Hunter 2023-09-28 7:05 ` Viresh Kumar 2023-09-01 16:41 ` [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay Sumit Gupta 2023-09-19 11:28 ` [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta 2 siblings, 2 replies; 12+ messages in thread From: Sumit Gupta @ 2023-09-01 16:41 UTC (permalink / raw) To: rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, jonathanh, bbasu, sumitg, amiettinen Currently, we make SMP call on every frequency set request to get the physical 'CPU ID' and 'CLUSTER ID' for the target CPU. This change optimizes the repeated calls by storing the physical IDs and the per core frequency register offset for all CPUs during boot. Later this info is used directly when required to set the frequency or read it from ACTMON counters. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> --- drivers/cpufreq/tegra194-cpufreq.c | 79 +++++++++++++++++++----------- 1 file changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c index 386aed3637b4..0d0893cf7f18 100644 --- a/drivers/cpufreq/tegra194-cpufreq.c +++ b/drivers/cpufreq/tegra194-cpufreq.c @@ -39,6 +39,12 @@ /* cpufreq transisition latency */ #define TEGRA_CPUFREQ_TRANSITION_LATENCY (300 * 1000) /* unit in nanoseconds */ +struct tegra_cpu_data { + u32 cpuid; + u32 clusterid; + void __iomem *freq_core_reg; +}; + struct tegra_cpu_ctr { u32 cpu; u32 coreclk_cnt, last_coreclk_cnt; @@ -69,6 +75,7 @@ struct tegra194_cpufreq_data { struct cpufreq_frequency_table **bpmp_luts; const struct tegra_cpufreq_soc *soc; bool icc_dram_bw_scaling; + struct tegra_cpu_data *cpu_data; }; static struct workqueue_struct *read_counters_wq; @@ -116,14 +123,8 @@ static void tegra234_get_cpu_cluster_id(u32 cpu, u32 *cpuid, u32 *clusterid) static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); - void __iomem *freq_core_reg; - u64 mpidr_id; - - /* use physical id to get address of per core frequency register */ - mpidr_id = (clusterid * data->soc->maxcpus_per_cluster) + cpuid; - freq_core_reg = SCRATCH_FREQ_CORE_REG(data, mpidr_id); - *ndiv = readl(freq_core_reg) & NDIV_MASK; + *ndiv = readl(data->cpu_data[cpu].freq_core_reg) & NDIV_MASK; return 0; } @@ -131,19 +132,10 @@ static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) static void tegra234_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); - void __iomem *freq_core_reg; - u32 cpu, cpuid, clusterid; - u64 mpidr_id; - - for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) { - data->soc->ops->get_cpu_cluster_id(cpu, &cpuid, &clusterid); - - /* use physical id to get address of per core frequency register */ - mpidr_id = (clusterid * data->soc->maxcpus_per_cluster) + cpuid; - freq_core_reg = SCRATCH_FREQ_CORE_REG(data, mpidr_id); + u32 cpu; - writel(ndiv, freq_core_reg); - } + for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) + writel(ndiv, data->cpu_data[cpu].freq_core_reg); } /* @@ -157,11 +149,10 @@ static void tegra234_read_counters(struct tegra_cpu_ctr *c) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); void __iomem *actmon_reg; - u32 cpuid, clusterid; u64 val; - data->soc->ops->get_cpu_cluster_id(c->cpu, &cpuid, &clusterid); - actmon_reg = CORE_ACTMON_CNTR_REG(data, clusterid, cpuid); + actmon_reg = CORE_ACTMON_CNTR_REG(data, data->cpu_data[c->cpu].clusterid, + data->cpu_data[c->cpu].cpuid); val = readq(actmon_reg); c->last_refclk_cnt = upper_32_bits(val); @@ -357,19 +348,17 @@ static void tegra194_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) static unsigned int tegra194_get_speed(u32 cpu) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); + u32 clusterid = data->cpu_data[cpu].clusterid; struct cpufreq_frequency_table *pos; - u32 cpuid, clusterid; unsigned int rate; u64 ndiv; int ret; - data->soc->ops->get_cpu_cluster_id(cpu, &cpuid, &clusterid); - /* reconstruct actual cpu freq using counters */ rate = tegra194_calculate_speed(cpu); /* get last written ndiv value */ - ret = data->soc->ops->get_cpu_ndiv(cpu, cpuid, clusterid, &ndiv); + ret = data->soc->ops->get_cpu_ndiv(cpu, data->cpu_data[cpu].cpuid, clusterid, &ndiv); if (WARN_ON_ONCE(ret)) return rate; @@ -475,13 +464,12 @@ static int tegra194_cpufreq_init(struct cpufreq_policy *policy) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); int maxcpus_per_cluster = data->soc->maxcpus_per_cluster; + u32 clusterid = data->cpu_data[policy->cpu].clusterid; struct cpufreq_frequency_table *freq_table; struct cpufreq_frequency_table *bpmp_lut; u32 start_cpu, cpu; - u32 clusterid; int ret; - data->soc->ops->get_cpu_cluster_id(policy->cpu, NULL, &clusterid); if (clusterid >= data->soc->num_clusters || !data->bpmp_luts[clusterid]) return -EINVAL; @@ -659,6 +647,28 @@ tegra_cpufreq_bpmp_read_lut(struct platform_device *pdev, struct tegra_bpmp *bpm return freq_table; } +static int tegra194_cpufreq_store_physids(unsigned int cpu, struct tegra194_cpufreq_data *data) +{ + int num_cpus = data->soc->maxcpus_per_cluster * data->soc->num_clusters; + u32 cpuid, clusterid; + u64 mpidr_id; + + if (cpu > (num_cpus - 1)) { + pr_err("cpufreq: wrong num of cpus or clusters in soc data\n"); + return -EINVAL; + } + + data->soc->ops->get_cpu_cluster_id(cpu, &cpuid, &clusterid); + + mpidr_id = (clusterid * data->soc->maxcpus_per_cluster) + cpuid; + + data->cpu_data[cpu].cpuid = cpuid; + data->cpu_data[cpu].clusterid = clusterid; + data->cpu_data[cpu].freq_core_reg = SCRATCH_FREQ_CORE_REG(data, mpidr_id); + + return 0; +} + static int tegra194_cpufreq_probe(struct platform_device *pdev) { const struct tegra_cpufreq_soc *soc; @@ -666,6 +676,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev) struct tegra_bpmp *bpmp; struct device *cpu_dev; int err, i; + u32 cpu; data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL); if (!data) @@ -692,6 +703,12 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev) return PTR_ERR(data->regs); } + data->cpu_data = devm_kcalloc(&pdev->dev, data->soc->num_clusters * + data->soc->maxcpus_per_cluster, + sizeof(struct tegra_cpu_data), GFP_KERNEL); + if (!data->cpu_data) + return -ENOMEM; + platform_set_drvdata(pdev, data); bpmp = tegra_bpmp_get(&pdev->dev); @@ -713,6 +730,12 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev) } } + for_each_possible_cpu(cpu) { + err = tegra194_cpufreq_store_physids(cpu, data); + if (err) + goto err_free_res; + } + tegra194_cpufreq_driver.driver_data = data; /* Check for optional OPPv2 and interconnect paths on CPU0 to enable ICC scaling */ -- 2.17.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta @ 2023-09-01 17:58 ` Jon Hunter 2023-09-01 19:51 ` Sumit Gupta 2023-09-28 7:05 ` Viresh Kumar 1 sibling, 1 reply; 12+ messages in thread From: Jon Hunter @ 2023-09-01 17:58 UTC (permalink / raw) To: Sumit Gupta, rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, bbasu, amiettinen On 01/09/2023 17:41, Sumit Gupta wrote: > Currently, we make SMP call on every frequency set request to get the > physical 'CPU ID' and 'CLUSTER ID' for the target CPU. This change > optimizes the repeated calls by storing the physical IDs and the per > core frequency register offset for all CPUs during boot. Later this > info is used directly when required to set the frequency or read it > from ACTMON counters. > > Signed-off-by: Sumit Gupta <sumitg@nvidia.com> > --- You should list the changes you made in V2 under the above '---' so it is clear to the reviewers what has changed. I believe in this case you simply change the subject but please clarify. Jon -- nvpublic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-01 17:58 ` Jon Hunter @ 2023-09-01 19:51 ` Sumit Gupta 0 siblings, 0 replies; 12+ messages in thread From: Sumit Gupta @ 2023-09-01 19:51 UTC (permalink / raw) To: Jon Hunter, rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, bbasu, amiettinen, Sumit Gupta On 01/09/23 23:28, Jon Hunter wrote: > > > On 01/09/2023 17:41, Sumit Gupta wrote: >> Currently, we make SMP call on every frequency set request to get the >> physical 'CPU ID' and 'CLUSTER ID' for the target CPU. This change >> optimizes the repeated calls by storing the physical IDs and the per >> core frequency register offset for all CPUs during boot. Later this >> info is used directly when required to set the frequency or read it >> from ACTMON counters. >> >> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> >> --- > > You should list the changes you made in V2 under the above '---' so it > is clear to the reviewers what has changed. I believe in this case you > simply change the subject but please clarify. > > Jon > Yes, only changed the subject. There is no other change. Sorry, forgot to mention. Thank you, Sumit Gupta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta 2023-09-01 17:58 ` Jon Hunter @ 2023-09-28 7:05 ` Viresh Kumar 2023-09-29 14:17 ` Sumit Gupta 1 sibling, 1 reply; 12+ messages in thread From: Viresh Kumar @ 2023-09-28 7:05 UTC (permalink / raw) To: Sumit Gupta Cc: rafael, linux-pm, linux-tegra, linux-kernel, treding, jonathanh, bbasu, amiettinen On 01-09-23, 22:11, Sumit Gupta wrote: > @@ -131,19 +132,10 @@ static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) > static void tegra234_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) > { > + for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) (Yes this is existing code, but ..) you don't need to perform AND with cpu_online_mask as policy->cpus should only contain currently online CPUs. Please check if you ever see it differently. > + data->cpu_data = devm_kcalloc(&pdev->dev, data->soc->num_clusters * > + data->soc->maxcpus_per_cluster, > + sizeof(struct tegra_cpu_data), GFP_KERNEL); This should be: sizeof(*data->cpu_data) instead. Didn't checkpatch complain about it ? -- viresh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-28 7:05 ` Viresh Kumar @ 2023-09-29 14:17 ` Sumit Gupta 2023-10-03 5:00 ` Viresh Kumar 0 siblings, 1 reply; 12+ messages in thread From: Sumit Gupta @ 2023-09-29 14:17 UTC (permalink / raw) To: Viresh Kumar Cc: rafael, linux-pm, linux-tegra, linux-kernel, treding, jonathanh, bbasu, amiettinen, Sumit Gupta On 28/09/23 12:35, Viresh Kumar wrote: > External email: Use caution opening links or attachments > > > On 01-09-23, 22:11, Sumit Gupta wrote: >> @@ -131,19 +132,10 @@ static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) >> static void tegra234_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) >> { > >> + for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) > > (Yes this is existing code, but ..) you don't need to perform AND with > cpu_online_mask as policy->cpus should only contain currently online CPUs. > > Please check if you ever see it differently. > I think this was kept to be safe. Should I removed the AND in v3 or send separate patch? >> + data->cpu_data = devm_kcalloc(&pdev->dev, data->soc->num_clusters * >> + data->soc->maxcpus_per_cluster, >> + sizeof(struct tegra_cpu_data), GFP_KERNEL); > > This should be: sizeof(*data->cpu_data) instead. Didn't checkpatch complain > about it ? > > -- > viresh Checkpatch didn't highlight it. Will do the change in v3. Thank you, Sumit Gupta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-09-29 14:17 ` Sumit Gupta @ 2023-10-03 5:00 ` Viresh Kumar 2023-10-04 14:11 ` Sumit Gupta 0 siblings, 1 reply; 12+ messages in thread From: Viresh Kumar @ 2023-10-03 5:00 UTC (permalink / raw) To: Sumit Gupta Cc: rafael, linux-pm, linux-tegra, linux-kernel, treding, jonathanh, bbasu, amiettinen On 29-09-23, 19:47, Sumit Gupta wrote: > > > On 28/09/23 12:35, Viresh Kumar wrote: > > External email: Use caution opening links or attachments > > > > > > On 01-09-23, 22:11, Sumit Gupta wrote: > > > @@ -131,19 +132,10 @@ static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) > > > static void tegra234_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) > > > { > > > > > + for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) > > > > (Yes this is existing code, but ..) you don't need to perform AND with > > cpu_online_mask as policy->cpus should only contain currently online CPUs. > > > > Please check if you ever see it differently. > > > > I think this was kept to be safe. > Should I removed the AND in v3 or send separate patch? Sending it separately would be ideal. -- viresh ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls 2023-10-03 5:00 ` Viresh Kumar @ 2023-10-04 14:11 ` Sumit Gupta 0 siblings, 0 replies; 12+ messages in thread From: Sumit Gupta @ 2023-10-04 14:11 UTC (permalink / raw) To: Viresh Kumar Cc: rafael, linux-pm, linux-tegra, linux-kernel, treding, jonathanh, bbasu, amiettinen, Sumit Gupta On 03/10/23 10:30, Viresh Kumar wrote: > External email: Use caution opening links or attachments > > > On 29-09-23, 19:47, Sumit Gupta wrote: >> >> >> On 28/09/23 12:35, Viresh Kumar wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On 01-09-23, 22:11, Sumit Gupta wrote: >>>> @@ -131,19 +132,10 @@ static int tegra234_get_cpu_ndiv(u32 cpu, u32 cpuid, u32 clusterid, u64 *ndiv) >>>> static void tegra234_set_cpu_ndiv(struct cpufreq_policy *policy, u64 ndiv) >>>> { >>> >>>> + for_each_cpu_and(cpu, policy->cpus, cpu_online_mask) >>> >>> (Yes this is existing code, but ..) you don't need to perform AND with >>> cpu_online_mask as policy->cpus should only contain currently online CPUs. >>> >>> Please check if you ever see it differently. >>> >> >> I think this was kept to be safe. >> Should I removed the AND in v3 or send separate patch? > > Sending it separately would be ideal. > > -- > viresh Sent v3 with the "sizeof(*data->cpu_data)" change. Will send a separate patch with change to remove AND with mask as suggested. Thank you, Sumit Gupta ^ permalink raw reply [flat|nested] 12+ messages in thread
* [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay 2023-09-01 16:41 [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta 2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta @ 2023-09-01 16:41 ` Sumit Gupta 2023-09-01 17:59 ` Jon Hunter 2023-09-19 11:28 ` [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta 2 siblings, 1 reply; 12+ messages in thread From: Sumit Gupta @ 2023-09-01 16:41 UTC (permalink / raw) To: rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, jonathanh, bbasu, sumitg, amiettinen Use reference clock count based loop instead of "udelay()" for sampling of counters to improve the accuracy of re-generated CPU frequency. "udelay()" internally calls "WFE" which stops the counters and results in bigger delta between the last set freq and the re-generated value from counters. The counter sampling window used in loop is the minimum number of reference clock cycles which is known to give a stable value of CPU frequency. The change also helps to reduce the sampling window from "500us" to "<50us". Suggested-by: Antti Miettinen <amiettinen@nvidia.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> --- drivers/cpufreq/tegra194-cpufreq.c | 72 +++++++++++++++++++++++------- 1 file changed, 55 insertions(+), 17 deletions(-) diff --git a/drivers/cpufreq/tegra194-cpufreq.c b/drivers/cpufreq/tegra194-cpufreq.c index 0d0893cf7f18..cc84c752fff8 100644 --- a/drivers/cpufreq/tegra194-cpufreq.c +++ b/drivers/cpufreq/tegra194-cpufreq.c @@ -5,7 +5,6 @@ #include <linux/cpu.h> #include <linux/cpufreq.h> -#include <linux/delay.h> #include <linux/dma-mapping.h> #include <linux/module.h> #include <linux/of.h> @@ -21,10 +20,11 @@ #define KHZ 1000 #define REF_CLK_MHZ 408 /* 408 MHz */ -#define US_DELAY 500 #define CPUFREQ_TBL_STEP_HZ (50 * KHZ * KHZ) #define MAX_CNT ~0U +#define MAX_DELTA_KHZ 115200 + #define NDIV_MASK 0x1FF #define CORE_OFFSET(cpu) (cpu * 8) @@ -68,6 +68,7 @@ struct tegra_cpufreq_soc { int maxcpus_per_cluster; unsigned int num_clusters; phys_addr_t actmon_cntr_base; + u32 refclk_delta_min; }; struct tegra194_cpufreq_data { @@ -149,6 +150,8 @@ static void tegra234_read_counters(struct tegra_cpu_ctr *c) { struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); void __iomem *actmon_reg; + u32 delta_refcnt; + int cnt = 0; u64 val; actmon_reg = CORE_ACTMON_CNTR_REG(data, data->cpu_data[c->cpu].clusterid, @@ -157,10 +160,25 @@ static void tegra234_read_counters(struct tegra_cpu_ctr *c) val = readq(actmon_reg); c->last_refclk_cnt = upper_32_bits(val); c->last_coreclk_cnt = lower_32_bits(val); - udelay(US_DELAY); - val = readq(actmon_reg); - c->refclk_cnt = upper_32_bits(val); - c->coreclk_cnt = lower_32_bits(val); + + /* + * The sampling window is based on the minimum number of reference + * clock cycles which is known to give a stable value of CPU frequency. + */ + do { + val = readq(actmon_reg); + c->refclk_cnt = upper_32_bits(val); + c->coreclk_cnt = lower_32_bits(val); + if (c->refclk_cnt < c->last_refclk_cnt) + delta_refcnt = c->refclk_cnt + (MAX_CNT - c->last_refclk_cnt); + else + delta_refcnt = c->refclk_cnt - c->last_refclk_cnt; + if (++cnt >= 0xFFFF) { + pr_warn("cpufreq: problem with refclk on cpu:%d, delta_refcnt:%u, cnt:%d\n", + c->cpu, delta_refcnt, cnt); + break; + } + } while (delta_refcnt < data->soc->refclk_delta_min); } static struct tegra_cpufreq_ops tegra234_cpufreq_ops = { @@ -175,6 +193,7 @@ static const struct tegra_cpufreq_soc tegra234_cpufreq_soc = { .actmon_cntr_base = 0x9000, .maxcpus_per_cluster = 4, .num_clusters = 3, + .refclk_delta_min = 16000, }; static const struct tegra_cpufreq_soc tegra239_cpufreq_soc = { @@ -182,6 +201,7 @@ static const struct tegra_cpufreq_soc tegra239_cpufreq_soc = { .actmon_cntr_base = 0x4000, .maxcpus_per_cluster = 8, .num_clusters = 1, + .refclk_delta_min = 16000, }; static void tegra194_get_cpu_cluster_id(u32 cpu, u32 *cpuid, u32 *clusterid) @@ -222,15 +242,33 @@ static inline u32 map_ndiv_to_freq(struct mrq_cpu_ndiv_limits_response static void tegra194_read_counters(struct tegra_cpu_ctr *c) { + struct tegra194_cpufreq_data *data = cpufreq_get_driver_data(); + u32 delta_refcnt; + int cnt = 0; u64 val; val = read_freq_feedback(); c->last_refclk_cnt = lower_32_bits(val); c->last_coreclk_cnt = upper_32_bits(val); - udelay(US_DELAY); - val = read_freq_feedback(); - c->refclk_cnt = lower_32_bits(val); - c->coreclk_cnt = upper_32_bits(val); + + /* + * The sampling window is based on the minimum number of reference + * clock cycles which is known to give a stable value of CPU frequency. + */ + do { + val = read_freq_feedback(); + c->refclk_cnt = lower_32_bits(val); + c->coreclk_cnt = upper_32_bits(val); + if (c->refclk_cnt < c->last_refclk_cnt) + delta_refcnt = c->refclk_cnt + (MAX_CNT - c->last_refclk_cnt); + else + delta_refcnt = c->refclk_cnt - c->last_refclk_cnt; + if (++cnt >= 0xFFFF) { + pr_warn("cpufreq: problem with refclk on cpu:%d, delta_refcnt:%u, cnt:%d\n", + c->cpu, delta_refcnt, cnt); + break; + } + } while (delta_refcnt < data->soc->refclk_delta_min); } static void tegra_read_counters(struct work_struct *work) @@ -288,9 +326,8 @@ static unsigned int tegra194_calculate_speed(u32 cpu) u32 rate_mhz; /* - * udelay() is required to reconstruct cpu frequency over an - * observation window. Using workqueue to call udelay() with - * interrupts enabled. + * Reconstruct cpu frequency over an observation/sampling window. + * Using workqueue to keep interrupts enabled during the interval. */ read_counters_work.c.cpu = cpu; INIT_WORK_ONSTACK(&read_counters_work.work, tegra_read_counters); @@ -372,9 +409,9 @@ static unsigned int tegra194_get_speed(u32 cpu) if (pos->driver_data != ndiv) continue; - if (abs(pos->frequency - rate) > 115200) { - pr_warn("cpufreq: cpu%d,cur:%u,set:%u,set ndiv:%llu\n", - cpu, rate, pos->frequency, ndiv); + if (abs(pos->frequency - rate) > MAX_DELTA_KHZ) { + pr_warn("cpufreq: cpu%d,cur:%u,set:%u,delta:%d,set ndiv:%llu\n", + cpu, rate, pos->frequency, abs(rate - pos->frequency), ndiv); } else { rate = pos->frequency; } @@ -568,6 +605,7 @@ static const struct tegra_cpufreq_soc tegra194_cpufreq_soc = { .ops = &tegra194_cpufreq_ops, .maxcpus_per_cluster = 2, .num_clusters = 4, + .refclk_delta_min = 16000, }; static void tegra194_cpufreq_free_resources(void) @@ -684,7 +722,7 @@ static int tegra194_cpufreq_probe(struct platform_device *pdev) soc = of_device_get_match_data(&pdev->dev); - if (soc->ops && soc->maxcpus_per_cluster && soc->num_clusters) { + if (soc->ops && soc->maxcpus_per_cluster && soc->num_clusters && soc->refclk_delta_min) { data->soc = soc; } else { dev_err(&pdev->dev, "soc data missing\n"); -- 2.17.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay 2023-09-01 16:41 ` [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay Sumit Gupta @ 2023-09-01 17:59 ` Jon Hunter 2023-09-01 19:53 ` Sumit Gupta 0 siblings, 1 reply; 12+ messages in thread From: Jon Hunter @ 2023-09-01 17:59 UTC (permalink / raw) To: Sumit Gupta, rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, bbasu, amiettinen On 01/09/2023 17:41, Sumit Gupta wrote: > Use reference clock count based loop instead of "udelay()" for > sampling of counters to improve the accuracy of re-generated CPU > frequency. "udelay()" internally calls "WFE" which stops the > counters and results in bigger delta between the last set freq > and the re-generated value from counters. The counter sampling > window used in loop is the minimum number of reference clock > cycles which is known to give a stable value of CPU frequency. > The change also helps to reduce the sampling window from "500us" > to "<50us". > > Suggested-by: Antti Miettinen <amiettinen@nvidia.com> > Signed-off-by: Sumit Gupta <sumitg@nvidia.com> > --- Same here. Jon -- nvpublic ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay 2023-09-01 17:59 ` Jon Hunter @ 2023-09-01 19:53 ` Sumit Gupta 0 siblings, 0 replies; 12+ messages in thread From: Sumit Gupta @ 2023-09-01 19:53 UTC (permalink / raw) To: Jon Hunter, rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, bbasu, amiettinen, Sumit Gupta On 01/09/23 23:29, Jon Hunter wrote: > > On 01/09/2023 17:41, Sumit Gupta wrote: >> Use reference clock count based loop instead of "udelay()" for >> sampling of counters to improve the accuracy of re-generated CPU >> frequency. "udelay()" internally calls "WFE" which stops the >> counters and results in bigger delta between the last set freq >> and the re-generated value from counters. The counter sampling >> window used in loop is the minimum number of reference clock >> cycles which is known to give a stable value of CPU frequency. >> The change also helps to reduce the sampling window from "500us" >> to "<50us". >> >> Suggested-by: Antti Miettinen <amiettinen@nvidia.com> >> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> >> --- > > Same here. > > Jon > This patch is newly added in v2. Thank you, Sumit Gupta ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver 2023-09-01 16:41 [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta 2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta 2023-09-01 16:41 ` [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay Sumit Gupta @ 2023-09-19 11:28 ` Sumit Gupta 2 siblings, 0 replies; 12+ messages in thread From: Sumit Gupta @ 2023-09-19 11:28 UTC (permalink / raw) To: rafael, viresh.kumar, linux-pm, linux-tegra, linux-kernel Cc: treding, jonathanh, bbasu, amiettinen, Sumit Gupta Gentle Ping. Thank you, Sumit Gupta On 01/09/23 22:11, Sumit Gupta wrote: > This patch set adds below improvements to the Tegra194 CPUFREQ driver. > They are applicable to all the Tegra SoC's supported by the driver. > > 1) Patch 1: Avoid making SMP call on every frequency request to reduce > the time for frequency set and get calls. > > 2) Patch 2: Use reference clock count based loop instead of udelay() > to improve the accuracy of re-generated CPU frequency. > > The patches are not related but have minor conflict. So, need to be > applied in order of patch numbers. If 'Patch 2' is to be applied first > then will rebase that and send separately. > > Sumit Gupta (2): > cpufreq: tegra194: save CPU data to avoid repeated SMP calls > cpufreq: tegra194: use refclk delta based loop instead of udelay > > drivers/cpufreq/tegra194-cpufreq.c | 151 ++++++++++++++++++++--------- > 1 file changed, 106 insertions(+), 45 deletions(-) > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-10-04 14:11 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-01 16:41 [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta 2023-09-01 16:41 ` [Patch v2 1/2] cpufreq: tegra194: save CPU data to avoid repeated SMP calls Sumit Gupta 2023-09-01 17:58 ` Jon Hunter 2023-09-01 19:51 ` Sumit Gupta 2023-09-28 7:05 ` Viresh Kumar 2023-09-29 14:17 ` Sumit Gupta 2023-10-03 5:00 ` Viresh Kumar 2023-10-04 14:11 ` Sumit Gupta 2023-09-01 16:41 ` [Patch v2 2/2] cpufreq: tegra194: use refclk delta based loop instead of udelay Sumit Gupta 2023-09-01 17:59 ` Jon Hunter 2023-09-01 19:53 ` Sumit Gupta 2023-09-19 11:28 ` [Patch v2 0/2] Improvements to the Tegra CPUFREQ driver Sumit Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).