Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* [PATCH v12 0/6] Add utilization clamping support (CGroups API)
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini

Hi all, this is a respin of:

  https://lore.kernel.org/lkml/20190708084357.12944-1-patrick.bellasi@arm.com/

which addresses all the comments collected so far:

- track requested cgroup's percentage to mask conversion rounding to userspace
- use a dedicated variable for parent restrictions
- make more explicit in the documentation that the requested "protection" is
  always capped by the requested "limit"
- use the newly added uclamp_mutex to serialize the sysfs write callback
- add missing RCU read locks across cpu_util_update_eff() call from
  uclamp_update_root_tg()
- remove not required and confusing sentence from the above changelog
- add a new patch to always use enum uclamp_id for clamp_id values
- fix percentage's decimals format string

as well as adds some small modifications:

- introduce UCLAMP_PERCENT_{SHIFT,SCALE} to avoid hardcoded constants
- s/uclamp_scale_from_percent()/capacity_from_percent()/
- move range check from cpu_uclamp_{min,max}_write() to capacity_from_percent()

The series is based on top of today's Linus master branch (wip for 5.3-rc1):

  commit 22051d9c4a57 ("Merge tag 'platform-drivers-x86-v5.3-2' of git://git.infradead.org/linux-platform-drivers-x86")

Thanks Quentin, Michal and Tejun for your review comments!

This has been the first code review targeting specifically the cgroups bits and
the series is now hopefully in a better shape.

Looking forward for any additional comments! ;)

Cheers,
Patrick

Series Organization
===================

The full tree is available here:

   git://linux-arm.org/linux-pb.git   lkml/utilclamp_v12
   http://www.linux-arm.org/git?p=linux-pb.git;a=shortlog;h=refs/heads/lkml/utilclamp_v12

Newcomer's Short Abstract
=========================

The Linux scheduler tracks a "utilization" signal for each scheduling entity
(SE), e.g. tasks, to know how much CPU time they use. This signal allows the
scheduler to know how "big" a task is and, in principle, it can support
advanced task placement strategies by selecting the best CPU to run a task.
Some of these strategies are represented by the Energy Aware Scheduler [1].

When the schedutil cpufreq governor is in use, the utilization signal allows
the Linux scheduler to also drive frequency selection. The CPU utilization
signal, which represents the aggregated utilization of tasks scheduled on that
CPU, is used to select the frequency which best fits the workload generated by
the tasks.

The current translation of utilization values into a frequency selection is
simple: we go to max for RT tasks or to the minimum frequency which can
accommodate the utilization of DL+FAIR tasks.
However, utilization values by themselves cannot convey the desired
power/performance behaviors of each task as intended by user-space.
As such they are not ideally suited for task placement decisions.

Task placement and frequency selection policies in the kernel can be improved
by taking into consideration hints coming from authorized user-space elements,
like for example the Android middleware or more generally any "System
Management Software" (SMS) framework.

Utilization clamping is a mechanism which allows to "clamp" (i.e. filter) the
utilization generated by RT and FAIR tasks within a range defined by user-space.
The clamped utilization value can then be used, for example, to enforce a
minimum and/or maximum frequency depending on which tasks are active on a CPU.

The main use-cases for utilization clamping are:

 - boosting: better interactive response for small tasks which
   are affecting the user experience.

   Consider for example the case of a small control thread for an external
   accelerator (e.g. GPU, DSP, other devices). Here, from the task utilization
   the scheduler does not have a complete view of what the task's requirements
   are and, if it's a small utilization task, it keeps selecting a more energy
   efficient CPU, with smaller capacity and lower frequency, thus negatively
   impacting the overall time required to complete task activations.

 - capping: increase energy efficiency for background tasks not affecting the
   user experience.

   Since running on a lower capacity CPU at a lower frequency is more energy
   efficient, when the completion time is not a main goal, then capping the
   utilization considered for certain (maybe big) tasks can have positive
   effects, both on energy consumption and thermal headroom.
   This feature allows also to make RT tasks more energy friendly on mobile
   systems where running them on high capacity CPUs and at the maximum
   frequency is not required.

From these two use-cases, it's worth noticing that frequency selection
biasing, introduced by patches 9 and 10 of this series, is just one possible
usage of utilization clamping. Another compelling extension of utilization
clamping is in helping the scheduler in making tasks placement decisions.

Utilization is (also) a task specific property the scheduler uses to know
how much CPU bandwidth a task requires, at least as long as there is idle time.
Thus, the utilization clamp values, defined either per-task or per-task_group,
can represent tasks to the scheduler as being bigger (or smaller) than what
they actually are.

Utilization clamping thus enables interesting additional optimizations, for
example on asymmetric capacity systems like Arm big.LITTLE and DynamIQ CPUs,
where:

 - boosting: try to run small/foreground tasks on higher-capacity CPUs to
   complete them faster despite being less energy efficient.

 - capping: try to run big/background tasks on low-capacity CPUs to save power
   and thermal headroom for more important tasks

This series does not present this additional usage of utilization clamping but
it's an integral part of the EAS feature set, where [2] is one of its main
components.

Android kernels use SchedTune, a solution similar to utilization clamping, to
bias both 'frequency selection' and 'task placement'. This series provides the
foundation to add similar features to mainline while focusing, for the
time being, just on schedutil integration.

References
==========

[1] Energy Aware Scheduling
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-energy.txt?h=v5.1

[2] Expressing per-task/per-cgroup performance hints
    Linux Plumbers Conference 2018
    https://linuxplumbersconf.org/event/2/contributions/128/

Patrick Bellasi (6):
  sched/core: uclamp: Extend CPU's cgroup controller
  sched/core: uclamp: Propagate parent clamps
  sched/core: uclamp: Propagate system defaults to root group
  sched/core: uclamp: Use TG's clamps to restrict TASK's clamps
  sched/core: uclamp: Update CPU's refcount on TG's clamp changes
  sched/core: uclamp: always use enum uclamp_id for clamp_id values

 Documentation/admin-guide/cgroup-v2.rst |  34 +++
 init/Kconfig                            |  22 ++
 kernel/sched/core.c                     | 382 ++++++++++++++++++++++--
 kernel/sched/sched.h                    |  12 +-
 4 files changed, 430 insertions(+), 20 deletions(-)

-- 
2.22.0

^ permalink raw reply

* [PATCH v12 1/6] sched/core: uclamp: Extend CPU's cgroup controller
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.

With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.

Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.

Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.

Specifically:

- uclamp.min: defines the minimum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run at least at a
	      	 minimum frequency which corresponds to the uclamp.min
	      	 utilization

- uclamp.max: defines the maximum utilization which should be considered
	      i.e. the RUNNABLE tasks of this group will run up to a
	      	 maximum frequency which corresponds to the uclamp.max
	      	 utilization

These attributes:

a) are available only for non-root nodes, both on default and legacy
   hierarchies, while system wide clamps are defined by a generic
   interface which does not depends on cgroups. This system wide
   interface enforces constraints on tasks in the root node.

b) enforce effective constraints at each level of the hierarchy which
   are a restriction of the group requests considering its parent's
   effective constraints. Root group effective constraints are defined
   by the system wide interface.
   This mechanism allows each (non-root) level of the hierarchy to:
   - request whatever clamp values it would like to get
   - effectively get only up to the maximum amount allowed by its parent

c) have higher priority than task-specific clamps, defined via
   sched_setattr(), thus allowing to control and restrict task requests.

Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Validate local consistency by enforcing uclamp.min < uclamp.max.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---
Changes in v12:
 Message-ID: <20190715133801.yohhd2hywzsv3uyf@e110439-lin>
 - track requested cgroup's percentage to mask conversion rounding to userspace
 - introduce UCLAMP_PERCENT_{SHIFT,SCALE} to avoid hardcoded constants
 - s/uclamp_scale_from_percent()/capacity_from_percent()/
 - move range check from cpu_uclamp_{min,max}_write() to capacity_from_percent()
 Message-ID: <20190718152327.vmnds3kpagh2xz2r@e110439-lin>
 - fix percentage's decimals format string
---
 Documentation/admin-guide/cgroup-v2.rst |  34 +++++
 init/Kconfig                            |  22 +++
 kernel/sched/core.c                     | 175 +++++++++++++++++++++++-
 kernel/sched/sched.h                    |   8 ++
 4 files changed, 238 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3b29005aa981..5f1c266131b0 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -951,6 +951,13 @@ controller implements weight and absolute bandwidth limit models for
 normal scheduling policy and absolute bandwidth allocation model for
 realtime scheduling policy.
 
+In all the above models, cycles distribution is defined only on a temporal
+base and it does not account for the frequency at which tasks are executed.
+The (optional) utilization clamping support allows to hint the schedutil
+cpufreq governor about the minimum desired frequency which should always be
+provided by a CPU, as well as the maximum desired frequency, which should not
+be exceeded by a CPU.
+
 WARNING: cgroup2 doesn't yet support control of realtime processes and
 the cpu controller can only be enabled when all RT processes are in
 the root cgroup.  Be aware that system management software may already
@@ -1016,6 +1023,33 @@ All time durations are in microseconds.
 	Shows pressure stall information for CPU. See
 	Documentation/accounting/psi.rst for details.
 
+  cpu.uclamp.min
+        A read-write single value file which exists on non-root cgroups.
+        The default is "0", i.e. no utilization boosting.
+
+        The requested minimum utilization (protection) as a percentage
+        rational number, e.g. 12.34 for 12.34%.
+
+        This interface allows reading and setting minimum utilization clamp
+        values similar to the sched_setattr(2). This minimum utilization
+        value is used to clamp the task specific minimum utilization clamp.
+
+        The requested minimum utilization (protection) is always capped by
+        the current value for the maximum utilization (limit), i.e.
+        `cpu.uclamp.max`.
+
+  cpu.uclamp.max
+        A read-write single value file which exists on non-root cgroups.
+        The default is "max". i.e. no utilization capping
+
+        The requested maximum utilization (limit) as a percentage rational
+        number, e.g. 98.76 for 98.76%.
+
+        This interface allows reading and setting maximum utilization clamp
+        values similar to the sched_setattr(2). This maximum utilization
+        value is used to clamp the task specific maximum utilization clamp.
+
+
 
 Memory
 ------
diff --git a/init/Kconfig b/init/Kconfig
index bd7d650d4a99..ac285cfa78b6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -928,6 +928,28 @@ config RT_GROUP_SCHED
 
 endif #CGROUP_SCHED
 
+config UCLAMP_TASK_GROUP
+	bool "Utilization clamping per group of tasks"
+	depends on CGROUP_SCHED
+	depends on UCLAMP_TASK
+	default n
+	help
+	  This feature enables the scheduler to track the clamped utilization
+	  of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
+
+	  When this option is enabled, the user can specify a min and max
+	  CPU bandwidth which is allowed for each single task in a group.
+	  The max bandwidth allows to clamp the maximum frequency a task
+	  can use, while the min bandwidth allows to define a minimum
+	  frequency a task will always use.
+
+	  When task group based utilization clamping is enabled, an eventually
+	  specified task-specific clamp value is constrained by the cgroup
+	  specified clamp value. Both minimum and maximum task clamping cannot
+	  be bigger than the corresponding clamping defined at task group level.
+
+	  If in doubt, say N.
+
 config CGROUP_PIDS
 	bool "PIDs controller"
 	help
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..fcc32afe53cb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1149,8 +1149,12 @@ static void __init init_uclamp(void)
 
 	/* System defaults allow max clamp values for both indexes */
 	uclamp_se_set(&uc_max, uclamp_none(UCLAMP_MAX), false);
-	for_each_clamp_id(clamp_id)
+	for_each_clamp_id(clamp_id) {
 		uclamp_default[clamp_id] = uc_max;
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+		root_task_group.uclamp_req[clamp_id] = uc_max;
+#endif
+	}
 }
 
 #else /* CONFIG_UCLAMP_TASK */
@@ -6727,6 +6731,19 @@ void ia64_set_curr_task(int cpu, struct task_struct *p)
 /* task_group_lock serializes the addition/removal of task groups */
 static DEFINE_SPINLOCK(task_group_lock);
 
+static inline void alloc_uclamp_sched_group(struct task_group *tg,
+					    struct task_group *parent)
+{
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	int clamp_id;
+
+	for_each_clamp_id(clamp_id) {
+		uclamp_se_set(&tg->uclamp_req[clamp_id],
+			      uclamp_none(clamp_id), false);
+	}
+#endif
+}
+
 static void sched_free_group(struct task_group *tg)
 {
 	free_fair_sched_group(tg);
@@ -6750,6 +6767,8 @@ struct task_group *sched_create_group(struct task_group *parent)
 	if (!alloc_rt_sched_group(tg, parent))
 		goto err;
 
+	alloc_uclamp_sched_group(tg, parent);
+
 	return tg;
 
 err:
@@ -6970,6 +6989,132 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 		sched_move_task(task);
 }
 
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+
+#define _POW10(exp) ((unsigned int)1e##exp)
+#define POW10(exp) _POW10(exp)
+
+struct uclamp_request {
+#define UCLAMP_PERCENT_SHIFT	2
+#define UCLAMP_PERCENT_SCALE	(100 * POW10(UCLAMP_PERCENT_SHIFT))
+	s64 percent;
+	u64 util;
+	int ret;
+};
+
+static inline struct uclamp_request
+capacity_from_percent(char *buf)
+{
+	struct uclamp_request req = {
+		.percent = UCLAMP_PERCENT_SCALE,
+		.util = SCHED_CAPACITY_SCALE,
+		.ret = 0,
+	};
+
+	buf = strim(buf);
+	if (strncmp("max", buf, 4)) {
+		req.ret = cgroup_parse_float(buf, UCLAMP_PERCENT_SHIFT,
+					     &req.percent);
+		if (req.ret)
+			return req;
+		if (req.percent > UCLAMP_PERCENT_SCALE) {
+			req.ret = -ERANGE;
+			return req;
+		}
+
+		req.util = req.percent << SCHED_CAPACITY_SHIFT;
+		req.util = DIV_ROUND_CLOSEST_ULL(req.util, UCLAMP_PERCENT_SCALE);
+	}
+
+	return req;
+}
+
+static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes,
+				    loff_t off)
+{
+	struct uclamp_request req;
+	struct task_group *tg;
+
+	req = capacity_from_percent(buf);
+	if (req.ret)
+		return req.ret;
+
+	rcu_read_lock();
+
+	tg = css_tg(of_css(of));
+	if (tg->uclamp_req[UCLAMP_MIN].value != req.util)
+		uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN], req.util, false);
+
+	/* Keep track of the actual requested value */
+	tg->uclamp_pct[UCLAMP_MIN] = req.percent;
+
+	rcu_read_unlock();
+
+	return nbytes;
+}
+
+static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes,
+				    loff_t off)
+{
+	struct uclamp_request req;
+	struct task_group *tg;
+
+	req = capacity_from_percent(buf);
+	if (req.ret)
+		return req.ret;
+
+	rcu_read_lock();
+
+	tg = css_tg(of_css(of));
+	if (tg->uclamp_req[UCLAMP_MAX].value != req.util)
+		uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX], req.util, false);
+
+	/* Keep track of the actual requested value */
+	tg->uclamp_pct[UCLAMP_MAX] = req.percent;
+
+	rcu_read_unlock();
+
+	return nbytes;
+}
+
+static inline void cpu_uclamp_print(struct seq_file *sf,
+				    enum uclamp_id clamp_id)
+{
+	struct task_group *tg;
+	u64 util_clamp;
+	u64 percent;
+	u32 rem;
+
+	rcu_read_lock();
+	tg = css_tg(seq_css(sf));
+	util_clamp = tg->uclamp_req[clamp_id].value;
+	rcu_read_unlock();
+
+	if (util_clamp == SCHED_CAPACITY_SCALE) {
+		seq_puts(sf, "max\n");
+		return;
+	}
+
+	percent = tg->uclamp_pct[clamp_id];
+	percent = div_u64_rem(percent, POW10(UCLAMP_PERCENT_SHIFT), &rem);
+	seq_printf(sf, "%llu.%0*u\n", percent, UCLAMP_PERCENT_SHIFT, rem);
+}
+
+static int cpu_uclamp_min_show(struct seq_file *sf, void *v)
+{
+	cpu_uclamp_print(sf, UCLAMP_MIN);
+	return 0;
+}
+
+static int cpu_uclamp_max_show(struct seq_file *sf, void *v)
+{
+	cpu_uclamp_print(sf, UCLAMP_MAX);
+	return 0;
+}
+#endif /* CONFIG_UCLAMP_TASK_GROUP */
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static int cpu_shares_write_u64(struct cgroup_subsys_state *css,
 				struct cftype *cftype, u64 shareval)
@@ -7314,6 +7459,20 @@ static struct cftype cpu_legacy_files[] = {
 		.read_u64 = cpu_rt_period_read_uint,
 		.write_u64 = cpu_rt_period_write_uint,
 	},
+#endif
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	{
+		.name = "uclamp.min",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_uclamp_min_show,
+		.write = cpu_uclamp_min_write,
+	},
+	{
+		.name = "uclamp.max",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_uclamp_max_show,
+		.write = cpu_uclamp_max_write,
+	},
 #endif
 	{ }	/* Terminate */
 };
@@ -7481,6 +7640,20 @@ static struct cftype cpu_files[] = {
 		.seq_show = cpu_max_show,
 		.write = cpu_max_write,
 	},
+#endif
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	{
+		.name = "uclamp.min",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_uclamp_min_show,
+		.write = cpu_uclamp_min_write,
+	},
+	{
+		.name = "uclamp.max",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cpu_uclamp_max_show,
+		.write = cpu_uclamp_max_write,
+	},
 #endif
 	{ }	/* terminate */
 };
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 802b1f3405f2..f10557a2dea7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -393,6 +393,14 @@ struct task_group {
 #endif
 
 	struct cfs_bandwidth	cfs_bandwidth;
+
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	/* The two decimal precision [%] value requested from user-space */
+	unsigned int		uclamp_pct[UCLAMP_CNT];
+	/* Clamp values requested for a task group */
+	struct uclamp_se	uclamp_req[UCLAMP_CNT];
+#endif
+
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-- 
2.22.0


^ permalink raw reply related

* [PATCH v12 2/6] sched/core: uclamp: Propagate parent clamps
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

In order to properly support hierarchical resources control, the cgroup
delegation model requires that attribute writes from a child group never
fail but still are locally consistent and constrained based on parent's
assigned resources. This requires to properly propagate and aggregate
parent attributes down to its descendants.

Implement this mechanism by adding a new "effective" clamp value for each
task group. The effective clamp value is defined as the smaller value
between the clamp value of a group and the effective clamp value of its
parent. This is the actual clamp value enforced on tasks in a task group.

Since it's possible for a cpu.uclamp.min value to be bigger than the
cpu.uclamp.max value, ensure local consistency by restricting each
"protection" (i.e. min utilization) with the corresponding "limit"
(i.e. max utilization).

Do that at effective clamps propagation to ensure all user-space write
never fails while still always tracking the most restrictive values.

Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---
Changes in v12:
 Message-ID: <20190716140706.vuggfigjlys44lkp@e110439-lin>
 - use a dedicated variable for parent restrictions
 - make more explicit in the documentation that the requested "protection" is
   always capped by the requested "limit"
 Message-ID: <20190716175542.p7vs2muslyuez6lq@e110439-lin>
 - use the newly added uclamp_mutex to serialize the sysfs write callback
---
 kernel/sched/core.c  | 70 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/sched/sched.h |  2 ++
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fcc32afe53cb..08f5a0c205c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -773,6 +773,18 @@ static void set_load_weight(struct task_struct *p, bool update_load)
 }
 
 #ifdef CONFIG_UCLAMP_TASK
+/*
+ * Serializes updates of utilization clamp values
+ *
+ * The (slow-path) user-space triggers utilization clamp value updates which
+ * can require updates on (fast-path) scheduler's data structures used to
+ * support enqueue/dequeue operations.
+ * While the per-CPU rq lock protects fast-path update operations, user-space
+ * requests are serialized using a mutex to reduce the risk of conflicting
+ * updates or API abuses.
+ */
+static DEFINE_MUTEX(uclamp_mutex);
+
 /* Max allowed minimum utilization */
 unsigned int sysctl_sched_uclamp_util_min = SCHED_CAPACITY_SCALE;
 
@@ -1010,10 +1022,9 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 				loff_t *ppos)
 {
 	int old_min, old_max;
-	static DEFINE_MUTEX(mutex);
 	int result;
 
-	mutex_lock(&mutex);
+	mutex_lock(&uclamp_mutex);
 	old_min = sysctl_sched_uclamp_util_min;
 	old_max = sysctl_sched_uclamp_util_max;
 
@@ -1048,7 +1059,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 	sysctl_sched_uclamp_util_min = old_min;
 	sysctl_sched_uclamp_util_max = old_max;
 done:
-	mutex_unlock(&mutex);
+	mutex_unlock(&uclamp_mutex);
 
 	return result;
 }
@@ -1137,6 +1148,8 @@ static void __init init_uclamp(void)
 	unsigned int clamp_id;
 	int cpu;
 
+	mutex_init(&uclamp_mutex);
+
 	for_each_possible_cpu(cpu) {
 		memset(&cpu_rq(cpu)->uclamp, 0, sizeof(struct uclamp_rq));
 		cpu_rq(cpu)->uclamp_flags = 0;
@@ -1153,6 +1166,7 @@ static void __init init_uclamp(void)
 		uclamp_default[clamp_id] = uc_max;
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 		root_task_group.uclamp_req[clamp_id] = uc_max;
+		root_task_group.uclamp[clamp_id] = uc_max;
 #endif
 	}
 }
@@ -6740,6 +6754,7 @@ static inline void alloc_uclamp_sched_group(struct task_group *tg,
 	for_each_clamp_id(clamp_id) {
 		uclamp_se_set(&tg->uclamp_req[clamp_id],
 			      uclamp_none(clamp_id), false);
+		tg->uclamp[clamp_id] = parent->uclamp[clamp_id];
 	}
 #endif
 }
@@ -6990,6 +7005,45 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 }
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
+static void cpu_util_update_eff(struct cgroup_subsys_state *css)
+{
+	struct cgroup_subsys_state *top_css = css;
+	struct uclamp_se *uc_parent = NULL;
+	struct uclamp_se *uc_se = NULL;
+	unsigned int eff[UCLAMP_CNT];
+	unsigned int clamp_id;
+	unsigned int clamps;
+
+	css_for_each_descendant_pre(css, top_css) {
+		uc_parent = css_tg(css)->parent
+			? css_tg(css)->parent->uclamp : NULL;
+
+		for_each_clamp_id(clamp_id) {
+			/* Assume effective clamps matches requested clamps */
+			eff[clamp_id] = css_tg(css)->uclamp_req[clamp_id].value;
+			/* Cap effective clamps with parent's effective clamps */
+			if (uc_parent &&
+			    eff[clamp_id] > uc_parent[clamp_id].value) {
+				eff[clamp_id] = uc_parent[clamp_id].value;
+			}
+		}
+		/* Ensure protection is always capped by limit */
+		eff[UCLAMP_MIN] = min(eff[UCLAMP_MIN], eff[UCLAMP_MAX]);
+
+		/* Propagate most restrictive effective clamps */
+		clamps = 0x0;
+		uc_se = css_tg(css)->uclamp;
+		for_each_clamp_id(clamp_id) {
+			if (eff[clamp_id] == uc_se[clamp_id].value)
+				continue;
+			uc_se[clamp_id].value = eff[clamp_id];
+			uc_se[clamp_id].bucket_id = uclamp_bucket_id(eff[clamp_id]);
+			clamps |= (0x1 << clamp_id);
+		}
+		if (!clamps)
+			css = css_rightmost_descendant(css);
+	}
+}
 
 #define _POW10(exp) ((unsigned int)1e##exp)
 #define POW10(exp) _POW10(exp)
@@ -7040,6 +7094,7 @@ static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
 	if (req.ret)
 		return req.ret;
 
+	mutex_lock(&uclamp_mutex);
 	rcu_read_lock();
 
 	tg = css_tg(of_css(of));
@@ -7049,7 +7104,11 @@ static ssize_t cpu_uclamp_min_write(struct kernfs_open_file *of,
 	/* Keep track of the actual requested value */
 	tg->uclamp_pct[UCLAMP_MIN] = req.percent;
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_eff(of_css(of));
+
 	rcu_read_unlock();
+	mutex_unlock(&uclamp_mutex);
 
 	return nbytes;
 }
@@ -7065,6 +7124,7 @@ static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
 	if (req.ret)
 		return req.ret;
 
+	mutex_lock(&uclamp_mutex);
 	rcu_read_lock();
 
 	tg = css_tg(of_css(of));
@@ -7074,7 +7134,11 @@ static ssize_t cpu_uclamp_max_write(struct kernfs_open_file *of,
 	/* Keep track of the actual requested value */
 	tg->uclamp_pct[UCLAMP_MAX] = req.percent;
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_eff(of_css(of));
+
 	rcu_read_unlock();
+	mutex_unlock(&uclamp_mutex);
 
 	return nbytes;
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f10557a2dea7..93a030321210 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -399,6 +399,8 @@ struct task_group {
 	unsigned int		uclamp_pct[UCLAMP_CNT];
 	/* Clamp values requested for a task group */
 	struct uclamp_se	uclamp_req[UCLAMP_CNT];
+	/* Effective clamp values used for a task group */
+	struct uclamp_se	uclamp[UCLAMP_CNT];
 #endif
 
 };
-- 
2.22.0


^ permalink raw reply related

* [PATCH v12 3/6] sched/core: uclamp: Propagate system defaults to root group
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

The clamp values are not tunable at the level of the root task group.
That's for two main reasons:

 - the root group represents "system resources" which are always
   entirely available from the cgroup standpoint.

 - when tuning/restricting "system resources" makes sense, tuning must
   be done using a system wide API which should also be available when
   control groups are not.

When a system wide restriction is available, cgroups should be aware of
its value in order to know exactly how much "system resources" are
available for the subgroups.

Utilization clamping supports already the concepts of:

 - system defaults: which define the maximum possible clamp values
   usable by tasks.

 - effective clamps: which allows a parent cgroup to constraint (maybe
   temporarily) its descendants without losing the information related
   to the values "requested" from them.

Exploit these two concepts and bind them together in such a way that,
whenever system default are tuned, the new values are propagated to
(possibly) restrict or relax the "effective" value of nested cgroups.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---
Changes in v12:
 Message-ID: <20190716143417.us3xhksrsaxsl2ok@e110439-lin>
 - add missing RCU read locks across cpu_util_update_eff() call from
   uclamp_update_root_tg()
---
 kernel/sched/core.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 08f5a0c205c6..e9231b089d5c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1017,10 +1017,30 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 		uclamp_rq_dec_id(rq, p, clamp_id);
 }
 
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+static void cpu_util_update_eff(struct cgroup_subsys_state *css);
+static void uclamp_update_root_tg(void)
+{
+	struct task_group *tg = &root_task_group;
+
+	uclamp_se_set(&tg->uclamp_req[UCLAMP_MIN],
+		      sysctl_sched_uclamp_util_min, false);
+	uclamp_se_set(&tg->uclamp_req[UCLAMP_MAX],
+		      sysctl_sched_uclamp_util_max, false);
+
+	rcu_read_lock();
+	cpu_util_update_eff(&root_task_group.css);
+	rcu_read_unlock();
+}
+#else
+static void uclamp_update_root_tg(void) { }
+#endif
+
 int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp,
 				loff_t *ppos)
 {
+	bool update_root_tg = false;
 	int old_min, old_max;
 	int result;
 
@@ -1043,12 +1063,17 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
 	if (old_min != sysctl_sched_uclamp_util_min) {
 		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
 			      sysctl_sched_uclamp_util_min, false);
+		update_root_tg = true;
 	}
 	if (old_max != sysctl_sched_uclamp_util_max) {
 		uclamp_se_set(&uclamp_default[UCLAMP_MAX],
 			      sysctl_sched_uclamp_util_max, false);
+		update_root_tg = true;
 	}
 
+	if (update_root_tg)
+		uclamp_update_root_tg();
+
 	/*
 	 * Updating all the RUNNABLE task is expensive, keep it simple and do
 	 * just a lazy update at each next enqueue time.
-- 
2.22.0


^ permalink raw reply related

* [PATCH v12 5/6] sched/core: uclamp: Update CPU's refcount on TG's clamp changes
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

On updates of task group (TG) clamp values, ensure that these new values
are enforced on all RUNNABLE tasks of the task group, i.e. all RUNNABLE
tasks are immediately boosted and/or capped as requested.

Do that each time we update effective clamps from cpu_util_update_eff().
Use the *cgroup_subsys_state (css) to walk the list of tasks in each
affected TG and update their RUNNABLE tasks.
Update each task by using the same mechanism used for cpu affinity masks
updates, i.e. by taking the rq lock.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 426736b2c4d7..26ac1cbec0be 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1043,6 +1043,57 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 		uclamp_rq_dec_id(rq, p, clamp_id);
 }
 
+static inline void
+uclamp_update_active(struct task_struct *p, unsigned int clamp_id)
+{
+	struct rq_flags rf;
+	struct rq *rq;
+
+	/*
+	 * Lock the task and the rq where the task is (or was) queued.
+	 *
+	 * We might lock the (previous) rq of a !RUNNABLE task, but that's the
+	 * price to pay to safely serialize util_{min,max} updates with
+	 * enqueues, dequeues and migration operations.
+	 * This is the same locking schema used by __set_cpus_allowed_ptr().
+	 */
+	rq = task_rq_lock(p, &rf);
+
+	/*
+	 * Setting the clamp bucket is serialized by task_rq_lock().
+	 * If the task is not yet RUNNABLE and its task_struct is not
+	 * affecting a valid clamp bucket, the next time it's enqueued,
+	 * it will already see the updated clamp bucket value.
+	 */
+	if (!p->uclamp[clamp_id].active)
+		goto done;
+
+	uclamp_rq_dec_id(rq, p, clamp_id);
+	uclamp_rq_inc_id(rq, p, clamp_id);
+
+done:
+
+	task_rq_unlock(rq, p, &rf);
+}
+
+static inline void
+uclamp_update_active_tasks(struct cgroup_subsys_state *css,
+			   unsigned int clamps)
+{
+	struct css_task_iter it;
+	struct task_struct *p;
+	unsigned int clamp_id;
+
+	css_task_iter_start(css, 0, &it);
+	while ((p = css_task_iter_next(&it))) {
+		for_each_clamp_id(clamp_id) {
+			if ((0x1 << clamp_id) & clamps)
+				uclamp_update_active(p, clamp_id);
+		}
+	}
+	css_task_iter_end(&it);
+}
+
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 static void cpu_util_update_eff(struct cgroup_subsys_state *css);
 static void uclamp_update_root_tg(void)
@@ -7091,8 +7142,13 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 			uc_se[clamp_id].bucket_id = uclamp_bucket_id(eff[clamp_id]);
 			clamps |= (0x1 << clamp_id);
 		}
-		if (!clamps)
+		if (!clamps) {
 			css = css_rightmost_descendant(css);
+			continue;
+		}
+
+		/* Immediately update descendants RUNNABLE tasks */
+		uclamp_update_active_tasks(css, clamps);
 	}
 }
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH v12 6/6] sched/core: uclamp: always use enum uclamp_id for clamp_id values
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

The supported clamp indexes are defined in enum clamp_id however, because
of the code logic in some of the first utilization clamping series version,
sometimes we needed to use unsigned int to represent indexes.

This is not more required since the final version of the uclamp_* APIs can
always use the proper enum uclamp_id type.

Fix it with a bulk rename now that we have all the bits merged.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

---
Changes in v12:
 Message-ID: <20190716140319.hdmgcuevnpwdqobl@e110439-lin>
 - added in this series
---
 kernel/sched/core.c  | 38 +++++++++++++++++++-------------------
 kernel/sched/sched.h |  2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26ac1cbec0be..1e6cf9499d49 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -810,7 +810,7 @@ static inline unsigned int uclamp_bucket_base_value(unsigned int clamp_value)
 	return UCLAMP_BUCKET_DELTA * uclamp_bucket_id(clamp_value);
 }
 
-static inline unsigned int uclamp_none(int clamp_id)
+static inline enum uclamp_id uclamp_none(enum uclamp_id clamp_id)
 {
 	if (clamp_id == UCLAMP_MIN)
 		return 0;
@@ -826,7 +826,7 @@ static inline void uclamp_se_set(struct uclamp_se *uc_se,
 }
 
 static inline unsigned int
-uclamp_idle_value(struct rq *rq, unsigned int clamp_id,
+uclamp_idle_value(struct rq *rq, enum uclamp_id clamp_id,
 		  unsigned int clamp_value)
 {
 	/*
@@ -842,7 +842,7 @@ uclamp_idle_value(struct rq *rq, unsigned int clamp_id,
 	return uclamp_none(UCLAMP_MIN);
 }
 
-static inline void uclamp_idle_reset(struct rq *rq, unsigned int clamp_id,
+static inline void uclamp_idle_reset(struct rq *rq, enum uclamp_id clamp_id,
 				     unsigned int clamp_value)
 {
 	/* Reset max-clamp retention only on idle exit */
@@ -853,8 +853,8 @@ static inline void uclamp_idle_reset(struct rq *rq, unsigned int clamp_id,
 }
 
 static inline
-unsigned int uclamp_rq_max_value(struct rq *rq, unsigned int clamp_id,
-				 unsigned int clamp_value)
+enum uclamp_id uclamp_rq_max_value(struct rq *rq, enum uclamp_id clamp_id,
+				   unsigned int clamp_value)
 {
 	struct uclamp_bucket *bucket = rq->uclamp[clamp_id].bucket;
 	int bucket_id = UCLAMP_BUCKETS - 1;
@@ -874,7 +874,7 @@ unsigned int uclamp_rq_max_value(struct rq *rq, unsigned int clamp_id,
 }
 
 static inline struct uclamp_se
-uclamp_tg_restrict(struct task_struct *p, unsigned int clamp_id)
+uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct uclamp_se uc_req = p->uclamp_req[clamp_id];
 #ifdef CONFIG_UCLAMP_TASK_GROUP
@@ -906,7 +906,7 @@ uclamp_tg_restrict(struct task_struct *p, unsigned int clamp_id)
  * - the system default clamp value, defined by the sysadmin
  */
 static inline struct uclamp_se
-uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
+uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
 	struct uclamp_se uc_max = uclamp_default[clamp_id];
@@ -918,7 +918,7 @@ uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
 	return uc_req;
 }
 
-unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id)
+enum uclamp_id uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct uclamp_se uc_eff;
 
@@ -942,7 +942,7 @@ unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id)
  * for each bucket when all its RUNNABLE tasks require the same clamp.
  */
 static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p,
-				    unsigned int clamp_id)
+				    enum uclamp_id clamp_id)
 {
 	struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
 	struct uclamp_se *uc_se = &p->uclamp[clamp_id];
@@ -980,7 +980,7 @@ static inline void uclamp_rq_inc_id(struct rq *rq, struct task_struct *p,
  * enforce the expected state and warn.
  */
 static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
-				    unsigned int clamp_id)
+				    enum uclamp_id clamp_id)
 {
 	struct uclamp_rq *uc_rq = &rq->uclamp[clamp_id];
 	struct uclamp_se *uc_se = &p->uclamp[clamp_id];
@@ -1019,7 +1019,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
 
 static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
 {
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
@@ -1034,7 +1034,7 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
 
 static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 {
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
@@ -1044,7 +1044,7 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 }
 
 static inline void
-uclamp_update_active(struct task_struct *p, unsigned int clamp_id)
+uclamp_update_active(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct rq_flags rf;
 	struct rq *rq;
@@ -1080,9 +1080,9 @@ static inline void
 uclamp_update_active_tasks(struct cgroup_subsys_state *css,
 			   unsigned int clamps)
 {
+	enum uclamp_id clamp_id;
 	struct css_task_iter it;
 	struct task_struct *p;
-	unsigned int clamp_id;
 
 	css_task_iter_start(css, 0, &it);
 	while ((p = css_task_iter_next(&it))) {
@@ -1188,7 +1188,7 @@ static int uclamp_validate(struct task_struct *p,
 static void __setscheduler_uclamp(struct task_struct *p,
 				  const struct sched_attr *attr)
 {
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 
 	/*
 	 * On scheduling class change, reset to default clamps for tasks
@@ -1225,7 +1225,7 @@ static void __setscheduler_uclamp(struct task_struct *p,
 
 static void uclamp_fork(struct task_struct *p)
 {
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 
 	for_each_clamp_id(clamp_id)
 		p->uclamp[clamp_id].active = false;
@@ -1247,7 +1247,7 @@ static void uclamp_fork(struct task_struct *p)
 static void __init init_uclamp(void)
 {
 	struct uclamp_se uc_max = {};
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 	int cpu;
 
 	mutex_init(&uclamp_mutex);
@@ -6851,7 +6851,7 @@ static inline void alloc_uclamp_sched_group(struct task_group *tg,
 					    struct task_group *parent)
 {
 #ifdef CONFIG_UCLAMP_TASK_GROUP
-	int clamp_id;
+	enum uclamp_id clamp_id;
 
 	for_each_clamp_id(clamp_id) {
 		uclamp_se_set(&tg->uclamp_req[clamp_id],
@@ -7113,7 +7113,7 @@ static void cpu_util_update_eff(struct cgroup_subsys_state *css)
 	struct uclamp_se *uc_parent = NULL;
 	struct uclamp_se *uc_se = NULL;
 	unsigned int eff[UCLAMP_CNT];
-	unsigned int clamp_id;
+	enum uclamp_id clamp_id;
 	unsigned int clamps;
 
 	css_for_each_descendant_pre(css, top_css) {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 93a030321210..e230474c1548 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2276,7 +2276,7 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 #endif /* CONFIG_CPU_FREQ */
 
 #ifdef CONFIG_UCLAMP_TASK
-unsigned int uclamp_eff_value(struct task_struct *p, unsigned int clamp_id);
+enum uclamp_id uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
 
 static __always_inline
 unsigned int uclamp_util_with(struct rq *rq, unsigned int util,
-- 
2.22.0


^ permalink raw reply related

* [PATCH v12 4/6] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps
From: Patrick Bellasi @ 2019-07-18 18:17 UTC (permalink / raw)
  To: linux-kernel, linux-pm, linux-api, cgroups
  Cc: Ingo Molnar, Peter Zijlstra, Tejun Heo, Rafael J . Wysocki,
	Vincent Guittot, Viresh Kumar, Paul Turner, Michal Koutny,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718181748.28446-1-patrick.bellasi@arm.com>

When a task specific clamp value is configured via sched_setattr(2), this
value is accounted in the corresponding clamp bucket every time the task is
{en,de}qeued. However, when cgroups are also in use, the task specific
clamp values could be restricted by the task_group (TG) clamp values.

Update uclamp_cpu_inc() to aggregate task and TG clamp values. Every time a
task is enqueued, it's accounted in the clamp bucket tracking the smaller
clamp between the task specific value and its TG effective value. This
allows to:

1. ensure cgroup clamps are always used to restrict task specific requests,
   i.e. boosted not more than its TG effective protection and capped at
   least as its TG effective limit.

2. implement a "nice-like" policy, where tasks are still allowed to request
   less than what enforced by their TG effective limits and protections

Do this by exploiting the concept of "effective" clamp, which is already
used by a TG to track parent enforced restrictions.

Apply task group clamp restrictions only to tasks belonging to a child
group. While, for tasks in the root group or in an autogroup, system
defaults are still enforced.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---
Changes in v12:
 Message-ID: <20190716143435.iwwd6fjr3udlqol4@e110439-lin>
 - remove not required and confusing sentence from the above changelog
---
 kernel/sched/core.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e9231b089d5c..426736b2c4d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -873,16 +873,42 @@ unsigned int uclamp_rq_max_value(struct rq *rq, unsigned int clamp_id,
 	return uclamp_idle_value(rq, clamp_id, clamp_value);
 }

+static inline struct uclamp_se
+uclamp_tg_restrict(struct task_struct *p, unsigned int clamp_id)
+{
+	struct uclamp_se uc_req = p->uclamp_req[clamp_id];
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	struct uclamp_se uc_max;
+
+	/*
+	 * Tasks in autogroups or root task group will be
+	 * restricted by system defaults.
+	 */
+	if (task_group_is_autogroup(task_group(p)))
+		return uc_req;
+	if (task_group(p) == &root_task_group)
+		return uc_req;
+
+	uc_max = task_group(p)->uclamp[clamp_id];
+	if (uc_req.value > uc_max.value || !uc_req.user_defined)
+		return uc_max;
+#endif
+
+	return uc_req;
+}
+
 /*
  * The effective clamp bucket index of a task depends on, by increasing
  * priority:
  * - the task specific clamp value, when explicitly requested from userspace
+ * - the task group effective clamp value, for tasks not either in the root
+ *   group or in an autogroup
  * - the system default clamp value, defined by the sysadmin
  */
 static inline struct uclamp_se
 uclamp_eff_get(struct task_struct *p, unsigned int clamp_id)
 {
-	struct uclamp_se uc_req = p->uclamp_req[clamp_id];
+	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
 	struct uclamp_se uc_max = uclamp_default[clamp_id];

 	/* System default restrictions always apply */
-- 
2.22.0

^ permalink raw reply related

* Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845
From: David Dai @ 2019-07-18 17:59 UTC (permalink / raw)
  To: Evan Green
  Cc: Georgi Djakov, linux-pm, Vincent Guittot, Bjorn Andersson,
	amit.kucheria, Doug Anderson, Sean Sweeney, LKML,
	linux-arm Mailing List, linux-arm-msm
In-Reply-To: <CAE=gft4pQXyCdRsMkN7Xs-R5HU=2baYhCPqSsw=uqOfT+hTJDg@mail.gmail.com>

On 7/16/2019 1:15 PM, Evan Green wrote:
> On Mon, Jul 15, 2019 at 4:34 PM David Dai <daidavid1@codeaurora.org> wrote:
>> Hi Evan,
>>
>> Thanks for the continued help in reviewing these patches!
> No problem. I want to do more, but haven't found time to do the
> prerequisite research before jumping into some of the other
> discussions yet.
>
>> On 7/11/2019 10:06 AM, Evan Green wrote:
>>> Hi Georgi and David,
>>>
>>> On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>>>> From: David Dai <daidavid1@codeaurora.org>
>>>>
>>>> Add support for wake and sleep commands by using a tag to indicate
>>>> whether or not the aggregate and set requests fall into execution
>>>> state specific bucket.
>>>>
>>>> Signed-off-by: David Dai <daidavid1@codeaurora.org>
>>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>>>> ---
>>>>    drivers/interconnect/qcom/sdm845.c | 129 ++++++++++++++++++++++-------
>>>>    1 file changed, 98 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/interconnect/qcom/sdm845.c b/drivers/interconnect/qcom/sdm845.c
>>>> index fb526004c82e..c100aab39415 100644
>>>> --- a/drivers/interconnect/qcom/sdm845.c
>>>> +++ b/drivers/interconnect/qcom/sdm845.c
>>>> @@ -66,6 +66,17 @@ struct bcm_db {
>>>>    #define SDM845_MAX_BCM_PER_NODE        2
>>>>    #define SDM845_MAX_VCD         10
>>>>
>>>> +#define QCOM_ICC_BUCKET_AMC            0
>>> What is AMC again? Is it the "right now" bucket? Maybe a comment on
>>> the meaning of this bucket would be helpful.
>> That's correct. Will add a comment for this.
>>>> +#define QCOM_ICC_BUCKET_WAKE           1
>>>> +#define QCOM_ICC_BUCKET_SLEEP          2
>>>> +#define QCOM_ICC_NUM_BUCKETS           3
>>>> +#define QCOM_ICC_TAG_AMC               BIT(QCOM_ICC_BUCKET_AMC)
>>>> +#define QCOM_ICC_TAG_WAKE              BIT(QCOM_ICC_BUCKET_WAKE)
>>>> +#define QCOM_ICC_TAG_SLEEP             BIT(QCOM_ICC_BUCKET_SLEEP)
>>>> +#define QCOM_ICC_TAG_ACTIVE_ONLY       (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
>>>> +#define QCOM_ICC_TAG_ALWAYS            (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
>>>> +                                        QCOM_ICC_TAG_SLEEP)
>>>> +
>>>>    /**
>>>>     * struct qcom_icc_node - Qualcomm specific interconnect nodes
>>>>     * @name: the node name used in debugfs
>>>> @@ -75,7 +86,9 @@ struct bcm_db {
>>>>     * @channels: num of channels at this node
>>>>     * @buswidth: width of the interconnect between a node and the bus
>>>>     * @sum_avg: current sum aggregate value of all avg bw requests
>>>> + * @sum_avg_cached: previous sum aggregate value of all avg bw requests
>>>>     * @max_peak: current max aggregate value of all peak bw requests
>>>> + * @max_peak_cached: previous max aggregate value of all peak bw requests
>>>>     * @bcms: list of bcms associated with this logical node
>>>>     * @num_bcms: num of @bcms
>>>>     */
>>>> @@ -86,8 +99,10 @@ struct qcom_icc_node {
>>>>           u16 num_links;
>>>>           u16 channels;
>>>>           u16 buswidth;
>>>> -       u64 sum_avg;
>>>> -       u64 max_peak;
>>>> +       u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 max_peak[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
>>>>           struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
>>>>           size_t num_bcms;
>>>>    };
>>>> @@ -112,8 +127,8 @@ struct qcom_icc_bcm {
>>>>           const char *name;
>>>>           u32 type;
>>>>           u32 addr;
>>>> -       u64 vote_x;
>>>> -       u64 vote_y;
>>>> +       u64 vote_x[QCOM_ICC_NUM_BUCKETS];
>>>> +       u64 vote_y[QCOM_ICC_NUM_BUCKETS];
>>>>           bool dirty;
>>>>           bool keepalive;
>>>>           struct bcm_db aux_data;
>>>> @@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, u64 vote_y,
>>>>                   cmd->wait = true;
>>>>    }
>>>>
>>>> -static void tcs_list_gen(struct list_head *bcm_list,
>>>> +static void tcs_list_gen(struct list_head *bcm_list, int bucket,
>>>>                            struct tcs_cmd tcs_list[SDM845_MAX_VCD],
>>>>                            int n[SDM845_MAX_VCD])
>>>>    {
>>>> @@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>                           commit = true;
>>>>                           cur_vcd_size = 0;
>>>>                   }
>>>> -               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x, bcm->vote_y,
>>>> -                           bcm->addr, commit);
>>>> +               tcs_cmd_gen(&tcs_list[idx], bcm->vote_x[bucket],
>>>> +                           bcm->vote_y[bucket], bcm->addr, commit);
>>>>                   idx++;
>>>>                   n[batch]++;
>>>>                   /*
>>>> @@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,
>>>>
>>>>    static void bcm_aggregate(struct qcom_icc_bcm *bcm)
>>>>    {
>>>> -       size_t i;
>>>> -       u64 agg_avg = 0;
>>>> -       u64 agg_peak = 0;
>>>> +       size_t i, bucket;
>>>> +       u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
>>>> +       u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
>>>>           u64 temp;
>>>>
>>>> -       for (i = 0; i < bcm->num_nodes; i++) {
>>>> -               temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
>>>> -               do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>> -               agg_avg = max(agg_avg, temp);
>>>> +       for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
>>>> +               for (i = 0; i < bcm->num_nodes; i++) {
>>>> +                       temp = bcm->nodes[i]->sum_avg_cached[bucket] * bcm->aux_data.width;
>>>> +                       do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
>>>> +                       agg_avg[bucket] = max(agg_avg[bucket], temp);
>>>>
>>>> -               temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
>>>> -               do_div(temp, bcm->nodes[i]->buswidth);
>>> Why is it that this one doesn't have the multiply by
>>> bcm->nodes[i]->channels again? I can't recall if there was a reason.
>>> If it's correct maybe it deserves a comment.
>> I think the rationale behind this is generally for consumers to target a
>> certain minimum threshold to satisfy some structural latency
>> requirements as opposed to strictly throughput, and it may be easier for
>> consumers to reuse certain values to support hitting some minimum NoC
>> frequencies without having to be concerned with the number of channels
>> that may change from platform to platform.
> I was mostly pointing out that sum_avg seems to have the multiply, but
> max_peak does not. I would have expected those two things to be of the
> same units, and get the same treatment. Maybe the hardware is taking
> in different final units for that field, one that is per-channel and
> one that isn't?

The hardware isn't treating the values differently. I couldn't find any 
justification other than the intuition mentioned above for the ease of 
voting from the consumer perspective. The consumer would know that this 
peak_bw value results in some floor performance from the system to 
satisfy its latency requirements. The same approach would work if we 
accounted for the number of channels as well, but given that channels 
may vary from platform to platform or even on the same platform that 
shares multiple channel configurations(DDR), it can be difficult for 
consumers to keep track of and have to adjust their votes constantly(to 
try to hit some frequency/latency requirement, this intuition doesn't 
apply for avg_bw since we're concerned with throughput in that case).

>>>> -               agg_peak = max(agg_peak, temp);
>>>> -       }
>>>> +                       temp = bcm->nodes[i]->max_peak_cached[bucket] * bcm->aux_data.width;
>>>> +                       do_div(temp, bcm->nodes[i]->buswidth);
>>>> +                       agg_peak[bucket] = max(agg_peak[bucket], temp);
>>>>
>>>> -       temp = agg_avg * 1000ULL;
>>>> -       do_div(temp, bcm->aux_data.unit);
>>>> -       bcm->vote_x = temp;
>>>> +                       bcm->nodes[i]->sum_avg[bucket] = 0;
>>>> +                       bcm->nodes[i]->max_peak[bucket] = 0;
>>> I don't understand the sum_avg vs sum_avg_cached. Here's what I understand:
>>> 1. qcom_icc_aggregate() does the math from the incoming values on
>>> sum_avg, and then clobbers sum_avg_cached with those values.
>>> 2. bcm_aggregate() uses sum_avg_cached in its calculations, then clears sum_avg.
>>>
>>> But I don't get why that's needed. Why not just have sum_avg? Wouldn't
>>> it work the same? Ok, it wouldn't if you ended up calling
>>> bcm_aggregate() multiple times on the same bcm. But you have a dirty
>>> flag that prevents this from happening. So I think it's safe to remove
>>> the cached arrays, and just clear out the sum_avg when you aggregate.
>> You are correct in that the dirty flag would prevent another repeat of
>> the bcm_aggregate() call in the same icc_set request. But consider a
>> following icc_set request on a different node that shares the same BCM,
>> the next bcm_aggregate() would result in an incorrect aggregate sum_avg
>> for the BCM since the avg_sum from the previous node(from the previous
>> icc_set) was cleared out. We need a way to retain the current state of
>> all nodes to accurately aggregate the bw values for the BCM.
> I don't get it. qcom_icc_aggregate() clobbers sum_avg_cached. So
> they're only ever a) equal, like after qcom_icc_aggregate(), or b)
> sum_avg is zeroed, and sum_avg_cached is its old value. A new
> icc_set_bw() would call aggregate_requests(), which would clobber
> sum_avg_cached to sum_avg for every BCM involved. Then the core would
> call apply_constraints(), then qcom_icc_set(), which would use
> sum_avg_cached, and clear out sum_avg, being sure with the dirty flag
> that bcm_aggregate() is only called once per BCM. This all happens
> under the mutex held in the core. A new request would start the whole
> thing over, since sum_avg is cleared. It seems to me that flow would
> work the same with one array as it does with two. Maybe you can walk
> me through a scenario?
> -Evan

Let's walk through the scenario you've just described with the 
assumption that there's only one avg_sum value per node with two 
icc_set_bw() requests on two different nodes(say 2MB for node 1 and 1MB 
for node 2) under the same BCM(say BCM A). The first 
qcom_icc_aggregate() aggregates to a 2MB avg_sum at the node1 followed 
by apply_constraints(), qcom_icc_set(), bcm_aggregate() which causes BCM 
A to aggregate to max(node1->avg_sum, node2->avg_sum) and reach a vote_x 
of 2MB(for simplicity let's ignore unit). We then clear out 
node1->avg_sum before we start the next icc_set_bw(). In the following 
icc_set_bw(), the qcom_icc_aggregate() aggregates to 1MB in node2 
followed by apply_constraints(), qcom_icc_set(), bcm_aggregate(), but 
now incorrectly aggregates BCM A to 1MB by looking at 
max(node1->avg_sum, node2->avg_sum) because node1->avg_sum was cleared 
out when in reality BCM A should have a vote_x value of 2MB at this 
point. The subsequent bcm_aggregate do not re-aggregate all of the 
requests for each of its nodes, but assumes that the aggregated results 
at the nodes are correct.

>
>>>> +               }
>>>>
>>>> -       temp = agg_peak * 1000ULL;
>>>> -       do_div(temp, bcm->aux_data.unit);
>>>> -       bcm->vote_y = temp;
>>>> +               temp = agg_avg[bucket] * 1000ULL;
>>>> +               do_div(temp, bcm->aux_data.unit);
>>>> +               bcm->vote_x[bucket] = temp;
>>>>
>>>> -       if (bcm->keepalive && bcm->vote_x == 0 && bcm->vote_y == 0) {
>>>> -               bcm->vote_x = 1;
>>>> -               bcm->vote_y = 1;
>>>> +               temp = agg_peak[bucket] * 1000ULL;
>>>> +               do_div(temp, bcm->aux_data.unit);
>>>> +               bcm->vote_y[bucket] = temp;
>>>> +       }
>>>> +
>>>> +       if (bcm->keepalive && bcm->vote_x[0] == 0 && bcm->vote_y[0] == 0) {
>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_AMC] = 1;
>>>> +               bcm->vote_x[QCOM_ICC_BUCKET_WAKE] = 1;
>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_AMC] = 1;
>>>> +               bcm->vote_y[QCOM_ICC_BUCKET_WAKE] = 1;
>>>>           }
>>>>
>>>>           bcm->dirty = false;
>>>> @@ -631,15 +653,25 @@ static int qcom_icc_aggregate(struct icc_node *node, u32 tag, u32 avg_bw,
>>>>    {
>>>>           size_t i;
>>>>           struct qcom_icc_node *qn;
>>>> +       unsigned long tag_word = (unsigned long)tag;
>>>>
>>>>           qn = node->data;
>>>>
>>>> +       if (!tag)
>>>> +               tag_word = QCOM_ICC_TAG_ALWAYS;
>>>> +
>>>> +       for (i = 0; i < QCOM_ICC_NUM_BUCKETS; i++) {
>>>> +               if (test_bit(i, &tag_word)) {
>>> I guess all this extra business with tag_word and casting is so that
>>> you can use test_bit, which is presumably a tiny bit faster? Does this
>>> actually make a measurable difference? Maybe in the name of simplicity
>>> we just do if (tag & BIT(i)), and then optimize if we find that
>>> conditional to be a hotspot?
>> Using (tag & BIT(i)) as opposed to test_bit seems reasonable to me.
>>>> +                       qn->sum_avg[i] += avg_bw;
>>>> +                       qn->max_peak[i] = max_t(u32, qn->max_peak[i], peak_bw);
>>>> +                       qn->sum_avg_cached[i] = qn->sum_avg[i];
>>>> +                       qn->max_peak_cached[i] = qn->max_peak[i];
>>>> +               }
>>>> +       }
>>>> +
>>>>           *agg_avg += avg_bw;
>>>>           *agg_peak = max_t(u32, *agg_peak, peak_bw);
>>>>
>>>> -       qn->sum_avg = *agg_avg;
>>>> -       qn->max_peak = *agg_peak;
>>>> -
>>>>           for (i = 0; i < qn->num_bcms; i++)
>>>>                   qn->bcms[i]->dirty = true;
>>>>
>>>> @@ -675,7 +707,7 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>            * Construct the command list based on a pre ordered list of BCMs
>>>>            * based on VCD.
>>>>            */
>>>> -       tcs_list_gen(&commit_list, cmds, commit_idx);
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_AMC, cmds, commit_idx);
>>>>
>>>>           if (!commit_idx[0])
>>>>                   return ret;
>>>> @@ -693,6 +725,41 @@ static int qcom_icc_set(struct icc_node *src, struct icc_node *dst)
>>>>                   return ret;
>>>>           }
>>>>
>>>> +       INIT_LIST_HEAD(&commit_list);
>>>> +
>>>> +       for (i = 0; i < qp->num_bcms; i++) {
>>>> +               /*
>>>> +                * Only generate WAKE and SLEEP commands if a resource's
>>>> +                * requirements change as the execution environment transitions
>>>> +                * between different power states.
>>>> +                */
>>>> +               if (qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_WAKE] !=
>>>> +                   qp->bcms[i]->vote_x[QCOM_ICC_BUCKET_SLEEP] ||
>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_WAKE] !=
>>>> +                   qp->bcms[i]->vote_y[QCOM_ICC_BUCKET_SLEEP]) {
>>>> +                       list_add_tail(&qp->bcms[i]->list, &commit_list);
>>>> +               }
>>>> +       }
>>>> +
>>>> +       if (list_empty(&commit_list))
>>>> +               return ret;
>>>> +
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_WAKE, cmds, commit_idx);
>>>> +
>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_WAKE_ONLY_STATE, cmds, commit_idx);
>>>> +       if (ret) {
>>>> +               pr_err("Error sending WAKE RPMH requests (%d)\n", ret);
>>>> +               return ret;
>>>> +       }
>>>> +
>>>> +       tcs_list_gen(&commit_list, QCOM_ICC_BUCKET_SLEEP, cmds, commit_idx);
>>>> +
>>>> +       ret = rpmh_write_batch(qp->dev, RPMH_SLEEP_STATE, cmds, commit_idx);
>>>> +       if (ret) {
>>>> +               pr_err("Error sending SLEEP RPMH requests (%d)\n", ret);
>>>> +               return ret;
>>>> +       }
>>>> +
>>>>           return ret;
>>>>    }
>>>>
>> --
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply

* Re: [PATCH 09/18] drivers: firmware: psci: Add support for PM domains using genpd
From: Lina Iyer @ 2019-07-18 17:57 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Ulf Hansson, Lorenzo Pieralisi, Mark Rutland, Linux ARM,
	Rafael J . Wysocki, Daniel Lezcano, Raju P . L . S . S . S . N,
	Amit Kucheria, Bjorn Andersson, Stephen Boyd, Niklas Cassel,
	Tony Lindgren, Kevin Hilman, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Souvik Chakravarty, Linux PM, linux-arm-msm,
	Linux Kernel Mailing List, Lina Iyer
In-Reply-To: <20190718131913.GB28633@e107155-lin>

On Thu, Jul 18 2019 at 07:19 -0600, Sudeep Holla wrote:
>On Thu, Jul 18, 2019 at 01:04:03PM +0200, Ulf Hansson wrote:
>> On Tue, 16 Jul 2019 at 17:05, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> >
>> > On Mon, May 13, 2019 at 09:22:51PM +0200, Ulf Hansson wrote:
>> > > When the hierarchical CPU topology layout is used in DT, we need to setup
>> > > the corresponding PM domain data structures, as to allow a CPU and a group
>> > > of CPUs to be power managed accordingly. Let's enable this by deploying
>> > > support through the genpd interface.
>> > >
>> > > Additionally, when the OS initiated mode is supported by the PSCI FW, let's
>> > > also parse the domain idle states DT bindings as to make genpd responsible
>> > > for the state selection, when the states are compatible with
>> > > "domain-idle-state". Otherwise, when only Platform Coordinated mode is
>> > > supported, we rely solely on the state selection to be managed through the
>> > > regular cpuidle framework.
>> > >
>> > > If the initialization of the PM domain data structures succeeds and the OS
>> > > initiated mode is supported, we try to switch to it. In case it fails,
>> > > let's fall back into a degraded mode, rather than bailing out and returning
>> > > an error code.
>> > >
>> > > Due to that the OS initiated mode may become enabled, we need to adjust to
>> > > maintain backwards compatibility for a kernel started through a kexec call.
>> > > Do this by explicitly switch to Platform Coordinated mode during boot.
>> > >
>> > > Finally, the actual initialization of the PM domain data structures, is
>> > > done via calling the new shared function, psci_dt_init_pm_domains().
>> > > However, this is implemented by subsequent changes.
>> > >
>> > > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
>> > > Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
>> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>> > > ---
>> > >
>> > > Changes:
>> > >       - Simplify code setting domain_state at power off.
>> > >       - Use the genpd ->free_state() callback to manage freeing of states.
>> > >       - Fixup a bogus while loop.
>> > >
>> > > ---
>> > >  drivers/firmware/psci/Makefile         |   2 +-
>> > >  drivers/firmware/psci/psci.c           |   7 +-
>> > >  drivers/firmware/psci/psci.h           |   5 +
>> > >  drivers/firmware/psci/psci_pm_domain.c | 268 +++++++++++++++++++++++++
>> > >  4 files changed, 280 insertions(+), 2 deletions(-)
>> > >  create mode 100644 drivers/firmware/psci/psci_pm_domain.c
>> > >
>> >
>> > [...]
>> >
>> > >  #endif /* __PSCI_H */
>> > > diff --git a/drivers/firmware/psci/psci_pm_domain.c b/drivers/firmware/psci/psci_pm_domain.c
>> > > new file mode 100644
>> > > index 000000000000..3c6ca846caf4
>> > > --- /dev/null
>> > > +++ b/drivers/firmware/psci/psci_pm_domain.c
>> > > @@ -0,0 +1,268 @@
>> > > +// SPDX-License-Identifier: GPL-2.0
>> > > +/*
>> > > + * PM domains for CPUs via genpd - managed by PSCI.
>> > > + *
>> > > + * Copyright (C) 2019 Linaro Ltd.
>> > > + * Author: Ulf Hansson <ulf.hansson@linaro.org>
>> > > + *
>> > > + */
>> > > +
>> >
>> > [...]
>> >
>> > > +static int psci_pd_power_off(struct generic_pm_domain *pd)
>> > > +{
>> > > +     struct genpd_power_state *state = &pd->states[pd->state_idx];
>> > > +     u32 *pd_state;
>> > > +
>> > > +     /* If we have failed to enable OSI mode, then abort power off. */
>> > > +     if (psci_has_osi_support() && !osi_mode_enabled)
>> > > +             return -EBUSY;
>> > > +
>> > > +     if (!state->data)
>> > > +             return 0;
>> > > +
>> > > +     /* When OSI mode is enabled, set the corresponding domain state. */
>> > > +     pd_state = state->data;
>> > > +     psci_set_domain_state(*pd_state);
>> >
>> > I trying to understand how would this scale to level 2(cluster of
>> > clusters or for simply system). The current code for psci_set_domain_state
>> > just stores the value @pd_state into per-cpu domain_state. E.g.: Now if
>> > the system level pd is getting called after cluster PD, it will set the
>> > domain state to system level PD state. It won't work with original
>> > format and it may work with extended format if it's carefully crafted.
>> > In short, the point is just over-writing domain_state is asking for
>> > troubles IMO.
>>
>> Thanks for spotting this!
>>
>> While walking upwards in the PM domain topology, I thought I was ORing
>> the domain states, but clearly the code isn't doing that.
>>
>> In principle we need to do the below instead.
>>
>> pd_state = state->data;
>> composite_pd_state = *pd_state | psci_get_domain_state();
>> psci_set_domain_state(composite_pd_state);
>>
>
>Yes 2 different issues here:
>1. The direct assignment overwriting the value is problem which you agree.
>2. The OR logic on it's own is bit not so clear from the specification.
>   Since firmware authors need to be aware of this to make all these
>   work. So it's not implicit, either we set this requirement in form
>   of binding. We were also thinking of stating composite state in the
>   DT, still just a thought, need to come up with examples/illustrations.
>
It is generally very obvious to firmware authors to map hardware
definitions to specific bits in the state param. If a cluster component
has more than on/off state, more bits are assigned to the define the
idle states of the component.
Addition is also an option, but there are enough bits compared to the
hardware components that we have in each state, that it hasn't been a
problem.

--Lina


^ permalink raw reply

* Re: [PATCH 14/18] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
From: Lina Iyer @ 2019-07-18 17:41 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Lorenzo Pieralisi, Sudeep Holla, Mark Rutland, Linux ARM,
	Rafael J . Wysocki, Daniel Lezcano, Raju P . L . S . S . S . N,
	Amit Kucheria, Bjorn Andersson, Stephen Boyd, Niklas Cassel,
	Tony Lindgren, Kevin Hilman, Viresh Kumar, Vincent Guittot,
	Geert Uytterhoeven, Souvik Chakravarty, Linux PM, linux-arm-msm,
	Linux Kernel Mailing List
In-Reply-To: <CAPDyKFr4NmichQk4uf+Wgbanh=5idKYY=37WCb6U_hNFDVYg=w@mail.gmail.com>

On Thu, Jul 18 2019 at 10:55 -0600, Ulf Hansson wrote:
>On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
><lorenzo.pieralisi@arm.com> wrote:
>>
>> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
>> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
>> > <lorenzo.pieralisi@arm.com> wrote:
>> > >
>> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
>> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
>> > > > CPU to be power managed through its PM domain, via deploying runtime PM
>> > > > support.
>> > > >
>> > > > To know for which idle states runtime PM reference counting is needed,
>> > > > let's store the index of deepest idle state for the CPU, in a per CPU
>> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
>> > > > the requested idle state index and then act accordingly.
>> > >
>> > > I do not see why a system with two CPU CPUidle states, say CPU retention
>> > > and CPU shutdown, should not be calling runtime PM on CPU retention
>> > > entry.
>> >
>> > If the CPU idle governor did select the CPU retention for the CPU, it
>> > was probably because the target residency for the CPU shutdown state
>> > could not be met.
>>
>> The kernel does not know what those cpu states represent, so, this is an
>> assumption you are making and it must be made clear that this code works
>> as long as your assumption is valid.
>>
>> If eg a "cluster" retention state has lower target_residency than
>> the deepest CPU idle state this assumption is wrong.
>
>Good point, you are right. I try to find a place to document this assumption.
>
>>
>> And CPUidle and genPD governor decisions are not synced anyway so,
>> again, this is an assumption, not a certainty.
>>
>> > In this case, there is no point in allowing any other deeper idle
>> > states for cluster/package/system, since those have even greater
>> > residencies, hence calling runtime PM doesn't make sense.
>>
>> On the systems you are testing on.
>
>So what you are saying typically means, that if all CPUs in the same
>cluster have entered the CPU retention state, on some system the
>cluster may also put into a cluster retention state (assuming the
>target residency is met)?
>
>Do you know of any systems that has these characteristics?
>
Many QCOM SoCs can do that. But with the hardware improving, the
power-performance benefits skew the results in favor of powering off
the cluster than keeping the CPU and cluster in retention.

Kevin H and I thought of this problem earlier on. But that is a second
level problem to solve and definitely to be thought of after we have the
support for the deepest states in the kernel. We left that out for a
later date. The idea would have been to setup the allowable state(s) in
the DT for CPU and cluster state definitions and have the genpd take
that into consideration when deciding the idle state for the domain.

Thanks,
Lina


^ permalink raw reply

* Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
From: Saravana Kannan @ 2019-07-18 17:26 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
	Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
	Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
	Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
In-Reply-To: <20190718043558.roi4j6jw5n4zkwky@vireshk-i7>

On Wed, Jul 17, 2019 at 9:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 17-07-19, 13:29, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > > Interconnects often quantify their performance points in terms of
> > > > bandwidth. So, add opp-peak-KBps (required) and opp-avg-KBps (optional) to
> > > > allow specifying Bandwidth OPP tables in DT.
> > > >
> > > > opp-peak-KBps is a required property that replace opp-hz for Bandwidth OPP
> > > > tables.
> > > >
> > > > opp-avg-KBps is an optional property that can be used in Bandwidth OPP
> > > > tables.
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  Documentation/devicetree/bindings/opp/opp.txt | 15 ++++++++++++---
> > > >  1 file changed, 12 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt
> > > > index 76b6c79604a5..c869e87caa2a 100644
> > > > --- a/Documentation/devicetree/bindings/opp/opp.txt
> > > > +++ b/Documentation/devicetree/bindings/opp/opp.txt
> > > > @@ -83,9 +83,14 @@ properties.
> > > >
> > > >  Required properties:
> > > >  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a
> > > > -  required property for all device nodes but devices like power domains. The
> > > > -  power domain nodes must have another (implementation dependent) property which
> > > > -  uniquely identifies the OPP nodes.
> > > > +  required property for all device nodes but for devices like power domains or
> > > > +  bandwidth opp tables. The power domain nodes must have another (implementation
> > > > +  dependent) property which uniquely identifies the OPP nodes. The interconnect
> > > > +  opps are required to have the opp-peak-bw property.
> > >
> > >                                    ??
> >
> > Sorry, what's the question? Was this an accidental email?
>
> Too much smartness is too bad sometimes, sorry about that :)
>
> I placed the ?? right below "opp-peak-bw", there is no property like
> that. You failed to update it :)

Ah, "typo". I'll fix it.

-Saravana

^ permalink raw reply

* Re: [PATCH 14/18] drivers: firmware: psci: Manage runtime PM in the idle path for CPUs
From: Ulf Hansson @ 2019-07-18 16:54 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Sudeep Holla, Mark Rutland, Linux ARM, Rafael J . Wysocki,
	Daniel Lezcano, Raju P . L . S . S . S . N, Amit Kucheria,
	Bjorn Andersson, Stephen Boyd, Niklas Cassel, Tony Lindgren,
	Kevin Hilman, Viresh Kumar, Lina Iyer, Vincent Guittot,
	Geert Uytterhoeven, Souvik Chakravarty, Linux PM, linux-arm-msm,
	Linux Kernel Mailing List
In-Reply-To: <20190718133053.GA27222@e121166-lin.cambridge.arm.com>

On Thu, 18 Jul 2019 at 15:31, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
>
> On Thu, Jul 18, 2019 at 12:35:07PM +0200, Ulf Hansson wrote:
> > On Tue, 16 Jul 2019 at 17:53, Lorenzo Pieralisi
> > <lorenzo.pieralisi@arm.com> wrote:
> > >
> > > On Mon, May 13, 2019 at 09:22:56PM +0200, Ulf Hansson wrote:
> > > > When the hierarchical CPU topology layout is used in DT, let's allow the
> > > > CPU to be power managed through its PM domain, via deploying runtime PM
> > > > support.
> > > >
> > > > To know for which idle states runtime PM reference counting is needed,
> > > > let's store the index of deepest idle state for the CPU, in a per CPU
> > > > variable. This allows psci_cpu_suspend_enter() to compare this index with
> > > > the requested idle state index and then act accordingly.
> > >
> > > I do not see why a system with two CPU CPUidle states, say CPU retention
> > > and CPU shutdown, should not be calling runtime PM on CPU retention
> > > entry.
> >
> > If the CPU idle governor did select the CPU retention for the CPU, it
> > was probably because the target residency for the CPU shutdown state
> > could not be met.
>
> The kernel does not know what those cpu states represent, so, this is an
> assumption you are making and it must be made clear that this code works
> as long as your assumption is valid.
>
> If eg a "cluster" retention state has lower target_residency than
> the deepest CPU idle state this assumption is wrong.

Good point, you are right. I try to find a place to document this assumption.

>
> And CPUidle and genPD governor decisions are not synced anyway so,
> again, this is an assumption, not a certainty.
>
> > In this case, there is no point in allowing any other deeper idle
> > states for cluster/package/system, since those have even greater
> > residencies, hence calling runtime PM doesn't make sense.
>
> On the systems you are testing on.

So what you are saying typically means, that if all CPUs in the same
cluster have entered the CPU retention state, on some system the
cluster may also put into a cluster retention state (assuming the
target residency is met)?

Do you know of any systems that has these characteristics?

[...]

Kind regards
Uffe

^ permalink raw reply

* Re: [GIT PULL] More power management updates for v5.3-rc1
From: pr-tracker-bot @ 2019-07-18 16:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Linux PM, ACPI Devel Maling List,
	Linux Kernel Mailing List
In-Reply-To: <CAJZ5v0irbFa5E=UZ+1XiiuoXC9zD0qc7vx3NtTCmB88h3_U4hw@mail.gmail.com>

The pull request you sent on Thu, 18 Jul 2019 11:06:18 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-5.3-rc1-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d0411ec8ca6b98061023873e334323ef102100cc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* RE: [PATCH] Revert "cpufreq: schedutil: Don't set next_freq to UINT_MAX"
From: Doug Smythies @ 2019-07-18 15:46 UTC (permalink / raw)
  To: 'Viresh Kumar'; +Cc: rjw, joel, linux-kernel, linux-pm, Doug Smythies
In-Reply-To: <20190718102815.utl3hanfc7fpf2i6@vireshk-i7>

On 2019.07.18 03:28 Viresh Kumar wrote:
> On 17-07-19, 23:26, Doug Smythies wrote:
>> This reverts commit ecd2884291261e3fddbc7651ee11a20d596bb514.
>> 
>> The commit caused a regression whereby reducing the maximum
>> CPU clock frequency is ineffective while busy, and the CPU
>> clock remains unchanged. Once the system has experienced
>> some idle time, the new limit is implemented.
>
> Can you explain why this patch caused that issue ? I am sorry but I couldn't
> understand it from your email. How are we trying to reduce the frequency? Is
> clk_set_rate() getting called with that finally and not working ?

The patch eliminates the "flag", UNIT_MAX, and it's related comment,
that was used to indicate if it was a limit change that causes the
subsequent execution of sugov_update_single.

As for "clk_set_rate()", I don't know. I bisected the kernel
and got here. I also looked at reverting B7EAF1AAB9F8, but
it seemed to have some purpose which I don't know of more
important than this one or not.

I'll reinsert the first test below with more detail:

On 2019.07.17 23:27 Doug wrote:

> Driver: intel_cpufreq ; Governor: Schedutil
> Kernel 5.2:

$ uname -a
Linux s15 5.2.0-stock #608 SMP PREEMPT Mon Jul 8 08:21:56 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux

doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
intel_cpufreq
doug@s15:~$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
schedutil
schedutil
schedutil
schedutil
schedutil
schedutil
schedutil
schedutil

> Test 1: No thermal throttling involved (I only ever use thermal, if anything):

doug@s15:~$ sudo systemctl status thermald.service
. thermald.service - Thermal Daemon Service
   Loaded: loaded (/lib/systemd/system/thermald.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

> - load the system (some sort of CPU stress test).

Use whatever you want. I use my own, only because I always have and it prints something
every so often which gives an indication of actual clock speed.

doug@s15:~$ ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 &
[1] 3000
[2] 3001
[3] 3002
[4] 3003
[5] 3004
[6] 3005
[7] 3006
[8] 3007
[9] 3008

Watch everything with turobstat:

doug@s15:~$ sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 5
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt
0.03    1600    269     25      3.69
0.07    1600    390     25      3.72
0.06    1600    368     25      3.72
0.06    1600    343     25      3.71
0.04    1600    269     26      3.70
74.72   3474    30230   46      43.95  <<< Add load
100.00  3500    40228   50      58.27
100.00  3500    40196   52      58.59
100.00  3500    40215   55      58.91
100.00  3500    40211   56      59.12
100.00  3500    40291   58      59.33  <<< Try to set 60% max
100.00  3500    40278   59      59.55
100.00  3500    40591   61      59.73
100.00  3500    40279   61      60.10
100.00  3500    40292   63      60.35
100.00  3500    40401   64      60.85
99.99   3500    40352   65      61.12
100.00  3500    40230   67      61.32
100.00  3500    40228   67      61.52
100.00  3500    40400   68      61.60  <<< Try to set 42% max
100.00  3500    40279   69      61.74
100.00  3500    40258   68      61.85
100.00  3500    40226   70      62.01
100.00  3500    40220   70      62.10
100.00  3500    40211   71      62.25

> - adjust downwards the maximum clock frequency
>   echo 60 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct

doug@s15:~/temp$ echo 60 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
60
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
60
... wait awhile ...
doug@s15:~/temp$ echo 42 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
42
doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
42

> - observe the Bzy_MHz does not change on the turbostat output.

See annotated turbostat output above.

> - reduce the system load, such that there is some idle time.
> - observe the Bzy_MHz now does change.
> - increase the load again.
> - observe the CPU frequency is now pinned to the new limit.

Go back to 60% first:

doug@s15:~/temp$ echo 60 | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct
60
killall test1
... wait awhile ...
doug@s15:~$ ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 & ~/c/test1 &
[1] 3040
[2] 3041
[3] 3042
[4] 3043
[5] 3044
[6] 3045
[7] 3046
[8] 3047
[9] 3048

doug@s15:~$ sudo turbostat --quiet --Summary --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 5
Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt
100.00  3500    40289   80      64.32
100.00  3500    40223   79      64.16
100.00  3500    40212   79      64.14
100.00  3500    40269   79      64.17
100.00  3500    41330   79      64.19  <<< Freq is above the 60% limit
14.10   3489    6260    55      12.63  <<< Load removed, now not "busy"
0.03    1600    263     53      4.13   <<< see sugov_update_single
23.22   2293    9582    65      10.72  <<< Load applied
100.00  2300    40229   66      35.09  <<< 3.8 GHz * 0.6 = 2.3 GHz
100.00  2300    40209   66      35.21
100.00  2300    40211   65      35.20
100.00  2300    40219   65      35.16
100.00  2300    40224   65      35.14

The only procedure changes when using the acpi-cpufreq CPU scaling driver and
the schedutil governor are for setting and monitoring the max freq settings,
as described in the test write-ups.

Hope this helps.

... Doug



^ permalink raw reply

* Re: [PATCH v11 1/5] sched/core: uclamp: Extend CPU's cgroup controller
From: Patrick Bellasi @ 2019-07-18 15:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190718145238.GD696309@devbig004.ftw2.facebook.com>

On 18-Jul 07:52, Tejun Heo wrote:
> Hello, Patrick.
> 
> On Mon, Jul 08, 2019 at 09:43:53AM +0100, Patrick Bellasi wrote:
> > +static inline void cpu_uclamp_print(struct seq_file *sf,
> > +				    enum uclamp_id clamp_id)
> > +{
> > +	struct task_group *tg;
> > +	u64 util_clamp;
> > +	u64 percent;
> > +	u32 rem;
> > +
> > +	rcu_read_lock();
> > +	tg = css_tg(seq_css(sf));
> > +	util_clamp = tg->uclamp_req[clamp_id].value;
> > +	rcu_read_unlock();
> > +
> > +	if (util_clamp == SCHED_CAPACITY_SCALE) {
> > +		seq_puts(sf, "max\n");
> > +		return;
> > +	}
> > +
> > +	percent = uclamp_percent_from_scale(util_clamp);
> > +	percent = div_u64_rem(percent, 100, &rem);
> > +	seq_printf(sf, "%llu.%u\n", percent, rem);
> 
> "%llu.%02u" otherwise 20.01 gets printed as 20.1

Yup!... good point! :)

Since we already collected many feedbacks, I've got a v12 ready for posting.
Maybe you better wait for that before going on with the review.

Thanks,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply

* Re: [PATCH v11 1/5] sched/core: uclamp: Extend CPU's cgroup controller
From: Tejun Heo @ 2019-07-18 14:52 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Viresh Kumar, Paul Turner,
	Quentin Perret, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle, Suren Baghdasaryan,
	Alessio Balsini
In-Reply-To: <20190708084357.12944-2-patrick.bellasi@arm.com>

Hello, Patrick.

On Mon, Jul 08, 2019 at 09:43:53AM +0100, Patrick Bellasi wrote:
> +static inline void cpu_uclamp_print(struct seq_file *sf,
> +				    enum uclamp_id clamp_id)
> +{
> +	struct task_group *tg;
> +	u64 util_clamp;
> +	u64 percent;
> +	u32 rem;
> +
> +	rcu_read_lock();
> +	tg = css_tg(seq_css(sf));
> +	util_clamp = tg->uclamp_req[clamp_id].value;
> +	rcu_read_unlock();
> +
> +	if (util_clamp == SCHED_CAPACITY_SCALE) {
> +		seq_puts(sf, "max\n");
> +		return;
> +	}
> +
> +	percent = uclamp_percent_from_scale(util_clamp);
> +	percent = div_u64_rem(percent, 100, &rem);
> +	seq_printf(sf, "%llu.%u\n", percent, rem);

"%llu.%02u" otherwise 20.01 gets printed as 20.1

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH v2 2/9] soc: samsung: Convert exynos-chipid driver to use the regmap API
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

Convert the driver to use regmap API in order to allow other
drivers, like ASV, to access the CHIPID registers.

This patch adds definition of selected CHIPID register offsets
and register bit fields for Exynos5422 SoC.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - new patch
---
 drivers/soc/samsung/exynos-chipid.c       | 33 ++++++----------
 include/linux/soc/samsung/exynos-chipid.h | 48 +++++++++++++++++++++++
 2 files changed, 61 insertions(+), 20 deletions(-)
 create mode 100644 include/linux/soc/samsung/exynos-chipid.h

diff --git a/drivers/soc/samsung/exynos-chipid.c b/drivers/soc/samsung/exynos-chipid.c
index 78b123ee60c0..594b00488013 100644
--- a/drivers/soc/samsung/exynos-chipid.c
+++ b/drivers/soc/samsung/exynos-chipid.c
@@ -9,18 +9,16 @@
  */
 
 #include <linux/io.h>
+#include <linux/mfd/syscon.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
+#include <linux/regmap.h>
 #include <linux/slab.h>
+#include <linux/soc/samsung/exynos-chipid.h>
 #include <linux/sys_soc.h>
 
-#define EXYNOS_SUBREV_MASK	(0xF << 4)
-#define EXYNOS_MAINREV_MASK	(0xF << 0)
-#define EXYNOS_REV_MASK		(EXYNOS_SUBREV_MASK | EXYNOS_MAINREV_MASK)
-#define EXYNOS_MASK		0xFFFFF000
-
 static const struct exynos_soc_id {
 	const char *name;
 	unsigned int id;
@@ -53,29 +51,24 @@ static const char * __init product_id_to_soc_id(unsigned int product_id)
 int __init exynos_chipid_early_init(void)
 {
 	struct soc_device_attribute *soc_dev_attr;
-	void __iomem *exynos_chipid_base;
 	struct soc_device *soc_dev;
 	struct device_node *root;
-	struct device_node *np;
+	struct regmap *regmap;
 	u32 product_id;
 	u32 revision;
+	int ret;
 
-	/* look up for chipid node */
-	np = of_find_compatible_node(NULL, NULL, "samsung,exynos4210-chipid");
-	if (!np)
-		return -ENODEV;
-
-	exynos_chipid_base = of_iomap(np, 0);
-	of_node_put(np);
-
-	if (!exynos_chipid_base) {
-		pr_err("Failed to map SoC chipid\n");
-		return -ENXIO;
+	regmap = syscon_regmap_lookup_by_compatible("samsung,exynos4210-chipid");
+	if (IS_ERR(regmap)) {
+		pr_err("%s: failed to get regmap\n", __func__);
+		return PTR_ERR(regmap);
 	}
 
-	product_id = readl_relaxed(exynos_chipid_base);
+	ret = regmap_read(regmap, EXYNOS_CHIPID_REG_PRO_ID, &product_id);
+	if (ret < 0)
+		return ret;
+
 	revision = product_id & EXYNOS_REV_MASK;
-	iounmap(exynos_chipid_base);
 
 	soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
 	if (!soc_dev_attr)
diff --git a/include/linux/soc/samsung/exynos-chipid.h b/include/linux/soc/samsung/exynos-chipid.h
new file mode 100644
index 000000000000..25359d70d617
--- /dev/null
+++ b/include/linux/soc/samsung/exynos-chipid.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2018 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ *
+ * Exynos - CHIPID support
+ */
+
+#define EXYNOS_CHIPID_REG_PRO_ID	0x00
+ #define EXYNOS_SUBREV_MASK		(0xf << 4)
+ #define EXYNOS_MAINREV_MASK		(0xf << 0)
+ #define EXYNOS_REV_MASK		(EXYNOS_SUBREV_MASK | \
+					 EXYNOS_MAINREV_MASK)
+ #define EXYNOS_MASK			0xfffff000
+
+#define EXYNOS_CHIPID_REG_PKG_ID	0x04
+/* Bit field definitions for EXYNOS_CHIPID_REG_PKG_ID register */
+ #define EXYNOS5422_IDS_OFFSET		24
+ #define EXYNOS5422_IDS_MASK		0xff
+ #define EXYNOS5422_USESG_OFFSET	3
+ #define EXYNOS5422_USESG_MASK		0x01
+ #define EXYNOS5422_SG_OFFSET		0
+ #define EXYNOS5422_SG_MASK		0x07
+ #define EXYNOS5422_TABLE_OFFSET	8
+ #define EXYNOS5422_TABLE_MASK		0x03
+ #define EXYNOS5422_SG_A_OFFSET		17
+ #define EXYNOS5422_SG_A_MASK		0x0f
+ #define EXYNOS5422_SG_B_OFFSET		21
+ #define EXYNOS5422_SG_B_MASK		0x03
+ #define EXYNOS5422_SG_BSIGN_OFFSET	23
+ #define EXYNOS5422_SG_BSIGN_MASK	0x01
+ #define EXYNOS5422_BIN2_OFFSET		12
+ #define EXYNOS5422_BIN2_MASK		0x01
+
+#define EXYNOS_CHIPID_REG_LOT_ID	0x14
+
+#define EXYNOS_CHIPID_REG_AUX_INFO	0x1c
+/* Bit field definitions for EXYNOS_CHIPID_REG_AUX_INFO register */
+ #define EXYNOS5422_TMCB_OFFSET		0
+ #define EXYNOS5422_TMCB_MASK		0x7f
+ #define EXYNOS5422_ARM_UP_OFFSET	8
+ #define EXYNOS5422_ARM_UP_MASK		0x03
+ #define EXYNOS5422_ARM_DN_OFFSET	10
+ #define EXYNOS5422_ARM_DN_MASK		0x03
+ #define EXYNOS5422_KFC_UP_OFFSET	12
+ #define EXYNOS5422_KFC_UP_MASK		0x03
+ #define EXYNOS5422_KFC_DN_OFFSET	14
+ #define EXYNOS5422_KFC_DN_MASK		0x03
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 4/9] ARM: EXYNOS: enable exynos_chipid for ARCH_EXYNOS
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

From: Pankaj Dubey <pankaj.dubey@samsung.com>

As now we have chipid driver to initialize SoC related information
let's include it in build by default.

Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - none
---
 arch/arm/mach-exynos/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-exynos/Kconfig b/arch/arm/mach-exynos/Kconfig
index 1c518b8ee520..6fc4af312361 100644
--- a/arch/arm/mach-exynos/Kconfig
+++ b/arch/arm/mach-exynos/Kconfig
@@ -13,6 +13,7 @@ menuconfig ARCH_EXYNOS
 	select ARM_AMBA
 	select ARM_GIC
 	select COMMON_CLK_SAMSUNG
+	select EXYNOS_CHIPID
 	select EXYNOS_THERMAL
 	select EXYNOS_PMU
 	select EXYNOS_SROM
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 3/9] soc: samsung: Add Exynos Adaptive Supply Voltage driver
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

The Adaptive Supply Voltage (ASV) driver adjusts CPU cluster operating
points depending on exact revision of an SoC retrieved from the CHIPID
block or the OTP memory.  This allows for some power saving as for some
CPU clock frequencies we can lower CPU cluster supply voltage comparing
to safe values common to the all chip revisions.

This patch adds support for Exynos5422/5800 SoC, it is partially based
on code from https://github.com/hardkernel/linux repository,
branch odroidxu4-4.14.y, files: arch/arm/mach-exynos/exynos5422-asv.[ch].

Tested on Odroid XU3, XU4, XU3 Lite.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - removed code for parsing the ASV OPP tables from DT, the ASV OPP tables
   moved to the driver;
 - converted to use the regmap API;
 - converted to normal platform driver.
---
 drivers/soc/samsung/Kconfig          |  11 +
 drivers/soc/samsung/Makefile         |   3 +
 drivers/soc/samsung/exynos-asv.c     | 185 ++++++++++
 drivers/soc/samsung/exynos-asv.h     |  82 +++++
 drivers/soc/samsung/exynos5422-asv.c | 499 +++++++++++++++++++++++++++
 drivers/soc/samsung/exynos5422-asv.h |  25 ++
 6 files changed, 805 insertions(+)
 create mode 100644 drivers/soc/samsung/exynos-asv.c
 create mode 100644 drivers/soc/samsung/exynos-asv.h
 create mode 100644 drivers/soc/samsung/exynos5422-asv.c
 create mode 100644 drivers/soc/samsung/exynos5422-asv.h

diff --git a/drivers/soc/samsung/Kconfig b/drivers/soc/samsung/Kconfig
index 2905f5262197..539cd95dd176 100644
--- a/drivers/soc/samsung/Kconfig
+++ b/drivers/soc/samsung/Kconfig
@@ -7,6 +7,17 @@ menuconfig SOC_SAMSUNG
 
 if SOC_SAMSUNG
 
+config EXYNOS_ASV
+	bool "Exynos Adaptive Supply Voltage support" if COMPILE_TEST
+	depends on ARCH_EXYNOS || COMPILE_TEST
+	depends on EXYNOS_CHIPID
+	select EXYNOS_ASV_ARM if ARM && ARCH_EXYNOS
+
+# There is no need to enable these drivers for ARMv8
+config EXYNOS_ASV_ARM
+	bool "Exynos ASV ARMv7-specific driver extensions" if COMPILE_TEST
+	depends on EXYNOS_ASV
+
 config EXYNOS_CHIPID
 	bool "Exynos Chipid controller driver" if COMPILE_TEST
 	depends on ARCH_EXYNOS || COMPILE_TEST
diff --git a/drivers/soc/samsung/Makefile b/drivers/soc/samsung/Makefile
index 3b6a8797416c..edd1d6ea064d 100644
--- a/drivers/soc/samsung/Makefile
+++ b/drivers/soc/samsung/Makefile
@@ -1,5 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 
+obj-$(CONFIG_EXYNOS_ASV)	+= exynos-asv.o
+obj-$(CONFIG_EXYNOS_ASV_ARM)	+= exynos5422-asv.o
+
 obj-$(CONFIG_EXYNOS_CHIPID)	+= exynos-chipid.o
 obj-$(CONFIG_EXYNOS_PMU)	+= exynos-pmu.o
 
diff --git a/drivers/soc/samsung/exynos-asv.c b/drivers/soc/samsung/exynos-asv.c
new file mode 100644
index 000000000000..b1a7e0ba8870
--- /dev/null
+++ b/drivers/soc/samsung/exynos-asv.c
@@ -0,0 +1,185 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ * Author: Sylwester Nawrocki <s.nawrocki@samsung.com>
+ *
+ * Samsung Exynos SoC Adaptive Supply Voltage support
+ */
+
+#include <linux/cpu.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/mfd/syscon.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/platform_device.h>
+#include <linux/pm_opp.h>
+#include <linux/regmap.h>
+#include <linux/slab.h>
+#include <linux/soc/samsung/exynos-chipid.h>
+
+#include "exynos-asv.h"
+#include "exynos5422-asv.h"
+
+#define MHZ 1000000U
+
+static int exynos_asv_update_cpu_opps(struct exynos_asv *asv,
+				      struct device *cpu)
+{
+	struct exynos_asv_subsys *subsys = NULL;
+	struct dev_pm_opp *opp;
+	unsigned int opp_freq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(asv->subsys); i++) {
+		if (of_device_is_compatible(cpu->of_node,
+					    asv->subsys[i].cpu_dt_compat)) {
+			subsys = &asv->subsys[i];
+			break;
+		}
+	}
+	if (!subsys)
+		return -EINVAL;
+
+	for (i = 0; i < subsys->table.num_rows; i++) {
+		unsigned int new_voltage;
+		unsigned int voltage;
+		int timeout = 1000;
+		int err;
+
+		opp_freq = exynos_asv_opp_get_frequency(subsys, i);
+
+		opp = dev_pm_opp_find_freq_exact(cpu, opp_freq * MHZ, true);
+		if (IS_ERR(opp)) {
+			dev_info(asv->dev, "cpu%d opp%d, freq: %u missing\n",
+				 cpu->id, i, opp_freq);
+
+			continue;
+		}
+
+		voltage = dev_pm_opp_get_voltage(opp);
+		new_voltage = asv->opp_get_voltage(subsys, i, voltage);
+		dev_pm_opp_put(opp);
+
+		opp_freq *= MHZ;
+		dev_pm_opp_remove(cpu, opp_freq);
+
+		while (--timeout) {
+			opp = dev_pm_opp_find_freq_exact(cpu, opp_freq, true);
+			if (IS_ERR(opp))
+				break;
+			dev_pm_opp_put(opp);
+			msleep(1);
+		}
+
+		err = dev_pm_opp_add(cpu, opp_freq, new_voltage);
+		if (err < 0)
+			dev_err(asv->dev,
+				"Failed to add OPP %u Hz/%u uV for cpu%d\n",
+				opp_freq, new_voltage, cpu->id);
+	}
+
+	return 0;
+}
+
+static int exynos_asv_update_opps(struct exynos_asv *asv)
+{
+	struct opp_table *last_opp_table = NULL;
+	struct device *cpu;
+	int ret, cpuid;
+
+	for_each_possible_cpu(cpuid) {
+		struct opp_table *opp_table;
+
+		cpu = get_cpu_device(cpuid);
+		if (!cpu)
+			continue;
+
+		opp_table = dev_pm_opp_get_opp_table(cpu);
+		if (IS_ERR(opp_table))
+			continue;
+
+		if (!last_opp_table || opp_table != last_opp_table) {
+			last_opp_table = opp_table;
+
+			ret = exynos_asv_update_cpu_opps(asv, cpu);
+			if (ret < 0)
+				dev_err(asv->dev, "Couldn't udate OPPs for cpu%d\n",
+					cpuid);
+		}
+
+		dev_pm_opp_put_opp_table(opp_table);
+	}
+
+	return	0;
+}
+
+static int exynos_asv_probe(struct platform_device *pdev)
+{
+	int (*probe_func)(struct exynos_asv *asv);
+	struct exynos_asv *asv;
+	struct device *cpu_dev;
+	u32 product_id = 0;
+	int ret, i;
+
+	cpu_dev = get_cpu_device(0);
+	ret = dev_pm_opp_get_opp_count(cpu_dev);
+	if (ret < 0)
+		return -EPROBE_DEFER;
+
+	asv = kcalloc(1, sizeof(*asv), GFP_KERNEL);
+	if (!asv)
+		return -ENOMEM;
+
+	asv->chipid_regmap = syscon_node_to_regmap(pdev->dev.of_node);
+	if (IS_ERR(asv->chipid_regmap)) {
+		dev_err(&pdev->dev, "Could not find syscon regmap\n");
+		return PTR_ERR(asv->chipid_regmap);
+	}
+
+	regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_PRO_ID, &product_id);
+
+	switch (product_id & EXYNOS_MASK) {
+	case 0xE5422000:
+		probe_func = exynos5422_asv_init;
+		break;
+	default:
+		dev_err(&pdev->dev, "Unsupported product ID: %#x", product_id);
+		return -ENODEV;
+	}
+
+	ret = of_property_read_u32(pdev->dev.of_node, "samsung,asv-bin",
+				   &asv->of_bin);
+	if (ret < 0)
+		asv->of_bin = -EINVAL;
+
+	asv->dev = &pdev->dev;
+	dev_set_drvdata(&pdev->dev, asv);
+
+	for (i = 0; i < ARRAY_SIZE(asv->subsys); i++)
+		asv->subsys[i].asv = asv;
+
+	ret = probe_func(asv);
+	if (ret < 0)
+		return ret;
+
+	return exynos_asv_update_opps(asv);
+}
+
+static const struct of_device_id exynos_asv_of_device_ids[] = {
+	{ .compatible = "samsung,exynos4210-chipid" },
+	{}
+};
+
+static struct platform_driver exynos_asv_driver = {
+	.driver = {
+		.name = "exynos-asv",
+		.of_match_table = exynos_asv_of_device_ids,
+	},
+	.probe	= exynos_asv_probe,
+};
+module_platform_driver(exynos_asv_driver);
diff --git a/drivers/soc/samsung/exynos-asv.h b/drivers/soc/samsung/exynos-asv.h
new file mode 100644
index 000000000000..d0a5d603093d
--- /dev/null
+++ b/drivers/soc/samsung/exynos-asv.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ * Author: Sylwester Nawrocki <s.nawrocki@samsung.com>
+ *
+ * Samsung Exynos SoC Adaptive Supply Voltage support
+ */
+#ifndef _EXYNOS_ASV_H
+#define _EXYNOS_ASV_H
+
+enum {
+	EXYNOS_ASV_SUBSYS_ID_ARM,
+	EXYNOS_ASV_SUBSYS_ID_EGL = EXYNOS_ASV_SUBSYS_ID_ARM,
+	EXYNOS_ASV_SUBSYS_ID_KFC,
+	EXYNOS_ASV_SUBSYS_ID_INT,
+	EXYNOS_ASV_SUBSYS_ID_MIF,
+	EXYNOS_ASV_SUBSYS_ID_G3D,
+	EXYNOS_ASV_SUBSYS_ID_CAM,
+	EXYNOS_ASV_SUBSYS_ID_MAX
+};
+
+struct regmap;
+
+/* HPM, IDS values to select target group */
+struct asv_limit_entry {
+	unsigned int hpm;
+	unsigned int ids;
+};
+
+struct exynos_asv_table {
+	unsigned int num_rows;
+	unsigned int num_cols;
+	u32 *buf;
+};
+
+struct exynos_asv_subsys {
+	struct exynos_asv *asv;
+	char *cpu_dt_compat;
+	int id;
+	struct exynos_asv_table table;
+
+	unsigned int base_volt;
+	unsigned int offset_volt_h;
+	unsigned int offset_volt_l;
+};
+
+struct exynos_asv {
+	struct device *dev;
+	struct regmap *chipid_regmap;
+	struct exynos_asv_subsys subsys[2];
+
+	int (*opp_get_voltage)(struct exynos_asv_subsys *subs, int level,
+			       unsigned int voltage);
+	unsigned int group;
+	unsigned int table;
+
+	/* True if SG fields from PKG_ID register should be used */
+	bool use_sg;
+	/* ASV bin read from DT */
+	int of_bin;
+};
+
+static inline u32 __asv_get_table_entry(struct exynos_asv_table *table,
+					unsigned int row, unsigned int col)
+{
+	return table->buf[row * (table->num_cols) + col];
+}
+
+static inline u32 exynos_asv_opp_get_voltage(struct exynos_asv_subsys *subsys,
+					unsigned int level, unsigned int group)
+{
+	return __asv_get_table_entry(&subsys->table, level, group + 1);
+}
+
+static inline u32 exynos_asv_opp_get_frequency(struct exynos_asv_subsys *subsys,
+					unsigned int level)
+{
+	return __asv_get_table_entry(&subsys->table, level, 0);
+}
+
+#endif /* _EXYNOS_ASV_H */
diff --git a/drivers/soc/samsung/exynos5422-asv.c b/drivers/soc/samsung/exynos5422-asv.c
new file mode 100644
index 000000000000..5fd673a6a733
--- /dev/null
+++ b/drivers/soc/samsung/exynos5422-asv.c
@@ -0,0 +1,499 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ *
+ * Samsung Exynos 5422 SoC Adaptive Supply Voltage support
+ */
+
+#include <linux/bitrev.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/of.h>
+#include <linux/regmap.h>
+#include <linux/soc/samsung/exynos-chipid.h>
+#include <linux/slab.h>
+
+#include "exynos-asv.h"
+
+#define ASV_GROUPS_NUM		14
+#define ASV_ARM_DVFS_NUM	20
+#define ASV_ARM_BIN2_DVFS_NUM	17
+#define ASV_KFC_DVFS_NUM	14
+#define ASV_KFC_BIN2_DVFS_NUM	12
+
+static const u32 asv_arm_table[][ASV_ARM_DVFS_NUM][ASV_GROUPS_NUM + 1] = {
+{
+	/* ARM 0, 1 */
+	{ 2100,    1362500, 1362500, 1350000, 1337500, 1325000, 1312500, 1300000,
+	  1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000 },
+	{ 2000,    1312500, 1312500, 1300000, 1287500, 1275000, 1262500, 1250000,
+	  1237500, 1225000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1900,    1250000, 1237500, 1225000, 1212500, 1200000, 1187500, 1175000,
+	  1162500, 1150000, 1162500, 1150000, 1137500, 1125000, 1112500 },
+	{ 1800,    1200000, 1187500, 1175000, 1162500, 1150000, 1137500, 1125000,
+	  1112500, 1100000, 1112500, 1100000, 1087500, 1075000, 1062500 },
+	{ 1700,    1162500, 1150000, 1137500, 1125000, 1112500, 1100000, 1087500,
+	  1075000, 1062500, 1075000, 1062500, 1050000, 1037500, 1025000 },
+	{ 1600,    1125000, 1112500, 1100000, 1087500, 1075000, 1062500, 1050000,
+	  1037500, 1025000, 1037500, 1025000, 1012500, 1000000, 987500 },
+	{ 1500,    1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500,
+	  1000000, 987500,  1000000, 987500,  975000,  962500,  950000 },
+	{ 1400,    1062500, 1050000, 1037500, 1025000, 1012500, 1000000, 987500,
+	  975000,  962500,  975000,  962500,  950000,  937500,  925000 },
+	{ 1300,    1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000,
+	  962500,  950000,  962500,  950000,  937500,  925000,  912500 },
+	{ 1200,    1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  937500,  925000,  912500,  900000,  900000 },
+	{ 1100,    1000000, 987500,  975000,  962500,  950000,  937500,  925000,
+	  912500,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 1000,    975000,  962500,  950000,  937500,  925000,  912500,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 900,     950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 800,     925000,  912500,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 700,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* ARM 2 */
+	{ 2100,    1362500, 1362500, 1350000, 1337500, 1325000, 1312500, 1300000,
+	  1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000 },
+	{ 2000,    1312500, 1312500, 1312500, 1300000, 1275000, 1262500, 1250000,
+	  1237500, 1225000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1900,    1262500, 1250000, 1250000, 1237500, 1212500, 1200000, 1187500,
+	  1175000, 1162500, 1175000, 1162500, 1150000, 1137500, 1125000 },
+	{ 1800,    1212500, 1200000, 1187500, 1175000, 1162500, 1150000, 1137500,
+	  1125000, 1112500, 1125000, 1112500, 1100000, 1087500, 1075000 },
+	{ 1700,    1175000, 1162500, 1150000, 1137500, 1125000, 1112500, 1100000,
+	  1087500, 1075000, 1087500, 1075000, 1062500, 1050000, 1037500 },
+	{ 1600,    1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500,
+	  1050000, 1037500, 1050000, 1037500, 1025000, 1012500, 1000000 },
+	{ 1500,    1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 1012500, 1000000, 987500,  975000,  962500 },
+	{ 1400,    1075000, 1062500, 1050000, 1037500, 1025000, 1012500, 1000000,
+	  987500,  975000,  987500,  975000,  962500,  950000,  937500 },
+	{ 1300,    1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000,
+	  962500,  950000,  962500,  950000,  937500,  925000,  912500 },
+	{ 1200,    1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  937500,  925000,  912500,  900000,  900000 },
+	{ 1100,    1000000, 987500,  975000,  962500,  950000,  937500,  925000,
+	  912500,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 1000,    975000,  962500,  950000,  937500,  925000,  912500,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 900,     950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 800,     925000,  912500,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 700,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* ARM 3 */
+	{ 2100,    1362500, 1362500, 1350000, 1337500, 1325000, 1312500, 1300000,
+	  1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000 },
+	{ 2000,    1312500, 1312500, 1300000, 1287500, 1275000, 1262500, 1250000,
+	  1237500, 1225000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1900,    1262500, 1250000, 1237500, 1225000, 1212500, 1200000, 1187500,
+	  1175000, 1162500, 1175000, 1162500, 1150000, 1137500, 1125000 },
+	{ 1800,    1212500, 1200000, 1187500, 1175000, 1162500, 1150000, 1137500,
+	  1125000, 1112500, 1125000, 1112500, 1100000, 1087500, 1075000 },
+	{ 1700,    1175000, 1162500, 1150000, 1137500, 1125000, 1112500, 1100000,
+	  1087500, 1075000, 1087500, 1075000, 1062500, 1050000, 1037500 },
+	{ 1600,    1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500,
+	  1050000, 1037500, 1050000, 1037500, 1025000, 1012500, 1000000 },
+	{ 1500,    1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 1012500, 1000000, 987500,  975000,  962500 },
+	{ 1400,    1075000, 1062500, 1050000, 1037500, 1025000, 1012500, 1000000,
+	  987500,  975000,  987500,  975000,  962500,  950000,  937500 },
+	{ 1300,    1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000,
+	  962500,  950000,  962500,  950000,  937500,  925000,  912500 },
+	{ 1200,    1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  937500,  925000,  912500,  900000,  900000 },
+	{ 1100,    1000000, 987500,  975000,  962500,  950000,  937500,  925000,
+	  912500,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 1000,    975000,  962500,  950000,  937500,  925000,  912500,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 900,     950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 800,     925000,  912500,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 700,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* ARM bin 2 */
+	{ 1800,    1237500, 1225000, 1212500, 1200000, 1187500, 1175000, 1162500,
+	  1150000, 1137500, 1150000, 1137500, 1125000, 1112500, 1100000 },
+	{ 1700,    1200000, 1187500, 1175000, 1162500, 1150000, 1137500, 1125000,
+	  1112500, 1100000, 1112500, 1100000, 1087500, 1075000, 1062500 },
+	{ 1600,    1162500, 1150000, 1137500, 1125000, 1112500, 1100000, 1087500,
+	  1075000, 1062500, 1075000, 1062500, 1050000, 1037500, 1025000 },
+	{ 1500,    1125000, 1112500, 1100000, 1087500, 1075000, 1062500, 1050000,
+	  1037500, 1025000, 1037500, 1025000, 1012500, 1000000, 987500 },
+	{ 1400,    1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 1012500, 1000000, 987500,  975000,  962500 },
+	{ 1300,    1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500,
+	  1000000, 987500,  1000000, 987500,  975000,  962500,  950000 },
+	{ 1200,    1062500, 1050000, 1037500, 1025000, 1012500, 1000000, 987500,
+	  975000,  962500,  975000,  962500,  950000,  937500,  925000 },
+	{ 1100,    1037500, 1025000, 1012500, 1000000, 987500,  975000,  962500,
+	  950000,  937500,  950000,  937500,  925000,  912500,  900000 },
+	{ 1000,    1012500, 1000000, 987500,  975000,  962500,  950000,  937500,
+	  925000,  912500,  925000,  912500,  900000,  900000,  900000 },
+	{ 900,     987500,  975000,  962500,  950000,  937500,  925000,  912500,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 800,     962500,  950000,  937500,  925000,  912500,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 700,     937500,  925000,  912500,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}
+};
+
+static const u32 asv_kfc_table[][ASV_KFC_DVFS_NUM][ASV_GROUPS_NUM + 1] = {
+{
+	/* KFC 0, 1 */
+	{ 1500000, 1300000, 1300000, 1300000, 1287500, 1287500, 1287500, 1275000,
+	  1262500, 1250000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1400000, 1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000,
+	  1187500, 1175000, 1162500, 1150000, 1137500, 1125000, 1112500 },
+	{ 1300000, 1225000, 1212500, 1200000, 1187500, 1175000, 1162500, 1150000,
+	  1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500 },
+	{ 1200000, 1175000, 1162500, 1150000, 1137500, 1125000, 1112500, 1100000,
+	  1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500 },
+	{ 1100000, 1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500,
+	  1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000 },
+	{ 1000000, 1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 987500,  975000,  962500,  950000,  937500 },
+	{ 900000,  1062500, 1050000, 1037500, 1025000, 1012500, 1000000, 987500,
+	  975000,  962500,  950000,  937500,  925000,  912500,  900000 },
+	{ 800000,  1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  912500,  900000,  900000,  900000,  900000 },
+	{ 700000,  987500,  975000,  962500,  950000,  937500,  925000,  912500,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600000,  950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500000,  912500,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400000,  900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300000,  900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200000,  900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* KFC 2 */
+	{ 1500,    1300000, 1300000, 1300000, 1287500, 1287500, 1287500, 1275000,
+	  1262500, 1250000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1400,    1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000,
+	  1187500, 1175000, 1162500, 1150000, 1137500, 1125000, 1112500 },
+	{ 1300,    1225000, 1212500, 1200000, 1187500, 1175000, 1162500, 1150000,
+	  1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500 },
+	{ 1200,    1175000, 1162500, 1150000, 1137500, 1125000, 1112500, 1100000,
+	  1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500 },
+	{ 1100,    1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500,
+	  1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000 },
+	{ 1000,    1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 987500,  975000,  962500,  950000,  937500 },
+	{ 900,     1062500, 1050000, 1037500, 1025000, 1012500, 1000000, 987500,
+	  975000,  962500,  950000,  937500,  925000,  912500,  900000 },
+	{ 800,     1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  912500,  900000,  900000,  900000,  900000 },
+	{ 700,     987500,  975000,  962500,  950000,  937500,  925000,  912500,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     912500,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* KFC 3 */
+	{ 1500,    1300000, 1300000, 1300000, 1287500, 1287500, 1287500, 1275000,
+	  1262500, 1250000, 1237500, 1225000, 1212500, 1200000, 1187500 },
+	{ 1400,    1275000, 1262500, 1250000, 1237500, 1225000, 1212500, 1200000,
+	  1187500, 1175000, 1162500, 1150000, 1137500, 1125000, 1112500 },
+	{ 1300,    1225000, 1212500, 1200000, 1187500, 1175000, 1162500, 1150000,
+	  1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500 },
+	{ 1200,    1175000, 1162500, 1150000, 1137500, 1125000, 1112500, 1100000,
+	  1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500 },
+	{ 1100,    1137500, 1125000, 1112500, 1100000, 1087500, 1075000, 1062500,
+	  1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000 },
+	{ 1000,    1100000, 1087500, 1075000, 1062500, 1050000, 1037500, 1025000,
+	  1012500, 1000000, 987500,  975000,  962500,  950000,  937500 },
+	{ 900,     1062500, 1050000, 1037500, 1025000, 1012500, 1000000, 987500,
+	  975000,  962500,  950000,  937500,  925000,  912500,  900000 },
+	{ 800,     1025000, 1012500, 1000000, 987500,  975000,  962500,  950000,
+	  937500,  925000,  912500,  900000,  900000,  900000,  900000 },
+	{ 700,     987500,  975000,  962500,  950000,  937500,  925000,  912500,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     950000,  937500,  925000,  912500,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     912500,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}, {
+	/* KFC bin 2 */
+	{ 1300,    1250000, 1237500, 1225000, 1212500, 1200000, 1187500, 1175000,
+	  1162500, 1150000, 1137500, 1125000, 1112500, 1100000, 1087500 },
+	{ 1200,    1200000, 1187500, 1175000, 1162500, 1150000, 1137500, 1125000,
+	  1112500, 1100000, 1087500, 1075000, 1062500, 1050000, 1037500 },
+	{ 1100,    1162500, 1150000, 1137500, 1125000, 1112500, 1100000, 1087500,
+	  1075000, 1062500, 1050000, 1037500, 1025000, 1012500, 1000000 },
+	{ 1000,    1125000, 1112500, 1100000, 1087500, 1075000, 1062500, 1050000,
+	  1037500, 1025000, 1012500, 1000000, 987500,  975000,  962500 },
+	{ 900,     1087500, 1075000, 1062500, 1050000, 1037500, 1025000, 1012500,
+	  1000000, 987500,  975000,  962500,  950000,  937500,  925000 },
+	{ 800,     1050000, 1037500, 1025000, 1012500, 1000000, 987500,  975000,
+	  962500,  950000,  937500,  925000,  912500,  900000,  900000 },
+	{ 700,     1012500, 1000000, 987500,  975000,  962500,  950000,  937500,
+	  925000,  912500,  900000,  900000,  900000,  900000,  900000 },
+	{ 600,     975000,  962500,  950000,  937500,  925000,  912500,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 500,     937500,  925000,  912500,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 400,     925000,  912500,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 300,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+	{ 200,     900000,  900000,  900000,  900000,  900000,  900000,  900000,
+	  900000,  900000,  900000,  900000,  900000,  900000,  900000 },
+}
+};
+
+static const struct asv_limit_entry __asv_limits[ASV_GROUPS_NUM] = {
+	{ 13, 55 },
+	{ 21, 65 },
+	{ 25, 69 },
+	{ 30, 72 },
+	{ 36, 74 },
+	{ 43, 76 },
+	{ 51, 78 },
+	{ 65, 80 },
+	{ 81, 82 },
+	{ 98, 84 },
+	{ 119, 87 },
+	{ 135, 89 },
+	{ 150, 92 },
+	{ 999, 999 },
+};
+
+static int exynos5422_asv_get_group(struct exynos_asv *asv)
+{
+	unsigned int pkgid_reg, auxi_reg;
+	int hpm, ids, i;
+
+	regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_PKG_ID, &pkgid_reg);
+	regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_AUX_INFO, &auxi_reg);
+
+	if (asv->use_sg) {
+		u32 sga = (pkgid_reg >> EXYNOS5422_SG_A_OFFSET) &
+			   EXYNOS5422_SG_A_MASK;
+
+		u32 sgb = (pkgid_reg >> EXYNOS5422_SG_B_OFFSET) &
+			   EXYNOS5422_SG_B_MASK;
+
+		if ((pkgid_reg >> EXYNOS5422_SG_BSIGN_OFFSET) &
+		     EXYNOS5422_SG_BSIGN_MASK)
+			return sga + sgb;
+		else
+			return sga - sgb;
+	}
+
+	hpm = (auxi_reg >> EXYNOS5422_TMCB_OFFSET) & EXYNOS5422_TMCB_MASK;
+	ids = (pkgid_reg >> EXYNOS5422_IDS_OFFSET) & EXYNOS5422_IDS_MASK;
+
+	for (i = 0; i < ASV_GROUPS_NUM; i++) {
+		if (ids <= __asv_limits[i].ids)
+			break;
+		if (hpm <= __asv_limits[i].hpm)
+			break;
+	}
+	if (i < ASV_GROUPS_NUM)
+		return i;
+
+	return 0;
+}
+
+static int __asv_offset_voltage(unsigned int index)
+{
+	switch (index) {
+	case 1:
+		return 12500;
+	case 2:
+		return 50000;
+	case 3:
+		return 25000;
+	default:
+		return 0;
+	};
+}
+
+static void exynos5422_asv_offset_voltage_setup(struct exynos_asv *asv)
+{
+	struct exynos_asv_subsys *subsys;
+	unsigned int reg, value;
+
+	regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_AUX_INFO, &reg);
+
+	/* ARM offset voltage setup */
+	subsys = &asv->subsys[EXYNOS_ASV_SUBSYS_ID_ARM];
+
+	subsys->base_volt = 1000000;
+
+	value = (reg >> EXYNOS5422_ARM_UP_OFFSET) & EXYNOS5422_ARM_UP_MASK;
+	subsys->offset_volt_h = __asv_offset_voltage(value);
+
+	value = (reg >> EXYNOS5422_ARM_DN_OFFSET) & EXYNOS5422_ARM_DN_MASK;
+	subsys->offset_volt_l = __asv_offset_voltage(value);
+
+	/* KFC offset voltage setup */
+	subsys = &asv->subsys[EXYNOS_ASV_SUBSYS_ID_KFC];
+
+	subsys->base_volt = 1000000;
+
+	value = (reg >> EXYNOS5422_KFC_UP_OFFSET) & EXYNOS5422_KFC_UP_MASK;
+	subsys->offset_volt_h = __asv_offset_voltage(value);
+
+	value = (reg >> EXYNOS5422_KFC_DN_OFFSET) & EXYNOS5422_KFC_DN_MASK;
+	subsys->offset_volt_l = __asv_offset_voltage(value);
+}
+
+static int exynos5422_asv_opp_get_voltage(struct exynos_asv_subsys *subsys,
+				int level, unsigned int volt)
+{
+	unsigned int asv_volt;
+
+	if (level >= subsys->table.num_rows)
+		return volt;
+
+	asv_volt = exynos_asv_opp_get_voltage(subsys, level,
+					      subsys->asv->group);
+
+	if (volt > subsys->base_volt)
+		asv_volt += subsys->offset_volt_h;
+	else
+		asv_volt += subsys->offset_volt_l;
+
+	return asv_volt;
+}
+
+static unsigned int exynos5422_asv_parse_table(struct exynos_asv *asv,
+				      unsigned int pkg_id)
+{
+	return (pkg_id >> EXYNOS5422_TABLE_OFFSET) & EXYNOS5422_TABLE_MASK;
+}
+
+static bool exynos5422_asv_parse_bin2(struct exynos_asv *asv,
+				     unsigned int pkg_id)
+{
+	return (pkg_id >> EXYNOS5422_BIN2_OFFSET) & EXYNOS5422_BIN2_MASK;
+}
+
+static bool exynos5422_asv_parse_sg(struct exynos_asv *asv,
+					unsigned int pkg_id)
+{
+	return ((pkg_id >> EXYNOS5422_USESG_OFFSET) & EXYNOS5422_USESG_MASK);
+}
+
+int exynos5422_asv_init(struct exynos_asv *asv)
+{
+	struct exynos_asv_subsys *subsys;
+	unsigned int table_index;
+	unsigned int pkg_id;
+	bool bin2;
+
+	regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_PKG_ID, &pkg_id);
+
+	if (asv->of_bin == 2) {
+		bin2 = true;
+		asv->use_sg = false;
+	} else {
+		asv->use_sg = exynos5422_asv_parse_sg(asv, pkg_id);
+		bin2 = exynos5422_asv_parse_bin2(asv, pkg_id);
+	}
+
+	asv->group = exynos5422_asv_get_group(asv);
+	asv->table = exynos5422_asv_parse_table(asv, pkg_id);
+
+	exynos5422_asv_offset_voltage_setup(asv);
+
+	if (bin2) {
+		table_index = 3;
+	} else {
+		if (asv->table == 2 || asv->table == 3)
+			table_index = asv->table - 1;
+		else
+			table_index = 0;
+	}
+
+	subsys = &asv->subsys[EXYNOS_ASV_SUBSYS_ID_ARM];
+	subsys->cpu_dt_compat = "arm,cortex-a15";
+	if (bin2)
+		subsys->table.num_rows = ASV_ARM_BIN2_DVFS_NUM;
+	else
+		subsys->table.num_rows = ASV_ARM_DVFS_NUM;
+	subsys->table.num_cols = ASV_GROUPS_NUM + 1;
+	subsys->table.buf = (u32 *)asv_arm_table[table_index];
+
+	subsys = &asv->subsys[EXYNOS_ASV_SUBSYS_ID_KFC];
+	subsys->cpu_dt_compat = "arm,cortex-a7";
+	if (bin2)
+		subsys->table.num_rows = ASV_KFC_BIN2_DVFS_NUM;
+	else
+		subsys->table.num_rows = ASV_KFC_DVFS_NUM;
+	subsys->table.num_cols = ASV_GROUPS_NUM + 1;
+	subsys->table.buf = (u32 *)asv_kfc_table[table_index];
+
+	asv->opp_get_voltage = exynos5422_asv_opp_get_voltage;
+
+	return 0;
+}
diff --git a/drivers/soc/samsung/exynos5422-asv.h b/drivers/soc/samsung/exynos5422-asv.h
new file mode 100644
index 000000000000..d8f108fcc39b
--- /dev/null
+++ b/drivers/soc/samsung/exynos5422-asv.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ *
+ * Samsung Exynos 5422 SoC Adaptive Supply Voltage support
+ */
+
+#ifndef __EXYNOS5422_ASV_H
+#define __EXYNOS5422_ASV_H
+
+#include <linux/errno.h>
+
+struct exynos_asv;
+
+#ifdef CONFIG_EXYNOS_ASV_ARM
+int exynos5422_asv_init(struct exynos_asv *asv);
+#else
+static inline int exynos5422_asv_init(struct exynos_asv *asv)
+{
+	return -ENOTSUPP;
+}
+#endif
+
+#endif /* __EXYNOS5422_ASV_H */
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 5/9] ARM64: EXYNOS: enable exynos_chipid for ARCH_EXYNOS
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

From: Pankaj Dubey <pankaj.dubey@samsung.com>

This patch enables exynos_chipid driver for ARCH_EXYNOS
based SoC.

Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - none
---
 arch/arm64/Kconfig.platforms | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms
index d07fc063c930..e432d26f37d3 100644
--- a/arch/arm64/Kconfig.platforms
+++ b/arch/arm64/Kconfig.platforms
@@ -74,6 +74,7 @@ config ARCH_BRCMSTB
 config ARCH_EXYNOS
 	bool "ARMv8 based Samsung Exynos SoC family"
 	select COMMON_CLK_SAMSUNG
+	select EXYNOS_CHIPID
 	select EXYNOS_PM_DOMAINS if PM_GENERIC_DOMAINS
 	select EXYNOS_PMU
 	select HAVE_S3C2410_WATCHDOG if WATCHDOG
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 6/9] ARM: EXYNOS: Enable exynos-asv driver for ARCH_EXYNOS
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

Enable exynos-asv driver for Exynos 32-bit SoCs.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - none
---
 arch/arm/mach-exynos/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-exynos/Kconfig b/arch/arm/mach-exynos/Kconfig
index 6fc4af312361..c6ed153c3151 100644
--- a/arch/arm/mach-exynos/Kconfig
+++ b/arch/arm/mach-exynos/Kconfig
@@ -13,6 +13,7 @@ menuconfig ARCH_EXYNOS
 	select ARM_AMBA
 	select ARM_GIC
 	select COMMON_CLK_SAMSUNG
+	select EXYNOS_ASV
 	select EXYNOS_CHIPID
 	select EXYNOS_THERMAL
 	select EXYNOS_PMU
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 8/9] ARM: dts: Add "syscon" compatible string to chipid node
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

The CHIP ID block in addition to exact chip revision information
contains data and control registers for ASV (Adaptive Supply Voltage)
and ABB (Adaptive Body Bias). Add "syscon" compatible so the CHIPID
block can be shared by respective drivers.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - new patch
---
 arch/arm/boot/dts/exynos5.dtsi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/exynos5.dtsi b/arch/arm/boot/dts/exynos5.dtsi
index 67f9b4504a42..4801ca759feb 100644
--- a/arch/arm/boot/dts/exynos5.dtsi
+++ b/arch/arm/boot/dts/exynos5.dtsi
@@ -35,8 +35,8 @@
 		#size-cells = <1>;
 		ranges;
 
-		chipid@10000000 {
-			compatible = "samsung,exynos4210-chipid";
+		chipid: chipid@10000000 {
+			compatible = "samsung,exynos4210-chipid", "syscon";
 			reg = <0x10000000 0x100>;
 		};
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 7/9] soc: samsung: Update the CHIP ID DT binding documentation
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

This patch adds documentation of a new optional "samsung,asv-bin"
property in the chipid device node and documents requirement of
"syscon" compatible string.  These additions are needed to support
Exynos ASV (Adaptive Supply Voltage) feature.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - new patch
---
 .../devicetree/bindings/arm/samsung/exynos-chipid.txt  | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/samsung/exynos-chipid.txt b/Documentation/devicetree/bindings/arm/samsung/exynos-chipid.txt
index 85c5dfd4a720..be3657e6c00c 100644
--- a/Documentation/devicetree/bindings/arm/samsung/exynos-chipid.txt
+++ b/Documentation/devicetree/bindings/arm/samsung/exynos-chipid.txt
@@ -1,12 +1,18 @@
-SAMSUNG Exynos SoCs Chipid driver.
+SAMSUNG Exynos SoC series CHIPID subsystem
 
 Required properties:
-- compatible : Should at least contain "samsung,exynos4210-chipid".
+- compatible : Should at least contain "samsung,exynos4210-chipid", "syscon".
 
 - reg: offset and length of the register set
 
+Optional properties:
+ - samsung,asv-bin : Adaptive Supply Voltage bin selection. This can be used
+   to determine the ASV bin of an SoC if respective information is missing
+   in the CHIPID registers or in the OTP memory. Possible values: 0...3.
+
 Example:
 	chipid@10000000 {
 		compatible = "samsung,exynos4210-chipid";
 		reg = <0x10000000 0x100>;
+		samsung,asv-bin = <2>;
 	};
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 9/9] ARM: dts: Add samsung,asv-bin property for odroidxu3-lite
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

The Exynos5422 SoC used on Odroid XU3 Lite boards belongs to
a special ASV bin but this information cannot be read from the
CHIPID block registers. Add samsung,asv-bin property for XU3
Lite to ensure the ASV bin is properly determined.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
Changes since v1 (RFC):
 - new patch
---
 arch/arm/boot/dts/exynos5422-odroidxu3-lite.dts | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/exynos5422-odroidxu3-lite.dts b/arch/arm/boot/dts/exynos5422-odroidxu3-lite.dts
index c19b5a51ca44..a31ca2ef750f 100644
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-lite.dts
+++ b/arch/arm/boot/dts/exynos5422-odroidxu3-lite.dts
@@ -26,6 +26,10 @@
 	status = "disabled";
 };
 
+&chipid {
+	samsung,asv-bin = <2>;
+};
+
 &pwm {
 	/*
 	 * PWM 0 -- fan
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 1/9] soc: samsung: Add exynos chipid driver support
From: Sylwester Nawrocki @ 2019-07-18 14:30 UTC (permalink / raw)
  To: krzk
  Cc: robh+dt, vireshk, devicetree, kgene, pankaj.dubey,
	linux-samsung-soc, linux-arm-kernel, linux-kernel, linux-pm,
	b.zolnierkie, m.szyprowski, Sylwester Nawrocki
In-Reply-To: <20190718143044.25066-1-s.nawrocki@samsung.com>

From: Pankaj Dubey <pankaj.dubey@samsung.com>

Exynos SoCs have Chipid, for identification of product IDs and SoC
revisions. This patch intends to provide initialization code for all
these functionalities, at the same time it provides some sysfs entries
for accessing these information to user-space.

This driver uses existing binding for exynos-chipid.

Changes by Bartlomiej:
- fixed return values on errors
- removed bogus kfree_const()
- added missing Exynos4210 EVT0 id
- converted code to use EXYNOS_MASK define
- fixed np use after of_node_put()
- fixed too early use of dev_info()
- made driver fail for unknown SoC-s
- added SPDX tag
- updated Copyrights

Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
[m.szyprowski: for suggestion and code snippet of product_id_to_soc_id]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
[s.nawrocki: updated copyright date]
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
---
 drivers/soc/samsung/Kconfig         |   5 ++
 drivers/soc/samsung/Makefile        |   2 +
 drivers/soc/samsung/exynos-chipid.c | 111 ++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+)
 create mode 100644 drivers/soc/samsung/exynos-chipid.c

diff --git a/drivers/soc/samsung/Kconfig b/drivers/soc/samsung/Kconfig
index 2186285fda92..2905f5262197 100644
--- a/drivers/soc/samsung/Kconfig
+++ b/drivers/soc/samsung/Kconfig
@@ -7,6 +7,11 @@ menuconfig SOC_SAMSUNG
 
 if SOC_SAMSUNG
 
+config EXYNOS_CHIPID
+	bool "Exynos Chipid controller driver" if COMPILE_TEST
+	depends on ARCH_EXYNOS || COMPILE_TEST
+	select SOC_BUS
+
 config EXYNOS_PMU
 	bool "Exynos PMU controller driver" if COMPILE_TEST
 	depends on ARCH_EXYNOS || ((ARM || ARM64) && COMPILE_TEST)
diff --git a/drivers/soc/samsung/Makefile b/drivers/soc/samsung/Makefile
index 29f294baac6e..3b6a8797416c 100644
--- a/drivers/soc/samsung/Makefile
+++ b/drivers/soc/samsung/Makefile
@@ -1,4 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_EXYNOS_CHIPID)	+= exynos-chipid.o
 obj-$(CONFIG_EXYNOS_PMU)	+= exynos-pmu.o
 
 obj-$(CONFIG_EXYNOS_PMU_ARM_DRIVERS)	+= exynos3250-pmu.o exynos4-pmu.o \
diff --git a/drivers/soc/samsung/exynos-chipid.c b/drivers/soc/samsung/exynos-chipid.c
new file mode 100644
index 000000000000..78b123ee60c0
--- /dev/null
+++ b/drivers/soc/samsung/exynos-chipid.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Samsung Electronics Co., Ltd.
+ *	      http://www.samsung.com/
+ *
+ * EXYNOS - CHIP ID support
+ * Author: Pankaj Dubey <pankaj.dubey@samsung.com>
+ * Author: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
+ */
+
+#include <linux/io.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/sys_soc.h>
+
+#define EXYNOS_SUBREV_MASK	(0xF << 4)
+#define EXYNOS_MAINREV_MASK	(0xF << 0)
+#define EXYNOS_REV_MASK		(EXYNOS_SUBREV_MASK | EXYNOS_MAINREV_MASK)
+#define EXYNOS_MASK		0xFFFFF000
+
+static const struct exynos_soc_id {
+	const char *name;
+	unsigned int id;
+} soc_ids[] = {
+	{ "EXYNOS3250", 0xE3472000 },
+	{ "EXYNOS4210", 0x43200000 },	/* EVT0 revision */
+	{ "EXYNOS4210", 0x43210000 },
+	{ "EXYNOS4212", 0x43220000 },
+	{ "EXYNOS4412", 0xE4412000 },
+	{ "EXYNOS5250", 0x43520000 },
+	{ "EXYNOS5260", 0xE5260000 },
+	{ "EXYNOS5410", 0xE5410000 },
+	{ "EXYNOS5420", 0xE5420000 },
+	{ "EXYNOS5440", 0xE5440000 },
+	{ "EXYNOS5800", 0xE5422000 },
+	{ "EXYNOS7420", 0xE7420000 },
+	{ "EXYNOS5433", 0xE5433000 },
+};
+
+static const char * __init product_id_to_soc_id(unsigned int product_id)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(soc_ids); i++)
+		if ((product_id & EXYNOS_MASK) == soc_ids[i].id)
+			return soc_ids[i].name;
+	return NULL;
+}
+
+int __init exynos_chipid_early_init(void)
+{
+	struct soc_device_attribute *soc_dev_attr;
+	void __iomem *exynos_chipid_base;
+	struct soc_device *soc_dev;
+	struct device_node *root;
+	struct device_node *np;
+	u32 product_id;
+	u32 revision;
+
+	/* look up for chipid node */
+	np = of_find_compatible_node(NULL, NULL, "samsung,exynos4210-chipid");
+	if (!np)
+		return -ENODEV;
+
+	exynos_chipid_base = of_iomap(np, 0);
+	of_node_put(np);
+
+	if (!exynos_chipid_base) {
+		pr_err("Failed to map SoC chipid\n");
+		return -ENXIO;
+	}
+
+	product_id = readl_relaxed(exynos_chipid_base);
+	revision = product_id & EXYNOS_REV_MASK;
+	iounmap(exynos_chipid_base);
+
+	soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
+	if (!soc_dev_attr)
+		return -ENOMEM;
+
+	soc_dev_attr->family = "Samsung Exynos";
+
+	root = of_find_node_by_path("/");
+	of_property_read_string(root, "model", &soc_dev_attr->machine);
+	of_node_put(root);
+
+	soc_dev_attr->revision = kasprintf(GFP_KERNEL, "%x", revision);
+	soc_dev_attr->soc_id = product_id_to_soc_id(product_id);
+	if (!soc_dev_attr->soc_id) {
+		pr_err("Unknown SoC\n");
+		return -ENODEV;
+	}
+
+	/* please note that the actual registration will be deferred */
+	soc_dev = soc_device_register(soc_dev_attr);
+	if (IS_ERR(soc_dev)) {
+		kfree(soc_dev_attr->revision);
+		kfree(soc_dev_attr);
+		return PTR_ERR(soc_dev);
+	}
+
+	/* it is too early to use dev_info() here (soc_dev is NULL) */
+	pr_info("soc soc0: Exynos: CPU[%s] PRO_ID[0x%x] REV[0x%x] Detected\n",
+		soc_dev_attr->soc_id, product_id, revision);
+
+	return 0;
+}
+early_initcall(exynos_chipid_early_init);
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox