Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCH 9/9 v2] cgroup_freezer: implement proper hierarchy support
From: Michal Hocko @ 2012-11-08 14:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121107163919.GC2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>

On Wed 07-11-12 08:39:19, Tejun Heo wrote:
[...]
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
[...]
> @@ -320,14 +388,39 @@ static void freezer_apply_state(struct f
>   * @freezer: freezer of interest
>   * @freeze: whether to freeze or thaw
>   *
> - * Freeze or thaw @cgroup according to @freeze.
> + * Freeze or thaw @freezer according to @freeze.  The operations are
> + * recursive - all descendants of @freezer will be affected.
>   */
>  static void freezer_change_state(struct freezer *freezer, bool freeze)
>  {
> +	struct cgroup *pos;
> +
>  	/* update @freezer */
>  	spin_lock_irq(&freezer->lock);
>  	freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
>  	spin_unlock_irq(&freezer->lock);
> +
> +	/*
> +	 * Update all its descendants in pre-order traversal.  Each
> +	 * descendant will try to inherit its parent's FREEZING state as
> +	 * CGROUP_FREEZING_PARENT.
> +	 */
> +	rcu_read_lock();
> +	cgroup_for_each_descendant_pre(pos, freezer->css.cgroup) {
> +		struct freezer *pos_f = cgroup_freezer(pos);
> +		struct freezer *parent = parent_freezer(pos_f);
> +
> +		/*
> +		 * Our update to @parent->state is already visible which is
> +		 * all we need.  No need to lock @parent.  For more info on
> +		 * synchronization, see freezer_post_create().
> +		 */
> +		spin_lock_irq(&pos_f->lock);
> +		freezer_apply_state(pos_f, parent->state & CGROUP_FREEZING,
> +				    CGROUP_FREEZING_PARENT);
> +		spin_unlock_irq(&pos_f->lock);
> +	}
> +	rcu_read_unlock();
>  }

This seems to be racy because parent->state access is not linearized.
Say we have parallel freeze and thawing on a tree like the following:
	A
	|
	B
        |
        C 

pre_order will visit them in B, C order.
	CPU1						CPU2
freezer_apply_state(A, true)
A->state & FREEZING == true		freezer_apply_state(A, false)
					A->state & FREEZING == false
					freezer_apply_state(B, false)
					B->state & FREEZING == false
freezer_apply_state(B, true)

B->state & FREEZING == true
freezer_apply_state(C, true)
					freezer_apply_state(C, false)

So A, C are thawed while B is frozen. Or am I missing something which
would prevent from this kind of race?

The easy fix is to move spin_unlock_irq(&freezer->lock); after
rcu_read_unlock.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
From: Michal Hocko @ 2012-11-08 13:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
	fweisbec
In-Reply-To: <1351931915-1701-9-git-send-email-tj@kernel.org>

On Sat 03-11-12 01:38:34, Tejun Heo wrote:
> A cgroup is online and visible to iteration between ->post_create()
> and ->pre_destroy().  This patch introduces CGROUP_FREEZER_ONLINE and
> toggles it from the newly added freezer_post_create() and
> freezer_pre_destroy() while holding freezer->lock such that a
> cgroup_freezer can be reilably distinguished to be online.  This will
> be used by full hierarchy support.

I am thinking whether freezer_pre_destroy is really needed. Once we
reach pre_destroy then there are no tasks nor any children in the group
so there is nobody to wake up if the group was frozen and the destroy
callback is called after synchronize_rcu so the traversing should be
safe.

> ONLINE test is added to freezer_apply_state() but it currently doesn't
> make any difference as freezer_write() can only be called for an
> online cgroup.
> 
> Adjusting system_freezing_cnt on destruction is moved from
> freezer_destroy() to the new freezer_pre_destroy() for consistency.
> 
> This patch doesn't introduce any noticeable behavior change.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cgroup_freezer.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b8ad93c..4f12d31 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,6 +23,7 @@
>  #include <linux/seq_file.h>
>  
>  enum freezer_state_flags {
> +	CGROUP_FREEZER_ONLINE	= (1 << 0), /* freezer is fully online */
>  	CGROUP_FREEZING_SELF	= (1 << 1), /* this freezer is freezing */
>  	CGROUP_FREEZING_PARENT	= (1 << 2), /* the parent freezer is freezing */
>  	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
> @@ -98,13 +99,45 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
>  	return &freezer->css;
>  }
>  
> -static void freezer_destroy(struct cgroup *cgroup)
> +/**
> + * freezer_post_create - commit creation of a freezer cgroup
> + * @cgroup: cgroup being created
> + *
> + * We're committing to creation of @cgroup.  Mark it online.
> + */
> +static void freezer_post_create(struct cgroup *cgroup)
>  {
>  	struct freezer *freezer = cgroup_freezer(cgroup);
>  
> +	spin_lock_irq(&freezer->lock);
> +	freezer->state |= CGROUP_FREEZER_ONLINE;
> +	spin_unlock_irq(&freezer->lock);
> +}
> +
> +/**
> + * freezer_pre_destroy - initiate destruction of @cgroup
> + * @cgroup: cgroup being destroyed
> + *
> + * @cgroup is going away.  Mark it dead and decrement system_freezing_count
> + * if it was holding one.
> + */
> +static void freezer_pre_destroy(struct cgroup *cgroup)
> +{
> +	struct freezer *freezer = cgroup_freezer(cgroup);
> +
> +	spin_lock_irq(&freezer->lock);
> +
>  	if (freezer->state & CGROUP_FREEZING)
>  		atomic_dec(&system_freezing_cnt);
> -	kfree(freezer);
> +
> +	freezer->state = 0;
> +
> +	spin_unlock_irq(&freezer->lock);
> +}
> +
> +static void freezer_destroy(struct cgroup *cgroup)
> +{
> +	kfree(cgroup_freezer(cgroup));
>  }
>  
>  /*
> @@ -260,6 +293,9 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze,
>  	/* also synchronizes against task migration, see freezer_attach() */
>  	lockdep_assert_held(&freezer->lock);
>  
> +	if (!(freezer->state & CGROUP_FREEZER_ONLINE))
> +		return;
> +
>  	if (freeze) {
>  		if (!(freezer->state & CGROUP_FREEZING))
>  			atomic_inc(&system_freezing_cnt);
> @@ -347,6 +383,8 @@ static struct cftype files[] = {
>  struct cgroup_subsys freezer_subsys = {
>  	.name		= "freezer",
>  	.create		= freezer_create,
> +	.post_create	= freezer_post_create,
> +	.pre_destroy	= freezer_pre_destroy,
>  	.destroy	= freezer_destroy,
>  	.subsys_id	= freezer_subsys_id,
>  	.attach		= freezer_attach,
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
From: Michal Hocko @ 2012-11-08 12:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
	fweisbec
In-Reply-To: <1351931915-1701-8-git-send-email-tj@kernel.org>

On Sat 03-11-12 01:38:33, Tejun Heo wrote:
> Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> the two flags.  This is to prepare for full hierarchy support.
> 
> freezer_apply_date() is updated such that it can handle setting and
> clearing of both flags.  The two flags are also exposed to userland
> via read-only files self_freezing and parent_freezing.
> 
> Other than the added cgroupfs files, this patch doesn't introduce any
> behavior change.

OK, I can imagine that this might be useful to find the first parent
group that needs to be thawed to unfreeze the group I am interested
in. Could you also update Documentation/cgroups/freezer-subsystem.txt to
clarify the intended usage?

Minor nit. Same as mentioned in the previous patch freezer_apply_state
should get enum freezer_state_flags state parameter.

> Signed-off-by: Tejun Heo <tj@kernel.org>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cgroup_freezer.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index e76aa9f..b8ad93c 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,8 +23,12 @@
>  #include <linux/seq_file.h>
>  
>  enum freezer_state_flags {
> -	CGROUP_FREEZING		= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FREEZING_SELF	= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FREEZING_PARENT	= (1 << 2), /* the parent freezer is freezing */
>  	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
> +
> +	/* mask for all FREEZING flags */
> +	CGROUP_FREEZING		= CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
>  };
>  
>  struct freezer {
> @@ -245,8 +249,13 @@ static void unfreeze_cgroup(struct freezer *freezer)
>   * freezer_apply_state - apply state change to a single cgroup_freezer
>   * @freezer: freezer to apply state change to
>   * @freeze: whether to freeze or unfreeze
> + * @state: CGROUP_FREEZING_* flag to set or clear
> + *
> + * Set or clear @state on @cgroup according to @freeze, and perform
> + * freezing or thawing as necessary.
>   */
> -static void freezer_apply_state(struct freezer *freezer, bool freeze)
> +static void freezer_apply_state(struct freezer *freezer, bool freeze,
> +				unsigned int state)
>  {
>  	/* also synchronizes against task migration, see freezer_attach() */
>  	lockdep_assert_held(&freezer->lock);
> @@ -254,13 +263,19 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
>  	if (freeze) {
>  		if (!(freezer->state & CGROUP_FREEZING))
>  			atomic_inc(&system_freezing_cnt);
> -		freezer->state |= CGROUP_FREEZING;
> +		freezer->state |= state;
>  		freeze_cgroup(freezer);
>  	} else {
> -		if (freezer->state & CGROUP_FREEZING)
> -			atomic_dec(&system_freezing_cnt);
> -		freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
> -		unfreeze_cgroup(freezer);
> +		bool was_freezing = freezer->state & CGROUP_FREEZING;
> +
> +		freezer->state &= ~state;
> +
> +		if (!(freezer->state & CGROUP_FREEZING)) {
> +			if (was_freezing)
> +				atomic_dec(&system_freezing_cnt);
> +			freezer->state &= ~CGROUP_FROZEN;
> +			unfreeze_cgroup(freezer);
> +		}
>  	}
>  }
>  
> @@ -275,7 +290,7 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
>  {
>  	/* update @freezer */
>  	spin_lock_irq(&freezer->lock);
> -	freezer_apply_state(freezer, freeze);
> +	freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
>  	spin_unlock_irq(&freezer->lock);
>  }
>  
> @@ -295,6 +310,20 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>  	return 0;
>  }
>  
> +static u64 freezer_self_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> +	struct freezer *freezer = cgroup_freezer(cgroup);
> +
> +	return (bool)(freezer->state & CGROUP_FREEZING_SELF);
> +}
> +
> +static u64 freezer_parent_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> +	struct freezer *freezer = cgroup_freezer(cgroup);
> +
> +	return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
> +}
> +
>  static struct cftype files[] = {
>  	{
>  		.name = "state",
> @@ -302,6 +331,16 @@ static struct cftype files[] = {
>  		.read_seq_string = freezer_read,
>  		.write_string = freezer_write,
>  	},
> +	{
> +		.name = "self_freezing",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.read_u64 = freezer_self_freezing_read,
> +	},
> +	{
> +		.name = "parent_freezing",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.read_u64 = freezer_parent_freezing_read,
> +	},
>  	{ }	/* terminate */
>  };
>  
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags
From: Michal Hocko @ 2012-11-08 10:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Sat 03-11-12 01:38:32, Tejun Heo wrote:
> freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
> As the scheduled full hierarchy support requires more than one
> freezing condition, switch it to mask of flags.  If FREEZING is not
> set, it's thawed.  FREEZING is set if freezing or frozen.  If frozen,
> both FREEZING and FROZEN are set.  Now that tasks can be attached to
> an already frozen cgroup, this also makes freezing condition checks
> more natural.
> 
> This patch doesn't introduce any behavior change.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

I think it would be nicer to use freezer_state_flags enum rather than
unsigned int for the state. I would even expect gcc to complain about
that but it looks like -fstrict-enums is c++ specific (so long enum
safety).

Anyway
Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  kernel/cgroup_freezer.c | 60 ++++++++++++++++++++++---------------------------
>  1 file changed, 27 insertions(+), 33 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 2690830..e76aa9f 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -22,15 +22,14 @@
>  #include <linux/freezer.h>
>  #include <linux/seq_file.h>
>  
> -enum freezer_state {
> -	CGROUP_THAWED = 0,
> -	CGROUP_FREEZING,
> -	CGROUP_FROZEN,
> +enum freezer_state_flags {
> +	CGROUP_FREEZING		= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
>  };
>  
>  struct freezer {
>  	struct cgroup_subsys_state	css;
> -	enum freezer_state		state;
> +	unsigned int			state;
>  	spinlock_t			lock;
>  };
>  
> @@ -48,12 +47,10 @@ static inline struct freezer *task_freezer(struct task_struct *task)
>  
>  bool cgroup_freezing(struct task_struct *task)
>  {
> -	enum freezer_state state;
>  	bool ret;
>  
>  	rcu_read_lock();
> -	state = task_freezer(task)->state;
> -	ret = state == CGROUP_FREEZING || state == CGROUP_FROZEN;
> +	ret = task_freezer(task)->state & CGROUP_FREEZING;
>  	rcu_read_unlock();
>  
>  	return ret;
> @@ -63,10 +60,13 @@ bool cgroup_freezing(struct task_struct *task)
>   * cgroups_write_string() limits the size of freezer state strings to
>   * CGROUP_LOCAL_BUFFER_SIZE
>   */
> -static const char *freezer_state_strs[] = {
> -	"THAWED",
> -	"FREEZING",
> -	"FROZEN",
> +static const char *freezer_state_strs(unsigned int state)
> +{
> +	if (state & CGROUP_FROZEN)
> +		return "FROZEN";
> +	if (state & CGROUP_FREEZING)
> +		return "FREEZING";
> +	return "THAWED";
>  };
>  
>  /*
> @@ -91,7 +91,6 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
>  		return ERR_PTR(-ENOMEM);
>  
>  	spin_lock_init(&freezer->lock);
> -	freezer->state = CGROUP_THAWED;
>  	return &freezer->css;
>  }
>  
> @@ -99,7 +98,7 @@ static void freezer_destroy(struct cgroup *cgroup)
>  {
>  	struct freezer *freezer = cgroup_freezer(cgroup);
>  
> -	if (freezer->state != CGROUP_THAWED)
> +	if (freezer->state & CGROUP_FREEZING)
>  		atomic_dec(&system_freezing_cnt);
>  	kfree(freezer);
>  }
> @@ -129,15 +128,13 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
>  	 * Tasks in @tset are on @new_cgrp but may not conform to its
>  	 * current state before executing the following - !frozen tasks may
>  	 * be visible in a FROZEN cgroup and frozen tasks in a THAWED one.
> -	 * This means that, to determine whether to freeze, one should test
> -	 * whether the state equals THAWED.
>  	 */
>  	cgroup_taskset_for_each(task, new_cgrp, tset) {
> -		if (freezer->state == CGROUP_THAWED) {
> +		if (!(freezer->state & CGROUP_FREEZING)) {
>  			__thaw_task(task);
>  		} else {
>  			freeze_task(task);
> -			freezer->state = CGROUP_FREEZING;
> +			freezer->state &= ~CGROUP_FROZEN;
>  		}
>  	}
>  
> @@ -159,11 +156,7 @@ static void freezer_fork(struct task_struct *task)
>  		goto out;
>  
>  	spin_lock_irq(&freezer->lock);
> -	/*
> -	 * @task might have been just migrated into a FROZEN cgroup.  Test
> -	 * equality with THAWED.  Read the comment in freezer_attach().
> -	 */
> -	if (freezer->state != CGROUP_THAWED)
> +	if (freezer->state & CGROUP_FREEZING)
>  		freeze_task(task);
>  	spin_unlock_irq(&freezer->lock);
>  out:
> @@ -184,7 +177,8 @@ static void update_if_frozen(struct freezer *freezer)
>  	struct cgroup_iter it;
>  	struct task_struct *task;
>  
> -	if (freezer->state != CGROUP_FREEZING)
> +	if (!(freezer->state & CGROUP_FREEZING) ||
> +	    (freezer->state & CGROUP_FROZEN))
>  		return;
>  
>  	cgroup_iter_start(cgroup, &it);
> @@ -202,7 +196,7 @@ static void update_if_frozen(struct freezer *freezer)
>  		}
>  	}
>  
> -	freezer->state = CGROUP_FROZEN;
> +	freezer->state |= CGROUP_FROZEN;
>  notyet:
>  	cgroup_iter_end(cgroup, &it);
>  }
> @@ -211,14 +205,14 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
>  			struct seq_file *m)
>  {
>  	struct freezer *freezer = cgroup_freezer(cgroup);
> -	enum freezer_state state;
> +	unsigned int state;
>  
>  	spin_lock_irq(&freezer->lock);
>  	update_if_frozen(freezer);
>  	state = freezer->state;
>  	spin_unlock_irq(&freezer->lock);
>  
> -	seq_puts(m, freezer_state_strs[state]);
> +	seq_puts(m, freezer_state_strs(state));
>  	seq_putc(m, '\n');
>  	return 0;
>  }
> @@ -258,14 +252,14 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
>  	lockdep_assert_held(&freezer->lock);
>  
>  	if (freeze) {
> -		if (freezer->state == CGROUP_THAWED)
> +		if (!(freezer->state & CGROUP_FREEZING))
>  			atomic_inc(&system_freezing_cnt);
> -		freezer->state = CGROUP_FREEZING;
> +		freezer->state |= CGROUP_FREEZING;
>  		freeze_cgroup(freezer);
>  	} else {
> -		if (freezer->state != CGROUP_THAWED)
> +		if (freezer->state & CGROUP_FREEZING)
>  			atomic_dec(&system_freezing_cnt);
> -		freezer->state = CGROUP_THAWED;
> +		freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
>  		unfreeze_cgroup(freezer);
>  	}
>  }
> @@ -290,9 +284,9 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>  {
>  	bool freeze;
>  
> -	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> +	if (strcmp(buffer, freezer_state_strs(0)) == 0)
>  		freeze = false;
> -	else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
> +	else if (strcmp(buffer, freezer_state_strs(CGROUP_FROZEN)) == 0)
>  		freeze = true;
>  	else
>  		return -EINVAL;
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
From: Michal Hocko @ 2012-11-08  9:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, rjw, containers, cgroups, linux-kernel, linux-pm,
	fweisbec
In-Reply-To: <1351931915-1701-6-git-send-email-tj@kernel.org>

On Sat 03-11-12 01:38:31, Tejun Heo wrote:
> * Make freezer_change_state() take bool @freeze instead of enum
>   freezer_state.
> 
> * Separate out freezer_apply_state() out of freezer_change_state().
>   This makes freezer_change_state() a rather silly thin wrapper.  It
>   will be filled with hierarchy handling later on.
> 
> This patch doesn't introduce any behavior change.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Makes sense
Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  kernel/cgroup_freezer.c | 48 ++++++++++++++++++++++++++++++------------------
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 975b3d8..2690830 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -247,45 +247,57 @@ static void unfreeze_cgroup(struct freezer *freezer)
>  	cgroup_iter_end(cgroup, &it);
>  }
>  
> -static void freezer_change_state(struct freezer *freezer,
> -				 enum freezer_state goal_state)
> +/**
> + * freezer_apply_state - apply state change to a single cgroup_freezer
> + * @freezer: freezer to apply state change to
> + * @freeze: whether to freeze or unfreeze
> + */
> +static void freezer_apply_state(struct freezer *freezer, bool freeze)
>  {
>  	/* also synchronizes against task migration, see freezer_attach() */
> -	spin_lock_irq(&freezer->lock);
> +	lockdep_assert_held(&freezer->lock);
>  
> -	switch (goal_state) {
> -	case CGROUP_THAWED:
> -		if (freezer->state != CGROUP_THAWED)
> -			atomic_dec(&system_freezing_cnt);
> -		freezer->state = CGROUP_THAWED;
> -		unfreeze_cgroup(freezer);
> -		break;
> -	case CGROUP_FROZEN:
> +	if (freeze) {
>  		if (freezer->state == CGROUP_THAWED)
>  			atomic_inc(&system_freezing_cnt);
>  		freezer->state = CGROUP_FREEZING;
>  		freeze_cgroup(freezer);
> -		break;
> -	default:
> -		BUG();
> +	} else {
> +		if (freezer->state != CGROUP_THAWED)
> +			atomic_dec(&system_freezing_cnt);
> +		freezer->state = CGROUP_THAWED;
> +		unfreeze_cgroup(freezer);
>  	}
> +}
>  
> +/**
> + * freezer_change_state - change the freezing state of a cgroup_freezer
> + * @freezer: freezer of interest
> + * @freeze: whether to freeze or thaw
> + *
> + * Freeze or thaw @cgroup according to @freeze.
> + */
> +static void freezer_change_state(struct freezer *freezer, bool freeze)
> +{
> +	/* update @freezer */
> +	spin_lock_irq(&freezer->lock);
> +	freezer_apply_state(freezer, freeze);
>  	spin_unlock_irq(&freezer->lock);
>  }
>  
>  static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>  			 const char *buffer)
>  {
> -	enum freezer_state goal_state;
> +	bool freeze;
>  
>  	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> -		goal_state = CGROUP_THAWED;
> +		freeze = false;
>  	else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
> -		goal_state = CGROUP_FROZEN;
> +		freeze = true;
>  	else
>  		return -EINVAL;
>  
> -	freezer_change_state(cgroup_freezer(cgroup), goal_state);
> +	freezer_change_state(cgroup_freezer(cgroup), freeze);
>  	return 0;
>  }
>  
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH 4/9] cgroup_freezer: trivial cleanups
From: Michal Hocko @ 2012-11-08  9:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Sat 03-11-12 01:38:30, Tejun Heo wrote:
> * Clean-up indentation and line-breaks.  Drop the invalid comment
>   about freezer->lock.
> 
> * Make all internal functions take @freezer instead of both @cgroup
>   and @freezer.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Looks reasonable
Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

> ---
>  kernel/cgroup_freezer.c | 41 +++++++++++++++++++----------------------
>  1 file changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index bedefd9..975b3d8 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -29,17 +29,15 @@ enum freezer_state {
>  };
>  
>  struct freezer {
> -	struct cgroup_subsys_state css;
> -	enum freezer_state state;
> -	spinlock_t lock; /* protects _writes_ to state */
> +	struct cgroup_subsys_state	css;
> +	enum freezer_state		state;
> +	spinlock_t			lock;
>  };
>  
> -static inline struct freezer *cgroup_freezer(
> -		struct cgroup *cgroup)
> +static inline struct freezer *cgroup_freezer(struct cgroup *cgroup)
>  {
> -	return container_of(
> -		cgroup_subsys_state(cgroup, freezer_subsys_id),
> -		struct freezer, css);
> +	return container_of(cgroup_subsys_state(cgroup, freezer_subsys_id),
> +			    struct freezer, css);
>  }
>  
>  static inline struct freezer *task_freezer(struct task_struct *task)
> @@ -180,8 +178,9 @@ out:
>   * migrated into or out of @cgroup, so we can't verify task states against
>   * @freezer state here.  See freezer_attach() for details.
>   */
> -static void update_if_frozen(struct cgroup *cgroup, struct freezer *freezer)
> +static void update_if_frozen(struct freezer *freezer)
>  {
> +	struct cgroup *cgroup = freezer->css.cgroup;
>  	struct cgroup_iter it;
>  	struct task_struct *task;
>  
> @@ -211,12 +210,11 @@ notyet:
>  static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
>  			struct seq_file *m)
>  {
> -	struct freezer *freezer;
> +	struct freezer *freezer = cgroup_freezer(cgroup);
>  	enum freezer_state state;
>  
> -	freezer = cgroup_freezer(cgroup);
>  	spin_lock_irq(&freezer->lock);
> -	update_if_frozen(cgroup, freezer);
> +	update_if_frozen(freezer);
>  	state = freezer->state;
>  	spin_unlock_irq(&freezer->lock);
>  
> @@ -225,8 +223,9 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
>  	return 0;
>  }
>  
> -static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> +static void freeze_cgroup(struct freezer *freezer)
>  {
> +	struct cgroup *cgroup = freezer->css.cgroup;
>  	struct cgroup_iter it;
>  	struct task_struct *task;
>  
> @@ -236,8 +235,9 @@ static void freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
>  	cgroup_iter_end(cgroup, &it);
>  }
>  
> -static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
> +static void unfreeze_cgroup(struct freezer *freezer)
>  {
> +	struct cgroup *cgroup = freezer->css.cgroup;
>  	struct cgroup_iter it;
>  	struct task_struct *task;
>  
> @@ -247,11 +247,9 @@ static void unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
>  	cgroup_iter_end(cgroup, &it);
>  }
>  
> -static void freezer_change_state(struct cgroup *cgroup,
> +static void freezer_change_state(struct freezer *freezer,
>  				 enum freezer_state goal_state)
>  {
> -	struct freezer *freezer = cgroup_freezer(cgroup);
> -
>  	/* also synchronizes against task migration, see freezer_attach() */
>  	spin_lock_irq(&freezer->lock);
>  
> @@ -260,13 +258,13 @@ static void freezer_change_state(struct cgroup *cgroup,
>  		if (freezer->state != CGROUP_THAWED)
>  			atomic_dec(&system_freezing_cnt);
>  		freezer->state = CGROUP_THAWED;
> -		unfreeze_cgroup(cgroup, freezer);
> +		unfreeze_cgroup(freezer);
>  		break;
>  	case CGROUP_FROZEN:
>  		if (freezer->state == CGROUP_THAWED)
>  			atomic_inc(&system_freezing_cnt);
>  		freezer->state = CGROUP_FREEZING;
> -		freeze_cgroup(cgroup, freezer);
> +		freeze_cgroup(freezer);
>  		break;
>  	default:
>  		BUG();
> @@ -275,8 +273,7 @@ static void freezer_change_state(struct cgroup *cgroup,
>  	spin_unlock_irq(&freezer->lock);
>  }
>  
> -static int freezer_write(struct cgroup *cgroup,
> -			 struct cftype *cft,
> +static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>  			 const char *buffer)
>  {
>  	enum freezer_state goal_state;
> @@ -288,7 +285,7 @@ static int freezer_write(struct cgroup *cgroup,
>  	else
>  		return -EINVAL;
>  
> -	freezer_change_state(cgroup, goal_state);
> +	freezer_change_state(cgroup_freezer(cgroup), goal_state);
>  	return 0;
>  }
>  
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Rafael J. Wysocki @ 2012-11-08  9:56 UTC (permalink / raw)
  To: Huang Ying; +Cc: Alan Stern, linux-kernel, linux-pm
In-Reply-To: <1352340276.7176.37.camel@yhuang-dev>

On Thursday, November 08, 2012 10:04:36 AM Huang Ying wrote:
> On Thu, 2012-11-08 at 02:35 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 08, 2012 09:15:08 AM Huang Ying wrote:
> > > On Thu, 2012-11-08 at 00:09 +0100, Rafael J. Wysocki wrote:

[...]

> > > I think the patch can fix the issue in a better way.
> > 
> > I'm not sure what you mean.
> 
> I mean your patch can fix the driver-less VGA issue.  And it is better
> than my original patch.  The following discussion is not about this
> specific issue.  Just about PM core logic.

OK

> > > Do we still need to clarify state about disabled and forbidden?  When a
> > > device is forbidden and the usage_count > 0,
> > 
> > "Forbidden" always means usage_count > 0.
> 
> Yes.
> 
> > > is it a good idea to allow to set device state to SUSPENDED if the device
> > > is disabled?
> > 
> > No, it is not.  The status should always be ACTIVE as long as usage_count > 0.
> > However, in some cases we actually would like to change the status to
> > SUSPENDED when usage_count becomes equal to 0, because that means we can
> > suspend (I mean really suspend) the parents of the devices in question
> > (and we want to notify the parents in those cases).
> 
> So do you think Alan Stern's suggestion about forbidden and disabled is
> the right way to go?

I'm not really sure about that.

My original idea was that the runtime PM status and usage counter would
only matter when runtime PM of a device was enabled.  That leads to
problems, though, when we enable runtime PM of a device whose usage
counter is greater from zero and status is SUSPENDED.  Also when the
device's status is ACTIVE, but its parent's child count is 0.

It's not very easy to fix this at the core level, though, because we
depend on the current behavior in some places.  I'm thinking that
perhaps pm_runtime_enable() should just WARN() if things are obviously
inconsistent (although there still may be problems, for example, if the
parent's child count is 2 when we enable runtime PM for its child, but that
child is the only one it actually has).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
From: Michal Hocko @ 2012-11-08  9:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, rjw-KKrjLPT3xs0,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Sat 03-11-12 01:38:29, Tejun Heo wrote:
> Currently, cgroup doesn't provide any generic helper for walking a
> given cgroup's children or descendants.  This patch adds the following
> three macros.
> 
> * cgroup_for_each_child() - walk immediate children of a cgroup.
> 
> * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
>   in pre-order tree traversal.
> 
> * cgroup_for_each_descendant_post() - visit all descendants of a
>   cgroup in post-order tree traversal.
> 
> All three only require the user to hold RCU read lock during
> traversal.  Verifying that each iterated cgroup is online is the
> responsibility of the user.  When used with proper synchronization,
> cgroup_for_each_descendant_pre() can be used to propagate config
> updates to descendants in reliable way.  See comments for details.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

I will convert mem_cgroup_iter to use this rather than css_get_next
after this gets into the next tree so that it can fly via Andrew.

Reviewed-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

Just a minor knit. You are talking about a config propagation while I
would consider state propagation more clear and less confusing. Config
is usually stable enough so that post_create is not necessary for
syncing (e.g. memcg.swappiness). It is a state which must be consistent
throughout the hierarchy which matters here.

Thanks!

> ---
>  include/linux/cgroup.h | 82 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/cgroup.c        | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 168 insertions(+)
> 
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 90c33eb..0020329 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -534,6 +534,88 @@ static inline struct cgroup* task_cgroup(struct task_struct *task,
>  	return task_subsys_state(task, subsys_id)->cgroup;
>  }
>  
> +/**
> + * cgroup_for_each_child - iterate through children of a cgroup
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose children to walk
> + *
> + * Walk @cgroup's children.  Must be called under rcu_read_lock().  A child
> + * cgroup which hasn't finished ->post_create() or already has finished
> + * ->pre_destroy() may show up during traversal and it's each subsystem's
> + * responsibility to verify that each @pos is alive.
> + *
> + * If a subsystem synchronizes against the parent in its ->post_create()
> + * and before starting iterating, a cgroup which finished ->post_create()
> + * is guaranteed to be visible in the future iterations.
> + */
> +#define cgroup_for_each_child(pos, cgroup)				\
> +	list_for_each_entry_rcu(pos, &(cgroup)->children, sibling)
> +
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> +					  struct cgroup *cgroup);
> +
> +/**
> + * cgroup_for_each_descendant_pre - pre-order walk of a cgroup's descendants
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * Walk @cgroup's descendants.  Must be called under rcu_read_lock().  A
> + * descendant cgroup which hasn't finished ->post_create() or already has
> + * finished ->pre_destroy() may show up during traversal and it's each
> + * subsystem's responsibility to verify that each @pos is alive.
> + *
> + * If a subsystem synchronizes against the parent in its ->post_create()
> + * and before starting iterating, and synchronizes against @pos on each
> + * iteration, any descendant cgroup which finished ->post_create() is
> + * guaranteed to be visible in the future iterations.
> + *
> + * In other words, the following guarantees that a descendant can't escape
> + * configuration of its ancestors.
> + *
> + * my_post_create(@cgrp)
> + * {
> + *	Lock @cgrp->parent and @cgrp;
> + *	Inherit config from @cgrp->parent;
> + *	Unlock both.
> + * }
> + *
> + * my_update_config(@cgrp)
> + * {
> + *	Lock @cgrp;
> + *	Update @cgrp's config;
> + *	Unlock @cgrp;
> + *
> + *	cgroup_for_each_descendant_pre(@pos, @cgrp) {
> + *		Lock @pos;
> + *		Verify @pos is alive and inherit config from @pos->parent;
> + *		Unlock @pos;
> + *	}
> + * }
> + *
> + * Alternatively, a subsystem may choose to use a single global lock to
> + * synchronize ->post_create() and ->pre_destroy() against tree-walking
> + * operations.
> + */
> +#define cgroup_for_each_descendant_pre(pos, cgroup)			\
> +	for (pos = cgroup_next_descendant_pre(NULL, (cgroup)); (pos);	\
> +	     pos = cgroup_next_descendant_pre((pos), (cgroup)))
> +
> +struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
> +					   struct cgroup *cgroup);
> +
> +/**
> + * cgroup_for_each_descendant_post - post-order walk of a cgroup's descendants
> + * @pos: the cgroup * to use as the loop cursor
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * Similar to cgroup_for_each_descendant_pre() but performs post-order
> + * traversal instead.  Note that the walk visibility guarantee described in
> + * pre-order walk doesn't apply the same to post-order walks.
> + */
> +#define cgroup_for_each_descendant_post(pos, cgroup)			\
> +	for (pos = cgroup_next_descendant_post(NULL, (cgroup)); (pos);	\
> +	     pos = cgroup_next_descendant_post((pos), (cgroup)))
> +
>  /* A cgroup_iter should be treated as an opaque object */
>  struct cgroup_iter {
>  	struct list_head *cg_link;
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index cc5d2a0..8bd662c 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -2985,6 +2985,92 @@ static void cgroup_enable_task_cg_lists(void)
>  	write_unlock(&css_set_lock);
>  }
>  
> +/**
> + * cgroup_next_descendant_pre - find the next descendant for pre-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_pre().  Find the next
> + * descendant to visit for pre-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_pre(struct cgroup *pos,
> +					  struct cgroup *cgroup)
> +{
> +	struct cgroup *next;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	/* if first iteration, pretend we just visited @cgroup */
> +	if (!pos) {
> +		if (list_empty(&cgroup->children))
> +			return NULL;
> +		pos = cgroup;
> +	}
> +
> +	/* visit the first child if exists */
> +	next = list_first_or_null_rcu(&pos->children, struct cgroup, sibling);
> +	if (next)
> +		return next;
> +
> +	/* no child, visit my or the closest ancestor's next sibling */
> +	do {
> +		next = list_entry_rcu(pos->sibling.next, struct cgroup,
> +				      sibling);
> +		if (&next->sibling != &pos->parent->children)
> +			return next;
> +
> +		pos = pos->parent;
> +	} while (pos != cgroup);
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_pre);
> +
> +static struct cgroup *cgroup_leftmost_descendant(struct cgroup *pos)
> +{
> +	struct cgroup *last;
> +
> +	do {
> +		last = pos;
> +		pos = list_first_or_null_rcu(&pos->children, struct cgroup,
> +					     sibling);
> +	} while (pos);
> +
> +	return last;
> +}
> +
> +/**
> + * cgroup_next_descendant_post - find the next descendant for post-order walk
> + * @pos: the current position (%NULL to initiate traversal)
> + * @cgroup: cgroup whose descendants to walk
> + *
> + * To be used by cgroup_for_each_descendant_post().  Find the next
> + * descendant to visit for post-order traversal of @cgroup's descendants.
> + */
> +struct cgroup *cgroup_next_descendant_post(struct cgroup *pos,
> +					   struct cgroup *cgroup)
> +{
> +	struct cgroup *next;
> +
> +	WARN_ON_ONCE(!rcu_read_lock_held());
> +
> +	/* if first iteration, visit the leftmost descendant */
> +	if (!pos) {
> +		next = cgroup_leftmost_descendant(cgroup);
> +		return next != cgroup ? next : NULL;
> +	}
> +
> +	/* if there's an unvisited sibling, visit its leftmost descendant */
> +	next = list_entry_rcu(pos->sibling.next, struct cgroup, sibling);
> +	if (&next->sibling != &pos->parent->children)
> +		return cgroup_leftmost_descendant(next);
> +
> +	/* no sibling left, visit parent */
> +	next = pos->parent;
> +	return next != cgroup ? next : NULL;
> +}
> +EXPORT_SYMBOL_GPL(cgroup_next_descendant_post);
> +
>  void cgroup_iter_start(struct cgroup *cgrp, struct cgroup_iter *it)
>  	__acquires(css_set_lock)
>  {
> -- 
> 1.7.11.7
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH v4] Thermal: exynos: Add sysfs node supporting exynos's emulation mode.
From: Amit Kachhap @ 2012-11-08  9:24 UTC (permalink / raw)
  To: Jonghwa Lee
  Cc: linux-pm, linux-kernel, Len Brown, Durgadoss R, Rafael J. Wysocki,
	MyungJoo Ham, Kyungmin Park, Zhang Rui
In-Reply-To: <1351823091-2540-1-git-send-email-jonghwa3.lee@samsung.com>

Hi Jonghwa Lee,

I tested this patch and it looks good. I have some minor comments below,

Reviewed-by: Amit Daniel Kachhap <amit.kachhap@linaro.org>

Thanks,
Amit Daniel
On 2 November 2012 07:54, Jonghwa Lee <jonghwa3.lee@samsung.com> wrote:
> This patch supports exynos's emulation mode with newly created sysfs node.
> Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal
> management unit. Thermal emulation mode supports software debug for TMU's
> operation. User can set temperature manually with software code and TMU
> will read current temperature from user value not from sensor's value.
> This patch includes also documentary placed under Documentation/thermal/.
>
> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
> ---
> v4
>  - Fix Typo.
>  - Remove unnecessary codes.
>  - Add comments about feature of exynos emulation operation to the document.
>
> v3
>  - Remove unnecessay variables.
>  - Do some code clean in exynos_tmu_emulation_store().
>  - Make wrapping function of sysfs node creation function to use
>    #ifdefs in minimum.
>
> v2
>  exynos_thermal.c
>  - Fix build error occured by wrong emulation control register name.
>  - Remove exynos5410 dependent codes.
>  exynos_thermal_emulation
>  - Align indentation.
>
>  Documentation/thermal/exynos_thermal_emulation |   56 +++++++++++++++
>  drivers/thermal/Kconfig                        |    9 +++
>  drivers/thermal/exynos_thermal.c               |   91 ++++++++++++++++++++++++
>  3 files changed, 156 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/thermal/exynos_thermal_emulation
>
> diff --git a/Documentation/thermal/exynos_thermal_emulation b/Documentation/thermal/exynos_thermal_emulation
> new file mode 100644
> index 0000000..a6ea06f
> --- /dev/null
> +++ b/Documentation/thermal/exynos_thermal_emulation
> @@ -0,0 +1,56 @@
> +EXYNOS EMULATION MODE
> +========================
> +
> +Copyright (C) 2012 Samsung Electronics
> +
> +Written by Jonghwa Lee <jonghwa3.lee@samsung.com>
> +
> +Description
> +-----------
> +
> +Exynos 4x12 (4212, 4412) and 5 series provide emulation mode for thermal management unit.
> +Thermal emulation mode supports software debug for TMU's operation. User can set temperature
> +manually with software code and TMU will read current temperature from user value not from
> +sensor's value.
> +
> +Enabling CONFIG_EXYNOS_THERMAL_EMUL option will make this support in available.
> +When it's enabled, sysfs node will be created under
> +/sys/bus/platform/devices/'exynos device name'/ with name of 'emulation'.
> +
> +The sysfs node, 'emulation', will contain value 0 for the initial state. When you input any
> +temperature you want to update to sysfs node, it automatically enable emulation mode and
> +current temperature will be changed into it.
> +(Exynos also supports user changable delay time which would be used to delay of
> + changing temperature. However, this node only uses same delay of real sensing time, 938us.)
> +
> +Exynos emulation mode requires synchronous of value changing and enabling. It means when you
> +want to update the any value of delay or next temperature, then you have to enable emulation
> +mode at the same time. (Or you have to keep the mode enabling.) If you don't, it fails to
> +change the value to updated one and just use last succeessful value repeatedly. That's why
> +this node gives users the right to change termerpature only. Just one interface makes it more
> +simply to use.
> +
> +Disabling emulation mode only requires writing value 0 to sysfs node.
> +
> +
> +TEMP   120 |
> +           |
> +       100 |
> +           |
> +        80 |
> +           |                            +-----------
> +        60 |                            |          |
> +           |              +-------------|          |
> +        40 |              |             |          |
> +           |              |             |          |
> +        20 |              |             |          +----------
> +           |              |             |          |          |
> +         0 |______________|_____________|__________|__________|_________
> +                  A             A          A                  A     TIME
> +                  |<----->|     |<----->|  |<----->|          |
> +                  | 938us |     |       |  |       |          |
> +emulation    :  0  50     |     70      |  20      |          0
> +current temp :   sensor   50            70         20        sensor
> +
> +
> +
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index e1cb6bd..c02a66c 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -55,3 +55,12 @@ config EXYNOS_THERMAL
>         help
>           If you say yes here you get support for TMU (Thermal Managment
>           Unit) on SAMSUNG EXYNOS series of SoC.
> +
> +config EXYNOS_THERMAL_EMUL
> +       bool "EXYNOS TMU emulation mode support"
> +       depends on !CPU_EXYNOS4210 && EXYNOS_THERMAL
Instead of using CPU_EXYNOS4210 here it is better to use data->soc ==
SOC_ARCH_EXYNOS4210 inside the emulation show/store functions.
> +       help
> +         Exynos 4412 and 4414 and 5 series has emulation mode on TMU.
> +         Enable this option will be make sysfs node in exynos thermal platform
> +         device directory to support emulation mode. With emulation mode sysfs
> +         node, you can manually input temperature to TMU for simulation purpose.
> diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
> index fd03e85..eebd4e5 100644
> --- a/drivers/thermal/exynos_thermal.c
> +++ b/drivers/thermal/exynos_thermal.c
> @@ -99,6 +99,14 @@
>  #define IDLE_INTERVAL 10000
>  #define MCELSIUS       1000
>
> +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> +#define EXYNOS_EMUL_TIME       0x57F0
> +#define EXYNOS_EMUL_TIME_SHIFT 16
> +#define EXYNOS_EMUL_DATA_SHIFT 8
> +#define EXYNOS_EMUL_DATA_MASK  0xFF
> +#define EXYNOS_EMUL_ENABLE     0x1
> +#endif /* CONFIG_EXYNOS_THERMAL_EMUL */
> +
>  /* CPU Zone information */
>  #define PANIC_ZONE      4
>  #define WARN_ZONE       3
> @@ -832,6 +840,82 @@ static inline struct  exynos_tmu_platform_data *exynos_get_driver_data(
>         return (struct exynos_tmu_platform_data *)
>                         platform_get_device_id(pdev)->driver_data;
>  }
> +
> +#ifdef CONFIG_EXYNOS_THERMAL_EMUL
> +static ssize_t exynos_tmu_emulation_show(struct device *dev,
> +                                        struct device_attribute *attr,
> +                                        char *buf)
> +{
> +       struct platform_device *pdev = container_of(dev,
> +                                       struct platform_device, dev);
> +       struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> +       unsigned int reg;
> +       u8 temp_code;
> +       int temp = 0;
> +
> +       mutex_lock(&data->lock);
> +       clk_enable(data->clk);
> +       reg = readl(data->base + EXYNOS_EMUL_CON);
> +       clk_disable(data->clk);
> +       mutex_unlock(&data->lock);
> +
> +       if (reg & EXYNOS_EMUL_ENABLE) {
> +               reg >>= EXYNOS_EMUL_DATA_SHIFT;
> +               temp_code = reg & EXYNOS_EMUL_DATA_MASK;
> +               temp = code_to_temp(data, temp_code);
> +       }
> +
> +       return sprintf(buf, "%d\n", temp);
Currently in  /sys/devices/virtual/thermal/thermal_zone0/ all
temperatures are shown in millicelsius so it is better show this in
millicelsius also.
> +}
> +
> +static ssize_t exynos_tmu_emulation_store(struct device *dev,
> +                                       struct device_attribute *attr,
> +                                       const char *buf, size_t count)
> +{
> +       struct platform_device *pdev = container_of(dev,
> +                                       struct platform_device, dev);
> +       struct exynos_tmu_data *data = platform_get_drvdata(pdev);
> +       unsigned int reg;
> +       int temp;
> +
> +       if (!sscanf(buf, "%d\n", &temp) || temp < 0)
> +               return -EINVAL;
> +
> +       mutex_lock(&data->lock);
> +       clk_enable(data->clk);
> +
> +       reg = readl(data->base + EXYNOS_EMUL_CON);
> +
> +       if (temp)
> +               reg = (EXYNOS_EMUL_TIME << EXYNOS_EMUL_TIME_SHIFT) |
> +                       (temp_to_code(data, temp) << EXYNOS_EMUL_DATA_SHIFT) |
Same as above.
> +                       EXYNOS_EMUL_ENABLE;
> +       else
> +               reg &= ~EXYNOS_EMUL_ENABLE;
> +
> +       writel(reg, data->base + EXYNOS_EMUL_CON);
> +
> +       clk_disable(data->clk);
> +       mutex_unlock(&data->lock);
> +
> +       return count;
> +}
> +
> +static DEVICE_ATTR(emulation, 0644, exynos_tmu_emulation_show,
> +                                       exynos_tmu_emulation_store);
> +static int create_emulation_sysfs(struct device *dev)
> +{
> +       return device_create_file(dev, &dev_attr_emulation);
> +}
> +static void remove_emulation_sysfs(struct device *dev)
> +{
> +       device_remove_file(dev, &dev_attr_emulation);
> +}
> +#else
> +static inline int create_emulation_sysfs(struct device *dev) {return 0;}
> +static inline void remove_emulation_sysfs(struct device *dev){}
> +#endif
> +
>  static int __devinit exynos_tmu_probe(struct platform_device *pdev)
>  {
>         struct exynos_tmu_data *data;
> @@ -930,6 +1014,11 @@ static int __devinit exynos_tmu_probe(struct platform_device *pdev)
>                 dev_err(&pdev->dev, "Failed to register thermal interface\n");
>                 goto err_clk;
>         }
> +
> +       ret = create_emulation_sysfs(&pdev->dev);
> +       if (ret)
> +               dev_err(&pdev->dev, "Failed to create emulation mode sysfs node\n");
> +
>         return 0;
>  err_clk:
>         platform_set_drvdata(pdev, NULL);
> @@ -941,6 +1030,8 @@ static int __devexit exynos_tmu_remove(struct platform_device *pdev)
>  {
>         struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>
> +       remove_emulation_sysfs(&pdev->dev);
> +
>         exynos_tmu_control(pdev, false);
>
>         exynos_unregister_thermal();
> --
> 1.7.4.1
>

^ permalink raw reply

* Re: [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags
From: Kamezawa Hiroyuki @ 2012-11-08  5:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20121108044255.GG2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>

(2012/11/08 13:42), Tejun Heo wrote:
> Hello, Kame.
>
> On Thu, Nov 08, 2012 at 01:37:50PM +0900, Kamezawa Hiroyuki wrote:
>> How about
>> enum {
>>     __CGROUP_FREEZING,
>>     __CGROUP_FROZEN,
>> };
>>
>> #define CGROUP_FREEZER_STATE_MASK  0x3
>> #define CGROUP_FREEZER_STATE(state)	((state) & CGROUP_FREEZER_STATE_MASK)
>> #define CGROUP_THAW(state)	(CGROUP_FREEZER_STATE(state) == 0)
>> #define CGROUP_FREEZING(state)	(CGROUP_FREEZER_STATE(state) == __CGROUP_FREEZING)
>> #define CGROUP_FROZEN(state)\
>> 	(CGROUP_FREEZER_STATE(state) == (__CGROUP_FREEZING | __CGROUP_FROZEN))
>
> I think it's a bit overdone and we have cases where we test for
> FREEZING regardless of FROZEN and cases where test for FREEZING &&
> !FROZEN.  We can have, say, CGROUP_FREZING() and then
> CGROUP_FREEZING_BUT_NOT_FROZEN(), but it feels more like obfuscation
> than anything else.
>

Hm, then, I'm glad if I can see what combinations of flags are valid and
meanings of them in source code comments.

Anyway,
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
From: Kamezawa Hiroyuki @ 2012-11-08  4:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan, mhocko, rjw, linux-pm, fweisbec, containers,
	linux-kernel, cgroups
In-Reply-To: <20121108044521.GH2660@mtj.dyndns.org>

(2012/11/08 13:45), Tejun Heo wrote:
> Hello,
>
> On Thu, Nov 08, 2012 at 01:42:22PM +0900, Kamezawa Hiroyuki wrote:
>> (2012/11/03 17:38), Tejun Heo wrote:
>>> Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
>>> the two flags.  This is to prepare for full hierarchy support.
>>>
>>> freezer_apply_date() is updated such that it can handle setting and
>>> clearing of both flags.  The two flags are also exposed to userland
>>> via read-only files self_freezing and parent_freezing.
>>>
>>> Other than the added cgroupfs files, this patch doesn't introduce any
>>> behavior change.
>>>
>>> Signed-off-by: Tejun Heo <tj@kernel.org>
>>
>> What is the purpose of FREEZING_SELG/PARENT, having 2 flags and
>> make it visible to users ?
>
> SELF is the configuration of the current cgroup, PARENT is freezing
> state inherited from an ancestor.  If either is set, the cgroup is
> freezing.  Without the two conditions visible to userland, it can get
> a bit confusing as echoing "THAWED" to state may not do anything to
> the cgroup.
>
Got it, thank you. I'm glad if you add some more explanation in comments
and considered use-case in changelog.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame






^ permalink raw reply

* Re: [PATCH 3/6] ARM: OMAP: voltage: move voltdm_reset to platform_data header
From: Nishanth Menon @ 2012-11-08  4:52 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-omap, Rafael, Kevin Hilman, Anton Vorontsov, linux-pm
In-Reply-To: <20121107014608.GB6801@atomide.com>

On 17:46-20121106, Tony Lindgren wrote:
> * Nishanth Menon <nm@ti.com> [121106 17:30]:
> > On 17:18-20121106, Tony Lindgren wrote:
> > > * Nishanth Menon <nm@ti.com> [121106 13:50]:
> > > > On 10:49-20121106, Tony Lindgren wrote:
> > > > > 
> > > > > Looks like there are other things there too that's not platform data:
> > > > > 
> > > > > struct voltagedomain *voltdm_lookup(const char *name);
> > > > > int voltdm_scale(struct voltagedomain *voltdm, unsigned long target_volt);
> > > > > unsigned long voltdm_get_voltage(struct voltagedomain *voltdm);
> > > > > struct omap_volt_data *omap_voltage_get_voltdata(struct voltagedomain *voltdm,
> > > > > 		unsigned long volt);
> > > > > 
> > > > > Can you please add a patch fixing that ASAP?
> > > > 
> > > > Agreed include/linux/platform_data/voltage-omap.h has more functions as well.
> > > > Considering it did:
> > > > rename arch/arm/plat-omap/include/plat/voltage.h =>
> > > > include/linux/platform_data/voltage-omap.h
> > > > 
> > > > Where do we move these functions to?
> > > > 
> > > > drivers/power/avs/smartreflex.c needs:
> > > > omap_voltage_get_voltdata
> > > > and
> > > > drivers/power/avs/smartreflex-class3.c
> > > > will need voltdm_reset and voltdm_get_voltage
> > > 
> > > How about something local drivers/power/avs/smartreflex.h?
> > These APIs are exposed by voltage layer, not smartreflex :(
> > stuff like voltdm_scale will have to be used by regulator logic
> > eventually, so moving to AVS driver header is probably not right.
> 
> Well ideally you'd have some generic API doing it rather than
> these omap specifc exported functions.
> 
> Meanwhile, I guess you need to find some suitable location
> for the header file that works for Rafael.
I wonder if including mach/voltage.h is acceptable here? if Kevin could
suggest an option as well, it will be great.

-- 
Regards,
Nishanth Menon

^ permalink raw reply

* Re: [PATCH 8/9] cgroup_freezer: add ->post_create() and ->pre_destroy() and track online state
From: Kamezawa Hiroyuki @ 2012-11-08  4:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1351931915-1701-9-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> A cgroup is online and visible to iteration between ->post_create()
> and ->pre_destroy().  This patch introduces CGROUP_FREEZER_ONLINE and
> toggles it from the newly added freezer_post_create() and
> freezer_pre_destroy() while holding freezer->lock such that a
> cgroup_freezer can be reilably distinguished to be online.  This will
> be used by full hierarchy support.
> 
> ONLINE test is added to freezer_apply_state() but it currently doesn't
> make any difference as freezer_write() can only be called for an
> online cgroup.
> 
> Adjusting system_freezing_cnt on destruction is moved from
> freezer_destroy() to the new freezer_pre_destroy() for consistency.
> 
> This patch doesn't introduce any noticeable behavior change.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>   kernel/cgroup_freezer.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index b8ad93c..4f12d31 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,6 +23,7 @@
>   #include <linux/seq_file.h>
>   
>   enum freezer_state_flags {
> +	CGROUP_FREEZER_ONLINE	= (1 << 0), /* freezer is fully online */

Could you explain what 'online' means here again, rather than changelog ?
BTW, 'online' is a shared concept, between post_create() and pre_destroy(), among
developpers ? Is it new ?

Anyway, the patch itself is simple.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
From: Tejun Heo @ 2012-11-08  4:45 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <509B382E.4030707-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

Hello,

On Thu, Nov 08, 2012 at 01:42:22PM +0900, Kamezawa Hiroyuki wrote:
> (2012/11/03 17:38), Tejun Heo wrote:
> >Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> >the two flags.  This is to prepare for full hierarchy support.
> >
> >freezer_apply_date() is updated such that it can handle setting and
> >clearing of both flags.  The two flags are also exposed to userland
> >via read-only files self_freezing and parent_freezing.
> >
> >Other than the added cgroupfs files, this patch doesn't introduce any
> >behavior change.
> >
> >Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> 
> What is the purpose of FREEZING_SELG/PARENT, having 2 flags and
> make it visible to users ?

SELF is the configuration of the current cgroup, PARENT is freezing
state inherited from an ancestor.  If either is set, the cgroup is
freezing.  Without the two conditions visible to userland, it can get
a bit confusing as echoing "THAWED" to state may not do anything to
the cgroup.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] Thermal: exynos: Add support for temperature falling interrupt.
From: Amit Kachhap @ 2012-11-08  4:43 UTC (permalink / raw)
  To: Jonghwa Lee
  Cc: linux-pm, linux-kernel, Len Brown, Durgadoss R, Rafael J. Wysocki,
	MyungJoo Ham, Kyungmin Park, Zhang Rui
In-Reply-To: <1351756075-1467-1-git-send-email-jonghwa3.lee@samsung.com>

Hi Jonghwa Lee,

Adding Zhang Rui and Durgadoss.

I reviewed and tested this patch. This is a nice feature to have but
this will not call the cpufreq cooling device properly as thermal
framework calls the frequency states in a step wise fashion which is
not possible if we disable polling completely. I submitted a patch to
fix this issue.
Also some minor review comments below which I have already fixed in my
patchset.

Thanks,
Amit

On 1 November 2012 13:17, Jonghwa Lee <jonghwa3.lee@samsung.com> wrote:
> This patch introduces using temperature falling interrupt in exynos
> thermal driver. Former patch, it only use polling way to check
> whether if system themperature is fallen. However, exynos SOC also
> provides temperature falling interrupt way to do same things by hw.
> This feature is not supported in exynos4210.
>
> Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com>
> ---
>  drivers/thermal/exynos_thermal.c             |   81 +++++++++++++++-----------
>  include/linux/platform_data/exynos_thermal.h |    3 +
>  2 files changed, 49 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/thermal/exynos_thermal.c b/drivers/thermal/exynos_thermal.c
> index 4bdd1eb..f91a699 100644
> --- a/drivers/thermal/exynos_thermal.c
> +++ b/drivers/thermal/exynos_thermal.c
> @@ -94,6 +94,7 @@
>  #define SENSOR_NAME_LEN        16
>  #define MAX_TRIP_COUNT 8
>  #define MAX_COOLING_DEVICE 4
> +#define MAX_THRESHOLD_LEVS 4
>
>  #define ACTIVE_INTERVAL 500
>  #define IDLE_INTERVAL 10000
> @@ -133,6 +134,7 @@ struct exynos_tmu_data {
>  struct thermal_trip_point_conf {
>         int trip_val[MAX_TRIP_COUNT];
>         int trip_count;
> +       u8 trigger_falling;
>  };
>
>  struct thermal_cooling_conf {
> @@ -182,7 +184,8 @@ static int exynos_set_mode(struct thermal_zone_device *thermal,
>
>         mutex_lock(&th_zone->therm_dev->lock);
>
> -       if (mode == THERMAL_DEVICE_ENABLED)
> +       if (mode == THERMAL_DEVICE_ENABLED &&
> +               !th_zone->sensor_conf->trip_data.trigger_falling)
>                 th_zone->therm_dev->polling_delay = IDLE_INTERVAL;
>         else
>                 th_zone->therm_dev->polling_delay = 0;
> @@ -421,7 +424,8 @@ static void exynos_report_trigger(void)
>                         break;
>         }
>
> -       if (th_zone->mode == THERMAL_DEVICE_ENABLED) {
> +       if (th_zone->mode == THERMAL_DEVICE_ENABLED &&
> +               !th_zone->sensor_conf->trip_data.trigger_falling) {
>                 if (i > 0)
>                         th_zone->therm_dev->polling_delay = ACTIVE_INTERVAL;
>                 else
> @@ -460,7 +464,8 @@ static int exynos_register_thermal(struct thermal_sensor_conf *sensor_conf)
>
>         th_zone->therm_dev = thermal_zone_device_register(sensor_conf->name,
>                         EXYNOS_ZONE_COUNT, 0, NULL, &exynos_dev_ops, 0,
> -                       IDLE_INTERVAL);
> +                       sensor_conf->trip_data.trigger_falling ?
> +                       0 : IDLE_INTERVAL);
>
>         if (IS_ERR(th_zone->therm_dev)) {
>                 pr_err("Failed to register thermal zone device\n");
> @@ -567,8 +572,9 @@ static int exynos_tmu_initialize(struct platform_device *pdev)
>  {
>         struct exynos_tmu_data *data = platform_get_drvdata(pdev);
>         struct exynos_tmu_platform_data *pdata = data->pdata;
> -       unsigned int status, trim_info, rising_threshold;
> -       int ret = 0, threshold_code;
> +       unsigned int status, trim_info;
> +       unsigned int rising_threshold = 0, falling_threshold = 0;
> +       int ret = 0, threshold_code, i, trigger_levs = 0;
>
>         mutex_lock(&data->lock);
>         clk_enable(data->clk);
> @@ -593,6 +599,11 @@ static int exynos_tmu_initialize(struct platform_device *pdev)
>                         (data->temp_error2 != 0))
>                 data->temp_error1 = pdata->efuse_value;
>
> +       /* Count trigger levels to be enabled */
> +       for (i = 0; i < MAX_THRESHOLD_LEVS; i++)
> +               if (pdata->trigger_levels[i])
> +                       trigger_levs++;
> +
>         if (data->soc == SOC_ARCH_EXYNOS4210) {
>                 /* Write temperature code for threshold */
>                 threshold_code = temp_to_code(data, pdata->threshold);
> @@ -602,44 +613,38 @@ static int exynos_tmu_initialize(struct platform_device *pdev)
>                 }
>                 writeb(threshold_code,
>                         data->base + EXYNOS4210_TMU_REG_THRESHOLD_TEMP);
> -
> -               writeb(pdata->trigger_levels[0],
> -                       data->base + EXYNOS4210_TMU_REG_TRIG_LEVEL0);
> -               writeb(pdata->trigger_levels[1],
> -                       data->base + EXYNOS4210_TMU_REG_TRIG_LEVEL1);
> -               writeb(pdata->trigger_levels[2],
> -                       data->base + EXYNOS4210_TMU_REG_TRIG_LEVEL2);
> -               writeb(pdata->trigger_levels[3],
> -                       data->base + EXYNOS4210_TMU_REG_TRIG_LEVEL3);
> +               for (i = 0; i < trigger_levs; i++)
> +                       writeb(pdata->trigger_levels[i],
> +                       data->base + EXYNOS4210_TMU_REG_TRIG_LEVEL0 + i * 4);
>
>                 writel(EXYNOS4210_TMU_INTCLEAR_VAL,
>                         data->base + EXYNOS_TMU_REG_INTCLEAR);
>         } else if (data->soc == SOC_ARCH_EXYNOS) {
> -               /* Write temperature code for threshold */
> -               threshold_code = temp_to_code(data, pdata->trigger_levels[0]);
> -               if (threshold_code < 0) {
> -                       ret = threshold_code;
> -                       goto out;
> -               }
> -               rising_threshold = threshold_code;
> -               threshold_code = temp_to_code(data, pdata->trigger_levels[1]);
> -               if (threshold_code < 0) {
> -                       ret = threshold_code;
> -                       goto out;
> -               }
> -               rising_threshold |= (threshold_code << 8);
> -               threshold_code = temp_to_code(data, pdata->trigger_levels[2]);
> -               if (threshold_code < 0) {
> -                       ret = threshold_code;
> -                       goto out;
> +               /* Write temperature code for rising and falling threshold */
> +               for (i = 0; i < trigger_levs; i++) {
> +                       threshold_code = temp_to_code(data,
> +                                               pdata->trigger_levels[i]);
> +                       if (threshold_code < 0) {
> +                               ret = threshold_code;
> +                               goto out;
> +                       }
> +                       rising_threshold |= threshold_code;

This should be rising_threshold |= threshold_code << 8 * i;

> +                       if (pdata->threshold_falling) {
> +                               threshold_code = temp_to_code(data,
> +                                               pdata->trigger_levels[i] -
> +                                               pdata->threshold_falling);
> +                               if (threshold_code > 0)
> +                                       falling_threshold |=
> +                                               threshold_code << 8 * i;
> +                       }
>                 }
> -               rising_threshold |= (threshold_code << 16);
>
>                 writel(rising_threshold,
>                                 data->base + EXYNOS_THD_TEMP_RISE);
> -               writel(0, data->base + EXYNOS_THD_TEMP_FALL);
> +               writel(falling_threshold,
> +                               data->base + EXYNOS_THD_TEMP_FALL);
>
> -               writel(EXYNOS_TMU_CLEAR_RISE_INT|EXYNOS_TMU_CLEAR_FALL_INT,
> +               writel(EXYNOS_TMU_CLEAR_RISE_INT | EXYNOS_TMU_CLEAR_FALL_INT,
>                                 data->base + EXYNOS_TMU_REG_INTCLEAR);
>         }
>  out:
> @@ -672,6 +677,8 @@ static void exynos_tmu_control(struct platform_device *pdev, bool on)
>                         pdata->trigger_level2_en << 8 |
>                         pdata->trigger_level1_en << 4 |
>                         pdata->trigger_level0_en;
> +               if (pdata->threshold_falling)
> +                       interrupt_en |= interrupt_en << 16;
>         } else {
>                 con |= EXYNOS_TMU_CORE_OFF;
>                 interrupt_en = 0; /* Disable all interrupts */
> @@ -710,7 +717,8 @@ static void exynos_tmu_work(struct work_struct *work)
>
>
>         if (data->soc == SOC_ARCH_EXYNOS)
> -               writel(EXYNOS_TMU_CLEAR_RISE_INT,
> +               writel(EXYNOS_TMU_CLEAR_RISE_INT |
> +                               EXYNOS_TMU_CLEAR_FALL_INT,
>                                 data->base + EXYNOS_TMU_REG_INTCLEAR);
>         else
>                 writel(EXYNOS4210_TMU_INTCLEAR_VAL,
> @@ -767,6 +775,7 @@ static struct exynos_tmu_platform_data const exynos4210_default_tmu_data = {
>
>  #if defined(CONFIG_SOC_EXYNOS5250) || defined(CONFIG_SOC_EXYNOS4412)
>  static struct exynos_tmu_platform_data const exynos_default_tmu_data = {
> +       .threshold_falling = 10,
>         .trigger_levels[0] = 85,
>         .trigger_levels[1] = 103,
>         .trigger_levels[2] = 110,
> @@ -1002,6 +1011,8 @@ static int __devinit exynos_tmu_probe(struct platform_device *pdev)
>                 exynos_sensor_conf.trip_data.trip_val[i] =
>                         pdata->threshold + pdata->trigger_levels[i];
>
> +       exynos_sensor_conf.trip_data.trigger_falling = pdata->threshold_falling;
> +
>         exynos_sensor_conf.cooling_data.freq_clip_count =
>                                                 pdata->freq_tab_count;
>         for (i = 0; i < pdata->freq_tab_count; i++) {
> diff --git a/include/linux/platform_data/exynos_thermal.h b/include/linux/platform_data/exynos_thermal.h
> index a7bdb2f..da7e627 100644
> --- a/include/linux/platform_data/exynos_thermal.h
> +++ b/include/linux/platform_data/exynos_thermal.h
> @@ -53,6 +53,8 @@ struct freq_clip_table {
>   * struct exynos_tmu_platform_data
>   * @threshold: basic temperature for generating interrupt
>   *            25 <= threshold <= 125 [unit: degree Celsius]
> + * @threshold_falling: differntial value for setting threshold
> + *                    of temperature falling interrupt.
>   * @trigger_levels: array for each interrupt levels
>   *     [unit: degree Celsius]
>   *     0: temperature for trigger_level0 interrupt
> @@ -97,6 +99,7 @@ struct freq_clip_table {
>   */
>  struct exynos_tmu_platform_data {
>         u8 threshold;
> +       u8 threshold_falling;
>         u8 trigger_levels[4];
>         bool trigger_level0_en;
>         bool trigger_level1_en;
> --
> 1.7.4.1
>

^ permalink raw reply

* Re: [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags
From: Tejun Heo @ 2012-11-08  4:42 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <509B371E.9050005-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

Hello, Kame.

On Thu, Nov 08, 2012 at 01:37:50PM +0900, Kamezawa Hiroyuki wrote:
> How about
> enum {
>    __CGROUP_FREEZING,
>    __CGROUP_FROZEN,
> };
> 
> #define CGROUP_FREEZER_STATE_MASK  0x3
> #define CGROUP_FREEZER_STATE(state)	((state) & CGROUP_FREEZER_STATE_MASK)
> #define CGROUP_THAW(state)	(CGROUP_FREEZER_STATE(state) == 0)
> #define CGROUP_FREEZING(state)	(CGROUP_FREEZER_STATE(state) == __CGROUP_FREEZING)
> #define CGROUP_FROZEN(state)\
> 	(CGROUP_FREEZER_STATE(state) == (__CGROUP_FREEZING | __CGROUP_FROZEN))

I think it's a bit overdone and we have cases where we test for
FREEZING regardless of FROZEN and cases where test for FREEZING &&
!FROZEN.  We can have, say, CGROUP_FREZING() and then
CGROUP_FREEZING_BUT_NOT_FROZEN(), but it feels more like obfuscation
than anything else.

> >@@ -290,9 +284,9 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
> >  {
> >  	bool freeze;
> >
> >-	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> >+	if (strcmp(buffer, freezer_state_strs(0)) == 0)
> 
> Can't we have a name for "0" ?

We can define CGROUP_THAWED to be zero.  I'd rather keep it explicitly
zero tho.  freezer state is mask of freezing and frozen flags and no
flag set means thawed.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 7/9] cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]
From: Kamezawa Hiroyuki @ 2012-11-08  4:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-8-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of
> the two flags.  This is to prepare for full hierarchy support.
>
> freezer_apply_date() is updated such that it can handle setting and
> clearing of both flags.  The two flags are also exposed to userland
> via read-only files self_freezing and parent_freezing.
>
> Other than the added cgroupfs files, this patch doesn't introduce any
> behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

What is the purpose of FREEZING_SELG/PARENT, having 2 flags and
make it visible to users ?

Thanks,
-Kame



> ---
>   kernel/cgroup_freezer.c | 55 ++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index e76aa9f..b8ad93c 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -23,8 +23,12 @@
>   #include <linux/seq_file.h>
>
>   enum freezer_state_flags {
> -	CGROUP_FREEZING		= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FREEZING_SELF	= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FREEZING_PARENT	= (1 << 2), /* the parent freezer is freezing */
>   	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
> +
> +	/* mask for all FREEZING flags */
> +	CGROUP_FREEZING		= CGROUP_FREEZING_SELF | CGROUP_FREEZING_PARENT,
>   };
>
>   struct freezer {
> @@ -245,8 +249,13 @@ static void unfreeze_cgroup(struct freezer *freezer)
>    * freezer_apply_state - apply state change to a single cgroup_freezer
>    * @freezer: freezer to apply state change to
>    * @freeze: whether to freeze or unfreeze
> + * @state: CGROUP_FREEZING_* flag to set or clear
> + *
> + * Set or clear @state on @cgroup according to @freeze, and perform
> + * freezing or thawing as necessary.
>    */
> -static void freezer_apply_state(struct freezer *freezer, bool freeze)
> +static void freezer_apply_state(struct freezer *freezer, bool freeze,
> +				unsigned int state)
>   {
>   	/* also synchronizes against task migration, see freezer_attach() */
>   	lockdep_assert_held(&freezer->lock);
> @@ -254,13 +263,19 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
>   	if (freeze) {
>   		if (!(freezer->state & CGROUP_FREEZING))
>   			atomic_inc(&system_freezing_cnt);
> -		freezer->state |= CGROUP_FREEZING;
> +		freezer->state |= state;
>   		freeze_cgroup(freezer);
>   	} else {
> -		if (freezer->state & CGROUP_FREEZING)
> -			atomic_dec(&system_freezing_cnt);
> -		freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
> -		unfreeze_cgroup(freezer);
> +		bool was_freezing = freezer->state & CGROUP_FREEZING;
> +
> +		freezer->state &= ~state;
> +
> +		if (!(freezer->state & CGROUP_FREEZING)) {
> +			if (was_freezing)
> +				atomic_dec(&system_freezing_cnt);
> +			freezer->state &= ~CGROUP_FROZEN;
> +			unfreeze_cgroup(freezer);
> +		}
>   	}
>   }
>
> @@ -275,7 +290,7 @@ static void freezer_change_state(struct freezer *freezer, bool freeze)
>   {
>   	/* update @freezer */
>   	spin_lock_irq(&freezer->lock);
> -	freezer_apply_state(freezer, freeze);
> +	freezer_apply_state(freezer, freeze, CGROUP_FREEZING_SELF);
>   	spin_unlock_irq(&freezer->lock);
>   }
>
> @@ -295,6 +310,20 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>   	return 0;
>   }
>
> +static u64 freezer_self_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> +	struct freezer *freezer = cgroup_freezer(cgroup);
> +
> +	return (bool)(freezer->state & CGROUP_FREEZING_SELF);
> +}
> +
> +static u64 freezer_parent_freezing_read(struct cgroup *cgroup, struct cftype *cft)
> +{
> +	struct freezer *freezer = cgroup_freezer(cgroup);
> +
> +	return (bool)(freezer->state & CGROUP_FREEZING_PARENT);
> +}
> +
>   static struct cftype files[] = {
>   	{
>   		.name = "state",
> @@ -302,6 +331,16 @@ static struct cftype files[] = {
>   		.read_seq_string = freezer_read,
>   		.write_string = freezer_write,
>   	},
> +	{
> +		.name = "self_freezing",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.read_u64 = freezer_self_freezing_read,
> +	},
> +	{
> +		.name = "parent_freezing",
> +		.flags = CFTYPE_NOT_ON_ROOT,
> +		.read_u64 = freezer_parent_freezing_read,
> +	},
>   	{ }	/* terminate */
>   };
>
>

^ permalink raw reply

* Re: [PATCH 6/9] cgroup_freezer: make freezer->state mask of flags
From: Kamezawa Hiroyuki @ 2012-11-08  4:37 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-7-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> freezer->state was an enum value - one of THAWED, FREEZING and FROZEN.
> As the scheduled full hierarchy support requires more than one
> freezing condition, switch it to mask of flags.  If FREEZING is not
> set, it's thawed.  FREEZING is set if freezing or frozen.  If frozen,
> both FREEZING and FROZEN are set.  Now that tasks can be attached to
> an already frozen cgroup, this also makes freezing condition checks
> more natural.
>
> This patch doesn't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
>   kernel/cgroup_freezer.c | 60 ++++++++++++++++++++++---------------------------
>   1 file changed, 27 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
> index 2690830..e76aa9f 100644
> --- a/kernel/cgroup_freezer.c
> +++ b/kernel/cgroup_freezer.c
> @@ -22,15 +22,14 @@
>   #include <linux/freezer.h>
>   #include <linux/seq_file.h>
>
> -enum freezer_state {
> -	CGROUP_THAWED = 0,
> -	CGROUP_FREEZING,
> -	CGROUP_FROZEN,
> +enum freezer_state_flags {
> +	CGROUP_FREEZING		= (1 << 1), /* this freezer is freezing */
> +	CGROUP_FROZEN		= (1 << 3), /* this and its descendants frozen */
>   };
>
>   struct freezer {
>   	struct cgroup_subsys_state	css;
> -	enum freezer_state		state;
> +	unsigned int			state;
>   	spinlock_t			lock;
>   };
>
> @@ -48,12 +47,10 @@ static inline struct freezer *task_freezer(struct task_struct *task)
>
>   bool cgroup_freezing(struct task_struct *task)
>   {
> -	enum freezer_state state;
>   	bool ret;
>
>   	rcu_read_lock();
> -	state = task_freezer(task)->state;
> -	ret = state == CGROUP_FREEZING || state == CGROUP_FROZEN;
> +	ret = task_freezer(task)->state & CGROUP_FREEZING;
>   	rcu_read_unlock();
>
>   	return ret;
> @@ -63,10 +60,13 @@ bool cgroup_freezing(struct task_struct *task)
>    * cgroups_write_string() limits the size of freezer state strings to
>    * CGROUP_LOCAL_BUFFER_SIZE
>    */
> -static const char *freezer_state_strs[] = {
> -	"THAWED",
> -	"FREEZING",
> -	"FROZEN",
> +static const char *freezer_state_strs(unsigned int state)
> +{
> +	if (state & CGROUP_FROZEN)
> +		return "FROZEN";
> +	if (state & CGROUP_FREEZING)
> +		return "FREEZING";
> +	return "THAWED";
>   };
>
>   /*
> @@ -91,7 +91,6 @@ static struct cgroup_subsys_state *freezer_create(struct cgroup *cgroup)
>   		return ERR_PTR(-ENOMEM);
>
>   	spin_lock_init(&freezer->lock);
> -	freezer->state = CGROUP_THAWED;
>   	return &freezer->css;
>   }
>
> @@ -99,7 +98,7 @@ static void freezer_destroy(struct cgroup *cgroup)
>   {
>   	struct freezer *freezer = cgroup_freezer(cgroup);
>
> -	if (freezer->state != CGROUP_THAWED)
> +	if (freezer->state & CGROUP_FREEZING)
>   		atomic_dec(&system_freezing_cnt);
>   	kfree(freezer);
>   }
> @@ -129,15 +128,13 @@ static void freezer_attach(struct cgroup *new_cgrp, struct cgroup_taskset *tset)
>   	 * Tasks in @tset are on @new_cgrp but may not conform to its
>   	 * current state before executing the following - !frozen tasks may
>   	 * be visible in a FROZEN cgroup and frozen tasks in a THAWED one.
> -	 * This means that, to determine whether to freeze, one should test
> -	 * whether the state equals THAWED.
>   	 */
>   	cgroup_taskset_for_each(task, new_cgrp, tset) {
> -		if (freezer->state == CGROUP_THAWED) {
> +		if (!(freezer->state & CGROUP_FREEZING)) {
>   			__thaw_task(task);
>   		} else {
>   			freeze_task(task);
> -			freezer->state = CGROUP_FREEZING;
> +			freezer->state &= ~CGROUP_FROZEN;
>   		}
>   	}
>
> @@ -159,11 +156,7 @@ static void freezer_fork(struct task_struct *task)
>   		goto out;
>
>   	spin_lock_irq(&freezer->lock);
> -	/*
> -	 * @task might have been just migrated into a FROZEN cgroup.  Test
> -	 * equality with THAWED.  Read the comment in freezer_attach().
> -	 */
> -	if (freezer->state != CGROUP_THAWED)
> +	if (freezer->state & CGROUP_FREEZING)
>   		freeze_task(task);
>   	spin_unlock_irq(&freezer->lock);
>   out:
> @@ -184,7 +177,8 @@ static void update_if_frozen(struct freezer *freezer)
>   	struct cgroup_iter it;
>   	struct task_struct *task;
>
> -	if (freezer->state != CGROUP_FREEZING)
> +	if (!(freezer->state & CGROUP_FREEZING) ||
> +	    (freezer->state & CGROUP_FROZEN))
>   		return;

Hmm....

How about
enum {
    __CGROUP_FREEZING,
    __CGROUP_FROZEN,
};

#define CGROUP_FREEZER_STATE_MASK  0x3
#define CGROUP_FREEZER_STATE(state)	((state) & CGROUP_FREEZER_STATE_MASK)
#define CGROUP_THAW(state)	(CGROUP_FREEZER_STATE(state) == 0)
#define CGROUP_FREEZING(state)	(CGROUP_FREEZER_STATE(state) == __CGROUP_FREEZING)
#define CGROUP_FROZEN(state)\
	(CGROUP_FREEZER_STATE(state) == (__CGROUP_FREEZING | __CGROUP_FROZEN))
>
>   	cgroup_iter_start(cgroup, &it);
> @@ -202,7 +196,7 @@ static void update_if_frozen(struct freezer *freezer)
>   		}
>   	}
>
> -	freezer->state = CGROUP_FROZEN;
> +	freezer->state |= CGROUP_FROZEN;
>   notyet:
>   	cgroup_iter_end(cgroup, &it);
>   }
> @@ -211,14 +205,14 @@ static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
>   			struct seq_file *m)
>   {
>   	struct freezer *freezer = cgroup_freezer(cgroup);
> -	enum freezer_state state;
> +	unsigned int state;
>
>   	spin_lock_irq(&freezer->lock);
>   	update_if_frozen(freezer);
>   	state = freezer->state;
>   	spin_unlock_irq(&freezer->lock);
>
> -	seq_puts(m, freezer_state_strs[state]);
> +	seq_puts(m, freezer_state_strs(state));
>   	seq_putc(m, '\n');
>   	return 0;
>   }
> @@ -258,14 +252,14 @@ static void freezer_apply_state(struct freezer *freezer, bool freeze)
>   	lockdep_assert_held(&freezer->lock);
>
>   	if (freeze) {
> -		if (freezer->state == CGROUP_THAWED)
> +		if (!(freezer->state & CGROUP_FREEZING))
>   			atomic_inc(&system_freezing_cnt);
> -		freezer->state = CGROUP_FREEZING;
> +		freezer->state |= CGROUP_FREEZING;
>   		freeze_cgroup(freezer);
>   	} else {
> -		if (freezer->state != CGROUP_THAWED)
> +		if (freezer->state & CGROUP_FREEZING)
>   			atomic_dec(&system_freezing_cnt);
> -		freezer->state = CGROUP_THAWED;
> +		freezer->state &= ~(CGROUP_FREEZING | CGROUP_FROZEN);
>   		unfreeze_cgroup(freezer);
>   	}
>   }
> @@ -290,9 +284,9 @@ static int freezer_write(struct cgroup *cgroup, struct cftype *cft,
>   {
>   	bool freeze;
>
> -	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
> +	if (strcmp(buffer, freezer_state_strs(0)) == 0)

Can't we have a name for "0" ?

>   		freeze = false;
> -	else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
> +	else if (strcmp(buffer, freezer_state_strs(CGROUP_FROZEN)) == 0)
>   		freeze = true;
>   	else
>   		return -EINVAL;
>


Thanks,
-Kame

^ permalink raw reply

* Re: [PATCH 5/9] cgroup_freezer: prepare freezer_change_state() for full hierarchy support
From: Kamezawa Hiroyuki @ 2012-11-08  4:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-6-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> * Make freezer_change_state() take bool @freeze instead of enum
>    freezer_state.
>
> * Separate out freezer_apply_state() out of freezer_change_state().
>    This makes freezer_change_state() a rather silly thin wrapper.  It
>    will be filled with hierarchy handling later on.
>
> This patch doesn't introduce any behavior change.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 4/9] cgroup_freezer: trivial cleanups
From: Kamezawa Hiroyuki @ 2012-11-08  3:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-5-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> * Clean-up indentation and line-breaks.  Drop the invalid comment
>    about freezer->lock.
>
> * Make all internal functions take @freezer instead of both @cgroup
>    and @freezer.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 3/9] cgroup: implement generic child / descendant walk macros
From: Kamezawa Hiroyuki @ 2012-11-08  3:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, mhocko-AlSwsSmVLrQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-4-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> Currently, cgroup doesn't provide any generic helper for walking a
> given cgroup's children or descendants.  This patch adds the following
> three macros.
> 
> * cgroup_for_each_child() - walk immediate children of a cgroup.
> 
> * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup
>    in pre-order tree traversal.
> 
> * cgroup_for_each_descendant_post() - visit all descendants of a
>    cgroup in post-order tree traversal.
> 
> All three only require the user to hold RCU read lock during
> traversal.  Verifying that each iterated cgroup is online is the
> responsibility of the user.  When used with proper synchronization,
> cgroup_for_each_descendant_pre() can be used to propagate config
> updates to descendants in reliable way.  See comments for details.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

maybe better than using css->id in some(many) case.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-c2VE3kR2zFovtab9mdV7tw@public.gmane.org>

^ permalink raw reply

* [PATCH V2] PM / Qos: Ensure device not in PRM_SUSPENDED when pm qos flags request functions are invoked in the pm core.
From: Lan Tianyu @ 2012-11-08  3:14 UTC (permalink / raw)
  To: rjw; +Cc: Lan Tianyu, stern, linux-pm, linux-acpi

Since dev_pm_qos_add_request(), dev_pm_qos_update_request() and
dev_pm_qos_remove_request() for pm qos flags should not be invoked
when device in RPM_SUSPENDED. Add pm_runtime_get_sync() and pm_runtime_put()
around these functions in the pm core to ensure device not in RPM_SUSPENDED.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
---
Change since v1:
	Remove unnecessary pm_runtime_get_sync() and pm_runtime_put()
around dev_pm_qos_update_flags().
---
 drivers/base/power/qos.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
index 081db2d..fdc3894 100644
--- a/drivers/base/power/qos.c
+++ b/drivers/base/power/qos.c
@@ -633,15 +633,18 @@ int dev_pm_qos_expose_flags(struct device *dev, s32 val)
 	if (!req)
 		return -ENOMEM;
 
+	pm_runtime_get_sync(dev);
 	ret = dev_pm_qos_add_request(dev, req, DEV_PM_QOS_FLAGS, val);
 	if (ret < 0)
-		return ret;
+		goto fail;
 
 	dev->power.qos->flags_req = req;
 	ret = pm_qos_sysfs_add_flags(dev);
 	if (ret)
 		__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_FLAGS);
 
+fail:
+	pm_runtime_put(dev);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_expose_flags);
@@ -654,7 +657,9 @@ void dev_pm_qos_hide_flags(struct device *dev)
 {
 	if (dev->power.qos && dev->power.qos->flags_req) {
 		pm_qos_sysfs_remove_flags(dev);
+		pm_runtime_get_sync(dev);
 		__dev_pm_qos_drop_user_request(dev, DEV_PM_QOS_FLAGS);
+		pm_runtime_put(dev);
 	}
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_hide_flags);
-- 
1.7.9.5


^ permalink raw reply related

* Re: [PATCH 2/9] cgroup: Use rculist ops for cgroup->children
From: Kamezawa Hiroyuki @ 2012-11-08  3:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351931915-1701-3-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

(2012/11/03 17:38), Tejun Heo wrote:
> Use RCU safe list operations for cgroup->children.  This will be used
> to implement cgroup children / descendant walking which can be used by
> controllers.
>
> Note that cgroup_create() now puts a new cgroup at the end of the
> ->children list instead of head.  This isn't strictly necessary but is
> done so that the iteration order is more conventional.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [PATCH 1/9 v2] cgroup: add cgroup_subsys->post_create()
From: Kamezawa Hiroyuki @ 2012-11-08  2:59 UTC (permalink / raw)
  To: Tejun Heo
  Cc: lizefan-hv44wF8Li93QT0dZR+AlfA, mhocko-AlSwsSmVLrQ,
	rjw-KKrjLPT3xs0,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA, fweisbec-Re5JQEeQqe8AvxtiuMwx3w,
	Glauber Costa
In-Reply-To: <20121107171508.GF2660-9pTldWuhBndy/B6EtB590w@public.gmane.org>

(2012/11/08 2:15), Tejun Heo wrote:
> Currently, there's no way for a controller to find out whether a new
> cgroup finished all ->create() allocatinos successfully and is
> considered "live" by cgroup.
>
> This becomes a problem later when we add generic descendants walking
> to cgroup which can be used by controllers as controllers don't have a
> synchronization point where it can synchronize against new cgroups
> appearing in such walks.
>
> This patch adds ->post_create().  It's called after all ->create()
> succeeded and the cgroup is linked into the generic cgroup hierarchy.
> This plays the counterpart of ->pre_destroy().
>
> When used in combination with the to-be-added generic descendant
> iterators, ->post_create() can be used to implement reliable state
> inheritance.  It will be explained with the descendant iterators.
>
> v2: Added a paragraph about its future use w/ descendant iterators per
>      Michal.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> Cc: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>


Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>

^ permalink raw reply

* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Huang Ying @ 2012-11-08  2:04 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-kernel, linux-pm
In-Reply-To: <5312356.pH16Qh7ECV@vostro.rjw.lan>

On Thu, 2012-11-08 at 02:35 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 08, 2012 09:15:08 AM Huang Ying wrote:
> > On Thu, 2012-11-08 at 00:09 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 07, 2012 11:51:15 PM Rafael J. Wysocki wrote:
> > > > On Wednesday, November 07, 2012 04:56:49 PM Alan Stern wrote:
> > > > > On Wed, 7 Nov 2012, Rafael J. Wysocki wrote:
> > > > > 
> > > > > > > Right.  The reasoning behind my proposal goes like this: When there's
> > > > > > > no driver, the subsystem can let userspace directly control the
> > > > > > > device's power level through the power/control attribute.
> > > > > > 
> > > > > > Well, we might as well just leave the runtime PM of PCI devices enabled, even
> > > > > > if they have no drivers, but modify the PCI bus type's runtime PM callbacks
> > > > > > to ignore devices with no drivers.
> > > > > > 
> > > > > > IIRC the reason why we decided to disable runtime PM for PCI device with no
> > > > > > drivers was that some of them refused to work again after being put by the
> > > > > > core into D3.  By making the PCI bus type's runtime PM callbacks ignore them
> > > > > > we'd avoid this problem without modifying the core's behavior.
> > > > > 
> > > > > It comes down to a question of the parent.  If a driverless PCI device
> > > > > isn't being used, shouldn't its parent be allowed to go into runtime
> > > > > suspend?  As things stand now, we do allow it.
> > > > > 
> > > > > The problem is that we don't disallow it when the driverless device
> > > > > _is_ being used.
> > > > 
> > > > We can make it depend on what's there in the control file.  Let's say if that's
> > > > "on" (ie. the devices usage counter is not zero), we won't allow the parent
> > > > to be suspended.
> > > > 
> > > > So, as I said, why don't we keep the runtime PM of PCI devices always enabled,
> > > > regardless of whether or not there is a driver, and arrange things in such a
> > > > way that the device is automatically "suspended" if user space writes "auto"
> > > > to the control file.  IOW, I suppose we can do something like this:
> > > 
> > > It probably is better to treat the "no driver" case in a special way, though:
> > > 
> > > ---
> > >  drivers/pci/pci-driver.c |   45 +++++++++++++++++++++++++--------------------
> > >  drivers/pci/pci.c        |    2 ++
> > >  2 files changed, 27 insertions(+), 20 deletions(-)
> > > 
> > > Index: linux-pm/drivers/pci/pci-driver.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/pci/pci-driver.c
> > > +++ linux-pm/drivers/pci/pci-driver.c
> > > @@ -263,22 +263,17 @@ static long local_pci_probe(void *_ddi)
> > >  	/* The parent bridge must be in active state when probing */
> > >  	if (parent)
> > >  		pm_runtime_get_sync(parent);
> > > -	/* Unbound PCI devices are always set to disabled and suspended.
> > > -	 * During probe, the device is set to enabled and active and the
> > > -	 * usage count is incremented.  If the driver supports runtime PM,
> > > -	 * it should call pm_runtime_put_noidle() in its probe routine and
> > > +	/*
> > > +	 * During probe, the device is set to active and the usage count is
> > > +	 * incremented.  If the driver supports runtime PM, it should call
> > > +	 * pm_runtime_put_noidle() in its probe routine and
> > >  	 * pm_runtime_get_noresume() in its remove routine.
> > >  	 */
> > > -	pm_runtime_get_noresume(dev);
> > > -	pm_runtime_set_active(dev);
> > > -	pm_runtime_enable(dev);
> > > -
> > > +	pm_runtime_get_sync(dev);
> > >  	rc = ddi->drv->probe(ddi->dev, ddi->id);
> > > -	if (rc) {
> > > -		pm_runtime_disable(dev);
> > > -		pm_runtime_set_suspended(dev);
> > > -		pm_runtime_put_noidle(dev);
> > > -	}
> > > +	if (rc)
> > > +		pm_runtime_put_sync(dev);
> > > +
> > >  	if (parent)
> > >  		pm_runtime_put(parent);
> > >  	return rc;
> > > @@ -369,9 +364,7 @@ static int pci_device_remove(struct devi
> > >  	}
> > >  
> > >  	/* Undo the runtime PM settings in local_pci_probe() */
> > > -	pm_runtime_disable(dev);
> > > -	pm_runtime_set_suspended(dev);
> > > -	pm_runtime_put_noidle(dev);
> > > +	pm_runtime_put_sync(dev);
> > >  
> > >  	/*
> > >  	 * If the device is still on, set the power state as "unknown",
> > > @@ -998,10 +991,14 @@ static int pci_pm_restore(struct device
> > >  static int pci_pm_runtime_suspend(struct device *dev)
> > >  {
> > >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > -	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > > +	const struct dev_pm_ops *pm;
> > >  	pci_power_t prev = pci_dev->current_state;
> > >  	int error;
> > >  
> > > +	if (!dev->driver)
> > > +		return 0;
> > > +
> > > +	pm = dev->driver->pm;
> > >  	if (!pm || !pm->runtime_suspend)
> > >  		return -ENOSYS;
> > >  
> > > @@ -1035,8 +1032,12 @@ static int pci_pm_runtime_resume(struct
> > >  {
> > >  	int rc;
> > >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > -	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > > +	const struct dev_pm_ops *pm;
> > > +
> > > +	if (!dev->driver)
> > > +		return 0;
> > >  
> > > +	pm = dev->driver->pm;
> > >  	if (!pm || !pm->runtime_resume)
> > >  		return -ENOSYS;
> > >  
> > > @@ -1054,8 +1055,12 @@ static int pci_pm_runtime_resume(struct
> > >  
> > >  static int pci_pm_runtime_idle(struct device *dev)
> > >  {
> > > -	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > > +	const struct dev_pm_ops *pm;
> > > +
> > > +	if (!dev->driver)
> > > +		goto out:
> > >  
> > > +	pm = dev->driver->pm;
> > >  	if (!pm)
> > >  		return -ENOSYS;
> > >  
> > > @@ -1065,8 +1070,8 @@ static int pci_pm_runtime_idle(struct de
> > >  			return ret;
> > >  	}
> > >  
> > > + out:
> > >  	pm_runtime_suspend(dev);
> > > -
> > >  	return 0;
> > >  }
> > >  
> > > Index: linux-pm/drivers/pci/pci.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/pci/pci.c
> > > +++ linux-pm/drivers/pci/pci.c
> > > @@ -1868,6 +1868,8 @@ void pci_pm_init(struct pci_dev *dev)
> > >  	u16 pmc;
> > >  
> > >  	pm_runtime_forbid(&dev->dev);
> > > +	pm_runtime_set_active(dev);
> > > +	pm_runtime_enable(&dev->dev);
> > >  	device_enable_async_suspend(&dev->dev);
> > >  	dev->wakeup_prepared = false;
> > >  
> > 
> > I think the patch can fix the issue in a better way.
> 
> I'm not sure what you mean.

I mean your patch can fix the driver-less VGA issue.  And it is better
than my original patch.  The following discussion is not about this
specific issue.  Just about PM core logic.

> > Do we still need to clarify state about disabled and forbidden?  When a
> > device is forbidden and the usage_count > 0,
> 
> "Forbidden" always means usage_count > 0.

Yes.

> > is it a good idea to allow to set device state to SUSPENDED if the device
> > is disabled?
> 
> No, it is not.  The status should always be ACTIVE as long as usage_count > 0.
> However, in some cases we actually would like to change the status to
> SUSPENDED when usage_count becomes equal to 0, because that means we can
> suspend (I mean really suspend) the parents of the devices in question
> (and we want to notify the parents in those cases).

So do you think Alan Stern's suggestion about forbidden and disabled is
the right way to go?

>	Make pm_runtime_set_suspended() fail if runtime PM is 
> 	forbidden.
> 
> 	Make pm_runtime_forbid() call pm_runtime_set_active()
> 	(and do a runtime resume of the parent) if disable_depth > 0.
>
>	Make the PCI runtime-idle routine call 
> 	pm_runtime_set_suspended() if disable_depth > 0.  Or maybe
> 	do this for all devices, in the runtime PM core.

In this way, we do not really need to call pm_runtime_set_suspended() in
fact.  Because if disabled and usage_count=0, device will be set to
SUSPENDED state automatically.

Best Regards,
Huang Ying



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox