Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-10 22:52 UTC (permalink / raw)
  To: Jean Pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <alpine.DEB.2.00.1110101636470.5205@utopia.booyaka.com>

On Mon, 10 Oct 2011, Paul Walmsley wrote:

> OK.  sys_offmode only applies to OFF mode.  Voltage scaling can also occur 
> during RETENTION and INACTIVE ("sleep").  So were these results with 
> retention and voltage scaling disabled?

This last sentence should have read:

So were these results with retention and sleep voltage scaling disabled?

- Paul

^ permalink raw reply

* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-10 22:44 UTC (permalink / raw)
  To: Jean Pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <CAORVsuX2PUubPzO3mWTaO0Lg6Mdog8wjDY6hm93qheMwoz27CA@mail.gmail.com>

On Mon, 10 Oct 2011, Jean Pihet wrote:

> On Fri, Oct 7, 2011 at 6:17 AM, Paul Walmsley <paul@pwsan.com> wrote:
> > On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:
> >
> >> From: Jean Pihet <j-pihet@ti.com>
> >>
> >> Figures are added to the power domains structs for RET and OFF modes.
> >>
> >> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> >> from actual measurements.
> >> The latency figures for the other power domains are preliminary and
> >> shall be added.
> >>
> >> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> >> for a detailed explanation on where are the numbers magically coming from.
> >>
> >> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> >> on MPU, CORE and PER.
> >
> > Do the CSWR measurements include the time for the PMIC to scale the
> > voltage, or do they just represent the time to enter and exit clock stop
> > (presumably with DPLL idling)?
> 
> As described at [1] the measurements have not been performed with
> sys_offmode and sys_clkreq signals toggling correctly, because of the
> lack of support for it in the kernel.
> However the results are including a correction for the sys_offmode
> transition time (11.5ms), but no correction for the sys_clkreq signal
> (which should be 1ms for sysclk on, 2.5ms for sysclk off).
> 
> [1] http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement#cpuidle_results

OK.  sys_offmode only applies to OFF mode.  Voltage scaling can also occur 
during RETENTION and INACTIVE ("sleep").  So were these results with 
retention and voltage scaling disabled?


- Paul

^ permalink raw reply

* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-07 15:26 UTC (permalink / raw)
  To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-9-git-send-email-j-pihet@ti.com>

Hi Jean

On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:

> Figures are added to the power domains structs for RET and OFF modes.
> 
> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> from actual measurements.
> The latency figures for the other power domains are preliminary and
> shall be added.
> 
> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> for a detailed explanation on where are the numbers magically coming from.
> 
> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> on MPU, CORE and PER.

A few more question about these numbers.

- Are these numbers the ones right off the hardware, or rounded, or 
worst-case plus a certain percentage?

- What frequency/voltage were these measured at?

- Also, you mention OMAP3 Beagleboard.  Is that 34xx or 36xx Beagleboard?

> Signed-off-by: Jean Pihet <j-pihet@ti.com>
> ---
>  arch/arm/mach-omap2/powerdomains3xxx_data.c |   78 +++++++++++++++++++++++++++
>  1 files changed, 78 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/powerdomains3xxx_data.c b/arch/arm/mach-omap2/powerdomains3xxx_data.c
> index 469a920..32922bb 100644
> --- a/arch/arm/mach-omap2/powerdomains3xxx_data.c
> +++ b/arch/arm/mach-omap2/powerdomains3xxx_data.c
> @@ -31,6 +31,14 @@
>  
>  /*
>   * Powerdomains
> + *
> + * The wakeup_lat values are derived from measurements on
> + * the actual target.
> + *
> + * Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> + * from actual measurements.
> + * The latency figures for the other power domains are preliminary and
> + * shall be added.
>   */
>  
>  static struct powerdomain iva2_pwrdm = {
> @@ -52,6 +60,13 @@ static struct powerdomain iva2_pwrdm = {
>  		[2] = PWRSTS_OFF_ON,
>  		[3] = PWRSTS_ON,
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 1100,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 350,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain mpu_3xxx_pwrdm = {
> @@ -68,6 +83,13 @@ static struct powerdomain mpu_3xxx_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_OFF_ON,
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 1830,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 121,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  /*
> @@ -98,6 +120,13 @@ static struct powerdomain core_3xxx_pre_es3_1_pwrdm = {
>  		[0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
>  		[1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 3082,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 153,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain core_3xxx_es3_1_pwrdm = {
> @@ -121,6 +150,13 @@ static struct powerdomain core_3xxx_es3_1_pwrdm = {
>  		[0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
>  		[1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 3082,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 153,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain dss_pwrdm = {
> @@ -136,6 +172,13 @@ static struct powerdomain dss_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_ON,  /* MEMONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 70,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 20,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  /*
> @@ -157,6 +200,13 @@ static struct powerdomain sgx_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_ON,  /* MEMONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 1000,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain cam_pwrdm = {
> @@ -172,6 +222,13 @@ static struct powerdomain cam_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_ON,  /* MEMONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 850,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 35,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain per_pwrdm = {
> @@ -187,6 +244,13 @@ static struct powerdomain per_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_ON,  /* MEMONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 671,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 31,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain emu_pwrdm = {
> @@ -201,6 +265,13 @@ static struct powerdomain neon_pwrdm = {
>  	.omap_chip	  = OMAP_CHIP_INIT(CHIP_IS_OMAP3430),
>  	.pwrsts		  = PWRSTS_OFF_RET_ON,
>  	.pwrsts_logic_ret = PWRSTS_RET,
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 0,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 0,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain usbhost_pwrdm = {
> @@ -223,6 +294,13 @@ static struct powerdomain usbhost_pwrdm = {
>  	.pwrsts_mem_on	  = {
>  		[0] = PWRSTS_ON,  /* MEMONSTATE */
>  	},
> +	.wakeup_lat = {
> +		[PWRDM_FUNC_PWRST_OFF] = 800,
> +		[PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_CSWR] = 150,
> +		[PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> +		[PWRDM_FUNC_PWRST_ON] = 0,
> +	},
>  };
>  
>  static struct powerdomain dpll1_pwrdm = {
> -- 
> 1.7.4.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


- Paul

^ permalink raw reply

* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-07  4:17 UTC (permalink / raw)
  To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-9-git-send-email-j-pihet@ti.com>

Hi Jean

On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:

> From: Jean Pihet <j-pihet@ti.com>
> 
> Figures are added to the power domains structs for RET and OFF modes.
> 
> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> from actual measurements.
> The latency figures for the other power domains are preliminary and
> shall be added.
> 
> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> for a detailed explanation on where are the numbers magically coming from.
> 
> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> on MPU, CORE and PER.

Do the CSWR measurements include the time for the PMIC to scale the 
voltage, or do they just represent the time to enter and exit clock stop 
(presumably with DPLL idling)? 


- Paul

^ permalink raw reply

* Re: [PATCH 4/8] OMAP2+: omap_hwmod: manage the wake-up latency constraints
From: Paul Walmsley @ 2011-10-07  2:53 UTC (permalink / raw)
  To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-5-git-send-email-j-pihet@ti.com>

Hi

a comment:

On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:

> From: Jean Pihet <j-pihet@ti.com>
> 
> Hwmod is queried from the OMAP_PM layer to manage the power domains
> wake-up latency constraints. Hwmod retrieves the correct power domain
> and if it exists it calls the corresponding power domain function.
> 
> Tested on OMAP3 Beagleboard and OMAP4 Pandaboard in RET/OFF using wake-up
> latency constraints on MPU, CORE and PER.
> 
> Signed-off-by: Jean Pihet <j-pihet@ti.com>
> ---
>  arch/arm/mach-omap2/omap_hwmod.c             |   26 +++++++++++++++++++++++++-
>  arch/arm/plat-omap/include/plat/omap_hwmod.h |    2 ++
>  2 files changed, 27 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> index 84cc0bd..c6b1cc9 100644
> --- a/arch/arm/mach-omap2/omap_hwmod.c
> +++ b/arch/arm/mach-omap2/omap_hwmod.c
> @@ -2618,11 +2619,34 @@ ohsps_unlock:
>  	return ret;
>  }
>  
> +/*
> + * omap_hwmod_set_wkup_constraint- set/release a wake-up latency constraint
> + *
> + * @oh: struct omap_hwmod* to which the target device belongs to.
> + * @cookie: identifier of the constraints list for @oh.
> + * @min_latency: the minimum allowed wake-up latency for @oh.
> + *
> + * Returns 0 upon success.
> + */

It's good that there is some documentation here, but it would be better if 
it were kerneldoc-style documentation.  Please see 
Documentation/kernel-doc-nano-HOWTO.txt.

Also it would be good to have a bit more documentation here beyond simply 
"Returns 0 upon success."  For example, how should a caller remove a 
wakeup latency constraint?  Also, this function can return -EINVAL, not 
counting whatever pwrdm_set_wkup_lat_constraint() can return, so that 
should be documented above also.

This applies to the function comments in the rest of the patches too.  
Some of them have quite good kerneldoc comments, such as 
pwrdm_wakeuplat_update_pwrst(); others need some work, like 
pwrdm_set_wkup_lat_constraint().

> +int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh,
> +				       void *cookie, long min_latency)
> +{
> +	struct powerdomain *pwrdm = omap_hwmod_get_pwrdm(oh);
> +
> +	if (!pwrdm) {
> +		pr_err("%s: Error: could not find powerdomain "
> +		       "for %s\n", __func__, oh->name);
> +		return -EINVAL;
> +	}
> +
> +	return pwrdm_set_wkup_lat_constraint(pwrdm, cookie, min_latency);
> +}
> +
>  /**
>   * omap_hwmod_get_context_loss_count - get lost context count
>   * @oh: struct omap_hwmod *
>   *
> - * Query the powerdomain of of @oh to get the context loss
> + * Query the powerdomain of @oh to get the context loss
>   * count for this device.
>   *
>   * Returns the context loss count of the powerdomain assocated with @oh
> diff --git a/arch/arm/plat-omap/include/plat/omap_hwmod.h b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> index 0e329ca..75e0e7a 100644
> --- a/arch/arm/plat-omap/include/plat/omap_hwmod.h
> +++ b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> @@ -603,6 +603,8 @@ int omap_hwmod_for_each_by_class(const char *classname,
>  				 void *user);
>  
>  int omap_hwmod_set_postsetup_state(struct omap_hwmod *oh, u8 state);
> +int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh, void *cookie,
> +				       long min_latency);
>  u32 omap_hwmod_get_context_loss_count(struct omap_hwmod *oh);
>  
>  int omap_hwmod_no_setup_reset(struct omap_hwmod *oh);
> -- 
> 1.7.4.1
> 


- Paul

^ permalink raw reply

* Re: [PATCHSET cgroup] extend threadgroup locking
From: Tejun Heo @ 2011-09-11  3:35 UTC (permalink / raw)
  To: rjw, paul, lizf
  Cc: fweisbec, containers, linux-kernel, Oleg Nesterov, linux-pm, akpm
In-Reply-To: <1315159280-25032-1-git-send-email-htejun@gmail.com>

On Mon, Sep 05, 2011 at 03:01:16AM +0900, Tejun Heo wrote:
> Hello,
> 
> cgroup currently only blocks new threads from joining the target
> threadgroup during migration, and on-going migration could race
> against exec and exit leading to interesting problems - the symmetry
> between various attach methods, task exiting during method execution,
> ->exit() racing against attach methods, migrating task switching basic
> properties during exec and so on.
> 
> This patchset extends threadgroup locking such that it covers all
> operations which can alter the threadgroup - fork, exit and exec, and
> update cgroup to take advantage of it.  rwsem read ops are added to
> exit path but exec is excluded by grabbing the existing
> cred_guard_mutex from threadgroup locking helper.
> 
> This makes threadgroup locking complete and resolves cgroup issues
> stemming from the target taskset being unstable.
> 
> This patchset is on top of the current pm-freezer + "freezer: fixes &
> simplifications" patchset and contains the following four patches.
> Patch list and diffstat follow.
> 
> Thanks.
> 
>  [PATCH 1/4] cgroup: change locking order in attach_task_by_pid()
>  [PATCH 2/4] threadgroup: rename signal->threadgroup_fork_lock to
>  [PATCH 3/4] threadgroup: extend threadgroup_lock() to cover exit and
>  [PATCH 4/4] cgroup: always lock threadgroup during migration
> 
>  include/linux/init_task.h |    9 ++----
>  include/linux/sched.h     |   58 ++++++++++++++++++++++++++++---------------
>  kernel/cgroup.c           |   62 +++++++++++++++++++++-------------------------
>  kernel/exit.c             |   16 ++++++++---
>  kernel/fork.c             |    8 ++---
>  5 files changed, 88 insertions(+), 65 deletions(-)

Oops, forgot to cc Oleg on this series.

Oleg, this one definitely needs your review.  The original thread is...

  http://thread.gmane.org/gmane.linux.kernel/1187853/focus=1188347

If you want the series in mbox format, please let me know.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Tejun Heo @ 2011-09-11  1:54 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110908175926.GA26986@redhat.com>

Hello,

On Thu, Sep 08, 2011 at 07:59:26PM +0200, Oleg Nesterov wrote:
> > Indeed, nice catch.  This one actually is pretty dangerous.  We
> > probably want to make a separate fix for this and backport it to
> > -stable?
> 
> And I forgot to mention that wait_event_freezable_timeout() doesn't
> handle -ERESTARTSYS at all.
> 
> But I don't think -stable needs this fix. According to grep, nobody
> check the returned value, and none of the callers plays with signals.

Ah, no user, okay.

> > Yeap, with freezable_with_signal gone, this looks correct & simpler to
> > me
> 
> I don't really understand this... set_freezable_with_signal() has a
> lot of problems, yes... But even if it wasn't removed this fix makes
> sense anyway, afaics.
> 
> If freezable_with_signal caller does wait_event_freezable_timeout(),
> __retval becomes -ERESTARTSYS after freeze_task(). The next iteration
> will return 0 with the KERN_ERR message from schedule_timeout().

Hmmm... but with the change, if a kthread gets TIF_SIGPENDING from
freezer and then thawed, it would not enter try_to_freeze() and thus
won't clear TIF_SIGPENDING.  The original code was racy too but the
window would be much larger afterwards.  Anyways, this doesn't matter
anymore.

> > but it would be nice if you can sprinkle some documentation while
> > at it. :)
> 
> But they already have the comment ;) What can I add?

Proper /** - */ comment w/ return value documentation? :P

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Tejun Heo @ 2011-09-11  1:37 UTC (permalink / raw)
  To: Matt Helsley; +Cc: containers, Oleg Nesterov, linux-kernel, linux-pm, paul
In-Reply-To: <20110908180159.GA4197@count0.beaverton.ibm.com>

Hello, Matt.

On Thu, Sep 08, 2011 at 11:01:59AM -0700, Matt Helsley wrote:
> Looking at the code and docs I realize I didn't explicitly mention that
> kthreads could not be frozen by the cgroup freezer. However, the code did
> not allow it. When freezing tasks the cgroup freezer always did:
> 
> 	freeze_task(task, true)
> 
> which would only freeze tasks without PF_FREEZER_NOSIG due to the second
> "sig_only" parameter. I believe this means it could not be used to
> freeze kthreads.
> 
> My feeling is kthreads should not be "managed" directly by userspace.
> Especially if that includes the ability to arbitrarily stop or freeze them.
> So it seems more appropriate to explicitly disallow freezing of kthreads
> via the cgroup freezer.

Oh yeah, that's what we should do and maybe that's what the current
code intends to achieve too but currently it ends up triggering
BUG_ON().

Thanks.

-- 
tejun

^ permalink raw reply

* [PATCH 2/2] PM/runtime: handle ->runtime_suspend failure correctly
From: tom.leiming @ 2011-09-10 14:37 UTC (permalink / raw)
  To: rjw, stern; +Cc: linux-pm, linux-kernel
In-Reply-To: <1315665442-4194-1-git-send-email-tom.leiming@gmail.com>

From: Ming Lei <tom.leiming@gmail.com>

If ->runtime_suspend returns -EAGAIN or -EBUSY, the device should
still be in ACTIVE state, so it is not needed to handle defered
resume and idle notification to its parent; if ->runtime_suspend
returns other fatal failure, it doesn't make sense to process defered
resume and send idle notification to its parent.

So skip these when failure is returned from ->runtime_suspend, also add
comments for this handling in rpm_suspend.

This patch also updates comments for rpm_suspend:

- 'Cancel a pending idle notification' should be put before, also
should be changed as 'Cancel a pending idle notification or
autosuspend/suspend'

- idle notification for suspend failure has been removed, so update
comments for it

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 drivers/base/power/runtime.c |   22 ++++++++++++++--------
 1 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 441b5a3..6fa3241 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -284,14 +284,17 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
  * @dev: Device to suspend.
  * @rpmflags: Flag bits.
  *
- * Check if the device's runtime PM status allows it to be suspended.  If
- * another suspend has been started earlier, either return immediately or wait
- * for it to finish, depending on the RPM_NOWAIT and RPM_ASYNC flags.  Cancel a
- * pending idle notification.  If the RPM_ASYNC flag is set then queue a
- * suspend request; otherwise run the ->runtime_suspend() callback directly.
- * If a deferred resume was requested while the callback was running then carry
- * it out; otherwise send an idle notification for the device (if the suspend
- * failed) or for its parent (if the suspend succeeded).
+ * Check if the device's runtime PM status allows it to be suspended. Cancel
+ * a pending idle notification or autosuspend/suspend. If another suspend has
+ * been started earlier, either return immediately or wait for it to finish,
+ * depending on the RPM_NOWAIT and RPM_ASYNC flags. If the RPM_ASYNC flag is
+ * set then queue a suspend request; otherwise run the ->runtime_suspend()
+ * callback directly. If ->runtime_suspend returns failure, just cancel
+ * pending request and wake up waited tasks, then return immediatelly.
+ * After ->runtime_suspend succeeded, if a deferred resume was requested
+ * while the callback was running then carry it out; otherwise send an idle
+ * notification for its parent (if both ignore_children and irq_safe
+ * are not set).
  *
  * This function must be called under dev->power.lock with interrupts disabled.
  */
@@ -422,6 +425,9 @@ static int rpm_suspend(struct device *dev, int rpmflags)
 	}
 	wake_up_all(&dev->power.wait_queue);
 
+	if (retval)
+		goto out;
+
 	if (dev->power.deferred_resume) {
 		rpm_resume(dev, 0);
 		retval = -EAGAIN;
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/2] PM/runtime: update document about callbacks
From: tom.leiming @ 2011-09-10 14:37 UTC (permalink / raw)
  To: rjw, stern; +Cc: linux-pm, linux-kernel

From: Ming Lei <tom.leiming@gmail.com>

Support for device power domains has been introduced in
commit 9659cc0678b954f187290c6e8b247a673c5d37e1 (PM: Make
system-wide PM and runtime PM treat subsystems consistently),
also power domain callbacks will take precedence over subsystem ones
from commit 4d27e9dcff00a6425d779b065ec8892e4f391661(PM: Make
power domain callbacks take precedence over subsystem ones).

So update part of "Device Runtime PM Callbacks" in
Documentation/power/runtime_pm.txt.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 Documentation/power/runtime_pm.txt |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 1f05404..0e85608 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -43,13 +43,18 @@ struct dev_pm_ops {
 	...
 };
 
-The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are
-executed by the PM core for either the device type, or the class (if the device
-type's struct dev_pm_ops object does not exist), or the bus type (if the
-device type's and class' struct dev_pm_ops objects do not exist) of the given
-device (this allows device types to override callbacks provided by bus types or
-classes if necessary).  The bus type, device type and class callbacks are
-referred to as subsystem-level callbacks in what follows.
+The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
+are executed by the PM core for either the power domain, or the device type
+(if the device power domain's struct dev_pm_ops does not exist), or the class
+(if the device power domain's and type's struct dev_pm_ops object does not
+exist), or the bus type (if the device power domain's, type's and class'
+struct dev_pm_ops objects do not exist) of the given device, so the priority
+order of callbacks from high to low is that power domain callbacks, device
+type callbacks, class callbacks and bus type callbacks, and the high priority
+one will take precedence over low priority one. The bus type, device type and
+class callbacks are referred to as subsystem-level callbacks in what follows,
+and generally speaking, the power domain callbacks are used for representing
+power domains within a SoC.
 
 By default, the callbacks are always invoked in process context with interrupts
 enabled.  However, subsystems can use the pm_runtime_irq_safe() helper function
-- 
1.7.4.1

^ permalink raw reply related

* Re: LPC 2011 Power Management Session notes
From: mark gross @ 2011-09-09 22:50 UTC (permalink / raw)
  To: Jean Pihet; +Cc: linux-pm
In-Reply-To: <CAORVsuWKd9uvTTNCr_k9W6DeA+z3ZEaD99bqit++4xUy9zqKSQ@mail.gmail.com>

On Fri, Sep 09, 2011 at 10:09:48AM +0200, Jean Pihet wrote:
> Hi,
> 
> On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:
> >
> > Available at
> >
> > http://pad.ubuntu.com/plumbers-2011-pwr-mgt
> >
> > in real time if anyone's interested
> Thanks for the doc!
> 
> 
> About the Device PM QoS user space interface:
> 1. Proposal: For every device, there will be a sysfs wakeup latency attribute
> 2. Proposal: Userspace processes should communicate their needs via sysfs
> 
> I assume there can be only one user app that can open a given sysfs
> entry at a time, is that correct?
> 
> Let me think about it and propose a patch next week.


My recollection of the evening was that the exposing of per device qos
latencies was not considered a good idea without better articulation on
how it would be used.  (or misused) 

FWIW a few folks (ok maybe just me) are not comfortable with exposing
raw values without a good enough abstraction for them to user mode such
that the values used through that user mode ABI would be somewhat
portable between ISA's or even versions of the same chips / platforms.

--mark
 
> Regards,
> Jean
> 
> >
> >
> > - Paul
> > _______________________________________________
> > linux-pm mailing list
> > linux-pm@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> >
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply

* Re: LPC 2011 Power Management Session notes
From: Mark Brown @ 2011-09-09 16:50 UTC (permalink / raw)
  To: Jean Pihet; +Cc: linux-pm
In-Reply-To: <CAORVsuWKd9uvTTNCr_k9W6DeA+z3ZEaD99bqit++4xUy9zqKSQ@mail.gmail.com>

On Fri, Sep 09, 2011 at 10:09:48AM +0200, Jean Pihet wrote:
> On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:

> About the Device PM QoS user space interface:
> 1. Proposal: For every device, there will be a sysfs wakeup latency attribute
> 2. Proposal: Userspace processes should communicate their needs via sysfs

> I assume there can be only one user app that can open a given sysfs
> entry at a time, is that correct?

No, that's not the case.  The file implementation doesn't get to see
opens and closes at all, it just gets passed buffers to fill or read
from.

^ permalink raw reply

* Re: LPC 2011 Power Management Session notes
From: Jean Pihet @ 2011-09-09  8:09 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: linux-pm
In-Reply-To: <alpine.DEB.2.00.1109081827360.17022@utopia.booyaka.com>

Hi,

On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:
>
> Available at
>
> http://pad.ubuntu.com/plumbers-2011-pwr-mgt
>
> in real time if anyone's interested
Thanks for the doc!


About the Device PM QoS user space interface:
1. Proposal: For every device, there will be a sysfs wakeup latency attribute
2. Proposal: Userspace processes should communicate their needs via sysfs

I assume there can be only one user app that can open a given sysfs
entry at a time, is that correct?

Let me think about it and propose a patch next week.

Regards,
Jean

>
>
> - Paul
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

^ permalink raw reply

* LPC 2011 Power Management Session notes
From: Paul Walmsley @ 2011-09-09  0:28 UTC (permalink / raw)
  To: linux-pm


Available at 

http://pad.ubuntu.com/plumbers-2011-pwr-mgt

in real time if anyone's interested


- Paul

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Matt Helsley @ 2011-09-08 18:01 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, Oleg Nesterov, linux-kernel, linux-pm, paul
In-Reply-To: <20110906032846.GA18425@mtj.dyndns.org>

On Tue, Sep 06, 2011 at 12:28:55PM +0900, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote:
> > Perhaps it is correct... Just I do not understand what it should do.
> > I thought it is "wait_for_event && do_not_block_freezer". And at first
> > glance the code looks as if it tries to do this. Say, in the "likely"
> > case we restart wait_event_interruptible() after refrigerator().
> > 
> > But this looks racy. Suppose that freezing() is already false when
> > try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does
> > freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case?
> 
> It may return -ERESTARTSYS when not strictly necessary but given that
> it's supposed to trigger restart anyway I don't think it's actually
> broken (it doesn't break the contract of the wait).  Another thing to
> note is that kthread freezing via cgroup is almost fundamentally
> broken.  With the removal of freezable_with_signal, this shouldn't be
> an issue anymore although cgroup_freezer still needs to be fixed to
> account for kthreads for xstate transitions (it currently triggers
> BUG_ON).

Looking at the code and docs I realize I didn't explicitly mention that
kthreads could not be frozen by the cgroup freezer. However, the code did
not allow it. When freezing tasks the cgroup freezer always did:

	freeze_task(task, true)

which would only freeze tasks without PF_FREEZER_NOSIG due to the second
"sig_only" parameter. I believe this means it could not be used to
freeze kthreads.

My feeling is kthreads should not be "managed" directly by userspace.
Especially if that includes the ability to arbitrarily stop or freeze them.
So it seems more appropriate to explicitly disallow freezing of kthreads
via the cgroup freezer.

Cheers,
	-Matt Helsley

^ permalink raw reply

* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Oleg Nesterov @ 2011-09-08 17:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110908010530.GD3987@mtj.dyndns.org>

Hi,

On 09/08, Tejun Heo wrote:
>
> Hello,
>
> On Wed, Sep 07, 2011 at 08:22:17PM +0200, Oleg Nesterov wrote:
> > wait_event_freezable() and wait_event_freezable_timeout() stop
> > the waiting if try_to_freeze() fails. This is not right, we can
> > race with __thaw_task() and in this case
> >
> > 	- wait_event_freezable() returns the wrong ERESTARTSYS
> >
> > 	- wait_event_freezable_timeout() can return the positive
> > 	  value while condition == F
>
> Indeed, nice catch.  This one actually is pretty dangerous.  We
> probably want to make a separate fix for this and backport it to
> -stable?

And I forgot to mention that wait_event_freezable_timeout() doesn't
handle -ERESTARTSYS at all.

But I don't think -stable needs this fix. According to grep, nobody
check the returned value, and none of the callers plays with signals.

> > Change the code to always check __retval/condition before return.
> >
> > Note: with or without this patch the timeout logic looks strange,
> > probably we should recalc timeout if try_to_freeze() returns T.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Yeap, with freezable_with_signal gone, this looks correct & simpler to
> me

I don't really understand this... set_freezable_with_signal() has a
lot of problems, yes... But even if it wasn't removed this fix makes
sense anyway, afaics.

If freezable_with_signal caller does wait_event_freezable_timeout(),
__retval becomes -ERESTARTSYS after freeze_task(). The next iteration
will return 0 with the KERN_ERR message from schedule_timeout().

> but it would be nice if you can sprinkle some documentation while
> at it. :)

But they already have the comment ;) What can I add?

Oleg.

^ permalink raw reply

* [PATCH] PM / Runtime: pm_runtime_idle can be called in atomic context
From: tom.leiming @ 2011-09-08 16:06 UTC (permalink / raw)
  To: rjw, stern; +Cc: linux-pm

From: Ming Lei <tom.leiming@gmail.com>

Add to pm_runtime_idle the list of functions that can be called
in atomic context if pm_runtime_irq_safe() has been called for the
device.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 Documentation/power/runtime_pm.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 08d70e4..1f05404 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -477,6 +477,7 @@ pm_runtime_autosuspend_expiration()
 If pm_runtime_irq_safe() has been called for a device then the following helper
 functions may also be used in interrupt context:
 
+pm_runtime_idle()
 pm_runtime_suspend()
 pm_runtime_autosuspend()
 pm_runtime_resume()
-- 
1.7.4.1

^ permalink raw reply related

* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Kevin Hilman @ 2011-09-08 13:51 UTC (permalink / raw)
  To: t-kristo
  Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
	linux-pm, linux-omap
In-Reply-To: <1315457911.2679.19.camel@sokoban>

Hi Tero,

Tero Kristo <t-kristo@ti.com> writes:

> On Wed, 2011-09-07 at 19:59 +0200, Hilman, Kevin wrote:
>> Tero Kristo <t-kristo@ti.com> writes:

[...]

>> > After thinking about this problem and looking at possible ways to fix
>> > it, I am planning to change the PRCM chain handler to be a driver, which
>> > gets suspended along with the rest of the system. This means the PRCM
>> > interrupt would get disabled also during this time, and it would be
>> > enabled in the driver->complete() call, which should happen after rest
>> > of the drivers have been able to enable their PM runtime in the
>> > driver->resume() call chain. Do you see any problems with this approach?
>> 
>> How will the system wakeup from retention or off-mode if the PRCM IRQ is
>> disabled?
>
> This is actually some sort of an issue with retention if we disable PRCM
> irq completely, off works purely with wakeup signals as we come out of
> reset. Anyway, I did some experimentation with this, and OMAP3 is able
> to wake up if we leave WKUP irq up, but disable IO chain irq. IO chain
> events seem to trigger a WKUP event also always, we just postpone the
> processing of IO chain until later. I had to also split the wakeup
> clearing for OMAP3 into 2 parts, one part handles wakeups and another IO
> chain. Currently both IO chain and WKUP are acked by the same handler.

Here's another option, which is kind of a hybrid of what's been
discussed so far.

The PRCM driver would leave the IRQs enabled during suspend, but would
just delay delivering them to the drivers.  IOW, handle/clear the PRCM
IRQ normally, but delay delivery of the *device* IRQ until the
->complete callback of the PRCM driver.

Doing this ensures all the drivers are resumed before any device IRQ is
delivered, and doesn't require any additional queuing of events in the
drivers.

Kevin

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Konrad Rzeszutek Wilk @ 2011-09-08 13:38 UTC (permalink / raw)
  To: Cihula, Joseph
  Cc: Brown, Len, Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
	keir@xen.org, x86@kernel.org, linux-acpi@vger.kernel.org,
	tboot-devel@lists.sourceforge.net, Fitzhardinge,
	tglx@linutronix.de, hpa@zytor.com, liang.tang@oracle.com,
	linux-pm@lists.linux-foundation.org
In-Reply-To: <20110907190659.GH7074@dumpdata.com>

On Wed, Sep 07, 2011 at 03:06:59PM -0400, Konrad Rzeszutek Wilk wrote:
> > > >> +#define XENPF_enter_acpi_sleep    51
> > > >> +struct xenpf_enter_acpi_sleep {
> > > >> +	/* IN variables */
> > > >> +	uint16_t pm1a_cnt_val;      /* PM1a control value. */
> > > >> +	uint16_t pm1b_cnt_val;      /* PM1b control value. */
> > > > These are uint32_t in native Linux--why truncate in the API and not at use?
> > > 
> > > Does ACPI define them as 32 or 16 bit?
> > 
> > The spec indicates that the length is variable and could be up to 32 bits (AFAICT).  And Linux uses 32b, which your other patch is truncating for this call.
> 
> Yikes! Well, looks like we need to fix the Xen ABI too. Lets get that fixed
> and also address all the other comments (thanks for looking at it) you pointed
> out.

So read up the ACPI spec and it says that the minimum is 2 bytes and does not
say anything about the maximum. The list of what the bits do stops at 16-bits
(the last two are reserved) so I think we are actually OK.

Albeit if the spec starts using more of them - then yes we will need to revist
this Xen ABI and potentially add a new call.

^ permalink raw reply

* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Tero Kristo @ 2011-09-08  4:58 UTC (permalink / raw)
  To: Hilman, Kevin
  Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
	linux-pm, linux-omap
In-Reply-To: <87fwk8qllx.fsf@ti.com>

On Wed, 2011-09-07 at 19:59 +0200, Hilman, Kevin wrote:
> Tero Kristo <t-kristo@ti.com> writes:
> 
> > Hi,
> >
> > On Sat, 2011-08-27 at 21:42 +0200, Alan Stern wrote:
> >> On Sat, 27 Aug 2011, Santosh wrote:
> >> 
> >> > On Saturday 27 August 2011 07:31 PM, Alan Stern wrote:
> >> > > On Sat, 27 Aug 2011, Santosh wrote:
> >> > >
> >> > >> I might be wrong here, but after discussion with Govindraj on this
> >> > >> issue, it seems there is a flaw in the way OMAP chain handler
> >> > >> handling the child interrupts.
> >> > >>
> >> > >> On OMAP, we have special interrupt wakeup source at PRCM level and
> >> > >> many devices can trigger wakeup from low power via this common
> >> > >> interrupt source. The common interrupt source upon wakeup from low
> >> > >> power state, decodes the source of interrupt and based on that
> >> > >> source, calls the respective device ISR directly.
> >> > >>
> >> > >> The issue I see here is, the ISR on _a_ device (UART in this case)
> >> > >> is happening even before UART resume and DPM resume has been completed.
> >> > >> If this is the case, then it is surely asking for trouble. Because not
> >> > >> just clocks, but even driver state machine is already in suspend state
> >> > >> when the ISR is called.
> >> > >
> >> > > If the driver state machine is in the suspend state when the ISR is
> >> > > called, then the ISR should realize it is handling a wakeup event
> >> > > instead of a normal I/O event.  All it needs to do is turn off the
> >> > > interrupt source; the event itself will be taken care of during the
> >> > > device's resume callback.
> >> > >
> >> > Good point but the ISR is called as a function call and not real
> >> > hardware event so no need to turn-off the source in the child
> >> > ISR. Parent ISR will clear the source anyways.
> >> > 
> >> > But the intention here is to record the event for the child.
> >> 
> >> In that case the ISR only has to record the event.
> >> 
> >> > I mean for UART wakeup, the received character should be
> >> > extracted. If not done, UART will loose that character because
> >> > the event is lost. So not sure how the event will be taken
> >> > care during resume callback. Could you elaborate bit more on
> >> > last comment please?
> >> 
> >> The resume callback routine must check to see if an event was recorded.
> >> If one was, the routine must see whether a character was received while 
> >> the system was asleep and extract the character from the UART.  (It 
> >> probably won't hurt to do this even if an event wasn't recorded.)
> >> 
> >> Alan Stern
> >> 
> >
> > After thinking about this problem and looking at possible ways to fix
> > it, I am planning to change the PRCM chain handler to be a driver, which
> > gets suspended along with the rest of the system. This means the PRCM
> > interrupt would get disabled also during this time, and it would be
> > enabled in the driver->complete() call, which should happen after rest
> > of the drivers have been able to enable their PM runtime in the
> > driver->resume() call chain. Do you see any problems with this approach?
> 
> How will the system wakeup from retention or off-mode if the PRCM IRQ is
> disabled?

This is actually some sort of an issue with retention if we disable PRCM
irq completely, off works purely with wakeup signals as we come out of
reset. Anyway, I did some experimentation with this, and OMAP3 is able
to wake up if we leave WKUP irq up, but disable IO chain irq. IO chain
events seem to trigger a WKUP event also always, we just postpone the
processing of IO chain until later. I had to also split the wakeup
clearing for OMAP3 into 2 parts, one part handles wakeups and another IO
chain. Currently both IO chain and WKUP are acked by the same handler.

> Besides that, I don't like this approach because it leaves a rather long
> window during which no PRCM-triggered wakeup events can happen.  This is
> not good since we also want potential wakeup events that happen *during*
> system suspend to be able to cancel the suspend.

This should also be taken care of by the approach described above.

> 
> > The only issue I am seeing myself is if some driver decides to do
> > resume() in the complete() pm-op and potentially screwing the ordering
> > here...
> 
> Personally, I think Alan's approach is the only scalable approach.  This
> has to be handled by the drivers.
> 
> If the driver's ISR does a pm_runtime_get_sync() and it fails (in this
> case, with -EACCESS since runtime PM is disabled), the driver should
> handle that handle as Alan described above.

Yea I think this probably needs to be done anyway, at least on some
cases. The PRCM chain handler approach might be able to solve most of
the problems though. I think I will post what I have anyway for comments
hopefully later today, so you can have a look and say what you think.

-Tero



Texas Instruments Oy, Tekniikantie 12, 02150 Espoo. Y-tunnus: 0115040-6. Kotipaikka: Helsinki
 

^ permalink raw reply

* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Tejun Heo @ 2011-09-08  1:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110907182217.GB13909@redhat.com>

Hello,

On Wed, Sep 07, 2011 at 08:22:17PM +0200, Oleg Nesterov wrote:
> wait_event_freezable() and wait_event_freezable_timeout() stop
> the waiting if try_to_freeze() fails. This is not right, we can
> race with __thaw_task() and in this case
> 
> 	- wait_event_freezable() returns the wrong ERESTARTSYS
> 
> 	- wait_event_freezable_timeout() can return the positive
> 	  value while condition == F

Indeed, nice catch.  This one actually is pretty dangerous.  We
probably want to make a separate fix for this and backport it to
-stable?

> Change the code to always check __retval/condition before return.
> 
> Note: with or without this patch the timeout logic looks strange,
> probably we should recalc timeout if try_to_freeze() returns T.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Yeap, with freezable_with_signal gone, this looks correct & simpler to
me but it would be nice if you can sprinkle some documentation while
at it. :)

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Konrad Rzeszutek Wilk @ 2011-09-07 19:06 UTC (permalink / raw)
  To: Cihula, Joseph
  Cc: Brown, Len, Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
	keir@xen.org, x86@kernel.org, linux-acpi@vger.kernel.org,
	tboot-devel@lists.sourceforge.net, Fitzhardinge,
	tglx@linutronix.de, hpa@zytor.com, liang.tang@oracle.com,
	linux-pm@lists.linux-foundation.org
In-Reply-To: <9F57BF860713DF4BA3EFA4F8C6DFEDAC16F40F0B@ORSMSX101.amr.corp.intel.com>

> > >> +#define XENPF_enter_acpi_sleep    51
> > >> +struct xenpf_enter_acpi_sleep {
> > >> +	/* IN variables */
> > >> +	uint16_t pm1a_cnt_val;      /* PM1a control value. */
> > >> +	uint16_t pm1b_cnt_val;      /* PM1b control value. */
> > > These are uint32_t in native Linux--why truncate in the API and not at use?
> > 
> > Does ACPI define them as 32 or 16 bit?
> 
> The spec indicates that the length is variable and could be up to 32 bits (AFAICT).  And Linux uses 32b, which your other patch is truncating for this call.

Yikes! Well, looks like we need to fix the Xen ABI too. Lets get that fixed
and also address all the other comments (thanks for looking at it) you pointed
out.

^ permalink raw reply

* [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Oleg Nesterov @ 2011-09-07 18:22 UTC (permalink / raw)
  To: Tejun Heo, Rafael J. Wysocki; +Cc: containers, paul, linux-kernel, linux-pm
In-Reply-To: <20110907182156.GA13909@redhat.com>

wait_event_freezable() and wait_event_freezable_timeout() stop
the waiting if try_to_freeze() fails. This is not right, we can
race with __thaw_task() and in this case

	- wait_event_freezable() returns the wrong ERESTARTSYS

	- wait_event_freezable_timeout() can return the positive
	  value while condition == F

Change the code to always check __retval/condition before return.

Note: with or without this patch the timeout logic looks strange,
probably we should recalc timeout if try_to_freeze() returns T.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---

 include/linux/freezer.h |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

--- 3.1/include/linux/freezer.h~w_e_f	2011-09-04 20:23:30.000000000 +0200
+++ 3.1/include/linux/freezer.h	2011-09-07 20:00:27.000000000 +0200
@@ -107,32 +107,33 @@ static inline int freezer_should_skip(st
  * Freezer-friendly wrappers around wait_event_interruptible() and
  * wait_event_interruptible_timeout(), originally defined in <linux/wait.h>
  */
-
 #define wait_event_freezable(wq, condition)				\
 ({									\
 	int __retval;							\
-	do {								\
+	for (;;) {							\
 		__retval = wait_event_interruptible(wq, 		\
 				(condition) || freezing(current));	\
-		if (__retval && !freezing(current))			\
+		if (__retval || (condition))				\
 			break;						\
-		else if (!(condition))					\
-			__retval = -ERESTARTSYS;			\
-	} while (try_to_freeze());					\
+		try_to_freeze();					\
+	}								\
 	__retval;							\
 })
 
-
 #define wait_event_freezable_timeout(wq, condition, timeout)		\
 ({									\
 	long __retval = timeout;					\
-	do {								\
+	for (;;) {							\
 		__retval = wait_event_interruptible_timeout(wq,		\
 				(condition) || freezing(current),	\
 				__retval); 				\
-	} while (try_to_freeze());					\
+		if (__retval <= 0 || (condition))			\
+			break;						\
+		try_to_freeze();					\
+	}								\
 	__retval;							\
 })
+
 #else /* !CONFIG_FREEZER */
 static inline bool frozen(struct task_struct *p) { return false; }
 static inline bool freezing(struct task_struct *p) { return false; }

^ permalink raw reply

* [PATCH 0/1] freezer: fix wait_event_freezable/__thaw_task races
From: Oleg Nesterov @ 2011-09-07 18:21 UTC (permalink / raw)
  To: Tejun Heo, Rafael J. Wysocki; +Cc: containers, paul, linux-kernel, linux-pm
In-Reply-To: <20110906155332.GF18425@mtj.dyndns.org>

Hi,

On 09/07, Tejun Heo wrote:
> >
> > I meant, unless the caller plays with allow_signal(), it has all rights to do
> >
> > 	if (wait_event_freezable(...))
> > 		BUG();
> >
> > This becomes correct with the code above.
>
> Yeap, sure, w/ freezable_with_signal gone, the above should work fine.
> Care to create a patch?

Sure.

But. Tejun, Rafael, I have to rely on your review. I have no idea how
can I test this change. Hopfully it is simple enough, though.

And. wait_event_freezable_timeout() obviously has another problem,
although I hope this is fine. The caller can spend a lot of time in
refrigerator, shouldn't we update "timeout" in this case? I hope we
shouldn't.

Oleg.

^ permalink raw reply

* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Kevin Hilman @ 2011-09-07 17:59 UTC (permalink / raw)
  To: t-kristo
  Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
	linux-pm, linux-omap
In-Reply-To: <1315410513.2679.9.camel@sokoban>

Tero Kristo <t-kristo@ti.com> writes:

> Hi,
>
> On Sat, 2011-08-27 at 21:42 +0200, Alan Stern wrote:
>> On Sat, 27 Aug 2011, Santosh wrote:
>> 
>> > On Saturday 27 August 2011 07:31 PM, Alan Stern wrote:
>> > > On Sat, 27 Aug 2011, Santosh wrote:
>> > >
>> > >> I might be wrong here, but after discussion with Govindraj on this
>> > >> issue, it seems there is a flaw in the way OMAP chain handler
>> > >> handling the child interrupts.
>> > >>
>> > >> On OMAP, we have special interrupt wakeup source at PRCM level and
>> > >> many devices can trigger wakeup from low power via this common
>> > >> interrupt source. The common interrupt source upon wakeup from low
>> > >> power state, decodes the source of interrupt and based on that
>> > >> source, calls the respective device ISR directly.
>> > >>
>> > >> The issue I see here is, the ISR on _a_ device (UART in this case)
>> > >> is happening even before UART resume and DPM resume has been completed.
>> > >> If this is the case, then it is surely asking for trouble. Because not
>> > >> just clocks, but even driver state machine is already in suspend state
>> > >> when the ISR is called.
>> > >
>> > > If the driver state machine is in the suspend state when the ISR is
>> > > called, then the ISR should realize it is handling a wakeup event
>> > > instead of a normal I/O event.  All it needs to do is turn off the
>> > > interrupt source; the event itself will be taken care of during the
>> > > device's resume callback.
>> > >
>> > Good point but the ISR is called as a function call and not real
>> > hardware event so no need to turn-off the source in the child
>> > ISR. Parent ISR will clear the source anyways.
>> > 
>> > But the intention here is to record the event for the child.
>> 
>> In that case the ISR only has to record the event.
>> 
>> > I mean for UART wakeup, the received character should be
>> > extracted. If not done, UART will loose that character because
>> > the event is lost. So not sure how the event will be taken
>> > care during resume callback. Could you elaborate bit more on
>> > last comment please?
>> 
>> The resume callback routine must check to see if an event was recorded.
>> If one was, the routine must see whether a character was received while 
>> the system was asleep and extract the character from the UART.  (It 
>> probably won't hurt to do this even if an event wasn't recorded.)
>> 
>> Alan Stern
>> 
>
> After thinking about this problem and looking at possible ways to fix
> it, I am planning to change the PRCM chain handler to be a driver, which
> gets suspended along with the rest of the system. This means the PRCM
> interrupt would get disabled also during this time, and it would be
> enabled in the driver->complete() call, which should happen after rest
> of the drivers have been able to enable their PM runtime in the
> driver->resume() call chain. Do you see any problems with this approach?

How will the system wakeup from retention or off-mode if the PRCM IRQ is
disabled?

Besides that, I don't like this approach because it leaves a rather long
window during which no PRCM-triggered wakeup events can happen.  This is
not good since we also want potential wakeup events that happen *during*
system suspend to be able to cancel the suspend.

> The only issue I am seeing myself is if some driver decides to do
> resume() in the complete() pm-op and potentially screwing the ordering
> here...

Personally, I think Alan's approach is the only scalable approach.  This
has to be handled by the drivers.

If the driver's ISR does a pm_runtime_get_sync() and it fails (in this
case, with -EACCESS since runtime PM is disabled), the driver should
handle that handle as Alan described above.

Kevin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox