* Re: [PATCH 4/6 v2] PM: Limit race conditions between runtime PM and system sleep (v2)
From: Linus Walleij @ 2011-10-27 13:54 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Kevin Hilman, linux-scsi, Greg KH, LKML, Jesse Barnes, Tejun Heo,
Linux PM mailing list, stable
In-Reply-To: <201106292334.24518.rjw@sisk.pl>
2011/6/29 Rafael J. Wysocki <rjw@sisk.pl>:
> One of the roles of the PM core is to prevent different PM callbacks
> executed for the same device object from racing with each other.
> Unfortunately, after commit e8665002477f0278f84f898145b1f141ba26ee26
> (PM: Allow pm_runtime_suspend() to succeed during system suspend)
> runtime PM callbacks may be executed concurrently with system
> suspend/resume callbacks for the same device.
(...)
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
A quick question: is there some specific reason why this patch should
not go into the 3.0.y stable releases? We are trying to produce
a runtime PM system of product quality based on 3.0.y and we've
already had to backport this patch ourselves to get things stable.
We have also backported:
PM: Introduce generic "noirq" callback routines for subsystems (v2)
PM / Runtime: Update documentation of interactions with system sleep
PM / Runtime: Add new helper function: pm_runtime_status_suspended()
And now it seems to be sufficient to get this thing going.
Yours,
Linus Walleij
^ permalink raw reply
* Re: Problem statement: Opportunistic suspend and i8042 wakeups
From: Dmitry Torokhov @ 2011-10-27 5:47 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-pm, linux-input
In-Reply-To: <CAMLZHHTjZdxiRsEP=tQyQafkPHftOtq6XRgd09MdCJowSxKCvA@mail.gmail.com>
On Thu, Oct 13, 2011 at 03:25:04PM +0100, Daniel Drake wrote:
> Hi,
>
> Thanks for not getting bored of this :)
>
> On Mon, Oct 10, 2011 at 9:38 PM, Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
> > I think input layer releasing keys at resume time is actually in the
> > wrong here, as it meddles with device state after physical device
> > (i8042, atkbd, etc) are fully woken up.
>
> Yes, that's whats happening here.
>
> > IIRC I put releasing originally in resume because I was concerned
> > with events getting discarded after S2D because we thew away the state
> > that happens after taking snapshot and there weren't enough granularity
> > in PM callbacks to differentiate between S2D and S2R.
> >
> > I believe if we move release of the keys into suspend handlers then
> > input core should not get into the middle of things for your case as you
> > said that the OLPC device will not suspend with a key still pressed and
> > upon release we would not be doing anything with key state (but
> > restore LED/SOUND only) which you do not care about.
>
> Not quite.
>
> The system will suspend, but will resume immediately (because of an
> incoming key press interrupt, which are repeated when the key is held
> down).
Ah, I thought it won't even attempt to suspend while key is depressed.
> So, if Linux fabricates a key release event in the suspend
> handler, userspace would see two key release events: the fabricated
> one, and the one that comes from when the user later releases the key
> resulting in a break interrupt. This would cause two characters to be
> printed on screen when the user only pressed the key once.
Input core will suppress the 2nd release though so I do not think that
userspace will see it.
Thanks.
--
Dmitry
^ permalink raw reply
* Linux Foundation mailing list services restored
From: Mailman Admin @ 2011-10-27 0:47 UTC (permalink / raw)
Hello,
Mailing list services have been restored and we are in the testing
phase, so please let us know if you see any issues.
Most importantly, we want to thank you for your patience and
understanding as we worked around the clock to restore services. For
web sites associated with particular projects that have not yet been
restored, we are reaching out this week and next to the project leads to
begin restoring the web services. Thanks for your continued patience.
The Linux Foundation
^ permalink raw reply
* Re: [markgross@thengar.org: [RFC] wake up notifications and suspend blocking (aka more wakelock stuff)]
From: Alan Stern @ 2011-10-25 15:14 UTC (permalink / raw)
To: mark gross
Cc: khilman, Dmitry Fink (Palm GBU), NeilBrown, linux-kernel,
amit.kucheria, peterz, John Stultz, farrowg, Magnus Damm,
linux-pm, mjg
In-Reply-To: <20111025045046.GA8383@mgross-G62>
On Mon, 24 Oct 2011, mark gross wrote:
> > > Timeout? why can't we define a proper notification handshake for such
> > > things? Timeouts are never right IMO.
> > >
> >
> > I thought that before, but I have come around to the opposite way of thinking
> > thanks to some instructive examples from Alan and Rafael.
> >
> > Some things are simply not visible to the OS. We can expect them to be
> > happening but we cannot be sure and there is no clear signal that they aren't
> > actually happening. So we need a timeout.
>
> um
> > - USB cannot (I think) know which USB device triggered a wake-from-suspend,
> > and in any case cannot directly ask the device why it woke from suspend.
> > It has to wait for the device to tell it (in response to a regular
> > 'interrupt' poll I assume - but it isn't guaranteed to be reported on the
> > first poll) - or timeout and give up waiting.
> maybe if you are unwilling to change the user mode stack that is reading
> these events. But if you where they you shouldn't need time outs.
Remember, here we are talking about timeouts in the kernel stack, not
in the user-mode stack.
So consider this theoretical situation (which is not very different
from reality): The system gets a wakeup signal. Sometime in the next
30 ms or less, there may or may not be an input event -- the kernel has
no way to tell other than wait and see.
The kernel could simply go right back to sleep without waiting, but if
it does and there is a pending input event, then very quickly it will
get another wakeup signal, and it'll be right back where it started --
trying to decide whether to stay awake for the next 30 ms.
Can you suggest a way to handle this other than using a timeout?
Alan Stern
^ permalink raw reply
* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-10 22:52 UTC (permalink / raw)
To: Jean Pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <alpine.DEB.2.00.1110101636470.5205@utopia.booyaka.com>
On Mon, 10 Oct 2011, Paul Walmsley wrote:
> OK. sys_offmode only applies to OFF mode. Voltage scaling can also occur
> during RETENTION and INACTIVE ("sleep"). So were these results with
> retention and voltage scaling disabled?
This last sentence should have read:
So were these results with retention and sleep voltage scaling disabled?
- Paul
^ permalink raw reply
* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-10 22:44 UTC (permalink / raw)
To: Jean Pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <CAORVsuX2PUubPzO3mWTaO0Lg6Mdog8wjDY6hm93qheMwoz27CA@mail.gmail.com>
On Mon, 10 Oct 2011, Jean Pihet wrote:
> On Fri, Oct 7, 2011 at 6:17 AM, Paul Walmsley <paul@pwsan.com> wrote:
> > On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:
> >
> >> From: Jean Pihet <j-pihet@ti.com>
> >>
> >> Figures are added to the power domains structs for RET and OFF modes.
> >>
> >> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> >> from actual measurements.
> >> The latency figures for the other power domains are preliminary and
> >> shall be added.
> >>
> >> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> >> for a detailed explanation on where are the numbers magically coming from.
> >>
> >> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> >> on MPU, CORE and PER.
> >
> > Do the CSWR measurements include the time for the PMIC to scale the
> > voltage, or do they just represent the time to enter and exit clock stop
> > (presumably with DPLL idling)?
>
> As described at [1] the measurements have not been performed with
> sys_offmode and sys_clkreq signals toggling correctly, because of the
> lack of support for it in the kernel.
> However the results are including a correction for the sys_offmode
> transition time (11.5ms), but no correction for the sys_clkreq signal
> (which should be 1ms for sysclk on, 2.5ms for sysclk off).
>
> [1] http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement#cpuidle_results
OK. sys_offmode only applies to OFF mode. Voltage scaling can also occur
during RETENTION and INACTIVE ("sleep"). So were these results with
retention and voltage scaling disabled?
- Paul
^ permalink raw reply
* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-07 15:26 UTC (permalink / raw)
To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-9-git-send-email-j-pihet@ti.com>
Hi Jean
On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:
> Figures are added to the power domains structs for RET and OFF modes.
>
> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> from actual measurements.
> The latency figures for the other power domains are preliminary and
> shall be added.
>
> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> for a detailed explanation on where are the numbers magically coming from.
>
> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> on MPU, CORE and PER.
A few more question about these numbers.
- Are these numbers the ones right off the hardware, or rounded, or
worst-case plus a certain percentage?
- What frequency/voltage were these measured at?
- Also, you mention OMAP3 Beagleboard. Is that 34xx or 36xx Beagleboard?
> Signed-off-by: Jean Pihet <j-pihet@ti.com>
> ---
> arch/arm/mach-omap2/powerdomains3xxx_data.c | 78 +++++++++++++++++++++++++++
> 1 files changed, 78 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/powerdomains3xxx_data.c b/arch/arm/mach-omap2/powerdomains3xxx_data.c
> index 469a920..32922bb 100644
> --- a/arch/arm/mach-omap2/powerdomains3xxx_data.c
> +++ b/arch/arm/mach-omap2/powerdomains3xxx_data.c
> @@ -31,6 +31,14 @@
>
> /*
> * Powerdomains
> + *
> + * The wakeup_lat values are derived from measurements on
> + * the actual target.
> + *
> + * Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> + * from actual measurements.
> + * The latency figures for the other power domains are preliminary and
> + * shall be added.
> */
>
> static struct powerdomain iva2_pwrdm = {
> @@ -52,6 +60,13 @@ static struct powerdomain iva2_pwrdm = {
> [2] = PWRSTS_OFF_ON,
> [3] = PWRSTS_ON,
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 1100,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 350,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain mpu_3xxx_pwrdm = {
> @@ -68,6 +83,13 @@ static struct powerdomain mpu_3xxx_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_OFF_ON,
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 1830,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 121,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> /*
> @@ -98,6 +120,13 @@ static struct powerdomain core_3xxx_pre_es3_1_pwrdm = {
> [0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
> [1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 3082,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 153,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain core_3xxx_es3_1_pwrdm = {
> @@ -121,6 +150,13 @@ static struct powerdomain core_3xxx_es3_1_pwrdm = {
> [0] = PWRSTS_OFF_RET_ON, /* MEM1ONSTATE */
> [1] = PWRSTS_OFF_RET_ON, /* MEM2ONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 3082,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 153,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain dss_pwrdm = {
> @@ -136,6 +172,13 @@ static struct powerdomain dss_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_ON, /* MEMONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 70,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 20,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> /*
> @@ -157,6 +200,13 @@ static struct powerdomain sgx_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_ON, /* MEMONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 1000,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain cam_pwrdm = {
> @@ -172,6 +222,13 @@ static struct powerdomain cam_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_ON, /* MEMONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 850,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 35,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain per_pwrdm = {
> @@ -187,6 +244,13 @@ static struct powerdomain per_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_ON, /* MEMONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 671,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 31,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain emu_pwrdm = {
> @@ -201,6 +265,13 @@ static struct powerdomain neon_pwrdm = {
> .omap_chip = OMAP_CHIP_INIT(CHIP_IS_OMAP3430),
> .pwrsts = PWRSTS_OFF_RET_ON,
> .pwrsts_logic_ret = PWRSTS_RET,
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 0,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 0,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain usbhost_pwrdm = {
> @@ -223,6 +294,13 @@ static struct powerdomain usbhost_pwrdm = {
> .pwrsts_mem_on = {
> [0] = PWRSTS_ON, /* MEMONSTATE */
> },
> + .wakeup_lat = {
> + [PWRDM_FUNC_PWRST_OFF] = 800,
> + [PWRDM_FUNC_PWRST_OSWR] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_CSWR] = 150,
> + [PWRDM_FUNC_PWRST_INACTIVE] = UNSUP_STATE,
> + [PWRDM_FUNC_PWRST_ON] = 0,
> + },
> };
>
> static struct powerdomain dpll1_pwrdm = {
> --
> 1.7.4.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
- Paul
^ permalink raw reply
* Re: [PATCH 8/8] OMAP3: powerdomain data: add wake-up latency figures
From: Paul Walmsley @ 2011-10-07 4:17 UTC (permalink / raw)
To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-9-git-send-email-j-pihet@ti.com>
Hi Jean
On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
>
> Figures are added to the power domains structs for RET and OFF modes.
>
> Note: the latency figures for MPU, PER, CORE, NEON have been obtained
> from actual measurements.
> The latency figures for the other power domains are preliminary and
> shall be added.
>
> Cf. http://www.omappedia.org/wiki/Power_Management_Device_Latencies_Measurement
> for a detailed explanation on where are the numbers magically coming from.
>
> Tested on OMAP3 Beagleboard in RET/OFF using wake-up latency constraints
> on MPU, CORE and PER.
Do the CSWR measurements include the time for the PMIC to scale the
voltage, or do they just represent the time to enter and exit clock stop
(presumably with DPLL idling)?
- Paul
^ permalink raw reply
* Re: [PATCH 4/8] OMAP2+: omap_hwmod: manage the wake-up latency constraints
From: Paul Walmsley @ 2011-10-07 2:53 UTC (permalink / raw)
To: jean.pihet; +Cc: Kevin Hilman, Linux PM mailing list, linux-omap, Jean Pihet
In-Reply-To: <1316622258-3892-5-git-send-email-j-pihet@ti.com>
Hi
a comment:
On Wed, 21 Sep 2011, jean.pihet@newoldbits.com wrote:
> From: Jean Pihet <j-pihet@ti.com>
>
> Hwmod is queried from the OMAP_PM layer to manage the power domains
> wake-up latency constraints. Hwmod retrieves the correct power domain
> and if it exists it calls the corresponding power domain function.
>
> Tested on OMAP3 Beagleboard and OMAP4 Pandaboard in RET/OFF using wake-up
> latency constraints on MPU, CORE and PER.
>
> Signed-off-by: Jean Pihet <j-pihet@ti.com>
> ---
> arch/arm/mach-omap2/omap_hwmod.c | 26 +++++++++++++++++++++++++-
> arch/arm/plat-omap/include/plat/omap_hwmod.h | 2 ++
> 2 files changed, 27 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
> index 84cc0bd..c6b1cc9 100644
> --- a/arch/arm/mach-omap2/omap_hwmod.c
> +++ b/arch/arm/mach-omap2/omap_hwmod.c
> @@ -2618,11 +2619,34 @@ ohsps_unlock:
> return ret;
> }
>
> +/*
> + * omap_hwmod_set_wkup_constraint- set/release a wake-up latency constraint
> + *
> + * @oh: struct omap_hwmod* to which the target device belongs to.
> + * @cookie: identifier of the constraints list for @oh.
> + * @min_latency: the minimum allowed wake-up latency for @oh.
> + *
> + * Returns 0 upon success.
> + */
It's good that there is some documentation here, but it would be better if
it were kerneldoc-style documentation. Please see
Documentation/kernel-doc-nano-HOWTO.txt.
Also it would be good to have a bit more documentation here beyond simply
"Returns 0 upon success." For example, how should a caller remove a
wakeup latency constraint? Also, this function can return -EINVAL, not
counting whatever pwrdm_set_wkup_lat_constraint() can return, so that
should be documented above also.
This applies to the function comments in the rest of the patches too.
Some of them have quite good kerneldoc comments, such as
pwrdm_wakeuplat_update_pwrst(); others need some work, like
pwrdm_set_wkup_lat_constraint().
> +int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh,
> + void *cookie, long min_latency)
> +{
> + struct powerdomain *pwrdm = omap_hwmod_get_pwrdm(oh);
> +
> + if (!pwrdm) {
> + pr_err("%s: Error: could not find powerdomain "
> + "for %s\n", __func__, oh->name);
> + return -EINVAL;
> + }
> +
> + return pwrdm_set_wkup_lat_constraint(pwrdm, cookie, min_latency);
> +}
> +
> /**
> * omap_hwmod_get_context_loss_count - get lost context count
> * @oh: struct omap_hwmod *
> *
> - * Query the powerdomain of of @oh to get the context loss
> + * Query the powerdomain of @oh to get the context loss
> * count for this device.
> *
> * Returns the context loss count of the powerdomain assocated with @oh
> diff --git a/arch/arm/plat-omap/include/plat/omap_hwmod.h b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> index 0e329ca..75e0e7a 100644
> --- a/arch/arm/plat-omap/include/plat/omap_hwmod.h
> +++ b/arch/arm/plat-omap/include/plat/omap_hwmod.h
> @@ -603,6 +603,8 @@ int omap_hwmod_for_each_by_class(const char *classname,
> void *user);
>
> int omap_hwmod_set_postsetup_state(struct omap_hwmod *oh, u8 state);
> +int omap_hwmod_set_wkup_lat_constraint(struct omap_hwmod *oh, void *cookie,
> + long min_latency);
> u32 omap_hwmod_get_context_loss_count(struct omap_hwmod *oh);
>
> int omap_hwmod_no_setup_reset(struct omap_hwmod *oh);
> --
> 1.7.4.1
>
- Paul
^ permalink raw reply
* Re: [PATCHSET cgroup] extend threadgroup locking
From: Tejun Heo @ 2011-09-11 3:35 UTC (permalink / raw)
To: rjw, paul, lizf
Cc: fweisbec, containers, linux-kernel, Oleg Nesterov, linux-pm, akpm
In-Reply-To: <1315159280-25032-1-git-send-email-htejun@gmail.com>
On Mon, Sep 05, 2011 at 03:01:16AM +0900, Tejun Heo wrote:
> Hello,
>
> cgroup currently only blocks new threads from joining the target
> threadgroup during migration, and on-going migration could race
> against exec and exit leading to interesting problems - the symmetry
> between various attach methods, task exiting during method execution,
> ->exit() racing against attach methods, migrating task switching basic
> properties during exec and so on.
>
> This patchset extends threadgroup locking such that it covers all
> operations which can alter the threadgroup - fork, exit and exec, and
> update cgroup to take advantage of it. rwsem read ops are added to
> exit path but exec is excluded by grabbing the existing
> cred_guard_mutex from threadgroup locking helper.
>
> This makes threadgroup locking complete and resolves cgroup issues
> stemming from the target taskset being unstable.
>
> This patchset is on top of the current pm-freezer + "freezer: fixes &
> simplifications" patchset and contains the following four patches.
> Patch list and diffstat follow.
>
> Thanks.
>
> [PATCH 1/4] cgroup: change locking order in attach_task_by_pid()
> [PATCH 2/4] threadgroup: rename signal->threadgroup_fork_lock to
> [PATCH 3/4] threadgroup: extend threadgroup_lock() to cover exit and
> [PATCH 4/4] cgroup: always lock threadgroup during migration
>
> include/linux/init_task.h | 9 ++----
> include/linux/sched.h | 58 ++++++++++++++++++++++++++++---------------
> kernel/cgroup.c | 62 +++++++++++++++++++++-------------------------
> kernel/exit.c | 16 ++++++++---
> kernel/fork.c | 8 ++---
> 5 files changed, 88 insertions(+), 65 deletions(-)
Oops, forgot to cc Oleg on this series.
Oleg, this one definitely needs your review. The original thread is...
http://thread.gmane.org/gmane.linux.kernel/1187853/focus=1188347
If you want the series in mbox format, please let me know.
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Tejun Heo @ 2011-09-11 1:54 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110908175926.GA26986@redhat.com>
Hello,
On Thu, Sep 08, 2011 at 07:59:26PM +0200, Oleg Nesterov wrote:
> > Indeed, nice catch. This one actually is pretty dangerous. We
> > probably want to make a separate fix for this and backport it to
> > -stable?
>
> And I forgot to mention that wait_event_freezable_timeout() doesn't
> handle -ERESTARTSYS at all.
>
> But I don't think -stable needs this fix. According to grep, nobody
> check the returned value, and none of the callers plays with signals.
Ah, no user, okay.
> > Yeap, with freezable_with_signal gone, this looks correct & simpler to
> > me
>
> I don't really understand this... set_freezable_with_signal() has a
> lot of problems, yes... But even if it wasn't removed this fix makes
> sense anyway, afaics.
>
> If freezable_with_signal caller does wait_event_freezable_timeout(),
> __retval becomes -ERESTARTSYS after freeze_task(). The next iteration
> will return 0 with the KERN_ERR message from schedule_timeout().
Hmmm... but with the change, if a kthread gets TIF_SIGPENDING from
freezer and then thawed, it would not enter try_to_freeze() and thus
won't clear TIF_SIGPENDING. The original code was racy too but the
window would be much larger afterwards. Anyways, this doesn't matter
anymore.
> > but it would be nice if you can sprinkle some documentation while
> > at it. :)
>
> But they already have the comment ;) What can I add?
Proper /** - */ comment w/ return value documentation? :P
Thanks.
--
tejun
^ permalink raw reply
* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Tejun Heo @ 2011-09-11 1:37 UTC (permalink / raw)
To: Matt Helsley; +Cc: containers, Oleg Nesterov, linux-kernel, linux-pm, paul
In-Reply-To: <20110908180159.GA4197@count0.beaverton.ibm.com>
Hello, Matt.
On Thu, Sep 08, 2011 at 11:01:59AM -0700, Matt Helsley wrote:
> Looking at the code and docs I realize I didn't explicitly mention that
> kthreads could not be frozen by the cgroup freezer. However, the code did
> not allow it. When freezing tasks the cgroup freezer always did:
>
> freeze_task(task, true)
>
> which would only freeze tasks without PF_FREEZER_NOSIG due to the second
> "sig_only" parameter. I believe this means it could not be used to
> freeze kthreads.
>
> My feeling is kthreads should not be "managed" directly by userspace.
> Especially if that includes the ability to arbitrarily stop or freeze them.
> So it seems more appropriate to explicitly disallow freezing of kthreads
> via the cgroup freezer.
Oh yeah, that's what we should do and maybe that's what the current
code intends to achieve too but currently it ends up triggering
BUG_ON().
Thanks.
--
tejun
^ permalink raw reply
* [PATCH 2/2] PM/runtime: handle ->runtime_suspend failure correctly
From: tom.leiming @ 2011-09-10 14:37 UTC (permalink / raw)
To: rjw, stern; +Cc: linux-pm, linux-kernel
In-Reply-To: <1315665442-4194-1-git-send-email-tom.leiming@gmail.com>
From: Ming Lei <tom.leiming@gmail.com>
If ->runtime_suspend returns -EAGAIN or -EBUSY, the device should
still be in ACTIVE state, so it is not needed to handle defered
resume and idle notification to its parent; if ->runtime_suspend
returns other fatal failure, it doesn't make sense to process defered
resume and send idle notification to its parent.
So skip these when failure is returned from ->runtime_suspend, also add
comments for this handling in rpm_suspend.
This patch also updates comments for rpm_suspend:
- 'Cancel a pending idle notification' should be put before, also
should be changed as 'Cancel a pending idle notification or
autosuspend/suspend'
- idle notification for suspend failure has been removed, so update
comments for it
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/base/power/runtime.c | 22 ++++++++++++++--------
1 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 441b5a3..6fa3241 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -284,14 +284,17 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
* @dev: Device to suspend.
* @rpmflags: Flag bits.
*
- * Check if the device's runtime PM status allows it to be suspended. If
- * another suspend has been started earlier, either return immediately or wait
- * for it to finish, depending on the RPM_NOWAIT and RPM_ASYNC flags. Cancel a
- * pending idle notification. If the RPM_ASYNC flag is set then queue a
- * suspend request; otherwise run the ->runtime_suspend() callback directly.
- * If a deferred resume was requested while the callback was running then carry
- * it out; otherwise send an idle notification for the device (if the suspend
- * failed) or for its parent (if the suspend succeeded).
+ * Check if the device's runtime PM status allows it to be suspended. Cancel
+ * a pending idle notification or autosuspend/suspend. If another suspend has
+ * been started earlier, either return immediately or wait for it to finish,
+ * depending on the RPM_NOWAIT and RPM_ASYNC flags. If the RPM_ASYNC flag is
+ * set then queue a suspend request; otherwise run the ->runtime_suspend()
+ * callback directly. If ->runtime_suspend returns failure, just cancel
+ * pending request and wake up waited tasks, then return immediatelly.
+ * After ->runtime_suspend succeeded, if a deferred resume was requested
+ * while the callback was running then carry it out; otherwise send an idle
+ * notification for its parent (if both ignore_children and irq_safe
+ * are not set).
*
* This function must be called under dev->power.lock with interrupts disabled.
*/
@@ -422,6 +425,9 @@ static int rpm_suspend(struct device *dev, int rpmflags)
}
wake_up_all(&dev->power.wait_queue);
+ if (retval)
+ goto out;
+
if (dev->power.deferred_resume) {
rpm_resume(dev, 0);
retval = -EAGAIN;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 1/2] PM/runtime: update document about callbacks
From: tom.leiming @ 2011-09-10 14:37 UTC (permalink / raw)
To: rjw, stern; +Cc: linux-pm, linux-kernel
From: Ming Lei <tom.leiming@gmail.com>
Support for device power domains has been introduced in
commit 9659cc0678b954f187290c6e8b247a673c5d37e1 (PM: Make
system-wide PM and runtime PM treat subsystems consistently),
also power domain callbacks will take precedence over subsystem ones
from commit 4d27e9dcff00a6425d779b065ec8892e4f391661(PM: Make
power domain callbacks take precedence over subsystem ones).
So update part of "Device Runtime PM Callbacks" in
Documentation/power/runtime_pm.txt.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
Documentation/power/runtime_pm.txt | 19 ++++++++++++-------
1 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 1f05404..0e85608 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -43,13 +43,18 @@ struct dev_pm_ops {
...
};
-The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks are
-executed by the PM core for either the device type, or the class (if the device
-type's struct dev_pm_ops object does not exist), or the bus type (if the
-device type's and class' struct dev_pm_ops objects do not exist) of the given
-device (this allows device types to override callbacks provided by bus types or
-classes if necessary). The bus type, device type and class callbacks are
-referred to as subsystem-level callbacks in what follows.
+The ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
+are executed by the PM core for either the power domain, or the device type
+(if the device power domain's struct dev_pm_ops does not exist), or the class
+(if the device power domain's and type's struct dev_pm_ops object does not
+exist), or the bus type (if the device power domain's, type's and class'
+struct dev_pm_ops objects do not exist) of the given device, so the priority
+order of callbacks from high to low is that power domain callbacks, device
+type callbacks, class callbacks and bus type callbacks, and the high priority
+one will take precedence over low priority one. The bus type, device type and
+class callbacks are referred to as subsystem-level callbacks in what follows,
+and generally speaking, the power domain callbacks are used for representing
+power domains within a SoC.
By default, the callbacks are always invoked in process context with interrupts
enabled. However, subsystems can use the pm_runtime_irq_safe() helper function
--
1.7.4.1
^ permalink raw reply related
* Re: LPC 2011 Power Management Session notes
From: mark gross @ 2011-09-09 22:50 UTC (permalink / raw)
To: Jean Pihet; +Cc: linux-pm
In-Reply-To: <CAORVsuWKd9uvTTNCr_k9W6DeA+z3ZEaD99bqit++4xUy9zqKSQ@mail.gmail.com>
On Fri, Sep 09, 2011 at 10:09:48AM +0200, Jean Pihet wrote:
> Hi,
>
> On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:
> >
> > Available at
> >
> > http://pad.ubuntu.com/plumbers-2011-pwr-mgt
> >
> > in real time if anyone's interested
> Thanks for the doc!
>
>
> About the Device PM QoS user space interface:
> 1. Proposal: For every device, there will be a sysfs wakeup latency attribute
> 2. Proposal: Userspace processes should communicate their needs via sysfs
>
> I assume there can be only one user app that can open a given sysfs
> entry at a time, is that correct?
>
> Let me think about it and propose a patch next week.
My recollection of the evening was that the exposing of per device qos
latencies was not considered a good idea without better articulation on
how it would be used. (or misused)
FWIW a few folks (ok maybe just me) are not comfortable with exposing
raw values without a good enough abstraction for them to user mode such
that the values used through that user mode ABI would be somewhat
portable between ISA's or even versions of the same chips / platforms.
--mark
> Regards,
> Jean
>
> >
> >
> > - Paul
> > _______________________________________________
> > linux-pm mailing list
> > linux-pm@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/linux-pm
> >
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* Re: LPC 2011 Power Management Session notes
From: Mark Brown @ 2011-09-09 16:50 UTC (permalink / raw)
To: Jean Pihet; +Cc: linux-pm
In-Reply-To: <CAORVsuWKd9uvTTNCr_k9W6DeA+z3ZEaD99bqit++4xUy9zqKSQ@mail.gmail.com>
On Fri, Sep 09, 2011 at 10:09:48AM +0200, Jean Pihet wrote:
> On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:
> About the Device PM QoS user space interface:
> 1. Proposal: For every device, there will be a sysfs wakeup latency attribute
> 2. Proposal: Userspace processes should communicate their needs via sysfs
> I assume there can be only one user app that can open a given sysfs
> entry at a time, is that correct?
No, that's not the case. The file implementation doesn't get to see
opens and closes at all, it just gets passed buffers to fill or read
from.
^ permalink raw reply
* Re: LPC 2011 Power Management Session notes
From: Jean Pihet @ 2011-09-09 8:09 UTC (permalink / raw)
To: Paul Walmsley; +Cc: linux-pm
In-Reply-To: <alpine.DEB.2.00.1109081827360.17022@utopia.booyaka.com>
Hi,
On Fri, Sep 9, 2011 at 2:28 AM, Paul Walmsley <paul@pwsan.com> wrote:
>
> Available at
>
> http://pad.ubuntu.com/plumbers-2011-pwr-mgt
>
> in real time if anyone's interested
Thanks for the doc!
About the Device PM QoS user space interface:
1. Proposal: For every device, there will be a sysfs wakeup latency attribute
2. Proposal: Userspace processes should communicate their needs via sysfs
I assume there can be only one user app that can open a given sysfs
entry at a time, is that correct?
Let me think about it and propose a patch next week.
Regards,
Jean
>
>
> - Paul
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>
^ permalink raw reply
* LPC 2011 Power Management Session notes
From: Paul Walmsley @ 2011-09-09 0:28 UTC (permalink / raw)
To: linux-pm
Available at
http://pad.ubuntu.com/plumbers-2011-pwr-mgt
in real time if anyone's interested
- Paul
^ permalink raw reply
* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Matt Helsley @ 2011-09-08 18:01 UTC (permalink / raw)
To: Tejun Heo; +Cc: containers, Oleg Nesterov, linux-kernel, linux-pm, paul
In-Reply-To: <20110906032846.GA18425@mtj.dyndns.org>
On Tue, Sep 06, 2011 at 12:28:55PM +0900, Tejun Heo wrote:
> Hello,
>
> On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote:
> > Perhaps it is correct... Just I do not understand what it should do.
> > I thought it is "wait_for_event && do_not_block_freezer". And at first
> > glance the code looks as if it tries to do this. Say, in the "likely"
> > case we restart wait_event_interruptible() after refrigerator().
> >
> > But this looks racy. Suppose that freezing() is already false when
> > try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does
> > freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case?
>
> It may return -ERESTARTSYS when not strictly necessary but given that
> it's supposed to trigger restart anyway I don't think it's actually
> broken (it doesn't break the contract of the wait). Another thing to
> note is that kthread freezing via cgroup is almost fundamentally
> broken. With the removal of freezable_with_signal, this shouldn't be
> an issue anymore although cgroup_freezer still needs to be fixed to
> account for kthreads for xstate transitions (it currently triggers
> BUG_ON).
Looking at the code and docs I realize I didn't explicitly mention that
kthreads could not be frozen by the cgroup freezer. However, the code did
not allow it. When freezing tasks the cgroup freezer always did:
freeze_task(task, true)
which would only freeze tasks without PF_FREEZER_NOSIG due to the second
"sig_only" parameter. I believe this means it could not be used to
freeze kthreads.
My feeling is kthreads should not be "managed" directly by userspace.
Especially if that includes the ability to arbitrarily stop or freeze them.
So it seems more appropriate to explicitly disallow freezing of kthreads
via the cgroup freezer.
Cheers,
-Matt Helsley
^ permalink raw reply
* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Oleg Nesterov @ 2011-09-08 17:59 UTC (permalink / raw)
To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110908010530.GD3987@mtj.dyndns.org>
Hi,
On 09/08, Tejun Heo wrote:
>
> Hello,
>
> On Wed, Sep 07, 2011 at 08:22:17PM +0200, Oleg Nesterov wrote:
> > wait_event_freezable() and wait_event_freezable_timeout() stop
> > the waiting if try_to_freeze() fails. This is not right, we can
> > race with __thaw_task() and in this case
> >
> > - wait_event_freezable() returns the wrong ERESTARTSYS
> >
> > - wait_event_freezable_timeout() can return the positive
> > value while condition == F
>
> Indeed, nice catch. This one actually is pretty dangerous. We
> probably want to make a separate fix for this and backport it to
> -stable?
And I forgot to mention that wait_event_freezable_timeout() doesn't
handle -ERESTARTSYS at all.
But I don't think -stable needs this fix. According to grep, nobody
check the returned value, and none of the callers plays with signals.
> > Change the code to always check __retval/condition before return.
> >
> > Note: with or without this patch the timeout logic looks strange,
> > probably we should recalc timeout if try_to_freeze() returns T.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> Yeap, with freezable_with_signal gone, this looks correct & simpler to
> me
I don't really understand this... set_freezable_with_signal() has a
lot of problems, yes... But even if it wasn't removed this fix makes
sense anyway, afaics.
If freezable_with_signal caller does wait_event_freezable_timeout(),
__retval becomes -ERESTARTSYS after freeze_task(). The next iteration
will return 0 with the KERN_ERR message from schedule_timeout().
> but it would be nice if you can sprinkle some documentation while
> at it. :)
But they already have the comment ;) What can I add?
Oleg.
^ permalink raw reply
* [PATCH] PM / Runtime: pm_runtime_idle can be called in atomic context
From: tom.leiming @ 2011-09-08 16:06 UTC (permalink / raw)
To: rjw, stern; +Cc: linux-pm
From: Ming Lei <tom.leiming@gmail.com>
Add to pm_runtime_idle the list of functions that can be called
in atomic context if pm_runtime_irq_safe() has been called for the
device.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
Documentation/power/runtime_pm.txt | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 08d70e4..1f05404 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -477,6 +477,7 @@ pm_runtime_autosuspend_expiration()
If pm_runtime_irq_safe() has been called for a device then the following helper
functions may also be used in interrupt context:
+pm_runtime_idle()
pm_runtime_suspend()
pm_runtime_autosuspend()
pm_runtime_resume()
--
1.7.4.1
^ permalink raw reply related
* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Kevin Hilman @ 2011-09-08 13:51 UTC (permalink / raw)
To: t-kristo
Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
linux-pm, linux-omap
In-Reply-To: <1315457911.2679.19.camel@sokoban>
Hi Tero,
Tero Kristo <t-kristo@ti.com> writes:
> On Wed, 2011-09-07 at 19:59 +0200, Hilman, Kevin wrote:
>> Tero Kristo <t-kristo@ti.com> writes:
[...]
>> > After thinking about this problem and looking at possible ways to fix
>> > it, I am planning to change the PRCM chain handler to be a driver, which
>> > gets suspended along with the rest of the system. This means the PRCM
>> > interrupt would get disabled also during this time, and it would be
>> > enabled in the driver->complete() call, which should happen after rest
>> > of the drivers have been able to enable their PM runtime in the
>> > driver->resume() call chain. Do you see any problems with this approach?
>>
>> How will the system wakeup from retention or off-mode if the PRCM IRQ is
>> disabled?
>
> This is actually some sort of an issue with retention if we disable PRCM
> irq completely, off works purely with wakeup signals as we come out of
> reset. Anyway, I did some experimentation with this, and OMAP3 is able
> to wake up if we leave WKUP irq up, but disable IO chain irq. IO chain
> events seem to trigger a WKUP event also always, we just postpone the
> processing of IO chain until later. I had to also split the wakeup
> clearing for OMAP3 into 2 parts, one part handles wakeups and another IO
> chain. Currently both IO chain and WKUP are acked by the same handler.
Here's another option, which is kind of a hybrid of what's been
discussed so far.
The PRCM driver would leave the IRQs enabled during suspend, but would
just delay delivering them to the drivers. IOW, handle/clear the PRCM
IRQ normally, but delay delivery of the *device* IRQ until the
->complete callback of the PRCM driver.
Doing this ensures all the drivers are resumed before any device IRQ is
delivered, and doesn't require any additional queuing of events in the
drivers.
Kevin
^ permalink raw reply
* Re: [Xen-devel] RE: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Konrad Rzeszutek Wilk @ 2011-09-08 13:38 UTC (permalink / raw)
To: Cihula, Joseph
Cc: Brown, Len, Jeremy Fitzhardinge, xen-devel@lists.xensource.com,
keir@xen.org, x86@kernel.org, linux-acpi@vger.kernel.org,
tboot-devel@lists.sourceforge.net, Fitzhardinge,
tglx@linutronix.de, hpa@zytor.com, liang.tang@oracle.com,
linux-pm@lists.linux-foundation.org
In-Reply-To: <20110907190659.GH7074@dumpdata.com>
On Wed, Sep 07, 2011 at 03:06:59PM -0400, Konrad Rzeszutek Wilk wrote:
> > > >> +#define XENPF_enter_acpi_sleep 51
> > > >> +struct xenpf_enter_acpi_sleep {
> > > >> + /* IN variables */
> > > >> + uint16_t pm1a_cnt_val; /* PM1a control value. */
> > > >> + uint16_t pm1b_cnt_val; /* PM1b control value. */
> > > > These are uint32_t in native Linux--why truncate in the API and not at use?
> > >
> > > Does ACPI define them as 32 or 16 bit?
> >
> > The spec indicates that the length is variable and could be up to 32 bits (AFAICT). And Linux uses 32b, which your other patch is truncating for this call.
>
> Yikes! Well, looks like we need to fix the Xen ABI too. Lets get that fixed
> and also address all the other comments (thanks for looking at it) you pointed
> out.
So read up the ACPI spec and it says that the minimum is 2 bytes and does not
say anything about the maximum. The list of what the bits do stops at 16-bits
(the last two are reserved) so I think we are actually OK.
Albeit if the spec starts using more of them - then yes we will need to revist
this Xen ABI and potentially add a new call.
^ permalink raw reply
* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Tero Kristo @ 2011-09-08 4:58 UTC (permalink / raw)
To: Hilman, Kevin
Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
linux-pm, linux-omap
In-Reply-To: <87fwk8qllx.fsf@ti.com>
On Wed, 2011-09-07 at 19:59 +0200, Hilman, Kevin wrote:
> Tero Kristo <t-kristo@ti.com> writes:
>
> > Hi,
> >
> > On Sat, 2011-08-27 at 21:42 +0200, Alan Stern wrote:
> >> On Sat, 27 Aug 2011, Santosh wrote:
> >>
> >> > On Saturday 27 August 2011 07:31 PM, Alan Stern wrote:
> >> > > On Sat, 27 Aug 2011, Santosh wrote:
> >> > >
> >> > >> I might be wrong here, but after discussion with Govindraj on this
> >> > >> issue, it seems there is a flaw in the way OMAP chain handler
> >> > >> handling the child interrupts.
> >> > >>
> >> > >> On OMAP, we have special interrupt wakeup source at PRCM level and
> >> > >> many devices can trigger wakeup from low power via this common
> >> > >> interrupt source. The common interrupt source upon wakeup from low
> >> > >> power state, decodes the source of interrupt and based on that
> >> > >> source, calls the respective device ISR directly.
> >> > >>
> >> > >> The issue I see here is, the ISR on _a_ device (UART in this case)
> >> > >> is happening even before UART resume and DPM resume has been completed.
> >> > >> If this is the case, then it is surely asking for trouble. Because not
> >> > >> just clocks, but even driver state machine is already in suspend state
> >> > >> when the ISR is called.
> >> > >
> >> > > If the driver state machine is in the suspend state when the ISR is
> >> > > called, then the ISR should realize it is handling a wakeup event
> >> > > instead of a normal I/O event. All it needs to do is turn off the
> >> > > interrupt source; the event itself will be taken care of during the
> >> > > device's resume callback.
> >> > >
> >> > Good point but the ISR is called as a function call and not real
> >> > hardware event so no need to turn-off the source in the child
> >> > ISR. Parent ISR will clear the source anyways.
> >> >
> >> > But the intention here is to record the event for the child.
> >>
> >> In that case the ISR only has to record the event.
> >>
> >> > I mean for UART wakeup, the received character should be
> >> > extracted. If not done, UART will loose that character because
> >> > the event is lost. So not sure how the event will be taken
> >> > care during resume callback. Could you elaborate bit more on
> >> > last comment please?
> >>
> >> The resume callback routine must check to see if an event was recorded.
> >> If one was, the routine must see whether a character was received while
> >> the system was asleep and extract the character from the UART. (It
> >> probably won't hurt to do this even if an event wasn't recorded.)
> >>
> >> Alan Stern
> >>
> >
> > After thinking about this problem and looking at possible ways to fix
> > it, I am planning to change the PRCM chain handler to be a driver, which
> > gets suspended along with the rest of the system. This means the PRCM
> > interrupt would get disabled also during this time, and it would be
> > enabled in the driver->complete() call, which should happen after rest
> > of the drivers have been able to enable their PM runtime in the
> > driver->resume() call chain. Do you see any problems with this approach?
>
> How will the system wakeup from retention or off-mode if the PRCM IRQ is
> disabled?
This is actually some sort of an issue with retention if we disable PRCM
irq completely, off works purely with wakeup signals as we come out of
reset. Anyway, I did some experimentation with this, and OMAP3 is able
to wake up if we leave WKUP irq up, but disable IO chain irq. IO chain
events seem to trigger a WKUP event also always, we just postpone the
processing of IO chain until later. I had to also split the wakeup
clearing for OMAP3 into 2 parts, one part handles wakeups and another IO
chain. Currently both IO chain and WKUP are acked by the same handler.
> Besides that, I don't like this approach because it leaves a rather long
> window during which no PRCM-triggered wakeup events can happen. This is
> not good since we also want potential wakeup events that happen *during*
> system suspend to be able to cancel the suspend.
This should also be taken care of by the approach described above.
>
> > The only issue I am seeing myself is if some driver decides to do
> > resume() in the complete() pm-op and potentially screwing the ordering
> > here...
>
> Personally, I think Alan's approach is the only scalable approach. This
> has to be handled by the drivers.
>
> If the driver's ISR does a pm_runtime_get_sync() and it fails (in this
> case, with -EACCESS since runtime PM is disabled), the driver should
> handle that handle as Alan described above.
Yea I think this probably needs to be done anyway, at least on some
cases. The PRCM chain handler approach might be able to solve most of
the problems though. I think I will post what I have anyway for comments
hopefully later today, so you can have a look and say what you think.
-Tero
Texas Instruments Oy, Tekniikantie 12, 02150 Espoo. Y-tunnus: 0115040-6. Kotipaikka: Helsinki
^ permalink raw reply
* Re: [PATCH 1/1] freezer: fix wait_event_freezable/__thaw_task races
From: Tejun Heo @ 2011-09-08 1:05 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110907182217.GB13909@redhat.com>
Hello,
On Wed, Sep 07, 2011 at 08:22:17PM +0200, Oleg Nesterov wrote:
> wait_event_freezable() and wait_event_freezable_timeout() stop
> the waiting if try_to_freeze() fails. This is not right, we can
> race with __thaw_task() and in this case
>
> - wait_event_freezable() returns the wrong ERESTARTSYS
>
> - wait_event_freezable_timeout() can return the positive
> value while condition == F
Indeed, nice catch. This one actually is pretty dangerous. We
probably want to make a separate fix for this and backport it to
-stable?
> Change the code to always check __retval/condition before return.
>
> Note: with or without this patch the timeout logic looks strange,
> probably we should recalc timeout if try_to_freeze() returns T.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Yeap, with freezable_with_signal gone, this looks correct & simpler to
me but it would be nice if you can sprinkle some documentation while
at it. :)
Thanks.
--
tejun
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox