Linux Power Management development
 help / color / mirror / Atom feed
* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Kevin Hilman @ 2011-09-07 17:59 UTC (permalink / raw)
  To: t-kristo
  Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
	linux-pm, linux-omap
In-Reply-To: <1315410513.2679.9.camel@sokoban>

Tero Kristo <t-kristo@ti.com> writes:

> Hi,
>
> On Sat, 2011-08-27 at 21:42 +0200, Alan Stern wrote:
>> On Sat, 27 Aug 2011, Santosh wrote:
>> 
>> > On Saturday 27 August 2011 07:31 PM, Alan Stern wrote:
>> > > On Sat, 27 Aug 2011, Santosh wrote:
>> > >
>> > >> I might be wrong here, but after discussion with Govindraj on this
>> > >> issue, it seems there is a flaw in the way OMAP chain handler
>> > >> handling the child interrupts.
>> > >>
>> > >> On OMAP, we have special interrupt wakeup source at PRCM level and
>> > >> many devices can trigger wakeup from low power via this common
>> > >> interrupt source. The common interrupt source upon wakeup from low
>> > >> power state, decodes the source of interrupt and based on that
>> > >> source, calls the respective device ISR directly.
>> > >>
>> > >> The issue I see here is, the ISR on _a_ device (UART in this case)
>> > >> is happening even before UART resume and DPM resume has been completed.
>> > >> If this is the case, then it is surely asking for trouble. Because not
>> > >> just clocks, but even driver state machine is already in suspend state
>> > >> when the ISR is called.
>> > >
>> > > If the driver state machine is in the suspend state when the ISR is
>> > > called, then the ISR should realize it is handling a wakeup event
>> > > instead of a normal I/O event.  All it needs to do is turn off the
>> > > interrupt source; the event itself will be taken care of during the
>> > > device's resume callback.
>> > >
>> > Good point but the ISR is called as a function call and not real
>> > hardware event so no need to turn-off the source in the child
>> > ISR. Parent ISR will clear the source anyways.
>> > 
>> > But the intention here is to record the event for the child.
>> 
>> In that case the ISR only has to record the event.
>> 
>> > I mean for UART wakeup, the received character should be
>> > extracted. If not done, UART will loose that character because
>> > the event is lost. So not sure how the event will be taken
>> > care during resume callback. Could you elaborate bit more on
>> > last comment please?
>> 
>> The resume callback routine must check to see if an event was recorded.
>> If one was, the routine must see whether a character was received while 
>> the system was asleep and extract the character from the UART.  (It 
>> probably won't hurt to do this even if an event wasn't recorded.)
>> 
>> Alan Stern
>> 
>
> After thinking about this problem and looking at possible ways to fix
> it, I am planning to change the PRCM chain handler to be a driver, which
> gets suspended along with the rest of the system. This means the PRCM
> interrupt would get disabled also during this time, and it would be
> enabled in the driver->complete() call, which should happen after rest
> of the drivers have been able to enable their PM runtime in the
> driver->resume() call chain. Do you see any problems with this approach?

How will the system wakeup from retention or off-mode if the PRCM IRQ is
disabled?

Besides that, I don't like this approach because it leaves a rather long
window during which no PRCM-triggered wakeup events can happen.  This is
not good since we also want potential wakeup events that happen *during*
system suspend to be able to cancel the suspend.

> The only issue I am seeing myself is if some driver decides to do
> resume() in the complete() pm-op and potentially screwing the ordering
> here...

Personally, I think Alan's approach is the only scalable approach.  This
has to be handled by the drivers.

If the driver's ISR does a pm_runtime_get_sync() and it fails (in this
case, with -EACCESS since runtime PM is disabled), the driver should
handle that handle as Alan described above.

Kevin

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 2/7] x86, acpi, tboot: Have a ACPI sleep override instead of calling tboot_sleep.
From: Cihula, Joseph @ 2011-09-07 17:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Brown, Len, xen-devel@lists.xensource.com, keir@xen.org,
	Konrad Rzeszutek Wilk, Wang, Shane, x86@kernel.org,
	linux-acpi@vger.kernel.org, tboot-devel@lists.sourceforge.net,
	liang.tang@oracle.com, hpa@zytor.com, tglx@linutronix.de,
	linux-pm@lists.linux-foundation.org
In-Reply-To: <4E67A972.6030909@goop.org>

> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Wednesday, September 07, 2011 10:27 AM
> 
> On 09/06/2011 09:20 PM, Cihula, Joseph wrote:
> >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> >> Sent: Wednesday, August 31, 2011 11:31 AM
> >>
> >> The ACPI suspend path makes a call to tboot_sleep right before it
> >> writes the PM1A, PM1B values. We replace the direct call to tboot via an registration callback
> similar to __acpi_register_gsi.
> >>
> >> CC: Thomas Gleixner <tglx@linutronix.de>
> >> CC: "H. Peter Anvin" <hpa@zytor.com>
> >> CC: x86@kernel.org
> >> CC: Len Brown <len.brown@intel.com>
> >> CC: Joseph Cihula <joseph.cihula@intel.com>
> >> CC: Shane Wang <shane.wang@intel.com>
> >> CC: xen-devel@lists.xensource.com
> >> CC: linux-pm@lists.linux-foundation.org
> >> CC: tboot-devel@lists.sourceforge.net
> >> CC: linux-acpi@vger.kernel.org
> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> ---
> >>  arch/x86/include/asm/acpi.h   |    3 +++
> >>  arch/x86/kernel/acpi/boot.c   |    3 +++
> >>  arch/x86/kernel/tboot.c       |   13 +++++++++----
> >>  drivers/acpi/acpica/hwsleep.c |   12 ++++++++++--
> >>  include/linux/tboot.h         |    3 ++-
> >>  5 files changed, 27 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/acpi.h
> >> b/arch/x86/include/asm/acpi.h index 610001d..49864a1
> >> 100644
> >> --- a/arch/x86/include/asm/acpi.h
> >> +++ b/arch/x86/include/asm/acpi.h
> >> @@ -98,6 +98,9 @@ void acpi_pic_sci_set_trigger(unsigned int, u16);
> >> extern int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
> >>  				  int trigger, int polarity);
> >>
> >> +extern int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
> >> +				    u32 pm1b_ctrl, bool *skip_rest);
> >> +
> >>  static inline void disable_acpi(void)  {
> >>  	acpi_disabled = 1;
> >> diff --git a/arch/x86/kernel/acpi/boot.c
> >> b/arch/x86/kernel/acpi/boot.c index 4558f0d..d191b4c
> >> 100644
> >> --- a/arch/x86/kernel/acpi/boot.c
> >> +++ b/arch/x86/kernel/acpi/boot.c
> >> @@ -552,6 +552,9 @@ static int acpi_register_gsi_ioapic(struct device
> >> *dev, u32 gsi,  int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
> >>  			   int trigger, int polarity) = acpi_register_gsi_pic;
> >>
> >> +int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
> >> +			     u32 pm1b_ctrl, bool *skip_rest) = NULL;
> >> +
> >>  /*
> >>   * success: return IRQ number (>=0)
> >>   * failure: return < 0
> >> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index
> >> 30ac65d..a18070c 100644
> >> --- a/arch/x86/kernel/tboot.c
> >> +++ b/arch/x86/kernel/tboot.c
> >> @@ -41,7 +41,7 @@
> >>  #include <asm/setup.h>
> >>  #include <asm/e820.h>
> >>  #include <asm/io.h>
> >> -
> >> +#include <linux/acpi.h>
> >>  #include "acpi/realmode/wakeup.h"
> >>
> >>  /* Global pointer to shared data; NULL means no measured launch. */
> >> @@ -270,7 +270,8 @@ static void tboot_copy_fadt(const struct acpi_table_fadt *fadt)
> >>  		offsetof(struct acpi_table_facs, firmware_waking_vector);  }
> >>
> >> -void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
> >> +int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
> >> +		 bool *skip_rest)
> > Don't you need to use the 'unused' attrib on skip_rest in order to prevent compiler warnings?
> 
> No, gcc doesn't warn about unused parameters.

-Wunused-parameter

While the kernel may not be compiled with this flag, it wouldn't hurt to specify it anyway; but it's not a big issue.

Joe

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Cihula, Joseph @ 2011-09-07 17:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Brown, Len, xen-devel@lists.xensource.com, hpa@zytor.com,
	Konrad Rzeszutek Wilk, x86@kernel.org, Fitzhardinge,
	linux-acpi@vger.kernel.org, tboot-devel@lists.sourceforge.net,
	liang.tang@oracle.com, linux-pm@lists.linux-foundation.org,
	keir@xen.org, tglx@linutronix.de
In-Reply-To: <4E67A9E7.2020802@goop.org>

> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
> Sent: Wednesday, September 07, 2011 10:29 AM
> 
> On 09/06/2011 10:50 PM, Cihula, Joseph wrote:
> >> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> >> Sent: Wednesday, August 31, 2011 11:31 AM
> >>
> >> From: Yu Ke <ke.yu@intel.com>
> >>
> >> This patches implements the xen_platform_op hypercall, to pass the parsed ACPI info to
> hypervisor.
> >>
> >> Signed-off-by: Yu Ke <ke.yu@intel.com>
> >> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> >> [v1: Added DEFINE_GUEST.. in appropiate headers]
> >> [v2: Ripped out typedefs]
> >> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> ---
> >>  arch/ia64/include/asm/xen/interface.h |    1 +
> >>  arch/x86/include/asm/xen/interface.h  |    1 +
> >>  include/xen/interface/platform.h      |  320 +++++++++++++++++++++++++++++++++
> >>  include/xen/interface/xen.h           |    1 +
> >>  4 files changed, 323 insertions(+), 0 deletions(-)  create mode
> >> 100644 include/xen/interface/platform.h
> >>
> >> diff --git a/arch/ia64/include/asm/xen/interface.h
> >> b/arch/ia64/include/asm/xen/interface.h
> >> index e951e74..1d2427d 100644
> >> --- a/arch/ia64/include/asm/xen/interface.h
> >> +++ b/arch/ia64/include/asm/xen/interface.h
> >> @@ -76,6 +76,7 @@ DEFINE_GUEST_HANDLE(char);
> >> DEFINE_GUEST_HANDLE(int); DEFINE_GUEST_HANDLE(long);
> >> DEFINE_GUEST_HANDLE(void);
> >> +DEFINE_GUEST_HANDLE(uint64_t);
> >>
> >>  typedef unsigned long xen_pfn_t;
> >>  DEFINE_GUEST_HANDLE(xen_pfn_t);
> >> diff --git a/arch/x86/include/asm/xen/interface.h
> >> b/arch/x86/include/asm/xen/interface.h
> >> index 5d4922a..a1f2db5 100644
> >> --- a/arch/x86/include/asm/xen/interface.h
> >> +++ b/arch/x86/include/asm/xen/interface.h
> >> @@ -55,6 +55,7 @@ DEFINE_GUEST_HANDLE(char);
> >> DEFINE_GUEST_HANDLE(int); DEFINE_GUEST_HANDLE(long);
> >> DEFINE_GUEST_HANDLE(void);
> >> +DEFINE_GUEST_HANDLE(uint64_t);
> >>  #endif
> >>
> >>  #ifndef HYPERVISOR_VIRT_START
> >> diff --git a/include/xen/interface/platform.h
> >> b/include/xen/interface/platform.h
> >> new file mode 100644
> >> index 0000000..c168468
> >> --- /dev/null
> >> +++ b/include/xen/interface/platform.h
> > Why are you adding so many new hypercalls that aren't being used in this patchset?
> 
> May as well bring in all the platform-related hypercall definitions at once rather than be
> piecemeal.
> 
> >> @@ -0,0 +1,320 @@
> >> +/*******************************************************************
> >> +***
> >> +********
> >> + * platform.h
> >> + *
> >> + * Hardware platform operations. Intended for use by domain-0 kernel.
> >> + *
> >> + * Permission is hereby granted, free of charge, to any person
> >> +obtaining a copy
> >> + * of this software and associated documentation files (the
> >> +"Software"), to
> >> + * deal in the Software without restriction, including without
> >> +limitation the
> >> + * rights to use, copy, modify, merge, publish, distribute,
> >> +sublicense, and/or
> >> + * sell copies of the Software, and to permit persons to whom the
> >> +Software is
> >> + * furnished to do so, subject to the following conditions:
> >> + *
> >> + * The above copyright notice and this permission notice shall be
> >> +included in
> >> + * all copies or substantial portions of the Software.
> >> + *
> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >> +EXPRESS OR
> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >> +MERCHANTABILITY,
> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> >> +SHALL THE
> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> >> +OTHER
> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >> +ARISING
> >> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> >> +OTHER
> >> + * DEALINGS IN THE SOFTWARE.
> >> + *
> >> + * Copyright (c) 2002-2006, K Fraser  */
> >> +
> >> +#ifndef __XEN_PUBLIC_PLATFORM_H__
> >> +#define __XEN_PUBLIC_PLATFORM_H__
> >> +
> >> +#include "xen.h"
> >> +
> >> +#define XENPF_INTERFACE_VERSION 0x03000001
> >> +
> >> +/*
> >> + * Set clock such that it would read <secs,nsecs> after 00:00:00
> >> +UTC,
> >> + * 1 January, 1970 if the current system time was <system_time>.
> >> + */
> >> +#define XENPF_settime             17
> >> +struct xenpf_settime {
> >> +	/* IN variables. */
> >> +	uint32_t secs;
> >> +	uint32_t nsecs;
> >> +	uint64_t system_time;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_settime_t);
> >> +
> >> +/*
> >> + * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
> >> + * On x86, @type is an architecture-defined MTRR memory type.
> >> + * On success, returns the MTRR that was used (@reg) and a handle
> >> +that can
> >> + * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting.
> >> + * (x86-specific).
> >> + */
> >> +#define XENPF_add_memtype         31
> >> +struct xenpf_add_memtype {
> >> +	/* IN variables. */
> >> +	unsigned long mfn;
> >> +	uint64_t nr_mfns;
> >> +	uint32_t type;
> >> +	/* OUT variables. */
> >> +	uint32_t handle;
> >> +	uint32_t reg;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_add_memtype_t);
> >> +
> >> +/*
> >> + * Tear down an existing memory-range type. If @handle is remembered
> >> +then it
> >> + * should be passed in to accurately tear down the correct setting
> >> +(in case
> >> + * of overlapping memory regions with differing types). If it is not
> >> +known
> >> + * then @handle should be set to zero. In all cases @reg must be set.
> >> + * (x86-specific).
> >> + */
> >> +#define XENPF_del_memtype         32
> >> +struct xenpf_del_memtype {
> >> +	/* IN variables. */
> >> +	uint32_t handle;
> >> +	uint32_t reg;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_del_memtype_t);
> >> +
> >> +/* Read current type of an MTRR (x86-specific). */
> >> +#define XENPF_read_memtype        33
> >> +struct xenpf_read_memtype {
> >> +	/* IN variables. */
> >> +	uint32_t reg;
> >> +	/* OUT variables. */
> >> +	unsigned long mfn;
> >> +	uint64_t nr_mfns;
> >> +	uint32_t type;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_read_memtype_t);
> >> +
> >> +#define XENPF_microcode_update    35
> >> +struct xenpf_microcode_update {
> >> +	/* IN variables. */
> >> +	GUEST_HANDLE(void) data;          /* Pointer to microcode data */
> >> +	uint32_t length;                  /* Length of microcode data. */
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_microcode_update_t);
> >> +
> >> +#define XENPF_platform_quirk      39
> >> +#define QUIRK_NOIRQBALANCING      1 /* Do not restrict IO-APIC RTE targets */
> >> +#define QUIRK_IOAPIC_BAD_REGSEL   2 /* IO-APIC REGSEL forgets its value    */
> >> +#define QUIRK_IOAPIC_GOOD_REGSEL  3 /* IO-APIC REGSEL behaves properly     */
> >> +struct xenpf_platform_quirk {
> >> +	/* IN variables. */
> >> +	uint32_t quirk_id;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t);
> >> +
> >> +#define XENPF_firmware_info       50
> >> +#define XEN_FW_DISK_INFO          1 /* from int 13 AH=08/41/48 */
> >> +#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
> >> +#define XEN_FW_VBEDDC_INFO        3 /* from int 10 AX=4f15 */
> >> +struct xenpf_firmware_info {
> >> +	/* IN variables. */
> >> +	uint32_t type;
> >> +	uint32_t index;
> >> +	/* OUT variables. */
> >> +	union {
> >> +		struct {
> >> +			/* Int13, Fn48: Check Extensions Present. */
> >> +			uint8_t device;                   /* %dl: bios device number */
> >> +			uint8_t version;                  /* %ah: major version      */
> >> +			uint16_t interface_support;       /* %cx: support bitmap     */
> >> +			/* Int13, Fn08: Legacy Get Device Parameters. */
> >> +			uint16_t legacy_max_cylinder;     /* %cl[7:6]:%ch: max cyl # */
> >> +			uint8_t legacy_max_head;          /* %dh: max head #         */
> >> +			uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector #  */
> >> +			/* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */
> >> +			/* NB. First uint16_t of buffer must be set to buffer size.      */
> >> +			GUEST_HANDLE(void) edd_params;
> >> +		} disk_info; /* XEN_FW_DISK_INFO */
> >> +		struct {
> >> +			uint8_t device;                   /* bios device number  */
> >> +			uint32_t mbr_signature;           /* offset 0x1b8 in mbr */
> >> +		} disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */
> >> +		struct {
> >> +			/* Int10, AX=4F15: Get EDID info. */
> >> +			uint8_t capabilities;
> >> +			uint8_t edid_transfer_time;
> >> +			/* must refer to 128-byte buffer */
> >> +			GUEST_HANDLE(uchar) edid;
> >> +		} vbeddc_info; /* XEN_FW_VBEDDC_INFO */
> >> +	} u;
> >> +};
> >> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_firmware_info_t);
> >> +
> >> +#define XENPF_enter_acpi_sleep    51
> >> +struct xenpf_enter_acpi_sleep {
> >> +	/* IN variables */
> >> +	uint16_t pm1a_cnt_val;      /* PM1a control value. */
> >> +	uint16_t pm1b_cnt_val;      /* PM1b control value. */
> > These are uint32_t in native Linux--why truncate in the API and not at use?
> 
> Does ACPI define them as 32 or 16 bit?

The spec indicates that the length is variable and could be up to 32 bits (AFAICT).  And Linux uses 32b, which your other patch is truncating for this call.

Joe

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Jeremy Fitzhardinge @ 2011-09-07 17:29 UTC (permalink / raw)
  To: Cihula, Joseph
  Cc: Brown, Len, xen-devel@lists.xensource.com, keir@xen.org,
	Fitzhardinge, Konrad Rzeszutek Wilk, Wang, Shane, x86@kernel.org,
	Jeremy, linux-acpi@vger.kernel.org,
	tboot-devel@lists.sourceforge.net, liang.tang@oracle.com,
	hpa@zytor.com, tglx@linutronix.de,
	linux-pm@lists.linux-foundation.org
In-Reply-To: <9F57BF860713DF4BA3EFA4F8C6DFEDAC16F3F665@ORSMSX101.amr.corp.intel.com>

On 09/06/2011 10:50 PM, Cihula, Joseph wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: Wednesday, August 31, 2011 11:31 AM
>>
>> From: Yu Ke <ke.yu@intel.com>
>>
>> This patches implements the xen_platform_op hypercall, to pass the parsed ACPI info to hypervisor.
>>
>> Signed-off-by: Yu Ke <ke.yu@intel.com>
>> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>> [v1: Added DEFINE_GUEST.. in appropiate headers]
>> [v2: Ripped out typedefs]
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  arch/ia64/include/asm/xen/interface.h |    1 +
>>  arch/x86/include/asm/xen/interface.h  |    1 +
>>  include/xen/interface/platform.h      |  320 +++++++++++++++++++++++++++++++++
>>  include/xen/interface/xen.h           |    1 +
>>  4 files changed, 323 insertions(+), 0 deletions(-)  create mode 100644
>> include/xen/interface/platform.h
>>
>> diff --git a/arch/ia64/include/asm/xen/interface.h b/arch/ia64/include/asm/xen/interface.h
>> index e951e74..1d2427d 100644
>> --- a/arch/ia64/include/asm/xen/interface.h
>> +++ b/arch/ia64/include/asm/xen/interface.h
>> @@ -76,6 +76,7 @@ DEFINE_GUEST_HANDLE(char);  DEFINE_GUEST_HANDLE(int);
>> DEFINE_GUEST_HANDLE(long);  DEFINE_GUEST_HANDLE(void);
>> +DEFINE_GUEST_HANDLE(uint64_t);
>>
>>  typedef unsigned long xen_pfn_t;
>>  DEFINE_GUEST_HANDLE(xen_pfn_t);
>> diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
>> index 5d4922a..a1f2db5 100644
>> --- a/arch/x86/include/asm/xen/interface.h
>> +++ b/arch/x86/include/asm/xen/interface.h
>> @@ -55,6 +55,7 @@ DEFINE_GUEST_HANDLE(char);  DEFINE_GUEST_HANDLE(int);
>> DEFINE_GUEST_HANDLE(long);  DEFINE_GUEST_HANDLE(void);
>> +DEFINE_GUEST_HANDLE(uint64_t);
>>  #endif
>>
>>  #ifndef HYPERVISOR_VIRT_START
>> diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
>> new file mode 100644
>> index 0000000..c168468
>> --- /dev/null
>> +++ b/include/xen/interface/platform.h
> Why are you adding so many new hypercalls that aren't being used in this patchset?

May as well bring in all the platform-related hypercall definitions at
once rather than be piecemeal.

>> @@ -0,0 +1,320 @@
>> +/**********************************************************************
>> +********
>> + * platform.h
>> + *
>> + * Hardware platform operations. Intended for use by domain-0 kernel.
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> +obtaining a copy
>> + * of this software and associated documentation files (the
>> +"Software"), to
>> + * deal in the Software without restriction, including without
>> +limitation the
>> + * rights to use, copy, modify, merge, publish, distribute, sublicense,
>> +and/or
>> + * sell copies of the Software, and to permit persons to whom the
>> +Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> +included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> +EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> +MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
>> +SHALL THE
>> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> +OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> +ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> + * DEALINGS IN THE SOFTWARE.
>> + *
>> + * Copyright (c) 2002-2006, K Fraser
>> + */
>> +
>> +#ifndef __XEN_PUBLIC_PLATFORM_H__
>> +#define __XEN_PUBLIC_PLATFORM_H__
>> +
>> +#include "xen.h"
>> +
>> +#define XENPF_INTERFACE_VERSION 0x03000001
>> +
>> +/*
>> + * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC,
>> + * 1 January, 1970 if the current system time was <system_time>.
>> + */
>> +#define XENPF_settime             17
>> +struct xenpf_settime {
>> +	/* IN variables. */
>> +	uint32_t secs;
>> +	uint32_t nsecs;
>> +	uint64_t system_time;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_settime_t);
>> +
>> +/*
>> + * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
>> + * On x86, @type is an architecture-defined MTRR memory type.
>> + * On success, returns the MTRR that was used (@reg) and a handle that
>> +can
>> + * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting.
>> + * (x86-specific).
>> + */
>> +#define XENPF_add_memtype         31
>> +struct xenpf_add_memtype {
>> +	/* IN variables. */
>> +	unsigned long mfn;
>> +	uint64_t nr_mfns;
>> +	uint32_t type;
>> +	/* OUT variables. */
>> +	uint32_t handle;
>> +	uint32_t reg;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_add_memtype_t);
>> +
>> +/*
>> + * Tear down an existing memory-range type. If @handle is remembered
>> +then it
>> + * should be passed in to accurately tear down the correct setting (in
>> +case
>> + * of overlapping memory regions with differing types). If it is not
>> +known
>> + * then @handle should be set to zero. In all cases @reg must be set.
>> + * (x86-specific).
>> + */
>> +#define XENPF_del_memtype         32
>> +struct xenpf_del_memtype {
>> +	/* IN variables. */
>> +	uint32_t handle;
>> +	uint32_t reg;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_del_memtype_t);
>> +
>> +/* Read current type of an MTRR (x86-specific). */
>> +#define XENPF_read_memtype        33
>> +struct xenpf_read_memtype {
>> +	/* IN variables. */
>> +	uint32_t reg;
>> +	/* OUT variables. */
>> +	unsigned long mfn;
>> +	uint64_t nr_mfns;
>> +	uint32_t type;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_read_memtype_t);
>> +
>> +#define XENPF_microcode_update    35
>> +struct xenpf_microcode_update {
>> +	/* IN variables. */
>> +	GUEST_HANDLE(void) data;          /* Pointer to microcode data */
>> +	uint32_t length;                  /* Length of microcode data. */
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_microcode_update_t);
>> +
>> +#define XENPF_platform_quirk      39
>> +#define QUIRK_NOIRQBALANCING      1 /* Do not restrict IO-APIC RTE targets */
>> +#define QUIRK_IOAPIC_BAD_REGSEL   2 /* IO-APIC REGSEL forgets its value    */
>> +#define QUIRK_IOAPIC_GOOD_REGSEL  3 /* IO-APIC REGSEL behaves properly     */
>> +struct xenpf_platform_quirk {
>> +	/* IN variables. */
>> +	uint32_t quirk_id;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t);
>> +
>> +#define XENPF_firmware_info       50
>> +#define XEN_FW_DISK_INFO          1 /* from int 13 AH=08/41/48 */
>> +#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
>> +#define XEN_FW_VBEDDC_INFO        3 /* from int 10 AX=4f15 */
>> +struct xenpf_firmware_info {
>> +	/* IN variables. */
>> +	uint32_t type;
>> +	uint32_t index;
>> +	/* OUT variables. */
>> +	union {
>> +		struct {
>> +			/* Int13, Fn48: Check Extensions Present. */
>> +			uint8_t device;                   /* %dl: bios device number */
>> +			uint8_t version;                  /* %ah: major version      */
>> +			uint16_t interface_support;       /* %cx: support bitmap     */
>> +			/* Int13, Fn08: Legacy Get Device Parameters. */
>> +			uint16_t legacy_max_cylinder;     /* %cl[7:6]:%ch: max cyl # */
>> +			uint8_t legacy_max_head;          /* %dh: max head #         */
>> +			uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector #  */
>> +			/* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */
>> +			/* NB. First uint16_t of buffer must be set to buffer size.      */
>> +			GUEST_HANDLE(void) edd_params;
>> +		} disk_info; /* XEN_FW_DISK_INFO */
>> +		struct {
>> +			uint8_t device;                   /* bios device number  */
>> +			uint32_t mbr_signature;           /* offset 0x1b8 in mbr */
>> +		} disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */
>> +		struct {
>> +			/* Int10, AX=4F15: Get EDID info. */
>> +			uint8_t capabilities;
>> +			uint8_t edid_transfer_time;
>> +			/* must refer to 128-byte buffer */
>> +			GUEST_HANDLE(uchar) edid;
>> +		} vbeddc_info; /* XEN_FW_VBEDDC_INFO */
>> +	} u;
>> +};
>> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_firmware_info_t);
>> +
>> +#define XENPF_enter_acpi_sleep    51
>> +struct xenpf_enter_acpi_sleep {
>> +	/* IN variables */
>> +	uint16_t pm1a_cnt_val;      /* PM1a control value. */
>> +	uint16_t pm1b_cnt_val;      /* PM1b control value. */
> These are uint32_t in native Linux--why truncate in the API and not at use?

Does ACPI define them as 32 or 16 bit?

    J

^ permalink raw reply

* Re: [Xen-devel] RE: [PATCH 2/7] x86, acpi, tboot: Have a ACPI sleep override instead of calling tboot_sleep.
From: Jeremy Fitzhardinge @ 2011-09-07 17:27 UTC (permalink / raw)
  To: Cihula, Joseph
  Cc: Brown, Len, xen-devel@lists.xensource.com, keir@xen.org,
	Konrad Rzeszutek Wilk, Wang, Shane, x86@kernel.org,
	linux-acpi@vger.kernel.org, tboot-devel@lists.sourceforge.net,
	liang.tang@oracle.com, hpa@zytor.com, tglx@linutronix.de,
	linux-pm@lists.linux-foundation.org
In-Reply-To: <9F57BF860713DF4BA3EFA4F8C6DFEDAC16F3F524@ORSMSX101.amr.corp.intel.com>

On 09/06/2011 09:20 PM, Cihula, Joseph wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
>> Sent: Wednesday, August 31, 2011 11:31 AM
>>
>> The ACPI suspend path makes a call to tboot_sleep right before it writes the PM1A, PM1B values. We
>> replace the direct call to tboot via an registration callback similar to __acpi_register_gsi.
>>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: x86@kernel.org
>> CC: Len Brown <len.brown@intel.com>
>> CC: Joseph Cihula <joseph.cihula@intel.com>
>> CC: Shane Wang <shane.wang@intel.com>
>> CC: xen-devel@lists.xensource.com
>> CC: linux-pm@lists.linux-foundation.org
>> CC: tboot-devel@lists.sourceforge.net
>> CC: linux-acpi@vger.kernel.org
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>>  arch/x86/include/asm/acpi.h   |    3 +++
>>  arch/x86/kernel/acpi/boot.c   |    3 +++
>>  arch/x86/kernel/tboot.c       |   13 +++++++++----
>>  drivers/acpi/acpica/hwsleep.c |   12 ++++++++++--
>>  include/linux/tboot.h         |    3 ++-
>>  5 files changed, 27 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index 610001d..49864a1
>> 100644
>> --- a/arch/x86/include/asm/acpi.h
>> +++ b/arch/x86/include/asm/acpi.h
>> @@ -98,6 +98,9 @@ void acpi_pic_sci_set_trigger(unsigned int, u16);  extern int
>> (*__acpi_register_gsi)(struct device *dev, u32 gsi,
>>  				  int trigger, int polarity);
>>
>> +extern int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
>> +				    u32 pm1b_ctrl, bool *skip_rest);
>> +
>>  static inline void disable_acpi(void)
>>  {
>>  	acpi_disabled = 1;
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 4558f0d..d191b4c
>> 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -552,6 +552,9 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,  int
>> (*__acpi_register_gsi)(struct device *dev, u32 gsi,
>>  			   int trigger, int polarity) = acpi_register_gsi_pic;
>>
>> +int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
>> +			     u32 pm1b_ctrl, bool *skip_rest) = NULL;
>> +
>>  /*
>>   * success: return IRQ number (>=0)
>>   * failure: return < 0
>> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 30ac65d..a18070c 100644
>> --- a/arch/x86/kernel/tboot.c
>> +++ b/arch/x86/kernel/tboot.c
>> @@ -41,7 +41,7 @@
>>  #include <asm/setup.h>
>>  #include <asm/e820.h>
>>  #include <asm/io.h>
>> -
>> +#include <linux/acpi.h>
>>  #include "acpi/realmode/wakeup.h"
>>
>>  /* Global pointer to shared data; NULL means no measured launch. */ @@ -270,7 +270,8 @@ static
>> void tboot_copy_fadt(const struct acpi_table_fadt *fadt)
>>  		offsetof(struct acpi_table_facs, firmware_waking_vector);  }
>>
>> -void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
>> +int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
>> +		 bool *skip_rest)
> Don't you need to use the 'unused' attrib on skip_rest in order to prevent compiler warnings?

No, gcc doesn't warn about unused parameters.

    J

^ permalink raw reply

* Re: [PATCH] vt/suspend: cleanup #if defined uglyness and fix compile error
From: Arnd Bergmann @ 2011-09-07 16:49 UTC (permalink / raw)
  To: H Hartley Sweeten
  Cc: len.brown, art, gregkh, Linux Kernel, linux-pm, jslaby, akpm
In-Reply-To: <201109061404.10301.hartleys@visionengravers.com>

On Tuesday 06 September 2011, H Hartley Sweeten wrote:
> rder to cleanup
> the #if defined ugliness for the vt suspend support functions. Note that
> CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.
> 
> The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
> CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
> not set:
> 
> drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
> include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here
> 
> Also, remove the incorrect path from the comment in console.c.
> 
> Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
> Cc: Arnd Bergmann <arnd@arnd.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Greg Kroah-Hartman <gregkh@suse.de>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Arthur Taylor <art@ified.ca>
> Cc: Jiri Slaby <jslaby@suse.cz>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply

* Re: Issue: Runtime API usage in wake-up device irq_handler during wakeup from system-wide-suspend.
From: Tero Kristo @ 2011-09-07 15:48 UTC (permalink / raw)
  To: Alan Stern
  Cc: Balbi, Felipe, Basak, Partha, R, Govindraj, Munegowda, Keshava,
	linux-pm, linux-omap
In-Reply-To: <Pine.LNX.4.44L0.1108271537300.21403-100000@netrider.rowland.org>

Hi,

On Sat, 2011-08-27 at 21:42 +0200, Alan Stern wrote:
> On Sat, 27 Aug 2011, Santosh wrote:
> 
> > On Saturday 27 August 2011 07:31 PM, Alan Stern wrote:
> > > On Sat, 27 Aug 2011, Santosh wrote:
> > >
> > >> I might be wrong here, but after discussion with Govindraj on this
> > >> issue, it seems there is a flaw in the way OMAP chain handler
> > >> handling the child interrupts.
> > >>
> > >> On OMAP, we have special interrupt wakeup source at PRCM level and
> > >> many devices can trigger wakeup from low power via this common
> > >> interrupt source. The common interrupt source upon wakeup from low
> > >> power state, decodes the source of interrupt and based on that
> > >> source, calls the respective device ISR directly.
> > >>
> > >> The issue I see here is, the ISR on _a_ device (UART in this case)
> > >> is happening even before UART resume and DPM resume has been completed.
> > >> If this is the case, then it is surely asking for trouble. Because not
> > >> just clocks, but even driver state machine is already in suspend state
> > >> when the ISR is called.
> > >
> > > If the driver state machine is in the suspend state when the ISR is
> > > called, then the ISR should realize it is handling a wakeup event
> > > instead of a normal I/O event.  All it needs to do is turn off the
> > > interrupt source; the event itself will be taken care of during the
> > > device's resume callback.
> > >
> > Good point but the ISR is called as a function call and not real
> > hardware event so no need to turn-off the source in the child
> > ISR. Parent ISR will clear the source anyways.
> > 
> > But the intention here is to record the event for the child.
> 
> In that case the ISR only has to record the event.
> 
> > I mean for UART wakeup, the received character should be
> > extracted. If not done, UART will loose that character because
> > the event is lost. So not sure how the event will be taken
> > care during resume callback. Could you elaborate bit more on
> > last comment please?
> 
> The resume callback routine must check to see if an event was recorded.
> If one was, the routine must see whether a character was received while 
> the system was asleep and extract the character from the UART.  (It 
> probably won't hurt to do this even if an event wasn't recorded.)
> 
> Alan Stern
> 

After thinking about this problem and looking at possible ways to fix
it, I am planning to change the PRCM chain handler to be a driver, which
gets suspended along with the rest of the system. This means the PRCM
interrupt would get disabled also during this time, and it would be
enabled in the driver->complete() call, which should happen after rest
of the drivers have been able to enable their PM runtime in the
driver->resume() call chain. Do you see any problems with this approach?
The only issue I am seeing myself is if some driver decides to do
resume() in the complete() pm-op and potentially screwing the ordering
here...

-Tero



Texas Instruments Oy, Tekniikantie 12, 02150 Espoo. Y-tunnus: 0115040-6. Kotipaikka: Helsinki
 

^ permalink raw reply

* Re: [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Cihula, Joseph @ 2011-09-07  5:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, x86@kernel.org, tglx@linutronix.de,
	tboot-devel
  Cc: xen-devel@lists.xensource.com, Jeremy Fitzhardinge
In-Reply-To: <1314815484-4668-6-git-send-email-konrad.wilk@oracle.com>

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 31, 2011 11:31 AM
> 
> From: Yu Ke <ke.yu@intel.com>
> 
> This patches implements the xen_platform_op hypercall, to pass the parsed ACPI info to hypervisor.
> 
> Signed-off-by: Yu Ke <ke.yu@intel.com>
> Signed-off-by: Tian Kevin <kevin.tian@intel.com>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> [v1: Added DEFINE_GUEST.. in appropiate headers]
> [v2: Ripped out typedefs]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/ia64/include/asm/xen/interface.h |    1 +
>  arch/x86/include/asm/xen/interface.h  |    1 +
>  include/xen/interface/platform.h      |  320 +++++++++++++++++++++++++++++++++
>  include/xen/interface/xen.h           |    1 +
>  4 files changed, 323 insertions(+), 0 deletions(-)  create mode 100644
> include/xen/interface/platform.h
> 
> diff --git a/arch/ia64/include/asm/xen/interface.h b/arch/ia64/include/asm/xen/interface.h
> index e951e74..1d2427d 100644
> --- a/arch/ia64/include/asm/xen/interface.h
> +++ b/arch/ia64/include/asm/xen/interface.h
> @@ -76,6 +76,7 @@ DEFINE_GUEST_HANDLE(char);  DEFINE_GUEST_HANDLE(int);
> DEFINE_GUEST_HANDLE(long);  DEFINE_GUEST_HANDLE(void);
> +DEFINE_GUEST_HANDLE(uint64_t);
> 
>  typedef unsigned long xen_pfn_t;
>  DEFINE_GUEST_HANDLE(xen_pfn_t);
> diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
> index 5d4922a..a1f2db5 100644
> --- a/arch/x86/include/asm/xen/interface.h
> +++ b/arch/x86/include/asm/xen/interface.h
> @@ -55,6 +55,7 @@ DEFINE_GUEST_HANDLE(char);  DEFINE_GUEST_HANDLE(int);
> DEFINE_GUEST_HANDLE(long);  DEFINE_GUEST_HANDLE(void);
> +DEFINE_GUEST_HANDLE(uint64_t);
>  #endif
> 
>  #ifndef HYPERVISOR_VIRT_START
> diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
> new file mode 100644
> index 0000000..c168468
> --- /dev/null
> +++ b/include/xen/interface/platform.h

Why are you adding so many new hypercalls that aren't being used in this patchset?

> @@ -0,0 +1,320 @@
> +/**********************************************************************
> +********
> + * platform.h
> + *
> + * Hardware platform operations. Intended for use by domain-0 kernel.
> + *
> + * Permission is hereby granted, free of charge, to any person
> +obtaining a copy
> + * of this software and associated documentation files (the
> +"Software"), to
> + * deal in the Software without restriction, including without
> +limitation the
> + * rights to use, copy, modify, merge, publish, distribute, sublicense,
> +and/or
> + * sell copies of the Software, and to permit persons to whom the
> +Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> +included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> +EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> +MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
> +SHALL THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> +OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> +ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Copyright (c) 2002-2006, K Fraser
> + */
> +
> +#ifndef __XEN_PUBLIC_PLATFORM_H__
> +#define __XEN_PUBLIC_PLATFORM_H__
> +
> +#include "xen.h"
> +
> +#define XENPF_INTERFACE_VERSION 0x03000001
> +
> +/*
> + * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC,
> + * 1 January, 1970 if the current system time was <system_time>.
> + */
> +#define XENPF_settime             17
> +struct xenpf_settime {
> +	/* IN variables. */
> +	uint32_t secs;
> +	uint32_t nsecs;
> +	uint64_t system_time;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_settime_t);
> +
> +/*
> + * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
> + * On x86, @type is an architecture-defined MTRR memory type.
> + * On success, returns the MTRR that was used (@reg) and a handle that
> +can
> + * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting.
> + * (x86-specific).
> + */
> +#define XENPF_add_memtype         31
> +struct xenpf_add_memtype {
> +	/* IN variables. */
> +	unsigned long mfn;
> +	uint64_t nr_mfns;
> +	uint32_t type;
> +	/* OUT variables. */
> +	uint32_t handle;
> +	uint32_t reg;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_add_memtype_t);
> +
> +/*
> + * Tear down an existing memory-range type. If @handle is remembered
> +then it
> + * should be passed in to accurately tear down the correct setting (in
> +case
> + * of overlapping memory regions with differing types). If it is not
> +known
> + * then @handle should be set to zero. In all cases @reg must be set.
> + * (x86-specific).
> + */
> +#define XENPF_del_memtype         32
> +struct xenpf_del_memtype {
> +	/* IN variables. */
> +	uint32_t handle;
> +	uint32_t reg;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_del_memtype_t);
> +
> +/* Read current type of an MTRR (x86-specific). */
> +#define XENPF_read_memtype        33
> +struct xenpf_read_memtype {
> +	/* IN variables. */
> +	uint32_t reg;
> +	/* OUT variables. */
> +	unsigned long mfn;
> +	uint64_t nr_mfns;
> +	uint32_t type;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_read_memtype_t);
> +
> +#define XENPF_microcode_update    35
> +struct xenpf_microcode_update {
> +	/* IN variables. */
> +	GUEST_HANDLE(void) data;          /* Pointer to microcode data */
> +	uint32_t length;                  /* Length of microcode data. */
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_microcode_update_t);
> +
> +#define XENPF_platform_quirk      39
> +#define QUIRK_NOIRQBALANCING      1 /* Do not restrict IO-APIC RTE targets */
> +#define QUIRK_IOAPIC_BAD_REGSEL   2 /* IO-APIC REGSEL forgets its value    */
> +#define QUIRK_IOAPIC_GOOD_REGSEL  3 /* IO-APIC REGSEL behaves properly     */
> +struct xenpf_platform_quirk {
> +	/* IN variables. */
> +	uint32_t quirk_id;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t);
> +
> +#define XENPF_firmware_info       50
> +#define XEN_FW_DISK_INFO          1 /* from int 13 AH=08/41/48 */
> +#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
> +#define XEN_FW_VBEDDC_INFO        3 /* from int 10 AX=4f15 */
> +struct xenpf_firmware_info {
> +	/* IN variables. */
> +	uint32_t type;
> +	uint32_t index;
> +	/* OUT variables. */
> +	union {
> +		struct {
> +			/* Int13, Fn48: Check Extensions Present. */
> +			uint8_t device;                   /* %dl: bios device number */
> +			uint8_t version;                  /* %ah: major version      */
> +			uint16_t interface_support;       /* %cx: support bitmap     */
> +			/* Int13, Fn08: Legacy Get Device Parameters. */
> +			uint16_t legacy_max_cylinder;     /* %cl[7:6]:%ch: max cyl # */
> +			uint8_t legacy_max_head;          /* %dh: max head #         */
> +			uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector #  */
> +			/* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */
> +			/* NB. First uint16_t of buffer must be set to buffer size.      */
> +			GUEST_HANDLE(void) edd_params;
> +		} disk_info; /* XEN_FW_DISK_INFO */
> +		struct {
> +			uint8_t device;                   /* bios device number  */
> +			uint32_t mbr_signature;           /* offset 0x1b8 in mbr */
> +		} disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */
> +		struct {
> +			/* Int10, AX=4F15: Get EDID info. */
> +			uint8_t capabilities;
> +			uint8_t edid_transfer_time;
> +			/* must refer to 128-byte buffer */
> +			GUEST_HANDLE(uchar) edid;
> +		} vbeddc_info; /* XEN_FW_VBEDDC_INFO */
> +	} u;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_firmware_info_t);
> +
> +#define XENPF_enter_acpi_sleep    51
> +struct xenpf_enter_acpi_sleep {
> +	/* IN variables */
> +	uint16_t pm1a_cnt_val;      /* PM1a control value. */
> +	uint16_t pm1b_cnt_val;      /* PM1b control value. */

These are uint32_t in native Linux--why truncate in the API and not at use?

> +	uint32_t sleep_state;       /* Which state to enter (Sn). */
> +	uint32_t flags;             /* Must be zero. */
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_enter_acpi_sleep_t);
> +
> +#define XENPF_change_freq         52
> +struct xenpf_change_freq {
> +	/* IN variables */
> +	uint32_t flags; /* Must be zero. */
> +	uint32_t cpu;   /* Physical cpu. */
> +	uint64_t freq;  /* New frequency (Hz). */ };
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_change_freq_t);
> +
> +/*
> + * Get idle times (nanoseconds since boot) for physical CPUs specified
> +in the
> + * @cpumap_bitmap with range [0..@cpumap_nr_cpus-1]. The @idletime
> +array is
> + * indexed by CPU number; only entries with the corresponding
> +@cpumap_bitmap
> + * bit set are written to. On return, @cpumap_bitmap is modified so
> +that any
> + * non-existent CPUs are cleared. Such CPUs have their @idletime array
> +entry
> + * cleared.
> + */
> +#define XENPF_getidletime         53
> +struct xenpf_getidletime {
> +	/* IN/OUT variables */
> +	/* IN: CPUs to interrogate; OUT: subset of IN which are present */
> +	GUEST_HANDLE(uchar) cpumap_bitmap;
> +	/* IN variables */
> +	/* Size of cpumap bitmap. */
> +	uint32_t cpumap_nr_cpus;
> +	/* Must be indexable for every cpu in cpumap_bitmap. */
> +	GUEST_HANDLE(uint64_t) idletime;
> +	/* OUT variables */
> +	/* System time when the idletime snapshots were taken. */
> +	uint64_t now;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_getidletime_t);
> +
> +#define XENPF_set_processor_pminfo      54
> +
> +/* ability bits */
> +#define XEN_PROCESSOR_PM_CX	1
> +#define XEN_PROCESSOR_PM_PX	2
> +#define XEN_PROCESSOR_PM_TX	4
> +
> +/* cmd type */
> +#define XEN_PM_CX   0
> +#define XEN_PM_PX   1
> +#define XEN_PM_TX   2
> +
> +/* Px sub info type */
> +#define XEN_PX_PCT   1
> +#define XEN_PX_PSS   2
> +#define XEN_PX_PPC   4
> +#define XEN_PX_PSD   8
> +
> +struct xen_power_register {
> +	uint32_t     space_id;
> +	uint32_t     bit_width;
> +	uint32_t     bit_offset;
> +	uint32_t     access_size;
> +	uint64_t     address;
> +};
> +
> +struct xen_processor_csd {
> +	uint32_t    domain;      /* domain number of one dependent group */
> +	uint32_t    coord_type;  /* coordination type */
> +	uint32_t    num;         /* number of processors in same domain */
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_processor_csd);
> +
> +struct xen_processor_cx {
> +	struct xen_power_register  reg; /* GAS for Cx trigger register */
> +	uint8_t     type;     /* cstate value, c0: 0, c1: 1, ... */
> +	uint32_t    latency;  /* worst latency (ms) to enter/exit this cstate */
> +	uint32_t    power;    /* average power consumption(mW) */
> +	uint32_t    dpcnt;    /* number of dependency entries */
> +	GUEST_HANDLE(xen_processor_csd) dp; /* NULL if no dependency */ };
> +DEFINE_GUEST_HANDLE_STRUCT(xen_processor_cx);
> +
> +struct xen_processor_flags {
> +	uint32_t bm_control:1;
> +	uint32_t bm_check:1;
> +	uint32_t has_cst:1;
> +	uint32_t power_setup_done:1;
> +	uint32_t bm_rld_set:1;
> +};
> +
> +struct xen_processor_power {
> +	uint32_t count;  /* number of C state entries in array below */
> +	struct xen_processor_flags flags;  /* global flags of this processor */
> +	GUEST_HANDLE(xen_processor_cx) states; /* supported c states */ };
> +
> +struct xen_pct_register {
> +	uint8_t  descriptor;
> +	uint16_t length;
> +	uint8_t  space_id;
> +	uint8_t  bit_width;
> +	uint8_t  bit_offset;
> +	uint8_t  reserved;
> +	uint64_t address;
> +};
> +
> +struct xen_processor_px {
> +	uint64_t core_frequency; /* megahertz */
> +	uint64_t power;      /* milliWatts */
> +	uint64_t transition_latency; /* microseconds */
> +	uint64_t bus_master_latency; /* microseconds */
> +	uint64_t control;        /* control value */
> +	uint64_t status;     /* success indicator */
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_processor_px);
> +
> +struct xen_psd_package {
> +	uint64_t num_entries;
> +	uint64_t revision;
> +	uint64_t domain;
> +	uint64_t coord_type;
> +	uint64_t num_processors;
> +};
> +
> +struct xen_processor_performance {
> +	uint32_t flags;     /* flag for Px sub info type */
> +	uint32_t platform_limit;  /* Platform limitation on freq usage */
> +	struct xen_pct_register control_register;
> +	struct xen_pct_register status_register;
> +	uint32_t state_count;     /* total available performance states */
> +	GUEST_HANDLE(xen_processor_px) states;
> +	struct xen_psd_package domain_info;
> +	uint32_t shared_type;     /* coordination type of this processor */
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_processor_performance);
> +
> +struct xenpf_set_processor_pminfo {
> +	/* IN variables */
> +	uint32_t id;    /* ACPI CPU ID */
> +	uint32_t type;  /* {XEN_PM_CX, XEN_PM_PX} */
> +	union {
> +		struct xen_processor_power          power;/* Cx: _CST/_CSD */
> +		struct xen_processor_performance    perf; /* Px: _PPC/_PCT/_PSS/_PSD */
> +	};
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo);
> +
> +struct xen_platform_op {
> +	uint32_t cmd;
> +	uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
> +	union {
> +		struct xenpf_settime           settime;
> +		struct xenpf_add_memtype       add_memtype;
> +		struct xenpf_del_memtype       del_memtype;
> +		struct xenpf_read_memtype      read_memtype;
> +		struct xenpf_microcode_update  microcode;
> +		struct xenpf_platform_quirk    platform_quirk;
> +		struct xenpf_firmware_info     firmware_info;
> +		struct xenpf_enter_acpi_sleep  enter_acpi_sleep;
> +		struct xenpf_change_freq       change_freq;
> +		struct xenpf_getidletime       getidletime;
> +		struct xenpf_set_processor_pminfo set_pminfo;
> +		uint8_t                        pad[128];
> +	} u;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_platform_op_t);
> +
> +#endif /* __XEN_PUBLIC_PLATFORM_H__ */
> diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h index 70213b4..d83cc08
> 100644
> --- a/include/xen/interface/xen.h
> +++ b/include/xen/interface/xen.h
> @@ -453,6 +453,7 @@ struct start_info {
>  /* These flags are passed in the 'flags' field of start_info_t. */
>  #define SIF_PRIVILEGED    (1<<0)  /* Is the domain privileged? */
>  #define SIF_INITDOMAIN    (1<<1)  /* Is this the initial control domain? */
> +#define SIF_PM_MASK       (0xFF<<8) /* reserve 1 byte for xen-pm options */
> 
>  typedef uint64_t cpumap_t;
> 
> --
> 1.7.4.1

^ permalink raw reply

* Re: [PATCH 6/7] xen/acpi/sleep: Enable ACPI sleep via the __acpi_override_sleep
From: Cihula, Joseph @ 2011-09-07  4:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, x86@kernel.org, tglx@linutronix.de,
	tboot-devel
  Cc: xen-devel@lists.xensource.com
In-Reply-To: <1314815484-4668-7-git-send-email-konrad.wilk@oracle.com>

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 31, 2011 11:31 AM
> 
> Provide the registration callback to call in the Xen's ACPI sleep functionality. This means that
> during S3/S5 we make a hypercall XENPF_enter_acpi_sleep with the proper PM1A/PM1B registers.
> 
> Based of Ke Yu's <ke.yu@intel.com> initial idea.
> [ From http://xenbits.xensource.com/linux-2.6.18-xen.hg
> change c68699484a65 ]
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/xen/hypercall.h |    8 ++++++++
>  arch/x86/xen/enlighten.c             |    3 +++
>  drivers/xen/Makefile                 |    2 +-
>  drivers/xen/acpi.c                   |   25 +++++++++++++++++++++++++
>  include/xen/acpi.h                   |   26 ++++++++++++++++++++++++++
>  5 files changed, 63 insertions(+), 1 deletions(-)  create mode 100644 drivers/xen/acpi.c  create
> mode 100644 include/xen/acpi.h
> 
> diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
> index d240ea9..0c9894e 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -45,6 +45,7 @@
>  #include <xen/interface/xen.h>
>  #include <xen/interface/sched.h>
>  #include <xen/interface/physdev.h>
> +#include <xen/interface/platform.h>
> 
>  /*
>   * The hypercall asms have to meet several constraints:
> @@ -299,6 +300,13 @@ HYPERVISOR_set_timer_op(u64 timeout)  }
> 
>  static inline int
> +HYPERVISOR_dom0_op(struct xen_platform_op *platform_op) {
> +	platform_op->interface_version = XENPF_INTERFACE_VERSION;
> +	return _hypercall1(int, dom0_op, platform_op); }
> +
> +static inline int
>  HYPERVISOR_set_debugreg(int reg, unsigned long value)  {
>  	return _hypercall2(int, set_debugreg, reg, value); diff --git a/arch/x86/xen/enlighten.c
> b/arch/x86/xen/enlighten.c index 5525163..6962653 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -42,6 +42,7 @@
>  #include <xen/page.h>
>  #include <xen/hvm.h>
>  #include <xen/hvc-console.h>
> +#include <xen/acpi.h>
> 
>  #include <asm/paravirt.h>
>  #include <asm/apic.h>
> @@ -1250,6 +1251,8 @@ asmlinkage void __init xen_start_kernel(void)
>  	} else {
>  		/* Make sure ACS will be enabled */
>  		pci_request_acs();
> +
> +		xen_acpi_sleep_register();
>  	}
> 
> 
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile index bbc1825..370552d 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -16,7 +16,7 @@ obj-$(CONFIG_XENFS)			+= xenfs/
>  obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
>  obj-$(CONFIG_XEN_PLATFORM_PCI)		+= xen-platform-pci.o
>  obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
> -obj-$(CONFIG_XEN_DOM0)			+= pci.o
> +obj-$(CONFIG_XEN_DOM0)			+= pci.o acpi.o
> 
>  xen-evtchn-y				:= evtchn.o
>  xen-gntdev-y				:= gntdev.o
> diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c new file mode 100644 index 0000000..c0f829f
> --- /dev/null
> +++ b/drivers/xen/acpi.c
> @@ -0,0 +1,25 @@

Copyright and license?

> +#include <xen/acpi.h>
> +#include <xen/interface/platform.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/hypervisor.h>
> +
> +int xen_acpi_notify_hypervisor_state(u8 sleep_state,
> +				     u32 pm1a_cnt, u32 pm1b_cnt,
> +				     bool *skip_rest)
> +{
> +	struct xen_platform_op op = {
> +		.cmd = XENPF_enter_acpi_sleep,
> +		.interface_version = XENPF_INTERFACE_VERSION,
> +		.u = {
> +			.enter_acpi_sleep = {
> +				.pm1a_cnt_val = (u16)pm1a_cnt,
> +				.pm1b_cnt_val = (u16)pm1b_cnt,
> +				.sleep_state = sleep_state,
> +			},
> +		},
> +	};
> +	if (skip_rest)
> +		*skip_rest = true;
> +
> +	return HYPERVISOR_dom0_op(&op);
> +}
> diff --git a/include/xen/acpi.h b/include/xen/acpi.h new file mode 100644 index 0000000..e414f14
> --- /dev/null
> +++ b/include/xen/acpi.h
> @@ -0,0 +1,26 @@

Copyright and license?

> +#ifndef _XEN_ACPI_H
> +#define _XEN_ACPI_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_XEN_DOM0
> +#include <asm/xen/hypervisor.h>
> +#include <xen/xen.h>
> +#include <linux/acpi.h>
> +
> +int xen_acpi_notify_hypervisor_state(u8 sleep_state,
> +				     u32 pm1a_cnt, u32 pm1b_cnd,
> +				     bool *skip_rest);
> +
> +static inline void xen_acpi_sleep_register(void) {
> +	if (xen_initial_domain())
> +		__acpi_override_sleep = xen_acpi_notify_hypervisor_state; } #else
> +static inline void xen_acpi_sleep_register(void) { } #endif
> +
> +#endif	/* _XEN_ACPI_H */
> --
> 1.7.4.1

^ permalink raw reply

* Re: [PATCH 2/7] x86, acpi, tboot: Have a ACPI sleep override instead of calling tboot_sleep.
From: Cihula, Joseph @ 2011-09-07  4:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, x86@kernel.org, tglx@linutronix.de,
	tboot-devel
  Cc: xen-devel@lists.xensource.com
In-Reply-To: <1314815484-4668-3-git-send-email-konrad.wilk@oracle.com>

> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 31, 2011 11:31 AM
> 
> The ACPI suspend path makes a call to tboot_sleep right before it writes the PM1A, PM1B values. We
> replace the direct call to tboot via an registration callback similar to __acpi_register_gsi.
> 
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: x86@kernel.org
> CC: Len Brown <len.brown@intel.com>
> CC: Joseph Cihula <joseph.cihula@intel.com>
> CC: Shane Wang <shane.wang@intel.com>
> CC: xen-devel@lists.xensource.com
> CC: linux-pm@lists.linux-foundation.org
> CC: tboot-devel@lists.sourceforge.net
> CC: linux-acpi@vger.kernel.org
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/include/asm/acpi.h   |    3 +++
>  arch/x86/kernel/acpi/boot.c   |    3 +++
>  arch/x86/kernel/tboot.c       |   13 +++++++++----
>  drivers/acpi/acpica/hwsleep.c |   12 ++++++++++--
>  include/linux/tboot.h         |    3 ++-
>  5 files changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index 610001d..49864a1
> 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -98,6 +98,9 @@ void acpi_pic_sci_set_trigger(unsigned int, u16);  extern int
> (*__acpi_register_gsi)(struct device *dev, u32 gsi,
>  				  int trigger, int polarity);
> 
> +extern int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
> +				    u32 pm1b_ctrl, bool *skip_rest);
> +
>  static inline void disable_acpi(void)
>  {
>  	acpi_disabled = 1;
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 4558f0d..d191b4c
> 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -552,6 +552,9 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,  int
> (*__acpi_register_gsi)(struct device *dev, u32 gsi,
>  			   int trigger, int polarity) = acpi_register_gsi_pic;
> 
> +int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
> +			     u32 pm1b_ctrl, bool *skip_rest) = NULL;
> +
>  /*
>   * success: return IRQ number (>=0)
>   * failure: return < 0
> diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 30ac65d..a18070c 100644
> --- a/arch/x86/kernel/tboot.c
> +++ b/arch/x86/kernel/tboot.c
> @@ -41,7 +41,7 @@
>  #include <asm/setup.h>
>  #include <asm/e820.h>
>  #include <asm/io.h>
> -
> +#include <linux/acpi.h>
>  #include "acpi/realmode/wakeup.h"
> 
>  /* Global pointer to shared data; NULL means no measured launch. */ @@ -270,7 +270,8 @@ static
> void tboot_copy_fadt(const struct acpi_table_fadt *fadt)
>  		offsetof(struct acpi_table_facs, firmware_waking_vector);  }
> 
> -void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
> +int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
> +		 bool *skip_rest)

Don't you need to use the 'unused' attrib on skip_rest in order to prevent compiler warnings?

>  {
>  	static u32 acpi_shutdown_map[ACPI_S_STATE_COUNT] = {
>  		/* S0,1,2: */ -1, -1, -1,
> @@ -279,7 +280,7 @@ void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
>  		/* S5: */ TB_SHUTDOWN_S5 };
> 
>  	if (!tboot_enabled())
> -		return;
> +		return AE_OK;
> 
>  	tboot_copy_fadt(&acpi_gbl_FADT);
>  	tboot->acpi_sinfo.pm1a_cnt_val = pm1a_control; @@ -290,10 +291,12 @@ void tboot_sleep(u8
> sleep_state, u32 pm1a_control, u32 pm1b_control)
>  	if (sleep_state >= ACPI_S_STATE_COUNT ||
>  	    acpi_shutdown_map[sleep_state] == -1) {
>  		pr_warning("unsupported sleep state 0x%x\n", sleep_state);
> -		return;
> +		return AE_ERROR;
>  	}
> 
>  	tboot_shutdown(acpi_shutdown_map[sleep_state]);
> +
> +	return AE_OK;
>  }
> 
>  static atomic_t ap_wfs_count;
> @@ -343,6 +346,8 @@ static __init int tboot_late_init(void)
> 
>  	atomic_set(&ap_wfs_count, 0);
>  	register_hotcpu_notifier(&tboot_cpu_notifier);
> +
> +	__acpi_override_sleep = tboot_sleep;
>  	return 0;
>  }
> 
> diff --git a/drivers/acpi/acpica/hwsleep.c b/drivers/acpi/acpica/hwsleep.c index 2ac28bb..31d1198
> 100644
> --- a/drivers/acpi/acpica/hwsleep.c
> +++ b/drivers/acpi/acpica/hwsleep.c
> @@ -45,7 +45,6 @@
>  #include <acpi/acpi.h>
>  #include "accommon.h"
>  #include "actables.h"
> -#include <linux/tboot.h>
> 
>  #define _COMPONENT          ACPI_HARDWARE
>  ACPI_MODULE_NAME("hwsleep")
> @@ -343,8 +342,17 @@ acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state)
> 
>  	ACPI_FLUSH_CPU_CACHE();
> 
> -	tboot_sleep(sleep_state, pm1a_control, pm1b_control);
> +	if (__acpi_override_sleep) {
> +		bool skip_rest = false;
> 
> +		status = __acpi_override_sleep(sleep_state, pm1a_control,
> +					       pm1b_control, &skip_rest);
> +
> +		if (ACPI_FAILURE(status))
> +			return_ACPI_STATUS(status);
> +		if (skip_rest)
> +			return_ACPI_STATUS(AE_OK);
> +	}
>  	/* Write #2: Write both SLP_TYP + SLP_EN */
> 
>  	status = acpi_hw_write_pm1_control(pm1a_control, pm1b_control); diff --git
> a/include/linux/tboot.h b/include/linux/tboot.h index 1dba6ee..19badbd 100644
> --- a/include/linux/tboot.h
> +++ b/include/linux/tboot.h
> @@ -143,7 +143,8 @@ static inline int tboot_enabled(void)
> 
>  extern void tboot_probe(void);
>  extern void tboot_shutdown(u32 shutdown_type); -extern void tboot_sleep(u8 sleep_state, u32
> pm1a_control, u32 pm1b_control);
> +extern int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
> +		       bool *skip);
>  extern struct acpi_table_header *tboot_get_dmar_table(
>  				      struct acpi_table_header *dmar_tbl);  extern int
> tboot_force_iommu(void);
> --
> 1.7.4.1

^ permalink raw reply

* [PATCH] vt/suspend: cleanup #if defined uglyness and fix compile error
From: H Hartley Sweeten @ 2011-09-06 21:04 UTC (permalink / raw)
  To: Linux Kernel; +Cc: len.brown, art, arnd, gregkh, linux-pm, jslaby, akpm

Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
the #if defined ugliness for the vt suspend support functions. Note that
CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.

The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
not set:

drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here

Also, remove the incorrect path from the comment in console.c.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Arnd Bergmann <arnd@arnd.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Len Brown <len.brown@intel.com>
Cc: Arthur Taylor <art@ified.ca>
Cc: Jiri Slaby <jslaby@suse.cz>

---

This is a revised version of the patch "vt_ioctl.c: fix compile error with
pm_set_vt_switch()" that I sent last week.  Alan Cox asked if the #if uglies
could be moved into the header to keep the code clean. Arnd Bergmann then
submitted a revised patch that did this.  This now includes Arnd's fix and along
with the addition of CONFIG_VT_CONSOLE_SLEEP.

diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index f1ea59b..565fe0f 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -60,6 +60,10 @@ config VT_CONSOLE
 
 	  If unsure, say Y.
 
+config VT_CONSOLE_SLEEP
+	def_bool y
+	depends on VT_CONSOLE && PM_SLEEP
+
 config HW_CONSOLE
 	bool
 	depends on VT && !S390 && !UML
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 46f3548..c3da030 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -8,15 +8,18 @@
 #include <linux/mm.h>
 #include <asm/errno.h>
 
-#if defined(CONFIG_PM_SLEEP) && defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
+#if defined(CONFIG_VT)
 extern void pm_set_vt_switch(int);
-extern int pm_prepare_console(void);
-extern void pm_restore_console(void);
 #else
 static inline void pm_set_vt_switch(int do_switch)
 {
 }
+#endif
 
+#if defined(CONFIG_VT_CONSOLE_SLEEP)
+extern int pm_prepare_console(void);
+extern void pm_restore_console(void);
+#else
 static inline int pm_prepare_console(void)
 {
 	return 0;
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index ad6bdd8..07e0e28 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -2,7 +2,7 @@
 ccflags-$(CONFIG_PM_DEBUG)	:= -DDEBUG
 
 obj-$(CONFIG_PM)		+= main.o qos.o
-obj-$(CONFIG_PM_SLEEP)		+= console.o
+obj-$(CONFIG_VT_CONSOLE_SLEEP)	+= console.o
 obj-$(CONFIG_FREEZER)		+= process.o
 obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PM_TEST_SUSPEND)	+= suspend_test.o
diff --git a/kernel/power/console.c b/kernel/power/console.c
index 218e5af..b1dc456 100644
--- a/kernel/power/console.c
+++ b/kernel/power/console.c
@@ -1,5 +1,5 @@
 /*
- * drivers/power/process.c - Functions for saving/restoring console.
+ * Functions for saving/restoring console.
  *
  * Originally from swsusp.
  */
@@ -10,7 +10,6 @@
 #include <linux/module.h>
 #include "power.h"
 
-#if defined(CONFIG_VT) && defined(CONFIG_VT_CONSOLE)
 #define SUSPEND_CONSOLE	(MAX_NR_CONSOLES-1)
 
 static int orig_fgconsole, orig_kmsg;
@@ -32,4 +31,3 @@ void pm_restore_console(void)
 		vt_kmsg_redirect(orig_kmsg);
 	}
 }
-#endif

^ permalink raw reply related

* Re: [PATCH] OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
From: Kevin Hilman @ 2011-09-06 20:20 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-pm, linux-omap, linux-arm-kernel, Arnd Bergmann
In-Reply-To: <4E667EE1.1020808@ti.com>

On 09/06/2011 01:13 PM, Kevin Hilman wrote:
> On 09/01/2011 02:57 PM, Rafael J. Wysocki wrote:
>> On Thursday, September 01, 2011, Arnd Bergmann wrote:
>>> On Thursday 01 September 2011 11:12:02 Kevin Hilman wrote:
>>>> The suspend/resume _noirq handlers were #ifdef'd out in the
>>>> !CONFIG_SUSPEND case, but were still assigned to the dev_pm_ops
>>>> struct. Fix by defining them to NULL in the !CONFIG_SUSPEND case.
>>>>
>>>> Reported-by: Arnd Bergmann<arnd@arndb.de>
>>>> Signed-off-by: Kevin Hilman<khilman@ti.com>
>>>
>>> Acked-by: Arnd Bergmann<arnd@arndb.de>
>>>
>>> Thansk for the fast response!
>>
>> I'll apply the patch when kernel.org is back in order.
>>
>
> Tony,
>
> I spoke w/Rafael and due to hera being down and Rafael traveling to LPC
> he wont be able to queue this for v3.1. Can you queue up this fix? This
> needs to go in to v3.1-rc.

For convenience, this patch is in a branch in my backup gitorious repo 
with Arnd's ack added:

	git://gitorious.org/khilman/linux-omap-pm.git for_3.1/pm-fixes-2

Kevin

^ permalink raw reply

* Re: [PATCH] OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
From: Kevin Hilman @ 2011-09-06 20:13 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: linux-pm, linux-omap, linux-arm-kernel, Arnd Bergmann
In-Reply-To: <201109012357.39873.rjw@sisk.pl>

On 09/01/2011 02:57 PM, Rafael J. Wysocki wrote:
> On Thursday, September 01, 2011, Arnd Bergmann wrote:
>> On Thursday 01 September 2011 11:12:02 Kevin Hilman wrote:
>>> The suspend/resume _noirq handlers were #ifdef'd out in the
>>> !CONFIG_SUSPEND case, but were still assigned to the dev_pm_ops
>>> struct.  Fix by defining them to NULL in the !CONFIG_SUSPEND case.
>>>
>>> Reported-by: Arnd Bergmann<arnd@arndb.de>
>>> Signed-off-by: Kevin Hilman<khilman@ti.com>
>>
>> Acked-by: Arnd Bergmann<arnd@arndb.de>
>>
>> Thansk for the fast response!
>
> I'll apply the patch when kernel.org is back in order.
>

Tony,

I spoke w/Rafael and due to hera being down and Rafael traveling to LPC 
he wont be able to queue this for v3.1.  Can you queue up this fix? 
This needs to go in to v3.1-rc.

Thanks,

Kevin

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Tejun Heo @ 2011-09-06 15:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110906152539.GA16899@redhat.com>

Hello,

On Tue, Sep 06, 2011 at 05:25:39PM +0200, Oleg Nesterov wrote:
> On 09/06, Oleg Nesterov wrote:
> >
> > Yes, agreed. In this case I think it should be
> >
> > 	#define wait_event_freezable(wq, condition)				\
> > 	({									\
> > 		int __retval;							\
> > 		for (;;) {							\
> > 			__retval = wait_event_interruptible(wq, 		\
> > 					(condition) || freezing(current));	\
> > 			if (__retval || (condition))				\
> > 				break;						\
> > 			try_to_freeze();					\
> > 		}								\
> > 		__retval;							\
> > 	})
> >
> > __retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(),
> > probably nobody should do this.
> 
> I meant, unless the caller plays with allow_signal(), it has all rights to do
> 
> 	if (wait_event_freezable(...))
> 		BUG();
> 
> This becomes correct with the code above.

Yeap, sure, w/ freezable_with_signal gone, the above should work fine.
Care to create a patch?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Oleg Nesterov @ 2011-09-06 15:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110906151836.GA15568@redhat.com>

On 09/06, Oleg Nesterov wrote:
>
> Yes, agreed. In this case I think it should be
>
> 	#define wait_event_freezable(wq, condition)				\
> 	({									\
> 		int __retval;							\
> 		for (;;) {							\
> 			__retval = wait_event_interruptible(wq, 		\
> 					(condition) || freezing(current));	\
> 			if (__retval || (condition))				\
> 				break;						\
> 			try_to_freeze();					\
> 		}								\
> 		__retval;							\
> 	})
>
> __retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(),
> probably nobody should do this.

I meant, unless the caller plays with allow_signal(), it has all rights to do

	if (wait_event_freezable(...))
		BUG();

This becomes correct with the code above.

Oleg.

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Oleg Nesterov @ 2011-09-06 15:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110906032846.GA18425@mtj.dyndns.org>

Hi,

On 09/06, Tejun Heo wrote:
>
> Hello,
>
> On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote:
> > Perhaps it is correct... Just I do not understand what it should do.
> > I thought it is "wait_for_event && do_not_block_freezer". And at first
> > glance the code looks as if it tries to do this. Say, in the "likely"
> > case we restart wait_event_interruptible() after refrigerator().
> >
> > But this looks racy. Suppose that freezing() is already false when
> > try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does
> > freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case?
>
> It may return -ERESTARTSYS when not strictly necessary but given that
> it's supposed to trigger restart anyway I don't think it's actually
> broken (it doesn't break the contract of the wait).

OK, but still this doesn't look right.

> Another thing to
> note is that kthread freezing via cgroup is almost fundamentally
> broken.

OK, let it be freeze_processes()+thaw_tasks(). Of course this is mostly
theoretical.

> > And if it can be used by the userspace thread, then we should probably
> > do recalc_sigpending() somewhere, otherwise wait_event_freezable() will
> > always return -ERESTARTSYS after __refrigerator().
>
> Thankfully, currently, all the few users are kthreads.  I don't think
> it makes sense for userland tasks at all.

Yes, agreed. In this case I think it should be

	#define wait_event_freezable(wq, condition)				\
	({									\
		int __retval;							\
		for (;;) {							\
			__retval = wait_event_interruptible(wq, 		\
					(condition) || freezing(current));	\
			if (__retval || (condition))				\
				break;						\
			try_to_freeze();					\
		}								\
		__retval;							\
	})

__retval/ERESTARTSYS is only needed for kthreads which play with allow_signal(),
probably nobody should do this.

Oleg.

^ permalink raw reply

* Re: [PATCHSET cgroup] extend threadgroup locking
From: Li Zefan @ 2011-09-06  9:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, fweisbec, containers, linux-kernel, linux-pm, paul
In-Reply-To: <1315159280-25032-1-git-send-email-htejun@gmail.com>

Tejun Heo wrote:
> Hello,
> 
> cgroup currently only blocks new threads from joining the target
> threadgroup during migration, and on-going migration could race
> against exec and exit leading to interesting problems - the symmetry
> between various attach methods, task exiting during method execution,
> ->exit() racing against attach methods, migrating task switching basic
> properties during exec and so on.
> 
> This patchset extends threadgroup locking such that it covers all
> operations which can alter the threadgroup - fork, exit and exec, and
> update cgroup to take advantage of it.  rwsem read ops are added to
> exit path but exec is excluded by grabbing the existing
> cred_guard_mutex from threadgroup locking helper.
> 
> This makes threadgroup locking complete and resolves cgroup issues
> stemming from the target taskset being unstable.
> 
> This patchset is on top of the current pm-freezer + "freezer: fixes &
> simplifications" patchset and contains the following four patches.
> Patch list and diffstat follow.
> 
> Thanks.
> 
>  [PATCH 1/4] cgroup: change locking order in attach_task_by_pid()
>  [PATCH 2/4] threadgroup: rename signal->threadgroup_fork_lock to
>  [PATCH 3/4] threadgroup: extend threadgroup_lock() to cover exit and
>  [PATCH 4/4] cgroup: always lock threadgroup during migration
> 

I've read through the whole patchset, and it looks good to me.

Acked-by: Li Zefan <lizf@cn.fujitsu.com>

>  include/linux/init_task.h |    9 ++----
>  include/linux/sched.h     |   58 ++++++++++++++++++++++++++++---------------
>  kernel/cgroup.c           |   62 +++++++++++++++++++++-------------------------
>  kernel/exit.c             |   16 ++++++++---
>  kernel/fork.c             |    8 ++---
>  5 files changed, 88 insertions(+), 65 deletions(-)
> 
> --
> tejun
> 
> [1] http://thread.gmane.org/gmane.linux.kernel/1187553
> 

^ permalink raw reply

* Re: [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00 seconds
From: Rafael J. Wysocki @ 2011-09-06  6:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, arnd, linux-kernel, oleg, Pekka Enberg,
	Srivatsa S. Bhat, Linux PM mailing list
In-Reply-To: <20110906050831.GA16976@htj.dyndns.org>

On Tuesday, September 06, 2011, Tejun Heo wrote:
> Hello, again.
> 
> On Mon, Sep 05, 2011 at 11:15:12PM +0900, Tejun Heo wrote:
> > >  Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
> > >  invert_cpu_stat D 0000000000000000  5304 20435  17329 0x00000084
> > >   ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
> > >   ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
> > >   ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
> > >  Call Trace:
> > >   [<ffffffff81532de5>] schedule_timeout+0x235/0x320
> > >   [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
> > >   [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
> > >   [<ffffffff81364486>] _request_firmware+0x156/0x2c0
> > >   [<ffffffff81364686>] request_firmware+0x16/0x20
> > >   [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
> > >   [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
> > >   [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
> > >   [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
> > >   [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
> > >   [<ffffffff8106d000>] __cpu_notify+0x20/0x40
> > >   [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
> > >   [<ffffffff8152d07b>] cpu_up+0xd9/0xec
> > >   [<ffffffff8151e599>] store_online+0x99/0xd0
> > >   [<ffffffff81355eb0>] sysdev_store+0x20/0x30
> > >   [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
> > >   [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
> > >   [<ffffffff8117f024>] sys_write+0x54/0xa0
> > >   [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
> > 
> > So, this task is trying to bring a CPU up, which triggers firmware
> > helper to load microcode.  Firmware class currently sleeps
> > non-interruptibly to wait for firmware load to complete, which is
> > performed by another userland task.  Now, the PM freezer doesn't
> > assume that there will be non-freezable wait dependencies among
> > userland tasks.  It only knows two levels - userland and kernel tasks
> > - and assumes that the former group may have non-freezable wait
> > dependency on the latter but there's no such dependency among each
> > group itself.  If there's such dependency, PM freezer may fail, which
> > is what happened here.
> > 
> > ie. the firmware loader userland process got frozen first.
> > invert_cpu_stat trying to bring up CPU was waiting for the firmware
> > loader to finish in non-interruptible sleep, so the freezer couldn't
> > proceed.
> 
> Hmmm... I went through the code again and usermodehelper_disable()
> seems to be there to prevent deadlocks like this.  usermode helpers
> are drained & plugged before freezing is tried.  Rafael, the above
> shouldn't be happening, right?

No, it shouldn't in theory, but I'm not sure any more after the recent
modifications of firmware loading related to the initialization.  I'll have
a closer look tomorrow.

Thanks,
Rafael

^ permalink raw reply

* Re: [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00 seconds
From: Tejun Heo @ 2011-09-06  5:08 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Pekka Enberg, Christoph Lameter, arnd, linux-kernel, oleg,
	Linux PM mailing list
In-Reply-To: <20110905141512.GE9807@htj.dyndns.org>

Hello, again.

On Mon, Sep 05, 2011 at 11:15:12PM +0900, Tejun Heo wrote:
> >  Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
> >  invert_cpu_stat D 0000000000000000  5304 20435  17329 0x00000084
> >   ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
> >   ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
> >   ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
> >  Call Trace:
> >   [<ffffffff81532de5>] schedule_timeout+0x235/0x320
> >   [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
> >   [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
> >   [<ffffffff81364486>] _request_firmware+0x156/0x2c0
> >   [<ffffffff81364686>] request_firmware+0x16/0x20
> >   [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
> >   [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
> >   [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
> >   [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
> >   [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
> >   [<ffffffff8106d000>] __cpu_notify+0x20/0x40
> >   [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
> >   [<ffffffff8152d07b>] cpu_up+0xd9/0xec
> >   [<ffffffff8151e599>] store_online+0x99/0xd0
> >   [<ffffffff81355eb0>] sysdev_store+0x20/0x30
> >   [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
> >   [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
> >   [<ffffffff8117f024>] sys_write+0x54/0xa0
> >   [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
> 
> So, this task is trying to bring a CPU up, which triggers firmware
> helper to load microcode.  Firmware class currently sleeps
> non-interruptibly to wait for firmware load to complete, which is
> performed by another userland task.  Now, the PM freezer doesn't
> assume that there will be non-freezable wait dependencies among
> userland tasks.  It only knows two levels - userland and kernel tasks
> - and assumes that the former group may have non-freezable wait
> dependency on the latter but there's no such dependency among each
> group itself.  If there's such dependency, PM freezer may fail, which
> is what happened here.
> 
> ie. the firmware loader userland process got frozen first.
> invert_cpu_stat trying to bring up CPU was waiting for the firmware
> loader to finish in non-interruptible sleep, so the freezer couldn't
> proceed.

Hmmm... I went through the code again and usermodehelper_disable()
seems to be there to prevent deadlocks like this.  usermode helpers
are drained & plugged before freezing is tried.  Rafael, the above
shouldn't be happening, right?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [BUG] Soft-lockup during cpu-hotplug in VFS callpaths
From: Srivatsa S. Bhat @ 2011-09-06  5:03 UTC (permalink / raw)
  To: maciej.rutecki; +Cc: linux-fsdevel, linux-pm, linux-kernel
In-Reply-To: <201109052118.49293.maciej.rutecki@gmail.com>

On 09/06/2011 12:48 AM, Maciej Rutecki wrote:
> On poniedziałek, 5 września 2011 o 11:07:55 Srivatsa S. Bhat wrote:
>> On 09/01/2011 12:10 AM, Maciej Rutecki wrote:
>>> On środa, 24 sierpnia 2011 o 15:44:55 Srivatsa S. Bhat wrote:
>>>> Hi,
>>>>
>>>> While running stressful cpu hotplug tests along with kernel compilation
>>>> running in background, soft-lockups are detected on multiple CPUs.
>>>> Sometimes this also leads to hard lockups and kernel panic.
>>>> All the soft-lockups seem to occur at vfsmount_lock_local_cpu() or other
>>>> VFS callpaths.
>>>>
>>>>
>>>> [37108.410813] BUG: soft lockup - CPU#5 stuck for 22s! [cc1:29669]
>>>> <snip>
>>>> [37108.694781] Call Trace:
>>>> [37108.697306]  [<ffffffff81199e70>] ?
>>>> vfsmount_lock_local_lock_cpu+0x70/0x70 [37108.704258]
>>>> [<ffffffff81187cb5>] path_init+0x315/0x400
>>>> [37108.709558]  [<ffffffff8127c398>] ? __raw_spin_lock_init+0x38/0x70
>>>> [37108.715812]  [<ffffffff8118961c>] path_openat+0x8c/0x3f0
>>>> [37108.721203]  [<ffffffff81012129>] ? sched_clock+0x9/0x10
>>>> [37108.726597]  [<ffffffff8109416d>] ? sched_clock_cpu+0xcd/0x110
>>>> [37108.732508]  [<ffffffff810a178d>] ? trace_hardirqs_off+0xd/0x10
>>>> [37108.738498]  [<ffffffff8109421f>] ? local_clock+0x6f/0x80
>>>> [37108.743970]  [<ffffffff81189a99>] do_filp_open+0x49/0xa0
>>>> [37108.749362]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
>>>> [37108.754665]  [<ffffffff8152584b>] ? _raw_spin_unlock+0x2b/0x40
>>>> [37108.760575]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
>>>> [37108.765875]  [<ffffffff81179607>] do_sys_open+0x107/0x1e0
>>>> [37108.771352]  [<ffffffff810d610f>] ? audit_syscall_entry+0x1bf/0x1f0
>>>> [37108.777695]  [<ffffffff81179720>] sys_open+0x20/0x30
>>>> [37108.782741]  [<ffffffff8152e202>] system_call_fastpath+0x16/0x1b
>>>>
>>>> Kernel version: 3.0.1, 3.0.3
>>>> Hardware: Dual socket quad-core hyper-threaded Intel x86 machine
>>>> Scenario:
>>>> (a) Stressful cpu hotplug tests + kernel compilation
>>>>
>>>> (b) IRQ balancing had been disabled and all the IRQs  were made to be
>>>>
>>>>     routed to CPU 0 (except the ones that couldn't be routed).
>>>>
>>>> (c) Lockdep was enabled during kernel configuration.
>>>>
>>>> Steps (b) and (c) were done to dig deeper into the issue. However the
>>>> same issue was observed by just doing step (a).
>>>>
>>>> Definitely there seems to be a race condition occurring here, because
>>>> this issue is hit after sometime, after starting the tests. And the
>>>> time it takes to hit the issue increases as we increase the number of
>>>> debug print statements. In some cases (especially when the number of
>>>> debug print statements were quite high), the stress on the machine had
>>>> to be increased in order to hit the issue within measurable time. In my
>>>> tests, a maximum of about 2 to 2.5 hours was sufficient, to hit this
>>>> bug.
>>>>
>>>> Please find the console log attached with this mail.
>>>>
>>>> Any ideas on how to go about fixing this bug?
>>>
>>> It is a regression?
>>
>> Hi Maciej,
>>
>> Thank you for taking a look.
>> Yes, it seems to be a regression. I tested out kernel 2.6.39.3 with similar
>> test cases for quite a long time, and it did not hit any soft-lockup
>> issues.
> 
> Thanks for the answer. I create bug entry:
> https://bugzilla.kernel.org/show_bug.cgi?id=42402

Oh thank you. But I had created an entry myself in bugzilla, immediately after I posted on the
mailing list. (https://bugzilla.kernel.org/show_bug.cgi?id=42382)
I will however delete my entry since we don't want duplicates and moreover the 'Product'
and 'Component' fields in your entry seems more appropriate with respect to the bug.

Thanks again.

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@linux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Tejun Heo @ 2011-09-06  3:28 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110905162012.GA4445@redhat.com>

Hello,

On Mon, Sep 05, 2011 at 06:20:12PM +0200, Oleg Nesterov wrote:
> Perhaps it is correct... Just I do not understand what it should do.
> I thought it is "wait_for_event && do_not_block_freezer". And at first
> glance the code looks as if it tries to do this. Say, in the "likely"
> case we restart wait_event_interruptible() after refrigerator().
> 
> But this looks racy. Suppose that freezing() is already false when
> try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does
> freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case?

It may return -ERESTARTSYS when not strictly necessary but given that
it's supposed to trigger restart anyway I don't think it's actually
broken (it doesn't break the contract of the wait).  Another thing to
note is that kthread freezing via cgroup is almost fundamentally
broken.  With the removal of freezable_with_signal, this shouldn't be
an issue anymore although cgroup_freezer still needs to be fixed to
account for kthreads for xstate transitions (it currently triggers
BUG_ON).

> And if it can be used by the userspace thread, then we should probably
> do recalc_sigpending() somewhere, otherwise wait_event_freezable() will
> always return -ERESTARTSYS after __refrigerator().

Thankfully, currently, all the few users are kthreads.  I don't think
it makes sense for userland tasks at all.  Interruptible sleeps for
userland tasks necessiate the ability to return to signal delivery
path and restart or abort the current operation.  There's no point in
using wait_event_freezable() instead of wait_event_interruptible().

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [BUG] Soft-lockup during cpu-hotplug in VFS callpaths
From: Maciej Rutecki @ 2011-09-05 19:18 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: linux-fsdevel, linux-pm, linux-kernel
In-Reply-To: <4E64916B.4090205@linux.vnet.ibm.com>

On poniedziałek, 5 września 2011 o 11:07:55 Srivatsa S. Bhat wrote:
> On 09/01/2011 12:10 AM, Maciej Rutecki wrote:
> > On środa, 24 sierpnia 2011 o 15:44:55 Srivatsa S. Bhat wrote:
> >> Hi,
> >> 
> >> While running stressful cpu hotplug tests along with kernel compilation
> >> running in background, soft-lockups are detected on multiple CPUs.
> >> Sometimes this also leads to hard lockups and kernel panic.
> >> All the soft-lockups seem to occur at vfsmount_lock_local_cpu() or other
> >> VFS callpaths.
> >> 
> >> 
> >> [37108.410813] BUG: soft lockup - CPU#5 stuck for 22s! [cc1:29669]
> >> <snip>
> >> [37108.694781] Call Trace:
> >> [37108.697306]  [<ffffffff81199e70>] ?
> >> vfsmount_lock_local_lock_cpu+0x70/0x70 [37108.704258]
> >> [<ffffffff81187cb5>] path_init+0x315/0x400
> >> [37108.709558]  [<ffffffff8127c398>] ? __raw_spin_lock_init+0x38/0x70
> >> [37108.715812]  [<ffffffff8118961c>] path_openat+0x8c/0x3f0
> >> [37108.721203]  [<ffffffff81012129>] ? sched_clock+0x9/0x10
> >> [37108.726597]  [<ffffffff8109416d>] ? sched_clock_cpu+0xcd/0x110
> >> [37108.732508]  [<ffffffff810a178d>] ? trace_hardirqs_off+0xd/0x10
> >> [37108.738498]  [<ffffffff8109421f>] ? local_clock+0x6f/0x80
> >> [37108.743970]  [<ffffffff81189a99>] do_filp_open+0x49/0xa0
> >> [37108.749362]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
> >> [37108.754665]  [<ffffffff8152584b>] ? _raw_spin_unlock+0x2b/0x40
> >> [37108.760575]  [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
> >> [37108.765875]  [<ffffffff81179607>] do_sys_open+0x107/0x1e0
> >> [37108.771352]  [<ffffffff810d610f>] ? audit_syscall_entry+0x1bf/0x1f0
> >> [37108.777695]  [<ffffffff81179720>] sys_open+0x20/0x30
> >> [37108.782741]  [<ffffffff8152e202>] system_call_fastpath+0x16/0x1b
> >> 
> >> Kernel version: 3.0.1, 3.0.3
> >> Hardware: Dual socket quad-core hyper-threaded Intel x86 machine
> >> Scenario:
> >> (a) Stressful cpu hotplug tests + kernel compilation
> >> 
> >> (b) IRQ balancing had been disabled and all the IRQs  were made to be
> >> 
> >>     routed to CPU 0 (except the ones that couldn't be routed).
> >> 
> >> (c) Lockdep was enabled during kernel configuration.
> >> 
> >> Steps (b) and (c) were done to dig deeper into the issue. However the
> >> same issue was observed by just doing step (a).
> >> 
> >> Definitely there seems to be a race condition occurring here, because
> >> this issue is hit after sometime, after starting the tests. And the
> >> time it takes to hit the issue increases as we increase the number of
> >> debug print statements. In some cases (especially when the number of
> >> debug print statements were quite high), the stress on the machine had
> >> to be increased in order to hit the issue within measurable time. In my
> >> tests, a maximum of about 2 to 2.5 hours was sufficient, to hit this
> >> bug.
> >> 
> >> Please find the console log attached with this mail.
> >> 
> >> Any ideas on how to go about fixing this bug?
> > 
> > It is a regression?
> 
> Hi Maciej,
> 
> Thank you for taking a look.
> Yes, it seems to be a regression. I tested out kernel 2.6.39.3 with similar
> test cases for quite a long time, and it did not hit any soft-lockup
> issues.

Thanks for the answer. I create bug entry:
https://bugzilla.kernel.org/show_bug.cgi?id=42402

Regards
-- 
Maciej Rutecki
http://www.maciek.unixy.pl
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Oleg Nesterov @ 2011-09-05 16:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110905023505.GC9807@htj.dyndns.org>

On 09/05, Tejun Heo wrote:
>
> Ooh, can I add your Acked-by before sending pull request to Rafael?

Yes, thanks, please feel free.

Oleg.

^ permalink raw reply

* Re: [PATCH 6/6] freezer: kill unused set_freezable_with_signal()
From: Oleg Nesterov @ 2011-09-05 16:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: containers, linux-kernel, linux-pm, paul
In-Reply-To: <20110905023315.GB9807@htj.dyndns.org>

On 09/05, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Sun, Sep 04, 2011 at 08:46:26PM +0200, Oleg Nesterov wrote:
> > > @@ -72,10 +72,6 @@ bool __refrigerator(bool check_kthr_stop)
> > >  		schedule();
> > >  	}
> > >
> > > -	spin_lock_irq(&current->sighand->siglock);
> > > -	recalc_sigpending(); /* We sent fake signal, clean it up */
> > > -	spin_unlock_irq(&current->sighand->siglock);
> > > -
> >
> > Why? This recalc_sigpending() makes sense imho. Otherwise the user-space
> > tasks (almost) always return with TIF_SIGPENDING. May be we can do this
> > under "if (PF_KTRHREAD)".
>
> PF_KTHREAD no longer gets TIF_SIGPENDING set, so...

Yes,

> > For example. Suppose the user-space task does wait_event_freezable()...
> >
> > Hmm. OTOH, wait_event_freezable() looks wrong anyway... So probably
> > this doesn't matter. ptrace_stop/get_signal_to_deliver doesn't need
> > this, probably we do not care about the other callers.
>
> Can you elaborate on it being wrong?  Do you mean the possibility of
> leaking spurious TIF_SIGPENDING?

Perhaps it is correct... Just I do not understand what it should do.
I thought it is "wait_for_event && do_not_block_freezer". And at first
glance the code looks as if it tries to do this. Say, in the "likely"
case we restart wait_event_interruptible() after refrigerator().

But this looks racy. Suppose that freezing() is already false when
try_to_freeze() or __refrigerator() is called. Say, cgroup_freezer does
freeze_task() + __thaw_task(). Why it returns -ERESTARTSYS in this case?

And if it can be used by the userspace thread, then we should probably
do recalc_sigpending() somewhere, otherwise wait_event_freezable() will
always return -ERESTARTSYS after __refrigerator().

Oleg.

^ permalink raw reply

* Re: [PATCH 3/5] PM / QoS: Add function dev_pm_qos_read_value()
From: Rafael J. Wysocki @ 2011-09-05 15:30 UTC (permalink / raw)
  To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <CAORVsuXtgd6hTvpkoHobSGKrXgMB1L8dSVbkkt071useNRdngg@mail.gmail.com>

On Monday, September 05, 2011, Jean Pihet wrote:
> Hi,
> 
> On Sat, Sep 3, 2011 at 10:02 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Hi,
> >
> > On Saturday, September 03, 2011, Rafael J. Wysocki wrote:
> >> On Friday, September 02, 2011, Jean Pihet wrote:
> >> > On Fri, Sep 2, 2011 at 12:07 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > ...
> >> > >> >
> >> > >> > -       mutex_lock(&dev_pm_qos_mtx);
> >> > >> >        req->dev = dev;
> >> > >> >
> >> > >> > -       /* Return if the device has been removed */
> >> > >> > -       if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
> >> > >> > -               ret = -ENODEV;
> >> > >> > -               goto out;
> >> > >> > -       }
> >> > >> > +       device_pm_lock();
> >> > >> > +       mutex_lock(&dev_pm_qos_mtx);
> >> > >> >
> >> > >> > -       /*
> >> > >> > -        * Allocate the constraints data on the first call to add_request,
> >> > >> > -        * i.e. only if the data is not already allocated and if the device has
> >> > >> > -        * not been removed
> >> > >> > -        */
> >> > >> > -       if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
> >> > >> > -               ret = dev_pm_qos_constraints_allocate(dev);
> >> > >> > +       if (dev->power.constraints) {
> >> > >> > +               device_pm_unlock();
> >> > >> > +       } else {
> >> > >> > +               if (list_empty(&dev->power.entry)) {
> >> > >> > +                       /* The device has been removed from the system. */
> >> > >> > +                       device_pm_unlock();
> >> > >> > +                       goto out;
> >> > >> 0 is silently returned in case the device has been removed. Is that
> >> > >> the intention?
> >> > >
> >> > > Pretty much it is.  Is that a problem?
> >> > I think the caller needs to know if the constraint has been applied
> >> > correctly or not.
> >>
> >> However, the removal of the device is a special case.  What would the caller
> >> be supposed to do when it learned that the device had been removed while it had
> >> been trying to add a QoS constraing for it?  Not much I guess.
> >
> > Anyway, since returning an error code in that case won't hurt either,
> > below is a v2 of the patch doing that and resetting the dev field in the
> > req structure.
> Ok thanks for the update. The comments are inlined below.
> 

...
> >
> > -       if (req->dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
> > +       if (req->dev->power.constraints) {
> >                if (new_value != req->node.prio)
> >                        ret = apply_constraint(req, PM_QOS_UPDATE_REQ,
> >                                               new_value);
> >        } else {
> >                /* Return if the device has been removed */
> > -               ret = -ENODEV;
> > +               ret = -EINVAL;
> The retcode should be -ENODEV.

Well, both -EINVAL and -ENODEV seem somewhat suitable here and I don't
have a strong preference, so I very well may leave the latter as it.

> Is the kerneldoc still OK?

It will be if the error code isn't changed.

Updated patch follows.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM / QoS: Add function dev_pm_qos_read_value() (v2)

To read the current PM QoS value for a given device we need to
make sure that the device's power.constraints object won't be
removed while we're doing that.  For this reason, put the
operation under dev->power.lock and acquire the lock
around the initialization and removal of power.constraints.

Moreover, since we're using the value of power.constraints to
determine whether or not the object is present, the
power.constraints_state field isn't necessary any more and may be
removed.  However, dev_pm_qos_add_request() needs to check if the
device is being removed from the system before allocating a new
PM QoS constraints object for it, so it has to use device_pm_lock()
and the device PM QoS initialization and destruction should be done
under device_pm_lock() as well.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/base/power/main.c |    4 -
 drivers/base/power/qos.c  |  168 ++++++++++++++++++++++++++--------------------
 include/linux/pm.h        |    8 --
 include/linux/pm_qos.h    |    3 
 4 files changed, 103 insertions(+), 80 deletions(-)

Index: linux/drivers/base/power/qos.c
===================================================================
--- linux.orig/drivers/base/power/qos.c
+++ linux/drivers/base/power/qos.c
@@ -30,15 +30,6 @@
  * . To minimize the data usage by the per-device constraints, the data struct
  *   is only allocated at the first call to dev_pm_qos_add_request.
  * . The data is later free'd when the device is removed from the system.
- * . The constraints_state variable from dev_pm_info tracks the data struct
- *    allocation state:
- *    DEV_PM_QOS_NO_DEVICE: No device present or device removed, no data
- *     allocated,
- *    DEV_PM_QOS_DEVICE_PRESENT: Device present, data not allocated and will be
- *     allocated at the first call to dev_pm_qos_add_request,
- *    DEV_PM_QOS_ALLOCATED: Device present, data allocated. The per-device
- *     PM QoS constraints framework is operational and constraints can be
- *     added, updated or removed using the dev_pm_qos_* API.
  *  . A global mutex protects the constraints users from the data being
  *     allocated and free'd.
  */
@@ -51,8 +42,30 @@
 
 
 static DEFINE_MUTEX(dev_pm_qos_mtx);
+
 static BLOCKING_NOTIFIER_HEAD(dev_pm_notifiers);
 
+/**
+ * dev_pm_qos_read_value - Get PM QoS constraint for a given device.
+ * @dev: Device to get the PM QoS constraint value for.
+ */
+s32 dev_pm_qos_read_value(struct device *dev)
+{
+	struct pm_qos_constraints *c;
+	unsigned long flags;
+	s32 ret = 0;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
+
+	c = dev->power.constraints;
+	if (c)
+		ret = pm_qos_read_value(c);
+
+	spin_unlock_irqrestore(&dev->power.lock, flags);
+
+	return ret;
+}
+
 /*
  * apply_constraint
  * @req: constraint request to apply
@@ -105,27 +118,37 @@ static int dev_pm_qos_constraints_alloca
 	}
 	BLOCKING_INIT_NOTIFIER_HEAD(n);
 
+	plist_head_init(&c->list);
+	c->target_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
+	c->default_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
+	c->type = PM_QOS_MIN;
+	c->notifiers = n;
+
+	spin_lock_irq(&dev->power.lock);
 	dev->power.constraints = c;
-	plist_head_init(&dev->power.constraints->list);
-	dev->power.constraints->target_value = PM_QOS_DEV_LAT_DEFAULT_VALUE;
-	dev->power.constraints->default_value =	PM_QOS_DEV_LAT_DEFAULT_VALUE;
-	dev->power.constraints->type = PM_QOS_MIN;
-	dev->power.constraints->notifiers = n;
-	dev->power.constraints_state = DEV_PM_QOS_ALLOCATED;
+	spin_unlock_irq(&dev->power.lock);
 
 	return 0;
 }
 
+static void __dev_pm_qos_constraints_init(struct device *dev)
+{
+	spin_lock_irq(&dev->power.lock);
+	dev->power.constraints = NULL;
+	spin_unlock_irq(&dev->power.lock);
+}
+
 /**
- * dev_pm_qos_constraints_init
+ * dev_pm_qos_constraints_init - Initalize device's PM QoS constraints pointer.
  * @dev: target device
  *
- * Called from the device PM subsystem at device insertion
+ * Called from the device PM subsystem at device insertion under
+ * device_pm_lock().
  */
 void dev_pm_qos_constraints_init(struct device *dev)
 {
 	mutex_lock(&dev_pm_qos_mtx);
-	dev->power.constraints_state = DEV_PM_QOS_DEVICE_PRESENT;
+	dev->power.constraints = NULL;
 	mutex_unlock(&dev_pm_qos_mtx);
 }
 
@@ -133,34 +156,35 @@ void dev_pm_qos_constraints_init(struct
  * dev_pm_qos_constraints_destroy
  * @dev: target device
  *
- * Called from the device PM subsystem at device removal
+ * Called from the device PM subsystem at device removal under device_pm_lock().
  */
 void dev_pm_qos_constraints_destroy(struct device *dev)
 {
 	struct dev_pm_qos_request *req, *tmp;
+	struct pm_qos_constraints *c;
 
 	mutex_lock(&dev_pm_qos_mtx);
 
-	if (dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
-		/* Flush the constraints list for the device */
-		plist_for_each_entry_safe(req, tmp,
-					  &dev->power.constraints->list,
-					  node) {
-			/*
-			 * Update constraints list and call the notification
-			 * callbacks if needed
-			 */
-			apply_constraint(req, PM_QOS_REMOVE_REQ,
-					 PM_QOS_DEFAULT_VALUE);
-			memset(req, 0, sizeof(*req));
-		}
+	c = dev->power.constraints;
+	if (!c)
+		goto out;
 
-		kfree(dev->power.constraints->notifiers);
-		kfree(dev->power.constraints);
-		dev->power.constraints = NULL;
+	/* Flush the constraints list for the device */
+	plist_for_each_entry_safe(req, tmp, &c->list, node) {
+		/*
+		 * Update constraints list and call the notification
+		 * callbacks if needed
+		 */
+		apply_constraint(req, PM_QOS_REMOVE_REQ, PM_QOS_DEFAULT_VALUE);
+		memset(req, 0, sizeof(*req));
 	}
-	dev->power.constraints_state = DEV_PM_QOS_NO_DEVICE;
 
+	__dev_pm_qos_constraints_init(dev);
+
+	kfree(c->notifiers);
+	kfree(c);
+
+ out:
 	mutex_unlock(&dev_pm_qos_mtx);
 }
 
@@ -178,8 +202,9 @@ void dev_pm_qos_constraints_destroy(stru
  *
  * Returns 1 if the aggregated constraint value has changed,
  * 0 if the aggregated constraint value has not changed,
- * -EINVAL in case of wrong parameters, -ENODEV if the device has been
- * removed from the system
+ * -EINVAL in case of wrong parameters, -ENOMEM if there's not enough memory
+ * to allocate for data structures, -ENODEV if the device has just been removed
+ * from the system.
  */
 int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
 			   s32 value)
@@ -195,28 +220,37 @@ int dev_pm_qos_add_request(struct device
 		return -EINVAL;
 	}
 
-	mutex_lock(&dev_pm_qos_mtx);
 	req->dev = dev;
 
-	/* Return if the device has been removed */
-	if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
-		ret = -ENODEV;
-		goto out;
-	}
+	device_pm_lock();
+	mutex_lock(&dev_pm_qos_mtx);
 
-	/*
-	 * Allocate the constraints data on the first call to add_request,
-	 * i.e. only if the data is not already allocated and if the device has
-	 * not been removed
-	 */
-	if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
-		ret = dev_pm_qos_constraints_allocate(dev);
+	if (dev->power.constraints) {
+		device_pm_unlock();
+	} else {
+		if (list_empty(&dev->power.entry)) {
+			/* The device has been removed from the system. */
+			device_pm_unlock();
+			req->dev = NULL;
+			ret = -ENODEV;
+			goto out;
+		} else {
+			device_pm_unlock();
+			/*
+			 * Allocate the constraints data on the first call to
+			 * add_request, i.e. only if the data is not already
+			 * allocated and if the device has not been removed.
+			 */
+			ret = dev_pm_qos_constraints_allocate(dev);
+		}
+	}
 
 	if (!ret)
 		ret = apply_constraint(req, PM_QOS_ADD_REQ, value);
 
-out:
+ out:
 	mutex_unlock(&dev_pm_qos_mtx);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(dev_pm_qos_add_request);
@@ -252,7 +286,7 @@ int dev_pm_qos_update_request(struct dev
 
 	mutex_lock(&dev_pm_qos_mtx);
 
-	if (req->dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
+	if (req->dev->power.constraints) {
 		if (new_value != req->node.prio)
 			ret = apply_constraint(req, PM_QOS_UPDATE_REQ,
 					       new_value);
@@ -293,7 +327,7 @@ int dev_pm_qos_remove_request(struct dev
 
 	mutex_lock(&dev_pm_qos_mtx);
 
-	if (req->dev->power.constraints_state == DEV_PM_QOS_ALLOCATED) {
+	if (req->dev->power.constraints) {
 		ret = apply_constraint(req, PM_QOS_REMOVE_REQ,
 				       PM_QOS_DEFAULT_VALUE);
 		memset(req, 0, sizeof(*req));
@@ -323,15 +357,12 @@ int dev_pm_qos_add_notifier(struct devic
 
 	mutex_lock(&dev_pm_qos_mtx);
 
-	/* Silently return if the device has been removed */
-	if (dev->power.constraints_state != DEV_PM_QOS_ALLOCATED)
-		goto out;
-
-	retval = blocking_notifier_chain_register(
-			dev->power.constraints->notifiers,
-			notifier);
+	/* Silently return if the constraints object is not present. */
+	if (dev->power.constraints)
+		retval = blocking_notifier_chain_register(
+				dev->power.constraints->notifiers,
+				notifier);
 
-out:
 	mutex_unlock(&dev_pm_qos_mtx);
 	return retval;
 }
@@ -354,15 +385,12 @@ int dev_pm_qos_remove_notifier(struct de
 
 	mutex_lock(&dev_pm_qos_mtx);
 
-	/* Silently return if the device has been removed */
-	if (dev->power.constraints_state != DEV_PM_QOS_ALLOCATED)
-		goto out;
-
-	retval = blocking_notifier_chain_unregister(
-			dev->power.constraints->notifiers,
-			notifier);
+	/* Silently return if the constraints object is not present. */
+	if (dev->power.constraints)
+		retval = blocking_notifier_chain_unregister(
+				dev->power.constraints->notifiers,
+				notifier);
 
-out:
 	mutex_unlock(&dev_pm_qos_mtx);
 	return retval;
 }
Index: linux/include/linux/pm_qos.h
===================================================================
--- linux.orig/include/linux/pm_qos.h
+++ linux/include/linux/pm_qos.h
@@ -77,6 +77,7 @@ int pm_qos_remove_notifier(int pm_qos_cl
 int pm_qos_request_active(struct pm_qos_request *req);
 s32 pm_qos_read_value(struct pm_qos_constraints *c);
 
+s32 dev_pm_qos_read_value(struct device *dev);
 int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
 			   s32 value);
 int dev_pm_qos_update_request(struct dev_pm_qos_request *req, s32 new_value);
@@ -117,6 +118,8 @@ static inline int pm_qos_request_active(
 static inline s32 pm_qos_read_value(struct pm_qos_constraints *c)
 			{ return 0; }
 
+static inline s32 dev_pm_qos_read_value(struct device *dev)
+			{ return 0; }
 static inline int dev_pm_qos_add_request(struct device *dev,
 					 struct dev_pm_qos_request *req,
 					 s32 value)
Index: linux/drivers/base/power/main.c
===================================================================
--- linux.orig/drivers/base/power/main.c
+++ linux/drivers/base/power/main.c
@@ -98,8 +98,8 @@ void device_pm_add(struct device *dev)
 		dev_warn(dev, "parent %s should not be sleeping\n",
 			dev_name(dev->parent));
 	list_add_tail(&dev->power.entry, &dpm_list);
-	mutex_unlock(&dpm_list_mtx);
 	dev_pm_qos_constraints_init(dev);
+	mutex_unlock(&dpm_list_mtx);
 }
 
 /**
@@ -110,9 +110,9 @@ void device_pm_remove(struct device *dev
 {
 	pr_debug("PM: Removing info for %s:%s\n",
 		 dev->bus ? dev->bus->name : "No Bus", dev_name(dev));
-	dev_pm_qos_constraints_destroy(dev);
 	complete_all(&dev->power.completion);
 	mutex_lock(&dpm_list_mtx);
+	dev_pm_qos_constraints_destroy(dev);
 	list_del_init(&dev->power.entry);
 	mutex_unlock(&dpm_list_mtx);
 	device_wakeup_disable(dev);
Index: linux/include/linux/pm.h
===================================================================
--- linux.orig/include/linux/pm.h
+++ linux/include/linux/pm.h
@@ -421,13 +421,6 @@ enum rpm_request {
 	RPM_REQ_RESUME,
 };
 
-/* Per-device PM QoS constraints data struct state */
-enum dev_pm_qos_state {
-	DEV_PM_QOS_NO_DEVICE,		/* No device present */
-	DEV_PM_QOS_DEVICE_PRESENT,	/* Device present, data not allocated */
-	DEV_PM_QOS_ALLOCATED,		/* Device present, data allocated */
-};
-
 struct wakeup_source;
 
 struct pm_domain_data {
@@ -489,7 +482,6 @@ struct dev_pm_info {
 #endif
 	struct pm_subsys_data	*subsys_data;  /* Owned by the subsystem. */
 	struct pm_qos_constraints *constraints;
-	enum dev_pm_qos_state	constraints_state;
 };
 
 extern void update_pm_runtime_accounting(struct device *dev);

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox