Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver
From: Jacob Pan @ 2012-11-14  1:14 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: paulmck, Linux PM, LKML, Rafael Wysocki, Len Brown,
	Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Zhang Rui,
	Rob Landley
In-Reply-To: <50A2E116.8000400@linux.intel.com>

On Tue, 13 Nov 2012 16:08:54 -0800
Arjan van de Ven <arjan@linux.intel.com> wrote:

> > I think I know, but I feel the need to ask anyway.  Why not tell
> > RCU about the clamping?  
> 
> I don't mind telling RCU, but what cannot happen is a bunch of CPU
> time suddenly getting used (since that is the opposite of what is
> needed at the specific point in time of going idle)
Another reason is my observation that there are some assumptions/checks
to make sure only idle thread can tell rcu it is idle. Is it ok to
extend that to other kthreads?

-- 
Thanks,

Jacob

^ permalink raw reply

* Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver
From: Jacob Pan @ 2012-11-14  1:24 UTC (permalink / raw)
  To: paulmck
  Cc: Arjan van de Ven, Linux PM, LKML, Rafael Wysocki, Len Brown,
	Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Zhang Rui,
	Rob Landley
In-Reply-To: <20121114000259.GK2489@linux.vnet.ibm.com>

On Tue, 13 Nov 2012 16:03:00 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> Just to make sure I am really understanding what is happening, let's
> suppose we have a HZ=1000 system that has a few tasks that
> occasionally run at prio 99.  These tasks would run during the clamp
> interval, but would (for example) see the jiffies counter remaining
> at the value at the beginning of the clamp interval until the end of
> that interval, when the jiffies counter would suddenly jump by roughly
> six counts, right?

Yes, if there is no interrupts disturb the clamping duration.
We do not mask interrupts which will update jiffies.

-- 
Thanks,

Jacob

^ permalink raw reply

* Re: [PATCH v9 03/10] ata: zpodd: identify and init ZPODD devices
From: Aaron Lu @ 2012-11-14  1:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Jeff Wu, Aaron Lu, linux-ide, linux-pm, linux-scsi, linux-acpi
In-Reply-To: <20121112185301.GB5560@mtj.dyndns.org>

On 11/13/2012 02:53 AM, Tejun Heo wrote:
> Hello,
> 
> On Fri, Nov 09, 2012 at 02:51:55PM +0800, Aaron Lu wrote:
>>  void ata_acpi_unbind(struct ata_device *dev)
>>  {
>> +	if (zpodd_dev_enabled(dev))
>> +		zpodd_deinit(dev);
> 
> Maybe zpodd_exit() instead?

OK.

> 
>> +void zpodd_init(struct ata_device *dev)
>> +{
>> +	int ret;
>> +	struct zpodd *zpodd;
>> +
>> +	if (dev->private_data)
>> +		return;
>> +
>> +	if (!device_can_poweroff(dev))
>> +		return;
>> +
>> +	if ((ret = check_loading_mechanism(dev)) == -ENODEV)
>> +		return;
>> +
>> +	zpodd = kzalloc(sizeof(struct zpodd), GFP_KERNEL);
>> +	if (!zpodd)
>> +		return;
>> +
>> +	if (ret)
>> +		zpodd->drawer = true;
>> +	else
>> +		zpodd->slot = true;
>> +
>> +	zpodd->dev = dev;
>> +	dev->private_data = zpodd;
> 
> I don't think you're supposed to use dev->private_data from libata
> core layer.  Just add a new field.  Nobody cares about adding 8 more
> bytes to struct ata_device and spending 8 more bytes is way better
> than muddying the ownership of ->private_data.

OK.
And just out of curiosity, who's supposed to use device's private_data?
I didn't find any user for ata_device's private_data in libata.

> 
>> +/* libata-zpodd.c */
>> +#ifdef CONFIG_SATA_ZPODD
>> +void zpodd_init(struct ata_device *dev);
>> +void zpodd_deinit(struct ata_device *dev);
>> +static inline bool zpodd_dev_enabled(struct ata_device *dev)
>> +{
>> +	if (dev->flags & ATA_DFLAG_DA && dev->private_data)
>> +		return true;
>> +	else
>> +		return false;
>> +}
> 
> And this gets completely wrong.  What if the device supports DA and
> low level driver makes use of ->private_data?

I didn't find any user of ata_device's private_data, so I used it for
ZPODD. But if this is dangerous, I'll use a new field.

Thanks,
Aaron


^ permalink raw reply

* Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver
From: Paul E. McKenney @ 2012-11-14  1:34 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Arjan van de Ven, Linux PM, LKML, Rafael Wysocki, Len Brown,
	Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Zhang Rui,
	Rob Landley
In-Reply-To: <20121113171450.3657290c@chromoly>

On Tue, Nov 13, 2012 at 05:14:50PM -0800, Jacob Pan wrote:
> On Tue, 13 Nov 2012 16:08:54 -0800
> Arjan van de Ven <arjan@linux.intel.com> wrote:
> 
> > > I think I know, but I feel the need to ask anyway.  Why not tell
> > > RCU about the clamping?  
> > 
> > I don't mind telling RCU, but what cannot happen is a bunch of CPU
> > time suddenly getting used (since that is the opposite of what is
> > needed at the specific point in time of going idle)

Another round of RCU_FAST_NO_HZ rework, you are asking for?  ;-)

> Another reason is my observation that there are some assumptions/checks
> to make sure only idle thread can tell rcu it is idle. Is it ok to
> extend that to other kthreads?

If you are only having the system take 6-millisecond "vacations", probably
best to try it as it is and fix specific problems if/when they arise.

							Thanx, Paul


^ permalink raw reply

* Re: [PATCH v9 04/10] libata: acpi: move acpi notification code to zpodd
From: Aaron Lu @ 2012-11-14  1:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Jeff Wu, Aaron Lu, linux-ide, linux-pm, linux-scsi, linux-acpi
In-Reply-To: <20121112185544.GC5560@mtj.dyndns.org>

On 11/13/2012 02:55 AM, Tejun Heo wrote:
> On Fri, Nov 09, 2012 at 02:51:56PM +0800, Aaron Lu wrote:
>> Since the ata acpi notification code introduced in commit
>> 3bd46600a7a7e938c54df8cdbac9910668c7dfb0 is solely for ZPODD, and we
>> now have a dedicated place for it, move these code there.
>>
>> And the add/remove_pm_notifier code is simplified a little bit that it
>> does not check things like if the handle is NULL and if a corresponding
>> acpi_device is there for the handle as they are guaranteed by the
>> device_can_poweroff already.
> 
> Please don't mix code movement with actual changes.  It makes it
> difficult to track what's going on.

Oh, yes.
But since add_pm_notifier code happens during ZPODD init time, and init
now happens during first time probe instead of after SCSI device has
been created, some changes are necessary when moving these code.

Sorry for not describing these things clear, I'll update the changelog
to reflect this next time.

Thanks,
Aaron


^ permalink raw reply

* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Aaron Lu @ 2012-11-14  2:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Jeff Wu, Aaron Lu, linux-ide, linux-pm, linux-scsi, linux-acpi
In-Reply-To: <20121112191303.GE5560@mtj.dyndns.org>

On 11/13/2012 03:13 AM, Tejun Heo wrote:
> Hello,
> 
> On Fri, Nov 09, 2012 at 02:51:58PM +0800, Aaron Lu wrote:
>> +#define POWEROFF_DELAY  (30 * 1000)     /* 30 seconds for power off delay */
>> +
>>  struct zpodd {
>>  	bool slot:1;
>>  	bool drawer:1;
>>  	bool from_notify:1;	/* resumed as a result of acpi notification */
>> +	bool status_ready:1;	/* ready status derived from media event poll,
>> +				   it is not accurate, but serves as a hint */
>> +	bool zp_ready:1;	/* zero power ready state */
>> +
>> +	unsigned long last_ready; /* last zero power ready timestamp */
>>  
>>  	struct ata_device *dev;
>>  };
> 
> How are accesses to the bit fields synchronized?

They are synchronized by PM core.
PM core ensures that no two suspend or resume callback run concurrently.
And when ODD is executing a command, it is in active state, no PM
callback will run.

> 
>> +/*
>> + * Snoop the result of GET_STATUS_NOTIFICATION_EVENT, the media
>> + * status byte has information on media present/door closed.
>> + *
>> + * This information serves only as a hint, as it is not accurate.
>> + * The sense code method will be used when deciding if the ODD is
>> + * really zero power ready.
>> + */
> 
> It would be great if you can make the above a proper dockbook function
> comment.  Also, as the snooping for hint thing isn't too obvious it
> would be great if the comment contains more info which is explained in
> the commit message.

OK.

> 
>> +/*
>> + * Check ODD's zero power ready status.
>> + *
>> + * This function is called during ATA port's suspend path,
>> + * when the port is not frozen yet, so that we can still make
>> + * some IO to the ODD to decide if it is zero power ready.
>> + *
>> + * The ODD is regarded as zero power ready when it is in zero
>> + * power ready state for some time(defined by POWEROFF_DELAY).
>> + */
>> +void zpodd_check_zpready(struct ata_device *dev)
>> +{
>> +	bool zp_ready;
>> +	unsigned long expires;
>> +	struct zpodd *zpodd = dev->private_data;
>> +
>> +	if (!zpodd->status_ready) {
>> +		zpodd->last_ready = 0;
>> +		return;
>> +	}
>> +
>> +	if (!zpodd->last_ready) {
>> +		zp_ready = zpready(dev);
>> +		if (zp_ready)
>> +			zpodd->last_ready = jiffies;
>> +		return;
>> +	}
>> +
>> +	expires = zpodd->last_ready + msecs_to_jiffies(POWEROFF_DELAY);
>> +	if (time_before(jiffies, expires))
>> +		return;
>> +
>> +	zpodd->zp_ready = zpready(dev);
>> +	if (!zpodd->zp_ready)
>> +		zpodd->last_ready = 0;
>> +}
> 
> Hmmm... so, the "full" check only happens when autopm kicks in, right?
> Is it really worth avoiding an extra TUR on autopm events?  That's not
> really a hot path.  It seems a bit over-engineered to me.

A little more information about this:
When there is disc inside and no program mounted the drive, the ODD will
be runtime suspended/resumed every 2 seconds along with the event poll.

I'm not sure if the above situation can happen often. Normal desktop
environment should automatically mount the ODD once they got uevent, and
for console users, they will probably manually mount the drive after
they have inserted a disc. The way I did it this way is to deal with the
worst possible case. But if this is deemed as not necessary, I think I
can remove the snoop hint thing and use TUR directly.


Thanks,
Aaron


^ permalink raw reply

* [PATCH v10 0/14] x86: Arbitrary CPU hot(un)plug support
From: Fenghua Yu @ 2012-11-13 19:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Linus Torvalds,
	Andrew Morton, Asit K Mallick, Tony Luck, Arjan Dan De Ven,
	Suresh B Siddha, Len Brown, Srivatssa S. Bhat, Randy Dunlap,
	Rafael J. Wysocki, Chen Gong, linux-kernel, linux-pm, x86
  Cc: Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

CPU0 or BSP (Bootstrap Processor) has been the last processor that can not be
hot removed on x86. This patchset implements CPU0 or BSP online and offline
and removes this obstacle to CPU hotplug.

RAS needs the feature. If socket0 needs to be hotplugged for any reason (any
thread on socket0 is bad, shared cache issue, uncore issue, etc), CPU0 is
required to be offline or hot replaced to keep the system run. For example,
starting with Core Duo, if you have a system that is reporting cache problems
via the "yellow" status in the MCi_STATUS msr, then there is benefit in simply
soft off-lining the cores that share that cache - assuming that leaves you at
least one online core. A single socket system with L3 cache troubles is not
helped - but problems in L1/L2 cache, or on multi-socket systems can be avoided.
They are already being avoided for the cases where CPU0 is not involved.
This patchset can help L1/L2 cache problem in CPU0 or L3 cache problem on the
socket with CPU0 in a multi-socket system.

v10: Use first online CPU to handle retriggered irq in ioapic. Correct
cpu0_logical_apicid for x2apic. Add wakeup_cpu_via_init_nmi() to make code more
readable. Add ENDPROC(start_cpu0). Add comment for pm_notifier priority in
cpu_hotplug_pm_callback.

v9: Add Intel vendor check to support the feature on Intel platforms only.

v8: Add smp_store_boot_cpu_info() and change smp_store_cpu_info() back to avoid
a compilation error. Fix a few messy subject issues.

v7: Change smp_store_cpu_info() to allow store cpuinfo for CPU0 when online.
Change PIC irq detection in native_cpu_disable().

v6: If CPU0 is offlined during boot time in CPU0 hotplug debug mode, put CPU0
online again before resuming from hibernation and disable x2apic and xapic.Don't
set __CPUINIT for start_cpu0() in head_32.S. Clean up CPU0 wake up nmi handler
after callin and callout sync. In a period (3 seconds), check if CPU0 wake up
NMI is handled after offlined CPU0 exits from mwait.

v5: Add CONFIG_BOOTPARAM_HOTPLUG_CPU0 and CONFIG_DEBUG_HOTPLUG_CPU0. Simplify
duplicate xstate_size init check. Wakeup CPU0 via nmi instead INITs. Add
mcheck_cpu_init when CPU0 online. Change variable bsp_hotpluggable to
cpu0_hotpluggable with __initdata qualifier.

v4: Add __read_mostly for internal bsp_hotpluggable variable. Add my email
address in cpu-hotplug.txt document. A wording change in comment.

v3: Register a pm notifier to check if CPU0 is online before hibernate/suspend.
Small wording changes in document and print info.

v2: Add locking changes between cpu hotplug and hibernate/suspend. Change PIC
irq bound to CPU0 detection.

Fenghua Yu (14):
  doc: Add x86 CPU0 online/offline feature
  x86, Kconfig: Add config switch for CPU0 hotplug
  x86, topology: Don't offline CPU0 if any PIC irq can not be migrated
    out of it
  x86, hotplug: Support functions for CPU0 online/offline
  x86, hotplug, suspend: Online CPU0 for suspend or hibernate
  kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
  x86-64, hotplug: Add start_cpu0() entry point to head_64.S
  x86-32, hotplug: Add start_cpu0() entry point to head_32.S
  x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
  x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
  x86, hotplug: The first online processor saves the MTRR state
  x86, hotplug: Handle retrigger irq by the first available CPU
  x86/i387.c: Initialize thread xstate only on CPU0 only once
  x86, topology: Debug CPU00 hotplug

 Documentation/cpu-hotplug.txt       |   24 ++++++
 Documentation/kernel-parameters.txt |   14 +++
 arch/x86/Kconfig                    |   44 ++++++++++
 arch/x86/include/asm/cpu.h          |    4 +
 arch/x86/include/asm/smp.h          |    1 +
 arch/x86/kernel/apic/io_apic.c      |    4 +-
 arch/x86/kernel/cpu/common.c        |    5 +-
 arch/x86/kernel/cpu/mtrr/main.c     |    9 ++-
 arch/x86/kernel/head_32.S           |   13 +++
 arch/x86/kernel/head_64.S           |   16 ++++
 arch/x86/kernel/i387.c              |    6 +-
 arch/x86/kernel/smpboot.c           |  151 ++++++++++++++++++++++++++++------
 arch/x86/kernel/topology.c          |  101 ++++++++++++++++++++++--
 arch/x86/power/cpu.c                |   82 +++++++++++++++++++
 kernel/cpu.c                        |    5 +
 15 files changed, 438 insertions(+), 41 deletions(-)

-- 
1.7.2


^ permalink raw reply

* [PATCH v10 04/14] x86, hotplug: Support functions for CPU0 online/offline
From: Fenghua Yu @ 2012-11-13 19:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Linus Torvalds,
	Andrew Morton, Asit K Mallick, Tony Luck, Arjan Dan De Ven,
	Suresh B Siddha, Len Brown, Srivatssa S. Bhat, Randy Dunlap,
	Rafael J. Wysocki, Chen Gong, linux-kernel, linux-pm, x86
  Cc: Fenghua Yu
In-Reply-To: <1352835171-3958-1-git-send-email-fenghua.yu@intel.com>

From: Fenghua Yu <fenghua.yu@intel.com>

Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time.

Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after
it's offline.

Continue to online CPU0 in native_cpu_up().

Continue to offline CPU0 in native_cpu_disable().

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/smp.h |    1 +
 arch/x86/kernel/smpboot.c  |   38 ++++++++++++++++++--------------------
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 4f19a15..b073aae 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask);
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
+void smp_store_boot_cpu_info(void);
 void smp_store_cpu_info(int id);
 #define cpu_physical_id(cpu)	per_cpu(x86_cpu_to_apicid, cpu)
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..c297907 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -125,8 +125,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 atomic_t init_deasserted;
 
 /*
- * Report back to the Boot Processor.
- * Running on AP.
+ * Report back to the Boot Processor during boot time or to the caller processor
+ * during CPU online.
  */
 static void __cpuinit smp_callin(void)
 {
@@ -279,19 +279,30 @@ notrace static void __cpuinit start_secondary(void *unused)
 	cpu_idle();
 }
 
+void __init smp_store_boot_cpu_info(void)
+{
+	int id = 0; /* CPU 0 */
+	struct cpuinfo_x86 *c = &cpu_data(id);
+
+	*c = boot_cpu_data;
+	c->cpu_index = id;
+}
+
 /*
  * The bootstrap kernel entry code has set these up. Save them for
  * a given CPU
  */
-
 void __cpuinit smp_store_cpu_info(int id)
 {
 	struct cpuinfo_x86 *c = &cpu_data(id);
 
 	*c = boot_cpu_data;
 	c->cpu_index = id;
-	if (id != 0)
-		identify_secondary_cpu(c);
+	/*
+	 * During boot time, CPU0 has this setup already. Save the info when
+	 * bringing up AP or offlined CPU0.
+	 */
+	identify_secondary_cpu(c);
 }
 
 static bool __cpuinit
@@ -795,7 +806,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 
 	pr_debug("++++++++++++++++++++=_---CPU UP  %u\n", cpu);
 
-	if (apicid == BAD_APICID || apicid == boot_cpu_physical_apicid ||
+	if (apicid == BAD_APICID ||
 	    !physid_isset(apicid, phys_cpu_present_map) ||
 	    !apic->apic_id_valid(apicid)) {
 		pr_err("%s: bad cpu %d\n", __func__, cpu);
@@ -990,7 +1001,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	/*
 	 * Setup boot CPU information
 	 */
-	smp_store_cpu_info(0); /* Final full version of the data */
+	smp_store_boot_cpu_info(); /* Final full version of the data */
 	cpumask_copy(cpu_callin_mask, cpumask_of(0));
 	mb();
 
@@ -1214,19 +1225,6 @@ void cpu_disable_common(void)
 
 int native_cpu_disable(void)
 {
-	int cpu = smp_processor_id();
-
-	/*
-	 * Perhaps use cpufreq to drop frequency, but that could go
-	 * into generic code.
-	 *
-	 * We won't take down the boot processor on i386 due to some
-	 * interrupts only being able to be serviced by the BSP.
-	 * Especially so if we're not using an IOAPIC	-zwane
-	 */
-	if (cpu == 0)
-		return -EBUSY;
-
 	clear_local_APIC();
 
 	cpu_disable_common();
-- 
1.7.2


^ permalink raw reply related

* [PATCH v10 05/14] x86, hotplug, suspend: Online CPU0 for suspend or hibernate
From: Fenghua Yu @ 2012-11-13 19:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Linus Torvalds,
	Andrew Morton, Asit K Mallick, Tony Luck, Arjan Dan De Ven,
	Suresh B Siddha, Len Brown, Srivatssa S. Bhat, Randy Dunlap,
	Rafael J. Wysocki, Chen Gong, linux-kernel, linux-pm, x86
  Cc: Fenghua Yu
In-Reply-To: <1352835171-3958-1-git-send-email-fenghua.yu@intel.com>

From: Fenghua Yu <fenghua.yu@intel.com>

Because x86 BIOS requires CPU0 to resume from sleep, suspend or hibernate can't
be executed if CPU0 is detected offline. To make suspend or hibernate and
further resume succeed, CPU0 must be online.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/power/cpu.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 218cdb1..adde775 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -237,3 +237,47 @@ void restore_processor_state(void)
 #ifdef CONFIG_X86_32
 EXPORT_SYMBOL(restore_processor_state);
 #endif
+
+/*
+ * When bsp_check() is called in hibernate and suspend, cpu hotplug
+ * is disabled already. So it's unnessary to handle race condition between
+ * cpumask query and cpu hotplug.
+ */
+static int bsp_check(void)
+{
+	if (cpumask_first(cpu_online_mask) != 0) {
+		pr_warn("CPU0 is offline.\n");
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int bsp_pm_callback(struct notifier_block *nb, unsigned long action,
+			   void *ptr)
+{
+	int ret = 0;
+
+	switch (action) {
+	case PM_SUSPEND_PREPARE:
+	case PM_HIBERNATION_PREPARE:
+		ret = bsp_check();
+		break;
+	default:
+		break;
+	}
+	return notifier_from_errno(ret);
+}
+
+static int __init bsp_pm_check_init(void)
+{
+	/*
+	 * Set this bsp_pm_callback as lower priority than
+	 * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called
+	 * earlier to disable cpu hotplug before bsp online check.
+	 */
+	pm_notifier(bsp_pm_callback, -INT_MAX);
+	return 0;
+}
+
+core_initcall(bsp_pm_check_init);
-- 
1.7.2


^ permalink raw reply related

* [PATCH v10 09/14] x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
From: Fenghua Yu @ 2012-11-13 19:32 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, H Peter Anvin, Linus Torvalds,
	Andrew Morton, Asit K Mallick, Tony Luck, Arjan Dan De Ven,
	Suresh B Siddha, Len Brown, Srivatssa S. Bhat, Randy Dunlap,
	Rafael J. Wysocki, Chen Gong, linux-kernel, linux-pm, x86
  Cc: Fenghua Yu
In-Reply-To: <1352835171-3958-1-git-send-email-fenghua.yu@intel.com>

From: Fenghua Yu <fenghua.yu@intel.com>

Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
code which is not a desired behavior for waking up BSP. To avoid the boot-strap
code, wake up CPU0 by NMI instead.

This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
physically hot removed and then hot added), NMI won't wake it up. We'll change
this code in the future to wake up hard offlined CPU0 if real platform and
request are available.

AP is still waken up as before by INIT, SIPI, SIPI sequence.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpu.h |    1 +
 arch/x86/kernel/smpboot.c  |  113 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4564c8e..a119572 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -28,6 +28,7 @@ struct x86_cpu {
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
+extern void __cpuinit start_cpu0(void);
 #endif
 
 DECLARE_PER_CPU(int, cpu_state);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c297907..31854bc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -138,15 +138,17 @@ static void __cpuinit smp_callin(void)
 	 * we may get here before an INIT-deassert IPI reaches
 	 * our local APIC.  We have to wait for the IPI or we'll
 	 * lock up on an APIC access.
+	 *
+	 * Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI.
 	 */
-	if (apic->wait_for_init_deassert)
+	cpuid = smp_processor_id();
+	if (apic->wait_for_init_deassert && cpuid != 0)
 		apic->wait_for_init_deassert(&init_deasserted);
 
 	/*
 	 * (This works even if the APIC is not enabled.)
 	 */
 	phys_id = read_apic_id();
-	cpuid = smp_processor_id();
 	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
 		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
 					phys_id, cpuid);
@@ -228,6 +230,8 @@ static void __cpuinit smp_callin(void)
 	cpumask_set_cpu(cpuid, cpu_callin_mask);
 }
 
+static int cpu0_logical_apicid;
+static int enable_start_cpu0;
 /*
  * Activate a secondary processor.
  */
@@ -243,6 +247,8 @@ notrace static void __cpuinit start_secondary(void *unused)
 	preempt_disable();
 	smp_callin();
 
+	enable_start_cpu0 = 0;
+
 #ifdef CONFIG_X86_32
 	/* switch away from the initial page table */
 	load_cr3(swapper_pg_dir);
@@ -492,7 +498,7 @@ void __inquire_remote_apic(int apicid)
  * won't ... remember to clear down the APIC, etc later.
  */
 int __cpuinit
-wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
+wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)
 {
 	unsigned long send_status, accept_status = 0;
 	int maxlvt;
@@ -500,7 +506,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)
 	/* Target chip */
 	/* Boot on the stack */
 	/* Kick the second */
-	apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid);
+	apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);
 
 	pr_debug("Waiting for send to finish...\n");
 	send_status = safe_apic_wait_icr_idle();
@@ -660,6 +666,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid)
 			node, cpu, apicid);
 }
 
+static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs)
+{
+	int cpu;
+
+	cpu = smp_processor_id();
+	if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0)
+		return NMI_HANDLED;
+
+	return NMI_DONE;
+}
+
+/*
+ * Wake up AP by INIT, INIT, STARTUP sequence.
+ *
+ * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS
+ * boot-strap code which is not a desired behavior for waking up BSP. To
+ * void the boot-strap code, wake up CPU0 by NMI instead.
+ *
+ * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined
+ * (i.e. physically hot removed and then hot added), NMI won't wake it up.
+ * We'll change this code in the future to wake up hard offlined CPU0 if
+ * real platform and request are available.
+ */
+static int __cpuinit
+wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid,
+	       int *cpu0_nmi_registered)
+{
+	int id;
+	int boot_error;
+
+	/*
+	 * Wake up AP by INIT, INIT, STARTUP sequence.
+	 */
+	if (cpu)
+		return wakeup_secondary_cpu_via_init(apicid, start_ip);
+
+	/*
+	 * Wake up BSP by nmi.
+	 *
+	 * Register a NMI handler to help wake up CPU0.
+	 */
+	boot_error = register_nmi_handler(NMI_LOCAL,
+					  wakeup_cpu0_nmi, 0, "wake_cpu0");
+
+	if (!boot_error) {
+		enable_start_cpu0 = 1;
+		*cpu0_nmi_registered = 1;
+		if (apic->dest_logical == APIC_DEST_LOGICAL)
+			id = cpu0_logical_apicid;
+		else
+			id = apicid;
+		boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip);
+	}
+
+	return boot_error;
+}
+
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -675,6 +738,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 
 	unsigned long boot_error = 0;
 	int timeout;
+	int cpu0_nmi_registered = 0;
 
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
@@ -722,13 +786,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	}
 
 	/*
-	 * Kick the secondary CPU. Use the method in the APIC driver
-	 * if it's defined - or use an INIT boot APIC message otherwise:
+	 * Wake up a CPU in difference cases:
+	 * - Use the method in the APIC driver if it's defined
+	 * Otherwise,
+	 * - Use an INIT boot APIC message for APs or NMI for BSP.
 	 */
 	if (apic->wakeup_secondary_cpu)
 		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);
 	else
-		boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip);
+		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
+						     &cpu0_nmi_registered);
 
 	if (!boot_error) {
 		/*
@@ -793,6 +860,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		 */
 		smpboot_restore_warm_reset_vector();
 	}
+	/*
+	 * Clean up the nmi handler. Do this after the callin and callout sync
+	 * to avoid impact of possible long unregister time.
+	 */
+	if (cpu0_nmi_registered)
+		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0");
+
 	return boot_error;
 }
 
@@ -1037,6 +1111,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	setup_local_APIC();
 
+	if (x2apic_mode)
+		cpu0_logical_apicid = apic_read(APIC_LDR);
+	else
+		cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR));
+
 	/*
 	 * Enable IO APIC before setting up error vector
 	 */
@@ -1264,6 +1343,16 @@ void play_dead_common(void)
 	local_irq_disable();
 }
 
+static bool wakeup_cpu0(void)
+{
+	unsigned int timeout;
+
+	if (smp_processor_id() == 0 && enable_start_cpu0)
+		return true;
+
+	return false;
+}
+
 /*
  * We need to flush the caches before going to sleep, lest we have
  * dirty data in our caches when we come back up.
@@ -1327,6 +1416,11 @@ static inline void mwait_play_dead(void)
 		__monitor(mwait_ptr, 0, 0);
 		mb();
 		__mwait(eax, 0);
+		/*
+		 * If NMI wants to wake up CPU0, start CPU0.
+		 */
+		if (wakeup_cpu0())
+			start_cpu0();
 	}
 }
 
@@ -1337,6 +1431,11 @@ static inline void hlt_play_dead(void)
 
 	while (1) {
 		native_halt();
+		/*
+		 * If NMI wants to wake up CPU0, start CPU0.
+		 */
+		if (wakeup_cpu0())
+			start_cpu0();
 	}
 }
 
-- 
1.7.2


^ permalink raw reply related

* Re: [RFC PATCH v2 1/6] usb: Register usb port's acpi power resources
From: Lan Tianyu @ 2012-11-14  2:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Lan Tianyu, Sergei Shtylyov, gregkh, sarah.a.sharp, stern,
	oneukum, linux-usb, Linux PM list
In-Reply-To: <46669688.k9rXBaWOjk@vostro.rjw.lan>

On 2012年11月14日 07:56, Rafael J. Wysocki wrote:
> On Tuesday, November 13, 2012 08:36:15 PM Lan Tianyu wrote:
>> 于 2012/11/13 19:07, Sergei Shtylyov 写道:
>>> Hello.
>>>
>>> On 13-11-2012 12:00, Lan Tianyu wrote:
>>>
>>>> This patch is to register usb port's acpi power resources. Create
>>>> link between usb port device and its acpi power resource.
>>>
>>>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
>>> [...]
>>>
>>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>>> index cef4252..c58ebc0 100644
>>>> --- a/drivers/usb/core/usb-acpi.c
>>>> +++ b/drivers/usb/core/usb-acpi.c
>>>> @@ -216,6 +216,28 @@ static struct acpi_bus_type usb_acpi_bus = {
>>>>       .find_device = usb_acpi_find_device,
>>>>   };
>>>>
>>>> +int usb_acpi_register_power_resources(struct device *dev)
>>>> +{
>>>> +    acpi_handle port_handle = DEVICE_ACPI_HANDLE(dev);
>>>> +
>>>> +    if (!port_handle)
>>>> +        return -ENODEV;
>>>> +
>>>> +    if (acpi_power_resource_register_device(dev, port_handle) < 0)
>>>> +        return -ENODEV;
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +void usb_acpi_unregister_power_resources(struct device *dev)
>>>> +{
>>>> +    acpi_handle port_handle = DEVICE_ACPI_HANDLE(dev);
>>>> +
>>>> +    if (!port_handle)
>>>> +        return;
>>>> +
>>>> +    acpi_power_resource_register_device(dev, port_handle);
>>>
>>>     I thinbk you have been askied already, but shouldn't it be
>>> acpi_power_resource_unregister_device()?
>>>
>> Oh. Sorry. Too focus on the other modification. Thanks for your reminder.
> 
> Besides, it would be a bit more natural to do
> 
> if (port_handle)
> 	acpi_power_resource_unregister_device(dev, port_handle);
> 
> instead of doing that return when port_handle is NULL.
> 
Hi Rafael:
	Great thanks for your review. :)
	I get it and will update it at next version.
> Thanks,
> Rafael
> 
> 


-- 
Best regards
Tianyu Lan

^ permalink raw reply

* Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver
From: Arjan van de Ven @ 2012-11-14  2:59 UTC (permalink / raw)
  To: paulmck
  Cc: Jacob Pan, Linux PM, LKML, Rafael Wysocki, Len Brown,
	Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Zhang Rui,
	Rob Landley
In-Reply-To: <20121114013459.GS2489@linux.vnet.ibm.com>

On 11/13/2012 5:34 PM, Paul E. McKenney wrote:
> On Tue, Nov 13, 2012 at 05:14:50PM -0800, Jacob Pan wrote:
>> On Tue, 13 Nov 2012 16:08:54 -0800
>> Arjan van de Ven <arjan@linux.intel.com> wrote:
>>
>>>> I think I know, but I feel the need to ask anyway.  Why not tell
>>>> RCU about the clamping?  
>>>
>>> I don't mind telling RCU, but what cannot happen is a bunch of CPU
>>> time suddenly getting used (since that is the opposite of what is
>>> needed at the specific point in time of going idle)
> 
> Another round of RCU_FAST_NO_HZ rework, you are asking for?  ;-)

well
we can tell you we're about to mwait
and we can tell you when we're done being idle.
you could just do the actual work at that point, we don't care anymore ;-)
just at the start of the mandated idle period we can't afford to have more
jitter than we already have (which is more than I'd like, but it's manageable.
More jitter means more performance hit, since during the time of the jitter, some cpus
are forced idle, e.g. costing performance, without the actual big-step power savings
kicking in yet....)

> If you are only having the system take 6-millisecond "vacations", probably

it's not all that different from running a while (1) loop for 6 msec inside
a kernel thread.... other than the power level of course...


^ permalink raw reply

* Re: [RFC PATCH v2 2/6] usb: Add driver/usb/core/port.c file to fill usb port related code.
From: Lan Tianyu @ 2012-11-14  3:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	sarah.a.sharp-VuQAYsv1563Yd54FQh9/CA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz, oneukum-l3A5Bk7waGM,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Linux PM list
In-Reply-To: <2444119.E7TGPxnJj8-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>

On 2012年11月14日 08:04, Rafael J. Wysocki wrote:
> On Tuesday, November 13, 2012 04:00:01 PM Lan Tianyu wrote:
>> This patch is to create driver/usb/core/port.c and move usb port related
>> code into it.
> 
> It does seem to make functional changes in addition to that, however.
> 
No functional change. But change the usb_hub_create_port_device() param.
which original used struct usb_hub as param. Because struct usb_hub is
private struct in the hub.c and now move usb_hub_create_port_device() to
port.c. Will add changelog about this next version.
> If I'm not mistaken and that really is the case, can you (briefly)
> describe those changes here too?
> 
> Rafael
> 
> 
>> Signed-off-by: Lan Tianyu <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> ---
>>  drivers/usb/core/Makefile |    1 +
>>  drivers/usb/core/hub.c    |  113 +++++++--------------------------------------
>>  drivers/usb/core/port.c   |   82 ++++++++++++++++++++++++++++++++
>>  drivers/usb/core/usb.h    |    3 ++
>>  include/linux/usb.h       |   16 +++++++
>>  5 files changed, 118 insertions(+), 97 deletions(-)
>>  create mode 100644 drivers/usb/core/port.c
>>
>> diff --git a/drivers/usb/core/Makefile b/drivers/usb/core/Makefile
>> index 26059b9..5e847ad 100644
>> --- a/drivers/usb/core/Makefile
>> +++ b/drivers/usb/core/Makefile
>> @@ -7,6 +7,7 @@ ccflags-$(CONFIG_USB_DEBUG) := -DDEBUG
>>  usbcore-y := usb.o hub.o hcd.o urb.o message.o driver.o
>>  usbcore-y += config.o file.o buffer.o sysfs.o endpoint.o
>>  usbcore-y += devio.o notify.o generic.o quirks.o devices.o
>> +usbcore-y += port.o
>>  
>>  usbcore-$(CONFIG_PCI)		+= hcd-pci.o
>>  usbcore-$(CONFIG_ACPI)		+= usb-acpi.o
>> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
>> index 3c85fe1c..60746aa 100644
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -42,13 +42,6 @@
>>  #define USB_VENDOR_GENESYS_LOGIC		0x05e3
>>  #define HUB_QUIRK_CHECK_PORT_AUTOSUSPEND	0x01
>>  
>> -struct usb_port {
>> -	struct usb_device *child;
>> -	struct device dev;
>> -	struct dev_state *port_owner;
>> -	enum usb_port_connect_type connect_type;
>> -};
>> -
>>  struct usb_hub {
>>  	struct device		*intfdev;	/* the "interface" device */
>>  	struct usb_device	*hdev;
>> @@ -170,9 +163,6 @@ EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem);
>>  #define HUB_DEBOUNCE_STEP	  25
>>  #define HUB_DEBOUNCE_STABLE	 100
>>  
>> -#define to_usb_port(_dev) \
>> -	container_of(_dev, struct usb_port, dev)
>> -
>>  static int usb_reset_and_verify_device(struct usb_device *udev);
>>  
>>  static inline char *portspeed(struct usb_hub *hub, int portstatus)
>> @@ -1237,57 +1227,12 @@ static int hub_post_reset(struct usb_interface *intf)
>>  	return 0;
>>  }
>>  
>> -static void usb_port_device_release(struct device *dev)
>> -{
>> -	struct usb_port *port_dev = to_usb_port(dev);
>> -
>> -	usb_acpi_unregister_power_resources(dev);
>> -	kfree(port_dev);
>> -}
>> -
>>  static void usb_hub_remove_port_device(struct usb_hub *hub,
>>  				       int port1)
>>  {
>>  	device_unregister(&hub->ports[port1 - 1]->dev);
>>  }
>>  
>> -struct device_type usb_port_device_type = {
>> -	.name =		"usb_port",
>> -	.release =	usb_port_device_release,
>> -};
>> -
>> -static int usb_hub_create_port_device(struct usb_hub *hub,
>> -				      int port1)
>> -{
>> -	struct usb_port *port_dev = NULL;
>> -	int retval;
>> -
>> -	port_dev = kzalloc(sizeof(*port_dev), GFP_KERNEL);
>> -	if (!port_dev) {
>> -		retval = -ENOMEM;
>> -		goto exit;
>> -	}
>> -
>> -	hub->ports[port1 - 1] = port_dev;
>> -	port_dev->dev.parent = hub->intfdev;
>> -	port_dev->dev.groups = port_dev_group;
>> -	port_dev->dev.type = &usb_port_device_type;
>> -	dev_set_name(&port_dev->dev, "port%d", port1);
>> -
>> -	retval = device_register(&port_dev->dev);
>> -	if (retval)
>> -		goto error_register;
>> -
>> -	usb_acpi_register_power_resources(&port_dev->dev);
>> -
>> -	return 0;
>> -
>> -error_register:
>> -	put_device(&port_dev->dev);
>> -exit:
>> -	return retval;
>> -}
>> -
>>  static int hub_configure(struct usb_hub *hub,
>>  	struct usb_endpoint_descriptor *endpoint)
>>  {
>> @@ -1548,10 +1493,24 @@ static int hub_configure(struct usb_hub *hub,
>>  	if (hub->has_indicators && blinkenlights)
>>  		hub->indicator [0] = INDICATOR_CYCLE;
>>  
>> -	for (i = 0; i < hdev->maxchild; i++)
>> -		if (usb_hub_create_port_device(hub, i + 1) < 0)
>> +	for (i = 0; i < hdev->maxchild; i++) {
>> +		struct usb_port *port_dev = NULL;
>> +
>> +		port_dev = kzalloc(sizeof(*port_dev), GFP_KERNEL);
>> +		if (!port_dev) {
>> +			dev_err(hub->intfdev,
>> +				"couldn't create port%d device due to lack mem.\n", i + 1);
>> +			continue;
>> +		}
>> +
>> +		hub->ports[i] = port_dev;
>> +
>> +		if (usb_hub_create_port_device(hub->intfdev, i + 1, port_dev) < 0) {
>>  			dev_err(hub->intfdev,
>>  				"couldn't create port%d device.\n", i + 1);
>> +			hub->ports[i] = NULL;
>> +		}
>> +	}
>>  
>>  	if (!hub_is_superspeed(hdev)) {
>>  		for (i = 1; i <= hdev->maxchild; i++)
>> @@ -4765,46 +4724,6 @@ static int hub_thread(void *__unused)
>>  	return 0;
>>  }
>>  
>> -static ssize_t show_port_connect_type(struct device *dev,
>> -	struct device_attribute *attr, char *buf)
>> -{
>> -	struct usb_port *port_dev = to_usb_port(dev);
>> -	char *result;
>> -
>> -	switch (port_dev->connect_type) {
>> -	case USB_PORT_CONNECT_TYPE_HOT_PLUG:
>> -		result = "hotplug";
>> -		break;
>> -	case USB_PORT_CONNECT_TYPE_HARD_WIRED:
>> -		result = "hardwired";
>> -		break;
>> -	case USB_PORT_NOT_USED:
>> -		result = "not used";
>> -		break;
>> -	default:
>> -		result = "unknown";
>> -		break;
>> -	}
>> -
>> -	return sprintf(buf, "%s\n", result);
>> -}
>> -static DEVICE_ATTR(connect_type, S_IRUGO, show_port_connect_type,
>> -		NULL);
>> -
>> -static struct attribute *port_dev_attrs[] = {
>> -	&dev_attr_connect_type.attr,
>> -	NULL,
>> -};
>> -
>> -static struct attribute_group port_dev_attr_grp = {
>> -	.attrs = port_dev_attrs,
>> -};
>> -
>> -static const struct attribute_group *port_dev_group[] = {
>> -	&port_dev_attr_grp,
>> -	NULL,
>> -};
>> -
>>  static const struct usb_device_id hub_id_table[] = {
>>      { .match_flags = USB_DEVICE_ID_MATCH_VENDOR
>>  	           | USB_DEVICE_ID_MATCH_INT_CLASS,
>> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c
>> new file mode 100644
>> index 0000000..6523a03
>> --- /dev/null
>> +++ b/drivers/usb/core/port.c
>> @@ -0,0 +1,82 @@
>> +#include <linux/kernel.h>
>> +#include <linux/errno.h>
>> +#include <linux/module.h>
>> +#include <linux/usb.h>
>> +
>> +#include "usb.h"
>> +
>> +static ssize_t show_port_connect_type(struct device *dev,
>> +	struct device_attribute *attr, char *buf)
>> +{
>> +	struct usb_port *port_dev = to_usb_port(dev);
>> +	char *result;
>> +
>> +	switch (port_dev->connect_type) {
>> +	case USB_PORT_CONNECT_TYPE_HOT_PLUG:
>> +		result = "hotplug";
>> +		break;
>> +	case USB_PORT_CONNECT_TYPE_HARD_WIRED:
>> +		result = "hardwired";
>> +		break;
>> +	case USB_PORT_NOT_USED:
>> +		result = "not used";
>> +		break;
>> +	default:
>> +		result = "unknown";
>> +		break;
>> +	}
>> +
>> +	return sprintf(buf, "%s\n", result);
>> +}
>> +static DEVICE_ATTR(connect_type, S_IRUGO, show_port_connect_type,
>> +		NULL);
>> +
>> +static struct attribute *port_dev_attrs[] = {
>> +	&dev_attr_connect_type.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group port_dev_attr_grp = {
>> +	.attrs = port_dev_attrs,
>> +};
>> +
>> +static const struct attribute_group *port_dev_group[] = {
>> +	&port_dev_attr_grp,
>> +	NULL,
>> +};
>> +
>> +static void usb_port_device_release(struct device *dev)
>> +{
>> +	struct usb_port *port_dev = to_usb_port(dev);
>> +
>> +	usb_acpi_unregister_power_resources(dev);
>> +	kfree(port_dev);
>> +}
>> +
>> +struct device_type usb_port_device_type = {
>> +	.name =		"usb_port",
>> +	.release =	usb_port_device_release,
>> +};
>> +
>> +int usb_hub_create_port_device(struct device *intfdev,
>> +		int port1, struct usb_port *port_dev)
>> +{
>> +	int retval;
>> +
>> +	port_dev->dev.parent = intfdev;
>> +	port_dev->dev.groups = port_dev_group;
>> +	port_dev->dev.type = &usb_port_device_type;
>> +	dev_set_name(&port_dev->dev, "port%d", port1);
>> +
>> +	retval = device_register(&port_dev->dev);
>> +	if (retval)
>> +		goto error_register;
>> +
>> +	usb_acpi_register_power_resources(&port_dev->dev);
>> +
>> +	return 0;
>> +
>> +error_register:
>> +	put_device(&port_dev->dev);
>> +	return retval;
>> +}
>> diff --git a/drivers/usb/core/usb.h b/drivers/usb/core/usb.h
>> index 38958fb..de7434b 100644
>> --- a/drivers/usb/core/usb.h
>> +++ b/drivers/usb/core/usb.h
>> @@ -60,6 +60,9 @@ extern void usb_hub_cleanup(void);
>>  extern int usb_major_init(void);
>>  extern void usb_major_cleanup(void);
>>  
>> +extern int usb_hub_create_port_device(struct device *intfdev,
>> +		int port1, struct usb_port *port_dev);
>> +
>>  #ifdef	CONFIG_PM
>>  
>>  extern int usb_suspend(struct device *dev, pm_message_t msg);
>> diff --git a/include/linux/usb.h b/include/linux/usb.h
>> index e996bb6..c1f1346 100644
>> --- a/include/linux/usb.h
>> +++ b/include/linux/usb.h
>> @@ -572,6 +572,22 @@ struct usb_device {
>>  };
>>  #define	to_usb_device(d) container_of(d, struct usb_device, dev)
>>  
>> +/**
>> + * struct usb port - kernel's representation of a usb port
>> + * @child: usb device attatched to the port
>> + * @dev: generic device interface
>> + * @port_owner: port's owner
>> + * @connect_type: port's connect type
>> + */
>> +struct usb_port {
>> +	struct usb_device *child;
>> +	struct device dev;
>> +	struct dev_state *port_owner;
>> +	enum usb_port_connect_type connect_type;
>> +};
>> +#define to_usb_port(_dev) \
>> +	container_of(_dev, struct usb_port, dev)
>> +
>>  static inline struct usb_device *interface_to_usbdev(struct usb_interface *intf)
>>  {
>>  	return to_usb_device(intf->dev.parent);
>>


-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] Thermal: Add Linux/Thermal subsystem info in MAINTAINER file
From: Zhang Rui @ 2012-11-14  6:38 UTC (permalink / raw)
  To: Linux PM list, ACPI Devel Maling List, LKML
  Cc: Rafael J. Wysocki, Len Brown, Zhang, Rui, durga, Amit Kachhap,
	jhbird.choi, kuninori.morimoto.gx, eduardo.valentin, viresh.kumar,
	hongbo.zhang

Add Linux/Thermal subsystem info in MAINTAINER file.

All the changes made to the generic thermal layer, or
platform thermal drivers that make use of the thermal layer,
should be sent to linux-pm@vger.kernel.org for discussion.

And as the maintainer, I will only apply the patches that have been
sent to linux-pm@vger.kernel.org.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 MAINTAINERS |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 59203e7..2d8512b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7210,6 +7210,14 @@ L:	linux-xtensa@linux-xtensa.org
 S:	Maintained
 F:	arch/xtensa/
 
+THERMAL
+M:      Zhang Rui <rui.zhang@intel.com>
+L:      linux-pm@vger.kernel.org
+T:      git git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux.git
+S:      Supported
+F:      drivers/thermal/
+F:      include/linux/thermal.h
+
 THINKPAD ACPI EXTRAS DRIVER
 M:	Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
 L:	ibm-acpi-devel@lists.sourceforge.net
-- 
1.7.9.5




^ permalink raw reply related

* Re: [RFC PATCH v2 3/6] usb: add runtime pm support for usb port device
From: Lan Tianyu @ 2012-11-14  6:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: gregkh, sarah.a.sharp, stern, oneukum, linux-usb, Linux PM list
In-Reply-To: <5876155.rAsY9PFrXq@vostro.rjw.lan>

On 2012年11月14日 08:08, Rafael J. Wysocki wrote:
> On Tuesday, November 13, 2012 04:00:02 PM Lan Tianyu wrote:
>> This patch is to add runtime pm callback for usb port device.
>> Set/clear PORT_POWER feature in the resume/suspend callbak.
>> Add portnum for struct usb_port to record port number.
>>
>> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> 
> This one looks reasonable to me.  From the PM side
> 
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Hi Rafael and Alan:
	This patch has a collaboration problem with pm qos. Since pm core would
pm_runtime_get_sync/put(port_dev) if pm qos flags was changed and port's
suspend call_back() clear PORT_POWER feature without any check. This
will cause PORT_POWER feather being cleared every time after pm qos
flags being changed.

	I have an idea that add check in the port's runtime idle callback.
Check NO_POWER_OFF flag firstly. If set return. Second, for port without
device, suspend port directly and for port with device, increase child
device's runtime usage without resume and do barrier to ensure all
suspend process finish, at last check the child runtime status. If it
was suspended, suspend port and if do nothing.

static int usb_port_runtime_idle(struct device *dev)
{
	struct usb_port *port_dev = to_usb_port(dev);
	int retval = 0;

	if (dev_pm_qos_flags(&port_dev->dev, PM_QOS_FLAG_NO_POWER_OFF)
			== PM_QOS_FLAGS_ALL)
		return 0;

	if (!port_dev->child) {
		retval = pm_runtime_suspend(&port_dev->dev);
		if (!retval)
			port_dev->power_on =false;
	}
	else {
		pm_runtime_get_noresume(&port_dev->child->dev);
		pm_runtime_barrier(&port_dev->child->dev);
		if (port_dev->child->dev.power.runtime_status
				== RPM_SUSPENDED) {
			retval = pm_runtime_suspend(&port_dev->dev);
			if (!retval)
				port_dev->power_on =false;
		}	
		pm_runtime_put_noidle(&port_dev->child->dev);
	}
	
	return retval;
}

I'd like to see your opinion :) thanks.

> 
>> ---
>>  drivers/usb/core/hub.c  |   14 ++++++++++++++
>>  drivers/usb/core/port.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  drivers/usb/core/usb.h  |    2 ++
>>  include/linux/usb.h     |    2 ++
>>  4 files changed, 57 insertions(+)
>>
>> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
>> index 60746aa..2cb414e 100644
>> --- a/drivers/usb/core/hub.c
>> +++ b/drivers/usb/core/hub.c
>> @@ -765,6 +765,20 @@ static void hub_tt_work(struct work_struct *work)
>>  	spin_unlock_irqrestore (&hub->tt.lock, flags);
>>  }
>>  
>> +int usb_hub_set_port_power(struct usb_device *hdev, int port1,
>> +		bool set)
>> +{
>> +	int ret;
>> +
>> +	if (set)
>> +		ret = set_port_feature(hdev, port1,
>> +				USB_PORT_FEAT_POWER);
>> +	else
>> +		ret = clear_port_feature(hdev, port1,
>> +				USB_PORT_FEAT_POWER);
>> +	return ret;
>> +}
>> +
>>  /**
>>   * usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub
>>   * @urb: an URB associated with the failed or incomplete split transaction
>> diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c
>> index 6523a03..1cda766 100644
>> --- a/drivers/usb/core/port.c
>> +++ b/drivers/usb/core/port.c
>> @@ -53,9 +53,45 @@ static void usb_port_device_release(struct device *dev)
>>  	kfree(port_dev);
>>  }
>>  
>> +static int usb_port_runtime_resume(struct device *dev)
>> +{
>> +	struct usb_port *port_dev = to_usb_port(dev);
>> +	struct usb_device *hdev =
>> +		to_usb_device(dev->parent->parent);
>> +
>> +	return usb_hub_set_port_power(hdev, port_dev->portnum,
>> +			true);
>> +}
>> +
>> +static int usb_port_runtime_suspend(struct device *dev)
>> +{
>> +	struct usb_port *port_dev = to_usb_port(dev);
>> +	struct usb_device *hdev =
>> +		to_usb_device(dev->parent->parent);
>> +
>> +	return usb_hub_set_port_power(hdev, port_dev->portnum,
>> +			false);
>> +}
>> +
>> +static int usb_port_runtime_idle(struct device *dev)
>> +{
>> +	struct usb_port *port_dev = to_usb_port(dev);
>> +
>> +	return pm_runtime_suspend(&port_dev->dev);
>> +}
>> +
>> +static const struct dev_pm_ops usb_port_pm_ops = {
>> +#ifdef CONFIG_USB_SUSPEND
>> +.runtime_suspend =	usb_port_runtime_suspend,
>> +.runtime_resume =	usb_port_runtime_resume,
>> +.runtime_idle =		usb_port_runtime_idle,
>> +#endif
>> +};
>> +
>>  struct device_type usb_port_device_type = {
>>  	.name =		"usb_port",
>>  	.release =	usb_port_device_release,
>> +	.pm = &usb_port_pm_ops,
>>  };
>>  
>>  int usb_hub_create_port_device(struct device *intfdev,
>> @@ -63,6 +99,7 @@ int usb_hub_create_port_device(struct device *intfdev,
>>  {
>>  	int retval;
>>  
>> +	port_dev->portnum = port1;
>>  	port_dev->dev.parent = intfdev;
>>  	port_dev->dev.groups = port_dev_group;
>>  	port_dev->dev.type = &usb_port_device_type;
>> @@ -72,6 +109,8 @@ int usb_hub_create_port_device(struct device *intfdev,
>>  	if (retval)
>>  		goto error_register;
>>  
>> +	pm_runtime_set_active(&port_dev->dev);
>> +	pm_runtime_enable(&port_dev->dev);
>>  	usb_acpi_register_power_resources(&port_dev->dev);
>>  
>>  	return 0;
>> diff --git a/drivers/usb/core/usb.h b/drivers/usb/core/usb.h
>> index de7434b..47d77d4 100644
>> --- a/drivers/usb/core/usb.h
>> +++ b/drivers/usb/core/usb.h
>> @@ -62,6 +62,8 @@ extern void usb_major_cleanup(void);
>>  
>>  extern int usb_hub_create_port_device(struct device *intfdev,
>>  		int port1, struct usb_port *port_dev);
>> +extern int usb_hub_set_port_power(struct usb_device *hdev,
>> +		int port1, bool set);
>>  
>>  #ifdef	CONFIG_PM
>>  
>> diff --git a/include/linux/usb.h b/include/linux/usb.h
>> index c1f1346..71510bf 100644
>> --- a/include/linux/usb.h
>> +++ b/include/linux/usb.h
>> @@ -578,12 +578,14 @@ struct usb_device {
>>   * @dev: generic device interface
>>   * @port_owner: port's owner
>>   * @connect_type: port's connect type
>> + * @portnum: port index num based one
>>   */
>>  struct usb_port {
>>  	struct usb_device *child;
>>  	struct device dev;
>>  	struct dev_state *port_owner;
>>  	enum usb_port_connect_type connect_type;
>> +	u8 portnum;
>>  };
>>  #define to_usb_port(_dev) \
>>  	container_of(_dev, struct usb_port, dev)
>>


-- 
Best regards
Tianyu Lan

^ permalink raw reply

* [PATCH 1/1] thermal: Exynos: Add missing dependency
From: Sachin Kamat @ 2012-11-14  6:48 UTC (permalink / raw)
  To: linux-pm
  Cc: rui.zhang, durgadoss.r, sachin.kamat, patches, akpm,
	Amit Daniel Kachhap

CPU_FREQ_TABLE depends on CPU_FREQ. Selecting CPU_FREQ_TABLE without checking
for dependencies gives the following compilation warnings:
warning: (ARCH_TEGRA_2x_SOC && ARCH_TEGRA_3x_SOC && UX500_SOC_DB8500 &&
CPU_THERMAL && EXYNOS_THERMAL) selects CPU_FREQ_TABLE which has unmet
direct dependencies (ARCH_HAS_CPUFREQ && CPU_FREQ)

Cc: Amit Daniel Kachhap <amit.kachhap@linaro.org>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Build tested using exynos4_defconfig on linux-next tree of 20121114.
---
 drivers/thermal/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 266c15e..197b7db 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -50,7 +50,7 @@ config RCAR_THERMAL
 
 config EXYNOS_THERMAL
 	tristate "Temperature sensor on Samsung EXYNOS"
-	depends on (ARCH_EXYNOS4 || ARCH_EXYNOS5) && THERMAL
+	depends on (ARCH_EXYNOS4 || ARCH_EXYNOS5) && THERMAL && CPU_FREQ
 	select CPU_FREQ_TABLE
 	help
 	  If you say yes here you get support for TMU (Thermal Managment
-- 
1.7.4.1


^ permalink raw reply related

* Re: [PATCH] cpuidle: Measure idle state durations with monotonic clock
From: Deepthi Dharwar @ 2012-11-14  9:06 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, Kevin Hilman, Trinabh Gupta, linux-pm,
	Daniel Lezcano, Rafael J. Wysocki, linux-acpi, Srivatsa S. Bhat,
	Andrew Morton, linuxppc-dev, Sameer Nanda, Len Brown
In-Reply-To: <1352843563-16392-1-git-send-email-jwerner@chromium.org>

On 11/14/2012 03:22 AM, Julius Werner wrote:
> Many cpuidle drivers measure their time spent in an idle state by
> reading the wallclock time before and after idling and calculating the
> difference. This leads to erroneous results when the wallclock time gets
> updated by another processor in the meantime, adding that clock
> adjustment to the idle state's time counter.
> 
> If the clock adjustment was negative, the result is even worse due to an
> erroneous cast from int to unsigned long long of the last_residency
> variable. The negative 32 bit integer will zero-extend and result in a
> forward time jump of roughly four billion milliseconds or 1.3 hours on
> the idle state residency counter.
> 
> This patch changes all affected cpuidle drivers to use the monotonic
> clock for their measurements instead. It also removes the erroneous
> cast, making sure that negative residency values are applied correctly
> even though they should not appear anymore.

Currently tegra/cpuidle uses ktime_get(). Good to have it for all
the other arch idle residency time logging too.
Tested patch on pseries.

Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>

Cheers,
Deepthi

> 
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>  arch/powerpc/platforms/pseries/processor_idle.c |    4 ++--
>  drivers/acpi/processor_idle.c                   |   12 ++++++------
>  drivers/cpuidle/cpuidle.c                       |    3 +--
>  drivers/idle/intel_idle.c                       |   13 ++++---------
>  4 files changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
> index 45d00e5..4d806b4 100644
> --- a/arch/powerpc/platforms/pseries/processor_idle.c
> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
> @@ -36,7 +36,7 @@ static struct cpuidle_state *cpuidle_state_table;
>  static inline void idle_loop_prolog(unsigned long *in_purr, ktime_t *kt_before)
>  {
> 
> -	*kt_before = ktime_get_real();
> +	*kt_before = ktime_get();
>  	*in_purr = mfspr(SPRN_PURR);
>  	/*
>  	 * Indicate to the HV that we are idle. Now would be
> @@ -50,7 +50,7 @@ static inline  s64 idle_loop_epilog(unsigned long in_purr, ktime_t kt_before)
>  	get_lppaca()->wait_state_cycles += mfspr(SPRN_PURR) - in_purr;
>  	get_lppaca()->idle = 0;
> 
> -	return ktime_to_us(ktime_sub(ktime_get_real(), kt_before));
> +	return ktime_to_us(ktime_sub(ktime_get(), kt_before));
>  }
> 
>  static int snooze_loop(struct cpuidle_device *dev,
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index e8086c7..8c98d73 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -751,9 +751,9 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
> 
> 
>  	lapic_timer_state_broadcast(pr, cx, 1);
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	acpi_idle_do_entry(cx);
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time =  ktime_to_us(ktime_sub(kt2, kt1));
> 
>  	/* Update device last_residency*/
> @@ -843,11 +843,11 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>  	if (cx->type == ACPI_STATE_C3)
>  		ACPI_FLUSH_CPU_CACHE();
> 
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	/* Tell the scheduler that we are going deep-idle: */
>  	sched_clock_idle_sleep_event();
>  	acpi_idle_do_entry(cx);
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>  	idle_time = idle_time_ns;
>  	do_div(idle_time, NSEC_PER_USEC);
> @@ -934,7 +934,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>  	 */
>  	lapic_timer_state_broadcast(pr, cx, 1);
> 
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	/*
>  	 * disable bus master
>  	 * bm_check implies we need ARB_DIS
> @@ -965,7 +965,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>  		c3_cpu_count--;
>  		raw_spin_unlock(&c3_lock);
>  	}
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>  	idle_time = idle_time_ns;
>  	do_div(idle_time, NSEC_PER_USEC);
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 7f15b85..1536edd 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -109,8 +109,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>  		/* This can be moved to within driver enter routine
>  		 * but that results in multiple copies of same code.
>  		 */
> -		dev->states_usage[entered_state].time +=
> -				(unsigned long long)dev->last_residency;
> +		dev->states_usage[entered_state].time += dev->last_residency;
>  		dev->states_usage[entered_state].usage++;
>  	} else {
>  		dev->last_residency = 0;
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index b0f6b4c..6329a97 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -56,7 +56,7 @@
>  #include <linux/kernel.h>
>  #include <linux/cpuidle.h>
>  #include <linux/clockchips.h>
> -#include <linux/hrtimer.h>	/* ktime_get_real() */
> +#include <linux/hrtimer.h>	/* ktime_get() */
>  #include <trace/events/power.h>
>  #include <linux/sched.h>
>  #include <linux/notifier.h>
> @@ -281,8 +281,7 @@ static int intel_idle(struct cpuidle_device *dev,
>  	struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
>  	unsigned long eax = (unsigned long)cpuidle_get_statedata(state_usage);
>  	unsigned int cstate;
> -	ktime_t kt_before, kt_after;
> -	s64 usec_delta;
> +	ktime_t kt_before;
>  	int cpu = smp_processor_id();
> 
>  	cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
> @@ -297,7 +296,7 @@ static int intel_idle(struct cpuidle_device *dev,
>  	if (!(lapic_timer_reliable_states & (1 << (cstate))))
>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
> 
> -	kt_before = ktime_get_real();
> +	kt_before = ktime_get();
> 
>  	stop_critical_timings();
>  	if (!need_resched()) {
> @@ -310,17 +309,13 @@ static int intel_idle(struct cpuidle_device *dev,
> 
>  	start_critical_timings();
> 
> -	kt_after = ktime_get_real();
> -	usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
> +	dev->last_residency = ktime_to_us(ktime_sub(ktime_get(), kt_before));
> 
>  	local_irq_enable();
> 
>  	if (!(lapic_timer_reliable_states & (1 << (cstate))))
>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
> 
> -	/* Update cpuidle counters */
> -	dev->last_residency = (int)usec_delta;
> -
>  	return index;
>  }
> 


^ permalink raw reply

* Re: [RFC PATCH v2 3/6] usb: add runtime pm support for usb port device
From: Rafael J. Wysocki @ 2012-11-14  9:49 UTC (permalink / raw)
  To: Lan Tianyu
  Cc: gregkh, sarah.a.sharp, stern, oneukum, linux-usb, Linux PM list
In-Reply-To: <50A33B7D.2000100@intel.com>

On Wednesday, November 14, 2012 02:34:37 PM Lan Tianyu wrote:
> On 2012年11月14日 08:08, Rafael J. Wysocki wrote:
> > On Tuesday, November 13, 2012 04:00:02 PM Lan Tianyu wrote:
> >> This patch is to add runtime pm callback for usb port device.
> >> Set/clear PORT_POWER feature in the resume/suspend callbak.
> >> Add portnum for struct usb_port to record port number.
> >>
> >> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
> > 
> > This one looks reasonable to me.  From the PM side
> > 
> > Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Hi Rafael and Alan:
> 	This patch has a collaboration problem with pm qos. Since pm core would
> pm_runtime_get_sync/put(port_dev) if pm qos flags was changed and port's
> suspend call_back() clear PORT_POWER feature without any check. This
> will cause PORT_POWER feather being cleared every time after pm qos
> flags being changed.
> 
> 	I have an idea that add check in the port's runtime idle callback.
> Check NO_POWER_OFF flag firstly. If set return. Second, for port without
> device, suspend port directly and for port with device, increase child
> device's runtime usage without resume and do barrier to ensure all
> suspend process finish, at last check the child runtime status. If it
> was suspended, suspend port and if do nothing.
> 
> static int usb_port_runtime_idle(struct device *dev)
> {
> 	struct usb_port *port_dev = to_usb_port(dev);
> 	int retval = 0;
> 
> 	if (dev_pm_qos_flags(&port_dev->dev, PM_QOS_FLAG_NO_POWER_OFF)
> 			== PM_QOS_FLAGS_ALL)
> 		return 0;
> 
> 	if (!port_dev->child) {
> 		retval = pm_runtime_suspend(&port_dev->dev);
> 		if (!retval)
> 			port_dev->power_on =false;
> 	}
> 	else {

This usually is written as "} else {" in the kernel code.

> 		pm_runtime_get_noresume(&port_dev->child->dev);
> 		pm_runtime_barrier(&port_dev->child->dev);
> 		if (port_dev->child->dev.power.runtime_status
> 				== RPM_SUSPENDED) {
> 			retval = pm_runtime_suspend(&port_dev->dev);
> 			if (!retval)
> 				port_dev->power_on =false;
> 		}	
> 		pm_runtime_put_noidle(&port_dev->child->dev);
> 	}

Hmm.  If child->dev is not suspended, then our usage_count should be
at least 1, so pm_runtime_suspend(&port_dev->dev) shouldn't actually
suspend us.  Isn't that the case?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Rafael J. Wysocki @ 2012-11-14  9:52 UTC (permalink / raw)
  To: Huang Ying; +Cc: Alan Stern, linux-kernel, linux-pm
In-Reply-To: <1352855308.7176.232.camel@yhuang-dev>

On Wednesday, November 14, 2012 09:08:28 AM Huang Ying wrote:
> On Tue, 2012-11-13 at 11:10 -0500, Alan Stern wrote:
> > On Tue, 13 Nov 2012, Huang Ying wrote:
> > 
> > > > This is not quite right.  Consider a device that is in runtime suspend 
> > > > when a system sleep starts.  When the system sleep ends, the device 
> > > > will be resumed but the PM core will still think its state is 
> > > > SUSPENDED.  The subsystem has to tell the PM core that the device is 
> > > > now ACTIVE.  Currently, subsystems do this by calling 
> > > > pm_runtime_disable, pm_runtime_set_active, pm_runtime_enable.  Under 
> > > > your scheme this wouldn't work; the pm_runtime_set_active call would 
> > > > fail because the device was !forbidden.
> > > 
> > > Thanks for your information.  For this specific situation, is it
> > > possible to call pm_runtime_resume() or pm_request_resume() for the
> > > device?
> > 
> > No, because the device already is at full power.  The subsystem just
> > needs to tell the PM core that it is.
> > 
> > > > > PM.  Device can always work with full power.
> > > > 
> > > > It can't if the parent is in SUSPEND.  If necessary, the user can write 
> > > > "on" to the parent's power/control attribute first.
> > > 
> > > Is it possible to call pm_runtime_set_active() for the parent if the
> > > parent is disabled and SUSPENDED.
> > 
> > Doing that is possible, but it might not work.  The parent might
> > actually be at low power; calling pm_runtime_set_active wouldn't change
> > the physical power level.  Basically, it's not safe to assume anything
> > about devices that are disabled for runtime PM.
> > 
> > > It appears that there is race condition between this and the
> > > pm_runtime_disable, pm_runtime_set_active, pm_runtime_enable sequence
> > > you mentioned ealier.
> > > 
> > > thread 1			thread 2
> > > pm_runtime_disable
> > > pm_runtime_set_active
> > > 				pm_runtime_allow
> > > 				  pm_runtime_set_suspended
> > > pm_runtime_enable
> > 
> > This can't happen in the situation I described earlier because during
> > system sleep transitions, no other user threads are allowed to run.  
> > All of them except the one actually carrying out the transition are
> > frozen.
> 
> Thanks for your kind explanation.
> 
> After talking with you, my feeling is that the disabled state is obscure
> and error-prone.  So I suggest not to use it if possible.  Maybe we can
> 
>  - make changes suggested by Alan to make disabled state better.

What changes specifically do you mean to be precise?

>  - use Rafael's solution to solve this specific issue, and avoid the
> usage of disabled state here.

Well, I think that the PCI subsystem should just enable runtime PM for
all devices upfront and keep it enabled going forward.

My patch is incomplete, however, because it doesn't deal with probe/remove
correctly at this point (which Alan pointed out earlier in the thread).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Rafael J. Wysocki @ 2012-11-14 10:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: Huang Ying, linux-kernel, linux-pm
In-Reply-To: <Pine.LNX.4.44L0.1211081125470.1280-100000@iolanthe.rowland.org>

On Thursday, November 08, 2012 12:07:54 PM Alan Stern wrote:
> On Thu, 8 Nov 2012, Rafael J. Wysocki wrote:

[...]

I'd like to revisit this for a while if you don't mind.

> Your revised patch does do the job, except for a few problems.  
> Namely, while local_pci_probe() and pci_device_remove() are running,
> the device _does_ have a driver.

Right.

> This means that local_pci_probe() should not call pm_runtime_get_sync(),
> for example.  Doing so would invoke the driver's runtime_resume routine
> before calling the driver's probe routine!
> 
> The USB subsystem solves this problem by carefully keeping track of the 
> state of the device-driver binding:
> 
> 	Originally the device is UNBOUND.
> 
> 	At the start of the subsystem's probe routine, the state
> 	changes to BINDING.
> 
> 	If the probe succeeds then it changes to BOUND; otherwise
> 	it goes back to UNBOUND.
> 
> 	At the start of the subsystem's remove routine, the state
> 	changes to UNBINDING.  At the end it goes to UNBOUND.
> 
> When the state is anything other than BOUND, the subsystem's runtime PM 
> routines act as though there is no driver.

Well, that wouldn't help PCI, because some drivers want to use the
pm_runtime_* stuff in their .probe() routines and actually expect it to
work. :-)

Perhaps we can introduce something like

pm_runtime_get[_put]_skip_callbacks()

that would treat the device as though it had the power.no_callbacks flag
set and use that around the driver's .probe() in the PCI core?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] cpuidle: Measure idle state durations with monotonic clock
From: Daniel Lezcano @ 2012-11-14 10:57 UTC (permalink / raw)
  To: Deepthi Dharwar
  Cc: Julius Werner, linux-kernel, Kevin Hilman, Trinabh Gupta,
	linux-pm, Rafael J. Wysocki, linux-acpi, Srivatsa S. Bhat,
	Andrew Morton, linuxppc-dev, Sameer Nanda, Len Brown
In-Reply-To: <50A35F21.9040003@linux.vnet.ibm.com>

On 11/14/2012 10:06 AM, Deepthi Dharwar wrote:
> On 11/14/2012 03:22 AM, Julius Werner wrote:
>> Many cpuidle drivers measure their time spent in an idle state by
>> reading the wallclock time before and after idling and calculating the
>> difference. This leads to erroneous results when the wallclock time gets
>> updated by another processor in the meantime, adding that clock
>> adjustment to the idle state's time counter.
>>
>> If the clock adjustment was negative, the result is even worse due to an
>> erroneous cast from int to unsigned long long of the last_residency
>> variable. The negative 32 bit integer will zero-extend and result in a
>> forward time jump of roughly four billion milliseconds or 1.3 hours on
>> the idle state residency counter.
>>
>> This patch changes all affected cpuidle drivers to use the monotonic
>> clock for their measurements instead. It also removes the erroneous
>> cast, making sure that negative residency values are applied correctly
>> even though they should not appear anymore.
> 
> Currently tegra/cpuidle uses ktime_get(). Good to have it for all
> the other arch idle residency time logging too.

Actually it is used by all arm cpuidle drivers through the wrapper
"cpuidle_wrap_enter" and the en_core_tk_irqen flag.

> Tested patch on pseries.
> 
> Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
> 
> Cheers,
> Deepthi
> 
>>
>> Signed-off-by: Julius Werner <jwerner@chromium.org>
>> ---
>>  arch/powerpc/platforms/pseries/processor_idle.c |    4 ++--
>>  drivers/acpi/processor_idle.c                   |   12 ++++++------
>>  drivers/cpuidle/cpuidle.c                       |    3 +--
>>  drivers/idle/intel_idle.c                       |   13 ++++---------
>>  4 files changed, 13 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
>> index 45d00e5..4d806b4 100644
>> --- a/arch/powerpc/platforms/pseries/processor_idle.c
>> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
>> @@ -36,7 +36,7 @@ static struct cpuidle_state *cpuidle_state_table;
>>  static inline void idle_loop_prolog(unsigned long *in_purr, ktime_t *kt_before)
>>  {
>>
>> -	*kt_before = ktime_get_real();
>> +	*kt_before = ktime_get();
>>  	*in_purr = mfspr(SPRN_PURR);
>>  	/*
>>  	 * Indicate to the HV that we are idle. Now would be
>> @@ -50,7 +50,7 @@ static inline  s64 idle_loop_epilog(unsigned long in_purr, ktime_t kt_before)
>>  	get_lppaca()->wait_state_cycles += mfspr(SPRN_PURR) - in_purr;
>>  	get_lppaca()->idle = 0;
>>
>> -	return ktime_to_us(ktime_sub(ktime_get_real(), kt_before));
>> +	return ktime_to_us(ktime_sub(ktime_get(), kt_before));
>>  }
>>
>>  static int snooze_loop(struct cpuidle_device *dev,
>> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
>> index e8086c7..8c98d73 100644
>> --- a/drivers/acpi/processor_idle.c
>> +++ b/drivers/acpi/processor_idle.c
>> @@ -751,9 +751,9 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>>
>>
>>  	lapic_timer_state_broadcast(pr, cx, 1);
>> -	kt1 = ktime_get_real();
>> +	kt1 = ktime_get();
>>  	acpi_idle_do_entry(cx);
>> -	kt2 = ktime_get_real();
>> +	kt2 = ktime_get();
>>  	idle_time =  ktime_to_us(ktime_sub(kt2, kt1));
>>
>>  	/* Update device last_residency*/
>> @@ -843,11 +843,11 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>>  	if (cx->type == ACPI_STATE_C3)
>>  		ACPI_FLUSH_CPU_CACHE();
>>
>> -	kt1 = ktime_get_real();
>> +	kt1 = ktime_get();
>>  	/* Tell the scheduler that we are going deep-idle: */
>>  	sched_clock_idle_sleep_event();
>>  	acpi_idle_do_entry(cx);
>> -	kt2 = ktime_get_real();
>> +	kt2 = ktime_get();
>>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>>  	idle_time = idle_time_ns;
>>  	do_div(idle_time, NSEC_PER_USEC);
>> @@ -934,7 +934,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>>  	 */
>>  	lapic_timer_state_broadcast(pr, cx, 1);
>>
>> -	kt1 = ktime_get_real();
>> +	kt1 = ktime_get();
>>  	/*
>>  	 * disable bus master
>>  	 * bm_check implies we need ARB_DIS
>> @@ -965,7 +965,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>>  		c3_cpu_count--;
>>  		raw_spin_unlock(&c3_lock);
>>  	}
>> -	kt2 = ktime_get_real();
>> +	kt2 = ktime_get();
>>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>>  	idle_time = idle_time_ns;
>>  	do_div(idle_time, NSEC_PER_USEC);
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index 7f15b85..1536edd 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -109,8 +109,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
>>  		/* This can be moved to within driver enter routine
>>  		 * but that results in multiple copies of same code.
>>  		 */
>> -		dev->states_usage[entered_state].time +=
>> -				(unsigned long long)dev->last_residency;
>> +		dev->states_usage[entered_state].time += dev->last_residency;
>>  		dev->states_usage[entered_state].usage++;
>>  	} else {
>>  		dev->last_residency = 0;
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index b0f6b4c..6329a97 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -56,7 +56,7 @@
>>  #include <linux/kernel.h>
>>  #include <linux/cpuidle.h>
>>  #include <linux/clockchips.h>
>> -#include <linux/hrtimer.h>	/* ktime_get_real() */
>> +#include <linux/hrtimer.h>	/* ktime_get() */
>>  #include <trace/events/power.h>
>>  #include <linux/sched.h>
>>  #include <linux/notifier.h>
>> @@ -281,8 +281,7 @@ static int intel_idle(struct cpuidle_device *dev,
>>  	struct cpuidle_state_usage *state_usage = &dev->states_usage[index];
>>  	unsigned long eax = (unsigned long)cpuidle_get_statedata(state_usage);
>>  	unsigned int cstate;
>> -	ktime_t kt_before, kt_after;
>> -	s64 usec_delta;
>> +	ktime_t kt_before;
>>  	int cpu = smp_processor_id();
>>
>>  	cstate = (((eax) >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1;
>> @@ -297,7 +296,7 @@ static int intel_idle(struct cpuidle_device *dev,
>>  	if (!(lapic_timer_reliable_states & (1 << (cstate))))
>>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
>>
>> -	kt_before = ktime_get_real();
>> +	kt_before = ktime_get();
>>
>>  	stop_critical_timings();
>>  	if (!need_resched()) {
>> @@ -310,17 +309,13 @@ static int intel_idle(struct cpuidle_device *dev,
>>
>>  	start_critical_timings();
>>
>> -	kt_after = ktime_get_real();
>> -	usec_delta = ktime_to_us(ktime_sub(kt_after, kt_before));
>> +	dev->last_residency = ktime_to_us(ktime_sub(ktime_get(), kt_before));
>>
>>  	local_irq_enable();
>>
>>  	if (!(lapic_timer_reliable_states & (1 << (cstate))))
>>  		clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &cpu);
>>
>> -	/* Update cpuidle counters */
>> -	dev->last_residency = (int)usec_delta;
>> -
>>  	return index;
>>  }
>>
> 


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] cpuidle: Measure idle state durations with monotonic clock
From: Daniel Lezcano @ 2012-11-14 11:05 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, Len Brown, Rafael J. Wysocki, Kevin Hilman,
	Andrew Morton, Srivatsa S. Bhat, linux-acpi, linux-pm,
	linuxppc-dev, Deepthi Dharwar, Trinabh Gupta, Sameer Nanda,
	Lists Linaro-dev
In-Reply-To: <1352843563-16392-1-git-send-email-jwerner@chromium.org>

On 11/13/2012 10:52 PM, Julius Werner wrote:
> Many cpuidle drivers measure their time spent in an idle state by
> reading the wallclock time before and after idling and calculating the
> difference. This leads to erroneous results when the wallclock time gets
> updated by another processor in the meantime, adding that clock
> adjustment to the idle state's time counter.
> 
> If the clock adjustment was negative, the result is even worse due to an
> erroneous cast from int to unsigned long long of the last_residency
> variable. The negative 32 bit integer will zero-extend and result in a
> forward time jump of roughly four billion milliseconds or 1.3 hours on
> the idle state residency counter.
> 
> This patch changes all affected cpuidle drivers to use the monotonic
> clock for their measurements instead. It also removes the erroneous
> cast, making sure that negative residency values are applied correctly
> even though they should not appear anymore.
> 
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>  arch/powerpc/platforms/pseries/processor_idle.c |    4 ++--
>  drivers/acpi/processor_idle.c                   |   12 ++++++------
>  drivers/cpuidle/cpuidle.c                       |    3 +--
>  drivers/idle/intel_idle.c                       |   13 ++++---------
>  4 files changed, 13 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c
> index 45d00e5..4d806b4 100644
> --- a/arch/powerpc/platforms/pseries/processor_idle.c
> +++ b/arch/powerpc/platforms/pseries/processor_idle.c
> @@ -36,7 +36,7 @@ static struct cpuidle_state *cpuidle_state_table;
>  static inline void idle_loop_prolog(unsigned long *in_purr, ktime_t *kt_before)
>  {
>  
> -	*kt_before = ktime_get_real();
> +	*kt_before = ktime_get();
>  	*in_purr = mfspr(SPRN_PURR);
>  	/*
>  	 * Indicate to the HV that we are idle. Now would be
> @@ -50,7 +50,7 @@ static inline  s64 idle_loop_epilog(unsigned long in_purr, ktime_t kt_before)
>  	get_lppaca()->wait_state_cycles += mfspr(SPRN_PURR) - in_purr;
>  	get_lppaca()->idle = 0;
>  
> -	return ktime_to_us(ktime_sub(ktime_get_real(), kt_before));
> +	return ktime_to_us(ktime_sub(ktime_get(), kt_before));
>  }
>  
>  static int snooze_loop(struct cpuidle_device *dev,
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index e8086c7..8c98d73 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -751,9 +751,9 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>  
>  
>  	lapic_timer_state_broadcast(pr, cx, 1);
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	acpi_idle_do_entry(cx);
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time =  ktime_to_us(ktime_sub(kt2, kt1));
>  
>  	/* Update device last_residency*/
> @@ -843,11 +843,11 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
>  	if (cx->type == ACPI_STATE_C3)
>  		ACPI_FLUSH_CPU_CACHE();
>  
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	/* Tell the scheduler that we are going deep-idle: */
>  	sched_clock_idle_sleep_event();
>  	acpi_idle_do_entry(cx);
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>  	idle_time = idle_time_ns;
>  	do_div(idle_time, NSEC_PER_USEC);
> @@ -934,7 +934,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>  	 */
>  	lapic_timer_state_broadcast(pr, cx, 1);
>  
> -	kt1 = ktime_get_real();
> +	kt1 = ktime_get();
>  	/*
>  	 * disable bus master
>  	 * bm_check implies we need ARB_DIS
> @@ -965,7 +965,7 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
>  		c3_cpu_count--;
>  		raw_spin_unlock(&c3_lock);
>  	}
> -	kt2 = ktime_get_real();
> +	kt2 = ktime_get();
>  	idle_time_ns = ktime_to_ns(ktime_sub(kt2, kt1));
>  	idle_time = idle_time_ns;
>  	do_div(idle_time, NSEC_PER_USEC);


Maybe you can remove all these computations and set the flag
en_core_tk_irqen for the driver ? That will be handled by the cpuidle
framework, no ?

Same comment for the intel_idle driver.

Thanks
  -- Daniel

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply

* Re: [RFC PATCH v2 3/6] usb: add runtime pm support for usb port device
From: Lan Tianyu @ 2012-11-14 12:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Lan Tianyu, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	sarah.a.sharp-VuQAYsv1563Yd54FQh9/CA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz, oneukum-l3A5Bk7waGM,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Linux PM list
In-Reply-To: <2700525.6XmcYg8quR-sKB8Sp2ER+y1GS7QM15AGw@public.gmane.org>

于 2012/11/14 17:49, Rafael J. Wysocki 写道:
> On Wednesday, November 14, 2012 02:34:37 PM Lan Tianyu wrote:
>> On 2012年11月14日 08:08, Rafael J. Wysocki wrote:
>>> On Tuesday, November 13, 2012 04:00:02 PM Lan Tianyu wrote:
>>>> This patch is to add runtime pm callback for usb port device.
>>>> Set/clear PORT_POWER feature in the resume/suspend callbak.
>>>> Add portnum for struct usb_port to record port number.
>>>>
>>>> Signed-off-by: Lan Tianyu <tianyu.lan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>>>
>>> This one looks reasonable to me.  From the PM side
>>>
>>> Acked-by: Rafael J. Wysocki <rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> Hi Rafael and Alan:
>> 	This patch has a collaboration problem with pm qos. Since pm core would
>> pm_runtime_get_sync/put(port_dev) if pm qos flags was changed and port's
>> suspend call_back() clear PORT_POWER feature without any check. This
>> will cause PORT_POWER feather being cleared every time after pm qos
>> flags being changed.
>>
>> 	I have an idea that add check in the port's runtime idle callback.
>> Check NO_POWER_OFF flag firstly. If set return. Second, for port without
>> device, suspend port directly and for port with device, increase child
>> device's runtime usage without resume and do barrier to ensure all
>> suspend process finish, at last check the child runtime status. If it
>> was suspended, suspend port and if do nothing.
>>
>> static int usb_port_runtime_idle(struct device *dev)
>> {
>> 	struct usb_port *port_dev = to_usb_port(dev);
>> 	int retval = 0;
>>
>> 	if (dev_pm_qos_flags(&port_dev->dev, PM_QOS_FLAG_NO_POWER_OFF)
>> 			== PM_QOS_FLAGS_ALL)
>> 		return 0;
>>
>> 	if (!port_dev->child) {
>> 		retval = pm_runtime_suspend(&port_dev->dev);
>> 		if (!retval)
>> 			port_dev->power_on =false;
>> 	}
>> 	else {
>
> This usually is written as "} else {" in the kernel code.
>
>> 		pm_runtime_get_noresume(&port_dev->child->dev);
>> 		pm_runtime_barrier(&port_dev->child->dev);
>> 		if (port_dev->child->dev.power.runtime_status
>> 				== RPM_SUSPENDED) {
>> 			retval = pm_runtime_suspend(&port_dev->dev);
>> 			if (!retval)
>> 				port_dev->power_on =false;
>> 		}	
>> 		pm_runtime_put_noidle(&port_dev->child->dev);
>> 	}
>
> Hmm.  If child->dev is not suspended, then our usage_count should be
> at least 1, so pm_runtime_suspend(&port_dev->dev) shouldn't actually
> suspend us.  Isn't that the case?
No, because the child device is not under port device and so even if
child->dev is not suspended, port device's usage still can be 0 and
power off the port.
>
> Rafael
>
>

-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [BUGFIX] PM: Fix active child counting when disabled and forbidden
From: Huang Ying @ 2012-11-14 13:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Alan Stern, linux-kernel, linux-pm
In-Reply-To: <2028148.7b9pFIymgn@vostro.rjw.lan>

On Wed, 2012-11-14 at 10:52 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 14, 2012 09:08:28 AM Huang Ying wrote:
> > On Tue, 2012-11-13 at 11:10 -0500, Alan Stern wrote:
> > > On Tue, 13 Nov 2012, Huang Ying wrote:
> > > 
> > > > > This is not quite right.  Consider a device that is in runtime suspend 
> > > > > when a system sleep starts.  When the system sleep ends, the device 
> > > > > will be resumed but the PM core will still think its state is 
> > > > > SUSPENDED.  The subsystem has to tell the PM core that the device is 
> > > > > now ACTIVE.  Currently, subsystems do this by calling 
> > > > > pm_runtime_disable, pm_runtime_set_active, pm_runtime_enable.  Under 
> > > > > your scheme this wouldn't work; the pm_runtime_set_active call would 
> > > > > fail because the device was !forbidden.
> > > > 
> > > > Thanks for your information.  For this specific situation, is it
> > > > possible to call pm_runtime_resume() or pm_request_resume() for the
> > > > device?
> > > 
> > > No, because the device already is at full power.  The subsystem just
> > > needs to tell the PM core that it is.
> > > 
> > > > > > PM.  Device can always work with full power.
> > > > > 
> > > > > It can't if the parent is in SUSPEND.  If necessary, the user can write 
> > > > > "on" to the parent's power/control attribute first.
> > > > 
> > > > Is it possible to call pm_runtime_set_active() for the parent if the
> > > > parent is disabled and SUSPENDED.
> > > 
> > > Doing that is possible, but it might not work.  The parent might
> > > actually be at low power; calling pm_runtime_set_active wouldn't change
> > > the physical power level.  Basically, it's not safe to assume anything
> > > about devices that are disabled for runtime PM.
> > > 
> > > > It appears that there is race condition between this and the
> > > > pm_runtime_disable, pm_runtime_set_active, pm_runtime_enable sequence
> > > > you mentioned ealier.
> > > > 
> > > > thread 1			thread 2
> > > > pm_runtime_disable
> > > > pm_runtime_set_active
> > > > 				pm_runtime_allow
> > > > 				  pm_runtime_set_suspended
> > > > pm_runtime_enable
> > > 
> > > This can't happen in the situation I described earlier because during
> > > system sleep transitions, no other user threads are allowed to run.  
> > > All of them except the one actually carrying out the transition are
> > > frozen.
> > 
> > Thanks for your kind explanation.
> > 
> > After talking with you, my feeling is that the disabled state is obscure
> > and error-prone.  So I suggest not to use it if possible.  Maybe we can
> > 
> >  - make changes suggested by Alan to make disabled state better.
> 
> What changes specifically do you mean to be precise?

I mean the following changes from Alan's email.

        pm_runtime_set_suspended should fail if dev->power.runtime_auto
        is clear.

        pm_runtime_forbid should call pm_runtime_set_active if
        dev->power.disable_depth > 0.  (This would run into a problem
        if the parent is suspended and disabled.  Maybe 
        pm_runtime_forbid should fail when this happens.)

For the second one, is it possible that the device is really in low
power state when pm_runtime_forbid is called?  That situation is hard to
deal with too.

> >  - use Rafael's solution to solve this specific issue, and avoid the
> > usage of disabled state here.
> 
> Well, I think that the PCI subsystem should just enable runtime PM for
> all devices upfront and keep it enabled going forward.
> 
> My patch is incomplete, however, because it doesn't deal with probe/remove
> correctly at this point (which Alan pointed out earlier in the thread).

Yes.

Best Regards,
Huang Ying



^ permalink raw reply

* Re: [PATCH 4/6 v4] arm highbank: add support for pl320 IPC
From: Rob Herring @ 2012-11-14 14:03 UTC (permalink / raw)
  To: Mark Langsdorf; +Cc: linux-kernel, cpufreq, linux-pm
In-Reply-To: <1352313166-28980-5-git-send-email-mark.langsdorf@calxeda.com>

On 11/07/2012 12:32 PM, Mark Langsdorf wrote:
> From: Rob Herring <rob.herring@calxeda.com>
> 
> The pl320 IPC allows for interprocessor communication between the highbank A9
> and the EnergyCore Management Engine. The pl320 implements a straightforward
> mailbox protocol.
> 
> Signed-off-by: Mark Langsdorf <mark.langsdorf@calxeda.com>
> Signed-off-by: Rob Herring <rob.herring@calxeda.com>
> ---
> Changes from v3, v2
> 	None
> Changes from v1
>         Removed erroneous changes for cpufreq Kconfig
> 
>  arch/arm/include/asm/pl320-ipc.h                |  20 ++

asm/hardware/ is probably more appropriate.

>  arch/arm/mach-highbank/Makefile                 |   2 +
>  arch/arm/mach-highbank/include/mach/pl320-ipc.h |  20 ++

Need to delete this file.

>  arch/arm/mach-highbank/pl320-ipc.c              | 232 ++++++++++++++++++++++++
>  4 files changed, 274 insertions(+)
>  create mode 100644 arch/arm/include/asm/pl320-ipc.h
>  create mode 100644 arch/arm/mach-highbank/include/mach/pl320-ipc.h
>  create mode 100644 arch/arm/mach-highbank/pl320-ipc.c
> 
> diff --git a/arch/arm/include/asm/pl320-ipc.h b/arch/arm/include/asm/pl320-ipc.h
> new file mode 100644
> index 0000000..a0e58ee
> --- /dev/null
> +++ b/arch/arm/include/asm/pl320-ipc.h
> @@ -0,0 +1,20 @@
> +/*
> + * Copyright 2010 Calxeda, Inc.

Update copyright.

> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +int ipc_call_fast(u32 *data);

We should get rid of fast and slow channels and just have a single tx
channel as it is all the same and we don't use the fast channel.

> +int ipc_call_slow(u32 *data);
> +
> +extern int pl320_ipc_register_notifier(struct notifier_block *nb);
> +extern int pl320_ipc_unregister_notifier(struct notifier_block *nb);
> diff --git a/arch/arm/mach-highbank/Makefile b/arch/arm/mach-highbank/Makefile
> index 3ec8bdd..b894708 100644
> --- a/arch/arm/mach-highbank/Makefile
> +++ b/arch/arm/mach-highbank/Makefile
> @@ -7,3 +7,5 @@ obj-$(CONFIG_DEBUG_HIGHBANK_UART)	+= lluart.o
>  obj-$(CONFIG_SMP)			+= platsmp.o
>  obj-$(CONFIG_HOTPLUG_CPU)		+= hotplug.o
>  obj-$(CONFIG_PM_SLEEP)			+= pm.o
> +
> +obj-y					+= pl320-ipc.o
> diff --git a/arch/arm/mach-highbank/include/mach/pl320-ipc.h b/arch/arm/mach-highbank/include/mach/pl320-ipc.h
> new file mode 100644
> index 0000000..a0e58ee
> --- /dev/null
> +++ b/arch/arm/mach-highbank/include/mach/pl320-ipc.h
> @@ -0,0 +1,20 @@
> +/*
> + * Copyright 2010 Calxeda, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +int ipc_call_fast(u32 *data);
> +int ipc_call_slow(u32 *data);
> +
> +extern int pl320_ipc_register_notifier(struct notifier_block *nb);
> +extern int pl320_ipc_unregister_notifier(struct notifier_block *nb);
> diff --git a/arch/arm/mach-highbank/pl320-ipc.c b/arch/arm/mach-highbank/pl320-ipc.c
> new file mode 100644
> index 0000000..0eb92e4
> --- /dev/null
> +++ b/arch/arm/mach-highbank/pl320-ipc.c
> @@ -0,0 +1,232 @@
> +/*
> + * Copyright 2012 Calxeda, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +#include <linux/types.h>
> +#include <linux/err.h>
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/io.h>
> +#include <linux/interrupt.h>
> +#include <linux/completion.h>
> +#include <linux/mutex.h>
> +#include <linux/notifier.h>
> +#include <linux/spinlock.h>
> +#include <linux/device.h>
> +#include <linux/amba/bus.h>
> +
> +#include <asm/pl320-ipc.h>
> +
> +#define IPCMxSOURCE(m)		((m) * 0x40)
> +#define IPCMxDSET(m)		(((m) * 0x40) + 0x004)
> +#define IPCMxDCLEAR(m)		(((m) * 0x40) + 0x008)
> +#define IPCMxDSTATUS(m)		(((m) * 0x40) + 0x00C)
> +#define IPCMxMODE(m)		(((m) * 0x40) + 0x010)
> +#define IPCMxMSET(m)		(((m) * 0x40) + 0x014)
> +#define IPCMxMCLEAR(m)		(((m) * 0x40) + 0x018)
> +#define IPCMxMSTATUS(m)		(((m) * 0x40) + 0x01C)
> +#define IPCMxSEND(m)		(((m) * 0x40) + 0x020)
> +#define IPCMxDR(m, dr)		(((m) * 0x40) + ((dr) * 4) + 0x024)
> +
> +#define IPCMMIS(irq)		(((irq) * 8) + 0x800)
> +#define IPCMRIS(irq)		(((irq) * 8) + 0x804)
> +
> +#define MBOX_MASK(n)		(1 << (n))
> +#define IPC_FAST_MBOX		0
> +#define IPC_SLOW_MBOX		1
> +#define IPC_RX_MBOX		2
> +
> +#define CHAN_MASK(n)		(1 << (n))
> +#define A9_SOURCE		1
> +#define M3_SOURCE		0
> +
> +static void __iomem *ipc_base;
> +static int ipc_irq;
> +static DEFINE_SPINLOCK(ipc_m0_lock);
> +static DEFINE_MUTEX(ipc_m1_lock);
> +static DECLARE_COMPLETION(ipc_completion);
> +static ATOMIC_NOTIFIER_HEAD(ipc_notifier);
> +
> +static inline void set_destination(int source, int mbox)
> +{
> +	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDSET(mbox));
> +	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMSET(mbox));
> +}
> +
> +static inline void clear_destination(int source, int mbox)
> +{
> +	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxDCLEAR(mbox));
> +	__raw_writel(CHAN_MASK(source), ipc_base + IPCMxMCLEAR(mbox));
> +}
> +
> +static void __ipc_send(int mbox, u32 *data)
> +{
> +	int i;
> +	for (i = 0; i < 7; i++)
> +		__raw_writel(data[i], ipc_base + IPCMxDR(mbox, i));
> +	__raw_writel(0x1, ipc_base + IPCMxSEND(mbox));
> +}
> +
> +static u32 __ipc_rcv(int mbox, u32 *data)
> +{
> +	int i;
> +	for (i = 0; i < 7; i++)
> +		data[i] = __raw_readl(ipc_base + IPCMxDR(mbox, i));
> +	return data[1];
> +}
> +
> +/* non-blocking implementation from the A9 side, interrupt safe in theory */
> +int ipc_call_fast(u32 *data)
> +{
> +	int timeout, ret;
> +
> +	spin_lock(&ipc_m0_lock);
> +
> +	__ipc_send(IPC_FAST_MBOX, data);
> +
> +	for (timeout = 500; timeout > 0; timeout--) {
> +		if (__raw_readl(ipc_base + IPCMxSEND(IPC_FAST_MBOX)) == 0x2)
> +			break;
> +		udelay(100);
> +	}
> +	if (timeout == 0) {
> +		ret = -ETIMEDOUT;
> +		goto out;
> +	}
> +
> +	ret = __ipc_rcv(IPC_FAST_MBOX, data);
> +out:
> +	__raw_writel(0, ipc_base + IPCMxSEND(IPC_FAST_MBOX));
> +	spin_unlock(&ipc_m0_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ipc_call_fast);
> +
> +/* blocking implmentation from the A9 side, not usuable in interrupts! */
> +int ipc_call_slow(u32 *data)
> +{
> +	int ret;
> +
> +	mutex_lock(&ipc_m1_lock);
> +
> +	init_completion(&ipc_completion);
> +	__ipc_send(IPC_SLOW_MBOX, data);
> +	ret = wait_for_completion_timeout(&ipc_completion,
> +					  msecs_to_jiffies(1000));
> +	if (ret == 0) {
> +		ret = -ETIMEDOUT;
> +		goto out;
> +	}
> +
> +	ret = __ipc_rcv(IPC_SLOW_MBOX, data);
> +out:
> +	mutex_unlock(&ipc_m1_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ipc_call_slow);
> +
> +irqreturn_t ipc_handler(int irq, void *dev)
> +{
> +	u32 irq_stat;
> +	u32 data[7];
> +
> +	irq_stat = __raw_readl(ipc_base + IPCMMIS(1));
> +	if (irq_stat & MBOX_MASK(IPC_SLOW_MBOX)) {
> +		__raw_writel(0, ipc_base + IPCMxSEND(IPC_SLOW_MBOX));
> +		complete(&ipc_completion);
> +	}
> +	if (irq_stat & MBOX_MASK(IPC_RX_MBOX)) {
> +		__ipc_rcv(IPC_RX_MBOX, data);
> +		atomic_notifier_call_chain(&ipc_notifier, data[0], data + 1);
> +		__raw_writel(2, ipc_base + IPCMxSEND(IPC_RX_MBOX));
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +int pl320_ipc_register_notifier(struct notifier_block *nb)
> +{
> +	return atomic_notifier_chain_register(&ipc_notifier, nb);
> +}
> +
> +int pl320_ipc_unregister_notifier(struct notifier_block *nb)
> +{
> +	return atomic_notifier_chain_unregister(&ipc_notifier, nb);
> +}
> +
> +static int __devinit pl320_probe(struct amba_device *adev,
> +				const struct amba_id *id)
> +{
> +	int ret;
> +
> +	ipc_base = ioremap(adev->res.start, resource_size(&adev->res));
> +	if (ipc_base == NULL)
> +		return -ENOMEM;
> +
> +	__raw_writel(0, ipc_base + IPCMxSEND(IPC_FAST_MBOX));
> +	__raw_writel(0, ipc_base + IPCMxSEND(IPC_SLOW_MBOX));
> +
> +	ipc_irq = adev->irq[0];
> +	ret = request_irq(ipc_irq, ipc_handler, 0, dev_name(&adev->dev), NULL);
> +	if (ret < 0)
> +		goto err;
> +
> +	/* Init fast mailbox */
> +	__raw_writel(CHAN_MASK(A9_SOURCE),
> +			ipc_base + IPCMxSOURCE(IPC_FAST_MBOX));
> +	set_destination(M3_SOURCE, IPC_FAST_MBOX);
> +
> +	/* Init slow mailbox */
> +	__raw_writel(CHAN_MASK(A9_SOURCE),
> +			ipc_base + IPCMxSOURCE(IPC_SLOW_MBOX));
> +	__raw_writel(CHAN_MASK(M3_SOURCE),
> +			ipc_base + IPCMxDSET(IPC_SLOW_MBOX));
> +	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
> +		     ipc_base + IPCMxMSET(IPC_SLOW_MBOX));
> +
> +	/* Init receive mailbox */
> +	__raw_writel(CHAN_MASK(M3_SOURCE),
> +			ipc_base + IPCMxSOURCE(IPC_RX_MBOX));
> +	__raw_writel(CHAN_MASK(A9_SOURCE),
> +			ipc_base + IPCMxDSET(IPC_RX_MBOX));
> +	__raw_writel(CHAN_MASK(M3_SOURCE) | CHAN_MASK(A9_SOURCE),
> +		     ipc_base + IPCMxMSET(IPC_RX_MBOX));
> +
> +	return 0;
> +err:
> +	iounmap(ipc_base);
> +	return ret;
> +}
> +
> +static struct amba_id pl320_ids[] = {
> +	{
> +		.id	= 0x00041320,
> +		.mask	= 0x000fffff,
> +	},
> +	{ 0, 0 },
> +};
> +
> +static struct amba_driver pl320_driver = {
> +	.drv = {
> +		.name	= "pl320",
> +	},
> +	.id_table	= pl320_ids,
> +	.probe		= pl320_probe,
> +};
> +
> +static int __init ipc_init(void)
> +{
> +	return amba_driver_register(&pl320_driver);
> +}
> +module_init(ipc_init);
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox