Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* RE: [RFC PATCH 2/7] Thermal: Create zone level APIs
From: R, Durgadoss @ 2012-12-14  5:10 UTC (permalink / raw)
  To: Hongbo Zhang
  Cc: Zhang, Rui, linux-pm@vger.kernel.org, wni@nvidia.com,
	eduardo.valentin@ti.com, amit.kachhap@linaro.org,
	sachin.kamat@linaro.org
In-Reply-To: <CAJLyvQxp9A_PT4HDhV9NUizDS9Ax+ibAb8q-j48pUOt8221zOQ@mail.gmail.com>

Hi,


> -----Original Message-----
> From: Hongbo Zhang [mailto:hongbo.zhang@linaro.org]
> Sent: Friday, December 14, 2012 9:41 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux-pm@vger.kernel.org; wni@nvidia.com;
> eduardo.valentin@ti.com; amit.kachhap@linaro.org;
> sachin.kamat@linaro.org
> Subject: Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
> 
> On 13 December 2012 23:00, R, Durgadoss <durgadoss.r@intel.com> wrote:
> > Hi,
> >
> >
> >> -----Original Message-----
> >> From: linux-pm-owner@vger.kernel.org [mailto:linux-pm-
> >> owner@vger.kernel.org] On Behalf Of Hongbo Zhang
> >> Sent: Thursday, December 13, 2012 11:53 AM
> >> To: R, Durgadoss
> >> Cc: Zhang, Rui; linux-pm@vger.kernel.org; wni@nvidia.com;
> >> eduardo.valentin@ti.com; amit.kachhap@linaro.org;
> >> sachin.kamat@linaro.org
> >> Subject: Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
> >>
> >> On 17 November 2012 18:45, Durgadoss R <durgadoss.r@intel.com>
> wrote:
> >> > This patch adds a new thermal_zone structure to
> >> > thermal.h. Also, adds zone level APIs to the thermal
> >> > framework.
> >> >
> >> > Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
> >
> > [A big cut.]
> >
> >> >  #define to_cooling_device(_dev)        \
> >> >         container_of(_dev, struct thermal_cooling_device, device)
> >> > @@ -1557,6 +1615,146 @@ static void remove_trip_attrs(struct
> >> thermal_zone_device *tz)
> >> >         kfree(tz->trip_hyst_attrs);
> >> >  }
> >> >
> >> > +struct thermal_zone *create_thermal_zone(const char *name, void
> >> *devdata)
> >> Durgaross,
> >> Since more sensors can be added into one thermal zone, think about
> >> this situation that every sensor is in itself file separately:
> >
> > Yes, we thought about this scenario.
> > The way the new framework is designed is like below:
> >
> > The thermal sensor source file can be generic (can be any sensor driver, in
> any
> > subsystem). This driver will use the sensor APIs and register with thermal
> > framework to participate in platform Thermal management. This does not
> > (and should not) know about which zone it belongs to, or any other
> information
> > about platform thermals. A sensor driver is a standalone piece of code,
> which
> > can optionally register with thermal framework.
> >
> > However, for any platform, there should be a platformX_thermal.c file,
> > which will know about the platform thermal characteristics (like how many
> sensors,
> > zones, cooling devices, etc.. And how they are related to each other i.e the
> > mapping information). Only in this file, the zone level APIs should be used,
> in
> > which case the file will have all information required to attach various
> sensors
> > to a particular zone.
> I guess you will remove the bind callback, I suggest to remove it if
> you don't have this plan.

Yes, We will remove this.

> A match callback or/and mapping data can be used to identify the
> binding relationship, the current binding process is abundant and
> strange a bit.

Yes, the mapping data can be used.

> >
> > This way, we can have one platform level thermal file, which can support
> > multiple platforms (may be)using the same set of sensors (but)binded in a
> different way.
> > This file can get the platform thermal information through Firmware,
> > ACPI tables, device tree etc.
> >
> > Unfortunately, today we don't have many drivers that can be clearly
> differentiated
> > as 'sensor_file.c' and 'platform_thermal_file.c'. But very soon we will
> need/have.
> > The reason I am saying this is because we are seeing a lot of chip drivers,
> > starting to use thermal framework, and we should keep it really light-
> weight
> > for them to do so.
> >
> > An Example: drivers/hwmon/emc1403.c - a generic thermal chip driver
> > In one platform this sensor can belong to 'ZoneA' and in another the
> > same can belong to 'ZoneB'. But, emc1403.c does not really care about
> > where does it belong. It just reports temperature.
> > [Okay, I just picked an example, it doesn't register with framework,
> > as of today :-)]

> Clear about the above information.

Thank you :-)

> This is a good design, I believe more and more drivers will use this.
> But much efforts are needed.

Yes, we need quite some code modifications.

> >
> >>
> >> sensorA_file.c:
> >> if common_zone_for_AB not there;
> >> create common_zone_for_AB;
> >
> > This create_zone should not even be in sensorA_file.c.
> > It should be moved to platform level file.
> >
> >> add sensorA into common_zone_for_AB;
> >>
> >> sensorB_file.c:
> >> if common_zone_for_AB not there;
> >> create common_zone_for_AB;
> >> add sensorB into common_zone_for_AB;
> >>
> >
> > Same here..
> >
> >> But how to check dedicate thermal zone is created or not? we are
> >> lacking of this interface.
> >> Here are two ways to achieve this from my point of view:
> >> a)
> >> add interface get_thermal_zone_byname(const char *name) which will
> >> walk through the thermal_zone_list.
> >
> > However, (despite this problem we are talking about)
> > we can introduce this API, if needed.
> >
> >> b)
> >> add one more parameter for current interface, like this
> >> create_thermal_zone(const char *name, void *devdata, bool reuse)
> >> if reuse==ture {
> >> if thermal zone already created, return it;
> >> else create thermal zone;
> >> } else {
> >> if thermal zone already created, return error;
> >> else create thermal zone;
> >> }
> >>
> >> I prefer a), how do you think about it?
> >
> > I know I wrote a lengthy explanation. Let me know if it makes sense.
> > If it does, then probably I will add it to our Documentation file.
> Yes, It really makes sense.
> Thanks.

Okay, Then I am adding this to our Documentation file.

Thanks,
Durga

^ permalink raw reply

* Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
From: Hongbo Zhang @ 2012-12-14  4:10 UTC (permalink / raw)
  To: R, Durgadoss
  Cc: Zhang, Rui, linux-pm@vger.kernel.org, wni@nvidia.com,
	eduardo.valentin@ti.com, amit.kachhap@linaro.org,
	sachin.kamat@linaro.org
In-Reply-To: <4D68720C2E767A4AA6A8796D42C8EB592487C8@BGSMSX101.gar.corp.intel.com>

On 13 December 2012 23:00, R, Durgadoss <durgadoss.r@intel.com> wrote:
> Hi,
>
>
>> -----Original Message-----
>> From: linux-pm-owner@vger.kernel.org [mailto:linux-pm-
>> owner@vger.kernel.org] On Behalf Of Hongbo Zhang
>> Sent: Thursday, December 13, 2012 11:53 AM
>> To: R, Durgadoss
>> Cc: Zhang, Rui; linux-pm@vger.kernel.org; wni@nvidia.com;
>> eduardo.valentin@ti.com; amit.kachhap@linaro.org;
>> sachin.kamat@linaro.org
>> Subject: Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
>>
>> On 17 November 2012 18:45, Durgadoss R <durgadoss.r@intel.com> wrote:
>> > This patch adds a new thermal_zone structure to
>> > thermal.h. Also, adds zone level APIs to the thermal
>> > framework.
>> >
>> > Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
>
> [A big cut.]
>
>> >  #define to_cooling_device(_dev)        \
>> >         container_of(_dev, struct thermal_cooling_device, device)
>> > @@ -1557,6 +1615,146 @@ static void remove_trip_attrs(struct
>> thermal_zone_device *tz)
>> >         kfree(tz->trip_hyst_attrs);
>> >  }
>> >
>> > +struct thermal_zone *create_thermal_zone(const char *name, void
>> *devdata)
>> Durgaross,
>> Since more sensors can be added into one thermal zone, think about
>> this situation that every sensor is in itself file separately:
>
> Yes, we thought about this scenario.
> The way the new framework is designed is like below:
>
> The thermal sensor source file can be generic (can be any sensor driver, in any
> subsystem). This driver will use the sensor APIs and register with thermal
> framework to participate in platform Thermal management. This does not
> (and should not) know about which zone it belongs to, or any other information
> about platform thermals. A sensor driver is a standalone piece of code, which
> can optionally register with thermal framework.
>
> However, for any platform, there should be a platformX_thermal.c file,
> which will know about the platform thermal characteristics (like how many sensors,
> zones, cooling devices, etc.. And how they are related to each other i.e the
> mapping information). Only in this file, the zone level APIs should be used, in
> which case the file will have all information required to attach various sensors
> to a particular zone.
I guess you will remove the bind callback, I suggest to remove it if
you don't have this plan.
A match callback or/and mapping data can be used to identify the
binding relationship, the current binding process is abundant and
strange a bit.
>
> This way, we can have one platform level thermal file, which can support
> multiple platforms (may be)using the same set of sensors (but)binded in a different way.
> This file can get the platform thermal information through Firmware,
> ACPI tables, device tree etc.
>
> Unfortunately, today we don't have many drivers that can be clearly differentiated
> as 'sensor_file.c' and 'platform_thermal_file.c'. But very soon we will need/have.
> The reason I am saying this is because we are seeing a lot of chip drivers,
> starting to use thermal framework, and we should keep it really light-weight
> for them to do so.
>
> An Example: drivers/hwmon/emc1403.c - a generic thermal chip driver
> In one platform this sensor can belong to 'ZoneA' and in another the
> same can belong to 'ZoneB'. But, emc1403.c does not really care about
> where does it belong. It just reports temperature.
> [Okay, I just picked an example, it doesn't register with framework,
> as of today :-)]
Clear about the above information.
This is a good design, I believe more and more drivers will use this.
But much efforts are needed.
>
>>
>> sensorA_file.c:
>> if common_zone_for_AB not there;
>> create common_zone_for_AB;
>
> This create_zone should not even be in sensorA_file.c.
> It should be moved to platform level file.
>
>> add sensorA into common_zone_for_AB;
>>
>> sensorB_file.c:
>> if common_zone_for_AB not there;
>> create common_zone_for_AB;
>> add sensorB into common_zone_for_AB;
>>
>
> Same here..
>
>> But how to check dedicate thermal zone is created or not? we are
>> lacking of this interface.
>> Here are two ways to achieve this from my point of view:
>> a)
>> add interface get_thermal_zone_byname(const char *name) which will
>> walk through the thermal_zone_list.
>
> However, (despite this problem we are talking about)
> we can introduce this API, if needed.
>
>> b)
>> add one more parameter for current interface, like this
>> create_thermal_zone(const char *name, void *devdata, bool reuse)
>> if reuse==ture {
>> if thermal zone already created, return it;
>> else create thermal zone;
>> } else {
>> if thermal zone already created, return error;
>> else create thermal zone;
>> }
>>
>> I prefer a), how do you think about it?
>
> I know I wrote a lengthy explanation. Let me know if it makes sense.
> If it does, then probably I will add it to our Documentation file.
Yes, It really makes sense.
Thanks.
>
> Thanks,
> Durga

^ permalink raw reply

* [PATCH] PCIe/PM: Do not suspend port if any subordinate device need PME polling
From: Huang Ying @ 2012-12-14  2:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, linux-pci, linux-pm, Rafael J. Wysocki, Huang Ying

In

  http://www.mail-archive.com/linux-usb@vger.kernel.org/msg07976.html

Ulrich reported that his USB3 cardreader does not work reliably when
connected to the USB3 port.  It turns out that USB3 controller failed
to be waken up when plugging in the USB3 cardreader.  Further
experiment found that the USB3 host controller can only be waken up
via polling, while not via PME interrupt.  But if the PCIe port that
the USB3 host controller is connected is suspended, we can not poll
the USB3 host controller because its config space is not accessible if
the PCIe port is put into low power state.

To solve the issue, the PCIe port will not be suspended if any
subordinate device need PME polling.

Reported-by: Ulrich Eckhardt <usb@uli-eckhardt.de>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: stable@vger.kernel.org	# 3.6+
---
 drivers/pci/pcie/portdrv_pci.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -134,10 +134,26 @@ static int pcie_port_runtime_resume(stru
 	return 0;
 }

+static int pci_dev_pme_poll(struct pci_dev *pdev, void *data)
+{
+	int *pme_poll = data;
+	*pme_poll = *pme_poll || pdev->pme_poll;
+	return 0;
+}
+
 static int pcie_port_runtime_idle(struct device *dev)
 {
+	struct pci_dev *pdev = to_pci_dev(dev);
+	int pme_poll = false;
+
+	/*
+	 * If any subordinate device needs pme poll, we should keep
+	 * the port in D0, because we need port in D0 to poll it.
+	 */
+	pci_walk_bus(pdev->subordinate, pci_dev_pme_poll, &pme_poll);
 	/* Delay for a short while to prevent too frequent suspend/resume */
-	pm_schedule_suspend(dev, 10);
+	if (!pme_poll)
+		pm_schedule_suspend(dev, 10);
 	return -EBUSY;
 }
 #else

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-13 16:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121213161709.GA19125@redhat.com>

On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
>>>
>>> Even I don't spot anything wrong with it. But I'll give it some more
>>> thought..
>>
>> Since an interrupt handler can also run get_online_cpus_atomic(), we
>> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
>> right?
> 
> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
> __this_cpu_inc() correctness-wise.
>
> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
> 
> But when I try to read the comments percpu.h, I am starting to think that
> even this_cpu_inc() is not safe if irq handler can do the same?
> 

The comment seems to say that its not safe wrt interrupts. But looking at
the code in include/linux/percpu.h, IIUC, that is true only about
this_cpu_read() because it only disables preemption.

However, this_cpu_inc() looks safe wrt interrupts because it wraps the
increment within raw_local_irqsave()/restore().

> Confused...
> 
> I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
> local_inc(__this_cpu_ptr(...)) work??
> 
>> But still, this scheme is better, because the reader doesn't have to spin
>> on the read_lock() with interrupts disabled.
> 
> Yes, but my main concern is that irq_disable/enable itself is not that cheap.
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Tejun Heo @ 2012-12-13 16:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Srivatsa S. Bhat, tglx, peterz, paulmck, rusty, mingo, akpm,
	namhyung, vincent.guittot, sbw, amit.kucheria, rostedt, rjw,
	wangyun, xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121213161709.GA19125@redhat.com>

Hello, Oleg.

On Thu, Dec 13, 2012 at 05:17:09PM +0100, Oleg Nesterov wrote:
> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
> __this_cpu_inc() correctness-wise.

this_cpu_inc() equals local_irq_save() + __this_cpu_inc().

> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

Yes, it is safe.

> But when I try to read the comments percpu.h, I am starting to think that
> even this_cpu_inc() is not safe if irq handler can do the same?
> 
> Confused...

Yeah, the comment is confusing and the way these macros are defined
doesn't help.  There used to be three variants and it looks like we
didn't update the comment while removing the preempt safe ones.  Gotta
clean those up.  Anyways, yes, this_cpu_*() are safe against irqs.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-13 16:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C9F38F.3020005@linux.vnet.ibm.com>

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> >
> > Even I don't spot anything wrong with it. But I'll give it some more
> > thought..
>
> Since an interrupt handler can also run get_online_cpus_atomic(), we
> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
> right?

Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
__this_cpu_inc() correctness-wise.

And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.

But when I try to read the comments percpu.h, I am starting to think that
even this_cpu_inc() is not safe if irq handler can do the same?

Confused...

I am shy to ask... will, say, DEFINE_PER_CPU(local_t) and
local_inc(__this_cpu_ptr(...)) work??

> But still, this scheme is better, because the reader doesn't have to spin
> on the read_lock() with interrupts disabled.

Yes, but my main concern is that irq_disable/enable itself is not that cheap.

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-13 15:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C8D739.6030903@linux.vnet.ibm.com>

On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
> On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
>> On 12/13, Srivatsa S. Bhat wrote:
>>>
>>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
>>>> And _perhaps_ get_ can avoid it too?
>>>>
>>>> I didn't really try to think, probably this is not right, but can't
>>>> something like this work?
>>>>
>>>> 	#define XXXX	(1 << 16)
>>>> 	#define MASK	(XXXX -1)
>>>>
>>>> 	void get_online_cpus_atomic(void)
>>>> 	{
>>>> 		preempt_disable();
>>>>
>>>> 		// only for writer
>>>> 		__this_cpu_add(reader_percpu_refcnt, XXXX);
>>>>
>>>> 		if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>>>> 			__this_cpu_inc(reader_percpu_refcnt);
>>>> 		} else {
>>>> 			smp_wmb();
>>>> 			if (writer_active()) {
>>>> 				...
>>>> 			}
>>>> 		}
>>>>
>>>> 		__this_cpu_dec(reader_percpu_refcnt, XXXX);
>>>> 	}
>>>>
>>>
>>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>>> of how the mask helps us avoid disabling interrupts..
>>
>> Why do we need cli/sti at all? We should prevent the following race:
>>
>> 	- the writer already holds hotplug_rwlock, so get_ must not
>> 	  succeed.
>>
>> 	- the new reader comes, it increments reader_percpu_refcnt,
>> 	  but before it checks writer_active() ...
>>
>> 	- irq handler does get_online_cpus_atomic() and sees
>> 	  reader_nested_percpu() == T, so it simply increments
>> 	  reader_percpu_refcnt and succeeds.
>>
>> OTOH, why do we need to increment reader_percpu_refcnt the counter
>> in advance? To ensure that either we see writer_active() or the
>> writer should see reader_percpu_refcnt != 0 (and that is why they
>> should write/read in reverse order).
>>
>> The code above tries to avoid this race using the lower 16 bits
>> as a "nested-counter", and the upper bits to avoid the race with
>> the writer.
>>
>> 	// only for writer
>> 	__this_cpu_add(reader_percpu_refcnt, XXXX);
>>
>> If irq comes and does get_online_cpus_atomic(), it won't be confused
>> by __this_cpu_add(XXXX), it will check the lower bits and switch to
>> the "slow path".
>>
> 
> This is a very clever scheme indeed! :-) Thanks a lot for explaining
> it in detail.
> 
>>
>> But once again, so far I didn't really try to think. It is quite
>> possible I missed something.
>>
> 
> Even I don't spot anything wrong with it. But I'll give it some more
> thought..

Since an interrupt handler can also run get_online_cpus_atomic(), we
cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
right?

To maintain the integrity of the update itself, we will have to use the
this_cpu_* variant, which basically plays spoil-sport on this whole
scheme... :-(

But still, this scheme is better, because the reader doesn't have to spin
on the read_lock() with interrupts disabled. That way, interrupt handlers
that are not hotplug readers can continue to run on this CPU while taking
another CPU offline.

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* RE: [RFC PATCH 2/7] Thermal: Create zone level APIs
From: R, Durgadoss @ 2012-12-13 15:00 UTC (permalink / raw)
  To: Hongbo Zhang
  Cc: Zhang, Rui, linux-pm@vger.kernel.org, wni@nvidia.com,
	eduardo.valentin@ti.com, amit.kachhap@linaro.org,
	sachin.kamat@linaro.org
In-Reply-To: <CAJLyvQzB6nGayefJM5-Qak7XxG_yPGFEPc6gv=j8cPrtUjP=Pg@mail.gmail.com>

Hi,

> -----Original Message-----
> From: linux-pm-owner@vger.kernel.org [mailto:linux-pm-
> owner@vger.kernel.org] On Behalf Of Hongbo Zhang
> Sent: Thursday, December 13, 2012 11:53 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux-pm@vger.kernel.org; wni@nvidia.com;
> eduardo.valentin@ti.com; amit.kachhap@linaro.org;
> sachin.kamat@linaro.org
> Subject: Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
> 
> On 17 November 2012 18:45, Durgadoss R <durgadoss.r@intel.com> wrote:
> > This patch adds a new thermal_zone structure to
> > thermal.h. Also, adds zone level APIs to the thermal
> > framework.
> >
> > Signed-off-by: Durgadoss R <durgadoss.r@intel.com>

[A big cut.]

> >  #define to_cooling_device(_dev)        \
> >         container_of(_dev, struct thermal_cooling_device, device)
> > @@ -1557,6 +1615,146 @@ static void remove_trip_attrs(struct
> thermal_zone_device *tz)
> >         kfree(tz->trip_hyst_attrs);
> >  }
> >
> > +struct thermal_zone *create_thermal_zone(const char *name, void
> *devdata)
> Durgaross,
> Since more sensors can be added into one thermal zone, think about
> this situation that every sensor is in itself file separately:

Yes, we thought about this scenario.
The way the new framework is designed is like below:

The thermal sensor source file can be generic (can be any sensor driver, in any
subsystem). This driver will use the sensor APIs and register with thermal
framework to participate in platform Thermal management. This does not
(and should not) know about which zone it belongs to, or any other information
about platform thermals. A sensor driver is a standalone piece of code, which
can optionally register with thermal framework.

However, for any platform, there should be a platformX_thermal.c file,
which will know about the platform thermal characteristics (like how many sensors,
zones, cooling devices, etc.. And how they are related to each other i.e the
mapping information). Only in this file, the zone level APIs should be used, in
which case the file will have all information required to attach various sensors
to a particular zone.

This way, we can have one platform level thermal file, which can support
multiple platforms (may be)using the same set of sensors (but)binded in a different way.
This file can get the platform thermal information through Firmware,
ACPI tables, device tree etc.

Unfortunately, today we don't have many drivers that can be clearly differentiated
as 'sensor_file.c' and 'platform_thermal_file.c'. But very soon we will need/have.
The reason I am saying this is because we are seeing a lot of chip drivers,
starting to use thermal framework, and we should keep it really light-weight
for them to do so.

An Example: drivers/hwmon/emc1403.c - a generic thermal chip driver
In one platform this sensor can belong to 'ZoneA' and in another the
same can belong to 'ZoneB'. But, emc1403.c does not really care about
where does it belong. It just reports temperature.
[Okay, I just picked an example, it doesn't register with framework,
as of today :-)]

> 
> sensorA_file.c:
> if common_zone_for_AB not there;
> create common_zone_for_AB;

This create_zone should not even be in sensorA_file.c.
It should be moved to platform level file.

> add sensorA into common_zone_for_AB;
> 
> sensorB_file.c:
> if common_zone_for_AB not there;
> create common_zone_for_AB;
> add sensorB into common_zone_for_AB;
> 

Same here..

> But how to check dedicate thermal zone is created or not? we are
> lacking of this interface.
> Here are two ways to achieve this from my point of view:
> a)
> add interface get_thermal_zone_byname(const char *name) which will
> walk through the thermal_zone_list.

However, (despite this problem we are talking about)
we can introduce this API, if needed.

> b)
> add one more parameter for current interface, like this
> create_thermal_zone(const char *name, void *devdata, bool reuse)
> if reuse==ture {
> if thermal zone already created, return it;
> else create thermal zone;
> } else {
> if thermal zone already created, return error;
> else create thermal zone;
> }
> 
> I prefer a), how do you think about it?

I know I wrote a lengthy explanation. Let me know if it makes sense.
If it does, then probably I will add it to our Documentation file.

Thanks,
Durga

^ permalink raw reply

* Re: [RESEND PATCH 0/5] cpufreq: db8500: Rename driver and update some parts
From: Ulf Hansson @ 2012-12-13  9:20 UTC (permalink / raw)
  To: Mike Turquette
  Cc: Ulf Hansson, Rafael J. Wysocki, cpufreq, linux-pm@vger.kernel.org,
	linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic
In-Reply-To: <CAPtuhTgcmdxtG9B6L5NMyLCrhQh2o0DFkq47e_YCMoBzVhaF=g@mail.gmail.com>

On 12 December 2012 21:56, Mike Turquette <mturquette@linaro.org> wrote:
> On Mon, Dec 10, 2012 at 7:25 AM, Ulf Hansson <ulf.hansson@stericsson.com> wrote:
>> From: Ulf Hansson <ulf.hansson@linaro.org>
>>
>> This patchset starts by renaming the db8500 cpufreq driver to a more generic
>> name. There are new variants which rely on it too, so instead we give it a
>> family name of dbx500.
>>
>> On top of that a fixup patch for initialization of the driver and some minor
>> cleanup patches are included as well.
>>
>> These patches can be material for 3.8.
>>
>> Since some patches for the db8500 cpufreq driver has recently been merged
>> through Mike Turquette's clk tree, I decided to base these patches on top
>> of the clk-next branch. So I hope Mike is fine by merging these patches
>> through his tree.
>>
>> So I will try to collect ACKs from Rafael for the cpufreq patches. The
>> mfd patch has already been ACKED by Samuel.
>>
>
> Ulf,
>
> Just FYI clk-next has been pulled by Linus for 3.8.  I can still take
> this branch once Rafael ACKs the CPUfreq bits if you want, but I think
> you have more options now that clk-next is merged in.  It's up to you.

I think the easiest way forward (at least for me :-) ) would be if we
could send them as for 3.8 rcs. Is that possible for you Mike?

>
> Regards,
> Mike

Thanks!

Ulf Hansson

^ permalink raw reply

* Re: [PATCH] cpuidle - remove the power_specified field in the driver
From: Daniel Lezcano @ 2012-12-13  7:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Julius Werner, Francesco Lavra, linux-pm, Deepthi Dharwar,
	Trinabh Gupta, Lists Linaro-dev, len.brown, linux-kernel,
	Andrew Morton, Sameer Nanda
In-Reply-To: <2077776.UzqC1zVIVr@vostro.rjw.lan>

On 12/12/2012 11:42 PM, Rafael J. Wysocki wrote:
> On Wednesday, December 12, 2012 09:00:53 PM Daniel Lezcano wrote:
>> On 12/12/2012 07:50 PM, Julius Werner wrote:
>>> Thanks again for making this happen, Daniel. I like this version,
>>> except for the small nitpick that I still think it would make sense to
>>> also turn the loop in menu.c around. How about something like this:
>>>
>>>        for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i++) {
>>>                 struct cpuidle_state *s = &drv->states[i];
>>>                 if (!s->disable && s->exit_latency <= latency_req &&
>>>                     s->target_residency <= data->predicted_us &&
>>>                     s->exit_latency * multiplier <= data->predicted_us) {
>>>                         data->last_state_idx = i;
>>>                         data->exit_us = s->exit_latency;
>>>                         break;
>>>                 }
>>>         }
>>
>> Actually I was planning to do that in a separate patch.
> 
> Can you submit that second patch too, please, so that people don't have to
> wonder?

Sure.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply

* Re: [RFC PATCH 2/7] Thermal: Create zone level APIs
From: Hongbo Zhang @ 2012-12-13  6:23 UTC (permalink / raw)
  To: Durgadoss R
  Cc: rui.zhang, linux-pm, wni, eduardo.valentin, amit.kachhap,
	sachin.kamat
In-Reply-To: <1353149158-19102-3-git-send-email-durgadoss.r@intel.com>

On 17 November 2012 18:45, Durgadoss R <durgadoss.r@intel.com> wrote:
> This patch adds a new thermal_zone structure to
> thermal.h. Also, adds zone level APIs to the thermal
> framework.
>
> Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
> ---
>  drivers/thermal/thermal_sys.c |  206 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/thermal.h       |   25 +++++
>  2 files changed, 231 insertions(+)
>
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> index e726c8b..9d386d7 100644
> --- a/drivers/thermal/thermal_sys.c
> +++ b/drivers/thermal/thermal_sys.c
> @@ -44,19 +44,28 @@ MODULE_DESCRIPTION("Generic thermal management sysfs support");
>  MODULE_LICENSE("GPL");
>
>  static DEFINE_IDR(thermal_tz_idr);
> +static DEFINE_IDR(thermal_zone_idr);
>  static DEFINE_IDR(thermal_cdev_idr);
>  static DEFINE_IDR(thermal_sensor_idr);
>  static DEFINE_MUTEX(thermal_idr_lock);
>
>  static LIST_HEAD(thermal_tz_list);
>  static LIST_HEAD(thermal_sensor_list);
> +static LIST_HEAD(thermal_zone_list);
>  static LIST_HEAD(thermal_cdev_list);
>  static LIST_HEAD(thermal_governor_list);
>
>  static DEFINE_MUTEX(thermal_list_lock);
>  static DEFINE_MUTEX(ts_list_lock);
> +static DEFINE_MUTEX(tz_list_lock);
>  static DEFINE_MUTEX(thermal_governor_lock);
>
> +#define for_each_thermal_sensor(pos) \
> +       list_for_each_entry(pos, &thermal_sensor_list, node)
> +
> +#define for_each_thermal_zone(pos) \
> +       list_for_each_entry(pos, &thermal_zone_list, node)
> +
>  static struct thermal_governor *__find_governor(const char *name)
>  {
>         struct thermal_governor *pos;
> @@ -419,15 +428,62 @@ static void thermal_zone_device_check(struct work_struct *work)
>         thermal_zone_device_update(tz);
>  }
>
> +static int get_sensor_indx(struct thermal_zone *tz, struct thermal_sensor *ts)
> +{
> +       int i, indx = -EINVAL;
> +
> +       if (!tz || !ts)
> +               return -EINVAL;
> +
> +       mutex_lock(&ts_list_lock);
> +       for (i = 0; i < tz->sensor_indx; i++) {
> +               if (tz->sensors[i] == ts) {
> +                       indx = i;
> +                       break;
> +               }
> +       }
> +       mutex_unlock(&ts_list_lock);
> +       return indx;
> +}
> +
> +static void remove_sensor_from_zone(struct thermal_zone *tz,
> +                               struct thermal_sensor *ts)
> +{
> +       int indx, j;
> +
> +       indx = get_sensor_indx(tz, ts);
> +       if (indx < 0)
> +               return;
> +
> +       sysfs_remove_link(&tz->device.kobj, kobject_name(&ts->device.kobj));
> +
> +       /* Shift the entries in the tz->sensors array */
> +       for (j = indx; j < MAX_SENSORS_PER_ZONE - 1; j++)
> +               tz->sensors[j] = tz->sensors[j + 1];
> +
> +       tz->sensor_indx--;
> +}
> +
>  /* sys I/F for thermal zone */
>
>  #define to_thermal_zone(_dev) \
>         container_of(_dev, struct thermal_zone_device, device)
>
> +#define to_zone(_dev) \
> +       container_of(_dev, struct thermal_zone, device)
> +
>  #define to_thermal_sensor(_dev) \
>         container_of(_dev, struct thermal_sensor, device)
>
>  static ssize_t
> +zone_name_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +       struct thermal_zone *tz = to_zone(dev);
> +
> +       return sprintf(buf, "%s\n", tz->name);
> +}
> +
> +static ssize_t
>  sensor_name_show(struct device *dev, struct device_attribute *attr, char *buf)
>  {
>         struct thermal_sensor *ts = to_thermal_sensor(dev);
> @@ -772,6 +828,8 @@ static DEVICE_ATTR(policy, S_IRUGO | S_IWUSR, policy_show, policy_store);
>  static DEVICE_ATTR(sensor_name, 0444, sensor_name_show, NULL);
>  static DEVICE_ATTR(temp_input, 0444, sensor_temp_show, NULL);
>
> +static DEVICE_ATTR(zone_name, 0444, zone_name_show, NULL);
> +
>  /* sys I/F for cooling device */
>  #define to_cooling_device(_dev)        \
>         container_of(_dev, struct thermal_cooling_device, device)
> @@ -1557,6 +1615,146 @@ static void remove_trip_attrs(struct thermal_zone_device *tz)
>         kfree(tz->trip_hyst_attrs);
>  }
>
> +struct thermal_zone *create_thermal_zone(const char *name, void *devdata)
Durgaross,
Since more sensors can be added into one thermal zone, think about
this situation that every sensor is in itself file separately:

sensorA_file.c:
if common_zone_for_AB not there;
create common_zone_for_AB;
add sensorA into common_zone_for_AB;

sensorB_file.c:
if common_zone_for_AB not there;
create common_zone_for_AB;
add sensorB into common_zone_for_AB;

But how to check dedicate thermal zone is created or not? we are
lacking of this interface.
Here are two ways to achieve this from my point of view:
a)
add interface get_thermal_zone_byname(const char *name) which will
walk through the thermal_zone_list.
b)
add one more parameter for current interface, like this
create_thermal_zone(const char *name, void *devdata, bool reuse)
if reuse==ture {
if thermal zone already created, return it;
else create thermal zone;
} else {
if thermal zone already created, return error;
else create thermal zone;
}

I prefer a), how do you think about it?

> +{
> +       struct thermal_zone *tz;
> +       int ret;
> +
> +       if (!name || (name && strlen(name) >= THERMAL_NAME_LENGTH))
> +               return ERR_PTR(-EINVAL);
> +
> +       tz = kzalloc(sizeof(*tz), GFP_KERNEL);
> +       if (!tz)
> +               return ERR_PTR(-ENOMEM);
> +
> +       idr_init(&tz->idr);
> +       ret = get_idr(&thermal_zone_idr, &thermal_idr_lock, &tz->id);
> +       if (ret)
> +               goto exit_free;
> +
> +       strcpy(tz->name, name);
> +       tz->devdata = devdata;
> +       tz->device.class = &thermal_class;
> +
> +       dev_set_name(&tz->device, "zone%d", tz->id);
> +       ret = device_register(&tz->device);
> +       if (ret)
> +               goto exit_idr;
> +
> +       ret = device_create_file(&tz->device, &dev_attr_zone_name);
> +       if (ret)
> +               goto exit_unregister;
> +
> +       /* Add this zone to the global list of thermal zones */
> +       mutex_lock(&tz_list_lock);
> +       list_add_tail(&tz->node, &thermal_zone_list);
> +       mutex_unlock(&tz_list_lock);
> +       return tz;
> +
> +exit_unregister:
> +       device_unregister(&tz->device);
> +exit_idr:
> +       release_idr(&thermal_zone_idr, &thermal_idr_lock, tz->id);
> +exit_free:
> +       kfree(tz);
> +       return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(create_thermal_zone);
> +
> +void remove_thermal_zone(struct thermal_zone *tz)
> +{
> +       struct thermal_zone *pos, *next;
> +       bool found = false;
> +
> +       if (!tz)
> +               return;
> +
> +       mutex_lock(&tz_list_lock);
> +
> +       list_for_each_entry_safe(pos, next, &thermal_zone_list, node) {
> +               if (pos == tz) {
> +                       list_del(&tz->node);
> +                       found = true;
> +                       break;
> +               }
> +       }
> +
> +       if (!found)
> +               goto exit;
> +
> +       device_remove_file(&tz->device, &dev_attr_zone_name);
> +
> +       release_idr(&thermal_zone_idr, &thermal_idr_lock, tz->id);
> +       idr_destroy(&tz->idr);
> +
> +       device_unregister(&tz->device);
> +       kfree(tz);
> +exit:
> +       mutex_unlock(&tz_list_lock);
> +       return;
> +}
> +EXPORT_SYMBOL(remove_thermal_zone);
> +
> +struct thermal_sensor *get_sensor_by_name(const char *name)
> +{
> +       struct thermal_sensor *pos;
> +       struct thermal_sensor *ts = NULL;
> +
> +       mutex_lock(&ts_list_lock);
> +       for_each_thermal_sensor(pos) {
> +               if (!strnicmp(pos->name, name, THERMAL_NAME_LENGTH)) {
> +                       ts = pos;
> +                       break;
> +               }
> +       }
> +       mutex_unlock(&ts_list_lock);
> +       return ts;
> +}
> +EXPORT_SYMBOL(get_sensor_by_name);
> +
> +int add_sensor_to_zone_by_name(struct thermal_zone *tz, const char *name)
> +{
> +       int ret;
> +       struct thermal_sensor *ts = get_sensor_by_name(name);
> +
> +       if (!tz || !ts)
> +               return -EINVAL;
> +
> +       mutex_lock(&tz_list_lock);
> +
> +       /* Ensure we are not adding the same sensor again!! */
> +       ret = get_sensor_indx(tz, ts);
> +       if (ret >= 0) {
> +               ret = -EEXIST;
> +               goto exit_zone;
> +       }
> +
> +       mutex_lock(&ts_list_lock);
> +
> +       ret = sysfs_create_link(&tz->device.kobj, &ts->device.kobj,
> +                               kobject_name(&ts->device.kobj));
> +       if (ret)
> +               goto exit_sensor;
> +
> +       tz->sensors[tz->sensor_indx++] = ts;
> +
> +exit_sensor:
> +       mutex_unlock(&ts_list_lock);
> +exit_zone:
> +       mutex_unlock(&tz_list_lock);
> +       return ret;
> +}
> +EXPORT_SYMBOL(add_sensor_to_zone_by_name);
> +
> +int add_sensor_to_zone(struct thermal_zone *tz, struct thermal_sensor *ts)
> +{
> +       if (!tz || !ts)
> +               return -EINVAL;
> +
> +       return add_sensor_to_zone_by_name(tz, ts->name);
> +}
> +EXPORT_SYMBOL(add_sensor_to_zone);
> +
>  /**
>   * thermal_sensor_register - register a new thermal sensor
>   * @name:      name of the thermal sensor
> @@ -1624,6 +1822,7 @@ EXPORT_SYMBOL(thermal_sensor_register);
>  void thermal_sensor_unregister(struct thermal_sensor *ts)
>  {
>         int i;
> +       struct thermal_zone *tz;
>         struct thermal_sensor *pos, *next;
>         bool found = false;
>
> @@ -1643,6 +1842,13 @@ void thermal_sensor_unregister(struct thermal_sensor *ts)
>         if (!found)
>                 return;
>
> +       mutex_lock(&tz_list_lock);
> +
> +       for_each_thermal_zone(tz)
> +               remove_sensor_from_zone(tz, ts);
> +
> +       mutex_unlock(&tz_list_lock);
> +
>         for (i = 0; i < ts->thresholds; i++)
>                 device_remove_file(&ts->device, &ts->thresh_attrs[i].attr);
>
> diff --git a/include/linux/thermal.h b/include/linux/thermal.h
> index e2f85ec..38438be 100644
> --- a/include/linux/thermal.h
> +++ b/include/linux/thermal.h
> @@ -49,6 +49,11 @@
>  /* Default Thermal Governor: Does Linear Throttling */
>  #define DEFAULT_THERMAL_GOVERNOR       "step_wise"
>
> +/* Maximum number of sensors, allowed in a thermal zone
> + * We will start with 5, and increase if needed.
> + */
> +#define MAX_SENSORS_PER_ZONE           5
> +
>  struct thermal_sensor;
>  struct thermal_zone_device;
>  struct thermal_cooling_device;
> @@ -191,6 +196,21 @@ struct thermal_zone_device {
>         struct delayed_work poll_queue;
>  };
>
> +struct thermal_zone {
> +       char name[THERMAL_NAME_LENGTH];
> +       int id;
> +       void *devdata;
> +       struct thermal_zone *ops;
> +       struct thermal_governor *governor;
> +       struct idr idr;
> +       struct device device;
> +       struct list_head node;
> +
> +       /* Sensor level information */
> +       int sensor_indx; /* index into 'sensors' array */
> +       struct thermal_sensor *sensors[MAX_SENSORS_PER_ZONE];
> +};
> +
>  /* Structure that holds thermal governor information */
>  struct thermal_governor {
>         char name[THERMAL_NAME_LENGTH];
> @@ -264,6 +284,11 @@ struct thermal_sensor *thermal_sensor_register(const char *,
>  void thermal_sensor_unregister(struct thermal_sensor *);
>  int enable_sensor_thresholds(struct thermal_sensor *, int);
>
> +struct thermal_zone *create_thermal_zone(const char *, void *);
> +void remove_thermal_zone(struct thermal_zone *);
> +int add_sensor_to_zone(struct thermal_zone *, struct thermal_sensor *);
> +int add_sensor_to_zone_by_name(struct thermal_zone *, const char *);
> +
>  #ifdef CONFIG_NET
>  extern int thermal_generate_netlink_event(u32 orig, enum events event);
>  #else
> --
> 1.7.9.5
>

^ permalink raw reply

* Re: [RESEND PATCH 0/5] cpufreq: db8500: Rename driver and update some parts
From: Rafael J. Wysocki @ 2012-12-12 22:43 UTC (permalink / raw)
  To: Mike Turquette
  Cc: Ulf Hansson, cpufreq, linux-pm@vger.kernel.org, linux-arm-kernel,
	Lee Jones, Linus Walleij, Rickard Andersson, Jonas Aberg,
	Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <CAPtuhTgcmdxtG9B6L5NMyLCrhQh2o0DFkq47e_YCMoBzVhaF=g@mail.gmail.com>

On Wednesday, December 12, 2012 12:56:48 PM Mike Turquette wrote:
> On Mon, Dec 10, 2012 at 7:25 AM, Ulf Hansson <ulf.hansson@stericsson.com> wrote:
> > From: Ulf Hansson <ulf.hansson@linaro.org>
> >
> > This patchset starts by renaming the db8500 cpufreq driver to a more generic
> > name. There are new variants which rely on it too, so instead we give it a
> > family name of dbx500.
> >
> > On top of that a fixup patch for initialization of the driver and some minor
> > cleanup patches are included as well.
> >
> > These patches can be material for 3.8.
> >
> > Since some patches for the db8500 cpufreq driver has recently been merged
> > through Mike Turquette's clk tree, I decided to base these patches on top
> > of the clk-next branch. So I hope Mike is fine by merging these patches
> > through his tree.
> >
> > So I will try to collect ACKs from Rafael for the cpufreq patches. The
> > mfd patch has already been ACKED by Samuel.
> >
> 
> Ulf,
> 
> Just FYI clk-next has been pulled by Linus for 3.8.  I can still take
> this branch once Rafael ACKs the CPUfreq bits if you want, but I think
> you have more options now that clk-next is merged in.  It's up to you.

Please regard them as acked.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH] cpuidle - remove the power_specified field in the driver
From: Rafael J. Wysocki @ 2012-12-12 22:42 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Julius Werner, Francesco Lavra, linux-pm, Deepthi Dharwar,
	Trinabh Gupta, Lists Linaro-dev, len.brown, linux-kernel,
	Andrew Morton, Sameer Nanda
In-Reply-To: <50C8E275.9050308@linaro.org>

On Wednesday, December 12, 2012 09:00:53 PM Daniel Lezcano wrote:
> On 12/12/2012 07:50 PM, Julius Werner wrote:
> > Thanks again for making this happen, Daniel. I like this version,
> > except for the small nitpick that I still think it would make sense to
> > also turn the loop in menu.c around. How about something like this:
> > 
> >        for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i++) {
> >                 struct cpuidle_state *s = &drv->states[i];
> >                 if (!s->disable && s->exit_latency <= latency_req &&
> >                     s->target_residency <= data->predicted_us &&
> >                     s->exit_latency * multiplier <= data->predicted_us) {
> >                         data->last_state_idx = i;
> >                         data->exit_us = s->exit_latency;
> >                         break;
> >                 }
> >         }
> 
> Actually I was planning to do that in a separate patch.

Can you submit that second patch too, please, so that people don't have to
wonder?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-12 21:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C8DE7B.8080708@linux.vnet.ibm.com>

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
> >
> > But perhaps there is another reason to make it per-cpu...

Actually this is not the reason, please see below. But let me repeat,
it is not that I suggest to remove "per-cpu".

> > It seems we can avoid cpu_hotplug.active_writer == current check in
> > get/put.
> >
> > take_cpu_down() can clear this_cpu(writer_signal) right after it takes
> > hotplug_rwlock for writing. It runs with irqs and preemption disabled,
> > nobody else will ever look at writer_signal on its CPU.
> >
>
> Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
> refcount, but we don't care.. because we only need to ensure that they don't
> deadlock by taking the rwlock for read.

Yes, but...

Probably it would be more clean to simply do this_cpu_inc(reader_percpu_refcnt)
after write_lock(hotplug_rwlock). This will have the same effect for get/put,
and we still can make writer_signal global (if we want).

And note that this will also simplify the lockdep annotations which we (imho)
should add later.

Ignoring all complications get_online_cpus_atomic() does:

	if (this_cpu_read(reader_percpu_refcnt))
		this_cpu_inc(reader_percpu_refcnt);
	else if (!writer_signal)
		this_cpu_inc(reader_percpu_refcnt);	// same as above
	else
		read_lock(&hotplug_rwlock);

But for lockdep it should do:

	if (this_cpu_read(reader_percpu_refcnt))
		this_cpu_inc(reader_percpu_refcnt);
	else if (!writer_signal) {
		this_cpu_inc(reader_percpu_refcnt);
		// pretend we take hotplug_rwlock for lockdep
		rwlock_acquire_read(&hotplug_rwlock.dep_map, 0, 0);
	}
	else
		read_lock(&hotplug_rwlock);

And we need to ensure that rwlock_acquire_read() is not called under
write_lock(hotplug_rwlock).

If we use reader_percpu_refcnt to fool get/put, we should not worry.

Oleg.


^ permalink raw reply

* Re: [RESEND PATCH 0/5] cpufreq: db8500: Rename driver and update some parts
From: Mike Turquette @ 2012-12-12 20:56 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J. Wysocki, cpufreq, linux-pm@vger.kernel.org,
	linux-arm-kernel, Lee Jones, Linus Walleij, Rickard Andersson,
	Jonas Aberg, Vincent Guittot, Philippe Begnic, Ulf Hansson
In-Reply-To: <1355153142-9534-1-git-send-email-ulf.hansson@stericsson.com>

On Mon, Dec 10, 2012 at 7:25 AM, Ulf Hansson <ulf.hansson@stericsson.com> wrote:
> From: Ulf Hansson <ulf.hansson@linaro.org>
>
> This patchset starts by renaming the db8500 cpufreq driver to a more generic
> name. There are new variants which rely on it too, so instead we give it a
> family name of dbx500.
>
> On top of that a fixup patch for initialization of the driver and some minor
> cleanup patches are included as well.
>
> These patches can be material for 3.8.
>
> Since some patches for the db8500 cpufreq driver has recently been merged
> through Mike Turquette's clk tree, I decided to base these patches on top
> of the clk-next branch. So I hope Mike is fine by merging these patches
> through his tree.
>
> So I will try to collect ACKs from Rafael for the cpufreq patches. The
> mfd patch has already been ACKED by Samuel.
>

Ulf,

Just FYI clk-next has been pulled by Linus for 3.8.  I can still take
this branch once Rafael ACKs the CPUfreq bits if you want, but I think
you have more options now that clk-next is merged in.  It's up to you.

Regards,
Mike

^ permalink raw reply

* Re: [PATCH] cpuidle - remove the power_specified field in the driver
From: Daniel Lezcano @ 2012-12-12 20:00 UTC (permalink / raw)
  To: Julius Werner
  Cc: Rafael J. Wysocki, Francesco Lavra, linux-pm, Deepthi Dharwar,
	Trinabh Gupta, Lists Linaro-dev, len.brown, linux-kernel,
	Andrew Morton, Sameer Nanda
In-Reply-To: <CAODwPW81fXDNMTvG8juR6iAGFeDV9PXE-65Wp+dAE-mcKYU2nw@mail.gmail.com>

On 12/12/2012 07:50 PM, Julius Werner wrote:
> Thanks again for making this happen, Daniel. I like this version,
> except for the small nitpick that I still think it would make sense to
> also turn the loop in menu.c around. How about something like this:
> 
>        for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i++) {
>                 struct cpuidle_state *s = &drv->states[i];
>                 if (!s->disable && s->exit_latency <= latency_req &&
>                     s->target_residency <= data->predicted_us &&
>                     s->exit_latency * multiplier <= data->predicted_us) {
>                         data->last_state_idx = i;
>                         data->exit_us = s->exit_latency;
>                         break;
>                 }
>         }

Actually I was planning to do that in a separate patch.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-12 19:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212193646.GA29395@redhat.com>

On 12/13/2012 01:06 AM, Oleg Nesterov wrote:
> On 12/12, Oleg Nesterov wrote:
>>
>> On 12/12, Srivatsa S. Bhat wrote:
>>>
>>> On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
>>>>
>>>> Why it needs to be per-cpu? It can be global and __read_mostly to avoid
>>>> the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
>>>> into a single cacheline...
>>>
>>> Even I realized this (that we could use a global) after posting out the
>>> series.. But do you think that it would be better to retain the per-cpu
>>> variant itself, due to the cache effects?
>>
>> I don't really know, up to you. This was the question ;)
> 
> But perhaps there is another reason to make it per-cpu...
> 
> It seems we can avoid cpu_hotplug.active_writer == current check in
> get/put.
> 
> take_cpu_down() can clear this_cpu(writer_signal) right after it takes
> hotplug_rwlock for writing. It runs with irqs and preemption disabled,
> nobody else will ever look at writer_signal on its CPU.
> 

Hmm.. And then the get/put_ on that CPU will increment/decrement the per-cpu
refcount, but we don't care.. because we only need to ensure that they don't
deadlock by taking the rwlock for read.

This sounds great!

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-12 19:36 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212180248.GA24882@redhat.com>

On 12/12, Oleg Nesterov wrote:
>
> On 12/12, Srivatsa S. Bhat wrote:
> >
> > On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
> > >
> > > Why it needs to be per-cpu? It can be global and __read_mostly to avoid
> > > the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
> > > into a single cacheline...
> >
> > Even I realized this (that we could use a global) after posting out the
> > series.. But do you think that it would be better to retain the per-cpu
> > variant itself, due to the cache effects?
>
> I don't really know, up to you. This was the question ;)

But perhaps there is another reason to make it per-cpu...

It seems we can avoid cpu_hotplug.active_writer == current check in
get/put.

take_cpu_down() can clear this_cpu(writer_signal) right after it takes
hotplug_rwlock for writing. It runs with irqs and preemption disabled,
nobody else will ever look at writer_signal on its CPU.

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-12 19:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212184849.GA26784@redhat.com>

On 12/13/2012 12:18 AM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
>>> And _perhaps_ get_ can avoid it too?
>>>
>>> I didn't really try to think, probably this is not right, but can't
>>> something like this work?
>>>
>>> 	#define XXXX	(1 << 16)
>>> 	#define MASK	(XXXX -1)
>>>
>>> 	void get_online_cpus_atomic(void)
>>> 	{
>>> 		preempt_disable();
>>>
>>> 		// only for writer
>>> 		__this_cpu_add(reader_percpu_refcnt, XXXX);
>>>
>>> 		if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
>>> 			__this_cpu_inc(reader_percpu_refcnt);
>>> 		} else {
>>> 			smp_wmb();
>>> 			if (writer_active()) {
>>> 				...
>>> 			}
>>> 		}
>>>
>>> 		__this_cpu_dec(reader_percpu_refcnt, XXXX);
>>> 	}
>>>
>>
>> Sorry, may be I'm too blind to see, but I didn't understand the logic
>> of how the mask helps us avoid disabling interrupts..
> 
> Why do we need cli/sti at all? We should prevent the following race:
> 
> 	- the writer already holds hotplug_rwlock, so get_ must not
> 	  succeed.
> 
> 	- the new reader comes, it increments reader_percpu_refcnt,
> 	  but before it checks writer_active() ...
> 
> 	- irq handler does get_online_cpus_atomic() and sees
> 	  reader_nested_percpu() == T, so it simply increments
> 	  reader_percpu_refcnt and succeeds.
> 
> OTOH, why do we need to increment reader_percpu_refcnt the counter
> in advance? To ensure that either we see writer_active() or the
> writer should see reader_percpu_refcnt != 0 (and that is why they
> should write/read in reverse order).
> 
> The code above tries to avoid this race using the lower 16 bits
> as a "nested-counter", and the upper bits to avoid the race with
> the writer.
> 
> 	// only for writer
> 	__this_cpu_add(reader_percpu_refcnt, XXXX);
> 
> If irq comes and does get_online_cpus_atomic(), it won't be confused
> by __this_cpu_add(XXXX), it will check the lower bits and switch to
> the "slow path".
> 

This is a very clever scheme indeed! :-) Thanks a lot for explaining
it in detail.

> 
> But once again, so far I didn't really try to think. It is quite
> possible I missed something.
> 

Even I don't spot anything wrong with it. But I'll give it some more
thought..

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [PATCH] cpuidle - remove the power_specified field in the driver
From: Julius Werner @ 2012-12-12 18:50 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, Francesco Lavra, linux-pm, Deepthi Dharwar,
	Trinabh Gupta, Lists Linaro-dev, len.brown, linux-kernel,
	Andrew Morton, Sameer Nanda
In-Reply-To: <1355325801-1317-1-git-send-email-daniel.lezcano@linaro.org>

Thanks again for making this happen, Daniel. I like this version,
except for the small nitpick that I still think it would make sense to
also turn the loop in menu.c around. How about something like this:

       for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i++) {
                struct cpuidle_state *s = &drv->states[i];
                if (!s->disable && s->exit_latency <= latency_req &&
                    s->target_residency <= data->predicted_us &&
                    s->exit_latency * multiplier <= data->predicted_us) {
                        data->last_state_idx = i;
                        data->exit_us = s->exit_latency;
                        break;
                }
        }

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-12 18:48 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C8CD52.8040808@linux.vnet.ibm.com>

On 12/13, Srivatsa S. Bhat wrote:
>
> On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
> > And _perhaps_ get_ can avoid it too?
> >
> > I didn't really try to think, probably this is not right, but can't
> > something like this work?
> >
> > 	#define XXXX	(1 << 16)
> > 	#define MASK	(XXXX -1)
> >
> > 	void get_online_cpus_atomic(void)
> > 	{
> > 		preempt_disable();
> >
> > 		// only for writer
> > 		__this_cpu_add(reader_percpu_refcnt, XXXX);
> >
> > 		if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
> > 			__this_cpu_inc(reader_percpu_refcnt);
> > 		} else {
> > 			smp_wmb();
> > 			if (writer_active()) {
> > 				...
> > 			}
> > 		}
> >
> > 		__this_cpu_dec(reader_percpu_refcnt, XXXX);
> > 	}
> >
>
> Sorry, may be I'm too blind to see, but I didn't understand the logic
> of how the mask helps us avoid disabling interrupts..

Why do we need cli/sti at all? We should prevent the following race:

	- the writer already holds hotplug_rwlock, so get_ must not
	  succeed.

	- the new reader comes, it increments reader_percpu_refcnt,
	  but before it checks writer_active() ...

	- irq handler does get_online_cpus_atomic() and sees
	  reader_nested_percpu() == T, so it simply increments
	  reader_percpu_refcnt and succeeds.

OTOH, why do we need to increment reader_percpu_refcnt the counter
in advance? To ensure that either we see writer_active() or the
writer should see reader_percpu_refcnt != 0 (and that is why they
should write/read in reverse order).

The code above tries to avoid this race using the lower 16 bits
as a "nested-counter", and the upper bits to avoid the race with
the writer.

	// only for writer
	__this_cpu_add(reader_percpu_refcnt, XXXX);

If irq comes and does get_online_cpus_atomic(), it won't be confused
by __this_cpu_add(XXXX), it will check the lower bits and switch to
the "slow path".


But once again, so far I didn't really try to think. It is quite
possible I missed something.

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-12 18:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212182308.GA26094@redhat.com>

On 12/12/2012 11:53 PM, Oleg Nesterov wrote:
> On 12/12, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
>>
>>> And when I look at get_online_cpus_atomic() again it uses rmb(). This
>>> doesn't look correct, we need the full barrier between this_cpu_inc()
>>> and writer_active().
>>
>> Hmm..
>>
>>> At the same time reader_nested_percpu() can be checked before mb().
>>
>> I thought that since the increment and the check (reader_nested_percpu)
>> act on the same memory location, they will naturally be run in the given
>> order, without any need for barriers. Am I wrong?
> 
> And this is what I meant, you do not need a barrier before
> reader_nested_percpu().
> 

Ah, ok!

> But you need to ensure that WRITE(reader_percpu_refcnt) and READ(writer_signal)
> can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
> STOREs.
> 

OK, got it. (I know you meant s/can/can't).

I'm trying to see if we can somehow exploit the fact that the writer can
potentially tolerate if a reader ignores his signal (to switch to rwlocks)
for a while... and use this to get rid of barriers in the reader path (without
using synchronize_sched() at the writer, of course). And perhaps also take advantage
of the fact that the read_lock() acts as a one-way barrier..

I don't know, maybe its not possible after all.. :-/
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-12 18:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212180248.GA24882@redhat.com>

On 12/12/2012 11:32 PM, Oleg Nesterov wrote:
> On 12/12, Srivatsa S. Bhat wrote:
>>
>> On 12/12/2012 10:47 PM, Oleg Nesterov wrote:
>>>
>>> Why it needs to be per-cpu? It can be global and __read_mostly to avoid
>>> the false-sharing. OK, perhaps to put reader_percpu_refcnt/writer_signal
>>> into a single cacheline...
>>
>> Even I realized this (that we could use a global) after posting out the
>> series.. But do you think that it would be better to retain the per-cpu
>> variant itself, due to the cache effects?
> 
> I don't really know, up to you. This was the question ;)

OK :-)

> 
>>> Do we really need local_irq_save/restore in put_ ?
>>>
>>
>> Hmm.. good point! I don't think we need it.
> 
> And _perhaps_ get_ can avoid it too?
> 
> I didn't really try to think, probably this is not right, but can't
> something like this work?
> 
> 	#define XXXX	(1 << 16)
> 	#define MASK	(XXXX -1)
> 
> 	void get_online_cpus_atomic(void)
> 	{
> 		preempt_disable();
> 
> 		// only for writer
> 		__this_cpu_add(reader_percpu_refcnt, XXXX);
> 
> 		if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
> 			__this_cpu_inc(reader_percpu_refcnt);
> 		} else {
> 			smp_wmb();
> 			if (writer_active()) {
> 				...
> 			}
> 		}
> 
> 		__this_cpu_dec(reader_percpu_refcnt, XXXX);
> 	}
> 

Sorry, may be I'm too blind to see, but I didn't understand the logic
of how the mask helps us avoid disabling interrupts.. Can you kindly
elaborate?

> 	void put_online_cpus_atomic(void)
> 	{
> 		if (__this_cpu_read(reader_percpu_refcnt) & MASK)
> 			__this_cpu_dec(reader_percpu_refcnt);
> 		else
> 			read_unlock(&hotplug_rwlock);
> 		preempt_enable();
> 	}
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-12 18:23 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C8C8C9.2070605@linux.vnet.ibm.com>

On 12/12, Srivatsa S. Bhat wrote:
>
> On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
>
> > And when I look at get_online_cpus_atomic() again it uses rmb(). This
> > doesn't look correct, we need the full barrier between this_cpu_inc()
> > and writer_active().
>
> Hmm..
>
> > At the same time reader_nested_percpu() can be checked before mb().
>
> I thought that since the increment and the check (reader_nested_percpu)
> act on the same memory location, they will naturally be run in the given
> order, without any need for barriers. Am I wrong?

And this is what I meant, you do not need a barrier before
reader_nested_percpu().

But you need to ensure that WRITE(reader_percpu_refcnt) and READ(writer_signal)
can't be reordered, so you need mb() in between. rmb() can serialize LOADs and
STOREs.

Or I misunderstood?

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-12 18:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121212172431.GA23328@redhat.com>

On 12/12/2012 10:54 PM, Oleg Nesterov wrote:
> On 12/12, Oleg Nesterov wrote:
>>
>> On 12/11, Srivatsa S. Bhat wrote:
>>>
>>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>>> when no writer is active.
>>
>> plus cli/sti ;) and increment/decrement are atomic.
>                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> OOPS, sorry I was going to say "adds mb()".
> 

Ok, got it :)

> And when I look at get_online_cpus_atomic() again it uses rmb(). This
> doesn't look correct, we need the full barrier between this_cpu_inc()
> and writer_active().
> 

Hmm..

> At the same time reader_nested_percpu() can be checked before mb().
> 

I thought that since the increment and the check (reader_nested_percpu)
act on the same memory location, they will naturally be run in the given
order, without any need for barriers. Am I wrong?

(I referred Documentation/memory-barriers.txt again to verify this, and
the second point under the "Guarantees" section looked like it said the
same thing : "Overlapping loads and stores within a particular CPU will
appear to be ordered within that CPU").

Regards,
Srivatsa S. Bhat

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox