* RE: [PATCH 4/8] Thermal: Add Thermal_trip sysfs node
From: R, Durgadoss @ 2012-12-20 7:52 UTC (permalink / raw)
To: Greg KH
Cc: Zhang, Rui, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, hongbo.zhang@linaro.org,
wni@nvidia.com
In-Reply-To: <20121220054201.GE28007@kroah.com>
Hi Greg,
Thank you for looking at this.
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, December 20, 2012 11:12 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> hongbo.zhang@linaro.org; wni@nvidia.com
> Subject: Re: [PATCH 4/8] Thermal: Add Thermal_trip sysfs node
>
> On Tue, Dec 18, 2012 at 02:59:33PM +0530, Durgadoss R wrote:
> > This patch adds a thermal_trip directory under
> > /sys/class/thermal/zoneX. This directory contains
> > the trip point values for sensors bound to this
> > zone.
>
> Eeek, you just broke userspace tools that now can no longer see these
> entries :(
>
> Why do you need to create a subdirectory? As you found out, doing so
> isn't the easiest, right? That is on purpose.
Yes, I observed the complexity.
>
> I really wouldn't recommend doing this at all, please stick within the
> 'struct device' framework here, don't create new kobjects and hang sysfs
> files off of them.
But, we cannot put all _trip directly under ZoneX directory. We can remove the
thermal_trip directory, and put sensorY_trip under /sys/class/thermal/zoneX/.
But this sensorY_trip needs to be a directory which has four sysfs nodes named,
active, passive, crit, hot.
Rui, What do you think about this ?
The only other way I see, is directly put sensorY_trip_[active/passive/hot/crit]
which will create way too many nodes, under /sys/class/thermal/zoneX/.
Thanks,
Durga
>
> thanks,
>
> greg k-h
^ permalink raw reply
* RE: [PATCH 0/8] Thermal Framework Enhancements
From: R, Durgadoss @ 2012-12-20 6:16 UTC (permalink / raw)
To: Greg KH
Cc: Zhang, Rui, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, hongbo.zhang@linaro.org,
wni@nvidia.com
In-Reply-To: <20121220053757.GD28007@kroah.com>
Hi Greg,
> -----Original Message-----
> From: Greg KH [mailto:gregkh@linuxfoundation.org]
> Sent: Thursday, December 20, 2012 11:08 AM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> hongbo.zhang@linaro.org; wni@nvidia.com
> Subject: Re: [PATCH 0/8] Thermal Framework Enhancements
>
> On Tue, Dec 18, 2012 at 02:59:29PM +0530, Durgadoss R wrote:
> > This patch is a v1 based on the RFC submitted here:
> > https://patchwork.kernel.org/patch/1758921/
> >
> > This patch set is based on Rui's -thermal tree, and is
> > tested on a Core-i5 and an Atom netbook.
> >
> > This series contains 8 patches:
> > Patch 1/8: Creates new sensor level APIs
> > Patch 2/8: Creates new zone level APIs. The existing tzd structure is
> > kept as such for clarity and compatibility purposes.
> > Patch 3/8: Creates functions to add/remove a cdev to/from a zone. The
> > existing tcd structure need not be modified.
> > Patch 4/8: Adds a thermal_trip sysfs node, which exposes various trip
> > points for all sensors present in a zone.
> > Patch 5/8: Adds a thermal_map sysfs node. It is a compact representation
> > of the binding relationship between a sensor and a cdev,
> > within a zone.
> > Patch 6/8: Creates Documentation for the new APIs. A new file is
> > created for clarity. Final goal is to merge with the existing
> > file or refactor the files, as whatever seems appropriate.
>
> The documentation for sysfs files needs to be in Documentation/ABI,
> please add new entries there if you are creating new sysfs files.
This patch was meant to add Documentation in the standard .txt under
Documentation/thermal/.
So, will keep this patch, and also add one more patch to document these
ABI under Documentation/ABI/testing/.
Thanks,
Durga
>
> thanks,
>
> greg k-h
^ permalink raw reply
* Re: [PATCH v9 06/10] ata: zpodd: check zero power ready status
From: Aaron Lu @ 2012-12-20 6:07 UTC (permalink / raw)
To: Tejun Heo
Cc: James Bottomley, Rafael J. Wysocki, linux-pm, Jeff Garzik,
Alan Stern, Jeff Wu, Aaron Lu, linux-ide, linux-scsi, linux-acpi
In-Reply-To: <20121203162323.GB19802@htj.dyndns.org>
Hi Tejun,
On 12/04/2012 12:23 AM, Tejun Heo wrote:
> Hello, James.
>
> On Mon, Dec 03, 2012 at 08:25:43AM +0000, James Bottomley wrote:
>>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>>> index e65c62e..1756151 100644
>>> --- a/include/scsi/scsi_device.h
>>> +++ b/include/scsi/scsi_device.h
>>> @@ -160,6 +160,7 @@ struct scsi_device {
>>> unsigned can_power_off:1; /* Device supports runtime power off */
>>> unsigned wce_default_on:1; /* Cache is ON by default */
>>> unsigned no_dif:1; /* T10 PI (DIF) should be disabled */
>>> + unsigned event_driven:1; /* No need to poll the device */
>>>
>>> DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
>>> struct list_head event_list; /* asserted events */
>>
>> Yes, but if we can get away with doing that, it should be in genhd
>> because it's completely generic.
>>
>> I was imagining we'd have to fake the reply to the test unit ready or
>> some other commands, which is why it would need to be in sr.c
>>
>> The check events code is Tejun's baby, if he's OK with it then just do
>> it in genhd.c
>
> The problem here is there's no easy to reach genhd from libata (or the
> other way around) without going through sr. I think we're gonna have
> to have something in sr one way or the other.
Do you think we can do something like the following to get the gendisk
pointer in libata? - We have ata_dev->sdev->sdev_gendev, and the
gendisk->part0.__dev is a child of it, so we can loop scan sdev_gendev's
children to see which one is of block_class and disk_type. Assuming one
device will alloc only one disk(correct?), we can get the gendisk from
libata layer. Code like this:
diff --git a/block/genhd.c b/block/genhd.c
index 9a289d7..448b201 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1619,6 +1619,38 @@ static void disk_events_workfn(struct work_struct *work)
kobject_uevent_env(&disk_to_dev(disk)->kobj, KOBJ_CHANGE, envp);
}
+static int is_gendisk_part0(struct device *dev, void *data)
+{
+ struct device **child = data;
+
+ if (dev->class == &block_class && dev->type == &disk_type) {
+ *child = dev;
+ return 1;
+ } else
+ return 0;
+}
+
+/**
+ * disk_from_device - Get the gendisk pointer for this device.
+ * @dev: the device this gendisk is created for, i.e. gendisk->driverfs_dev
+ *
+ * LLD sometimes need to play with the gendisk without HLD's aware,
+ * this routine gives LLD the required access to gendisk.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+struct gendisk *disk_from_device(struct device *dev)
+{
+ struct device *child;
+
+ if (device_for_each_child(dev, &child, is_gendisk_part0))
+ return dev_to_disk(child);
+ else
+ return NULL;
+}
+EXPORT_SYMBOL(disk_from_device);
+
/*
* A disk events enabled device has the following sysfs nodes under
* its /sys/block/X/ directory.
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 79b8bba..e8002c0 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -427,6 +427,7 @@ extern void disk_block_events(struct gendisk *disk);
extern void disk_unblock_events(struct gendisk *disk);
extern void disk_flush_events(struct gendisk *disk, unsigned int mask);
extern unsigned int disk_clear_events(struct gendisk *disk, unsigned int mask);
+extern struct gendisk *disk_from_device(struct device *dev);
/* drivers/char/random.c */
extern void add_disk_randomness(struct gendisk *disk);
Then together with disk_try_block_events and disk_unblock_events, we can
avoid touching SCSI layer to let ODD stay in zero power state.
So what do you think?
Thanks,
Aaron
^ permalink raw reply related
* RE: [PATCH 2/8] Thermal: Create zone level APIs
From: R, Durgadoss @ 2012-12-20 6:02 UTC (permalink / raw)
To: Joe Perches
Cc: Zhang, Rui, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, hongbo.zhang@linaro.org,
wni@nvidia.com
In-Reply-To: <1355830214.19706.39.camel@joe-AO722>
Hi Joe,
> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Tuesday, December 18, 2012 5:00 PM
> To: R, Durgadoss
> Cc: Zhang, Rui; linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> hongbo.zhang@linaro.org; wni@nvidia.com
> Subject: Re: [PATCH 2/8] Thermal: Create zone level APIs
>
> On Tue, 2012-12-18 at 14:59 +0530, Durgadoss R wrote:
> > This patch adds a new thermal_zone structure to
> > thermal.h. Also, adds zone level APIs to the thermal
> > framework.
>
> []
>
> > diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
>
> > +#define GET_INDEX(tz, ptr, indx, type) \
> > + do { \
> > + int i; \
> > + indx = -EINVAL; \
> > + if (!tz || !ptr) \
> > + break; \
> > + mutex_lock(&type##_list_lock); \
> > + for (i = 0; i < tz->type##_indx; i++) { \
> > + if (tz->type##s[i] == ptr) { \
> > + indx = i; \
> > + break; \
> > + } \
> > + } \
> > + mutex_unlock(&type##_list_lock); \
> > + } while (0)
>
> A statement expression macro returning int would be
> more kernel style like and better to use.
>
Yes, makes sense. Will fix this in next rev.
Thanks,
Durga
> (sorry about the whitespace, evolution 3.6 is crappy)
>
> #define GET_INDEX(tx, ptr, type) \
> ({ \
> int rtn = -EINVAL; \
> do { \
> int i; \
> if (!tz || !ptr) \
> break; \
> mutex_lock(&type##_list_lock); \
> for (i = 0; i < tz->type##_indx; i++) { \
> if (tz->type##s[i] == ptr) { \
> rtn = i; \
> break; \
> } \
> } \
> mutex_unlock(&type##_list_lock); \
> } while (0); \
> rtn; \
> })
>
>
> > +static void remove_sensor_from_zone(struct thermal_zone *tz,
> > + struct thermal_sensor *ts)
> > +{
> > + int j, indx;
> > +
> > + GET_INDEX(tz, ts, indx, sensor);
>
> This becomes
>
> indx = GET_INDEX(tx, ts, sensor);
>
> > + if (indx < 0)
> > + return;
>
^ permalink raw reply
* Re: [PATCH 4/8] Thermal: Add Thermal_trip sysfs node
From: Greg KH @ 2012-12-20 5:42 UTC (permalink / raw)
To: Durgadoss R; +Cc: rui.zhang, linux-pm, linux-kernel, hongbo.zhang, wni
In-Reply-To: <1355822977-4804-5-git-send-email-durgadoss.r@intel.com>
On Tue, Dec 18, 2012 at 02:59:33PM +0530, Durgadoss R wrote:
> This patch adds a thermal_trip directory under
> /sys/class/thermal/zoneX. This directory contains
> the trip point values for sensors bound to this
> zone.
Eeek, you just broke userspace tools that now can no longer see these
entries :(
Why do you need to create a subdirectory? As you found out, doing so
isn't the easiest, right? That is on purpose.
I really wouldn't recommend doing this at all, please stick within the
'struct device' framework here, don't create new kobjects and hang sysfs
files off of them.
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH 0/8] Thermal Framework Enhancements
From: Greg KH @ 2012-12-20 5:37 UTC (permalink / raw)
To: Durgadoss R; +Cc: rui.zhang, linux-pm, linux-kernel, hongbo.zhang, wni
In-Reply-To: <1355822977-4804-1-git-send-email-durgadoss.r@intel.com>
On Tue, Dec 18, 2012 at 02:59:29PM +0530, Durgadoss R wrote:
> This patch is a v1 based on the RFC submitted here:
> https://patchwork.kernel.org/patch/1758921/
>
> This patch set is based on Rui's -thermal tree, and is
> tested on a Core-i5 and an Atom netbook.
>
> This series contains 8 patches:
> Patch 1/8: Creates new sensor level APIs
> Patch 2/8: Creates new zone level APIs. The existing tzd structure is
> kept as such for clarity and compatibility purposes.
> Patch 3/8: Creates functions to add/remove a cdev to/from a zone. The
> existing tcd structure need not be modified.
> Patch 4/8: Adds a thermal_trip sysfs node, which exposes various trip
> points for all sensors present in a zone.
> Patch 5/8: Adds a thermal_map sysfs node. It is a compact representation
> of the binding relationship between a sensor and a cdev,
> within a zone.
> Patch 6/8: Creates Documentation for the new APIs. A new file is
> created for clarity. Final goal is to merge with the existing
> file or refactor the files, as whatever seems appropriate.
The documentation for sysfs files needs to be in Documentation/ABI,
please add new entries there if you are creating new sysfs files.
thanks,
greg k-h
^ permalink raw reply
* [pm:acpi-scan-temp-new 15/17] include/linux/acpi.h:518: warning: 'acpi_dev_pm_get_node' defined but not used
From: kbuild test robot @ 2012-12-20 2:28 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm
tree: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git acpi-scan-temp-new
head: 9970b0ad27dc7970cdb101c79f9c99d699263744
commit: 44374289a89061d604187d122e70722b5625208b [15/17] ACPI / PCI: Rework the setup and cleanup of device wakeup
config: make ARCH=avr32 atngw100_defconfig
All warnings:
In file included from include/uapi/linux/param.h:4,
from include/linux/timex.h:63,
from include/linux/jiffies.h:8,
from include/linux/ktime.h:25,
from include/linux/timer.h:5,
from include/linux/workqueue.h:8,
from include/linux/srcu.h:32,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:738,
from include/linux/gfp.h:4,
from include/linux/kmod.h:22,
from include/linux/module.h:13,
from init/main.c:13:
arch/avr32/include/asm/param.h:6:1: warning: "HZ" redefined
In file included from arch/avr32/include/asm/param.h:4,
from include/uapi/linux/param.h:4,
from include/linux/timex.h:63,
from include/linux/jiffies.h:8,
from include/linux/ktime.h:25,
from include/linux/timer.h:5,
from include/linux/workqueue.h:8,
from include/linux/srcu.h:32,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:738,
from include/linux/gfp.h:4,
from include/linux/kmod.h:22,
from include/linux/module.h:13,
from init/main.c:13:
arch/avr32/include/uapi/asm/param.h:6:1: warning: this is the location of the previous definition
include/linux/acpi.h:518: warning: 'acpi_dev_pm_get_node' defined but not used
vim +/acpi_dev_pm_get_node +518 include/linux/acpi.h
e5cc8ef3 Rafael J. Wysocki 2012-11-02 502 int acpi_subsys_suspend_late(struct device *dev);
e5cc8ef3 Rafael J. Wysocki 2012-11-02 503 int acpi_subsys_resume_early(struct device *dev);
e5cc8ef3 Rafael J. Wysocki 2012-11-02 504 #else
e5cc8ef3 Rafael J. Wysocki 2012-11-02 505 static inline int acpi_dev_suspend_late(struct device *dev) { return 0; }
e5cc8ef3 Rafael J. Wysocki 2012-11-02 506 static inline int acpi_dev_resume_early(struct device *dev) { return 0; }
e5cc8ef3 Rafael J. Wysocki 2012-11-02 507 static inline int acpi_subsys_prepare(struct device *dev) { return 0; }
e5cc8ef3 Rafael J. Wysocki 2012-11-02 508 static inline int acpi_subsys_suspend_late(struct device *dev) { return 0; }
e5cc8ef3 Rafael J. Wysocki 2012-11-02 509 static inline int acpi_subsys_resume_early(struct device *dev) { return 0; }
e5cc8ef3 Rafael J. Wysocki 2012-11-02 510 #endif
e5cc8ef3 Rafael J. Wysocki 2012-11-02 511
e5cc8ef3 Rafael J. Wysocki 2012-11-02 512 #if defined(CONFIG_ACPI) && defined(CONFIG_PM)
44374289 Rafael J. Wysocki 2012-12-20 513 struct acpi_device *acpi_dev_pm_get_node(struct device *dev);
b88ce2a4 Rafael J. Wysocki 2012-11-26 514 int acpi_dev_pm_attach(struct device *dev, bool power_on);
d71f2f88 Rafael J. Wysocki 2012-12-05 515 void acpi_dev_pm_detach(struct device *dev, bool power_off);
e5cc8ef3 Rafael J. Wysocki 2012-11-02 516 #else
44374289 Rafael J. Wysocki 2012-12-20 517 static struct acpi_device *acpi_dev_pm_get_node(struct device *dev)
44374289 Rafael J. Wysocki 2012-12-20 @518 {
44374289 Rafael J. Wysocki 2012-12-20 519 return NULL;
44374289 Rafael J. Wysocki 2012-12-20 520 }
b88ce2a4 Rafael J. Wysocki 2012-11-26 521 static inline int acpi_dev_pm_attach(struct device *dev, bool power_on)
b88ce2a4 Rafael J. Wysocki 2012-11-26 522 {
b88ce2a4 Rafael J. Wysocki 2012-11-26 523 return -ENODEV;
b88ce2a4 Rafael J. Wysocki 2012-11-26 524 }
b88ce2a4 Rafael J. Wysocki 2012-11-26 525 static inline void acpi_dev_pm_detach(struct device *dev, bool power_off) {}
e5cc8ef3 Rafael J. Wysocki 2012-11-02 526 #endif
---
0-DAY kernel build testing backend Open Source Technology Center
Fengguang Wu, Yuanhan Liu Intel Corporation
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-19 19:49 UTC (permalink / raw)
To: Oleg Nesterov
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121219191436.GA25829@redhat.com>
On 12/20/2012 12:44 AM, Oleg Nesterov wrote:
> On 12/19, Srivatsa S. Bhat wrote:
>>
>> BTW, there is a small problem with the synchronize_sched() approach:
>> We can't generalize the synchronization scheme as a generic percpu rwlock
>> because the writer's role is split into 2, the first part (the one having
>> synchronize_sched()) which should be run in process context, and the
>> second part (the rest of it), which can be run in atomic context.
>
> Yes,
>
>> That is, needing the writer to be able to sleep (due to synchronize_sched())
>> will make the locking scheme unusable (in other usecases) I think. And
>> the split (blocking and non-blocking part) itself seems hard to express
>> as a single writer API.
>
> I do not think this is the problem.
>
> We need 2 helpers for writer, the 1st one does synchronize_sched() and the
> 2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
>
Ah, that's the problem no? Users of reader-writer locks expect to run in
atomic context (ie., they don't want to sleep). We can't expose an API that
can make the task go to sleep under the covers!
(And that too, definitely not with a name such as "percpu_write_lock_irqsave()"
... because we are going to be interrupt-safe as well, in the second part...).
> In fact I think that the 1st helper should not do synchronize_sched(),
> and (say) cpu_hotplug_begin() should call it itself. Because if we also
> change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
> synchronize_sched() twice. But lets ignore this for now.
>
Yeah, let's ignore this for now, else I'll get all confused ;-)
> But,
>
>> Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
>> reader in the fast path. Instead let's do a full smp_mb() and get rid of
>> the synchronize_sched() in the writer, and thereby expose this synchronization
>> scheme as generic percpu rwlocks?" Or should we rather use synchronize_sched()
>> and keep this synchronization scheme restricted to CPU hotplug only?
>
> Oh, I do not know ;)
>
> To me, the main question is: can we use synchronize_sched() in cpu_down?
> It is slow.
>
Haha :-) So we don't want smp_mb() in the reader, *and* also don't want
synchronize_sched() in the writer! Sounds like saying we want to have the cake
and eat it too ;-) :P
But seriously speaking, I understand that its a bit of a hard choice to make.
On one side, we can avoid synchronize_sched() at the writer, but have to bear
the burden of smp_mb() at the reader even in the fastpath, when no writer is active.
On the other side, we can make the reader avoid smp_mb(), but the writer has to
suffer a full synchronize_sched(). But an important point is that, at the writer
side, we already wait for so many mutex locks and stuff, like in the CPU_DOWN_PREPARE
stage (as determined by the callbacks). So adding a synchronize_sched() to the mix
shouldn't make it noticeably bad, isn't it?
(I'm not arguing in favor of using synchronize_sched(). I'm just trying to point
out that using it might not be as bad as we think it is, in the CPU hotplug case.
And moreover, since I'm still not convinced about the writer API part if use
synchronize_sched(), I'd rather avoid synchronize_sched().)
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-19 19:14 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50D2047A.1040606@linux.vnet.ibm.com>
On 12/19, Srivatsa S. Bhat wrote:
>
> BTW, there is a small problem with the synchronize_sched() approach:
> We can't generalize the synchronization scheme as a generic percpu rwlock
> because the writer's role is split into 2, the first part (the one having
> synchronize_sched()) which should be run in process context, and the
> second part (the rest of it), which can be run in atomic context.
Yes,
> That is, needing the writer to be able to sleep (due to synchronize_sched())
> will make the locking scheme unusable (in other usecases) I think. And
> the split (blocking and non-blocking part) itself seems hard to express
> as a single writer API.
I do not think this is the problem.
We need 2 helpers for writer, the 1st one does synchronize_sched() and the
2nd one takes rwlock. A generic percpu_write_lock() simply calls them both.
In fact I think that the 1st helper should not do synchronize_sched(),
and (say) cpu_hotplug_begin() should call it itself. Because if we also
change cpu_hotplug_begin() to use percpu_rw_sem we do not want to do
synchronize_sched() twice. But lets ignore this for now.
But,
> Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
> reader in the fast path. Instead let's do a full smp_mb() and get rid of
> the synchronize_sched() in the writer, and thereby expose this synchronization
> scheme as generic percpu rwlocks?" Or should we rather use synchronize_sched()
> and keep this synchronization scheme restricted to CPU hotplug only?
Oh, I do not know ;)
To me, the main question is: can we use synchronize_sched() in cpu_down?
It is slow.
Oleg.
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-19 18:16 UTC (permalink / raw)
To: Oleg Nesterov
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121219163900.GA18516@redhat.com>
On 12/19/2012 10:09 PM, Oleg Nesterov wrote:
> (sorry if you see this email twice)
>
> On 12/19, Srivatsa S. Bhat wrote:
>>
>> On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
>>
>>>> I tried thinking about other ways to avoid that smp_mb() in the reader,
>>>
>>> Just in case, I think there is no way to avoid mb() in _get (although
>>> perhaps it can be "implicit").
>>>
>>
>> Actually, I was trying to avoid mb() in the _fastpath_, when there is no
>> active writer. I missed stating that clearly, sorry.
>
> Yes, I meant the fastpath too,
>
>>> The writer changes cpu_online_mask and drops the lock. We need to ensure
>>> that the reader sees the change in cpu_online_mask after _get returns.
>>>
>>
>> The write_unlock() will ensure the completion of the update to cpu_online_mask,
>> using the same semi-permeable logic that you pointed above. So readers will
>> see the update as soon as the writer releases the lock, right?
>
> Why?
>
> Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
> sets writer_signal[0] = 0, this "enables" fast path on CPU_0.
>
> But, without a barrier, there is no guarantee that CPU_0 will see the
> change in cpu_online_mask after get_online_cpus_atomic() checks
> writer_signal[0] and returns.
>
Hmm, because a reader's code can get reordered to something like:
read cpu_online_mask
get_online_cpus_atomic()
You are right, I missed that.
> Hmm. And this means that the last version lacks the additional rmb()
> (at least) if writer_active() == F.
>
Yes, the fast path needs that smp_rmb(), whereas the slow-path is fine,
because it takes the read_lock().
BTW, there is a small problem with the synchronize_sched() approach:
We can't generalize the synchronization scheme as a generic percpu rwlock
because the writer's role is split into 2, the first part (the one having
synchronize_sched()) which should be run in process context, and the
second part (the rest of it), which can be run in atomic context.
That is, needing the writer to be able to sleep (due to synchronize_sched())
will make the locking scheme unusable (in other usecases) I think. And
the split (blocking and non-blocking part) itself seems hard to express
as a single writer API.
Hmmm.. so what do we do? Shall we say "We anyway have to do smp_rmb() in the
reader in the fast path. Instead let's do a full smp_mb() and get rid of
the synchronize_sched() in the writer, and thereby expose this synchronization
scheme as generic percpu rwlocks?" Or should we rather use synchronize_sched()
and keep this synchronization scheme restricted to CPU hotplug only?
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-19 16:39 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50D0CCB3.10105@linux.vnet.ibm.com>
(sorry if you see this email twice)
On 12/19, Srivatsa S. Bhat wrote:
>
> On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
>
> >> I tried thinking about other ways to avoid that smp_mb() in the reader,
> >
> > Just in case, I think there is no way to avoid mb() in _get (although
> > perhaps it can be "implicit").
> >
>
> Actually, I was trying to avoid mb() in the _fastpath_, when there is no
> active writer. I missed stating that clearly, sorry.
Yes, I meant the fastpath too,
> > The writer changes cpu_online_mask and drops the lock. We need to ensure
> > that the reader sees the change in cpu_online_mask after _get returns.
> >
>
> The write_unlock() will ensure the completion of the update to cpu_online_mask,
> using the same semi-permeable logic that you pointed above. So readers will
> see the update as soon as the writer releases the lock, right?
Why?
Once again, the writer (say) removes CPU_1 from cpu_online_mask, and
sets writer_signal[0] = 0, this "enables" fast path on CPU_0.
But, without a barrier, there is no guarantee that CPU_0 will see the
change in cpu_online_mask after get_online_cpus_atomic() checks
writer_signal[0] and returns.
Hmm. And this means that the last version lacks the additional rmb()
(at least) if writer_active() == F.
Oleg.
^ permalink raw reply
* Re: [PATCH 0/2] clocksource: nomadik-mtu: support timer-based delay
From: John Stultz @ 2012-12-19 16:18 UTC (permalink / raw)
To: Linus Walleij
Cc: Arnd Bergmann, linux-pm, Fabio Baltieri, linux-kernel, cpufreq,
Rafael J. Wysocki, Thomas Gleixner, linux-arm-kernel
In-Reply-To: <CACRpkdZe4eJ_0ND1gTPFKbXVSXaiCTeLW-m4YjZ87ZoDTiFxhA@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 514 bytes --]
On Dec 19, 2012 7:34 AM, "Linus Walleij" <linus.walleij@linaro.org> wrote:
>
> On Mon, Dec 17, 2012 at 7:59 PM, John Stultz <john.stultz@linaro.org>
wrote:
> > On 12/17/2012 10:38 AM, Linus Walleij wrote:
> >> If no clocksource maintainer steps up in a week then notify me and
> >> I'll simply stick these into the ux500 tree.
> >
> > Yea, I'd actually prefer these go through the arch tree where they can
get
> > testing.
>
> OK, can I have your ACK?
Acked-by: John Stultz <john.stultz@linaro.org>
thanks
-john
[-- Attachment #1.2: Type: text/html, Size: 836 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [PATCH 0/2] clocksource: nomadik-mtu: support timer-based delay
From: Linus Walleij @ 2012-12-19 15:34 UTC (permalink / raw)
To: John Stultz
Cc: Fabio Baltieri, Arnd Bergmann, Thomas Gleixner, Rafael J. Wysocki,
cpufreq, linux-pm, linux-arm-kernel, linux-kernel
In-Reply-To: <50CF6B77.3000000@linaro.org>
On Mon, Dec 17, 2012 at 7:59 PM, John Stultz <john.stultz@linaro.org> wrote:
> On 12/17/2012 10:38 AM, Linus Walleij wrote:
>>
>> On Mon, Dec 17, 2012 at 12:36 PM, Fabio Baltieri
>> <fabio.baltieri@linaro.org> wrote:
>>>
>>> On Tue, Dec 04, 2012 at 11:10:43AM +0100, Fabio Baltieri wrote:
>>>>
>>>> this implements timer-based delay support for nomadik and ux500
>>>> platforms, using the MTU as time source, and marks the u8500 cpufreq
>>>> driver as CPUFREQ_CONST_LOOPS accordingly.
>>>>
>>>> The patches are based on Arnd's arm-soc/ux500/mtu-clk branch, as that
>>>> contains latest MTU driver developments, including a driver move/rename,
>>>> but I can rebase if necessary.
>>>
>>> The patches applies cleanly on current mainline now, as all dependencies
>>> has already been merged, can this be considered for for merging too?
>>
>> If no clocksource maintainer steps up in a week then notify me and
>> I'll simply stick these into the ux500 tree.
>
> Yea, I'd actually prefer these go through the arch tree where they can get
> testing.
OK, can I have your ACK?
Yours,
Linus Walleij
^ permalink raw reply
* [PATCH 2/2] thermal: db8500: Use of_match_ptr() macro in db8500_cpufreq_cooling.c
From: Sachin Kamat @ 2012-12-19 8:50 UTC (permalink / raw)
To: linux-pm; +Cc: rui.zhang, hongbo.zhang, sachin.kamat, patches, Hongbo Zhang
In-Reply-To: <1355907059-28720-1-git-send-email-sachin.kamat@linaro.org>
This eliminates having an #ifdef returning NULL for the case
when OF is disabled.
Cc: Hongbo Zhang <hongbo.zhang@stericsson.com>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Compile tested on linux-next.
---
drivers/thermal/db8500_cpufreq_cooling.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/db8500_cpufreq_cooling.c b/drivers/thermal/db8500_cpufreq_cooling.c
index 4cf8e72..2141985 100644
--- a/drivers/thermal/db8500_cpufreq_cooling.c
+++ b/drivers/thermal/db8500_cpufreq_cooling.c
@@ -21,6 +21,7 @@
#include <linux/cpufreq.h>
#include <linux/err.h>
#include <linux/module.h>
+#include <linux/of.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
@@ -73,15 +74,13 @@ static const struct of_device_id db8500_cpufreq_cooling_match[] = {
{ .compatible = "stericsson,db8500-cpufreq-cooling" },
{},
};
-#else
-#define db8500_cpufreq_cooling_match NULL
#endif
static struct platform_driver db8500_cpufreq_cooling_driver = {
.driver = {
.owner = THIS_MODULE,
.name = "db8500-cpufreq-cooling",
- .of_match_table = db8500_cpufreq_cooling_match,
+ .of_match_table = of_match_ptr(db8500_cpufreq_cooling_match),
},
.probe = db8500_cpufreq_cooling_probe,
.suspend = db8500_cpufreq_cooling_suspend,
--
1.7.4.1
^ permalink raw reply related
* [PATCH 1/2] thermal: db8500: Use of_match_ptr() macro in db8500_thermal.c
From: Sachin Kamat @ 2012-12-19 8:50 UTC (permalink / raw)
To: linux-pm; +Cc: rui.zhang, hongbo.zhang, sachin.kamat, patches, Hongbo Zhang
This eliminates having an #ifdef returning NULL for the case
when OF is disabled.
Cc: Hongbo Zhang <hongbo.zhang@stericsson.com>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Compile tested on linux-next.
---
drivers/thermal/db8500_thermal.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/db8500_thermal.c b/drivers/thermal/db8500_thermal.c
index ec71ade..61ce60a 100644
--- a/drivers/thermal/db8500_thermal.c
+++ b/drivers/thermal/db8500_thermal.c
@@ -508,15 +508,13 @@ static const struct of_device_id db8500_thermal_match[] = {
{ .compatible = "stericsson,db8500-thermal" },
{},
};
-#else
-#define db8500_thermal_match NULL
#endif
static struct platform_driver db8500_thermal_driver = {
.driver = {
.owner = THIS_MODULE,
.name = "db8500-thermal",
- .of_match_table = db8500_thermal_match,
+ .of_match_table = of_match_ptr(db8500_thermal_match),
},
.probe = db8500_thermal_probe,
.suspend = db8500_thermal_suspend,
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH] [RFC] cpufreq: can't raise max frequency with cpu_thermal
From: Doug Anderson @ 2012-12-19 5:45 UTC (permalink / raw)
To: amit daniel kachhap
Cc: Sonny Rao, linux-pm, linux-kernel@vger.kernel.org, Zhang Rui,
Sameer Nanda
In-Reply-To: <CADGdYn6P8T9AKBpHV+uz6avh0+8uWQ+qnKyCjYsYdHQRpPriRg@mail.gmail.com>
Amit,
On Tue, Dec 18, 2012 at 8:17 PM, amit daniel kachhap
<amit.daniel@samsung.com> wrote:
> On Tue, Dec 18, 2012 at 12:29 AM, Sonny Rao <sonnyrao@chromium.org> wrote:
>> The cpu_thermal generic thermal management code has a bug where once
>> max cpu frequency has been lowered in sysfs (scaling_max_freq) it is
>> not possible to raise the max back up later. The bug is that the
>> notifer gets called by __cpufreq_set_policy() before the user policy
>> max is raised, and is incorrectly trying to enforce the max frequency
>> policy even when we are trying to change the policy. It is also not
>> clear why this driver is looking at the user policy since it is
>> primarily supposed to enforce thermal policy, not user set policy.
>
> Hi Sunny,
>
> I am not sure if this change is needed.
Do you have a machine that's running with your code? Can you go into
sysfs (/sys/devices/system/cpu/cpu0/cpufreq/) and try lowering then
raising the max frequency by doing something like this (assumes that
you can scale down to 200MHz):
cd /sys/devices/system/cpu/cpu0/cpufreq/
OLD_VAL=$(cat scaling_max_freq)
cat scaling_min_freq > scaling_max_freq
echo ${OLD_VAL} > scaling_max_freq
echo "$(cat scaling_max_freq) should be ${OLD_VAL}. Is it?"
...when I run the above without Sonny's patch on my system I see:
200000 should be 1700000. Is it?
...after Sonny's patch then the above works.
> There is a check in cpufreq_thermal_notifier function to return 0 if
> notify_device == NOTIFY_INVALID. So the user will be always able to
> change the max frequency in normal situation. Did you tested this for
> some corner cases?
> The reason behind putting this check is that I don't want to override
> the user constraints.
>
> Thanks,
> Amit Daniel
>
>>
>> Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
>> ---
>> drivers/thermal/cpu_cooling.c | 4 ----
>> 1 files changed, 0 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>> index 836828e..63bc708 100644
>> --- a/drivers/thermal/cpu_cooling.c
>> +++ b/drivers/thermal/cpu_cooling.c
>> @@ -219,10 +219,6 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>> if (cpumask_test_cpu(policy->cpu, ¬ify_device->allowed_cpus))
>> max_freq = notify_device->cpufreq_val;
>>
>> - /* Never exceed user_policy.max*/
>> - if (max_freq > policy->user_policy.max)
>> - max_freq = policy->user_policy.max;
>> -
>> if (policy->max != max_freq)
>> cpufreq_verify_within_limits(policy, 0, max_freq);
>>
>> --
>> 1.7.7.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
-Doug
^ permalink raw reply
* Re: [PATCH] [RFC] cpufreq: can't raise max frequency with cpu_thermal
From: amit daniel kachhap @ 2012-12-19 4:17 UTC (permalink / raw)
To: Sonny Rao; +Cc: linux-pm, linux-kernel, Zhang Rui, Doug Anderson, Sameer Nanda
In-Reply-To: <1355819384-25983-1-git-send-email-sonnyrao@chromium.org>
On Tue, Dec 18, 2012 at 12:29 AM, Sonny Rao <sonnyrao@chromium.org> wrote:
> The cpu_thermal generic thermal management code has a bug where once
> max cpu frequency has been lowered in sysfs (scaling_max_freq) it is
> not possible to raise the max back up later. The bug is that the
> notifer gets called by __cpufreq_set_policy() before the user policy
> max is raised, and is incorrectly trying to enforce the max frequency
> policy even when we are trying to change the policy. It is also not
> clear why this driver is looking at the user policy since it is
> primarily supposed to enforce thermal policy, not user set policy.
Hi Sunny,
I am not sure if this change is needed.
There is a check in cpufreq_thermal_notifier function to return 0 if
notify_device == NOTIFY_INVALID. So the user will be always able to
change the max frequency in normal situation. Did you tested this for
some corner cases?
The reason behind putting this check is that I don't want to override
the user constraints.
Thanks,
Amit Daniel
>
> Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
> ---
> drivers/thermal/cpu_cooling.c | 4 ----
> 1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index 836828e..63bc708 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -219,10 +219,6 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> if (cpumask_test_cpu(policy->cpu, ¬ify_device->allowed_cpus))
> max_freq = notify_device->cpufreq_val;
>
> - /* Never exceed user_policy.max*/
> - if (max_freq > policy->user_policy.max)
> - max_freq = policy->user_policy.max;
> -
> if (policy->max != max_freq)
> cpufreq_verify_within_limits(policy, 0, max_freq);
>
> --
> 1.7.7.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [GIT PULL] power tools for Linux-3.8
From: Linus Torvalds @ 2012-12-18 20:38 UTC (permalink / raw)
To: Len Brown; +Cc: Linux PM list, linux-kernel
In-Reply-To: <50CF6FC5.20005@kernel.org>
On Mon, Dec 17, 2012 at 11:17 AM, Len Brown <lenb@kernel.org> wrote:
>
> There will be a logical conflict with the uapi changes
> which breaks the turbostat build. It can be fixed this way:
I *think* you meant to point me at the 'release' branch, so that's
what I pulled.
What you actually pointed me at was the 'next' branch:
> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git next
which had that fixed up as a separate commit (which breaks bisecting
the build, but I guess that doesn't much matter for something small
like turbostat).
Linus
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-18 20:06 UTC (permalink / raw)
To: Oleg Nesterov
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121218194357.GA27972@redhat.com>
On 12/19/2012 01:13 AM, Oleg Nesterov wrote:
> On 12/18, Srivatsa S. Bhat wrote:
>>
>> So now that we can't avoid disabling and enabling interrupts,
>
> Still I think it would be better to not use local_irq_save/restore
> directly.
Sure, we can use this_cpu_add() itself. I explicitly used
local_irq_save/restore here just to explain my question.
> And,
>
>> I was
>> wondering if we could exploit this to avoid the smp_mb()..
>>
>> Maybe this is a stupid question, but I'll shoot it anyway...
>> Does local_irq_disable()/enable provide any ordering guarantees by any chance?
>
> Oh, I do not know.
>
> But please look at the comment above prepare_to_wait(). It assumes
> that even spin_unlock_irqrestore() is not the full barrier.
>
Semi-permeable barrier.. Hmm..
> In any case. get_online_cpus_atomic() has to use irq_restore, not
> irq_enable. And _restore does nothing "special" if irqs were already
> disabled, so I think we can't rely on sti.
>
Right, I forgot about the _restore part.
>> I tried thinking about other ways to avoid that smp_mb() in the reader,
>
> Just in case, I think there is no way to avoid mb() in _get (although
> perhaps it can be "implicit").
>
Actually, I was trying to avoid mb() in the _fastpath_, when there is no
active writer. I missed stating that clearly, sorry.
> The writer changes cpu_online_mask and drops the lock. We need to ensure
> that the reader sees the change in cpu_online_mask after _get returns.
>
The write_unlock() will ensure the completion of the update to cpu_online_mask,
using the same semi-permeable logic that you pointed above. So readers will
see the update as soon as the writer releases the lock, right?
>> but was unsuccessful. So if the above assumption is wrong, I guess we'll
>> just have to go with the version that uses synchronize_sched() at the
>> writer-side.
>
> In this case we can also convert get_online_cpus() to use percpu_rwsem
> and avoid mutex_lock(&cpu_hotplug.lock), but this is minor I guess.
> I do not think get_online_cpus() is called too often.
>
Yes, we could do that as well. I remember you saying that you had some
patches for percpu_rwsem to help use it in cpu hotplug code (to make it
recursive, IIRC).
So, I guess we'll go with the synchronize_sched() approach for percpu rwlocks
then. Tejun, it is still worthwhile to expose this as a generic percpu rwlock
and then use it inside cpu hotplug code, right?
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-18 19:43 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50D09180.4080703@linux.vnet.ibm.com>
On 12/18, Srivatsa S. Bhat wrote:
>
> So now that we can't avoid disabling and enabling interrupts,
Still I think it would be better to not use local_irq_save/restore
directly. And,
> I was
> wondering if we could exploit this to avoid the smp_mb()..
>
> Maybe this is a stupid question, but I'll shoot it anyway...
> Does local_irq_disable()/enable provide any ordering guarantees by any chance?
Oh, I do not know.
But please look at the comment above prepare_to_wait(). It assumes
that even spin_unlock_irqrestore() is not the full barrier.
In any case. get_online_cpus_atomic() has to use irq_restore, not
irq_enable. And _restore does nothing "special" if irqs were already
disabled, so I think we can't rely on sti.
> I tried thinking about other ways to avoid that smp_mb() in the reader,
Just in case, I think there is no way to avoid mb() in _get (although
perhaps it can be "implicit").
The writer changes cpu_online_mask and drops the lock. We need to ensure
that the reader sees the change in cpu_online_mask after _get returns.
> but was unsuccessful. So if the above assumption is wrong, I guess we'll
> just have to go with the version that uses synchronize_sched() at the
> writer-side.
In this case we can also convert get_online_cpus() to use percpu_rwsem
and avoid mutex_lock(&cpu_hotplug.lock), but this is minor I guess.
I do not think get_online_cpus() is called too often.
Oleg.
^ permalink raw reply
* Re: [PATCH] [RFC] cpufreq: can't raise max frequency with cpu_thermal
From: Doug Anderson @ 2012-12-18 16:03 UTC (permalink / raw)
To: Sonny Rao
Cc: linux-pm, linux-kernel@vger.kernel.org, Zhang Rui,
Amit Daniel Kachhap, Sameer Nanda, Subash
In-Reply-To: <1355819384-25983-1-git-send-email-sonnyrao@chromium.org>
On Tue, Dec 18, 2012 at 12:29 AM, Sonny Rao <sonnyrao@chromium.org> wrote:
> The cpu_thermal generic thermal management code has a bug where once
> max cpu frequency has been lowered in sysfs (scaling_max_freq) it is
> not possible to raise the max back up later. The bug is that the
> notifer gets called by __cpufreq_set_policy() before the user policy
> max is raised, and is incorrectly trying to enforce the max frequency
> policy even when we are trying to change the policy. It is also not
> clear why this driver is looking at the user policy since it is
> primarily supposed to enforce thermal policy, not user set policy.
>
> Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
> ---
> drivers/thermal/cpu_cooling.c | 4 ----
> 1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index 836828e..63bc708 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -219,10 +219,6 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
> if (cpumask_test_cpu(policy->cpu, ¬ify_device->allowed_cpus))
> max_freq = notify_device->cpufreq_val;
>
> - /* Never exceed user_policy.max*/
> - if (max_freq > policy->user_policy.max)
> - max_freq = policy->user_policy.max;
> -
> if (policy->max != max_freq)
> cpufreq_verify_within_limits(policy, 0, max_freq);
>
> --
> 1.7.7.3
>
Sonny's change matches what the "ACPI version" of this code
(drivers/acpi/processor_thermal.c) does as well. I would certainly be
interested to know why the code was added here in the first place.
Amit: do you know?
Reviewed-by: Doug Anderson <dianders@chromium.org>
^ permalink raw reply
* Re: [RFC PATCH v4 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-18 15:53 UTC (permalink / raw)
To: Oleg Nesterov
Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121214180345.GA22024@redhat.com>
On 12/14/2012 11:33 PM, Oleg Nesterov wrote:
> On 12/13, Srivatsa S. Bhat wrote:
>>
>> On 12/13/2012 09:47 PM, Oleg Nesterov wrote:
>>> On 12/13, Srivatsa S. Bhat wrote:
>>>>
>>>> On 12/13/2012 12:42 AM, Srivatsa S. Bhat wrote:
>>>>>
>>>>> Even I don't spot anything wrong with it. But I'll give it some more
>>>>> thought..
>>>>
>>>> Since an interrupt handler can also run get_online_cpus_atomic(), we
>>>> cannot use the __this_cpu_* versions for modifying reader_percpu_refcnt,
>>>> right?
>>>
>>> Hmm. I thought that __this_cpu_* must be safe under preempt_disable().
>>> IOW, I thought that, say, this_cpu_inc() is "equal" to preempt_disable +
>>> __this_cpu_inc() correctness-wise.
>>>
>>> And. I thought that this_cpu_inc() is safe wrt interrupt, like local_t.
>>>
>>> But when I try to read the comments percpu.h, I am starting to think that
>>> even this_cpu_inc() is not safe if irq handler can do the same?
>>>
>>
>> The comment seems to say that its not safe wrt interrupts. But looking at
>> the code in include/linux/percpu.h, IIUC, that is true only about
>> this_cpu_read() because it only disables preemption.
>>
>> However, this_cpu_inc() looks safe wrt interrupts because it wraps the
>> increment within raw_local_irqsave()/restore().
>
> You mean _this_cpu_generic_to_op() I guess. So yes, I think you are right,
> this_cpu_* should be irq-safe, but __this_cpu_* is not.
>
Yes.
> Thanks.
>
> At least on x86 there is no difference between this_ and __this_, both do
> percpu_add_op() without local_irq_disable/enable. But it seems that most
> of architectures use generic code.
>
So now that we can't avoid disabling and enabling interrupts, I was
wondering if we could exploit this to avoid the smp_mb()..
Maybe this is a stupid question, but I'll shoot it anyway...
Does local_irq_disable()/enable provide any ordering guarantees by any chance?
I think the answer is no, but if it is yes, I guess we can do as shown
below to ensure that STORE(reader_percpu_refcnt) happens before
LOAD(writer_signal).
void get_online_cpus_atomic(void)
{
unsigned long flags;
preempt_disable();
//only for writer
local_irq_save(flags);
__this_cpu_add(reader_percpu_refcnt, XXXX);
local_irq_restore(flags);
//no need of an explicit smp_mb()
if (__this_cpu_read(reader_percpu_refcnt) & MASK) {
this_cpu_inc(reader_percpu_refcnt);
} else if (writer_active()) {
...
}
this_cpu_dec(reader_percpu_refcnt, XXXX);
}
I tried thinking about other ways to avoid that smp_mb() in the reader,
but was unsuccessful. So if the above assumption is wrong, I guess we'll
just have to go with the version that uses synchronize_sched() at the
writer-side.
Regards,
Srivatsa S. Bhat
^ permalink raw reply
* Re: [PATCH 2/8] Thermal: Create zone level APIs
From: Joe Perches @ 2012-12-18 11:30 UTC (permalink / raw)
To: Durgadoss R; +Cc: rui.zhang, linux-pm, linux-kernel, hongbo.zhang, wni
In-Reply-To: <1355822977-4804-3-git-send-email-durgadoss.r@intel.com>
On Tue, 2012-12-18 at 14:59 +0530, Durgadoss R wrote:
> This patch adds a new thermal_zone structure to
> thermal.h. Also, adds zone level APIs to the thermal
> framework.
[]
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
> +#define GET_INDEX(tz, ptr, indx, type) \
> + do { \
> + int i; \
> + indx = -EINVAL; \
> + if (!tz || !ptr) \
> + break; \
> + mutex_lock(&type##_list_lock); \
> + for (i = 0; i < tz->type##_indx; i++) { \
> + if (tz->type##s[i] == ptr) { \
> + indx = i; \
> + break; \
> + } \
> + } \
> + mutex_unlock(&type##_list_lock); \
> + } while (0)
A statement expression macro returning int would be
more kernel style like and better to use.
(sorry about the whitespace, evolution 3.6 is crappy)
#define GET_INDEX(tx, ptr, type) \
({ \
int rtn = -EINVAL; \
do { \
int i; \
if (!tz || !ptr) \
break; \
mutex_lock(&type##_list_lock); \
for (i = 0; i < tz->type##_indx; i++) { \
if (tz->type##s[i] == ptr) { \
rtn = i; \
break; \
} \
} \
mutex_unlock(&type##_list_lock); \
} while (0); \
rtn; \
})
> +static void remove_sensor_from_zone(struct thermal_zone *tz,
> + struct thermal_sensor *ts)
> +{
> + int j, indx;
> +
> + GET_INDEX(tz, ts, indx, sensor);
This becomes
indx = GET_INDEX(tx, ts, sensor);
> + if (indx < 0)
> + return;
^ permalink raw reply
* Re: [PATCH 1/8] Thermal: Create sensor level APIs
From: Joe Perches @ 2012-12-18 11:13 UTC (permalink / raw)
To: Durgadoss R; +Cc: rui.zhang, linux-pm, linux-kernel, hongbo.zhang, wni
In-Reply-To: <1355822977-4804-2-git-send-email-durgadoss.r@intel.com>
On Tue, 2012-12-18 at 14:59 +0530, Durgadoss R wrote:
> This patch creates sensor level APIs, in the
> generic thermal framework.
Just some trivial notes.
> diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
[]
> +static ssize_t
> +sensor_temp_show(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> + int ret;
> + long val;
> + struct thermal_sensor *ts = to_thermal_sensor(dev);
> +
> + ret = ts->ops->get_temp(ts, &val);
> +
> + return ret ? ret : sprintf(buf, "%ld\n", val);
I'd much prefer the form
ret = ts->ops...
if (ret)
return ret;
return sprintf(buf, "%ld\n", val);
Otherwise, maybe use gcc's pretty common ?: extension
return ret ?: sprintf(...)
[]
> +static int enable_sensor_thresholds(struct thermal_sensor *ts, int count)
> +{
> + int i;
> + int size = sizeof(struct thermal_attr) * count;
> +
> + ts->thresh_attrs = kzalloc(size, GFP_KERNEL);
kcalloc
> + if (!ts->thresh_attrs)
> + return -ENOMEM;
> +
> + if (ts->ops->get_hyst) {
> + ts->hyst_attrs = kzalloc(size, GFP_KERNEL);
kcalloc here too
^ permalink raw reply
* [PATCH 4/8] Thermal: Add Thermal_trip sysfs node
From: Durgadoss R @ 2012-12-18 9:29 UTC (permalink / raw)
To: rui.zhang, linux-pm; +Cc: linux-kernel, hongbo.zhang, wni, Durgadoss R
In-Reply-To: <1355822977-4804-1-git-send-email-durgadoss.r@intel.com>
This patch adds a thermal_trip directory under
/sys/class/thermal/zoneX. This directory contains
the trip point values for sensors bound to this
zone.
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
---
drivers/thermal/thermal_sys.c | 237 ++++++++++++++++++++++++++++++++++++++++-
include/linux/thermal.h | 37 +++++++
2 files changed, 272 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
index b39bf97..29ec073 100644
--- a/drivers/thermal/thermal_sys.c
+++ b/drivers/thermal/thermal_sys.c
@@ -448,6 +448,22 @@ static void thermal_zone_device_check(struct work_struct *work)
thermal_zone_device_update(tz);
}
+static int get_sensor_indx_by_kobj(struct thermal_zone *tz, const char *name)
+{
+ int i, indx = -EINVAL;
+
+ mutex_lock(&sensor_list_lock);
+ for (i = 0; i < tz->sensor_indx; i++) {
+ if (!strnicmp(name, kobject_name(tz->kobj_trip[i]),
+ THERMAL_NAME_LENGTH)) {
+ indx = i;
+ break;
+ }
+ }
+ mutex_unlock(&sensor_list_lock);
+ return indx;
+}
+
static void remove_sensor_from_zone(struct thermal_zone *tz,
struct thermal_sensor *ts)
{
@@ -459,9 +475,15 @@ static void remove_sensor_from_zone(struct thermal_zone *tz,
sysfs_remove_link(&tz->device.kobj, kobject_name(&ts->device.kobj));
+ /* Delete this sensor's trip Kobject */
+ kobject_del(tz->kobj_trip[indx]);
+
/* Shift the entries in the tz->sensors array */
- for (j = indx; j < MAX_SENSORS_PER_ZONE - 1; j++)
+ for (j = indx; j < MAX_SENSORS_PER_ZONE - 1; j++) {
tz->sensors[j] = tz->sensors[j + 1];
+ tz->sensor_trip[j] = tz->sensor_trip[j + 1];
+ tz->kobj_trip[j] = tz->kobj_trip[j + 1];
+ }
tz->sensor_indx--;
}
@@ -875,6 +897,120 @@ policy_show(struct device *dev, struct device_attribute *devattr, char *buf)
return sprintf(buf, "%s\n", tz->governor->name);
}
+static ssize_t
+active_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+ int i, indx, ret = 0;
+ struct thermal_zone *tz;
+ struct device *dev;
+
+ /* In this function, for
+ * /sys/class/thermal/zoneX/thermal_trip/sensorY:
+ * attr points to sysfs node 'active'
+ * kobj points to sensorY
+ * kobj->parent points to thermal_trip
+ * kobj->parent->parent points to zoneX
+ */
+
+ /* Get the zone pointer */
+ dev = container_of(kobj->parent->parent, struct device, kobj);
+ tz = to_zone(dev);
+ if (!tz)
+ return -EINVAL;
+
+ /*
+ * We need this because in the sysfs tree, 'sensorY' is
+ * not really the sensor pointer. It just has the name
+ * 'sensorY'; whereas 'zoneX' is actually the zone pointer.
+ * This means container_of(kobj, struct device, kobj) will not
+ * provide the actual sensor pointer.
+ */
+ indx = get_sensor_indx_by_kobj(tz, kobject_name(kobj));
+ if (indx < 0)
+ return indx;
+
+ if (tz->sensor_trip[indx]->num_active_trips <= 0)
+ return sprintf(buf, "<Not available>\n");
+
+ ret += sprintf(buf, "0x%x", tz->sensor_trip[indx]->active_trip_mask);
+ for (i = 0; i < tz->sensor_trip[indx]->num_active_trips; i++) {
+ ret += sprintf(buf + ret, " %d",
+ tz->sensor_trip[indx]->active_trips[i]);
+ }
+
+ ret += sprintf(buf + ret, "\n");
+ return ret;
+}
+
+static ssize_t
+ptrip_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+ int i, indx, ret = 0;
+ struct thermal_zone *tz;
+ struct device *dev;
+
+ /* Get the zone pointer */
+ dev = container_of(kobj->parent->parent, struct device, kobj);
+ tz = to_zone(dev);
+ if (!tz)
+ return -EINVAL;
+
+ indx = get_sensor_indx_by_kobj(tz, kobject_name(kobj));
+ if (indx < 0)
+ return indx;
+
+ if (tz->sensor_trip[indx]->num_passive_trips <= 0)
+ return sprintf(buf, "<Not available>\n");
+
+ for (i = 0; i < tz->sensor_trip[indx]->num_passive_trips; i++) {
+ ret += sprintf(buf + ret, "%d ",
+ tz->sensor_trip[indx]->passive_trips[i]);
+ }
+
+ ret += sprintf(buf + ret, "\n");
+ return ret;
+}
+
+static ssize_t
+hot_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+ int indx;
+ struct thermal_zone *tz;
+ struct device *dev;
+
+ /* Get the zone pointer */
+ dev = container_of(kobj->parent->parent, struct device, kobj);
+ tz = to_zone(dev);
+ if (!tz)
+ return -EINVAL;
+
+ indx = get_sensor_indx_by_kobj(tz, kobject_name(kobj));
+ if (indx < 0)
+ return indx;
+
+ return sprintf(buf, "%d\n", tz->sensor_trip[indx]->hot);
+}
+
+static ssize_t
+critical_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
+{
+ int indx;
+ struct thermal_zone *tz;
+ struct device *dev;
+
+ /* Get the zone pointer */
+ dev = container_of(kobj->parent->parent, struct device, kobj);
+ tz = to_zone(dev);
+ if (!tz)
+ return -EINVAL;
+
+ indx = get_sensor_indx_by_kobj(tz, kobject_name(kobj));
+ if (indx < 0)
+ return indx;
+
+ return sprintf(buf, "%d\n", tz->sensor_trip[indx]->crit);
+}
+
static DEVICE_ATTR(type, 0444, type_show, NULL);
static DEVICE_ATTR(temp, 0444, temp_show, NULL);
static DEVICE_ATTR(mode, 0644, mode_show, mode_store);
@@ -885,7 +1021,27 @@ static DEVICE_ATTR(policy, S_IRUGO | S_IWUSR, policy_show, policy_store);
static DEVICE_ATTR(sensor_name, 0444, sensor_name_show, NULL);
static DEVICE_ATTR(temp_input, 0444, sensor_temp_show, NULL);
-static DEVICE_ATTR(zone_name, 0444, zone_name_show, NULL);
+/* Thermal zone attributes */
+static DEVICE_ATTR(zone_name, S_IRUGO, zone_name_show, NULL);
+
+/* Thermal trip attributes */
+static struct kobj_attribute active_attr = __ATTR_RO(active);
+/* TODO: rename this to passive while removing old code */
+static struct kobj_attribute passive_attr = __ATTR_RO(ptrip);
+static struct kobj_attribute hot_attr = __ATTR_RO(hot);
+static struct kobj_attribute crit_attr = __ATTR_RO(critical);
+
+static struct attribute *trip_attrs[] = {
+ &active_attr.attr,
+ &passive_attr.attr,
+ &hot_attr.attr,
+ &crit_attr.attr,
+ NULL,
+ };
+
+static struct attribute_group trip_attr_group = {
+ .attrs = trip_attrs,
+ };
/* sys I/F for cooling device */
#define to_cooling_device(_dev) \
@@ -1770,12 +1926,19 @@ struct thermal_zone *create_thermal_zone(const char *name, void *devdata)
if (ret)
goto exit_unregister;
+ tz->kobj_thermal_trip = kobject_create_and_add("thermal_trip",
+ &tz->device.kobj);
+ if (!tz->kobj_thermal_trip)
+ goto exit_name;
+
/* Add this zone to the global list of thermal zones */
mutex_lock(&zone_list_lock);
list_add_tail(&tz->node, &thermal_zone_list);
mutex_unlock(&zone_list_lock);
return tz;
+exit_name:
+ device_remove_file(&tz->device, &dev_attr_zone_name);
exit_unregister:
device_unregister(&tz->device);
exit_idr:
@@ -1789,6 +1952,7 @@ EXPORT_SYMBOL(create_thermal_zone);
void remove_thermal_zone(struct thermal_zone *tz)
{
struct thermal_zone *pos, *next;
+ int i;
bool found = false;
if (!tz)
@@ -1809,6 +1973,33 @@ void remove_thermal_zone(struct thermal_zone *tz)
device_remove_file(&tz->device, &dev_attr_zone_name);
+ /* Just for ease of usage */
+ i = tz->sensor_indx;
+
+ while (--i >= 0) {
+ /* Remove /sys/class/thermal/zoneX/sensorY */
+ sysfs_remove_link(&tz->device.kobj,
+ kobject_name(&tz->sensors[i]->device.kobj));
+
+ /* Remove /sys/class/thermal/zoneX/thermal_trip/sensorY */
+ if (tz->kobj_trip[i]) {
+ sysfs_remove_group(tz->kobj_trip[i], &trip_attr_group);
+ kobject_del(tz->kobj_trip[i]);
+ }
+ }
+
+ /* Remove /sys/class/thermal/zoneX/thermal_trip */
+ kobject_del(tz->kobj_thermal_trip);
+
+ /* Release the cdevs attached to this zone */
+ i = tz->cdev_indx;
+
+ while (--i >= 0) {
+ /* Remove /sys/class/thermal/zoneX/cooling_deviceY */
+ sysfs_remove_link(&tz->device.kobj,
+ kobject_name(&tz->cdevs[i]->device.kobj));
+ }
+
release_idr(&thermal_zone_idr, &thermal_idr_lock, tz->id);
idr_destroy(&tz->idr);
@@ -1920,6 +2111,48 @@ exit_zone:
}
EXPORT_SYMBOL(add_cdev_to_zone);
+int add_sensor_trip_info(struct thermal_zone *tz, struct thermal_sensor *ts,
+ struct thermal_trip_point *trip)
+{
+ int indx, ret = -EINVAL;
+
+ if (!tz || !ts || !trip)
+ return ret;
+
+ mutex_lock(&zone_list_lock);
+
+ GET_INDEX(tz, ts, indx, sensor);
+ if (indx < 0)
+ goto exit_indx;
+
+ /* Create kobj for /sys/class/thermal/zoneX/thermal_trip/sensorY */
+ tz->kobj_trip[indx] = kobject_create_and_add(
+ kobject_name(&ts->device.kobj),
+ tz->kobj_thermal_trip);
+ if (!tz->kobj_trip[indx]) {
+ ret = -ENOMEM;
+ goto exit_indx;
+ }
+
+ ret = sysfs_create_group(tz->kobj_trip[indx], &trip_attr_group);
+ if (ret) {
+ dev_err(&tz->device, "sysfs_create_group failed:%d\n", ret);
+ goto exit_kobj;
+ }
+
+ tz->sensor_trip[indx] = trip;
+ mutex_unlock(&zone_list_lock);
+ return 0;
+
+exit_kobj:
+ kobject_del(tz->kobj_trip[indx]);
+ tz->kobj_trip[indx] = NULL;
+exit_indx:
+ mutex_unlock(&zone_list_lock);
+ return ret;
+}
+EXPORT_SYMBOL(add_sensor_trip_info);
+
/**
* thermal_sensor_register - register a new thermal sensor
* @name: name of the thermal sensor
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index c4e45c7..8372f05 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -158,6 +158,30 @@ struct thermal_attr {
char name[THERMAL_NAME_LENGTH];
};
+/*
+ * This structure defines the trip points for a sensor.
+ * The actual values for these trip points come from
+ * platform characterization. The thermal governors
+ * (either kernel or user space) may take appropriate
+ * actions when the sensors reach these trip points.
+ * See Documentation/thermal/sysfs-api2.txt for more details.
+ *
+ * As of now, For a particular sensor, we support:
+ * a) 1 hot trip point
+ * b) 1 critical trip point
+ * c) 'n' passive trip points
+ * d) 'm' active trip points
+ */
+struct thermal_trip_point {
+ int hot;
+ int crit;
+ int num_passive_trips;
+ int *passive_trips;
+ int num_active_trips;
+ int *active_trips;
+ int active_trip_mask;
+};
+
struct thermal_sensor {
char name[THERMAL_NAME_LENGTH];
int id;
@@ -215,6 +239,16 @@ struct thermal_zone {
/* cdev level information */
int cdev_indx; /* index into 'cdevs' array */
struct thermal_cooling_device *cdevs[MAX_CDEVS_PER_ZONE];
+
+ /*
+ * Thermal sensors trip information:
+ * kobj_thermal_trip: /sys/class/thermal/zoneX/thermal_trip
+ * kobj_trip: /sys/class/thermal/zoneX/thermal_trip/sensorY
+ * sensor_trip: trip point information for each sensor
+ */
+ struct kobject *kobj_thermal_trip;
+ struct kobject *kobj_trip[MAX_SENSORS_PER_ZONE];
+ struct thermal_trip_point *sensor_trip[MAX_SENSORS_PER_ZONE];
};
/* Structure that holds thermal governor information */
@@ -295,6 +329,9 @@ int add_sensor_to_zone(struct thermal_zone *, struct thermal_sensor *);
int add_cdev_to_zone(struct thermal_zone *, struct thermal_cooling_device *);
+int add_sensor_trip_info(struct thermal_zone *, struct thermal_sensor *,
+ struct thermal_trip_point *);
+
#ifdef CONFIG_NET
extern int thermal_generate_netlink_event(u32 orig, enum events event);
#else
--
1.7.9.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox