linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix polling policy in the absence of _TZP
@ 2008-05-14  1:00 Matthew Garrett
  2008-05-14  1:03 ` [PATCH 1/2] Add a list of processor objects to the ACPI core Matthew Garrett
  2008-05-14  3:19 ` [PATCH] Fix polling policy in the absence of _TZP Len Brown
  0 siblings, 2 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14  1:00 UTC (permalink / raw)
  To: lenb; +Cc: linux-acpi

Fix Linux to conform to 11.3.18 of the ACPI spec 3.0. In the absence of 
a _TZP method, we should be polling at a default frequency. While it can 
be argued that "0" is a default frequency, I don't think that's what the 
spec authors had in mind.

Signed-off-by: Matthew Garrett <mjg@redhat.com>

---

Arguably this should be implemented in such a way that the polling rate 
increases as the temperature rises and drops if the trend is downwards. 
That would give higher resolution when we're heading towards a trip 
point, without any significant increase in power consumption since 
increased temperatures imply that the CPU is probably not idle.

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 504385b..93cb3e8 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -846,10 +846,11 @@ static void acpi_thermal_check(void *data)
 	 * Calculate Sleep Time
 	 * --------------------
 	 * If we're in the passive state, use _TSP's value.  Otherwise
-	 * use the default polling frequency (e.g. _TZP).  If no polling
-	 * frequency is specified then we'll wait forever (at least until
-	 * a thermal event occurs).  Note that _TSP and _TZD values are
-	 * given in 1/10th seconds (we must covert to milliseconds).
+	 * use the default polling frequency (e.g. _TZP).  If no
+	 * polling frequency is specified then we'll wait 10 seconds
+	 * (or until a thermal event occurs).  Note that _TSP and _TZD
+	 * values are given in 1/10th seconds (we must covert to
+	 * milliseconds).
 	 */
 	if (tz->state.passive) {
 		sleep_time = tz->trips.passive.tsp * 100;
@@ -1575,8 +1576,9 @@ static int acpi_thermal_get_info(struct acpi_thermal *tz)
 	/* Get default polling frequency [_TZP] (optional) */
 	if (tzp)
 		tz->polling_frequency = tzp;
-	else
-		acpi_thermal_get_polling_frequency(tz);
+	else if (acpi_thermal_get_polling_frequency(tz) == -ENODEV)
+		/* If no _TZP, default to polling every 10 seconds */
+		tz->polling_frequency = 100;
 
 	return 0;
 }

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] Add a list of processor objects to the ACPI core
  2008-05-14  1:00 [PATCH] Fix polling policy in the absence of _TZP Matthew Garrett
@ 2008-05-14  1:03 ` Matthew Garrett
  2008-05-14  1:08   ` [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one Matthew Garrett
  2008-05-14  3:19 ` [PATCH] Fix polling policy in the absence of _TZP Len Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14  1:03 UTC (permalink / raw)
  To: lenb; +Cc: linux-acpi

Add a convenience structure to the ACPI core that allows drivers to 
obtain the list of CPU devices. This is left in the core since the 
scanning is performed at boot time and the drivers may be built as 
modules.

Signed-off-by: Matthew Garrett <mjg@redhat.com>

---

Am I missing a straightforward function to call from a driver that gives 
me the handles of all the objects of a specific type? There must be a 
better way of doing it than this :)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 6d85289..3b8d036 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -30,6 +30,9 @@ struct acpi_device_bus_id{
 	struct list_head node;
 };
 
+struct acpi_handle_list acpi_processor_list;
+EXPORT_SYMBOL(processor_list);
+
 /*
  * Creates hid/cid(s) string needed for modalias and uevent
  * e.g. on a device with hid:IBM0001 and cid:ACPI0001 you get:
@@ -1044,6 +1047,11 @@ static void acpi_device_set_id(struct acpi_device *device,
 		break;
 	case ACPI_BUS_TYPE_PROCESSOR:
 		hid = ACPI_PROCESSOR_HID;
+		if (acpi_processor_list.count < ACPI_MAX_HANDLES) {
+			acpi_processor_list.handles[acpi_processor_list.count]
+				=handle;
+			acpi_processor_list.count++;
+		}
 		break;
 	case ACPI_BUS_TYPE_SYSTEM:
 		hid = ACPI_SYSTEM_HID;

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  1:03 ` [PATCH 1/2] Add a list of processor objects to the ACPI core Matthew Garrett
@ 2008-05-14  1:08   ` Matthew Garrett
  2008-05-14  3:30     ` Len Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14  1:08 UTC (permalink / raw)
  To: lenb; +Cc: linux-acpi

If a thermal zone is provided with a critical temperature, then there is 
obviously a concern on the part of the vendor that it may overheat. 
Currently Linux will only attempt to do something about that if the 
vendor has explicitly added a passive cooling trip point. However, it's 
clear that allowing the system to hit the critical trip point is far 
from ideal - the system will immediately shut down, and data will almost 
certainly be lost. This patch adds a default passive cooling zone if the 
platform does not provide its own, with the default being to have it be 
5 degrees below the critical shutoff temperature. This should avoid the 
kernel limiting performance unless it's genuinely likely that the 
hardware is about to overheat and shut down. The default temperature 
value can be overridden by passing the thermal.psv argument at boot or 
module load time.

Signed-off-by: Matthew Garrett <mjg@redhat.com>

---

While this is clearly something of a hack, I'd argue that it's the right 
thing to do. In the real world, it's highly unlikely that a piece of 
ahrdware is going to reach equilibrium at 5 degrees below the critical 
temperature. If we've reached that temperature, the machine is in 
serious danger of powering down in the near future and we really ought 
to do something about it.

This patch associates the CPUs with the zone even if the zone may be 
relating to an entirely different part of the hardware. This is a 
pragmatic decision - right now the CPUs are the only hardware we really 
have any thermal control over, and even if the thermal zone is covering 
the GPU (for instance) then the only thing we can do to reduce the heat 
is to reduce the load on the CPU. I think this is certainly better than 
letting the machine power down.

diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 93cb3e8..19acfd3 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -116,6 +116,8 @@ static const struct acpi_device_id  thermal_device_ids[] = {
 };
 MODULE_DEVICE_TABLE(acpi, thermal_device_ids);
 
+extern struct acpi_handle_list acpi_processor_list;
+
 static struct acpi_driver acpi_thermal_driver = {
 	.name = "thermal",
 	.class = ACPI_THERMAL_CLASS,
@@ -418,9 +420,7 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 				"_PSV", NULL, &tz->trips.passive.temperature);
 		}
 
-		if (ACPI_FAILURE(status))
-			tz->trips.passive.flags.valid = 0;
-		else {
+		if (ACPI_SUCCESS(status)) {
 			tz->trips.passive.flags.valid = 1;
 			if (flag == ACPI_TRIPS_INIT) {
 				status = acpi_evaluate_integer(
@@ -440,20 +440,48 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
 					tz->trips.passive.flags.valid = 0;
 			}
 		}
+
+		if (!tz->trips.passive.flags.valid) {
+			/* If there's no valid passive zone, add a fake
+			   one in order to ensure that we don't hit the
+			   critical temperature limit */
+
+			tz->trips.passive.flags.valid = 1;
+			tz->trips.passive.tc1 = 1;
+			tz->trips.passive.tc2 = 1;
+
+			/* A high rate of polling here is acceptable -
+			   if we're hitting this limit, then the
+			   system is clearly under load. A higher
+			   polling frequency means that we can weigh
+			   the load against the temperature more
+			   effeciently and overall reduce power
+			   consumption */
+
+			tz->trips.passive.tsp = 10;
+
+			/* Set the passive trip temperature to be either
+			   the option passed by the user or 5 degrees below the
+			   critical temperature. That should give us enough
+			   head room without limiting performance */
+
+			if (!psv)
+				tz->trips.passive.temperature =
+					tz->trips.critical.temperature - 50;
+		}
 	}
 	if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.passive.flags.valid) {
 		memset(&devices, 0, sizeof(struct acpi_handle_list));
 		status = acpi_evaluate_reference(tz->device->handle, "_PSL",
 							NULL, &devices);
-		if (ACPI_FAILURE(status))
-			tz->trips.passive.flags.valid = 0;
-		else
-			tz->trips.passive.flags.valid = 1;
-
-		if (memcmp(&tz->trips.passive.devices, &devices,
+		if (ACPI_FAILURE(status)) {
+			memcpy(&tz->trips.passive.devices,
+			       &acpi_processor_list,
+			       sizeof (struct acpi_handle_list));
+		} else if (memcmp(&tz->trips.passive.devices, &devices,
 				sizeof(struct acpi_handle_list))) {
 			memcpy(&tz->trips.passive.devices, &devices,
-				sizeof(struct acpi_handle_list));
+			       sizeof(struct acpi_handle_list));
 			ACPI_THERMAL_TRIPS_EXCEPTION(flag, "device");
 		}
 	}

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix polling policy in the absence of _TZP
  2008-05-14  1:00 [PATCH] Fix polling policy in the absence of _TZP Matthew Garrett
  2008-05-14  1:03 ` [PATCH 1/2] Add a list of processor objects to the ACPI core Matthew Garrett
@ 2008-05-14  3:19 ` Len Brown
  2008-05-14 10:14   ` Matthew Garrett
  1 sibling, 1 reply; 17+ messages in thread
From: Len Brown @ 2008-05-14  3:19 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi

On Tuesday 13 May 2008, Matthew Garrett wrote:
> Fix Linux to conform to 11.3.18 of the ACPI spec 3.0. In the absence of 
> a _TZP method, we should be polling at a default frequency. While it can 
> be argued that "0" is a default frequency, I don't think that's what the 
> spec authors had in mind.

Actually it is what the spec authors had in mind,
and the spec is wrong.  Yes, I can provide documentation
supporting this statement.

NAK

thanks,
-Len

> Signed-off-by: Matthew Garrett <mjg@redhat.com>
> 
> ---
> 
> Arguably this should be implemented in such a way that the polling rate 
> increases as the temperature rises and drops if the trend is downwards. 
> That would give higher resolution when we're heading towards a trip 
> point, without any significant increase in power consumption since 
> increased temperatures imply that the CPU is probably not idle.
> 
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index 504385b..93cb3e8 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -846,10 +846,11 @@ static void acpi_thermal_check(void *data)
>  	 * Calculate Sleep Time
>  	 * --------------------
>  	 * If we're in the passive state, use _TSP's value.  Otherwise
> -	 * use the default polling frequency (e.g. _TZP).  If no polling
> -	 * frequency is specified then we'll wait forever (at least until
> -	 * a thermal event occurs).  Note that _TSP and _TZD values are
> -	 * given in 1/10th seconds (we must covert to milliseconds).
> +	 * use the default polling frequency (e.g. _TZP).  If no
> +	 * polling frequency is specified then we'll wait 10 seconds
> +	 * (or until a thermal event occurs).  Note that _TSP and _TZD
> +	 * values are given in 1/10th seconds (we must covert to
> +	 * milliseconds).
>  	 */
>  	if (tz->state.passive) {
>  		sleep_time = tz->trips.passive.tsp * 100;
> @@ -1575,8 +1576,9 @@ static int acpi_thermal_get_info(struct acpi_thermal *tz)
>  	/* Get default polling frequency [_TZP] (optional) */
>  	if (tzp)
>  		tz->polling_frequency = tzp;
> -	else
> -		acpi_thermal_get_polling_frequency(tz);
> +	else if (acpi_thermal_get_polling_frequency(tz) == -ENODEV)
> +		/* If no _TZP, default to polling every 10 seconds */
> +		tz->polling_frequency = 100;
>  
>  	return 0;
>  }
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  1:08   ` [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one Matthew Garrett
@ 2008-05-14  3:30     ` Len Brown
  2008-05-14  9:33       ` Matthew Garrett
  2008-05-15 11:47       ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Len Brown @ 2008-05-14  3:30 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi

On Tuesday 13 May 2008, Matthew Garrett wrote:
> If a thermal zone is provided with a critical temperature, then there is 
> obviously a concern on the part of the vendor that it may overheat. 
> Currently Linux will only attempt to do something about that if the 
> vendor has explicitly added a passive cooling trip point. However, it's 
> clear that allowing the system to hit the critical trip point is far 
> from ideal - the system will immediately shut down, and data will almost 
> certainly be lost. This patch adds a default passive cooling zone if the 
> platform does not provide its own, with the default being to have it be 
> 5 degrees below the critical shutoff temperature. This should avoid the 
> kernel limiting performance unless it's genuinely likely that the 
> hardware is about to overheat and shut down. The default temperature 
> value can be overridden by passing the thermal.psv argument at boot or 
> module load time.
> 
> Signed-off-by: Matthew Garrett <mjg@redhat.com>
> 
> ---
> 
> While this is clearly something of a hack,


Yes, I'm aware of the Dell D610, and I'm truly sorry
that Dell built a machine with a poor thermal solution,
didn't supply _PSV, and somehow looked after Windows
without looking after Linux.

But I'm not willing to screw up Linux for the
benefit of an obsolete screwed up Dell laptop.

An entire new handheld product line will ship this year
where the assumption that the processor is the primary
contributor to heat may simply be _false_, thermal
zones may be totally unrelated to the processor,
and throttling the processor may be exactly the
wrong way to handle a thermal issue.

I'm sorry for D610 owners running Linux, really I am.

-Len

> I'd argue that it's the right  
> thing to do. In the real world, it's highly unlikely that a piece of 
> ahrdware is going to reach equilibrium at 5 degrees below the critical 
> temperature. If we've reached that temperature, the machine is in 
> serious danger of powering down in the near future and we really ought 
> to do something about it.
> 
> This patch associates the CPUs with the zone even if the zone may be 
> relating to an entirely different part of the hardware. This is a 
> pragmatic decision - right now the CPUs are the only hardware we really 
> have any thermal control over, and even if the thermal zone is covering 
> the GPU (for instance) then the only thing we can do to reduce the heat 
> is to reduce the load on the CPU. I think this is certainly better than 
> letting the machine power down.
> 
> diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
> index 93cb3e8..19acfd3 100644
> --- a/drivers/acpi/thermal.c
> +++ b/drivers/acpi/thermal.c
> @@ -116,6 +116,8 @@ static const struct acpi_device_id  thermal_device_ids[] = {
>  };
>  MODULE_DEVICE_TABLE(acpi, thermal_device_ids);
>  
> +extern struct acpi_handle_list acpi_processor_list;
> +
>  static struct acpi_driver acpi_thermal_driver = {
>  	.name = "thermal",
>  	.class = ACPI_THERMAL_CLASS,
> @@ -418,9 +420,7 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
>  				"_PSV", NULL, &tz->trips.passive.temperature);
>  		}
>  
> -		if (ACPI_FAILURE(status))
> -			tz->trips.passive.flags.valid = 0;
> -		else {
> +		if (ACPI_SUCCESS(status)) {
>  			tz->trips.passive.flags.valid = 1;
>  			if (flag == ACPI_TRIPS_INIT) {
>  				status = acpi_evaluate_integer(
> @@ -440,20 +440,48 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
>  					tz->trips.passive.flags.valid = 0;
>  			}
>  		}
> +
> +		if (!tz->trips.passive.flags.valid) {
> +			/* If there's no valid passive zone, add a fake
> +			   one in order to ensure that we don't hit the
> +			   critical temperature limit */
> +
> +			tz->trips.passive.flags.valid = 1;
> +			tz->trips.passive.tc1 = 1;
> +			tz->trips.passive.tc2 = 1;
> +
> +			/* A high rate of polling here is acceptable -
> +			   if we're hitting this limit, then the
> +			   system is clearly under load. A higher
> +			   polling frequency means that we can weigh
> +			   the load against the temperature more
> +			   effeciently and overall reduce power
> +			   consumption */
> +
> +			tz->trips.passive.tsp = 10;
> +
> +			/* Set the passive trip temperature to be either
> +			   the option passed by the user or 5 degrees below the
> +			   critical temperature. That should give us enough
> +			   head room without limiting performance */
> +
> +			if (!psv)
> +				tz->trips.passive.temperature =
> +					tz->trips.critical.temperature - 50;
> +		}
>  	}
>  	if ((flag & ACPI_TRIPS_DEVICES) && tz->trips.passive.flags.valid) {
>  		memset(&devices, 0, sizeof(struct acpi_handle_list));
>  		status = acpi_evaluate_reference(tz->device->handle, "_PSL",
>  							NULL, &devices);
> -		if (ACPI_FAILURE(status))
> -			tz->trips.passive.flags.valid = 0;
> -		else
> -			tz->trips.passive.flags.valid = 1;
> -
> -		if (memcmp(&tz->trips.passive.devices, &devices,
> +		if (ACPI_FAILURE(status)) {
> +			memcpy(&tz->trips.passive.devices,
> +			       &acpi_processor_list,
> +			       sizeof (struct acpi_handle_list));
> +		} else if (memcmp(&tz->trips.passive.devices, &devices,
>  				sizeof(struct acpi_handle_list))) {
>  			memcpy(&tz->trips.passive.devices, &devices,
> -				sizeof(struct acpi_handle_list));
> +			       sizeof(struct acpi_handle_list));
>  			ACPI_THERMAL_TRIPS_EXCEPTION(flag, "device");
>  		}
>  	}
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  3:30     ` Len Brown
@ 2008-05-14  9:33       ` Matthew Garrett
  2008-05-14 15:37         ` Len Brown
  2008-05-15  0:55         ` Zhang Rui
  2008-05-15 11:47       ` Andi Kleen
  1 sibling, 2 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14  9:33 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-acpi

On Tue, May 13, 2008 at 11:30:20PM -0400, Len Brown wrote:

> An entire new handheld product line will ship this year
> where the assumption that the processor is the primary
> contributor to heat may simply be _false_, thermal
> zones may be totally unrelated to the processor,
> and throttling the processor may be exactly the
> wrong way to handle a thermal issue.

That's fine. In the ACPI code, we currently have no support for managing 
thermal conditions through anything other than the processor. Once that 
changes, it's easy to fix this up so it doesn't interfere with thermal 
zones that are handled by throttling other devices.

Menlow has a more complex thermal model than machines we've seen up to 
now, but I was under the impression that this was to be handled by the 
new generic thermal class rather than the existing ACPI code. I don't 
see this patch as being in conflict with supporting that.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Fix polling policy in the absence of _TZP
  2008-05-14  3:19 ` [PATCH] Fix polling policy in the absence of _TZP Len Brown
@ 2008-05-14 10:14   ` Matthew Garrett
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14 10:14 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-acpi

On Tue, May 13, 2008 at 11:19:39PM -0400, Len Brown wrote:
> On Tuesday 13 May 2008, Matthew Garrett wrote:
> > Fix Linux to conform to 11.3.18 of the ACPI spec 3.0. In the absence of 
> > a _TZP method, we should be polling at a default frequency. While it can 
> > be argued that "0" is a default frequency, I don't think that's what the 
> > spec authors had in mind.
> 
> Actually it is what the spec authors had in mind,
> and the spec is wrong.  Yes, I can provide documentation

Hmm. Ok, I'll handle this differently.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  9:33       ` Matthew Garrett
@ 2008-05-14 15:37         ` Len Brown
  2008-05-14 23:01           ` Matthew Garrett
  2008-05-15  0:55         ` Zhang Rui
  1 sibling, 1 reply; 17+ messages in thread
From: Len Brown @ 2008-05-14 15:37 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi

On Wednesday 14 May 2008, Matthew Garrett wrote:
> On Tue, May 13, 2008 at 11:30:20PM -0400, Len Brown wrote:
> 
> > An entire new handheld product line will ship this year
> > where the assumption that the processor is the primary
> > contributor to heat may simply be _false_, thermal
> > zones may be totally unrelated to the processor,
> > and throttling the processor may be exactly the
> > wrong way to handle a thermal issue.
> 
> That's fine. In the ACPI code, we currently have no support for managing 
> thermal conditions through anything other than the processor. Once that 
> changes, it's easy to fix this up so it doesn't interfere with thermal 
> zones that are handled by throttling other devices.

No.
ACPI thermal zones with no passive trip points can be used
to simply report interesting temperature events to user-space
(via the spiffy new generic thermal class).
User-space then invokes corrective action -- which may be
something _other_ than throttling the CPU.

> Menlow has a more complex thermal model than machines we've seen up to 
> now, but I was under the impression that this was to be handled by the 
> new generic thermal class rather than the existing ACPI code. I don't 
> see this patch as being in conflict with supporting that.

No.
It is a direct conflict.

Say the thermal zone is related to the communications chip.
If we create a phantom passive trip point for that zone
and throttle the CPU when it fires, we're shooting ourselves
in the head (when we should be poking ourselves in the ear, or some such:-)

-Len

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14 15:37         ` Len Brown
@ 2008-05-14 23:01           ` Matthew Garrett
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-14 23:01 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-acpi

On Wed, May 14, 2008 at 11:37:22AM -0400, Len Brown wrote:

> No.
> ACPI thermal zones with no passive trip points can be used
> to simply report interesting temperature events to user-space
> (via the spiffy new generic thermal class).
> User-space then invokes corrective action -- which may be
> something _other_ than throttling the CPU.

If user space wants to decouple ACPI from the generic thermal model, 
then it can do so by using the mode file in the sysfs node. That doesn't 
change.

> > Menlow has a more complex thermal model than machines we've seen up to 
> > now, but I was under the impression that this was to be handled by the 
> > new generic thermal class rather than the existing ACPI code. I don't 
> > see this patch as being in conflict with supporting that.
> 
> No.
> It is a direct conflict.
> 
> Say the thermal zone is related to the communications chip.
> If we create a phantom passive trip point for that zone
> and throttle the CPU when it fires, we're shooting ourselves
> in the head (when we should be poking ourselves in the ear, or some such:-)

As I said, if userspace is managing the thermal envelope then it has to 
indicate this by writing a value into sysfs (see the get_mode and 
set_mode calls in the thermal_zone_device_ops structure). If that's 
done then it indicates to the kernel that it should leave control up to 
the userspace application. acpi_thermal_check will then return without 
handling the trip points (including the critical one, which I'm not sure 
sounds like a good idea...)

As a result, this patch doesn't conflict with the userspace thermal 
management at all. The moment the userspace policy daemon starts up 
it'll disable the code that would fire on the passive trip point. If 
you're not using a userspace policy daemon, then the only hardware that 
can be used to drop the temperature is the processor. Which is what this 
patch does.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  9:33       ` Matthew Garrett
  2008-05-14 15:37         ` Len Brown
@ 2008-05-15  0:55         ` Zhang Rui
  2008-05-15  1:04           ` Matthew Garrett
  1 sibling, 1 reply; 17+ messages in thread
From: Zhang Rui @ 2008-05-15  0:55 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: Len Brown, linux-acpi


On Wed, 2008-05-14 at 10:33 +0100, Matthew Garrett wrote:
> On Tue, May 13, 2008 at 11:30:20PM -0400, Len Brown wrote:
> 
> > An entire new handheld product line will ship this year
> > where the assumption that the processor is the primary
> > contributor to heat may simply be _false_, thermal
> > zones may be totally unrelated to the processor,
> > and throttling the processor may be exactly the
> > wrong way to handle a thermal issue.
> 
> That's fine. In the ACPI code, we currently have no support for managing 
> thermal conditions through anything other than the processor. Once that 
> changes, it's easy to fix this up so it doesn't interfere with thermal 
> zones that are handled by throttling other devices.
> 
> Menlow has a more complex thermal model than machines we've seen up to 
> now, but I was under the impression that this was to be handled by the 
> new generic thermal class rather than the existing ACPI code. I don't 
> see this patch as being in conflict with supporting that.
> 
Menlow also defines ACPI thermal zones and all these thermal zones are
registered with the generic thermal class.
The difference is that menlow uses _TZD to list many other devices for
throttling, and all these devices are registered with the generic
thermal class as well.
If this patch is applied, processors may be throttled unexcepted.

thanks,
rui


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-15  0:55         ` Zhang Rui
@ 2008-05-15  1:04           ` Matthew Garrett
       [not found]             ` <1210814699.2929.10.camel@rzhang-crestline.sh.intel.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Garrett @ 2008-05-15  1:04 UTC (permalink / raw)
  To: Zhang Rui; +Cc: Len Brown, linux-acpi

On Thu, May 15, 2008 at 08:55:16AM +0800, Zhang Rui wrote:

> Menlow also defines ACPI thermal zones and all these thermal zones are
> registered with the generic thermal class.
> The difference is that menlow uses _TZD to list many other devices for
> throttling, and all these devices are registered with the generic
> thermal class as well.
> If this patch is applied, processors may be throttled unexcepted.

Hm. If there's a TZD defined for the thermal zone, then it makes sense 
to use that rather than using the processor. I didn't bother since I 
haven't yet been able to find a machine that used TZD for anything other 
than the CPU :) I'll implement that tomorrow.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
       [not found]               ` <20080515015831.GA14829@srcf.ucam.org>
@ 2008-05-15  2:15                 ` Zhang Rui
  2008-05-15  2:24                   ` Matthew Garrett
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Rui @ 2008-05-15  2:15 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi, Len Brown


On Thu, 2008-05-15 at 02:58 +0100, Matthew Garrett wrote:
> On Thu, May 15, 2008 at 09:24:59AM +0800, Zhang Rui wrote:
> > On Thu, 2008-05-15 at 09:04 +0800, Matthew Garrett wrote:
> > >  I'll implement that tomorrow.
> > sorry I don't understand this. what will you implement?
> 
> I'll modify my patch so it uses the devices in a _TZD package if they're 
> present.
> 
Hmm, ACPI thermal zone doesn't know how to throttle the devices in a
_TZD package, so I don't think we can use the _TZD devices.

thanks,
rui


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-15  2:15                 ` Zhang Rui
@ 2008-05-15  2:24                   ` Matthew Garrett
  2008-05-15  3:06                     ` Zhang Rui
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Garrett @ 2008-05-15  2:24 UTC (permalink / raw)
  To: Zhang Rui; +Cc: linux-acpi, Len Brown

On Thu, May 15, 2008 at 10:15:39AM +0800, Zhang Rui wrote:
> On Thu, 2008-05-15 at 02:58 +0100, Matthew Garrett wrote:
> > I'll modify my patch so it uses the devices in a _TZD package if they're 
> > present.
> > 
> Hmm, ACPI thermal zone doesn't know how to throttle the devices in a
> _TZD package, so I don't think we can use the _TZD devices.

If they have a _TZD, it's presumably because they need cooling. If the 
kernel isn't going to be able to do that, what happens if the userspace 
daemon crashes or fails to start for some reason? 

The alternative is to leave it as it is currently in my patch. If the 
userspace daemon starts, it disables the ACPI control and will handle it 
itself. If it doesn't start, the kernel does the only thing it can - 
reduce the speed of the CPU. If there are no plans to add hooks for the 
drivers to be throttled by the kernel, then I'd argue that this is the 
right thing to do.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-15  2:24                   ` Matthew Garrett
@ 2008-05-15  3:06                     ` Zhang Rui
  2008-05-15 11:40                       ` Matthew Garrett
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Rui @ 2008-05-15  3:06 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi, Len Brown


On Thu, 2008-05-15 at 03:24 +0100, Matthew Garrett wrote:
> On Thu, May 15, 2008 at 10:15:39AM +0800, Zhang Rui wrote:
> > On Thu, 2008-05-15 at 02:58 +0100, Matthew Garrett wrote:
> > > I'll modify my patch so it uses the devices in a _TZD package if they're 
> > > present.
> > > 
> > Hmm, ACPI thermal zone doesn't know how to throttle the devices in a
> > _TZD package, so I don't think we can use the _TZD devices.
> 
> If they have a _TZD, it's presumably because they need cooling.
Agree.

>  If the 
> kernel isn't going to be able to do that,
It is complicated to throttle devices in kernel.
Take this for example:
            Method (_TZD, 0, Serialized)
            {
                If (MPEN)
                {
                    Return (Package (0x04)
                    {
                        \_PR.P001,
                        \_PR.P002,
                        \_SB.PCI0.GFX0.DD04,
                        MEM0
                    })
                }

                Return (Package (0x04)
                {
                    \_SB.PCI0.GFX0.DD03,
                    \_SB.PCI0.GFX0.DD04,
                    \_PR.P001,
                    MEM0
                })
            }
ACPI thermal zone knows how to throttle a processor,
but it never knows how to throttle the LCD and memory controller.

Device throttling is implemented in its native driver, like ACPI
processor, ACPI video and intel_menlow etc, and all the driver may be
loaded as a module.
I don't like the idea of adding hooks like we do for processors.
what do you think?

>  what happens if the userspace 
> daemon crashes or fails to start for some reason? 
Well, the current solution depends on the userspace daemon.

A possible solution is to throttle the devices in the generic thermal
class, as only the generic thermal driver knows the temperature and trip
point info and know hows to throttle all the cooling devices in a
thermal zone. But I haven't thought about it carefully.

> 
> The alternative is to leave it as it is currently in my patch. If the 
> userspace daemon starts, it disables the ACPI control and will handle it 
> itself. If it doesn't start, the kernel does the only thing it can - 
> reduce the speed of the CPU.

it can work and it doesn't break the current flowchart on menlow.
but I still doubt if this is the right way to go.
As you said, a _TZD method means these devices need to be throttled when
overheating, it's also presumable that BIOS doesn't want to throttle the
processor for a thermal zone without _PSV, e.g. the thermal zone is for
the skin temperature and throttling the processor doesn't help.
anyway, a fake passive trip point is a little tricky IMO. :)

thanks,
rui



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-15  3:06                     ` Zhang Rui
@ 2008-05-15 11:40                       ` Matthew Garrett
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-15 11:40 UTC (permalink / raw)
  To: Zhang Rui; +Cc: linux-acpi, Len Brown

On Thu, May 15, 2008 at 11:06:23AM +0800, Zhang Rui wrote:
> On Thu, 2008-05-15 at 03:24 +0100, Matthew Garrett wrote:

> ACPI thermal zone knows how to throttle a processor,
> but it never knows how to throttle the LCD and memory controller.
>
> Device throttling is implemented in its native driver, like ACPI
> processor, ACPI video and intel_menlow etc, and all the driver may be
> loaded as a module.
> I don't like the idea of adding hooks like we do for processors.
> what do you think?

I think it would be beneficial if the ACPI thermal code could call into 
the native device code, kind of like the processor thermal code does 
with cpufreq. But that's for the future, rather than something I'm 
worried about now. Doing that through the generic thermal code rather 
than the ACPI layer itself seems like the right way to do it, assuming 
that we can tie all the physical devices to their ACPI handles.
 
> > The alternative is to leave it as it is currently in my patch. If the 
> > userspace daemon starts, it disables the ACPI control and will handle it 
> > itself. If it doesn't start, the kernel does the only thing it can - 
> > reduce the speed of the CPU.
> 
> it can work and it doesn't break the current flowchart on menlow.
> but I still doubt if this is the right way to go.
> As you said, a _TZD method means these devices need to be throttled when
> overheating, it's also presumable that BIOS doesn't want to throttle the
> processor for a thermal zone without _PSV, e.g. the thermal zone is for
> the skin temperature and throttling the processor doesn't help.
> anyway, a fake passive trip point is a little tricky IMO. :)

Like I said, if the device is within 5 degrees of a critical shutdown 
then it seems likely that it's going to power off in the near future 
unless we do something about it. If the BIOS doesn't provide a passive 
zone or a _TZD package, then the only thing we can do is throttle the 
CPU. It might help, or it might not - but it's certainly not going to 
icnrease the instantaneous heat load on the device.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-14  3:30     ` Len Brown
  2008-05-14  9:33       ` Matthew Garrett
@ 2008-05-15 11:47       ` Andi Kleen
  2008-05-15 12:02         ` Matthew Garrett
  1 sibling, 1 reply; 17+ messages in thread
From: Andi Kleen @ 2008-05-15 11:47 UTC (permalink / raw)
  To: Len Brown; +Cc: Matthew Garrett, linux-acpi

Len Brown <lenb@kernel.org> writes:
>
> Yes, I'm aware of the Dell D610, and I'm truly sorry
> that Dell built a machine with a poor thermal solution,
> didn't supply _PSV, and somehow looked after Windows
> without looking after Linux.

Do you know how it works in Windows?

> But I'm not willing to screw up Linux for the
> benefit of an obsolete screwed up Dell laptop.

Wouldn't it be possible to add a D610 specific quirk that 
simply adds a trip point for that machine only?

As long as the problem is not wide-spread over many models
a quirk is a reasonable solution.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one
  2008-05-15 11:47       ` Andi Kleen
@ 2008-05-15 12:02         ` Matthew Garrett
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Garrett @ 2008-05-15 12:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Len Brown, linux-acpi

On Thu, May 15, 2008 at 01:47:28PM +0200, Andi Kleen wrote:

> Wouldn't it be possible to add a D610 specific quirk that 
> simply adds a trip point for that machine only?
> 
> As long as the problem is not wide-spread over many models
> a quirk is a reasonable solution.

There are plenty of other machines that have this issue, but there 
doesn't seem to be any especially good pattern to them. In marginal 
cases it may even depend on the manufacturing batch.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-05-15 12:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-14  1:00 [PATCH] Fix polling policy in the absence of _TZP Matthew Garrett
2008-05-14  1:03 ` [PATCH 1/2] Add a list of processor objects to the ACPI core Matthew Garrett
2008-05-14  1:08   ` [PATCH 2/2] Add a passive cooling trip point if the firmware doesn't define one Matthew Garrett
2008-05-14  3:30     ` Len Brown
2008-05-14  9:33       ` Matthew Garrett
2008-05-14 15:37         ` Len Brown
2008-05-14 23:01           ` Matthew Garrett
2008-05-15  0:55         ` Zhang Rui
2008-05-15  1:04           ` Matthew Garrett
     [not found]             ` <1210814699.2929.10.camel@rzhang-crestline.sh.intel.com>
     [not found]               ` <20080515015831.GA14829@srcf.ucam.org>
2008-05-15  2:15                 ` Zhang Rui
2008-05-15  2:24                   ` Matthew Garrett
2008-05-15  3:06                     ` Zhang Rui
2008-05-15 11:40                       ` Matthew Garrett
2008-05-15 11:47       ` Andi Kleen
2008-05-15 12:02         ` Matthew Garrett
2008-05-14  3:19 ` [PATCH] Fix polling policy in the absence of _TZP Len Brown
2008-05-14 10:14   ` Matthew Garrett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).