Linux Power Management development
 help / color / mirror / Atom feed
* [PATCH] power: max17040: use devm_kzalloc
From: Devendra Naga @ 2013-01-06  4:53 UTC (permalink / raw)
  To: Anton Vorontsov, David Woodhouse, linux-pm; +Cc: Devendra Naga

use devm_kzalloc and no need of error path and unload frees

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
---
 drivers/power/max17040_battery.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/power/max17040_battery.c b/drivers/power/max17040_battery.c
index 22cfe9c..74a0bd9 100644
--- a/drivers/power/max17040_battery.c
+++ b/drivers/power/max17040_battery.c
@@ -207,7 +207,7 @@ static int max17040_probe(struct i2c_client *client,
 	if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_BYTE))
 		return -EIO;
 
-	chip = kzalloc(sizeof(*chip), GFP_KERNEL);
+	chip = devm_kzalloc(&client->dev, sizeof(*chip), GFP_KERNEL);
 	if (!chip)
 		return -ENOMEM;
 
@@ -225,7 +225,6 @@ static int max17040_probe(struct i2c_client *client,
 	ret = power_supply_register(&client->dev, &chip->battery);
 	if (ret) {
 		dev_err(&client->dev, "failed: power supply register\n");
-		kfree(chip);
 		return ret;
 	}
 
@@ -244,7 +243,6 @@ static int max17040_remove(struct i2c_client *client)
 
 	power_supply_unregister(&chip->battery);
 	cancel_delayed_work(&chip->work);
-	kfree(chip);
 	return 0;
 }
 
-- 
1.7.10.4


^ permalink raw reply related

* Re: power: max17040: use devm_kzalloc
From: Anton Vorontsov @ 2013-01-06  3:04 UTC (permalink / raw)
  To: Devendra Naga; +Cc: David Woodhouse, linux-pm
In-Reply-To: <CA+C2MxQVdgWn_fjrbiHBDeENHce5FT2xYxYxom84XncQ3bB1nQ@mail.gmail.com>

On Fri, Dec 07, 2012 at 01:01:06PM +0530, Devendra Naga wrote:
> use devm_kzalloc and no need of error path and unload frees
> 
> Signed-off-by: Devendra Naga <develkernel412222@gmail.com>
> ---

The patch doesn't apply, it is whitespace-damaged...

>  drivers/power/max17040_battery.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/power/max17040_battery.c
> b/drivers/power/max17040_battery.c
> index 22cfe9c..74a0bd9 100644
> --- a/drivers/power/max17040_battery.c
> +++ b/drivers/power/max17040_battery.c
> @@ -207,7 +207,7 @@ static int max17040_probe(struct i2c_client *client,
>         if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_BYTE))
>                 return -EIO;
> 
> -       chip = kzalloc(sizeof(*chip), GFP_KERNEL);
> +       chip = devm_kzalloc(&client->dev, sizeof(*chip), GFP_KERNEL);
>         if (!chip)
>                 return -ENOMEM;
> 
> @@ -225,7 +225,6 @@ static int max17040_probe(struct i2c_client *client,
>         ret = power_supply_register(&client->dev, &chip->battery);
>         if (ret) {
>                 dev_err(&client->dev, "failed: power supply register\n");
> -               kfree(chip);
>                 return ret;
>         }
> 
> @@ -244,7 +243,6 @@ static int max17040_remove(struct i2c_client *client)
> 
>         power_supply_unregister(&chip->battery);
>         cancel_delayed_work(&chip->work);
> -       kfree(chip);
>         return 0;
>  }
> 
> -- 
> 1.7.5.4

^ permalink raw reply

* [PATCH v11 8/9] libata: no poll when ODD is powered off
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

When the ODD is powered off, any action the user did to the ODD that
would generate a media event will trigger an ACPI interrupt, so the
poll for media event is no longer necessary. And the poll will also
cause a runtime status change, which will stop the ODD from staying in
powered off state, so the poll should better be stopped.

But since we don't have access to the gendisk structure in LLDs, here
comes the event_driven flag for scsi device. This flag serves as a
capability of the device, conveyed by the LLDs to upper layer. It is set
when LLDs know that this device will no longer generate any media
related events, so that the check_events can simply return 0 without
bothering the device, effectively silence the poll.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-zpodd.c | 8 ++++++++
 drivers/scsi/sr.c          | 6 +++++-
 include/scsi/scsi_device.h | 1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
index 3c39987..5843a18 100644
--- a/drivers/ata/libata-zpodd.c
+++ b/drivers/ata/libata-zpodd.c
@@ -177,6 +177,13 @@ void zpodd_pre_poweroff(struct ata_device *dev)
 {
 	struct zpodd *zpodd = dev->zpodd;
 
+	/*
+	 * Silence the media change poll, as we will be notified when
+	 * user wants to use the ODD so there is no meaning to poll
+	 * it when it is powered off
+	 */
+	dev->sdev->event_driven = true;
+
 	zpodd->powered_off = true;
 	device_set_run_wake(&dev->sdev->sdev_gendev, true);
 	acpi_pm_device_run_wake(&dev->sdev->sdev_gendev, true);
@@ -209,6 +216,7 @@ void zpodd_post_resume(struct ata_device *dev)
 
 	zpodd->last_ready = 0;
 	zpodd->zp_ready = false;
+	dev->sdev->event_driven = false;
 }
 
 static void zpodd_wake_dev(acpi_handle handle, u32 event, void *context)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 4d1a610..b034274 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -594,7 +594,11 @@ static unsigned int sr_block_check_events(struct gendisk *disk,
 					  unsigned int clearing)
 {
 	struct scsi_cd *cd = scsi_cd(disk);
-	return cdrom_check_events(&cd->cdi, clearing);
+
+	if (!cd->device->event_driven)
+		return cdrom_check_events(&cd->cdi, clearing);
+	else
+		return 0;
 }
 
 static int sr_block_revalidate_disk(struct gendisk *disk)
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index e65c62e..1756151 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -160,6 +160,7 @@ struct scsi_device {
 	unsigned can_power_off:1; /* Device supports runtime power off */
 	unsigned wce_default_on:1;	/* Cache is ON by default */
 	unsigned no_dif:1;	/* T10 PI (DIF) should be disabled */
+	unsigned event_driven:1; /* No need to poll the device */
 
 	DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */
 	struct list_head event_list;	/* asserted events */
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 9/9] libata: do not suspend port if normal ODD is attached
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

For ODDs, the upper layer will poll for media change every a few
seconds, which will make it enter and leave suspend state very
oftern. And as each suspend will also cause a hard/soft reset,
the gain of runtime suspend is very little while the ODD may
mis-function after constantly being reset. So the idle callback
here will not proceed to suspend if a non-ZPODD capable ODD is
attached to the port.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-core.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 65a362e..f664a90 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5408,8 +5408,27 @@ static int ata_port_resume(struct device *dev)
 	return rc;
 }
 
+/*
+ * For ODDs, the upper layer will poll for media change every a few seconds,
+ * which will make it enter and leave suspend state every a few seconds. And
+ * as each suspend will cause a hard/soft reset, the gain of runtime suspend
+ * is very little and the ODD may mis-function after constantly being reset.
+ * So the idle callback here will not proceed to suspend if a non-ZPODD capable
+ * ODD is attached to the port.
+ */
 static int ata_port_runtime_idle(struct device *dev)
 {
+	struct ata_port *ap = to_ata_port(dev);
+	struct ata_link *link;
+	struct ata_device *adev;
+
+	ata_for_each_link(link, ap, HOST_FIRST) {
+		ata_for_each_dev(adev, link, ENABLED)
+			if (adev->class == ATA_DEV_ATAPI &&
+			    !zpodd_dev_enabled(adev))
+				return -EBUSY;
+	}
+
 	return pm_runtime_suspend(dev);
 }
 
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 7/9] libata: expose pm qos flags for ata device
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

Expose pm qos flags to user space so that user has a chance to disable
pm features like power off, if he/she has a broken platform or devices
or simply does not like this pm feature.

This flag is exposed to user space only for ata device or atapi device
that is zero power capable. For normal atapi device, it will never be
powered off.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-acpi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index e832bf6..5b67be3 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -17,6 +17,7 @@
 #include <linux/pci.h>
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
+#include <linux/pm_qos.h>
 #include <scsi/scsi_device.h>
 #include "libata.h"
 
@@ -1022,6 +1023,8 @@ static void ata_acpi_unregister_power_resource(struct ata_device *dev)
 void ata_acpi_bind(struct ata_device *dev)
 {
 	ata_acpi_register_power_resource(dev);
+	if (dev->class == ATA_DEV_ATA || zpodd_dev_enabled(dev))
+		dev_pm_qos_expose_flags(&dev->sdev->sdev_gendev, 0);
 }
 
 void ata_acpi_unbind(struct ata_device *dev)
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 6/9] libata: handle power transition of ODD
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

When ata port is runtime suspended, it will check if the ODD attched to
it is a zero power(ZP) capable ODD and if the ZP capable ODD is in zero
power ready state. And if this is not the case, the highest acpi state
will be limited to ACPI_STATE_D3_HOT to avoid powering off the ODD.

And on resume, it will re-gain power and go through the recovery
process. When reset for the ata port is done, the ODD is considered
functional, and post processing like eject tray if the ODD is drawer
type is done there.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-acpi.c  | 35 ++++++++++++++++++++---------
 drivers/ata/libata-eh.c    |  2 ++
 drivers/ata/libata-zpodd.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h       |  8 +++++++
 4 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index b4788dd..e832bf6 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -835,6 +835,22 @@ void ata_acpi_on_resume(struct ata_port *ap)
 	}
 }
 
+static int ata_acpi_choose_suspend_state(struct ata_device *dev)
+{
+	int d_max_in = ACPI_STATE_D3_COLD;
+
+	/*
+	 * For ATAPI, runtime D3 cold is only allowed
+	 * for ZPODD in zero power ready state
+	 */
+	if (dev->class == ATA_DEV_ATAPI &&
+	    !(zpodd_dev_enabled(dev) && zpodd_zpready(dev)))
+		d_max_in = ACPI_STATE_D3_HOT;
+
+	return acpi_pm_device_sleep_state(&dev->sdev->sdev_gendev,
+					  NULL, d_max_in);
+}
+
 /**
  * ata_acpi_set_state - set the port power state
  * @ap: target ATA port
@@ -861,17 +877,16 @@ void ata_acpi_set_state(struct ata_port *ap, pm_message_t state)
 			continue;
 
 		if (state.event != PM_EVENT_ON) {
-			acpi_state = acpi_pm_device_sleep_state(
-				&dev->sdev->sdev_gendev, NULL, ACPI_STATE_D3);
-			if (acpi_state > 0)
-				acpi_bus_set_power(handle, acpi_state);
-			/* TBD: need to check if it's runtime pm request */
-			acpi_pm_device_run_wake(
-				&dev->sdev->sdev_gendev, true);
+			acpi_state = ata_acpi_choose_suspend_state(dev);
+			if (acpi_state == ACPI_STATE_D0)
+				continue;
+			if (zpodd_dev_enabled(dev) &&
+			    acpi_state == ACPI_STATE_D3_COLD)
+				zpodd_pre_poweroff(dev);
+			acpi_bus_set_power(handle, acpi_state);
 		} else {
-			/* Ditto */
-			acpi_pm_device_run_wake(
-				&dev->sdev->sdev_gendev, false);
+			if (zpodd_dev_enabled(dev))
+				zpodd_pre_poweron(dev);
 			acpi_bus_set_power(handle, ACPI_STATE_D0);
 		}
 	}
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index d85934c..d772d66 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -3857,6 +3857,8 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 				rc = atapi_eh_clear_ua(dev);
 				if (rc)
 					goto rest_fail;
+				if (zpodd_dev_enabled(dev))
+					zpodd_post_resume(dev);
 			}
 		}
 
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
index 7ee49bb..3c39987 100644
--- a/drivers/ata/libata-zpodd.c
+++ b/drivers/ata/libata-zpodd.c
@@ -17,6 +17,7 @@ struct zpodd {
 	/* The following bits are synchronized by PM core */
 	bool from_notify:1; /* resumed as a result of acpi notification */
 	bool zp_ready:1;	/* zero power ready state */
+	bool powered_off:1;	/* ODD is powered off */
 
 	unsigned long last_ready; /* last zero power ready timestamp */
 
@@ -42,6 +43,17 @@ static int run_atapi_cmd(struct ata_device *dev, const char *cdb,
 			buf ? DMA_FROM_DEVICE : DMA_NONE, buf, buf_len, 0);
 }
 
+static int eject_tray(struct ata_device *dev)
+{
+	const char cdb[] = {  GPCMD_START_STOP_UNIT,
+			      0, 0, 0,
+			      0x02,     /* LoEj */
+			      0, 0, 0, 0, 0, 0, 0,
+	};
+
+	return run_atapi_cmd(dev, cdb, sizeof(cdb), NULL, 0);
+}
+
 /*
  * Per the spec, only slot type and drawer type ODD can be supported
  *
@@ -155,6 +167,50 @@ void zpodd_on_suspend(struct ata_device *dev)
 	zpodd->zp_ready = true;
 }
 
+bool zpodd_zpready(struct ata_device *dev)
+{
+	struct zpodd *zpodd = dev->zpodd;
+	return zpodd->zp_ready;
+}
+
+void zpodd_pre_poweroff(struct ata_device *dev)
+{
+	struct zpodd *zpodd = dev->zpodd;
+
+	zpodd->powered_off = true;
+	device_set_run_wake(&dev->sdev->sdev_gendev, true);
+	acpi_pm_device_run_wake(&dev->sdev->sdev_gendev, true);
+}
+
+void zpodd_pre_poweron(struct ata_device *dev)
+{
+	struct zpodd *zpodd = dev->zpodd;
+
+	if (zpodd->powered_off) {
+		acpi_pm_device_run_wake(&dev->sdev->sdev_gendev, false);
+		device_set_run_wake(&dev->sdev->sdev_gendev, false);
+	}
+}
+
+void zpodd_post_resume(struct ata_device *dev)
+{
+	struct zpodd *zpodd = dev->zpodd;
+
+	if (!zpodd->powered_off)
+		return;
+
+	zpodd->powered_off = false;
+
+	if (zpodd->from_notify) {
+		zpodd->from_notify = false;
+		if (zpodd->drawer)
+			eject_tray(dev);
+	}
+
+	zpodd->last_ready = 0;
+	zpodd->zp_ready = false;
+}
+
 static void zpodd_wake_dev(acpi_handle handle, u32 event, void *context)
 {
 	struct ata_device *ata_dev = context;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index f9235dc..4b20e71 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -242,11 +242,19 @@ static inline bool zpodd_dev_enabled(struct ata_device *dev)
 	return dev->zpodd ? true : false;
 }
 void zpodd_on_suspend(struct ata_device *dev);
+bool zpodd_zpready(struct ata_device *dev);
+void zpodd_pre_poweroff(struct ata_device *dev);
+void zpodd_pre_poweron(struct ata_device *dev);
+void zpodd_post_resume(struct ata_device *dev);
 #else /* CONFIG_SATA_ZPODD */
 static inline void zpodd_init(struct ata_device *dev) {}
 static inline void zpodd_exit(struct ata_device *dev) {}
 static inline bool zpodd_dev_enabled(struct ata_device *dev) { return false; }
 static inline void zpodd_on_suspend(struct ata_device *dev) {}
+static inline bool zpodd_zpready(struct ata_device *dev) { return false; }
+static inline void zpodd_pre_poweroff(struct ata_device *dev) {}
+static inline void zpodd_pre_poweron(struct ata_device *dev) {}
+static inline void zpodd_post_resume(struct ata_device *dev) {}
 #endif /* CONFIG_SATA_ZPODD */
 
 #endif /* __LIBATA_H__ */
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 5/9] libata: check zero power ready status for ZPODD
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

Per the Mount Fuji spec, the ODD is considered zero power ready when:
  - For slot type ODD, no media inside;
  - For tray type ODD, no media inside and tray closed.

The information can be retrieved by either the returned information of
command GET_EVENT_STATUS_NOTIFICATION(the command is used to poll for
media event) or sense code.

The information provided by the media status byte is not accurate, it
is possible that after a new disc is just inserted, the status byte
still returns media not present. So this information can not be used as
the deciding factor, we use sense code to decide if zpready status is
true.

When we first sensed the ODD in the zero power ready state, the
timestamp will be recoreded. And after ODD stayed in this state for
some pre-defined period, the ODD is considered as power off ready and
the zp_ready flag will be set. The zp_ready flag serves as the deciding
factor other code will use to see if power off is OK for the ODD.

The Mount Fuji spec suggests a delay should be used here, to avoid the
case user ejects the ODD and then instantly inserts a new one again, so
that we can avoid a power transition. And some ODDs may be slow to place
its head to the home position after disc is ejected, so a delay here is
generally a good idea. And the delay time can be changed via the module
param zpodd_poweroff_delay.

The zero power ready status check is performed in the ata port's runtime
suspend code path, when port is not frozen yet, as we need to issue some
IOs to the ODD.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-eh.c    | 10 +++++--
 drivers/ata/libata-zpodd.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h       |  5 ++++
 3 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bf039b0..d85934c 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1591,7 +1591,7 @@ static int ata_eh_read_log_10h(struct ata_device *dev,
  *	RETURNS:
  *	0 on success, AC_ERR_* mask on failure.
  */
-static unsigned int atapi_eh_tur(struct ata_device *dev, u8 *r_sense_key)
+unsigned int atapi_eh_tur(struct ata_device *dev, u8 *r_sense_key)
 {
 	u8 cdb[ATAPI_CDB_LEN] = { TEST_UNIT_READY, 0, 0, 0, 0, 0 };
 	struct ata_taskfile tf;
@@ -1624,7 +1624,7 @@ static unsigned int atapi_eh_tur(struct ata_device *dev, u8 *r_sense_key)
  *	RETURNS:
  *	0 on success, AC_ERR_* mask on failure
  */
-static unsigned int atapi_eh_request_sense(struct ata_device *dev,
+unsigned int atapi_eh_request_sense(struct ata_device *dev,
 					   u8 *sense_buf, u8 dfl_sense_key)
 {
 	u8 cdb[ATAPI_CDB_LEN] =
@@ -4022,6 +4022,7 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 {
 	unsigned long flags;
 	int rc = 0;
+	struct ata_device *dev;
 
 	/* are we suspending? */
 	spin_lock_irqsave(ap->lock, flags);
@@ -4034,6 +4035,11 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
 
 	WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
 
+	ata_for_each_dev(dev, &ap->link, ENABLED) {
+		if (zpodd_dev_enabled(dev))
+			zpodd_on_suspend(dev);
+	}
+
 	/* tell ACPI we're suspending */
 	rc = ata_acpi_on_suspend(ap);
 	if (rc)
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
index c8c7b74..7ee49bb 100644
--- a/drivers/ata/libata-zpodd.c
+++ b/drivers/ata/libata-zpodd.c
@@ -1,16 +1,24 @@
 #include <linux/libata.h>
 #include <linux/cdrom.h>
 #include <linux/pm_runtime.h>
+#include <linux/module.h>
 #include <scsi/scsi_device.h>
 
 #include "libata.h"
 
+static int zpodd_poweroff_delay = 30; /* 30 seconds for power off delay */
+module_param(zpodd_poweroff_delay, int, 0644);
+MODULE_PARM_DESC(zpodd_poweroff_delay, "Poweroff delay for ZPODD in seconds");
+
 struct zpodd {
 	bool slot:1;
 	bool drawer:1;
 
 	/* The following bits are synchronized by PM core */
 	bool from_notify:1; /* resumed as a result of acpi notification */
+	bool zp_ready:1;	/* zero power ready state */
+
+	unsigned long last_ready; /* last zero power ready timestamp */
 
 	struct ata_device *dev;
 };
@@ -85,6 +93,68 @@ static bool odd_can_poweroff(struct ata_device *ata_dev)
 	return acpi_device_can_poweroff(acpi_dev);
 }
 
+/* Test if ODD is zero power ready by sense code */
+static bool zpready(struct ata_device *dev)
+{
+	u8 sense_key, *sense_buf;
+	unsigned int ret, asc, ascq, add_len;
+	struct zpodd *zpodd = dev->zpodd;
+
+	ret = atapi_eh_tur(dev, &sense_key);
+
+	if (!ret || sense_key != NOT_READY)
+		return false;
+
+	sense_buf = dev->link->ap->sector_buf;
+	ret = atapi_eh_request_sense(dev, sense_buf, sense_key);
+	if (ret)
+		return false;
+
+	/* sense valid */
+	if ((sense_buf[0] & 0x7f) != 0x70)
+		return false;
+
+	add_len = sense_buf[7];
+	/* has asc and ascq */
+	if (add_len < 6)
+		return false;
+
+	asc = sense_buf[12];
+	ascq = sense_buf[13];
+
+	if (zpodd->slot)
+		/* no media inside */
+		return asc == 0x3a;
+	else
+		/* no media inside and door closed */
+		return asc == 0x3a && ascq == 0x01;
+}
+
+/* Check zero power ready status */
+void zpodd_on_suspend(struct ata_device *dev)
+{
+	struct zpodd *zpodd = dev->zpodd;
+	unsigned long expires;
+
+	if (!zpready(dev)) {
+		zpodd->zp_ready = false;
+		zpodd->last_ready = 0;
+		return;
+	}
+
+	if (!zpodd->last_ready) {
+		zpodd->last_ready = jiffies;
+		return;
+	}
+
+	expires = zpodd->last_ready +
+		  msecs_to_jiffies(zpodd_poweroff_delay * 1000);
+	if (time_before(jiffies, expires))
+		return;
+
+	zpodd->zp_ready = true;
+}
+
 static void zpodd_wake_dev(acpi_handle handle, u32 event, void *context)
 {
 	struct ata_device *ata_dev = context;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 8cb4372..f9235dc 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -182,6 +182,9 @@ extern void ata_eh_finish(struct ata_port *ap);
 extern int ata_ering_map(struct ata_ering *ering,
 			 int (*map_fn)(struct ata_ering_entry *, void *),
 		  	 void *arg);
+extern unsigned int atapi_eh_tur(struct ata_device *dev, u8 *r_sense_key);
+extern unsigned int atapi_eh_request_sense(struct ata_device *dev,
+					   u8 *sense_buf, u8 dfl_sense_key);
 
 /* libata-pmp.c */
 #ifdef CONFIG_SATA_PMP
@@ -238,10 +241,12 @@ static inline bool zpodd_dev_enabled(struct ata_device *dev)
 {
 	return dev->zpodd ? true : false;
 }
+void zpodd_on_suspend(struct ata_device *dev);
 #else /* CONFIG_SATA_ZPODD */
 static inline void zpodd_init(struct ata_device *dev) {}
 static inline void zpodd_exit(struct ata_device *dev) {}
 static inline bool zpodd_dev_enabled(struct ata_device *dev) { return false; }
+static inline void zpodd_on_suspend(struct ata_device *dev) {}
 #endif /* CONFIG_SATA_ZPODD */
 
 #endif /* __LIBATA_H__ */
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 3/9] libata: identify and init ZPODD devices
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

The ODD can be enabled for ZPODD if the following three conditions are
satisfied:
1 The ODD supports device attention;
2 The platform can runtime power off the ODD through ACPI;
3 The ODD is either slot type or drawer type.
For such ODDs, zpodd_init is called and a new structure is allocated for
it to store ZPODD related stuffs.

And the zpodd_dev_enabled function is used to test if ZPODD is currently
enabled for this ODD.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-acpi.c  |   2 +
 drivers/ata/libata-core.c  |   4 +-
 drivers/ata/libata-zpodd.c | 114 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h       |  14 ++++++
 include/linux/libata.h     |   3 ++
 include/uapi/linux/cdrom.h |  34 ++++++++++++++
 6 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ata/libata-zpodd.c

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index ef01ac0..5aa7322 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -1063,6 +1063,8 @@ void ata_acpi_bind(struct ata_device *dev)
 
 void ata_acpi_unbind(struct ata_device *dev)
 {
+	if (zpodd_dev_enabled(dev))
+		zpodd_exit(dev);
 	ata_acpi_remove_pm_notifier(dev);
 	ata_acpi_unregister_power_resource(dev);
 }
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 9e8b99a..65a362e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2396,8 +2396,10 @@ int ata_dev_configure(struct ata_device *dev)
 			dma_dir_string = ", DMADIR";
 		}
 
-		if (ata_id_has_da(dev->id))
+		if (ata_id_has_da(dev->id)) {
 			dev->flags |= ATA_DFLAG_DA;
+			zpodd_init(dev);
+		}
 
 		/* print device info to dmesg */
 		if (ata_msg_drv(ap) && print_info)
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
new file mode 100644
index 0000000..58bf891
--- /dev/null
+++ b/drivers/ata/libata-zpodd.c
@@ -0,0 +1,114 @@
+#include <linux/libata.h>
+#include <linux/cdrom.h>
+
+#include "libata.h"
+
+struct zpodd {
+	bool slot:1;
+	bool drawer:1;
+
+	struct ata_device *dev;
+};
+
+static int run_atapi_cmd(struct ata_device *dev, const char *cdb,
+		unsigned short cdb_len, char *buf, unsigned int buf_len)
+{
+	struct ata_taskfile tf = {0};
+
+	tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE;
+	tf.command = ATA_CMD_PACKET;
+
+	if (buf) {
+		tf.protocol = ATAPI_PROT_PIO;
+		tf.lbam = buf_len;
+	} else {
+		tf.protocol = ATAPI_PROT_NODATA;
+	}
+
+	return ata_exec_internal(dev, &tf, cdb,
+			buf ? DMA_FROM_DEVICE : DMA_NONE, buf, buf_len, 0);
+}
+
+/*
+ * Per the spec, only slot type and drawer type ODD can be supported
+ *
+ * Return 0 for slot type, 1 for drawer, -ENODEV for other types or on error.
+ */
+static int check_loading_mechanism(struct ata_device *dev)
+{
+	char buf[16];
+	unsigned int ret;
+	struct rm_feature_desc *desc = (void *)(buf + 8);
+
+	char cdb[] = {  GPCMD_GET_CONFIGURATION,
+			2,      /* only 1 feature descriptor requested */
+			0, 3,   /* 3, removable medium feature */
+			0, 0, 0,/* reserved */
+			0, sizeof(buf),
+			0, 0, 0,
+	};
+
+	ret = run_atapi_cmd(dev, cdb, sizeof(cdb), buf, sizeof(buf));
+	if (ret)
+		return -ENODEV;
+
+	if (be16_to_cpu(desc->feature_code) != 3)
+		return -ENODEV;
+
+	if (desc->mech_type == 0 && desc->load == 0 && desc->eject == 1)
+		return 0; /* slot */
+	else if (desc->mech_type == 1 && desc->load == 0 && desc->eject == 1)
+		return 1; /* drawer */
+	else
+		return -ENODEV;
+}
+
+static bool odd_can_poweroff(struct ata_device *ata_dev)
+{
+	acpi_handle handle;
+	acpi_status status;
+	struct acpi_device *acpi_dev;
+
+	handle = ata_dev_acpi_handle(ata_dev);
+	if (!handle)
+		return false;
+
+	status = acpi_bus_get_device(handle, &acpi_dev);
+	if (ACPI_FAILURE(status))
+		return false;
+
+	return acpi_device_can_poweroff(acpi_dev);
+}
+
+void zpodd_init(struct ata_device *dev)
+{
+	int ret;
+	struct zpodd *zpodd;
+
+	if (dev->zpodd)
+		return;
+
+	if (!odd_can_poweroff(dev))
+		return;
+
+	if ((ret = check_loading_mechanism(dev)) == -ENODEV)
+		return;
+
+	zpodd = kzalloc(sizeof(struct zpodd), GFP_KERNEL);
+	if (!zpodd)
+		return;
+
+	if (ret)
+		zpodd->drawer = true;
+	else
+		zpodd->slot = true;
+
+	zpodd->dev = dev;
+	dev->zpodd = zpodd;
+}
+
+void zpodd_exit(struct ata_device *dev)
+{
+	kfree(dev->zpodd);
+	dev->zpodd = NULL;
+}
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index 7148a58..8cb4372 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -230,4 +230,18 @@ static inline void ata_sff_exit(void)
 { }
 #endif /* CONFIG_ATA_SFF */
 
+/* libata-zpodd.c */
+#ifdef CONFIG_SATA_ZPODD
+void zpodd_init(struct ata_device *dev);
+void zpodd_exit(struct ata_device *dev);
+static inline bool zpodd_dev_enabled(struct ata_device *dev)
+{
+	return dev->zpodd ? true : false;
+}
+#else /* CONFIG_SATA_ZPODD */
+static inline void zpodd_init(struct ata_device *dev) {}
+static inline void zpodd_exit(struct ata_device *dev) {}
+static inline bool zpodd_dev_enabled(struct ata_device *dev) { return false; }
+#endif /* CONFIG_SATA_ZPODD */
+
 #endif /* __LIBATA_H__ */
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 83ba0ab..f88f909 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -620,6 +620,9 @@ struct ata_device {
 	union acpi_object	*gtf_cache;
 	unsigned int		gtf_filter;
 #endif
+#ifdef CONFIG_SATA_ZPODD
+	void			*zpodd;
+#endif
 	struct device		tdev;
 	/* n_sector is CLEAR_BEGIN, read comment above CLEAR_BEGIN */
 	u64			n_sectors;	/* size of device, if ATA */
diff --git a/include/uapi/linux/cdrom.h b/include/uapi/linux/cdrom.h
index 898b866..bd17ad5 100644
--- a/include/uapi/linux/cdrom.h
+++ b/include/uapi/linux/cdrom.h
@@ -908,5 +908,39 @@ struct mode_page_header {
 	__be16 desc_length;
 };
 
+/* removable medium feature descriptor */
+struct rm_feature_desc {
+	__be16 feature_code;
+#if defined(__BIG_ENDIAN_BITFIELD)
+	__u8 reserved1:2;
+	__u8 feature_version:4;
+	__u8 persistent:1;
+	__u8 curr:1;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	__u8 curr:1;
+	__u8 persistent:1;
+	__u8 feature_version:4;
+	__u8 reserved1:2;
+#endif
+	__u8 add_len;
+#if defined(__BIG_ENDIAN_BITFIELD)
+	__u8 mech_type:3;
+	__u8 load:1;
+	__u8 eject:1;
+	__u8 pvnt_jmpr:1;
+	__u8 dbml:1;
+	__u8 lock:1;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+	__u8 lock:1;
+	__u8 dbml:1;
+	__u8 pvnt_jmpr:1;
+	__u8 eject:1;
+	__u8 load:1;
+	__u8 mech_type:3;
+#endif
+	__u8 reserved2;
+	__u8 reserved3;
+	__u8 reserved4;
+};
 
 #endif /* _UAPI_LINUX_CDROM_H */
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 4/9] libata: move acpi notification code to zpodd
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

Since the ata acpi notification code introduced in commit
3bd46600a7a7e938c54df8cdbac9910668c7dfb0 is solely for ZPODD, and we
now have a dedicated place for it, move these code there.

And the ata_acpi_add_pm_notifier code is changed a little bit in that it
is now invoked when scsi device is not bound with ACPI yet, so the way
to get the acpi handle is different with the previous version. And the
ata_acpi_add/remove_pm_notifier is also simplified a little bit in that
it doesn't check if the acpi_device for the handle exists or not as the
odd_can_poweroff function already checked that.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/libata-acpi.c  | 71 ----------------------------------------------
 drivers/ata/libata-zpodd.c | 33 +++++++++++++++++++++
 2 files changed, 33 insertions(+), 71 deletions(-)

diff --git a/drivers/ata/libata-acpi.c b/drivers/ata/libata-acpi.c
index 5aa7322..b4788dd 100644
--- a/drivers/ata/libata-acpi.c
+++ b/drivers/ata/libata-acpi.c
@@ -974,57 +974,6 @@ void ata_acpi_on_disable(struct ata_device *dev)
 	ata_acpi_clear_gtf(dev);
 }
 
-static void ata_acpi_wake_dev(acpi_handle handle, u32 event, void *context)
-{
-	struct ata_device *ata_dev = context;
-
-	if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev &&
-			pm_runtime_suspended(&ata_dev->sdev->sdev_gendev))
-		scsi_autopm_get_device(ata_dev->sdev);
-}
-
-static void ata_acpi_add_pm_notifier(struct ata_device *dev)
-{
-	struct acpi_device *acpi_dev;
-	acpi_handle handle;
-	acpi_status status;
-
-	handle = ata_dev_acpi_handle(dev);
-	if (!handle)
-		return;
-
-	status = acpi_bus_get_device(handle, &acpi_dev);
-	if (ACPI_FAILURE(status))
-		return;
-
-	if (dev->sdev->can_power_off) {
-		acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
-			ata_acpi_wake_dev, dev);
-		device_set_run_wake(&dev->sdev->sdev_gendev, true);
-	}
-}
-
-static void ata_acpi_remove_pm_notifier(struct ata_device *dev)
-{
-	struct acpi_device *acpi_dev;
-	acpi_handle handle;
-	acpi_status status;
-
-	handle = ata_dev_acpi_handle(dev);
-	if (!handle)
-		return;
-
-	status = acpi_bus_get_device(handle, &acpi_dev);
-	if (ACPI_FAILURE(status))
-		return;
-
-	if (dev->sdev->can_power_off) {
-		device_set_run_wake(&dev->sdev->sdev_gendev, false);
-		acpi_remove_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
-			ata_acpi_wake_dev);
-	}
-}
-
 static void ata_acpi_register_power_resource(struct ata_device *dev)
 {
 	struct scsi_device *sdev = dev->sdev;
@@ -1057,7 +1006,6 @@ static void ata_acpi_unregister_power_resource(struct ata_device *dev)
 
 void ata_acpi_bind(struct ata_device *dev)
 {
-	ata_acpi_add_pm_notifier(dev);
 	ata_acpi_register_power_resource(dev);
 }
 
@@ -1065,7 +1013,6 @@ void ata_acpi_unbind(struct ata_device *dev)
 {
 	if (zpodd_dev_enabled(dev))
 		zpodd_exit(dev);
-	ata_acpi_remove_pm_notifier(dev);
 	ata_acpi_unregister_power_resource(dev);
 }
 
@@ -1107,9 +1054,6 @@ static int ata_acpi_bind_device(struct ata_port *ap, struct scsi_device *sdev,
 				acpi_handle *handle)
 {
 	struct ata_device *ata_dev;
-	acpi_status status;
-	struct acpi_device *acpi_dev;
-	struct acpi_device_power_state *states;
 
 	if (ap->flags & ATA_FLAG_ACPI_SATA) {
 		if (!sata_pmp_attached(ap))
@@ -1126,21 +1070,6 @@ static int ata_acpi_bind_device(struct ata_port *ap, struct scsi_device *sdev,
 	if (!*handle)
 		return -ENODEV;
 
-	status = acpi_bus_get_device(*handle, &acpi_dev);
-	if (ACPI_FAILURE(status))
-		return 0;
-
-	/*
-	 * If firmware has _PS3 or _PR3 for this device,
-	 * and this ata ODD device support device attention,
-	 * it means this device can be powered off
-	 */
-	states = acpi_dev->power.states;
-	if ((states[ACPI_STATE_D3_HOT].flags.valid ||
-			states[ACPI_STATE_D3_COLD].flags.explicit_set) &&
-			ata_dev->flags & ATA_DFLAG_DA)
-		sdev->can_power_off = 1;
-
 	return 0;
 }
 
diff --git a/drivers/ata/libata-zpodd.c b/drivers/ata/libata-zpodd.c
index 58bf891..c8c7b74 100644
--- a/drivers/ata/libata-zpodd.c
+++ b/drivers/ata/libata-zpodd.c
@@ -1,5 +1,7 @@
 #include <linux/libata.h>
 #include <linux/cdrom.h>
+#include <linux/pm_runtime.h>
+#include <scsi/scsi_device.h>
 
 #include "libata.h"
 
@@ -7,6 +9,9 @@ struct zpodd {
 	bool slot:1;
 	bool drawer:1;
 
+	/* The following bits are synchronized by PM core */
+	bool from_notify:1; /* resumed as a result of acpi notification */
+
 	struct ata_device *dev;
 };
 
@@ -80,6 +85,32 @@ static bool odd_can_poweroff(struct ata_device *ata_dev)
 	return acpi_device_can_poweroff(acpi_dev);
 }
 
+static void zpodd_wake_dev(acpi_handle handle, u32 event, void *context)
+{
+	struct ata_device *ata_dev = context;
+	struct zpodd *zpodd = ata_dev->zpodd;
+	struct device *dev = &ata_dev->sdev->sdev_gendev;
+
+	if (event == ACPI_NOTIFY_DEVICE_WAKE && ata_dev &&
+			pm_runtime_suspended(dev)) {
+		zpodd->from_notify = true;
+		pm_runtime_resume(dev);
+	}
+}
+
+static void ata_acpi_add_pm_notifier(struct ata_device *dev)
+{
+	acpi_handle handle = ata_dev_acpi_handle(dev);
+	acpi_install_notify_handler(handle, ACPI_SYSTEM_NOTIFY,
+				    zpodd_wake_dev, dev);
+}
+
+static void ata_acpi_remove_pm_notifier(struct ata_device *dev)
+{
+	acpi_handle handle = DEVICE_ACPI_HANDLE(&dev->sdev->sdev_gendev);
+	acpi_remove_notify_handler(handle, ACPI_SYSTEM_NOTIFY, zpodd_wake_dev);
+}
+
 void zpodd_init(struct ata_device *dev)
 {
 	int ret;
@@ -103,12 +134,14 @@ void zpodd_init(struct ata_device *dev)
 	else
 		zpodd->slot = true;
 
+	ata_acpi_add_pm_notifier(dev);
 	zpodd->dev = dev;
 	dev->zpodd = zpodd;
 }
 
 void zpodd_exit(struct ata_device *dev)
 {
+	ata_acpi_remove_pm_notifier(dev);
 	kfree(dev->zpodd);
 	dev->zpodd = NULL;
 }
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 2/9] libata: Add CONFIG_SATA_ZPODD
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

Added a new config CONFIG_SATA_ZPODD, which is used to support
SATA based zero power ODD(ZPODD), and depends on ATA_ACPI.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/ata/Kconfig  | 13 +++++++++++++
 drivers/ata/Makefile |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/ata/Kconfig b/drivers/ata/Kconfig
index e08d322..996d16c 100644
--- a/drivers/ata/Kconfig
+++ b/drivers/ata/Kconfig
@@ -58,6 +58,19 @@ config ATA_ACPI
 	  You can disable this at kernel boot time by using the
 	  option libata.noacpi=1
 
+config SATA_ZPODD
+	bool "SATA Zero Power ODD Support"
+	depends on ATA_ACPI
+	default n
+	help
+	  This option adds support for SATA ZPODD. It requires both
+	  ODD and the platform support, and if enabled, will automatically
+	  power on/off the ODD when certain condition is satisfied. This
+	  does not impact user's experience of the ODD, only power is saved
+	  when ODD is not in use(i.e. no disc inside).
+
+	  If unsure, say N.
+
 config SATA_PMP
 	bool "SATA Port Multiplier support"
 	default y
diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index 9329daf..85e3de4 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -107,3 +107,4 @@ libata-y	:= libata-core.o libata-scsi.o libata-eh.o libata-transport.o
 libata-$(CONFIG_ATA_SFF)	+= libata-sff.o
 libata-$(CONFIG_SATA_PMP)	+= libata-pmp.o
 libata-$(CONFIG_ATA_ACPI)	+= libata-acpi.o
+libata-$(CONFIG_SATA_ZPODD)	+= libata-zpodd.o
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 1/9] scsi: sr: support runtime pm
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu
In-Reply-To: <1357440509-28108-1-git-send-email-aaron.lu@intel.com>

This patch adds runtime pm support for sr.

It did this by increasing the runtime usage_count of the device when:
- its block device is opened;
- the events checking is to run.

And decreasing the runtime usage_count of the device when:
- its block device is closed;
- After the events checking is done.

The idea is discussed in this mail thread:
http://thread.gmane.org/gmane.linux.acpi.devel/55243/focus=52703

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 drivers/scsi/sr.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 5fc97d2..4d1a610 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -45,6 +45,7 @@
 #include <linux/blkdev.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/pm_runtime.h>
 #include <asm/uaccess.h>
 
 #include <scsi/scsi.h>
@@ -146,7 +147,8 @@ static inline struct scsi_cd *scsi_cd_get(struct gendisk *disk)
 	kref_get(&cd->kref);
 	if (scsi_device_get(cd->device))
 		goto out_put;
-	goto out;
+	if (!scsi_autopm_get_device(cd->device))
+		goto out;
 
  out_put:
 	kref_put(&cd->kref, sr_kref_release);
@@ -162,6 +164,7 @@ static void scsi_cd_put(struct scsi_cd *cd)
 
 	mutex_lock(&sr_ref_mutex);
 	kref_put(&cd->kref, sr_kref_release);
+	scsi_autopm_put_device(sdev);
 	scsi_device_put(sdev);
 	mutex_unlock(&sr_ref_mutex);
 }
@@ -211,7 +214,7 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 				    unsigned int clearing, int slot)
 {
 	struct scsi_cd *cd = cdi->handle;
-	bool last_present;
+	bool last_present = cd->media_present;
 	struct scsi_sense_hdr sshdr;
 	unsigned int events;
 	int ret;
@@ -220,6 +223,8 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	if (CDSL_CURRENT != slot)
 		return 0;
 
+	scsi_autopm_get_device(cd->device);
+
 	events = sr_get_events(cd->device);
 	cd->get_event_changed |= events & DISK_EVENT_MEDIA_CHANGE;
 
@@ -246,10 +251,9 @@ static unsigned int sr_check_events(struct cdrom_device_info *cdi,
 	}
 
 	if (!(clearing & DISK_EVENT_MEDIA_CHANGE))
-		return events;
+		goto out;
 do_tur:
 	/* let's see whether the media is there with TUR */
-	last_present = cd->media_present;
 	ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr);
 
 	/*
@@ -270,7 +274,7 @@ do_tur:
 	}
 
 	if (cd->ignore_get_event)
-		return events;
+		goto out;
 
 	/* check whether GET_EVENT is reporting spurious MEDIA_CHANGE */
 	if (!cd->tur_changed) {
@@ -287,6 +291,18 @@ do_tur:
 	cd->tur_changed = false;
 	cd->get_event_changed = false;
 
+out:
+	/*
+	 * If there is no medium detected or the medium has been there
+	 * since last poll, try to suspend the device. Otherwise, keep
+	 * it active for one more poll interval so that if user space
+	 * application opens the block device, we can avoid a runtime
+	 * status change.
+	 */
+	pm_runtime_put_noidle(&cd->device->sdev_gendev);
+	if (!cd->media_present || last_present)
+		pm_runtime_suspend(&cd->device->sdev_gendev);
+
 	return events;
 }
 
@@ -718,6 +734,8 @@ static int sr_probe(struct device *dev)
 
 	sdev_printk(KERN_DEBUG, sdev,
 		    "Attached scsi CD-ROM %s\n", cd->cdi.name);
+	scsi_autopm_put_device(cd->device);
+
 	return 0;
 
 fail_put:
@@ -965,6 +983,8 @@ static int sr_remove(struct device *dev)
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
 
+	scsi_autopm_get_device(cd->device);
+
 	blk_queue_prep_rq(cd->device->request_queue, scsi_prep_fn);
 	del_gendisk(cd->disk);
 
-- 
1.7.11.7


^ permalink raw reply related

* [PATCH v11 0/9] ZPODD Patches
From: Aaron Lu @ 2013-01-06  2:48 UTC (permalink / raw)
  To: Jeff Garzik, James Bottomley, Rafael J. Wysocki, Alan Stern,
	Tejun Heo
  Cc: Aaron Lu, Jeff Wu, linux-ide, linux-pm, linux-scsi, linux-acpi,
	Aaron Lu

v11:
Introduce event_driven flag in scsi_device to silence the media event
poll after ODD is powered off;
Removed ata layer PM QOS control, instead, simply limit ACPI state to
D3_HOT when choosing state;
Make the power off delay a module param named zpodd_poweroff_delay,
defaults to 30 seconds.

v10:
Introduce PM_QOS_NO_POLL flag to skip calling disk's events_check
callback;
Do not use zero power ready hint information from event poll;
Check attached device in port's runtime idle callback to decide if
suspend is desired;
Address various comments from Tejun Heo.

v9:
Build ZPODD as part of libata instead of another standalone module
as it is tightly related to other libata files.
Identify and init ZPODD during probe time instead of after SCSI
device is created as suggested by Tejun Heo.
Make use of pm qos flag to give ACPI hint when choosing ACPI state.
Expose qos flag to give user control of whether power off is allowed.

This patchset used Rafael's pm-qos work:
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-qos

v8:
This version is a redesign, it doesn't have much to do with previous
versions. The ZPODD implementation is done almost entirely in ATA layer
now, except 2 helper functions from SCSI sr driver to block disk events.

The basic idea is that, when ata port is runtime suspended, it will
check if the ODD is ready to be powered off. And if yes, events is
blocked and power omitted; if not, ODD's power supply remains unchanged
by keeping ACPI state at D0.

Some background knowledge about ZPODD is added below v1 history log.

v7:
Re work of runtime pm of sr driver, based on ideas of Alan Stern and
Oliver Neukum.

Jeff, due to the ready_to_power_off flag added, there is a small
change in [PATCH v7 6/6] libata: acpi: respect may_power_off flag,
please check if I can still get your ack, thanks.

v6:
When user changes may_power_off flag through sysfs entry and if device
is already runtime suspended, resume resume it so that it can respect
this flag next time it is runtime suspended as suggested by Alan Stern.
Call scsi_autopm_get/put_device once in sr_check_events as suggested by
Alan Stern.

v5:
Add may_power_off flag to scsi device.
Alan Stern suggested that I should not mess runtime suspend with
runtime power off, but the current zpodd implementation made it not
easy to seperate. So I re-wrote the zpodd implementation, the end
result is, normal ODD can also enter runtime suspended state, but
their power won't be removed.

v4:
Rebase on top of Linus' tree, due to this, the problem of a missing
flag in v3 is gone;
Add a new function scsi_autopm_put_device_autosuspend to first mark
last busy for the device and then put autosuspend it as suggested by
Oliver Neukum.
Typo fix as pointed by Sergei Shtylyov.
Check can_power_off flag before any runtime pm operations in sr.

v3:
Rebase on top of scsi-misc tree;
Add the sr related patches previously in Jeff's libata tree;
Re-organize the sr patches.
A problem for now: for patch
scsi: sr: support zero power ODD(ZPODD)
I can't set a flag in libata-acpi.c since a related function is
missing in scsi-misc tree. Will fix this when 3.6-rc1 released.

v2:
Bug fix for v1;
Use scsi_autopm_* in sr driver instead of pm_runtime_*;

v1:
Here are some patches to make ZPODD easier to use for end users and
a fix for using ZPODD with system suspend.

Some background knowledge about ZPODD:
ODD means Optical Disc Drive.
ZPODD means Zero Power ODD, it is a mechanism to place the ODD into
zero power state when the system is running at S0 system state without
user's awareness.
It achieved this by ACPI and SATA device attention pin. For power off,
normal ACPI control method is used to place the device into D3 cold
ACPI device state, aka. device power supply omitted. For power on, when
user press the eject button of a drawer type ODD or when user inserts
an ODD into a slot type ODD, the device attention pin will trigger. In
the current x86 implementation, this pin will connect to a GPE, and the
GPE will trigger an ACPI interrupt. With our pre-registered ACPI
notification code, the device can be runtime resumed, and we place the
device back to full power state by setting its ACPI state to D0. The
whole process is transparent to the end user.

Aaron Lu (9):
  scsi: sr: support runtime pm
  libata: Add CONFIG_SATA_ZPODD
  libata: identify and init ZPODD devices
  libata: move acpi notification code to zpodd
  libata: check zero power ready status for ZPODD
  libata: handle power transition of ODD
  libata: expose pm qos flags for ata device
  libata: no poll when ODD is powered off
  libata: do not suspend port if normal ODD is attached

 drivers/ata/Kconfig        |  13 +++
 drivers/ata/Makefile       |   1 +
 drivers/ata/libata-acpi.c  | 111 +++++-------------
 drivers/ata/libata-core.c  |  23 +++-
 drivers/ata/libata-eh.c    |  12 +-
 drivers/ata/libata-zpodd.c | 281 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata.h       |  27 +++++
 drivers/scsi/sr.c          |  36 +++++-
 include/linux/libata.h     |   3 +
 include/scsi/scsi_device.h |   1 +
 include/uapi/linux/cdrom.h |  34 ++++++
 11 files changed, 452 insertions(+), 90 deletions(-)
 create mode 100644 drivers/ata/libata-zpodd.c

-- 
1.7.11.7


^ permalink raw reply

* Re: [PATCH/RFC 0/1] Delete legacy power trace API
From: Paul Gortmaker @ 2013-01-06  0:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Arjan van de Ven, Frederic Weisbecker, Steven Rostedt,
	linux-kernel
In-Reply-To: <5190666.Mu6UtpjCCL@vostro.rjw.lan>

[Re: [PATCH/RFC 0/1] Delete legacy power trace API] On 05/01/2013 (Sat 23:10) Rafael J. Wysocki wrote:

> On Friday, January 04, 2013 08:49:03 PM Paul Gortmaker wrote:
> > The actual deletion is mind-numbingly simple; and if you go by the
> > comments in the code, it is well overdue.  However, in discussions
> > with Frederic, he suggested to me that those comments might have
> > been overly optimistic, and that there may still be people out
> > there who are still unknowingly using this dead API.
> > 
> > So, that is the crux of the RFC component -- to check whether the
> > comments saying "delete by v3.1" can be taken at face value, or
> > whether they were overly optimistic, and hence this stuff is still
> > actively used even though it is overdue for deletion.
> 
> Do you want me or the tracing maintainers to handle this?

I have no particular preference as to what path it takes in
getting merged to mainline - just so long as we don't hear anyone
requesting it to _not_ be removed in the next few days.

Paul.
--


> 
> Rafael
> 
> 
> > ---
> > 
> > Paul Gortmaker (1):
> >   tracing: remove deprecated power trace API
> > 
> >  Documentation/trace/events-power.txt | 27 +----------
> >  arch/arm/mach-omap2/pm34xx.c         |  2 -
> >  arch/x86/kernel/process.c            |  6 ---
> >  drivers/cpufreq/cpufreq.c            |  1 -
> >  drivers/cpuidle/cpuidle.c            |  2 -
> >  include/trace/events/power.h         | 92 ------------------------------------
> >  kernel/trace/Kconfig                 | 15 ------
> >  kernel/trace/power-traces.c          |  3 --
> >  8 files changed, 1 insertion(+), 147 deletions(-)
> > 
> > 
> -- 
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: [PATCH/RFC 0/1] Delete legacy power trace API
From: Rafael J. Wysocki @ 2013-01-05 22:10 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: linux-pm, Arjan van de Ven, Frederic Weisbecker, Steven Rostedt,
	linux-kernel
In-Reply-To: <1357350544-32704-1-git-send-email-paul.gortmaker@windriver.com>

On Friday, January 04, 2013 08:49:03 PM Paul Gortmaker wrote:
> The actual deletion is mind-numbingly simple; and if you go by the
> comments in the code, it is well overdue.  However, in discussions
> with Frederic, he suggested to me that those comments might have
> been overly optimistic, and that there may still be people out
> there who are still unknowingly using this dead API.
> 
> So, that is the crux of the RFC component -- to check whether the
> comments saying "delete by v3.1" can be taken at face value, or
> whether they were overly optimistic, and hence this stuff is still
> actively used even though it is overdue for deletion.

Do you want me or the tracing maintainers to handle this?

Rafael


> ---
> 
> Paul Gortmaker (1):
>   tracing: remove deprecated power trace API
> 
>  Documentation/trace/events-power.txt | 27 +----------
>  arch/arm/mach-omap2/pm34xx.c         |  2 -
>  arch/x86/kernel/process.c            |  6 ---
>  drivers/cpufreq/cpufreq.c            |  1 -
>  drivers/cpuidle/cpuidle.c            |  2 -
>  include/trace/events/power.h         | 92 ------------------------------------
>  kernel/trace/Kconfig                 | 15 ------
>  kernel/trace/power-traces.c          |  3 --
>  8 files changed, 1 insertion(+), 147 deletions(-)
> 
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* Re: 3.8-rc[12] cpufreq build errors...
From: Rafael J. Wysocki @ 2013-01-05 21:53 UTC (permalink / raw)
  To: Woody Suwalski; +Cc: LKML, Linux PM list
In-Reply-To: <50E83B2D.9000609@gmail.com>

On Saturday, January 05, 2013 09:39:41 AM Woody Suwalski wrote:
> Rafael, in 3.8 kernel part of the common logic has been moved to a 
> separate cpufreq_governor.c file.
> According to the Makefile, the cpufreq_governor.o is to be linked to 
> other cpufreq  modules.
> 
> However I see that a separate malformed cpufreq_governor.ko is created, 
> and then the real modules can not work without the common logic either.
> 
> The build config is a simple 32-bit config, on top of vanilla source.
> 
> I submit a verbose build log with "ls *.ko" and some "modinfo" outputs 
> attached to the bottom.
> Clearly the build subsystem is misinterpreting the build intentions.
> 
> Could you please check if you see the same issue in your builds?
> I am using a 32-bit October-ish Debian Testing as a build machine (Eeepc 
> netbook).

This issue has already been reported and there's a fix scheduled for
inclusion in the master branch of the linux-pm.git tree.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply

* [PATCH v7 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, Ming Lei
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei@canonical.com>

The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
--
v7:
	- optimize on pm_runtime_set_memalloc_noio(true)

v5:
	- fix code style error
	- add comment on clear the device memalloc_noio flag
v4:
	- rename memalloc_noio_resume as memalloc_noio
	- remove pm_runtime_get_memalloc_noio()
	- add comments on pm_runtime_set_memalloc_noio
v3:
	- introduce pm_runtime_get_memalloc_noio()
	- hold one global lock on pm_runtime_set_memalloc_noio
	- hold device power lock when accessing memalloc_noio_resume
	  flag suggested by Alan Stern
	- implement pm_runtime_set_memalloc_noio without recursion
	  suggested by Alan Stern
v2:
	- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   70 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    3 ++
 3 files changed, 74 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..cd92e1c 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,76 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+	return dev->power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ *
+ *     If memory allocation with GFP_KERNEL is called inside runtime
+ *     resume/suspend callback of any one of its ancestors(or the
+ *     block device itself), the deadlock may be triggered inside the
+ *     memory allocation since it might not complete until the block
+ *     device becomes active and the involed page I/O finishes. The
+ *     situation is pointed out first by Alan Stern. Network device
+ *     are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+	static DEFINE_MUTEX(dev_hotplug_mutex);
+
+	mutex_lock(&dev_hotplug_mutex);
+	for (;;) {
+		bool enabled;
+
+		/* hold power lock since bitfield is not SMP-safe. */
+		spin_lock_irq(&dev->power.lock);
+		enabled = dev->power.memalloc_noio;
+		dev->power.memalloc_noio = enable;
+		spin_unlock_irq(&dev->power.lock);
+
+		/*
+		 * not need to enable ancestors any more if the device
+		 * has been enabled.
+		 */
+		if (enabled && enable)
+			break;
+
+		dev = dev->parent;
+
+		/*
+		 * clear flag of the parent device only if all the
+		 * children don't set the flag because ancestor's
+		 * flag was set by any one of the descendants.
+		 */
+		if (!dev || (!enable &&
+			     device_for_each_child(dev, NULL,
+						   dev_memalloc_noio)))
+			break;
+	}
+	mutex_unlock(&dev_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
 	unsigned int		irq_safe:1;
 	unsigned int		use_autosuspend:1;
 	unsigned int		timer_autosuspends:1;
+	unsigned int		memalloc_noio:1;
 	enum rpm_request	request;
 	enum rpm_status		runtime_status;
 	int			runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
 extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
 extern void pm_runtime_update_max_time_suspended(struct device *dev,
 						 s64 delta_ns);
+extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
 
 static inline bool pm_children_suspended(struct device *dev)
 {
@@ -149,6 +150,8 @@ static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
 						int delay) {}
 static inline unsigned long pm_runtime_autosuspend_expiration(
 				struct device *dev) { return 0; }
+static inline void pm_runtime_set_memalloc_noio(struct device *dev,
+						bool enable){}
 
 #endif /* !CONFIG_PM_RUNTIME */
 
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH v7 1/6] mm: teach mm by current context info to not do I/O during memory allocation
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, Ming Lei, Jiri Kosina, Mel Gorman,
	KAMEZAWA Hiroyuki, Michal Hocko, Ingo Molnar, Peter Zijlstra
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei@canonical.com>

This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device, network devices involved too
in the iSCSI case)

- during block device runtime suspend, because runtime resume need
to wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the
affected path into GFP_NOIO because these functions doing that may be
implemented as library and will be called in many other contexts.

In fact, memalloc_noio_flags() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL back in other non-affected contexts, at least
almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_NOIO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
	<-acpi_ut_allocate
		<-ACPI_ALLOCATE_ZEROED
			<-acpi_evaluate_object
				<-__acpi_bus_set_power
					<-acpi_bus_set_power
						<-acpi_pci_set_power_state
							<-platform_pci_set_power_state
								<-pci_platform_power_transition
									<-__pci_complete_power_transition
										<-pci_set_power_state
											<-pci_restore_standard_config
												<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
	<-finish_port_resume
		<-usb_port_resume
			<-generic_resume
				<-usb_resume_device
					<-usb_resume_both
						<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, ....

That is just what I have found.  Unfortunately, this allocation can
only be found by human being now, and there should be many not found
since any function in the resume path(call tree) may allocate memory
with GFP_KERNEL.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Cc: Jiri Kosina <jiri.kosina@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
--
v7:
	- fix type of 'flags' in memalloc_noio_save()/memalloc_noio_restore()
	- rebase on v3.8-rc2-next-20130104

v6:
	- replace GFP_IO with __GFP_IO to fix compile failure

v5:
	- use inline instead of macro to define memalloc_noio_*
	- replace memalloc_noio() with memalloc_noio_flags() to
	make code neater
	- don't clear GFP_FS because no GFP_IO means
	that allocation won't enter device driver as pointed by
	Andrew Morton

v4:
	- fix comment
v3:
	- no change
v2:
        - remove changes on 'may_writepage' and 'may_swap' because that
          isn't related with the patchset, and can't introduce I/O in
          allocation path if GFP_IOFS is unset, so handing 'may_swap'
          and may_writepage on GFP_NOIO or GFP_NOFS  should be a
          mm internal thing, and let mm guys deal with that, :-).

          Looks clearing the two may_XXX flag only excludes dirty pages
	  	  and anon pages for relaiming, and the behaviour should be decided
          by GFP FLAG, IMO.

        - unset GFP_IOFS in try_to_free_pages() path since
          alloc_page_buffers()
          and dma_alloc_from_contiguous may drop into the path, as
          pointed by KAMEZAWA Hiroyuki
v1:
        - take Minchan's change to avoid the check in alloc_page hot
          path

        - change the helpers' style into save/restore as suggested by
          Alan Stern
---
 include/linux/sched.h |   22 ++++++++++++++++++++++
 mm/page_alloc.c       |    9 ++++++++-
 mm/vmscan.c           |    4 ++--
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0df4a9d..b35ae0e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -51,6 +51,7 @@ struct sched_param {
 #include <linux/cred.h>
 #include <linux/llist.h>
 #include <linux/uidgid.h>
+#include <linux/gfp.h>
 
 #include <asm/processor.h>
 
@@ -1814,6 +1815,7 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define PF_FROZEN	0x00010000	/* frozen for system suspend */
 #define PF_FSTRANS	0x00020000	/* inside a filesystem transaction */
 #define PF_KSWAPD	0x00040000	/* I am kswapd */
+#define PF_MEMALLOC_NOIO 0x00080000	/* Allocating memory without IO involved */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_KTHREAD	0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE	0x00400000	/* randomize virtual address space */
@@ -1851,6 +1853,26 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
 #define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
 #define used_math() tsk_used_math(current)
 
+/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
+static inline gfp_t memalloc_noio_flags(gfp_t flags)
+{
+	if (unlikely(current->flags & PF_MEMALLOC_NOIO))
+		flags &= ~__GFP_IO;
+	return flags;
+}
+
+static inline unsigned int memalloc_noio_save(void)
+{
+	unsigned int flags = current->flags & PF_MEMALLOC_NOIO;
+	current->flags |= PF_MEMALLOC_NOIO;
+	return flags;
+}
+
+static inline void memalloc_noio_restore(unsigned int flags)
+{
+	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
+}
+
 /*
  * task->jobctl flags
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 500dc7a..b86f27e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2629,10 +2629,17 @@ retry_cpuset:
 	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
 			zonelist, high_zoneidx, alloc_flags,
 			preferred_zone, migratetype);
-	if (unlikely(!page))
+	if (unlikely(!page)) {
+		/*
+		 * Runtime PM, block IO and its error handling path
+		 * can deadlock because I/O on the device might not
+		 * complete.
+		 */
+		gfp_mask = memalloc_noio_flags(gfp_mask);
 		page = __alloc_pages_slowpath(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
+	}
 
 	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 292f50a..9f97f44 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2353,7 +2353,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 {
 	unsigned long nr_reclaimed;
 	struct scan_control sc = {
-		.gfp_mask = gfp_mask,
+		.gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)),
 		.may_writepage = !laptop_mode,
 		.nr_to_reclaim = SWAP_CLUSTER_MAX,
 		.may_unmap = 1,
@@ -3334,7 +3334,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
 		.may_swap = 1,
 		.nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX),
-		.gfp_mask = gfp_mask,
+		.gfp_mask = (gfp_mask = memalloc_noio_flags(gfp_mask)),
 		.order = order,
 		.priority = ZONE_RECLAIM_PRIORITY,
 	};
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH v7 6/6] USB: forbid memory allocation with I/O during bus reset
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, Ming Lei
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei@canonical.com>

If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
--
v5:
	- use inline memalloc_noio_save()
v4:
	- mark current memalloc_noio for every usb device reset
---
 drivers/usb/core/hub.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index a815fd2..698922e 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5040,6 +5040,7 @@ int usb_reset_device(struct usb_device *udev)
 {
 	int ret;
 	int i;
+	unsigned int noio_flag;
 	struct usb_host_config *config = udev->actconfig;
 
 	if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5049,6 +5050,17 @@ int usb_reset_device(struct usb_device *udev)
 		return -EINVAL;
 	}
 
+	/*
+	 * Don't allocate memory with GFP_KERNEL in current
+	 * context to avoid possible deadlock if usb mass
+	 * storage interface or usbnet interface(iSCSI case)
+	 * is included in current configuration. The easist
+	 * approach is to do it for every device reset,
+	 * because the device 'memalloc_noio' flag may have
+	 * not been set before reseting the usb device.
+	 */
+	noio_flag = memalloc_noio_save();
+
 	/* Prevent autosuspend during the reset */
 	usb_autoresume_device(udev);
 
@@ -5093,6 +5105,7 @@ int usb_reset_device(struct usb_device *udev)
 	}
 
 	usb_autosuspend_device(udev);
+	memalloc_noio_restore(noio_flag);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v7 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Greg Kroah-Hartman, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Alan Stern, Oliver Neukum, Minchan Kim, Rafael J. Wysocki,
	Jens Axboe, David S. Miller, Ming Lei
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume/runtime_suspend callback on device with
the flag of 'memalloc_noio' set.

Cc: Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>
Cc: Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org>
Cc: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
--
v7:
	- move memalloc_noio_save/memalloc_noio_restore into
	rpm_callback to avoid code duplication, as suggested
	by Rafael
v5:
	- use inline memalloc_noio_save()
v4:
	- runtime_suspend need this too because rpm_resume may wait for
	completion of concurrent runtime_suspend, so deadlock still may
	be triggered in runtime_suspend path.
---
 drivers/base/power/runtime.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index cd92e1c..1244930 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -348,7 +348,24 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
 	if (!cb)
 		return -ENOSYS;
 
-	retval = __rpm_callback(cb, dev);
+	if (dev->power.memalloc_noio) {
+		unsigned int noio_flag;
+
+		/*
+		 * Deadlock might be caused if memory allocation with
+		 * GFP_KERNEL happens inside runtime_suspend and
+		 * runtime_resume callbacks of one block device's
+		 * ancestor or the block device itself. Network
+		 * device might be thought as part of iSCSI block
+		 * device, so network device and its ancestor should
+		 * be marked as memalloc_noio too.
+		 */
+		noio_flag = memalloc_noio_save();
+		retval = __rpm_callback(cb, dev);
+		memalloc_noio_restore(noio_flag);
+	} else {
+		retval = __rpm_callback(cb, dev);
+	}
 
 	dev->power.runtime_error = retval;
 	return retval != -EACCES ? retval : -EIO;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v7 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, Ming Lei, Eric Dumazet, David Decotigny,
	Tom Herbert, Ingo Molnar
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei@canonical.com>

Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume and runtime_suspend callback of network devices in
iSCSI situation, so mark network devices and its ancestor as
'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio().

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Decotigny <david.decotigny@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
--
v7:
	- rebase on v3.8-rc2-next-20130104

v4:
	- call pm_runtime_set_memalloc_noio(ddev, true) after
	device_add
---
 net/core/net-sysfs.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 29c884a..67e00b2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -21,6 +21,7 @@
 #include <linux/vmalloc.h>
 #include <linux/export.h>
 #include <linux/jiffies.h>
+#include <linux/pm_runtime.h>
 
 #include "net-sysfs.h"
 
@@ -1409,6 +1410,8 @@ void netdev_unregister_kobject(struct net_device * net)
 
 	remove_queue_kobjects(net);
 
+	pm_runtime_set_memalloc_noio(dev, false);
+
 	device_del(dev);
 }
 
@@ -1453,6 +1456,8 @@ int netdev_register_kobject(struct net_device *net)
 		return error;
 	}
 
+	pm_runtime_set_memalloc_noio(dev, true);
+
 	return error;
 }
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v7 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller, Ming Lei
In-Reply-To: <1357352744-8138-1-git-send-email-ming.lei@canonical.com>

This patch applyes the introduced pm_runtime_set_memalloc_noio on
block device so that PM core will teach mm to not allocate memory with
GFP_IOFS when calling the runtime_resume and runtime_suspend callback
for block devices and its ancestors.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
--
v5:
	- fix code style and one typo
v4:
	- call pm_runtime_set_memalloc_noio(ddev, true) after device_add
---
 block/genhd.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 4125beb..2eb64a3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
 #include <linux/mutex.h>
 #include <linux/idr.h>
 #include <linux/log2.h>
+#include <linux/pm_runtime.h>
 
 #include "blk.h"
 
@@ -534,6 +535,14 @@ static void register_disk(struct gendisk *disk)
 			return;
 		}
 	}
+
+	/*
+	 * avoid probable deadlock caused by allocating memory with
+	 * GFP_KERNEL in runtime_resume callback of its all ancestor
+	 * devices
+	 */
+	pm_runtime_set_memalloc_noio(ddev, true);
+
 	disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
 	disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
 
@@ -663,6 +672,7 @@ void del_gendisk(struct gendisk *disk)
 	disk->driverfs_dev = NULL;
 	if (!sysfs_deprecated)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH v7 0/6] solve deadlock caused by memory allocation with I/O
From: Ming Lei @ 2013-01-05  2:25 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel
  Cc: Greg Kroah-Hartman, linux-usb, linux-pm, linux-mm, Alan Stern,
	Oliver Neukum, Minchan Kim, Rafael J. Wysocki, Jens Axboe,
	David S. Miller

Hi,

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime PM and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix the deadlock
problem on block device/network device during runtime PM or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IO during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

Andrew, could you queue these patches into your tree since V6 fixes all
your concerns and looks no one objects these patches?

Change logs:
V7:
	- rebase on v3.8-rc2-next-20130104
	- move memalloc_noio_save/memalloc_noio_restore into
        rpm_callback to avoid code duplication, as suggested
        by Rafael
	- optimize on pm_runtime_set_memalloc_noio(true)
	- fix type of 'flags' in memalloc_noio_save()/memalloc_noio_restore()
V6:
        - fix one compile failure(1/6), and only one line change
V5:
        - don't clear GFP_FS
        - coding style fix
        - add comments
        - see details in individual change logs
V4:
        - patches from the 2nd to the 6th changed
        - call pm_runtime_set_memalloc_noio() after device_add() as pointed
        by Alan
        - set PF_MEMALLOC_NOIO during runtime_suspend()
V3:
        - patch 2/6 and 5/6 changed, see their commit log
        - remove RFC from title since several guys have expressed that
        it is a reasonable solution
V2:
        - remove changes on 'may_writepage' and 'may_swap'(1/6)
        - unset GFP_IOFS in try_to_free_pages() path(1/6)
        - introduce pm_runtime_set_memalloc_noio()
        - only apply the meachnism on block/network device and its ancestors
        for runtime resume context
V1:
        - take Minchan's change to avoid the check in alloc_page hot path
        - change the helpers' style into save/restore as suggested by Alan
        - memory allocation with no io in usb bus reset path for all devices
        as suggested by Greg and Oliver

 block/genhd.c                |   10 +++++
 drivers/base/power/runtime.c |   89 +++++++++++++++++++++++++++++++++++++++++-
 drivers/usb/core/hub.c       |   13 ++++++
 include/linux/pm.h           |    1 +
 include/linux/pm_runtime.h   |    3 ++
 include/linux/sched.h        |   22 +++++++++++
 mm/page_alloc.c              |    9 ++++-
 mm/vmscan.c                  |    4 +-
 net/core/net-sysfs.c         |    5 +++
 9 files changed, 152 insertions(+), 4 deletions(-)


Thanks,
--
Ming Lei

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 1/1] tracing: remove deprecated power trace API
From: Paul Gortmaker @ 2013-01-05  1:49 UTC (permalink / raw)
  To: linux-pm
  Cc: Arjan van de Ven, Frederic Weisbecker, Steven Rostedt,
	linux-kernel, Paul Gortmaker
In-Reply-To: <1357350544-32704-1-git-send-email-paul.gortmaker@windriver.com>

The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 Documentation/trace/events-power.txt | 27 +----------
 arch/arm/mach-omap2/pm34xx.c         |  2 -
 arch/x86/kernel/process.c            |  6 ---
 drivers/cpufreq/cpufreq.c            |  1 -
 drivers/cpuidle/cpuidle.c            |  2 -
 include/trace/events/power.h         | 92 ------------------------------------
 kernel/trace/Kconfig                 | 15 ------
 kernel/trace/power-traces.c          |  3 --
 8 files changed, 1 insertion(+), 147 deletions(-)

diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt
index cf794af..e1498ff 100644
--- a/Documentation/trace/events-power.txt
+++ b/Documentation/trace/events-power.txt
@@ -17,7 +17,7 @@ Cf. include/trace/events/power.h for the events definitions.
 1. Power state switch events
 ============================
 
-1.1 New trace API
+1.1 Trace API
 -----------------
 
 A 'cpu' event class gathers the CPU-related events: cpuidle and
@@ -41,31 +41,6 @@ The event which has 'state=4294967295' in the trace is very important to the use
 space tools which are using it to detect the end of the current state, and so to
 correctly draw the states diagrams and to calculate accurate statistics etc.
 
-1.2 DEPRECATED trace API
-------------------------
-
-A new Kconfig option CONFIG_EVENT_POWER_TRACING_DEPRECATED with the default value of
-'y' has been created. This allows the legacy trace power API to be used conjointly
-with the new trace API.
-The Kconfig option, the old trace API (in include/trace/events/power.h) and the
-old trace points will disappear in a future release (namely 2.6.41).
-
-power_start		"type=%lu state=%lu cpu_id=%lu"
-power_frequency		"type=%lu state=%lu cpu_id=%lu"
-power_end		"cpu_id=%lu"
-
-The 'type' parameter takes one of those macros:
- . POWER_NONE	= 0,
- . POWER_CSTATE	= 1,	/* C-State */
- . POWER_PSTATE	= 2,	/* Frequency change or DVFS */
-
-The 'state' parameter is set depending on the type:
- . Target C-state for type=POWER_CSTATE,
- . Target frequency for type=POWER_PSTATE,
-
-power_end is used to indicate the exit of a state, corresponding to the latest
-power_start event.
-
 2. Clocks events
 ================
 The clock events are used for clock enable/disable and for
diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index 7be3622..2d93d8b 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -351,12 +351,10 @@ static void omap3_pm_idle(void)
 	if (omap_irq_pending())
 		goto out;
 
-	trace_power_start(POWER_CSTATE, 1, smp_processor_id());
 	trace_cpu_idle(1, smp_processor_id());
 
 	omap_sram_idle();
 
-	trace_power_end(smp_processor_id());
 	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 
 out:
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 2ed787f..dcfc1f4 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -375,7 +375,6 @@ void cpu_idle(void)
  */
 void default_idle(void)
 {
-	trace_power_start_rcuidle(POWER_CSTATE, 1, smp_processor_id());
 	trace_cpu_idle_rcuidle(1, smp_processor_id());
 	current_thread_info()->status &= ~TS_POLLING;
 	/*
@@ -389,7 +388,6 @@ void default_idle(void)
 	else
 		local_irq_enable();
 	current_thread_info()->status |= TS_POLLING;
-	trace_power_end_rcuidle(smp_processor_id());
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 }
 #ifdef CONFIG_APM_MODULE
@@ -423,7 +421,6 @@ void stop_this_cpu(void *dummy)
 static void mwait_idle(void)
 {
 	if (!need_resched()) {
-		trace_power_start_rcuidle(POWER_CSTATE, 1, smp_processor_id());
 		trace_cpu_idle_rcuidle(1, smp_processor_id());
 		if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -434,7 +431,6 @@ static void mwait_idle(void)
 			__sti_mwait(0, 0);
 		else
 			local_irq_enable();
-		trace_power_end_rcuidle(smp_processor_id());
 		trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 	} else
 		local_irq_enable();
@@ -447,12 +443,10 @@ static void mwait_idle(void)
  */
 static void poll_idle(void)
 {
-	trace_power_start_rcuidle(POWER_CSTATE, 0, smp_processor_id());
 	trace_cpu_idle_rcuidle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end_rcuidle(smp_processor_id());
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1f93dbd..99faadf 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -294,7 +294,6 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		adjust_jiffies(CPUFREQ_POSTCHANGE, freqs);
 		pr_debug("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
-		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
 		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8df53dd..d4713fe 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -153,7 +153,6 @@ int cpuidle_idle_call(void)
 		return 0;
 	}
 
-	trace_power_start_rcuidle(POWER_CSTATE, next_state, dev->cpu);
 	trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
 	if (cpuidle_state_is_coupled(dev, drv, next_state))
@@ -162,7 +161,6 @@ int cpuidle_idle_call(void)
 	else
 		entered_state = cpuidle_enter_state(dev, drv, next_state);
 
-	trace_power_end_rcuidle(dev->cpu);
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* give the governor an opportunity to reflect on the outcome */
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 0c97838..427acab 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -99,98 +99,6 @@ DEFINE_EVENT(wakeup_source, wakeup_source_deactivate,
 	TP_ARGS(name, state)
 );
 
-#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
-
-/*
- * The power events are used for cpuidle & suspend (power_start, power_end)
- *  and for cpufreq (power_frequency)
- */
-DECLARE_EVENT_CLASS(power,
-
-	TP_PROTO(unsigned int type, unsigned int state, unsigned int cpu_id),
-
-	TP_ARGS(type, state, cpu_id),
-
-	TP_STRUCT__entry(
-		__field(	u64,		type		)
-		__field(	u64,		state		)
-		__field(	u64,		cpu_id		)
-	),
-
-	TP_fast_assign(
-		__entry->type = type;
-		__entry->state = state;
-		__entry->cpu_id = cpu_id;
-	),
-
-	TP_printk("type=%lu state=%lu cpu_id=%lu", (unsigned long)__entry->type,
-		(unsigned long)__entry->state, (unsigned long)__entry->cpu_id)
-);
-
-DEFINE_EVENT(power, power_start,
-
-	TP_PROTO(unsigned int type, unsigned int state, unsigned int cpu_id),
-
-	TP_ARGS(type, state, cpu_id)
-);
-
-DEFINE_EVENT(power, power_frequency,
-
-	TP_PROTO(unsigned int type, unsigned int state, unsigned int cpu_id),
-
-	TP_ARGS(type, state, cpu_id)
-);
-
-TRACE_EVENT(power_end,
-
-	TP_PROTO(unsigned int cpu_id),
-
-	TP_ARGS(cpu_id),
-
-	TP_STRUCT__entry(
-		__field(	u64,		cpu_id		)
-	),
-
-	TP_fast_assign(
-		__entry->cpu_id = cpu_id;
-	),
-
-	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
-
-);
-
-/* Deprecated dummy functions must be protected against multi-declartion */
-#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
-#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
-
-enum {
-	POWER_NONE = 0,
-	POWER_CSTATE = 1,
-	POWER_PSTATE = 2,
-};
-#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
-
-#else /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
-
-#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
-#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
-enum {
-       POWER_NONE = 0,
-       POWER_CSTATE = 1,
-       POWER_PSTATE = 2,
-};
-
-/* These dummy declaration have to be ripped out when the deprecated
-   events get removed */
-static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
-static inline void trace_power_end(u64 cpuid) {};
-static inline void trace_power_start_rcuidle(u64 type, u64 state, u64 cpuid) {};
-static inline void trace_power_end_rcuidle(u64 cpuid) {};
-static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
-#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
-
-#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
-
 /*
  * The clock events are used for clock enable/disable and for
  *  clock rate change
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 5d89335..ad0a067 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -78,21 +78,6 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
-config EVENT_POWER_TRACING_DEPRECATED
-	depends on EVENT_TRACING
-	bool "Deprecated power event trace API, to be removed"
-	default y
-	help
-	  Provides old power event types:
-	  C-state/idle accounting events:
-	  power:power_start
-	  power:power_end
-	  and old cpufreq accounting event:
-	  power:power_frequency
-	  This is for userspace compatibility
-	  and will vanish after 5 kernel iterations,
-	  namely 3.1.
-
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index f55fcf6..1c71382 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,8 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-#ifdef EVENT_POWER_TRACING_DEPRECATED
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
-#endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.8.1


^ permalink raw reply related

* [PATCH/RFC 0/1] Delete legacy power trace API
From: Paul Gortmaker @ 2013-01-05  1:49 UTC (permalink / raw)
  To: linux-pm
  Cc: Arjan van de Ven, Frederic Weisbecker, Steven Rostedt,
	linux-kernel, Paul Gortmaker

The actual deletion is mind-numbingly simple; and if you go by the
comments in the code, it is well overdue.  However, in discussions
with Frederic, he suggested to me that those comments might have
been overly optimistic, and that there may still be people out
there who are still unknowingly using this dead API.

So, that is the crux of the RFC component -- to check whether the
comments saying "delete by v3.1" can be taken at face value, or
whether they were overly optimistic, and hence this stuff is still
actively used even though it is overdue for deletion.

Thanks,
Paul.
---

Paul Gortmaker (1):
  tracing: remove deprecated power trace API

 Documentation/trace/events-power.txt | 27 +----------
 arch/arm/mach-omap2/pm34xx.c         |  2 -
 arch/x86/kernel/process.c            |  6 ---
 drivers/cpufreq/cpufreq.c            |  1 -
 drivers/cpuidle/cpuidle.c            |  2 -
 include/trace/events/power.h         | 92 ------------------------------------
 kernel/trace/Kconfig                 | 15 ------
 kernel/trace/power-traces.c          |  3 --
 8 files changed, 1 insertion(+), 147 deletions(-)

-- 
1.8.1


^ permalink raw reply

* Re: [PATCH] cpuidle - fix lock contention in the idle path
From: Rafael J. Wysocki @ 2013-01-05  0:03 UTC (permalink / raw)
  To: Daniel Lezcano, Russ Anderson
  Cc: linux-pm, pdeschrijver, akpm, linux-kernel, rja
In-Reply-To: <50E6764C.9060608@linaro.org>

On Friday, January 04, 2013 07:27:24 AM Daniel Lezcano wrote:
> On 01/02/2013 10:13 PM, Russ Anderson wrote:
> > On Wed, Dec 26, 2012 at 11:01:48AM +0100, Daniel Lezcano wrote:
> >> The commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 introduces
> >> a lock in the cpuidle_get_cpu_driver function. This function
> >> is used in the idle_call function.
> >>
> >> The problem is the contention with a large number of cpus because
> >> they try to access the idle routine at the same time.
> >>
> >> The lock could be safely removed because of how is used the
> >> cpuidle api. The cpuidle_register_driver is called first but
> >> until the cpuidle_register_device is not called we don't
> >> enter in the cpuidle idle call function because the device
> >> is not enabled.
> >>
> >> The cpuidle_unregister_driver function, leading the a NULL driver,
> >> is not called before the cpuidle_unregister_device.
> >>
> >> This is how is used the cpuidle api from the different drivers.
> >>
> >> However, a cleanup around the lock and a proper refcounting
> >> mechanism should be used to ensure the consistency in the api,
> >> like cpuidle_unregister_driver should failed if its refcounting
> >> is not 0.
> >>
> >> These modifications will need some code reorganization and rewrite
> >> which does not fit with a fix.
> > 
> > I agree.
> > 
> >> The following patch is a hot fix by returning to the initial behavior
> >> by removing the lock when getting the driver.
> > 
> > The patch fixes the problem.  Verified on a system with 1024 cpus.
> > Thanks.
> > 
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > 
> > Reported-by: Russ Anderson <rja@sgi.com>
> > Acked-by: Russ Anderson <rja@sgi.com>
> 
> Hi Rafael,
> 
> could you consider this patch for merging ?

Yes, I've taken it already.

I'll include it into the next pull request.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox