* Re: [PATCH 0/5] PM: Generic PM domains and device PM QoS
From: Rafael J. Wysocki @ 2011-09-01 22:14 UTC (permalink / raw)
To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <CAORVsuVJEbLRXLuQFUh3_rDf+DfNFpOhEuBr33SRKqxBos=6Bw@mail.gmail.com>
Hi,
On Thursday, September 01, 2011, Jean Pihet wrote:
> Rafael,
>
> On Wed, Aug 31, 2011 at 12:17 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > Hi,
> >
> > This patchset illustrates how device PM QoS may be used along with
> > PM domains in my view.
> >
> > Actually, it consists of two parts. Namely, patches [1-3/5] seem to be
> > suitable for 3.2, unless somebody hates them,
> The patches [1-3/5] are ok (reviewed only) excepted some remarks I have.
OK, thanks for the comments.
> > but patches [4-5/5] are
> > total RFC. They haven't been tested, only compiled, so the use of them
> > is not encouraged (they may kill your dog or make your cat go wild, or
> > do something equally nasty, so beware).
> That looks like a disclaimer ;p
>
> > Their purpose is to illustrate
> > an idea that I'd like to discuss at the PM miniconference during the
> > LPC.
> There is some code for OMAP that dynamically updates the worst case
> values for devices activation and de-activation;
> cf._omap_device_activate and _omap_device_deactivate in
> arch/arm/plat-omap/omap_device.c. The idea is to start with reference
> figures (worst case measured on board) at boot and then update the
> worst case values at runtime.
> Based on the PM QoS values and the worst case latency values the next
> power domains states can be determined. Unfortunately this is not
> (yet) implemented.
I thought about that too, but I'd like to discuss the basic idea first.
> I am wondering if the patches [4-5/5] are meant to replace the OMAP
> code, which would be really nice.
I certainly hope they will be useful for multiple platforms. Whether
or not OMAP turns out to be one of them I can't tell at the moment.
Thanks,
Rafael
^ permalink raw reply
* Re: [RFC][PATCH 5/5] PM / Domains: Add default power off governor function
From: Rafael J. Wysocki @ 2011-09-01 22:11 UTC (permalink / raw)
To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <CAORVsuW7rFEN8u7RCpmWJamtG2iUbA_zz3ax4hgBjGAZW9vU=A@mail.gmail.com>
Hi,
On Thursday, September 01, 2011, Jean Pihet wrote:
> Rafael,
>
> On Wed, Aug 31, 2011 at 12:22 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > Add a function deciding whether or not a given PM domain should
> > be powered off on the basis of that domain's devices' PM QoS
> > constraints.
> >
> > ---
> > drivers/base/power/domain_governor.c | 96 +++++++++++++++++++++++++++++++++++
> > include/linux/pm_domain.h | 7 ++
> > 2 files changed, 103 insertions(+)
> >
> > Index: linux/include/linux/pm_domain.h
> > ===================================================================
> > --- linux.orig/include/linux/pm_domain.h
> > +++ linux/include/linux/pm_domain.h
> > @@ -49,6 +49,10 @@ struct generic_pm_domain {
> > int (*start_device)(struct device *dev);
> > int (*stop_device)(struct device *dev);
> > bool (*active_wakeup)(struct device *dev);
> > + ktime_t power_off_latency;
> > + ktime_t power_on_latency;
> > + s64 break_even_ns;
> > + s64 min_delta_ns;
> How are those values populated?
I'm not sure. There are a few possible ways to do that, but I simply
don't know which one is going to be the most useful. That's one of
the reasons why patches [4-5/5] are RFCs.
> Is there a mechanism that dynamically updates the values?
Obviously not at the moment.
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH 3/5] PM / QoS: Add function dev_pm_qos_read_value()
From: Rafael J. Wysocki @ 2011-09-01 22:07 UTC (permalink / raw)
To: Jean Pihet; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <CAORVsuX=vKiGxHrR4hZjeT4DV0vhUh+MXLwNjoOwgBDu1HRR5A@mail.gmail.com>
Hi,
On Thursday, September 01, 2011, Jean Pihet wrote:
> Hi Rafael,
>
> On Wed, Aug 31, 2011 at 12:21 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > From: Rafael J. Wysocki <rjw@sisk.pl>
> >
> > To read the current PM QoS value for a given device we need to
> > make sure that the device's power.constraints object won't be
> > removed while we're doing that. For this reason, put the
> > operation under dev->power.lock and acquire the lock
> > around the initialization and removal of power.constraints.
> Ok.
>
> > Moreover, since we're using the value of power.constraints to
> > determine whether or not the object is present, the
> > power.constraints_state field isn't necessary any more and may be
> > removed. However, dev_pm_qos_add_request() needs to check if the
> > device is being removed from the system before allocating a new
> > PM QoS constraints object for it, so it has to use device_pm_lock()
> > and the device PM QoS initialization and destruction should be done
> > under device_pm_lock() as well.
> Ok that makes sense.
> The constraints_state field can be replaced by a combination of
> dev->power.constraints and list_empty(&dev->power.entry), which makes
> the code more compact and less redundant.
>
> >
> > Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> > ---
> > drivers/base/power/main.c | 4 -
> > drivers/base/power/qos.c | 167 ++++++++++++++++++++++++++--------------------
> > include/linux/pm.h | 8 --
> > include/linux/pm_qos.h | 3
> > 4 files changed, 101 insertions(+), 81 deletions(-)
> >
> > Index: linux/drivers/base/power/qos.c
> > ===================================================================
> > --- linux.orig/drivers/base/power/qos.c
> > +++ linux/drivers/base/power/qos.c
> > @@ -30,15 +30,6 @@
> ...
>
> >
> > @@ -178,8 +202,8 @@ void dev_pm_qos_constraints_destroy(stru
> > *
> > * Returns 1 if the aggregated constraint value has changed,
> > * 0 if the aggregated constraint value has not changed,
> > - * -EINVAL in case of wrong parameters, -ENODEV if the device has been
> > - * removed from the system
> > + * -EINVAL in case of wrong parameters, -ENOMEM if there's not enough memory
> > + * to allocate for data structures.
> Why not use -ENODEV in case there is no device?
I don't think it's useful for the caller. If the device is gone, the
constraing simply doesn't matter, so there's no error to handle.
> > */
> > int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > s32 value)
> > @@ -195,28 +219,35 @@ int dev_pm_qos_add_request(struct device
> > return -EINVAL;
> > }
> >
> > - mutex_lock(&dev_pm_qos_mtx);
> > req->dev = dev;
> >
> > - /* Return if the device has been removed */
> > - if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
> > - ret = -ENODEV;
> > - goto out;
> > - }
> > + device_pm_lock();
> > + mutex_lock(&dev_pm_qos_mtx);
> >
> > - /*
> > - * Allocate the constraints data on the first call to add_request,
> > - * i.e. only if the data is not already allocated and if the device has
> > - * not been removed
> > - */
> > - if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
> > - ret = dev_pm_qos_constraints_allocate(dev);
> > + if (dev->power.constraints) {
> > + device_pm_unlock();
> > + } else {
> > + if (list_empty(&dev->power.entry)) {
> > + /* The device has been removed from the system. */
> > + device_pm_unlock();
> > + goto out;
> 0 is silently returned in case the device has been removed. Is that
> the intention?
Pretty much it is. Is that a problem?
Rafael
^ permalink raw reply
* Re: [PATCH] OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
From: Rafael J. Wysocki @ 2011-09-01 21:57 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linux-pm, linux-omap, linux-arm-kernel
In-Reply-To: <1876183.c78P8V3ZbL@wuerfel>
On Thursday, September 01, 2011, Arnd Bergmann wrote:
> On Thursday 01 September 2011 11:12:02 Kevin Hilman wrote:
> > The suspend/resume _noirq handlers were #ifdef'd out in the
> > !CONFIG_SUSPEND case, but were still assigned to the dev_pm_ops
> > struct. Fix by defining them to NULL in the !CONFIG_SUSPEND case.
> >
> > Reported-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Kevin Hilman <khilman@ti.com>
>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> Thansk for the fast response!
I'll apply the patch when kernel.org is back in order.
Thanks,
Rafael
^ permalink raw reply
* Re: [PATCH] OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
From: Arnd Bergmann @ 2011-09-01 18:22 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: linux-pm, linux-omap
In-Reply-To: <1314900722-10252-1-git-send-email-khilman@ti.com>
On Thursday 01 September 2011 11:12:02 Kevin Hilman wrote:
> The suspend/resume _noirq handlers were #ifdef'd out in the
> !CONFIG_SUSPEND case, but were still assigned to the dev_pm_ops
> struct. Fix by defining them to NULL in the !CONFIG_SUSPEND case.
>
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Thansk for the fast response!
^ permalink raw reply
* [PATCH] OMAP: omap_device: fix !CONFIG_SUSPEND case in _noirq handlers
From: Kevin Hilman @ 2011-09-01 18:12 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-pm, linux-omap, linux-arm-kernel, Arnd Bergmann
The suspend/resume _noirq handlers were #ifdef'd out in the
!CONFIG_SUSPEND case, but were still assigned to the dev_pm_ops
struct. Fix by defining them to NULL in the !CONFIG_SUSPEND case.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kevin Hilman <khilman@ti.com>
---
Applies to v3.1-rc4.
arch/arm/plat-omap/omap_device.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/arm/plat-omap/omap_device.c b/arch/arm/plat-omap/omap_device.c
index 9a6a538..02609ee 100644
--- a/arch/arm/plat-omap/omap_device.c
+++ b/arch/arm/plat-omap/omap_device.c
@@ -615,6 +615,9 @@ static int _od_resume_noirq(struct device *dev)
return pm_generic_resume_noirq(dev);
}
+#else
+#define _od_suspend_noirq NULL
+#define _od_resume_noirq NULL
#endif
static struct dev_pm_domain omap_device_pm_domain = {
--
1.7.6
^ permalink raw reply related
* Re: [PATCH 1/1] OMAP: omap_device: only override _noirq methods, not normal suspend/resume
From: Kevin Hilman @ 2011-09-01 17:50 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: linux-pm, linux-omap, linux-arm-kernel
In-Reply-To: <201109011800.55516.arnd@arndb.de>
Arnd Bergmann <arnd@arndb.de> writes:
> On Thursday 25 August 2011, Kevin Hilman wrote:
>> --- a/arch/arm/plat-omap/omap_device.c
>> +++ b/arch/arm/plat-omap/omap_device.c
>> @@ -622,7 +622,8 @@ static struct dev_pm_domain omap_device_pm_domain = {
>> SET_RUNTIME_PM_OPS(_od_runtime_suspend, _od_runtime_resume,
>> _od_runtime_idle)
>> USE_PLATFORM_PM_SLEEP_OPS
>> - SET_SYSTEM_SLEEP_PM_OPS(_od_suspend_noirq, _od_resume_noirq)
>> + .suspend_noirq = _od_suspend_noirq,
>> + .resume_noirq = _od_resume_noirq,
>> }
>> };
>
> This breaks if CONFIG_SUSPEND is not set and the _od_suspend_noirq/_od_resume_noirq
> functions are not defined.
Indeed.
Will post a fix shortly.
Kevin
^ permalink raw reply
* Re: [PATCH v9 2/4] PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
From: Turquette, Mike @ 2011-09-01 16:57 UTC (permalink / raw)
To: myungjoo.ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
In-Reply-To: <CAJ0PZbQiv0eKcTu4gzdtvXGAm9MbETeHQ7AgOcXA4gYCF8H9-g@mail.gmail.com>
On Wed, Aug 31, 2011 at 9:51 PM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> On Thu, Sep 1, 2011 at 5:05 AM, Turquette, Mike <mturquette@ti.com> wrote:
>> On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
[snip]
>>> +static void devfreq_monitor(struct work_struct *work)
>>> +{
>>> + static unsigned long last_polled_at;
>>> + struct devfreq *devfreq, *tmp;
>>> + int error;
>>> + unsigned long jiffies_passed;
>>> + unsigned long next_jiffies = ULONG_MAX, now = jiffies;
>>> +
>>> + /* Initially last_polled_at = 0, polling every device at bootup */
>>> + jiffies_passed = now - last_polled_at;
>>> + last_polled_at = now;
>>> + if (jiffies_passed == 0)
>>> + jiffies_passed = 1;
>>> +
>>> + mutex_lock(&devfreq_list_lock);
>>
>> Should not lock the list here. If we lock the list for all major
>> operations, it nullifies the performance benefit of having a mutex in
>> struct devfreq.
>>
>
> Ok... then.. how about locking like this? :
>
> mutex_lock(&devfreq_list_lock);
> list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
> mutex_lock(&devfreq->lock);
> mutex_unlock(&devfreq_list_lock);
>
> blahblah
>
> mutex_unlock(&devfreq->lock);
> mutex_lock(&devfreq_list_lock);
> }
> mutex_unlock(&devfreq_list_lock);
I took a step back from the code to rethink the big picture, and I've
come back to the same conclusion conclusion that I had in the V5
patchset. (https://patchwork.kernel.org/patch/1043442/)
Firstly, there is no reason to walk the list here. All of the locking
discussion here could just go away if we didn't walk the list of
devfreq devices every time the delay_work gets fired. If each device
programmed it's own delayed work then this problem simply goes away.
If you don't mind I'd like to post an RFC of devfreq with these
changes implemented so we can review them and discuss the ideas around
some concrete code.
Best regards,
Mike
^ permalink raw reply
* Re: [PATCH 1/1] OMAP: omap_device: only override _noirq methods, not normal suspend/resume
From: Arnd Bergmann @ 2011-09-01 16:00 UTC (permalink / raw)
To: linux-arm-kernel; +Cc: linux-pm, linux-omap
In-Reply-To: <1314229437-31836-2-git-send-email-khilman@ti.com>
On Thursday 25 August 2011, Kevin Hilman wrote:
> --- a/arch/arm/plat-omap/omap_device.c
> +++ b/arch/arm/plat-omap/omap_device.c
> @@ -622,7 +622,8 @@ static struct dev_pm_domain omap_device_pm_domain = {
> SET_RUNTIME_PM_OPS(_od_runtime_suspend, _od_runtime_resume,
> _od_runtime_idle)
> USE_PLATFORM_PM_SLEEP_OPS
> - SET_SYSTEM_SLEEP_PM_OPS(_od_suspend_noirq, _od_resume_noirq)
> + .suspend_noirq = _od_suspend_noirq,
> + .resume_noirq = _od_resume_noirq,
> }
> };
This breaks if CONFIG_SUSPEND is not set and the _od_suspend_noirq/_od_resume_noirq
functions are not defined.
Arnd
^ permalink raw reply
* Re: [PATCH 0/5] PM: Generic PM domains and device PM QoS
From: Jean Pihet @ 2011-09-01 15:28 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <201108310017.03103.rjw@sisk.pl>
Rafael,
On Wed, Aug 31, 2011 at 12:17 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Hi,
>
> This patchset illustrates how device PM QoS may be used along with
> PM domains in my view.
>
> Actually, it consists of two parts. Namely, patches [1-3/5] seem to be
> suitable for 3.2, unless somebody hates them,
The patches [1-3/5] are ok (reviewed only) excepted some remarks I have.
> but patches [4-5/5] are
> total RFC. They haven't been tested, only compiled, so the use of them
> is not encouraged (they may kill your dog or make your cat go wild, or
> do something equally nasty, so beware).
That looks like a disclaimer ;p
> Their purpose is to illustrate
> an idea that I'd like to discuss at the PM miniconference during the
> LPC.
There is some code for OMAP that dynamically updates the worst case
values for devices activation and de-activation;
cf._omap_device_activate and _omap_device_deactivate in
arch/arm/plat-omap/omap_device.c. The idea is to start with reference
figures (worst case measured on board) at boot and then update the
worst case values at runtime.
Based on the PM QoS values and the worst case latency values the next
power domains states can be determined. Unfortunately this is not
(yet) implemented.
I am wondering if the patches [4-5/5] are meant to replace the OMAP
code, which would be really nice.
>
> [1/5] - Split device PM domain data into a "base" and additional fields
> (one need_restore field at the moment, but the subsequent patches
> add more fields).
>
> [2/5] - Make runtime PM always release power.lock if power.irq_safe is set.
>
> [3/5] - Add function for reading device PM QoS values safely.
>
> [4/5] - Add governor function for stopping devices.
>
> [5/5] - Add generic PM domain power off governor function.
>
> Thanks,
> Rafael
>
>
Regards,
Jean
^ permalink raw reply
* Re: [RFC][PATCH 5/5] PM / Domains: Add default power off governor function
From: Jean Pihet @ 2011-09-01 15:17 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <201108310022.43850.rjw@sisk.pl>
Rafael,
On Wed, Aug 31, 2011 at 12:22 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> Add a function deciding whether or not a given PM domain should
> be powered off on the basis of that domain's devices' PM QoS
> constraints.
>
> ---
> drivers/base/power/domain_governor.c | 96 +++++++++++++++++++++++++++++++++++
> include/linux/pm_domain.h | 7 ++
> 2 files changed, 103 insertions(+)
>
> Index: linux/include/linux/pm_domain.h
> ===================================================================
> --- linux.orig/include/linux/pm_domain.h
> +++ linux/include/linux/pm_domain.h
> @@ -49,6 +49,10 @@ struct generic_pm_domain {
> int (*start_device)(struct device *dev);
> int (*stop_device)(struct device *dev);
> bool (*active_wakeup)(struct device *dev);
> + ktime_t power_off_latency;
> + ktime_t power_on_latency;
> + s64 break_even_ns;
> + s64 min_delta_ns;
How are those values populated? Is there a mechanism that dynamically
updates the values?
> };
>
> static inline struct generic_pm_domain *pd_to_genpd(struct dev_pm_domain *pd)
> @@ -64,6 +68,9 @@ struct gpd_link {
> };
>
> struct gpd_gov_dev_data {
> + ktime_t start_latency;
> + ktime_t suspend_latency;
> + ktime_t resume_latency;
Same question here.
> s64 break_even_ns;
> };
>
...
Regards,
Jean
^ permalink raw reply
* Re: [PATCH 3/5] PM / QoS: Add function dev_pm_qos_read_value()
From: Jean Pihet @ 2011-09-01 15:13 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux PM mailing list, LKML, Linux-sh list
In-Reply-To: <201108310021.13800.rjw@sisk.pl>
Hi Rafael,
On Wed, Aug 31, 2011 at 12:21 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> From: Rafael J. Wysocki <rjw@sisk.pl>
>
> To read the current PM QoS value for a given device we need to
> make sure that the device's power.constraints object won't be
> removed while we're doing that. For this reason, put the
> operation under dev->power.lock and acquire the lock
> around the initialization and removal of power.constraints.
Ok.
> Moreover, since we're using the value of power.constraints to
> determine whether or not the object is present, the
> power.constraints_state field isn't necessary any more and may be
> removed. However, dev_pm_qos_add_request() needs to check if the
> device is being removed from the system before allocating a new
> PM QoS constraints object for it, so it has to use device_pm_lock()
> and the device PM QoS initialization and destruction should be done
> under device_pm_lock() as well.
Ok that makes sense.
The constraints_state field can be replaced by a combination of
dev->power.constraints and list_empty(&dev->power.entry), which makes
the code more compact and less redundant.
>
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
> drivers/base/power/main.c | 4 -
> drivers/base/power/qos.c | 167 ++++++++++++++++++++++++++--------------------
> include/linux/pm.h | 8 --
> include/linux/pm_qos.h | 3
> 4 files changed, 101 insertions(+), 81 deletions(-)
>
> Index: linux/drivers/base/power/qos.c
> ===================================================================
> --- linux.orig/drivers/base/power/qos.c
> +++ linux/drivers/base/power/qos.c
> @@ -30,15 +30,6 @@
...
>
> @@ -178,8 +202,8 @@ void dev_pm_qos_constraints_destroy(stru
> *
> * Returns 1 if the aggregated constraint value has changed,
> * 0 if the aggregated constraint value has not changed,
> - * -EINVAL in case of wrong parameters, -ENODEV if the device has been
> - * removed from the system
> + * -EINVAL in case of wrong parameters, -ENOMEM if there's not enough memory
> + * to allocate for data structures.
Why not use -ENODEV in case there is no device?
> */
> int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> s32 value)
> @@ -195,28 +219,35 @@ int dev_pm_qos_add_request(struct device
> return -EINVAL;
> }
>
> - mutex_lock(&dev_pm_qos_mtx);
> req->dev = dev;
>
> - /* Return if the device has been removed */
> - if (req->dev->power.constraints_state == DEV_PM_QOS_NO_DEVICE) {
> - ret = -ENODEV;
> - goto out;
> - }
> + device_pm_lock();
> + mutex_lock(&dev_pm_qos_mtx);
>
> - /*
> - * Allocate the constraints data on the first call to add_request,
> - * i.e. only if the data is not already allocated and if the device has
> - * not been removed
> - */
> - if (dev->power.constraints_state == DEV_PM_QOS_DEVICE_PRESENT)
> - ret = dev_pm_qos_constraints_allocate(dev);
> + if (dev->power.constraints) {
> + device_pm_unlock();
> + } else {
> + if (list_empty(&dev->power.entry)) {
> + /* The device has been removed from the system. */
> + device_pm_unlock();
> + goto out;
0 is silently returned in case the device has been removed. Is that
the intention?
> + } else {
> + device_pm_unlock();
> + /*
> + * Allocate the constraints data on the first call to
> + * add_request, i.e. only if the data is not already
> + * allocated and if the device has not been removed.
> + */
> + ret = dev_pm_qos_constraints_allocate(dev);
> + }
> + }
>
> if (!ret)
> ret = apply_constraint(req, PM_QOS_ADD_REQ, value);
>
> -out:
> + out:
> mutex_unlock(&dev_pm_qos_mtx);
> +
> return ret;
> }
> EXPORT_SYMBOL_GPL(dev_pm_qos_add_request);
...
Thanks,
Jean
^ permalink raw reply
* Re: [PATCH 6/6] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: Frederic Weisbecker @ 2011-09-01 12:58 UTC (permalink / raw)
To: Tejun Heo
Cc: containers, lizf, linux-kernel, Oleg Nesterov, linux-pm, paul,
kamezawa.hiroyu
In-Reply-To: <20110901112221.GA2752@htj.dyndns.org>
On Thu, Sep 01, 2011 at 08:22:21PM +0900, Tejun Heo wrote:
> Hello,
>
> On Wed, Aug 31, 2011 at 03:42:24PM +0200, Frederic Weisbecker wrote:
> > My task counter subsystem patchset brings a cancel_attach_task() callback
> > that cancels can_attach_task() effects.
> >
> > I thought that rebased on top of your patch it's going to be merged inside
> > cancel_attach() but OTOH we can't cancel the effect of failed migration
> > on a thread that way.
> >
> > May be we need to keep a cancel_attach_task() just for that purpose?
>
> We can do that but I think that becomes a bit too complex and fragile.
> That path won't be traveled unless it races against exit. Bugs will
> be difficult to detect and reproduce. In this respect, the current
> code already seems racy. ->can_attach (or other methods in the attach
> path) and ->exit can race each other and I don't think all subsystems
> handle that properly.
I guess subsystems don't care about that currently, although I haven't
checked. But the task counter will need this per thread cancellation in
migration failure, without the need for any synchronization between
can_attach, attach and exit.
Now if we want to solve this with a synchronization there, either
we task_lock() every tasks in the group in the beginning of cgroup_attach_proc().
But that's not very nice. Or we use cgroup_mutex on cgroup_exit() exit,
but that's even worse.
I guess we need the leader->sighand->siglock on both cgroup_attach_proc()
and cgroup_exit(). Besides we may have more reasons to have that:
https://lkml.org/lkml/2011/8/15/342
>
> IMHO the right thing to do here is simplifying synchronization rules
> so that nothing else happens while migration is in progress.
>
> Thanks.
>
> --
> tejun
^ permalink raw reply
* Re: [PATCH 6/6] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
From: Tejun Heo @ 2011-09-01 11:22 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: containers, lizf, linux-kernel, linux-pm, paul, kamezawa.hiroyu
In-Reply-To: <20110831134220.GB20598@somewhere>
Hello,
On Wed, Aug 31, 2011 at 03:42:24PM +0200, Frederic Weisbecker wrote:
> My task counter subsystem patchset brings a cancel_attach_task() callback
> that cancels can_attach_task() effects.
>
> I thought that rebased on top of your patch it's going to be merged inside
> cancel_attach() but OTOH we can't cancel the effect of failed migration
> on a thread that way.
>
> May be we need to keep a cancel_attach_task() just for that purpose?
We can do that but I think that becomes a bit too complex and fragile.
That path won't be traveled unless it races against exit. Bugs will
be difficult to detect and reproduce. In this respect, the current
code already seems racy. ->can_attach (or other methods in the attach
path) and ->exit can race each other and I don't think all subsystems
handle that properly.
IMHO the right thing to do here is simplifying synchronization rules
so that nothing else happens while migration is in progress.
Thanks.
--
tejun
^ permalink raw reply
* Re: [RFC PATCH v1] ACPI S3 to work under Xen.
From: Tian, Kevin @ 2011-09-01 6:58 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, x86@kernel.org, tglx@linutronix.de,
tboot-devel
Cc: xen-devel@lists.xensource.com
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Thursday, September 01, 2011 2:31 AM
>
> Attached is an RFC set of patches to enable S3 to work with the Xen hypervisor.
>
> The relationship that Xen has with Linux kernel is symbiotic. The Linux
> kernel does the ACPI "stuff" and tells the hypervisor to do the low-level
> stuff (such as program the IOAPIC, setup vectors, etc). The realm of
> ACPI S3 is more complex as we need to save the CPU state (and Intel TXT
> values - which the hypervisor has to do) and then restore them.
>
> The major difficulties we hit was with 'acpi_suspend_lowlevel' - which tweaks
> a lot of lowlevel values and some of them are not properly handled by Xen.
> Liang Tang has figured which ones of them we trip over (read below) - and he
> suggested that perhaps we can provide a registration mechanism to abstract
> this away.
that's not a problem for Xen. On bare metal the processor state handled off
from BIOS after resume (e.g. real mode in many cases) doesn't match the
execution environment expected by Linux. So Linux has to save original context
and then restore it after resume.
However for Xen, Linux doesn't talk to BIOS directly. Instead it simply walks
through necessary ACPI methods, and then issues a hypercall to Xen which
then further completes the remaining suspend steps. So here ACPI S3 is
exactly same as any other hypercall path, where processor context will be
saved/restored by Xen automatically.
>
> So the attached patches do exactly that - there are two entry points
> in the ACPI.
>
> 1). For S3: acpi_suspend_lowlevel -> .. lots of code -> acpi_enter_sleep_state
> 2). For S1/S4/S5: acpi_enter_sleep_state
>
> The first naive idea was of abstracting away in the 'acpi_enter_sleep_state'
> function the tboot_sleep code so that we can use it too. And low-behold - it
> worked splendidly for powering off (S5 I believe)
>
> For S3 that did not work - during suspend the hypervisor tripped over when
> saving cr8. During resume it tripped over at restoring the cr3, cr8, idt,
> and gdt values.
>
> What do you guys think? One thought is to use the paravirt interface to
> deal with cr3, cr8, idt, gdt for suspend/resume case.. But that is a lot
> of extra 'if' in the paravirt code - which the callback registration would
> effectively do the same thing as the paravirt - except at a higher level.
>
So tweaking at a higher level is a right thing to do.
Thanks
Kevin
^ permalink raw reply
* Re: [PATCH v9 2/4] PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
From: MyungJoo Ham @ 2011-09-01 4:51 UTC (permalink / raw)
To: Turquette, Mike
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
In-Reply-To: <CAJOA=zM+MOAd9cCo04EzsegvkbOHRykZpiyappPptpC95b18=w@mail.gmail.com>
On Thu, Sep 1, 2011 at 5:05 AM, Turquette, Mike <mturquette@ti.com> wrote:
> On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> [snip]
>> +/**
>> + * get_devfreq() - find devfreq struct. a wrapped find_device_devfreq()
>> + * with mutex protection. exported for governors
>> + * @dev: device pointer used to lookup device devfreq.
>> + */
>> +struct devfreq *get_devfreq(struct device *dev)
>> +{
>> + struct devfreq *ret;
>> +
>> + mutex_lock(&devfreq_list_lock);
>> + ret = find_device_devfreq(dev);
>> + mutex_unlock(&devfreq_list_lock);
>
> You prevent changes to the devfreq list while searching (good) but
> after returning the pointer there is no protection from that item
> being removed from the list. Generally "get" and "put" functions do
> more than just return a pointer: get functions often increment a
> refcount, or hold a lock. The put function will decrement the
> refcount or release the lock. Maybe you want something like the
> following:
>
> mutex_lock(&devfreq_list_lock);
> ret = find_device_devfreq(dev);
> mutex_lock(&devfreq->lock);
> mutex_unlock(&devfreq_list_lock);
>
> Then you need a corresponding put which does the mutex_unlock(&devfreq->lock).
>
> It looks like the only consumers of get_devfreq are the sysfs
> show/store interfaces, which immediately hold devfreq->lock, so the
> above proposal certainly makes fits the existing use case.
>
> Also CPUfreq's "cpufreq_get" function does a nice job of protecting
> the object from getting freed with a rw_semaphore. It is a bit more
> "complicated" but makes for good reading.
>
Thank you. The possibility that someone may do something on struct
devfreq after mutex_unlock(&devfreq_list_lock) and before
mutex_lock(&devfreq->lock) bothered me and it appears that I need to
add devfreq_put() anyway. I hesitated it because users might forget
using devfreq_put() after calling devfreq_get(); however, it is just
same as forgetting mutex_unlock after mutex_lock. So I wouldn't mind
that much.
The next devfreq_get() will do
> mutex_lock(&devfreq_list_lock);
> ret = find_device_devfreq(dev);
> mutex_lock(&devfreq->lock);
> mutex_unlock(&devfreq_list_lock);
and devfreq_put() will do
> mutex_unlock(&devfreq->lock);
as you've suggested.
>> +static int devfreq_do(struct devfreq *devfreq)
>> +{
>> + struct opp *opp;
>> + unsigned long freq;
>> + int err;
>> +
>
> Put the mutex_is_locked check here? See more below.
>
[]
>> +static int devfreq_update(struct notifier_block *nb, unsigned long type,
>> + void *devp)
>> +{
>> + struct devfreq *devfreq = container_of(nb, struct devfreq, nb);
>> + int ret;
>> +
>> + mutex_lock(&devfreq->lock);
>> + ret = update_devfreq(devfreq);
>> + mutex_unlock(&devfreq->lock);
>
> The whole devfreq_update/update_devfreq pairing is redundant.
> update_devfreq's purpose is to make sure the lock is held before going
> further, and the only caller of update_devfreq is devfreq_update which
> always holds the lock.
>
> This still doesn't stop a bad driver writer from just calling
> devfreq_do with an extern. Perhaps the lock detection should be moved
> into devfreq_do and update_devfreq should go away?
>
- update_devfreq: extern for governors
- devfreq_update: notifier callback for OPP
- devfreq_do: internal function of devfreq.
Anyway, as you've mentioned, it seems I'd better rename devfreq_do as
update_devfreq and make it exported with mutex check.
[]
>> +static void devfreq_monitor(struct work_struct *work)
>> +{
>> + static unsigned long last_polled_at;
>> + struct devfreq *devfreq, *tmp;
>> + int error;
>> + unsigned long jiffies_passed;
>> + unsigned long next_jiffies = ULONG_MAX, now = jiffies;
>> +
>> + /* Initially last_polled_at = 0, polling every device at bootup */
>> + jiffies_passed = now - last_polled_at;
>> + last_polled_at = now;
>> + if (jiffies_passed == 0)
>> + jiffies_passed = 1;
>> +
>> + mutex_lock(&devfreq_list_lock);
>
> Should not lock the list here. If we lock the list for all major
> operations, it nullifies the performance benefit of having a mutex in
> struct devfreq.
>
Ok... then.. how about locking like this? :
mutex_lock(&devfreq_list_lock);
list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
mutex_lock(&devfreq->lock);
mutex_unlock(&devfreq_list_lock);
blahblah
mutex_unlock(&devfreq->lock);
mutex_lock(&devfreq_list_lock);
}
mutex_unlock(&devfreq_list_lock);
Anyway, there is one more problem with allowing
unlocked-devfreq_list_lock in the loop.
list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) is safe
for the removal of devfreq from the list in the loop.
However, it is not safe against the removal of devfreq's next member
in the loop and while devfreq_list_lock is unlocked,
devfreq_remove_device may remove that one; thus, breaking the
list_for_each_entry_safe loop.
Such break is prevent by adding one more mutex_lock/mutex_unlock to
the loop for tmp->lock. However, if we do mutex_unlock(&tmp->lock)
before mutex_lock(&devfreq_list_lock), we still have the same
breaking-the-loop issue and if we do it after
mutex_lock(&devfreq_list_lock), we have a deadlock issue (someone
might have locked devfreq_list_lock and waiting to lock tmp->lock).
Thus, we will need to block devfreq_remove_device at devfreq_monitor
whlie unlocking devfreq_list in the loop. Other operations (add /
list) on the list are fine for it.
So, the loop will be:
mutex_lock(&devfreq_list_lock);
prohibit_devfreq_remove = true;
list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
mutex_lock(&devfreq->lock);
mutex_unlock(&devfreq_list_lock);
blahblah
mutex_unlock(&devfreq->lock);
mutex_lock(&devfreq_list_lock);
}
prohibit_devfreq_remove = false;
mutex_unlock(&devfreq_list_lock);
[]
>> + /* Remove a devfreq with an error. */
>> + if (error && error != -EAGAIN) {
>> + dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n",
>> + error, devfreq->governor->name);
>> +
>> + list_del(&devfreq->node);
>> + mutex_unlock(&devfreq->lock);
>> + kfree(devfreq);
>> + continue;
>
> Should this error handling also unregister the OPP notifier? This
> code duplicates portions of devfreq_remove_device. I propose instead
> here we do the following:
>
> mutex_unlock(&devfreq->lock);
> _devfreq_remove_lock(devfreq);
> /* this locks the list first,
> * then locks devfreq->lock,
> * then does the house cleaning
> */
> continue;
Ah.. that was missing. Thanks!
I'll make a _devfreq_remove_device(struct device *dev, bool
call_by_monitor) and let devfreq_remove_device use it.
>
>> + }
>> + devfreq->next_polling = devfreq->polling_jiffies;
>> +
>> + /* No more polling required (polling_ms changed) */
>> + if (devfreq->next_polling == 0) {
>> + mutex_unlock(&devfreq->lock);
>> + continue;
>> + }
>> + } else {
>> + devfreq->next_polling -= jiffies_passed;
>> + }
>> +
>> + next_jiffies = (next_jiffies > devfreq->next_polling) ?
>> + devfreq->next_polling : next_jiffies;
>> +
>> + mutex_unlock(&devfreq->lock);
>> + }
>> +
>> + if (next_jiffies > 0 && next_jiffies < ULONG_MAX) {
>> + polling = true;
>> + queue_delayed_work(devfreq_wq, &devfreq_work, next_jiffies);
>> + } else {
>> + polling = false;
>> + }
>> +
>> + mutex_unlock(&devfreq_list_lock);
>
> Again, list should not be locked over this whole function. It blocks
> other unrelated devfreq devices from scaling.
>
>> +}
>
> [snip]
>
>> +/**
>> + * devfreq_remove_device() - Remove devfreq feature from a device.
>> + * @device: the device to remove devfreq feature.
>> + */
>> +int devfreq_remove_device(struct device *dev)
>
> Why does this take a struct device*? Shouldn't it be a struct devfreq*?
It is because devfreq_add_device() does not return struct devfreq.
struct devfreq is not visible to device drivers. It is visible to
governors.
>
> If there is a case for removing a devfreq device with only struct
> device* as input, how about:
>
> int devfreq_remove_device(struct device *dev)
> {
> struct devfreq *devfreq;
> mutex_lock(&devfreq_list_lock);
> devfreq = find_device_devfreq(dev);
> mutex_unlock(&devfreq_list_lock);
>
> return _devfreq_remove_device(struct devfreq *df);
> /* _devfreq_remove_device does the real work and can also be
> called from devfreq_monitor */
> }
Sure, I'll do so and reduce the redundancy.
>
> Regards,
> Mike
>
[]
Thank you so much!
Cheers,
MyungJoo
--
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* Re: [PATCH v9 3/4] PM / devfreq: add common sysfs interfaces
From: Turquette, Mike @ 2011-08-31 21:27 UTC (permalink / raw)
To: MyungJoo Ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
In-Reply-To: <1314775779-21399-4-git-send-email-myungjoo.ham@samsung.com>
On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
> Device specific sysfs interface /sys/devices/.../power/devfreq_*
> - governor R: name of governor
> - cur_freq R: current frequency
> - max_freq R: maximum operable frequency
> - min_freq R: minimum operable frequency
> - polling_interval R: polling interval in ms given with devfreq profile
> W: update polling interval.
>
> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>
> --
> Changes from v8
> - applied per-devfreq locking mechanism
>
> Changes from v7
> - removed set_freq from the common devfreq interface
> - added get_devfreq, a mutex-protected wrapper for find_device_devfreq
> (for sysfs interfaces and later with governor-support)
> - corrected ABI documentation.
>
> Changes from v6
> - poling_interval is writable.
>
> Changes from v5
> - updated devferq_update usage.
>
> Changes from v4
> - removed system-wide sysfs interface
> - removed tickling sysfs interface
> - added set_freq for userspace governor (and any other governors that
> require user input)
>
> Changes from v3
> - corrected sysfs API usage
> - corrected error messages
> - moved sysfs entry location
> - added sysfs entries
>
> Changes from v2
> - add ABI entries for devfreq sysfs interface
> ---
> Documentation/ABI/testing/sysfs-devices-power | 37 +++++
> drivers/devfreq/devfreq.c | 203 +++++++++++++++++++++++++
> 2 files changed, 240 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
> index 8ffbc25..57f4591 100644
> --- a/Documentation/ABI/testing/sysfs-devices-power
> +++ b/Documentation/ABI/testing/sysfs-devices-power
> @@ -165,3 +165,40 @@ Description:
>
> Not all drivers support this attribute. If it isn't supported,
> attempts to read or write it will yield I/O errors.
> +
> +What: /sys/devices/.../power/devfreq_governor
> +Date: July 2011
> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> + The /sys/devices/.../power/devfreq_governor shows the name
> + of the governor used by the corresponding device.
> +
> +What: /sys/devices/.../power/devfreq_cur_freq
> +Date: July 2011
> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> + The /sys/devices/.../power/devfreq_cur_freq shows the current
> + frequency of the corresponding device.
> +
> +What: /sys/devices/.../power/devfreq_max_freq
> +Date: July 2011
> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> + The /sys/devices/.../power/devfreq_max_freq shows the
> + maximum operable frequency of the corresponding device.
> +
> +What: /sys/devices/.../power/devfreq_min_freq
> +Date: July 2011
> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> + The /sys/devices/.../power/devfreq_min_freq shows the
> + minimum operable frequency of the corresponding device.
> +
> +What: /sys/devices/.../power/devfreq_polling_interval
> +Date: July 2011
> +Contact: MyungJoo Ham <myungjoo.ham@samsung.com>
> +Description:
> + The /sys/devices/.../power/devfreq_polling_interval sets and
> + shows the requested polling interval of the corresponding
> + device. The values are represented in ms. If the value is less
> + than 1 jiffy, it is considered to be 0, which means no polling.
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 621b863..1c46052 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -37,6 +37,8 @@ static struct delayed_work devfreq_work;
> static LIST_HEAD(devfreq_list);
> static DEFINE_MUTEX(devfreq_list_lock);
>
> +static struct attribute_group dev_attr_group;
> +
> /**
> * find_device_devfreq() - find devfreq struct using device pointer
> * @dev: device pointer used to lookup device devfreq.
> @@ -191,6 +193,8 @@ static void devfreq_monitor(struct work_struct *work)
> dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n",
> error, devfreq->governor->name);
>
> + sysfs_unmerge_group(&devfreq->dev->kobj,
> + &dev_attr_group);
> list_del(&devfreq->node);
> mutex_unlock(&devfreq->lock);
> kfree(devfreq);
> @@ -293,6 +297,8 @@ int devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile,
> queue_delayed_work(devfreq_wq, &devfreq_work,
> devfreq->next_polling);
> }
> +
> + sysfs_merge_group(&dev->kobj, &dev_attr_group);
> mutex_unlock(&devfreq->lock);
> goto out;
> err_init:
> @@ -333,6 +339,8 @@ int devfreq_remove_device(struct device *dev)
> goto out;
> }
>
> + sysfs_unmerge_group(&dev->kobj, &dev_attr_group);
> +
> list_del(&devfreq->node);
>
> if (devfreq->governor->exit)
> @@ -346,6 +354,201 @@ out:
> return 0;
> }
>
> +static ssize_t show_governor(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct devfreq *df;
> + ssize_t ret;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + ret = PTR_ERR(df);
> + goto out;
> + }
> +
> + mutex_lock(&df->lock);
> + if (!df->governor) {
> + ret = -EINVAL;
> + goto out_l;
> + }
> +
> + ret = sprintf(buf, "%s\n", df->governor->name);
> +out_l:
> + mutex_unlock(&df->lock);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return ret;
> +}
> +
> +static ssize_t show_freq(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct devfreq *df;
> + ssize_t ret;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + ret = PTR_ERR(df);
> + goto out;
> + }
> +
> + ret = sprintf(buf, "%lu\n", df->previous_freq);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return ret;
> +}
> +
> +static ssize_t show_max_freq(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct devfreq *df;
> + ssize_t ret;
> + unsigned long freq = ULONG_MAX;
> + struct opp *opp;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + ret = PTR_ERR(df);
> + goto out;
> + }
> +
> + mutex_lock(&df->lock);
> + opp = opp_find_freq_floor(df->dev, &freq);
> + if (IS_ERR(opp)) {
> + ret = PTR_ERR(opp);
> + goto out_l;
> + }
> +
> + ret = sprintf(buf, "%lu\n", freq);
> +out_l:
> + mutex_unlock(&df->lock);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return ret;
> +}
> +
> +static ssize_t show_min_freq(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct devfreq *df;
> + ssize_t ret;
> + unsigned long freq = 0;
> + struct opp *opp;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + ret = PTR_ERR(df);
> + goto out;
> + }
> +
> + mutex_lock(&df->lock);
> + opp = opp_find_freq_ceil(df->dev, &freq);
> + if (IS_ERR(opp)) {
> + ret = PTR_ERR(opp);
> + goto out_l;
> + }
> +
> + ret = sprintf(buf, "%lu\n", freq);
> +out_l:
> + mutex_unlock(&df->lock);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return ret;
> +}
> +
> +static ssize_t show_polling_interval(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct devfreq *df;
> + ssize_t ret;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + ret = PTR_ERR(df);
> + goto out;
> + }
> +
> + mutex_lock(&df->lock);
> + if (!df->profile) {
> + ret = -EINVAL;
> + goto out_l;
> + }
> +
> + ret = sprintf(buf, "%d\n", df->profile->polling_ms);
> +out_l:
> + mutex_unlock(&df->lock);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return ret;
> +}
> +
> +static ssize_t store_polling_interval(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct devfreq *df;
> + unsigned int value;
> + int ret;
> +
> + mutex_lock(&devfreq_list_lock);
> + df = find_device_devfreq(dev);
> + if (IS_ERR(df)) {
> + count = PTR_ERR(df);
> + goto out;
> + }
> + mutex_lock(&df->lock);
> + if (!df->profile) {
> + count = -EINVAL;
> + goto out_l;
> + }
> +
> + ret = sscanf(buf, "%u", &value);
> + if (ret != 1) {
> + count = -EINVAL;
> + goto out_l;
> + }
> +
> + df->profile->polling_ms = value;
> + df->next_polling = df->polling_jiffies
> + = msecs_to_jiffies(value);
> +
> + if (df->next_polling > 0 && !polling) {
> + polling = true;
> + queue_delayed_work(devfreq_wq, &devfreq_work,
> + df->next_polling);
> + }
> +out_l:
> + mutex_unlock(&df->lock);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR(devfreq_governor, 0444, show_governor, NULL);
> +static DEVICE_ATTR(devfreq_cur_freq, 0444, show_freq, NULL);
> +static DEVICE_ATTR(devfreq_max_freq, 0444, show_max_freq, NULL);
> +static DEVICE_ATTR(devfreq_min_freq, 0444, show_min_freq, NULL);
> +static DEVICE_ATTR(devfreq_polling_interval, 0644, show_polling_interval,
Instead of DEVICE_ATTR, why don't you create your own ktype specific
to devfreq? That would also mean that you don't have to do the struct
device * conversions to get struct devfreq * everytime (which requires
locking and walking the list).
Regards,
Mike
> + store_polling_interval);
> +static struct attribute *dev_entries[] = {
> + &dev_attr_devfreq_governor.attr,
> + &dev_attr_devfreq_cur_freq.attr,
> + &dev_attr_devfreq_max_freq.attr,
> + &dev_attr_devfreq_min_freq.attr,
> + &dev_attr_devfreq_polling_interval.attr,
> + NULL,
> +};
> +static struct attribute_group dev_attr_group = {
> + .name = power_group_name,
> + .attrs = dev_entries,
> +};
> +
> /**
> * devfreq_init() - Initialize data structure for devfreq framework and
> * start polling registered devfreq devices.
> --
> 1.7.4.1
>
>
^ permalink raw reply
* Re: [PATCH v9 2/4] PM: Introduce devfreq: generic DVFS framework with device-specific OPPs
From: Turquette, Mike @ 2011-08-31 20:05 UTC (permalink / raw)
To: MyungJoo Ham
Cc: Len Brown, Greg Kroah-Hartman, Kyungmin Park, linux-pm,
Thomas Gleixner
In-Reply-To: <1314775779-21399-3-git-send-email-myungjoo.ham@samsung.com>
On Wed, Aug 31, 2011 at 12:29 AM, MyungJoo Ham <myungjoo.ham@samsung.com> wrote:
[snip]
> +/**
> + * get_devfreq() - find devfreq struct. a wrapped find_device_devfreq()
> + * with mutex protection. exported for governors
> + * @dev: device pointer used to lookup device devfreq.
> + */
> +struct devfreq *get_devfreq(struct device *dev)
> +{
> + struct devfreq *ret;
> +
> + mutex_lock(&devfreq_list_lock);
> + ret = find_device_devfreq(dev);
> + mutex_unlock(&devfreq_list_lock);
You prevent changes to the devfreq list while searching (good) but
after returning the pointer there is no protection from that item
being removed from the list. Generally "get" and "put" functions do
more than just return a pointer: get functions often increment a
refcount, or hold a lock. The put function will decrement the
refcount or release the lock. Maybe you want something like the
following:
mutex_lock(&devfreq_list_lock);
ret = find_device_devfreq(dev);
mutex_lock(&devfreq->lock);
mutex_unlock(&devfreq_list_lock);
Then you need a corresponding put which does the mutex_unlock(&devfreq->lock).
It looks like the only consumers of get_devfreq are the sysfs
show/store interfaces, which immediately hold devfreq->lock, so the
above proposal certainly makes fits the existing use case.
Also CPUfreq's "cpufreq_get" function does a nice job of protecting
the object from getting freed with a rw_semaphore. It is a bit more
"complicated" but makes for good reading.
> +
> + return ret;
> +}
> +
> +/**
> + * devfreq_do() - Check the usage profile of a given device and configure
> + * frequency and voltage accordingly
> + * @devfreq: devfreq info of the given device
> + */
> +static int devfreq_do(struct devfreq *devfreq)
> +{
> + struct opp *opp;
> + unsigned long freq;
> + int err;
> +
Put the mutex_is_locked check here? See more below.
> + err = devfreq->governor->get_target_freq(devfreq, &freq);
> + if (err)
> + return err;
> +
> + opp = opp_find_freq_ceil(devfreq->dev, &freq);
> + if (opp == ERR_PTR(-ENODEV))
> + opp = opp_find_freq_floor(devfreq->dev, &freq);
> +
> + if (IS_ERR(opp))
> + return PTR_ERR(opp);
> +
> + if (devfreq->previous_freq == freq)
> + return 0;
> +
> + err = devfreq->profile->target(devfreq->dev, opp);
> + if (err)
> + return err;
> +
> + devfreq->previous_freq = freq;
> + return 0;
> +}
> +
> +/**
> + * update_devfreq() - Notify that the device OPP or frequency requirement
> + * has been changed. This function is exported for governors.
> + * @devfreq: the devfreq instance.
> + *
> + * Note: lock devfreq->lock before calling update_devfreq
> + */
> +int update_devfreq(struct devfreq *devfreq)
> +{
> + int err = 0;
> +
> + if (!mutex_is_locked(&devfreq->lock)) {
> + WARN(true, "devfreq->lock must be locked by the caller.\n");
> + return -EINVAL;
> + }
> +
> + /* Reevaluate the proper frequency */
> + err = devfreq_do(devfreq);
> + return err;
> +}
> +
> +/**
> + * devfreq_update() - Notify that the device OPP has been changed.
> + * @dev: the device whose OPP has been changed.
> + *
> + * Called by OPP notifier.
> + */
> +static int devfreq_update(struct notifier_block *nb, unsigned long type,
> + void *devp)
> +{
> + struct devfreq *devfreq = container_of(nb, struct devfreq, nb);
> + int ret;
> +
> + mutex_lock(&devfreq->lock);
> + ret = update_devfreq(devfreq);
> + mutex_unlock(&devfreq->lock);
The whole devfreq_update/update_devfreq pairing is redundant.
update_devfreq's purpose is to make sure the lock is held before going
further, and the only caller of update_devfreq is devfreq_update which
always holds the lock.
This still doesn't stop a bad driver writer from just calling
devfreq_do with an extern. Perhaps the lock detection should be moved
into devfreq_do and update_devfreq should go away?
> +
> + return ret;
> +}
> +
> +/**
> + * devfreq_monitor() - Periodically run devfreq_do()
> + * @work: the work struct used to run devfreq_monitor periodically.
> + *
> + */
> +static void devfreq_monitor(struct work_struct *work)
> +{
> + static unsigned long last_polled_at;
> + struct devfreq *devfreq, *tmp;
> + int error;
> + unsigned long jiffies_passed;
> + unsigned long next_jiffies = ULONG_MAX, now = jiffies;
> +
> + /* Initially last_polled_at = 0, polling every device at bootup */
> + jiffies_passed = now - last_polled_at;
> + last_polled_at = now;
> + if (jiffies_passed == 0)
> + jiffies_passed = 1;
> +
> + mutex_lock(&devfreq_list_lock);
Should not lock the list here. If we lock the list for all major
operations, it nullifies the performance benefit of having a mutex in
struct devfreq.
> +
> + list_for_each_entry_safe(devfreq, tmp, &devfreq_list, node) {
> + mutex_lock(&devfreq->lock);
> +
> + if (devfreq->next_polling == 0) {
> + mutex_unlock(&devfreq->lock);
> + continue;
> + }
> +
> + /*
> + * Reduce more next_polling if devfreq_wq took an extra
> + * delay. (i.e., CPU has been idled.)
> + */
> + if (devfreq->next_polling <= jiffies_passed) {
> + error = devfreq_do(devfreq);
> +
> + /* Remove a devfreq with an error. */
> + if (error && error != -EAGAIN) {
> + dev_err(devfreq->dev, "Due to devfreq_do error(%d), devfreq(%s) is removed from the device\n",
> + error, devfreq->governor->name);
> +
> + list_del(&devfreq->node);
> + mutex_unlock(&devfreq->lock);
> + kfree(devfreq);
> + continue;
Should this error handling also unregister the OPP notifier? This
code duplicates portions of devfreq_remove_device. I propose instead
here we do the following:
mutex_unlock(&devfreq->lock);
_devfreq_remove_lock(devfreq);
/* this locks the list first,
* then locks devfreq->lock,
* then does the house cleaning
*/
continue;
> + }
> + devfreq->next_polling = devfreq->polling_jiffies;
> +
> + /* No more polling required (polling_ms changed) */
> + if (devfreq->next_polling == 0) {
> + mutex_unlock(&devfreq->lock);
> + continue;
> + }
> + } else {
> + devfreq->next_polling -= jiffies_passed;
> + }
> +
> + next_jiffies = (next_jiffies > devfreq->next_polling) ?
> + devfreq->next_polling : next_jiffies;
> +
> + mutex_unlock(&devfreq->lock);
> + }
> +
> + if (next_jiffies > 0 && next_jiffies < ULONG_MAX) {
> + polling = true;
> + queue_delayed_work(devfreq_wq, &devfreq_work, next_jiffies);
> + } else {
> + polling = false;
> + }
> +
> + mutex_unlock(&devfreq_list_lock);
Again, list should not be locked over this whole function. It blocks
other unrelated devfreq devices from scaling.
> +}
[snip]
> +/**
> + * devfreq_remove_device() - Remove devfreq feature from a device.
> + * @device: the device to remove devfreq feature.
> + */
> +int devfreq_remove_device(struct device *dev)
Why does this take a struct device*? Shouldn't it be a struct devfreq*?
If there is a case for removing a devfreq device with only struct
device* as input, how about:
int devfreq_remove_device(struct device *dev)
{
struct devfreq *devfreq;
mutex_lock(&devfreq_list_lock);
devfreq = find_device_devfreq(dev);
mutex_unlock(&devfreq_list_lock);
return _devfreq_remove_device(struct devfreq *df);
/* _devfreq_remove_device does the real work and can also be
called from devfreq_monitor */
}
Regards,
Mike
> +{
> + struct devfreq *devfreq;
> + struct srcu_notifier_head *nh;
> + int err = 0;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + mutex_lock(&devfreq_list_lock);
> + devfreq = find_device_devfreq(dev);
> + if (IS_ERR(devfreq)) {
> + err = PTR_ERR(devfreq);
> + goto out;
> + }
> +
> + mutex_lock(&devfreq->lock);
> + nh = opp_get_notifier(dev);
> + if (IS_ERR(nh)) {
> + err = PTR_ERR(nh);
> + mutex_unlock(&devfreq->lock);
> + goto out;
> + }
> +
> + list_del(&devfreq->node);
> +
> + if (devfreq->governor->exit)
> + devfreq->governor->exit(devfreq);
> +
> + srcu_notifier_chain_unregister(nh, &devfreq->nb);
> + mutex_unlock(&devfreq->lock);
> + kfree(devfreq);
> +out:
> + mutex_unlock(&devfreq_list_lock);
> + return 0;
> +}
[snip]
^ permalink raw reply
* Re: [BUG] Soft-lockup during cpu-hotplug in VFS callpaths
From: Maciej Rutecki @ 2011-08-31 18:40 UTC (permalink / raw)
To: Srivatsa S. Bhat; +Cc: linux-fsdevel, linux-pm, linux-kernel
In-Reply-To: <4E550057.9070609@linux.vnet.ibm.com>
On środa, 24 sierpnia 2011 o 15:44:55 Srivatsa S. Bhat wrote:
> Hi,
>
> While running stressful cpu hotplug tests along with kernel compilation
> running in background, soft-lockups are detected on multiple CPUs.
> Sometimes this also leads to hard lockups and kernel panic.
> All the soft-lockups seem to occur at vfsmount_lock_local_cpu() or other
> VFS callpaths.
>
>
> [37108.410813] BUG: soft lockup - CPU#5 stuck for 22s! [cc1:29669]
> <snip>
> [37108.694781] Call Trace:
> [37108.697306] [<ffffffff81199e70>] ?
> vfsmount_lock_local_lock_cpu+0x70/0x70 [37108.704258]
> [<ffffffff81187cb5>] path_init+0x315/0x400
> [37108.709558] [<ffffffff8127c398>] ? __raw_spin_lock_init+0x38/0x70
> [37108.715812] [<ffffffff8118961c>] path_openat+0x8c/0x3f0
> [37108.721203] [<ffffffff81012129>] ? sched_clock+0x9/0x10
> [37108.726597] [<ffffffff8109416d>] ? sched_clock_cpu+0xcd/0x110
> [37108.732508] [<ffffffff810a178d>] ? trace_hardirqs_off+0xd/0x10
> [37108.738498] [<ffffffff8109421f>] ? local_clock+0x6f/0x80
> [37108.743970] [<ffffffff81189a99>] do_filp_open+0x49/0xa0
> [37108.749362] [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
> [37108.754665] [<ffffffff8152584b>] ? _raw_spin_unlock+0x2b/0x40
> [37108.760575] [<ffffffff811982f3>] ? alloc_fd+0xc3/0x210
> [37108.765875] [<ffffffff81179607>] do_sys_open+0x107/0x1e0
> [37108.771352] [<ffffffff810d610f>] ? audit_syscall_entry+0x1bf/0x1f0
> [37108.777695] [<ffffffff81179720>] sys_open+0x20/0x30
> [37108.782741] [<ffffffff8152e202>] system_call_fastpath+0x16/0x1b
>
> Kernel version: 3.0.1, 3.0.3
> Hardware: Dual socket quad-core hyper-threaded Intel x86 machine
> Scenario:
> (a) Stressful cpu hotplug tests + kernel compilation
>
> (b) IRQ balancing had been disabled and all the IRQs were made to be
> routed to CPU 0 (except the ones that couldn't be routed).
>
> (c) Lockdep was enabled during kernel configuration.
>
> Steps (b) and (c) were done to dig deeper into the issue. However the same
> issue was observed by just doing step (a).
>
> Definitely there seems to be a race condition occurring here, because this
> issue is hit after sometime, after starting the tests. And the time it
> takes to hit the issue increases as we increase the number of debug print
> statements. In some cases (especially when the number of debug print
> statements were quite high), the stress on the machine had to be increased
> in order to hit the issue within measurable time. In my tests, a maximum
> of about 2 to 2.5 hours was sufficient, to hit this bug.
>
> Please find the console log attached with this mail.
>
> Any ideas on how to go about fixing this bug?
It is a regression?
Regards
--
Maciej Rutecki
http://www.maciek.unixy.pl
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply
* [PATCH 7/7] xen/acpi/sleep: Register to the acpi_suspend_lowlevel a callback.
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
We piggyback on "x86/acpi: Provide registration for acpi_suspend_lowlevel."
to register a Xen version of the callback. The callback does not
do anything special - except it omits the x86_acpi_suspend_lowlevel.
It does that b/c during suspend it tries to save cr8 values (which
the hypervisor does not support), and then on resume path the
cr3, cr8, idt, and gdt are all resumed which clashes with what
the hypervisor has set up for the guest.
Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
include/xen/acpi.h | 14 +++++++++++++-
1 files changed, 13 insertions(+), 1 deletions(-)
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index e414f14..0409919 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -12,10 +12,22 @@ int xen_acpi_notify_hypervisor_state(u8 sleep_state,
u32 pm1a_cnt, u32 pm1b_cnd,
bool *skip_rest);
+static inline int xen_acpi_suspend_lowlevel(void)
+{
+ /*
+ * Xen will save and restore CPU context, so
+ * we can skip that and just go straight to
+ * the suspend.
+ */
+ acpi_enter_sleep_state(ACPI_STATE_S3);
+ return 0;
+}
static inline void xen_acpi_sleep_register(void)
{
- if (xen_initial_domain())
+ if (xen_initial_domain()) {
+ acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
__acpi_override_sleep = xen_acpi_notify_hypervisor_state;
+ }
}
#else
static inline void xen_acpi_sleep_register(void)
--
1.7.4.1
^ permalink raw reply related
* [PATCH 6/7] xen/acpi/sleep: Enable ACPI sleep via the __acpi_override_sleep
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
Provide the registration callback to call in the Xen's
ACPI sleep functionality. This means that during S3/S5
we make a hypercall XENPF_enter_acpi_sleep with the
proper PM1A/PM1B registers.
Based of Ke Yu's <ke.yu@intel.com> initial idea.
[ From http://xenbits.xensource.com/linux-2.6.18-xen.hg
change c68699484a65 ]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/include/asm/xen/hypercall.h | 8 ++++++++
arch/x86/xen/enlighten.c | 3 +++
drivers/xen/Makefile | 2 +-
drivers/xen/acpi.c | 25 +++++++++++++++++++++++++
include/xen/acpi.h | 26 ++++++++++++++++++++++++++
5 files changed, 63 insertions(+), 1 deletions(-)
create mode 100644 drivers/xen/acpi.c
create mode 100644 include/xen/acpi.h
diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h
index d240ea9..0c9894e 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -45,6 +45,7 @@
#include <xen/interface/xen.h>
#include <xen/interface/sched.h>
#include <xen/interface/physdev.h>
+#include <xen/interface/platform.h>
/*
* The hypercall asms have to meet several constraints:
@@ -299,6 +300,13 @@ HYPERVISOR_set_timer_op(u64 timeout)
}
static inline int
+HYPERVISOR_dom0_op(struct xen_platform_op *platform_op)
+{
+ platform_op->interface_version = XENPF_INTERFACE_VERSION;
+ return _hypercall1(int, dom0_op, platform_op);
+}
+
+static inline int
HYPERVISOR_set_debugreg(int reg, unsigned long value)
{
return _hypercall2(int, set_debugreg, reg, value);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5525163..6962653 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -42,6 +42,7 @@
#include <xen/page.h>
#include <xen/hvm.h>
#include <xen/hvc-console.h>
+#include <xen/acpi.h>
#include <asm/paravirt.h>
#include <asm/apic.h>
@@ -1250,6 +1251,8 @@ asmlinkage void __init xen_start_kernel(void)
} else {
/* Make sure ACS will be enabled */
pci_request_acs();
+
+ xen_acpi_sleep_register();
}
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index bbc1825..370552d 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -16,7 +16,7 @@ obj-$(CONFIG_XENFS) += xenfs/
obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
obj-$(CONFIG_XEN_PLATFORM_PCI) += xen-platform-pci.o
obj-$(CONFIG_SWIOTLB_XEN) += swiotlb-xen.o
-obj-$(CONFIG_XEN_DOM0) += pci.o
+obj-$(CONFIG_XEN_DOM0) += pci.o acpi.o
xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
new file mode 100644
index 0000000..c0f829f
--- /dev/null
+++ b/drivers/xen/acpi.c
@@ -0,0 +1,25 @@
+#include <xen/acpi.h>
+#include <xen/interface/platform.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+int xen_acpi_notify_hypervisor_state(u8 sleep_state,
+ u32 pm1a_cnt, u32 pm1b_cnt,
+ bool *skip_rest)
+{
+ struct xen_platform_op op = {
+ .cmd = XENPF_enter_acpi_sleep,
+ .interface_version = XENPF_INTERFACE_VERSION,
+ .u = {
+ .enter_acpi_sleep = {
+ .pm1a_cnt_val = (u16)pm1a_cnt,
+ .pm1b_cnt_val = (u16)pm1b_cnt,
+ .sleep_state = sleep_state,
+ },
+ },
+ };
+ if (skip_rest)
+ *skip_rest = true;
+
+ return HYPERVISOR_dom0_op(&op);
+}
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
new file mode 100644
index 0000000..e414f14
--- /dev/null
+++ b/include/xen/acpi.h
@@ -0,0 +1,26 @@
+#ifndef _XEN_ACPI_H
+#define _XEN_ACPI_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_XEN_DOM0
+#include <asm/xen/hypervisor.h>
+#include <xen/xen.h>
+#include <linux/acpi.h>
+
+int xen_acpi_notify_hypervisor_state(u8 sleep_state,
+ u32 pm1a_cnt, u32 pm1b_cnd,
+ bool *skip_rest);
+
+static inline void xen_acpi_sleep_register(void)
+{
+ if (xen_initial_domain())
+ __acpi_override_sleep = xen_acpi_notify_hypervisor_state;
+}
+#else
+static inline void xen_acpi_sleep_register(void)
+{
+}
+#endif
+
+#endif /* _XEN_ACPI_H */
--
1.7.4.1
^ permalink raw reply related
* [PATCH 5/7] xen/acpi: Domain0 acpi parser related platform hypercall
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
From: Yu Ke <ke.yu@intel.com>
This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI info to hypervisor.
Signed-off-by: Yu Ke <ke.yu@intel.com>
Signed-off-by: Tian Kevin <kevin.tian@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Added DEFINE_GUEST.. in appropiate headers]
[v2: Ripped out typedefs]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/ia64/include/asm/xen/interface.h | 1 +
arch/x86/include/asm/xen/interface.h | 1 +
include/xen/interface/platform.h | 320 +++++++++++++++++++++++++++++++++
include/xen/interface/xen.h | 1 +
4 files changed, 323 insertions(+), 0 deletions(-)
create mode 100644 include/xen/interface/platform.h
diff --git a/arch/ia64/include/asm/xen/interface.h b/arch/ia64/include/asm/xen/interface.h
index e951e74..1d2427d 100644
--- a/arch/ia64/include/asm/xen/interface.h
+++ b/arch/ia64/include/asm/xen/interface.h
@@ -76,6 +76,7 @@ DEFINE_GUEST_HANDLE(char);
DEFINE_GUEST_HANDLE(int);
DEFINE_GUEST_HANDLE(long);
DEFINE_GUEST_HANDLE(void);
+DEFINE_GUEST_HANDLE(uint64_t);
typedef unsigned long xen_pfn_t;
DEFINE_GUEST_HANDLE(xen_pfn_t);
diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h
index 5d4922a..a1f2db5 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -55,6 +55,7 @@ DEFINE_GUEST_HANDLE(char);
DEFINE_GUEST_HANDLE(int);
DEFINE_GUEST_HANDLE(long);
DEFINE_GUEST_HANDLE(void);
+DEFINE_GUEST_HANDLE(uint64_t);
#endif
#ifndef HYPERVISOR_VIRT_START
diff --git a/include/xen/interface/platform.h b/include/xen/interface/platform.h
new file mode 100644
index 0000000..c168468
--- /dev/null
+++ b/include/xen/interface/platform.h
@@ -0,0 +1,320 @@
+/******************************************************************************
+ * platform.h
+ *
+ * Hardware platform operations. Intended for use by domain-0 kernel.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (c) 2002-2006, K Fraser
+ */
+
+#ifndef __XEN_PUBLIC_PLATFORM_H__
+#define __XEN_PUBLIC_PLATFORM_H__
+
+#include "xen.h"
+
+#define XENPF_INTERFACE_VERSION 0x03000001
+
+/*
+ * Set clock such that it would read <secs,nsecs> after 00:00:00 UTC,
+ * 1 January, 1970 if the current system time was <system_time>.
+ */
+#define XENPF_settime 17
+struct xenpf_settime {
+ /* IN variables. */
+ uint32_t secs;
+ uint32_t nsecs;
+ uint64_t system_time;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_settime_t);
+
+/*
+ * Request memory range (@mfn, @mfn+@nr_mfns-1) to have type @type.
+ * On x86, @type is an architecture-defined MTRR memory type.
+ * On success, returns the MTRR that was used (@reg) and a handle that can
+ * be passed to XENPF_DEL_MEMTYPE to accurately tear down the new setting.
+ * (x86-specific).
+ */
+#define XENPF_add_memtype 31
+struct xenpf_add_memtype {
+ /* IN variables. */
+ unsigned long mfn;
+ uint64_t nr_mfns;
+ uint32_t type;
+ /* OUT variables. */
+ uint32_t handle;
+ uint32_t reg;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_add_memtype_t);
+
+/*
+ * Tear down an existing memory-range type. If @handle is remembered then it
+ * should be passed in to accurately tear down the correct setting (in case
+ * of overlapping memory regions with differing types). If it is not known
+ * then @handle should be set to zero. In all cases @reg must be set.
+ * (x86-specific).
+ */
+#define XENPF_del_memtype 32
+struct xenpf_del_memtype {
+ /* IN variables. */
+ uint32_t handle;
+ uint32_t reg;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_del_memtype_t);
+
+/* Read current type of an MTRR (x86-specific). */
+#define XENPF_read_memtype 33
+struct xenpf_read_memtype {
+ /* IN variables. */
+ uint32_t reg;
+ /* OUT variables. */
+ unsigned long mfn;
+ uint64_t nr_mfns;
+ uint32_t type;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_read_memtype_t);
+
+#define XENPF_microcode_update 35
+struct xenpf_microcode_update {
+ /* IN variables. */
+ GUEST_HANDLE(void) data; /* Pointer to microcode data */
+ uint32_t length; /* Length of microcode data. */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_microcode_update_t);
+
+#define XENPF_platform_quirk 39
+#define QUIRK_NOIRQBALANCING 1 /* Do not restrict IO-APIC RTE targets */
+#define QUIRK_IOAPIC_BAD_REGSEL 2 /* IO-APIC REGSEL forgets its value */
+#define QUIRK_IOAPIC_GOOD_REGSEL 3 /* IO-APIC REGSEL behaves properly */
+struct xenpf_platform_quirk {
+ /* IN variables. */
+ uint32_t quirk_id;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_platform_quirk_t);
+
+#define XENPF_firmware_info 50
+#define XEN_FW_DISK_INFO 1 /* from int 13 AH=08/41/48 */
+#define XEN_FW_DISK_MBR_SIGNATURE 2 /* from MBR offset 0x1b8 */
+#define XEN_FW_VBEDDC_INFO 3 /* from int 10 AX=4f15 */
+struct xenpf_firmware_info {
+ /* IN variables. */
+ uint32_t type;
+ uint32_t index;
+ /* OUT variables. */
+ union {
+ struct {
+ /* Int13, Fn48: Check Extensions Present. */
+ uint8_t device; /* %dl: bios device number */
+ uint8_t version; /* %ah: major version */
+ uint16_t interface_support; /* %cx: support bitmap */
+ /* Int13, Fn08: Legacy Get Device Parameters. */
+ uint16_t legacy_max_cylinder; /* %cl[7:6]:%ch: max cyl # */
+ uint8_t legacy_max_head; /* %dh: max head # */
+ uint8_t legacy_sectors_per_track; /* %cl[5:0]: max sector # */
+ /* Int13, Fn41: Get Device Parameters (as filled into %ds:%esi). */
+ /* NB. First uint16_t of buffer must be set to buffer size. */
+ GUEST_HANDLE(void) edd_params;
+ } disk_info; /* XEN_FW_DISK_INFO */
+ struct {
+ uint8_t device; /* bios device number */
+ uint32_t mbr_signature; /* offset 0x1b8 in mbr */
+ } disk_mbr_signature; /* XEN_FW_DISK_MBR_SIGNATURE */
+ struct {
+ /* Int10, AX=4F15: Get EDID info. */
+ uint8_t capabilities;
+ uint8_t edid_transfer_time;
+ /* must refer to 128-byte buffer */
+ GUEST_HANDLE(uchar) edid;
+ } vbeddc_info; /* XEN_FW_VBEDDC_INFO */
+ } u;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_firmware_info_t);
+
+#define XENPF_enter_acpi_sleep 51
+struct xenpf_enter_acpi_sleep {
+ /* IN variables */
+ uint16_t pm1a_cnt_val; /* PM1a control value. */
+ uint16_t pm1b_cnt_val; /* PM1b control value. */
+ uint32_t sleep_state; /* Which state to enter (Sn). */
+ uint32_t flags; /* Must be zero. */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_enter_acpi_sleep_t);
+
+#define XENPF_change_freq 52
+struct xenpf_change_freq {
+ /* IN variables */
+ uint32_t flags; /* Must be zero. */
+ uint32_t cpu; /* Physical cpu. */
+ uint64_t freq; /* New frequency (Hz). */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_change_freq_t);
+
+/*
+ * Get idle times (nanoseconds since boot) for physical CPUs specified in the
+ * @cpumap_bitmap with range [0..@cpumap_nr_cpus-1]. The @idletime array is
+ * indexed by CPU number; only entries with the corresponding @cpumap_bitmap
+ * bit set are written to. On return, @cpumap_bitmap is modified so that any
+ * non-existent CPUs are cleared. Such CPUs have their @idletime array entry
+ * cleared.
+ */
+#define XENPF_getidletime 53
+struct xenpf_getidletime {
+ /* IN/OUT variables */
+ /* IN: CPUs to interrogate; OUT: subset of IN which are present */
+ GUEST_HANDLE(uchar) cpumap_bitmap;
+ /* IN variables */
+ /* Size of cpumap bitmap. */
+ uint32_t cpumap_nr_cpus;
+ /* Must be indexable for every cpu in cpumap_bitmap. */
+ GUEST_HANDLE(uint64_t) idletime;
+ /* OUT variables */
+ /* System time when the idletime snapshots were taken. */
+ uint64_t now;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_getidletime_t);
+
+#define XENPF_set_processor_pminfo 54
+
+/* ability bits */
+#define XEN_PROCESSOR_PM_CX 1
+#define XEN_PROCESSOR_PM_PX 2
+#define XEN_PROCESSOR_PM_TX 4
+
+/* cmd type */
+#define XEN_PM_CX 0
+#define XEN_PM_PX 1
+#define XEN_PM_TX 2
+
+/* Px sub info type */
+#define XEN_PX_PCT 1
+#define XEN_PX_PSS 2
+#define XEN_PX_PPC 4
+#define XEN_PX_PSD 8
+
+struct xen_power_register {
+ uint32_t space_id;
+ uint32_t bit_width;
+ uint32_t bit_offset;
+ uint32_t access_size;
+ uint64_t address;
+};
+
+struct xen_processor_csd {
+ uint32_t domain; /* domain number of one dependent group */
+ uint32_t coord_type; /* coordination type */
+ uint32_t num; /* number of processors in same domain */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_processor_csd);
+
+struct xen_processor_cx {
+ struct xen_power_register reg; /* GAS for Cx trigger register */
+ uint8_t type; /* cstate value, c0: 0, c1: 1, ... */
+ uint32_t latency; /* worst latency (ms) to enter/exit this cstate */
+ uint32_t power; /* average power consumption(mW) */
+ uint32_t dpcnt; /* number of dependency entries */
+ GUEST_HANDLE(xen_processor_csd) dp; /* NULL if no dependency */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_processor_cx);
+
+struct xen_processor_flags {
+ uint32_t bm_control:1;
+ uint32_t bm_check:1;
+ uint32_t has_cst:1;
+ uint32_t power_setup_done:1;
+ uint32_t bm_rld_set:1;
+};
+
+struct xen_processor_power {
+ uint32_t count; /* number of C state entries in array below */
+ struct xen_processor_flags flags; /* global flags of this processor */
+ GUEST_HANDLE(xen_processor_cx) states; /* supported c states */
+};
+
+struct xen_pct_register {
+ uint8_t descriptor;
+ uint16_t length;
+ uint8_t space_id;
+ uint8_t bit_width;
+ uint8_t bit_offset;
+ uint8_t reserved;
+ uint64_t address;
+};
+
+struct xen_processor_px {
+ uint64_t core_frequency; /* megahertz */
+ uint64_t power; /* milliWatts */
+ uint64_t transition_latency; /* microseconds */
+ uint64_t bus_master_latency; /* microseconds */
+ uint64_t control; /* control value */
+ uint64_t status; /* success indicator */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_processor_px);
+
+struct xen_psd_package {
+ uint64_t num_entries;
+ uint64_t revision;
+ uint64_t domain;
+ uint64_t coord_type;
+ uint64_t num_processors;
+};
+
+struct xen_processor_performance {
+ uint32_t flags; /* flag for Px sub info type */
+ uint32_t platform_limit; /* Platform limitation on freq usage */
+ struct xen_pct_register control_register;
+ struct xen_pct_register status_register;
+ uint32_t state_count; /* total available performance states */
+ GUEST_HANDLE(xen_processor_px) states;
+ struct xen_psd_package domain_info;
+ uint32_t shared_type; /* coordination type of this processor */
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_processor_performance);
+
+struct xenpf_set_processor_pminfo {
+ /* IN variables */
+ uint32_t id; /* ACPI CPU ID */
+ uint32_t type; /* {XEN_PM_CX, XEN_PM_PX} */
+ union {
+ struct xen_processor_power power;/* Cx: _CST/_CSD */
+ struct xen_processor_performance perf; /* Px: _PPC/_PCT/_PSS/_PSD */
+ };
+};
+DEFINE_GUEST_HANDLE_STRUCT(xenpf_set_processor_pminfo);
+
+struct xen_platform_op {
+ uint32_t cmd;
+ uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
+ union {
+ struct xenpf_settime settime;
+ struct xenpf_add_memtype add_memtype;
+ struct xenpf_del_memtype del_memtype;
+ struct xenpf_read_memtype read_memtype;
+ struct xenpf_microcode_update microcode;
+ struct xenpf_platform_quirk platform_quirk;
+ struct xenpf_firmware_info firmware_info;
+ struct xenpf_enter_acpi_sleep enter_acpi_sleep;
+ struct xenpf_change_freq change_freq;
+ struct xenpf_getidletime getidletime;
+ struct xenpf_set_processor_pminfo set_pminfo;
+ uint8_t pad[128];
+ } u;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_platform_op_t);
+
+#endif /* __XEN_PUBLIC_PLATFORM_H__ */
diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h
index 70213b4..d83cc08 100644
--- a/include/xen/interface/xen.h
+++ b/include/xen/interface/xen.h
@@ -453,6 +453,7 @@ struct start_info {
/* These flags are passed in the 'flags' field of start_info_t. */
#define SIF_PRIVILEGED (1<<0) /* Is the domain privileged? */
#define SIF_INITDOMAIN (1<<1) /* Is this the initial control domain? */
+#define SIF_PM_MASK (0xFF<<8) /* reserve 1 byte for xen-pm options */
typedef uint64_t cpumap_t;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 4/7] xen: Utilize the restore_msi_irqs hook.
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
to make a hypercall to restore the vectors in the MSI/MSI-X
configuration space.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/pci/xen.c | 12 ++++++++++++
include/xen/interface/physdev.h | 7 +++++++
2 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index f567965..f140999 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -241,6 +241,17 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
out:
return ret;
}
+
+static void xen_initdom_restore_msi_irqs(struct pci_dev *dev, int irq)
+{
+ int ret = 0;
+ struct physdev_restore_msi restore;
+
+ restore.bus = dev->bus->number;
+ restore.devfn = dev->devfn;
+ ret = HYPERVISOR_physdev_op(PHYSDEVOP_restore_msi, &restore);
+ WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret);
+}
#endif
#endif
@@ -458,6 +469,7 @@ static int __init pci_xen_initial_domain(void)
#ifdef CONFIG_PCI_MSI
x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+ x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
#endif
xen_setup_acpi_sci();
__acpi_register_gsi = acpi_register_gsi_xen;
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index 534cac8..44aefa9 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -144,6 +144,13 @@ struct physdev_manage_pci {
uint8_t devfn;
};
+#define PHYSDEVOP_restore_msi 19
+struct physdev_restore_msi {
+ /* IN */
+ uint8_t bus;
+ uint8_t devfn;
+};
+
#define PHYSDEVOP_manage_pci_add_ext 20
struct physdev_manage_pci_ext {
/* IN */
--
1.7.4.1
^ permalink raw reply related
* [PATCH 3/7] x86/acpi/sleep: Provide registration for acpi_suspend_lowlevel.
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
From: Liang Tang <liang.tang@oracle.com>
Which by default will be x86_acpi_suspend_lowlevel.
This registration allows us to register another callback
if there is a need to use another platform specific callback.
CC: Thomas Gleixner <tglx@linutronix.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: Len Brown <len.brown@intel.com>
CC: Joseph Cihula <joseph.cihula@intel.com>
CC: Shane Wang <shane.wang@intel.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: Len Brown <len.brown@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liang Tang <liang.tang@oracle.com>
---
arch/x86/include/asm/acpi.h | 2 +-
arch/x86/kernel/acpi/boot.c | 2 ++
arch/x86/kernel/acpi/sleep.c | 4 ++--
arch/x86/kernel/acpi/sleep.h | 2 ++
drivers/acpi/sleep.c | 2 ++
5 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 49864a1..a5f0b73 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -118,7 +118,7 @@ static inline void acpi_disable_pci(void)
}
/* Low-level suspend routine. */
-extern int acpi_suspend_lowlevel(void);
+extern int (*acpi_suspend_lowlevel)(void);
extern const unsigned char acpi_wakeup_code[];
#define acpi_wakeup_address (__pa(TRAMPOLINE_SYM(acpi_wakeup_code)))
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index d191b4c..92f4f38 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -44,6 +44,7 @@
#include <asm/mpspec.h>
#include <asm/smp.h>
+#include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
static int __initdata acpi_force = 0;
u32 acpi_rsdt_forced;
int acpi_disabled;
@@ -555,6 +556,7 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
u32 pm1b_ctrl, bool *skip_rest) = NULL;
+int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
/*
* success: return IRQ number (>=0)
* failure: return < 0
diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index 103b6ab..4d2d0b1 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -25,12 +25,12 @@ static char temp_stack[4096];
#endif
/**
- * acpi_suspend_lowlevel - save kernel state
+ * x86_acpi_suspend_lowlevel - save kernel state
*
* Create an identity mapped page table and copy the wakeup routine to
* low memory.
*/
-int acpi_suspend_lowlevel(void)
+int x86_acpi_suspend_lowlevel(void)
{
struct wakeup_header *header;
/* address in low memory of the wakeup routine. */
diff --git a/arch/x86/kernel/acpi/sleep.h b/arch/x86/kernel/acpi/sleep.h
index 416d4be..4d3feb5 100644
--- a/arch/x86/kernel/acpi/sleep.h
+++ b/arch/x86/kernel/acpi/sleep.h
@@ -13,3 +13,5 @@ extern unsigned long acpi_copy_wakeup_routine(unsigned long);
extern void wakeup_long64(void);
extern void do_suspend_lowlevel(void);
+
+extern int x86_acpi_suspend_lowlevel(void);
diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 6c94960..a6da454 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -254,6 +254,8 @@ static int acpi_suspend_enter(suspend_state_t pm_state)
break;
case ACPI_STATE_S3:
+ if (!acpi_suspend_lowlevel)
+ return -ENODEV;
error = acpi_suspend_lowlevel();
if (error)
return error;
--
1.7.4.1
^ permalink raw reply related
* [PATCH 2/7] x86, acpi, tboot: Have a ACPI sleep override instead of calling tboot_sleep.
From: Konrad Rzeszutek Wilk @ 2011-08-31 18:31 UTC (permalink / raw)
To: x86, tglx, tboot-devel, shane.wang, linux-pm, linux-acpi,
len.brown
Cc: xen-devel, Konrad Rzeszutek Wilk
In-Reply-To: <1314815484-4668-1-git-send-email-konrad.wilk@oracle.com>
The ACPI suspend path makes a call to tboot_sleep right before
it writes the PM1A, PM1B values. We replace the direct call to
tboot via an registration callback similar to __acpi_register_gsi.
CC: Thomas Gleixner <tglx@linutronix.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
CC: Len Brown <len.brown@intel.com>
CC: Joseph Cihula <joseph.cihula@intel.com>
CC: Shane Wang <shane.wang@intel.com>
CC: xen-devel@lists.xensource.com
CC: linux-pm@lists.linux-foundation.org
CC: tboot-devel@lists.sourceforge.net
CC: linux-acpi@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/include/asm/acpi.h | 3 +++
arch/x86/kernel/acpi/boot.c | 3 +++
arch/x86/kernel/tboot.c | 13 +++++++++----
drivers/acpi/acpica/hwsleep.c | 12 ++++++++++--
include/linux/tboot.h | 3 ++-
5 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 610001d..49864a1 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -98,6 +98,9 @@ void acpi_pic_sci_set_trigger(unsigned int, u16);
extern int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
int trigger, int polarity);
+extern int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
+ u32 pm1b_ctrl, bool *skip_rest);
+
static inline void disable_acpi(void)
{
acpi_disabled = 1;
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 4558f0d..d191b4c 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -552,6 +552,9 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
int trigger, int polarity) = acpi_register_gsi_pic;
+int (*__acpi_override_sleep)(u8 sleep_state, u32 pm1a_ctrl,
+ u32 pm1b_ctrl, bool *skip_rest) = NULL;
+
/*
* success: return IRQ number (>=0)
* failure: return < 0
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 30ac65d..a18070c 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -41,7 +41,7 @@
#include <asm/setup.h>
#include <asm/e820.h>
#include <asm/io.h>
-
+#include <linux/acpi.h>
#include "acpi/realmode/wakeup.h"
/* Global pointer to shared data; NULL means no measured launch. */
@@ -270,7 +270,8 @@ static void tboot_copy_fadt(const struct acpi_table_fadt *fadt)
offsetof(struct acpi_table_facs, firmware_waking_vector);
}
-void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
+int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
+ bool *skip_rest)
{
static u32 acpi_shutdown_map[ACPI_S_STATE_COUNT] = {
/* S0,1,2: */ -1, -1, -1,
@@ -279,7 +280,7 @@ void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
/* S5: */ TB_SHUTDOWN_S5 };
if (!tboot_enabled())
- return;
+ return AE_OK;
tboot_copy_fadt(&acpi_gbl_FADT);
tboot->acpi_sinfo.pm1a_cnt_val = pm1a_control;
@@ -290,10 +291,12 @@ void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control)
if (sleep_state >= ACPI_S_STATE_COUNT ||
acpi_shutdown_map[sleep_state] == -1) {
pr_warning("unsupported sleep state 0x%x\n", sleep_state);
- return;
+ return AE_ERROR;
}
tboot_shutdown(acpi_shutdown_map[sleep_state]);
+
+ return AE_OK;
}
static atomic_t ap_wfs_count;
@@ -343,6 +346,8 @@ static __init int tboot_late_init(void)
atomic_set(&ap_wfs_count, 0);
register_hotcpu_notifier(&tboot_cpu_notifier);
+
+ __acpi_override_sleep = tboot_sleep;
return 0;
}
diff --git a/drivers/acpi/acpica/hwsleep.c b/drivers/acpi/acpica/hwsleep.c
index 2ac28bb..31d1198 100644
--- a/drivers/acpi/acpica/hwsleep.c
+++ b/drivers/acpi/acpica/hwsleep.c
@@ -45,7 +45,6 @@
#include <acpi/acpi.h>
#include "accommon.h"
#include "actables.h"
-#include <linux/tboot.h>
#define _COMPONENT ACPI_HARDWARE
ACPI_MODULE_NAME("hwsleep")
@@ -343,8 +342,17 @@ acpi_status asmlinkage acpi_enter_sleep_state(u8 sleep_state)
ACPI_FLUSH_CPU_CACHE();
- tboot_sleep(sleep_state, pm1a_control, pm1b_control);
+ if (__acpi_override_sleep) {
+ bool skip_rest = false;
+ status = __acpi_override_sleep(sleep_state, pm1a_control,
+ pm1b_control, &skip_rest);
+
+ if (ACPI_FAILURE(status))
+ return_ACPI_STATUS(status);
+ if (skip_rest)
+ return_ACPI_STATUS(AE_OK);
+ }
/* Write #2: Write both SLP_TYP + SLP_EN */
status = acpi_hw_write_pm1_control(pm1a_control, pm1b_control);
diff --git a/include/linux/tboot.h b/include/linux/tboot.h
index 1dba6ee..19badbd 100644
--- a/include/linux/tboot.h
+++ b/include/linux/tboot.h
@@ -143,7 +143,8 @@ static inline int tboot_enabled(void)
extern void tboot_probe(void);
extern void tboot_shutdown(u32 shutdown_type);
-extern void tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control);
+extern int tboot_sleep(u8 sleep_state, u32 pm1a_control, u32 pm1b_control,
+ bool *skip);
extern struct acpi_table_header *tboot_get_dmar_table(
struct acpi_table_header *dmar_tbl);
extern int tboot_force_iommu(void);
--
1.7.4.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox