Linux ACPI

Linux ACPI
 help / color / mirror / Atom feed

* Re: USB runtime D3
From: Alan Stern @ 2009-07-30 18:57 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-usb, linux-acpi
In-Reply-To: <20090730020623.GB26389@srcf.ucam.org>

On Thu, 30 Jul 2009, Matthew Garrett wrote:

> I've been playing with putting host controllers into D3 when the root 
> hub is suspended, with a certain amount of success. My current 
> implementation simply calls the suspend and resume functions in the HCD 
> device's pm struct on bus_suspend and bus_resume. It's also necessary to 
> flag the ACPI GPEs for the HCDs as runtime and enabled in order to get 
> wakeup events. This is good enough for already plugged devices to work, 
> and running lsusb powers up any idle ports and notices that devices have 
> been plugged in. I also get an event when devices are unplugged.
> 
> This is obviously not entirely ideal, for a couple of reasons.
> 
> The first is that UHCI works perfectly providing I provide a remote 
> wakeup. An SCI is fired via ACPI, a notification is sent to the 
> appropriate device and we resume it quite happily. Unplugs also trigger 
> the ACPI event. However, plugging in doesn't generate any kind of wakeup 
> event, even though port0en and port1n are set in USB_RES. Is there any 
> way to get UHCI to do this? I'm guessing that the power is being cut to 
> the port when it suspends with no device connected.

No, that doesn't sound right.  On the other hand, there's only one way 
to find out for certain.  Voltmeters do come in handy at times...

UHCI isn't very flexible or configurable.  It doesn't sound like 
there's anything more you can do about this.

> The second is that EHCI generates a PME# notification on device plugin. 
> This triggers an SCI and the PME# notification shows up as a GPE event. 
> This causes the kernel to evaluate _L0D. On this Dell, that looks 
> something like this:
> 
> _L0D {
> 	SMI(foo, Local0)
> 	if (Local0 & 0x1)
> 		Notify(some hardware)
> 	if (Local0 & 0x2)
> 		Notify(EHCI)
> 	if (Local0 & 0x4)
> 		Notify(EHCI 2)
> }
> 
> However, Local0 ends up being 0 and no notification is fired. I'm 
> therefore not able to acknowledge the PME and it fires again. This is 
> obviously not ideal. Does anyone have any idea why this might be? The 
> fact that this ends up in SMI code obviously makes things more awkward.
> 
> I've thought of a way that this could be made to work, but it's kind of 
> hacky. A generic GPE handler could be registered and then scan all PCIe 
> devices for a raised PME bit.

Something like that is probably needed anyway -- for all PCI devices, 
not just PCIe.  You have to do this on systems that support PCI-PM but 
don't have ACPI.

> It could then flag it and send a 
> notification to the attached driver. EHCI could then force a rescan of 
> all currently powered off USB devices.

It should send the notification for each such device to the bus
subsystem (i.e., the PCI core).  The bus is then responsible for
telling the drivers what to do.

> The downside to this (other than it being a hack) is that I can't see an 
> obvious way to work out what the appropriate GPE is. On Intel, internal 
> PCI devices generate an event on GPE 0xd - but the PCI object only 
> claims 0xb in its _PRW object.
> 
> Anyone have any ideas? I've attached the current version of my code, 
> which is hacked up in various ways as I try to understand what's going 
> on but should give some idea what I'm trying.

Ugh...  Hacked is right.  I can't comment on the ACPI stuff, but the
USB parts are a mess.  You put stuff in the USB core that really
belongs in the PCI core and you put stuff in the kernel that belongs in
userspace.  Of course, there's nothing wrong with doing this as part of
a "proof-of-principle" thing.

Alan Stern


^ permalink raw reply

* Re: USB runtime D3
From: Alan Stern @ 2009-07-30 18:58 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Shaohua Li, linux-usb@vger.kernel.org, linux-acpi@vger.kernel.org
In-Reply-To: <20090730025657.GE26389@srcf.ucam.org>

On Thu, 30 Jul 2009, Matthew Garrett wrote:

> Yes, that would basically work for me. I'm still slightly curious as to 
> why my hardware's not giving me the correct ACPI notification, though - 
> I guess this may be entirely untested under Windows.

To put it mildly...

Alan Stern


^ permalink raw reply

* Re: USB runtime D3
From: Matthew Garrett @ 2009-07-30 19:02 UTC (permalink / raw)
  To: Alan Stern
  Cc: Shaohua Li, linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <Pine.LNX.4.44L0.0907301458090.6087-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

On Thu, Jul 30, 2009 at 02:58:26PM -0400, Alan Stern wrote:
> On Thu, 30 Jul 2009, Matthew Garrett wrote:
> 
> > Yes, that would basically work for me. I'm still slightly curious as to 
> > why my hardware's not giving me the correct ACPI notification, though - 
> > I guess this may be entirely untested under Windows.
> 
> To put it mildly...

Windows doesn't D3 the host?

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: USB runtime D3
From: Matthew Garrett @ 2009-07-30 19:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <Pine.LNX.4.44L0.0907301447080.6087-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

On Thu, Jul 30, 2009 at 02:57:48PM -0400, Alan Stern wrote:
> > event, even though port0en and port1n are set in USB_RES. Is there any 
> > way to get UHCI to do this? I'm guessing that the power is being cut to 
> > the port when it suspends with no device connected.
> 
> No, that doesn't sound right.  On the other hand, there's only one way 
> to find out for certain.  Voltmeters do come in handy at times...

Thinking about it, I get plug events from EHCI, so it can't be that the 
port is powered down. Maybe it's something to do with the port switching 
logic.

> UHCI isn't very flexible or configurable.  It doesn't sound like 
> there's anything more you can do about this.

Which possibly rules out the ability to do this.

> > I've thought of a way that this could be made to work, but it's kind of 
> > hacky. A generic GPE handler could be registered and then scan all PCIe 
> > devices for a raised PME bit.
> 
> Something like that is probably needed anyway -- for all PCI devices, 
> not just PCIe.  You have to do this on systems that support PCI-PM but 
> don't have ACPI.

Yes, they'll need to have their own platform hook into this.

> > Anyone have any ideas? I've attached the current version of my code, 
> > which is hacked up in various ways as I try to understand what's going 
> > on but should give some idea what I'm trying.
> 
> Ugh...  Hacked is right.  I can't comment on the ACPI stuff, but the
> USB parts are a mess.  You put stuff in the USB core that really
> belongs in the PCI core and you put stuff in the kernel that belongs in
> userspace.  Of course, there's nothing wrong with doing this as part of
> a "proof-of-principle" thing.

Yes, I guess the correct model is for this to be in PCI and have the USB 
code do nothing other than indicate that the PCI device is now idle.

-- 
Matthew Garrett | mjg59-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: USB runtime D3
From: Alan Stern @ 2009-07-30 19:15 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Shaohua Li, linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20090730190243.GA22016-1xO5oi07KQx4cg9Nei1l7Q@public.gmane.org>

On Thu, 30 Jul 2009, Matthew Garrett wrote:

> On Thu, Jul 30, 2009 at 02:58:26PM -0400, Alan Stern wrote:
> > On Thu, 30 Jul 2009, Matthew Garrett wrote:
> > 
> > > Yes, that would basically work for me. I'm still slightly curious as to 
> > > why my hardware's not giving me the correct ACPI notification, though - 
> > > I guess this may be entirely untested under Windows.
> > 
> > To put it mildly...
> 
> Windows doesn't D3 the host?

As far as I know, Windows doesn't even suspend the root hub.  However 
my knowledge of Windows internals is out of date by at least 10 years.  
Thank goodness...  :-)

Alan Stern

P.S.: Besides which, you can't put these Intel UHCI controllers into D3
because they don't support PCI PM.  Controllers from other vendors
(i.e., VIA) do.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: USB runtime D3
From: Alan Stern @ 2009-07-30 19:22 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-usb, linux-acpi
In-Reply-To: <20090730190716.GB22016@srcf.ucam.org>

On Thu, 30 Jul 2009, Matthew Garrett wrote:

> On Thu, Jul 30, 2009 at 02:57:48PM -0400, Alan Stern wrote:
> > > event, even though port0en and port1n are set in USB_RES. Is there any 
> > > way to get UHCI to do this? I'm guessing that the power is being cut to 
> > > the port when it suspends with no device connected.
> > 
> > No, that doesn't sound right.  On the other hand, there's only one way 
> > to find out for certain.  Voltmeters do come in handy at times...
> 
> Thinking about it, I get plug events from EHCI, so it can't be that the 
> port is powered down. Maybe it's something to do with the port switching 
> logic.

Seems likely.  What happens if you suspend the UHCI controller but 
leave the EHCI controller active?  Then the port-switching logic should 
kick in.

> > > Anyone have any ideas? I've attached the current version of my code, 
> > > which is hacked up in various ways as I try to understand what's going 
> > > on but should give some idea what I'm trying.
> > 
> > Ugh...  Hacked is right.  I can't comment on the ACPI stuff, but the
> > USB parts are a mess.  You put stuff in the USB core that really
> > belongs in the PCI core and you put stuff in the kernel that belongs in
> > userspace.  Of course, there's nothing wrong with doing this as part of
> > a "proof-of-principle" thing.
> 
> Yes, I guess the correct model is for this to be in PCI and have the USB 
> code do nothing other than indicate that the PCI device is now idle.

Which reminds me...  One of the things you had to do was enable remote
wakeup for the host controllers.  The current initial state is
disabled, for a good reason.  People don't like it if they suspend
their laptop only to find that the computer wakes back up again when
they unplug the USB mouse.

If we do end up implementing runtime power management for USB host
controllers, something (a userspace program?) will have to turn off
remote wakeup before system sleeps and turn it back on afterward.

Alan Stern

^ permalink raw reply

* Re: USB runtime D3
From: Matthew Garrett @ 2009-07-30 19:23 UTC (permalink / raw)
  To: Alan Stern
  Cc: Shaohua Li, linux-usb@vger.kernel.org, linux-acpi@vger.kernel.org
In-Reply-To: <Pine.LNX.4.44L0.0907301511030.6087-100000@iolanthe.rowland.org>

On Thu, Jul 30, 2009 at 03:15:46PM -0400, Alan Stern wrote:

> P.S.: Besides which, you can't put these Intel UHCI controllers into D3
> because they don't support PCI PM.  Controllers from other vendors
> (i.e., VIA) do.

There's an out of band wakeup mechanism via ACPI for the Intel ones. It 
works for remote wakeups and disconnects - it's the connects I'm having 
trouble with. Given that all PCI PM events are delivered via ACPI on 
standard x86, the only difference it makes from an implementation 
viewpoint is whether or not you have to enable the PME bits.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: USB runtime D3
From: Matthew Garrett @ 2009-07-30 19:26 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb, linux-acpi
In-Reply-To: <Pine.LNX.4.44L0.0907301516100.6087-100000@iolanthe.rowland.org>

On Thu, Jul 30, 2009 at 03:22:32PM -0400, Alan Stern wrote:

> Seems likely.  What happens if you suspend the UHCI controller but 
> leave the EHCI controller active?  Then the port-switching logic should 
> kick in.

Yeah, I'll give that a go.

> Which reminds me...  One of the things you had to do was enable remote
> wakeup for the host controllers.  The current initial state is
> disabled, for a good reason.  People don't like it if they suspend
> their laptop only to find that the computer wakes back up again when
> they unplug the USB mouse.
> 
> If we do end up implementing runtime power management for USB host
> controllers, something (a userspace program?) will have to turn off
> remote wakeup before system sleeps and turn it back on afterward.

I think we'll probably want a call for "transitioning from runtime 
suspend to system suspend", at which point the driver can clean that up 
itself.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* Re: [Xen-devel] [PATCH][pvops_dom0][2/4] Introduce the external control operation interface for domain0 ACPI parser
From: Jeremy Fitzhardinge @ 2009-07-30 20:36 UTC (permalink / raw)
  To: Len Brown; +Cc: Yu, Ke, linux-acpi@vger.kernel.org, Tian, Kevin
In-Reply-To: <alpine.LFD.2.00.0907301139530.17818@localhost.localdomain>

On 07/30/09 09:00, Len Brown wrote:
>>> My understanding is that all that code is supposed to be
>>> kernel-independent.  We could lift all that code as-is, write some new
>>> kernel interface shims and shove it all into Xen.   But I don't know if
>>> that solves any more problems than it causes.
>>>       
>
> Many have incorporated ACPICA into their OS, from BeOS to BSD, 
> Solaris to Linux.  And I'm not going to tell you that you can't
> do the same and make the xen hypervisor into an OS that knows
> about both policy and the hardware it is running on.
>
> However, ACPI != all the ACPI code in Linux.  ACPICA is the
> stuff in the drivers/acpi/acpica directory, and nothing else
> (asside from a few header files outside that directory)
>
> Also, I'm not sure the OS you build will be competitive with
> other OS's when you're done.
>   

Yes.  The general intent is that Xen tries to avoid having to
reimplement stuff that the guest operating systems are already much
better at.  However, it seems that ACPI's
>   
>>>>> s/extcntl/xen/ to make it clear why this code exists --
>>>>> or is there an expected "external control" other than Xen?
>>>>>           
> ...
>   
>>> I don't think that's a good idea.  If we need to do that, then there's a
>>> deeper design problem we need to address.  (And, if nothing else, people
>>> have become hypersensitive to naked Xen-specific code references in
>>> non-Xen files, and I'm tired of having those arguments.)
>>>       
>> Can you explain what the deeper design problem is?
>>     
>
> Obfuscating the code by calling a Xen-specific abstraction
> "extcntl" instead of "xen" will not fly.
>
> Jeremy's desire to create and use generic abstractions that have multliple 
> users is a good thing.  It simply does not apply here.
>   

OK.  I don't really understand ACPIs overall design or philosophy,
beyond a rough idea of what kinds of things it gets used for.  If there
really is no scope for refactoring things to make Ke's changes fit in
more naturally, then I guess we can live with some explicit hooks.

> I have no objection to having a xen-specific hook being called xen_*
> To do otherwise would be a dis-service to future maintainers of the code.
>   

I agree completely.  I have no interest in extra layers of
indirection/"abstraction" which are just disguises rather than having
some inherent value.

    J

^ permalink raw reply

* Re: [Xen-devel] [PATCH][pvops_dom0][2/4] Introduce the external control operation interface for domain0 ACPI parser
From: Jeremy Fitzhardinge @ 2009-07-30 20:52 UTC (permalink / raw)
  To: Len Brown; +Cc: Yu, Ke, linux-acpi@vger.kernel.org, Tian, Kevin, Xen-devel
In-Reply-To: <alpine.LFD.2.00.0907301121080.17818@localhost.localdomain>

On 07/30/09 08:37, Len Brown wrote:
> I agree with Kevin that it would be a mistake to put ACPI both into
> both dom0 and the hypervisor.  Frankly, on many levels, ACPI was
> designed with Windows in mind, and the further an OS strays from
> how Windows does things, the less likely you'll run well on many 
> systems.  Obviously, Xen looks nothing like windows, or any other OS,
> for it seems to have not one division between implementation and
> policy, but multiple...
>
> So I have a fundamental lack of understanding of the logic
> behind the partitioning behind the hypervisor and dom0.
> Maybe somebody explain it to me in terms that I'll understand?
>   

The basic idea is that Xen controls the things it must, and leaves
everything else to guest kernels.  At heart that means it controls the
physical CPUs (which includes things like local APICs) and memory (by
maintaining control over the CPU's paging hardware via the pagetables).

Everything beyond that is left to guest domains which handle various
responsibilities; informally we refer to "dom0" which is "the"
privileged domain which handles things like hardware discovery, device
drivers, domain creation, and a number of other services. However
there's no inherent reason why all these jobs must be aggregated
together.  For example, there's active work on having specific "driver
domains" which have responsibility for a specific piece of hardware or
class of hardware, but don't (and can't) do any of the other
"privileged" jobs.

> It reminds me of the partitioning between the Mach microkernel
> and the "user-space OS personality, eg Unix".  This looked really neat
> in proposals for funding and academic papers, but in reality it turned
> out to have little value other than employing programmers to
> re-invent the wheel, only to discover that the original round
> wheel was better than the square one that they produced...
>   

There are some parallels, and there's even a paper with a title
something like "hypervisors: microkernels done right".  I think the
rough sketch of the argument is that a Mach-like system with lots of
server processes fails in practise because its a poor match for the APIs
that people actually want to use and are used to, and if you want
Unix-like functionality then you just need to put a Unix in there. 
Xen's breakdown is more along the lines of the hardware architecture
itself, and therefore is in some sense more "natural": kernels which run
processes are real kernels, with the full programming interfaces
everyone wants to use; shared resources like CPU and memory can be
multiplexed fairly easily using well-understood techniques; with some
extra work, you can even let guest domains have direct hardware access
using their normal device drivers so that the hypervisor doesn't need to
deal with them.

ACPI - which wears many hats affecting many aspects of the system -
doesn't have any neat dividing lines, and doesn't follow the contours of
the underlying architecture, and so appears to be a poor match for a
Xen-like model.  If it were possible to partition ACPI into broad
function groupings, then maybe it would be possible to get a better fit,
but I'm not sure whether that's possible.

In general Xen doesn't have much use for ACPI; aside from this specific
case of needing to know how to control the CPUs for power management, it
doesn't really care about ACPI's services.  It doesn't do interrupt
routing, it doesn't really need to know about thermal management, any
anything it does need to know it can be told by the kernel which does
have a much greater interest in ACPI.  S3 suspend/resume needs to go via
Xen for the final stage, but aside from that it can all be handled in Linux.

    J

^ permalink raw reply

* Re: [Bug 13865] [PATCH] hp-wmi: check that an input device exists in resume handler
From: Len Brown @ 2009-07-30 21:17 UTC (permalink / raw)
  To: Matthew Garrett
  Cc: Frans Pop, Rafael J. Wysocki, bugzilla-daemon,
	ACPI Devel Maling List, pm list, LKML, Cédric Godin
In-Reply-To: <20090730123711.GB14127@srcf.ucam.org>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 317 bytes --]

> > Reported-and-tested-by: Cédric Godin <cedric@belbone.be>
> > Signed-off-by: Frans Pop <elendil@planet.nl>
> > Cc: Matthew Garrett <mjg59@srcf.ucam.org>

> Acked-by: Matthew Garrett <mjg@redhat.com>

Applied -- and i'll put it on my queue for 2.6.30.stable.

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply

* Re: [Bugme-new] [Bug 13879] New: Bluetooth stops working after s3 suspend/resume
From: Andrew Morton @ 2009-07-30 21:27 UTC (permalink / raw)
  To: linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
  Cc: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	bugme-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r,
	pi3mm3-Re5JQEeQqe8AvxtiuMwx3w, John W. Linville, Johannes Berg,
	Henrique de Moraes Holschuh
In-Reply-To: <bug-13879-10286-V0hAGp6uBxO456/isadD/XN4h3HLQggn@public.gmane.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

(added lots of cc's)

On Thu, 30 Jul 2009 15:30:31 GMT
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13879
> 
>            Summary: Bluetooth stops working after s3 suspend/resume
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.30-gentoo-r1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Bluetooth
>         AssignedTo: drivers_bluetooth-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
>         ReportedBy: pi3mm3-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
>                 CC: acpi_platform-drivers-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
>         Regression: No
> 
> 
> Created an attachment (id=22540)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=22540)
> dmesg output
> 
> Distribution: Gentoo 2008.0 64bit
> Hardware Environment: notebook Toshiba Portege M750
> Software Environment: desktop with gnome-2.24.3, udev-141 and dbus-1.2.3
> Problem Description: After an S3 cycle the hci0 device disappears.
> 
> The only way I found to wake it up is:
> 1) patch drivers/platform/x86/toshiba_acpi.c module with the Experimental
> Toshiba Acpi Driver
> (http://memebeam.org/free-software/toshiba_acpi/toshiba_acpi-dev_toshiba_test5-linux_2.6.29.patch)
> 2) run as root "toshset -bluetooth on" (http://schwieters.org/toshset/).
> 

hm, what's all that stuff in
http://memebeam.org/free-software/toshiba_acpi/ and who wrote it and
should we be merging it into mainline?

^ permalink raw reply

* Re: [Xen-devel] [PATCH][pvops_dom0][2/4] Introduce the external control operation interface for domain0 ACPI parser
From: Jeremy Fitzhardinge @ 2009-07-30 22:04 UTC (permalink / raw)
  To: Len Brown; +Cc: Yu, Ke, linux-acpi@vger.kernel.org, Tian, Kevin
In-Reply-To: <alpine.LFD.2.00.0907301204260.17818@localhost.localdomain>

On 07/30/09 09:29, Len Brown wrote:
> Unclear that the power management partitioning between xen hypervisor
> and dom0 is fully baked.
>
> Uncear (to me) what xen is doing internally with these power management 
> objects, and how that differs from what Linux would do.
>   

Yes.  The key thing is that Xen is the only entity which really knows
about physical CPUs and how they're being used, and so is the only thing
which can correctly apply the chosen policy.  If any particular guest
domain did it, it would only take into account that particular domain's
CPU use, and ignore everyone else (or have the extra complexity of
extracting system-wide usage from Xen then applying that to its own policy).

If I understand correctly, the code currently relies on Linux running
the _PSD method with its AML interpreter, and then feeding the results
to Xen as it doesn't have an AML interpreter.  And putting AML into Xen
would be an all-or-nothing proposition, because the entity which runs
AML maintains a lot of state which can't be separated between Xen and
Linux, and can't be shared.

    J

^ permalink raw reply

* Montreal Linux Power Management Mini-Summit, July 13, 2009 - Meeting Notes
From: Len Brown @ 2009-07-30 22:04 UTC (permalink / raw)
  To: Linux Power Management List, linux-acpi; +Cc: Linux Kernel Mailing List
In-Reply-To: <alpine.LFD.2.00.0905211430220.4390@localhost.localdomain>

A Linux Power Management "mini-summit" was held on July 13th, 2009 -
on the first day of the Montreal Linux Symposium.

The Linux Symposium generously provided the facilities.

We repeated the process used in 2008: http://lwn.net/Articles/292447/

This year the meeting room was more accessible to the general attendees
of the Linux Symposium, so we had a fair number of "drop-ins".
25 signed in (listed below) plus a few more that came and went.
While this exceeded our cap of 20, the extra people did not hinder
our goal of focusing on a single discussion.

Attendees
---------
Len Brown - Intel - ACPI, SFI, Suspend co-Maintainer
Howard Alyne - Wind River
Pierre Phaneuf
Rafael J. Wysocki - SUSE Labs/Novell, U. Warsaw;  Hibernate and Suspend Maintainer
Per-Inge Tallberg - Ericsson
Rickard Andersson - Ericsson
Paul Mundt - Renesas - SH Maintainer
Magnus Damm - Renesas
Richard Wooodruff - Texas Instruments, OMAP
Stephen Hui - Zarlink
John Linville - Red Hat - Wireless LAN maintainer
Mark Brown - Marvell
Samuel Thibault - labri.fr
Lucas Nussbaun - inria.fr
Srinivas Sripathi - Motorola
Jason Baron - Red Hat
Aristu Rozanaski - Red Hat - RHEL6 kernel maintainer
Christopher Curtis - RipTide Software
Klaus Pedersen - Nokia
H. Peter Anvin - Intel - x86 maintainer
Ernest Szedeman - Nortel
Rick Leir - Leirtech
David Ahern - Cisco
Wending Wen - Rheinmetall
Jason Chagas - Marvell

Some of the attendees are in photos here:
http://picasaweb.google.com/lenb417/2009LinuxSymposium#

Agenda
------
	1. Review changes over the last year
	2. Survey tools, techniques, workloads
	3. Discuss upcoming work

Summary of Power Management kernel changes since last year
----------------------------------------------------------
ACPI Platform BIOS compatibility fixes
	ACPI ACPI_SCI_EN work-around
	resume memory corruption workarounds

hibernation:
	NVS memory handling
	handle overlapping memory zones

suspend/resume framework re-work (Rafael Wysocki)
	shipped suspend/resume RTC test feature
	ordering update/workaround
	simplified driver interface now available
		r8169 etc. drivers now using it
	PCI PM framework re-worked to simplify drivers
	graphics drivers better support suspend/resume
		i915 video restore, though has bugs
		ATI making progress, especially older cards
		NVIDIA - continues to trail
			no open source support for devices after 7200
power aware scheduling
	sched_mc_power_savings
	per-CPU timers fixed
clock_events_broadcast()
	bugs fixed
	(no longer needed on Westmere, which has always running LAPIC timer)
range timers shipped upstream
	eg. range timers used android to group around wireless

Intel shipped Nehalem (Core i7), which has always-running-TSC

Run Time power management is receiving some attention now.

OMAP (Richard Woodruff)
	2008 had TI releasing aggressive full-off reference code on public portals
		Customers snapshotted this code at different points
		Heavy support burden ramping variants into production
	Linux-OMAP community have been creating a cleaner version of aggressive PM
	code suitable for mainline kernel in Linux-OMAP PM branch.
		Hope of reduced burden for future kernels with mainlined code

ACPI sub-system (Len Brown)
	quality has been the focus for the last year.
	We continue to process about 300 bugs/year
	with 50-60 unresolved at any given time.

Wireless: (John Linville)
	mac-80211 is now suspend/resume aware
	IEEE-80211 has run-time power saving features
		eg. negotiate w/ access point
		starting to deploy in drivers
	beacon filtering (reduces CPU wake-ups)
	TX power upcoming in cfg-80211 API
		Nokia tablets pushing power savings

SH: (Paul Mundt)
	cpuidle integration
	using clocksources & clockevents from upstream
	can switch between timers depending on sleep states
	Hibernate & STR enabled, can test w/ RTC & kexec-jump-and-return

s390:
	added suspend/resume support

5-second boot on Atom netbook for Moblin
	async API is upstream
	Fedora Core-11 boots in 20 seconds on a notebook
		Down from 60 seconds in Fedora Core-10

PM-QOS shipped
	Documentation/power/pm_qos_interface.txt

Survey of Tools, Techniques, workloads for optimizing power management
----------------------------------------------------------------------
powertop
bootchart
bootgraph
CONFIG_POWER_TRACER=y
	LTT-lite
performance counters for energy coming
OMAP uses on-board instrumentation
suspend/resume debug I/F
Power meters:
O(100) Watts Up Pro; O(600) Extech; O(1000) Yokogawa
O(600) HP/Agilent 34401A
OMAP: measure per-power-plane w/ lab instruments
500mA vs uA range difficult to measure w/ precision
	multi-channel DAC - each channel calibrated to range

Workloads for measuring power:

handheld: no standard workloads
	however device vendors have internal benchmarks
	#1 idle
	#2 specific workloads
	#3 combination use-case

SpecPower benchmark for servers (only)

Energy Star for client computers
	idle only
	requires STR to be enabled by default
	Energy Star Server spec coming
	Future Energy Star wants to use energy benchmark
BAPCO
	MobileMark 2007 for Windows
	Apple joined, so expect something new to work also on Apple
	No Linux Distro representation
EEMBC
	released something or other...
BLTK (Battery Life Toolkit) for Linux
	http://www.lesswatts.org/projects/bltk/
	could use refresh
	could use handheld new workloads

Future plans for the PM development, kernel side
-------------------------------------------------
cpuidle C-states generalized to be platform idle states...
	platform driver can hide platform hooks into CPU power states

Runtime PM for Platform Devices.
	2.6.32 framework plan simmering
	SH running on top of prototype now
		context save/restore for power off power domain

	platform devices
		SH specific - Magnus
	IO devices
		eg PCI, USB - Alan Stern

	clock framework (started in ARM, now common on embedded)
		includes ref-counts/clock
		architecture specific implementation
		x86/ACPI system doesn't expose clock dependencies
			so unclear benefit to that arch

Run-time PM of I/O devices, from the PCI POV mostly
	ability to put device into D1/D2 (~200us) /D3 (10ms)

	wakeup: PCIe #PME plug-event via root port
	(PCI #PME is less well specified)

ACPI 4.0 adds D3hot
	Q: has an effect on _SD3?

Hibernate/suspend:

Axiom: we need more people fixing suspend/resume bugs

Suspend2 aka "Tux on Ice"
	Spring 2009 patch set to replace hibernate w/ TOI was
	deemed impractical by upstream community, which prefers
	an incremental approach.

	Since, Nigel has sent specific patches to Rafael along
	the lines of gradual cherry-picking that upstream needs.

	First example is patch to compress hibernation image
	which Rafael thinks can be integrated.

	TOI is able to save larger hibernate images due to
	how it manages memory.  This is a nice benefit and
	we'd like to see if we can do it upstream.

	patch review bandwidth limited
	
	1. image compression
	2. image saving performance
		currently very slow
	3. ability to use multiple devices to save images
		including multiple swaps, and regular files
	4. break the half-of-memory image limitation
	5. Image encryption (solution for keys is an issue)

	It would be great to have Nigel supporting upstream hibernate.

	TOI supports snapshot boot via "kiosk mode"

Hibernate & kexec
	kexec-jump is upstream (i386, SH, no x86_64)
		simplifies memory management of the "jumped to" code
		unclear if any other advantages.

kexec-crash-dump is useful
	can make an oops "look less scary" and be automatic

STR performance
	eliminate console switch
	async device resume

android submitted "auto-suspend" patches
	compromise between low-level and high-level suspend invocation policy.

cpuidle vs auto-suspend
	suspend is more "draconian", it stops timers etc for you.
	platform drivers in cpuidle can get to same place.

Android
	OHA -Open Handset Alliance
		controls android license(s)
		Android = access to app-store
Moblin
	shall support Android applications

OMAP & SH specifics

	UIO - user space codec etc. have no concept of PM
		could use clock framework extension
		(clock framework is accessible via debugfs if necessary)
	interrupt coalescing
	deferred I/O to LCD
		delay until regular (infrequent) update interval
		use x-damage API to track change to visible screen

SH running cpufreq on top of clock framework
	cpufreq has notifiers, clock framework does not

lightweight CPU hotplug
	IBM proposed "idle throttling" approach using scheduler
	Intel is proposing simple "forced idle" RT thread
	PeterZ likes neither implementation, but
	favors the IBM approach in the long term.

	SH SMP wants to run Itron on some cores...
	low latency transition is important

Memory Power Management
	Nokia project w/ U. in Brazil
		more pain than gain in memory offline prototype
	"partial RAM self refresh"

	page tables for kernel memory would allow
		moving kernel physical memory

	memory off-line incompatible with high-performance interleaving

	using NUMA node to segment memory allows tracking
		unused memory
	anti-fragmentation went upstream last year

	consensus: online/offline
		node granularity only

ACPI 4.0 was published
	Error Reporting extensions
	processor aggregator device (forced idle to save power)
	D3hot
	generalized fan support
	thermal extensions
	IPMI op-region

	Len will do a Linux ACPI 4.0 presentation this Fall

virtualization power management
	PM is still an after-though in the VMM space
		they have bigger problems

	KVM gets everything in Linux for free
		but could benefit from more info from the guests

	Xen gets to re-invent/port/re-implement everything in Linux

	VMMS have an easier time moving physical pages
		and thus doing memory power management


^ permalink raw reply

* Re: [Bugme-new] [Bug 13879] New: Bluetooth stops working after s3 suspend/resume
From: Matthew Garrett @ 2009-07-30 22:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-acpi, linux-bluetooth, bugzilla-daemon, bugme-daemon,
	pi3mm3, John W. Linville, Johannes Berg,
	Henrique de Moraes Holschuh
In-Reply-To: <20090730142733.b05a95ed.akpm@linux-foundation.org>

On Thu, Jul 30, 2009 at 02:27:33PM -0700, Andrew Morton wrote:

> hm, what's all that stuff in
> http://memebeam.org/free-software/toshiba_acpi/ and who wrote it and
> should we be merging it into mainline?

Still waiting for someone to just do the right thing and add rfkill 
support to the Toshiba driver. I'm actually considering a complete 
rewrite - the existing code's grown kind of organically, and support for 
modern kernel features has been rather bolted onto the side.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply

* [PATCH] dell-laptop: Fix rfkill state queries
From: Matthew Garrett @ 2009-07-31  2:25 UTC (permalink / raw)
  To: linux-acpi; +Cc: johannes, linux-wireless, timg, Matthew Garrett

The current code in dell-laptop is confused about the hardware rfkill
state. Fix it up such that it's always reported correctly.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
---
 drivers/platform/x86/dell-laptop.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/dell-laptop.c b/drivers/platform/x86/dell-laptop.c
index 74909c4..9061111 100644
--- a/drivers/platform/x86/dell-laptop.c
+++ b/drivers/platform/x86/dell-laptop.c
@@ -197,8 +197,8 @@ static void dell_rfkill_query(struct rfkill *rfkill, void *data)
 	dell_send_request(&buffer, 17, 11);
 	status = buffer.output[1];
 
-	if (status & BIT(bit))
-		rfkill_set_hw_state(rfkill, !!(status & BIT(16)));
+	rfkill_set_sw_state(rfkill, !!(status & BIT(bit)));
+	rfkill_set_hw_state(rfkill, !(status & BIT(16)));
 }
 
 static const struct rfkill_ops dell_rfkill_ops = {
-- 
1.6.2.5


^ permalink raw reply related

* [PATCH 1/2] input: Add support for filtering input events
From: Matthew Garrett @ 2009-07-31  2:26 UTC (permalink / raw)
  To: linux-acpi; +Cc: linux-input, dmitry.torokhov, Matthew Garrett

Devices occasionally use the keyboard controller to send hardware
notifications that should be handled by the kernel rather than userspace.
These can be handled in kernel by registering an input handler, but the
event will still bubble up to userspace where it may cause confusion. This
patch adds support for the kernel to register filters, allowing it to
indicate that the event should be consumed by the hardware-specific driver
rather than passed to other input handlers. The assumption is that, as this
is hardware specific, there should be no need to have more than one filter -
similarly, if an input device is grabbed, we assume that people know what
they're doing and don't pass it to the filter.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
---
 drivers/input/input.c |   91 ++++++++++++++++++++++++++++++++++++++++++-------
 include/linux/input.h |    5 +++
 2 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/drivers/input/input.c b/drivers/input/input.c
index 7c237e6..80f1e48 100644
--- a/drivers/input/input.c
+++ b/drivers/input/input.c
@@ -88,19 +88,26 @@ static int input_defuzz_abs_event(int value, int old_val, int fuzz)
  */
 static void input_pass_event(struct input_dev *dev,
 			     unsigned int type, unsigned int code, int value)
-{
-	struct input_handle *handle;
+
+{	struct input_handle *handle;
 
 	rcu_read_lock();
 
 	handle = rcu_dereference(dev->grab);
-	if (handle)
+	if (handle) {
 		handle->handler->event(handle, type, code, value);
-	else
-		list_for_each_entry_rcu(handle, &dev->h_list, d_node)
-			if (handle->open)
-				handle->handler->event(handle,
-							type, code, value);
+		goto out;
+	}
+
+	handle = rcu_dereference(dev->filter);
+	if (handle && handle->handler->filter(handle, type, code, value))
+		goto out;
+
+	list_for_each_entry_rcu(handle, &dev->h_list, d_node)
+		if (handle->open)
+			handle->handler->event(handle,
+					       type, code, value);
+out:
 	rcu_read_unlock();
 }
 
@@ -375,12 +382,15 @@ int input_grab_device(struct input_handle *handle)
 }
 EXPORT_SYMBOL(input_grab_device);
 
-static void __input_release_device(struct input_handle *handle)
+static void __input_release_device(struct input_handle *handle, bool filter)
 {
 	struct input_dev *dev = handle->dev;
 
-	if (dev->grab == handle) {
-		rcu_assign_pointer(dev->grab, NULL);
+	if (handle == (filter ? dev->filter : dev->grab)) {
+		if (filter)
+			rcu_assign_pointer(dev->filter, NULL);
+		else
+			rcu_assign_pointer(dev->grab, NULL);
 		/* Make sure input_pass_event() notices that grab is gone */
 		synchronize_rcu();
 
@@ -404,12 +414,65 @@ void input_release_device(struct input_handle *handle)
 	struct input_dev *dev = handle->dev;
 
 	mutex_lock(&dev->mutex);
-	__input_release_device(handle);
+	__input_release_device(handle, false);
 	mutex_unlock(&dev->mutex);
 }
 EXPORT_SYMBOL(input_release_device);
 
 /**
+ * input_filter_device - allow input events to be filtered from higher layers
+ * @handle: input handle that wants to filter the device
+ *
+ * When a device is filtered by an input handle all events generated by
+ * the device are to this handle. If the filter function returns true then
+ * the event is discarded rather than being passed to any other input handles,
+ * otherwise it is passed to them as normal. Grabs will be handled before
+ * filters, so a grabbed device will not deliver events to a filter function.
+ */
+int input_filter_device(struct input_handle *handle)
+{
+	struct input_dev *dev = handle->dev;
+	int retval;
+
+	retval = mutex_lock_interruptible(&dev->mutex);
+	if (retval)
+		return retval;
+
+	if (dev->filter) {
+		retval = -EBUSY;
+		goto out;
+	}
+
+	rcu_assign_pointer(dev->filter, handle);
+	synchronize_rcu();
+
+ out:
+	mutex_unlock(&dev->mutex);
+	return retval;
+}
+EXPORT_SYMBOL(input_filter_device);
+
+/**
+ * input_unfilter_device - removes a filter from a device
+ * @handle: input handle that owns the device
+ *
+ * Removes the filter from a device so that other input handles can
+ * start receiving unfiltered input events. Upon release all handlers
+ * attached to the device have their start() method called so they
+ * have a change to synchronize device state with the rest of the
+ * system.
+ */
+void input_unfilter_device(struct input_handle *handle)
+{
+	struct input_dev *dev = handle->dev;
+
+	mutex_lock(&dev->mutex);
+	__input_release_device(handle, true);
+	mutex_unlock(&dev->mutex);
+}
+EXPORT_SYMBOL(input_unfilter_device);
+
+/**
  * input_open_device - open input device
  * @handle: handle through which device is being accessed
  *
@@ -482,7 +545,9 @@ void input_close_device(struct input_handle *handle)
 
 	mutex_lock(&dev->mutex);
 
-	__input_release_device(handle);
+	/* Release both grabs and filters */
+	__input_release_device(handle, false);
+	__input_release_device(handle, true);
 
 	if (!--dev->users && dev->close)
 		dev->close(dev);
diff --git a/include/linux/input.h b/include/linux/input.h
index 8b3bc3e..e28f116 100644
--- a/include/linux/input.h
+++ b/include/linux/input.h
@@ -1118,6 +1118,7 @@ struct input_dev {
 	int (*event)(struct input_dev *dev, unsigned int type, unsigned int code, int value);
 
 	struct input_handle *grab;
+	struct input_handle *filter;
 
 	spinlock_t event_lock;
 	struct mutex mutex;
@@ -1218,6 +1219,7 @@ struct input_handler {
 	void *private;
 
 	void (*event)(struct input_handle *handle, unsigned int type, unsigned int code, int value);
+	bool (*filter)(struct input_handle *handle, unsigned int type, unsigned int code, int value);
 	int (*connect)(struct input_handler *handler, struct input_dev *dev, const struct input_device_id *id);
 	void (*disconnect)(struct input_handle *handle);
 	void (*start)(struct input_handle *handle);
@@ -1295,6 +1297,9 @@ void input_unregister_handle(struct input_handle *);
 int input_grab_device(struct input_handle *);
 void input_release_device(struct input_handle *);
 
+int input_filter_device(struct input_handle *);
+void input_unfilter_device(struct input_handle *);
+
 int input_open_device(struct input_handle *);
 void input_close_device(struct input_handle *);
 
-- 
1.6.2.5


^ permalink raw reply related

* [PATCH 2/2] dell-laptop: Trigger rfkill updates on wifi toggle switch press
From: Matthew Garrett @ 2009-07-31  2:26 UTC (permalink / raw)
  To: linux-acpi; +Cc: linux-input, dmitry.torokhov, Matthew Garrett
In-Reply-To: <1249007207-27392-1-git-send-email-mjg@redhat.com>

Dell hardware sends the rfkill hardware killswitch event via the keyboard
controller, even if it's a physical toggle switch on the side of the
machine. Add support for catching the input event and updating the rfkill
state, allowing userspace to receive notifications that the change has
occurred.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
---
 drivers/platform/x86/dell-laptop.c |  100 ++++++++++++++++++++++++++++++++++++
 1 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/drivers/platform/x86/dell-laptop.c b/drivers/platform/x86/dell-laptop.c
index 74909c4..71a4149 100644
--- a/drivers/platform/x86/dell-laptop.c
+++ b/drivers/platform/x86/dell-laptop.c
@@ -22,6 +22,7 @@
 #include <linux/rfkill.h>
 #include <linux/power_supply.h>
 #include <linux/acpi.h>
+#include <linux/input.h>
 #include "../../firmware/dcdbas.h"
 
 #define BRIGHTNESS_TOKEN 0x7d
@@ -206,6 +207,16 @@ static const struct rfkill_ops dell_rfkill_ops = {
 	.query = dell_rfkill_query,
 };
 
+static void dell_rfkill_update(void)
+{
+	if (wifi_rfkill)
+		dell_rfkill_query(wifi_rfkill, (void *)1);
+	if (bluetooth_rfkill)
+		dell_rfkill_query(bluetooth_rfkill, (void *)2);
+	if (wwan_rfkill)
+		dell_rfkill_query(wwan_rfkill, (void *)3);
+}
+
 static int dell_setup_rfkill(void)
 {
 	struct calling_interface_buffer buffer;
@@ -310,6 +321,90 @@ static struct backlight_ops dell_ops = {
 	.update_status  = dell_send_intensity,
 };
 
+static const struct input_device_id dell_input_ids[] = {
+	{
+		.bustype = 0x11,
+		.vendor = 0x01,
+		.product = 0x01,
+		.version = 0xab41,
+		.flags = INPUT_DEVICE_ID_MATCH_BUS |
+			 INPUT_DEVICE_ID_MATCH_VENDOR |
+			 INPUT_DEVICE_ID_MATCH_PRODUCT |
+			 INPUT_DEVICE_ID_MATCH_VERSION
+	},
+	{ },
+};
+
+static bool dell_input_filter(struct input_handle *handle, unsigned int type,
+			     unsigned int code, int value)
+{
+	if (type == EV_KEY && code == KEY_WLAN && value == 1) {
+		dell_rfkill_update();
+		return 1;
+	}
+
+	return 0;
+}
+
+static void dell_input_event(struct input_handle *handle, unsigned int type,
+			     unsigned int code, int value)
+{
+}
+
+static int dell_input_connect(struct input_handler *handler,
+			      struct input_dev *dev,
+			      const struct input_device_id *id)
+{
+	struct input_handle *handle;
+	int error;
+
+	handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
+
+	handle->dev = dev;
+	handle->handler = handler;
+	handle->name = "dell-laptop";
+
+	error = input_register_handle(handle);
+	if (error)
+		goto err_free_handle;
+
+	error = input_open_device(handle);
+	if (error)
+		goto err_unregister_handle;
+
+	error = input_filter_device(handle);
+	if (error)
+		goto err_close_handle;
+
+	return 0;
+
+err_close_handle:
+	input_close_device(handle);
+err_unregister_handle:
+	input_unregister_handle(handle);
+err_free_handle:
+	kfree(handle);
+	return error;
+}
+
+static void dell_input_disconnect(struct input_handle *handle)
+{
+	input_close_device(handle);
+	input_unregister_handle(handle);
+	kfree(handle);
+}
+
+static struct input_handler dell_input_handler = {
+	.name = "dell-laptop",
+	.filter = dell_input_filter,
+	.event = dell_input_event,
+	.connect = dell_input_connect,
+	.disconnect = dell_input_disconnect,
+	.id_table = dell_input_ids,
+};
+
 static int __init dell_init(void)
 {
 	struct calling_interface_buffer buffer;
@@ -333,6 +428,10 @@ static int __init dell_init(void)
 		goto out;
 	}
 
+	if (input_register_handler(&dell_input_handler))
+		printk(KERN_INFO
+		       "dell-laptop: Could not register input filter\n");
+
 #ifdef CONFIG_ACPI
 	/* In the event of an ACPI backlight being available, don't
 	 * register the platform controller.
@@ -388,6 +487,7 @@ static void __exit dell_exit(void)
 		rfkill_unregister(bluetooth_rfkill);
 	if (wwan_rfkill)
 		rfkill_unregister(wwan_rfkill);
+	input_unregister_handler(&dell_input_handler);
 }
 
 module_init(dell_init);
-- 
1.6.2.5


^ permalink raw reply related

* Re: Dynamic configure max_cstate
From: Robert Hancock @ 2009-07-31  3:43 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Zhang, Yanmin, Corrado Zoccolo, LKML, linux-acpi
In-Reply-To: <20090728101135.GA22358@rhlx01.hs-esslingen.de>

On 07/28/2009 04:11 AM, Andreas Mohr wrote:
> Hi,
>
> On Tue, Jul 28, 2009 at 05:00:35PM +0800, Zhang, Yanmin wrote:
>> I tried different clocksources. For exmaple, I could get a better (30%) result with
>> hpet. With hpet, cpu utilization is about 5~8%. Function hpet_read uses too much cpu
>> time. With tsc, cpu utilization is about 2~3%. I think more cpu utilization causes fewer
>> C state transitions.
>>
>> With idle=poll, the result is about 10% better than the one of hpet. If using idle=poll,
>> I didn't find result difference among different clocksources.
>
> IOW, this seems to clearly point to ACPI Cx causing it.
>
> Both Corrado and me have been thinking that one should try skipping all
> bigger-latency ACPI Cx states whenever there's an ongoing I/O request where an
> immediate reply interrupt is expected.
>
> I've been investigating this a bit, and interesting parts would perhaps include
>    . kernel/pm_qos_params.c
>    . drivers/cpuidle/governors/menu.c (which acts on the ACPI _cx state
> structs as configured by drivers/acpi/processor_idle.c)
>    . and e.g. the wait_for_completion_timeout() part in drivers/ata/libata-core.c
>      (or other sources in case of other disk I/O mechanisms)
>
> One way to do some quick (and dirty!!) testing would be to set a flag
> before calling wait_for_completion_timeout() and testing for this flag in
> drivers/cpuidle/governors/menu.c and then skip deeper Cx states
> conditionally.
>
> As a very quick test, I tried a
> while :; do :; done
> loop in shell and renicing shell to 19 (to keep my CPU out of ACPI idle),
> but bonnie -s 100 results initially looked promising yet turned out to
> be inconsistent. The real way to test this would be idle=poll.
> My test system was Athlon XP with /proc/acpi/processor/CPU0/power
> latencies of 000 and 100 (the maximum allowed value, BTW) for C1/C2.
>
> If the wait_for_completion_timeout() flag testing turns out to help,
> then one might intend to use the pm_qos infrastructure to indicate
> these conditions, however it might be too bloated for such a
> purpose, a relatively simple (read: fast) boolean flag mechanism
> could be better.
>
> Plus one could then create a helper function which figures out a
> "pretty fast" Cx state (independent of specific latency times!).
> But when introducing this mechanism, take care to not ignore the
> requirements defined by pm_qos settings!
>
> Oh, and about the places which submit I/O requests where one would have to
> flag this: are they in any way correlated with the scheduler I/O wait
> value? Would the I/O wait mechanism be a place to more easily and centrally
> indicate that we're waiting for a request to come back in "very soon"?
> OTOH I/O requests may have vastly differing delay expectations,
> thus specifically only short-term expected I/O replies should be flagged,
> otherwise we're wasting lots of ACPI deep idle opportunities.

Did the results show a big difference in performance between maximum C2 
and maximum C3? Thing with C3 is that it likely will have some 
interference with bus-master DMA activity as the CPU has to wake up at 
least partially before the SATA controller can complete DMA operations, 
which will likely stall the controller for some period of time. There 
would be an argument for avoiding going into deep C-states which can't 
handle snooping while IO is in progress and DMA will shortly be occurring..

^ permalink raw reply

* Re: [PATCH] fujitsu-laptop: remove superfluous NULL pointer checks
From: Jonathan Woithe @ 2009-07-31  3:57 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Jonathan Woithe, linux-acpi, linux-kernel, Dan Carpenter, corbet,
	eteo, Julia Lawall
In-Reply-To: <200907301312.48132.bzolnier@gmail.com>

Hi

> > > From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > Subject: [PATCH] fujitsu-laptop: remove superfluous NULL pointer checks
> > > 
> > > This takes care of the following entries from Dan's list:
> > > 
> > > drivers/platform/x86/fujitsu-laptop.c +327 set_lcd_level(13) warning: variable derefenced before check 'fujitsu'
> > > drivers/platform/x86/fujitsu-laptop.c +358 set_lcd_level_alt(13) warning: variable derefenced before check 'fujitsu'
> > 
> > I'd rather keep the test for a non-null fujitsu in there, but obviously it's
> > kind of pointless doing it after the first dereference.  Since this fixup
> > overlaps with the one previously discussed with Julia I've taken the liberty
> 
> They are on top of Julia's patch (sorry I forgot to mention it)..

Ah, right.  I was trying to work out where it fitted (that was probably the
one preceeding patch I *didn't* end up trying).

Regards
  jonathan

^ permalink raw reply

* Re: [PATCH] fujitsu-laptop: consolidated fixes (NULL pointer checks, [un]registration fixes, etc)
From: Jonathan Woithe @ 2009-07-31  4:04 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Jonathan Woithe, linux-acpi, linux-kernel, Dan Carpenter, corbet,
	eteo, Julia Lawall
In-Reply-To: <200907301408.10864.bzolnier@gmail.com>

Hi all

> > This patch consolidates the two patches posted by Bartlomiej Zolnierkiewicz
> > and the NULL pointer patch which came out of a discussion with Julia Lawall.
> > Some additional NULL pointer check tidy-ups have also been done.
> > 
> > To summarise:
> 
> I think that for the increased readability and to preserve proper credits it
> would be better to keep separate patches (updating the driver version can be
> as well handled in an incremental patch).

Ok, leave it with me and I'll re-split along these lines.

> I would also strongly insist on removing the following needless NULL pointer
> checks introduced by this version.  They may hide real bugs in the ACPI driver
> code (which should make sure that we always have valid 'device' in the ACPI
> driver methods and that it doesn't go away while the method executes) or in
> the fujitsu-laptop itself (which has to always provide valid 'fujitsu_hotkey',
> 'fujitsu' and 'input' pointers and not free them while there are still some
> users left).

I'll take another look at these.  What I was concerned about was whether the
hotkey pointer in particular was valid in all cases (since some of the
relevant laptops don't in fact have hotkeys) - and I was very short of time
yesterday.  I've had a very quick look through the code just now and it
seems that this may not be a problem after all.  In any case I'll confirm
this over the next day or so and redo/reisssue the patches accordingly.

Regards
  jonathan

^ permalink raw reply

* Re: Dynamic configure max_cstate
From: Zhang, Yanmin @ 2009-07-31  7:06 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Andreas Mohr, Corrado Zoccolo, LKML, linux-acpi
In-Reply-To: <4A726844.7040505@gmail.com>

On Thu, 2009-07-30 at 21:43 -0600, Robert Hancock wrote:
> On 07/28/2009 04:11 AM, Andreas Mohr wrote:
> > Hi,
> >
> > On Tue, Jul 28, 2009 at 05:00:35PM +0800, Zhang, Yanmin wrote:
> >> I tried different clocksources. For exmaple, I could get a better (30%) result with
> >> hpet. With hpet, cpu utilization is about 5~8%. Function hpet_read uses too much cpu
> >> time. With tsc, cpu utilization is about 2~3%. I think more cpu utilization causes fewer
> >> C state transitions.
> >>
> >> With idle=poll, the result is about 10% better than the one of hpet. If using idle=poll,
> >> I didn't find result difference among different clocksources.
> >
> > IOW, this seems to clearly point to ACPI Cx causing it.
> >
> > Both Corrado and me have been thinking that one should try skipping all
> > bigger-latency ACPI Cx states whenever there's an ongoing I/O request where an
> > immediate reply interrupt is expected.
> >
> > I've been investigating this a bit, and interesting parts would perhaps include
> >    . kernel/pm_qos_params.c
> >    . drivers/cpuidle/governors/menu.c (which acts on the ACPI _cx state
> > structs as configured by drivers/acpi/processor_idle.c)
> >    . and e.g. the wait_for_completion_timeout() part in drivers/ata/libata-core.c
> >      (or other sources in case of other disk I/O mechanisms)
> >
> > One way to do some quick (and dirty!!) testing would be to set a flag
> > before calling wait_for_completion_timeout() and testing for this flag in
> > drivers/cpuidle/governors/menu.c and then skip deeper Cx states
> > conditionally.
> >
> > As a very quick test, I tried a
> > while :; do :; done
> > loop in shell and renicing shell to 19 (to keep my CPU out of ACPI idle),
> > but bonnie -s 100 results initially looked promising yet turned out to
> > be inconsistent. The real way to test this would be idle=poll.
> > My test system was Athlon XP with /proc/acpi/processor/CPU0/power
> > latencies of 000 and 100 (the maximum allowed value, BTW) for C1/C2.
> >
> > If the wait_for_completion_timeout() flag testing turns out to help,
> > then one might intend to use the pm_qos infrastructure to indicate
> > these conditions, however it might be too bloated for such a
> > purpose, a relatively simple (read: fast) boolean flag mechanism
> > could be better.
> >
> > Plus one could then create a helper function which figures out a
> > "pretty fast" Cx state (independent of specific latency times!).
> > But when introducing this mechanism, take care to not ignore the
> > requirements defined by pm_qos settings!
> >
> > Oh, and about the places which submit I/O requests where one would have to
> > flag this: are they in any way correlated with the scheduler I/O wait
> > value? Would the I/O wait mechanism be a place to more easily and centrally
> > indicate that we're waiting for a request to come back in "very soon"?
> > OTOH I/O requests may have vastly differing delay expectations,
> > thus specifically only short-term expected I/O replies should be flagged,
> > otherwise we're wasting lots of ACPI deep idle opportunities.
> 
> Did the results show a big difference in performance between maximum C2 
> and maximum C3? 
No big difference. I tried different max cstate by processor.max_cstate.
Mostly, processor.max_cstate=1 could get the similiar result like idle=poll.

> Thing with C3 is that it likely will have some 
> interference with bus-master DMA activity as the CPU has to wake up at 
> least partially before the SATA controller can complete DMA operations, 
> which will likely stall the controller for some period of time. There 
> would be an argument for avoiding going into deep C-states which can't 
> handle snooping while IO is in progress and DMA will shortly be occurring..


^ permalink raw reply

* RE: [Xen-devel] [PATCH][pvops_dom0][2/4] Introduce the external control operation interface for domain0 ACPI parser
From: Yu, Ke @ 2009-07-31  8:05 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-acpi@vger.kernel.org, Jeremy Fitzhardinge, Tian, Kevin
In-Reply-To: <alpine.LFD.2.00.0907301204260.17818@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 4086 bytes --]

>-----Original Message-----
>From: Len Brown [mailto:lenb@kernel.org]
>Sent: Friday, July 31, 2009 12:29 AM
>To: Yu, Ke
>Cc: linux-acpi@vger.kernel.org; Jeremy Fitzhardinge; Tian, Kevin
>Subject: RE: [Xen-devel] [PATCH][pvops_dom0][2/4] Introduce the external
>control operation interface for domain0 ACPI parser
>
>Unclear that the power management partitioning between xen hypervisor
>and dom0 is fully baked.
>
>Uncear (to me) what xen is doing internally with these power management
>objects, and how that differs from what Linux would do.
>
>While patches to the Linux kernel may be a good RFE, prototype, or base
>for discussion, the unknowns above need to be addressed to before it
>makes much sense to spent a large amount of time on the source.

Oh yes, I would like to add the background. In xen architecture, only hypervisor can control the physical CPU, so all the physical CPU Cx/Px power management is done in hypervisor side. And the cpuidle/cpufreq driver in dom0 is disabled. the Xen PM algorithm is ported from linux side, so it is similar as linux. For cpuidle part, when idle vcpu is scheduled (similar as idle process in linux), it will invoke cpuidle routine. cpuidle routine firstly decide which C state to go by calling cpuidle governor (e.g. menu governor), then cpuidle routine will enter C state using the ACPI method (e.g. I/O port, mwait described in _CST).
For cpufreq part, cpufreq driver will register the governor (e.g. ondemand) and cpu driver (cpufreq-acpi driver). once the cpufreq start on certain CPU, the governor will change the frequency by calling the corresponding cpu driver. http://wiki.xensource.com/xenwiki/xenpm has one picture on the cpufreq architecture.

All the necessary ACPI information in the above process is parsed by dom0 and pass to hypervisor.

>
>some things did jump out of the patch, however...
>
>I do not recommend believing _PSD.
>Our experience is that 50% of the time it is crap.

That is true, we also meet crap _PSD in especially the new platform. We will add more check on the _PSD data.

>
>Why does xen_processor_px exists when it is the same as acpi_processor_px?
>ditto for acpi_processor_cx and xen_processor_cx

xen_processor_px is part of the hypercall interface, and mainly used in hypervisor. since hypervisor has no acpi_processor_px as kernel has, so we have to add the same data structure for information passing purpose. similar for xen_processor_cx.

>
>Lose the ifdefs

Can you elaborate which part lose the ifdef? for the drivers/xen/processor_extcntl.c, since it has "obj-$(CONFIG_ACPI_PROCESSOR_XEN) += processor_extcntl.o" in Makefile, it implies #ifdef CONFIG_ACPI_PROCESSOR_XEN. 

>Lose the tests for xen on the inline code when
>they can be inside the called routines.

Can you elaborate which part? 

>
>This doesn't look like an abstraction layer,
>it looks more like a simple conduit.
>The thing at the other end (xen) will need to know
>just as much about these data structures as the
>thing that sent them (linux)

The reason is that xen and linux use the similar power management algorithm, so the data structure xen need is also similar as Linux side. This actually make life easier, in that we don't need to invent an abstraction layer here to handle the difference, since there is actually no difference.

>
>the patch doesn't apply to the upstream kernel --
>even the ACPI specific parts.

That is strange. I can successfully apply the ACPI specific parts to the upstream (linus 2.6 tree, parent commit b592972). 
For the xen part, it is based on Jeremy's git tree (git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git) rebase/master branch, so it cannot be cleanly applied to upstream.

>
>checkpatch.pl:
>total: 168 errors, 42 warnings, 643 lines checked
>
>Don't send me another patch that can't pass checkpatch.pl,
>even if just an RFC.

Sorry for that. I have fix the code style issue and the attached updated version can pass the checkpatch.pl

BTW, thanks for your comments.

Best Regards
Ke

[-- Attachment #2: 03.patch --]
[-- Type: application/octet-stream, Size: 21948 bytes --]

commit 60864023151926a582d403a899e69dd53421cb34
Author: Yu Ke <ke.yu@intel.com>
Date:   Fri Jul 31 11:00:15 2009 +0800

    Leverage domain0 ACPI parser for xen
    
    This patch reuse dom0 ACPI parser to get C/P state for Xen
    
    === Overview ===
    
    Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states
    power management. This info is provided by BIOS ACPI table. Since
    hypervisor has no ACPI parser, this info has to be parsed by domain0
    kernel ACPI sub-system, and then passed to hypervisor by hypercall.
    
    To make this happen, the key point is to add hook in the kernel ACPI
    sub-system. Fortunately, kernel already has good abstraction, and
    only several places need to add hook. To be more detail, there is an
    acpi_processor_driver (in drivers/acpi/processor_core.c) , which all the
    Cx/Px parsing event will go to. This driver will call its acpi processor
    event handler, e.g. add/remove, start/stop, notify to handle these
    events. These event handlers in turn will call some library functions (in
    drivers/acpi/processor_perflib.c), e.g. acpi_processor_ppc_has_changed,
    acpi_processor_ppc_has_changed, acpi_processor_cst_has_changed, to finish
    the acpi info parsing.
    
    So this patch add the xen hook in these places to notify xen for the parsed
    Cx/Px state information.
    
    Signed-off-by: Yu Ke <ke.yu@intel.com>
    Signed-off-by: Tian Kevin <kevin.tian@intel.com>

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 84e0f3c..2707d65 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -58,6 +58,7 @@
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
 #include <acpi/processor.h>
+#include <xen/acpi.h>
 
 #define ACPI_PROCESSOR_CLASS		"processor"
 #define ACPI_PROCESSOR_DEVICE_NAME	"Processor"
@@ -751,6 +752,10 @@ static int __cpuinit acpi_processor_start(struct acpi_device *device)
 
 	acpi_processor_power_init(pr, device);
 
+	result = processor_cntl_xen_prepare(pr);
+	if (result)
+		goto end;
+
 	pr->cdev = thermal_cooling_device_register("Processor", device,
 						&processor_cooling_ops);
 	if (IS_ERR(pr->cdev)) {
@@ -963,6 +968,10 @@ int acpi_processor_device_add(acpi_handle handle, struct acpi_device **device)
 	if (!pr)
 		return -ENODEV;
 
+	if (processor_cntl_xen())
+		processor_cntl_xen_notify(pr,
+			PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD);
+
 	if ((pr->id >= 0) && (pr->id < nr_cpu_ids)) {
 		kobject_uevent(&(*device)->dev.kobj, KOBJ_ONLINE);
 	}
@@ -1002,11 +1011,19 @@ static void __ref acpi_processor_hotplug_notify(acpi_handle handle,
 			break;
 		}
 
+		if (processor_cntl_xen())
+			processor_cntl_xen_notify(pr,
+					PROCESSOR_HOTPLUG, HOTPLUG_TYPE_ADD);
+
 		if (pr->id >= 0 && (pr->id < nr_cpu_ids)) {
 			kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
 			break;
 		}
 
+		if (processor_cntl_xen())
+			processor_cntl_xen_notify(pr, PROCESSOR_HOTPLUG,
+							HOTPLUG_TYPE_REMOVE);
+
 		result = acpi_processor_start(device);
 		if ((!result) && ((pr->id >= 0) && (pr->id < nr_cpu_ids))) {
 			kobject_uevent(&device->dev.kobj, KOBJ_ONLINE);
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 0efa59e..8994aff 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -58,6 +58,7 @@
 
 #include <acpi/acpi_bus.h>
 #include <acpi/processor.h>
+#include <xen/acpi.h>
 #include <asm/processor.h>
 
 #define ACPI_PROCESSOR_CLASS            "processor"
@@ -455,6 +456,12 @@ static int acpi_processor_get_power_info_cst(struct acpi_processor *pr)
 
 		cx.power = obj->integer.value;
 
+#ifdef CONFIG_ACPI_PROCESSOR_XEN
+		/* cache control methods to notify xen*/
+		if (processor_cntl_xen_pm())
+			memcpy(&cx.reg, reg, sizeof(*reg));
+#endif
+
 		current_count++;
 		memcpy(&(pr->power.states[current_count]), &cx, sizeof(cx));
 
@@ -1141,6 +1148,13 @@ int acpi_processor_cst_has_changed(struct acpi_processor *pr)
 	if (!pr->flags.power_setup_done)
 		return -ENODEV;
 
+    if (processor_cntl_xen_pm()) {
+		acpi_processor_get_power_info(pr);
+		processor_cntl_xen_notify(pr,
+			PROCESSOR_PM_CHANGE, PM_TYPE_IDLE);
+		return ret;
+	}
+
 	cpuidle_pause_and_lock();
 	cpuidle_disable_device(&pr->power.dev);
 	acpi_processor_get_power_info(pr);
@@ -1204,9 +1218,14 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr,
 	 * platforms that only support C1.
 	 */
 	if (pr->flags.power) {
-		acpi_processor_setup_cpuidle(pr);
-		if (cpuidle_register_device(&pr->power.dev))
-			return -EIO;
+		if (processor_cntl_xen_pm())
+			processor_cntl_xen_notify(pr,
+					PROCESSOR_PM_INIT, PM_TYPE_IDLE);
+		else {
+			acpi_processor_setup_cpuidle(pr);
+			if (cpuidle_register_device(&pr->power.dev))
+				return -EIO;
+		}
 
 		printk(KERN_INFO PREFIX "CPU%d (power states:", pr->id);
 		for (i = 1; i <= pr->power.count; i++)
diff --git a/drivers/acpi/processor_perflib.c b/drivers/acpi/processor_perflib.c
index 60e543d..8375075 100644
--- a/drivers/acpi/processor_perflib.c
+++ b/drivers/acpi/processor_perflib.c
@@ -38,6 +38,7 @@
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
 #include <acpi/processor.h>
+#include <xen/acpi.h>
 
 #define ACPI_PROCESSOR_CLASS		"processor"
 #define ACPI_PROCESSOR_FILE_PERFORMANCE	"performance"
@@ -154,13 +155,16 @@ int acpi_processor_ppc_has_changed(struct acpi_processor *pr)
 {
 	int ret;
 
-	if (ignore_ppc)
+	if (ignore_ppc && !processor_cntl_xen_pmperf())
 		return 0;
 
 	ret = acpi_processor_get_platform_limit(pr);
 
 	if (ret < 0)
 		return (ret);
+	else if (processor_cntl_xen_pmperf())
+		return processor_cntl_xen_notify(pr,
+				PROCESSOR_PM_CHANGE, PM_TYPE_PERF);
 	else
 		return cpufreq_update_policy(pr->id);
 }
@@ -330,7 +334,7 @@ static int acpi_processor_get_performance_states(struct acpi_processor *pr)
 	return result;
 }
 
-static int acpi_processor_get_performance_info(struct acpi_processor *pr)
+int acpi_processor_get_performance_info(struct acpi_processor *pr)
 {
 	int result = 0;
 	acpi_status status = AE_OK;
@@ -432,7 +436,7 @@ int acpi_processor_notify_smm(struct module *calling_module)
 
 EXPORT_SYMBOL(acpi_processor_notify_smm);
 
-static int acpi_processor_get_psd(struct acpi_processor	*pr)
+int acpi_processor_get_psd(struct acpi_processor	*pr)
 {
 	int result = 0;
 	acpi_status status = AE_OK;
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 3b1c421..d303c25 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -90,4 +90,9 @@ config XEN_XENBUS_FRONTEND
 
 config XEN_S3
        def_bool y
-       depends on XEN_DOM0 && ACPI
\ No newline at end of file
+       depends on XEN_DOM0 && ACPI
+
+config ACPI_PROCESSOR_XEN
+	bool
+	depends on XEN_DOM0 && ACPI_PROCESSOR && CPU_FREQ
+	default y
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 386c775..42c9ace 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -9,4 +9,5 @@ obj-$(CONFIG_XEN_BLKDEV_BACKEND)	+= blkback/
 obj-$(CONFIG_XEN_NETDEV_BACKEND)	+= netback/
 obj-$(CONFIG_XENFS)			+= xenfs/
 obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
-obj-$(CONFIG_XEN_S3)			+= acpi.o
\ No newline at end of file
+obj-$(CONFIG_XEN_S3)			+= acpi.o
+obj-$(CONFIG_ACPI_PROCESSOR_XEN) += processor_extcntl.o
diff --git a/drivers/xen/processor_extcntl.c b/drivers/xen/processor_extcntl.c
new file mode 100644
index 0000000..85d708c
--- /dev/null
+++ b/drivers/xen/processor_extcntl.c
@@ -0,0 +1,418 @@
+/*
+ * processor_extcntl.c - interface to notify Xen
+ *
+ *  Copyright (C) 2008, Intel corporation
+ *
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or (at
+ *  your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but
+ *  WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, write to the Free Software Foundation, Inc.,
+ *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/acpi.h>
+#include <linux/pm.h>
+#include <linux/cpu.h>
+
+#include <linux/cpufreq.h>
+#include <acpi/processor.h>
+#include <xen/acpi.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/hypervisor.h>
+
+static int processor_cntl_xen_get_performance(struct acpi_processor *pr);
+static int xen_hotplug_notifier(struct acpi_processor *pr, int event);
+
+static struct processor_cntl_xen_ops xen_ops = {
+	.hotplug		= xen_hotplug_notifier,
+};
+
+int processor_cntl_xen(void)
+{
+	return 1;
+}
+
+int processor_cntl_xen_pm(void)
+{
+	return (xen_ops.pm_ops[PM_TYPE_IDLE] != NULL);
+}
+
+int processor_cntl_xen_pmperf(void)
+{
+	return	(xen_ops.pm_ops[PM_TYPE_PERF] != NULL);
+}
+
+int processor_cntl_xen_pmthr(void)
+{
+	return (xen_ops.pm_ops[PM_TYPE_THR] != NULL);
+}
+
+static int processor_notify_smm(void)
+{
+	acpi_status status;
+	static int is_done;
+
+	/* only need successfully notify BIOS once */
+	/* avoid double notification which may lead to unexpected result */
+	if (is_done)
+		return 0;
+
+	/* Can't write pstate_cnt to smi_cmd if either value is zero */
+	if ((!acpi_gbl_FADT.smi_command) || (!acpi_gbl_FADT.pstate_control)) {
+		ACPI_DEBUG_PRINT((ACPI_DB_INFO, "No SMI port or pstate_cnt\n"));
+		return 0;
+	}
+
+	ACPI_DEBUG_PRINT((ACPI_DB_INFO,
+		"Writing pstate_cnt [0x%x] to smi_cmd [0x%x]\n",
+		acpi_gbl_FADT.pstate_control, acpi_gbl_FADT.smi_command));
+
+	status = acpi_os_write_port(acpi_gbl_FADT.smi_command,
+				    (u32) acpi_gbl_FADT.pstate_control, 8);
+	if (ACPI_FAILURE(status))
+		return status;
+
+	is_done = 1;
+
+	return 0;
+}
+
+int processor_cntl_xen_notify(struct acpi_processor *pr, int event, int type)
+{
+	int ret = -EINVAL;
+
+	switch (event) {
+	case PROCESSOR_PM_INIT:
+	case PROCESSOR_PM_CHANGE:
+		if ((type >= PM_TYPE_MAX) ||
+			!xen_ops.pm_ops[type])
+			break;
+
+		ret = xen_ops.pm_ops[type](pr, event);
+		break;
+	case PROCESSOR_HOTPLUG:
+		if (xen_ops.hotplug)
+			ret = xen_ops.hotplug(pr, type);
+		break;
+	default:
+		printk(KERN_ERR "Unsupport processor events %d.\n", event);
+		break;
+	}
+
+	return ret;
+}
+
+/*
+ * This is called from ACPI processor init, and targeted to hold
+ * some tricky housekeeping jobs to satisfy xen.
+ * For example, we may put dependency parse stub here for idle
+ * and performance state. Those information may be not available
+ * if splitting from dom0 control logic like cpufreq driver.
+ */
+int processor_cntl_xen_prepare(struct acpi_processor *pr)
+{
+
+	/* Initialize performance states */
+	if (processor_cntl_xen_pmperf())
+		processor_cntl_xen_get_performance(pr);
+
+	return 0;
+}
+
+/*
+ * Existing ACPI module does parse performance states at some point,
+ * when acpi-cpufreq driver is loaded which however is something
+ * we'd like to disable to avoid confliction with xen PM
+ * logic. So we have to collect raw performance information here
+ * when ACPI processor object is found and started.
+ */
+static int processor_cntl_xen_get_performance(struct acpi_processor *pr)
+{
+	int ret;
+	struct acpi_processor_performance *perf;
+	struct acpi_psd_package *pdomain;
+
+	if (pr->performance)
+		return -EBUSY;
+
+	perf = kzalloc(sizeof(struct acpi_processor_performance), GFP_KERNEL);
+	if (!perf)
+		return -ENOMEM;
+
+	pr->performance = perf;
+	/* Get basic performance state information */
+	ret = acpi_processor_get_performance_info(pr);
+	if (ret < 0)
+		goto err_out;
+
+	/*
+	 * Well, here we need retrieve performance dependency information
+	 * from _PSD object. The reason why existing interface is not used
+	 * is due to the reason that existing interface sticks to Linux cpu
+	 * id to construct some bitmap, however we want to split ACPI
+	 * processor objects from Linux cpu id logic. For example, even
+	 * when Linux is configured as UP, we still want to parse all ACPI
+	 * processor objects to xen. In this case, it's preferred
+	 * to use ACPI ID instead.
+	 */
+	pdomain = &pr->performance->domain_info;
+	pdomain->num_processors = 0;
+	ret = acpi_processor_get_psd(pr);
+	if (ret < 0) {
+		/*
+		 * _PSD is optional - assume no coordination if absent (or
+		 * broken), matching native kernels' behavior.
+		 */
+		pdomain->num_entries = ACPI_PSD_REV0_ENTRIES;
+		pdomain->revision = ACPI_PSD_REV0_REVISION;
+		pdomain->domain = pr->acpi_id;
+		pdomain->coord_type = DOMAIN_COORD_TYPE_SW_ALL;
+		pdomain->num_processors = 1;
+	}
+
+	/* Some sanity check */
+	if ((pdomain->revision != ACPI_PSD_REV0_REVISION) ||
+	    (pdomain->num_entries != ACPI_PSD_REV0_ENTRIES) ||
+	    ((pdomain->coord_type != DOMAIN_COORD_TYPE_SW_ALL) &&
+	     (pdomain->coord_type != DOMAIN_COORD_TYPE_SW_ANY) &&
+	     (pdomain->coord_type != DOMAIN_COORD_TYPE_HW_ALL))) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
+	/* Last step is to notify BIOS that xen exists */
+	processor_notify_smm();
+
+	processor_cntl_xen_notify(pr, PROCESSOR_PM_INIT, PM_TYPE_PERF);
+
+	return 0;
+err_out:
+	pr->performance = NULL;
+	kfree(perf);
+	return ret;
+}
+
+static inline void xen_convert_pct_reg(struct xen_pct_register *xpct,
+	struct acpi_pct_register *apct)
+{
+	xpct->descriptor = apct->descriptor;
+	xpct->length     = apct->length;
+	xpct->space_id   = apct->space_id;
+	xpct->bit_width  = apct->bit_width;
+	xpct->bit_offset = apct->bit_offset;
+	xpct->reserved   = apct->reserved;
+	xpct->address    = apct->address;
+}
+
+static inline void xen_convert_pss_states(struct xen_processor_px *xpss,
+	struct acpi_processor_px *apss, int state_count)
+{
+	int i;
+	for (i = 0; i < state_count; i++) {
+		xpss->core_frequency     = apss->core_frequency;
+		xpss->power              = apss->power;
+		xpss->transition_latency = apss->transition_latency;
+		xpss->bus_master_latency = apss->bus_master_latency;
+		xpss->control            = apss->control;
+		xpss->status             = apss->status;
+		xpss++;
+		apss++;
+	}
+}
+
+static inline void xen_convert_psd_pack(struct xen_psd_package *xpsd,
+	struct acpi_psd_package *apsd)
+{
+	xpsd->num_entries    = apsd->num_entries;
+	xpsd->revision       = apsd->revision;
+	xpsd->domain         = apsd->domain;
+	xpsd->coord_type     = apsd->coord_type;
+	xpsd->num_processors = apsd->num_processors;
+}
+
+static int xen_cx_notifier(struct acpi_processor *pr, int action)
+{
+	int ret, count = 0, i;
+	xen_platform_op_t op = {
+		.cmd			= XENPF_set_processor_pminfo,
+		.interface_version	= XENPF_INTERFACE_VERSION,
+		.u.set_pminfo.id	= pr->acpi_id,
+		.u.set_pminfo.type	= XEN_PM_CX,
+	};
+	struct xen_processor_cx *data, *buf;
+	struct acpi_processor_cx *cx;
+
+	if (action == PROCESSOR_PM_CHANGE)
+		return -EINVAL;
+
+	/* Convert to Xen defined structure and hypercall */
+	buf = kzalloc(pr->power.count * sizeof(struct xen_processor_cx),
+			GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	data = buf;
+	for (i = 1; i <= pr->power.count; i++) {
+		cx = &pr->power.states[i];
+		/* Skip invalid cstate entry */
+		if (!cx->valid)
+			continue;
+
+		data->type = cx->type;
+		data->latency = cx->latency;
+		data->power = cx->power;
+		data->reg.space_id = cx->reg.space_id;
+		data->reg.bit_width = cx->reg.bit_width;
+		data->reg.bit_offset = cx->reg.bit_offset;
+		data->reg.access_size = cx->reg.reserved;
+		data->reg.address = cx->reg.address;
+
+		/* Get dependency relationships, _CSD is not supported yet */
+		data->dpcnt = 0;
+		set_xen_guest_handle(data->dp, NULL);
+
+		data++;
+		count++;
+	}
+
+	if (!count) {
+		printk(KERN_ERR "No available Cx info for cpu %d\n",
+				pr->acpi_id);
+		kfree(buf);
+		return -EINVAL;
+	}
+
+	op.u.set_pminfo.power.count = count;
+	op.u.set_pminfo.power.flags.bm_control = pr->flags.bm_control;
+	op.u.set_pminfo.power.flags.bm_check = pr->flags.bm_check;
+	op.u.set_pminfo.power.flags.has_cst = pr->flags.has_cst;
+	op.u.set_pminfo.power.flags.power_setup_done =
+		pr->flags.power_setup_done;
+
+	set_xen_guest_handle(op.u.set_pminfo.power.states, buf);
+	ret = HYPERVISOR_dom0_op(&op);
+	kfree(buf);
+	return ret;
+}
+
+static int xen_px_notifier(struct acpi_processor *pr, int action)
+{
+	int ret = -EINVAL;
+	xen_platform_op_t op = {
+		.cmd			= XENPF_set_processor_pminfo,
+		.interface_version	= XENPF_INTERFACE_VERSION,
+		.u.set_pminfo.id	= pr->acpi_id,
+		.u.set_pminfo.type	= XEN_PM_PX,
+	};
+	struct xen_processor_performance *perf;
+	struct xen_processor_px *states = NULL;
+	struct acpi_processor_performance *px;
+	struct acpi_psd_package *pdomain;
+
+	if (!pr)
+		return -EINVAL;
+
+	perf = &op.u.set_pminfo.perf;
+	px = pr->performance;
+
+	switch (action) {
+	case PROCESSOR_PM_CHANGE:
+		/* ppc dynamic handle */
+		perf->flags = XEN_PX_PPC;
+		perf->platform_limit = pr->performance_platform_limit;
+
+		ret = HYPERVISOR_dom0_op(&op);
+		break;
+
+	case PROCESSOR_PM_INIT:
+		/* px normal init */
+		perf->flags = XEN_PX_PPC |
+			      XEN_PX_PCT |
+			      XEN_PX_PSS |
+			      XEN_PX_PSD;
+
+		/* ppc */
+		perf->platform_limit = pr->performance_platform_limit;
+
+		/* pct */
+		xen_convert_pct_reg(&perf->control_register,
+				&px->control_register);
+		xen_convert_pct_reg(&perf->status_register,
+				&px->status_register);
+
+		/* pss */
+		perf->state_count = px->state_count;
+		states = kzalloc(px->state_count*sizeof(xen_processor_px_t),
+				GFP_KERNEL);
+		if (!states)
+			return -ENOMEM;
+		xen_convert_pss_states(states, px->states, px->state_count);
+		set_xen_guest_handle(perf->states, states);
+
+		/* psd */
+		pdomain = &px->domain_info;
+		xen_convert_psd_pack(&perf->domain_info, pdomain);
+		if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ALL)
+			perf->shared_type = CPUFREQ_SHARED_TYPE_ALL;
+		else if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ANY)
+			perf->shared_type = CPUFREQ_SHARED_TYPE_ANY;
+		else if (pdomain->coord_type == DOMAIN_COORD_TYPE_HW_ALL)
+			perf->shared_type = CPUFREQ_SHARED_TYPE_HW;
+		else {
+			ret = -ENODEV;
+			kfree(states);
+			break;
+		}
+
+		ret = HYPERVISOR_dom0_op(&op);
+		kfree(states);
+		break;
+
+	default:
+		break;
+	}
+
+	return ret;
+}
+
+static int xen_tx_notifier(struct acpi_processor *pr, int action)
+{
+	return -EINVAL;
+}
+static int xen_hotplug_notifier(struct acpi_processor *pr, int event)
+{
+	return -EINVAL;
+}
+
+static int __init xen_acpi_processor_extcntl_init(void)
+{
+	unsigned int pmbits = (xen_start_info->flags & SIF_PM_MASK) >> 8;
+
+	if (!pmbits)
+		return 0;
+	if (pmbits & XEN_PROCESSOR_PM_CX)
+		xen_ops.pm_ops[PM_TYPE_IDLE] = xen_cx_notifier;
+	if (pmbits & XEN_PROCESSOR_PM_PX)
+		xen_ops.pm_ops[PM_TYPE_PERF] = xen_px_notifier;
+	if (pmbits & XEN_PROCESSOR_PM_TX)
+		xen_ops.pm_ops[PM_TYPE_THR] = xen_tx_notifier;
+
+	return 0;
+}
+
+subsys_initcall(xen_acpi_processor_extcntl_init);
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index baf1e0a..14c7e4c 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -77,6 +77,10 @@ struct acpi_processor_cx {
 	struct acpi_processor_cx_policy promotion;
 	struct acpi_processor_cx_policy demotion;
 	char desc[ACPI_CX_DESC_LEN];
+#ifdef CONFIG_ACPI_PROCESSOR_XEN
+	/* Require raw information for xen*/
+	struct acpi_power_register reg;
+#endif /* CONFIG_ACPI_PROCESSOE_XEN */
 };
 
 struct acpi_processor_power {
@@ -295,6 +299,8 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
 void acpi_processor_ppc_init(void);
 void acpi_processor_ppc_exit(void);
 int acpi_processor_ppc_has_changed(struct acpi_processor *pr);
+int acpi_processor_get_performance_info(struct acpi_processor *pr);
+int acpi_processor_get_psd(struct acpi_processor *pr);
 #else
 static inline void acpi_processor_ppc_init(void)
 {
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index fea4cfb..6ba2f5b 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -20,4 +20,59 @@ static inline bool xen_pv_acpi(void)
 int acpi_notify_hypervisor_state(u8 sleep_state,
 				 u32 pm1a_cnt, u32 pm1b_cnd);
 
+/*
+ * Following are interfaces for xen acpi processor control
+ */
+
+/* Events notified to xen */
+#define PROCESSOR_PM_INIT	1
+#define PROCESSOR_PM_CHANGE	2
+#define PROCESSOR_HOTPLUG	3
+
+/* Objects for the PM events */
+#define PM_TYPE_IDLE		0
+#define PM_TYPE_PERF		1
+#define PM_TYPE_THR		2
+#define PM_TYPE_MAX		3
+
+/* Processor hotplug events */
+#define HOTPLUG_TYPE_ADD	0
+#define HOTPLUG_TYPE_REMOVE	1
+
+#ifdef CONFIG_ACPI_PROCESSOR_XEN
+#include <acpi/acpi_drivers.h>
+#include <acpi/processor.h>
+
+struct processor_cntl_xen_ops {
+	/* Transfer processor PM events to xen */
+int (*pm_ops[PM_TYPE_MAX])(struct acpi_processor *pr, int event);
+	/* Notify physical processor status to xen */
+	int (*hotplug)(struct acpi_processor *pr, int type);
+};
+
+extern int processor_cntl_xen(void);
+extern int processor_cntl_xen_pm(void);
+extern int processor_cntl_xen_pmperf(void);
+extern int processor_cntl_xen_pmthr(void);
+extern int processor_cntl_xen_prepare(struct acpi_processor *pr);
+extern int processor_cntl_xen_notify(struct acpi_processor *pr,
+			int event, int type);
+
+#else
+
+static inline int processor_cntl_xen(void) { return 0; }
+static inline int processor_cntl_xen_pm(void) { return 0; }
+static inline int processor_cntl_xen_pmperf(void) { return 0; }
+static inline int processor_cntl_xen_pmthr(void) { return 0; }
+static inline int processor_cntl_xen_notify(struct acpi_processor *pr,
+			int event, int type)
+{
+	return 0;
+}
+static inline int processor_cntl_xen_prepare(struct acpi_processor *pr)
+{
+	return 0;
+}
+#endif /* CONFIG_ACPI_PROCESSOR_XEN */
+
 #endif	/* _XEN_ACPI_H */

^ permalink raw reply related

* Re: Dynamic configure max_cstate
From: Andreas Mohr @ 2009-07-31  8:07 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Robert Hancock, Andreas Mohr, Corrado Zoccolo, LKML, linux-acpi
In-Reply-To: <1249024006.2560.735.camel@ymzhang>

Hi,

On Fri, Jul 31, 2009 at 03:06:46PM +0800, Zhang, Yanmin wrote:
> On Thu, 2009-07-30 at 21:43 -0600, Robert Hancock wrote:
> > On 07/28/2009 04:11 AM, Andreas Mohr wrote:
> > > Oh, and about the places which submit I/O requests where one would have to
> > > flag this: are they in any way correlated with the scheduler I/O wait
> > > value? Would the I/O wait mechanism be a place to more easily and centrally
> > > indicate that we're waiting for a request to come back in "very soon"?
> > > OTOH I/O requests may have vastly differing delay expectations,
> > > thus specifically only short-term expected I/O replies should be flagged,
> > > otherwise we're wasting lots of ACPI deep idle opportunities.
> > 
> > Did the results show a big difference in performance between maximum C2 
> > and maximum C3? 
> No big difference. I tried different max cstate by processor.max_cstate.
> Mostly, processor.max_cstate=1 could get the similiar result like idle=poll.

OK, but I'd say that this doesn't mean that we should implement a
hard-coded mechanism which simply says "in such cases, don't do anything > C1".
Instead we should strive for a far-reaching _generic_ mechanism
which gathers average latencies of various I/O activities/devices
and then uses some formula to determine the maximum (not necessarily ACPI)
idle latency that we're willing to endure (e.g. average device I/O reply latency
divided by 10 or so).

And in addition to this, we should also take into account (read: skip)
any idle states which kill busmaster DMA completely
(in case of busmaster DMA I/O activities, that is).

_Lots_ of very nice opportunities for improvement here, I'd say...
(in the 5, 10 or even 40% range in the case of certain network I/O)

Andreas

^ permalink raw reply

* [PATCH 0/4] fujitsu-laptop: clean-ups and fixes
From: Jonathan Woithe @ 2009-07-31  8:40 UTC (permalink / raw)
  To: linux-acpi; +Cc: Jonathan Woithe, linux-kernel, julia, bzolnier

[PATCH 0/4] fujitsu-laptop: clean-ups and fixes

This patch series against fujitsu-laptop contains corrections to NULL
pointer tests, removal of redundant tests, fixes to the driver
registration/unregistration and other minor cleanups.

The patches have been sourced from various authors - credits appear in the
individual patches.

Please consider applying to linux-acpi.

Acked-by: Jonathan Woithe <jwoithe@physics.adelaide.edu.au>

Regards
  jonathan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox