From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [RFC] pvops domain0 Cx/Px info parser patch Date: Mon, 11 May 2009 11:48:08 -0700 Message-ID: <4A0872E8.7010208@goop.org> References: <4D05DB80B95B23498C72C700BD6C2E0B16A17718@pdsmsx502.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090108000504000300040906" Return-path: In-Reply-To: <4D05DB80B95B23498C72C700BD6C2E0B16A17718@pdsmsx502.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Yu, Ke" Cc: "Tian, Kevin" , "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --------------090108000504000300040906 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Yu, Ke wrote: > Hi, Jeremy > > Since now the pv_ops domain0 is approaching, we are considering to port the dom0 power management related code, or more specifically, the Cx/Px ACPI info parsing code, to pv_ops domain0 tree, so that people can utilize Xen power management capability (Cx/PX) under pv_ops domain0. This RFC is a draft version for comments. > Hi. I commented on the last patch, but saw no response. Did you get those comments? Oh, perhaps they never got sent. I've attached them now. I have not looked at these patches yet. Do they differ much? J --------------090108000504000300040906 Content-Type: message/rfc822; name="Attached Message" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Attached Message" From: Jeremy Fitzhardinge To: "Yu, Ke" CC: "Tian, Kevin" Date: Thu, 23 Apr 2009 16:14:49 -0400 Subject: Re: [RFC] pvops domain0 acpi parser patch Thread-Topic: [RFC] pvops domain0 acpi parser patch Message-ID: <49F0CC39.4010609@citrix.com> References: <4D05DB80B95B23498C72C700BD6C2E0B1675A6AF@pdsmsx502.ccr.corp.intel.com> In-Reply-To: <4D05DB80B95B23498C72C700BD6C2E0B1675A6AF@pdsmsx502.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Thunderbird 2.0.0.21 (X11/20090320) x-identity-key: id3 Content-Type: multipart/alternative; boundary="_000_49F0CC394010609citrixcom_" MIME-Version: 1.0 --_000_49F0CC394010609citrixcom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yu, Ke wrote: Hi Jeremy, Kevin and I originally write the domain0 host S3 code and Cx/Px state ACPI = info parsing code in the linux-2.6.18-xen.hg tree. Since now the pv_ops dom= ain0 is approaching, we are considering to port these code to pv_ops domain= 0 tree. It is very nice that you already port the host S3 code, so in this = email we post the Cx/Px state ACPI info parsing code. This is a draft versi= on intending for early comments. Thanks for doing this. I looked at the changes in 2.6.18-xen and put them = into the too-hard basket. Some general comments about patch submission: * please send patches one-per-mail, with [PATCH N/M] in the subject line= to indicate ordering * in each mail, include the full changelog description like you have bel= ow * each person signing off the mail should have a Signed-off-by: line This makes it easier to do line-by-line commentary on the patches and also = process it with the tools I have on hand. I just committed these patches to my tree, rebased onto xen-tip/dom0/core a= s xen-tip/dom0/acpi-parser, but I haven't even compile-tested to see if the= rebase worked. But I did cut-and-paste the overview into the first patch = comment so its recorded properly. =3D=3D=3D Overview =3D=3D=3D Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states pow= er management. This info is provided by BIOS ACPI table. Since hypervisor h= as no ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-sy= stem, and then passed to hypervisor by hypercall. Couldn't this be done with a variant of the cpufreq_acpi driver? It needs = to parse the tables, but rather than acting on them directly it passes the = information to Xen? To make this happen, the key point is to add hook in the kernel ACPI sub-sy= stem. Fortunately, kernel already has good abstraction, and only several pl= aces need to add hook. To be more detail, there is an acpi_processor_driver= (in drivers/acpi/processor_core.c) , which all the Cx/Px parsing event wil= l go to. This driver will call its acpi processor event handler, e.g. add/r= emove, start/stop, notify to handle these events. These event handlers in t= urn will call some library functions (in drivers/acpi/processor_perflib.c),= e.g. acpi_processor_ppc_has_changed, acpi_processor_ppc_has_changed, acpi_= processor_cst_has_changed, to finish the acpi info parsing. So the conclusion is: adding hooks in acpi_processor_driver and those relat= ed library functions will satisfy our requirement. To make the added hook cleaner, we introduce an interface called external c= ontrol operation (struct processor_extcntl_ops). All the hooks are encapsul= ated in this interface processor_extcntl_ops . Here the "external" means th= e acpi processor is controlled by external entity, e.g. VMM. Every kind of = external entity can has its implementation of this interface. In this patch= , the interface for Xen is implemented. Is the issue that the hypervisor will change the CPUs states asynchronously= under the kernel, and the kernel wants to be informed about those state ch= anges, along with some kind of mapping from pcpu to vcpu? What use does the kernel make of that information? I guess I don't really understand what the larger problem domain is. Do you have any plans for other hypervisor backends? Does a kvm one make s= ense? =3D=3D Patch Set Description =3D=3D This patch set is based the xen/dom0/hackery branch. It has three patches: 1. acpi_parser_framework.patch This patch introduces the interface of external control operation, and adds= hooks to the acpi sub-system, including the acpi_processor_driver, and the= related library functions. 2. acpi_parser_xen_driver.patch This patch implements the Xen version of the external control operation int= erface. 3. acpi_parser_platform_hypercall.patch This patches implements the xen_platform_op hypercall, to pass the parsed A= CPI info to hypervisor. =3D=3D Misc =3D=3D To make this patch works correctly, there is a corner case need to consider= : processor in domain0 is virtual CPU, while BIOS ACPI table provides the p= hysical CPU info, since the vCPU and pCPU is not 1:1 mapped, this may cause= d unexpected result. E.g. in UP domain0, Xen hypervisor may only get one p= hysical processor info. Current linux-2.6.18-xen.hg tree solve this issue = by using acpi_id instead of processor_id. However, this code is a bit trick= y and hacky. We are still trying to find a cleaner way to solve this issue. Is this a corner case? I would think its the common case. Anyway, this is draft version for your early comments, and after your revie= wing, we will send it to wider scope, e.g. kernel ACPI sub-system developer= in Intel. OK, some more specific comments: Cut down on the number of #ifdefs. If CONFIG_PROCESSOR_EXTERNAL_CONTROL is= not set, define the functions to appropriate no-op definitions. For examp= le, rather than: +#ifdef CONFIG_PROCESSOR_EXTERNAL_CONTROL + if (!processor_cntl_external()) { + return -ENODEV; + } +#else return -ENODEV; +#endif /* CONFIG_PROCESSOR_EXTERNAL_CONTROL */ just define processor_cntl_external() to always return 0. (Oh, it looks li= ke you already do this. Why all the #ifdefs then?) Also, rather than: + if (processor_cntl_external()) + processor_notify_external(pr, PROCESSOR_HOTPLUG, + HOTPLUG_TYPE_REMOVE= ); can't processor_notify_external() do the test itself? (In addition to the = #ifdef comments.) This seems pointless: -static int acpi_processor_get_performance_info(struct acpi_processor *pr) +#ifndef CONFIG_PROCESSOR_EXTERNAL_CONTROL +static +#endif +int acpi_processor_get_performance_info(struct acpi_processor *pr) config CPU_FREQ bool "CPU Frequency scaling" + depends on !PROCESSOR_EXTERNAL_CONTROL This isn't acceptible. A single kernel image could be running on bare hard= ware, or be booted as a Xen dom0 kernel. It needs to be able to handle bot= h CPUFREQ and external processor control. Unless external processor contro= l somehow subsumes all the cpufreq driver functions? Anyway, all decisions= need to be made at runtime. Put all the Xen-specific code into a Xen-specific header. I've already add= ed include/xen/acpi.h for the S3 changes, so perhaps that would be suitable= . +ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),) +obj-$(CONFIG_XEN) +=3D processor_extcntl_xen.o +endif No, do this in Kconfig, with something like: config XEN_PROCESSOR_EXTERNAL_CONTROL def_bool y depends on XEN && PROCESSOR_EXTERNAL_CONTROL And maybe XEN_ACPI and/or XEN_DOM0? If this is useful for non-x86 (ia64), = then put it in drivers/xen. J --_000_49F0CC394010609citrixcom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Yu, Ke wrote:
Hi Jeremy,

Kevin and I originally write the domain0 host S3 code and Cx/Px state ACPI =
info parsing code in the linux-2.6.18-xen.hg tree. Since now the pv_ops dom=
ain0 is approaching, we are considering to port these code to pv_ops domain=
0 tree. It is very nice that you already port the host S3 code, so in this =
email we post the Cx/Px state ACPI info parsing code. This is a draft versi=
on intending for early comments.=20
  

Thanks for doing this.  I looked at the changes in 2.6.18-xen and put them into the too-hard basket.

Some general comments about patch submission:
  • please send patches one-per-mail, with [PATCH N/M] in the subject line to indicate ordering
  • in each mail, include the full changelog description like you have below
  • each person signing off the mail should have a Signed-off-by: line
This makes it easier to do line-by-line commentary on the patches and also process it with the tools I have on hand.

I just committed these patches to my tree, rebased onto xen-tip/dom0/core as xen-tip/dom0/acpi-parser, but I haven't even compile-tested to see if the rebase worked.  But I did cut-and-paste the overview into the first patch comment so its recorded properly.

=3D=3D=3D Overview =3D=3D=3D

Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states pow=
er management. This info is provided by BIOS ACPI table. Since hypervisor h=
as no ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-sy=
stem, and then passed to hypervisor by hypercall.
  

Couldn't this be done with a variant of the cpufreq_acpi driver?  It needs to parse the tables, but rather than acting on them directly it passes the information to Xen?

To make this happen, the key point is to add hook in the k=
ernel ACPI sub-system. Fortunately, kernel already has good abstraction, an=
d only several places need to add hook. To be more detail, there is an acpi=
_processor_driver (in drivers/acpi/processor_core.c) , which all the Cx/Px =
parsing event will go to. This driver will call its acpi processor event ha=
ndler, e.g. add/remove, start/stop, notify to handle these events. These ev=
ent handlers in turn will call some library functions (in drivers/acpi/proc=
essor_perflib.c), e.g. acpi_processor_ppc_has_changed, acpi_processor_ppc_h=
as_changed, acpi_processor_cst_has_changed, to finish the acpi info parsing=
.

So the conclusion is: adding hooks in acpi_processor_driver and those relat=
ed library functions will satisfy our requirement.

To make the added hook cleaner, we introduce an interface called external c=
ontrol operation (struct processor_extcntl_ops). All the hooks are encapsul=
ated in this interface processor_extcntl_ops . Here the "external" means th=
e acpi processor is controlled by external entity, e.g. VMM. Every kind of =
external entity can has its implementation of this interface. In this patch=
, the interface for Xen is implemented.
  

Is the issue that the hypervisor will change the CPUs states asynchronously under the kernel, and the kernel wants to be informed about those state changes, along with some kind of mapping from pcpu to vcpu?

What use does the kernel make of that information?

I guess I don't really understand what the larger problem domain is.

Do you have any plans for other hypervisor backends?  Does a kvm one make sense?

=3D=3D Patch Set Description =3D=3D

This patch set is based the xen/dom0/hackery branch. It has three patches:

1.	acpi_parser_framework.patch
This patch introduces the interface of external control operation, and adds=
 hooks to the acpi sub-system, including the acpi_processor_driver, and the=
 related library functions.

2.	acpi_parser_xen_driver.patch
This patch implements the Xen version of the external control operation int=
erface.=20

3.	acpi_parser_platform_hypercall.patch
This patches implements the xen_platform_op hypercall, to pass the parsed A=
CPI info to hypervisor.

=3D=3D Misc =3D=3D

To make this patch works correctly, there is a corner case need to consider=
: processor in domain0 is virtual CPU, while BIOS ACPI table provides the p=
hysical CPU info, since the vCPU and pCPU is not 1:1 mapped, this may cause=
d unexpected result.  E.g. in UP domain0, Xen hypervisor may only get one p=
hysical processor info.  Current linux-2.6.18-xen.hg tree solve this issue =
by using acpi_id instead of processor_id. However, this code is a bit trick=
y and hacky. We are still trying to find a cleaner way to solve this issue.=
=20
  

Is this a corner case?  I would think its the common case.

Anyway, this is draft version for your early comments, and=
 after your reviewing, we will send it to wider scope, e.g. kernel ACPI sub=
-system developer in Intel.
  

OK, some more specific comments:

Cut down on the number of #ifdefs.  If CONFIG_PROCESSOR_EXTERNAL_CONTROL is not set, define the functions to appropriate no-op definitions.  For example, rather than:
+#ifdef CONFIG_PROCESSOR_EXTERNAL_CONTROL
+			if (!processor_cntl_external()) {
+				return -ENODEV;
+			}
+#else
 			return -ENODEV;
+#endif /* CONFIG_PROCESSOR_EXTERNAL_CONTROL */
just define processor_cntl_external() to always return 0.  (Oh, it looks like you already do this.  Why all the #ifdefs then?)

Also, rather than:
+		if (processor_cntl_external())
+			processor_notify_external(pr, PROCESSOR_HOTPLUG,
+							HOTPLUG_TYPE_REMOVE);
can't processor_notify_external() do the test itself?  (In addition to the #ifdef comments.)

This seems pointless:
-static int acpi_processor_get_performance_info(struct acpi_processor =
*pr)
+#ifndef CONFIG_PROCESSOR_EXTERNAL_CONTROL
+static
+#endif
+int acpi_processor_get_performance_info(struct acpi_processor *pr)


 config CPU_FREQ
 	bool "CPU Frequency scaling"
+	depends on !PROCESSOR_EXTERNAL_CONTROL

This isn't acceptible.  A single kernel image could be running on bare hardware, or be booted as a Xen dom0 kernel.  It needs to be able to handle both CPUFREQ and external processor control.  Unless external processor control somehow subsumes all the cpufreq driver functions?  Anyway, all decisions need to be made at runtime.

Put all the Xen-specific code into a Xen-specific header.  I've alread= y added include/xen/acpi.h for the S3 changes, so perhaps that would be suitable.


+ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),)
+obj-$(CONFIG_XEN)		+=3D processor_extcntl_xen.o
+endif
No, do this in Kconfig, with something like:
config XEN_PROCESSOR_EXTERNAL_CONTROL
	def_bool y
	depends on XEN && PROCESSOR_EXTERNAL_CONTROL
And maybe XEN_ACPI and/or XEN_DOM0?  If this is useful for non-x86 (ia64), then put it in drivers/xen.

    J
--_000_49F0CC394010609citrixcom_-- --------------090108000504000300040906 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --------------090108000504000300040906--