xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/9] enable Cache QoS Monitoring (CQM) feature
@ 2014-07-04  8:34 Dongxiao Xu
  2014-07-04  8:34 ` [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access hypercall Dongxiao Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 50+ messages in thread
From: Dongxiao Xu @ 2014-07-04  8:34 UTC (permalink / raw)
  To: xen-devel
  Cc: keir, Ian.Campbell, stefano.stabellini, George.Dunlap,
	andrew.cooper3, Ian.Jackson, JBeulich, dgdegra

Changes from v11:
 - Turn off pqos and pqos_monitor in Xen command line by default.
 - Modify the original specific MSR access hypercall into a generic
   resource access hypercall. This hypercall could be used to access
   MSR, Port I/O, etc. Use platform_op to replace sysctl so that both
   dom0 kernel and userspace could use this hypercall.
 - Address various comments from Jan, Ian, Konrad, and Daniel.

Changes from v10:
 - Re-design and re-implement the whole logic. In this version,
   hypervisor provides basic mechanisms (like access MSRs) while all
   policies are put in user space.
   patch 1-3 provide a generic MSR hypercall for toolstack to access
   patch 4-9 implement the cache QoS monitoring feature

Changes from v9:
 - Revise the readonly mapping mechanism to share data between Xen and
   userspace. We create L3C buffer for each socket, share both buffer
   address MFNs and buffer MFNs to userspace.
 - Split the pqos.c into pqos.c and cqm.c for better code structure.
 - Show the total L3 cache size when issueing xl pqos-list cqm command.
 - Abstract a libxl_getcqminfo() function to fetch cqm data from Xen.
 - Several coding style fixes.

Changes from v8:
 - Address comments from Ian Campbell, including:
   * Modify the return handling for xc_sysctl();
   * Add man page items for platform QoS related commands.
   * Fix typo in commit message.

Changes from v7:
 - Address comments from Andrew Cooper, including:
   * Check CQM capability before allocating cpumask memory.
   * Move one function declaration into the correct patch.

Changes from v6:
 - Address comments from Jan Beulich, including:
   * Remove the unnecessary CPUID feature check.
   * Remove the unnecessary socket_cpu_map.
   * Spin_lock related changes, avoid spin_lock_irqsave().
   * Use readonly mapping to pass cqm data between Xen/Userspace,
     to avoid data copying.
   * Optimize RDMSR/WRMSR logic to avoid unnecessary calls.
   * Misc fixes including __read_mostly prefix, return value, etc.

Changes from v5:
 - Address comments from Dario Faggioli, including:
   * Define a new libxl_cqminfo structure to avoid reference of xc
     structure in libxl functions.
   * Use LOGE() instead of the LIBXL__LOG() functions.

Changes from v4:
 - When comparing xl cqm parameter, use strcmp instead of strncmp,
   otherwise, "xl pqos-attach cqmabcd domid" will be considered as
   a valid command line.
 - Address comments from Andrew Cooper, including:
   * Adjust the pqos parameter parsing function.
   * Modify the pqos related documentation.
   * Add a check for opt_cqm_max_rmid in initialization code.
   * Do not IPI CPU that is in same socket with current CPU.
 - Address comments from Dario Faggioli, including:
   * Fix an typo in export symbols.
   * Return correct libxl error code for qos related functions.
   * Abstract the error printing logic into a function.
 - Address comment from Daniel De Graaf, including:
   * Add return value in for pqos related check.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Modify the GPLv2 related file header, remove the address.

Changes from v3:
 - Use structure to better organize CQM related global variables.
 - Address comments from Andrew Cooper, including:
   * Remove the domain creation flag for CQM RMID allocation.
   * Adjust the boot parameter format, use custom_param().
   * Add documentation for the new added boot parameter.
   * Change QoS type flag to be uint64_t.
   * Initialize the per socket cpu bitmap in system boot time.
   * Remove get_cqm_avail() function.
   * Misc of format changes.
 - Address comment from Daniel De Graaf, including:
   * Use avc_current_has_perm() for XEN2__PQOS_OP that belongs to SECCLASS_XEN2.

Changes from v2:
 - Address comments from Andrew Cooper, including:
   * Merging tools stack changes into one patch.
   * Reduce the IPI number to one per socket.
   * Change structures for CQM data exchange between tools and Xen.
   * Misc of format/variable/function name changes.
 - Address comments from Konrad Rzeszutek Wilk, including:
   * Simplify the error printing logic.
   * Add xsm check for the new added hypercalls.

Changes from v1:
 - Address comments from Andrew Cooper, including:
   * Change function names, e.g., alloc_cqm_rmid(), system_supports_cqm(), etc.
   * Change some structure element order to save packing cost.
   * Correct some function's return value.
   * Some programming styles change.
   * ...

Future generations of Intel Xeon processor may offer monitoring capability in
each logical processor to measure specific quality-of-service metric,
for example, the Cache QoS Monitoring to get L3 cache occupancy.
Detailed information please refer to Intel SDM chapter 17.14.

Cache QoS Monitoring provides a layer of abstraction between applications and
logical processors through the use of Resource Monitoring IDs (RMIDs).
In Xen design, each guest in the system can be assigned an RMID independently,
while RMID=0 is reserved for monitoring domains that doesn't enable CQM service.
When any of the domain's vcpu is scheduled on a logical processor, the domain's
RMID will be activated by programming the value into one specific MSR, and when
the vcpu is scheduled out, a RMID=0 will be programmed into that MSR.
The Cache QoS Hardware tracks cache utilization of memory accesses according to
the RMIDs and reports monitored data via a counter register. With this solution,
we can get the knowledge how much L3 cache is used by a certain guest.

To attach QoS monitoring service to a certain guest:
xl pqos-monitor-attach domid

To detached CQM service from a guest:
xl pqos-monitor-detach domid

To get the L3 cache usage:
$ xl pqos-monitor-show cache_occupancy <domid>

The below data is just an example showing how the CQM related data is exposed to
end user.

[root@localhost]# xl pqos-monitor-show cache_occupancy
Total RMID: 55
Per-Socket L3 Cache Size: 35840 KB
Name                                        ID  SocketID        L3C_Usage       SocketID        L3C_Usage
Domain-0                                     0         0         31920 KB              1         28728 KB
ExampleHVMDomain                             1         0           504 KB              1          6160 KB

Dongxiao Xu (9):
  x86: add generic resource (e.g. MSR) access hypercall
  xsm: add resource operation related xsm policy
  tools: provide interface for generic MSR access
  x86: detect and initialize Platform QoS Monitoring feature
  x86: dynamically attach/detach QoS monitoring service for a guest
  x86: collect global QoS monitoring information
  x86: enable QoS monitoring for each domain RMID
  xsm: add platform QoS related xsm policies
  tools: CMDs and APIs for Platform QoS Monitoring

 docs/man/xl.pod.1                            |   24 +++
 docs/misc/xen-command-line.markdown          |   14 ++
 tools/flask/policy/policy/modules/xen/xen.if |    2 +-
 tools/flask/policy/policy/modules/xen/xen.te |    6 +-
 tools/libxc/Makefile                         |    2 +
 tools/libxc/xc_msr_x86.h                     |   36 +++++
 tools/libxc/xc_pqos.c                        |  219 ++++++++++++++++++++++++++
 tools/libxc/xc_private.h                     |   31 ++++
 tools/libxc/xc_resource.c                    |   53 +++++++
 tools/libxc/xenctrl.h                        |   23 +++
 tools/libxl/Makefile                         |    2 +-
 tools/libxl/libxl.h                          |   17 ++
 tools/libxl/libxl_pqos.c                     |  171 ++++++++++++++++++++
 tools/libxl/libxl_types.idl                  |    4 +
 tools/libxl/xl.h                             |    3 +
 tools/libxl/xl_cmdimpl.c                     |  131 +++++++++++++++
 tools/libxl/xl_cmdtable.c                    |   17 ++
 xen/arch/x86/Makefile                        |    2 +
 xen/arch/x86/domain.c                        |    8 +
 xen/arch/x86/domctl.c                        |   29 ++++
 xen/arch/x86/platform_hypercall.c            |   39 +++++
 xen/arch/x86/pqos.c                          |  193 +++++++++++++++++++++++
 xen/arch/x86/resource.c                      |  119 ++++++++++++++
 xen/arch/x86/setup.c                         |    3 +
 xen/arch/x86/sysctl.c                        |   56 +++++++
 xen/include/asm-x86/cpufeature.h             |    1 +
 xen/include/asm-x86/domain.h                 |    2 +
 xen/include/asm-x86/msr-index.h              |    3 +
 xen/include/asm-x86/pqos.h                   |   65 ++++++++
 xen/include/asm-x86/resource.h               |   40 +++++
 xen/include/public/domctl.h                  |   12 ++
 xen/include/public/platform.h                |   24 +++
 xen/include/public/sysctl.h                  |   14 ++
 xen/include/xlat.lst                         |    1 +
 xen/xsm/flask/hooks.c                        |   11 ++
 xen/xsm/flask/policy/access_vectors          |   18 ++-
 xen/xsm/flask/policy/security_classes        |    1 +
 37 files changed, 1390 insertions(+), 6 deletions(-)
 create mode 100644 tools/libxc/xc_msr_x86.h
 create mode 100644 tools/libxc/xc_pqos.c
 create mode 100644 tools/libxc/xc_resource.c
 create mode 100644 tools/libxl/libxl_pqos.c
 create mode 100644 xen/arch/x86/pqos.c
 create mode 100644 xen/arch/x86/resource.c
 create mode 100644 xen/include/asm-x86/pqos.h
 create mode 100644 xen/include/asm-x86/resource.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 50+ messages in thread
* Re: [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access hypercall
@ 2014-07-15  2:23 Xu, Dongxiao
  2014-07-15 10:00 ` Andrew Cooper
  0 siblings, 1 reply; 50+ messages in thread
From: Xu, Dongxiao @ 2014-07-15  2:23 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com,
	George.Dunlap@eu.citrix.com, stefano.stabellini@eu.citrix.com,
	Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org,
	dgdegra@tycho.nsa.gov

> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Friday, July 11, 2014 5:25 PM
> To: Xu, Dongxiao; Jan Beulich
> Cc: Ian.Campbell@citrix.com; George.Dunlap@eu.citrix.com;
> Ian.Jackson@eu.citrix.com; stefano.stabellini@eu.citrix.com;
> xen-devel@lists.xen.org; konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov;
> keir@xen.org
> Subject: Re: [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access
> hypercall
> 
> On 11/07/14 05:29, Xu, Dongxiao wrote:
> >> -----Original Message-----
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Friday, July 04, 2014 6:44 PM
> >> To: Xu, Dongxiao
> >> Cc: andrew.cooper3@citrix.com; Ian.Campbell@citrix.com;
> >> George.Dunlap@eu.citrix.com; Ian.Jackson@eu.citrix.com;
> >> stefano.stabellini@eu.citrix.com; xen-devel@lists.xen.org;
> >> konrad.wilk@oracle.com; dgdegra@tycho.nsa.gov; keir@xen.org
> >> Subject: Re: [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access
> >> hypercall
> >>
> >>>>> On 04.07.14 at 10:34, <dongxiao.xu@intel.com> wrote:
> >>> Add a generic resource access hypercall for tool stack or other
> >>> components, e.g., accessing MSR, port I/O, etc.
> >> Sigh - you're still allocating an unbounded amount of memory for
> >> passing around the input arguments, despite it being possible (and
> >> having been suggested) to read these from the original buffer on
> >> each iteration. You're still not properly checking for preemption
> >> between iterations. And you're still not making use of
> >> continue_hypercall_on_cpu(). Plus you now silently ignore the
> >> upper 32-bits of the passing in "idx" value as well as not
> >> understood XEN_RESOURCE_OP_* values.
> > continue_hypercall_on_cpu() is asynchronized, which requires the "data" field
> always points to the right place before the hypercall returns.
> > However in our function, we have a "for" loop to cover multiple operations, so
> the "data" field will be modified in each iteration, which cannot meet the
> continue_hypercall_on_cpu() requirements...
> 
> This is because you are still copying all resource data at once from the
> hypercall.
> 
> As Jan points out, this is an unbounded allocation in Xen which must be
> addresses.  If instead you were to copy each element one at a time, you
> would avoid this allocation entirely and be able to correctly use
> continue_hypercall_on_cpu().

I've accepted the idea to copy the element one by one, however it seems that it doesn't help on continue_hypercall_on_cpu().
The full code looks like the following, where the variable "ra" will be updated on every "for" loop, and couldn't be used in continue_hypercall_on_cpu().
Do you have idea on how to solve this issue and use continue_hypercall_on_cpu() here?

static int resource_access_helper(struct xenpf_resource_op *op)
{
    struct xen_resource_access ra;
    unsigned int i;
    int ret = 0;

    for ( i = 0; i < op->nr; i++ )
    {
        if ( copy_from_guest_offset(&ra.data, op->data, i, 1) )
        {
            ret = -EFAULT;
            break;
        }

        if ( ra.data.cpu == smp_processor_id() )
            resource_access_one(&ra);
        else
            on_selected_cpus(cpumask_of(ra.data.cpu),
                             resource_access_one, &ra, 1);

        if ( copy_to_guest_offset(op->data, i, &ra.data, 1) )
        {
            ret = -EFAULT;
            break;
        }
    }

    return ret;
}

> 
> 
> >
> > For the preemption check, what about the following? Here the preemption is
> checked within each resource_access_one() function.
> 
> None of this preemption works.
> 
> In the case a hypercall gets preempted, you need to increment the guest
> handle along to the next element to process, and decrement the count by
> the number of elements processed in *the guest context*.
> 
> That way, when the hypercall continues in Xen, it shall pick up with the
> next action to perform rather than restarting from the beginning.

Some actions (like CQM) requires the read/write the MSRs in a continuous way, if it is interrupted, this "continuity" couldn't be guaranteed. The RESTART return value indicates to re-run the operations.
BTW, I tested it in my box, and the "failure" case doesn't happen frequently.

Thanks,
Dongxiao

> 
> ~Andrew

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-08-01  9:19 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-04  8:34 [PATCH v12 0/9] enable Cache QoS Monitoring (CQM) feature Dongxiao Xu
2014-07-04  8:34 ` [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access hypercall Dongxiao Xu
2014-07-04  9:40   ` Andrew Cooper
2014-07-04 10:30     ` Jan Beulich
2014-07-04 10:52       ` Andrew Cooper
2014-07-08  7:06         ` Xu, Dongxiao
2014-07-08  9:07           ` Andrew Cooper
2014-07-08  9:30             ` Jürgen Groß
2014-07-09  2:06             ` Xu, Dongxiao
2014-07-09 14:17               ` Daniel De Graaf
2014-07-08  8:57         ` George Dunlap
2014-07-08  9:20           ` Andrew Cooper
2014-07-04 10:44   ` Jan Beulich
2014-07-11  4:29     ` Xu, Dongxiao
2014-07-11  9:24       ` Andrew Cooper
2014-07-04  8:34 ` [PATCH v12 2/9] xsm: add resource operation related xsm policy Dongxiao Xu
2014-07-08 21:22   ` Daniel De Graaf
2014-07-09  5:28     ` Xu, Dongxiao
2014-07-09 14:17       ` Daniel De Graaf
2014-07-04  8:34 ` [PATCH v12 3/9] tools: provide interface for generic MSR access Dongxiao Xu
2014-07-04 11:42   ` Jan Beulich
2014-07-09 16:58     ` Ian Campbell
2014-07-23  7:48       ` Jan Beulich
2014-07-24  6:31         ` Xu, Dongxiao
2014-07-24  6:56           ` Jan Beulich
2014-07-24  6:36         ` Xu, Dongxiao
2014-07-09 17:01   ` Ian Campbell
2014-07-04  8:34 ` [PATCH v12 4/9] x86: detect and initialize Platform QoS Monitoring feature Dongxiao Xu
2014-07-04 11:56   ` Jan Beulich
2014-07-15  6:18     ` Xu, Dongxiao
2014-07-04  8:34 ` [PATCH v12 5/9] x86: dynamically attach/detach QoS monitoring service for a guest Dongxiao Xu
2014-07-04 12:06   ` Jan Beulich
2014-07-15  5:31     ` Xu, Dongxiao
2014-07-23  7:53       ` Jan Beulich
2014-07-04  8:34 ` [PATCH v12 6/9] x86: collect global QoS monitoring information Dongxiao Xu
2014-07-04 12:14   ` Jan Beulich
2014-08-01  8:26     ` Xu, Dongxiao
2014-08-01  9:19       ` Jan Beulich
2014-07-04  8:34 ` [PATCH v12 7/9] x86: enable QoS monitoring for each domain RMID Dongxiao Xu
2014-07-04 12:15   ` Jan Beulich
2014-07-04  8:34 ` [PATCH v12 8/9] xsm: add platform QoS related xsm policies Dongxiao Xu
2014-07-08 21:22   ` Daniel De Graaf
2014-07-04  8:34 ` [PATCH v12 9/9] tools: CMDs and APIs for Platform QoS Monitoring Dongxiao Xu
2014-07-10 15:50   ` Ian Campbell
2014-07-04 10:26 ` [PATCH v12 0/9] enable Cache QoS Monitoring (CQM) feature Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2014-07-15  2:23 [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access hypercall Xu, Dongxiao
2014-07-15 10:00 ` Andrew Cooper
2014-07-23  7:45   ` Jan Beulich
2014-07-23  9:09     ` Andrew Cooper
2014-07-28 10:01       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).