* [PATCH] Xenoprof patches for xen-unstable
@ 2005-05-25 19:10 Santos, Jose Renato G
2005-05-25 19:22 ` Nivedita Singhvi
2005-05-31 16:08 ` Andrew Theurer
0 siblings, 2 replies; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-05-25 19:10 UTC (permalink / raw)
To: Xen-devel
[-- Attachment #1: Type: text/plain, Size: 936 bytes --]
hi,
I have attached patches for enabling system wide profiling
using oprofile for xen unstable.
The patches were generated against change-set 1.1507 (May 22).
The 4 attached files are
1) xenoprof.txt:
- xenoprof overview and user guide
2) xenoprof-1.1-xen-3.0-devel.patch:
- patch for xen
3) xenoprof-1.1-linux-2.6.11:
- patch for linux. Note that this needs to be applied
twice, once to linux-2.6.11-xen0 and once to
linux-2.6.11-xenU. (This is different than the last
patch which was created against the linux sparse tree).
4) xenoprof-1.1-oprofile-0.8.2:
- patch for oprofile version 0.8.2
Current known limitation/bugs are
- No support for SMP guests yet.
- when using passive domains, most samples are lost.
I will be working on these issues and post new patches
when I have them available.
Thanks
Renato
[-- Attachment #2: xenoprof.txt --]
[-- Type: text/plain, Size: 9765 bytes --]
XENOPROF - Performance profiling in Xen
=========================================
User Guide
============
Version: 1.1
Date: May 25, 2005
Copyright (C) 2005 Hewlett-Packard Co. (http://xenoprof.sourceforge.net)
(Aravind Menon, Jose Renato Santos, Yoshio Turner, G.(John) Janakiraman)
1. Overview
===========
This file provides an overview of Xenoprof, a system-wide statistical
profiling toolkit implemented for the Xen virtual machine environment.
The Xenoprof toolkit supports system-wide coordinated profiling in a
Xen environment to obtain the distribution of hardware events such as
clock cycles, instruction execution, TLB and cache misses, etc. Xenoprof
allows profiling of concurrently executing virtual machines (which
includes the operating system and applications running in each virtual
machine) and the Xen VMM itself. Xenoprof provides profiling data at the
fine granularity of individual processes and routines executing in either
the virtual machine or in the Xen VMM
Xenoprof was developed at HP Labs by modifying and extending the
original OProfile code for linux (http://oprofile.sourceforge.net).
We assume the reader is familiar with OProfile and its tools. If you
are not familiar with OProfile we suggest that you read the OProfile
user manual at http://oprofile.sourceforge.net/docs before using Xenoprof.
System wide profiling in Xen requires the cooperation of 3 software
components at different levels of the software stack.
a) Xenoprof:
Extensions to the Xen hypervisor to support system-wide statistical
profiling. Xenoprof programs hardware performance counters to
generate sampling interrupts at regular event count intervals, and
handles the Non Maskable Interrupts (NMI) generated by the
performance counters at overflow. The NMI handler samples the
program counter (PC) at the time of interrupt and stores the PC
value in a per domain sample buffer. Domains interact with
Xenoprof using a specific hypercall. This hypercall enables
domains to define the hardware performance events to be sampled and
their parameters (e.g., overflow interval), as well as to control
the start and end of profiling. Domains are notified of new PC
samples in their respective sample buffers using the virtual
interrupt mechanism provided by Xen (e.g., event notification).
b) OProfile kernel module:
This module is responsible for interpreting the PC samples received
from Xenoprof and mapping the PC sample to the appropriate routine
in user, kernel or hypervisor level. The original OProfile kernel
module for linux was modified to use the Xenoprof interface instead
of accessing the hardware counters directly.
The OProfile module is organized in two main components: a low
level driver, specific to a particular CPU model, and a generic
module that is independent of the specific CPU model and implements
the higher level profiling functions. To enable OProfile to be
used with Xenoprof, a new low level driver specific to Xen was
created. This driver accesses the hypervisor through the exposed
Xenoprof interface, while the high level generic module was kept
almost unmodified, except for minor changes necessary to interpret
performance events associated with the hypervisor.
c) OProfile user level daemon and tools:
The user level daemon is responsible for collecting the performance
event samples from the kernel module and storing them on files for
later processing and reporting. The user level tools implement
commands that enable the user to start and stop a profiling
session, selecting the appropriate performance events and
parameters as well as to generate reports. In order to be used in
a Xen environment these tools were slightly modified. In
particular, new command line options were added to the opcontrol
command as described below.
2. Profiling multiple domains
=============================
A profiling session may profile one, a subset, or all domains running
in a particular physical machine. In every profiling session one of
the domains takes the role of the initiator, which is responsible for
configuring, starting and stopping the session. Other domains can be
included in the session as active participants or passive
participants. Active participants are domains which have an active
OProfile kernel module that can map a PC sample to the appropriate
routine in user, kernel or hypervisor level, given that the CPU was
executing that domain when the PC was sampled. Passive participants
do not need to be executing an OProfile kernel module. For these
domains performance profiling is done at a coarser granularity with PC
samples being assigned to the domain as whole, instead of to specific
routines. Passive domains are useful when profiling systems running
domains with operating systems that do not support the OProfile kernel
module or equivalent. Note that the initiator must always be an
active domain. The initiator will process the PC samples of all
passive domains.
A performance event (generated when one of the hardware performance
counters overflows) is delivered to the appropriate domain, depending
on the type of domain running at the time of the event. If the
running domain is an active domain the PC sample is delivered to that
domain. If the running domain is a passive domain, the PC sample is
delivered to the initiator. If the running domain is not included in
the profiling session, the PC sample is discarded.
3) Extensions to OProfile user level commands
=============================================
A few command line options were added to OProfile command "opcontrol"
for use in Xen environments. The new command line options are:
a) --xen=<xen_image_file>
This option is used to specify the xen image (e.g. xen-syms). This
is used to resolve PC samples collected when executing the
hypervisor.
b) --active-domains=<list>
(where <list> is a list of comma separated domain ids)
This option is used in the initiator domain to specify the list of
active domains to be profiled. The specification of the initiator
domain id in the list of active domains is not necessary. The
initiator domain will always be considered an active domain and its
inclusion on the specified active domain list is optional.
For example: --active-domains=2,5,6 indicates that domains 2, 5 and
6 are active domains. Assuming that domain 0 was the initiator the
previous specification would be equivalent to
--active-domains=0,2,5,6.
c) --passive-domains=<list>
This option is used to specify the list of passive domains.
Besides opcontrol no other OProfile commands were modified for use in
Xen environments.
Full system profiling reports can be easily obtained by concatenating
the individual reports of each active domain, using the regular
opreport command in each active domain. New tools that combine
multiple reports on a single system-wide report can be implemented in
the future.
4) Multi-domain profiling
=========================
In order to start and stop a profiling session across multiple domains
a set of OProfile commands must be executed in the multiple domains in
a coordinated way. A typical sequence of commands for starting and
stopping profiling are listed below
A) Sequence of commands to start profiling:
1) On the initiator domain
> opcontrol --reset
(clear out any previous data of current session)
> opcontrol --start-daemon
[--active-domains=<active_list>]
[--passive-domains=<passive_list>] ...
(start OProfile daemon and specify the set of active and
passive domains in the session)
2) On each active domain
> opcontrol --reset
> opcontrol --start
(indicates domain is ready to process performance events)
3) On initiator
> opcontrol --start
(Multi-domain profiling session starts)
(This is only successful if all active domains are ready)
B) Sequence of commands to stop profiling
1) On each active domain
> opcontrol --stop
2) On initiator domain
> opcontrol --stop
5) Current supported configurations
a) Xen versions: Xen 2.0.3 to 2.0.5;
Xen 3.0-devel (tested on xen-unstable as of
05/22/2005 - change set 1.1507)
b) Oprofile version 0.8.2
b) Processor architecture: X86
c) Processor models: Pentium 4, Pentium iii
d) Active Domains: Uniprocessor - linux 2.6. (No SMP, No linux 2.4)
e) Passive Domains: No restriction
6) Patch files
==============
In order to run OProfile in Xen environments three patches are needed:
a) xenoprof-1.1-xen-3.0-devel.patch
Patch for Xen hypervisor.5
b) xenoprof-1.1-linux-2.6.11.patch
Patch for Linux 2.6.11
(Apply twice: One to linux-2.6.11-xen0 and once to linux-2.6.11-xenU)
c) xenoprof-1.1-oprofile-0.8.2.patch
Patch for OProfile 0.8.2
Version 1.1
===========
- Modifications necessary for running on xen 3.0 (xen-unstable)
(This version will not run on xen 2.0. For xen 2.0 use xenoprof 1.0)
- Code clean up based on John Levon comments
- Current known limitations/bugs
- no support for SMP guests
- large number of passive domains samples being lost
[-- Attachment #3: xenoprof-1.1-xen-3.0-devel.patch --]
[-- Type: application/octet-stream, Size: 61330 bytes --]
diff -PNaur xen-unstable/xen/arch/x86/Makefile xen-unstable-prof/xen/arch/x86/Makefile
--- xen-unstable/xen/arch/x86/Makefile 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof/xen/arch/x86/Makefile 2005-05-24 11:54:47.000000000 -0700
@@ -12,7 +12,10 @@
OBJS := $(patsubst cdb%.o,,$(OBJS))
endif
+OBJS += oprofile/oprofile.o
+
default: $(TARGET)
+ make -C oprofile
$(TARGET): $(TARGET)-syms boot/mkelf32
./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000
@@ -30,12 +33,16 @@
boot/mkelf32: boot/mkelf32.c
$(HOSTCC) $(HOSTCFLAGS) -o $@ $<
+oprofile/oprofile.o:
+ $(MAKE) -C oprofile
+
clean:
rm -f *.o *.s *~ core boot/*.o boot/*~ boot/core boot/mkelf32
rm -f x86_32/*.o x86_32/*~ x86_32/core
rm -f x86_64/*.o x86_64/*~ x86_64/core
rm -f mtrr/*.o mtrr/*~ mtrr/core
rm -f acpi/*.o acpi/*~ acpi/core
+ rm -f oprofile/*.o
delete-unfresh-files:
# nothing
diff -PNaur xen-unstable/xen/arch/x86/nmi.c xen-unstable-prof/xen/arch/x86/nmi.c
--- xen-unstable/xen/arch/x86/nmi.c 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof/xen/arch/x86/nmi.c 2005-05-24 11:54:47.000000000 -0700
@@ -5,6 +5,10 @@
*
* Started by Ingo Molnar <mingo@redhat.com>
*
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ *
* Fixes:
* Mikael Pettersson : AMD K7 support for local APIC NMI watchdog.
* Mikael Pettersson : Power Management for local APIC NMI watchdog.
@@ -33,6 +37,28 @@
extern int logical_proc_id[];
+/*
+ * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
+ * - it may be reserved by some other driver, or not
+ * - when not reserved by some other driver, it may be used for
+ * the NMI watchdog, or not
+ *
+ * This is maintained separately from nmi_active because the NMI
+ * watchdog may also be driven from the I/O APIC timer.
+ */
+static spinlock_t lapic_nmi_owner_lock = SPIN_LOCK_UNLOCKED;
+static unsigned int lapic_nmi_owner;
+#define LAPIC_NMI_WATCHDOG (1<<0)
+#define LAPIC_NMI_RESERVED (1<<1)
+
+/* nmi_active:
+ * +1: the lapic NMI watchdog is active, but can be disabled
+ * 0: the lapic NMI watchdog has not been set up, and cannot
+ * be enabled
+ * -1: the lapic NMI watchdog is disabled, but can be enabled
+ */
+int nmi_active;
+
#define K7_EVNTSEL_ENABLE (1 << 22)
#define K7_EVNTSEL_INT (1 << 20)
#define K7_EVNTSEL_OS (1 << 17)
@@ -69,9 +95,6 @@
*/
#define MSR_P4_IQ_COUNTER0 0x30C
#define MSR_P4_IQ_COUNTER1 0x30D
-#define MSR_P4_IQ_CCCR0 0x36C
-#define MSR_P4_IQ_CCCR1 0x36D
-#define MSR_P4_CRU_ESCR0 0x3B8 /* ESCR no. 4 */
#define P4_NMI_CRU_ESCR0 \
(P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS0|P4_ESCR_USR0| \
P4_ESCR_OS1|P4_ESCR_USR1)
@@ -123,6 +146,69 @@
* Original code written by Keith Owens.
*/
+static void disable_lapic_nmi_watchdog(void)
+{
+ if (nmi_active <= 0)
+ return;
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ wrmsr(MSR_K7_EVNTSEL0, 0, 0);
+ break;
+ case X86_VENDOR_INTEL:
+ switch (boot_cpu_data.x86) {
+ case 6:
+ wrmsr(MSR_P6_EVNTSEL0, 0, 0);
+ break;
+ case 15:
+ if (logical_proc_id[smp_processor_id()] == 0)
+ {
+ wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
+ wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
+ } else {
+ wrmsr(MSR_P4_IQ_CCCR1, 0, 0);
+ }
+ break;
+ }
+ break;
+ }
+ nmi_active = -1;
+ /* tell do_nmi() and others that we're not active any more */
+ nmi_watchdog = 0;
+}
+
+static void enable_lapic_nmi_watchdog(void)
+{
+ if (nmi_active < 0) {
+ nmi_watchdog = NMI_LOCAL_APIC;
+ setup_apic_nmi_watchdog();
+ }
+}
+
+int reserve_lapic_nmi(void)
+{
+ unsigned int old_owner;
+ spin_lock(&lapic_nmi_owner_lock);
+ old_owner = lapic_nmi_owner;
+ lapic_nmi_owner |= LAPIC_NMI_RESERVED;
+ spin_unlock(&lapic_nmi_owner_lock);
+ if (old_owner & LAPIC_NMI_RESERVED)
+ return -EBUSY;
+ if (old_owner & LAPIC_NMI_WATCHDOG)
+ disable_lapic_nmi_watchdog();
+ return 0;
+}
+
+void release_lapic_nmi(void)
+{
+ unsigned int new_owner;
+ spin_lock(&lapic_nmi_owner_lock);
+ new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
+ lapic_nmi_owner = new_owner;
+ spin_unlock(&lapic_nmi_owner_lock);
+ if (new_owner & LAPIC_NMI_WATCHDOG)
+ enable_lapic_nmi_watchdog();
+}
+
static void __pminit clear_msr_range(unsigned int base, unsigned int n)
{
unsigned int i;
@@ -247,6 +333,8 @@
default:
return;
}
+ lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
+ nmi_active = 1;
nmi_pm_init();
}
@@ -333,3 +421,7 @@
}
}
}
+
+EXPORT_SYMBOL(reserve_lapic_nmi);
+EXPORT_SYMBOL(release_lapic_nmi);
+
diff -PNaur xen-unstable/xen/arch/x86/oprofile/Makefile xen-unstable-prof/xen/arch/x86/oprofile/Makefile
--- xen-unstable/xen/arch/x86/oprofile/Makefile 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/Makefile 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,9 @@
+
+include $(BASEDIR)/Rules.mk
+
+default: $(OBJS)
+ $(LD) $(LDFLAGS) -r -o oprofile.o $(OBJS)
+
+%.o: %.c $(HDRS) Makefile
+ $(CC) $(CFLAGS) -c $< -o $@
+
diff -PNaur xen-unstable/xen/arch/x86/oprofile/nmi_int.c xen-unstable-prof/xen/arch/x86/oprofile/nmi_int.c
--- xen-unstable/xen/arch/x86/oprofile/nmi_int.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/nmi_int.c 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,433 @@
+/**
+ * @file nmi_int.c
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/event.h>
+#include <xen/types.h>
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <public/xen.h>
+#include <asm/nmi.h>
+#include <asm/msr.h>
+#include <asm/apic.h>
+
+#include "op_counter.h"
+#include "op_x86_model.h"
+
+static struct op_x86_model_spec const * model;
+static struct op_msrs cpu_msrs[NR_CPUS];
+static unsigned long saved_lvtpc[NR_CPUS];
+
+#define VIRQ_BITMASK_SIZE (MAX_OPROF_DOMAINS/32 + 1)
+
+extern int active_domains[MAX_OPROF_DOMAINS];
+extern unsigned int adomains;
+
+extern struct domain * primary_profiler;
+extern struct domain * adomain_ptrs[MAX_OPROF_DOMAINS];
+extern unsigned long virq_ovf_pending[VIRQ_BITMASK_SIZE];
+
+extern int is_active(struct domain *d);
+extern int active_id(struct domain *d);
+extern int is_passive(struct domain *d);
+extern int is_profiled(struct domain *d);
+
+
+int nmi_profiling_started = 0;
+
+int active_virq_count = 0;
+int passive_virq_count = 0;
+int other_virq_count = 0;
+int other_id = -1;
+int xen_count = 0;
+int dom_count = 0;
+int ovf = 0;
+
+int nmi_callback(struct cpu_user_regs * regs, int cpu)
+{
+ int xen_mode = 0;
+
+ ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
+ xen_mode = RING_0(regs);
+ if (ovf) {
+ if (xen_mode)
+ xen_count++;
+ else
+ dom_count++;
+
+ if (is_active(current->domain)) {
+ /* This is lightly incorrect. If we do not deliver
+ OVF virtual interrupts in a synchronous
+ manner, a process switch may happen in the domain
+ between the point the sample was collected and
+ the point at which a VIRQ was delivered. However,
+ it is not safe to call send_guest_virq from this
+ NMI context, it may lead to a deadlock since NMIs are
+ unmaskable. One optimization that we can do is
+ that if the sample occurs while domain code is
+ runnng, we know that it is safe to call
+ send_guest_virq, since we know no Xen code
+ is running at that time.
+ However, this may distort the sample distribution,
+ because we may lose more Xen mode samples.*/
+ active_virq_count++;
+ if (!xen_mode) {
+ send_guest_virq(current, VIRQ_PMC_OVF);
+ clear_bit(active_id(current->domain), &virq_ovf_pending[0]);
+ } else
+ set_bit(active_id(current->domain), &virq_ovf_pending[0]);
+ primary_profiler->shared_info->active_samples++;
+ }
+ else if (is_passive(current->domain)) {
+ set_bit(active_id(primary_profiler), &virq_ovf_pending[0]);
+ passive_virq_count++;
+ primary_profiler->shared_info->passive_samples++;
+ }
+ else {
+ other_virq_count++;
+ other_id = current->domain->domain_id;
+ primary_profiler->shared_info->other_samples++;
+ }
+ }
+ return 1;
+}
+
+static void free_msrs(void)
+{
+ int i;
+ for (i = 0; i < NR_CPUS; ++i) {
+ xfree(cpu_msrs[i].counters);
+ cpu_msrs[i].counters = NULL;
+ xfree(cpu_msrs[i].controls);
+ cpu_msrs[i].controls = NULL;
+ }
+}
+
+static int allocate_msrs(void)
+{
+ int success = 1;
+ size_t controls_size = sizeof(struct op_msr) * model->num_controls;
+ size_t counters_size = sizeof(struct op_msr) * model->num_counters;
+
+ int i;
+ for (i = 0; i < NR_CPUS; ++i) {
+ //if (!cpu_online(i))
+ if (!test_bit(i, &cpu_online_map))
+ continue;
+
+ cpu_msrs[i].counters = xmalloc_bytes(counters_size);
+ if (!cpu_msrs[i].counters) {
+ success = 0;
+ break;
+ }
+ cpu_msrs[i].controls = xmalloc_bytes(controls_size);
+ if (!cpu_msrs[i].controls) {
+ success = 0;
+ break;
+ }
+ }
+ if (!success)
+ free_msrs();
+
+ return success;
+}
+
+static void nmi_cpu_save_registers(struct op_msrs * msrs)
+{
+ unsigned int const nr_ctrs = model->num_counters;
+ unsigned int const nr_ctrls = model->num_controls;
+ struct op_msr * counters = msrs->counters;
+ struct op_msr * controls = msrs->controls;
+ unsigned int i;
+
+ for (i = 0; i < nr_ctrs; ++i) {
+ rdmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
+
+ for (i = 0; i < nr_ctrls; ++i) {
+ rdmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
+}
+
+static void nmi_save_registers(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ model->fill_in_addresses(msrs);
+ nmi_cpu_save_registers(msrs);
+}
+
+int nmi_reserve_counters(void)
+{
+ if (!allocate_msrs())
+ return -ENOMEM;
+
+ /* We walk a thin line between law and rape here.
+ * We need to be careful to install our NMI handler
+ * without actually triggering any NMIs as this will
+ * break the core code horrifically.
+ */
+ /* Don't we need to do this on all CPUs?*/
+ if (reserve_lapic_nmi() < 0) {
+ free_msrs();
+ return -EBUSY;
+ }
+ /* We need to serialize save and setup for HT because the subset
+ * of msrs are distinct for save and setup operations
+ */
+ on_each_cpu(nmi_save_registers, NULL, 0, 1);
+ return 0;
+}
+
+static void nmi_cpu_setup(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ model->setup_ctrs(msrs);
+}
+
+int nmi_setup_events(void)
+{
+ on_each_cpu(nmi_cpu_setup, NULL, 0, 1);
+ return 0;
+}
+
+int nmi_enable_virq()
+{
+ set_nmi_callback(nmi_callback);
+ return 0;
+}
+
+static void nmi_cpu_start(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs const * msrs = &cpu_msrs[cpu];
+ saved_lvtpc[cpu] = apic_read(APIC_LVTPC);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ model->start(msrs);
+}
+
+int nmi_start(void)
+{
+ on_each_cpu(nmi_cpu_start, NULL, 0, 1);
+ nmi_profiling_started = 1;
+ return 0;
+}
+
+static void nmi_cpu_stop(void * dummy)
+{
+ unsigned int v;
+ int cpu = smp_processor_id();
+ struct op_msrs const * msrs = &cpu_msrs[cpu];
+ model->stop(msrs);
+
+ /* restoring APIC_LVTPC can trigger an apic error because the delivery
+ * mode and vector nr combination can be illegal. That's by design: on
+ * power on apic lvt contain a zero vector nr which are legal only for
+ * NMI delivery mode. So inhibit apic err before restoring lvtpc
+ */
+ if (!(apic_read(APIC_LVTPC) & APIC_DM_NMI)
+ || (apic_read(APIC_LVTPC) & APIC_LVT_MASKED)) {
+ printk("nmi_stop: APIC not good %ul\n", apic_read(APIC_LVTPC));
+ mdelay(5000);
+ }
+ v = apic_read(APIC_LVTERR);
+ apic_write(APIC_LVTERR, v | APIC_LVT_MASKED);
+ apic_write(APIC_LVTPC, saved_lvtpc[cpu]);
+ apic_write(APIC_LVTERR, v);
+}
+
+void nmi_stop(void)
+{
+ nmi_profiling_started = 0;
+ on_each_cpu(nmi_cpu_stop, NULL, 0, 1);
+ active_virq_count = 0;
+ passive_virq_count = 0;
+ other_virq_count = 0;
+ xen_count = 0;
+ dom_count = 0;
+}
+
+extern unsigned int read_ctr(struct op_msrs const * const msrs, int ctr);
+
+void nmi_sanity_check(struct cpu_user_regs *regs, int cpu)
+{
+ int i;
+ int masked = 0;
+
+ /* We may have missed some NMI interrupts if we were already
+ in an NMI context at that time. If this happens, then
+ the counters are not reset and in the case of P4, the
+ APIC LVT disable mask is set. In both cases we end up
+ losing samples. On P4, this condition can be detected
+ by checking the APIC LVT mask. But in P6, we need to
+ examine the counters for overflow. So, every timer
+ interrupt, we check that everything is OK */
+
+ if (apic_read(APIC_LVTPC) & APIC_LVT_MASKED)
+ masked = 1;
+
+ nmi_callback(regs, cpu);
+
+ if (ovf && masked) {
+ if (is_active(current->domain))
+ current->domain->shared_info->nmi_restarts++;
+ else if (is_passive(current->domain))
+ primary_profiler->shared_info->nmi_restarts++;
+ }
+
+ /*if (jiffies %1000 == 0) {
+ printk("cpu %d: sample count %d %d %d at %u\n", cpu, active_virq_count, passive_virq_count, other_virq_count, jiffies);
+ printk("other task id %d\n", other_id);
+ printk("%d in xen, %d in domain\n", xen_count, dom_count);
+ printk("counters %p %p\n", read_ctr(&cpu_msrs[cpu], 0), read_ctr(&cpu_msrs[cpu], 1));
+ }*/
+
+
+ for (i = 0; i < adomains; i++)
+ if (test_and_clear_bit(i, &virq_ovf_pending[0])) {
+ /* For now we do not support profiling of SMP guests */
+ /* virq is delivered to first VCPU */
+ send_guest_virq(adomain_ptrs[i]->exec_domain[0], VIRQ_PMC_OVF);
+ }
+}
+
+void nmi_disable_virq(void)
+{
+ unset_nmi_callback();
+}
+
+static void nmi_restore_registers(struct op_msrs * msrs)
+{
+ unsigned int const nr_ctrs = model->num_counters;
+ unsigned int const nr_ctrls = model->num_controls;
+ struct op_msr * counters = msrs->counters;
+ struct op_msr * controls = msrs->controls;
+ unsigned int i;
+
+ for (i = 0; i < nr_ctrls; ++i) {
+ wrmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
+
+ for (i = 0; i < nr_ctrs; ++i) {
+ wrmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
+}
+
+static void nmi_cpu_shutdown(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ nmi_restore_registers(msrs);
+}
+
+void nmi_release_counters(void)
+{
+ on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
+ release_lapic_nmi();
+ free_msrs();
+}
+
+struct op_counter_config counter_config[OP_MAX_COUNTER];
+
+static int __init p4_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 3)
+ return 0;
+
+#ifndef CONFIG_SMP
+ model = &op_p4_spec;
+ return 1;
+#else
+ //switch (smp_num_siblings) {
+ if (cpu_has_ht)
+ {
+ model = &op_p4_ht2_spec;
+ return 1;
+ }
+ else
+ {
+ model = &op_p4_spec;
+ return 1;
+ }
+#endif
+ return 0;
+}
+
+
+static int __init ppro_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 0xd)
+ return 0;
+
+ model = &op_ppro_spec;
+ return 1;
+}
+
+int nmi_init(int *num_events, int *is_primary)
+{
+ __u8 vendor = current_cpu_data.x86_vendor;
+ __u8 family = current_cpu_data.x86;
+ int prim = 0;
+
+ if (!cpu_has_apic)
+ return -ENODEV;
+
+ if (primary_profiler == NULL) {
+ primary_profiler = current->domain;
+ prim = 1;
+ }
+
+ if (primary_profiler != current->domain)
+ goto out;
+
+ switch (vendor) {
+ case X86_VENDOR_INTEL:
+ switch (family) {
+ /* Pentium IV */
+ case 0xf:
+ if (!p4_init())
+ return -ENODEV;
+ break;
+ /* A P6-class processor */
+ case 6:
+ if (!ppro_init())
+ return -ENODEV;
+ break;
+ default:
+ return -ENODEV;
+ }
+ break;
+ default:
+ return -ENODEV;
+ }
+out:
+ if (copy_to_user((void *)num_events, (void *)&model->num_counters, sizeof(int)))
+ return -EFAULT;
+ if (copy_to_user((void *)is_primary, (void *)&prim, sizeof(int)))
+ return -EFAULT;
+
+ return 0;
+}
+
diff -PNaur xen-unstable/xen/arch/x86/oprofile/op_counter.h xen-unstable-prof/xen/arch/x86/oprofile/op_counter.h
--- xen-unstable/xen/arch/x86/oprofile/op_counter.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/op_counter.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,33 @@
+/**
+ * @file op_counter.h
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#ifndef OP_COUNTER_H
+#define OP_COUNTER_H
+
+#define OP_MAX_COUNTER 8
+
+/* Per-perfctr configuration as set via
+ * oprofilefs.
+ */
+struct op_counter_config {
+ unsigned long count;
+ unsigned long enabled;
+ unsigned long event;
+ unsigned long kernel;
+ unsigned long user;
+ unsigned long unit_mask;
+};
+
+extern struct op_counter_config counter_config[];
+
+#endif /* OP_COUNTER_H */
diff -PNaur xen-unstable/xen/arch/x86/oprofile/op_model_p4.c xen-unstable-prof/xen/arch/x86/oprofile/op_model_p4.c
--- xen-unstable/xen/arch/x86/oprofile/op_model_p4.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/op_model_p4.c 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,744 @@
+/**
+ * @file op_model_p4.c
+ * P4 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/types.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <asm/apic.h>
+#include <asm/processor.h>
+#include <xen/sched.h>
+
+#include "op_x86_model.h"
+#include "op_counter.h"
+
+#define NUM_EVENTS 39
+
+#define NUM_COUNTERS_NON_HT 8
+#define NUM_ESCRS_NON_HT 45
+#define NUM_CCCRS_NON_HT 18
+#define NUM_CONTROLS_NON_HT (NUM_ESCRS_NON_HT + NUM_CCCRS_NON_HT)
+
+#define NUM_COUNTERS_HT2 4
+#define NUM_ESCRS_HT2 23
+#define NUM_CCCRS_HT2 9
+#define NUM_CONTROLS_HT2 (NUM_ESCRS_HT2 + NUM_CCCRS_HT2)
+
+static unsigned int num_counters = NUM_COUNTERS_NON_HT;
+
+
+/* this has to be checked dynamically since the
+ hyper-threadedness of a chip is discovered at
+ kernel boot-time. */
+static inline void setup_num_counters(void)
+{
+#ifdef CONFIG_SMP
+ if (cpu_has_ht)
+ num_counters = NUM_COUNTERS_HT2;
+#endif
+}
+
+static int inline addr_increment(void)
+{
+#ifdef CONFIG_SMP
+ return cpu_has_ht ? 2 : 1;
+#else
+ return 1;
+#endif
+}
+
+
+/* tables to simulate simplified hardware view of p4 registers */
+struct p4_counter_binding {
+ int virt_counter;
+ int counter_address;
+ int cccr_address;
+};
+
+struct p4_event_binding {
+ int escr_select; /* value to put in CCCR */
+ int event_select; /* value to put in ESCR */
+ struct {
+ int virt_counter; /* for this counter... */
+ int escr_address; /* use this ESCR */
+ } bindings[2];
+};
+
+/* nb: these CTR_* defines are a duplicate of defines in
+ event/i386.p4*events. */
+
+
+#define CTR_BPU_0 (1 << 0)
+#define CTR_MS_0 (1 << 1)
+#define CTR_FLAME_0 (1 << 2)
+#define CTR_IQ_4 (1 << 3)
+#define CTR_BPU_2 (1 << 4)
+#define CTR_MS_2 (1 << 5)
+#define CTR_FLAME_2 (1 << 6)
+#define CTR_IQ_5 (1 << 7)
+
+static struct p4_counter_binding p4_counters [NUM_COUNTERS_NON_HT] = {
+ { CTR_BPU_0, MSR_P4_BPU_PERFCTR0, MSR_P4_BPU_CCCR0 },
+ { CTR_MS_0, MSR_P4_MS_PERFCTR0, MSR_P4_MS_CCCR0 },
+ { CTR_FLAME_0, MSR_P4_FLAME_PERFCTR0, MSR_P4_FLAME_CCCR0 },
+ { CTR_IQ_4, MSR_P4_IQ_PERFCTR4, MSR_P4_IQ_CCCR4 },
+ { CTR_BPU_2, MSR_P4_BPU_PERFCTR2, MSR_P4_BPU_CCCR2 },
+ { CTR_MS_2, MSR_P4_MS_PERFCTR2, MSR_P4_MS_CCCR2 },
+ { CTR_FLAME_2, MSR_P4_FLAME_PERFCTR2, MSR_P4_FLAME_CCCR2 },
+ { CTR_IQ_5, MSR_P4_IQ_PERFCTR5, MSR_P4_IQ_CCCR5 }
+};
+
+#define NUM_UNUSED_CCCRS NUM_CCCRS_NON_HT - NUM_COUNTERS_NON_HT
+
+/* All cccr we don't use. */
+static int p4_unused_cccr[NUM_UNUSED_CCCRS] = {
+ MSR_P4_BPU_CCCR1, MSR_P4_BPU_CCCR3,
+ MSR_P4_MS_CCCR1, MSR_P4_MS_CCCR3,
+ MSR_P4_FLAME_CCCR1, MSR_P4_FLAME_CCCR3,
+ MSR_P4_IQ_CCCR0, MSR_P4_IQ_CCCR1,
+ MSR_P4_IQ_CCCR2, MSR_P4_IQ_CCCR3
+};
+
+/* p4 event codes in libop/op_event.h are indices into this table. */
+
+static struct p4_event_binding p4_events[NUM_EVENTS] = {
+
+ { /* BRANCH_RETIRED */
+ 0x05, 0x06,
+ { {CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ {CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* MISPRED_BRANCH_RETIRED */
+ 0x04, 0x03,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* TC_DELIVER_MODE */
+ 0x01, 0x01,
+ { { CTR_MS_0, MSR_P4_TC_ESCR0},
+ { CTR_MS_2, MSR_P4_TC_ESCR1} }
+ },
+
+ { /* BPU_FETCH_REQUEST */
+ 0x00, 0x03,
+ { { CTR_BPU_0, MSR_P4_BPU_ESCR0},
+ { CTR_BPU_2, MSR_P4_BPU_ESCR1} }
+ },
+
+ { /* ITLB_REFERENCE */
+ 0x03, 0x18,
+ { { CTR_BPU_0, MSR_P4_ITLB_ESCR0},
+ { CTR_BPU_2, MSR_P4_ITLB_ESCR1} }
+ },
+
+ { /* MEMORY_CANCEL */
+ 0x05, 0x02,
+ { { CTR_FLAME_0, MSR_P4_DAC_ESCR0},
+ { CTR_FLAME_2, MSR_P4_DAC_ESCR1} }
+ },
+
+ { /* MEMORY_COMPLETE */
+ 0x02, 0x08,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* LOAD_PORT_REPLAY */
+ 0x02, 0x04,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* STORE_PORT_REPLAY */
+ 0x02, 0x05,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* MOB_LOAD_REPLAY */
+ 0x02, 0x03,
+ { { CTR_BPU_0, MSR_P4_MOB_ESCR0},
+ { CTR_BPU_2, MSR_P4_MOB_ESCR1} }
+ },
+
+ { /* PAGE_WALK_TYPE */
+ 0x04, 0x01,
+ { { CTR_BPU_0, MSR_P4_PMH_ESCR0},
+ { CTR_BPU_2, MSR_P4_PMH_ESCR1} }
+ },
+
+ { /* BSQ_CACHE_REFERENCE */
+ 0x07, 0x0c,
+ { { CTR_BPU_0, MSR_P4_BSU_ESCR0},
+ { CTR_BPU_2, MSR_P4_BSU_ESCR1} }
+ },
+
+ { /* IOQ_ALLOCATION */
+ 0x06, 0x03,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { 0, 0 } }
+ },
+
+ { /* IOQ_ACTIVE_ENTRIES */
+ 0x06, 0x1a,
+ { { CTR_BPU_2, MSR_P4_FSB_ESCR1},
+ { 0, 0 } }
+ },
+
+ { /* FSB_DATA_ACTIVITY */
+ 0x06, 0x17,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { CTR_BPU_2, MSR_P4_FSB_ESCR1} }
+ },
+
+ { /* BSQ_ALLOCATION */
+ 0x07, 0x05,
+ { { CTR_BPU_0, MSR_P4_BSU_ESCR0},
+ { 0, 0 } }
+ },
+
+ { /* BSQ_ACTIVE_ENTRIES */
+ 0x07, 0x06,
+ { { CTR_BPU_2, MSR_P4_BSU_ESCR1 /* guess */},
+ { 0, 0 } }
+ },
+
+ { /* X87_ASSIST */
+ 0x05, 0x03,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* SSE_INPUT_ASSIST */
+ 0x01, 0x34,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* PACKED_SP_UOP */
+ 0x01, 0x08,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* PACKED_DP_UOP */
+ 0x01, 0x0c,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* SCALAR_SP_UOP */
+ 0x01, 0x0a,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* SCALAR_DP_UOP */
+ 0x01, 0x0e,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* 64BIT_MMX_UOP */
+ 0x01, 0x02,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* 128BIT_MMX_UOP */
+ 0x01, 0x1a,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* X87_FP_UOP */
+ 0x01, 0x04,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* X87_SIMD_MOVES_UOP */
+ 0x01, 0x2e,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* MACHINE_CLEAR */
+ 0x05, 0x02,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* GLOBAL_POWER_EVENTS */
+ 0x06, 0x13 /* older manual says 0x05, newer 0x13 */,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { CTR_BPU_2, MSR_P4_FSB_ESCR1} }
+ },
+
+ { /* TC_MS_XFER */
+ 0x00, 0x05,
+ { { CTR_MS_0, MSR_P4_MS_ESCR0},
+ { CTR_MS_2, MSR_P4_MS_ESCR1} }
+ },
+
+ { /* UOP_QUEUE_WRITES */
+ 0x00, 0x09,
+ { { CTR_MS_0, MSR_P4_MS_ESCR0},
+ { CTR_MS_2, MSR_P4_MS_ESCR1} }
+ },
+
+ { /* FRONT_END_EVENT */
+ 0x05, 0x08,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* EXECUTION_EVENT */
+ 0x05, 0x0c,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* REPLAY_EVENT */
+ 0x05, 0x09,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* INSTR_RETIRED */
+ 0x04, 0x02,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* UOPS_RETIRED */
+ 0x04, 0x01,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* UOP_TYPE */
+ 0x02, 0x02,
+ { { CTR_IQ_4, MSR_P4_RAT_ESCR0},
+ { CTR_IQ_5, MSR_P4_RAT_ESCR1} }
+ },
+
+ { /* RETIRED_MISPRED_BRANCH_TYPE */
+ 0x02, 0x05,
+ { { CTR_MS_0, MSR_P4_TBPU_ESCR0},
+ { CTR_MS_2, MSR_P4_TBPU_ESCR1} }
+ },
+
+ { /* RETIRED_BRANCH_TYPE */
+ 0x02, 0x04,
+ { { CTR_MS_0, MSR_P4_TBPU_ESCR0},
+ { CTR_MS_2, MSR_P4_TBPU_ESCR1} }
+ }
+};
+
+
+#define MISC_PMC_ENABLED_P(x) ((x) & 1 << 7)
+
+#define ESCR_RESERVED_BITS 0x80000003
+#define ESCR_CLEAR(escr) ((escr) &= ESCR_RESERVED_BITS)
+#define ESCR_SET_USR_0(escr, usr) ((escr) |= (((usr) & 1) << 2))
+#define ESCR_SET_OS_0(escr, os) ((escr) |= (((os) & 1) << 3))
+#define ESCR_SET_USR_1(escr, usr) ((escr) |= (((usr) & 1)))
+#define ESCR_SET_OS_1(escr, os) ((escr) |= (((os) & 1) << 1))
+#define ESCR_SET_EVENT_SELECT(escr, sel) ((escr) |= (((sel) & 0x3f) << 25))
+#define ESCR_SET_EVENT_MASK(escr, mask) ((escr) |= (((mask) & 0xffff) << 9))
+#define ESCR_READ(escr,high,ev,i) do {rdmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0)
+#define ESCR_WRITE(escr,high,ev,i) do {wrmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0)
+
+#define CCCR_RESERVED_BITS 0x38030FFF
+#define CCCR_CLEAR(cccr) ((cccr) &= CCCR_RESERVED_BITS)
+#define CCCR_SET_REQUIRED_BITS(cccr) ((cccr) |= 0x00030000)
+#define CCCR_SET_ESCR_SELECT(cccr, sel) ((cccr) |= (((sel) & 0x07) << 13))
+#define CCCR_SET_PMI_OVF_0(cccr) ((cccr) |= (1<<26))
+#define CCCR_SET_PMI_OVF_1(cccr) ((cccr) |= (1<<27))
+#define CCCR_SET_ENABLE(cccr) ((cccr) |= (1<<12))
+#define CCCR_SET_DISABLE(cccr) ((cccr) &= ~(1<<12))
+#define CCCR_READ(low, high, i) do {rdmsr(p4_counters[(i)].cccr_address, (low), (high));} while (0)
+#define CCCR_WRITE(low, high, i) do {wrmsr(p4_counters[(i)].cccr_address, (low), (high));} while (0)
+#define CCCR_OVF_P(cccr) ((cccr) & (1U<<31))
+#define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31)))
+
+#define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0)
+#define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0)
+#define CTR_OVERFLOW_P(ctr) (!((ctr) & 0x80000000))
+
+
+/* this assigns a "stagger" to the current CPU, which is used throughout
+ the code in this module as an extra array offset, to select the "even"
+ or "odd" part of all the divided resources. */
+static unsigned int get_stagger(void)
+{
+#ifdef CONFIG_SMP
+ /*int cpu = smp_processor_id();
+ return (cpu != first_cpu(cpu_sibling_map[cpu]));*/
+ /* We want the two logical cpus of a physical cpu to use
+ disjoint set of counters. The following code is wrong. */
+ return 0;
+#endif
+ return 0;
+}
+
+
+/* finally, mediate access to a real hardware counter
+ by passing a "virtual" counter numer to this macro,
+ along with your stagger setting. */
+#define VIRT_CTR(stagger, i) ((i) + ((num_counters) * (stagger)))
+
+static unsigned long reset_value[NUM_COUNTERS_NON_HT];
+
+
+static void p4_fill_in_addresses(struct op_msrs * const msrs)
+{
+ unsigned int i;
+ unsigned int addr, stag;
+
+ setup_num_counters();
+ stag = get_stagger();
+
+ /* the counter registers we pay attention to */
+ for (i = 0; i < num_counters; ++i) {
+ msrs->counters[i].addr =
+ p4_counters[VIRT_CTR(stag, i)].counter_address;
+ }
+
+ /* FIXME: bad feeling, we don't save the 10 counters we don't use. */
+
+ /* 18 CCCR registers */
+ for (i = 0, addr = MSR_P4_BPU_CCCR0 + stag;
+ addr <= MSR_P4_IQ_CCCR5; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* 43 ESCR registers in three or four discontiguous group */
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
+ * to avoid special case in nmi_{save|restore}_registers() */
+ if (boot_cpu_data.x86_model >= 0x3) {
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ } else {
+ for (addr = MSR_P4_IQ_ESCR0 + stag;
+ addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ for (addr = MSR_P4_MS_ESCR0 + stag;
+ addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ for (addr = MSR_P4_IX_ESCR0 + stag;
+ addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* there are 2 remaining non-contiguously located ESCRs */
+
+ if (num_counters == NUM_COUNTERS_NON_HT) {
+ /* standard non-HT CPUs handle both remaining ESCRs*/
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+
+ } else if (stag == 0) {
+ /* HT CPUs give the first remainder to the even thread, as
+ the 32nd control register */
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+
+ } else {
+ /* and two copies of the second to the odd thread,
+ for the 22st and 23nd control registers */
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ }
+}
+
+
+static void pmc_setup_one_p4_counter(unsigned int ctr)
+{
+ int i;
+ int const maxbind = 2;
+ unsigned int cccr = 0;
+ unsigned int escr = 0;
+ unsigned int high = 0;
+ unsigned int counter_bit;
+ struct p4_event_binding *ev = NULL;
+ unsigned int stag;
+
+ stag = get_stagger();
+
+ /* convert from counter *number* to counter *bit* */
+ counter_bit = 1 << VIRT_CTR(stag, ctr);
+
+ /* find our event binding structure. */
+ if (counter_config[ctr].event <= 0 || counter_config[ctr].event > NUM_EVENTS) {
+ printk(KERN_ERR
+ "oprofile: P4 event code 0x%lx out of range\n",
+ counter_config[ctr].event);
+ return;
+ }
+
+ ev = &(p4_events[counter_config[ctr].event - 1]);
+
+ for (i = 0; i < maxbind; i++) {
+ if (ev->bindings[i].virt_counter & counter_bit) {
+
+ /* modify ESCR */
+ ESCR_READ(escr, high, ev, i);
+ ESCR_CLEAR(escr);
+ if (stag == 0) {
+ ESCR_SET_USR_0(escr, counter_config[ctr].user);
+ ESCR_SET_OS_0(escr, counter_config[ctr].kernel);
+ } else {
+ ESCR_SET_USR_1(escr, counter_config[ctr].user);
+ ESCR_SET_OS_1(escr, counter_config[ctr].kernel);
+ }
+ ESCR_SET_EVENT_SELECT(escr, ev->event_select);
+ ESCR_SET_EVENT_MASK(escr, counter_config[ctr].unit_mask);
+ ESCR_WRITE(escr, high, ev, i);
+
+ /* modify CCCR */
+ CCCR_READ(cccr, high, VIRT_CTR(stag, ctr));
+ CCCR_CLEAR(cccr);
+ CCCR_SET_REQUIRED_BITS(cccr);
+ CCCR_SET_ESCR_SELECT(cccr, ev->escr_select);
+ if (stag == 0) {
+ CCCR_SET_PMI_OVF_0(cccr);
+ } else {
+ CCCR_SET_PMI_OVF_1(cccr);
+ }
+ CCCR_WRITE(cccr, high, VIRT_CTR(stag, ctr));
+ return;
+ }
+ }
+
+ printk(KERN_ERR
+ "oprofile: P4 event code 0x%lx no binding, stag %d ctr %d\n",
+ counter_config[ctr].event, stag, ctr);
+}
+
+
+static void p4_setup_ctrs(struct op_msrs const * const msrs)
+{
+ unsigned int i;
+ unsigned int low, high;
+ unsigned int addr;
+ unsigned int stag;
+
+ stag = get_stagger();
+
+ rdmsr(MSR_IA32_MISC_ENABLE, low, high);
+ if (! MISC_PMC_ENABLED_P(low)) {
+ printk(KERN_ERR "oprofile: P4 PMC not available\n");
+ return;
+ }
+
+ /* clear the cccrs we will use */
+ for (i = 0 ; i < num_counters ; i++) {
+ rdmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
+ CCCR_CLEAR(low);
+ CCCR_SET_REQUIRED_BITS(low);
+ wrmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
+ }
+
+ /* clear cccrs outside our concern */
+ for (i = stag ; i < NUM_UNUSED_CCCRS ; i += addr_increment()) {
+ rdmsr(p4_unused_cccr[i], low, high);
+ CCCR_CLEAR(low);
+ CCCR_SET_REQUIRED_BITS(low);
+ wrmsr(p4_unused_cccr[i], low, high);
+ }
+
+ /* clear all escrs (including those outside our concern) */
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) {
+ wrmsr(addr, 0, 0);
+ }
+
+ /* On older models clear also MSR_P4_IQ_ESCR0/1 */
+ if (boot_cpu_data.x86_model < 0x3) {
+ wrmsr(MSR_P4_IQ_ESCR0, 0, 0);
+ wrmsr(MSR_P4_IQ_ESCR1, 0, 0);
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ wrmsr(addr, 0, 0);
+ }
+
+ for (addr = MSR_P4_MS_ESCR0 + stag;
+ addr <= MSR_P4_TC_ESCR1; addr += addr_increment()){
+ wrmsr(addr, 0, 0);
+ }
+
+ for (addr = MSR_P4_IX_ESCR0 + stag;
+ addr <= MSR_P4_CRU_ESCR3; addr += addr_increment()){
+ wrmsr(addr, 0, 0);
+ }
+
+ if (num_counters == NUM_COUNTERS_NON_HT) {
+ wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
+ wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
+ } else if (stag == 0) {
+ wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
+ } else {
+ wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
+ }
+
+ /* setup all counters */
+ for (i = 0 ; i < num_counters ; ++i) {
+ if (counter_config[i].enabled) {
+ reset_value[i] = counter_config[i].count;
+ pmc_setup_one_p4_counter(i);
+ CTR_WRITE(counter_config[i].count, VIRT_CTR(stag, i));
+ } else {
+ reset_value[i] = 0;
+ }
+ }
+}
+
+
+extern void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event);
+extern int is_profiled(struct domain * d);
+extern struct domain * primary_profiler;
+
+static int p4_check_ctrs(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs)
+{
+ unsigned long ctr, low, high, stag, real;
+ int i, ovf = 0;
+ unsigned long eip = regs->eip;
+ int mode = 0;
+
+ if (RING_1(regs))
+ mode = 1;
+ else if (RING_0(regs))
+ mode = 2;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ if (!reset_value[i])
+ continue;
+
+ /*
+ * there is some eccentricity in the hardware which
+ * requires that we perform 2 extra corrections:
+ *
+ * - check both the CCCR:OVF flag for overflow and the
+ * counter high bit for un-flagged overflows.
+ *
+ * - write the counter back twice to ensure it gets
+ * updated properly.
+ *
+ * the former seems to be related to extra NMIs happening
+ * during the current NMI; the latter is reported as errata
+ * N15 in intel doc 249199-029, pentium 4 specification
+ * update, though their suggested work-around does not
+ * appear to solve the problem.
+ */
+
+ real = VIRT_CTR(stag, i);
+
+ CCCR_READ(low, high, real);
+ CTR_READ(ctr, high, real);
+ if (CCCR_OVF_P(low) || CTR_OVERFLOW_P(ctr)) {
+ pmc_log_event(current->domain, eip, mode, i);
+ CTR_WRITE(reset_value[i], real);
+ CCCR_CLEAR_OVF(low);
+ CCCR_WRITE(low, high, real);
+ CTR_WRITE(reset_value[i], real);
+ ovf = 1;
+ }
+ }
+
+ /* P4 quirk: you have to re-unmask the apic vector */
+ apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
+ /* See op_model_ppro.c */
+ return ovf;
+}
+
+
+static void p4_start(struct op_msrs const * const msrs)
+{
+ unsigned int low, high, stag;
+ int i;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ if (!reset_value[i])
+ continue;
+ CCCR_READ(low, high, VIRT_CTR(stag, i));
+ CCCR_SET_ENABLE(low);
+ CCCR_WRITE(low, high, VIRT_CTR(stag, i));
+ }
+}
+
+
+static void p4_stop(struct op_msrs const * const msrs)
+{
+ unsigned int low, high, stag;
+ int i;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ CCCR_READ(low, high, VIRT_CTR(stag, i));
+ CCCR_SET_DISABLE(low);
+ CCCR_WRITE(low, high, VIRT_CTR(stag, i));
+ }
+}
+
+
+#ifdef CONFIG_SMP
+struct op_x86_model_spec const op_p4_ht2_spec = {
+ .num_counters = NUM_COUNTERS_HT2,
+ .num_controls = NUM_CONTROLS_HT2,
+ .fill_in_addresses = &p4_fill_in_addresses,
+ .setup_ctrs = &p4_setup_ctrs,
+ .check_ctrs = &p4_check_ctrs,
+ .start = &p4_start,
+ .stop = &p4_stop
+};
+#endif
+
+struct op_x86_model_spec const op_p4_spec = {
+ .num_counters = NUM_COUNTERS_NON_HT,
+ .num_controls = NUM_CONTROLS_NON_HT,
+ .fill_in_addresses = &p4_fill_in_addresses,
+ .setup_ctrs = &p4_setup_ctrs,
+ .check_ctrs = &p4_check_ctrs,
+ .start = &p4_start,
+ .stop = &p4_stop
+};
diff -PNaur xen-unstable/xen/arch/x86/oprofile/op_model_ppro.c xen-unstable-prof/xen/arch/x86/oprofile/op_model_ppro.c
--- xen-unstable/xen/arch/x86/oprofile/op_model_ppro.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/op_model_ppro.c 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,166 @@
+/**
+ * @file op_model_ppro.h
+ * pentium pro / P6 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon
+ * @author Philippe Elie
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/types.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <asm/apic.h>
+#include <asm/processor.h>
+#include <xen/sched.h>
+
+#include "op_x86_model.h"
+#include "op_counter.h"
+
+#define NUM_COUNTERS 2
+#define NUM_CONTROLS 2
+
+#define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
+#define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0)
+#define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
+
+#define CTRL_READ(l,h,msrs,c) do {rdmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
+#define CTRL_WRITE(l,h,msrs,c) do {wrmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
+#define CTRL_SET_ACTIVE(n) (n |= (1<<22))
+#define CTRL_SET_INACTIVE(n) (n &= ~(1<<22))
+#define CTRL_CLEAR(x) (x &= (1<<21))
+#define CTRL_SET_ENABLE(val) (val |= 1<<20)
+#define CTRL_SET_USR(val,u) (val |= ((u & 1) << 16))
+#define CTRL_SET_KERN(val,k) (val |= ((k & 1) << 17))
+#define CTRL_SET_UM(val, m) (val |= (m << 8))
+#define CTRL_SET_EVENT(val, e) (val |= e)
+
+static unsigned long reset_value[NUM_COUNTERS];
+
+static void ppro_fill_in_addresses(struct op_msrs * const msrs)
+{
+ msrs->counters[0].addr = MSR_P6_PERFCTR0;
+ msrs->counters[1].addr = MSR_P6_PERFCTR1;
+
+ msrs->controls[0].addr = MSR_P6_EVNTSEL0;
+ msrs->controls[1].addr = MSR_P6_EVNTSEL1;
+}
+
+
+static void ppro_setup_ctrs(struct op_msrs const * const msrs)
+{
+ unsigned int low, high;
+ int i;
+
+ /* clear all counters */
+ for (i = 0 ; i < NUM_CONTROLS; ++i) {
+ CTRL_READ(low, high, msrs, i);
+ CTRL_CLEAR(low);
+ CTRL_WRITE(low, high, msrs, i);
+ }
+
+ /* avoid a false detection of ctr overflows in NMI handler */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ CTR_WRITE(1, msrs, i);
+ }
+
+ /* enable active counters */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ if (counter_config[i].enabled) {
+ reset_value[i] = counter_config[i].count;
+
+ CTR_WRITE(counter_config[i].count, msrs, i);
+
+ CTRL_READ(low, high, msrs, i);
+ CTRL_CLEAR(low);
+ CTRL_SET_ENABLE(low);
+ CTRL_SET_USR(low, counter_config[i].user);
+ CTRL_SET_KERN(low, counter_config[i].kernel);
+ CTRL_SET_UM(low, counter_config[i].unit_mask);
+ CTRL_SET_EVENT(low, counter_config[i].event);
+ CTRL_WRITE(low, high, msrs, i);
+ }
+ }
+}
+
+extern void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event);
+extern int is_profiled(struct domain * d);
+extern struct domain * primary_profiler;
+
+static int ppro_check_ctrs(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs)
+{
+ unsigned int low, high;
+ int i, ovf = 0;
+ unsigned long eip = regs->eip;
+ int mode = 0;
+
+ if (RING_1(regs))
+ mode = 1;
+ else if (RING_0(regs))
+ mode = 2;
+
+ for (i = 0 ; i < NUM_COUNTERS; ++i) {
+ CTR_READ(low, high, msrs, i);
+ if (CTR_OVERFLOWED(low)) {
+ pmc_log_event(current->domain, eip, mode, i);
+ CTR_WRITE(reset_value[i], msrs, i);
+ ovf = 1;
+ }
+ }
+
+ /* Only P6 based Pentium M need to re-unmask the apic vector but it
+ * doesn't hurt other P6 variant */
+ apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
+ /* We can't work out if we really handled an interrupt. We
+ * might have caught a *second* counter just after overflowing
+ * the interrupt for this counter then arrives
+ * and we don't find a counter that's overflowed, so we
+ * would return 0 and get dazed + confused. Instead we always
+ * assume we found an overflow. This sucks.
+ */
+ return ovf;
+}
+
+
+static void ppro_start(struct op_msrs const * const msrs)
+{
+ unsigned int low,high;
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_ACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+}
+
+static void ppro_stop(struct op_msrs const * const msrs)
+{
+ unsigned int low,high;
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_INACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+}
+
+unsigned int read_ctr(struct op_msrs const * const msrs, int i)
+{
+ unsigned int low, high;
+ CTR_READ(low, high, msrs, i);
+ return low;
+}
+
+struct op_x86_model_spec const op_ppro_spec = {
+ .num_counters = NUM_COUNTERS,
+ .num_controls = NUM_CONTROLS,
+ .fill_in_addresses = &ppro_fill_in_addresses,
+ .setup_ctrs = &ppro_setup_ctrs,
+ .check_ctrs = &ppro_check_ctrs,
+ .start = &ppro_start,
+ .stop = &ppro_stop
+};
diff -PNaur xen-unstable/xen/arch/x86/oprofile/op_x86_model.h xen-unstable-prof/xen/arch/x86/oprofile/op_x86_model.h
--- xen-unstable/xen/arch/x86/oprofile/op_x86_model.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/op_x86_model.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,55 @@
+/**
+ * @file op_x86_model.h
+ * interface to x86 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#ifndef OP_X86_MODEL_H
+#define OP_X86_MODEL_H
+
+struct op_saved_msr {
+ unsigned int high;
+ unsigned int low;
+};
+
+struct op_msr {
+ unsigned long addr;
+ struct op_saved_msr saved;
+};
+
+struct op_msrs {
+ struct op_msr * counters;
+ struct op_msr * controls;
+};
+
+struct pt_regs;
+
+/* The model vtable abstracts the differences between
+ * various x86 CPU model's perfctr support.
+ */
+struct op_x86_model_spec {
+ unsigned int const num_counters;
+ unsigned int const num_controls;
+ void (*fill_in_addresses)(struct op_msrs * const msrs);
+ void (*setup_ctrs)(struct op_msrs const * const msrs);
+ int (*check_ctrs)(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs);
+ void (*start)(struct op_msrs const * const msrs);
+ void (*stop)(struct op_msrs const * const msrs);
+};
+
+extern struct op_x86_model_spec const op_ppro_spec;
+extern struct op_x86_model_spec const op_p4_spec;
+extern struct op_x86_model_spec const op_p4_ht2_spec;
+extern struct op_x86_model_spec const op_athlon_spec;
+
+#endif /* OP_X86_MODEL_H */
diff -PNaur xen-unstable/xen/arch/x86/oprofile/pmc.c xen-unstable-prof/xen/arch/x86/oprofile/pmc.c
--- xen-unstable/xen/arch/x86/oprofile/pmc.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/arch/x86/oprofile/pmc.c 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,281 @@
+/*
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ * written by Aravind Menon, email: xenoprof@groups.hp.com
+ */
+
+#include <xen/sched.h>
+
+#include "op_counter.h"
+
+int active_domains[MAX_OPROF_DOMAINS];
+int passive_domains[MAX_OPROF_DOMAINS];
+unsigned int adomains = 0;
+unsigned int pdomains = 0;
+unsigned int activated = 0;
+
+#define VIRQ_BITMASK_SIZE (MAX_OPROF_DOMAINS/32 + 1)
+
+struct domain * primary_profiler = NULL;
+struct domain * adomain_ptrs[MAX_OPROF_DOMAINS];
+unsigned int virq_ovf_pending[VIRQ_BITMASK_SIZE];
+
+int is_active(struct domain *d)
+{
+ int i;
+ for (i = 0; i < adomains; i++)
+ if (d->domain_id == active_domains[i])
+ return 1;
+ return 0;
+}
+
+int active_id(struct domain *d)
+{
+ int i;
+ for (i = 0; i < adomains; i++)
+ if (d == adomain_ptrs[i])
+ return i;
+ return -1;
+}
+
+void free_adomain_ptrs()
+{
+ int i;
+ int num = adomains;
+
+ adomains = 0;
+ for (i = 0; i < VIRQ_BITMASK_SIZE; i++)
+ virq_ovf_pending[i] = 0;
+
+ for (i = 0; i < num; i++) {
+ put_domain(adomain_ptrs[i]);
+ adomain_ptrs[i] = NULL;
+ }
+}
+
+int set_adomain_ptrs(int num)
+{
+ int i;
+ struct domain *d;
+
+ for (i = 0; i < VIRQ_BITMASK_SIZE; i++)
+ virq_ovf_pending[i] = 0;
+
+ for (i = 0; i < num; i++) {
+ d = find_domain_by_id(active_domains[i]);
+ if (!d) {
+ free_adomain_ptrs();
+ return -EFAULT;
+ }
+ adomain_ptrs[i] = d;
+ adomains++;
+ }
+ return 0;
+}
+
+int set_active(struct domain *d)
+{
+ if (is_active(d))
+ return 0;
+ /* hack if we run out of space */
+ if (adomains >= MAX_OPROF_DOMAINS) {
+ adomains--;
+ put_domain(adomain_ptrs[adomains]);
+ }
+ active_domains[adomains] = d->domain_id;
+ if (get_domain(d))
+ adomain_ptrs[adomains++] = d;
+ else {
+ free_adomain_ptrs();
+ return -EFAULT;
+ }
+ return 0;
+}
+
+int is_passive(struct domain *d)
+{
+ int i;
+ for (i = 0; i < pdomains; i++)
+ if (d->domain_id == passive_domains[i])
+ return 1;
+ return 0;
+}
+
+int is_profiled(struct domain *d)
+{
+ if (is_active(d) || is_passive(d))
+ return 1;
+ return 0;
+}
+
+void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event)
+{
+ shared_info_t *s = NULL;
+ struct domain *dest = d;
+ int head = 0;
+
+ if (!is_profiled(d))
+ return;
+
+ if (is_passive(d)) {
+ dest = primary_profiler;
+ goto log_passive;
+ }
+
+log_active:
+ s = dest->shared_info;
+
+ head = s->event_head;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+
+ if (s->losing_samples)
+ s->samples_lost++;
+ if (head == s->event_tail - 1 || (head == MAX_OPROF_EVENTS - 1 && s->event_tail == 0))
+ s->losing_samples = 1;
+
+ s->event_log[head].eip = eip;
+ s->event_log[head].mode = mode;
+ s->event_log[head].event = event;
+ head++;
+ s->event_head = head;
+ return;
+
+log_passive:
+ /* We use the following inefficient format for logging events from other
+ domains. We put a special record indicating that the next record is
+ for another domain. This is done for each sample from another
+ domain */
+ s = dest->shared_info;
+
+ head = s->event_head;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+
+ if (s->losing_samples)
+ s->samples_lost++;
+ if (head == s->event_tail - 1 || (head == MAX_OPROF_EVENTS - 1 && s->event_tail == 0))
+ s->losing_samples = 1;
+
+ s->event_log[head].eip = ~1UL;
+ s->event_log[head].mode = ~0;
+ s->event_log[head].event = d->domain_id;
+ head++;
+ s->event_head = head;
+ goto log_active;
+}
+
+static void pmc_event_init(struct domain *d)
+{
+ shared_info_t *s = d->shared_info;
+ s->event_head = 0;
+ s->event_tail = 0;
+ s->losing_samples = 0;
+ s->samples_lost = 0;
+ s->nmi_restarts = 0;
+ s->active_samples = 0;
+ s->passive_samples = 0;
+ s->other_samples = 0;
+}
+
+extern int nmi_init(int *num_events, int *is_primary);
+extern int nmi_reserve_counters(void);
+extern int nmi_setup_events(void);
+extern int nmi_enable_virq(void);
+extern int nmi_start(void);
+extern void nmi_stop(void);
+extern void nmi_disable_virq(void);
+extern void nmi_release_counters(void);
+
+#define PRIV_OP(op) ((op == PMC_SET_ACTIVE) || (op == PMC_SET_PASSIVE) || (op == PMC_RESERVE_COUNTERS) \
+ || (op == PMC_SETUP_EVENTS) || (op == PMC_START) || (op == PMC_STOP) \
+ || (op == PMC_RELEASE_COUNTERS) || (op == PMC_SHUTDOWN))
+
+int do_pmc_op(int op, unsigned int arg1, unsigned int arg2)
+{
+ int ret = 0;
+
+ if (PRIV_OP(op) && current->domain != primary_profiler)
+ return -EPERM;
+
+ switch (op) {
+ case PMC_INIT:
+ ret = nmi_init((int *)arg1, (int *)arg2);
+ break;
+
+ case PMC_SET_ACTIVE:
+ if (adomains != 0)
+ return -EPERM;
+ if (copy_from_user((void *)&active_domains,
+ (void *)arg1, arg2*sizeof(int)))
+ return -EFAULT;
+ if (set_adomain_ptrs(arg2))
+ return -EFAULT;
+ if (set_active(current->domain))
+ return -EFAULT;
+ break;
+
+ case PMC_SET_PASSIVE:
+ if (pdomains != 0)
+ return -EPERM;
+ if (copy_from_user((void *)&passive_domains,
+ (void *)arg1, arg2*sizeof(int)))
+ return -EFAULT;
+ pdomains = arg2;
+ break;
+
+ case PMC_RESERVE_COUNTERS:
+ ret = nmi_reserve_counters();
+ break;
+
+ case PMC_SETUP_EVENTS:
+ if (copy_from_user((void *)&counter_config,
+ (void *)arg1, arg2*sizeof(struct op_counter_config)))
+ return -EFAULT;
+ ret = nmi_setup_events();
+ break;
+
+ case PMC_ENABLE_VIRQ:
+ if (!is_active(current->domain)) {
+ if (current->domain != primary_profiler)
+ return -EPERM;
+ else
+ set_active(current->domain);
+ }
+ ret = nmi_enable_virq();
+ pmc_event_init(current->domain);
+ activated++;
+ break;
+
+ case PMC_START:
+ if (activated < adomains)
+ return -EPERM;
+ ret = nmi_start();
+ break;
+
+ case PMC_STOP:
+ nmi_stop();
+ break;
+
+ case PMC_DISABLE_VIRQ:
+ if (!is_active(current->domain))
+ return -EPERM;
+ nmi_disable_virq();
+ activated--;
+ break;
+
+ case PMC_RELEASE_COUNTERS:
+ nmi_release_counters();
+ break;
+
+ case PMC_SHUTDOWN:
+ free_adomain_ptrs();
+ pdomains = 0;
+ activated = 0;
+ primary_profiler = NULL;
+ break;
+
+ default:
+ ret = -EINVAL;
+ }
+ return ret;
+}
diff -PNaur xen-unstable/xen/arch/x86/traps.c xen-unstable-prof/xen/arch/x86/traps.c
--- xen-unstable/xen/arch/x86/traps.c 2005-05-22 20:12:48.000000000 -0700
+++ xen-unstable-prof/xen/arch/x86/traps.c 2005-05-24 11:54:47.000000000 -0700
@@ -2,6 +2,10 @@
* arch/x86/traps.c
*
* Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -53,6 +57,7 @@
#include <asm/debugger.h>
#include <asm/msr.h>
#include <asm/x86_emulate.h>
+#include <asm/nmi.h>
/*
* opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
@@ -1026,7 +1031,7 @@
printk("Do you have a strange power saving mode enabled?\n");
}
-asmlinkage void do_nmi(struct cpu_user_regs *regs, unsigned long reason)
+static void default_do_nmi(struct cpu_user_regs * regs, unsigned long reason)
{
++nmi_count(smp_processor_id());
@@ -1041,6 +1046,35 @@
unknown_nmi_error((unsigned char)(reason&0xff));
}
+static int dummy_nmi_callback(struct cpu_user_regs * regs, int cpu)
+{
+ return 0;
+}
+
+static nmi_callback_t nmi_callback = dummy_nmi_callback;
+
+asmlinkage void do_nmi(struct cpu_user_regs * regs, unsigned long reason)
+{
+ int cpu;
+ cpu = smp_processor_id();
+
+ if (!nmi_callback(regs, cpu))
+ default_do_nmi(regs, reason);
+}
+
+void set_nmi_callback(nmi_callback_t callback)
+{
+ nmi_callback = callback;
+}
+
+void unset_nmi_callback(void)
+{
+ nmi_callback = dummy_nmi_callback;
+}
+
+EXPORT_SYMBOL(set_nmi_callback);
+EXPORT_SYMBOL(unset_nmi_callback);
+
asmlinkage int math_state_restore(struct cpu_user_regs *regs)
{
/* Prevent recursion. */
diff -PNaur xen-unstable/xen/arch/x86/x86_32/entry.S xen-unstable-prof/xen/arch/x86/x86_32/entry.S
--- xen-unstable/xen/arch/x86/x86_32/entry.S 2005-05-22 20:12:52.000000000 -0700
+++ xen-unstable-prof/xen/arch/x86/x86_32/entry.S 2005-05-24 11:54:47.000000000 -0700
@@ -3,6 +3,10 @@
*
* Copyright (c) 2002-2004, K A Fraser
* Copyright (c) 1991, 1992 Linus Torvalds
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*
* Calling back to a guest OS:
* ===========================
@@ -564,15 +568,15 @@
movl UREGS_eflags(%esp),%eax
movb UREGS_cs(%esp),%al
testl $(3|X86_EFLAGS_VM),%eax
- jnz do_watchdog_tick
+ jnz normal_nmi
movl %ds,%eax
cmpw $(__HYPERVISOR_DS),%ax
- jne defer_nmi
+ jne inv_seg_nmi
movl %es,%eax
cmpw $(__HYPERVISOR_DS),%ax
- jne defer_nmi
+ jne inv_seg_nmi
-do_watchdog_tick:
+normal_nmi:
movl $(__HYPERVISOR_DS),%edx
movl %edx,%ds
movl %edx,%es
@@ -582,17 +586,30 @@
call do_nmi
addl $8,%esp
jmp ret_from_intr
-
-defer_nmi:
- movl $FIXMAP_apic_base,%eax
- # apic_wait_icr_idle()
-1: movl %ss:APIC_ICR(%eax),%ebx
- testl $APIC_ICR_BUSY,%ebx
- jnz 1b
- # __send_IPI_shortcut(APIC_DEST_SELF, TRAP_deferred_nmi)
- movl $(APIC_DM_FIXED | APIC_DEST_SELF | APIC_DEST_LOGICAL | \
- TRAP_deferred_nmi),%ss:APIC_ICR(%eax)
- jmp restore_all_xen
+inv_seg_nmi:
+ movl %ds,-4(%esp)
+ movl %es,-8(%esp)
+ movl %fs,-12(%esp)
+ movl %gs,-16(%esp)
+ subl $16,%esp
+
+ movl $(__HYPERVISOR_DS),%edx
+ movl %edx,%ds
+ movl %edx,%es
+
+ movl %esp,%edx
+ addl $16,%edx
+ pushl %ebx
+ pushl %edx
+ call do_nmi
+ addl $8,%esp
+
+ addl $16,%esp
+ movl -4(%esp),%ds
+ movl -8(%esp),%es
+ movl -12(%esp),%fs
+ movl -16(%esp),%gs
+ jmp restore_all_xen
nmi_parity_err:
# Clear and disable the parity-error line
@@ -749,6 +766,7 @@
.long do_boot_vcpu
.long do_ni_hypercall /* 25 */
.long do_mmuext_op
+ .long do_pmc_op
.rept NR_hypercalls-((.-hypercall_table)/4)
.long do_ni_hypercall
.endr
diff -PNaur xen-unstable/xen/include/asm-x86/msr.h xen-unstable-prof/xen/include/asm-x86/msr.h
--- xen-unstable/xen/include/asm-x86/msr.h 2005-05-22 20:12:52.000000000 -0700
+++ xen-unstable-prof/xen/include/asm-x86/msr.h 2005-05-24 11:54:47.000000000 -0700
@@ -191,6 +191,89 @@
#define MSR_P6_EVNTSEL0 0x186
#define MSR_P6_EVNTSEL1 0x187
+/* Pentium IV performance counter MSRs */
+#define MSR_P4_BPU_PERFCTR0 0x300
+#define MSR_P4_BPU_PERFCTR1 0x301
+#define MSR_P4_BPU_PERFCTR2 0x302
+#define MSR_P4_BPU_PERFCTR3 0x303
+#define MSR_P4_MS_PERFCTR0 0x304
+#define MSR_P4_MS_PERFCTR1 0x305
+#define MSR_P4_MS_PERFCTR2 0x306
+#define MSR_P4_MS_PERFCTR3 0x307
+#define MSR_P4_FLAME_PERFCTR0 0x308
+#define MSR_P4_FLAME_PERFCTR1 0x309
+#define MSR_P4_FLAME_PERFCTR2 0x30a
+#define MSR_P4_FLAME_PERFCTR3 0x30b
+#define MSR_P4_IQ_PERFCTR0 0x30c
+#define MSR_P4_IQ_PERFCTR1 0x30d
+#define MSR_P4_IQ_PERFCTR2 0x30e
+#define MSR_P4_IQ_PERFCTR3 0x30f
+#define MSR_P4_IQ_PERFCTR4 0x310
+#define MSR_P4_IQ_PERFCTR5 0x311
+#define MSR_P4_BPU_CCCR0 0x360
+#define MSR_P4_BPU_CCCR1 0x361
+#define MSR_P4_BPU_CCCR2 0x362
+#define MSR_P4_BPU_CCCR3 0x363
+#define MSR_P4_MS_CCCR0 0x364
+#define MSR_P4_MS_CCCR1 0x365
+#define MSR_P4_MS_CCCR2 0x366
+#define MSR_P4_MS_CCCR3 0x367
+#define MSR_P4_FLAME_CCCR0 0x368
+#define MSR_P4_FLAME_CCCR1 0x369
+#define MSR_P4_FLAME_CCCR2 0x36a
+#define MSR_P4_FLAME_CCCR3 0x36b
+#define MSR_P4_IQ_CCCR0 0x36c
+#define MSR_P4_IQ_CCCR1 0x36d
+#define MSR_P4_IQ_CCCR2 0x36e
+#define MSR_P4_IQ_CCCR3 0x36f
+#define MSR_P4_IQ_CCCR4 0x370
+#define MSR_P4_IQ_CCCR5 0x371
+#define MSR_P4_ALF_ESCR0 0x3ca
+#define MSR_P4_ALF_ESCR1 0x3cb
+#define MSR_P4_BPU_ESCR0 0x3b2
+#define MSR_P4_BPU_ESCR1 0x3b3
+#define MSR_P4_BSU_ESCR0 0x3a0
+#define MSR_P4_BSU_ESCR1 0x3a1
+#define MSR_P4_CRU_ESCR0 0x3b8
+#define MSR_P4_CRU_ESCR1 0x3b9
+#define MSR_P4_CRU_ESCR2 0x3cc
+#define MSR_P4_CRU_ESCR3 0x3cd
+#define MSR_P4_CRU_ESCR4 0x3e0
+#define MSR_P4_CRU_ESCR5 0x3e1
+#define MSR_P4_DAC_ESCR0 0x3a8
+#define MSR_P4_DAC_ESCR1 0x3a9
+#define MSR_P4_FIRM_ESCR0 0x3a4
+#define MSR_P4_FIRM_ESCR1 0x3a5
+#define MSR_P4_FLAME_ESCR0 0x3a6
+#define MSR_P4_FLAME_ESCR1 0x3a7
+#define MSR_P4_FSB_ESCR0 0x3a2
+#define MSR_P4_FSB_ESCR1 0x3a3
+#define MSR_P4_IQ_ESCR0 0x3ba
+#define MSR_P4_IQ_ESCR1 0x3bb
+#define MSR_P4_IS_ESCR0 0x3b4
+#define MSR_P4_IS_ESCR1 0x3b5
+#define MSR_P4_ITLB_ESCR0 0x3b6
+#define MSR_P4_ITLB_ESCR1 0x3b7
+#define MSR_P4_IX_ESCR0 0x3c8
+#define MSR_P4_IX_ESCR1 0x3c9
+#define MSR_P4_MOB_ESCR0 0x3aa
+#define MSR_P4_MOB_ESCR1 0x3ab
+#define MSR_P4_MS_ESCR0 0x3c0
+#define MSR_P4_MS_ESCR1 0x3c1
+#define MSR_P4_PMH_ESCR0 0x3ac
+#define MSR_P4_PMH_ESCR1 0x3ad
+#define MSR_P4_RAT_ESCR0 0x3bc
+#define MSR_P4_RAT_ESCR1 0x3bd
+#define MSR_P4_SAAT_ESCR0 0x3ae
+#define MSR_P4_SAAT_ESCR1 0x3af
+#define MSR_P4_SSU_ESCR0 0x3be
+#define MSR_P4_SSU_ESCR1 0x3bf /* guess: not defined in manual */
+#define MSR_P4_TBPU_ESCR0 0x3c2
+#define MSR_P4_TBPU_ESCR1 0x3c3
+#define MSR_P4_TC_ESCR0 0x3c4
+#define MSR_P4_TC_ESCR1 0x3c5
+#define MSR_P4_U2L_ESCR0 0x3b0
+#define MSR_P4_U2L_ESCR1 0x3b1
/* K7/K8 MSRs. Not complete. See the architecture manual for a more complete list. */
#define MSR_K7_EVNTSEL0 0xC0010000
diff -PNaur xen-unstable/xen/include/asm-x86/nmi.h xen-unstable-prof/xen/include/asm-x86/nmi.h
--- xen-unstable/xen/include/asm-x86/nmi.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/xen/include/asm-x86/nmi.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,26 @@
+/*
+ * linux/include/asm-i386/nmi.h
+ */
+#ifndef ASM_NMI_H
+#define ASM_NMI_H
+
+struct cpu_user_regs;
+
+typedef int (*nmi_callback_t)(struct cpu_user_regs * regs, int cpu);
+
+/**
+ * set_nmi_callback
+ *
+ * Set a handler for an NMI. Only one handler may be
+ * set. Return 1 if the NMI was handled.
+ */
+void set_nmi_callback(nmi_callback_t callback);
+
+/**
+ * unset_nmi_callback
+ *
+ * Remove the handler previously set.
+ */
+void unset_nmi_callback(void);
+
+#endif /* ASM_NMI_H */
diff -PNaur xen-unstable/xen/include/public/xen.h xen-unstable-prof/xen/include/public/xen.h
--- xen-unstable/xen/include/public/xen.h 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof/xen/include/public/xen.h 2005-05-24 11:54:47.000000000 -0700
@@ -4,6 +4,10 @@
* Guest OS interface to Xen.
*
* Copyright (c) 2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef __XEN_PUBLIC_XEN_H__
@@ -58,6 +62,7 @@
#define __HYPERVISOR_boot_vcpu 24
#define __HYPERVISOR_set_segment_base 25 /* x86/64 only */
#define __HYPERVISOR_mmuext_op 26
+#define __HYPERVISOR_pmc_op 27
/*
* MULTICALLS
@@ -80,6 +85,7 @@
#define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */
#define VIRQ_PARITY_ERR 4 /* (DOM0) NMI parity error. */
#define VIRQ_IO_ERR 5 /* (DOM0) NMI I/O error. */
+#define VIRQ_PMC_OVF 6 /* PMC Overflow */
#define NR_VIRQS 7
/*
@@ -244,6 +250,21 @@
#define VMASST_TYPE_writable_pagetables 2
#define MAX_VMASST_TYPE 2
+/*
+ * Commands to HYPERVISOR_pmc_op().
+ */
+#define PMC_INIT 0
+#define PMC_SET_ACTIVE 1
+#define PMC_SET_PASSIVE 2
+#define PMC_RESERVE_COUNTERS 3
+#define PMC_SETUP_EVENTS 4
+#define PMC_ENABLE_VIRQ 5
+#define PMC_START 6
+#define PMC_STOP 7
+#define PMC_DISABLE_VIRQ 8
+#define PMC_RELEASE_COUNTERS 9
+#define PMC_SHUTDOWN 10
+
#ifndef __ASSEMBLY__
typedef u16 domid_t;
@@ -299,6 +320,8 @@
/* Support for multi-processor guests. */
#define MAX_VIRT_CPUS 32
+#define MAX_OPROF_EVENTS 32
+#define MAX_OPROF_DOMAINS 25
/*
* Per-VCPU information goes here. This will be cleaned up more when Xen
* actually supports multi-VCPU guests.
@@ -412,6 +435,20 @@
arch_shared_info_t arch;
+ /* Oprofile structures */
+ u8 event_head;
+ u8 event_tail;
+ struct {
+ u32 eip;
+ u8 mode;
+ u8 event;
+ } PACKED event_log[MAX_OPROF_EVENTS];
+ u8 losing_samples;
+ u64 samples_lost;
+ u32 nmi_restarts;
+ u64 active_samples;
+ u64 passive_samples;
+ u64 other_samples;
} PACKED shared_info_t;
/*
[-- Attachment #4: xenoprof-1.1-linux-2.6.11.patch --]
[-- Type: application/octet-stream, Size: 35868 bytes --]
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/arch/Makefile xen-unstable-prof/linux-2.6.11-xen0/arch/xen/arch/Makefile
--- xen-unstable/linux-2.6.11-xen0/arch/xen/arch/Makefile 2005-05-22 20:12:49.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/arch/Makefile 2005-05-24 11:56:13.000000000 -0700
@@ -79,7 +79,6 @@
drivers-$(CONFIG_MATH_EMULATION) += arch/i386/math-emu/
drivers-$(CONFIG_PCI) += arch/xen/i386/pci/
# must be linked after kernel/
-drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/
drivers-$(CONFIG_PM) += arch/i386/power/
# for clean
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/configs/xen0_defconfig_x86_32 xen-unstable-prof/linux-2.6.11-xen0/arch/xen/configs/xen0_defconfig_x86_32
--- xen-unstable/linux-2.6.11-xen0/arch/xen/configs/xen0_defconfig_x86_32 2005-05-22 20:12:46.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/configs/xen0_defconfig_x86_32 2005-05-24 11:56:13.000000000 -0700
@@ -75,6 +75,12 @@
CONFIG_KMOD=y
#
+# OProfile options
+#
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=m
+
+#
# X86 Processor Configuration
#
CONFIG_XENARCH="i386"
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/configs/xenU_defconfig_x86_32 xen-unstable-prof/linux-2.6.11-xen0/arch/xen/configs/xenU_defconfig_x86_32
--- xen-unstable/linux-2.6.11-xen0/arch/xen/configs/xenU_defconfig_x86_32 2005-05-22 20:12:46.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/configs/xenU_defconfig_x86_32 2005-05-24 11:56:13.000000000 -0700
@@ -71,6 +71,12 @@
CONFIG_KMOD=y
#
+# OProfile options
+#
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=m
+
+#
# X86 Processor Configuration
#
CONFIG_XENARCH="i386"
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/i386/Makefile xen-unstable-prof/linux-2.6.11-xen0/arch/xen/i386/Makefile
--- xen-unstable/linux-2.6.11-xen0/arch/xen/i386/Makefile 2005-05-22 20:12:49.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/i386/Makefile 2005-05-24 11:56:13.000000000 -0700
@@ -79,7 +79,6 @@
drivers-$(CONFIG_MATH_EMULATION) += arch/i386/math-emu/
drivers-$(CONFIG_PCI) += arch/xen/i386/pci/
# must be linked after kernel/
-drivers-$(CONFIG_OPROFILE) += arch/i386/oprofile/
drivers-$(CONFIG_PM) += arch/i386/power/
# for clean
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/Kconfig xen-unstable-prof/linux-2.6.11-xen0/arch/xen/Kconfig
--- xen-unstable/linux-2.6.11-xen0/arch/xen/Kconfig 2005-05-22 20:12:43.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/Kconfig 2005-05-24 11:56:13.000000000 -0700
@@ -194,3 +194,6 @@
source "crypto/Kconfig"
source "lib/Kconfig"
+
+source "arch/xen/oprofile/Kconfig"
+
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/kernel/evtchn.c xen-unstable-prof/linux-2.6.11-xen0/arch/xen/kernel/evtchn.c
--- xen-unstable/linux-2.6.11-xen0/arch/xen/kernel/evtchn.c 2005-05-22 20:12:47.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/kernel/evtchn.c 2005-05-24 11:56:13.000000000 -0700
@@ -44,9 +44,14 @@
#include <asm-xen/hypervisor.h>
#include <asm-xen/evtchn.h>
+int virq_to_phys(int virq);
+
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
EXPORT_SYMBOL(force_evtchn_callback);
EXPORT_SYMBOL(evtchn_do_upcall);
+EXPORT_SYMBOL(virq_to_phys);
+EXPORT_SYMBOL(bind_virq_to_irq);
+EXPORT_SYMBOL(unbind_virq_from_irq);
#endif
/*
@@ -150,6 +155,15 @@
return irq;
}
+int virq_to_phys(int virq)
+{
+ int cpu = smp_processor_id();
+
+ if (virq >= NR_VIRQS)
+ return -1;
+ return per_cpu(virq_to_irq,cpu)[virq];
+}
+
int bind_virq_to_irq(int virq)
{
evtchn_op_t op;
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/Makefile xen-unstable-prof/linux-2.6.11-xen0/arch/xen/Makefile
--- xen-unstable/linux-2.6.11-xen0/arch/xen/Makefile 2005-05-22 20:12:52.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/Makefile 2005-05-24 11:56:13.000000000 -0700
@@ -22,6 +22,8 @@
core-y += arch/xen/kernel/
+drivers-$(CONFIG_OPROFILE) += arch/xen/oprofile/
+
include/.asm-ignore: include/asm
@rm -f include/.asm-ignore
@mv include/asm include/.asm-ignore
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/Kconfig xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/Kconfig
--- xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/Kconfig 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/Kconfig 2005-05-24 11:56:13.000000000 -0700
@@ -0,0 +1,23 @@
+
+menu "Profiling support"
+ depends on EXPERIMENTAL
+
+config PROFILING
+ bool "Profiling support (EXPERIMENTAL)"
+ help
+ Say Y here to enable the extended profiling support mechanisms used
+ by profilers such as OProfile.
+
+
+config OPROFILE
+ tristate "OProfile system profiling (EXPERIMENTAL)"
+ depends on PROFILING
+ help
+ OProfile is a profiling system capable of profiling the
+ whole system, include the kernel, kernel modules, libraries,
+ and applications.
+
+ If unsure, say N.
+
+endmenu
+
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/Makefile xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/Makefile
--- xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/Makefile 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/Makefile 2005-05-24 11:56:13.000000000 -0700
@@ -0,0 +1,9 @@
+obj-$(CONFIG_OPROFILE) += oprofile.o
+
+DRIVER_OBJS = $(addprefix ../../../drivers/oprofile/, \
+ oprof.o cpu_buffer.o buffer_sync.o \
+ event_buffer.o oprofile_files.o \
+ oprofilefs.o oprofile_stats.o \
+ timer_int.o )
+
+oprofile-y := $(DRIVER_OBJS) pmc.o
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/op_counter.h xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/op_counter.h
--- xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/op_counter.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/op_counter.h 2005-05-24 11:56:13.000000000 -0700
@@ -0,0 +1,29 @@
+/**
+ * @file op_counter.h
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon
+ */
+
+#ifndef OP_COUNTER_H
+#define OP_COUNTER_H
+
+#define OP_MAX_COUNTER 8
+
+/* Per-perfctr configuration as set via
+ * oprofilefs.
+ */
+struct op_counter_config {
+ unsigned long count;
+ unsigned long enabled;
+ unsigned long event;
+ unsigned long kernel;
+ unsigned long user;
+ unsigned long unit_mask;
+};
+
+extern struct op_counter_config counter_config[];
+
+#endif /* OP_COUNTER_H */
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/pmc.c xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/pmc.c
--- xen-unstable/linux-2.6.11-xen0/arch/xen/oprofile/pmc.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/arch/xen/oprofile/pmc.c 2005-05-24 11:56:13.000000000 -0700
@@ -0,0 +1,322 @@
+/**
+ * @file nmi_int.c
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <linux/init.h>
+#include <linux/notifier.h>
+#include <linux/smp.h>
+#include <linux/oprofile.h>
+#include <linux/sysdev.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <asm/nmi.h>
+#include <asm/msr.h>
+#include <asm/apic.h>
+
+#include "op_counter.h"
+
+static int pmc_start(void);
+static void pmc_stop(void);
+
+/* 0 == registered but off, 1 == registered and on */
+static int pmc_enabled = 0;
+static int num_events = 0;
+static int is_primary = 0;
+
+#ifdef CONFIG_PM
+
+static int pmc_suspend(struct sys_device *dev, u32 state)
+{
+ if (pmc_enabled == 1)
+ pmc_stop();
+ return 0;
+}
+
+
+static int pmc_resume(struct sys_device *dev)
+{
+ if (pmc_enabled == 1)
+ pmc_start();
+ return 0;
+}
+
+
+static struct sysdev_class oprofile_sysclass = {
+ set_kset_name("oprofile"),
+ .resume = pmc_resume,
+ .suspend = pmc_suspend,
+};
+
+
+static struct sys_device device_oprofile = {
+ .id = 0,
+ .cls = &oprofile_sysclass,
+};
+
+
+static int __init init_driverfs(void)
+{
+ int error;
+ if (!(error = sysdev_class_register(&oprofile_sysclass)))
+ error = sysdev_register(&device_oprofile);
+ return error;
+}
+
+
+static void __exit exit_driverfs(void)
+{
+ sysdev_unregister(&device_oprofile);
+ sysdev_class_unregister(&oprofile_sysclass);
+}
+
+#else
+#define init_driverfs() do { } while (0)
+#define exit_driverfs() do { } while (0)
+#endif /* CONFIG_PM */
+
+unsigned long long oprofile_samples = 0;
+
+static irqreturn_t pmc_ovf_interrupt (int irq, void *dev_id, struct pt_regs *regs)
+{
+ int head, tail;
+ shared_info_t *s = HYPERVISOR_shared_info;
+
+ head = s->event_head;
+ tail = s->event_tail;
+
+ /* oprofile_add_sample will also handle samples from other domains */
+
+ if (tail > head) {
+ while (tail < MAX_OPROF_EVENTS) {
+ oprofile_add_sample_xen(s->event_log[tail].eip,
+ s->event_log[tail].mode,
+ s->event_log[tail].event);
+ /*printk(KERN_INFO "pmc_sample: %p, %d, %d\n",
+ s->event_log[tail].eip, s->event_log[tail].mode,
+ s->event_log[tail].event);*/
+ oprofile_samples++;
+ tail++;
+ }
+ tail = 0;
+ }
+ while (tail < head) {
+ oprofile_add_sample_xen(s->event_log[tail].eip,
+ s->event_log[tail].mode, s->event_log[tail].event);
+ /*printk(KERN_INFO "pmc_sample: %p, %d, %d\n",
+ s->event_log[tail].eip, s->event_log[tail].mode,
+ s->event_log[tail].event);*/
+ oprofile_samples++;
+ tail++;
+ }
+
+ s->event_tail = tail;
+ s->losing_samples = 0;
+
+ return IRQ_HANDLED;
+}
+
+extern int virq_to_phys(int virq);
+
+static int pmc_setup(void)
+{
+ int ret;
+
+ if ((ret = request_irq(bind_virq_to_irq(VIRQ_PMC_OVF),
+ pmc_ovf_interrupt, SA_INTERRUPT, "pmc_ovf", NULL)))
+ goto release_irq;
+
+ if (is_primary) {
+ ret = HYPERVISOR_pmc_op(PMC_RESERVE_COUNTERS, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_setup: reserve_counters: ret %d\n", ret);
+
+ ret = HYPERVISOR_pmc_op(PMC_SETUP_EVENTS, (unsigned int)&counter_config, (unsigned int)num_events);
+ //printk(KERN_INFO "pmc_setup: setup_events: ret %d\n", ret);
+ }
+
+ ret = HYPERVISOR_pmc_op(PMC_ENABLE_VIRQ, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_setup: enable_virq: ret %d\n", ret);
+
+ pmc_enabled = 1;
+ return 0;
+
+release_irq:
+ free_irq(virq_to_phys(VIRQ_PMC_OVF), NULL);
+ unbind_virq_from_irq(VIRQ_PMC_OVF);
+
+ return ret;
+}
+
+static void pmc_shutdown(void)
+{
+ int ret;
+ pmc_enabled = 0;
+
+ ret = HYPERVISOR_pmc_op(PMC_DISABLE_VIRQ, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_shutdown: disable_virq: ret %d\n", ret);
+
+ if (is_primary) {
+ ret = HYPERVISOR_pmc_op(PMC_RELEASE_COUNTERS, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_shutdown: release_counters: ret %d\n", ret);
+ }
+
+ free_irq(virq_to_phys(VIRQ_PMC_OVF), NULL);
+ unbind_virq_from_irq(VIRQ_PMC_OVF);
+}
+
+static int pmc_start(void)
+{
+ int ret = 0;
+ if (is_primary)
+ ret = HYPERVISOR_pmc_op(PMC_START, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_start: ret %d\n", ret);
+ return ret;
+}
+
+static void pmc_stop(void)
+{
+ int ret = 0;
+ if (is_primary)
+ ret = HYPERVISOR_pmc_op(PMC_STOP, (unsigned int)NULL, (unsigned int)NULL);
+ //printk(KERN_INFO "pmc_stop: ret %d\n", ret);
+ printk(KERN_INFO "pmc: oprofile samples %llu, active %llu, passive %llu, other %llu, buffering losses %llu, NMI restarted %d\n",
+ oprofile_samples, HYPERVISOR_shared_info->active_samples, HYPERVISOR_shared_info->passive_samples,
+ HYPERVISOR_shared_info->other_samples, HYPERVISOR_shared_info->samples_lost, HYPERVISOR_shared_info->nmi_restarts);
+}
+
+static int pmc_set_active(int *active_domains, unsigned int adomains)
+{
+ int ret = 0;
+ if (is_primary)
+ ret = HYPERVISOR_pmc_op(PMC_SET_ACTIVE,
+ (unsigned int)active_domains, (unsigned int)adomains);
+ return ret;
+}
+
+static int pmc_set_passive(int *passive_domains, unsigned int pdomains)
+{
+ int ret = 0;
+ if (is_primary)
+ ret = HYPERVISOR_pmc_op(PMC_SET_PASSIVE,
+ (unsigned int)passive_domains, (unsigned int)pdomains);
+ return ret;
+}
+
+struct op_counter_config counter_config[OP_MAX_COUNTER];
+
+static int pmc_create_files(struct super_block * sb, struct dentry * root)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_events; ++i) {
+ struct dentry * dir;
+ char buf[2];
+
+ snprintf(buf, 2, "%d", i);
+ dir = oprofilefs_mkdir(sb, root, buf);
+ oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled);
+ oprofilefs_create_ulong(sb, dir, "event", &counter_config[i].event);
+ oprofilefs_create_ulong(sb, dir, "count", &counter_config[i].count);
+ oprofilefs_create_ulong(sb, dir, "unit_mask", &counter_config[i].unit_mask);
+ oprofilefs_create_ulong(sb, dir, "kernel", &counter_config[i].kernel);
+ oprofilefs_create_ulong(sb, dir, "user", &counter_config[i].user);
+ }
+
+ //printk(KERN_INFO "pmc_create_files\n");
+ return 0;
+}
+
+
+struct oprofile_operations pmc_ops = {
+ .create_files = pmc_create_files,
+ .set_active = pmc_set_active,
+ .set_passive = pmc_set_passive,
+ .setup = pmc_setup,
+ .shutdown = pmc_shutdown,
+ .start = pmc_start,
+ .stop = pmc_stop
+};
+
+
+static void __init p4_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 3)
+ pmc_ops.cpu_type = "type_unknown";
+
+ /* We always use a non-HT system because that goves us more events */
+ pmc_ops.cpu_type = "i386/p4";
+}
+
+
+static void __init ppro_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 0xd)
+ pmc_ops.cpu_type = "type_unknown";
+
+ if (cpu_model == 9) {
+ pmc_ops.cpu_type = "i386/p6_mobile";
+ } else if (cpu_model > 5) {
+ pmc_ops.cpu_type = "i386/piii";
+ } else if (cpu_model > 2) {
+ pmc_ops.cpu_type = "i386/pii";
+ } else {
+ pmc_ops.cpu_type = "i386/ppro";
+ }
+}
+
+/* in order to get driverfs right */
+static int using_pmc;
+
+int __init oprofile_arch_init(struct oprofile_operations * ops)
+{
+ int ret = HYPERVISOR_pmc_op(PMC_INIT, (unsigned int)&num_events, (unsigned int)&is_primary);
+
+ if (!ret) {
+ __u8 vendor = current_cpu_data.x86_vendor;
+ __u8 family = current_cpu_data.x86;
+
+ if (vendor == X86_VENDOR_INTEL) {
+ switch (family) {
+ /* Pentium IV */
+ case 0xf:
+ p4_init();
+ break;
+ /* A P6-class processor */
+ case 6:
+ ppro_init();
+ break;
+ default:
+ pmc_ops.cpu_type = "type_unknown";
+ }
+ } else pmc_ops.cpu_type = "type_unknown";
+
+ init_driverfs();
+ using_pmc = 1;
+ *ops = pmc_ops;
+ }
+ printk (KERN_INFO "oprofile_arch_init: ret %d, events %d, is_primary %d\n", ret, num_events, is_primary);
+ return ret;
+}
+
+
+void __exit oprofile_arch_exit(void)
+{
+ if (using_pmc)
+ exit_driverfs();
+
+ if (is_primary)
+ HYPERVISOR_pmc_op(PMC_SHUTDOWN, (unsigned int )NULL, (unsigned int)NULL);
+
+}
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/buffer_sync.c xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/buffer_sync.c
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/buffer_sync.c 2005-03-01 23:37:51.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/buffer_sync.c 2005-05-24 11:56:13.000000000 -0700
@@ -6,6 +6,10 @@
*
* @author John Levon <levon@movementarian.org>
*
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ *
* This is the core of the buffer management. Each
* CPU buffer is processed and entered into the
* global event buffer. Such processing is necessary
@@ -265,13 +269,30 @@
last_cookie = ~0UL;
}
-static void add_kernel_ctx_switch(unsigned int in_kernel)
+static void add_cpu_mode_switch(unsigned int cpu_mode)
{
add_event_entry(ESCAPE_CODE);
- if (in_kernel)
- add_event_entry(KERNEL_ENTER_SWITCH_CODE);
- else
- add_event_entry(KERNEL_EXIT_SWITCH_CODE);
+ switch (cpu_mode)
+ {
+ case CPU_MODE_USER:
+ add_event_entry(USER_ENTER_SWITCH_CODE);
+ break;
+ case CPU_MODE_KERNEL:
+ add_event_entry(KERNEL_ENTER_SWITCH_CODE);
+ break;
+ case CPU_MODE_XEN:
+ add_event_entry(XEN_ENTER_SWITCH_CODE);
+ break;
+ default:
+ break;
+ }
+}
+
+static void add_dom_switch(int domain_id)
+{
+ add_event_entry(ESCAPE_CODE);
+ add_event_entry(DOMAIN_SWITCH_CODE);
+ add_event_entry(domain_id);
}
static void
@@ -337,10 +358,9 @@
* sample is converted into a persistent dentry/offset pair
* for later lookup from userspace.
*/
-static int
-add_sample(struct mm_struct * mm, struct op_sample * s, int in_kernel)
+static int add_sample(struct mm_struct * mm, struct op_sample * s, int cpu_mode)
{
- if (in_kernel) {
+ if (cpu_mode >= CPU_MODE_KERNEL) {
add_sample_entry(s->eip, s->event);
return 1;
} else if (mm) {
@@ -374,6 +394,11 @@
{
return val == ESCAPE_CODE;
}
+
+static inline int is_dom_switch(unsigned long val)
+{
+ return val == DOMAIN_SWITCH_ESCAPE_CODE;
+}
/* "acquire" as many cpu buffer slots as we can */
@@ -489,10 +514,11 @@
struct mm_struct *mm = NULL;
struct task_struct * new;
unsigned long cookie = 0;
- int in_kernel = 1;
+ int cpu_mode = 1;
unsigned int i;
sync_buffer_state state = sb_buffer_start;
unsigned long available;
+ int domain_switch = 0;
down(&buffer_sem);
@@ -506,12 +532,12 @@
struct op_sample * s = &cpu_buf->buffer[cpu_buf->tail_pos];
if (is_code(s->eip)) {
- if (s->event <= CPU_IS_KERNEL) {
+ if (s->event <= CPU_MODE_MAX) {
/* kernel/userspace switch */
- in_kernel = s->event;
+ cpu_mode = s->event;
if (state == sb_buffer_start)
state = sb_sample_start;
- add_kernel_ctx_switch(s->event);
+ add_cpu_mode_switch(s->event);
} else if (s->event == CPU_TRACE_BEGIN) {
state = sb_bt_start;
add_trace_begin();
@@ -528,11 +554,23 @@
add_user_ctx_switch(new, cookie);
}
} else {
- if (state >= sb_bt_start &&
- !add_sample(mm, s, in_kernel)) {
- if (state == sb_bt_start) {
- state = sb_bt_ignore;
- atomic_inc(&oprofile_stats.bt_lost_no_mapping);
+ if (is_dom_switch(s->eip)) {
+ add_dom_switch((int)(s->event));
+ domain_switch = 1;
+ }
+ else {
+ if (domain_switch) {
+ add_sample_entry (s->eip, s->event);
+ domain_switch = 0;
+ }
+ else {
+ if (state >= sb_bt_start &&
+ !add_sample(mm, s, cpu_mode)) {
+ if (state == sb_bt_start) {
+ state = sb_bt_ignore;
+ atomic_inc(&oprofile_stats.bt_lost_no_mapping);
+ }
+ }
}
}
}
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.c xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.c
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.c 2005-03-01 23:37:31.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.c 2005-05-24 11:56:13.000000000 -0700
@@ -6,6 +6,10 @@
*
* @author John Levon <levon@movementarian.org>
*
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ *
* Each CPU has a local buffer that stores PC value/event
* pairs. We also log context switches when we notice them.
* Eventually each CPU's buffer is processed into the global
@@ -58,7 +62,7 @@
goto fail;
b->last_task = NULL;
- b->last_is_kernel = -1;
+ b->last_cpu_mode = -1;
b->tracing = 0;
b->buffer_size = buffer_size;
b->tail_pos = 0;
@@ -117,7 +121,7 @@
* collected will populate the buffer with proper
* values to initialize the buffer
*/
- cpu_buf->last_is_kernel = -1;
+ cpu_buf->last_cpu_mode = -1;
cpu_buf->last_task = NULL;
}
@@ -180,7 +184,7 @@
* events whenever is_kernel changes
*/
static int log_sample(struct oprofile_cpu_buffer * cpu_buf, unsigned long pc,
- int is_kernel, unsigned long event)
+ int cpu_mode, unsigned long event)
{
struct task_struct * task;
@@ -191,24 +195,39 @@
return 0;
}
- is_kernel = !!is_kernel;
+ // Ensure a valid cpu mode
+ if (cpu_mode > CPU_MODE_XEN)
+ return 0;
task = current;
- /* notice a switch from user->kernel or vice versa */
- if (cpu_buf->last_is_kernel != is_kernel) {
- cpu_buf->last_is_kernel = is_kernel;
- add_code(cpu_buf, is_kernel);
- }
- /* notice a task switch */
- if (cpu_buf->last_task != task) {
- cpu_buf->last_task = task;
- add_code(cpu_buf, (unsigned long)task);
+ /* We treat samples from other domains in a special manner:
+ each sample is preceded by a record with eip equal to ~1UL.
+ This record is non-sticky i.e. it holds only for the following
+ sample. The event field of this record stores the domain id.*/
+ if (pc == DOMAIN_SWITCH_ESCAPE_CODE) {
+ add_sample(cpu_buf, pc, event);
+ return 1;
+ } else {
+ /* notice a switch from user->kernel or vice versa */
+ if (cpu_buf->last_cpu_mode != cpu_mode) {
+ cpu_buf->last_cpu_mode = cpu_mode;
+ add_code(cpu_buf, cpu_mode);
+ }
+
+ /* notice a task switch */
+ if (cpu_buf->last_task != task) {
+ cpu_buf->last_task = task;
+ add_code(cpu_buf, (unsigned long)task);
+ }
+
+ /* Note: at this point, we lose the cpu_mode of a sample
+ if it is from another domain */
+
+ add_sample(cpu_buf, pc, event);
+ return 1;
}
-
- add_sample(cpu_buf, pc, event);
- return 1;
}
static int oprofile_begin_trace(struct oprofile_cpu_buffer * cpu_buf)
@@ -229,6 +248,14 @@
cpu_buf->tracing = 0;
}
+void oprofile_add_sample_xen(unsigned long eip, unsigned int cpu_mode,
+ unsigned long event)
+{
+ struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[smp_processor_id()];
+ log_sample(cpu_buf, eip, cpu_mode, event);
+
+
+}
void oprofile_add_sample(struct pt_regs * const regs, unsigned long event)
{
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.h xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.h
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.h 2005-03-01 23:38:38.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/cpu_buffer.h 2005-05-24 11:56:13.000000000 -0700
@@ -36,7 +36,7 @@
volatile unsigned long tail_pos;
unsigned long buffer_size;
struct task_struct * last_task;
- int last_is_kernel;
+ int last_cpu_mode;
int tracing;
struct op_sample * buffer;
unsigned long sample_received;
@@ -51,7 +51,14 @@
void cpu_buffer_reset(struct oprofile_cpu_buffer * cpu_buf);
/* transient events for the CPU buffer -> event buffer */
-#define CPU_IS_KERNEL 1
-#define CPU_TRACE_BEGIN 2
+#define CPU_MODE_USER 0
+#define CPU_MODE_KERNEL 1
+#define CPU_MODE_XEN 2
+#define CPU_MODE_MAX 2
+#define CPU_TRACE_BEGIN 3
+/* special escape code for indicating next sample in the CPU */
+/* buffer is from another Xen domain */
+#define DOMAIN_SWITCH_ESCAPE_CODE ~1UL
+
#endif /* OPROFILE_CPU_BUFFER_H */
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/event_buffer.c xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/event_buffer.c
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/event_buffer.c 2005-03-01 23:37:30.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/event_buffer.c 2005-05-24 11:56:13.000000000 -0700
@@ -56,6 +56,7 @@
/* Wake up the waiting process if any. This happens
* on "echo 0 >/dev/oprofile/enable" so the daemon
* processes the data remaining in the event buffer.
+ * also called on echo 1 > /dev/oprofile/dump
*/
void wake_up_buffer_waiter(void)
{
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/event_buffer.h xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/event_buffer.h
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/event_buffer.h 2005-03-01 23:37:48.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/event_buffer.h 2005-05-24 11:56:13.000000000 -0700
@@ -5,6 +5,10 @@
* @remark Read the file COPYING
*
* @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef EVENT_BUFFER_H
@@ -29,11 +33,13 @@
#define CPU_SWITCH_CODE 2
#define COOKIE_SWITCH_CODE 3
#define KERNEL_ENTER_SWITCH_CODE 4
-#define KERNEL_EXIT_SWITCH_CODE 5
+#define USER_ENTER_SWITCH_CODE 5
#define MODULE_LOADED_CODE 6
#define CTX_TGID_CODE 7
#define TRACE_BEGIN_CODE 8
#define TRACE_END_CODE 9
+#define XEN_ENTER_SWITCH_CODE 10
+#define DOMAIN_SWITCH_CODE 11
/* add data to the event buffer */
void add_event_entry(unsigned long data);
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/oprof.c xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/oprof.c
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/oprof.c 2005-03-01 23:37:50.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/oprof.c 2005-05-24 11:56:13.000000000 -0700
@@ -5,6 +5,10 @@
* @remark Read the file COPYING
*
* @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include <linux/kernel.h>
@@ -33,6 +37,25 @@
*/
static int timer = 0;
+extern unsigned int adomains, pdomains;
+extern int active_domains[MAX_OPROF_DOMAINS], passive_domains[MAX_OPROF_DOMAINS];
+
+int oprofile_set_active(void)
+{
+ if (oprofile_ops.set_active)
+ return oprofile_ops.set_active(active_domains, adomains);
+
+ return -EINVAL;
+}
+
+int oprofile_set_passive(void)
+{
+ if (oprofile_ops.set_passive)
+ return oprofile_ops.set_passive(passive_domains, pdomains);
+
+ return -EINVAL;
+}
+
int oprofile_setup(void)
{
int err;
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/drivers/oprofile/oprofile_files.c xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/oprofile_files.c
--- xen-unstable/linux-2.6.11-xen0/drivers/oprofile/oprofile_files.c 2005-03-01 23:38:09.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/drivers/oprofile/oprofile_files.c 2005-05-24 11:56:13.000000000 -0700
@@ -5,10 +5,16 @@
* @remark Read the file COPYING
*
* @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include <linux/fs.h>
#include <linux/oprofile.h>
+#include <linux/pagemap.h>
+#include <linux/ctype.h>
#include "event_buffer.h"
#include "oprofile_stats.h"
@@ -117,11 +123,140 @@
static struct file_operations dump_fops = {
.write = dump_write,
};
-
+
+#define TMPBUFSIZE 50
+
+unsigned int adomains = 0;
+long active_domains[MAX_OPROF_DOMAINS];
+
+extern int oprofile_set_active(void);
+
+static ssize_t adomain_write(struct file *file, char const __user *buf, size_t count, loff_t * offset)
+{
+ char tmpbuf[TMPBUFSIZE];
+ char *startp = tmpbuf;
+ char *endp = tmpbuf;
+ int i;
+ unsigned long val;
+
+ if (*offset)
+ return -EINVAL;
+ if (!count)
+ return 0;
+ if (count > TMPBUFSIZE - 1)
+ return -EINVAL;
+
+ memset(tmpbuf, 0x0, TMPBUFSIZE);
+
+ if (copy_from_user(tmpbuf, buf, count))
+ return -EFAULT;
+
+ for (i = 0; i < MAX_OPROF_DOMAINS; i++)
+ active_domains[i] = -1;
+ adomains = 0;
+
+ while (1) {
+ val = simple_strtol(startp, &endp, 0);
+ if (endp == startp)
+ break;
+ while (ispunct(*endp))
+ endp++;
+ active_domains[adomains++] = val;
+ if (adomains >= MAX_OPROF_DOMAINS)
+ break;
+ startp = endp;
+ }
+ if (oprofile_set_active())
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t adomain_read(struct file *file, char __user * buf, size_t count, loff_t * offset)
+{
+ char tmpbuf[TMPBUFSIZE];
+ size_t len = 0;
+ int i;
+ /* This is all screwed up if we run out of space */
+ for (i = 0; i < adomains; i++)
+ len += snprintf(tmpbuf + len, TMPBUFSIZE - len, "%u ", (unsigned int)active_domains[i]);
+ len += snprintf(tmpbuf + len, TMPBUFSIZE - len, "\n");
+ return simple_read_from_buffer((void __user *)buf, count, offset, tmpbuf, len);
+}
+
+
+static struct file_operations active_domain_ops = {
+ .read = adomain_read,
+ .write = adomain_write,
+};
+
+unsigned int pdomains = 0;
+long passive_domains[MAX_OPROF_DOMAINS];
+
+extern int oprofile_set_passive(void);
+
+static ssize_t pdomain_write(struct file *file, char const __user *buf, size_t count, loff_t * offset)
+{
+ char tmpbuf[TMPBUFSIZE];
+ char *startp = tmpbuf;
+ char *endp = tmpbuf;
+ int i;
+ unsigned long val;
+
+ if (*offset)
+ return -EINVAL;
+ if (!count)
+ return 0;
+ if (count > TMPBUFSIZE - 1)
+ return -EINVAL;
+
+ memset(tmpbuf, 0x0, TMPBUFSIZE);
+
+ if (copy_from_user(tmpbuf, buf, count))
+ return -EFAULT;
+
+ for (i = 0; i < MAX_OPROF_DOMAINS; i++)
+ passive_domains[i] = -1;
+ pdomains = 0;
+
+ while (1) {
+ val = simple_strtol(startp, &endp, 0);
+ if (endp == startp)
+ break;
+ while (ispunct(*endp))
+ endp++;
+ passive_domains[pdomains++] = val;
+ if (pdomains >= MAX_OPROF_DOMAINS)
+ break;
+ startp = endp;
+ }
+ if (oprofile_set_passive())
+ return -EINVAL;
+ return count;
+}
+
+static ssize_t pdomain_read(struct file *file, char __user * buf, size_t count, loff_t * offset)
+{
+ char tmpbuf[TMPBUFSIZE];
+ size_t len = 0;
+ int i;
+ /* This is all screwed up if we run out of space */
+ for (i = 0; i < pdomains; i++)
+ len += snprintf(tmpbuf + len, TMPBUFSIZE - len, "%u ", (unsigned int)passive_domains[i]);
+ len += snprintf (tmpbuf + len, TMPBUFSIZE - len, "\n");
+ return simple_read_from_buffer((void __user *)buf, count, offset, tmpbuf, len);
+}
+
+static struct file_operations passive_domain_ops = {
+ .read = pdomain_read,
+ .write = pdomain_write,
+};
+
void oprofile_create_files(struct super_block * sb, struct dentry * root)
{
oprofilefs_create_file(sb, root, "enable", &enable_fops);
oprofilefs_create_file_perm(sb, root, "dump", &dump_fops, 0666);
+ oprofilefs_create_file(sb, root, "active_domains", &active_domain_ops);
+ oprofilefs_create_file(sb, root, "passive_domains", &passive_domain_ops);
oprofilefs_create_file(sb, root, "buffer", &event_buffer_fops);
oprofilefs_create_ulong(sb, root, "buffer_size", &fs_buffer_size);
oprofilefs_create_ulong(sb, root, "buffer_watershed", &fs_buffer_watershed);
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/include/asm-xen/hypervisor.h xen-unstable-prof/linux-2.6.11-xen0/include/asm-xen/hypervisor.h
--- xen-unstable/linux-2.6.11-xen0/include/asm-xen/hypervisor.h 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/include/asm-xen/hypervisor.h 2005-05-24 11:56:13.000000000 -0700
@@ -4,6 +4,10 @@
* Linux-specific hypervisor handling.
*
* Copyright (c) 2002-2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*
* This file may be distributed separately from the Linux kernel, or
* incorporated into other software packages, subject to the following license:
@@ -137,4 +141,20 @@
#include <asm/hypercall.h>
+static inline int
+HYPERVISOR_pmc_op(
+ int op, unsigned int arg1, unsigned int arg2)
+{
+ int ret;
+ unsigned long ign1, ign2, ign3;
+
+ __asm__ __volatile__ (
+ TRAP_INSTR
+ : "=a"(ret), "=b"(ign1), "=c"(ign2), "=d"(ign3)
+ : "0"(__HYPERVISOR_pmc_op), "1"(op), "2"(arg1), "3"(arg2)
+ : "memory" );
+
+ return ret;
+}
+
#endif /* __HYPERVISOR_H__ */
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/include/asm-xen/xen-public/xen.h xen-unstable-prof/linux-2.6.11-xen0/include/asm-xen/xen-public/xen.h
--- xen-unstable/linux-2.6.11-xen0/include/asm-xen/xen-public/xen.h 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof/linux-2.6.11-xen0/include/asm-xen/xen-public/xen.h 2005-05-24 11:54:47.000000000 -0700
@@ -4,6 +4,10 @@
* Guest OS interface to Xen.
*
* Copyright (c) 2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef __XEN_PUBLIC_XEN_H__
@@ -58,6 +62,7 @@
#define __HYPERVISOR_boot_vcpu 24
#define __HYPERVISOR_set_segment_base 25 /* x86/64 only */
#define __HYPERVISOR_mmuext_op 26
+#define __HYPERVISOR_pmc_op 27
/*
* MULTICALLS
@@ -80,6 +85,7 @@
#define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */
#define VIRQ_PARITY_ERR 4 /* (DOM0) NMI parity error. */
#define VIRQ_IO_ERR 5 /* (DOM0) NMI I/O error. */
+#define VIRQ_PMC_OVF 6 /* PMC Overflow */
#define NR_VIRQS 7
/*
@@ -244,6 +250,21 @@
#define VMASST_TYPE_writable_pagetables 2
#define MAX_VMASST_TYPE 2
+/*
+ * Commands to HYPERVISOR_pmc_op().
+ */
+#define PMC_INIT 0
+#define PMC_SET_ACTIVE 1
+#define PMC_SET_PASSIVE 2
+#define PMC_RESERVE_COUNTERS 3
+#define PMC_SETUP_EVENTS 4
+#define PMC_ENABLE_VIRQ 5
+#define PMC_START 6
+#define PMC_STOP 7
+#define PMC_DISABLE_VIRQ 8
+#define PMC_RELEASE_COUNTERS 9
+#define PMC_SHUTDOWN 10
+
#ifndef __ASSEMBLY__
typedef u16 domid_t;
@@ -299,6 +320,8 @@
/* Support for multi-processor guests. */
#define MAX_VIRT_CPUS 32
+#define MAX_OPROF_EVENTS 32
+#define MAX_OPROF_DOMAINS 25
/*
* Per-VCPU information goes here. This will be cleaned up more when Xen
* actually supports multi-VCPU guests.
@@ -412,6 +435,20 @@
arch_shared_info_t arch;
+ /* Oprofile structures */
+ u8 event_head;
+ u8 event_tail;
+ struct {
+ u32 eip;
+ u8 mode;
+ u8 event;
+ } PACKED event_log[MAX_OPROF_EVENTS];
+ u8 losing_samples;
+ u64 samples_lost;
+ u32 nmi_restarts;
+ u64 active_samples;
+ u64 passive_samples;
+ u64 other_samples;
} PACKED shared_info_t;
/*
diff -PNaur --exclude=.asm-ignore xen-unstable/linux-2.6.11-xen0/include/linux/oprofile.h xen-unstable-prof/linux-2.6.11-xen0/include/linux/oprofile.h
--- xen-unstable/linux-2.6.11-xen0/include/linux/oprofile.h 2005-03-01 23:38:10.000000000 -0800
+++ xen-unstable-prof/linux-2.6.11-xen0/include/linux/oprofile.h 2005-05-24 11:56:13.000000000 -0700
@@ -8,6 +8,10 @@
* @remark Read the file COPYING
*
* @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef OPROFILE_H
@@ -27,6 +31,10 @@
/* create any necessary configuration files in the oprofile fs.
* Optional. */
int (*create_files)(struct super_block * sb, struct dentry * root);
+ /* setup active domains with Xen */
+ int (*set_active)(int *active_domains, unsigned int adomains);
+ /* setup passive domains with Xen */
+ int (*set_passive)(int *passive_domains, unsigned int pdomains);
/* Do any necessary interrupt setup. Optional. */
int (*setup)(void);
/* Do any necessary interrupt shutdown. Optional. */
@@ -61,6 +69,15 @@
*/
void oprofile_add_sample(struct pt_regs * const regs, unsigned long event);
+/**
+ * alternative function to Add a sample for Xen.
+ * It would be better to combine both functions into only one but this would
+ * require getting parameter cpu_mode(old is_kernel) back to
+ * oprofile_add_sample() m(Xen is the best location to determine cpu_mode)
+ */
+extern void oprofile_add_sample_xen(unsigned long eip, unsigned int cpu_mode,
+ unsigned long event);
+
/* Use this instead when the PC value is not from the regs. Doesn't
* backtrace. */
void oprofile_add_pc(unsigned long pc, int is_kernel, unsigned long event);
[-- Attachment #5: xenoprof-1.1-oprofile-0.8.2.patch --]
[-- Type: application/octet-stream, Size: 15074 bytes --]
diff -PNaur oprofile-0.8.2/daemon/init.c oprofile-0.8.2-xen/daemon/init.c
--- oprofile-0.8.2/daemon/init.c 2004-01-29 12:00:26.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/init.c 2005-04-08 13:42:06.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include "config.h"
@@ -221,6 +224,7 @@
size_t opd_buf_size;
opd_create_vmlinux(vmlinux, kernel_range);
+ opd_create_xen(xenimage, xen_range);
opd_buf_size = opd_read_fs_int("/dev/oprofile/", "buffer_size", 1);
kernel_pointer_size = opd_read_fs_int("/dev/oprofile/", "pointer_size", 1);
diff -PNaur oprofile-0.8.2/daemon/opd_interface.h oprofile-0.8.2-xen/daemon/opd_interface.h
--- oprofile-0.8.2/daemon/opd_interface.h 2004-01-17 18:21:13.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/opd_interface.h 2005-04-08 13:42:06.000000000 -0700
@@ -8,6 +8,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef OPD_INTERFACE_H
@@ -17,11 +20,13 @@
#define CPU_SWITCH_CODE 2
#define COOKIE_SWITCH_CODE 3
#define KERNEL_ENTER_SWITCH_CODE 4
-#define KERNEL_EXIT_SWITCH_CODE 5
+#define USER_ENTER_SWITCH_CODE 5
#define MODULE_LOADED_CODE 6
#define CTX_TGID_CODE 7
#define TRACE_BEGIN_CODE 8
#define TRACE_END_CODE 9
-#define LAST_CODE 10
+#define XEN_ENTER_SWITCH_CODE 10
+#define DOMAIN_SWITCH_CODE 11
+#define LAST_CODE 12
#endif /* OPD_INTERFACE_H */
diff -PNaur oprofile-0.8.2/daemon/opd_kernel.c oprofile-0.8.2-xen/daemon/opd_kernel.c
--- oprofile-0.8.2/daemon/opd_kernel.c 2004-01-29 12:00:26.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/opd_kernel.c 2005-05-23 15:07:01.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include "opd_kernel.h"
@@ -27,8 +30,12 @@
static LIST_HEAD(modules);
+static LIST_HEAD(other_domains);
+
static struct kernel_image vmlinux_image;
+static struct kernel_image xen_image;
+
void opd_create_vmlinux(char const * name, char const * arg)
{
/* vmlinux is *not* on the list of modules */
@@ -54,6 +61,43 @@
}
}
+void opd_create_xen(char const * name, char const * arg)
+{
+ /* xen is *not* on the list of modules */
+ list_init(&xen_image.list);
+
+ /* for no xen */
+ if (no_xen) {
+ xen_image.name = "no-xen";
+ return;
+ }
+
+ xen_image.name = xstrdup(name);
+
+ sscanf(arg, "%llx,%llx", &xen_image.start, &xen_image.end);
+
+ verbprintf(vmisc, "xen_start = %llx, xen_end = %llx\n",
+ xen_image.start, xen_image.end);
+
+ if (!xen_image.start && !xen_image.end) {
+ fprintf(stderr, "error: mis-parsed xen range: %llx-%llx\n",
+ xen_image.start, xen_image.end);
+ exit(EXIT_FAILURE);
+ }
+}
+
+static struct kernel_image *
+opd_create_domain(char const * name)
+{
+ struct kernel_image * image = xmalloc(sizeof(struct kernel_image));
+
+ image->name = xstrdup(name);
+ image->start = 0;
+ image->end = 0;
+ list_add(&image->list, &other_domains);
+
+ return image;
+}
/**
* Allocate and initialise a kernel image description
@@ -192,5 +236,21 @@
return image;
}
+ if (xen_image.start <= trans->pc && xen_image.end > trans->pc)
+ return &xen_image;
+
+ if (trans->other_domain != -1) {
+ char name[64];
+ list_for_each(pos, &other_domains) {
+ image = list_entry(pos, struct kernel_image, list);
+ if ( image->domain_id == trans->other_domain)
+ return image;
+ }
+ sprintf (name, "__domain_%d", (int)(trans->other_domain));
+ image = opd_create_domain(name);
+ image->domain_id = trans->other_domain;
+ return image;
+ }
+
return NULL;
}
diff -PNaur oprofile-0.8.2/daemon/opd_kernel.h oprofile-0.8.2-xen/daemon/opd_kernel.h
--- oprofile-0.8.2/daemon/opd_kernel.h 2003-09-24 14:21:14.000000000 -0700
+++ oprofile-0.8.2-xen/daemon/opd_kernel.h 2005-05-19 14:52:19.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef OPD_KERNEL_H
@@ -20,6 +23,8 @@
/** create the kernel image */
void opd_create_vmlinux(char const * name, char const * arg);
+void opd_create_xen(char const * name, char const * arg);
+
/** opd_reread_module_info - parse /proc/modules for kernel modules */
void opd_reread_module_info(void);
@@ -28,6 +33,7 @@
char * name;
vma_t start;
vma_t end;
+ int domain_id;
struct list_head list;
};
diff -PNaur oprofile-0.8.2/daemon/opd_trans.c oprofile-0.8.2-xen/daemon/opd_trans.c
--- oprofile-0.8.2/daemon/opd_trans.c 2004-01-29 12:00:26.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/opd_trans.c 2005-05-23 15:12:28.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include "opd_trans.h"
@@ -189,9 +192,9 @@
}
-static void code_kernel_exit(struct transient * trans)
+static void code_user_enter(struct transient * trans)
{
- verbprintf(vmisc, "KERNEL_EXIT_SWITCH to user-space\n");
+ verbprintf(vmisc, "USER_ENTER_SWITCH to user-space\n");
trans->in_kernel = 0;
trans->current = NULL;
}
@@ -218,6 +221,27 @@
trans->tracing = TRACING_OFF;
}
+static void code_xen_enter(struct transient *trans)
+{
+ verbprintf(vmisc, "XEN_ENTER_SWITCH to xen\n");
+ trans->in_kernel = 1;
+ trans->current = NULL;
+ /* subtlety: we must keep trans->cookie cached, even though it's
+ * meaningless for Xen - we won't necessarily get a cookie switch
+ * on Xen exit. See comments in opd_sfile.c. It seems that we can
+ * get away with in_kernel = 1 as long as we supply the correct
+ * Xen image, and its address range in startup find_kernel_image
+ * is modified to look in the Xen image also
+ */
+}
+
+static void code_domain_switch(struct transient * trans)
+{
+ /* We have to remember the old kernel value, so we do the safe thing: increment */
+ trans->in_kernel++;
+ trans->current = NULL;
+ trans->other_domain = pop_buffer_value(trans);
+}
typedef void (*handler_t)(struct transient *);
@@ -227,12 +251,14 @@
&code_cpu_switch,
&code_cookie_switch,
&code_kernel_enter,
- &code_kernel_exit,
+ &code_user_enter,
&code_module_loaded,
/* tgid handled differently */
&code_unknown,
&code_trace_begin,
&code_trace_end,
+ &code_xen_enter,
+ &code_domain_switch,
};
@@ -252,7 +278,8 @@
.in_kernel = -1,
.cpu = -1,
.tid = -1,
- .tgid = -1
+ .tgid = -1,
+ .other_domain = -1
};
/* FIXME: was uint64_t but it can't compile on alpha where uint64_t
@@ -266,6 +293,13 @@
if (!is_escape_code(code)) {
opd_put_sample(&trans, code);
+ if (trans.other_domain != -1) {
+ /* if last sample was for another domain,
+ restore trans fields */
+ trans.other_domain = -1;
+ trans.current = NULL;
+ trans.in_kernel--;
+ }
continue;
}
diff -PNaur oprofile-0.8.2/daemon/opd_trans.h oprofile-0.8.2-xen/daemon/opd_trans.h
--- oprofile-0.8.2/daemon/opd_trans.h 2004-01-17 18:21:13.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/opd_trans.h 2005-05-19 16:48:59.000000000 -0700
@@ -44,6 +44,7 @@
unsigned long cpu;
pid_t tid;
pid_t tgid;
+ long long other_domain;
};
void opd_process_samples(char const * buffer, size_t count);
diff -PNaur oprofile-0.8.2/daemon/oprofiled.c oprofile-0.8.2-xen/daemon/oprofiled.c
--- oprofile-0.8.2/daemon/oprofiled.c 2004-05-28 10:21:39.000000000 -0700
+++ oprofile-0.8.2-xen/daemon/oprofiled.c 2005-05-19 17:08:46.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#include "config.h"
@@ -60,6 +63,9 @@
int no_vmlinux;
char * vmlinux;
char * kernel_range;
+int no_xen;
+char * xenimage;
+char * xen_range;
static char * verbose;
static char * binary_name_filter;
static char * events;
@@ -75,6 +81,9 @@
{ "kernel-range", 'r', POPT_ARG_STRING, &kernel_range, 0, "Kernel VMA range", "start-end", },
{ "vmlinux", 'k', POPT_ARG_STRING, &vmlinux, 0, "vmlinux kernel image", "file", },
{ "no-vmlinux", 0, POPT_ARG_NONE, &no_vmlinux, 0, "vmlinux kernel image file not available", NULL, },
+ { "xen-range", 0, POPT_ARG_STRING, &xen_range, 0, "Xen VMA range", "start-end", },
+ { "xen-image", 0, POPT_ARG_STRING, &xenimage, 0, "Xen image", "file", },
+ { "no-xen", 0, POPT_ARG_NONE, &no_xen, 0, "xen image not available", NULL, },
{ "image", 0, POPT_ARG_STRING, &binary_name_filter, 0, "image name filter", "profile these comma separated image" },
{ "separate-lib", 0, POPT_ARG_INT, &separate_lib, 0, "separate library samples for each distinct application", "[0|1]", },
{ "separate-kernel", 0, POPT_ARG_INT, &separate_kernel, 0, "separate kernel samples for each distinct application", "[0|1]", },
@@ -402,6 +411,27 @@
}
}
+ if (!no_xen) {
+ if (!xenimage || !strcmp("", xenimage)) {
+ fprintf(stderr, "oprofiled: no xen image specified.\n");
+ poptPrintHelp(optcon, stderr, 0);
+ exit(EXIT_FAILURE);
+ }
+
+ /* canonicalise xen image filename. */
+ tmp = xmalloc(PATH_MAX);
+ if (realpath(xenimage, tmp))
+ xenimage = tmp;
+ else
+ free(tmp);
+
+ if (!xen_range || !strcmp("", xen_range)) {
+ fprintf(stderr, "oprofiled: no Xen VMA range specified.\n");
+ poptPrintHelp(optcon, stderr, 0);
+ exit(EXIT_FAILURE);
+ }
+ }
+
opd_parse_events(events);
opd_parse_image_filter();
diff -PNaur oprofile-0.8.2/daemon/oprofiled.h oprofile-0.8.2-xen/daemon/oprofiled.h
--- oprofile-0.8.2/daemon/oprofiled.h 2004-01-29 12:00:26.000000000 -0800
+++ oprofile-0.8.2-xen/daemon/oprofiled.h 2005-04-08 13:42:06.000000000 -0700
@@ -7,6 +7,9 @@
*
* @author John Levon
* @author Philippe Elie
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef OPROFILED_H
@@ -68,5 +71,8 @@
extern int no_vmlinux;
extern char * vmlinux;
extern char * kernel_range;
+extern int no_xen;
+extern char * xenimage;
+extern char * xen_range;
#endif /* OPROFILED_H */
diff -PNaur oprofile-0.8.2/utils/opcontrol oprofile-0.8.2-xen/utils/opcontrol
--- oprofile-0.8.2/utils/opcontrol 2005-03-18 12:44:46.000000000 -0800
+++ oprofile-0.8.2-xen/utils/opcontrol 2005-05-20 17:06:51.000000000 -0700
@@ -245,6 +245,7 @@
CPU_BUF_SIZE=0
NOTE_SIZE=0
VMLINUX=
+ XENIMAGE="none"
VERBOSE=""
SEPARATE_LIB=0
SEPARATE_KERNEL=0
@@ -325,6 +326,9 @@
echo "NOTE_SIZE=$NOTE_SIZE" >> $SETUP_FILE
fi
echo "CALLGRAPH=$CALLGRAPH" >> $SETUP_FILE
+ echo "XENIMAGE=$XENIMAGE" >> $SETUP_FILE
+ echo "ACTIVE_DOMAINS=$ACTIVE_DOMAINS" >> $SETUP_FILE
+ echo "PASSIVE_DOMAINS=$PASSIVE_DOMAINS" >> $SETUP_FILE
}
@@ -383,39 +387,64 @@
echo "The specified vmlinux file \"$VMLINUX\" doesn't exist." >&2
exit 1
+
+# similar check for Xen image
+ if test -f "$XENIMAGE"; then
+ return
+ fi
+
+ if test "$XENIMAGE" = "none"; then
+ return
+ fi
+
+ echo "The spcefified XenImage file \"$XENIMAGE\" does not exist." >&2
+ exit 1
}
-# get start and end points of the kernel
-get_kernel_range()
+# get start and end points of a file image (linux kernel or xen)
+# get_image_range parameter: $1=type_of_image (xen or linux)
+get_image_range()
{
- if test ! -z "$KERNEL_RANGE"; then
- return;
+ if test "$1" = "xen"; then
+ if test ! -z "XEN_RANGE"; then
+ return;
+ fi
+ FILE_IMAGE="$XENIMAGE"
+ else
+ if test ! -z "$KERNEL_RANGE"; then
+ return;
+ fi
+ FILE_IMAGE="$VMLINUX"
fi
- if test "$VMLINUX" = "none"; then
+ if test "$FILE_IMAGE" = "none"; then
return;
fi
# start from section 0 and then continue till end of .text
- range_info=`objdump -h $VMLINUX 2>/dev/null | grep -w "0 "`
+ range_info=`objdump -h $FILE_IMAGE 2>/dev/null | grep -w "0 "`
tmp1=`echo $range_info | awk '{print $4}'`
- range_info=`objdump -h $VMLINUX 2>/dev/null | grep " .text "`
+ range_info=`objdump -h $FILE_IMAGE 2>/dev/null | grep " .text "`
tmp_length=`echo $range_info | awk '{print $3}'`
- tmp2=`objdump -h $VMLINUX --adjust-vma=0x$tmp_length 2>/dev/null | grep " .text " | awk '{print $4}'`
+ tmp2=`objdump -h $FILE_IMAGE --adjust-vma=0x$tmp_length 2>/dev/null | grep " .text " | awk '{print $4}'`
if test -z "$tmp1" -o -z "$tmp2"; then
- echo "The specified file $VMLINUX does not seem to be valid" >&2
- echo "Make sure you are using vmlinux not vmlinuz" >&2
+ echo "The specified file $FILE_IMAGE does not seem to be valid" >&2
+ echo "Make sure you are using the non-compressed image file (e.g. vmlinux not vmlinuz)" >&2
vecho "found start as \"$tmp1\", end as \"$tmp2\"" >&2
exit 1
fi
- KERNEL_RANGE="`echo $tmp1`,`echo $tmp2`"
- vecho "KERNEL_RANGE $KERNEL_RANGE"
+ if test "$1" = "xen"; then
+ XEN_RANGE="`echo $tmp1`,`echo $tmp2`"
+ vecho "XEN_RANGE $XEN_RANGE"
+ else
+ KERNEL_RANGE="`echo $tmp1`,`echo $tmp2`"
+ vecho "KERNEL_RANGE $KERNEL_RANGE"
+ fi
}
-
# validate --separate= parameters. This function is called with IFS=,
# so on each argument is splitted
validate_separate_args()
@@ -688,7 +717,7 @@
VMLINUX=$val
DO_SETUP=yes
# check validity
- get_kernel_range
+ get_image_range "linux"
;;
--no-vmlinux)
VMLINUX=none
@@ -699,6 +728,22 @@
KERNEL_RANGE=$val
DO_SETUP=yes
;;
+ --xen)
+ error_if_empty $arg $val
+ XENIMAGE=$val
+ DO_SETUP=yes
+ get_image_range "xen"
+ ;;
+ --active-domains)
+ error_if_empty $arg $val
+ ACTIVE_DOMAINS=$val
+ DO_SETUP=yes
+ ;;
+ --passive-domains)
+ error_if_empty $arg $val
+ PASSIVE_DOMAINS=$val
+ DO_SETUP=yes
+ ;;
--note-table-size)
error_if_empty $arg $val
if test $"KERNEL_SUPPORT" = "yes"; then
@@ -970,6 +1015,22 @@
fi
fi
+ if test -n "$ACTIVE_DOMAINS"; then
+ if test "$KERNEL_SUPPORT" = "yes"; then
+ echo $ACTIVE_DOMAINS >$MOUNT/active_domains
+ else
+ echo "active-domains not supported - ignored" >&2
+ fi
+ fi
+
+ if test -n "$PASSIVE_DOMAINS"; then
+ if test "$KERNEL_SUPPORT" = "yes"; then
+ echo $PASSIVE_DOMAINS >$MOUNT/passive_domains
+ else
+ echo "passive-domains not supported - ignored" >&2
+ fi
+ fi
+
if test $NOTE_SIZE != 0; then
set_param notesize $NOTE_SIZE
fi
@@ -1063,7 +1124,7 @@
do_setup
do_load_setup
check_valid_args
- get_kernel_range
+ get_image_range "linux"
do_param_setup
OPD_ARGS=" \
@@ -1080,6 +1141,12 @@
OPD_ARGS="$OPD_ARGS --vmlinux=$VMLINUX --kernel-range=$KERNEL_RANGE"
fi
+ if test "$XENIMAGE" = "none"; then
+ OPD_ARGS="$OPD_ARGS --no-xen"
+ else
+ OPD_ARGS="$OPD_ARGS --xen-image=$XENIMAGE --xen-range=$XEN_RANGE"
+ fi
+
if ! test -z "$IMAGE_FILTER"; then
OPD_ARGS="$OPD_ARGS --image=$IMAGE_FILTER"
fi
[-- Attachment #6: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-25 19:10 Santos, Jose Renato G
@ 2005-05-25 19:22 ` Nivedita Singhvi
2005-05-31 16:08 ` Andrew Theurer
1 sibling, 0 replies; 12+ messages in thread
From: Nivedita Singhvi @ 2005-05-25 19:22 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
Santos, Jose Renato G wrote:
> hi,
>
> I have attached patches for enabling system wide profiling
> using oprofile for xen unstable.
> The patches were generated against change-set 1.1507 (May 22).
> The 4 attached files are
Jose, thanks very much for this! I really needed it...
> 1) xenoprof.txt:
> - xenoprof overview and user guide
> 2) xenoprof-1.1-xen-3.0-devel.patch:
> - patch for xen
> 3) xenoprof-1.1-linux-2.6.11:
> - patch for linux. Note that this needs to be applied
> twice, once to linux-2.6.11-xen0 and once to
> linux-2.6.11-xenU. (This is different than the last
> patch which was created against the linux sparse tree).
> 4) xenoprof-1.1-oprofile-0.8.2:
> - patch for oprofile version 0.8.2
>
> Current known limitation/bugs are
> - No support for SMP guests yet.
> - when using passive domains, most samples are lost.
Excuse my ignorance - what exactly is a passive domain?
> I will be working on these issues and post new patches
> when I have them available.
thanks!
Nivedita
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Xenoprof patches for xen-unstable
@ 2005-05-25 20:35 Santos, Jose Renato G
2005-05-25 21:22 ` Nivedita Singhvi
0 siblings, 1 reply; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-05-25 20:35 UTC (permalink / raw)
To: Nivedita Singhvi; +Cc: Xen-devel
>> -----Original Message-----
>> From: Nivedita Singhvi [mailto:niv@us.ibm.com]
>> Sent: Wednesday, May 25, 2005 12:23 PM
>> To: Santos, Jose Renato G
>> Cc: Xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>
>>
>> Santos, Jose Renato G wrote:
>>
>> > hi,
>> >
>> > I have attached patches for enabling system wide profiling
>> > using oprofile for xen unstable.
>> > The patches were generated against change-set 1.1507 (May 22).
>> > The 4 attached files are
>>
>> Jose, thanks very much for this! I really needed it...
>>
You are welcome. Please keep me posted on how you are
using it and any insights you may get using xenoprof
>> > 1) xenoprof.txt:
>> > - xenoprof overview and user guide
>> > 2) xenoprof-1.1-xen-3.0-devel.patch:
>> > - patch for xen
>> > 3) xenoprof-1.1-linux-2.6.11:
>> > - patch for linux. Note that this needs to be applied
>> > twice, once to linux-2.6.11-xen0 and once to
>> > linux-2.6.11-xenU. (This is different than the last
>> > patch which was created against the linux sparse tree).
>> > 4) xenoprof-1.1-oprofile-0.8.2:
>> > - patch for oprofile version 0.8.2
>> >
>> > Current known limitation/bugs are
>> > - No support for SMP guests yet.
>> > - when using passive domains, most samples are lost.
>>
>> Excuse my ignorance - what exactly is a passive domain?
>>
Sorry for using the term without explanation.
We call passive domains domains that are
profiled but do not have support for collecting PC samples
(i.e they do not have oprofile running).
In this case their samples are sent to the initiator
domain (usually domain 0) which process the samples on
their behalf. In this case the samples
are not decoded to specific binary files/functions
but are assigned to the whole domain (coarse granularity
profiling). We are planning to enable fine granularity
profiling for the kernel code on passive domains in
the future. However, as I mentioned, xenoprof is not working
properly with passive domains yet.
In the normal mode of operation that is functional
at this time, every domain being profiled must be running
an oprofile kernel module to process the PC samples
generated when they are running. Each domain will
have a partial report of the samples that happened
when they were running.
System wide profiling is obtained by combining the
reports of all profiled domains.
I hope this is clearer now.
Renato
>> > I will be working on these issues and post new patches
>> > when I have them available.
>>
>> thanks!
>>
>> Nivedita
>>
>>
>>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-25 20:35 Santos, Jose Renato G
@ 2005-05-25 21:22 ` Nivedita Singhvi
0 siblings, 0 replies; 12+ messages in thread
From: Nivedita Singhvi @ 2005-05-25 21:22 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
Santos, Jose Renato G wrote:
> You are welcome. Please keep me posted on how you are
> using it and any insights you may get using xenoprof
Absolutely!
>>>> Current known limitation/bugs are
>>>> - No support for SMP guests yet.
>>>> - when using passive domains, most samples are lost.
Thanks for the explanation below. Speaking for myself, neither
of the above will be an issue, since we can run oprofile
on all the kernels and are not yet running SMP guests.
Many thanks!
Nivedita
>>>Excuse my ignorance - what exactly is a passive domain?
>>>
>
> Sorry for using the term without explanation.
> We call passive domains domains that are
> profiled but do not have support for collecting PC samples
> (i.e they do not have oprofile running).
> In this case their samples are sent to the initiator
> domain (usually domain 0) which process the samples on
> their behalf. In this case the samples
> are not decoded to specific binary files/functions
> but are assigned to the whole domain (coarse granularity
> profiling). We are planning to enable fine granularity
> profiling for the kernel code on passive domains in
> the future. However, as I mentioned, xenoprof is not working
> properly with passive domains yet.
>
> In the normal mode of operation that is functional
> at this time, every domain being profiled must be running
> an oprofile kernel module to process the PC samples
> generated when they are running. Each domain will
> have a partial report of the samples that happened
> when they were running.
> System wide profiling is obtained by combining the
> reports of all profiled domains.
>
> I hope this is clearer now.
>
> Renato
>
>
>>>> I will be working on these issues and post new patches
>>>> when I have them available.
>>>
>>>thanks!
>>>
>>>Nivedita
>>>
>>>
>>>
>>>
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-25 19:10 Santos, Jose Renato G
2005-05-25 19:22 ` Nivedita Singhvi
@ 2005-05-31 16:08 ` Andrew Theurer
1 sibling, 0 replies; 12+ messages in thread
From: Andrew Theurer @ 2005-05-31 16:08 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
On Wednesday 25 May 2005 14:10, Santos, Jose Renato G wrote:
> hi,
>
> I have attached patches for enabling system wide profiling
> using oprofile for xen unstable.
> The patches were generated against change-set 1.1507 (May 22).
> The 4 attached files are
>
> 1) xenoprof.txt:
> - xenoprof overview and user guide
> 2) xenoprof-1.1-xen-3.0-devel.patch:
> - patch for xen
> 3) xenoprof-1.1-linux-2.6.11:
> - patch for linux. Note that this needs to be applied
> twice, once to linux-2.6.11-xen0 and once to
> linux-2.6.11-xenU. (This is different than the last
> patch which was created against the linux sparse tree).
> 4) xenoprof-1.1-oprofile-0.8.2:
> - patch for oprofile version 0.8.2
Thanks very much for these; this is going to be extremly helpful. I am
using these on xen-unstable-bk-1.1518 currently. One problem: so far I
have not observed any ticks in xen-syms. I have tried SDET benchmark,
which on your previous patches (for xen-2.0-testing), I would get about
12% of ticks in xen-syms.
This is on a single cpu xen0 domain with no other domains running. I
verified that the XENIMAGE and XEN_RANGE were getting passed to
oprofiled correctly. I do not specify any active or passive domains
since this is the only domain running. Any ideas why I would not get
any ticks for xen-syms?
Has anyone else tried xenoprofile?
Thanks,
-Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Xenoprof patches for xen-unstable
@ 2005-05-31 16:35 Santos, Jose Renato G
2005-05-31 17:41 ` Andrew Theurer
0 siblings, 1 reply; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-05-31 16:35 UTC (permalink / raw)
To: Andrew Theurer; +Cc: Xen-devel
Andrew
This is weird. Something seems wrong.
I am not familiar with the benchmark you are
running. Is this something easy to try?
If you could send me the code and some instructions on
how to use it, I can try running the same benchmark
in my environment.
I will also spend sometime looking more carefully
at xenoprof and my tests to see if I find anything wrong
Renato
>> -----Original Message-----
>> From: Andrew Theurer [mailto:habanero@us.ibm.com]
>> Sent: Tuesday, May 31, 2005 9:08 AM
>> To: Santos, Jose Renato G
>> Cc: Xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>
>>
>> On Wednesday 25 May 2005 14:10, Santos, Jose Renato G wrote:
>> > hi,
>> >
>> > I have attached patches for enabling system wide profiling
>> > using oprofile for xen unstable.
>> > The patches were generated against change-set 1.1507 (May 22).
>> > The 4 attached files are
>> >
>> > 1) xenoprof.txt:
>> > - xenoprof overview and user guide
>> > 2) xenoprof-1.1-xen-3.0-devel.patch:
>> > - patch for xen
>> > 3) xenoprof-1.1-linux-2.6.11:
>> > - patch for linux. Note that this needs to be applied
>> > twice, once to linux-2.6.11-xen0 and once to
>> > linux-2.6.11-xenU. (This is different than the last
>> > patch which was created against the linux sparse tree).
>> > 4) xenoprof-1.1-oprofile-0.8.2:
>> > - patch for oprofile version 0.8.2
>>
>> Thanks very much for these; this is going to be extremly
>> helpful. I am
>> using these on xen-unstable-bk-1.1518 currently. One
>> problem: so far I
>> have not observed any ticks in xen-syms. I have tried SDET
>> benchmark,
>> which on your previous patches (for xen-2.0-testing), I
>> would get about
>> 12% of ticks in xen-syms.
>>
>> This is on a single cpu xen0 domain with no other domains
>> running. I
>> verified that the XENIMAGE and XEN_RANGE were getting passed to
>> oprofiled correctly. I do not specify any active or passive domains
>> since this is the only domain running. Any ideas why I
>> would not get
>> any ticks for xen-syms?
>>
>> Has anyone else tried xenoprofile?
>>
>> Thanks,
>>
>> -Andrew
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Xenoprof patches for xen-unstable
@ 2005-05-31 16:37 Santos, Jose Renato G
2005-05-31 17:37 ` Andrew Theurer
0 siblings, 1 reply; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-05-31 16:37 UTC (permalink / raw)
To: Santos, Jose Renato G, Andrew Theurer; +Cc: Xen-devel
Andrew,
Could you please send me the commands you are using
to run oprofile?
Thanks
Renato
>> -----Original Message-----
>> From: Santos, Jose Renato G
>> Sent: Tuesday, May 31, 2005 9:35 AM
>> To: 'Andrew Theurer'
>> Cc: Xen-devel@lists.xensource.com
>> Subject: RE: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>
>>
>>
>> Andrew
>>
>> This is weird. Something seems wrong.
>> I am not familiar with the benchmark you are
>> running. Is this something easy to try?
>> If you could send me the code and some instructions on
>> how to use it, I can try running the same benchmark
>> in my environment.
>> I will also spend sometime looking more carefully
>> at xenoprof and my tests to see if I find anything wrong
>>
>> Renato
>>
>> >> -----Original Message-----
>> >> From: Andrew Theurer [mailto:habanero@us.ibm.com]
>> >> Sent: Tuesday, May 31, 2005 9:08 AM
>> >> To: Santos, Jose Renato G
>> >> Cc: Xen-devel@lists.xensource.com
>> >> Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>> >>
>> >>
>> >> On Wednesday 25 May 2005 14:10, Santos, Jose Renato G wrote:
>> >> > hi,
>> >> >
>> >> > I have attached patches for enabling system wide profiling
>> >> > using oprofile for xen unstable.
>> >> > The patches were generated against change-set 1.1507 (May 22).
>> >> > The 4 attached files are
>> >> >
>> >> > 1) xenoprof.txt:
>> >> > - xenoprof overview and user guide
>> >> > 2) xenoprof-1.1-xen-3.0-devel.patch:
>> >> > - patch for xen
>> >> > 3) xenoprof-1.1-linux-2.6.11:
>> >> > - patch for linux. Note that this needs to be applied
>> >> > twice, once to linux-2.6.11-xen0 and once to
>> >> > linux-2.6.11-xenU. (This is different than the last
>> >> > patch which was created against the linux sparse tree).
>> >> > 4) xenoprof-1.1-oprofile-0.8.2:
>> >> > - patch for oprofile version 0.8.2
>> >>
>> >> Thanks very much for these; this is going to be extremly
>> >> helpful. I am
>> >> using these on xen-unstable-bk-1.1518 currently. One
>> >> problem: so far I
>> >> have not observed any ticks in xen-syms. I have tried SDET
>> >> benchmark,
>> >> which on your previous patches (for xen-2.0-testing), I
>> >> would get about
>> >> 12% of ticks in xen-syms.
>> >>
>> >> This is on a single cpu xen0 domain with no other domains
>> >> running. I
>> >> verified that the XENIMAGE and XEN_RANGE were getting passed to
>> >> oprofiled correctly. I do not specify any active or
>> passive domains
>> >> since this is the only domain running. Any ideas why I
>> >> would not get
>> >> any ticks for xen-syms?
>> >>
>> >> Has anyone else tried xenoprofile?
>> >>
>> >> Thanks,
>> >>
>> >> -Andrew
>> >>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-31 16:37 [PATCH] Xenoprof patches for xen-unstable Santos, Jose Renato G
@ 2005-05-31 17:37 ` Andrew Theurer
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Theurer @ 2005-05-31 17:37 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
Santos, Jose Renato G wrote:
> Andrew,
>
> Could you please send me the commands you are using
> to run oprofile?
>
>
Sure, I use:
opcontrol --vmlinux=/boot/vmlinux-`uname -r` --xen=/boot/xen-unstable-syms
opcontrol --init
opcontrol --start
<benchmark>
opcontrol --stop
opreport -l
**I changed opcontrol to echo XENIMAGE and XEN_RANGE to
~/.oprofile/daemonrc so they are picked up by oprofiled.
oprofiled cmd line looks like:
oprofiled --separate-lib=0 --separate-kernel=0 --spearate-thread=0
--separate-thread=0 --spearate-cpu=0
--events=GLOBAL_POWER_EVENTS:29:0:100000:1:1:1,
--vmlinux=/boot/2.6.11-xen0-up --kernel-range=c01000000,c03ed2ae
--xen-image=/boot/xen-unstable-syms --xen-range=c01000000,c03fc45e
> Thanks
>
> Renato
>
>
>
>>>-----Original Message-----
>>>From: Santos, Jose Renato G
>>>Sent: Tuesday, May 31, 2005 9:35 AM
>>>To: 'Andrew Theurer'
>>>Cc: Xen-devel@lists.xensource.com
>>>Subject: RE: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>>
>>>
>>>
>>> Andrew
>>>
>>> This is weird. Something seems wrong.
>>> I am not familiar with the benchmark you are
>>> running. Is this something easy to try?
>>> If you could send me the code and some instructions on
>>> how to use it, I can try running the same benchmark
>>> in my environment.
>>> I will also spend sometime looking more carefully
>>> at xenoprof and my tests to see if I find anything wrong
>>>
>>> Renato
>>>
>>>
>>>
>>>>>-----Original Message-----
>>>>>From: Andrew Theurer [mailto:habanero@us.ibm.com]
>>>>>Sent: Tuesday, May 31, 2005 9:08 AM
>>>>>To: Santos, Jose Renato G
>>>>>Cc: Xen-devel@lists.xensource.com
>>>>>Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>>>>
>>>>>
>>>>>On Wednesday 25 May 2005 14:10, Santos, Jose Renato G wrote:
>>>>>
>>>>>
>>>>>> hi,
>>>>>>
>>>>>> I have attached patches for enabling system wide profiling
>>>>>> using oprofile for xen unstable.
>>>>>> The patches were generated against change-set 1.1507 (May 22).
>>>>>> The 4 attached files are
>>>>>>
>>>>>> 1) xenoprof.txt:
>>>>>> - xenoprof overview and user guide
>>>>>> 2) xenoprof-1.1-xen-3.0-devel.patch:
>>>>>> - patch for xen
>>>>>> 3) xenoprof-1.1-linux-2.6.11:
>>>>>> - patch for linux. Note that this needs to be applied
>>>>>> twice, once to linux-2.6.11-xen0 and once to
>>>>>> linux-2.6.11-xenU. (This is different than the last
>>>>>> patch which was created against the linux sparse tree).
>>>>>> 4) xenoprof-1.1-oprofile-0.8.2:
>>>>>> - patch for oprofile version 0.8.2
>>>>>>
>>>>>>
>>>>>Thanks very much for these; this is going to be extremly
>>>>>helpful. I am
>>>>>using these on xen-unstable-bk-1.1518 currently. One
>>>>>problem: so far I
>>>>>have not observed any ticks in xen-syms. I have tried SDET
>>>>>benchmark,
>>>>>which on your previous patches (for xen-2.0-testing), I
>>>>>would get about
>>>>>12% of ticks in xen-syms.
>>>>>
>>>>>This is on a single cpu xen0 domain with no other domains
>>>>>running. I
>>>>>verified that the XENIMAGE and XEN_RANGE were getting passed to
>>>>>oprofiled correctly. I do not specify any active or
>>>>>
>>>>>
>>>passive domains
>>>
>>>
>>>>>since this is the only domain running. Any ideas why I
>>>>>would not get
>>>>>any ticks for xen-syms?
>>>>>
>>>>>Has anyone else tried xenoprofile?
>>>>>
>>>>>Thanks,
>>>>>
>>>>>-Andrew
>>>>>
>>>>>
>>>>>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-31 16:35 Santos, Jose Renato G
@ 2005-05-31 17:41 ` Andrew Theurer
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Theurer @ 2005-05-31 17:41 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
Santos, Jose Renato G wrote:
> Andrew
>
> This is weird. Something seems wrong.
> I am not familiar with the benchmark you are
> running. Is this something easy to try?
> If you could send me the code and some instructions on
> how to use it, I can try running the same benchmark
> in my environment.
> I will also spend sometime looking more carefully
> at xenoprof and my tests to see if I find anything wrong
>
>
Although I can not redistribute SDET, it probably is available to you
since you are at HP:
http://www.spec.org/osg/sdm91/
It's actually quite a useful benchmark as it has a lot of fork+exec and
you can really stress a system with it.
-Andrew
> Renato
>
>
>
>>>-----Original Message-----
>>>From: Andrew Theurer [mailto:habanero@us.ibm.com]
>>>Sent: Tuesday, May 31, 2005 9:08 AM
>>>To: Santos, Jose Renato G
>>>Cc: Xen-devel@lists.xensource.com
>>>Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>>
>>>
>>>On Wednesday 25 May 2005 14:10, Santos, Jose Renato G wrote:
>>>
>>>
>>>> hi,
>>>>
>>>> I have attached patches for enabling system wide profiling
>>>> using oprofile for xen unstable.
>>>> The patches were generated against change-set 1.1507 (May 22).
>>>> The 4 attached files are
>>>>
>>>> 1) xenoprof.txt:
>>>> - xenoprof overview and user guide
>>>> 2) xenoprof-1.1-xen-3.0-devel.patch:
>>>> - patch for xen
>>>> 3) xenoprof-1.1-linux-2.6.11:
>>>> - patch for linux. Note that this needs to be applied
>>>> twice, once to linux-2.6.11-xen0 and once to
>>>> linux-2.6.11-xenU. (This is different than the last
>>>> patch which was created against the linux sparse tree).
>>>> 4) xenoprof-1.1-oprofile-0.8.2:
>>>> - patch for oprofile version 0.8.2
>>>>
>>>>
>>>Thanks very much for these; this is going to be extremly
>>>helpful. I am
>>>using these on xen-unstable-bk-1.1518 currently. One
>>>problem: so far I
>>>have not observed any ticks in xen-syms. I have tried SDET
>>>benchmark,
>>>which on your previous patches (for xen-2.0-testing), I
>>>would get about
>>>12% of ticks in xen-syms.
>>>
>>>This is on a single cpu xen0 domain with no other domains
>>>running. I
>>>verified that the XENIMAGE and XEN_RANGE were getting passed to
>>>oprofiled correctly. I do not specify any active or passive domains
>>>since this is the only domain running. Any ideas why I
>>>would not get
>>>any ticks for xen-syms?
>>>
>>>Has anyone else tried xenoprofile?
>>>
>>>Thanks,
>>>
>>>-Andrew
>>>
>>>
>>>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Xenoprof patches for xen-unstable
@ 2005-05-31 18:22 Santos, Jose Renato G
2005-05-31 18:35 ` Andrew Theurer
0 siblings, 1 reply; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-05-31 18:22 UTC (permalink / raw)
To: Andrew Theurer; +Cc: Xen-devel
>> -----Original Message-----
>> From: Andrew Theurer [mailto:habanero@us.ibm.com]
>> Sent: Tuesday, May 31, 2005 10:38 AM
>> To: Santos, Jose Renato G
>> Cc: Xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>>
>>
>> Santos, Jose Renato G wrote:
>>
>> > Andrew,
>> >
>> > Could you please send me the commands you are using
>> > to run oprofile?
>> >
>> >
>>
>> Sure, I use:
>>
>> opcontrol --vmlinux=/boot/vmlinux-`uname -r`
>> --xen=/boot/xen-unstable-syms opcontrol --init
>>
>> opcontrol --start
>>
>> <benchmark>
>>
>> opcontrol --stop
>>
>> opreport -l
>>
>>
>> **I changed opcontrol to echo XENIMAGE and XEN_RANGE to
>> ~/.oprofile/daemonrc so they are picked up by oprofiled.
>>
>> oprofiled cmd line looks like:
>>
>> oprofiled --separate-lib=0 --separate-kernel=0 --spearate-thread=0
>> --separate-thread=0 --spearate-cpu=0
>> --events=GLOBAL_POWER_EVENTS:29:0:100000:1:1:1,
>> --vmlinux=/boot/2.6.11-xen0-up --kernel-range=c01000000,c03ed2ae
>> --xen-image=/boot/xen-unstable-syms --xen-range=c01000000,c03fc45e
>>
Andrew,
I think your changes to opcontrol are not doing the
right thing. Your --xen-range option to oprofiled
is not correct. It overlaps with the kernel range.
Look at my version of the oprofiled cmd line:
(see --xen-range values).
oprofiled --separate-lib=0 --separate-kernel=0
--separate-thread=0 --separate-cpu=0
--events=GLOBAL_POWER_EVENTS:29:0:100000:1:1:1,
--vmlinux=/boot/vmlinux-syms-2.6.11.10-xen0
--kernel-range=c0100000,c0422e31
--xen-image=/boot/xen-3.0-devel-syms
--xen-range=ff100000,ff13c567
That is probably the reason you are not seeing
any Xen samples
Renato
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Xenoprof patches for xen-unstable
2005-05-31 18:22 Santos, Jose Renato G
@ 2005-05-31 18:35 ` Andrew Theurer
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Theurer @ 2005-05-31 18:35 UTC (permalink / raw)
To: Santos, Jose Renato G; +Cc: Xen-devel
> >> **I changed opcontrol to echo XENIMAGE and XEN_RANGE to
> >> ~/.oprofile/daemonrc so they are picked up by oprofiled.
> >>
> >> oprofiled cmd line looks like:
> >>
> >> oprofiled --separate-lib=0 --separate-kernel=0 --spearate-thread=0
> >> --separate-thread=0 --spearate-cpu=0
> >> --events=GLOBAL_POWER_EVENTS:29:0:100000:1:1:1,
> >> --vmlinux=/boot/2.6.11-xen0-up --kernel-range=c01000000,c03ed2ae
> >> --xen-image=/boot/xen-unstable-syms --xen-range=c01000000,c03fc45e
>
> Andrew,
>
> I think your changes to opcontrol are not doing the
> right thing. Your --xen-range option to oprofiled
> is not correct. It overlaps with the kernel range.
> Look at my version of the oprofiled cmd line:
> (see --xen-range values).
Renato, it looks like I did not update my syms file. Thanks for
steering me the right way. It works perfectly now!
-Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH] Xenoprof patches for xen-unstable
@ 2005-06-04 7:20 Santos, Jose Renato G
0 siblings, 0 replies; 12+ messages in thread
From: Santos, Jose Renato G @ 2005-06-04 7:20 UTC (permalink / raw)
To: Xen-devel
[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]
A new version of the patch removing the changes
to the the defer_nmi code in entry.S (based on
Keir's feedback) and some small bug fixes.
Renato
Signed-off-by: Jose Renato Santos <jsantos@hpl.hp.com>
> -----Original Message-----
> From: Santos, Jose Renato G
> Sent: Wednesday, May 25, 2005 12:11 PM
> To: Xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH] Xenoprof patches for xen-unstable
>
>
>
> hi,
>
> I have attached patches for enabling system wide profiling
> using oprofile for xen unstable.
> The patches were generated against change-set 1.1507 (May 22).
> The 4 attached files are
>
> 1) xenoprof.txt:
> - xenoprof overview and user guide
> 2) xenoprof-1.1-xen-3.0-devel.patch:
> - patch for xen
> 3) xenoprof-1.1-linux-2.6.11:
> - patch for linux. Note that this needs to be applied
> twice, once to linux-2.6.11-xen0 and once to
> linux-2.6.11-xenU. (This is different than the last
> patch which was created against the linux sparse tree).
> 4) xenoprof-1.1-oprofile-0.8.2:
> - patch for oprofile version 0.8.2
>
> Current known limitation/bugs are
> - No support for SMP guests yet.
> - when using passive domains, most samples are lost.
>
> I will be working on these issues and post new patches
> when I have them available.
>
> Thanks
>
> Renato
>
>
[-- Attachment #2: xenoprof-1.2-xen-3.0-devel.patch --]
[-- Type: application/octet-stream, Size: 60106 bytes --]
diff -Naur xen-unstable/xen/arch/x86/Makefile xen-unstable-prof-1.2/xen/arch/x86/Makefile
--- xen-unstable/xen/arch/x86/Makefile 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof-1.2/xen/arch/x86/Makefile 2005-05-24 11:54:47.000000000 -0700
@@ -12,7 +12,10 @@
OBJS := $(patsubst cdb%.o,,$(OBJS))
endif
+OBJS += oprofile/oprofile.o
+
default: $(TARGET)
+ make -C oprofile
$(TARGET): $(TARGET)-syms boot/mkelf32
./boot/mkelf32 $(TARGET)-syms $(TARGET) 0x100000
@@ -30,12 +33,16 @@
boot/mkelf32: boot/mkelf32.c
$(HOSTCC) $(HOSTCFLAGS) -o $@ $<
+oprofile/oprofile.o:
+ $(MAKE) -C oprofile
+
clean:
rm -f *.o *.s *~ core boot/*.o boot/*~ boot/core boot/mkelf32
rm -f x86_32/*.o x86_32/*~ x86_32/core
rm -f x86_64/*.o x86_64/*~ x86_64/core
rm -f mtrr/*.o mtrr/*~ mtrr/core
rm -f acpi/*.o acpi/*~ acpi/core
+ rm -f oprofile/*.o
delete-unfresh-files:
# nothing
diff -Naur xen-unstable/xen/arch/x86/nmi.c xen-unstable-prof-1.2/xen/arch/x86/nmi.c
--- xen-unstable/xen/arch/x86/nmi.c 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof-1.2/xen/arch/x86/nmi.c 2005-05-24 11:54:47.000000000 -0700
@@ -5,6 +5,10 @@
*
* Started by Ingo Molnar <mingo@redhat.com>
*
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ *
* Fixes:
* Mikael Pettersson : AMD K7 support for local APIC NMI watchdog.
* Mikael Pettersson : Power Management for local APIC NMI watchdog.
@@ -33,6 +37,28 @@
extern int logical_proc_id[];
+/*
+ * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
+ * - it may be reserved by some other driver, or not
+ * - when not reserved by some other driver, it may be used for
+ * the NMI watchdog, or not
+ *
+ * This is maintained separately from nmi_active because the NMI
+ * watchdog may also be driven from the I/O APIC timer.
+ */
+static spinlock_t lapic_nmi_owner_lock = SPIN_LOCK_UNLOCKED;
+static unsigned int lapic_nmi_owner;
+#define LAPIC_NMI_WATCHDOG (1<<0)
+#define LAPIC_NMI_RESERVED (1<<1)
+
+/* nmi_active:
+ * +1: the lapic NMI watchdog is active, but can be disabled
+ * 0: the lapic NMI watchdog has not been set up, and cannot
+ * be enabled
+ * -1: the lapic NMI watchdog is disabled, but can be enabled
+ */
+int nmi_active;
+
#define K7_EVNTSEL_ENABLE (1 << 22)
#define K7_EVNTSEL_INT (1 << 20)
#define K7_EVNTSEL_OS (1 << 17)
@@ -69,9 +95,6 @@
*/
#define MSR_P4_IQ_COUNTER0 0x30C
#define MSR_P4_IQ_COUNTER1 0x30D
-#define MSR_P4_IQ_CCCR0 0x36C
-#define MSR_P4_IQ_CCCR1 0x36D
-#define MSR_P4_CRU_ESCR0 0x3B8 /* ESCR no. 4 */
#define P4_NMI_CRU_ESCR0 \
(P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS0|P4_ESCR_USR0| \
P4_ESCR_OS1|P4_ESCR_USR1)
@@ -123,6 +146,69 @@
* Original code written by Keith Owens.
*/
+static void disable_lapic_nmi_watchdog(void)
+{
+ if (nmi_active <= 0)
+ return;
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ wrmsr(MSR_K7_EVNTSEL0, 0, 0);
+ break;
+ case X86_VENDOR_INTEL:
+ switch (boot_cpu_data.x86) {
+ case 6:
+ wrmsr(MSR_P6_EVNTSEL0, 0, 0);
+ break;
+ case 15:
+ if (logical_proc_id[smp_processor_id()] == 0)
+ {
+ wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
+ wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
+ } else {
+ wrmsr(MSR_P4_IQ_CCCR1, 0, 0);
+ }
+ break;
+ }
+ break;
+ }
+ nmi_active = -1;
+ /* tell do_nmi() and others that we're not active any more */
+ nmi_watchdog = 0;
+}
+
+static void enable_lapic_nmi_watchdog(void)
+{
+ if (nmi_active < 0) {
+ nmi_watchdog = NMI_LOCAL_APIC;
+ setup_apic_nmi_watchdog();
+ }
+}
+
+int reserve_lapic_nmi(void)
+{
+ unsigned int old_owner;
+ spin_lock(&lapic_nmi_owner_lock);
+ old_owner = lapic_nmi_owner;
+ lapic_nmi_owner |= LAPIC_NMI_RESERVED;
+ spin_unlock(&lapic_nmi_owner_lock);
+ if (old_owner & LAPIC_NMI_RESERVED)
+ return -EBUSY;
+ if (old_owner & LAPIC_NMI_WATCHDOG)
+ disable_lapic_nmi_watchdog();
+ return 0;
+}
+
+void release_lapic_nmi(void)
+{
+ unsigned int new_owner;
+ spin_lock(&lapic_nmi_owner_lock);
+ new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
+ lapic_nmi_owner = new_owner;
+ spin_unlock(&lapic_nmi_owner_lock);
+ if (new_owner & LAPIC_NMI_WATCHDOG)
+ enable_lapic_nmi_watchdog();
+}
+
static void __pminit clear_msr_range(unsigned int base, unsigned int n)
{
unsigned int i;
@@ -247,6 +333,8 @@
default:
return;
}
+ lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
+ nmi_active = 1;
nmi_pm_init();
}
@@ -333,3 +421,7 @@
}
}
}
+
+EXPORT_SYMBOL(reserve_lapic_nmi);
+EXPORT_SYMBOL(release_lapic_nmi);
+
diff -Naur xen-unstable/xen/arch/x86/oprofile/Makefile xen-unstable-prof-1.2/xen/arch/x86/oprofile/Makefile
--- xen-unstable/xen/arch/x86/oprofile/Makefile 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/Makefile 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,9 @@
+
+include $(BASEDIR)/Rules.mk
+
+default: $(OBJS)
+ $(LD) $(LDFLAGS) -r -o oprofile.o $(OBJS)
+
+%.o: %.c $(HDRS) Makefile
+ $(CC) $(CFLAGS) -c $< -o $@
+
diff -Naur xen-unstable/xen/arch/x86/oprofile/nmi_int.c xen-unstable-prof-1.2/xen/arch/x86/oprofile/nmi_int.c
--- xen-unstable/xen/arch/x86/oprofile/nmi_int.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/nmi_int.c 2005-06-03 18:39:32.000000000 -0700
@@ -0,0 +1,433 @@
+/**
+ * @file nmi_int.c
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon <levon@movementarian.org>
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/event.h>
+#include <xen/types.h>
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <public/xen.h>
+#include <asm/nmi.h>
+#include <asm/msr.h>
+#include <asm/apic.h>
+
+#include "op_counter.h"
+#include "op_x86_model.h"
+
+static struct op_x86_model_spec const * model;
+static struct op_msrs cpu_msrs[NR_CPUS];
+static unsigned long saved_lvtpc[NR_CPUS];
+
+#define VIRQ_BITMASK_SIZE (MAX_OPROF_DOMAINS/32 + 1)
+
+extern int active_domains[MAX_OPROF_DOMAINS];
+extern unsigned int adomains;
+
+extern struct domain * primary_profiler;
+extern struct domain * adomain_ptrs[MAX_OPROF_DOMAINS];
+extern unsigned long virq_ovf_pending[VIRQ_BITMASK_SIZE];
+
+extern int is_active(struct domain *d);
+extern int active_id(struct domain *d);
+extern int is_passive(struct domain *d);
+extern int is_profiled(struct domain *d);
+
+
+int nmi_profiling_started = 0;
+
+int active_virq_count = 0;
+int passive_virq_count = 0;
+int other_virq_count = 0;
+int other_id = -1;
+int xen_count = 0;
+int dom_count = 0;
+int ovf = 0;
+
+int nmi_callback(struct cpu_user_regs * regs, int cpu)
+{
+ int xen_mode = 0;
+
+ ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
+ xen_mode = RING_0(regs);
+ if (ovf) {
+ if (xen_mode)
+ xen_count++;
+ else
+ dom_count++;
+
+ if (is_active(current->domain)) {
+ /* This is slightly incorrect. If we do not deliver
+ OVF virtual interrupts in a synchronous
+ manner, a process switch may happen in the domain
+ between the point the sample was collected and
+ the point at which a VIRQ was delivered. However,
+ it is not safe to call send_guest_virq from this
+ NMI context, it may lead to a deadlock since NMIs are
+ unmaskable. One optimization that we can do is
+ that if the sample occurs while domain code is
+ runnng, we know that it is safe to call
+ send_guest_virq, since we know no Xen code
+ is running at that time.
+ However, this may distort the sample distribution,
+ because we may lose more Xen mode samples.*/
+ active_virq_count++;
+ if (!xen_mode) {
+ send_guest_virq(current, VIRQ_PMC_OVF);
+ clear_bit(active_id(current->domain), &virq_ovf_pending[0]);
+ } else
+ set_bit(active_id(current->domain), &virq_ovf_pending[0]);
+ primary_profiler->shared_info->active_samples++;
+ }
+ else if (is_passive(current->domain)) {
+ set_bit(active_id(primary_profiler), &virq_ovf_pending[0]);
+ passive_virq_count++;
+ primary_profiler->shared_info->passive_samples++;
+ }
+ else {
+ other_virq_count++;
+ other_id = current->domain->domain_id;
+ primary_profiler->shared_info->other_samples++;
+ }
+ }
+ return 1;
+}
+
+static void free_msrs(void)
+{
+ int i;
+ for (i = 0; i < NR_CPUS; ++i) {
+ xfree(cpu_msrs[i].counters);
+ cpu_msrs[i].counters = NULL;
+ xfree(cpu_msrs[i].controls);
+ cpu_msrs[i].controls = NULL;
+ }
+}
+
+static int allocate_msrs(void)
+{
+ int success = 1;
+ size_t controls_size = sizeof(struct op_msr) * model->num_controls;
+ size_t counters_size = sizeof(struct op_msr) * model->num_counters;
+
+ int i;
+ for (i = 0; i < NR_CPUS; ++i) {
+ //if (!cpu_online(i))
+ if (!test_bit(i, &cpu_online_map))
+ continue;
+
+ cpu_msrs[i].counters = xmalloc_bytes(counters_size);
+ if (!cpu_msrs[i].counters) {
+ success = 0;
+ break;
+ }
+ cpu_msrs[i].controls = xmalloc_bytes(controls_size);
+ if (!cpu_msrs[i].controls) {
+ success = 0;
+ break;
+ }
+ }
+ if (!success)
+ free_msrs();
+
+ return success;
+}
+
+static void nmi_cpu_save_registers(struct op_msrs * msrs)
+{
+ unsigned int const nr_ctrs = model->num_counters;
+ unsigned int const nr_ctrls = model->num_controls;
+ struct op_msr * counters = msrs->counters;
+ struct op_msr * controls = msrs->controls;
+ unsigned int i;
+
+ for (i = 0; i < nr_ctrs; ++i) {
+ rdmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
+
+ for (i = 0; i < nr_ctrls; ++i) {
+ rdmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
+}
+
+static void nmi_save_registers(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ model->fill_in_addresses(msrs);
+ nmi_cpu_save_registers(msrs);
+}
+
+int nmi_reserve_counters(void)
+{
+ if (!allocate_msrs())
+ return -ENOMEM;
+
+ /* We walk a thin line between law and rape here.
+ * We need to be careful to install our NMI handler
+ * without actually triggering any NMIs as this will
+ * break the core code horrifically.
+ */
+ /* Don't we need to do this on all CPUs?*/
+ if (reserve_lapic_nmi() < 0) {
+ free_msrs();
+ return -EBUSY;
+ }
+ /* We need to serialize save and setup for HT because the subset
+ * of msrs are distinct for save and setup operations
+ */
+ on_each_cpu(nmi_save_registers, NULL, 0, 1);
+ return 0;
+}
+
+static void nmi_cpu_setup(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ model->setup_ctrs(msrs);
+}
+
+int nmi_setup_events(void)
+{
+ on_each_cpu(nmi_cpu_setup, NULL, 0, 1);
+ return 0;
+}
+
+int nmi_enable_virq()
+{
+ set_nmi_callback(nmi_callback);
+ return 0;
+}
+
+static void nmi_cpu_start(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs const * msrs = &cpu_msrs[cpu];
+ saved_lvtpc[cpu] = apic_read(APIC_LVTPC);
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ model->start(msrs);
+}
+
+int nmi_start(void)
+{
+ on_each_cpu(nmi_cpu_start, NULL, 0, 1);
+ nmi_profiling_started = 1;
+ return 0;
+}
+
+static void nmi_cpu_stop(void * dummy)
+{
+ unsigned int v;
+ int cpu = smp_processor_id();
+ struct op_msrs const * msrs = &cpu_msrs[cpu];
+ model->stop(msrs);
+
+ /* restoring APIC_LVTPC can trigger an apic error because the delivery
+ * mode and vector nr combination can be illegal. That's by design: on
+ * power on apic lvt contain a zero vector nr which are legal only for
+ * NMI delivery mode. So inhibit apic err before restoring lvtpc
+ */
+ if (!(apic_read(APIC_LVTPC) & APIC_DM_NMI)
+ || (apic_read(APIC_LVTPC) & APIC_LVT_MASKED)) {
+ printk("nmi_stop: APIC not good %ul\n", apic_read(APIC_LVTPC));
+ mdelay(5000);
+ }
+ v = apic_read(APIC_LVTERR);
+ apic_write(APIC_LVTERR, v | APIC_LVT_MASKED);
+ apic_write(APIC_LVTPC, saved_lvtpc[cpu]);
+ apic_write(APIC_LVTERR, v);
+}
+
+void nmi_stop(void)
+{
+ nmi_profiling_started = 0;
+ on_each_cpu(nmi_cpu_stop, NULL, 0, 1);
+ active_virq_count = 0;
+ passive_virq_count = 0;
+ other_virq_count = 0;
+ xen_count = 0;
+ dom_count = 0;
+}
+
+extern unsigned int read_ctr(struct op_msrs const * const msrs, int ctr);
+
+void nmi_sanity_check(struct cpu_user_regs *regs, int cpu)
+{
+ int i;
+ int masked = 0;
+
+ /* We may have missed some NMI interrupts if we were already
+ in an NMI context at that time. If this happens, then
+ the counters are not reset and in the case of P4, the
+ APIC LVT disable mask is set. In both cases we end up
+ losing samples. On P4, this condition can be detected
+ by checking the APIC LVT mask. But in P6, we need to
+ examine the counters for overflow. So, every timer
+ interrupt, we check that everything is OK */
+
+ if (apic_read(APIC_LVTPC) & APIC_LVT_MASKED)
+ masked = 1;
+
+ nmi_callback(regs, cpu);
+
+ if (ovf && masked) {
+ if (is_active(current->domain))
+ current->domain->shared_info->nmi_restarts++;
+ else if (is_passive(current->domain))
+ primary_profiler->shared_info->nmi_restarts++;
+ }
+
+ /*if (jiffies %1000 == 0) {
+ printk("cpu %d: sample count %d %d %d at %u\n", cpu, active_virq_count, passive_virq_count, other_virq_count, jiffies);
+ printk("other task id %d\n", other_id);
+ printk("%d in xen, %d in domain\n", xen_count, dom_count);
+ printk("counters %p %p\n", read_ctr(&cpu_msrs[cpu], 0), read_ctr(&cpu_msrs[cpu], 1));
+ }*/
+
+
+ for (i = 0; i < adomains; i++)
+ if (test_and_clear_bit(i, &virq_ovf_pending[0])) {
+ /* For now we do not support profiling of SMP guests */
+ /* virq is delivered to first VCPU */
+ send_guest_virq(adomain_ptrs[i]->exec_domain[0], VIRQ_PMC_OVF);
+ }
+}
+
+void nmi_disable_virq(void)
+{
+ unset_nmi_callback();
+}
+
+static void nmi_restore_registers(struct op_msrs * msrs)
+{
+ unsigned int const nr_ctrs = model->num_counters;
+ unsigned int const nr_ctrls = model->num_controls;
+ struct op_msr * counters = msrs->counters;
+ struct op_msr * controls = msrs->controls;
+ unsigned int i;
+
+ for (i = 0; i < nr_ctrls; ++i) {
+ wrmsr(controls[i].addr,
+ controls[i].saved.low,
+ controls[i].saved.high);
+ }
+
+ for (i = 0; i < nr_ctrs; ++i) {
+ wrmsr(counters[i].addr,
+ counters[i].saved.low,
+ counters[i].saved.high);
+ }
+}
+
+static void nmi_cpu_shutdown(void * dummy)
+{
+ int cpu = smp_processor_id();
+ struct op_msrs * msrs = &cpu_msrs[cpu];
+ nmi_restore_registers(msrs);
+}
+
+void nmi_release_counters(void)
+{
+ on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
+ release_lapic_nmi();
+ free_msrs();
+}
+
+struct op_counter_config counter_config[OP_MAX_COUNTER];
+
+static int __init p4_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 3)
+ return 0;
+
+#ifndef CONFIG_SMP
+ model = &op_p4_spec;
+ return 1;
+#else
+ //switch (smp_num_siblings) {
+ if (cpu_has_ht)
+ {
+ model = &op_p4_ht2_spec;
+ return 1;
+ }
+ else
+ {
+ model = &op_p4_spec;
+ return 1;
+ }
+#endif
+ return 0;
+}
+
+
+static int __init ppro_init(void)
+{
+ __u8 cpu_model = current_cpu_data.x86_model;
+
+ if (cpu_model > 0xd)
+ return 0;
+
+ model = &op_ppro_spec;
+ return 1;
+}
+
+int nmi_init(int *num_events, int *is_primary)
+{
+ __u8 vendor = current_cpu_data.x86_vendor;
+ __u8 family = current_cpu_data.x86;
+ int prim = 0;
+
+ if (!cpu_has_apic)
+ return -ENODEV;
+
+ if (primary_profiler == NULL) {
+ primary_profiler = current->domain;
+ prim = 1;
+ }
+
+ if (primary_profiler != current->domain)
+ goto out;
+
+ switch (vendor) {
+ case X86_VENDOR_INTEL:
+ switch (family) {
+ /* Pentium IV */
+ case 0xf:
+ if (!p4_init())
+ return -ENODEV;
+ break;
+ /* A P6-class processor */
+ case 6:
+ if (!ppro_init())
+ return -ENODEV;
+ break;
+ default:
+ return -ENODEV;
+ }
+ break;
+ default:
+ return -ENODEV;
+ }
+out:
+ if (copy_to_user((void *)num_events, (void *)&model->num_counters, sizeof(int)))
+ return -EFAULT;
+ if (copy_to_user((void *)is_primary, (void *)&prim, sizeof(int)))
+ return -EFAULT;
+
+ return 0;
+}
+
diff -Naur xen-unstable/xen/arch/x86/oprofile/op_counter.h xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_counter.h
--- xen-unstable/xen/arch/x86/oprofile/op_counter.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_counter.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,33 @@
+/**
+ * @file op_counter.h
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#ifndef OP_COUNTER_H
+#define OP_COUNTER_H
+
+#define OP_MAX_COUNTER 8
+
+/* Per-perfctr configuration as set via
+ * oprofilefs.
+ */
+struct op_counter_config {
+ unsigned long count;
+ unsigned long enabled;
+ unsigned long event;
+ unsigned long kernel;
+ unsigned long user;
+ unsigned long unit_mask;
+};
+
+extern struct op_counter_config counter_config[];
+
+#endif /* OP_COUNTER_H */
diff -Naur xen-unstable/xen/arch/x86/oprofile/op_model_p4.c xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_model_p4.c
--- xen-unstable/xen/arch/x86/oprofile/op_model_p4.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_model_p4.c 2005-06-03 13:31:58.000000000 -0700
@@ -0,0 +1,744 @@
+/**
+ * @file op_model_p4.c
+ * P4 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/types.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <asm/apic.h>
+#include <asm/processor.h>
+#include <xen/sched.h>
+
+#include "op_x86_model.h"
+#include "op_counter.h"
+
+#define NUM_EVENTS 39
+
+#define NUM_COUNTERS_NON_HT 8
+#define NUM_ESCRS_NON_HT 45
+#define NUM_CCCRS_NON_HT 18
+#define NUM_CONTROLS_NON_HT (NUM_ESCRS_NON_HT + NUM_CCCRS_NON_HT)
+
+#define NUM_COUNTERS_HT2 4
+#define NUM_ESCRS_HT2 23
+#define NUM_CCCRS_HT2 9
+#define NUM_CONTROLS_HT2 (NUM_ESCRS_HT2 + NUM_CCCRS_HT2)
+
+static unsigned int num_counters = NUM_COUNTERS_NON_HT;
+
+
+/* this has to be checked dynamically since the
+ hyper-threadedness of a chip is discovered at
+ kernel boot-time. */
+static inline void setup_num_counters(void)
+{
+#ifdef CONFIG_SMP
+ if (cpu_has_ht)
+ num_counters = NUM_COUNTERS_HT2;
+#endif
+}
+
+static int inline addr_increment(void)
+{
+#ifdef CONFIG_SMP
+ return cpu_has_ht ? 2 : 1;
+#else
+ return 1;
+#endif
+}
+
+
+/* tables to simulate simplified hardware view of p4 registers */
+struct p4_counter_binding {
+ int virt_counter;
+ int counter_address;
+ int cccr_address;
+};
+
+struct p4_event_binding {
+ int escr_select; /* value to put in CCCR */
+ int event_select; /* value to put in ESCR */
+ struct {
+ int virt_counter; /* for this counter... */
+ int escr_address; /* use this ESCR */
+ } bindings[2];
+};
+
+/* nb: these CTR_* defines are a duplicate of defines in
+ event/i386.p4*events. */
+
+
+#define CTR_BPU_0 (1 << 0)
+#define CTR_MS_0 (1 << 1)
+#define CTR_FLAME_0 (1 << 2)
+#define CTR_IQ_4 (1 << 3)
+#define CTR_BPU_2 (1 << 4)
+#define CTR_MS_2 (1 << 5)
+#define CTR_FLAME_2 (1 << 6)
+#define CTR_IQ_5 (1 << 7)
+
+static struct p4_counter_binding p4_counters [NUM_COUNTERS_NON_HT] = {
+ { CTR_BPU_0, MSR_P4_BPU_PERFCTR0, MSR_P4_BPU_CCCR0 },
+ { CTR_MS_0, MSR_P4_MS_PERFCTR0, MSR_P4_MS_CCCR0 },
+ { CTR_FLAME_0, MSR_P4_FLAME_PERFCTR0, MSR_P4_FLAME_CCCR0 },
+ { CTR_IQ_4, MSR_P4_IQ_PERFCTR4, MSR_P4_IQ_CCCR4 },
+ { CTR_BPU_2, MSR_P4_BPU_PERFCTR2, MSR_P4_BPU_CCCR2 },
+ { CTR_MS_2, MSR_P4_MS_PERFCTR2, MSR_P4_MS_CCCR2 },
+ { CTR_FLAME_2, MSR_P4_FLAME_PERFCTR2, MSR_P4_FLAME_CCCR2 },
+ { CTR_IQ_5, MSR_P4_IQ_PERFCTR5, MSR_P4_IQ_CCCR5 }
+};
+
+#define NUM_UNUSED_CCCRS NUM_CCCRS_NON_HT - NUM_COUNTERS_NON_HT
+
+/* All cccr we don't use. */
+static int p4_unused_cccr[NUM_UNUSED_CCCRS] = {
+ MSR_P4_BPU_CCCR1, MSR_P4_BPU_CCCR3,
+ MSR_P4_MS_CCCR1, MSR_P4_MS_CCCR3,
+ MSR_P4_FLAME_CCCR1, MSR_P4_FLAME_CCCR3,
+ MSR_P4_IQ_CCCR0, MSR_P4_IQ_CCCR1,
+ MSR_P4_IQ_CCCR2, MSR_P4_IQ_CCCR3
+};
+
+/* p4 event codes in libop/op_event.h are indices into this table. */
+
+static struct p4_event_binding p4_events[NUM_EVENTS] = {
+
+ { /* BRANCH_RETIRED */
+ 0x05, 0x06,
+ { {CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ {CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* MISPRED_BRANCH_RETIRED */
+ 0x04, 0x03,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* TC_DELIVER_MODE */
+ 0x01, 0x01,
+ { { CTR_MS_0, MSR_P4_TC_ESCR0},
+ { CTR_MS_2, MSR_P4_TC_ESCR1} }
+ },
+
+ { /* BPU_FETCH_REQUEST */
+ 0x00, 0x03,
+ { { CTR_BPU_0, MSR_P4_BPU_ESCR0},
+ { CTR_BPU_2, MSR_P4_BPU_ESCR1} }
+ },
+
+ { /* ITLB_REFERENCE */
+ 0x03, 0x18,
+ { { CTR_BPU_0, MSR_P4_ITLB_ESCR0},
+ { CTR_BPU_2, MSR_P4_ITLB_ESCR1} }
+ },
+
+ { /* MEMORY_CANCEL */
+ 0x05, 0x02,
+ { { CTR_FLAME_0, MSR_P4_DAC_ESCR0},
+ { CTR_FLAME_2, MSR_P4_DAC_ESCR1} }
+ },
+
+ { /* MEMORY_COMPLETE */
+ 0x02, 0x08,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* LOAD_PORT_REPLAY */
+ 0x02, 0x04,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* STORE_PORT_REPLAY */
+ 0x02, 0x05,
+ { { CTR_FLAME_0, MSR_P4_SAAT_ESCR0},
+ { CTR_FLAME_2, MSR_P4_SAAT_ESCR1} }
+ },
+
+ { /* MOB_LOAD_REPLAY */
+ 0x02, 0x03,
+ { { CTR_BPU_0, MSR_P4_MOB_ESCR0},
+ { CTR_BPU_2, MSR_P4_MOB_ESCR1} }
+ },
+
+ { /* PAGE_WALK_TYPE */
+ 0x04, 0x01,
+ { { CTR_BPU_0, MSR_P4_PMH_ESCR0},
+ { CTR_BPU_2, MSR_P4_PMH_ESCR1} }
+ },
+
+ { /* BSQ_CACHE_REFERENCE */
+ 0x07, 0x0c,
+ { { CTR_BPU_0, MSR_P4_BSU_ESCR0},
+ { CTR_BPU_2, MSR_P4_BSU_ESCR1} }
+ },
+
+ { /* IOQ_ALLOCATION */
+ 0x06, 0x03,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { 0, 0 } }
+ },
+
+ { /* IOQ_ACTIVE_ENTRIES */
+ 0x06, 0x1a,
+ { { CTR_BPU_2, MSR_P4_FSB_ESCR1},
+ { 0, 0 } }
+ },
+
+ { /* FSB_DATA_ACTIVITY */
+ 0x06, 0x17,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { CTR_BPU_2, MSR_P4_FSB_ESCR1} }
+ },
+
+ { /* BSQ_ALLOCATION */
+ 0x07, 0x05,
+ { { CTR_BPU_0, MSR_P4_BSU_ESCR0},
+ { 0, 0 } }
+ },
+
+ { /* BSQ_ACTIVE_ENTRIES */
+ 0x07, 0x06,
+ { { CTR_BPU_2, MSR_P4_BSU_ESCR1 /* guess */},
+ { 0, 0 } }
+ },
+
+ { /* X87_ASSIST */
+ 0x05, 0x03,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* SSE_INPUT_ASSIST */
+ 0x01, 0x34,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* PACKED_SP_UOP */
+ 0x01, 0x08,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* PACKED_DP_UOP */
+ 0x01, 0x0c,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* SCALAR_SP_UOP */
+ 0x01, 0x0a,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* SCALAR_DP_UOP */
+ 0x01, 0x0e,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* 64BIT_MMX_UOP */
+ 0x01, 0x02,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* 128BIT_MMX_UOP */
+ 0x01, 0x1a,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* X87_FP_UOP */
+ 0x01, 0x04,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* X87_SIMD_MOVES_UOP */
+ 0x01, 0x2e,
+ { { CTR_FLAME_0, MSR_P4_FIRM_ESCR0},
+ { CTR_FLAME_2, MSR_P4_FIRM_ESCR1} }
+ },
+
+ { /* MACHINE_CLEAR */
+ 0x05, 0x02,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* GLOBAL_POWER_EVENTS */
+ 0x06, 0x13 /* older manual says 0x05, newer 0x13 */,
+ { { CTR_BPU_0, MSR_P4_FSB_ESCR0},
+ { CTR_BPU_2, MSR_P4_FSB_ESCR1} }
+ },
+
+ { /* TC_MS_XFER */
+ 0x00, 0x05,
+ { { CTR_MS_0, MSR_P4_MS_ESCR0},
+ { CTR_MS_2, MSR_P4_MS_ESCR1} }
+ },
+
+ { /* UOP_QUEUE_WRITES */
+ 0x00, 0x09,
+ { { CTR_MS_0, MSR_P4_MS_ESCR0},
+ { CTR_MS_2, MSR_P4_MS_ESCR1} }
+ },
+
+ { /* FRONT_END_EVENT */
+ 0x05, 0x08,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* EXECUTION_EVENT */
+ 0x05, 0x0c,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* REPLAY_EVENT */
+ 0x05, 0x09,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR2},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR3} }
+ },
+
+ { /* INSTR_RETIRED */
+ 0x04, 0x02,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* UOPS_RETIRED */
+ 0x04, 0x01,
+ { { CTR_IQ_4, MSR_P4_CRU_ESCR0},
+ { CTR_IQ_5, MSR_P4_CRU_ESCR1} }
+ },
+
+ { /* UOP_TYPE */
+ 0x02, 0x02,
+ { { CTR_IQ_4, MSR_P4_RAT_ESCR0},
+ { CTR_IQ_5, MSR_P4_RAT_ESCR1} }
+ },
+
+ { /* RETIRED_MISPRED_BRANCH_TYPE */
+ 0x02, 0x05,
+ { { CTR_MS_0, MSR_P4_TBPU_ESCR0},
+ { CTR_MS_2, MSR_P4_TBPU_ESCR1} }
+ },
+
+ { /* RETIRED_BRANCH_TYPE */
+ 0x02, 0x04,
+ { { CTR_MS_0, MSR_P4_TBPU_ESCR0},
+ { CTR_MS_2, MSR_P4_TBPU_ESCR1} }
+ }
+};
+
+
+#define MISC_PMC_ENABLED_P(x) ((x) & 1 << 7)
+
+#define ESCR_RESERVED_BITS 0x80000003
+#define ESCR_CLEAR(escr) ((escr) &= ESCR_RESERVED_BITS)
+#define ESCR_SET_USR_0(escr, usr) ((escr) |= (((usr) & 1) << 2))
+#define ESCR_SET_OS_0(escr, os) ((escr) |= (((os) & 1) << 3))
+#define ESCR_SET_USR_1(escr, usr) ((escr) |= (((usr) & 1)))
+#define ESCR_SET_OS_1(escr, os) ((escr) |= (((os) & 1) << 1))
+#define ESCR_SET_EVENT_SELECT(escr, sel) ((escr) |= (((sel) & 0x3f) << 25))
+#define ESCR_SET_EVENT_MASK(escr, mask) ((escr) |= (((mask) & 0xffff) << 9))
+#define ESCR_READ(escr,high,ev,i) do {rdmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0)
+#define ESCR_WRITE(escr,high,ev,i) do {wrmsr(ev->bindings[(i)].escr_address, (escr), (high));} while (0)
+
+#define CCCR_RESERVED_BITS 0x38030FFF
+#define CCCR_CLEAR(cccr) ((cccr) &= CCCR_RESERVED_BITS)
+#define CCCR_SET_REQUIRED_BITS(cccr) ((cccr) |= 0x00030000)
+#define CCCR_SET_ESCR_SELECT(cccr, sel) ((cccr) |= (((sel) & 0x07) << 13))
+#define CCCR_SET_PMI_OVF_0(cccr) ((cccr) |= (1<<26))
+#define CCCR_SET_PMI_OVF_1(cccr) ((cccr) |= (1<<27))
+#define CCCR_SET_ENABLE(cccr) ((cccr) |= (1<<12))
+#define CCCR_SET_DISABLE(cccr) ((cccr) &= ~(1<<12))
+#define CCCR_READ(low, high, i) do {rdmsr(p4_counters[(i)].cccr_address, (low), (high));} while (0)
+#define CCCR_WRITE(low, high, i) do {wrmsr(p4_counters[(i)].cccr_address, (low), (high));} while (0)
+#define CCCR_OVF_P(cccr) ((cccr) & (1U<<31))
+#define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31)))
+
+#define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0)
+#define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0)
+#define CTR_OVERFLOW_P(ctr) (!((ctr) & 0x80000000))
+
+
+/* this assigns a "stagger" to the current CPU, which is used throughout
+ the code in this module as an extra array offset, to select the "even"
+ or "odd" part of all the divided resources. */
+static unsigned int get_stagger(void)
+{
+#ifdef CONFIG_SMP
+ /*int cpu = smp_processor_id();
+ return (cpu != first_cpu(cpu_sibling_map[cpu]));*/
+ /* We want the two logical cpus of a physical cpu to use
+ disjoint set of counters. The following code is wrong. */
+ return 0;
+#endif
+ return 0;
+}
+
+
+/* finally, mediate access to a real hardware counter
+ by passing a "virtual" counter numer to this macro,
+ along with your stagger setting. */
+#define VIRT_CTR(stagger, i) ((i) + ((num_counters) * (stagger)))
+
+static unsigned long reset_value[NUM_COUNTERS_NON_HT];
+
+
+static void p4_fill_in_addresses(struct op_msrs * const msrs)
+{
+ unsigned int i;
+ unsigned int addr, stag;
+
+ setup_num_counters();
+ stag = get_stagger();
+
+ /* the counter registers we pay attention to */
+ for (i = 0; i < num_counters; ++i) {
+ msrs->counters[i].addr =
+ p4_counters[VIRT_CTR(stag, i)].counter_address;
+ }
+
+ /* FIXME: bad feeling, we don't save the 10 counters we don't use. */
+
+ /* 18 CCCR registers */
+ for (i = 0, addr = MSR_P4_BPU_CCCR0 + stag;
+ addr <= MSR_P4_IQ_CCCR5; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* 43 ESCR registers in three or four discontiguous group */
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
+ * to avoid special case in nmi_{save|restore}_registers() */
+ if (boot_cpu_data.x86_model >= 0x3) {
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ } else {
+ for (addr = MSR_P4_IQ_ESCR0 + stag;
+ addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ for (addr = MSR_P4_MS_ESCR0 + stag;
+ addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ for (addr = MSR_P4_IX_ESCR0 + stag;
+ addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* there are 2 remaining non-contiguously located ESCRs */
+
+ if (num_counters == NUM_COUNTERS_NON_HT) {
+ /* standard non-HT CPUs handle both remaining ESCRs*/
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+
+ } else if (stag == 0) {
+ /* HT CPUs give the first remainder to the even thread, as
+ the 32nd control register */
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+
+ } else {
+ /* and two copies of the second to the odd thread,
+ for the 22st and 23nd control registers */
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+ }
+}
+
+
+static void pmc_setup_one_p4_counter(unsigned int ctr)
+{
+ int i;
+ int const maxbind = 2;
+ unsigned int cccr = 0;
+ unsigned int escr = 0;
+ unsigned int high = 0;
+ unsigned int counter_bit;
+ struct p4_event_binding *ev = NULL;
+ unsigned int stag;
+
+ stag = get_stagger();
+
+ /* convert from counter *number* to counter *bit* */
+ counter_bit = 1 << VIRT_CTR(stag, ctr);
+
+ /* find our event binding structure. */
+ if (counter_config[ctr].event <= 0 || counter_config[ctr].event > NUM_EVENTS) {
+ printk(KERN_ERR
+ "oprofile: P4 event code 0x%lx out of range\n",
+ counter_config[ctr].event);
+ return;
+ }
+
+ ev = &(p4_events[counter_config[ctr].event - 1]);
+
+ for (i = 0; i < maxbind; i++) {
+ if (ev->bindings[i].virt_counter & counter_bit) {
+
+ /* modify ESCR */
+ ESCR_READ(escr, high, ev, i);
+ ESCR_CLEAR(escr);
+ if (stag == 0) {
+ ESCR_SET_USR_0(escr, counter_config[ctr].user);
+ ESCR_SET_OS_0(escr, counter_config[ctr].kernel);
+ } else {
+ ESCR_SET_USR_1(escr, counter_config[ctr].user);
+ ESCR_SET_OS_1(escr, counter_config[ctr].kernel);
+ }
+ ESCR_SET_EVENT_SELECT(escr, ev->event_select);
+ ESCR_SET_EVENT_MASK(escr, counter_config[ctr].unit_mask);
+ ESCR_WRITE(escr, high, ev, i);
+
+ /* modify CCCR */
+ CCCR_READ(cccr, high, VIRT_CTR(stag, ctr));
+ CCCR_CLEAR(cccr);
+ CCCR_SET_REQUIRED_BITS(cccr);
+ CCCR_SET_ESCR_SELECT(cccr, ev->escr_select);
+ if (stag == 0) {
+ CCCR_SET_PMI_OVF_0(cccr);
+ } else {
+ CCCR_SET_PMI_OVF_1(cccr);
+ }
+ CCCR_WRITE(cccr, high, VIRT_CTR(stag, ctr));
+ return;
+ }
+ }
+
+ printk(KERN_ERR
+ "oprofile: P4 event code 0x%lx no binding, stag %d ctr %d\n",
+ counter_config[ctr].event, stag, ctr);
+}
+
+
+static void p4_setup_ctrs(struct op_msrs const * const msrs)
+{
+ unsigned int i;
+ unsigned int low, high;
+ unsigned int addr;
+ unsigned int stag;
+
+ stag = get_stagger();
+
+ rdmsr(MSR_IA32_MISC_ENABLE, low, high);
+ if (! MISC_PMC_ENABLED_P(low)) {
+ printk(KERN_ERR "oprofile: P4 PMC not available\n");
+ return;
+ }
+
+ /* clear the cccrs we will use */
+ for (i = 0 ; i < num_counters ; i++) {
+ rdmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
+ CCCR_CLEAR(low);
+ CCCR_SET_REQUIRED_BITS(low);
+ wrmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
+ }
+
+ /* clear cccrs outside our concern */
+ for (i = stag ; i < NUM_UNUSED_CCCRS ; i += addr_increment()) {
+ rdmsr(p4_unused_cccr[i], low, high);
+ CCCR_CLEAR(low);
+ CCCR_SET_REQUIRED_BITS(low);
+ wrmsr(p4_unused_cccr[i], low, high);
+ }
+
+ /* clear all escrs (including those outside our concern) */
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) {
+ wrmsr(addr, 0, 0);
+ }
+
+ /* On older models clear also MSR_P4_IQ_ESCR0/1 */
+ if (boot_cpu_data.x86_model < 0x3) {
+ wrmsr(MSR_P4_IQ_ESCR0, 0, 0);
+ wrmsr(MSR_P4_IQ_ESCR1, 0, 0);
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ wrmsr(addr, 0, 0);
+ }
+
+ for (addr = MSR_P4_MS_ESCR0 + stag;
+ addr <= MSR_P4_TC_ESCR1; addr += addr_increment()){
+ wrmsr(addr, 0, 0);
+ }
+
+ for (addr = MSR_P4_IX_ESCR0 + stag;
+ addr <= MSR_P4_CRU_ESCR3; addr += addr_increment()){
+ wrmsr(addr, 0, 0);
+ }
+
+ if (num_counters == NUM_COUNTERS_NON_HT) {
+ wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
+ wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
+ } else if (stag == 0) {
+ wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
+ } else {
+ wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
+ }
+
+ /* setup all counters */
+ for (i = 0 ; i < num_counters ; ++i) {
+ if (counter_config[i].enabled) {
+ reset_value[i] = counter_config[i].count;
+ pmc_setup_one_p4_counter(i);
+ CTR_WRITE(counter_config[i].count, VIRT_CTR(stag, i));
+ } else {
+ reset_value[i] = 0;
+ }
+ }
+}
+
+
+extern void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event);
+extern int is_profiled(struct domain * d);
+extern struct domain * primary_profiler;
+
+static int p4_check_ctrs(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs)
+{
+ unsigned long ctr, low, high, stag, real;
+ int i, ovf = 0;
+ unsigned long eip = regs->eip;
+ int mode = 0;
+
+ if (RING_1(regs))
+ mode = 1;
+ else if (RING_0(regs))
+ mode = 2;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ if (!reset_value[i])
+ continue;
+
+ /*
+ * there is some eccentricity in the hardware which
+ * requires that we perform 2 extra corrections:
+ *
+ * - check both the CCCR:OVF flag for overflow and the
+ * counter high bit for un-flagged overflows.
+ *
+ * - write the counter back twice to ensure it gets
+ * updated properly.
+ *
+ * the former seems to be related to extra NMIs happening
+ * during the current NMI; the latter is reported as errata
+ * N15 in intel doc 249199-029, pentium 4 specification
+ * update, though their suggested work-around does not
+ * appear to solve the problem.
+ */
+
+ real = VIRT_CTR(stag, i);
+
+ CCCR_READ(low, high, real);
+ CTR_READ(ctr, high, real);
+ if (CCCR_OVF_P(low) || CTR_OVERFLOW_P(ctr)) {
+ pmc_log_event(current->domain, eip, mode, i);
+ CTR_WRITE(reset_value[i], real);
+ CCCR_CLEAR_OVF(low);
+ CCCR_WRITE(low, high, real);
+ CTR_WRITE(reset_value[i], real);
+ ovf = 1;
+ }
+ }
+
+ /* P4 quirk: you have to re-unmask the apic vector */
+ apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
+ /* See op_model_ppro.c */
+ return ovf;
+}
+
+
+static void p4_start(struct op_msrs const * const msrs)
+{
+ unsigned int low, high, stag;
+ int i;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ if (!reset_value[i])
+ continue;
+ CCCR_READ(low, high, VIRT_CTR(stag, i));
+ CCCR_SET_ENABLE(low);
+ CCCR_WRITE(low, high, VIRT_CTR(stag, i));
+ }
+}
+
+
+static void p4_stop(struct op_msrs const * const msrs)
+{
+ unsigned int low, high, stag;
+ int i;
+
+ stag = get_stagger();
+
+ for (i = 0; i < num_counters; ++i) {
+ CCCR_READ(low, high, VIRT_CTR(stag, i));
+ CCCR_SET_DISABLE(low);
+ CCCR_WRITE(low, high, VIRT_CTR(stag, i));
+ }
+}
+
+
+#ifdef CONFIG_SMP
+struct op_x86_model_spec const op_p4_ht2_spec = {
+ .num_counters = NUM_COUNTERS_HT2,
+ .num_controls = NUM_CONTROLS_HT2,
+ .fill_in_addresses = &p4_fill_in_addresses,
+ .setup_ctrs = &p4_setup_ctrs,
+ .check_ctrs = &p4_check_ctrs,
+ .start = &p4_start,
+ .stop = &p4_stop
+};
+#endif
+
+struct op_x86_model_spec const op_p4_spec = {
+ .num_counters = NUM_COUNTERS_NON_HT,
+ .num_controls = NUM_CONTROLS_NON_HT,
+ .fill_in_addresses = &p4_fill_in_addresses,
+ .setup_ctrs = &p4_setup_ctrs,
+ .check_ctrs = &p4_check_ctrs,
+ .start = &p4_start,
+ .stop = &p4_stop
+};
diff -Naur xen-unstable/xen/arch/x86/oprofile/op_model_ppro.c xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_model_ppro.c
--- xen-unstable/xen/arch/x86/oprofile/op_model_ppro.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_model_ppro.c 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,166 @@
+/**
+ * @file op_model_ppro.h
+ * pentium pro / P6 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author John Levon
+ * @author Philippe Elie
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#include <xen/types.h>
+#include <asm/msr.h>
+#include <asm/io.h>
+#include <asm/apic.h>
+#include <asm/processor.h>
+#include <xen/sched.h>
+
+#include "op_x86_model.h"
+#include "op_counter.h"
+
+#define NUM_COUNTERS 2
+#define NUM_CONTROLS 2
+
+#define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
+#define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0)
+#define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
+
+#define CTRL_READ(l,h,msrs,c) do {rdmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
+#define CTRL_WRITE(l,h,msrs,c) do {wrmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
+#define CTRL_SET_ACTIVE(n) (n |= (1<<22))
+#define CTRL_SET_INACTIVE(n) (n &= ~(1<<22))
+#define CTRL_CLEAR(x) (x &= (1<<21))
+#define CTRL_SET_ENABLE(val) (val |= 1<<20)
+#define CTRL_SET_USR(val,u) (val |= ((u & 1) << 16))
+#define CTRL_SET_KERN(val,k) (val |= ((k & 1) << 17))
+#define CTRL_SET_UM(val, m) (val |= (m << 8))
+#define CTRL_SET_EVENT(val, e) (val |= e)
+
+static unsigned long reset_value[NUM_COUNTERS];
+
+static void ppro_fill_in_addresses(struct op_msrs * const msrs)
+{
+ msrs->counters[0].addr = MSR_P6_PERFCTR0;
+ msrs->counters[1].addr = MSR_P6_PERFCTR1;
+
+ msrs->controls[0].addr = MSR_P6_EVNTSEL0;
+ msrs->controls[1].addr = MSR_P6_EVNTSEL1;
+}
+
+
+static void ppro_setup_ctrs(struct op_msrs const * const msrs)
+{
+ unsigned int low, high;
+ int i;
+
+ /* clear all counters */
+ for (i = 0 ; i < NUM_CONTROLS; ++i) {
+ CTRL_READ(low, high, msrs, i);
+ CTRL_CLEAR(low);
+ CTRL_WRITE(low, high, msrs, i);
+ }
+
+ /* avoid a false detection of ctr overflows in NMI handler */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ CTR_WRITE(1, msrs, i);
+ }
+
+ /* enable active counters */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ if (counter_config[i].enabled) {
+ reset_value[i] = counter_config[i].count;
+
+ CTR_WRITE(counter_config[i].count, msrs, i);
+
+ CTRL_READ(low, high, msrs, i);
+ CTRL_CLEAR(low);
+ CTRL_SET_ENABLE(low);
+ CTRL_SET_USR(low, counter_config[i].user);
+ CTRL_SET_KERN(low, counter_config[i].kernel);
+ CTRL_SET_UM(low, counter_config[i].unit_mask);
+ CTRL_SET_EVENT(low, counter_config[i].event);
+ CTRL_WRITE(low, high, msrs, i);
+ }
+ }
+}
+
+extern void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event);
+extern int is_profiled(struct domain * d);
+extern struct domain * primary_profiler;
+
+static int ppro_check_ctrs(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs)
+{
+ unsigned int low, high;
+ int i, ovf = 0;
+ unsigned long eip = regs->eip;
+ int mode = 0;
+
+ if (RING_1(regs))
+ mode = 1;
+ else if (RING_0(regs))
+ mode = 2;
+
+ for (i = 0 ; i < NUM_COUNTERS; ++i) {
+ CTR_READ(low, high, msrs, i);
+ if (CTR_OVERFLOWED(low)) {
+ pmc_log_event(current->domain, eip, mode, i);
+ CTR_WRITE(reset_value[i], msrs, i);
+ ovf = 1;
+ }
+ }
+
+ /* Only P6 based Pentium M need to re-unmask the apic vector but it
+ * doesn't hurt other P6 variant */
+ apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
+ /* We can't work out if we really handled an interrupt. We
+ * might have caught a *second* counter just after overflowing
+ * the interrupt for this counter then arrives
+ * and we don't find a counter that's overflowed, so we
+ * would return 0 and get dazed + confused. Instead we always
+ * assume we found an overflow. This sucks.
+ */
+ return ovf;
+}
+
+
+static void ppro_start(struct op_msrs const * const msrs)
+{
+ unsigned int low,high;
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_ACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+}
+
+static void ppro_stop(struct op_msrs const * const msrs)
+{
+ unsigned int low,high;
+ CTRL_READ(low, high, msrs, 0);
+ CTRL_SET_INACTIVE(low);
+ CTRL_WRITE(low, high, msrs, 0);
+}
+
+unsigned int read_ctr(struct op_msrs const * const msrs, int i)
+{
+ unsigned int low, high;
+ CTR_READ(low, high, msrs, i);
+ return low;
+}
+
+struct op_x86_model_spec const op_ppro_spec = {
+ .num_counters = NUM_COUNTERS,
+ .num_controls = NUM_CONTROLS,
+ .fill_in_addresses = &ppro_fill_in_addresses,
+ .setup_ctrs = &ppro_setup_ctrs,
+ .check_ctrs = &ppro_check_ctrs,
+ .start = &ppro_start,
+ .stop = &ppro_stop
+};
diff -Naur xen-unstable/xen/arch/x86/oprofile/op_x86_model.h xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_x86_model.h
--- xen-unstable/xen/arch/x86/oprofile/op_x86_model.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/op_x86_model.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,55 @@
+/**
+ * @file op_x86_model.h
+ * interface to x86 model-specific MSR operations
+ *
+ * @remark Copyright 2002 OProfile authors
+ * @remark Read the file COPYING
+ *
+ * @author Graydon Hoare
+ *
+ * Modified by Aravind Menon for Xen
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ */
+
+#ifndef OP_X86_MODEL_H
+#define OP_X86_MODEL_H
+
+struct op_saved_msr {
+ unsigned int high;
+ unsigned int low;
+};
+
+struct op_msr {
+ unsigned long addr;
+ struct op_saved_msr saved;
+};
+
+struct op_msrs {
+ struct op_msr * counters;
+ struct op_msr * controls;
+};
+
+struct pt_regs;
+
+/* The model vtable abstracts the differences between
+ * various x86 CPU model's perfctr support.
+ */
+struct op_x86_model_spec {
+ unsigned int const num_counters;
+ unsigned int const num_controls;
+ void (*fill_in_addresses)(struct op_msrs * const msrs);
+ void (*setup_ctrs)(struct op_msrs const * const msrs);
+ int (*check_ctrs)(unsigned int const cpu,
+ struct op_msrs const * const msrs,
+ struct cpu_user_regs * const regs);
+ void (*start)(struct op_msrs const * const msrs);
+ void (*stop)(struct op_msrs const * const msrs);
+};
+
+extern struct op_x86_model_spec const op_ppro_spec;
+extern struct op_x86_model_spec const op_p4_spec;
+extern struct op_x86_model_spec const op_p4_ht2_spec;
+extern struct op_x86_model_spec const op_athlon_spec;
+
+#endif /* OP_X86_MODEL_H */
diff -Naur xen-unstable/xen/arch/x86/oprofile/pmc.c xen-unstable-prof-1.2/xen/arch/x86/oprofile/pmc.c
--- xen-unstable/xen/arch/x86/oprofile/pmc.c 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/arch/x86/oprofile/pmc.c 2005-06-03 13:38:12.000000000 -0700
@@ -0,0 +1,295 @@
+/*
+ * Copyright (C) 2005 Hewlett-Packard Co.
+ * written by Aravind Menon, email: xenoprof@groups.hp.com
+ */
+
+#include <xen/sched.h>
+
+#include "op_counter.h"
+
+int active_domains[MAX_OPROF_DOMAINS];
+int passive_domains[MAX_OPROF_DOMAINS];
+unsigned int adomains = 0;
+unsigned int pdomains = 0;
+unsigned int activated = 0;
+
+#define VIRQ_BITMASK_SIZE (MAX_OPROF_DOMAINS/32 + 1)
+
+struct domain * primary_profiler = NULL;
+struct domain * adomain_ptrs[MAX_OPROF_DOMAINS];
+unsigned int virq_ovf_pending[VIRQ_BITMASK_SIZE];
+
+int is_active(struct domain *d)
+{
+ int i;
+ for (i = 0; i < adomains; i++)
+ if (d->domain_id == active_domains[i])
+ return 1;
+ return 0;
+}
+
+int active_id(struct domain *d)
+{
+ int i;
+ for (i = 0; i < adomains; i++)
+ if (d == adomain_ptrs[i])
+ return i;
+ return -1;
+}
+
+void free_adomain_ptrs()
+{
+ int i;
+ int num = adomains;
+
+ adomains = 0;
+ for (i = 0; i < VIRQ_BITMASK_SIZE; i++)
+ virq_ovf_pending[i] = 0;
+
+ for (i = 0; i < num; i++) {
+ put_domain(adomain_ptrs[i]);
+ adomain_ptrs[i] = NULL;
+ }
+}
+
+int set_adomain_ptrs(int num)
+{
+ int i;
+ struct domain *d;
+
+ for (i = 0; i < VIRQ_BITMASK_SIZE; i++)
+ virq_ovf_pending[i] = 0;
+
+ for (i = 0; i < num; i++) {
+ d = find_domain_by_id(active_domains[i]);
+ if (!d) {
+ free_adomain_ptrs();
+ return -EFAULT;
+ }
+ adomain_ptrs[i] = d;
+ adomains++;
+ }
+ return 0;
+}
+
+int set_active(struct domain *d)
+{
+ if (is_active(d))
+ return 0;
+ /* hack if we run out of space */
+ if (adomains >= MAX_OPROF_DOMAINS) {
+ adomains--;
+ put_domain(adomain_ptrs[adomains]);
+ }
+ active_domains[adomains] = d->domain_id;
+ if (get_domain(d))
+ adomain_ptrs[adomains++] = d;
+ else {
+ free_adomain_ptrs();
+ return -EFAULT;
+ }
+ return 0;
+}
+
+int is_passive(struct domain *d)
+{
+ int i;
+ for (i = 0; i < pdomains; i++)
+ if (d->domain_id == passive_domains[i])
+ return 1;
+ return 0;
+}
+
+int is_profiled(struct domain *d)
+{
+ if (is_active(d) || is_passive(d))
+ return 1;
+ return 0;
+}
+
+void pmc_log_event(struct domain *d, unsigned int eip, int mode, int event)
+{
+ shared_info_t *s = NULL;
+ struct domain *dest = d;
+ int head;
+ int tail;
+
+ if (!is_profiled(d))
+ return;
+
+ if (!is_passive(d)) {
+ s = dest->shared_info;
+ head = s->event_head;
+ tail = s->event_tail;
+ if ((head == tail - 1) ||
+ (head == MAX_OPROF_EVENTS - 1 && tail == 0)) {
+ s->losing_samples = 1;
+ s->samples_lost++;
+ }
+ else {
+ s->event_log[head].eip = eip;
+ s->event_log[head].mode = mode;
+ s->event_log[head].event = event;
+ head++;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+ s->event_head = head;
+ }
+ }
+ /* passive domains */
+ else {
+ dest = primary_profiler;
+ s = dest->shared_info;
+ head = s->event_head;
+ tail = s->event_tail;
+
+ /* We use the following inefficient format for logging
+ events from other domains. We put a special record
+ indicating that the next record is for another domain.
+ This is done for each sample from another domain */
+
+ head = s->event_head;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+ /* for passive domains we need to have at least two
+ entries empty in the buffer */
+ if ((head == tail - 1) ||
+ (head == tail - 2) ||
+ (head == MAX_OPROF_EVENTS - 1 && tail <= 1) ||
+ (head == MAX_OPROF_EVENTS - 2 && tail == 0) ) {
+ s->losing_samples = 1;
+ s->samples_lost++;
+ }
+ else {
+ s->event_log[head].eip = ~1UL;
+ s->event_log[head].mode = ~0;
+ s->event_log[head].event = d->domain_id;
+ head++;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+ s->event_log[head].eip = eip;
+ s->event_log[head].mode = mode;
+ s->event_log[head].event = event;
+ head++;
+ if (head >= MAX_OPROF_EVENTS)
+ head = 0;
+ s->event_head = head;
+ }
+ }
+}
+
+static void pmc_event_init(struct domain *d)
+{
+ shared_info_t *s = d->shared_info;
+ s->event_head = 0;
+ s->event_tail = 0;
+ s->losing_samples = 0;
+ s->samples_lost = 0;
+ s->nmi_restarts = 0;
+ s->active_samples = 0;
+ s->passive_samples = 0;
+ s->other_samples = 0;
+}
+
+extern int nmi_init(int *num_events, int *is_primary);
+extern int nmi_reserve_counters(void);
+extern int nmi_setup_events(void);
+extern int nmi_enable_virq(void);
+extern int nmi_start(void);
+extern void nmi_stop(void);
+extern void nmi_disable_virq(void);
+extern void nmi_release_counters(void);
+
+#define PRIV_OP(op) ((op == PMC_SET_ACTIVE) || (op == PMC_SET_PASSIVE) || (op == PMC_RESERVE_COUNTERS) \
+ || (op == PMC_SETUP_EVENTS) || (op == PMC_START) || (op == PMC_STOP) \
+ || (op == PMC_RELEASE_COUNTERS) || (op == PMC_SHUTDOWN))
+
+int do_pmc_op(int op, unsigned int arg1, unsigned int arg2)
+{
+ int ret = 0;
+
+ if (PRIV_OP(op) && current->domain != primary_profiler)
+ return -EPERM;
+
+ switch (op) {
+ case PMC_INIT:
+ ret = nmi_init((int *)arg1, (int *)arg2);
+ break;
+
+ case PMC_SET_ACTIVE:
+ if (adomains != 0)
+ return -EPERM;
+ if (copy_from_user((void *)&active_domains,
+ (void *)arg1, arg2*sizeof(int)))
+ return -EFAULT;
+ if (set_adomain_ptrs(arg2))
+ return -EFAULT;
+ if (set_active(current->domain))
+ return -EFAULT;
+ break;
+
+ case PMC_SET_PASSIVE:
+ if (pdomains != 0)
+ return -EPERM;
+ if (copy_from_user((void *)&passive_domains,
+ (void *)arg1, arg2*sizeof(int)))
+ return -EFAULT;
+ pdomains = arg2;
+ break;
+
+ case PMC_RESERVE_COUNTERS:
+ ret = nmi_reserve_counters();
+ break;
+
+ case PMC_SETUP_EVENTS:
+ if (copy_from_user((void *)&counter_config,
+ (void *)arg1, arg2*sizeof(struct op_counter_config)))
+ return -EFAULT;
+ ret = nmi_setup_events();
+ break;
+
+ case PMC_ENABLE_VIRQ:
+ if (!is_active(current->domain)) {
+ if (current->domain != primary_profiler)
+ return -EPERM;
+ else
+ set_active(current->domain);
+ }
+ ret = nmi_enable_virq();
+ pmc_event_init(current->domain);
+ activated++;
+ break;
+
+ case PMC_START:
+ if (activated < adomains)
+ return -EPERM;
+ ret = nmi_start();
+ break;
+
+ case PMC_STOP:
+ nmi_stop();
+ break;
+
+ case PMC_DISABLE_VIRQ:
+ if (!is_active(current->domain))
+ return -EPERM;
+ nmi_disable_virq();
+ activated--;
+ break;
+
+ case PMC_RELEASE_COUNTERS:
+ nmi_release_counters();
+ break;
+
+ case PMC_SHUTDOWN:
+ free_adomain_ptrs();
+ pdomains = 0;
+ activated = 0;
+ primary_profiler = NULL;
+ break;
+
+ default:
+ ret = -EINVAL;
+ }
+ return ret;
+}
diff -Naur xen-unstable/xen/arch/x86/traps.c xen-unstable-prof-1.2/xen/arch/x86/traps.c
--- xen-unstable/xen/arch/x86/traps.c 2005-05-22 20:12:48.000000000 -0700
+++ xen-unstable-prof-1.2/xen/arch/x86/traps.c 2005-06-03 14:05:51.000000000 -0700
@@ -2,6 +2,10 @@
* arch/x86/traps.c
*
* Modifications to Linux original are copyright (c) 2002-2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -53,6 +57,7 @@
#include <asm/debugger.h>
#include <asm/msr.h>
#include <asm/x86_emulate.h>
+#include <asm/nmi.h>
/*
* opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
@@ -1026,7 +1031,7 @@
printk("Do you have a strange power saving mode enabled?\n");
}
-asmlinkage void do_nmi(struct cpu_user_regs *regs, unsigned long reason)
+static void default_do_nmi(struct cpu_user_regs * regs, unsigned long reason)
{
++nmi_count(smp_processor_id());
@@ -1041,6 +1046,35 @@
unknown_nmi_error((unsigned char)(reason&0xff));
}
+static int dummy_nmi_callback(struct cpu_user_regs * regs, int cpu)
+{
+ return 0;
+}
+
+static nmi_callback_t nmi_callback = dummy_nmi_callback;
+
+asmlinkage void do_nmi(struct cpu_user_regs * regs, unsigned long reason)
+{
+ int cpu;
+ cpu = smp_processor_id();
+
+ if (!nmi_callback(regs, cpu))
+ default_do_nmi(regs, reason);
+}
+
+void set_nmi_callback(nmi_callback_t callback)
+{
+ nmi_callback = callback;
+}
+
+void unset_nmi_callback(void)
+{
+ nmi_callback = dummy_nmi_callback;
+}
+
+EXPORT_SYMBOL(set_nmi_callback);
+EXPORT_SYMBOL(unset_nmi_callback);
+
asmlinkage int math_state_restore(struct cpu_user_regs *regs)
{
/* Prevent recursion. */
diff -Naur xen-unstable/xen/arch/x86/x86_32/entry.S xen-unstable-prof-1.2/xen/arch/x86/x86_32/entry.S
--- xen-unstable/xen/arch/x86/x86_32/entry.S 2005-05-22 20:12:52.000000000 -0700
+++ xen-unstable-prof-1.2/xen/arch/x86/x86_32/entry.S 2005-05-27 10:08:27.000000000 -0700
@@ -749,6 +749,7 @@
.long do_boot_vcpu
.long do_ni_hypercall /* 25 */
.long do_mmuext_op
+ .long do_pmc_op
.rept NR_hypercalls-((.-hypercall_table)/4)
.long do_ni_hypercall
.endr
diff -Naur xen-unstable/xen/include/asm-x86/msr.h xen-unstable-prof-1.2/xen/include/asm-x86/msr.h
--- xen-unstable/xen/include/asm-x86/msr.h 2005-05-22 20:12:52.000000000 -0700
+++ xen-unstable-prof-1.2/xen/include/asm-x86/msr.h 2005-05-24 11:54:47.000000000 -0700
@@ -191,6 +191,89 @@
#define MSR_P6_EVNTSEL0 0x186
#define MSR_P6_EVNTSEL1 0x187
+/* Pentium IV performance counter MSRs */
+#define MSR_P4_BPU_PERFCTR0 0x300
+#define MSR_P4_BPU_PERFCTR1 0x301
+#define MSR_P4_BPU_PERFCTR2 0x302
+#define MSR_P4_BPU_PERFCTR3 0x303
+#define MSR_P4_MS_PERFCTR0 0x304
+#define MSR_P4_MS_PERFCTR1 0x305
+#define MSR_P4_MS_PERFCTR2 0x306
+#define MSR_P4_MS_PERFCTR3 0x307
+#define MSR_P4_FLAME_PERFCTR0 0x308
+#define MSR_P4_FLAME_PERFCTR1 0x309
+#define MSR_P4_FLAME_PERFCTR2 0x30a
+#define MSR_P4_FLAME_PERFCTR3 0x30b
+#define MSR_P4_IQ_PERFCTR0 0x30c
+#define MSR_P4_IQ_PERFCTR1 0x30d
+#define MSR_P4_IQ_PERFCTR2 0x30e
+#define MSR_P4_IQ_PERFCTR3 0x30f
+#define MSR_P4_IQ_PERFCTR4 0x310
+#define MSR_P4_IQ_PERFCTR5 0x311
+#define MSR_P4_BPU_CCCR0 0x360
+#define MSR_P4_BPU_CCCR1 0x361
+#define MSR_P4_BPU_CCCR2 0x362
+#define MSR_P4_BPU_CCCR3 0x363
+#define MSR_P4_MS_CCCR0 0x364
+#define MSR_P4_MS_CCCR1 0x365
+#define MSR_P4_MS_CCCR2 0x366
+#define MSR_P4_MS_CCCR3 0x367
+#define MSR_P4_FLAME_CCCR0 0x368
+#define MSR_P4_FLAME_CCCR1 0x369
+#define MSR_P4_FLAME_CCCR2 0x36a
+#define MSR_P4_FLAME_CCCR3 0x36b
+#define MSR_P4_IQ_CCCR0 0x36c
+#define MSR_P4_IQ_CCCR1 0x36d
+#define MSR_P4_IQ_CCCR2 0x36e
+#define MSR_P4_IQ_CCCR3 0x36f
+#define MSR_P4_IQ_CCCR4 0x370
+#define MSR_P4_IQ_CCCR5 0x371
+#define MSR_P4_ALF_ESCR0 0x3ca
+#define MSR_P4_ALF_ESCR1 0x3cb
+#define MSR_P4_BPU_ESCR0 0x3b2
+#define MSR_P4_BPU_ESCR1 0x3b3
+#define MSR_P4_BSU_ESCR0 0x3a0
+#define MSR_P4_BSU_ESCR1 0x3a1
+#define MSR_P4_CRU_ESCR0 0x3b8
+#define MSR_P4_CRU_ESCR1 0x3b9
+#define MSR_P4_CRU_ESCR2 0x3cc
+#define MSR_P4_CRU_ESCR3 0x3cd
+#define MSR_P4_CRU_ESCR4 0x3e0
+#define MSR_P4_CRU_ESCR5 0x3e1
+#define MSR_P4_DAC_ESCR0 0x3a8
+#define MSR_P4_DAC_ESCR1 0x3a9
+#define MSR_P4_FIRM_ESCR0 0x3a4
+#define MSR_P4_FIRM_ESCR1 0x3a5
+#define MSR_P4_FLAME_ESCR0 0x3a6
+#define MSR_P4_FLAME_ESCR1 0x3a7
+#define MSR_P4_FSB_ESCR0 0x3a2
+#define MSR_P4_FSB_ESCR1 0x3a3
+#define MSR_P4_IQ_ESCR0 0x3ba
+#define MSR_P4_IQ_ESCR1 0x3bb
+#define MSR_P4_IS_ESCR0 0x3b4
+#define MSR_P4_IS_ESCR1 0x3b5
+#define MSR_P4_ITLB_ESCR0 0x3b6
+#define MSR_P4_ITLB_ESCR1 0x3b7
+#define MSR_P4_IX_ESCR0 0x3c8
+#define MSR_P4_IX_ESCR1 0x3c9
+#define MSR_P4_MOB_ESCR0 0x3aa
+#define MSR_P4_MOB_ESCR1 0x3ab
+#define MSR_P4_MS_ESCR0 0x3c0
+#define MSR_P4_MS_ESCR1 0x3c1
+#define MSR_P4_PMH_ESCR0 0x3ac
+#define MSR_P4_PMH_ESCR1 0x3ad
+#define MSR_P4_RAT_ESCR0 0x3bc
+#define MSR_P4_RAT_ESCR1 0x3bd
+#define MSR_P4_SAAT_ESCR0 0x3ae
+#define MSR_P4_SAAT_ESCR1 0x3af
+#define MSR_P4_SSU_ESCR0 0x3be
+#define MSR_P4_SSU_ESCR1 0x3bf /* guess: not defined in manual */
+#define MSR_P4_TBPU_ESCR0 0x3c2
+#define MSR_P4_TBPU_ESCR1 0x3c3
+#define MSR_P4_TC_ESCR0 0x3c4
+#define MSR_P4_TC_ESCR1 0x3c5
+#define MSR_P4_U2L_ESCR0 0x3b0
+#define MSR_P4_U2L_ESCR1 0x3b1
/* K7/K8 MSRs. Not complete. See the architecture manual for a more complete list. */
#define MSR_K7_EVNTSEL0 0xC0010000
diff -Naur xen-unstable/xen/include/asm-x86/nmi.h xen-unstable-prof-1.2/xen/include/asm-x86/nmi.h
--- xen-unstable/xen/include/asm-x86/nmi.h 1969-12-31 16:00:00.000000000 -0800
+++ xen-unstable-prof-1.2/xen/include/asm-x86/nmi.h 2005-05-24 11:54:47.000000000 -0700
@@ -0,0 +1,26 @@
+/*
+ * linux/include/asm-i386/nmi.h
+ */
+#ifndef ASM_NMI_H
+#define ASM_NMI_H
+
+struct cpu_user_regs;
+
+typedef int (*nmi_callback_t)(struct cpu_user_regs * regs, int cpu);
+
+/**
+ * set_nmi_callback
+ *
+ * Set a handler for an NMI. Only one handler may be
+ * set. Return 1 if the NMI was handled.
+ */
+void set_nmi_callback(nmi_callback_t callback);
+
+/**
+ * unset_nmi_callback
+ *
+ * Remove the handler previously set.
+ */
+void unset_nmi_callback(void);
+
+#endif /* ASM_NMI_H */
diff -Naur xen-unstable/xen/include/public/xen.h xen-unstable-prof-1.2/xen/include/public/xen.h
--- xen-unstable/xen/include/public/xen.h 2005-05-22 20:12:45.000000000 -0700
+++ xen-unstable-prof-1.2/xen/include/public/xen.h 2005-05-24 11:54:47.000000000 -0700
@@ -4,6 +4,10 @@
* Guest OS interface to Xen.
*
* Copyright (c) 2004, K A Fraser
+ *
+ * Modified by Aravind Menon for supporting oprofile
+ * These modifications are:
+ * Copyright (C) 2005 Hewlett-Packard Co.
*/
#ifndef __XEN_PUBLIC_XEN_H__
@@ -58,6 +62,7 @@
#define __HYPERVISOR_boot_vcpu 24
#define __HYPERVISOR_set_segment_base 25 /* x86/64 only */
#define __HYPERVISOR_mmuext_op 26
+#define __HYPERVISOR_pmc_op 27
/*
* MULTICALLS
@@ -80,6 +85,7 @@
#define VIRQ_DOM_EXC 3 /* (DOM0) Exceptional event for some domain. */
#define VIRQ_PARITY_ERR 4 /* (DOM0) NMI parity error. */
#define VIRQ_IO_ERR 5 /* (DOM0) NMI I/O error. */
+#define VIRQ_PMC_OVF 6 /* PMC Overflow */
#define NR_VIRQS 7
/*
@@ -244,6 +250,21 @@
#define VMASST_TYPE_writable_pagetables 2
#define MAX_VMASST_TYPE 2
+/*
+ * Commands to HYPERVISOR_pmc_op().
+ */
+#define PMC_INIT 0
+#define PMC_SET_ACTIVE 1
+#define PMC_SET_PASSIVE 2
+#define PMC_RESERVE_COUNTERS 3
+#define PMC_SETUP_EVENTS 4
+#define PMC_ENABLE_VIRQ 5
+#define PMC_START 6
+#define PMC_STOP 7
+#define PMC_DISABLE_VIRQ 8
+#define PMC_RELEASE_COUNTERS 9
+#define PMC_SHUTDOWN 10
+
#ifndef __ASSEMBLY__
typedef u16 domid_t;
@@ -299,6 +320,8 @@
/* Support for multi-processor guests. */
#define MAX_VIRT_CPUS 32
+#define MAX_OPROF_EVENTS 32
+#define MAX_OPROF_DOMAINS 25
/*
* Per-VCPU information goes here. This will be cleaned up more when Xen
* actually supports multi-VCPU guests.
@@ -412,6 +435,20 @@
arch_shared_info_t arch;
+ /* Oprofile structures */
+ u8 event_head;
+ u8 event_tail;
+ struct {
+ u32 eip;
+ u8 mode;
+ u8 event;
+ } PACKED event_log[MAX_OPROF_EVENTS];
+ u8 losing_samples;
+ u64 samples_lost;
+ u32 nmi_restarts;
+ u64 active_samples;
+ u64 passive_samples;
+ u64 other_samples;
} PACKED shared_info_t;
/*
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2005-06-04 7:20 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-31 16:37 [PATCH] Xenoprof patches for xen-unstable Santos, Jose Renato G
2005-05-31 17:37 ` Andrew Theurer
-- strict thread matches above, loose matches on Subject: below --
2005-06-04 7:20 Santos, Jose Renato G
2005-05-31 18:22 Santos, Jose Renato G
2005-05-31 18:35 ` Andrew Theurer
2005-05-31 16:35 Santos, Jose Renato G
2005-05-31 17:41 ` Andrew Theurer
2005-05-25 20:35 Santos, Jose Renato G
2005-05-25 21:22 ` Nivedita Singhvi
2005-05-25 19:10 Santos, Jose Renato G
2005-05-25 19:22 ` Nivedita Singhvi
2005-05-31 16:08 ` Andrew Theurer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.