* [PATCH/RFC 1/4]Introduce a new field "guest" in cpustat
[not found] <46C4719A.2060308@bull.net>
@ 2007-08-16 15:57 ` Laurent Vivier
[not found] ` <46C4720F.7030304@bull.net>
1 sibling, 0 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-16 15:57 UTC (permalink / raw)
To: kvm-devel; +Cc: Ingo Molnar, Rusty Russell, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 386 bytes --]
[PATCH 1/4] as modern CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time used by
the CPU to run virtual CPU. Modify /proc/stat to display this new field.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: proc_stat_guest --]
[-- Type: text/plain, Size: 3000 bytes --]
Index: kvm/fs/proc/proc_misc.c
===================================================================
--- kvm.orig/fs/proc/proc_misc.c 2007-08-10 16:49:42.000000000 +0200
+++ kvm/fs/proc/proc_misc.c 2007-08-10 16:51:34.000000000 +0200
@@ -443,6 +443,7 @@
int i;
unsigned long jif;
cputime64_t user, nice, system, idle, iowait, irq, softirq, steal;
+ cputime64_t guest;
u64 sum = 0;
struct timespec boottime;
unsigned int *per_irq_sum;
@@ -452,7 +453,7 @@
return -ENOMEM;
user = nice = system = idle = iowait =
- irq = softirq = steal = cputime64_zero;
+ irq = softirq = steal = guest = cputime64_zero;
getboottime(&boottime);
jif = boottime.tv_sec;
@@ -467,6 +468,7 @@
irq = cputime64_add(irq, kstat_cpu(i).cpustat.irq);
softirq = cputime64_add(softirq, kstat_cpu(i).cpustat.softirq);
steal = cputime64_add(steal, kstat_cpu(i).cpustat.steal);
+ guest = cputime64_add(guest, kstat_cpu(i).cpustat.guest);
for (j = 0; j < NR_IRQS; j++) {
unsigned int temp = kstat_cpu(i).irqs[j];
sum += temp;
@@ -474,7 +476,7 @@
}
}
- seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu\n",
+ seq_printf(p, "cpu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n",
(unsigned long long)cputime64_to_clock_t(user),
(unsigned long long)cputime64_to_clock_t(nice),
(unsigned long long)cputime64_to_clock_t(system),
@@ -482,7 +484,8 @@
(unsigned long long)cputime64_to_clock_t(iowait),
(unsigned long long)cputime64_to_clock_t(irq),
(unsigned long long)cputime64_to_clock_t(softirq),
- (unsigned long long)cputime64_to_clock_t(steal));
+ (unsigned long long)cputime64_to_clock_t(steal),
+ (unsigned long long)cputime64_to_clock_t(guest));
for_each_online_cpu(i) {
/* Copy values here to work around gcc-2.95.3, gcc-2.96 */
@@ -494,7 +497,8 @@
irq = kstat_cpu(i).cpustat.irq;
softirq = kstat_cpu(i).cpustat.softirq;
steal = kstat_cpu(i).cpustat.steal;
- seq_printf(p, "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu\n",
+ guest = kstat_cpu(i).cpustat.guest;
+ seq_printf(p, "cpu%d %llu %llu %llu %llu %llu %llu %llu %llu %llu\n",
i,
(unsigned long long)cputime64_to_clock_t(user),
(unsigned long long)cputime64_to_clock_t(nice),
@@ -503,7 +507,8 @@
(unsigned long long)cputime64_to_clock_t(iowait),
(unsigned long long)cputime64_to_clock_t(irq),
(unsigned long long)cputime64_to_clock_t(softirq),
- (unsigned long long)cputime64_to_clock_t(steal));
+ (unsigned long long)cputime64_to_clock_t(steal),
+ (unsigned long long)cputime64_to_clock_t(guest));
}
seq_printf(p, "intr %llu", (unsigned long long)sum);
Index: kvm/include/linux/kernel_stat.h
===================================================================
--- kvm.orig/include/linux/kernel_stat.h 2007-08-10 16:49:42.000000000 +0200
+++ kvm/include/linux/kernel_stat.h 2007-08-10 16:49:59.000000000 +0200
@@ -23,6 +23,7 @@
cputime64_t idle;
cputime64_t iowait;
cputime64_t steal;
+ cputime64_t guest;
};
struct kernel_stat {
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH/RFC 2/4]Introduce a new field "guest" in task_struct
[not found] ` <46C4720F.7030304@bull.net>
@ 2007-08-16 15:57 ` Laurent Vivier
[not found] ` <46C4725A.4070607@bull.net>
1 sibling, 0 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-16 15:57 UTC (permalink / raw)
To: kvm-devel; +Cc: Ingo Molnar, Rusty Russell, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 334 bytes --]
PATCH 2/4] like for cpustat, introduce the "guest" and "cguest" fields for the
tasks. Modify
signal_struct and task_struct. Modify /proc/<pid>/stat to display these new field
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: proc_task_stat_guest --]
[-- Type: text/plain, Size: 5190 bytes --]
Index: kvm/fs/proc/array.c
===================================================================
--- kvm.orig/fs/proc/array.c 2007-08-16 15:23:52.000000000 +0200
+++ kvm/fs/proc/array.c 2007-08-16 15:36:54.000000000 +0200
@@ -354,6 +354,13 @@
return stime;
}
+static clock_t task_gtime(struct task_struct *p)
+{
+ clock_t gtime = cputime_to_clock_t(p->gtime);
+
+ return gtime;
+}
+
static int do_task_stat(struct task_struct *task, char *buffer, int whole)
{
unsigned long vsize, eip, esp, wchan = ~0UL;
@@ -368,8 +375,8 @@
unsigned long long start_time;
unsigned long cmin_flt = 0, cmaj_flt = 0;
unsigned long min_flt = 0, maj_flt = 0;
- cputime_t cutime, cstime;
- clock_t utime, stime;
+ cputime_t cutime, cstime, cgtime;
+ clock_t utime, stime, gtime;
unsigned long rsslim = 0;
char tcomm[sizeof(task->comm)];
unsigned long flags;
@@ -387,8 +394,8 @@
sigemptyset(&sigign);
sigemptyset(&sigcatch);
- cutime = cstime = cputime_zero;
- utime = stime = 0;
+ cutime = cstime = cgtime = cputime_zero;
+ utime = stime = gtime = 0;
rcu_read_lock();
if (lock_task_sighand(task, &flags)) {
@@ -406,6 +413,7 @@
cmaj_flt = sig->cmaj_flt;
cutime = sig->cutime;
cstime = sig->cstime;
+ cgtime = sig->cgtime;
rsslim = sig->rlim[RLIMIT_RSS].rlim_cur;
/* add up live thread stats at the group level */
@@ -416,6 +424,7 @@
maj_flt += t->maj_flt;
utime += task_utime(t);
stime += task_stime(t);
+ gtime += task_gtime(t);
t = next_thread(t);
} while (t != task);
@@ -423,6 +432,7 @@
maj_flt += sig->maj_flt;
utime += cputime_to_clock_t(sig->utime);
stime += cputime_to_clock_t(sig->stime);
+ gtime += cputime_to_clock_t(sig->gtime);
}
sid = signal_session(sig);
@@ -440,6 +450,7 @@
maj_flt = task->maj_flt;
utime = task_utime(task);
stime = task_stime(task);
+ gtime = task_gtime(task);
}
/* scale priority and nice values from timeslices to -20..20 */
@@ -457,7 +468,7 @@
res = sprintf(buffer, "%d (%s) %c %d %d %d %d %d %u %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu\n",
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld\n",
task->pid,
tcomm,
state,
@@ -502,7 +513,10 @@
task_cpu(task),
task->rt_priority,
task->policy,
- (unsigned long long)delayacct_blkio_ticks(task));
+ (unsigned long long)delayacct_blkio_ticks(task),
+ gtime,
+ cputime_to_clock_t(cgtime)
+ );
if (mm)
mmput(mm);
return res;
Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-16 15:23:52.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-16 15:36:54.000000000 +0200
@@ -514,7 +514,7 @@
* Live threads maintain their own counters and add to these
* in __exit_signal, except for the group leader.
*/
- cputime_t utime, stime, cutime, cstime;
+ cputime_t utime, stime, gtime, cutime, cstime, cgtime;
unsigned long nvcsw, nivcsw, cnvcsw, cnivcsw;
unsigned long min_flt, maj_flt, cmin_flt, cmaj_flt;
unsigned long inblock, oublock, cinblock, coublock;
@@ -1018,7 +1018,7 @@
int __user *clear_child_tid; /* CLONE_CHILD_CLEARTID */
unsigned int rt_priority;
- cputime_t utime, stime;
+ cputime_t utime, stime, gtime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time; /* monotonic time */
struct timespec real_start_time; /* boot based time */
Index: kvm/kernel/exit.c
===================================================================
--- kvm.orig/kernel/exit.c 2007-08-16 15:23:52.000000000 +0200
+++ kvm/kernel/exit.c 2007-08-16 15:36:54.000000000 +0200
@@ -120,6 +120,7 @@
*/
sig->utime = cputime_add(sig->utime, tsk->utime);
sig->stime = cputime_add(sig->stime, tsk->stime);
+ sig->gtime = cputime_add(sig->gtime, tsk->gtime);
sig->min_flt += tsk->min_flt;
sig->maj_flt += tsk->maj_flt;
sig->nvcsw += tsk->nvcsw;
@@ -1255,6 +1256,11 @@
cputime_add(p->stime,
cputime_add(sig->stime,
sig->cstime)));
+ psig->cgtime =
+ cputime_add(psig->cgtime,
+ cputime_add(p->gtime,
+ cputime_add(sig->gtime,
+ sig->cgtime)));
psig->cmin_flt +=
p->min_flt + sig->min_flt + sig->cmin_flt;
psig->cmaj_flt +=
Index: kvm/kernel/fork.c
===================================================================
--- kvm.orig/kernel/fork.c 2007-08-16 15:23:52.000000000 +0200
+++ kvm/kernel/fork.c 2007-08-16 15:39:07.000000000 +0200
@@ -876,7 +876,8 @@
sig->leader = 0; /* session leadership doesn't inherit */
sig->tty_old_pgrp = NULL;
- sig->utime = sig->stime = sig->cutime = sig->cstime = cputime_zero;
+ sig->utime = sig->stime = sig->gtime = sig->cutime =
+ sig->cstime = sig->cgtime = cputime_zero;
sig->nvcsw = sig->nivcsw = sig->cnvcsw = sig->cnivcsw = 0;
sig->min_flt = sig->maj_flt = sig->cmin_flt = sig->cmaj_flt = 0;
sig->inblock = sig->oublock = sig->cinblock = sig->coublock = 0;
@@ -1045,6 +1046,7 @@
p->utime = cputime_zero;
p->stime = cputime_zero;
+ p->gtime = cputime_zero;
#ifdef CONFIG_TASK_XACCT
p->rchar = 0; /* I/O counter: bytes read */
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
[not found] ` <46C4725A.4070607@bull.net>
@ 2007-08-16 15:58 ` Laurent Vivier
2007-08-16 22:39 ` Rusty Russell
[not found] ` <46C472D2.7000702@bull.net>
1 sibling, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-16 15:58 UTC (permalink / raw)
To: kvm-devel; +Cc: Ingo Molnar, Rusty Russell, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 473 bytes --]
[PATCH 3/3] introduce "account modifiers" mechanism in the kernel allowing a
module to modify the collected accounting for a given task. This implementation
is based on the "preempt_notifier". "account_system_time()" and
"account_user_time()" can call functions registered by a module to modify the
cputime value.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: accounting_modifiers --]
[-- Type: text/plain, Size: 4091 bytes --]
Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-16 15:23:27.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-16 15:23:41.000000000 +0200
@@ -955,6 +955,10 @@
/* list of struct preempt_notifier: */
struct hlist_head preempt_notifiers;
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ /* list of struct account_modifiers: */
+ struct hlist_head account_modifiers;
+#endif
unsigned short ioprio;
#ifdef CONFIG_BLK_DEV_IO_TRACE
@@ -1873,6 +1877,31 @@
}
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+struct account_modifier;
+struct account_ops {
+ cputime_t (*user_time)(struct account_modifier *,
+ struct task_struct *, cputime_t );
+ cputime_t (*system_time)(struct account_modifier *,
+ struct task_struct *, int, cputime_t);
+};
+
+struct account_modifier {
+ struct hlist_node link;
+ struct account_ops *ops;
+};
+
+void account_modifier_register(struct account_modifier *modifier);
+void account_modifier_unregister(struct account_modifier *modifier);
+
+static inline void account_modifier_init(struct account_modifier *modifier,
+ struct account_ops *ops)
+{
+ INIT_HLIST_NODE(&modifier->link);
+ modifier->ops = ops;
+}
+#endif
+
#endif /* __KERNEL__ */
#endif
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-16 15:15:33.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-16 15:23:28.000000000 +0200
@@ -1593,6 +1593,9 @@
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&p->preempt_notifiers);
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ INIT_HLIST_HEAD(&p->account_modifiers);
+#endif
/*
* We mark the process as running here, but have not actually
@@ -1731,6 +1734,62 @@
}
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+
+void account_modifier_register(struct account_modifier *modifier)
+{
+ hlist_add_head(&modifier->link, ¤t->account_modifiers);
+}
+EXPORT_SYMBOL_GPL(account_modifier_register);
+
+void account_modifier_unregister(struct account_modifier *modifier)
+{
+ hlist_del(&modifier->link);
+}
+EXPORT_SYMBOL_GPL(account_modifier_unregister);
+
+static cputime_t fire_user_time_account_modifiers(struct task_struct *curr,
+ cputime_t cputime)
+{
+ struct account_modifier *modifier;
+ struct hlist_node *node;
+
+ hlist_for_each_entry(modifier, node, &curr->account_modifiers, link)
+ if (modifier->ops->user_time)
+ cputime = modifier->ops->user_time(modifier,
+ curr, cputime);
+ return cputime;
+}
+
+static cputime_t fire_system_time_account_modifiers(struct task_struct *curr,
+ int hardirq_offset,
+ cputime_t cputime)
+{
+ struct account_modifier *modifier;
+ struct hlist_node *node;
+
+ hlist_for_each_entry(modifier, node, &curr->account_modifiers, link)
+ if (modifier->ops->system_time)
+ cputime = modifier->ops->system_time(modifier,
+ curr, hardirq_offset, cputime);
+ return cputime;
+}
+
+#else
+
+static inline cputime_t fire_user_time_account_modifiers(struct task_struct *curr, cputime_t cputime)
+{
+ return cputime;
+}
+
+static inline cputime_t fire_system_time_account_modifiers(struct task_struct *curr,
+ int hardirq_offset,
+ cputime_t cputime)
+{
+ return cputime;
+}
+
+#endif
/**
* prepare_task_switch - prepare to switch tasks
@@ -3223,6 +3282,8 @@
struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
cputime64_t tmp;
+ cputime = fire_user_time_account_modifiers(p, cputime);
+
p->utime = cputime_add(p->utime, cputime);
/* Add user time to cpustat. */
@@ -3246,6 +3307,8 @@
struct rq *rq = this_rq();
cputime64_t tmp;
+ cputime = fire_system_time_account_modifiers(p,hardirq_offset, cputime);
+
p->stime = cputime_add(p->stime, cputime);
/* Add system time to cpustat. */
@@ -6522,6 +6585,9 @@
#ifdef CONFIG_PREEMPT_NOTIFIERS
INIT_HLIST_HEAD(&init_task.preempt_notifiers);
#endif
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ INIT_HLIST_HEAD(&init_task.account_modifiers);
+#endif
#ifdef CONFIG_SMP
nr_cpu_ids = highest_cpu + 1;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH/RFC 4/4]Modify KVM to use the "account modifiers"
[not found] ` <46C472D2.7000702@bull.net>
@ 2007-08-16 15:59 ` Laurent Vivier
0 siblings, 0 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-16 15:59 UTC (permalink / raw)
To: kvm-devel; +Cc: Ingo Molnar, Rusty Russell, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 339 bytes --]
[PATCH 4/4] Modify KVM to use the "account modifiers". KVM can now measure time
consumed by a Virtual Machine on a per-cpu basis and modify kernel statistics to
report this value.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: kvm_accounting_modifiers --]
[-- Type: text/plain, Size: 5907 bytes --]
Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h 2007-08-16 16:21:42.000000000 +0200
+++ kvm/drivers/kvm/kvm.h 2007-08-16 16:27:00.000000000 +0200
@@ -408,6 +408,10 @@
struct file *filp;
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ struct account_modifier account_modifier;
+ ktime_t vtime[NR_CPUS];
+#endif
};
struct descriptor_table {
@@ -589,6 +593,20 @@
int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+static inline ktime_t kvm_guest_enter(void)
+{
+ return ktime_get();
+}
+
+static inline void kvm_guest_exit(struct kvm *kvm, ktime_t enter)
+{
+ ktime_t delta = ktime_sub(ktime_get(), enter);
+ kvm->vtime[smp_processor_id()] =
+ ktime_add(kvm->vtime[smp_processor_id()], delta);
+}
+#endif
+
static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
{
Index: kvm/drivers/kvm/kvm_main.c
===================================================================
--- kvm.orig/drivers/kvm/kvm_main.c 2007-08-16 16:21:42.000000000 +0200
+++ kvm/drivers/kvm/kvm_main.c 2007-08-16 16:40:28.000000000 +0200
@@ -33,6 +33,7 @@
#include <linux/file.h>
#include <linux/sysdev.h>
#include <linux/cpu.h>
+#include <linux/kernel_stat.h>
#include <linux/sched.h>
#include <linux/cpumask.h>
#include <linux/smp.h>
@@ -57,6 +58,9 @@
EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
static __read_mostly struct preempt_ops kvm_preempt_ops;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+static __read_mostly struct account_ops kvm_account_ops;
+#endif
#define STAT_OFFSET(x) offsetof(struct kvm_vcpu, stat.x)
@@ -285,6 +289,42 @@
}
EXPORT_SYMBOL_GPL(kvm_vcpu_uninit);
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+static inline
+struct kvm *account_modifier_to_kvm(struct account_modifier *am)
+{
+ return container_of(am, struct kvm, account_modifier);
+}
+
+static cputime_t kvm_account_system_modifier(struct account_modifier *modifier,
+ struct task_struct *curr,
+ int hardirq_offset,
+ cputime_t cputime)
+{
+ struct kvm *kvm = account_modifier_to_kvm(modifier);
+ ktime_t kmsec = ktime_set(0, NSEC_PER_MSEC);
+ cputime_t cmsec = msecs_to_cputime(1);
+ ktime_t vtime = kvm->vtime[smp_processor_id()];
+
+ while ((ktime_to_ns(vtime) >= NSEC_PER_MSEC) &&
+ (cputime_to_msecs(cputime) >= 1)) {
+ kvm->vtime[smp_processor_id()] = ktime_sub(vtime, kmsec);
+
+ curr->utime = cputime_add(curr->utime, cmsec);
+ curr->gtime = cputime_add(curr->gtime, cmsec);
+
+ kstat_this_cpu.cpustat.guest = cputime64_add(kstat_this_cpu.cpustat.guest,
+ cputime_to_cputime64(cmsec));
+ kstat_this_cpu.cpustat.user = cputime64_add(kstat_this_cpu.cpustat.user,
+ cputime_to_cputime64(cmsec));
+
+ cputime = cputime_sub(cputime, cmsec);
+ }
+
+ return cputime;
+}
+#endif
+
static struct kvm *kvm_create_vm(void)
{
struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
@@ -299,6 +339,15 @@
spin_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
spin_unlock(&kvm_lock);
+
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ kvm_account_ops.user_time = NULL;
+ kvm_account_ops.system_time = kvm_account_system_modifier;
+
+ account_modifier_init(&kvm->account_modifier, &kvm_account_ops);
+ account_modifier_register(&kvm->account_modifier);
+#endif
+
return kvm;
}
@@ -376,6 +425,9 @@
spin_lock(&kvm_lock);
list_del(&kvm->vm_list);
spin_unlock(&kvm_lock);
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ account_modifier_unregister(&kvm->account_modifier);
+#endif
kvm_io_bus_destroy(&kvm->pio_bus);
kvm_io_bus_destroy(&kvm->mmio_bus);
kvm_free_vcpus(kvm);
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-16 16:21:42.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-16 16:29:41.000000000 +0200
@@ -1392,6 +1392,9 @@
u16 gs_selector;
u16 ldt_selector;
int r;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ ktime_t now;
+#endif
again:
r = kvm_mmu_reload(vcpu);
@@ -1404,6 +1407,9 @@
clgi();
vcpu->guest_mode = 1;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1542,9 @@
#endif
: "cc", "memory" );
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ kvm_guest_exit(vcpu->kvm, now);
+#endif
vcpu->guest_mode = 0;
if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-16 16:21:42.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-16 16:30:24.000000000 +0200
@@ -2052,6 +2052,9 @@
struct vcpu_vmx *vmx = to_vmx(vcpu);
u8 fail;
int r;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ ktime_t now;
+#endif
preempted:
if (vcpu->guest_debug.enabled)
@@ -2078,6 +2081,9 @@
local_irq_disable();
vcpu->guest_mode = 1;
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2204,9 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );
+#ifdef CONFIG_ACCOUNT_MODIFIERS
+ kvm_guest_exit(vcpu->kvm, now);
+#endif
vcpu->guest_mode = 0;
local_irq_enable();
Index: kvm/drivers/kvm/Kconfig
===================================================================
--- kvm.orig/drivers/kvm/Kconfig 2007-08-16 16:21:42.000000000 +0200
+++ kvm/drivers/kvm/Kconfig 2007-08-16 16:21:44.000000000 +0200
@@ -41,4 +41,10 @@
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.
+config ACCOUNT_MODIFIERS
+ bool "Virtual Machine accounting support"
+ depends on KVM
+ ---help---
+ Allows to account CPU time used by the Virtual Machines.
+
endif # VIRTUALIZATION
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-16 15:58 ` [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Laurent Vivier
@ 2007-08-16 22:39 ` Rusty Russell
2007-08-17 7:35 ` Laurent Vivier
0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2007-08-16 22:39 UTC (permalink / raw)
To: Laurent Vivier; +Cc: kvm-devel, Ingo Molnar, virtualization, linux-kernel
On Thu, 2007-08-16 at 17:58 +0200, Laurent Vivier wrote:
> [PATCH 3/3] introduce "account modifiers" mechanism in the kernel allowing a
> module to modify the collected accounting for a given task. This implementation
> is based on the "preempt_notifier". "account_system_time()" and
> "account_user_time()" can call functions registered by a module to modify the
> cputime value.
>
> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Hi Laurent,
This seems a little like overkill. Why not just add an
"account_guest_time" which subtracts the given amount of time from
system time (if available) and adds it to guest time? Then kvm (and
lguest) should just need to call this at the right times.
Am I missing something?
Rusty.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-16 22:39 ` Rusty Russell
@ 2007-08-17 7:35 ` Laurent Vivier
2007-08-17 8:30 ` Rusty Russell
0 siblings, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 7:35 UTC (permalink / raw)
To: Rusty Russell; +Cc: kvm-devel, Ingo Molnar, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2110 bytes --]
Rusty Russell wrote:
> On Thu, 2007-08-16 at 17:58 +0200, Laurent Vivier wrote:
>> [PATCH 3/3] introduce "account modifiers" mechanism in the kernel allowing a
>> module to modify the collected accounting for a given task. This implementation
>> is based on the "preempt_notifier". "account_system_time()" and
>> "account_user_time()" can call functions registered by a module to modify the
>> cputime value.
>>
>> Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
>
>
> Hi Laurent,
Hi Rusty,
how are your puppies ?
And thank you for your comment.
> This seems a little like overkill. Why not just add an
> "account_guest_time" which subtracts the given amount of time from
> system time (if available) and adds it to guest time? Then kvm (and
> lguest) should just need to call this at the right times.
We can. I did something like this before.
By doing like that, I think there is a major issue: system time can be
decreasing (as we substract a value from it), and thus we can have negative
value in a tool like top. It's why I play with the cputime to add to system time
and not directly with the system time.
BUT I'm very open, my only goal is be able to compute guest time, "how" is not
very important...
what we can do:
- keep PATCHES 1 and 2, because we need to store guest time for cpu and tasks.
It is very generic.
- remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
guest time (by calling something like guest_enter() and guest_exit() from the
virtualization engine), and when in account_system_time() we have cputime >
vtime we substrate vtime from cputime and add vtime to user time and guest time.
But doing like this we freeze in kernel/sched.c the link between system time,
user time and guest time (i.e. system time = system time - vtime, user time =
user time + vtime and guest time = guest time + vtime).
- modify PATCH 4 to use new PATCH 3.
Do you agree ? Anybody doesn't agree ?
Laurent
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 7:35 ` Laurent Vivier
@ 2007-08-17 8:30 ` Rusty Russell
2007-08-17 9:16 ` Laurent Vivier
0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2007-08-17 8:30 UTC (permalink / raw)
To: Laurent Vivier; +Cc: kvm-devel, Ingo Molnar, virtualization, linux-kernel
On Fri, 2007-08-17 at 09:35 +0200, Laurent Vivier wrote:
> Rusty Russell wrote:
> > Hi Laurent,
>
> Hi Rusty,
> how are your puppies ?
They're getting a little fat, actually. Too many features ...
> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
> guest time (by calling something like guest_enter() and guest_exit() from the
> virtualization engine), and when in account_system_time() we have cputime >
> vtime we substrate vtime from cputime and add vtime to user time and guest time.
> But doing like this we freeze in kernel/sched.c the link between system time,
> user time and guest time (i.e. system time = system time - vtime, user time =
> user time + vtime and guest time = guest time + vtime).
Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
code, which then knows to add the tick to the guest time. That seems
the simplest possible solution.
lguest or kvm would set the flag before running the guest (which is done
with preempt disabled or using preemption hooks), and reset it
afterwards.
Thoughts?
Rusty.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 8:30 ` Rusty Russell
@ 2007-08-17 9:16 ` Laurent Vivier
2007-08-17 11:51 ` [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Laurent Vivier
2007-08-17 12:55 ` [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Avi Kivity
0 siblings, 2 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 9:16 UTC (permalink / raw)
To: Rusty Russell; +Cc: kvm-devel, Ingo Molnar, virtualization, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]
Rusty Russell wrote:
> On Fri, 2007-08-17 at 09:35 +0200, Laurent Vivier wrote:
>> Rusty Russell wrote:
>>> Hi Laurent,
>> Hi Rusty,
>> how are your puppies ?
>
> They're getting a little fat, actually. Too many features ...
>
>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>> guest time (by calling something like guest_enter() and guest_exit() from the
>> virtualization engine), and when in account_system_time() we have cputime >
>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>> But doing like this we freeze in kernel/sched.c the link between system time,
>> user time and guest time (i.e. system time = system time - vtime, user time =
>> user time + vtime and guest time = guest time + vtime).
>
> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
> code, which then knows to add the tick to the guest time. That seems
> the simplest possible solution.
>
> lguest or kvm would set the flag before running the guest (which is done
> with preempt disabled or using preemption hooks), and reset it
> afterwards.
>
> Thoughts?
It was my first attempt (except I didn't have a per-cpu flag, but a per-task
flag), it's not visible but I love simplicity... ;-)
A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
exited from VCPU and thus this flags is off (so we account 0 to the guest). What
I did then is "set the flag on when we enter in the VCPU, and
"account_system_time()" sets the flag off when it adds this timeslice to cpustat
(and compute correctly guest, user, system time). But I didn't like this idea
because all code executed after we entered in the VCPU is accounted to the guest
until we have an account_system_time() and I suppose we can have real system
time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
a timeslice.
So ? What's best ?
Laurent
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH/RFC 3/4, second shot]Introduce "account_guest_time"
2007-08-17 9:16 ` Laurent Vivier
@ 2007-08-17 11:51 ` Laurent Vivier
2007-08-17 11:54 ` [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()" Laurent Vivier
2007-08-17 12:59 ` [kvm-devel] [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Avi Kivity
2007-08-17 12:55 ` [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Avi Kivity
1 sibling, 2 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 11:51 UTC (permalink / raw)
To: Laurent Vivier
Cc: Rusty Russell, kvm-devel, Ingo Molnar, virtualization,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 412 bytes --]
This is another way to compute guest time... I remove the "account modifiers"
mechanism and call directly account_guest_time() from account_system_time().
account_system_time() computes user, system and guest times according value
accumulated in vtime (a ktime_t) in task_struct by the virtual machine.
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: account_guest --]
[-- Type: text/plain, Size: 1938 bytes --]
Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-17 10:18:53.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-17 12:33:22.000000000 +0200
@@ -1192,6 +1192,9 @@
#ifdef CONFIG_FAULT_INJECTION
int make_it_fail;
#endif
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t vtime;
+#endif
};
/*
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-17 10:18:53.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-17 12:33:07.000000000 +0200
@@ -3233,6 +3233,37 @@
cpustat->user = cputime64_add(cpustat->user, tmp);
}
+#ifdef CONFIG_GUEST_ACCOUNTING
+/*
+ * Account guest time to a process
+ * @p: the process that the cpu time gets accounted to
+ * @cputime: the cpu time spent in kernel space since the last update
+ */
+
+static cputime_t account_guest_time(struct task_struct *p, cputime_t cputime)
+{
+ struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+ ktime_t kmsec = ktime_set(0, NSEC_PER_MSEC);
+ cputime_t cmsec = msecs_to_cputime(1);
+
+ while ((ktime_to_ns(p->vtime) >= NSEC_PER_MSEC) &&
+ (cputime_to_msecs(cputime) >= 1)) {
+ p->vtime = ktime_sub(p->vtime, kmsec);
+ p->utime = cputime_add(p->utime, cmsec);
+ p->gtime = cputime_add(p->gtime, cmsec);
+
+ cpustat->guest = cputime64_add(cpustat->guest,
+ cputime_to_cputime64(cmsec));
+ cpustat->user = cputime64_add(cpustat->user,
+ cputime_to_cputime64(cmsec));
+
+ cputime = cputime_sub(cputime, cmsec);
+ }
+
+ return cputime;
+}
+#endif
+
/*
* Account system cpu time to a process.
* @p: the process that the cpu time gets accounted to
@@ -3246,6 +3277,10 @@
struct rq *rq = this_rq();
cputime64_t tmp;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ cputime = account_guest_time(p, cputime);
+#endif
+
p->stime = cputime_add(p->stime, cputime);
/* Add system time to cpustat. */
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"
2007-08-17 11:51 ` [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Laurent Vivier
@ 2007-08-17 11:54 ` Laurent Vivier
2007-08-17 13:03 ` [kvm-devel] " Avi Kivity
2007-08-17 12:59 ` [kvm-devel] [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Avi Kivity
1 sibling, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 11:54 UTC (permalink / raw)
To: Laurent Vivier
Cc: Rusty Russell, kvm-devel, Ingo Molnar, virtualization,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 125 bytes --]
KVM updates vtime in task_struct to allow account_guest_time() to modify user,
system and guest time in cpustat accordingly.
[-- Attachment #2: kvm_account_guest --]
[-- Type: text/plain, Size: 2882 bytes --]
Index: kvm/drivers/kvm/Kconfig
===================================================================
--- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
+++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
@@ -41,4 +41,10 @@
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.
+config GUEST_ACCOUNTING
+ bool "Virtual Machine accounting support"
+ depends on KVM
+ ---help---
+ Allows to account CPU time used by the Virtual Machines.
+
endif # VIRTUALIZATION
Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h 2007-08-17 10:19:30.000000000 +0200
+++ kvm/drivers/kvm/kvm.h 2007-08-17 10:21:56.000000000 +0200
@@ -589,6 +589,19 @@
int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+#ifdef CONFIG_GUEST_ACCOUNTING
+static inline ktime_t kvm_guest_enter(void)
+{
+ return ktime_get();
+}
+
+static inline void kvm_guest_exit(ktime_t enter)
+{
+ ktime_t delta = ktime_sub(ktime_get(), enter);
+ current->vtime = ktime_add(current->vtime, delta);
+}
+#endif
+
static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
{
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-17 10:22:07.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-17 10:23:32.000000000 +0200
@@ -1392,6 +1392,9 @@
u16 gs_selector;
u16 ldt_selector;
int r;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t now;
+#endif
again:
r = kvm_mmu_reload(vcpu);
@@ -1404,6 +1407,9 @@
clgi();
vcpu->guest_mode = 1;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1542,9 @@
#endif
: "cc", "memory" );
+#ifdef CONFIG_GUEST_ACCOUNTING
+ kvm_guest_exit(now);
+#endif
vcpu->guest_mode = 0;
if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-17 10:23:36.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-17 10:24:37.000000000 +0200
@@ -2052,6 +2052,9 @@
struct vcpu_vmx *vmx = to_vmx(vcpu);
u8 fail;
int r;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ ktime_t now;
+#endif
preempted:
if (vcpu->guest_debug.enabled)
@@ -2078,6 +2081,9 @@
local_irq_disable();
vcpu->guest_mode = 1;
+#ifdef CONFIG_GUEST_ACCOUNTING
+ now = kvm_guest_enter();
+#endif
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2204,9 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );
+#ifdef CONFIG_GUEST_ACCOUNTING
+ kvm_guest_exit(now);
+#endif
vcpu->guest_mode = 0;
local_irq_enable();
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 9:16 ` Laurent Vivier
2007-08-17 11:51 ` [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Laurent Vivier
@ 2007-08-17 12:55 ` Avi Kivity
2007-08-17 13:08 ` Laurent Vivier
2007-08-17 14:12 ` Laurent Vivier
1 sibling, 2 replies; 22+ messages in thread
From: Avi Kivity @ 2007-08-17 12:55 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
>>
>>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>>> guest time (by calling something like guest_enter() and guest_exit() from the
>>> virtualization engine), and when in account_system_time() we have cputime >
>>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>>> But doing like this we freeze in kernel/sched.c the link between system time,
>>> user time and guest time (i.e. system time = system time - vtime, user time =
>>> user time + vtime and guest time = guest time + vtime).
>>>
>> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
>> code, which then knows to add the tick to the guest time. That seems
>> the simplest possible solution.
>>
>> lguest or kvm would set the flag before running the guest (which is done
>> with preempt disabled or using preemption hooks), and reset it
>> afterwards.
>>
>> Thoughts?
>>
>
> It was my first attempt (except I didn't have a per-cpu flag, but a per-task
> flag), it's not visible but I love simplicity... ;-)
>
> A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
> exited from VCPU and thus this flags is off (so we account 0 to the guest). What
> I did then is "set the flag on when we enter in the VCPU, and
> "account_system_time()" sets the flag off when it adds this timeslice to cpustat
> (and compute correctly guest, user, system time). But I didn't like this idea
> because all code executed after we entered in the VCPU is accounted to the guest
> until we have an account_system_time() and I suppose we can have real system
> time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
> a timeslice.
>
> So ? What's best ?
>
The normal user/system accounting has the same issue, no? Whereever we
happen to land (kernel or user) gets the whole tick.
So I think it is okay to have the same limitation for guest time.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4, second shot]Introduce "account_guest_time"
2007-08-17 11:51 ` [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Laurent Vivier
2007-08-17 11:54 ` [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()" Laurent Vivier
@ 2007-08-17 12:59 ` Avi Kivity
1 sibling, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2007-08-17 12:59 UTC (permalink / raw)
To: Laurent Vivier; +Cc: kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
> This is another way to compute guest time... I remove the "account modifiers"
> mechanism and call directly account_guest_time() from account_system_time().
> account_system_time() computes user, system and guest times according value
> accumulated in vtime (a ktime_t) in task_struct by the virtual machine.
>
> @@ -3246,6 +3277,10 @@
> struct rq *rq = this_rq();
> cputime64_t tmp;
>
> +#ifdef CONFIG_GUEST_ACCOUNTING
> + cputime = account_guest_time(p, cputime);
> +#endif
> +
> p->stime = cputime_add(p->stime, cputime);
>
> /* Add system time to cpustat. */
In order to reduce the impact on whatever function this is in (use diff
-p please), you can always have a definition of account_guest_time:
#else
static cputime_t account_guest_time(struct task_struct *p, cputime_t cputime)
{
return cputime;
}
#endif
This way the #ifdef/#endif is not necessary when calling it.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"
2007-08-17 11:54 ` [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()" Laurent Vivier
@ 2007-08-17 13:03 ` Avi Kivity
2007-08-17 13:16 ` Laurent Vivier
0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2007-08-17 13:03 UTC (permalink / raw)
To: Laurent Vivier; +Cc: kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
> KVM updates vtime in task_struct to allow account_guest_time() to modify user,
> system and guest time in cpustat accordingly.
>
> --- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
> +++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
> @@ -41,4 +41,10 @@
> Provides support for KVM on AMD processors equipped with the AMD-V
> (SVM) extensions.
>
> +config GUEST_ACCOUNTING
> + bool "Virtual Machine accounting support"
> + depends on KVM
> + ---help---
> + Allows to account CPU time used by the Virtual Machines.
> +
Other way round. In the patch that adds account_guest_time(), have a
CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.
The advantages of this are:
- the puppyvisor can also select this if it so wishes
- we don't have core code reference some obscure module
CONFIG_PREEMPT_NOTIFIERS does the same thing.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 12:55 ` [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Avi Kivity
@ 2007-08-17 13:08 ` Laurent Vivier
2007-08-17 13:32 ` Christian Borntraeger
2007-08-19 7:41 ` Avi Kivity
2007-08-17 14:12 ` Laurent Vivier
1 sibling, 2 replies; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 13:08 UTC (permalink / raw)
To: Avi Kivity; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]
Avi Kivity wrote:
> Laurent Vivier wrote:
>>>
>>>> - remove PATCH 3, and add in task_struct a "ktime vtime" where we accumulate
>>>> guest time (by calling something like guest_enter() and guest_exit() from the
>>>> virtualization engine), and when in account_system_time() we have cputime >
>>>> vtime we substrate vtime from cputime and add vtime to user time and guest time.
>>>> But doing like this we freeze in kernel/sched.c the link between system time,
>>>> user time and guest time (i.e. system time = system time - vtime, user time =
>>>> user time + vtime and guest time = guest time + vtime).
>>>>
>>> Actually, I think we can set a per-cpu "in_guest" flag for the scheduler
>>> code, which then knows to add the tick to the guest time. That seems
>>> the simplest possible solution.
>>>
>>> lguest or kvm would set the flag before running the guest (which is done
>>> with preempt disabled or using preemption hooks), and reset it
>>> afterwards.
>>>
>>> Thoughts?
>>>
>> It was my first attempt (except I didn't have a per-cpu flag, but a per-task
>> flag), it's not visible but I love simplicity... ;-)
>>
>> A KVM VCPU is stopped by preemption, so when we enter in scheduler we have
>> exited from VCPU and thus this flags is off (so we account 0 to the guest). What
>> I did then is "set the flag on when we enter in the VCPU, and
>> "account_system_time()" sets the flag off when it adds this timeslice to cpustat
>> (and compute correctly guest, user, system time). But I didn't like this idea
>> because all code executed after we entered in the VCPU is accounted to the guest
>> until we have an account_system_time() and I suppose we can have real system
>> time in this part. And I guess a VCPU can be less than 1 ms (unit of cputime) in
>> a timeslice.
>>
>> So ? What's best ?
>>
>
> The normal user/system accounting has the same issue, no? Whereever we
> happen to land (kernel or user) gets the whole tick.
Yes... but perhaps I should rewrite this too ;-)
> So I think it is okay to have the same limitation for guest time.
OK, so we can go back to my first patch.
Who can decide to introduce this into the kernel ?
Laurent
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"
2007-08-17 13:03 ` [kvm-devel] " Avi Kivity
@ 2007-08-17 13:16 ` Laurent Vivier
2007-08-19 7:39 ` Avi Kivity
0 siblings, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 13:16 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel, linux-kernel, virtualization
[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]
Avi Kivity wrote:
> Laurent Vivier wrote:
>> KVM updates vtime in task_struct to allow account_guest_time() to modify user,
>> system and guest time in cpustat accordingly.
>>
>
>> --- kvm.orig/drivers/kvm/Kconfig 2007-08-17 10:24:46.000000000 +0200
>> +++ kvm/drivers/kvm/Kconfig 2007-08-17 10:25:25.000000000 +0200
>> @@ -41,4 +41,10 @@
>> Provides support for KVM on AMD processors equipped with the AMD-V
>> (SVM) extensions.
>>
>> +config GUEST_ACCOUNTING
>> + bool "Virtual Machine accounting support"
>> + depends on KVM
>> + ---help---
>> + Allows to account CPU time used by the Virtual Machines.
>> +
>
>
> Other way round. In the patch that adds account_guest_time(), have a
> CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
> dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.
I was wondering in which Kconfig I can put it...
> The advantages of this are:
> - the puppyvisor can also select this if it so wishes
> - we don't have core code reference some obscure module
I agree.
> CONFIG_PREEMPT_NOTIFIERS does the same thing.
I saw
Laurent
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 13:08 ` Laurent Vivier
@ 2007-08-17 13:32 ` Christian Borntraeger
2007-08-19 7:41 ` Avi Kivity
1 sibling, 0 replies; 22+ messages in thread
From: Christian Borntraeger @ 2007-08-17 13:32 UTC (permalink / raw)
To: Laurent Vivier
Cc: Avi Kivity, Rusty Russell, kvm-devel, linux-kernel,
virtualization
Am Freitag, 17. August 2007 schrieb Laurent Vivier:
> > The normal user/system accounting has the same issue, no? Whereever we
> > happen to land (kernel or user) gets the whole tick.
>
> Yes... but perhaps I should rewrite this too ;-)
If you look further, you will see, that this was actually rewritten in 2.6.12
and thats why we have cputime_t. The infrastructure is currently only used by
s390 and ppc64. On s390 we use our cpu timer to get the current time on each
syscall/irq/context switch etc to get precise accounting data for
system/user/irq/softirq/nice. We also get steal time (this is interesting for
the guest: how much of my cpu was actually not available because the
hypervisor scheduled me away). I dont know enough about other architectures
to say which one could exploit this infrastructure as well.
The current git tree is somewhat broken for CONFIG_VIRT_CPU_ACCOUTING due to
the accouting changes introduced by CFS - we work on this.
If you are interested in the cputime stuff, you can have a look at
arch/s390/kernel/vtime.c (account_system_vtime, account_vtime) as well as:
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=6be7071fdd321c206b1ee7a3e534782f25572830
for the first introduction and
http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=db59f519d37f80683f3611927f6b5c3bdfc0f9e6
for the s390 exploitation.
Christian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 12:55 ` [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Avi Kivity
2007-08-17 13:08 ` Laurent Vivier
@ 2007-08-17 14:12 ` Laurent Vivier
2007-08-19 7:38 ` Avi Kivity
1 sibling, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-17 14:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
[-- Attachment #1: Type: text/plain, Size: 311 bytes --]
Avi Kivity wrote:
[...]
>
> The normal user/system accounting has the same issue, no? Whereever we
> happen to land (kernel or user) gets the whole tick.
>
> So I think it is okay to have the same limitation for guest time.
>
So this is how it looks like.
PATCH 1 and 2 are always a prerequisite.
Laurent
[-- Attachment #2: account_guest --]
[-- Type: text/plain, Size: 1515 bytes --]
Index: kvm/include/linux/sched.h
===================================================================
--- kvm.orig/include/linux/sched.h 2007-08-17 15:07:02.000000000 +0200
+++ kvm/include/linux/sched.h 2007-08-17 15:08:19.000000000 +0200
@@ -1310,6 +1310,7 @@
#define PF_STARTING 0x00000002 /* being created */
#define PF_EXITING 0x00000004 /* getting shut down */
#define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */
+#define PF_VCPU 0x00000010 /* I'm a virtual CPU */
#define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */
#define PF_SUPERPRIV 0x00000100 /* used super-user privileges */
#define PF_DUMPCORE 0x00000200 /* dumped core */
Index: kvm/kernel/sched.c
===================================================================
--- kvm.orig/kernel/sched.c 2007-08-17 14:42:43.000000000 +0200
+++ kvm/kernel/sched.c 2007-08-17 15:16:20.000000000 +0200
@@ -3246,10 +3246,22 @@
struct rq *rq = this_rq();
cputime64_t tmp;
+ tmp = cputime_to_cputime64(cputime);
+ if (p->flags & PF_VCPU) {
+ p->utime = cputime_add(p->utime, cputime);
+ p->gtime = cputime_add(p->gtime, cputime);
+
+ cpustat->guest = cputime64_add(cpustat->guest, tmp);
+ cpustat->user = cputime64_add(cpustat->user, tmp);
+
+ p->flags &= ~PF_VCPU;
+
+ return;
+ }
+
p->stime = cputime_add(p->stime, cputime);
/* Add system time to cpustat. */
- tmp = cputime_to_cputime64(cputime);
if (hardirq_count() - hardirq_offset)
cpustat->irq = cputime64_add(cpustat->irq, tmp);
else if (softirq_count())
[-- Attachment #3: kvm_account_guest --]
[-- Type: text/plain, Size: 1746 bytes --]
Index: kvm/drivers/kvm/kvm.h
===================================================================
--- kvm.orig/drivers/kvm/kvm.h 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/kvm.h 2007-08-17 15:29:46.000000000 +0200
@@ -589,6 +589,19 @@
int kvm_hypercall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+#ifndef PF_VCPU
+#define PF_VCPU 0 /* no kernel support */
+#endif
+
+static inline void kvm_guest_enter(void)
+{
+ current->flags |= PF_VCPU;
+}
+
+static inline void kvm_guest_exit(void)
+{
+}
+
static inline int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
{
Index: kvm/drivers/kvm/svm.c
===================================================================
--- kvm.orig/drivers/kvm/svm.c 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/svm.c 2007-08-17 15:27:03.000000000 +0200
@@ -1404,6 +1404,7 @@
clgi();
vcpu->guest_mode = 1;
+ kvm_guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
svm_flush_tlb(vcpu);
@@ -1536,6 +1537,7 @@
#endif
: "cc", "memory" );
+ kvm_guest_exit();
vcpu->guest_mode = 0;
if (vcpu->fpu_active) {
Index: kvm/drivers/kvm/vmx.c
===================================================================
--- kvm.orig/drivers/kvm/vmx.c 2007-08-17 15:26:16.000000000 +0200
+++ kvm/drivers/kvm/vmx.c 2007-08-17 15:27:45.000000000 +0200
@@ -2078,6 +2078,7 @@
local_irq_disable();
vcpu->guest_mode = 1;
+ kvm_guest_enter();
if (vcpu->requests)
if (test_and_clear_bit(KVM_TLB_FLUSH, &vcpu->requests))
vmx_flush_tlb(vcpu);
@@ -2198,6 +2199,7 @@
[cr2]"i"(offsetof(struct kvm_vcpu, cr2))
: "cc", "memory" );
+ kvm_guest_exit();
vcpu->guest_mode = 0;
local_irq_enable();
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 14:12 ` Laurent Vivier
@ 2007-08-19 7:38 ` Avi Kivity
2007-08-20 7:30 ` Laurent Vivier
0 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2007-08-19 7:38 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
> Avi Kivity wrote:
> [...]
>
>> The normal user/system accounting has the same issue, no? Whereever we
>> happen to land (kernel or user) gets the whole tick.
>>
>> So I think it is okay to have the same limitation for guest time.
>>
>>
>
> So this is how it looks like.
> PATCH 1 and 2 are always a prerequisite.
>
>
> + tmp = cputime_to_cputime64(cputime);
> + if (p->flags & PF_VCPU) {
> + p->utime = cputime_add(p->utime, cputime);
> + p->gtime = cputime_add(p->gtime, cputime);
> +
> + cpustat->guest = cputime64_add(cpustat->guest, tmp);
> + cpustat->user = cputime64_add(cpustat->user, tmp);
> +
> + p->flags &= ~PF_VCPU;
> +
> + return;
> + }
> +
Where did CONFIG_GUEST_ACCOUNTING go?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()"
2007-08-17 13:16 ` Laurent Vivier
@ 2007-08-19 7:39 ` Avi Kivity
0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2007-08-19 7:39 UTC (permalink / raw)
To: Laurent Vivier; +Cc: kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
>> Other way round. In the patch that adds account_guest_time(), have a
>> CONFIG_GUEST_ACCOUNTING (defaulting to n, with no description, help, or
>> dependencies. Then, CONFIG_KVM can select GUEST_ACCOUNTING.
>>
>
> I was wondering in which Kconfig I can put it...
>
>
Looks like init/Kconfig has similar stuff.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-17 13:08 ` Laurent Vivier
2007-08-17 13:32 ` Christian Borntraeger
@ 2007-08-19 7:41 ` Avi Kivity
1 sibling, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2007-08-19 7:41 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
>> So I think it is okay to have the same limitation for guest time.
>>
>
> OK, so we can go back to my first patch.
> Who can decide to introduce this into the kernel ?
>
The sched.c parts are best merged by Ingo, and I can carry the kvm
parts. Alternatively, I can carry the entire patchset if Ingo acks it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-19 7:38 ` Avi Kivity
@ 2007-08-20 7:30 ` Laurent Vivier
2007-08-20 7:55 ` Avi Kivity
0 siblings, 1 reply; 22+ messages in thread
From: Laurent Vivier @ 2007-08-20 7:30 UTC (permalink / raw)
To: Avi Kivity; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]
Avi Kivity wrote:
> Laurent Vivier wrote:
>> Avi Kivity wrote:
>> [...]
>>
>>> The normal user/system accounting has the same issue, no? Whereever we
>>> happen to land (kernel or user) gets the whole tick.
>>>
>>> So I think it is okay to have the same limitation for guest time.
>>>
>>>
>>
>> So this is how it looks like.
>> PATCH 1 and 2 are always a prerequisite.
>>
>>
>
>> + tmp = cputime_to_cputime64(cputime);
>> + if (p->flags & PF_VCPU) {
>> + p->utime = cputime_add(p->utime, cputime);
>> + p->gtime = cputime_add(p->gtime, cputime);
>> +
>> + cpustat->guest = cputime64_add(cpustat->guest, tmp);
>> + cpustat->user = cputime64_add(cpustat->user, tmp);
>> +
>> + p->flags &= ~PF_VCPU;
>> +
>> + return;
>> + }
>> +
>
> Where did CONFIG_GUEST_ACCOUNTING go?
>
Lost in the sea ...
Actually, I thought this modification is not enough expensive (in time and
space) to justify a CONFIG_*. But if you think so I can add this in init/Kconfig.
Laurent
--
------------- Laurent.Vivier@bull.net --------------
"Software is hard" - Donald Knuth
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism
2007-08-20 7:30 ` Laurent Vivier
@ 2007-08-20 7:55 ` Avi Kivity
0 siblings, 0 replies; 22+ messages in thread
From: Avi Kivity @ 2007-08-20 7:55 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Rusty Russell, kvm-devel, linux-kernel, virtualization
Laurent Vivier wrote:
>>
>> Where did CONFIG_GUEST_ACCOUNTING go?
>>
>>
>
> Lost in the sea ...
>
> Actually, I thought this modification is not enough expensive (in time and
> space) to justify a CONFIG_*. But if you think so I can add this in init/Kconfig.
>
>
The difference between "convince everyone that it isn't expensive,
including the embedded guys" and preemptively adding a config option can
be quite large...
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2007-08-20 7:55 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <46C4719A.2060308@bull.net>
2007-08-16 15:57 ` [PATCH/RFC 1/4]Introduce a new field "guest" in cpustat Laurent Vivier
[not found] ` <46C4720F.7030304@bull.net>
2007-08-16 15:57 ` [PATCH/RFC 2/4]Introduce a new field "guest" in task_struct Laurent Vivier
[not found] ` <46C4725A.4070607@bull.net>
2007-08-16 15:58 ` [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Laurent Vivier
2007-08-16 22:39 ` Rusty Russell
2007-08-17 7:35 ` Laurent Vivier
2007-08-17 8:30 ` Rusty Russell
2007-08-17 9:16 ` Laurent Vivier
2007-08-17 11:51 ` [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Laurent Vivier
2007-08-17 11:54 ` [PATCH/RFC 4/4, second shot]KVM uses "account_guest_time()" Laurent Vivier
2007-08-17 13:03 ` [kvm-devel] " Avi Kivity
2007-08-17 13:16 ` Laurent Vivier
2007-08-19 7:39 ` Avi Kivity
2007-08-17 12:59 ` [kvm-devel] [PATCH/RFC 3/4, second shot]Introduce "account_guest_time" Avi Kivity
2007-08-17 12:55 ` [kvm-devel] [PATCH/RFC 3/4]Introduce "account modifiers" mechanism Avi Kivity
2007-08-17 13:08 ` Laurent Vivier
2007-08-17 13:32 ` Christian Borntraeger
2007-08-19 7:41 ` Avi Kivity
2007-08-17 14:12 ` Laurent Vivier
2007-08-19 7:38 ` Avi Kivity
2007-08-20 7:30 ` Laurent Vivier
2007-08-20 7:55 ` Avi Kivity
[not found] ` <46C472D2.7000702@bull.net>
2007-08-16 15:59 ` [PATCH/RFC 4/4]Modify KVM to use the "account modifiers" Laurent Vivier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox