linux-mips.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] KVM statistics data fd-based binary interface
@ 2021-05-17 14:53 Jing Zhang
  2021-05-17 14:53 ` [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones Jing Zhang
                   ` (4 more replies)
  0 siblings, 5 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:53 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito
  Cc: Jing Zhang

This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
  * architecture dependent or common
  * VM statistics data or VCPU statistics data
  * type: cumulative, instantaneous,
  * unit: none for simple counter, nanosecond, microsecond,
    millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.

---

* v4 -> v5
  - Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
    'kvmarm-fixes-5.13-1'")
  - Change maximum stats name length to 48
  - Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
    descriptor definition macros.
  - Fixed some errors/warnings reported by checkpatch.pl

* v3 -> v4
  - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
    between install_new_memslots and MMU notifier")
  - Use C-stype comments in the whole patch
  - Fix wrong count for x86 VCPU stats descriptors
  - Fix KVM stats data size counting and validity check in selftest

* v2 -> v3
  - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
    between install_new_memslots and MMU notifier")
  - Resolve some nitpicks about format

* v1 -> v2
  - Use ARRAY_SIZE to count the number of stats descriptors
  - Fix missing `size` field initialization in macro STATS_DESC

[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com

---

Jing Zhang (4):
  KVM: stats: Separate common stats from architecture specific ones
  KVM: stats: Add fd-based API to read binary stats data
  KVM: stats: Add documentation for statistics data binary interface
  KVM: selftests: Add selftest for KVM statistics data binary interface

 Documentation/virt/kvm/api.rst                | 171 ++++++++
 arch/arm64/include/asm/kvm_host.h             |   9 +-
 arch/arm64/kvm/guest.c                        |  38 +-
 arch/mips/include/asm/kvm_host.h              |   9 +-
 arch/mips/kvm/mips.c                          |  64 ++-
 arch/powerpc/include/asm/kvm_host.h           |   9 +-
 arch/powerpc/kvm/book3s.c                     |  64 ++-
 arch/powerpc/kvm/book3s_hv.c                  |  12 +-
 arch/powerpc/kvm/book3s_pr.c                  |   2 +-
 arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
 arch/powerpc/kvm/booke.c                      |  59 ++-
 arch/s390/include/asm/kvm_host.h              |   9 +-
 arch/s390/kvm/kvm-s390.c                      | 129 +++++-
 arch/x86/include/asm/kvm_host.h               |   9 +-
 arch/x86/kvm/x86.c                            |  67 +++-
 include/linux/kvm_host.h                      | 136 ++++++-
 include/linux/kvm_types.h                     |  12 +
 include/uapi/linux/kvm.h                      |  50 +++
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../testing/selftests/kvm/include/kvm_util.h  |   3 +
 .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
 virt/kvm/kvm_main.c                           | 237 ++++++++++-
 24 files changed, 1396 insertions(+), 90 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c


base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
@ 2021-05-17 14:53 ` Jing Zhang
  2021-05-17 23:39   ` David Matlack
  2021-05-17 14:53 ` [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:53 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito
  Cc: Jing Zhang

Put all common statistics in a separate structure to ease
statistics handling for the incoming new statistics API.

No functional change intended.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arch/arm64/include/asm/kvm_host.h   |  9 ++-------
 arch/arm64/kvm/guest.c              | 12 ++++++------
 arch/mips/include/asm/kvm_host.h    |  9 ++-------
 arch/mips/kvm/mips.c                | 12 ++++++------
 arch/powerpc/include/asm/kvm_host.h |  9 ++-------
 arch/powerpc/kvm/book3s.c           | 12 ++++++------
 arch/powerpc/kvm/book3s_hv.c        | 12 ++++++------
 arch/powerpc/kvm/book3s_pr.c        |  2 +-
 arch/powerpc/kvm/book3s_pr_papr.c   |  2 +-
 arch/powerpc/kvm/booke.c            | 14 +++++++-------
 arch/s390/include/asm/kvm_host.h    |  9 ++-------
 arch/s390/kvm/kvm-s390.c            | 12 ++++++------
 arch/x86/include/asm/kvm_host.h     |  9 ++-------
 arch/x86/kvm/x86.c                  | 14 +++++++-------
 include/linux/kvm_host.h            |  9 +++++++--
 include/linux/kvm_types.h           | 12 ++++++++++++
 virt/kvm/kvm_main.c                 | 14 +++++++-------
 17 files changed, 82 insertions(+), 90 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..f3ad7a20b0af 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -556,16 +556,11 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
 }
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_common common;
 };
 
 struct kvm_vcpu_stat {
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
+	struct kvm_vcpu_stat_common common;
 	u64 hvc_exit_stat;
 	u64 wfe_exit_stat;
 	u64 wfi_exit_stat;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5cb4a1cd5603..0e41331b0911 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -29,18 +29,18 @@
 #include "trace.h"
 
 struct kvm_stats_debugfs_item debugfs_entries[] = {
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
 	VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
 	VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
 	VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
 	VCPU_STAT("mmio_exit_user", mmio_exit_user),
 	VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
 	VCPU_STAT("exits", exits),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
 	{ NULL }
 };
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index fca4547d580f..6f610fbcd8d1 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -109,10 +109,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
 }
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_common common;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_common common;
 	u64 wait_exits;
 	u64 cache_exits;
 	u64 signal_exits;
@@ -142,12 +143,6 @@ struct kvm_vcpu_stat {
 #ifdef CONFIG_CPU_LOONGSON64
 	u64 vz_cpucfg_exits;
 #endif
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 4d4af97dcc88..f4fc60c05e9c 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -68,12 +68,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 #ifdef CONFIG_CPU_LOONGSON64
 	VCPU_STAT("vz_cpucfg", vz_cpucfg_exits),
 #endif
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
 	{NULL}
 };
 
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1e83359f286b..473d9d0804ff 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -80,12 +80,13 @@ struct kvmppc_book3s_shadow_vcpu;
 struct kvm_nested_guest;
 
 struct kvm_vm_stat {
-	ulong remote_tlb_flush;
+	struct kvm_vm_stat_common common;
 	ulong num_2M_pages;
 	ulong num_1G_pages;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_common common;
 	u64 sum_exits;
 	u64 mmio_exits;
 	u64 signal_exits;
@@ -101,14 +102,8 @@ struct kvm_vcpu_stat {
 	u64 emulated_inst_exits;
 	u64 dec_exits;
 	u64 ext_intr_exits;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 halt_wait_ns;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
 	u64 halt_successful_wait;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 	u64 dbell_exits;
 	u64 gdbell_exits;
 	u64 ld;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2b691f4d1f26..bd3a10e1fdaf 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -47,14 +47,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("dec", dec_exits),
 	VCPU_STAT("ext_intr", ext_intr_exits),
 	VCPU_STAT("queue_intr", queue_intr),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("halt_wait_ns", halt_wait_ns),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
 	VCPU_STAT("halt_successful_wait", halt_successful_wait),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
 	VCPU_STAT("pf_storage", pf_storage),
 	VCPU_STAT("sp_storage", sp_storage),
 	VCPU_STAT("pf_instruc", pf_instruc),
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..58e187e03c52 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -236,7 +236,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 
 	waitp = kvm_arch_vcpu_get_wait(vcpu);
 	if (rcuwait_wake_up(waitp))
-		++vcpu->stat.halt_wakeup;
+		++vcpu->stat.common.halt_wakeup;
 
 	cpu = READ_ONCE(vcpu->arch.thread_cpu);
 	if (cpu >= 0 && kvmppc_ipi_thread(cpu))
@@ -3925,7 +3925,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 	cur = start_poll = ktime_get();
 	if (vc->halt_poll_ns) {
 		ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
-		++vc->runner->stat.halt_attempted_poll;
+		++vc->runner->stat.common.halt_attempted_poll;
 
 		vc->vcore_state = VCORE_POLLING;
 		spin_unlock(&vc->lock);
@@ -3942,7 +3942,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		vc->vcore_state = VCORE_INACTIVE;
 
 		if (!do_sleep) {
-			++vc->runner->stat.halt_successful_poll;
+			++vc->runner->stat.common.halt_successful_poll;
 			goto out;
 		}
 	}
@@ -3954,7 +3954,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 		do_sleep = 0;
 		/* If we polled, count this as a successful poll */
 		if (vc->halt_poll_ns)
-			++vc->runner->stat.halt_successful_poll;
+			++vc->runner->stat.common.halt_successful_poll;
 		goto out;
 	}
 
@@ -3981,13 +3981,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 			ktime_to_ns(cur) - ktime_to_ns(start_wait);
 		/* Attribute failed poll time */
 		if (vc->halt_poll_ns)
-			vc->runner->stat.halt_poll_fail_ns +=
+			vc->runner->stat.common.halt_poll_fail_ns +=
 				ktime_to_ns(start_wait) -
 				ktime_to_ns(start_poll);
 	} else {
 		/* Attribute successful poll time */
 		if (vc->halt_poll_ns)
-			vc->runner->stat.halt_poll_success_ns +=
+			vc->runner->stat.common.halt_poll_success_ns +=
 				ktime_to_ns(cur) -
 				ktime_to_ns(start_poll);
 	}
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index d7733b07f489..214caa9d9675 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 		if (!vcpu->arch.pending_exceptions) {
 			kvm_vcpu_block(vcpu);
 			kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-			vcpu->stat.halt_wakeup++;
+			vcpu->stat.common.halt_wakeup++;
 
 			/* Unset POW bit after we woke up */
 			msr &= ~MSR_POW;
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index 031c8015864a..9384625c8051 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -378,7 +378,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
 		kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
 		kvm_vcpu_block(vcpu);
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
-		vcpu->stat.halt_wakeup++;
+		vcpu->stat.common.halt_wakeup++;
 		return EMULATE_DONE;
 	case H_LOGICAL_CI_LOAD:
 		return kvmppc_h_pr_logical_ci_load(vcpu);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 7d5fe43f85c4..07fdd7a1254a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -49,15 +49,15 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("inst_emu", emulated_inst_exits),
 	VCPU_STAT("dec", dec_exits),
 	VCPU_STAT("ext_intr", ext_intr_exits),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
 	VCPU_STAT("doorbell", dbell_exits),
 	VCPU_STAT("guest doorbell", gdbell_exits),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
+	VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
 	{ NULL }
 };
 
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8925f3969478..57a20897f3db 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -361,6 +361,7 @@ struct sie_page {
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_common common;
 	u64 exit_userspace;
 	u64 exit_null;
 	u64 exit_external_request;
@@ -370,13 +371,7 @@ struct kvm_vcpu_stat {
 	u64 exit_validity;
 	u64 exit_instruction;
 	u64 exit_pei;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_invalid;
 	u64 halt_no_poll_steal;
-	u64 halt_wakeup;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 instruction_lctl;
 	u64 instruction_lctlg;
 	u64 instruction_stctl;
@@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
 };
 
 struct kvm_vm_stat {
+	struct kvm_vm_stat_common common;
 	u64 inject_io;
 	u64 inject_float_mchk;
 	u64 inject_pfault_done;
 	u64 inject_service_signal;
 	u64 inject_virtio;
-	u64 remote_tlb_flush;
 };
 
 struct kvm_arch_memory_slot {
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1296fc10f80c..d6bf3372bb10 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -72,13 +72,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("exit_program_interruption", exit_program_interruption),
 	VCPU_STAT("exit_instr_and_program_int", exit_instr_and_program),
 	VCPU_STAT("exit_operation_exception", exit_operation_exception),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
 	VCPU_STAT("halt_no_poll_steal", halt_no_poll_steal),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("instruction_lctlg", instruction_lctlg),
 	VCPU_STAT("instruction_lctl", instruction_lctl),
 	VCPU_STAT("instruction_stctl", instruction_stctl),
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..5bfd6893fbf6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1127,6 +1127,7 @@ struct kvm_arch {
 };
 
 struct kvm_vm_stat {
+	struct kvm_vm_stat_common common;
 	ulong mmu_shadow_zapped;
 	ulong mmu_pte_write;
 	ulong mmu_pde_zapped;
@@ -1134,13 +1135,13 @@ struct kvm_vm_stat {
 	ulong mmu_recycled;
 	ulong mmu_cache_miss;
 	ulong mmu_unsync;
-	ulong remote_tlb_flush;
 	ulong lpages;
 	ulong nx_lpage_splits;
 	ulong max_mmu_page_hash_collisions;
 };
 
 struct kvm_vcpu_stat {
+	struct kvm_vcpu_stat_common common;
 	u64 pf_fixed;
 	u64 pf_guest;
 	u64 tlb_flush;
@@ -1154,10 +1155,6 @@ struct kvm_vcpu_stat {
 	u64 nmi_window_exits;
 	u64 l1d_flush;
 	u64 halt_exits;
-	u64 halt_successful_poll;
-	u64 halt_attempted_poll;
-	u64 halt_poll_invalid;
-	u64 halt_wakeup;
 	u64 request_irq_exits;
 	u64 irq_exits;
 	u64 host_state_reload;
@@ -1168,8 +1165,6 @@ struct kvm_vcpu_stat {
 	u64 irq_injections;
 	u64 nmi_injections;
 	u64 req_event;
-	u64 halt_poll_success_ns;
-	u64 halt_poll_fail_ns;
 	u64 nested_run;
 	u64 directed_yield_attempted;
 	u64 directed_yield_successful;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..9a93d80caff6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -226,10 +226,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("irq_window", irq_window_exits),
 	VCPU_STAT("nmi_window", nmi_window_exits),
 	VCPU_STAT("halt_exits", halt_exits),
-	VCPU_STAT("halt_successful_poll", halt_successful_poll),
-	VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
-	VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
-	VCPU_STAT("halt_wakeup", halt_wakeup),
+	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
+	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
+	VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
+	VCPU_STAT_COM("halt_wakeup", halt_wakeup),
 	VCPU_STAT("hypercalls", hypercalls),
 	VCPU_STAT("request_irq", request_irq_exits),
 	VCPU_STAT("irq_exits", irq_exits),
@@ -241,8 +241,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("nmi_injections", nmi_injections),
 	VCPU_STAT("req_event", req_event),
 	VCPU_STAT("l1d_flush", l1d_flush),
-	VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
-	VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
+	VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
+	VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
 	VCPU_STAT("nested_run", nested_run),
 	VCPU_STAT("directed_yield_attempted", directed_yield_attempted),
 	VCPU_STAT("directed_yield_successful", directed_yield_successful),
@@ -253,7 +253,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VM_STAT("mmu_recycled", mmu_recycled),
 	VM_STAT("mmu_cache_miss", mmu_cache_miss),
 	VM_STAT("mmu_unsync", mmu_unsync),
-	VM_STAT("remote_tlb_flush", remote_tlb_flush),
+	VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
 	VM_STAT("largepages", lpages, .mode = 0444),
 	VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
 	VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f34487e21f2..97700e41db3b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1243,10 +1243,15 @@ struct kvm_stats_debugfs_item {
 #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
 	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
 
-#define VM_STAT(n, x, ...) 							\
+#define VM_STAT(n, x, ...)						       \
 	{ n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
-#define VCPU_STAT(n, x, ...)							\
+#define VCPU_STAT(n, x, ...)						       \
 	{ n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
+#define VM_STAT_COM(n, x, ...)						       \
+	{ n, offsetof(struct kvm, stat.common.x), KVM_STAT_VM, ## __VA_ARGS__ }
+#define VCPU_STAT_COM(n, x, ...)					       \
+	{ n, offsetof(struct kvm_vcpu, stat.common.x),			       \
+	  KVM_STAT_VCPU, ## __VA_ARGS__ }
 
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index a7580f69dda0..87eb05ad678b 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -76,5 +76,17 @@ struct kvm_mmu_memory_cache {
 };
 #endif
 
+struct kvm_vm_stat_common {
+	ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat_common {
+	u64 halt_successful_poll;
+	u64 halt_attempted_poll;
+	u64 halt_poll_invalid;
+	u64 halt_wakeup;
+	u64 halt_poll_success_ns;
+	u64 halt_poll_fail_ns;
+};
 
 #endif /* __KVM_TYPES_H__ */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b4feb92dc79..34a4cf265297 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -330,7 +330,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	 */
 	if (!kvm_arch_flush_remote_tlb(kvm)
 	    || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
-		++kvm->stat.remote_tlb_flush;
+		++kvm->stat.common.remote_tlb_flush;
 	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
@@ -2940,9 +2940,9 @@ static inline void
 update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
 {
 	if (waited)
-		vcpu->stat.halt_poll_fail_ns += poll_ns;
+		vcpu->stat.common.halt_poll_fail_ns += poll_ns;
 	else
-		vcpu->stat.halt_poll_success_ns += poll_ns;
+		vcpu->stat.common.halt_poll_success_ns += poll_ns;
 }
 
 /*
@@ -2960,16 +2960,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
 		ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
 
-		++vcpu->stat.halt_attempted_poll;
+		++vcpu->stat.common.halt_attempted_poll;
 		do {
 			/*
 			 * This sets KVM_REQ_UNHALT if an interrupt
 			 * arrives.
 			 */
 			if (kvm_vcpu_check_block(vcpu) < 0) {
-				++vcpu->stat.halt_successful_poll;
+				++vcpu->stat.common.halt_successful_poll;
 				if (!vcpu_valid_wakeup(vcpu))
-					++vcpu->stat.halt_poll_invalid;
+					++vcpu->stat.common.halt_poll_invalid;
 				goto out;
 			}
 			poll_end = cur = ktime_get();
@@ -3027,7 +3027,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 	waitp = kvm_arch_vcpu_get_wait(vcpu);
 	if (rcuwait_wake_up(waitp)) {
 		WRITE_ONCE(vcpu->ready, true);
-		++vcpu->stat.halt_wakeup;
+		++vcpu->stat.common.halt_wakeup;
 		return true;
 	}
 
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
  2021-05-17 14:53 ` [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones Jing Zhang
@ 2021-05-17 14:53 ` Jing Zhang
  2021-05-19 17:12   ` David Matlack
  2021-05-20  4:21   ` Ricardo Koller
  2021-05-17 14:53 ` [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:53 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito
  Cc: Jing Zhang

Provides a file descriptor per VM to read VM stats info/data.
Provides a file descriptor per vCPU to read vCPU stats info/data.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arch/arm64/kvm/guest.c    |  26 +++++
 arch/mips/kvm/mips.c      |  52 +++++++++
 arch/powerpc/kvm/book3s.c |  52 +++++++++
 arch/powerpc/kvm/booke.c  |  45 ++++++++
 arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
 arch/x86/kvm/x86.c        |  53 +++++++++
 include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
 include/uapi/linux/kvm.h  |  50 +++++++++
 virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
 9 files changed, 745 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0e41331b0911..1cc1d83630ac 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -28,6 +28,32 @@
 
 #include "trace.h"
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("hvc_exit_stat"),
+	STATS_DESC_COUNTER("wfe_exit_stat"),
+	STATS_DESC_COUNTER("wfi_exit_stat"),
+	STATS_DESC_COUNTER("mmio_exit_user"),
+	STATS_DESC_COUNTER("mmio_exit_kernel"),
+	STATS_DESC_COUNTER("exits"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
 	VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index f4fc60c05e9c..f17a65743ccd 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -38,6 +38,58 @@
 #define VECTORSPACING 0x100	/* for EI/VI mode */
 #endif
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("wait_exits"),
+	STATS_DESC_COUNTER("cache_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("int_exits"),
+	STATS_DESC_COUNTER("cop_unusable_exits"),
+	STATS_DESC_COUNTER("tlbmod_exits"),
+	STATS_DESC_COUNTER("tlbmiss_ld_exits"),
+	STATS_DESC_COUNTER("tlbmiss_st_exits"),
+	STATS_DESC_COUNTER("addrerr_st_exits"),
+	STATS_DESC_COUNTER("addrerr_ld_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("resvd_inst_exits"),
+	STATS_DESC_COUNTER("break_inst_exits"),
+	STATS_DESC_COUNTER("trap_inst_exits"),
+	STATS_DESC_COUNTER("msa_fpe_exits"),
+	STATS_DESC_COUNTER("fpe_exits"),
+	STATS_DESC_COUNTER("msa_disabled_exits"),
+	STATS_DESC_COUNTER("flush_dcache_exits"),
+#ifdef CONFIG_KVM_MIPS_VZ
+	STATS_DESC_COUNTER("vz_gpsi_exits"),
+	STATS_DESC_COUNTER("vz_gsfc_exits"),
+	STATS_DESC_COUNTER("vz_hc_exits"),
+	STATS_DESC_COUNTER("vz_grr_exits"),
+	STATS_DESC_COUNTER("vz_gva_exits"),
+	STATS_DESC_COUNTER("vz_ghfc_exits"),
+	STATS_DESC_COUNTER("vz_gpa_exits"),
+	STATS_DESC_COUNTER("vz_resvd_exits"),
+#ifdef CONFIG_CPU_LOONGSON64
+	STATS_DESC_COUNTER("vz_cpucfg_exits"),
+#endif
+#endif
+	);
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("wait", wait_exits),
 	VCPU_STAT("cache", cache_exits),
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index bd3a10e1fdaf..5e8ee0d39ef9 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -38,6 +38,58 @@
 
 /* #define EXIT_DEBUG */
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_ICOUNTER("num_2M_pages"),
+	STATS_DESC_ICOUNTER("num_1G_pages"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("sum_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("light_exits"),
+	STATS_DESC_COUNTER("itlb_real_miss_exits"),
+	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("isi_exits"),
+	STATS_DESC_COUNTER("dsi_exits"),
+	STATS_DESC_COUNTER("emulated_inst_exits"),
+	STATS_DESC_COUNTER("dec_exits"),
+	STATS_DESC_COUNTER("ext_intr_exits"),
+	STATS_DESC_TIME_NSEC("halt_wait_ns"),
+	STATS_DESC_COUNTER("halt_successful_wait"),
+	STATS_DESC_COUNTER("dbell_exits"),
+	STATS_DESC_COUNTER("gdbell_exits"),
+	STATS_DESC_COUNTER("ld"),
+	STATS_DESC_COUNTER("st"),
+	STATS_DESC_COUNTER("pf_storage"),
+	STATS_DESC_COUNTER("pf_instruc"),
+	STATS_DESC_COUNTER("sp_storage"),
+	STATS_DESC_COUNTER("sp_instruc"),
+	STATS_DESC_COUNTER("queue_intr"),
+	STATS_DESC_COUNTER("ld_slow"),
+	STATS_DESC_COUNTER("st_slow"),
+	STATS_DESC_COUNTER("pthru_all"),
+	STATS_DESC_COUNTER("pthru_host"),
+	STATS_DESC_COUNTER("pthru_bad_aff"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("exits", sum_exits),
 	VCPU_STAT("mmio", mmio_exits),
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 07fdd7a1254a..86d221e9193e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -36,6 +36,51 @@
 
 unsigned long kvmppc_booke_handlers;
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_ICOUNTER("num_2M_pages"),
+	STATS_DESC_ICOUNTER("num_1G_pages"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("sum_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("light_exits"),
+	STATS_DESC_COUNTER("itlb_real_miss_exits"),
+	STATS_DESC_COUNTER("itlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_real_miss_exits"),
+	STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
+	STATS_DESC_COUNTER("syscall_exits"),
+	STATS_DESC_COUNTER("isi_exits"),
+	STATS_DESC_COUNTER("dsi_exits"),
+	STATS_DESC_COUNTER("emulated_inst_exits"),
+	STATS_DESC_COUNTER("dec_exits"),
+	STATS_DESC_COUNTER("ext_intr_exits"),
+	STATS_DESC_TIME_NSEC("halt_wait_ns"),
+	STATS_DESC_COUNTER("halt_successful_wait"),
+	STATS_DESC_COUNTER("dbell_exits"),
+	STATS_DESC_COUNTER("gdbell_exits"),
+	STATS_DESC_COUNTER("ld"),
+	STATS_DESC_COUNTER("st"),
+	STATS_DESC_COUNTER("pthru_all"),
+	STATS_DESC_COUNTER("pthru_host"),
+	STATS_DESC_COUNTER("pthru_bad_aff"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("mmio", mmio_exits),
 	VCPU_STAT("sig", signal_exits),
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d6bf3372bb10..003feee79fce 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -58,6 +58,123 @@
 #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
 			   (KVM_MAX_VCPUS + LOCAL_IRQS))
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_COUNTER("inject_io"),
+	STATS_DESC_COUNTER("inject_float_mchk"),
+	STATS_DESC_COUNTER("inject_pfault_done"),
+	STATS_DESC_COUNTER("inject_service_signal"),
+	STATS_DESC_COUNTER("inject_virtio"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("exit_userspace"),
+	STATS_DESC_COUNTER("exit_null"),
+	STATS_DESC_COUNTER("exit_external_request"),
+	STATS_DESC_COUNTER("exit_io_request"),
+	STATS_DESC_COUNTER("exit_external_interrupt"),
+	STATS_DESC_COUNTER("exit_stop_request"),
+	STATS_DESC_COUNTER("exit_validity"),
+	STATS_DESC_COUNTER("exit_instruction"),
+	STATS_DESC_COUNTER("exit_pei"),
+	STATS_DESC_COUNTER("halt_no_poll_steal"),
+	STATS_DESC_COUNTER("instruction_lctl"),
+	STATS_DESC_COUNTER("instruction_lctlg"),
+	STATS_DESC_COUNTER("instruction_stctl"),
+	STATS_DESC_COUNTER("instruction_stctg"),
+	STATS_DESC_COUNTER("exit_program_interruption"),
+	STATS_DESC_COUNTER("exit_instr_and_program"),
+	STATS_DESC_COUNTER("exit_operation_exception"),
+	STATS_DESC_COUNTER("deliver_ckc"),
+	STATS_DESC_COUNTER("deliver_cputm"),
+	STATS_DESC_COUNTER("deliver_external_call"),
+	STATS_DESC_COUNTER("deliver_emergency_signal"),
+	STATS_DESC_COUNTER("deliver_service_signal"),
+	STATS_DESC_COUNTER("deliver_virtio"),
+	STATS_DESC_COUNTER("deliver_stop_signal"),
+	STATS_DESC_COUNTER("deliver_prefix_signal"),
+	STATS_DESC_COUNTER("deliver_restart_signal"),
+	STATS_DESC_COUNTER("deliver_program"),
+	STATS_DESC_COUNTER("deliver_io"),
+	STATS_DESC_COUNTER("deliver_machine_check"),
+	STATS_DESC_COUNTER("exit_wait_state"),
+	STATS_DESC_COUNTER("inject_ckc"),
+	STATS_DESC_COUNTER("inject_cputm"),
+	STATS_DESC_COUNTER("inject_external_call"),
+	STATS_DESC_COUNTER("inject_emergency_signal"),
+	STATS_DESC_COUNTER("inject_mchk"),
+	STATS_DESC_COUNTER("inject_pfault_init"),
+	STATS_DESC_COUNTER("inject_program"),
+	STATS_DESC_COUNTER("inject_restart"),
+	STATS_DESC_COUNTER("inject_set_prefix"),
+	STATS_DESC_COUNTER("inject_stop_signal"),
+	STATS_DESC_COUNTER("instruction_epsw"),
+	STATS_DESC_COUNTER("instruction_gs"),
+	STATS_DESC_COUNTER("instruction_io_other"),
+	STATS_DESC_COUNTER("instruction_lpsw"),
+	STATS_DESC_COUNTER("instruction_lpswe"),
+	STATS_DESC_COUNTER("instruction_pfmf"),
+	STATS_DESC_COUNTER("instruction_ptff"),
+	STATS_DESC_COUNTER("instruction_sck"),
+	STATS_DESC_COUNTER("instruction_sckpf"),
+	STATS_DESC_COUNTER("instruction_stidp"),
+	STATS_DESC_COUNTER("instruction_spx"),
+	STATS_DESC_COUNTER("instruction_stpx"),
+	STATS_DESC_COUNTER("instruction_stap"),
+	STATS_DESC_COUNTER("instruction_iske"),
+	STATS_DESC_COUNTER("instruction_ri"),
+	STATS_DESC_COUNTER("instruction_rrbe"),
+	STATS_DESC_COUNTER("instruction_sske"),
+	STATS_DESC_COUNTER("instruction_ipte_interlock"),
+	STATS_DESC_COUNTER("instruction_stsi"),
+	STATS_DESC_COUNTER("instruction_stfl"),
+	STATS_DESC_COUNTER("instruction_tb"),
+	STATS_DESC_COUNTER("instruction_tpi"),
+	STATS_DESC_COUNTER("instruction_tprot"),
+	STATS_DESC_COUNTER("instruction_tsch"),
+	STATS_DESC_COUNTER("instruction_sie"),
+	STATS_DESC_COUNTER("instruction_essa"),
+	STATS_DESC_COUNTER("instruction_sthyi"),
+	STATS_DESC_COUNTER("instruction_sigp_sense"),
+	STATS_DESC_COUNTER("instruction_sigp_sense_running"),
+	STATS_DESC_COUNTER("instruction_sigp_external_call"),
+	STATS_DESC_COUNTER("instruction_sigp_emergency"),
+	STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
+	STATS_DESC_COUNTER("instruction_sigp_start"),
+	STATS_DESC_COUNTER("instruction_sigp_stop"),
+	STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
+	STATS_DESC_COUNTER("instruction_sigp_store_status"),
+	STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
+	STATS_DESC_COUNTER("instruction_sigp_arch"),
+	STATS_DESC_COUNTER("instruction_sigp_prefix"),
+	STATS_DESC_COUNTER("instruction_sigp_restart"),
+	STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
+	STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
+	STATS_DESC_COUNTER("instruction_sigp_unknown"),
+	STATS_DESC_COUNTER("diagnose_10"),
+	STATS_DESC_COUNTER("diagnose_44"),
+	STATS_DESC_COUNTER("diagnose_9c"),
+	STATS_DESC_COUNTER("diagnose_9c_ignored"),
+	STATS_DESC_COUNTER("diagnose_258"),
+	STATS_DESC_COUNTER("diagnose_308"),
+	STATS_DESC_COUNTER("diagnose_500"),
+	STATS_DESC_COUNTER("diagnose_other"),
+	STATS_DESC_COUNTER("pfault_sync"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("userspace_handled", exit_userspace),
 	VCPU_STAT("exit_null", exit_null),
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a93d80caff6..84880687c199 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
 u64 __read_mostly supported_xss;
 EXPORT_SYMBOL_GPL(supported_xss);
 
+struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
+	STATS_DESC_COUNTER("mmu_shadow_zapped"),
+	STATS_DESC_COUNTER("mmu_pte_write"),
+	STATS_DESC_COUNTER("mmu_pde_zapped"),
+	STATS_DESC_COUNTER("mmu_flooded"),
+	STATS_DESC_COUNTER("mmu_recycled"),
+	STATS_DESC_COUNTER("mmu_cache_miss"),
+	STATS_DESC_ICOUNTER("mmu_unsync"),
+	STATS_DESC_ICOUNTER("largepages"),
+	STATS_DESC_ICOUNTER("nx_largepages_splits"),
+	STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
+
+struct _kvm_stats_header kvm_vm_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vm_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vm_stats_desc),
+};
+
+struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
+	STATS_DESC_COUNTER("pf_fixed"),
+	STATS_DESC_COUNTER("pf_guest"),
+	STATS_DESC_COUNTER("tlb_flush"),
+	STATS_DESC_COUNTER("invlpg"),
+	STATS_DESC_COUNTER("exits"),
+	STATS_DESC_COUNTER("io_exits"),
+	STATS_DESC_COUNTER("mmio_exits"),
+	STATS_DESC_COUNTER("signal_exits"),
+	STATS_DESC_COUNTER("irq_window_exits"),
+	STATS_DESC_COUNTER("nmi_window_exits"),
+	STATS_DESC_COUNTER("l1d_flush"),
+	STATS_DESC_COUNTER("halt_exits"),
+	STATS_DESC_COUNTER("request_irq_exits"),
+	STATS_DESC_COUNTER("irq_exits"),
+	STATS_DESC_COUNTER("host_state_reload"),
+	STATS_DESC_COUNTER("fpu_reload"),
+	STATS_DESC_COUNTER("insn_emulation"),
+	STATS_DESC_COUNTER("insn_emulation_fail"),
+	STATS_DESC_COUNTER("hypercalls"),
+	STATS_DESC_COUNTER("irq_injections"),
+	STATS_DESC_COUNTER("nmi_injections"),
+	STATS_DESC_COUNTER("req_event"),
+	STATS_DESC_COUNTER("nested_run"));
+
+struct _kvm_stats_header kvm_vcpu_stats_header = {
+	.name_size = KVM_STATS_NAME_LEN,
+	.count = ARRAY_SIZE(kvm_vcpu_stats_desc),
+	.desc_offset = sizeof(struct kvm_stats_header),
+	.data_offset = sizeof(struct kvm_stats_header) +
+		sizeof(kvm_vcpu_stats_desc),
+};
+
 struct kvm_stats_debugfs_item debugfs_entries[] = {
 	VCPU_STAT("pf_fixed", pf_fixed),
 	VCPU_STAT("pf_guest", pf_guest),
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 97700e41db3b..52783f8062ca 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
 	int mode;
 };
 
+struct _kvm_stats_header {
+	__u32 name_size;
+	__u32 count;
+	__u32 desc_offset;
+	__u32 data_offset;
+};
+
+#define KVM_STATS_NAME_LEN	48
+struct _kvm_stats_desc {
+	struct kvm_stats_desc desc;
+	char name[KVM_STATS_NAME_LEN];
+};
+
 #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
 	((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
 
@@ -1253,8 +1266,122 @@ struct kvm_stats_debugfs_item {
 	{ n, offsetof(struct kvm_vcpu, stat.common.x),			       \
 	  KVM_STAT_VCPU, ## __VA_ARGS__ }
 
+#define STATS_DESC(name, type, unit, scale, exponent)			       \
+	{								       \
+		{type | unit | scale, exponent, 1}, name,		       \
+	}
+#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)		       \
+	STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
+#define STATS_DESC_INSTANT(name, unit, scale, exponent)			       \
+	STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
+
+/* Cumulative counter */
+#define STATS_DESC_COUNTER(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,		       \
+		KVM_STATS_SCALE_POW10, 0)
+/* Instantaneous counter */
+#define STATS_DESC_ICOUNTER(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,			       \
+		KVM_STATS_SCALE_POW10, 0)
+
+/* Cumulative clock cycles */
+#define STATS_DESC_CYCLE(name)						       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,		       \
+		KVM_STATS_SCALE_POW10, 0)
+/* Instantaneous clock cycles */
+#define STATS_DESC_ICYCLE(name)						       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,			       \
+		KVM_STATS_SCALE_POW10, 0)
+
+/* Cumulative memory size in Byte */
+#define STATS_DESC_SIZE_BYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_SCALE_POW2, 0)
+/* Cumulative memory size in KiByte */
+#define STATS_DESC_SIZE_KBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_SCALE_POW2, 10)
+/* Cumulative memory size in MiByte */
+#define STATS_DESC_SIZE_MBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_SCALE_POW2, 20)
+/* Cumulative memory size in GiByte */
+#define STATS_DESC_SIZE_GBYTE(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,		       \
+		KVM_STATS_SCALE_POW2, 30)
+
+/* Instantaneous memory size in Byte */
+#define STATS_DESC_ISIZE_BYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_SCALE_POW2, 0)
+/* Instantaneous memory size in KiByte */
+#define STATS_DESC_ISIZE_KBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_SCALE_POW2, 10)
+/* Instantaneous memory size in MiByte */
+#define STATS_DESC_ISIZE_MBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_SCALE_POW2, 20)
+/* Instantaneous memory size in GiByte */
+#define STATS_DESC_ISIZE_GBYTE(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,			       \
+		KVM_STATS_SCALE_POW2, 30)
+
+/* Cumulative time in second */
+#define STATS_DESC_TIME_SEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, 0)
+/* Cumulative time in millisecond */
+#define STATS_DESC_TIME_MSEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -3)
+/* Cumulative time in microsecond */
+#define STATS_DESC_TIME_USEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -6)
+/* Cumulative time in nanosecond */
+#define STATS_DESC_TIME_NSEC(name)					       \
+	STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -9)
+
+/* Instantaneous time in second */
+#define STATS_DESC_ITIME_SEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, 0)
+/* Instantaneous time in millisecond */
+#define STATS_DESC_ITIME_MSEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -3)
+/* Instantaneous time in microsecond */
+#define STATS_DESC_ITIME_USEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -6)
+/* Instantaneous time in nanosecond */
+#define STATS_DESC_ITIME_NSEC(name)					       \
+	STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,		       \
+		KVM_STATS_SCALE_POW10, -9)
+
+#define DEFINE_VM_STATS_DESC(...) {					       \
+	STATS_DESC_COUNTER("remote_tlb_flush"),				       \
+	## __VA_ARGS__							       \
+}
+
+#define DEFINE_VCPU_STATS_DESC(...) {					       \
+	STATS_DESC_COUNTER("halt_successful_poll"),			       \
+	STATS_DESC_COUNTER("halt_attempted_poll"),			       \
+	STATS_DESC_COUNTER("halt_poll_invalid"),			       \
+	STATS_DESC_COUNTER("halt_wakeup"),				       \
+	STATS_DESC_TIME_NSEC("halt_poll_success_ns"),			       \
+	STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),			       \
+	## __VA_ARGS__							       \
+}
+
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
+extern struct _kvm_stats_header kvm_vm_stats_header;
+extern struct _kvm_stats_header kvm_vcpu_stats_header;
+extern struct _kvm_stats_desc kvm_vm_stats_desc[];
+extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
 
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..a64e92c7d9de 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_STATS_BINARY_FD 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
 #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
 #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
 
+#define KVM_STATS_ID_MAXLEN		64
+
+struct kvm_stats_header {
+	char id[KVM_STATS_ID_MAXLEN];
+	__u32 name_size;
+	__u32 count;
+	__u32 desc_offset;
+	__u32 data_offset;
+};
+
+#define KVM_STATS_TYPE_SHIFT		0
+#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
+
+#define KVM_STATS_UNIT_SHIFT		4
+#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+
+#define KVM_STATS_SCALE_SHIFT		8
+#define KVM_STATS_SCALE_MASK		(0xF << KVM_STATS_SCALE_SHIFT)
+#define KVM_STATS_SCALE_POW10		(0x0 << KVM_STATS_SCALE_SHIFT)
+#define KVM_STATS_SCALE_POW2		(0x1 << KVM_STATS_SCALE_SHIFT)
+#define KVM_STATS_SCALE_MAX		KVM_STATS_SCALE_POW2
+
+struct kvm_stats_desc {
+	__u32 flags;
+	__s16 exponent;
+	__u16 size;
+	__u32 unused1;
+	__u32 unused2;
+	char name[0];
+};
+
+struct kvm_vm_stats_data {
+	unsigned long value[0];
+};
+
+struct kvm_vcpu_stats_data {
+	__u64 value[0];
+};
+
+#define KVM_STATS_GETFD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 34a4cf265297..9e2c8dcdeae9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 	return 0;
 }
 
+static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
+			      size_t size, loff_t *offset)
+{
+	char id[KVM_STATS_ID_MAXLEN];
+	struct kvm_vcpu *vcpu = file->private_data;
+	ssize_t copylen, len, remain = size;
+	size_t size_header, size_desc, size_stats;
+	loff_t pos = *offset;
+	char __user *dest = user_buffer;
+	void *src;
+
+	snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
+			task_pid_nr(current), vcpu->vcpu_id);
+	size_header = sizeof(kvm_vcpu_stats_header);
+	size_desc =
+		kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
+	size_stats = sizeof(vcpu->stat);
+
+	len = sizeof(id) + size_header + size_desc + size_stats - pos;
+	len = min(len, remain);
+	if (len <= 0)
+		return 0;
+	remain = len;
+
+	/* Copy kvm vcpu stats header id string */
+	copylen = sizeof(id) - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)id + pos;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vcpu stats header */
+	copylen = sizeof(id) + size_header - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&kvm_vcpu_stats_header;
+		src += pos - sizeof(id);
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vcpu stats descriptors */
+	copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&kvm_vcpu_stats_desc;
+		src += pos - kvm_vcpu_stats_header.desc_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vcpu stats values */
+	copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&vcpu->stat;
+		src += pos - kvm_vcpu_stats_header.data_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+
+	*offset = pos;
+	return len;
+}
+
+static const struct file_operations kvm_vcpu_stats_fops = {
+	.read = kvm_vcpu_stats_read,
+	.llseek = noop_llseek,
+};
+
+static int kvm_vcpu_ioctl_get_statsfd(struct kvm_vcpu *vcpu)
+{
+	int error, fd;
+	struct file *file;
+	char name[15 + ITOA_MAX_LEN + 1];
+
+	snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
+
+	error = get_unused_fd_flags(O_CLOEXEC);
+	if (error < 0)
+		return error;
+	fd = error;
+
+	file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_mode |= FMODE_PREAD;
+	fd_install(fd, file);
+
+	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+	return error;
+}
+
 static long kvm_vcpu_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
 		r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
 		break;
 	}
+	case KVM_STATS_GETFD: {
+		r = kvm_vcpu_ioctl_get_statsfd(vcpu);
+		break;
+	}
 	default:
 		r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
 	}
@@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 #else
 		return 0;
 #endif
+	case KVM_CAP_STATS_BINARY_FD:
+		return 1;
 	default:
 		break;
 	}
@@ -3967,6 +4082,111 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 }
 
+static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
+			      size_t size, loff_t *offset)
+{
+	char id[KVM_STATS_ID_MAXLEN];
+	struct kvm *kvm = file->private_data;
+	ssize_t copylen, len, remain = size;
+	size_t size_header, size_desc, size_stats;
+	loff_t pos = *offset;
+	char __user *dest = user_buffer;
+	void *src;
+
+	snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
+	size_header = sizeof(kvm_vm_stats_header);
+	size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
+	size_stats = sizeof(kvm->stat);
+
+	len = sizeof(id) + size_header + size_desc + size_stats - pos;
+	len = min(len, remain);
+	if (len <= 0)
+		return 0;
+	remain = len;
+
+	/* Copy kvm vm stats header id string */
+	copylen = sizeof(id) - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)id + pos;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vm stats header */
+	copylen = sizeof(id) + size_header - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&kvm_vm_stats_header;
+		src += pos - sizeof(id);
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vm stats descriptors */
+	copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&kvm_vm_stats_desc;
+		src += pos - kvm_vm_stats_header.desc_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+	/* Copy kvm vm stats values */
+	copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
+	copylen = min(copylen, remain);
+	if (copylen > 0) {
+		src = (void *)&kvm->stat;
+		src += pos - kvm_vm_stats_header.data_offset;
+		if (copy_to_user(dest, src, copylen))
+			return -EFAULT;
+		remain -= copylen;
+		pos += copylen;
+		dest += copylen;
+	}
+
+	*offset = pos;
+	return len;
+}
+
+static const struct file_operations kvm_vm_stats_fops = {
+	.read = kvm_vm_stats_read,
+	.llseek = noop_llseek,
+};
+
+static int kvm_vm_ioctl_get_statsfd(struct kvm *kvm)
+{
+	int error, fd;
+	struct file *file;
+
+	error = get_unused_fd_flags(O_CLOEXEC);
+	if (error < 0)
+		return error;
+	fd = error;
+
+	file = anon_inode_getfile("kvm-vm-stats",
+			&kvm_vm_stats_fops, kvm, O_RDONLY);
+	if (IS_ERR(file)) {
+		error = PTR_ERR(file);
+		goto err_put_unused_fd;
+	}
+	file->f_mode |= FMODE_PREAD;
+	fd_install(fd, file);
+
+	return fd;
+
+err_put_unused_fd:
+	put_unused_fd(fd);
+	return error;
+}
+
 static long kvm_vm_ioctl(struct file *filp,
 			   unsigned int ioctl, unsigned long arg)
 {
@@ -4149,6 +4369,9 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_RESET_DIRTY_RINGS:
 		r = kvm_vm_ioctl_reset_dirty_pages(kvm);
 		break;
+	case KVM_STATS_GETFD:
+		r = kvm_vm_ioctl_get_statsfd(kvm);
+		break;
 	default:
 		r = kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
  2021-05-17 14:53 ` [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones Jing Zhang
  2021-05-17 14:53 ` [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
@ 2021-05-17 14:53 ` Jing Zhang
  2021-05-19 16:57   ` David Matlack
  2021-05-19 17:02   ` David Matlack
  2021-05-17 14:53 ` [PATCH v5 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
  2021-05-17 14:55 ` [PATCH v5 0/4] KVM statistics data fd-based " Jing Zhang
  4 siblings, 2 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:53 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito
  Cc: Jing Zhang

Update KVM API documentation for binary statistics.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..9a6aa9770dfd 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_STATS_GETFD
+---------------------
+
+:Capability: KVM_CAP_STATS_BINARY_FD
+:Architectures: all
+:Type: vm ioctl, vcpu ioctl
+:Parameters: none
+:Returns: statistics file descriptor on success, < 0 on error
+
+Errors:
+
+  ======     ======================================================
+  ENOMEM     if the fd could not be created due to lack of memory
+  EMFILE     if the number of opened files exceeds the limit
+  ======     ======================================================
+
+The file descriptor can be used to read VM/vCPU statistics data in binary
+format. The file data is organized into three blocks as below:
++-------------+
+|   Header    |
++-------------+
+| Descriptors |
++-------------+
+| Stats Data  |
++-------------+
+
+The Header block is always at the start of the file. It is only needed to be
+read one time after a system boot.
+It is in the form of ``struct kvm_stats_header`` as below::
+
+	#define KVM_STATS_ID_MAXLEN		64
+
+	struct kvm_stats_header {
+		char id[KVM_STATS_ID_MAXLEN];
+		__u32 name_size;
+		__u32 count;
+		__u32 desc_offset;
+		__u32 data_offset;
+	};
+
+The ``id`` field is identification for the corresponding KVM statistics. For
+KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
+VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
+"kvm-12345/vcpu-12".
+
+The ``name_size`` field is the size (byte) of the statistics name string
+(including trailing '\0') appended to the end of every statistics descriptor.
+
+The ``count`` field is the number of statistics.
+
+The ``desc_offset`` field is the offset of the Descriptors block from the start
+of the file indicated by the file descriptor.
+
+The ``data_offset`` field is the offset of the Stats Data block from the start
+of the file indicated by the file descriptor.
+
+The Descriptors block is only needed to be read once after a system boot. It is
+an array of ``struct kvm_stats_desc`` as below::
+
+	#define KVM_STATS_TYPE_SHIFT		0
+	#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
+	#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_INSTANT
+
+	#define KVM_STATS_UNIT_SHIFT		4
+	#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
+	#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+
+	#define KVM_STATS_SCALE_SHIFT		8
+	#define KVM_STATS_SCALE_MASK		(0xF << KVM_STATS_SCALE_SHIFT)
+	#define KVM_STATS_SCALE_POW10		(0x0 << KVM_STATS_SCALE_SHIFT)
+	#define KVM_STATS_SCALE_POW2		(0x1 << KVM_STATS_SCALE_SHIFT)
+	#define KVM_STATS_SCALE_MAX		KVM_STATS_SCALE_POW2
+
+	struct kvm_stats_desc {
+		__u32 flags;
+		__s16 exponent;
+		__u16 size;
+		__u32 unused1;
+		__u32 unused2;
+		char name[0];
+	};
+
+The ``flags`` field contains the type and unit of the statistics data described
+by this descriptor. The following flags are supported:
+  * ``KVM_STATS_TYPE_CUMULATIVE``
+    The statistics data is cumulative. The value of data can only be increased.
+    Most of the counters used in KVM are of this type.
+    The corresponding ``count`` filed for this type is always 1.
+  * ``KVM_STATS_TYPE_INSTANT``
+    The statistics data is instantaneous. Its value can be increased or
+    decreased. This type is usually used as a measurement of some resources,
+    like the number of dirty pages, the number of large pages, etc.
+    The corresponding ``count`` field for this type is always 1.
+  * ``KVM_STATS_UNIT_NONE``
+    There is no unit for the value of statistics data. This usually means that
+    the value is a simple counter of an event.
+  * ``KVM_STATS_UNIT_BYTES``
+    It indicates that the statistics data is used to measure memory size, in the
+    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
+    determined by the ``exponent`` field in the descriptor. The
+    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
+    determined by ``pow(2, exponent)``. For example, if value is 10,
+    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
+    can get the statistics data in the unit of Byte by
+    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
+    10 * 1024 * 1024 Bytes.
+  * ``KVM_STATS_UNIT_SECONDS``
+    It indicates that the statistics data is used to measure time/latency, in
+    the unit of nanosecond, microsecond, millisecond and second. The unit of the
+    data is determined by the ``exponent`` field in the descriptor. The
+    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
+    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
+    ``exponent`` is -6, which means the unit of statistics data is microsecond,
+    we can get the statistics data in the unit of second by
+    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
+  * ``KVM_STATS_UNIT_CYCLES``
+    It indicates that the statistics data is used to measure CPU clock cycles.
+    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
+    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
+    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
+
+The ``exponent`` field is the scale of corresponding statistics data. It has two
+values as follows:
+  * ``KVM_STATS_SCALE_POW10``
+    The scale is based on power of 10. It is used for measurement of time and
+    CPU clock cycles.
+  * ``KVM_STATS_SCALE_POW2``
+    The scale is based on power of 2. It is used for measurement of memory size.
+
+The ``size`` field is the number of values of this statistics data. It is in the
+unit of ``unsigned long`` for VCPU or ``__u64`` for VM.
+
+The ``unused1`` and ``unused2`` fields are reserved for future
+support for other types of statistics data, like log/linear histogram.
+
+The ``name`` field points to the name string of the statistics data. The name
+string starts at the end of ``struct kvm_stats_desc``.
+The maximum length (including trailing '\0') is indicated by ``name_size``
+in ``struct kvm_stats_header``.
+
+The Stats Data block contains an array of data values of type ``struct
+kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
+user space periodically to pull statistics data.
+The order of data value in Stats Data block is the same as the order of
+descriptors in Descriptors block.
+  * Statistics data for VM::
+
+	struct kvm_vm_stats_data {
+		unsigned long value[0];
+	};
+
+  * Statistics data for VCPU::
+
+	struct kvm_vcpu_stats_data {
+		__u64 value[0];
+	};
+
 5. The kvm_run structure
 ========================
 
@@ -6891,3 +7054,11 @@ This capability is always enabled.
 This capability indicates that the KVM virtual PTP service is
 supported in the host. A VMM can check whether the service is
 available to the guest on migration.
+
+8.33 KVM_CAP_STATS_BINARY_FD
+----------------------------
+
+:Architectures: all
+
+This capability indicates the feature that user space can create get a file
+descriptor for every VM and VCPU to read statistics data in binary format.
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
                   ` (2 preceding siblings ...)
  2021-05-17 14:53 ` [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
@ 2021-05-17 14:53 ` Jing Zhang
  2021-05-19 17:21   ` David Matlack
  2021-05-19 22:00   ` Ricardo Koller
  2021-05-17 14:55 ` [PATCH v5 0/4] KVM statistics data fd-based " Jing Zhang
  4 siblings, 2 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:53 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito
  Cc: Jing Zhang

Add selftest to check KVM stats descriptors validity.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../testing/selftests/kvm/include/kvm_util.h  |   3 +
 .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
 tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
 5 files changed, 398 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..35796667c944 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -43,3 +43,4 @@
 /memslot_modification_stress_test
 /set_memory_region_test
 /steal_time
+/kvm_bin_form_stats
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..2984c86c848a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
+TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
 
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
@@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += kvm_page_table_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
+TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
@@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += kvm_page_table_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
+TEST_GEN_PROGS_s390x += kvm_bin_form_stats
 
 TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
 LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index a8f022794ce3..ee01a67022d9 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
 #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
 	__GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
 
+int vm_get_statsfd(struct kvm_vm *vm);
+int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
+
 #endif /* SELFTEST_KVM_UTIL_H */
diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
new file mode 100644
index 000000000000..dae44397d0f4
--- /dev/null
+++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
@@ -0,0 +1,379 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * kvm_bin_form_stats
+ *
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Test the fd-based interface for KVM statistics.
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+#include "asm/kvm.h"
+#include "linux/kvm.h"
+
+int vm_stats_test(struct kvm_vm *vm)
+{
+	ssize_t ret;
+	int i, stats_fd, err = -1;
+	size_t size_desc, size_data = 0;
+	struct kvm_stats_header header;
+	struct kvm_stats_desc *stats_desc, *pdesc;
+	struct kvm_vm_stats_data *stats_data;
+
+	/* Get fd for VM stats */
+	stats_fd = vm_get_statsfd(vm);
+	if (stats_fd < 0) {
+		perror("Get VM stats fd");
+		return err;
+	}
+	/* Read kvm vm stats header */
+	ret = read(stats_fd, &header, sizeof(header));
+	if (ret != sizeof(header)) {
+		perror("Read VM stats header");
+		goto out_close_fd;
+	}
+	size_desc = sizeof(*stats_desc) + header.name_size;
+	/* Check id string in header, that should start with "kvm" */
+	if (strncmp(header.id, "kvm", 3) ||
+			strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
+		printf("Invalid KVM VM stats type!\n");
+		goto out_close_fd;
+	}
+	/* Sanity check for other fields in header */
+	if (header.count == 0) {
+		err = 0;
+		goto out_close_fd;
+	}
+	/* Check overlap */
+	if (header.desc_offset == 0 || header.data_offset == 0 ||
+			header.desc_offset < sizeof(header) ||
+			header.data_offset < sizeof(header)) {
+		printf("Invalid offset fields in header!\n");
+		goto out_close_fd;
+	}
+	if (header.desc_offset < header.data_offset &&
+			(header.desc_offset + size_desc * header.count >
+			header.data_offset)) {
+		printf("VM Descriptor block is overlapped with data block!\n");
+		goto out_close_fd;
+	}
+
+	/* Allocate memory for stats descriptors */
+	stats_desc = calloc(header.count, size_desc);
+	if (!stats_desc) {
+		perror("Allocate memory for VM stats descriptors");
+		goto out_close_fd;
+	}
+	/* Read kvm vm stats descriptors */
+	ret = pread(stats_fd, stats_desc,
+			size_desc * header.count, header.desc_offset);
+	if (ret != size_desc * header.count) {
+		perror("Read KVM VM stats descriptors");
+		goto out_free_desc;
+	}
+	/* Sanity check for fields in descriptors */
+	for (i = 0; i < header.count; ++i) {
+		pdesc = (void *)stats_desc + i * size_desc;
+		/* Check type,unit,scale boundaries */
+		if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
+			printf("Unknown KVM stats type!\n");
+			goto out_free_desc;
+		}
+		if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
+			printf("Unknown KVM stats unit!\n");
+			goto out_free_desc;
+		}
+		if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
+				KVM_STATS_SCALE_MAX) {
+			printf("Unknown KVM stats scale!\n");
+			goto out_free_desc;
+		}
+		/* Check exponent for stats unit
+		 * Exponent for counter should be greater than or equal to 0
+		 * Exponent for unit bytes should be greater than or equal to 0
+		 * Exponent for unit seconds should be less than or equal to 0
+		 * Exponent for unit clock cycles should be greater than or
+		 * equal to 0
+		 */
+		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
+		case KVM_STATS_UNIT_NONE:
+		case KVM_STATS_UNIT_BYTES:
+		case KVM_STATS_UNIT_CYCLES:
+			if (pdesc->exponent < 0) {
+				printf("Unsupported KVM stats unit!\n");
+				goto out_free_desc;
+			}
+			break;
+		case KVM_STATS_UNIT_SECONDS:
+			if (pdesc->exponent > 0) {
+				printf("Unsupported KVM stats unit!\n");
+				goto out_free_desc;
+			}
+			break;
+		}
+		/* Check name string */
+		if (strlen(pdesc->name) >= header.name_size) {
+			printf("KVM stats name(%s) too long!\n", pdesc->name);
+			goto out_free_desc;
+		}
+		/* Check size field, which should not be zero */
+		if (pdesc->size == 0) {
+			printf("KVM descriptor(%s) with size of 0!\n",
+					pdesc->name);
+			goto out_free_desc;
+		}
+		size_data += pdesc->size * sizeof(stats_data->value[0]);
+	}
+	/* Check overlap */
+	if (header.data_offset < header.desc_offset &&
+		header.data_offset + size_data > header.desc_offset) {
+		printf("Data block is overlapped with Descriptor block!\n");
+		goto out_free_desc;
+	}
+	/* Check validity of all stats data size */
+	if (size_data < header.count * sizeof(stats_data->value[0])) {
+		printf("Data size is not correct!\n");
+		goto out_free_desc;
+	}
+
+	/* Allocate memory for stats data */
+	stats_data = malloc(size_data);
+	if (!stats_data) {
+		perror("Allocate memory for VM stats data");
+		goto out_free_desc;
+	}
+	/* Read kvm vm stats data */
+	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
+	if (ret != size_data) {
+		perror("Read KVM VM stats data");
+		goto out_free_data;
+	}
+
+	err = 0;
+out_free_data:
+	free(stats_data);
+out_free_desc:
+	free(stats_desc);
+out_close_fd:
+	close(stats_fd);
+	return err;
+}
+
+int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
+{
+	ssize_t ret;
+	int i, stats_fd, err = -1;
+	size_t size_desc, size_data = 0;
+	struct kvm_stats_header header;
+	struct kvm_stats_desc *stats_desc, *pdesc;
+	struct kvm_vcpu_stats_data *stats_data;
+
+	/* Get fd for VCPU stats */
+	stats_fd = vcpu_get_statsfd(vm, vcpu_id);
+	if (stats_fd < 0) {
+		perror("Get VCPU stats fd");
+		return err;
+	}
+	/* Read kvm vcpu stats header */
+	ret = read(stats_fd, &header, sizeof(header));
+	if (ret != sizeof(header)) {
+		perror("Read VCPU stats header");
+		goto out_close_fd;
+	}
+	size_desc = sizeof(*stats_desc) + header.name_size;
+	/* Check id string in header, that should start with "kvm" */
+	if (strncmp(header.id, "kvm", 3) ||
+			strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
+		printf("Invalid KVM VCPU stats type!\n");
+		goto out_close_fd;
+	}
+	/* Sanity check for other fields in header */
+	if (header.count == 0) {
+		err = 0;
+		goto out_close_fd;
+	}
+	/* Check overlap */
+	if (header.desc_offset == 0 || header.data_offset == 0 ||
+			header.desc_offset < sizeof(header) ||
+			header.data_offset < sizeof(header)) {
+		printf("Invalid offset fields in header!\n");
+		goto out_close_fd;
+	}
+	if (header.desc_offset < header.data_offset &&
+			(header.desc_offset + size_desc * header.count >
+			header.data_offset)) {
+		printf("VCPU Descriptor block is overlapped with data block!\n");
+		goto out_close_fd;
+	}
+
+	/* Allocate memory for stats descriptors */
+	stats_desc = calloc(header.count, size_desc);
+	if (!stats_desc) {
+		perror("Allocate memory for VCPU stats descriptors");
+		goto out_close_fd;
+	}
+	/* Read kvm vcpu stats descriptors */
+	ret = pread(stats_fd, stats_desc,
+			size_desc * header.count, header.desc_offset);
+	if (ret != size_desc * header.count) {
+		perror("Read KVM VCPU stats descriptors");
+		goto out_free_desc;
+	}
+	/* Sanity check for fields in descriptors */
+	for (i = 0; i < header.count; ++i) {
+		pdesc = (void *)stats_desc + i * size_desc;
+		/* Check boundaries */
+		if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
+			printf("Unknown KVM stats type!\n");
+			goto out_free_desc;
+		}
+		if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
+			printf("Unknown KVM stats unit!\n");
+			goto out_free_desc;
+		}
+		if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
+				KVM_STATS_SCALE_MAX) {
+			printf("Unknown KVM stats scale!\n");
+			goto out_free_desc;
+		}
+		/* Check exponent for stats unit
+		 * Exponent for counter should be greater than or equal to 0
+		 * Exponent for unit bytes should be greater than or equal to 0
+		 * Exponent for unit seconds should be less than or equal to 0
+		 * Exponent for unit clock cycles should be greater than or
+		 * equal to 0
+		 */
+		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
+		case KVM_STATS_UNIT_NONE:
+		case KVM_STATS_UNIT_BYTES:
+		case KVM_STATS_UNIT_CYCLES:
+			if (pdesc->exponent < 0) {
+				printf("Unsupported KVM stats unit!\n");
+				goto out_free_desc;
+			}
+			break;
+		case KVM_STATS_UNIT_SECONDS:
+			if (pdesc->exponent > 0) {
+				printf("Unsupported KVM stats unit!\n");
+				goto out_free_desc;
+			}
+			break;
+		}
+		/* Check name string */
+		if (strlen(pdesc->name) >= header.name_size) {
+			printf("KVM stats name(%s) too long!\n", pdesc->name);
+			goto out_free_desc;
+		}
+		/* Check size field, which should not be zero */
+		if (pdesc->size == 0) {
+			printf("KVM descriptor(%s) with size of 0!\n",
+					pdesc->name);
+			goto out_free_desc;
+		}
+		size_data += pdesc->size * sizeof(stats_data->value[0]);
+	}
+	/* Check overlap */
+	if (header.data_offset < header.desc_offset &&
+		header.data_offset + size_data > header.desc_offset) {
+		printf("Data block is overlapped with Descriptor block!\n");
+		goto out_free_desc;
+	}
+	/* Check validity of all stats data size */
+	if (size_data < header.count * sizeof(stats_data->value[0])) {
+		printf("Data size is not correct!\n");
+		goto out_free_desc;
+	}
+
+	/* Allocate memory for stats data */
+	stats_data = malloc(size_data);
+	if (!stats_data) {
+		perror("Allocate memory for VCPU stats data");
+		goto out_free_desc;
+	}
+	/* Read kvm vcpu stats data */
+	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
+	if (ret != size_data) {
+		perror("Read KVM VCPU stats data");
+		goto out_free_data;
+	}
+
+	err = 0;
+out_free_data:
+	free(stats_data);
+out_free_desc:
+	free(stats_desc);
+out_close_fd:
+	close(stats_fd);
+	return err;
+}
+
+/*
+ * Usage: kvm_bin_form_stats [#vm] [#vcpu]
+ * The first parameter #vm set the number of VMs being created.
+ * The second parameter #vcpu set the number of VCPUs being created.
+ * By default, 1 VM and 1 VCPU for the VM would be created for testing.
+ */
+
+int main(int argc, char *argv[])
+{
+	int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
+	struct kvm_vm **vms;
+
+	/* Get the number of VMs and VCPUs that would be created for testing. */
+	if (argc > 1) {
+		max_vm = strtol(argv[1], NULL, 0);
+		if (max_vm <= 0)
+			max_vm = 1;
+	}
+	if (argc > 2) {
+		max_vcpu = strtol(argv[2], NULL, 0);
+		if (max_vcpu <= 0)
+			max_vcpu = 1;
+	}
+
+	/* Check the extension for binary stats */
+	ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
+	if (ret < 0) {
+		printf("Binary form statistics interface is not supported!\n");
+		return err;
+	}
+
+	/* Create VMs and VCPUs */
+	vms = malloc(sizeof(vms[0]) * max_vm);
+	if (!vms) {
+		perror("Allocate memory for storing VM pointers");
+		return err;
+	}
+	for (i = 0; i < max_vm; ++i) {
+		vms[i] = vm_create(VM_MODE_DEFAULT,
+				DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+		for (j = 0; j < max_vcpu; ++j)
+			vm_vcpu_add(vms[i], j);
+	}
+
+	/* Check stats read for every VM and VCPU */
+	for (i = 0; i < max_vm; ++i) {
+		if (vm_stats_test(vms[i]))
+			goto out_free_vm;
+		for (j = 0; j < max_vcpu; ++j) {
+			if (vcpu_stats_test(vms[i], j))
+				goto out_free_vm;
+		}
+	}
+
+	err = 0;
+out_free_vm:
+	for (i = 0; i < max_vm; ++i)
+		kvm_vm_free(vms[i]);
+	free(vms);
+	return err;
+}
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fc83f6c5902d..d9e0b2c8b906 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
 	n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
 	return vm_adjust_num_guest_pages(mode, n);
 }
+
+int vm_get_statsfd(struct kvm_vm *vm)
+{
+	return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
+}
+
+int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+	return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
+}
-- 
2.31.1.751.gd2f1c929bd-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 0/4] KVM statistics data fd-based binary interface
  2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
                   ` (3 preceding siblings ...)
  2021-05-17 14:53 ` [PATCH v5 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
@ 2021-05-17 14:55 ` Jing Zhang
  4 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-17 14:55 UTC (permalink / raw)
  To: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi Paolo,

On Mon, May 17, 2021 at 9:53 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> This patchset provides a file descriptor for every VM and VCPU to read
> KVM statistics data in binary format.
> It is meant to provide a lightweight, flexible, scalable and efficient
> lock-free solution for user space telemetry applications to pull the
> statistics data periodically for large scale systems. The pulling
> frequency could be as high as a few times per second.
> In this patchset, every statistics data are treated to have some
> attributes as below:
>   * architecture dependent or common
>   * VM statistics data or VCPU statistics data
>   * type: cumulative, instantaneous,
>   * unit: none for simple counter, nanosecond, microsecond,
>     millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
> Since no lock/synchronization is used, the consistency between all
> the statistics data is not guaranteed. That means not all statistics
> data are read out at the exact same time, since the statistics date
> are still being updated by KVM subsystems while they are read out.
>
> ---
>
> * v4 -> v5
>   - Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
>     'kvmarm-fixes-5.13-1'")
>   - Change maximum stats name length to 48
>   - Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
>     descriptor definition macros.
>   - Fixed some errors/warnings reported by checkpatch.pl
>
> * v3 -> v4
>   - Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
>     between install_new_memslots and MMU notifier")
>   - Use C-stype comments in the whole patch
>   - Fix wrong count for x86 VCPU stats descriptors
>   - Fix KVM stats data size counting and validity check in selftest
>
> * v2 -> v3
>   - Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
>     between install_new_memslots and MMU notifier")
>   - Resolve some nitpicks about format
>
> * v1 -> v2
>   - Use ARRAY_SIZE to count the number of stats descriptors
>   - Fix missing `size` field initialization in macro STATS_DESC
>
> [1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
> [2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
> [3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
> [4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
>
> ---
>
> Jing Zhang (4):
>   KVM: stats: Separate common stats from architecture specific ones
>   KVM: stats: Add fd-based API to read binary stats data
>   KVM: stats: Add documentation for statistics data binary interface
>   KVM: selftests: Add selftest for KVM statistics data binary interface
>
>  Documentation/virt/kvm/api.rst                | 171 ++++++++
>  arch/arm64/include/asm/kvm_host.h             |   9 +-
>  arch/arm64/kvm/guest.c                        |  38 +-
>  arch/mips/include/asm/kvm_host.h              |   9 +-
>  arch/mips/kvm/mips.c                          |  64 ++-
>  arch/powerpc/include/asm/kvm_host.h           |   9 +-
>  arch/powerpc/kvm/book3s.c                     |  64 ++-
>  arch/powerpc/kvm/book3s_hv.c                  |  12 +-
>  arch/powerpc/kvm/book3s_pr.c                  |   2 +-
>  arch/powerpc/kvm/book3s_pr_papr.c             |   2 +-
>  arch/powerpc/kvm/booke.c                      |  59 ++-
>  arch/s390/include/asm/kvm_host.h              |   9 +-
>  arch/s390/kvm/kvm-s390.c                      | 129 +++++-
>  arch/x86/include/asm/kvm_host.h               |   9 +-
>  arch/x86/kvm/x86.c                            |  67 +++-
>  include/linux/kvm_host.h                      | 136 ++++++-
>  include/linux/kvm_types.h                     |  12 +
>  include/uapi/linux/kvm.h                      |  50 +++
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   3 +
>  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
>  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
>  virt/kvm/kvm_main.c                           | 237 ++++++++++-
>  24 files changed, 1396 insertions(+), 90 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
>
>
> base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
> --
> 2.31.1.751.gd2f1c929bd-goog
>
Please use this patchset which has some nontrivial changes and improvements.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-17 14:53 ` [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones Jing Zhang
@ 2021-05-17 23:39   ` David Matlack
  2021-05-18  0:10     ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-17 23:39 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 9:24 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> Put all common statistics in a separate structure to ease
> statistics handling for the incoming new statistics API.
>
> No functional change intended.
>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  9 ++-------
>  arch/arm64/kvm/guest.c              | 12 ++++++------
>  arch/mips/include/asm/kvm_host.h    |  9 ++-------
>  arch/mips/kvm/mips.c                | 12 ++++++------
>  arch/powerpc/include/asm/kvm_host.h |  9 ++-------
>  arch/powerpc/kvm/book3s.c           | 12 ++++++------
>  arch/powerpc/kvm/book3s_hv.c        | 12 ++++++------
>  arch/powerpc/kvm/book3s_pr.c        |  2 +-
>  arch/powerpc/kvm/book3s_pr_papr.c   |  2 +-
>  arch/powerpc/kvm/booke.c            | 14 +++++++-------
>  arch/s390/include/asm/kvm_host.h    |  9 ++-------
>  arch/s390/kvm/kvm-s390.c            | 12 ++++++------
>  arch/x86/include/asm/kvm_host.h     |  9 ++-------
>  arch/x86/kvm/x86.c                  | 14 +++++++-------
>  include/linux/kvm_host.h            |  9 +++++++--
>  include/linux/kvm_types.h           | 12 ++++++++++++
>  virt/kvm/kvm_main.c                 | 14 +++++++-------
>  17 files changed, 82 insertions(+), 90 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cd7d5c8c4bc..f3ad7a20b0af 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -556,16 +556,11 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
>  }
>
>  struct kvm_vm_stat {
> -       ulong remote_tlb_flush;
> +       struct kvm_vm_stat_common common;
>  };
>
>  struct kvm_vcpu_stat {
> -       u64 halt_successful_poll;
> -       u64 halt_attempted_poll;
> -       u64 halt_poll_success_ns;
> -       u64 halt_poll_fail_ns;
> -       u64 halt_poll_invalid;
> -       u64 halt_wakeup;
> +       struct kvm_vcpu_stat_common common;
>         u64 hvc_exit_stat;
>         u64 wfe_exit_stat;
>         u64 wfi_exit_stat;
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 5cb4a1cd5603..0e41331b0911 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -29,18 +29,18 @@
>  #include "trace.h"
>
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),

nit: I may be alone in this but I find using the  the following more readable:

        VCPU_STAT("halt_wakeup", common.halt_wakeup),

>         VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
>         VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
>         VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
>         VCPU_STAT("mmio_exit_user", mmio_exit_user),
>         VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
>         VCPU_STAT("exits", exits),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
>         { NULL }
>  };
>
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index fca4547d580f..6f610fbcd8d1 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -109,10 +109,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
>  }
>
>  struct kvm_vm_stat {
> -       ulong remote_tlb_flush;
> +       struct kvm_vm_stat_common common;
>  };
>
>  struct kvm_vcpu_stat {
> +       struct kvm_vcpu_stat_common common;
>         u64 wait_exits;
>         u64 cache_exits;
>         u64 signal_exits;
> @@ -142,12 +143,6 @@ struct kvm_vcpu_stat {
>  #ifdef CONFIG_CPU_LOONGSON64
>         u64 vz_cpucfg_exits;
>  #endif
> -       u64 halt_successful_poll;
> -       u64 halt_attempted_poll;
> -       u64 halt_poll_success_ns;
> -       u64 halt_poll_fail_ns;
> -       u64 halt_poll_invalid;
> -       u64 halt_wakeup;
>  };
>
>  struct kvm_arch_memory_slot {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 4d4af97dcc88..f4fc60c05e9c 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -68,12 +68,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>  #ifdef CONFIG_CPU_LOONGSON64
>         VCPU_STAT("vz_cpucfg", vz_cpucfg_exits),
>  #endif
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
>         {NULL}
>  };
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 1e83359f286b..473d9d0804ff 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -80,12 +80,13 @@ struct kvmppc_book3s_shadow_vcpu;
>  struct kvm_nested_guest;
>
>  struct kvm_vm_stat {
> -       ulong remote_tlb_flush;
> +       struct kvm_vm_stat_common common;
>         ulong num_2M_pages;
>         ulong num_1G_pages;
>  };
>
>  struct kvm_vcpu_stat {
> +       struct kvm_vcpu_stat_common common;
>         u64 sum_exits;
>         u64 mmio_exits;
>         u64 signal_exits;
> @@ -101,14 +102,8 @@ struct kvm_vcpu_stat {
>         u64 emulated_inst_exits;
>         u64 dec_exits;
>         u64 ext_intr_exits;
> -       u64 halt_poll_success_ns;
> -       u64 halt_poll_fail_ns;
>         u64 halt_wait_ns;
> -       u64 halt_successful_poll;
> -       u64 halt_attempted_poll;
>         u64 halt_successful_wait;
> -       u64 halt_poll_invalid;
> -       u64 halt_wakeup;
>         u64 dbell_exits;
>         u64 gdbell_exits;
>         u64 ld;
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 2b691f4d1f26..bd3a10e1fdaf 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -47,14 +47,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("dec", dec_exits),
>         VCPU_STAT("ext_intr", ext_intr_exits),
>         VCPU_STAT("queue_intr", queue_intr),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
>         VCPU_STAT("halt_wait_ns", halt_wait_ns),
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
>         VCPU_STAT("halt_successful_wait", halt_successful_wait),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
>         VCPU_STAT("pf_storage", pf_storage),
>         VCPU_STAT("sp_storage", sp_storage),
>         VCPU_STAT("pf_instruc", pf_instruc),
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 28a80d240b76..58e187e03c52 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -236,7 +236,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>
>         waitp = kvm_arch_vcpu_get_wait(vcpu);
>         if (rcuwait_wake_up(waitp))
> -               ++vcpu->stat.halt_wakeup;
> +               ++vcpu->stat.common.halt_wakeup;
>
>         cpu = READ_ONCE(vcpu->arch.thread_cpu);
>         if (cpu >= 0 && kvmppc_ipi_thread(cpu))
> @@ -3925,7 +3925,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>         cur = start_poll = ktime_get();
>         if (vc->halt_poll_ns) {
>                 ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
> -               ++vc->runner->stat.halt_attempted_poll;
> +               ++vc->runner->stat.common.halt_attempted_poll;
>
>                 vc->vcore_state = VCORE_POLLING;
>                 spin_unlock(&vc->lock);
> @@ -3942,7 +3942,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>                 vc->vcore_state = VCORE_INACTIVE;
>
>                 if (!do_sleep) {
> -                       ++vc->runner->stat.halt_successful_poll;
> +                       ++vc->runner->stat.common.halt_successful_poll;
>                         goto out;
>                 }
>         }
> @@ -3954,7 +3954,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>                 do_sleep = 0;
>                 /* If we polled, count this as a successful poll */
>                 if (vc->halt_poll_ns)
> -                       ++vc->runner->stat.halt_successful_poll;
> +                       ++vc->runner->stat.common.halt_successful_poll;
>                 goto out;
>         }
>
> @@ -3981,13 +3981,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>                         ktime_to_ns(cur) - ktime_to_ns(start_wait);
>                 /* Attribute failed poll time */
>                 if (vc->halt_poll_ns)
> -                       vc->runner->stat.halt_poll_fail_ns +=
> +                       vc->runner->stat.common.halt_poll_fail_ns +=
>                                 ktime_to_ns(start_wait) -
>                                 ktime_to_ns(start_poll);
>         } else {
>                 /* Attribute successful poll time */
>                 if (vc->halt_poll_ns)
> -                       vc->runner->stat.halt_poll_success_ns +=
> +                       vc->runner->stat.common.halt_poll_success_ns +=
>                                 ktime_to_ns(cur) -
>                                 ktime_to_ns(start_poll);
>         }
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index d7733b07f489..214caa9d9675 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
>                 if (!vcpu->arch.pending_exceptions) {
>                         kvm_vcpu_block(vcpu);
>                         kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> -                       vcpu->stat.halt_wakeup++;
> +                       vcpu->stat.common.halt_wakeup++;
>
>                         /* Unset POW bit after we woke up */
>                         msr &= ~MSR_POW;
> diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> index 031c8015864a..9384625c8051 100644
> --- a/arch/powerpc/kvm/book3s_pr_papr.c
> +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> @@ -378,7 +378,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
>                 kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
>                 kvm_vcpu_block(vcpu);
>                 kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> -               vcpu->stat.halt_wakeup++;
> +               vcpu->stat.common.halt_wakeup++;
>                 return EMULATE_DONE;
>         case H_LOGICAL_CI_LOAD:
>                 return kvmppc_h_pr_logical_ci_load(vcpu);
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 7d5fe43f85c4..07fdd7a1254a 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -49,15 +49,15 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("inst_emu", emulated_inst_exits),
>         VCPU_STAT("dec", dec_exits),
>         VCPU_STAT("ext_intr", ext_intr_exits),
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
>         VCPU_STAT("doorbell", dbell_exits),
>         VCPU_STAT("guest doorbell", gdbell_exits),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> -       VM_STAT("remote_tlb_flush", remote_tlb_flush),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
>         { NULL }
>  };
>
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 8925f3969478..57a20897f3db 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -361,6 +361,7 @@ struct sie_page {
>  };
>
>  struct kvm_vcpu_stat {
> +       struct kvm_vcpu_stat_common common;
>         u64 exit_userspace;
>         u64 exit_null;
>         u64 exit_external_request;
> @@ -370,13 +371,7 @@ struct kvm_vcpu_stat {
>         u64 exit_validity;
>         u64 exit_instruction;
>         u64 exit_pei;
> -       u64 halt_successful_poll;
> -       u64 halt_attempted_poll;
> -       u64 halt_poll_invalid;
>         u64 halt_no_poll_steal;
> -       u64 halt_wakeup;
> -       u64 halt_poll_success_ns;
> -       u64 halt_poll_fail_ns;
>         u64 instruction_lctl;
>         u64 instruction_lctlg;
>         u64 instruction_stctl;
> @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
>  };
>
>  struct kvm_vm_stat {
> +       struct kvm_vm_stat_common common;
>         u64 inject_io;
>         u64 inject_float_mchk;
>         u64 inject_pfault_done;
>         u64 inject_service_signal;
>         u64 inject_virtio;
> -       u64 remote_tlb_flush;
>  };
>
>  struct kvm_arch_memory_slot {
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 1296fc10f80c..d6bf3372bb10 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -72,13 +72,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("exit_program_interruption", exit_program_interruption),
>         VCPU_STAT("exit_instr_and_program_int", exit_instr_and_program),
>         VCPU_STAT("exit_operation_exception", exit_operation_exception),
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
>         VCPU_STAT("halt_no_poll_steal", halt_no_poll_steal),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
>         VCPU_STAT("instruction_lctlg", instruction_lctlg),
>         VCPU_STAT("instruction_lctl", instruction_lctl),
>         VCPU_STAT("instruction_stctl", instruction_stctl),
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..5bfd6893fbf6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1127,6 +1127,7 @@ struct kvm_arch {
>  };
>
>  struct kvm_vm_stat {
> +       struct kvm_vm_stat_common common;
>         ulong mmu_shadow_zapped;
>         ulong mmu_pte_write;
>         ulong mmu_pde_zapped;
> @@ -1134,13 +1135,13 @@ struct kvm_vm_stat {
>         ulong mmu_recycled;
>         ulong mmu_cache_miss;
>         ulong mmu_unsync;
> -       ulong remote_tlb_flush;
>         ulong lpages;
>         ulong nx_lpage_splits;
>         ulong max_mmu_page_hash_collisions;
>  };
>
>  struct kvm_vcpu_stat {
> +       struct kvm_vcpu_stat_common common;
>         u64 pf_fixed;
>         u64 pf_guest;
>         u64 tlb_flush;
> @@ -1154,10 +1155,6 @@ struct kvm_vcpu_stat {
>         u64 nmi_window_exits;
>         u64 l1d_flush;
>         u64 halt_exits;
> -       u64 halt_successful_poll;
> -       u64 halt_attempted_poll;
> -       u64 halt_poll_invalid;
> -       u64 halt_wakeup;
>         u64 request_irq_exits;
>         u64 irq_exits;
>         u64 host_state_reload;
> @@ -1168,8 +1165,6 @@ struct kvm_vcpu_stat {
>         u64 irq_injections;
>         u64 nmi_injections;
>         u64 req_event;
> -       u64 halt_poll_success_ns;
> -       u64 halt_poll_fail_ns;
>         u64 nested_run;
>         u64 directed_yield_attempted;
>         u64 directed_yield_successful;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9b6bca616929..9a93d80caff6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -226,10 +226,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("irq_window", irq_window_exits),
>         VCPU_STAT("nmi_window", nmi_window_exits),
>         VCPU_STAT("halt_exits", halt_exits),
> -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> -       VCPU_STAT("halt_wakeup", halt_wakeup),
> +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
>         VCPU_STAT("hypercalls", hypercalls),
>         VCPU_STAT("request_irq", request_irq_exits),
>         VCPU_STAT("irq_exits", irq_exits),
> @@ -241,8 +241,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("nmi_injections", nmi_injections),
>         VCPU_STAT("req_event", req_event),
>         VCPU_STAT("l1d_flush", l1d_flush),
> -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
>         VCPU_STAT("nested_run", nested_run),
>         VCPU_STAT("directed_yield_attempted", directed_yield_attempted),
>         VCPU_STAT("directed_yield_successful", directed_yield_successful),
> @@ -253,7 +253,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VM_STAT("mmu_recycled", mmu_recycled),
>         VM_STAT("mmu_cache_miss", mmu_cache_miss),
>         VM_STAT("mmu_unsync", mmu_unsync),
> -       VM_STAT("remote_tlb_flush", remote_tlb_flush),
> +       VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
>         VM_STAT("largepages", lpages, .mode = 0444),
>         VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
>         VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2f34487e21f2..97700e41db3b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1243,10 +1243,15 @@ struct kvm_stats_debugfs_item {
>  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
>         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>
> -#define VM_STAT(n, x, ...)                                                     \
> +#define VM_STAT(n, x, ...)                                                    \
>         { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
> -#define VCPU_STAT(n, x, ...)                                                   \
> +#define VCPU_STAT(n, x, ...)                                                  \
>         { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
> +#define VM_STAT_COM(n, x, ...)                                                \
> +       { n, offsetof(struct kvm, stat.common.x), KVM_STAT_VM, ## __VA_ARGS__ }
> +#define VCPU_STAT_COM(n, x, ...)                                              \
> +       { n, offsetof(struct kvm_vcpu, stat.common.x),                         \
> +         KVM_STAT_VCPU, ## __VA_ARGS__ }
>
>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>  extern struct dentry *kvm_debugfs_dir;
> diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> index a7580f69dda0..87eb05ad678b 100644
> --- a/include/linux/kvm_types.h
> +++ b/include/linux/kvm_types.h
> @@ -76,5 +76,17 @@ struct kvm_mmu_memory_cache {
>  };
>  #endif
>
> +struct kvm_vm_stat_common {
> +       ulong remote_tlb_flush;
> +};
> +
> +struct kvm_vcpu_stat_common {
> +       u64 halt_successful_poll;
> +       u64 halt_attempted_poll;
> +       u64 halt_poll_invalid;
> +       u64 halt_wakeup;
> +       u64 halt_poll_success_ns;
> +       u64 halt_poll_fail_ns;
> +};

Putting a "_common" struct here is the opposite of the pattern than
what KVM uses for struct kvm and struct kvm_vcpu. What are your
thoughts on inverting it so the common stats go in struct
kvm_{vcpu,vm}_stat and the arch-specific stats go in a arch-specific
struct kvm_{vcpu,vm}_stat_arch?

I imagine this may result in more churn in this patch since there are
more arch-specific stats than there are common stats, but would result
in a more consistent struct layout.


>
>  #endif /* __KVM_TYPES_H__ */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6b4feb92dc79..34a4cf265297 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -330,7 +330,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>          */
>         if (!kvm_arch_flush_remote_tlb(kvm)
>             || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> -               ++kvm->stat.remote_tlb_flush;
> +               ++kvm->stat.common.remote_tlb_flush;
>         cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>  }
>  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
> @@ -2940,9 +2940,9 @@ static inline void
>  update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
>  {
>         if (waited)
> -               vcpu->stat.halt_poll_fail_ns += poll_ns;
> +               vcpu->stat.common.halt_poll_fail_ns += poll_ns;
>         else
> -               vcpu->stat.halt_poll_success_ns += poll_ns;
> +               vcpu->stat.common.halt_poll_success_ns += poll_ns;
>  }
>
>  /*
> @@ -2960,16 +2960,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>         if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
>                 ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
>
> -               ++vcpu->stat.halt_attempted_poll;
> +               ++vcpu->stat.common.halt_attempted_poll;
>                 do {
>                         /*
>                          * This sets KVM_REQ_UNHALT if an interrupt
>                          * arrives.
>                          */
>                         if (kvm_vcpu_check_block(vcpu) < 0) {
> -                               ++vcpu->stat.halt_successful_poll;
> +                               ++vcpu->stat.common.halt_successful_poll;
>                                 if (!vcpu_valid_wakeup(vcpu))
> -                                       ++vcpu->stat.halt_poll_invalid;
> +                                       ++vcpu->stat.common.halt_poll_invalid;
>                                 goto out;
>                         }
>                         poll_end = cur = ktime_get();
> @@ -3027,7 +3027,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
>         waitp = kvm_arch_vcpu_get_wait(vcpu);
>         if (rcuwait_wake_up(waitp)) {
>                 WRITE_ONCE(vcpu->ready, true);
> -               ++vcpu->stat.halt_wakeup;
> +               ++vcpu->stat.common.halt_wakeup;
>                 return true;
>         }
>
> --
> 2.31.1.751.gd2f1c929bd-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-17 23:39   ` David Matlack
@ 2021-05-18  0:10     ` Jing Zhang
  2021-05-18 16:27       ` David Matlack
  0 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-18  0:10 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Mon, May 17, 2021 at 6:39 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 9:24 AM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Put all common statistics in a separate structure to ease
> > statistics handling for the incoming new statistics API.
> >
> > No functional change intended.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h   |  9 ++-------
> >  arch/arm64/kvm/guest.c              | 12 ++++++------
> >  arch/mips/include/asm/kvm_host.h    |  9 ++-------
> >  arch/mips/kvm/mips.c                | 12 ++++++------
> >  arch/powerpc/include/asm/kvm_host.h |  9 ++-------
> >  arch/powerpc/kvm/book3s.c           | 12 ++++++------
> >  arch/powerpc/kvm/book3s_hv.c        | 12 ++++++------
> >  arch/powerpc/kvm/book3s_pr.c        |  2 +-
> >  arch/powerpc/kvm/book3s_pr_papr.c   |  2 +-
> >  arch/powerpc/kvm/booke.c            | 14 +++++++-------
> >  arch/s390/include/asm/kvm_host.h    |  9 ++-------
> >  arch/s390/kvm/kvm-s390.c            | 12 ++++++------
> >  arch/x86/include/asm/kvm_host.h     |  9 ++-------
> >  arch/x86/kvm/x86.c                  | 14 +++++++-------
> >  include/linux/kvm_host.h            |  9 +++++++--
> >  include/linux/kvm_types.h           | 12 ++++++++++++
> >  virt/kvm/kvm_main.c                 | 14 +++++++-------
> >  17 files changed, 82 insertions(+), 90 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..f3ad7a20b0af 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -556,16 +556,11 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
> >  }
> >
> >  struct kvm_vm_stat {
> > -       ulong remote_tlb_flush;
> > +       struct kvm_vm_stat_common common;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > -       u64 halt_successful_poll;
> > -       u64 halt_attempted_poll;
> > -       u64 halt_poll_success_ns;
> > -       u64 halt_poll_fail_ns;
> > -       u64 halt_poll_invalid;
> > -       u64 halt_wakeup;
> > +       struct kvm_vcpu_stat_common common;
> >         u64 hvc_exit_stat;
> >         u64 wfe_exit_stat;
> >         u64 wfi_exit_stat;
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 5cb4a1cd5603..0e41331b0911 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -29,18 +29,18 @@
> >  #include "trace.h"
> >
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
>
> nit: I may be alone in this but I find using the  the following more readable:
>
>         VCPU_STAT("halt_wakeup", common.halt_wakeup),
>
> >         VCPU_STAT("hvc_exit_stat", hvc_exit_stat),
> >         VCPU_STAT("wfe_exit_stat", wfe_exit_stat),
> >         VCPU_STAT("wfi_exit_stat", wfi_exit_stat),
> >         VCPU_STAT("mmio_exit_user", mmio_exit_user),
> >         VCPU_STAT("mmio_exit_kernel", mmio_exit_kernel),
> >         VCPU_STAT("exits", exits),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> >         { NULL }
> >  };
> >
> > diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> > index fca4547d580f..6f610fbcd8d1 100644
> > --- a/arch/mips/include/asm/kvm_host.h
> > +++ b/arch/mips/include/asm/kvm_host.h
> > @@ -109,10 +109,11 @@ static inline bool kvm_is_error_hva(unsigned long addr)
> >  }
> >
> >  struct kvm_vm_stat {
> > -       ulong remote_tlb_flush;
> > +       struct kvm_vm_stat_common common;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > +       struct kvm_vcpu_stat_common common;
> >         u64 wait_exits;
> >         u64 cache_exits;
> >         u64 signal_exits;
> > @@ -142,12 +143,6 @@ struct kvm_vcpu_stat {
> >  #ifdef CONFIG_CPU_LOONGSON64
> >         u64 vz_cpucfg_exits;
> >  #endif
> > -       u64 halt_successful_poll;
> > -       u64 halt_attempted_poll;
> > -       u64 halt_poll_success_ns;
> > -       u64 halt_poll_fail_ns;
> > -       u64 halt_poll_invalid;
> > -       u64 halt_wakeup;
> >  };
> >
> >  struct kvm_arch_memory_slot {
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index 4d4af97dcc88..f4fc60c05e9c 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -68,12 +68,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >  #ifdef CONFIG_CPU_LOONGSON64
> >         VCPU_STAT("vz_cpucfg", vz_cpucfg_exits),
> >  #endif
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> >         {NULL}
> >  };
> >
> > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> > index 1e83359f286b..473d9d0804ff 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -80,12 +80,13 @@ struct kvmppc_book3s_shadow_vcpu;
> >  struct kvm_nested_guest;
> >
> >  struct kvm_vm_stat {
> > -       ulong remote_tlb_flush;
> > +       struct kvm_vm_stat_common common;
> >         ulong num_2M_pages;
> >         ulong num_1G_pages;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > +       struct kvm_vcpu_stat_common common;
> >         u64 sum_exits;
> >         u64 mmio_exits;
> >         u64 signal_exits;
> > @@ -101,14 +102,8 @@ struct kvm_vcpu_stat {
> >         u64 emulated_inst_exits;
> >         u64 dec_exits;
> >         u64 ext_intr_exits;
> > -       u64 halt_poll_success_ns;
> > -       u64 halt_poll_fail_ns;
> >         u64 halt_wait_ns;
> > -       u64 halt_successful_poll;
> > -       u64 halt_attempted_poll;
> >         u64 halt_successful_wait;
> > -       u64 halt_poll_invalid;
> > -       u64 halt_wakeup;
> >         u64 dbell_exits;
> >         u64 gdbell_exits;
> >         u64 ld;
> > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> > index 2b691f4d1f26..bd3a10e1fdaf 100644
> > --- a/arch/powerpc/kvm/book3s.c
> > +++ b/arch/powerpc/kvm/book3s.c
> > @@ -47,14 +47,14 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("dec", dec_exits),
> >         VCPU_STAT("ext_intr", ext_intr_exits),
> >         VCPU_STAT("queue_intr", queue_intr),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> >         VCPU_STAT("halt_wait_ns", halt_wait_ns),
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> >         VCPU_STAT("halt_successful_wait", halt_successful_wait),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> >         VCPU_STAT("pf_storage", pf_storage),
> >         VCPU_STAT("sp_storage", sp_storage),
> >         VCPU_STAT("pf_instruc", pf_instruc),
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 28a80d240b76..58e187e03c52 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -236,7 +236,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
> >
> >         waitp = kvm_arch_vcpu_get_wait(vcpu);
> >         if (rcuwait_wake_up(waitp))
> > -               ++vcpu->stat.halt_wakeup;
> > +               ++vcpu->stat.common.halt_wakeup;
> >
> >         cpu = READ_ONCE(vcpu->arch.thread_cpu);
> >         if (cpu >= 0 && kvmppc_ipi_thread(cpu))
> > @@ -3925,7 +3925,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >         cur = start_poll = ktime_get();
> >         if (vc->halt_poll_ns) {
> >                 ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
> > -               ++vc->runner->stat.halt_attempted_poll;
> > +               ++vc->runner->stat.common.halt_attempted_poll;
> >
> >                 vc->vcore_state = VCORE_POLLING;
> >                 spin_unlock(&vc->lock);
> > @@ -3942,7 +3942,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >                 vc->vcore_state = VCORE_INACTIVE;
> >
> >                 if (!do_sleep) {
> > -                       ++vc->runner->stat.halt_successful_poll;
> > +                       ++vc->runner->stat.common.halt_successful_poll;
> >                         goto out;
> >                 }
> >         }
> > @@ -3954,7 +3954,7 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >                 do_sleep = 0;
> >                 /* If we polled, count this as a successful poll */
> >                 if (vc->halt_poll_ns)
> > -                       ++vc->runner->stat.halt_successful_poll;
> > +                       ++vc->runner->stat.common.halt_successful_poll;
> >                 goto out;
> >         }
> >
> > @@ -3981,13 +3981,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
> >                         ktime_to_ns(cur) - ktime_to_ns(start_wait);
> >                 /* Attribute failed poll time */
> >                 if (vc->halt_poll_ns)
> > -                       vc->runner->stat.halt_poll_fail_ns +=
> > +                       vc->runner->stat.common.halt_poll_fail_ns +=
> >                                 ktime_to_ns(start_wait) -
> >                                 ktime_to_ns(start_poll);
> >         } else {
> >                 /* Attribute successful poll time */
> >                 if (vc->halt_poll_ns)
> > -                       vc->runner->stat.halt_poll_success_ns +=
> > +                       vc->runner->stat.common.halt_poll_success_ns +=
> >                                 ktime_to_ns(cur) -
> >                                 ktime_to_ns(start_poll);
> >         }
> > diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> > index d7733b07f489..214caa9d9675 100644
> > --- a/arch/powerpc/kvm/book3s_pr.c
> > +++ b/arch/powerpc/kvm/book3s_pr.c
> > @@ -493,7 +493,7 @@ static void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
> >                 if (!vcpu->arch.pending_exceptions) {
> >                         kvm_vcpu_block(vcpu);
> >                         kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> > -                       vcpu->stat.halt_wakeup++;
> > +                       vcpu->stat.common.halt_wakeup++;
> >
> >                         /* Unset POW bit after we woke up */
> >                         msr &= ~MSR_POW;
> > diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
> > index 031c8015864a..9384625c8051 100644
> > --- a/arch/powerpc/kvm/book3s_pr_papr.c
> > +++ b/arch/powerpc/kvm/book3s_pr_papr.c
> > @@ -378,7 +378,7 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
> >                 kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE);
> >                 kvm_vcpu_block(vcpu);
> >                 kvm_clear_request(KVM_REQ_UNHALT, vcpu);
> > -               vcpu->stat.halt_wakeup++;
> > +               vcpu->stat.common.halt_wakeup++;
> >                 return EMULATE_DONE;
> >         case H_LOGICAL_CI_LOAD:
> >                 return kvmppc_h_pr_logical_ci_load(vcpu);
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 7d5fe43f85c4..07fdd7a1254a 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -49,15 +49,15 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("inst_emu", emulated_inst_exits),
> >         VCPU_STAT("dec", dec_exits),
> >         VCPU_STAT("ext_intr", ext_intr_exits),
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> >         VCPU_STAT("doorbell", dbell_exits),
> >         VCPU_STAT("guest doorbell", gdbell_exits),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > -       VM_STAT("remote_tlb_flush", remote_tlb_flush),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
> >         { NULL }
> >  };
> >
> > diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> > index 8925f3969478..57a20897f3db 100644
> > --- a/arch/s390/include/asm/kvm_host.h
> > +++ b/arch/s390/include/asm/kvm_host.h
> > @@ -361,6 +361,7 @@ struct sie_page {
> >  };
> >
> >  struct kvm_vcpu_stat {
> > +       struct kvm_vcpu_stat_common common;
> >         u64 exit_userspace;
> >         u64 exit_null;
> >         u64 exit_external_request;
> > @@ -370,13 +371,7 @@ struct kvm_vcpu_stat {
> >         u64 exit_validity;
> >         u64 exit_instruction;
> >         u64 exit_pei;
> > -       u64 halt_successful_poll;
> > -       u64 halt_attempted_poll;
> > -       u64 halt_poll_invalid;
> >         u64 halt_no_poll_steal;
> > -       u64 halt_wakeup;
> > -       u64 halt_poll_success_ns;
> > -       u64 halt_poll_fail_ns;
> >         u64 instruction_lctl;
> >         u64 instruction_lctlg;
> >         u64 instruction_stctl;
> > @@ -755,12 +750,12 @@ struct kvm_vcpu_arch {
> >  };
> >
> >  struct kvm_vm_stat {
> > +       struct kvm_vm_stat_common common;
> >         u64 inject_io;
> >         u64 inject_float_mchk;
> >         u64 inject_pfault_done;
> >         u64 inject_service_signal;
> >         u64 inject_virtio;
> > -       u64 remote_tlb_flush;
> >  };
> >
> >  struct kvm_arch_memory_slot {
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 1296fc10f80c..d6bf3372bb10 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -72,13 +72,13 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("exit_program_interruption", exit_program_interruption),
> >         VCPU_STAT("exit_instr_and_program_int", exit_instr_and_program),
> >         VCPU_STAT("exit_operation_exception", exit_operation_exception),
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> >         VCPU_STAT("halt_no_poll_steal", halt_no_poll_steal),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> >         VCPU_STAT("instruction_lctlg", instruction_lctlg),
> >         VCPU_STAT("instruction_lctl", instruction_lctl),
> >         VCPU_STAT("instruction_stctl", instruction_stctl),
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 55efbacfc244..5bfd6893fbf6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1127,6 +1127,7 @@ struct kvm_arch {
> >  };
> >
> >  struct kvm_vm_stat {
> > +       struct kvm_vm_stat_common common;
> >         ulong mmu_shadow_zapped;
> >         ulong mmu_pte_write;
> >         ulong mmu_pde_zapped;
> > @@ -1134,13 +1135,13 @@ struct kvm_vm_stat {
> >         ulong mmu_recycled;
> >         ulong mmu_cache_miss;
> >         ulong mmu_unsync;
> > -       ulong remote_tlb_flush;
> >         ulong lpages;
> >         ulong nx_lpage_splits;
> >         ulong max_mmu_page_hash_collisions;
> >  };
> >
> >  struct kvm_vcpu_stat {
> > +       struct kvm_vcpu_stat_common common;
> >         u64 pf_fixed;
> >         u64 pf_guest;
> >         u64 tlb_flush;
> > @@ -1154,10 +1155,6 @@ struct kvm_vcpu_stat {
> >         u64 nmi_window_exits;
> >         u64 l1d_flush;
> >         u64 halt_exits;
> > -       u64 halt_successful_poll;
> > -       u64 halt_attempted_poll;
> > -       u64 halt_poll_invalid;
> > -       u64 halt_wakeup;
> >         u64 request_irq_exits;
> >         u64 irq_exits;
> >         u64 host_state_reload;
> > @@ -1168,8 +1165,6 @@ struct kvm_vcpu_stat {
> >         u64 irq_injections;
> >         u64 nmi_injections;
> >         u64 req_event;
> > -       u64 halt_poll_success_ns;
> > -       u64 halt_poll_fail_ns;
> >         u64 nested_run;
> >         u64 directed_yield_attempted;
> >         u64 directed_yield_successful;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9b6bca616929..9a93d80caff6 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -226,10 +226,10 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("irq_window", irq_window_exits),
> >         VCPU_STAT("nmi_window", nmi_window_exits),
> >         VCPU_STAT("halt_exits", halt_exits),
> > -       VCPU_STAT("halt_successful_poll", halt_successful_poll),
> > -       VCPU_STAT("halt_attempted_poll", halt_attempted_poll),
> > -       VCPU_STAT("halt_poll_invalid", halt_poll_invalid),
> > -       VCPU_STAT("halt_wakeup", halt_wakeup),
> > +       VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> > +       VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > +       VCPU_STAT_COM("halt_poll_invalid", halt_poll_invalid),
> > +       VCPU_STAT_COM("halt_wakeup", halt_wakeup),
> >         VCPU_STAT("hypercalls", hypercalls),
> >         VCPU_STAT("request_irq", request_irq_exits),
> >         VCPU_STAT("irq_exits", irq_exits),
> > @@ -241,8 +241,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("nmi_injections", nmi_injections),
> >         VCPU_STAT("req_event", req_event),
> >         VCPU_STAT("l1d_flush", l1d_flush),
> > -       VCPU_STAT("halt_poll_success_ns", halt_poll_success_ns),
> > -       VCPU_STAT("halt_poll_fail_ns", halt_poll_fail_ns),
> > +       VCPU_STAT_COM("halt_poll_success_ns", halt_poll_success_ns),
> > +       VCPU_STAT_COM("halt_poll_fail_ns", halt_poll_fail_ns),
> >         VCPU_STAT("nested_run", nested_run),
> >         VCPU_STAT("directed_yield_attempted", directed_yield_attempted),
> >         VCPU_STAT("directed_yield_successful", directed_yield_successful),
> > @@ -253,7 +253,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VM_STAT("mmu_recycled", mmu_recycled),
> >         VM_STAT("mmu_cache_miss", mmu_cache_miss),
> >         VM_STAT("mmu_unsync", mmu_unsync),
> > -       VM_STAT("remote_tlb_flush", remote_tlb_flush),
> > +       VM_STAT_COM("remote_tlb_flush", remote_tlb_flush),
> >         VM_STAT("largepages", lpages, .mode = 0444),
> >         VM_STAT("nx_largepages_splitted", nx_lpage_splits, .mode = 0444),
> >         VM_STAT("max_mmu_page_hash_collisions", max_mmu_page_hash_collisions),
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 2f34487e21f2..97700e41db3b 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1243,10 +1243,15 @@ struct kvm_stats_debugfs_item {
> >  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
> >         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
> >
> > -#define VM_STAT(n, x, ...)                                                     \
> > +#define VM_STAT(n, x, ...)                                                    \
> >         { n, offsetof(struct kvm, stat.x), KVM_STAT_VM, ## __VA_ARGS__ }
> > -#define VCPU_STAT(n, x, ...)                                                   \
> > +#define VCPU_STAT(n, x, ...)                                                  \
> >         { n, offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU, ## __VA_ARGS__ }
> > +#define VM_STAT_COM(n, x, ...)                                                \
> > +       { n, offsetof(struct kvm, stat.common.x), KVM_STAT_VM, ## __VA_ARGS__ }
> > +#define VCPU_STAT_COM(n, x, ...)                                              \
> > +       { n, offsetof(struct kvm_vcpu, stat.common.x),                         \
> > +         KVM_STAT_VCPU, ## __VA_ARGS__ }
> >
> >  extern struct kvm_stats_debugfs_item debugfs_entries[];
> >  extern struct dentry *kvm_debugfs_dir;
> > diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
> > index a7580f69dda0..87eb05ad678b 100644
> > --- a/include/linux/kvm_types.h
> > +++ b/include/linux/kvm_types.h
> > @@ -76,5 +76,17 @@ struct kvm_mmu_memory_cache {
> >  };
> >  #endif
> >
> > +struct kvm_vm_stat_common {
> > +       ulong remote_tlb_flush;
> > +};
> > +
> > +struct kvm_vcpu_stat_common {
> > +       u64 halt_successful_poll;
> > +       u64 halt_attempted_poll;
> > +       u64 halt_poll_invalid;
> > +       u64 halt_wakeup;
> > +       u64 halt_poll_success_ns;
> > +       u64 halt_poll_fail_ns;
> > +};
>
> Putting a "_common" struct here is the opposite of the pattern than
> what KVM uses for struct kvm and struct kvm_vcpu. What are your
> thoughts on inverting it so the common stats go in struct
> kvm_{vcpu,vm}_stat and the arch-specific stats go in a arch-specific
> struct kvm_{vcpu,vm}_stat_arch?
>
> I imagine this may result in more churn in this patch since there are
> more arch-specific stats than there are common stats, but would result
> in a more consistent struct layout.
Actually the definition of kvm_{vcpu,vm}_stat are arch specific. There is
no real structure for arch agnostic stats. Most of the stats in common
structures are arch agnostic, but not all of them.
There are some benefits to put all common stats in a separate structure.
e.g. if we want to add a stat in kvm_main.c, we only need to add this stat
in the common structure, don't have to update all kvm_{vcpu,vm}_stat
definition for all architectures.
>
>
> >
> >  #endif /* __KVM_TYPES_H__ */
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 6b4feb92dc79..34a4cf265297 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -330,7 +330,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> >          */
> >         if (!kvm_arch_flush_remote_tlb(kvm)
> >             || kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> > -               ++kvm->stat.remote_tlb_flush;
> > +               ++kvm->stat.common.remote_tlb_flush;
> >         cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
> > @@ -2940,9 +2940,9 @@ static inline void
> >  update_halt_poll_stats(struct kvm_vcpu *vcpu, u64 poll_ns, bool waited)
> >  {
> >         if (waited)
> > -               vcpu->stat.halt_poll_fail_ns += poll_ns;
> > +               vcpu->stat.common.halt_poll_fail_ns += poll_ns;
> >         else
> > -               vcpu->stat.halt_poll_success_ns += poll_ns;
> > +               vcpu->stat.common.halt_poll_success_ns += poll_ns;
> >  }
> >
> >  /*
> > @@ -2960,16 +2960,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >         if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) {
> >                 ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
> >
> > -               ++vcpu->stat.halt_attempted_poll;
> > +               ++vcpu->stat.common.halt_attempted_poll;
> >                 do {
> >                         /*
> >                          * This sets KVM_REQ_UNHALT if an interrupt
> >                          * arrives.
> >                          */
> >                         if (kvm_vcpu_check_block(vcpu) < 0) {
> > -                               ++vcpu->stat.halt_successful_poll;
> > +                               ++vcpu->stat.common.halt_successful_poll;
> >                                 if (!vcpu_valid_wakeup(vcpu))
> > -                                       ++vcpu->stat.halt_poll_invalid;
> > +                                       ++vcpu->stat.common.halt_poll_invalid;
> >                                 goto out;
> >                         }
> >                         poll_end = cur = ktime_get();
> > @@ -3027,7 +3027,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
> >         waitp = kvm_arch_vcpu_get_wait(vcpu);
> >         if (rcuwait_wake_up(waitp)) {
> >                 WRITE_ONCE(vcpu->ready, true);
> > -               ++vcpu->stat.halt_wakeup;
> > +               ++vcpu->stat.common.halt_wakeup;
> >                 return true;
> >         }
> >
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-18  0:10     ` Jing Zhang
@ 2021-05-18 16:27       ` David Matlack
  2021-05-18 17:25         ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-18 16:27 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 5:10 PM Jing Zhang <jingzhangos@google.com> wrote:
<snip>
> Actually the definition of kvm_{vcpu,vm}_stat are arch specific. There is
> no real structure for arch agnostic stats. Most of the stats in common
> structures are arch agnostic, but not all of them.
> There are some benefits to put all common stats in a separate structure.
> e.g. if we want to add a stat in kvm_main.c, we only need to add this stat
> in the common structure, don't have to update all kvm_{vcpu,vm}_stat
> definition for all architectures.

I meant rename the existing arch-specific struct kvm_{vcpu,vm}_stat to
kvm_{vcpu,vm}_stat_arch and rename struct kvm_{vcpu,vm}_stat_common to
kvm_{vcpu,vm}_stat.

So in  include/linux/kvm_types.h you'd have:

struct kvm_vm_stat {
  ulong remote_tlb_flush;
  struct kvm_vm_stat_arch arch;
};

struct kvm_vcpu_stat {
  u64 halt_successful_poll;
  u64 halt_attempted_poll;
  u64 halt_poll_invalid;
  u64 halt_wakeup;
  u64 halt_poll_success_ns;
  u64 halt_poll_fail_ns;
  struct kvm_vcpu_stat_arch arch;
};

And in arch/x86/include/asm/kvm_host.h you'd have:

struct kvm_vm_stat_arch {
  ulong mmu_shadow_zapped;
  ...
};

struct kvm_vcpu_stat_arch {
  u64 pf_fixed;
  u64 pf_guest;
  u64 tlb_flush;
  ...
};

You still have the same benefits of having an arch-neutral place to
store stats but the struct layout more closely resembles struct
kvm_vcpu and struct kvm.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-18 16:27       ` David Matlack
@ 2021-05-18 17:25         ` Jing Zhang
  2021-05-18 18:40           ` Krish Sadhukhan
  0 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-18 17:25 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Tue, May 18, 2021 at 11:27 AM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 5:10 PM Jing Zhang <jingzhangos@google.com> wrote:
> <snip>
> > Actually the definition of kvm_{vcpu,vm}_stat are arch specific. There is
> > no real structure for arch agnostic stats. Most of the stats in common
> > structures are arch agnostic, but not all of them.
> > There are some benefits to put all common stats in a separate structure.
> > e.g. if we want to add a stat in kvm_main.c, we only need to add this stat
> > in the common structure, don't have to update all kvm_{vcpu,vm}_stat
> > definition for all architectures.
>
> I meant rename the existing arch-specific struct kvm_{vcpu,vm}_stat to
> kvm_{vcpu,vm}_stat_arch and rename struct kvm_{vcpu,vm}_stat_common to
> kvm_{vcpu,vm}_stat.
>
> So in  include/linux/kvm_types.h you'd have:
>
> struct kvm_vm_stat {
>   ulong remote_tlb_flush;
>   struct kvm_vm_stat_arch arch;
> };
>
> struct kvm_vcpu_stat {
>   u64 halt_successful_poll;
>   u64 halt_attempted_poll;
>   u64 halt_poll_invalid;
>   u64 halt_wakeup;
>   u64 halt_poll_success_ns;
>   u64 halt_poll_fail_ns;
>   struct kvm_vcpu_stat_arch arch;
> };
>
> And in arch/x86/include/asm/kvm_host.h you'd have:
>
> struct kvm_vm_stat_arch {
>   ulong mmu_shadow_zapped;
>   ...
> };
>
> struct kvm_vcpu_stat_arch {
>   u64 pf_fixed;
>   u64 pf_guest;
>   u64 tlb_flush;
>   ...
> };
>
> You still have the same benefits of having an arch-neutral place to
> store stats but the struct layout more closely resembles struct
> kvm_vcpu and struct kvm.
You are right. This is a more reasonable way to layout the structures.
I remember that I didn't choose this way is only because that it needs
touching every arch specific stats in all architectures (stat.name ->
stat.arch.name) instead of only touching arch neutral stats.
Let's see if there is any vote from others about this.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-18 17:25         ` Jing Zhang
@ 2021-05-18 18:40           ` Krish Sadhukhan
  2021-05-21 19:04             ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: Krish Sadhukhan @ 2021-05-18 18:40 UTC (permalink / raw)
  To: Jing Zhang, David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito


On 5/18/21 10:25 AM, Jing Zhang wrote:
> Hi David,
>
> On Tue, May 18, 2021 at 11:27 AM David Matlack <dmatlack@google.com> wrote:
>> On Mon, May 17, 2021 at 5:10 PM Jing Zhang <jingzhangos@google.com> wrote:
>> <snip>
>>> Actually the definition of kvm_{vcpu,vm}_stat are arch specific. There is
>>> no real structure for arch agnostic stats. Most of the stats in common
>>> structures are arch agnostic, but not all of them.
>>> There are some benefits to put all common stats in a separate structure.
>>> e.g. if we want to add a stat in kvm_main.c, we only need to add this stat
>>> in the common structure, don't have to update all kvm_{vcpu,vm}_stat
>>> definition for all architectures.
>> I meant rename the existing arch-specific struct kvm_{vcpu,vm}_stat to
>> kvm_{vcpu,vm}_stat_arch and rename struct kvm_{vcpu,vm}_stat_common to
>> kvm_{vcpu,vm}_stat.
>>
>> So in  include/linux/kvm_types.h you'd have:
>>
>> struct kvm_vm_stat {
>>    ulong remote_tlb_flush;
>>    struct kvm_vm_stat_arch arch;
>> };
>>
>> struct kvm_vcpu_stat {
>>    u64 halt_successful_poll;
>>    u64 halt_attempted_poll;
>>    u64 halt_poll_invalid;
>>    u64 halt_wakeup;
>>    u64 halt_poll_success_ns;
>>    u64 halt_poll_fail_ns;
>>    struct kvm_vcpu_stat_arch arch;
>> };
>>
>> And in arch/x86/include/asm/kvm_host.h you'd have:
>>
>> struct kvm_vm_stat_arch {
>>    ulong mmu_shadow_zapped;
>>    ...
>> };
>>
>> struct kvm_vcpu_stat_arch {
>>    u64 pf_fixed;
>>    u64 pf_guest;
>>    u64 tlb_flush;
>>    ...
>> };
>>
>> You still have the same benefits of having an arch-neutral place to
>> store stats but the struct layout more closely resembles struct
>> kvm_vcpu and struct kvm.
> You are right. This is a more reasonable way to layout the structures.
> I remember that I didn't choose this way is only because that it needs
> touching every arch specific stats in all architectures (stat.name ->
> stat.arch.name) instead of only touching arch neutral stats.
> Let's see if there is any vote from others about this.


+1

>
> Thanks,
> Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-17 14:53 ` [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
@ 2021-05-19 16:57   ` David Matlack
  2021-05-19 19:29     ` Jing Zhang
  2021-05-19 17:02   ` David Matlack
  1 sibling, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-19 16:57 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 9:25 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> Update KVM API documentation for binary statistics.
>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
>  1 file changed, 171 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..9a6aa9770dfd 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
>  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>  with the KVM_XEN_VCPU_GET_ATTR ioctl.
>
> +4.130 KVM_STATS_GETFD
> +---------------------
> +
> +:Capability: KVM_CAP_STATS_BINARY_FD
> +:Architectures: all
> +:Type: vm ioctl, vcpu ioctl
> +:Parameters: none
> +:Returns: statistics file descriptor on success, < 0 on error
> +
> +Errors:
> +
> +  ======     ======================================================
> +  ENOMEM     if the fd could not be created due to lack of memory
> +  EMFILE     if the number of opened files exceeds the limit
> +  ======     ======================================================
> +
> +The file descriptor can be used to read VM/vCPU statistics data in binary
> +format. The file data is organized into three blocks as below:
> ++-------------+
> +|   Header    |
> ++-------------+
> +| Descriptors |
> ++-------------+
> +| Stats Data  |
> ++-------------+
> +
> +The Header block is always at the start of the file. It is only needed to be
> +read one time after a system boot.

By system boot do you mean the host or the VM? If the host then it's
probably just cleaner to omit that part entirely and just say "It is
only needed to be read once.".

> +It is in the form of ``struct kvm_stats_header`` as below::
> +
> +       #define KVM_STATS_ID_MAXLEN             64
> +
> +       struct kvm_stats_header {
> +               char id[KVM_STATS_ID_MAXLEN];
> +               __u32 name_size;
> +               __u32 count;
> +               __u32 desc_offset;
> +               __u32 data_offset;
> +       };
> +
> +The ``id`` field is identification for the corresponding KVM statistics. For
> +KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For

Should this say "For VM statistics, ..." instead?

> +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> +"kvm-12345/vcpu-12".
> +
> +The ``name_size`` field is the size (byte) of the statistics name string
> +(including trailing '\0') appended to the end of every statistics descriptor.
> +
> +The ``count`` field is the number of statistics.
> +
> +The ``desc_offset`` field is the offset of the Descriptors block from the start
> +of the file indicated by the file descriptor.
> +
> +The ``data_offset`` field is the offset of the Stats Data block from the start
> +of the file indicated by the file descriptor.
> +
> +The Descriptors block is only needed to be read once after a system boot. It is

Ditto here about system boot.

> +an array of ``struct kvm_stats_desc`` as below::

Consider omitting these macros from the documentation, or moving them
to later. Readers right here are expecting to see the struct
kvm_stats_desc given the previous line.

> +
> +       #define KVM_STATS_TYPE_SHIFT            0
> +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> +
> +       #define KVM_STATS_UNIT_SHIFT            4
> +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> +
> +       #define KVM_STATS_SCALE_SHIFT           8
> +       #define KVM_STATS_SCALE_MASK            (0xF << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_POW10           (0x0 << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_POW2            (0x1 << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_MAX             KVM_STATS_SCALE_POW2

Terminology nit: I think usually this part is called the "base". e.g.
when you decompose a number X into N * B^E, B is the "base" and E is
the "exponent". I see you're using "exponent" already but it might
make sense to change "scale" to "base" throughout this series.

> +
> +       struct kvm_stats_desc {
> +               __u32 flags;
> +               __s16 exponent;
> +               __u16 size;
> +               __u32 unused1;
> +               __u32 unused2;
> +               char name[0];
> +       };
> +
> +The ``flags`` field contains the type and unit of the statistics data described
> +by this descriptor. The following flags are supported:

nit: Suggest breaking this list out into separate lists so readers can
differentiate between the type, unit, and scale. Something like:

Bits 0-3 of ``flags`` encode the type:

* ``KVM_STATS_TYPE_CUMULATIVE`` ...
* ``KVM_STATS_TYPE_INSTANT`` ...

Bits 4-7 of ``flags encode the unit:

* ``KVM_STATS_UNIT_NONE`` ...
...
etc.

> +  * ``KVM_STATS_TYPE_CUMULATIVE``
> +    The statistics data is cumulative. The value of data can only be increased.
> +    Most of the counters used in KVM are of this type.
> +    The corresponding ``count`` filed for this type is always 1.
> +  * ``KVM_STATS_TYPE_INSTANT``
> +    The statistics data is instantaneous. Its value can be increased or
> +    decreased. This type is usually used as a measurement of some resources,
> +    like the number of dirty pages, the number of large pages, etc.
> +    The corresponding ``count`` field for this type is always 1.
> +  * ``KVM_STATS_UNIT_NONE``
> +    There is no unit for the value of statistics data. This usually means that
> +    the value is a simple counter of an event.
> +  * ``KVM_STATS_UNIT_BYTES``
> +    It indicates that the statistics data is used to measure memory size, in the
> +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> +    determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
> +    determined by ``pow(2, exponent)``. For example, if value is 10,
> +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> +    can get the statistics data in the unit of Byte by
> +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> +    10 * 1024 * 1024 Bytes.
> +  * ``KVM_STATS_UNIT_SECONDS``
> +    It indicates that the statistics data is used to measure time/latency, in
> +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> +    data is determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
> +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> +    we can get the statistics data in the unit of second by
> +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> +  * ``KVM_STATS_UNIT_CYCLES``
> +    It indicates that the statistics data is used to measure CPU clock cycles.
> +    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
> +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> +
> +The ``exponent`` field is the scale of corresponding statistics data. It has two
> +values as follows:
> +  * ``KVM_STATS_SCALE_POW10``

I thought the scale was encoded in ``flags`` not ``exponent``? Isn't
the exponent the

> +    The scale is based on power of 10. It is used for measurement of time and
> +    CPU clock cycles.
> +  * ``KVM_STATS_SCALE_POW2``
> +    The scale is based on power of 2. It is used for measurement of memory size.

It might be useful to give an example of how to use the exponent field
in practice.

> +
> +The ``size`` field is the number of values of this statistics data. It is in the
> +unit of ``unsigned long`` for VCPU or ``__u64`` for VM.
> +
> +The ``unused1`` and ``unused2`` fields are reserved for future
> +support for other types of statistics data, like log/linear histogram.
> +
> +The ``name`` field points to the name string of the statistics data. The name
> +string starts at the end of ``struct kvm_stats_desc``.
> +The maximum length (including trailing '\0') is indicated by ``name_size``
> +in ``struct kvm_stats_header``.
> +
> +The Stats Data block contains an array of data values of type ``struct
> +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> +user space periodically to pull statistics data.
> +The order of data value in Stats Data block is the same as the order of
> +descriptors in Descriptors block.
> +  * Statistics data for VM::
> +
> +       struct kvm_vm_stats_data {
> +               unsigned long value[0];
> +       };
> +
> +  * Statistics data for VCPU::
> +
> +       struct kvm_vcpu_stats_data {
> +               __u64 value[0];
> +       };
> +
>  5. The kvm_run structure
>  ========================
>
> @@ -6891,3 +7054,11 @@ This capability is always enabled.
>  This capability indicates that the KVM virtual PTP service is
>  supported in the host. A VMM can check whether the service is
>  available to the guest on migration.
> +
> +8.33 KVM_CAP_STATS_BINARY_FD
> +----------------------------
> +
> +:Architectures: all
> +
> +This capability indicates the feature that user space can create get a file
> +descriptor for every VM and VCPU to read statistics data in binary format.
> --
> 2.31.1.751.gd2f1c929bd-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-17 14:53 ` [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
  2021-05-19 16:57   ` David Matlack
@ 2021-05-19 17:02   ` David Matlack
  2021-05-19 19:30     ` Jing Zhang
  1 sibling, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-19 17:02 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 9:25 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> Update KVM API documentation for binary statistics.
>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
>  1 file changed, 171 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..9a6aa9770dfd 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
>  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
>  with the KVM_XEN_VCPU_GET_ATTR ioctl.
>
> +4.130 KVM_STATS_GETFD
> +---------------------
> +
> +:Capability: KVM_CAP_STATS_BINARY_FD
> +:Architectures: all
> +:Type: vm ioctl, vcpu ioctl
> +:Parameters: none
> +:Returns: statistics file descriptor on success, < 0 on error
> +
> +Errors:
> +
> +  ======     ======================================================
> +  ENOMEM     if the fd could not be created due to lack of memory
> +  EMFILE     if the number of opened files exceeds the limit
> +  ======     ======================================================
> +
> +The file descriptor can be used to read VM/vCPU statistics data in binary
> +format. The file data is organized into three blocks as below:
> ++-------------+
> +|   Header    |
> ++-------------+
> +| Descriptors |
> ++-------------+
> +| Stats Data  |
> ++-------------+
> +
> +The Header block is always at the start of the file. It is only needed to be
> +read one time after a system boot.
> +It is in the form of ``struct kvm_stats_header`` as below::
> +
> +       #define KVM_STATS_ID_MAXLEN             64
> +
> +       struct kvm_stats_header {
> +               char id[KVM_STATS_ID_MAXLEN];
> +               __u32 name_size;
> +               __u32 count;
> +               __u32 desc_offset;
> +               __u32 data_offset;
> +       };
> +
> +The ``id`` field is identification for the corresponding KVM statistics. For
> +KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> +"kvm-12345/vcpu-12".
> +
> +The ``name_size`` field is the size (byte) of the statistics name string
> +(including trailing '\0') appended to the end of every statistics descriptor.
> +
> +The ``count`` field is the number of statistics.
> +
> +The ``desc_offset`` field is the offset of the Descriptors block from the start
> +of the file indicated by the file descriptor.
> +
> +The ``data_offset`` field is the offset of the Stats Data block from the start
> +of the file indicated by the file descriptor.
> +
> +The Descriptors block is only needed to be read once after a system boot. It is
> +an array of ``struct kvm_stats_desc`` as below::
> +
> +       #define KVM_STATS_TYPE_SHIFT            0
> +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> +
> +       #define KVM_STATS_UNIT_SHIFT            4
> +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> +
> +       #define KVM_STATS_SCALE_SHIFT           8
> +       #define KVM_STATS_SCALE_MASK            (0xF << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_POW10           (0x0 << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_POW2            (0x1 << KVM_STATS_SCALE_SHIFT)
> +       #define KVM_STATS_SCALE_MAX             KVM_STATS_SCALE_POW2
> +
> +       struct kvm_stats_desc {
> +               __u32 flags;
> +               __s16 exponent;
> +               __u16 size;
> +               __u32 unused1;
> +               __u32 unused2;
> +               char name[0];
> +       };
> +
> +The ``flags`` field contains the type and unit of the statistics data described
> +by this descriptor. The following flags are supported:
> +  * ``KVM_STATS_TYPE_CUMULATIVE``
> +    The statistics data is cumulative. The value of data can only be increased.
> +    Most of the counters used in KVM are of this type.
> +    The corresponding ``count`` filed for this type is always 1.
> +  * ``KVM_STATS_TYPE_INSTANT``
> +    The statistics data is instantaneous. Its value can be increased or
> +    decreased. This type is usually used as a measurement of some resources,
> +    like the number of dirty pages, the number of large pages, etc.
> +    The corresponding ``count`` field for this type is always 1.
> +  * ``KVM_STATS_UNIT_NONE``
> +    There is no unit for the value of statistics data. This usually means that
> +    the value is a simple counter of an event.
> +  * ``KVM_STATS_UNIT_BYTES``
> +    It indicates that the statistics data is used to measure memory size, in the
> +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> +    determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
> +    determined by ``pow(2, exponent)``. For example, if value is 10,
> +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> +    can get the statistics data in the unit of Byte by
> +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> +    10 * 1024 * 1024 Bytes.
> +  * ``KVM_STATS_UNIT_SECONDS``
> +    It indicates that the statistics data is used to measure time/latency, in
> +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> +    data is determined by the ``exponent`` field in the descriptor. The
> +    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
> +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> +    we can get the statistics data in the unit of second by
> +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> +  * ``KVM_STATS_UNIT_CYCLES``
> +    It indicates that the statistics data is used to measure CPU clock cycles.
> +    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
> +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> +
> +The ``exponent`` field is the scale of corresponding statistics data. It has two
> +values as follows:
> +  * ``KVM_STATS_SCALE_POW10``
> +    The scale is based on power of 10. It is used for measurement of time and
> +    CPU clock cycles.
> +  * ``KVM_STATS_SCALE_POW2``
> +    The scale is based on power of 2. It is used for measurement of memory size.
> +
> +The ``size`` field is the number of values of this statistics data. It is in the
> +unit of ``unsigned long`` for VCPU or ``__u64`` for VM.

Note it is the reverse in the implementation.

> +
> +The ``unused1`` and ``unused2`` fields are reserved for future
> +support for other types of statistics data, like log/linear histogram.
> +
> +The ``name`` field points to the name string of the statistics data. The name
> +string starts at the end of ``struct kvm_stats_desc``.
> +The maximum length (including trailing '\0') is indicated by ``name_size``
> +in ``struct kvm_stats_header``.
> +
> +The Stats Data block contains an array of data values of type ``struct
> +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> +user space periodically to pull statistics data.
> +The order of data value in Stats Data block is the same as the order of
> +descriptors in Descriptors block.
> +  * Statistics data for VM::
> +
> +       struct kvm_vm_stats_data {
> +               unsigned long value[0];
> +       };
> +
> +  * Statistics data for VCPU::
> +
> +       struct kvm_vcpu_stats_data {
> +               __u64 value[0];
> +       };
> +
>  5. The kvm_run structure
>  ========================
>
> @@ -6891,3 +7054,11 @@ This capability is always enabled.
>  This capability indicates that the KVM virtual PTP service is
>  supported in the host. A VMM can check whether the service is
>  available to the guest on migration.
> +
> +8.33 KVM_CAP_STATS_BINARY_FD
> +----------------------------
> +
> +:Architectures: all
> +
> +This capability indicates the feature that user space can create get a file
> +descriptor for every VM and VCPU to read statistics data in binary format.
> --
> 2.31.1.751.gd2f1c929bd-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-17 14:53 ` [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
@ 2021-05-19 17:12   ` David Matlack
  2021-05-19 19:02     ` Jing Zhang
  2021-05-20  4:21   ` Ricardo Koller
  1 sibling, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-19 17:12 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 9:32 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> Provides a file descriptor per VM to read VM stats info/data.
> Provides a file descriptor per vCPU to read vCPU stats info/data.
>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arch/arm64/kvm/guest.c    |  26 +++++
>  arch/mips/kvm/mips.c      |  52 +++++++++
>  arch/powerpc/kvm/book3s.c |  52 +++++++++
>  arch/powerpc/kvm/booke.c  |  45 ++++++++
>  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
>  arch/x86/kvm/x86.c        |  53 +++++++++
>  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
>  include/uapi/linux/kvm.h  |  50 +++++++++
>  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
>  9 files changed, 745 insertions(+)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 0e41331b0911..1cc1d83630ac 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -28,6 +28,32 @@
>
>  #include "trace.h"
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("hvc_exit_stat"),
> +       STATS_DESC_COUNTER("wfe_exit_stat"),
> +       STATS_DESC_COUNTER("wfi_exit_stat"),
> +       STATS_DESC_COUNTER("mmio_exit_user"),
> +       STATS_DESC_COUNTER("mmio_exit_kernel"),
> +       STATS_DESC_COUNTER("exits"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
>         VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index f4fc60c05e9c..f17a65743ccd 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -38,6 +38,58 @@
>  #define VECTORSPACING 0x100    /* for EI/VI mode */
>  #endif
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("wait_exits"),
> +       STATS_DESC_COUNTER("cache_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("int_exits"),
> +       STATS_DESC_COUNTER("cop_unusable_exits"),
> +       STATS_DESC_COUNTER("tlbmod_exits"),
> +       STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> +       STATS_DESC_COUNTER("tlbmiss_st_exits"),
> +       STATS_DESC_COUNTER("addrerr_st_exits"),
> +       STATS_DESC_COUNTER("addrerr_ld_exits"),
> +       STATS_DESC_COUNTER("syscall_exits"),
> +       STATS_DESC_COUNTER("resvd_inst_exits"),
> +       STATS_DESC_COUNTER("break_inst_exits"),
> +       STATS_DESC_COUNTER("trap_inst_exits"),
> +       STATS_DESC_COUNTER("msa_fpe_exits"),
> +       STATS_DESC_COUNTER("fpe_exits"),
> +       STATS_DESC_COUNTER("msa_disabled_exits"),
> +       STATS_DESC_COUNTER("flush_dcache_exits"),
> +#ifdef CONFIG_KVM_MIPS_VZ
> +       STATS_DESC_COUNTER("vz_gpsi_exits"),
> +       STATS_DESC_COUNTER("vz_gsfc_exits"),
> +       STATS_DESC_COUNTER("vz_hc_exits"),
> +       STATS_DESC_COUNTER("vz_grr_exits"),
> +       STATS_DESC_COUNTER("vz_gva_exits"),
> +       STATS_DESC_COUNTER("vz_ghfc_exits"),
> +       STATS_DESC_COUNTER("vz_gpa_exits"),
> +       STATS_DESC_COUNTER("vz_resvd_exits"),
> +#ifdef CONFIG_CPU_LOONGSON64
> +       STATS_DESC_COUNTER("vz_cpucfg_exits"),
> +#endif
> +#endif
> +       );
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("wait", wait_exits),
>         VCPU_STAT("cache", cache_exits),
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index bd3a10e1fdaf..5e8ee0d39ef9 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -38,6 +38,58 @@
>
>  /* #define EXIT_DEBUG */
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +       STATS_DESC_ICOUNTER("num_2M_pages"),
> +       STATS_DESC_ICOUNTER("num_1G_pages"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("sum_exits"),
> +       STATS_DESC_COUNTER("mmio_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("light_exits"),
> +       STATS_DESC_COUNTER("itlb_real_miss_exits"),
> +       STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> +       STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> +       STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> +       STATS_DESC_COUNTER("syscall_exits"),
> +       STATS_DESC_COUNTER("isi_exits"),
> +       STATS_DESC_COUNTER("dsi_exits"),
> +       STATS_DESC_COUNTER("emulated_inst_exits"),
> +       STATS_DESC_COUNTER("dec_exits"),
> +       STATS_DESC_COUNTER("ext_intr_exits"),
> +       STATS_DESC_TIME_NSEC("halt_wait_ns"),
> +       STATS_DESC_COUNTER("halt_successful_wait"),
> +       STATS_DESC_COUNTER("dbell_exits"),
> +       STATS_DESC_COUNTER("gdbell_exits"),
> +       STATS_DESC_COUNTER("ld"),
> +       STATS_DESC_COUNTER("st"),
> +       STATS_DESC_COUNTER("pf_storage"),
> +       STATS_DESC_COUNTER("pf_instruc"),
> +       STATS_DESC_COUNTER("sp_storage"),
> +       STATS_DESC_COUNTER("sp_instruc"),
> +       STATS_DESC_COUNTER("queue_intr"),
> +       STATS_DESC_COUNTER("ld_slow"),
> +       STATS_DESC_COUNTER("st_slow"),
> +       STATS_DESC_COUNTER("pthru_all"),
> +       STATS_DESC_COUNTER("pthru_host"),
> +       STATS_DESC_COUNTER("pthru_bad_aff"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("exits", sum_exits),
>         VCPU_STAT("mmio", mmio_exits),
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 07fdd7a1254a..86d221e9193e 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -36,6 +36,51 @@
>
>  unsigned long kvmppc_booke_handlers;
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +       STATS_DESC_ICOUNTER("num_2M_pages"),
> +       STATS_DESC_ICOUNTER("num_1G_pages"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("sum_exits"),
> +       STATS_DESC_COUNTER("mmio_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("light_exits"),
> +       STATS_DESC_COUNTER("itlb_real_miss_exits"),
> +       STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> +       STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> +       STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> +       STATS_DESC_COUNTER("syscall_exits"),
> +       STATS_DESC_COUNTER("isi_exits"),
> +       STATS_DESC_COUNTER("dsi_exits"),
> +       STATS_DESC_COUNTER("emulated_inst_exits"),
> +       STATS_DESC_COUNTER("dec_exits"),
> +       STATS_DESC_COUNTER("ext_intr_exits"),
> +       STATS_DESC_TIME_NSEC("halt_wait_ns"),
> +       STATS_DESC_COUNTER("halt_successful_wait"),
> +       STATS_DESC_COUNTER("dbell_exits"),
> +       STATS_DESC_COUNTER("gdbell_exits"),
> +       STATS_DESC_COUNTER("ld"),
> +       STATS_DESC_COUNTER("st"),
> +       STATS_DESC_COUNTER("pthru_all"),
> +       STATS_DESC_COUNTER("pthru_host"),
> +       STATS_DESC_COUNTER("pthru_bad_aff"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("mmio", mmio_exits),
>         VCPU_STAT("sig", signal_exits),
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index d6bf3372bb10..003feee79fce 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -58,6 +58,123 @@
>  #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
>                            (KVM_MAX_VCPUS + LOCAL_IRQS))
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +       STATS_DESC_COUNTER("inject_io"),
> +       STATS_DESC_COUNTER("inject_float_mchk"),
> +       STATS_DESC_COUNTER("inject_pfault_done"),
> +       STATS_DESC_COUNTER("inject_service_signal"),
> +       STATS_DESC_COUNTER("inject_virtio"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("exit_userspace"),
> +       STATS_DESC_COUNTER("exit_null"),
> +       STATS_DESC_COUNTER("exit_external_request"),
> +       STATS_DESC_COUNTER("exit_io_request"),
> +       STATS_DESC_COUNTER("exit_external_interrupt"),
> +       STATS_DESC_COUNTER("exit_stop_request"),
> +       STATS_DESC_COUNTER("exit_validity"),
> +       STATS_DESC_COUNTER("exit_instruction"),
> +       STATS_DESC_COUNTER("exit_pei"),
> +       STATS_DESC_COUNTER("halt_no_poll_steal"),
> +       STATS_DESC_COUNTER("instruction_lctl"),
> +       STATS_DESC_COUNTER("instruction_lctlg"),
> +       STATS_DESC_COUNTER("instruction_stctl"),
> +       STATS_DESC_COUNTER("instruction_stctg"),
> +       STATS_DESC_COUNTER("exit_program_interruption"),
> +       STATS_DESC_COUNTER("exit_instr_and_program"),
> +       STATS_DESC_COUNTER("exit_operation_exception"),
> +       STATS_DESC_COUNTER("deliver_ckc"),
> +       STATS_DESC_COUNTER("deliver_cputm"),
> +       STATS_DESC_COUNTER("deliver_external_call"),
> +       STATS_DESC_COUNTER("deliver_emergency_signal"),
> +       STATS_DESC_COUNTER("deliver_service_signal"),
> +       STATS_DESC_COUNTER("deliver_virtio"),
> +       STATS_DESC_COUNTER("deliver_stop_signal"),
> +       STATS_DESC_COUNTER("deliver_prefix_signal"),
> +       STATS_DESC_COUNTER("deliver_restart_signal"),
> +       STATS_DESC_COUNTER("deliver_program"),
> +       STATS_DESC_COUNTER("deliver_io"),
> +       STATS_DESC_COUNTER("deliver_machine_check"),
> +       STATS_DESC_COUNTER("exit_wait_state"),
> +       STATS_DESC_COUNTER("inject_ckc"),
> +       STATS_DESC_COUNTER("inject_cputm"),
> +       STATS_DESC_COUNTER("inject_external_call"),
> +       STATS_DESC_COUNTER("inject_emergency_signal"),
> +       STATS_DESC_COUNTER("inject_mchk"),
> +       STATS_DESC_COUNTER("inject_pfault_init"),
> +       STATS_DESC_COUNTER("inject_program"),
> +       STATS_DESC_COUNTER("inject_restart"),
> +       STATS_DESC_COUNTER("inject_set_prefix"),
> +       STATS_DESC_COUNTER("inject_stop_signal"),
> +       STATS_DESC_COUNTER("instruction_epsw"),
> +       STATS_DESC_COUNTER("instruction_gs"),
> +       STATS_DESC_COUNTER("instruction_io_other"),
> +       STATS_DESC_COUNTER("instruction_lpsw"),
> +       STATS_DESC_COUNTER("instruction_lpswe"),
> +       STATS_DESC_COUNTER("instruction_pfmf"),
> +       STATS_DESC_COUNTER("instruction_ptff"),
> +       STATS_DESC_COUNTER("instruction_sck"),
> +       STATS_DESC_COUNTER("instruction_sckpf"),
> +       STATS_DESC_COUNTER("instruction_stidp"),
> +       STATS_DESC_COUNTER("instruction_spx"),
> +       STATS_DESC_COUNTER("instruction_stpx"),
> +       STATS_DESC_COUNTER("instruction_stap"),
> +       STATS_DESC_COUNTER("instruction_iske"),
> +       STATS_DESC_COUNTER("instruction_ri"),
> +       STATS_DESC_COUNTER("instruction_rrbe"),
> +       STATS_DESC_COUNTER("instruction_sske"),
> +       STATS_DESC_COUNTER("instruction_ipte_interlock"),
> +       STATS_DESC_COUNTER("instruction_stsi"),
> +       STATS_DESC_COUNTER("instruction_stfl"),
> +       STATS_DESC_COUNTER("instruction_tb"),
> +       STATS_DESC_COUNTER("instruction_tpi"),
> +       STATS_DESC_COUNTER("instruction_tprot"),
> +       STATS_DESC_COUNTER("instruction_tsch"),
> +       STATS_DESC_COUNTER("instruction_sie"),
> +       STATS_DESC_COUNTER("instruction_essa"),
> +       STATS_DESC_COUNTER("instruction_sthyi"),
> +       STATS_DESC_COUNTER("instruction_sigp_sense"),
> +       STATS_DESC_COUNTER("instruction_sigp_sense_running"),
> +       STATS_DESC_COUNTER("instruction_sigp_external_call"),
> +       STATS_DESC_COUNTER("instruction_sigp_emergency"),
> +       STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
> +       STATS_DESC_COUNTER("instruction_sigp_start"),
> +       STATS_DESC_COUNTER("instruction_sigp_stop"),
> +       STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
> +       STATS_DESC_COUNTER("instruction_sigp_store_status"),
> +       STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
> +       STATS_DESC_COUNTER("instruction_sigp_arch"),
> +       STATS_DESC_COUNTER("instruction_sigp_prefix"),
> +       STATS_DESC_COUNTER("instruction_sigp_restart"),
> +       STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
> +       STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
> +       STATS_DESC_COUNTER("instruction_sigp_unknown"),
> +       STATS_DESC_COUNTER("diagnose_10"),
> +       STATS_DESC_COUNTER("diagnose_44"),
> +       STATS_DESC_COUNTER("diagnose_9c"),
> +       STATS_DESC_COUNTER("diagnose_9c_ignored"),
> +       STATS_DESC_COUNTER("diagnose_258"),
> +       STATS_DESC_COUNTER("diagnose_308"),
> +       STATS_DESC_COUNTER("diagnose_500"),
> +       STATS_DESC_COUNTER("diagnose_other"),
> +       STATS_DESC_COUNTER("pfault_sync"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("userspace_handled", exit_userspace),
>         VCPU_STAT("exit_null", exit_null),
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a93d80caff6..84880687c199 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
>  u64 __read_mostly supported_xss;
>  EXPORT_SYMBOL_GPL(supported_xss);
>
> +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> +       STATS_DESC_COUNTER("mmu_shadow_zapped"),
> +       STATS_DESC_COUNTER("mmu_pte_write"),
> +       STATS_DESC_COUNTER("mmu_pde_zapped"),
> +       STATS_DESC_COUNTER("mmu_flooded"),
> +       STATS_DESC_COUNTER("mmu_recycled"),
> +       STATS_DESC_COUNTER("mmu_cache_miss"),
> +       STATS_DESC_ICOUNTER("mmu_unsync"),
> +       STATS_DESC_ICOUNTER("largepages"),
> +       STATS_DESC_ICOUNTER("nx_largepages_splits"),
> +       STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> +
> +struct _kvm_stats_header kvm_vm_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vm_stats_desc),
> +};
> +
> +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> +       STATS_DESC_COUNTER("pf_fixed"),
> +       STATS_DESC_COUNTER("pf_guest"),
> +       STATS_DESC_COUNTER("tlb_flush"),
> +       STATS_DESC_COUNTER("invlpg"),
> +       STATS_DESC_COUNTER("exits"),
> +       STATS_DESC_COUNTER("io_exits"),
> +       STATS_DESC_COUNTER("mmio_exits"),
> +       STATS_DESC_COUNTER("signal_exits"),
> +       STATS_DESC_COUNTER("irq_window_exits"),
> +       STATS_DESC_COUNTER("nmi_window_exits"),
> +       STATS_DESC_COUNTER("l1d_flush"),
> +       STATS_DESC_COUNTER("halt_exits"),
> +       STATS_DESC_COUNTER("request_irq_exits"),
> +       STATS_DESC_COUNTER("irq_exits"),
> +       STATS_DESC_COUNTER("host_state_reload"),
> +       STATS_DESC_COUNTER("fpu_reload"),
> +       STATS_DESC_COUNTER("insn_emulation"),
> +       STATS_DESC_COUNTER("insn_emulation_fail"),
> +       STATS_DESC_COUNTER("hypercalls"),
> +       STATS_DESC_COUNTER("irq_injections"),
> +       STATS_DESC_COUNTER("nmi_injections"),
> +       STATS_DESC_COUNTER("req_event"),
> +       STATS_DESC_COUNTER("nested_run"));
> +
> +struct _kvm_stats_header kvm_vcpu_stats_header = {
> +       .name_size = KVM_STATS_NAME_LEN,
> +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> +       .desc_offset = sizeof(struct kvm_stats_header),
> +       .data_offset = sizeof(struct kvm_stats_header) +
> +               sizeof(kvm_vcpu_stats_desc),
> +};
> +
>  struct kvm_stats_debugfs_item debugfs_entries[] = {
>         VCPU_STAT("pf_fixed", pf_fixed),
>         VCPU_STAT("pf_guest", pf_guest),
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 97700e41db3b..52783f8062ca 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
>         int mode;
>  };
>
> +struct _kvm_stats_header {
> +       __u32 name_size;
> +       __u32 count;
> +       __u32 desc_offset;
> +       __u32 data_offset;
> +};
> +
> +#define KVM_STATS_NAME_LEN     48
> +struct _kvm_stats_desc {
> +       struct kvm_stats_desc desc;
> +       char name[KVM_STATS_NAME_LEN];
> +};
> +
>  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
>         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
>
> @@ -1253,8 +1266,122 @@ struct kvm_stats_debugfs_item {
>         { n, offsetof(struct kvm_vcpu, stat.common.x),                         \
>           KVM_STAT_VCPU, ## __VA_ARGS__ }
>
> +#define STATS_DESC(name, type, unit, scale, exponent)                         \
> +       {                                                                      \
> +               {type | unit | scale, exponent, 1}, name,                      \
> +       }

Suggest using designated initializers here.

> +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)                    \
> +       STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> +#define STATS_DESC_INSTANT(name, unit, scale, exponent)                               \
> +       STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> +
> +/* Cumulative counter */
> +#define STATS_DESC_COUNTER(name)                                              \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,                       \
> +               KVM_STATS_SCALE_POW10, 0)
> +/* Instantaneous counter */
> +#define STATS_DESC_ICOUNTER(name)                                             \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,                          \
> +               KVM_STATS_SCALE_POW10, 0)
> +
> +/* Cumulative clock cycles */
> +#define STATS_DESC_CYCLE(name)                                                \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,                     \
> +               KVM_STATS_SCALE_POW10, 0)
> +/* Instantaneous clock cycles */
> +#define STATS_DESC_ICYCLE(name)                                                       \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,                        \
> +               KVM_STATS_SCALE_POW10, 0)
> +
> +/* Cumulative memory size in Byte */
> +#define STATS_DESC_SIZE_BYTE(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_SCALE_POW2, 0)
> +/* Cumulative memory size in KiByte */
> +#define STATS_DESC_SIZE_KBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_SCALE_POW2, 10)
> +/* Cumulative memory size in MiByte */
> +#define STATS_DESC_SIZE_MBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_SCALE_POW2, 20)
> +/* Cumulative memory size in GiByte */
> +#define STATS_DESC_SIZE_GBYTE(name)                                           \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> +               KVM_STATS_SCALE_POW2, 30)
> +
> +/* Instantaneous memory size in Byte */
> +#define STATS_DESC_ISIZE_BYTE(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_SCALE_POW2, 0)
> +/* Instantaneous memory size in KiByte */
> +#define STATS_DESC_ISIZE_KBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_SCALE_POW2, 10)
> +/* Instantaneous memory size in MiByte */
> +#define STATS_DESC_ISIZE_MBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_SCALE_POW2, 20)
> +/* Instantaneous memory size in GiByte */
> +#define STATS_DESC_ISIZE_GBYTE(name)                                          \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> +               KVM_STATS_SCALE_POW2, 30)
> +
> +/* Cumulative time in second */
> +#define STATS_DESC_TIME_SEC(name)                                             \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_SCALE_POW10, 0)
> +/* Cumulative time in millisecond */
> +#define STATS_DESC_TIME_MSEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_SCALE_POW10, -3)
> +/* Cumulative time in microsecond */
> +#define STATS_DESC_TIME_USEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_SCALE_POW10, -6)
> +/* Cumulative time in nanosecond */
> +#define STATS_DESC_TIME_NSEC(name)                                            \
> +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> +               KVM_STATS_SCALE_POW10, -9)
> +
> +/* Instantaneous time in second */
> +#define STATS_DESC_ITIME_SEC(name)                                            \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_SCALE_POW10, 0)
> +/* Instantaneous time in millisecond */
> +#define STATS_DESC_ITIME_MSEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_SCALE_POW10, -3)
> +/* Instantaneous time in microsecond */
> +#define STATS_DESC_ITIME_USEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_SCALE_POW10, -6)
> +/* Instantaneous time in nanosecond */
> +#define STATS_DESC_ITIME_NSEC(name)                                           \
> +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> +               KVM_STATS_SCALE_POW10, -9)
> +
> +#define DEFINE_VM_STATS_DESC(...) {                                           \
> +       STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> +       ## __VA_ARGS__                                                         \
> +}
> +
> +#define DEFINE_VCPU_STATS_DESC(...) {                                         \
> +       STATS_DESC_COUNTER("halt_successful_poll"),                            \
> +       STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> +       STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> +       STATS_DESC_COUNTER("halt_wakeup"),                                     \
> +       STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> +       STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> +       ## __VA_ARGS__                                                         \
> +}
> +
>  extern struct kvm_stats_debugfs_item debugfs_entries[];
>  extern struct dentry *kvm_debugfs_dir;
> +extern struct _kvm_stats_header kvm_vm_stats_header;
> +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
>
>  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>  static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..a64e92c7d9de 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_SGX_ATTRIBUTE 196
>  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>  #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_STATS_BINARY_FD 199
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
>  #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
>  #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
>
> +#define KVM_STATS_ID_MAXLEN            64
> +
> +struct kvm_stats_header {
> +       char id[KVM_STATS_ID_MAXLEN];
> +       __u32 name_size;
> +       __u32 count;
> +       __u32 desc_offset;
> +       __u32 data_offset;
> +};
> +
> +#define KVM_STATS_TYPE_SHIFT           0
> +#define KVM_STATS_TYPE_MASK            (0xF << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_CUMULATIVE      (0x0 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_INSTANT         (0x1 << KVM_STATS_TYPE_SHIFT)
> +#define KVM_STATS_TYPE_MAX             KVM_STATS_TYPE_INSTANT
> +
> +#define KVM_STATS_UNIT_SHIFT           4
> +#define KVM_STATS_UNIT_MASK            (0xF << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_NONE            (0x0 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_BYTES           (0x1 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_SECONDS         (0x2 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_CYCLES          (0x3 << KVM_STATS_UNIT_SHIFT)
> +#define KVM_STATS_UNIT_MAX             KVM_STATS_UNIT_CYCLES
> +
> +#define KVM_STATS_SCALE_SHIFT          8
> +#define KVM_STATS_SCALE_MASK           (0xF << KVM_STATS_SCALE_SHIFT)
> +#define KVM_STATS_SCALE_POW10          (0x0 << KVM_STATS_SCALE_SHIFT)
> +#define KVM_STATS_SCALE_POW2           (0x1 << KVM_STATS_SCALE_SHIFT)
> +#define KVM_STATS_SCALE_MAX            KVM_STATS_SCALE_POW2
> +
> +struct kvm_stats_desc {
> +       __u32 flags;
> +       __s16 exponent;
> +       __u16 size;
> +       __u32 unused1;
> +       __u32 unused2;
> +       char name[0];
> +};
> +
> +struct kvm_vm_stats_data {
> +       unsigned long value[0];
> +};
> +
> +struct kvm_vcpu_stats_data {
> +       __u64 value[0];
> +};
> +
> +#define KVM_STATS_GETFD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 34a4cf265297..9e2c8dcdeae9 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
>         return 0;
>  }
>
> +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> +                             size_t size, loff_t *offset)
> +{
> +       char id[KVM_STATS_ID_MAXLEN];
> +       struct kvm_vcpu *vcpu = file->private_data;
> +       ssize_t copylen, len, remain = size;
> +       size_t size_header, size_desc, size_stats;
> +       loff_t pos = *offset;
> +       char __user *dest = user_buffer;
> +       void *src;
> +
> +       snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> +                       task_pid_nr(current), vcpu->vcpu_id);
> +       size_header = sizeof(kvm_vcpu_stats_header);
> +       size_desc =
> +               kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> +       size_stats = sizeof(vcpu->stat);
> +
> +       len = sizeof(id) + size_header + size_desc + size_stats - pos;
> +       len = min(len, remain);
> +       if (len <= 0)
> +               return 0;
> +       remain = len;
> +
> +       /* Copy kvm vcpu stats header id string */
> +       copylen = sizeof(id) - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)id + pos;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm vcpu stats header */
> +       copylen = sizeof(id) + size_header - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&kvm_vcpu_stats_header;
> +               src += pos - sizeof(id);
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm vcpu stats descriptors */
> +       copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&kvm_vcpu_stats_desc;
> +               src += pos - kvm_vcpu_stats_header.desc_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }

KVM could cache everything above this to avoid the cost of
regenerating it on every read. It would require allocating some extra
memory in the kernel though, so it's not free. But if userspace is
reading stats for every vCPU every second it could be worth it.

> +       /* Copy kvm vcpu stats values */
> +       copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&vcpu->stat;
> +               src += pos - kvm_vcpu_stats_header.data_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +
> +       *offset = pos;
> +       return len;
> +}
> +
> +static const struct file_operations kvm_vcpu_stats_fops = {
> +       .read = kvm_vcpu_stats_read,
> +       .llseek = noop_llseek,
> +};
> +
> +static int kvm_vcpu_ioctl_get_statsfd(struct kvm_vcpu *vcpu)
> +{
> +       int error, fd;
> +       struct file *file;
> +       char name[15 + ITOA_MAX_LEN + 1];
> +
> +       snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);

Does this need to be globally unique? I was going to suggest using the
id ("kvm-%d/vcpu-%d") but the slash is probably not allowed. It would
be nice though to have the file name the same as the id though so
maybe change the id and name to something like  "kvm-%d.vcpu-%d"?

> +
> +       error = get_unused_fd_flags(O_CLOEXEC);
> +       if (error < 0)
> +               return error;
> +       fd = error;
> +
> +       file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> +       if (IS_ERR(file)) {
> +               error = PTR_ERR(file);
> +               goto err_put_unused_fd;
> +       }
> +       file->f_mode |= FMODE_PREAD;
> +       fd_install(fd, file);
> +
> +       return fd;
> +
> +err_put_unused_fd:
> +       put_unused_fd(fd);
> +       return error;
> +}
> +
>  static long kvm_vcpu_ioctl(struct file *filp,
>                            unsigned int ioctl, unsigned long arg)
>  {
> @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
>                 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
>                 break;
>         }
> +       case KVM_STATS_GETFD: {
> +               r = kvm_vcpu_ioctl_get_statsfd(vcpu);
> +               break;
> +       }
>         default:
>                 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
>         }
> @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  #else
>                 return 0;
>  #endif
> +       case KVM_CAP_STATS_BINARY_FD:
> +               return 1;
>         default:
>                 break;
>         }
> @@ -3967,6 +4082,111 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>         }
>  }
>
> +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> +                             size_t size, loff_t *offset)
> +{
> +       char id[KVM_STATS_ID_MAXLEN];
> +       struct kvm *kvm = file->private_data;
> +       ssize_t copylen, len, remain = size;
> +       size_t size_header, size_desc, size_stats;
> +       loff_t pos = *offset;
> +       char __user *dest = user_buffer;
> +       void *src;
> +
> +       snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> +       size_header = sizeof(kvm_vm_stats_header);
> +       size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> +       size_stats = sizeof(kvm->stat);
> +
> +       len = sizeof(id) + size_header + size_desc + size_stats - pos;
> +       len = min(len, remain);
> +       if (len <= 0)
> +               return 0;
> +       remain = len;
> +
> +       /* Copy kvm vm stats header id string */
> +       copylen = sizeof(id) - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)id + pos;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm vm stats header */
> +       copylen = sizeof(id) + size_header - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&kvm_vm_stats_header;
> +               src += pos - sizeof(id);
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +       /* Copy kvm vm stats descriptors */
> +       copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&kvm_vm_stats_desc;
> +               src += pos - kvm_vm_stats_header.desc_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }

Ditto here about caching.


> +       /* Copy kvm vm stats values */
> +       copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> +       copylen = min(copylen, remain);
> +       if (copylen > 0) {
> +               src = (void *)&kvm->stat;
> +               src += pos - kvm_vm_stats_header.data_offset;
> +               if (copy_to_user(dest, src, copylen))
> +                       return -EFAULT;
> +               remain -= copylen;
> +               pos += copylen;
> +               dest += copylen;
> +       }
> +
> +       *offset = pos;
> +       return len;
> +}
> +
> +static const struct file_operations kvm_vm_stats_fops = {
> +       .read = kvm_vm_stats_read,
> +       .llseek = noop_llseek,
> +};
> +
> +static int kvm_vm_ioctl_get_statsfd(struct kvm *kvm)
> +{
> +       int error, fd;
> +       struct file *file;
> +
> +       error = get_unused_fd_flags(O_CLOEXEC);
> +       if (error < 0)
> +               return error;
> +       fd = error;
> +
> +       file = anon_inode_getfile("kvm-vm-stats",
> +                       &kvm_vm_stats_fops, kvm, O_RDONLY);
> +       if (IS_ERR(file)) {
> +               error = PTR_ERR(file);
> +               goto err_put_unused_fd;
> +       }
> +       file->f_mode |= FMODE_PREAD;
> +       fd_install(fd, file);
> +
> +       return fd;
> +
> +err_put_unused_fd:
> +       put_unused_fd(fd);
> +       return error;
> +}
> +
>  static long kvm_vm_ioctl(struct file *filp,
>                            unsigned int ioctl, unsigned long arg)
>  {
> @@ -4149,6 +4369,9 @@ static long kvm_vm_ioctl(struct file *filp,
>         case KVM_RESET_DIRTY_RINGS:
>                 r = kvm_vm_ioctl_reset_dirty_pages(kvm);
>                 break;
> +       case KVM_STATS_GETFD:
> +               r = kvm_vm_ioctl_get_statsfd(kvm);
> +               break;
>         default:
>                 r = kvm_arch_vm_ioctl(filp, ioctl, arg);
>         }
> --
> 2.31.1.751.gd2f1c929bd-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-17 14:53 ` [PATCH v5 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
@ 2021-05-19 17:21   ` David Matlack
  2021-05-19 17:58     ` Jing Zhang
  2021-05-19 22:00   ` Ricardo Koller
  1 sibling, 1 reply; 30+ messages in thread
From: David Matlack @ 2021-05-19 17:21 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 9:24 AM Jing Zhang <jingzhangos@google.com> wrote:
>
> Add selftest to check KVM stats descriptors validity.
>
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   3 +
>  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
>  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
>  5 files changed, 398 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
>
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index bd83158e0e0b..35796667c944 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -43,3 +43,4 @@
>  /memslot_modification_stress_test
>  /set_memory_region_test
>  /steal_time
> +/kvm_bin_form_stats
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index e439d027939d..2984c86c848a 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
>  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
>  TEST_GEN_PROGS_x86_64 += set_memory_region_test
>  TEST_GEN_PROGS_x86_64 += steal_time
> +TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
>
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_aarch64 += kvm_page_table_test
>  TEST_GEN_PROGS_aarch64 += set_memory_region_test
>  TEST_GEN_PROGS_aarch64 += steal_time
> +TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
>
>  TEST_GEN_PROGS_s390x = s390x/memop
>  TEST_GEN_PROGS_s390x += s390x/resets
> @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
>  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>  TEST_GEN_PROGS_s390x += kvm_page_table_test
>  TEST_GEN_PROGS_s390x += set_memory_region_test
> +TEST_GEN_PROGS_s390x += kvm_bin_form_stats
>
>  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
>  LIBKVM += $(LIBKVM_$(UNAME_M))
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index a8f022794ce3..ee01a67022d9 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
>  #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
>         __GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
>
> +int vm_get_statsfd(struct kvm_vm *vm);
> +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
> +
>  #endif /* SELFTEST_KVM_UTIL_H */
> diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> new file mode 100644
> index 000000000000..dae44397d0f4
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> @@ -0,0 +1,379 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * kvm_bin_form_stats
> + *
> + * Copyright (C) 2021, Google LLC.
> + *
> + * Test the fd-based interface for KVM statistics.
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +
> +#include "test_util.h"
> +
> +#include "kvm_util.h"
> +#include "asm/kvm.h"
> +#include "linux/kvm.h"
> +
> +int vm_stats_test(struct kvm_vm *vm)
> +{
> +       ssize_t ret;
> +       int i, stats_fd, err = -1;
> +       size_t size_desc, size_data = 0;
> +       struct kvm_stats_header header;
> +       struct kvm_stats_desc *stats_desc, *pdesc;
> +       struct kvm_vm_stats_data *stats_data;
> +
> +       /* Get fd for VM stats */
> +       stats_fd = vm_get_statsfd(vm);
> +       if (stats_fd < 0) {
> +               perror("Get VM stats fd");
> +               return err;
> +       }
> +       /* Read kvm vm stats header */
> +       ret = read(stats_fd, &header, sizeof(header));
> +       if (ret != sizeof(header)) {
> +               perror("Read VM stats header");
> +               goto out_close_fd;
> +       }
> +       size_desc = sizeof(*stats_desc) + header.name_size;
> +       /* Check id string in header, that should start with "kvm" */
> +       if (strncmp(header.id, "kvm", 3) ||
> +                       strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> +               printf("Invalid KVM VM stats type!\n");
> +               goto out_close_fd;

Is there a reason why you are not using TEST_ASSERT for these checks?
The memory will get cleaned up when the test process exits, so there's
no need to do the careful error handling and goto statements.

(This applies throughout this whole test.)

> +       }
> +       /* Sanity check for other fields in header */
> +       if (header.count == 0) {
> +               err = 0;
> +               goto out_close_fd;
> +       }
> +       /* Check overlap */
> +       if (header.desc_offset == 0 || header.data_offset == 0 ||
> +                       header.desc_offset < sizeof(header) ||
> +                       header.data_offset < sizeof(header)) {
> +               printf("Invalid offset fields in header!\n");
> +               goto out_close_fd;
> +       }
> +       if (header.desc_offset < header.data_offset &&
> +                       (header.desc_offset + size_desc * header.count >
> +                       header.data_offset)) {
> +               printf("VM Descriptor block is overlapped with data block!\n");
> +               goto out_close_fd;
> +       }
> +
> +       /* Allocate memory for stats descriptors */
> +       stats_desc = calloc(header.count, size_desc);
> +       if (!stats_desc) {
> +               perror("Allocate memory for VM stats descriptors");
> +               goto out_close_fd;
> +       }
> +       /* Read kvm vm stats descriptors */
> +       ret = pread(stats_fd, stats_desc,
> +                       size_desc * header.count, header.desc_offset);
> +       if (ret != size_desc * header.count) {
> +               perror("Read KVM VM stats descriptors");
> +               goto out_free_desc;
> +       }
> +       /* Sanity check for fields in descriptors */
> +       for (i = 0; i < header.count; ++i) {
> +               pdesc = (void *)stats_desc + i * size_desc;
> +               /* Check type,unit,scale boundaries */
> +               if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> +                       printf("Unknown KVM stats type!\n");
> +                       goto out_free_desc;
> +               }
> +               if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> +                       printf("Unknown KVM stats unit!\n");
> +                       goto out_free_desc;
> +               }
> +               if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> +                               KVM_STATS_SCALE_MAX) {
> +                       printf("Unknown KVM stats scale!\n");
> +                       goto out_free_desc;
> +               }
> +               /* Check exponent for stats unit
> +                * Exponent for counter should be greater than or equal to 0
> +                * Exponent for unit bytes should be greater than or equal to 0
> +                * Exponent for unit seconds should be less than or equal to 0
> +                * Exponent for unit clock cycles should be greater than or
> +                * equal to 0
> +                */
> +               switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +               case KVM_STATS_UNIT_NONE:
> +               case KVM_STATS_UNIT_BYTES:
> +               case KVM_STATS_UNIT_CYCLES:
> +                       if (pdesc->exponent < 0) {
> +                               printf("Unsupported KVM stats unit!\n");
> +                               goto out_free_desc;
> +                       }
> +                       break;
> +               case KVM_STATS_UNIT_SECONDS:
> +                       if (pdesc->exponent > 0) {
> +                               printf("Unsupported KVM stats unit!\n");
> +                               goto out_free_desc;
> +                       }
> +                       break;
> +               }
> +               /* Check name string */
> +               if (strlen(pdesc->name) >= header.name_size) {
> +                       printf("KVM stats name(%s) too long!\n", pdesc->name);
> +                       goto out_free_desc;
> +               }
> +               /* Check size field, which should not be zero */
> +               if (pdesc->size == 0) {
> +                       printf("KVM descriptor(%s) with size of 0!\n",
> +                                       pdesc->name);
> +                       goto out_free_desc;
> +               }
> +               size_data += pdesc->size * sizeof(stats_data->value[0]);
> +       }
> +       /* Check overlap */
> +       if (header.data_offset < header.desc_offset &&
> +               header.data_offset + size_data > header.desc_offset) {
> +               printf("Data block is overlapped with Descriptor block!\n");
> +               goto out_free_desc;
> +       }
> +       /* Check validity of all stats data size */
> +       if (size_data < header.count * sizeof(stats_data->value[0])) {
> +               printf("Data size is not correct!\n");
> +               goto out_free_desc;
> +       }
> +
> +       /* Allocate memory for stats data */
> +       stats_data = malloc(size_data);
> +       if (!stats_data) {
> +               perror("Allocate memory for VM stats data");
> +               goto out_free_desc;
> +       }
> +       /* Read kvm vm stats data */
> +       ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> +       if (ret != size_data) {
> +               perror("Read KVM VM stats data");
> +               goto out_free_data;
> +       }
> +
> +       err = 0;
> +out_free_data:
> +       free(stats_data);
> +out_free_desc:
> +       free(stats_desc);
> +out_close_fd:
> +       close(stats_fd);
> +       return err;
> +}
> +
> +int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> +{
> +       ssize_t ret;
> +       int i, stats_fd, err = -1;
> +       size_t size_desc, size_data = 0;
> +       struct kvm_stats_header header;
> +       struct kvm_stats_desc *stats_desc, *pdesc;
> +       struct kvm_vcpu_stats_data *stats_data;
> +
> +       /* Get fd for VCPU stats */
> +       stats_fd = vcpu_get_statsfd(vm, vcpu_id);
> +       if (stats_fd < 0) {
> +               perror("Get VCPU stats fd");
> +               return err;
> +       }
> +       /* Read kvm vcpu stats header */
> +       ret = read(stats_fd, &header, sizeof(header));
> +       if (ret != sizeof(header)) {
> +               perror("Read VCPU stats header");
> +               goto out_close_fd;
> +       }
> +       size_desc = sizeof(*stats_desc) + header.name_size;
> +       /* Check id string in header, that should start with "kvm" */
> +       if (strncmp(header.id, "kvm", 3) ||
> +                       strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> +               printf("Invalid KVM VCPU stats type!\n");
> +               goto out_close_fd;
> +       }
> +       /* Sanity check for other fields in header */
> +       if (header.count == 0) {
> +               err = 0;
> +               goto out_close_fd;
> +       }
> +       /* Check overlap */
> +       if (header.desc_offset == 0 || header.data_offset == 0 ||
> +                       header.desc_offset < sizeof(header) ||
> +                       header.data_offset < sizeof(header)) {
> +               printf("Invalid offset fields in header!\n");
> +               goto out_close_fd;
> +       }
> +       if (header.desc_offset < header.data_offset &&
> +                       (header.desc_offset + size_desc * header.count >
> +                       header.data_offset)) {
> +               printf("VCPU Descriptor block is overlapped with data block!\n");
> +               goto out_close_fd;
> +       }
> +
> +       /* Allocate memory for stats descriptors */
> +       stats_desc = calloc(header.count, size_desc);
> +       if (!stats_desc) {
> +               perror("Allocate memory for VCPU stats descriptors");
> +               goto out_close_fd;
> +       }
> +       /* Read kvm vcpu stats descriptors */
> +       ret = pread(stats_fd, stats_desc,
> +                       size_desc * header.count, header.desc_offset);
> +       if (ret != size_desc * header.count) {
> +               perror("Read KVM VCPU stats descriptors");
> +               goto out_free_desc;
> +       }
> +       /* Sanity check for fields in descriptors */
> +       for (i = 0; i < header.count; ++i) {
> +               pdesc = (void *)stats_desc + i * size_desc;
> +               /* Check boundaries */
> +               if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> +                       printf("Unknown KVM stats type!\n");
> +                       goto out_free_desc;
> +               }
> +               if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> +                       printf("Unknown KVM stats unit!\n");
> +                       goto out_free_desc;
> +               }
> +               if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> +                               KVM_STATS_SCALE_MAX) {
> +                       printf("Unknown KVM stats scale!\n");
> +                       goto out_free_desc;
> +               }
> +               /* Check exponent for stats unit
> +                * Exponent for counter should be greater than or equal to 0
> +                * Exponent for unit bytes should be greater than or equal to 0
> +                * Exponent for unit seconds should be less than or equal to 0
> +                * Exponent for unit clock cycles should be greater than or
> +                * equal to 0
> +                */
> +               switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +               case KVM_STATS_UNIT_NONE:
> +               case KVM_STATS_UNIT_BYTES:
> +               case KVM_STATS_UNIT_CYCLES:
> +                       if (pdesc->exponent < 0) {
> +                               printf("Unsupported KVM stats unit!\n");
> +                               goto out_free_desc;
> +                       }
> +                       break;
> +               case KVM_STATS_UNIT_SECONDS:
> +                       if (pdesc->exponent > 0) {
> +                               printf("Unsupported KVM stats unit!\n");
> +                               goto out_free_desc;
> +                       }
> +                       break;
> +               }
> +               /* Check name string */
> +               if (strlen(pdesc->name) >= header.name_size) {
> +                       printf("KVM stats name(%s) too long!\n", pdesc->name);
> +                       goto out_free_desc;
> +               }
> +               /* Check size field, which should not be zero */
> +               if (pdesc->size == 0) {
> +                       printf("KVM descriptor(%s) with size of 0!\n",
> +                                       pdesc->name);
> +                       goto out_free_desc;
> +               }
> +               size_data += pdesc->size * sizeof(stats_data->value[0]);
> +       }
> +       /* Check overlap */
> +       if (header.data_offset < header.desc_offset &&
> +               header.data_offset + size_data > header.desc_offset) {
> +               printf("Data block is overlapped with Descriptor block!\n");
> +               goto out_free_desc;
> +       }
> +       /* Check validity of all stats data size */
> +       if (size_data < header.count * sizeof(stats_data->value[0])) {
> +               printf("Data size is not correct!\n");
> +               goto out_free_desc;
> +       }
> +
> +       /* Allocate memory for stats data */
> +       stats_data = malloc(size_data);
> +       if (!stats_data) {
> +               perror("Allocate memory for VCPU stats data");
> +               goto out_free_desc;
> +       }
> +       /* Read kvm vcpu stats data */
> +       ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> +       if (ret != size_data) {
> +               perror("Read KVM VCPU stats data");
> +               goto out_free_data;
> +       }
> +
> +       err = 0;
> +out_free_data:
> +       free(stats_data);
> +out_free_desc:
> +       free(stats_desc);
> +out_close_fd:
> +       close(stats_fd);
> +       return err;
> +}
> +
> +/*
> + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> + * The first parameter #vm set the number of VMs being created.
> + * The second parameter #vcpu set the number of VCPUs being created.
> + * By default, 1 VM and 1 VCPU for the VM would be created for testing.
> + */

Consider setting the default to something higher so people running
this test with default arguments get more test coverage?

> +
> +int main(int argc, char *argv[])
> +{
> +       int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
> +       struct kvm_vm **vms;
> +
> +       /* Get the number of VMs and VCPUs that would be created for testing. */
> +       if (argc > 1) {
> +               max_vm = strtol(argv[1], NULL, 0);
> +               if (max_vm <= 0)
> +                       max_vm = 1;
> +       }
> +       if (argc > 2) {
> +               max_vcpu = strtol(argv[2], NULL, 0);
> +               if (max_vcpu <= 0)
> +                       max_vcpu = 1;
> +       }
> +
> +       /* Check the extension for binary stats */
> +       ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> +       if (ret < 0) {
> +               printf("Binary form statistics interface is not supported!\n");
> +               return err;
> +       }
> +
> +       /* Create VMs and VCPUs */
> +       vms = malloc(sizeof(vms[0]) * max_vm);
> +       if (!vms) {
> +               perror("Allocate memory for storing VM pointers");
> +               return err;
> +       }
> +       for (i = 0; i < max_vm; ++i) {
> +               vms[i] = vm_create(VM_MODE_DEFAULT,
> +                               DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> +               for (j = 0; j < max_vcpu; ++j)
> +                       vm_vcpu_add(vms[i], j);
> +       }
> +
> +       /* Check stats read for every VM and VCPU */
> +       for (i = 0; i < max_vm; ++i) {
> +               if (vm_stats_test(vms[i]))
> +                       goto out_free_vm;
> +               for (j = 0; j < max_vcpu; ++j) {
> +                       if (vcpu_stats_test(vms[i], j))
> +                               goto out_free_vm;
> +               }
> +       }
> +
> +       err = 0;
> +out_free_vm:
> +       for (i = 0; i < max_vm; ++i)
> +               kvm_vm_free(vms[i]);
> +       free(vms);
> +       return err;
> +}
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index fc83f6c5902d..d9e0b2c8b906 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
>         n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
>         return vm_adjust_num_guest_pages(mode, n);
>  }
> +
> +int vm_get_statsfd(struct kvm_vm *vm)
> +{
> +       return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
> +}
> +
> +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> +       struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> +
> +       return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
> +}
> --
> 2.31.1.751.gd2f1c929bd-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-19 17:21   ` David Matlack
@ 2021-05-19 17:58     ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 17:58 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Wed, May 19, 2021 at 12:22 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 9:24 AM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Add selftest to check KVM stats descriptors validity.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   3 +
> >  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
> >  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
> >  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
> >  5 files changed, 398 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index bd83158e0e0b..35796667c944 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -43,3 +43,4 @@
> >  /memslot_modification_stress_test
> >  /set_memory_region_test
> >  /steal_time
> > +/kvm_bin_form_stats
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index e439d027939d..2984c86c848a 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> >  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> >  TEST_GEN_PROGS_x86_64 += set_memory_region_test
> >  TEST_GEN_PROGS_x86_64 += steal_time
> > +TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> > @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> >  TEST_GEN_PROGS_aarch64 += set_memory_region_test
> >  TEST_GEN_PROGS_aarch64 += steal_time
> > +TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_s390x = s390x/memop
> >  TEST_GEN_PROGS_s390x += s390x/resets
> > @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
> >  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_s390x += kvm_page_table_test
> >  TEST_GEN_PROGS_s390x += set_memory_region_test
> > +TEST_GEN_PROGS_s390x += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> >  LIBKVM += $(LIBKVM_$(UNAME_M))
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> > index a8f022794ce3..ee01a67022d9 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
> >  #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
> >         __GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
> >
> > +int vm_get_statsfd(struct kvm_vm *vm);
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
> > +
> >  #endif /* SELFTEST_KVM_UTIL_H */
> > diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > new file mode 100644
> > index 000000000000..dae44397d0f4
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > @@ -0,0 +1,379 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * kvm_bin_form_stats
> > + *
> > + * Copyright (C) 2021, Google LLC.
> > + *
> > + * Test the fd-based interface for KVM statistics.
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <fcntl.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <errno.h>
> > +
> > +#include "test_util.h"
> > +
> > +#include "kvm_util.h"
> > +#include "asm/kvm.h"
> > +#include "linux/kvm.h"
> > +
> > +int vm_stats_test(struct kvm_vm *vm)
> > +{
> > +       ssize_t ret;
> > +       int i, stats_fd, err = -1;
> > +       size_t size_desc, size_data = 0;
> > +       struct kvm_stats_header header;
> > +       struct kvm_stats_desc *stats_desc, *pdesc;
> > +       struct kvm_vm_stats_data *stats_data;
> > +
> > +       /* Get fd for VM stats */
> > +       stats_fd = vm_get_statsfd(vm);
> > +       if (stats_fd < 0) {
> > +               perror("Get VM stats fd");
> > +               return err;
> > +       }
> > +       /* Read kvm vm stats header */
> > +       ret = read(stats_fd, &header, sizeof(header));
> > +       if (ret != sizeof(header)) {
> > +               perror("Read VM stats header");
> > +               goto out_close_fd;
> > +       }
> > +       size_desc = sizeof(*stats_desc) + header.name_size;
> > +       /* Check id string in header, that should start with "kvm" */
> > +       if (strncmp(header.id, "kvm", 3) ||
> > +                       strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +               printf("Invalid KVM VM stats type!\n");
> > +               goto out_close_fd;
>
> Is there a reason why you are not using TEST_ASSERT for these checks?
> The memory will get cleaned up when the test process exits, so there's
> no need to do the careful error handling and goto statements.
>
> (This applies throughout this whole test.)
>
No reason to not use TEST_ASSERT. Will do.
> > +       }
> > +       /* Sanity check for other fields in header */
> > +       if (header.count == 0) {
> > +               err = 0;
> > +               goto out_close_fd;
> > +       }
> > +       /* Check overlap */
> > +       if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                       header.desc_offset < sizeof(header) ||
> > +                       header.data_offset < sizeof(header)) {
> > +               printf("Invalid offset fields in header!\n");
> > +               goto out_close_fd;
> > +       }
> > +       if (header.desc_offset < header.data_offset &&
> > +                       (header.desc_offset + size_desc * header.count >
> > +                       header.data_offset)) {
> > +               printf("VM Descriptor block is overlapped with data block!\n");
> > +               goto out_close_fd;
> > +       }
> > +
> > +       /* Allocate memory for stats descriptors */
> > +       stats_desc = calloc(header.count, size_desc);
> > +       if (!stats_desc) {
> > +               perror("Allocate memory for VM stats descriptors");
> > +               goto out_close_fd;
> > +       }
> > +       /* Read kvm vm stats descriptors */
> > +       ret = pread(stats_fd, stats_desc,
> > +                       size_desc * header.count, header.desc_offset);
> > +       if (ret != size_desc * header.count) {
> > +               perror("Read KVM VM stats descriptors");
> > +               goto out_free_desc;
> > +       }
> > +       /* Sanity check for fields in descriptors */
> > +       for (i = 0; i < header.count; ++i) {
> > +               pdesc = (void *)stats_desc + i * size_desc;
> > +               /* Check type,unit,scale boundaries */
> > +               if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                       printf("Unknown KVM stats type!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                       printf("Unknown KVM stats unit!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                               KVM_STATS_SCALE_MAX) {
> > +                       printf("Unknown KVM stats scale!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               /* Check exponent for stats unit
> > +                * Exponent for counter should be greater than or equal to 0
> > +                * Exponent for unit bytes should be greater than or equal to 0
> > +                * Exponent for unit seconds should be less than or equal to 0
> > +                * Exponent for unit clock cycles should be greater than or
> > +                * equal to 0
> > +                */
> > +               switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +               case KVM_STATS_UNIT_NONE:
> > +               case KVM_STATS_UNIT_BYTES:
> > +               case KVM_STATS_UNIT_CYCLES:
> > +                       if (pdesc->exponent < 0) {
> > +                               printf("Unsupported KVM stats unit!\n");
> > +                               goto out_free_desc;
> > +                       }
> > +                       break;
> > +               case KVM_STATS_UNIT_SECONDS:
> > +                       if (pdesc->exponent > 0) {
> > +                               printf("Unsupported KVM stats unit!\n");
> > +                               goto out_free_desc;
> > +                       }
> > +                       break;
> > +               }
> > +               /* Check name string */
> > +               if (strlen(pdesc->name) >= header.name_size) {
> > +                       printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                       goto out_free_desc;
> > +               }
> > +               /* Check size field, which should not be zero */
> > +               if (pdesc->size == 0) {
> > +                       printf("KVM descriptor(%s) with size of 0!\n",
> > +                                       pdesc->name);
> > +                       goto out_free_desc;
> > +               }
> > +               size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +       }
> > +       /* Check overlap */
> > +       if (header.data_offset < header.desc_offset &&
> > +               header.data_offset + size_data > header.desc_offset) {
> > +               printf("Data block is overlapped with Descriptor block!\n");
> > +               goto out_free_desc;
> > +       }
> > +       /* Check validity of all stats data size */
> > +       if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +               printf("Data size is not correct!\n");
> > +               goto out_free_desc;
> > +       }
> > +
> > +       /* Allocate memory for stats data */
> > +       stats_data = malloc(size_data);
> > +       if (!stats_data) {
> > +               perror("Allocate memory for VM stats data");
> > +               goto out_free_desc;
> > +       }
> > +       /* Read kvm vm stats data */
> > +       ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +       if (ret != size_data) {
> > +               perror("Read KVM VM stats data");
> > +               goto out_free_data;
> > +       }
> > +
> > +       err = 0;
> > +out_free_data:
> > +       free(stats_data);
> > +out_free_desc:
> > +       free(stats_desc);
> > +out_close_fd:
> > +       close(stats_fd);
> > +       return err;
> > +}
> > +
> > +int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> > +{
> > +       ssize_t ret;
> > +       int i, stats_fd, err = -1;
> > +       size_t size_desc, size_data = 0;
> > +       struct kvm_stats_header header;
> > +       struct kvm_stats_desc *stats_desc, *pdesc;
> > +       struct kvm_vcpu_stats_data *stats_data;
> > +
> > +       /* Get fd for VCPU stats */
> > +       stats_fd = vcpu_get_statsfd(vm, vcpu_id);
> > +       if (stats_fd < 0) {
> > +               perror("Get VCPU stats fd");
> > +               return err;
> > +       }
> > +       /* Read kvm vcpu stats header */
> > +       ret = read(stats_fd, &header, sizeof(header));
> > +       if (ret != sizeof(header)) {
> > +               perror("Read VCPU stats header");
> > +               goto out_close_fd;
> > +       }
> > +       size_desc = sizeof(*stats_desc) + header.name_size;
> > +       /* Check id string in header, that should start with "kvm" */
> > +       if (strncmp(header.id, "kvm", 3) ||
> > +                       strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +               printf("Invalid KVM VCPU stats type!\n");
> > +               goto out_close_fd;
> > +       }
> > +       /* Sanity check for other fields in header */
> > +       if (header.count == 0) {
> > +               err = 0;
> > +               goto out_close_fd;
> > +       }
> > +       /* Check overlap */
> > +       if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                       header.desc_offset < sizeof(header) ||
> > +                       header.data_offset < sizeof(header)) {
> > +               printf("Invalid offset fields in header!\n");
> > +               goto out_close_fd;
> > +       }
> > +       if (header.desc_offset < header.data_offset &&
> > +                       (header.desc_offset + size_desc * header.count >
> > +                       header.data_offset)) {
> > +               printf("VCPU Descriptor block is overlapped with data block!\n");
> > +               goto out_close_fd;
> > +       }
> > +
> > +       /* Allocate memory for stats descriptors */
> > +       stats_desc = calloc(header.count, size_desc);
> > +       if (!stats_desc) {
> > +               perror("Allocate memory for VCPU stats descriptors");
> > +               goto out_close_fd;
> > +       }
> > +       /* Read kvm vcpu stats descriptors */
> > +       ret = pread(stats_fd, stats_desc,
> > +                       size_desc * header.count, header.desc_offset);
> > +       if (ret != size_desc * header.count) {
> > +               perror("Read KVM VCPU stats descriptors");
> > +               goto out_free_desc;
> > +       }
> > +       /* Sanity check for fields in descriptors */
> > +       for (i = 0; i < header.count; ++i) {
> > +               pdesc = (void *)stats_desc + i * size_desc;
> > +               /* Check boundaries */
> > +               if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                       printf("Unknown KVM stats type!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                       printf("Unknown KVM stats unit!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                               KVM_STATS_SCALE_MAX) {
> > +                       printf("Unknown KVM stats scale!\n");
> > +                       goto out_free_desc;
> > +               }
> > +               /* Check exponent for stats unit
> > +                * Exponent for counter should be greater than or equal to 0
> > +                * Exponent for unit bytes should be greater than or equal to 0
> > +                * Exponent for unit seconds should be less than or equal to 0
> > +                * Exponent for unit clock cycles should be greater than or
> > +                * equal to 0
> > +                */
> > +               switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +               case KVM_STATS_UNIT_NONE:
> > +               case KVM_STATS_UNIT_BYTES:
> > +               case KVM_STATS_UNIT_CYCLES:
> > +                       if (pdesc->exponent < 0) {
> > +                               printf("Unsupported KVM stats unit!\n");
> > +                               goto out_free_desc;
> > +                       }
> > +                       break;
> > +               case KVM_STATS_UNIT_SECONDS:
> > +                       if (pdesc->exponent > 0) {
> > +                               printf("Unsupported KVM stats unit!\n");
> > +                               goto out_free_desc;
> > +                       }
> > +                       break;
> > +               }
> > +               /* Check name string */
> > +               if (strlen(pdesc->name) >= header.name_size) {
> > +                       printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                       goto out_free_desc;
> > +               }
> > +               /* Check size field, which should not be zero */
> > +               if (pdesc->size == 0) {
> > +                       printf("KVM descriptor(%s) with size of 0!\n",
> > +                                       pdesc->name);
> > +                       goto out_free_desc;
> > +               }
> > +               size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +       }
> > +       /* Check overlap */
> > +       if (header.data_offset < header.desc_offset &&
> > +               header.data_offset + size_data > header.desc_offset) {
> > +               printf("Data block is overlapped with Descriptor block!\n");
> > +               goto out_free_desc;
> > +       }
> > +       /* Check validity of all stats data size */
> > +       if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +               printf("Data size is not correct!\n");
> > +               goto out_free_desc;
> > +       }
> > +
> > +       /* Allocate memory for stats data */
> > +       stats_data = malloc(size_data);
> > +       if (!stats_data) {
> > +               perror("Allocate memory for VCPU stats data");
> > +               goto out_free_desc;
> > +       }
> > +       /* Read kvm vcpu stats data */
> > +       ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +       if (ret != size_data) {
> > +               perror("Read KVM VCPU stats data");
> > +               goto out_free_data;
> > +       }
> > +
> > +       err = 0;
> > +out_free_data:
> > +       free(stats_data);
> > +out_free_desc:
> > +       free(stats_desc);
> > +out_close_fd:
> > +       close(stats_fd);
> > +       return err;
> > +}
> > +
> > +/*
> > + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> > + * The first parameter #vm set the number of VMs being created.
> > + * The second parameter #vcpu set the number of VCPUs being created.
> > + * By default, 1 VM and 1 VCPU for the VM would be created for testing.
> > + */
>
> Consider setting the default to something higher so people running
> this test with default arguments get more test coverage?
>
Good point. Will use 4 VM and 4 VCPU as default.
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +       int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
> > +       struct kvm_vm **vms;
> > +
> > +       /* Get the number of VMs and VCPUs that would be created for testing. */
> > +       if (argc > 1) {
> > +               max_vm = strtol(argv[1], NULL, 0);
> > +               if (max_vm <= 0)
> > +                       max_vm = 1;
> > +       }
> > +       if (argc > 2) {
> > +               max_vcpu = strtol(argv[2], NULL, 0);
> > +               if (max_vcpu <= 0)
> > +                       max_vcpu = 1;
> > +       }
> > +
> > +       /* Check the extension for binary stats */
> > +       ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> > +       if (ret < 0) {
> > +               printf("Binary form statistics interface is not supported!\n");
> > +               return err;
> > +       }
> > +
> > +       /* Create VMs and VCPUs */
> > +       vms = malloc(sizeof(vms[0]) * max_vm);
> > +       if (!vms) {
> > +               perror("Allocate memory for storing VM pointers");
> > +               return err;
> > +       }
> > +       for (i = 0; i < max_vm; ++i) {
> > +               vms[i] = vm_create(VM_MODE_DEFAULT,
> > +                               DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> > +               for (j = 0; j < max_vcpu; ++j)
> > +                       vm_vcpu_add(vms[i], j);
> > +       }
> > +
> > +       /* Check stats read for every VM and VCPU */
> > +       for (i = 0; i < max_vm; ++i) {
> > +               if (vm_stats_test(vms[i]))
> > +                       goto out_free_vm;
> > +               for (j = 0; j < max_vcpu; ++j) {
> > +                       if (vcpu_stats_test(vms[i], j))
> > +                               goto out_free_vm;
> > +               }
> > +       }
> > +
> > +       err = 0;
> > +out_free_vm:
> > +       for (i = 0; i < max_vm; ++i)
> > +               kvm_vm_free(vms[i]);
> > +       free(vms);
> > +       return err;
> > +}
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index fc83f6c5902d..d9e0b2c8b906 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
> >         n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
> >         return vm_adjust_num_guest_pages(mode, n);
> >  }
> > +
> > +int vm_get_statsfd(struct kvm_vm *vm)
> > +{
> > +       return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
> > +}
> > +
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +       struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> > +
> > +       return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
> > +}
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-19 17:12   ` David Matlack
@ 2021-05-19 19:02     ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 19:02 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Wed, May 19, 2021 at 12:13 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 9:32 AM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Provides a file descriptor per VM to read VM stats info/data.
> > Provides a file descriptor per vCPU to read vCPU stats info/data.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  arch/arm64/kvm/guest.c    |  26 +++++
> >  arch/mips/kvm/mips.c      |  52 +++++++++
> >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> >  arch/x86/kvm/x86.c        |  53 +++++++++
> >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> >  include/uapi/linux/kvm.h  |  50 +++++++++
> >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> >  9 files changed, 745 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 0e41331b0911..1cc1d83630ac 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -28,6 +28,32 @@
> >
> >  #include "trace.h"
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("hvc_exit_stat"),
> > +       STATS_DESC_COUNTER("wfe_exit_stat"),
> > +       STATS_DESC_COUNTER("wfi_exit_stat"),
> > +       STATS_DESC_COUNTER("mmio_exit_user"),
> > +       STATS_DESC_COUNTER("mmio_exit_kernel"),
> > +       STATS_DESC_COUNTER("exits"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT_COM("halt_successful_poll", halt_successful_poll),
> >         VCPU_STAT_COM("halt_attempted_poll", halt_attempted_poll),
> > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> > index f4fc60c05e9c..f17a65743ccd 100644
> > --- a/arch/mips/kvm/mips.c
> > +++ b/arch/mips/kvm/mips.c
> > @@ -38,6 +38,58 @@
> >  #define VECTORSPACING 0x100    /* for EI/VI mode */
> >  #endif
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC();
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("wait_exits"),
> > +       STATS_DESC_COUNTER("cache_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("int_exits"),
> > +       STATS_DESC_COUNTER("cop_unusable_exits"),
> > +       STATS_DESC_COUNTER("tlbmod_exits"),
> > +       STATS_DESC_COUNTER("tlbmiss_ld_exits"),
> > +       STATS_DESC_COUNTER("tlbmiss_st_exits"),
> > +       STATS_DESC_COUNTER("addrerr_st_exits"),
> > +       STATS_DESC_COUNTER("addrerr_ld_exits"),
> > +       STATS_DESC_COUNTER("syscall_exits"),
> > +       STATS_DESC_COUNTER("resvd_inst_exits"),
> > +       STATS_DESC_COUNTER("break_inst_exits"),
> > +       STATS_DESC_COUNTER("trap_inst_exits"),
> > +       STATS_DESC_COUNTER("msa_fpe_exits"),
> > +       STATS_DESC_COUNTER("fpe_exits"),
> > +       STATS_DESC_COUNTER("msa_disabled_exits"),
> > +       STATS_DESC_COUNTER("flush_dcache_exits"),
> > +#ifdef CONFIG_KVM_MIPS_VZ
> > +       STATS_DESC_COUNTER("vz_gpsi_exits"),
> > +       STATS_DESC_COUNTER("vz_gsfc_exits"),
> > +       STATS_DESC_COUNTER("vz_hc_exits"),
> > +       STATS_DESC_COUNTER("vz_grr_exits"),
> > +       STATS_DESC_COUNTER("vz_gva_exits"),
> > +       STATS_DESC_COUNTER("vz_ghfc_exits"),
> > +       STATS_DESC_COUNTER("vz_gpa_exits"),
> > +       STATS_DESC_COUNTER("vz_resvd_exits"),
> > +#ifdef CONFIG_CPU_LOONGSON64
> > +       STATS_DESC_COUNTER("vz_cpucfg_exits"),
> > +#endif
> > +#endif
> > +       );
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("wait", wait_exits),
> >         VCPU_STAT("cache", cache_exits),
> > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> > index bd3a10e1fdaf..5e8ee0d39ef9 100644
> > --- a/arch/powerpc/kvm/book3s.c
> > +++ b/arch/powerpc/kvm/book3s.c
> > @@ -38,6 +38,58 @@
> >
> >  /* #define EXIT_DEBUG */
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +       STATS_DESC_ICOUNTER("num_2M_pages"),
> > +       STATS_DESC_ICOUNTER("num_1G_pages"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("sum_exits"),
> > +       STATS_DESC_COUNTER("mmio_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("light_exits"),
> > +       STATS_DESC_COUNTER("itlb_real_miss_exits"),
> > +       STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> > +       STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> > +       STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> > +       STATS_DESC_COUNTER("syscall_exits"),
> > +       STATS_DESC_COUNTER("isi_exits"),
> > +       STATS_DESC_COUNTER("dsi_exits"),
> > +       STATS_DESC_COUNTER("emulated_inst_exits"),
> > +       STATS_DESC_COUNTER("dec_exits"),
> > +       STATS_DESC_COUNTER("ext_intr_exits"),
> > +       STATS_DESC_TIME_NSEC("halt_wait_ns"),
> > +       STATS_DESC_COUNTER("halt_successful_wait"),
> > +       STATS_DESC_COUNTER("dbell_exits"),
> > +       STATS_DESC_COUNTER("gdbell_exits"),
> > +       STATS_DESC_COUNTER("ld"),
> > +       STATS_DESC_COUNTER("st"),
> > +       STATS_DESC_COUNTER("pf_storage"),
> > +       STATS_DESC_COUNTER("pf_instruc"),
> > +       STATS_DESC_COUNTER("sp_storage"),
> > +       STATS_DESC_COUNTER("sp_instruc"),
> > +       STATS_DESC_COUNTER("queue_intr"),
> > +       STATS_DESC_COUNTER("ld_slow"),
> > +       STATS_DESC_COUNTER("st_slow"),
> > +       STATS_DESC_COUNTER("pthru_all"),
> > +       STATS_DESC_COUNTER("pthru_host"),
> > +       STATS_DESC_COUNTER("pthru_bad_aff"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("exits", sum_exits),
> >         VCPU_STAT("mmio", mmio_exits),
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 07fdd7a1254a..86d221e9193e 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -36,6 +36,51 @@
> >
> >  unsigned long kvmppc_booke_handlers;
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +       STATS_DESC_ICOUNTER("num_2M_pages"),
> > +       STATS_DESC_ICOUNTER("num_1G_pages"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("sum_exits"),
> > +       STATS_DESC_COUNTER("mmio_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("light_exits"),
> > +       STATS_DESC_COUNTER("itlb_real_miss_exits"),
> > +       STATS_DESC_COUNTER("itlb_virt_miss_exits"),
> > +       STATS_DESC_COUNTER("dtlb_real_miss_exits"),
> > +       STATS_DESC_COUNTER("dtlb_virt_miss_exits"),
> > +       STATS_DESC_COUNTER("syscall_exits"),
> > +       STATS_DESC_COUNTER("isi_exits"),
> > +       STATS_DESC_COUNTER("dsi_exits"),
> > +       STATS_DESC_COUNTER("emulated_inst_exits"),
> > +       STATS_DESC_COUNTER("dec_exits"),
> > +       STATS_DESC_COUNTER("ext_intr_exits"),
> > +       STATS_DESC_TIME_NSEC("halt_wait_ns"),
> > +       STATS_DESC_COUNTER("halt_successful_wait"),
> > +       STATS_DESC_COUNTER("dbell_exits"),
> > +       STATS_DESC_COUNTER("gdbell_exits"),
> > +       STATS_DESC_COUNTER("ld"),
> > +       STATS_DESC_COUNTER("st"),
> > +       STATS_DESC_COUNTER("pthru_all"),
> > +       STATS_DESC_COUNTER("pthru_host"),
> > +       STATS_DESC_COUNTER("pthru_bad_aff"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("mmio", mmio_exits),
> >         VCPU_STAT("sig", signal_exits),
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index d6bf3372bb10..003feee79fce 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -58,6 +58,123 @@
> >  #define VCPU_IRQS_MAX_BUF (sizeof(struct kvm_s390_irq) * \
> >                            (KVM_MAX_VCPUS + LOCAL_IRQS))
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +       STATS_DESC_COUNTER("inject_io"),
> > +       STATS_DESC_COUNTER("inject_float_mchk"),
> > +       STATS_DESC_COUNTER("inject_pfault_done"),
> > +       STATS_DESC_COUNTER("inject_service_signal"),
> > +       STATS_DESC_COUNTER("inject_virtio"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("exit_userspace"),
> > +       STATS_DESC_COUNTER("exit_null"),
> > +       STATS_DESC_COUNTER("exit_external_request"),
> > +       STATS_DESC_COUNTER("exit_io_request"),
> > +       STATS_DESC_COUNTER("exit_external_interrupt"),
> > +       STATS_DESC_COUNTER("exit_stop_request"),
> > +       STATS_DESC_COUNTER("exit_validity"),
> > +       STATS_DESC_COUNTER("exit_instruction"),
> > +       STATS_DESC_COUNTER("exit_pei"),
> > +       STATS_DESC_COUNTER("halt_no_poll_steal"),
> > +       STATS_DESC_COUNTER("instruction_lctl"),
> > +       STATS_DESC_COUNTER("instruction_lctlg"),
> > +       STATS_DESC_COUNTER("instruction_stctl"),
> > +       STATS_DESC_COUNTER("instruction_stctg"),
> > +       STATS_DESC_COUNTER("exit_program_interruption"),
> > +       STATS_DESC_COUNTER("exit_instr_and_program"),
> > +       STATS_DESC_COUNTER("exit_operation_exception"),
> > +       STATS_DESC_COUNTER("deliver_ckc"),
> > +       STATS_DESC_COUNTER("deliver_cputm"),
> > +       STATS_DESC_COUNTER("deliver_external_call"),
> > +       STATS_DESC_COUNTER("deliver_emergency_signal"),
> > +       STATS_DESC_COUNTER("deliver_service_signal"),
> > +       STATS_DESC_COUNTER("deliver_virtio"),
> > +       STATS_DESC_COUNTER("deliver_stop_signal"),
> > +       STATS_DESC_COUNTER("deliver_prefix_signal"),
> > +       STATS_DESC_COUNTER("deliver_restart_signal"),
> > +       STATS_DESC_COUNTER("deliver_program"),
> > +       STATS_DESC_COUNTER("deliver_io"),
> > +       STATS_DESC_COUNTER("deliver_machine_check"),
> > +       STATS_DESC_COUNTER("exit_wait_state"),
> > +       STATS_DESC_COUNTER("inject_ckc"),
> > +       STATS_DESC_COUNTER("inject_cputm"),
> > +       STATS_DESC_COUNTER("inject_external_call"),
> > +       STATS_DESC_COUNTER("inject_emergency_signal"),
> > +       STATS_DESC_COUNTER("inject_mchk"),
> > +       STATS_DESC_COUNTER("inject_pfault_init"),
> > +       STATS_DESC_COUNTER("inject_program"),
> > +       STATS_DESC_COUNTER("inject_restart"),
> > +       STATS_DESC_COUNTER("inject_set_prefix"),
> > +       STATS_DESC_COUNTER("inject_stop_signal"),
> > +       STATS_DESC_COUNTER("instruction_epsw"),
> > +       STATS_DESC_COUNTER("instruction_gs"),
> > +       STATS_DESC_COUNTER("instruction_io_other"),
> > +       STATS_DESC_COUNTER("instruction_lpsw"),
> > +       STATS_DESC_COUNTER("instruction_lpswe"),
> > +       STATS_DESC_COUNTER("instruction_pfmf"),
> > +       STATS_DESC_COUNTER("instruction_ptff"),
> > +       STATS_DESC_COUNTER("instruction_sck"),
> > +       STATS_DESC_COUNTER("instruction_sckpf"),
> > +       STATS_DESC_COUNTER("instruction_stidp"),
> > +       STATS_DESC_COUNTER("instruction_spx"),
> > +       STATS_DESC_COUNTER("instruction_stpx"),
> > +       STATS_DESC_COUNTER("instruction_stap"),
> > +       STATS_DESC_COUNTER("instruction_iske"),
> > +       STATS_DESC_COUNTER("instruction_ri"),
> > +       STATS_DESC_COUNTER("instruction_rrbe"),
> > +       STATS_DESC_COUNTER("instruction_sske"),
> > +       STATS_DESC_COUNTER("instruction_ipte_interlock"),
> > +       STATS_DESC_COUNTER("instruction_stsi"),
> > +       STATS_DESC_COUNTER("instruction_stfl"),
> > +       STATS_DESC_COUNTER("instruction_tb"),
> > +       STATS_DESC_COUNTER("instruction_tpi"),
> > +       STATS_DESC_COUNTER("instruction_tprot"),
> > +       STATS_DESC_COUNTER("instruction_tsch"),
> > +       STATS_DESC_COUNTER("instruction_sie"),
> > +       STATS_DESC_COUNTER("instruction_essa"),
> > +       STATS_DESC_COUNTER("instruction_sthyi"),
> > +       STATS_DESC_COUNTER("instruction_sigp_sense"),
> > +       STATS_DESC_COUNTER("instruction_sigp_sense_running"),
> > +       STATS_DESC_COUNTER("instruction_sigp_external_call"),
> > +       STATS_DESC_COUNTER("instruction_sigp_emergency"),
> > +       STATS_DESC_COUNTER("instruction_sigp_cond_emergency"),
> > +       STATS_DESC_COUNTER("instruction_sigp_start"),
> > +       STATS_DESC_COUNTER("instruction_sigp_stop"),
> > +       STATS_DESC_COUNTER("instruction_sigp_stop_store_status"),
> > +       STATS_DESC_COUNTER("instruction_sigp_store_status"),
> > +       STATS_DESC_COUNTER("instruction_sigp_store_adtl_status"),
> > +       STATS_DESC_COUNTER("instruction_sigp_arch"),
> > +       STATS_DESC_COUNTER("instruction_sigp_prefix"),
> > +       STATS_DESC_COUNTER("instruction_sigp_restart"),
> > +       STATS_DESC_COUNTER("instruction_sigp_init_cpu_reset"),
> > +       STATS_DESC_COUNTER("instruction_sigp_cpu_reset"),
> > +       STATS_DESC_COUNTER("instruction_sigp_unknown"),
> > +       STATS_DESC_COUNTER("diagnose_10"),
> > +       STATS_DESC_COUNTER("diagnose_44"),
> > +       STATS_DESC_COUNTER("diagnose_9c"),
> > +       STATS_DESC_COUNTER("diagnose_9c_ignored"),
> > +       STATS_DESC_COUNTER("diagnose_258"),
> > +       STATS_DESC_COUNTER("diagnose_308"),
> > +       STATS_DESC_COUNTER("diagnose_500"),
> > +       STATS_DESC_COUNTER("diagnose_other"),
> > +       STATS_DESC_COUNTER("pfault_sync"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("userspace_handled", exit_userspace),
> >         VCPU_STAT("exit_null", exit_null),
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9a93d80caff6..84880687c199 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -214,6 +214,59 @@ EXPORT_SYMBOL_GPL(host_xss);
> >  u64 __read_mostly supported_xss;
> >  EXPORT_SYMBOL_GPL(supported_xss);
> >
> > +struct _kvm_stats_desc kvm_vm_stats_desc[] = DEFINE_VM_STATS_DESC(
> > +       STATS_DESC_COUNTER("mmu_shadow_zapped"),
> > +       STATS_DESC_COUNTER("mmu_pte_write"),
> > +       STATS_DESC_COUNTER("mmu_pde_zapped"),
> > +       STATS_DESC_COUNTER("mmu_flooded"),
> > +       STATS_DESC_COUNTER("mmu_recycled"),
> > +       STATS_DESC_COUNTER("mmu_cache_miss"),
> > +       STATS_DESC_ICOUNTER("mmu_unsync"),
> > +       STATS_DESC_ICOUNTER("largepages"),
> > +       STATS_DESC_ICOUNTER("nx_largepages_splits"),
> > +       STATS_DESC_ICOUNTER("max_mmu_page_hash_collisions"));
> > +
> > +struct _kvm_stats_header kvm_vm_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vm_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vm_stats_desc),
> > +};
> > +
> > +struct _kvm_stats_desc kvm_vcpu_stats_desc[] = DEFINE_VCPU_STATS_DESC(
> > +       STATS_DESC_COUNTER("pf_fixed"),
> > +       STATS_DESC_COUNTER("pf_guest"),
> > +       STATS_DESC_COUNTER("tlb_flush"),
> > +       STATS_DESC_COUNTER("invlpg"),
> > +       STATS_DESC_COUNTER("exits"),
> > +       STATS_DESC_COUNTER("io_exits"),
> > +       STATS_DESC_COUNTER("mmio_exits"),
> > +       STATS_DESC_COUNTER("signal_exits"),
> > +       STATS_DESC_COUNTER("irq_window_exits"),
> > +       STATS_DESC_COUNTER("nmi_window_exits"),
> > +       STATS_DESC_COUNTER("l1d_flush"),
> > +       STATS_DESC_COUNTER("halt_exits"),
> > +       STATS_DESC_COUNTER("request_irq_exits"),
> > +       STATS_DESC_COUNTER("irq_exits"),
> > +       STATS_DESC_COUNTER("host_state_reload"),
> > +       STATS_DESC_COUNTER("fpu_reload"),
> > +       STATS_DESC_COUNTER("insn_emulation"),
> > +       STATS_DESC_COUNTER("insn_emulation_fail"),
> > +       STATS_DESC_COUNTER("hypercalls"),
> > +       STATS_DESC_COUNTER("irq_injections"),
> > +       STATS_DESC_COUNTER("nmi_injections"),
> > +       STATS_DESC_COUNTER("req_event"),
> > +       STATS_DESC_COUNTER("nested_run"));
> > +
> > +struct _kvm_stats_header kvm_vcpu_stats_header = {
> > +       .name_size = KVM_STATS_NAME_LEN,
> > +       .count = ARRAY_SIZE(kvm_vcpu_stats_desc),
> > +       .desc_offset = sizeof(struct kvm_stats_header),
> > +       .data_offset = sizeof(struct kvm_stats_header) +
> > +               sizeof(kvm_vcpu_stats_desc),
> > +};
> > +
> >  struct kvm_stats_debugfs_item debugfs_entries[] = {
> >         VCPU_STAT("pf_fixed", pf_fixed),
> >         VCPU_STAT("pf_guest", pf_guest),
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 97700e41db3b..52783f8062ca 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1240,6 +1240,19 @@ struct kvm_stats_debugfs_item {
> >         int mode;
> >  };
> >
> > +struct _kvm_stats_header {
> > +       __u32 name_size;
> > +       __u32 count;
> > +       __u32 desc_offset;
> > +       __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_NAME_LEN     48
> > +struct _kvm_stats_desc {
> > +       struct kvm_stats_desc desc;
> > +       char name[KVM_STATS_NAME_LEN];
> > +};
> > +
> >  #define KVM_DBGFS_GET_MODE(dbgfs_item)                                         \
> >         ((dbgfs_item)->mode ? (dbgfs_item)->mode : 0644)
> >
> > @@ -1253,8 +1266,122 @@ struct kvm_stats_debugfs_item {
> >         { n, offsetof(struct kvm_vcpu, stat.common.x),                         \
> >           KVM_STAT_VCPU, ## __VA_ARGS__ }
> >
> > +#define STATS_DESC(name, type, unit, scale, exponent)                         \
> > +       {                                                                      \
> > +               {type | unit | scale, exponent, 1}, name,                      \
> > +       }
>
> Suggest using designated initializers here.
>
Sure, will do.
> > +#define STATS_DESC_CUMULATIVE(name, unit, scale, exponent)                    \
> > +       STATS_DESC(name, KVM_STATS_TYPE_CUMULATIVE, unit, scale, exponent)
> > +#define STATS_DESC_INSTANT(name, unit, scale, exponent)                               \
> > +       STATS_DESC(name, KVM_STATS_TYPE_INSTANT, unit, scale, exponent)
> > +
> > +/* Cumulative counter */
> > +#define STATS_DESC_COUNTER(name)                                              \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_NONE,                       \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +/* Instantaneous counter */
> > +#define STATS_DESC_ICOUNTER(name)                                             \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_NONE,                          \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +
> > +/* Cumulative clock cycles */
> > +#define STATS_DESC_CYCLE(name)                                                \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_CYCLES,                     \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +/* Instantaneous clock cycles */
> > +#define STATS_DESC_ICYCLE(name)                                                       \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_CYCLES,                        \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +
> > +/* Cumulative memory size in Byte */
> > +#define STATS_DESC_SIZE_BYTE(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_SCALE_POW2, 0)
> > +/* Cumulative memory size in KiByte */
> > +#define STATS_DESC_SIZE_KBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_SCALE_POW2, 10)
> > +/* Cumulative memory size in MiByte */
> > +#define STATS_DESC_SIZE_MBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_SCALE_POW2, 20)
> > +/* Cumulative memory size in GiByte */
> > +#define STATS_DESC_SIZE_GBYTE(name)                                           \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_BYTES,                      \
> > +               KVM_STATS_SCALE_POW2, 30)
> > +
> > +/* Instantaneous memory size in Byte */
> > +#define STATS_DESC_ISIZE_BYTE(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_SCALE_POW2, 0)
> > +/* Instantaneous memory size in KiByte */
> > +#define STATS_DESC_ISIZE_KBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_SCALE_POW2, 10)
> > +/* Instantaneous memory size in MiByte */
> > +#define STATS_DESC_ISIZE_MBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_SCALE_POW2, 20)
> > +/* Instantaneous memory size in GiByte */
> > +#define STATS_DESC_ISIZE_GBYTE(name)                                          \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_BYTES,                         \
> > +               KVM_STATS_SCALE_POW2, 30)
> > +
> > +/* Cumulative time in second */
> > +#define STATS_DESC_TIME_SEC(name)                                             \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +/* Cumulative time in millisecond */
> > +#define STATS_DESC_TIME_MSEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_SCALE_POW10, -3)
> > +/* Cumulative time in microsecond */
> > +#define STATS_DESC_TIME_USEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_SCALE_POW10, -6)
> > +/* Cumulative time in nanosecond */
> > +#define STATS_DESC_TIME_NSEC(name)                                            \
> > +       STATS_DESC_CUMULATIVE(name, KVM_STATS_UNIT_SECONDS,                    \
> > +               KVM_STATS_SCALE_POW10, -9)
> > +
> > +/* Instantaneous time in second */
> > +#define STATS_DESC_ITIME_SEC(name)                                            \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_SCALE_POW10, 0)
> > +/* Instantaneous time in millisecond */
> > +#define STATS_DESC_ITIME_MSEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_SCALE_POW10, -3)
> > +/* Instantaneous time in microsecond */
> > +#define STATS_DESC_ITIME_USEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_SCALE_POW10, -6)
> > +/* Instantaneous time in nanosecond */
> > +#define STATS_DESC_ITIME_NSEC(name)                                           \
> > +       STATS_DESC_INSTANT(name, KVM_STATS_UNIT_SECONDS,                       \
> > +               KVM_STATS_SCALE_POW10, -9)
> > +
> > +#define DEFINE_VM_STATS_DESC(...) {                                           \
> > +       STATS_DESC_COUNTER("remote_tlb_flush"),                                \
> > +       ## __VA_ARGS__                                                         \
> > +}
> > +
> > +#define DEFINE_VCPU_STATS_DESC(...) {                                         \
> > +       STATS_DESC_COUNTER("halt_successful_poll"),                            \
> > +       STATS_DESC_COUNTER("halt_attempted_poll"),                             \
> > +       STATS_DESC_COUNTER("halt_poll_invalid"),                               \
> > +       STATS_DESC_COUNTER("halt_wakeup"),                                     \
> > +       STATS_DESC_TIME_NSEC("halt_poll_success_ns"),                          \
> > +       STATS_DESC_TIME_NSEC("halt_poll_fail_ns"),                             \
> > +       ## __VA_ARGS__                                                         \
> > +}
> > +
> >  extern struct kvm_stats_debugfs_item debugfs_entries[];
> >  extern struct dentry *kvm_debugfs_dir;
> > +extern struct _kvm_stats_header kvm_vm_stats_header;
> > +extern struct _kvm_stats_header kvm_vcpu_stats_header;
> > +extern struct _kvm_stats_desc kvm_vm_stats_desc[];
> > +extern struct _kvm_stats_desc kvm_vcpu_stats_desc[];
> >
> >  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
> >  static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 3fd9a7e9d90c..a64e92c7d9de 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_SGX_ATTRIBUTE 196
> >  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
> >  #define KVM_CAP_PTP_KVM 198
> > +#define KVM_CAP_STATS_BINARY_FD 199
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -1898,4 +1899,53 @@ struct kvm_dirty_gfn {
> >  #define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
> >  #define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
> >
> > +#define KVM_STATS_ID_MAXLEN            64
> > +
> > +struct kvm_stats_header {
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       __u32 name_size;
> > +       __u32 count;
> > +       __u32 desc_offset;
> > +       __u32 data_offset;
> > +};
> > +
> > +#define KVM_STATS_TYPE_SHIFT           0
> > +#define KVM_STATS_TYPE_MASK            (0xF << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_CUMULATIVE      (0x0 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_INSTANT         (0x1 << KVM_STATS_TYPE_SHIFT)
> > +#define KVM_STATS_TYPE_MAX             KVM_STATS_TYPE_INSTANT
> > +
> > +#define KVM_STATS_UNIT_SHIFT           4
> > +#define KVM_STATS_UNIT_MASK            (0xF << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_NONE            (0x0 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_BYTES           (0x1 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_SECONDS         (0x2 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_CYCLES          (0x3 << KVM_STATS_UNIT_SHIFT)
> > +#define KVM_STATS_UNIT_MAX             KVM_STATS_UNIT_CYCLES
> > +
> > +#define KVM_STATS_SCALE_SHIFT          8
> > +#define KVM_STATS_SCALE_MASK           (0xF << KVM_STATS_SCALE_SHIFT)
> > +#define KVM_STATS_SCALE_POW10          (0x0 << KVM_STATS_SCALE_SHIFT)
> > +#define KVM_STATS_SCALE_POW2           (0x1 << KVM_STATS_SCALE_SHIFT)
> > +#define KVM_STATS_SCALE_MAX            KVM_STATS_SCALE_POW2
> > +
> > +struct kvm_stats_desc {
> > +       __u32 flags;
> > +       __s16 exponent;
> > +       __u16 size;
> > +       __u32 unused1;
> > +       __u32 unused2;
> > +       char name[0];
> > +};
> > +
> > +struct kvm_vm_stats_data {
> > +       unsigned long value[0];
> > +};
> > +
> > +struct kvm_vcpu_stats_data {
> > +       __u64 value[0];
> > +};
> > +
> > +#define KVM_STATS_GETFD  _IOR(KVMIO,  0xcc, struct kvm_stats_header)
> > +
> >  #endif /* __LINUX_KVM_H */
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 34a4cf265297..9e2c8dcdeae9 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -3409,6 +3409,115 @@ static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
> >         return 0;
> >  }
> >
> > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > +                             size_t size, loff_t *offset)
> > +{
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       struct kvm_vcpu *vcpu = file->private_data;
> > +       ssize_t copylen, len, remain = size;
> > +       size_t size_header, size_desc, size_stats;
> > +       loff_t pos = *offset;
> > +       char __user *dest = user_buffer;
> > +       void *src;
> > +
> > +       snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > +                       task_pid_nr(current), vcpu->vcpu_id);
> > +       size_header = sizeof(kvm_vcpu_stats_header);
> > +       size_desc =
> > +               kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > +       size_stats = sizeof(vcpu->stat);
> > +
> > +       len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > +       len = min(len, remain);
> > +       if (len <= 0)
> > +               return 0;
> > +       remain = len;
> > +
> > +       /* Copy kvm vcpu stats header id string */
> > +       copylen = sizeof(id) - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)id + pos;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm vcpu stats header */
> > +       copylen = sizeof(id) + size_header - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&kvm_vcpu_stats_header;
> > +               src += pos - sizeof(id);
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm vcpu stats descriptors */
> > +       copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&kvm_vcpu_stats_desc;
> > +               src += pos - kvm_vcpu_stats_header.desc_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
>
> KVM could cache everything above this to avoid the cost of
> regenerating it on every read. It would require allocating some extra
> memory in the kernel though, so it's not free. But if userspace is
> reading stats for every vCPU every second it could be worth it.
>
Stats descriptors are only read one time for every VM and VCPU during a
VM boot. No cache is needed.
> > +       /* Copy kvm vcpu stats values */
> > +       copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&vcpu->stat;
> > +               src += pos - kvm_vcpu_stats_header.data_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +
> > +       *offset = pos;
> > +       return len;
> > +}
> > +
> > +static const struct file_operations kvm_vcpu_stats_fops = {
> > +       .read = kvm_vcpu_stats_read,
> > +       .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vcpu_ioctl_get_statsfd(struct kvm_vcpu *vcpu)
> > +{
> > +       int error, fd;
> > +       struct file *file;
> > +       char name[15 + ITOA_MAX_LEN + 1];
> > +
> > +       snprintf(name, sizeof(name), "kvm-vcpu-stats:%d", vcpu->vcpu_id);
>
> Does this need to be globally unique? I was going to suggest using the
> id ("kvm-%d/vcpu-%d") but the slash is probably not allowed. It would
> be nice though to have the file name the same as the id though so
> maybe change the id and name to something like  "kvm-%d.vcpu-%d"?
>
The name passed into anon_inode_getfile is called a "class name" used for
dentry names which is associated with the anonymous inode. According to
the first commit for anonymous indoe support below, we know that the name
actually doesn't matter for anything.
https://github.com/torvalds/linux/commit/5dc8bf8132d59c03fe2562bce165c2f03f021687

This name is not related to the id in any way. Will keep the id format as it is.
> > +
> > +       error = get_unused_fd_flags(O_CLOEXEC);
> > +       if (error < 0)
> > +               return error;
> > +       fd = error;
> > +
> > +       file = anon_inode_getfile(name, &kvm_vcpu_stats_fops, vcpu, O_RDONLY);
> > +       if (IS_ERR(file)) {
> > +               error = PTR_ERR(file);
> > +               goto err_put_unused_fd;
> > +       }
> > +       file->f_mode |= FMODE_PREAD;
> > +       fd_install(fd, file);
> > +
> > +       return fd;
> > +
> > +err_put_unused_fd:
> > +       put_unused_fd(fd);
> > +       return error;
> > +}
> > +
> >  static long kvm_vcpu_ioctl(struct file *filp,
> >                            unsigned int ioctl, unsigned long arg)
> >  {
> > @@ -3606,6 +3715,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> >                 r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
> >                 break;
> >         }
> > +       case KVM_STATS_GETFD: {
> > +               r = kvm_vcpu_ioctl_get_statsfd(vcpu);
> > +               break;
> > +       }
> >         default:
> >                 r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
> >         }
> > @@ -3864,6 +3977,8 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> >  #else
> >                 return 0;
> >  #endif
> > +       case KVM_CAP_STATS_BINARY_FD:
> > +               return 1;
> >         default:
> >                 break;
> >         }
> > @@ -3967,6 +4082,111 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >         }
> >  }
> >
> > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > +                             size_t size, loff_t *offset)
> > +{
> > +       char id[KVM_STATS_ID_MAXLEN];
> > +       struct kvm *kvm = file->private_data;
> > +       ssize_t copylen, len, remain = size;
> > +       size_t size_header, size_desc, size_stats;
> > +       loff_t pos = *offset;
> > +       char __user *dest = user_buffer;
> > +       void *src;
> > +
> > +       snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > +       size_header = sizeof(kvm_vm_stats_header);
> > +       size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > +       size_stats = sizeof(kvm->stat);
> > +
> > +       len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > +       len = min(len, remain);
> > +       if (len <= 0)
> > +               return 0;
> > +       remain = len;
> > +
> > +       /* Copy kvm vm stats header id string */
> > +       copylen = sizeof(id) - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)id + pos;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm vm stats header */
> > +       copylen = sizeof(id) + size_header - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&kvm_vm_stats_header;
> > +               src += pos - sizeof(id);
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +       /* Copy kvm vm stats descriptors */
> > +       copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&kvm_vm_stats_desc;
> > +               src += pos - kvm_vm_stats_header.desc_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
>
> Ditto here about caching.
>
>
> > +       /* Copy kvm vm stats values */
> > +       copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > +       copylen = min(copylen, remain);
> > +       if (copylen > 0) {
> > +               src = (void *)&kvm->stat;
> > +               src += pos - kvm_vm_stats_header.data_offset;
> > +               if (copy_to_user(dest, src, copylen))
> > +                       return -EFAULT;
> > +               remain -= copylen;
> > +               pos += copylen;
> > +               dest += copylen;
> > +       }
> > +
> > +       *offset = pos;
> > +       return len;
> > +}
> > +
> > +static const struct file_operations kvm_vm_stats_fops = {
> > +       .read = kvm_vm_stats_read,
> > +       .llseek = noop_llseek,
> > +};
> > +
> > +static int kvm_vm_ioctl_get_statsfd(struct kvm *kvm)
> > +{
> > +       int error, fd;
> > +       struct file *file;
> > +
> > +       error = get_unused_fd_flags(O_CLOEXEC);
> > +       if (error < 0)
> > +               return error;
> > +       fd = error;
> > +
> > +       file = anon_inode_getfile("kvm-vm-stats",
> > +                       &kvm_vm_stats_fops, kvm, O_RDONLY);
> > +       if (IS_ERR(file)) {
> > +               error = PTR_ERR(file);
> > +               goto err_put_unused_fd;
> > +       }
> > +       file->f_mode |= FMODE_PREAD;
> > +       fd_install(fd, file);
> > +
> > +       return fd;
> > +
> > +err_put_unused_fd:
> > +       put_unused_fd(fd);
> > +       return error;
> > +}
> > +
> >  static long kvm_vm_ioctl(struct file *filp,
> >                            unsigned int ioctl, unsigned long arg)
> >  {
> > @@ -4149,6 +4369,9 @@ static long kvm_vm_ioctl(struct file *filp,
> >         case KVM_RESET_DIRTY_RINGS:
> >                 r = kvm_vm_ioctl_reset_dirty_pages(kvm);
> >                 break;
> > +       case KVM_STATS_GETFD:
> > +               r = kvm_vm_ioctl_get_statsfd(kvm);
> > +               break;
> >         default:
> >                 r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> >         }
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-19 16:57   ` David Matlack
@ 2021-05-19 19:29     ` Jing Zhang
  2021-05-19 20:30       ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 19:29 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Wed, May 19, 2021 at 11:57 AM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 9:25 AM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Update KVM API documentation for binary statistics.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
> >  1 file changed, 171 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 7fcb2fd38f42..9a6aa9770dfd 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
> >  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
> >  with the KVM_XEN_VCPU_GET_ATTR ioctl.
> >
> > +4.130 KVM_STATS_GETFD
> > +---------------------
> > +
> > +:Capability: KVM_CAP_STATS_BINARY_FD
> > +:Architectures: all
> > +:Type: vm ioctl, vcpu ioctl
> > +:Parameters: none
> > +:Returns: statistics file descriptor on success, < 0 on error
> > +
> > +Errors:
> > +
> > +  ======     ======================================================
> > +  ENOMEM     if the fd could not be created due to lack of memory
> > +  EMFILE     if the number of opened files exceeds the limit
> > +  ======     ======================================================
> > +
> > +The file descriptor can be used to read VM/vCPU statistics data in binary
> > +format. The file data is organized into three blocks as below:
> > ++-------------+
> > +|   Header    |
> > ++-------------+
> > +| Descriptors |
> > ++-------------+
> > +| Stats Data  |
> > ++-------------+
> > +
> > +The Header block is always at the start of the file. It is only needed to be
> > +read one time after a system boot.
>
> By system boot do you mean the host or the VM? If the host then it's
> probably just cleaner to omit that part entirely and just say "It is
> only needed to be read once.".
>
Will change "system boot" to "VM boot".
> > +It is in the form of ``struct kvm_stats_header`` as below::
> > +
> > +       #define KVM_STATS_ID_MAXLEN             64
> > +
> > +       struct kvm_stats_header {
> > +               char id[KVM_STATS_ID_MAXLEN];
> > +               __u32 name_size;
> > +               __u32 count;
> > +               __u32 desc_offset;
> > +               __u32 data_offset;
> > +       };
> > +
> > +The ``id`` field is identification for the corresponding KVM statistics. For
> > +KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
>
> Should this say "For VM statistics, ..." instead?
>
Yes, will fix it.
> > +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> > +"kvm-12345/vcpu-12".
> > +
> > +The ``name_size`` field is the size (byte) of the statistics name string
> > +(including trailing '\0') appended to the end of every statistics descriptor.
> > +
> > +The ``count`` field is the number of statistics.
> > +
> > +The ``desc_offset`` field is the offset of the Descriptors block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The ``data_offset`` field is the offset of the Stats Data block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The Descriptors block is only needed to be read once after a system boot. It is
>
> Ditto here about system boot.
>
> > +an array of ``struct kvm_stats_desc`` as below::
>
> Consider omitting these macros from the documentation, or moving them
> to later. Readers right here are expecting to see the struct
> kvm_stats_desc given the previous line.
>
How about changing "as below" to "as shown in below code block"?
> > +
> > +       #define KVM_STATS_TYPE_SHIFT            0
> > +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> > +
> > +       #define KVM_STATS_UNIT_SHIFT            4
> > +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> > +
> > +       #define KVM_STATS_SCALE_SHIFT           8
> > +       #define KVM_STATS_SCALE_MASK            (0xF << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_POW10           (0x0 << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_POW2            (0x1 << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_MAX             KVM_STATS_SCALE_POW2
>
> Terminology nit: I think usually this part is called the "base". e.g.
> when you decompose a number X into N * B^E, B is the "base" and E is
> the "exponent". I see you're using "exponent" already but it might
> make sense to change "scale" to "base" throughout this series.
>
Will change "SCALE" to "SCALE_BASE".
> > +
> > +       struct kvm_stats_desc {
> > +               __u32 flags;
> > +               __s16 exponent;
> > +               __u16 size;
> > +               __u32 unused1;
> > +               __u32 unused2;
> > +               char name[0];
> > +       };
> > +
> > +The ``flags`` field contains the type and unit of the statistics data described
> > +by this descriptor. The following flags are supported:
>
> nit: Suggest breaking this list out into separate lists so readers can
> differentiate between the type, unit, and scale. Something like:
>
> Bits 0-3 of ``flags`` encode the type:
>
> * ``KVM_STATS_TYPE_CUMULATIVE`` ...
> * ``KVM_STATS_TYPE_INSTANT`` ...
>
> Bits 4-7 of ``flags encode the unit:
>
> * ``KVM_STATS_UNIT_NONE`` ...
> ...
> etc.
>
Good suggestion. Will do that.
> > +  * ``KVM_STATS_TYPE_CUMULATIVE``
> > +    The statistics data is cumulative. The value of data can only be increased.
> > +    Most of the counters used in KVM are of this type.
> > +    The corresponding ``count`` filed for this type is always 1.
> > +  * ``KVM_STATS_TYPE_INSTANT``
> > +    The statistics data is instantaneous. Its value can be increased or
> > +    decreased. This type is usually used as a measurement of some resources,
> > +    like the number of dirty pages, the number of large pages, etc.
> > +    The corresponding ``count`` field for this type is always 1.
> > +  * ``KVM_STATS_UNIT_NONE``
> > +    There is no unit for the value of statistics data. This usually means that
> > +    the value is a simple counter of an event.
> > +  * ``KVM_STATS_UNIT_BYTES``
> > +    It indicates that the statistics data is used to measure memory size, in the
> > +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> > +    determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
> > +    determined by ``pow(2, exponent)``. For example, if value is 10,
> > +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> > +    can get the statistics data in the unit of Byte by
> > +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> > +    10 * 1024 * 1024 Bytes.
> > +  * ``KVM_STATS_UNIT_SECONDS``
> > +    It indicates that the statistics data is used to measure time/latency, in
> > +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> > +    data is determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
> > +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> > +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> > +    we can get the statistics data in the unit of second by
> > +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> > +  * ``KVM_STATS_UNIT_CYCLES``
> > +    It indicates that the statistics data is used to measure CPU clock cycles.
> > +    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
> > +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> > +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> > +
> > +The ``exponent`` field is the scale of corresponding statistics data. It has two
> > +values as follows:
> > +  * ``KVM_STATS_SCALE_POW10``
>
> I thought the scale was encoded in ``flags`` not ``exponent``? Isn't
> the exponent the
>
The base is encoded in ``flags``, not the exponent.
> > +    The scale is based on power of 10. It is used for measurement of time and
> > +    CPU clock cycles.
> > +  * ``KVM_STATS_SCALE_POW2``
> > +    The scale is based on power of 2. It is used for measurement of memory size.
>
> It might be useful to give an example of how to use the exponent field
> in practice.
>
Those examples where we discuss ``flags`` field also cover the usage
of exponent field.
> > +
> > +The ``size`` field is the number of values of this statistics data. It is in the
> > +unit of ``unsigned long`` for VCPU or ``__u64`` for VM.
> > +
> > +The ``unused1`` and ``unused2`` fields are reserved for future
> > +support for other types of statistics data, like log/linear histogram.
> > +
> > +The ``name`` field points to the name string of the statistics data. The name
> > +string starts at the end of ``struct kvm_stats_desc``.
> > +The maximum length (including trailing '\0') is indicated by ``name_size``
> > +in ``struct kvm_stats_header``.
> > +
> > +The Stats Data block contains an array of data values of type ``struct
> > +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> > +user space periodically to pull statistics data.
> > +The order of data value in Stats Data block is the same as the order of
> > +descriptors in Descriptors block.
> > +  * Statistics data for VM::
> > +
> > +       struct kvm_vm_stats_data {
> > +               unsigned long value[0];
> > +       };
> > +
> > +  * Statistics data for VCPU::
> > +
> > +       struct kvm_vcpu_stats_data {
> > +               __u64 value[0];
> > +       };
> > +
> >  5. The kvm_run structure
> >  ========================
> >
> > @@ -6891,3 +7054,11 @@ This capability is always enabled.
> >  This capability indicates that the KVM virtual PTP service is
> >  supported in the host. A VMM can check whether the service is
> >  available to the guest on migration.
> > +
> > +8.33 KVM_CAP_STATS_BINARY_FD
> > +----------------------------
> > +
> > +:Architectures: all
> > +
> > +This capability indicates the feature that user space can create get a file
> > +descriptor for every VM and VCPU to read statistics data in binary format.
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >
Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-19 17:02   ` David Matlack
@ 2021-05-19 19:30     ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 19:30 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi David,

On Wed, May 19, 2021 at 12:02 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, May 17, 2021 at 9:25 AM Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Update KVM API documentation for binary statistics.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
> >  1 file changed, 171 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 7fcb2fd38f42..9a6aa9770dfd 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
> >  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
> >  with the KVM_XEN_VCPU_GET_ATTR ioctl.
> >
> > +4.130 KVM_STATS_GETFD
> > +---------------------
> > +
> > +:Capability: KVM_CAP_STATS_BINARY_FD
> > +:Architectures: all
> > +:Type: vm ioctl, vcpu ioctl
> > +:Parameters: none
> > +:Returns: statistics file descriptor on success, < 0 on error
> > +
> > +Errors:
> > +
> > +  ======     ======================================================
> > +  ENOMEM     if the fd could not be created due to lack of memory
> > +  EMFILE     if the number of opened files exceeds the limit
> > +  ======     ======================================================
> > +
> > +The file descriptor can be used to read VM/vCPU statistics data in binary
> > +format. The file data is organized into three blocks as below:
> > ++-------------+
> > +|   Header    |
> > ++-------------+
> > +| Descriptors |
> > ++-------------+
> > +| Stats Data  |
> > ++-------------+
> > +
> > +The Header block is always at the start of the file. It is only needed to be
> > +read one time after a system boot.
> > +It is in the form of ``struct kvm_stats_header`` as below::
> > +
> > +       #define KVM_STATS_ID_MAXLEN             64
> > +
> > +       struct kvm_stats_header {
> > +               char id[KVM_STATS_ID_MAXLEN];
> > +               __u32 name_size;
> > +               __u32 count;
> > +               __u32 desc_offset;
> > +               __u32 data_offset;
> > +       };
> > +
> > +The ``id`` field is identification for the corresponding KVM statistics. For
> > +KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> > +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> > +"kvm-12345/vcpu-12".
> > +
> > +The ``name_size`` field is the size (byte) of the statistics name string
> > +(including trailing '\0') appended to the end of every statistics descriptor.
> > +
> > +The ``count`` field is the number of statistics.
> > +
> > +The ``desc_offset`` field is the offset of the Descriptors block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The ``data_offset`` field is the offset of the Stats Data block from the start
> > +of the file indicated by the file descriptor.
> > +
> > +The Descriptors block is only needed to be read once after a system boot. It is
> > +an array of ``struct kvm_stats_desc`` as below::
> > +
> > +       #define KVM_STATS_TYPE_SHIFT            0
> > +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> > +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> > +
> > +       #define KVM_STATS_UNIT_SHIFT            4
> > +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> > +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> > +
> > +       #define KVM_STATS_SCALE_SHIFT           8
> > +       #define KVM_STATS_SCALE_MASK            (0xF << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_POW10           (0x0 << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_POW2            (0x1 << KVM_STATS_SCALE_SHIFT)
> > +       #define KVM_STATS_SCALE_MAX             KVM_STATS_SCALE_POW2
> > +
> > +       struct kvm_stats_desc {
> > +               __u32 flags;
> > +               __s16 exponent;
> > +               __u16 size;
> > +               __u32 unused1;
> > +               __u32 unused2;
> > +               char name[0];
> > +       };
> > +
> > +The ``flags`` field contains the type and unit of the statistics data described
> > +by this descriptor. The following flags are supported:
> > +  * ``KVM_STATS_TYPE_CUMULATIVE``
> > +    The statistics data is cumulative. The value of data can only be increased.
> > +    Most of the counters used in KVM are of this type.
> > +    The corresponding ``count`` filed for this type is always 1.
> > +  * ``KVM_STATS_TYPE_INSTANT``
> > +    The statistics data is instantaneous. Its value can be increased or
> > +    decreased. This type is usually used as a measurement of some resources,
> > +    like the number of dirty pages, the number of large pages, etc.
> > +    The corresponding ``count`` field for this type is always 1.
> > +  * ``KVM_STATS_UNIT_NONE``
> > +    There is no unit for the value of statistics data. This usually means that
> > +    the value is a simple counter of an event.
> > +  * ``KVM_STATS_UNIT_BYTES``
> > +    It indicates that the statistics data is used to measure memory size, in the
> > +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> > +    determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
> > +    determined by ``pow(2, exponent)``. For example, if value is 10,
> > +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> > +    can get the statistics data in the unit of Byte by
> > +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> > +    10 * 1024 * 1024 Bytes.
> > +  * ``KVM_STATS_UNIT_SECONDS``
> > +    It indicates that the statistics data is used to measure time/latency, in
> > +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> > +    data is determined by the ``exponent`` field in the descriptor. The
> > +    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
> > +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> > +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> > +    we can get the statistics data in the unit of second by
> > +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> > +  * ``KVM_STATS_UNIT_CYCLES``
> > +    It indicates that the statistics data is used to measure CPU clock cycles.
> > +    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
> > +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> > +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> > +
> > +The ``exponent`` field is the scale of corresponding statistics data. It has two
> > +values as follows:
> > +  * ``KVM_STATS_SCALE_POW10``
> > +    The scale is based on power of 10. It is used for measurement of time and
> > +    CPU clock cycles.
> > +  * ``KVM_STATS_SCALE_POW2``
> > +    The scale is based on power of 2. It is used for measurement of memory size.
> > +
> > +The ``size`` field is the number of values of this statistics data. It is in the
> > +unit of ``unsigned long`` for VCPU or ``__u64`` for VM.
>
> Note it is the reverse in the implementation.
Will fix this. Thanks.
>
> > +
> > +The ``unused1`` and ``unused2`` fields are reserved for future
> > +support for other types of statistics data, like log/linear histogram.
> > +
> > +The ``name`` field points to the name string of the statistics data. The name
> > +string starts at the end of ``struct kvm_stats_desc``.
> > +The maximum length (including trailing '\0') is indicated by ``name_size``
> > +in ``struct kvm_stats_header``.
> > +
> > +The Stats Data block contains an array of data values of type ``struct
> > +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> > +user space periodically to pull statistics data.
> > +The order of data value in Stats Data block is the same as the order of
> > +descriptors in Descriptors block.
> > +  * Statistics data for VM::
> > +
> > +       struct kvm_vm_stats_data {
> > +               unsigned long value[0];
> > +       };
> > +
> > +  * Statistics data for VCPU::
> > +
> > +       struct kvm_vcpu_stats_data {
> > +               __u64 value[0];
> > +       };
> > +
> >  5. The kvm_run structure
> >  ========================
> >
> > @@ -6891,3 +7054,11 @@ This capability is always enabled.
> >  This capability indicates that the KVM virtual PTP service is
> >  supported in the host. A VMM can check whether the service is
> >  available to the guest on migration.
> > +
> > +8.33 KVM_CAP_STATS_BINARY_FD
> > +----------------------------
> > +
> > +:Architectures: all
> > +
> > +This capability indicates the feature that user space can create get a file
> > +descriptor for every VM and VCPU to read statistics data in binary format.
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface
  2021-05-19 19:29     ` Jing Zhang
@ 2021-05-19 20:30       ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 20:30 UTC (permalink / raw)
  To: David Matlack
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Wed, May 19, 2021 at 2:29 PM Jing Zhang <jingzhangos@google.com> wrote:
>
> Hi David,
>
> On Wed, May 19, 2021 at 11:57 AM David Matlack <dmatlack@google.com> wrote:
> >
> > On Mon, May 17, 2021 at 9:25 AM Jing Zhang <jingzhangos@google.com> wrote:
> > >
> > > Update KVM API documentation for binary statistics.
> > >
> > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > ---
> > >  Documentation/virt/kvm/api.rst | 171 +++++++++++++++++++++++++++++++++
> > >  1 file changed, 171 insertions(+)
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index 7fcb2fd38f42..9a6aa9770dfd 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -5034,6 +5034,169 @@ see KVM_XEN_VCPU_SET_ATTR above.
> > >  The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
> > >  with the KVM_XEN_VCPU_GET_ATTR ioctl.
> > >
> > > +4.130 KVM_STATS_GETFD
> > > +---------------------
> > > +
> > > +:Capability: KVM_CAP_STATS_BINARY_FD
> > > +:Architectures: all
> > > +:Type: vm ioctl, vcpu ioctl
> > > +:Parameters: none
> > > +:Returns: statistics file descriptor on success, < 0 on error
> > > +
> > > +Errors:
> > > +
> > > +  ======     ======================================================
> > > +  ENOMEM     if the fd could not be created due to lack of memory
> > > +  EMFILE     if the number of opened files exceeds the limit
> > > +  ======     ======================================================
> > > +
> > > +The file descriptor can be used to read VM/vCPU statistics data in binary
> > > +format. The file data is organized into three blocks as below:
> > > ++-------------+
> > > +|   Header    |
> > > ++-------------+
> > > +| Descriptors |
> > > ++-------------+
> > > +| Stats Data  |
> > > ++-------------+
> > > +
> > > +The Header block is always at the start of the file. It is only needed to be
> > > +read one time after a system boot.
> >
> > By system boot do you mean the host or the VM? If the host then it's
> > probably just cleaner to omit that part entirely and just say "It is
> > only needed to be read once.".
> >
> Will change "system boot" to "VM boot".
> > > +It is in the form of ``struct kvm_stats_header`` as below::
> > > +
> > > +       #define KVM_STATS_ID_MAXLEN             64
> > > +
> > > +       struct kvm_stats_header {
> > > +               char id[KVM_STATS_ID_MAXLEN];
> > > +               __u32 name_size;
> > > +               __u32 count;
> > > +               __u32 desc_offset;
> > > +               __u32 data_offset;
> > > +       };
> > > +
> > > +The ``id`` field is identification for the corresponding KVM statistics. For
> > > +KVM statistics, it is in the form of "kvm-{kvm pid}", like "kvm-12345". For
> >
> > Should this say "For VM statistics, ..." instead?
> >
> Yes, will fix it.
> > > +VCPU statistics, it is in the form of "kvm-{kvm pid}/vcpu-{vcpu id}", like
> > > +"kvm-12345/vcpu-12".
> > > +
> > > +The ``name_size`` field is the size (byte) of the statistics name string
> > > +(including trailing '\0') appended to the end of every statistics descriptor.
> > > +
> > > +The ``count`` field is the number of statistics.
> > > +
> > > +The ``desc_offset`` field is the offset of the Descriptors block from the start
> > > +of the file indicated by the file descriptor.
> > > +
> > > +The ``data_offset`` field is the offset of the Stats Data block from the start
> > > +of the file indicated by the file descriptor.
> > > +
> > > +The Descriptors block is only needed to be read once after a system boot. It is
> >
> > Ditto here about system boot.
> >
> > > +an array of ``struct kvm_stats_desc`` as below::
> >
> > Consider omitting these macros from the documentation, or moving them
> > to later. Readers right here are expecting to see the struct
> > kvm_stats_desc given the previous line.
> >
> How about changing "as below" to "as shown in below code block"?
> > > +
> > > +       #define KVM_STATS_TYPE_SHIFT            0
> > > +       #define KVM_STATS_TYPE_MASK             (0xF << KVM_STATS_TYPE_SHIFT)
> > > +       #define KVM_STATS_TYPE_CUMULATIVE       (0x0 << KVM_STATS_TYPE_SHIFT)
> > > +       #define KVM_STATS_TYPE_INSTANT          (0x1 << KVM_STATS_TYPE_SHIFT)
> > > +       #define KVM_STATS_TYPE_MAX              KVM_STATS_TYPE_INSTANT
> > > +
> > > +       #define KVM_STATS_UNIT_SHIFT            4
> > > +       #define KVM_STATS_UNIT_MASK             (0xF << KVM_STATS_UNIT_SHIFT)
> > > +       #define KVM_STATS_UNIT_NONE             (0x0 << KVM_STATS_UNIT_SHIFT)
> > > +       #define KVM_STATS_UNIT_BYTES            (0x1 << KVM_STATS_UNIT_SHIFT)
> > > +       #define KVM_STATS_UNIT_SECONDS          (0x2 << KVM_STATS_UNIT_SHIFT)
> > > +       #define KVM_STATS_UNIT_CYCLES           (0x3 << KVM_STATS_UNIT_SHIFT)
> > > +       #define KVM_STATS_UNIT_MAX              KVM_STATS_UNIT_CYCLES
> > > +
> > > +       #define KVM_STATS_SCALE_SHIFT           8
> > > +       #define KVM_STATS_SCALE_MASK            (0xF << KVM_STATS_SCALE_SHIFT)
> > > +       #define KVM_STATS_SCALE_POW10           (0x0 << KVM_STATS_SCALE_SHIFT)
> > > +       #define KVM_STATS_SCALE_POW2            (0x1 << KVM_STATS_SCALE_SHIFT)
> > > +       #define KVM_STATS_SCALE_MAX             KVM_STATS_SCALE_POW2
> >
> > Terminology nit: I think usually this part is called the "base". e.g.
> > when you decompose a number X into N * B^E, B is the "base" and E is
> > the "exponent". I see you're using "exponent" already but it might
> > make sense to change "scale" to "base" throughout this series.
> >
> Will change "SCALE" to "SCALE_BASE".
> > > +
> > > +       struct kvm_stats_desc {
> > > +               __u32 flags;
> > > +               __s16 exponent;
> > > +               __u16 size;
> > > +               __u32 unused1;
> > > +               __u32 unused2;
> > > +               char name[0];
> > > +       };
> > > +
> > > +The ``flags`` field contains the type and unit of the statistics data described
> > > +by this descriptor. The following flags are supported:
> >
> > nit: Suggest breaking this list out into separate lists so readers can
> > differentiate between the type, unit, and scale. Something like:
> >
> > Bits 0-3 of ``flags`` encode the type:
> >
> > * ``KVM_STATS_TYPE_CUMULATIVE`` ...
> > * ``KVM_STATS_TYPE_INSTANT`` ...
> >
> > Bits 4-7 of ``flags encode the unit:
> >
> > * ``KVM_STATS_UNIT_NONE`` ...
> > ...
> > etc.
> >
> Good suggestion. Will do that.
> > > +  * ``KVM_STATS_TYPE_CUMULATIVE``
> > > +    The statistics data is cumulative. The value of data can only be increased.
> > > +    Most of the counters used in KVM are of this type.
> > > +    The corresponding ``count`` filed for this type is always 1.
> > > +  * ``KVM_STATS_TYPE_INSTANT``
> > > +    The statistics data is instantaneous. Its value can be increased or
> > > +    decreased. This type is usually used as a measurement of some resources,
> > > +    like the number of dirty pages, the number of large pages, etc.
> > > +    The corresponding ``count`` field for this type is always 1.
> > > +  * ``KVM_STATS_UNIT_NONE``
> > > +    There is no unit for the value of statistics data. This usually means that
> > > +    the value is a simple counter of an event.
> > > +  * ``KVM_STATS_UNIT_BYTES``
> > > +    It indicates that the statistics data is used to measure memory size, in the
> > > +    unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is
> > > +    determined by the ``exponent`` field in the descriptor. The
> > > +    ``KVM_STATS_SCALE_POW2`` flag is valid in this case. The unit of the data is
> > > +    determined by ``pow(2, exponent)``. For example, if value is 10,
> > > +    ``exponent`` is 20, which means the unit of statistics data is MiByte, we
> > > +    can get the statistics data in the unit of Byte by
> > > +    ``value * pow(2, exponent) = 10 * pow(2, 20) = 10 MiByte`` which is
> > > +    10 * 1024 * 1024 Bytes.
> > > +  * ``KVM_STATS_UNIT_SECONDS``
> > > +    It indicates that the statistics data is used to measure time/latency, in
> > > +    the unit of nanosecond, microsecond, millisecond and second. The unit of the
> > > +    data is determined by the ``exponent`` field in the descriptor. The
> > > +    ``KVM_STATS_SCALE_POW10`` flag is valid in this case. The unit of the data
> > > +    is determined by ``pow(10, exponent)``. For example, if value is 2000000,
> > > +    ``exponent`` is -6, which means the unit of statistics data is microsecond,
> > > +    we can get the statistics data in the unit of second by
> > > +    ``value * pow(10, exponent) = 2000000 * pow(10, -6) = 2 seconds``.
> > > +  * ``KVM_STATS_UNIT_CYCLES``
> > > +    It indicates that the statistics data is used to measure CPU clock cycles.
> > > +    The ``KVM_STATS_SCALE_POW10`` flag is valid in this case. For example, if
> > > +    value is 200, ``exponent`` is 4, we can get the number of CPU clock cycles
> > > +    by ``value * pow(10, exponent) = 200 * pow(10, 4) = 2000000``.
> > > +
> > > +The ``exponent`` field is the scale of corresponding statistics data. It has two
> > > +values as follows:
> > > +  * ``KVM_STATS_SCALE_POW10``
> >
> > I thought the scale was encoded in ``flags`` not ``exponent``? Isn't
> > the exponent the
> >
> The base is encoded in ``flags``, not the exponent.
The description about ``exponent`` is not right here. Will fix it.
> > > +    The scale is based on power of 10. It is used for measurement of time and
> > > +    CPU clock cycles.
> > > +  * ``KVM_STATS_SCALE_POW2``
> > > +    The scale is based on power of 2. It is used for measurement of memory size.
> >
> > It might be useful to give an example of how to use the exponent field
> > in practice.
> >
> Those examples where we discuss ``flags`` field also cover the usage
> of exponent field.
Will add an example here too.
> > > +
> > > +The ``size`` field is the number of values of this statistics data. It is in the
> > > +unit of ``unsigned long`` for VCPU or ``__u64`` for VM.
> > > +
> > > +The ``unused1`` and ``unused2`` fields are reserved for future
> > > +support for other types of statistics data, like log/linear histogram.
> > > +
> > > +The ``name`` field points to the name string of the statistics data. The name
> > > +string starts at the end of ``struct kvm_stats_desc``.
> > > +The maximum length (including trailing '\0') is indicated by ``name_size``
> > > +in ``struct kvm_stats_header``.
> > > +
> > > +The Stats Data block contains an array of data values of type ``struct
> > > +kvm_vm_stats_data`` or ``struct kvm_vcpu_stats_data``. It would be read by
> > > +user space periodically to pull statistics data.
> > > +The order of data value in Stats Data block is the same as the order of
> > > +descriptors in Descriptors block.
> > > +  * Statistics data for VM::
> > > +
> > > +       struct kvm_vm_stats_data {
> > > +               unsigned long value[0];
> > > +       };
> > > +
> > > +  * Statistics data for VCPU::
> > > +
> > > +       struct kvm_vcpu_stats_data {
> > > +               __u64 value[0];
> > > +       };
> > > +
> > >  5. The kvm_run structure
> > >  ========================
> > >
> > > @@ -6891,3 +7054,11 @@ This capability is always enabled.
> > >  This capability indicates that the KVM virtual PTP service is
> > >  supported in the host. A VMM can check whether the service is
> > >  available to the guest on migration.
> > > +
> > > +8.33 KVM_CAP_STATS_BINARY_FD
> > > +----------------------------
> > > +
> > > +:Architectures: all
> > > +
> > > +This capability indicates the feature that user space can create get a file
> > > +descriptor for every VM and VCPU to read statistics data in binary format.
> > > --
> > > 2.31.1.751.gd2f1c929bd-goog
> > >
> Thanks,
> Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-17 14:53 ` [PATCH v5 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
  2021-05-19 17:21   ` David Matlack
@ 2021-05-19 22:00   ` Ricardo Koller
  2021-05-19 22:54     ` Jing Zhang
  2021-05-20 21:30     ` Jing Zhang
  1 sibling, 2 replies; 30+ messages in thread
From: Ricardo Koller @ 2021-05-19 22:00 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 02:53:14PM +0000, Jing Zhang wrote:
> Add selftest to check KVM stats descriptors validity.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  tools/testing/selftests/kvm/.gitignore        |   1 +
>  tools/testing/selftests/kvm/Makefile          |   3 +
>  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
>  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
>  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
>  5 files changed, 398 insertions(+)
>  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> index bd83158e0e0b..35796667c944 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -43,3 +43,4 @@
>  /memslot_modification_stress_test
>  /set_memory_region_test
>  /steal_time
> +/kvm_bin_form_stats
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index e439d027939d..2984c86c848a 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
>  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
>  TEST_GEN_PROGS_x86_64 += set_memory_region_test
>  TEST_GEN_PROGS_x86_64 += steal_time
> +TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
>  
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
>  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
>  TEST_GEN_PROGS_aarch64 += kvm_page_table_test
>  TEST_GEN_PROGS_aarch64 += set_memory_region_test
>  TEST_GEN_PROGS_aarch64 += steal_time
> +TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
>  
>  TEST_GEN_PROGS_s390x = s390x/memop
>  TEST_GEN_PROGS_s390x += s390x/resets
> @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
>  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
>  TEST_GEN_PROGS_s390x += kvm_page_table_test
>  TEST_GEN_PROGS_s390x += set_memory_region_test
> +TEST_GEN_PROGS_s390x += kvm_bin_form_stats
>  
>  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
>  LIBKVM += $(LIBKVM_$(UNAME_M))
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index a8f022794ce3..ee01a67022d9 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
>  #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
>  	__GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
>  
> +int vm_get_statsfd(struct kvm_vm *vm);
> +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
> +
>  #endif /* SELFTEST_KVM_UTIL_H */
> diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> new file mode 100644
> index 000000000000..dae44397d0f4
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> @@ -0,0 +1,379 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * kvm_bin_form_stats
> + *
> + * Copyright (C) 2021, Google LLC.
> + *
> + * Test the fd-based interface for KVM statistics.
> + */
> +
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +
> +#include "test_util.h"
> +
> +#include "kvm_util.h"
> +#include "asm/kvm.h"
> +#include "linux/kvm.h"
> +
> +int vm_stats_test(struct kvm_vm *vm)
> +{
> +	ssize_t ret;
> +	int i, stats_fd, err = -1;
> +	size_t size_desc, size_data = 0;
> +	struct kvm_stats_header header;
> +	struct kvm_stats_desc *stats_desc, *pdesc;
> +	struct kvm_vm_stats_data *stats_data;
> +
> +	/* Get fd for VM stats */
> +	stats_fd = vm_get_statsfd(vm);
> +	if (stats_fd < 0) {
> +		perror("Get VM stats fd");
> +		return err;
> +	}

It seems that the only difference between vm_stats_test and
vcpu_stats_test is what function to use for getting the fd.  If that's
the case, it might be better to move all the checks to a common
function.

> +	/* Read kvm vm stats header */
> +	ret = read(stats_fd, &header, sizeof(header));
> +	if (ret != sizeof(header)) {
> +		perror("Read VM stats header");
> +		goto out_close_fd;
> +	}
> +	size_desc = sizeof(*stats_desc) + header.name_size;
> +	/* Check id string in header, that should start with "kvm" */
> +	if (strncmp(header.id, "kvm", 3) ||
> +			strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> +		printf("Invalid KVM VM stats type!\n");
> +		goto out_close_fd;
> +	}
> +	/* Sanity check for other fields in header */
> +	if (header.count == 0) {
> +		err = 0;
> +		goto out_close_fd;
> +	}

As mentioned by David, it would be better to replace the checks with
TEST_ASSERT's. Most other selftests rely on TEST_ASSERT.

> +	/* Check overlap */
> +	if (header.desc_offset == 0 || header.data_offset == 0 ||
> +			header.desc_offset < sizeof(header) ||
> +			header.data_offset < sizeof(header)) {
> +		printf("Invalid offset fields in header!\n");
> +		goto out_close_fd;
> +	}
> +	if (header.desc_offset < header.data_offset &&
> +			(header.desc_offset + size_desc * header.count >
> +			header.data_offset)) {

Could you make the check more strict?

TEST_ASSERT(header.desc_offset + size_desc * header.count == header.data_offset,
	"The data block should be at the end of the descriptor block.");

> +		printf("VM Descriptor block is overlapped with data block!\n");
> +		goto out_close_fd;
> +	}
> +
> +	/* Allocate memory for stats descriptors */
> +	stats_desc = calloc(header.count, size_desc);
> +	if (!stats_desc) {
> +		perror("Allocate memory for VM stats descriptors");
> +		goto out_close_fd;
> +	}
> +	/* Read kvm vm stats descriptors */
> +	ret = pread(stats_fd, stats_desc,
> +			size_desc * header.count, header.desc_offset);

You could stress kvm_vm_stats_read() more by calling pread for more
offsets. For example, for every descriptor:

	pread(..., header.desc_offset + i * size_desc)

I realize that the typical usage will be to read once for all
descriptors. But kvm_vm_stats_read (and kvm_vcpu_stats_read) need to
handle any offset, and doing so seems to be quite complicated.

Actually, you could stress kvm_vm_stats_read() even more by calling it
for _every_ possible offset (and eventually invalid offsets and sizes).
One easier way to check this is by calling read all descriptors into
some reference buffer using a single pread, and then call it for all
offsets while comparing against the reference buf.

> +	if (ret != size_desc * header.count) {
> +		perror("Read KVM VM stats descriptors");
> +		goto out_free_desc;
> +	}
> +	/* Sanity check for fields in descriptors */
> +	for (i = 0; i < header.count; ++i) {
> +		pdesc = (void *)stats_desc + i * size_desc;

cast to (struct kvm_stats_desc *)

> +		/* Check type,unit,scale boundaries */
> +		if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> +			printf("Unknown KVM stats type!\n");
> +			goto out_free_desc;
> +		}
> +		if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> +			printf("Unknown KVM stats unit!\n");
> +			goto out_free_desc;
> +		}
> +		if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> +				KVM_STATS_SCALE_MAX) {
> +			printf("Unknown KVM stats scale!\n");
> +			goto out_free_desc;
> +		}
> +		/* Check exponent for stats unit
> +		 * Exponent for counter should be greater than or equal to 0
> +		 * Exponent for unit bytes should be greater than or equal to 0
> +		 * Exponent for unit seconds should be less than or equal to 0
> +		 * Exponent for unit clock cycles should be greater than or
> +		 * equal to 0
> +		 */
> +		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +		case KVM_STATS_UNIT_NONE:
> +		case KVM_STATS_UNIT_BYTES:
> +		case KVM_STATS_UNIT_CYCLES:
> +			if (pdesc->exponent < 0) {
> +				printf("Unsupported KVM stats unit!\n");
> +				goto out_free_desc;
> +			}
> +			break;
> +		case KVM_STATS_UNIT_SECONDS:
> +			if (pdesc->exponent > 0) {
> +				printf("Unsupported KVM stats unit!\n");
> +				goto out_free_desc;
> +			}
> +			break;

		default:
			TEST_FAIL("Unexpected unit ...");

> +		}
> +		/* Check name string */
> +		if (strlen(pdesc->name) >= header.name_size) {
> +			printf("KVM stats name(%s) too long!\n", pdesc->name);
> +			goto out_free_desc;
> +		}

Tighter check:

TEST_ASSERT(header.name_size > 0 &&
	strlen(pdesc->name) + 1 == header.name_size);

> +		/* Check size field, which should not be zero */
> +		if (pdesc->size == 0) {
> +			printf("KVM descriptor(%s) with size of 0!\n",
> +					pdesc->name);
> +			goto out_free_desc;
> +		}
> +		size_data += pdesc->size * sizeof(stats_data->value[0]);
> +	}
> +	/* Check overlap */
> +	if (header.data_offset < header.desc_offset &&
> +		header.data_offset + size_data > header.desc_offset) {
> +		printf("Data block is overlapped with Descriptor block!\n");
> +		goto out_free_desc;
> +	}

This won't be needed if you use the suggested TEST_ASSERT (the other
overlap check).

> +	/* Check validity of all stats data size */
> +	if (size_data < header.count * sizeof(stats_data->value[0])) {
> +		printf("Data size is not correct!\n");
> +		goto out_free_desc;
> +	}

Tighter check:

TEST_ASSERT(size_data == header.count * stats_data->value[0]);

> +
> +	/* Allocate memory for stats data */
> +	stats_data = malloc(size_data);
> +	if (!stats_data) {
> +		perror("Allocate memory for VM stats data");
> +		goto out_free_desc;
> +	}
> +	/* Read kvm vm stats data */
> +	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> +	if (ret != size_data) {
> +		perror("Read KVM VM stats data");
> +		goto out_free_data;
> +	}
> +
> +	err = 0;
> +out_free_data:
> +	free(stats_data);
> +out_free_desc:
> +	free(stats_desc);
> +out_close_fd:
> +	close(stats_fd);
> +	return err;
> +}
> +
> +int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> +{
> +	ssize_t ret;
> +	int i, stats_fd, err = -1;
> +	size_t size_desc, size_data = 0;
> +	struct kvm_stats_header header;
> +	struct kvm_stats_desc *stats_desc, *pdesc;
> +	struct kvm_vcpu_stats_data *stats_data;
> +
> +	/* Get fd for VCPU stats */
> +	stats_fd = vcpu_get_statsfd(vm, vcpu_id);
> +	if (stats_fd < 0) {
> +		perror("Get VCPU stats fd");
> +		return err;
> +	}
> +	/* Read kvm vcpu stats header */
> +	ret = read(stats_fd, &header, sizeof(header));
> +	if (ret != sizeof(header)) {
> +		perror("Read VCPU stats header");
> +		goto out_close_fd;
> +	}
> +	size_desc = sizeof(*stats_desc) + header.name_size;
> +	/* Check id string in header, that should start with "kvm" */
> +	if (strncmp(header.id, "kvm", 3) ||
> +			strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> +		printf("Invalid KVM VCPU stats type!\n");
> +		goto out_close_fd;
> +	}
> +	/* Sanity check for other fields in header */
> +	if (header.count == 0) {
> +		err = 0;
> +		goto out_close_fd;
> +	}
> +	/* Check overlap */
> +	if (header.desc_offset == 0 || header.data_offset == 0 ||
> +			header.desc_offset < sizeof(header) ||
> +			header.data_offset < sizeof(header)) {
> +		printf("Invalid offset fields in header!\n");
> +		goto out_close_fd;
> +	}
> +	if (header.desc_offset < header.data_offset &&
> +			(header.desc_offset + size_desc * header.count >
> +			header.data_offset)) {
> +		printf("VCPU Descriptor block is overlapped with data block!\n");
> +		goto out_close_fd;
> +	}

Same as above (tighter check).

> +
> +	/* Allocate memory for stats descriptors */
> +	stats_desc = calloc(header.count, size_desc);
> +	if (!stats_desc) {
> +		perror("Allocate memory for VCPU stats descriptors");
> +		goto out_close_fd;
> +	}
> +	/* Read kvm vcpu stats descriptors */
> +	ret = pread(stats_fd, stats_desc,
> +			size_desc * header.count, header.desc_offset);
> +	if (ret != size_desc * header.count) {
> +		perror("Read KVM VCPU stats descriptors");
> +		goto out_free_desc;
> +	}
> +	/* Sanity check for fields in descriptors */
> +	for (i = 0; i < header.count; ++i) {
> +		pdesc = (void *)stats_desc + i * size_desc;
> +		/* Check boundaries */
> +		if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> +			printf("Unknown KVM stats type!\n");
> +			goto out_free_desc;
> +		}
> +		if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> +			printf("Unknown KVM stats unit!\n");
> +			goto out_free_desc;
> +		}
> +		if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> +				KVM_STATS_SCALE_MAX) {
> +			printf("Unknown KVM stats scale!\n");
> +			goto out_free_desc;
> +		}
> +		/* Check exponent for stats unit
> +		 * Exponent for counter should be greater than or equal to 0
> +		 * Exponent for unit bytes should be greater than or equal to 0
> +		 * Exponent for unit seconds should be less than or equal to 0
> +		 * Exponent for unit clock cycles should be greater than or
> +		 * equal to 0
> +		 */
> +		switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> +		case KVM_STATS_UNIT_NONE:
> +		case KVM_STATS_UNIT_BYTES:
> +		case KVM_STATS_UNIT_CYCLES:
> +			if (pdesc->exponent < 0) {
> +				printf("Unsupported KVM stats unit!\n");
> +				goto out_free_desc;
> +			}
> +			break;
> +		case KVM_STATS_UNIT_SECONDS:
> +			if (pdesc->exponent > 0) {
> +				printf("Unsupported KVM stats unit!\n");
> +				goto out_free_desc;
> +			}
> +			break;
> +		}
> +		/* Check name string */
> +		if (strlen(pdesc->name) >= header.name_size) {
> +			printf("KVM stats name(%s) too long!\n", pdesc->name);
> +			goto out_free_desc;
> +		}
> +		/* Check size field, which should not be zero */
> +		if (pdesc->size == 0) {
> +			printf("KVM descriptor(%s) with size of 0!\n",
> +					pdesc->name);
> +			goto out_free_desc;
> +		}
> +		size_data += pdesc->size * sizeof(stats_data->value[0]);
> +	}
> +	/* Check overlap */
> +	if (header.data_offset < header.desc_offset &&
> +		header.data_offset + size_data > header.desc_offset) {
> +		printf("Data block is overlapped with Descriptor block!\n");
> +		goto out_free_desc;
> +	}
> +	/* Check validity of all stats data size */
> +	if (size_data < header.count * sizeof(stats_data->value[0])) {
> +		printf("Data size is not correct!\n");
> +		goto out_free_desc;
> +	}
> +
> +	/* Allocate memory for stats data */
> +	stats_data = malloc(size_data);
> +	if (!stats_data) {
> +		perror("Allocate memory for VCPU stats data");
> +		goto out_free_desc;
> +	}
> +	/* Read kvm vcpu stats data */
> +	ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> +	if (ret != size_data) {
> +		perror("Read KVM VCPU stats data");
> +		goto out_free_data;
> +	}
> +
> +	err = 0;
> +out_free_data:
> +	free(stats_data);
> +out_free_desc:
> +	free(stats_desc);
> +out_close_fd:
> +	close(stats_fd);
> +	return err;
> +}
> +
> +/*
> + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> + * The first parameter #vm set the number of VMs being created.
> + * The second parameter #vcpu set the number of VCPUs being created.
> + * By default, 1 VM and 1 VCPU for the VM would be created for testing.
> + */
> +
> +int main(int argc, char *argv[])
> +{
> +	int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
> +	struct kvm_vm **vms;
> +
> +	/* Get the number of VMs and VCPUs that would be created for testing. */
> +	if (argc > 1) {
> +		max_vm = strtol(argv[1], NULL, 0);
> +		if (max_vm <= 0)
> +			max_vm = 1;
> +	}
> +	if (argc > 2) {
> +		max_vcpu = strtol(argv[2], NULL, 0);
> +		if (max_vcpu <= 0)
> +			max_vcpu = 1;
> +	}
> +
> +	/* Check the extension for binary stats */
> +	ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> +	if (ret < 0) {
> +		printf("Binary form statistics interface is not supported!\n");
> +		return err;
> +	}
> +
> +	/* Create VMs and VCPUs */
> +	vms = malloc(sizeof(vms[0]) * max_vm);
> +	if (!vms) {
> +		perror("Allocate memory for storing VM pointers");
> +		return err;
> +	}
> +	for (i = 0; i < max_vm; ++i) {
> +		vms[i] = vm_create(VM_MODE_DEFAULT,
> +				DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> +		for (j = 0; j < max_vcpu; ++j)
> +			vm_vcpu_add(vms[i], j);
> +	}
> +
> +	/* Check stats read for every VM and VCPU */
> +	for (i = 0; i < max_vm; ++i) {
> +		if (vm_stats_test(vms[i]))
> +			goto out_free_vm;
> +		for (j = 0; j < max_vcpu; ++j) {
> +			if (vcpu_stats_test(vms[i], j))
> +				goto out_free_vm;
> +		}
> +	}
> +
> +	err = 0;
> +out_free_vm:
> +	for (i = 0; i < max_vm; ++i)
> +		kvm_vm_free(vms[i]);
> +	free(vms);
> +	return err;
> +}
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index fc83f6c5902d..d9e0b2c8b906 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
>  	n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
>  	return vm_adjust_num_guest_pages(mode, n);
>  }
> +
> +int vm_get_statsfd(struct kvm_vm *vm)
> +{
> +	return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
> +}
> +
> +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
> +{
> +	struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> +
> +	return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
> +}
> -- 
> 2.31.1.751.gd2f1c929bd-goog
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-19 22:00   ` Ricardo Koller
@ 2021-05-19 22:54     ` Jing Zhang
  2021-05-20 21:30     ` Jing Zhang
  1 sibling, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-19 22:54 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi Ricardo,

On Wed, May 19, 2021 at 5:00 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> On Mon, May 17, 2021 at 02:53:14PM +0000, Jing Zhang wrote:
> > Add selftest to check KVM stats descriptors validity.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   3 +
> >  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
> >  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
> >  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
> >  5 files changed, 398 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index bd83158e0e0b..35796667c944 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -43,3 +43,4 @@
> >  /memslot_modification_stress_test
> >  /set_memory_region_test
> >  /steal_time
> > +/kvm_bin_form_stats
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index e439d027939d..2984c86c848a 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> >  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> >  TEST_GEN_PROGS_x86_64 += set_memory_region_test
> >  TEST_GEN_PROGS_x86_64 += steal_time
> > +TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> > @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> >  TEST_GEN_PROGS_aarch64 += set_memory_region_test
> >  TEST_GEN_PROGS_aarch64 += steal_time
> > +TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_s390x = s390x/memop
> >  TEST_GEN_PROGS_s390x += s390x/resets
> > @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
> >  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_s390x += kvm_page_table_test
> >  TEST_GEN_PROGS_s390x += set_memory_region_test
> > +TEST_GEN_PROGS_s390x += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> >  LIBKVM += $(LIBKVM_$(UNAME_M))
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> > index a8f022794ce3..ee01a67022d9 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
> >  #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
> >       __GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
> >
> > +int vm_get_statsfd(struct kvm_vm *vm);
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
> > +
> >  #endif /* SELFTEST_KVM_UTIL_H */
> > diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > new file mode 100644
> > index 000000000000..dae44397d0f4
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > @@ -0,0 +1,379 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * kvm_bin_form_stats
> > + *
> > + * Copyright (C) 2021, Google LLC.
> > + *
> > + * Test the fd-based interface for KVM statistics.
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <fcntl.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <errno.h>
> > +
> > +#include "test_util.h"
> > +
> > +#include "kvm_util.h"
> > +#include "asm/kvm.h"
> > +#include "linux/kvm.h"
> > +
> > +int vm_stats_test(struct kvm_vm *vm)
> > +{
> > +     ssize_t ret;
> > +     int i, stats_fd, err = -1;
> > +     size_t size_desc, size_data = 0;
> > +     struct kvm_stats_header header;
> > +     struct kvm_stats_desc *stats_desc, *pdesc;
> > +     struct kvm_vm_stats_data *stats_data;
> > +
> > +     /* Get fd for VM stats */
> > +     stats_fd = vm_get_statsfd(vm);
> > +     if (stats_fd < 0) {
> > +             perror("Get VM stats fd");
> > +             return err;
> > +     }
>
> It seems that the only difference between vm_stats_test and
> vcpu_stats_test is what function to use for getting the fd.  If that's
> the case, it might be better to move all the checks to a common
> function.
>
Sure, will do that.
> > +     /* Read kvm vm stats header */
> > +     ret = read(stats_fd, &header, sizeof(header));
> > +     if (ret != sizeof(header)) {
> > +             perror("Read VM stats header");
> > +             goto out_close_fd;
> > +     }
> > +     size_desc = sizeof(*stats_desc) + header.name_size;
> > +     /* Check id string in header, that should start with "kvm" */
> > +     if (strncmp(header.id, "kvm", 3) ||
> > +                     strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +             printf("Invalid KVM VM stats type!\n");
> > +             goto out_close_fd;
> > +     }
> > +     /* Sanity check for other fields in header */
> > +     if (header.count == 0) {
> > +             err = 0;
> > +             goto out_close_fd;
> > +     }
>
> As mentioned by David, it would be better to replace the checks with
> TEST_ASSERT's. Most other selftests rely on TEST_ASSERT.
>
Will do.
> > +     /* Check overlap */
> > +     if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                     header.desc_offset < sizeof(header) ||
> > +                     header.data_offset < sizeof(header)) {
> > +             printf("Invalid offset fields in header!\n");
> > +             goto out_close_fd;
> > +     }
> > +     if (header.desc_offset < header.data_offset &&
> > +                     (header.desc_offset + size_desc * header.count >
> > +                     header.data_offset)) {
>
> Could you make the check more strict?
>
> TEST_ASSERT(header.desc_offset + size_desc * header.count == header.data_offset,
>         "The data block should be at the end of the descriptor block.");
>
We can't do stricter checks like this. Only the header block is
enforced to be at offset 0.
The descriptor block and data block are not enforced to be adjacent,
they can be at any
offset theoretically. That's why we have the desc_offset and data_offset fields.
> > +             printf("VM Descriptor block is overlapped with data block!\n");
> > +             goto out_close_fd;
> > +     }
> > +
> > +     /* Allocate memory for stats descriptors */
> > +     stats_desc = calloc(header.count, size_desc);
> > +     if (!stats_desc) {
> > +             perror("Allocate memory for VM stats descriptors");
> > +             goto out_close_fd;
> > +     }
> > +     /* Read kvm vm stats descriptors */
> > +     ret = pread(stats_fd, stats_desc,
> > +                     size_desc * header.count, header.desc_offset);
>
> You could stress kvm_vm_stats_read() more by calling pread for more
> offsets. For example, for every descriptor:
>
>         pread(..., header.desc_offset + i * size_desc)
>
> I realize that the typical usage will be to read once for all
> descriptors. But kvm_vm_stats_read (and kvm_vcpu_stats_read) need to
> handle any offset, and doing so seems to be quite complicated.
>
> Actually, you could stress kvm_vm_stats_read() even more by calling it
> for _every_ possible offset (and eventually invalid offsets and sizes).
> One easier way to check this is by calling read all descriptors into
> some reference buffer using a single pread, and then call it for all
> offsets while comparing against the reference buf.
>
Yes, the kvm_{vm,vcpu}_stats_read support read at any offset.
It is a good idea to do stress read and invalid offset read.
Will add those tests.
> > +     if (ret != size_desc * header.count) {
> > +             perror("Read KVM VM stats descriptors");
> > +             goto out_free_desc;
> > +     }
> > +     /* Sanity check for fields in descriptors */
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
>
> cast to (struct kvm_stats_desc *)
>
> > +             /* Check type,unit,scale boundaries */
> > +             if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                     printf("Unknown KVM stats type!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                     printf("Unknown KVM stats unit!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                             KVM_STATS_SCALE_MAX) {
> > +                     printf("Unknown KVM stats scale!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check exponent for stats unit
> > +              * Exponent for counter should be greater than or equal to 0
> > +              * Exponent for unit bytes should be greater than or equal to 0
> > +              * Exponent for unit seconds should be less than or equal to 0
> > +              * Exponent for unit clock cycles should be greater than or
> > +              * equal to 0
> > +              */
> > +             switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +             case KVM_STATS_UNIT_NONE:
> > +             case KVM_STATS_UNIT_BYTES:
> > +             case KVM_STATS_UNIT_CYCLES:
> > +                     if (pdesc->exponent < 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             case KVM_STATS_UNIT_SECONDS:
> > +                     if (pdesc->exponent > 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
>
>                 default:
>                         TEST_FAIL("Unexpected unit ...");
>
Will do. thanks.
> > +             }
> > +             /* Check name string */
> > +             if (strlen(pdesc->name) >= header.name_size) {
> > +                     printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                     goto out_free_desc;
> > +             }
>
> Tighter check:
>
> TEST_ASSERT(header.name_size > 0 &&
>         strlen(pdesc->name) + 1 == header.name_size);
>
The length of name string can be any number less than header.name_size.
We can't add this kind of check here.
> > +             /* Check size field, which should not be zero */
> > +             if (pdesc->size == 0) {
> > +                     printf("KVM descriptor(%s) with size of 0!\n",
> > +                                     pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +     }
> > +     /* Check overlap */
> > +     if (header.data_offset < header.desc_offset &&
> > +             header.data_offset + size_data > header.desc_offset) {
> > +             printf("Data block is overlapped with Descriptor block!\n");
> > +             goto out_free_desc;
> > +     }
>
> This won't be needed if you use the suggested TEST_ASSERT (the other
> overlap check).
>
> > +     /* Check validity of all stats data size */
> > +     if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +             printf("Data size is not correct!\n");
> > +             goto out_free_desc;
> > +     }
>
> Tighter check:
>
> TEST_ASSERT(size_data == header.count * stats_data->value[0]);
>
Will do this.
> > +
> > +     /* Allocate memory for stats data */
> > +     stats_data = malloc(size_data);
> > +     if (!stats_data) {
> > +             perror("Allocate memory for VM stats data");
> > +             goto out_free_desc;
> > +     }
> > +     /* Read kvm vm stats data */
> > +     ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +     if (ret != size_data) {
> > +             perror("Read KVM VM stats data");
> > +             goto out_free_data;
> > +     }
> > +
> > +     err = 0;
> > +out_free_data:
> > +     free(stats_data);
> > +out_free_desc:
> > +     free(stats_desc);
> > +out_close_fd:
> > +     close(stats_fd);
> > +     return err;
> > +}
> > +
> > +int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> > +{
> > +     ssize_t ret;
> > +     int i, stats_fd, err = -1;
> > +     size_t size_desc, size_data = 0;
> > +     struct kvm_stats_header header;
> > +     struct kvm_stats_desc *stats_desc, *pdesc;
> > +     struct kvm_vcpu_stats_data *stats_data;
> > +
> > +     /* Get fd for VCPU stats */
> > +     stats_fd = vcpu_get_statsfd(vm, vcpu_id);
> > +     if (stats_fd < 0) {
> > +             perror("Get VCPU stats fd");
> > +             return err;
> > +     }
> > +     /* Read kvm vcpu stats header */
> > +     ret = read(stats_fd, &header, sizeof(header));
> > +     if (ret != sizeof(header)) {
> > +             perror("Read VCPU stats header");
> > +             goto out_close_fd;
> > +     }
> > +     size_desc = sizeof(*stats_desc) + header.name_size;
> > +     /* Check id string in header, that should start with "kvm" */
> > +     if (strncmp(header.id, "kvm", 3) ||
> > +                     strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +             printf("Invalid KVM VCPU stats type!\n");
> > +             goto out_close_fd;
> > +     }
> > +     /* Sanity check for other fields in header */
> > +     if (header.count == 0) {
> > +             err = 0;
> > +             goto out_close_fd;
> > +     }
> > +     /* Check overlap */
> > +     if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                     header.desc_offset < sizeof(header) ||
> > +                     header.data_offset < sizeof(header)) {
> > +             printf("Invalid offset fields in header!\n");
> > +             goto out_close_fd;
> > +     }
> > +     if (header.desc_offset < header.data_offset &&
> > +                     (header.desc_offset + size_desc * header.count >
> > +                     header.data_offset)) {
> > +             printf("VCPU Descriptor block is overlapped with data block!\n");
> > +             goto out_close_fd;
> > +     }
>
> Same as above (tighter check).
>
> > +
> > +     /* Allocate memory for stats descriptors */
> > +     stats_desc = calloc(header.count, size_desc);
> > +     if (!stats_desc) {
> > +             perror("Allocate memory for VCPU stats descriptors");
> > +             goto out_close_fd;
> > +     }
> > +     /* Read kvm vcpu stats descriptors */
> > +     ret = pread(stats_fd, stats_desc,
> > +                     size_desc * header.count, header.desc_offset);
> > +     if (ret != size_desc * header.count) {
> > +             perror("Read KVM VCPU stats descriptors");
> > +             goto out_free_desc;
> > +     }
> > +     /* Sanity check for fields in descriptors */
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
> > +             /* Check boundaries */
> > +             if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                     printf("Unknown KVM stats type!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                     printf("Unknown KVM stats unit!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                             KVM_STATS_SCALE_MAX) {
> > +                     printf("Unknown KVM stats scale!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check exponent for stats unit
> > +              * Exponent for counter should be greater than or equal to 0
> > +              * Exponent for unit bytes should be greater than or equal to 0
> > +              * Exponent for unit seconds should be less than or equal to 0
> > +              * Exponent for unit clock cycles should be greater than or
> > +              * equal to 0
> > +              */
> > +             switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +             case KVM_STATS_UNIT_NONE:
> > +             case KVM_STATS_UNIT_BYTES:
> > +             case KVM_STATS_UNIT_CYCLES:
> > +                     if (pdesc->exponent < 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             case KVM_STATS_UNIT_SECONDS:
> > +                     if (pdesc->exponent > 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             }
> > +             /* Check name string */
> > +             if (strlen(pdesc->name) >= header.name_size) {
> > +                     printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check size field, which should not be zero */
> > +             if (pdesc->size == 0) {
> > +                     printf("KVM descriptor(%s) with size of 0!\n",
> > +                                     pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +     }
> > +     /* Check overlap */
> > +     if (header.data_offset < header.desc_offset &&
> > +             header.data_offset + size_data > header.desc_offset) {
> > +             printf("Data block is overlapped with Descriptor block!\n");
> > +             goto out_free_desc;
> > +     }
> > +     /* Check validity of all stats data size */
> > +     if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +             printf("Data size is not correct!\n");
> > +             goto out_free_desc;
> > +     }
> > +
> > +     /* Allocate memory for stats data */
> > +     stats_data = malloc(size_data);
> > +     if (!stats_data) {
> > +             perror("Allocate memory for VCPU stats data");
> > +             goto out_free_desc;
> > +     }
> > +     /* Read kvm vcpu stats data */
> > +     ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +     if (ret != size_data) {
> > +             perror("Read KVM VCPU stats data");
> > +             goto out_free_data;
> > +     }
> > +
> > +     err = 0;
> > +out_free_data:
> > +     free(stats_data);
> > +out_free_desc:
> > +     free(stats_desc);
> > +out_close_fd:
> > +     close(stats_fd);
> > +     return err;
> > +}
> > +
> > +/*
> > + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> > + * The first parameter #vm set the number of VMs being created.
> > + * The second parameter #vcpu set the number of VCPUs being created.
> > + * By default, 1 VM and 1 VCPU for the VM would be created for testing.
> > + */
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +     int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
> > +     struct kvm_vm **vms;
> > +
> > +     /* Get the number of VMs and VCPUs that would be created for testing. */
> > +     if (argc > 1) {
> > +             max_vm = strtol(argv[1], NULL, 0);
> > +             if (max_vm <= 0)
> > +                     max_vm = 1;
> > +     }
> > +     if (argc > 2) {
> > +             max_vcpu = strtol(argv[2], NULL, 0);
> > +             if (max_vcpu <= 0)
> > +                     max_vcpu = 1;
> > +     }
> > +
> > +     /* Check the extension for binary stats */
> > +     ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> > +     if (ret < 0) {
> > +             printf("Binary form statistics interface is not supported!\n");
> > +             return err;
> > +     }
> > +
> > +     /* Create VMs and VCPUs */
> > +     vms = malloc(sizeof(vms[0]) * max_vm);
> > +     if (!vms) {
> > +             perror("Allocate memory for storing VM pointers");
> > +             return err;
> > +     }
> > +     for (i = 0; i < max_vm; ++i) {
> > +             vms[i] = vm_create(VM_MODE_DEFAULT,
> > +                             DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> > +             for (j = 0; j < max_vcpu; ++j)
> > +                     vm_vcpu_add(vms[i], j);
> > +     }
> > +
> > +     /* Check stats read for every VM and VCPU */
> > +     for (i = 0; i < max_vm; ++i) {
> > +             if (vm_stats_test(vms[i]))
> > +                     goto out_free_vm;
> > +             for (j = 0; j < max_vcpu; ++j) {
> > +                     if (vcpu_stats_test(vms[i], j))
> > +                             goto out_free_vm;
> > +             }
> > +     }
> > +
> > +     err = 0;
> > +out_free_vm:
> > +     for (i = 0; i < max_vm; ++i)
> > +             kvm_vm_free(vms[i]);
> > +     free(vms);
> > +     return err;
> > +}
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index fc83f6c5902d..d9e0b2c8b906 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
> >       n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
> >       return vm_adjust_num_guest_pages(mode, n);
> >  }
> > +
> > +int vm_get_statsfd(struct kvm_vm *vm)
> > +{
> > +     return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
> > +}
> > +
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +     struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> > +
> > +     return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
> > +}
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-17 14:53 ` [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
  2021-05-19 17:12   ` David Matlack
@ 2021-05-20  4:21   ` Ricardo Koller
  2021-05-20 17:37     ` Jing Zhang
  1 sibling, 1 reply; 30+ messages in thread
From: Ricardo Koller @ 2021-05-20  4:21 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> Provides a file descriptor per VM to read VM stats info/data.
> Provides a file descriptor per vCPU to read vCPU stats info/data.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arch/arm64/kvm/guest.c    |  26 +++++
>  arch/mips/kvm/mips.c      |  52 +++++++++
>  arch/powerpc/kvm/book3s.c |  52 +++++++++
>  arch/powerpc/kvm/booke.c  |  45 ++++++++
>  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
>  arch/x86/kvm/x86.c        |  53 +++++++++
>  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
>  include/uapi/linux/kvm.h  |  50 +++++++++
>  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
>  9 files changed, 745 insertions(+)
> 
  
> +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> +			      size_t size, loff_t *offset)
> +{
> +	char id[KVM_STATS_ID_MAXLEN];
> +	struct kvm_vcpu *vcpu = file->private_data;
> +	ssize_t copylen, len, remain = size;
> +	size_t size_header, size_desc, size_stats;
> +	loff_t pos = *offset;
> +	char __user *dest = user_buffer;
> +	void *src;

Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
clang will do the expected thing.

> +
> +	snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> +			task_pid_nr(current), vcpu->vcpu_id);
> +	size_header = sizeof(kvm_vcpu_stats_header);
> +	size_desc =
> +		kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> +	size_stats = sizeof(vcpu->stat);
> +
> +	len = sizeof(id) + size_header + size_desc + size_stats - pos;
> +	len = min(len, remain);
> +	if (len <= 0)
> +		return 0;
> +	remain = len;

If 'desc_offset' is not right after the header, then the 'len'
calculation is missing the gap into account. For example, assuming there
is a gap of 0x1000000 between the header and the descriptors:

	desc_offset = sizeof(id) + size_header + 0x1000000

and the user calls the ioctl with enough space for the whole file,
including the gap:

	*offset = 0
	size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000

then 'remain' gets the wrong size:

	remain = sizeof(id) + size_header + size_desc + size_stats

and ... (more below)

> +
> +	/* Copy kvm vcpu stats header id string */
> +	copylen = sizeof(id) - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)id + pos;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vcpu stats header */
> +	copylen = sizeof(id) + size_header - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)&kvm_vcpu_stats_header;
> +		src += pos - sizeof(id);
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vcpu stats descriptors */
> +	copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;

This would be the state at this point:

	pos	= sizeof(id) + size_header
	copylen	= sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
		= 0x1000000 + size_desc
	remain	= size_desc + size_stats

> +	copylen = min(copylen, remain);

	copylen = size_desc + size_stats

which is not enough to copy the descriptors (and the data).

> +	if (copylen > 0) {
> +		src = (void *)&kvm_vcpu_stats_desc;
> +		src += pos - kvm_vcpu_stats_header.desc_offset;

Moreover, src also needs to take the gap into account.

	src	= &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
		= &kvm_vcpu_stats_desc - 0x1000000

Otherwise, src ends up pointing at the wrong place.

> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vcpu stats values */
> +	copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;

The same problem occurs here. There is a potential gap before
data_offset that needs to be taken into account for src and len.

Would it be possible to just ensure that there is no gap? maybe even
remove data_offset and desc_offset and always place them adjacent, and
have the descriptors right after the header.

> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)&vcpu->stat;
> +		src += pos - kvm_vcpu_stats_header.data_offset;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +
> +	*offset = pos;
> +	return len;
> +}
> +
>  



> +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> +			      size_t size, loff_t *offset)
> +{

Consider moving the common code between kvm_vcpu_stats_read and this one
into some function that takes pointers to header, desc, and data. Unless
there is something vcpu or vm specific besides that.

> +	char id[KVM_STATS_ID_MAXLEN];
> +	struct kvm *kvm = file->private_data;
> +	ssize_t copylen, len, remain = size;
> +	size_t size_header, size_desc, size_stats;
> +	loff_t pos = *offset;
> +	char __user *dest = user_buffer;
> +	void *src;
> +
> +	snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> +	size_header = sizeof(kvm_vm_stats_header);
> +	size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> +	size_stats = sizeof(kvm->stat);
> +
> +	len = sizeof(id) + size_header + size_desc + size_stats - pos;
> +	len = min(len, remain);
> +	if (len <= 0)
> +		return 0;
> +	remain = len;
> +
> +	/* Copy kvm vm stats header id string */
> +	copylen = sizeof(id) - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)id + pos;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vm stats header */
> +	copylen = sizeof(id) + size_header - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)&kvm_vm_stats_header;
> +		src += pos - sizeof(id);
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vm stats descriptors */
> +	copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)&kvm_vm_stats_desc;
> +		src += pos - kvm_vm_stats_header.desc_offset;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +	/* Copy kvm vm stats values */
> +	copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> +	copylen = min(copylen, remain);
> +	if (copylen > 0) {
> +		src = (void *)&kvm->stat;
> +		src += pos - kvm_vm_stats_header.data_offset;
> +		if (copy_to_user(dest, src, copylen))
> +			return -EFAULT;
> +		remain -= copylen;
> +		pos += copylen;
> +		dest += copylen;
> +	}
> +
> +	*offset = pos;
> +	return len;
> +}
> +
> -- 
> 2.31.1.751.gd2f1c929bd-goog
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-20  4:21   ` Ricardo Koller
@ 2021-05-20 17:37     ` Jing Zhang
  2021-05-20 18:58       ` Ricardo Koller
  0 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-20 17:37 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi Ricardo,

On Wed, May 19, 2021 at 11:21 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> > Provides a file descriptor per VM to read VM stats info/data.
> > Provides a file descriptor per vCPU to read vCPU stats info/data.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  arch/arm64/kvm/guest.c    |  26 +++++
> >  arch/mips/kvm/mips.c      |  52 +++++++++
> >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> >  arch/x86/kvm/x86.c        |  53 +++++++++
> >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> >  include/uapi/linux/kvm.h  |  50 +++++++++
> >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> >  9 files changed, 745 insertions(+)
> >
>
> > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > +                           size_t size, loff_t *offset)
> > +{
> > +     char id[KVM_STATS_ID_MAXLEN];
> > +     struct kvm_vcpu *vcpu = file->private_data;
> > +     ssize_t copylen, len, remain = size;
> > +     size_t size_header, size_desc, size_stats;
> > +     loff_t pos = *offset;
> > +     char __user *dest = user_buffer;
> > +     void *src;
>
> Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
> clang will do the expected thing.
>
> > +
> > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > +                     task_pid_nr(current), vcpu->vcpu_id);
> > +     size_header = sizeof(kvm_vcpu_stats_header);
> > +     size_desc =
> > +             kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > +     size_stats = sizeof(vcpu->stat);
> > +
> > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > +     len = min(len, remain);
> > +     if (len <= 0)
> > +             return 0;
> > +     remain = len;
>
> If 'desc_offset' is not right after the header, then the 'len'
> calculation is missing the gap into account. For example, assuming there
> is a gap of 0x1000000 between the header and the descriptors:
>
>         desc_offset = sizeof(id) + size_header + 0x1000000
>
> and the user calls the ioctl with enough space for the whole file,
> including the gap:
>
>         *offset = 0
>         size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000
>
> then 'remain' gets the wrong size:
>
>         remain = sizeof(id) + size_header + size_desc + size_stats
>
> and ... (more below)
>
> > +
> > +     /* Copy kvm vcpu stats header id string */
> > +     copylen = sizeof(id) - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)id + pos;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vcpu stats header */
> > +     copylen = sizeof(id) + size_header - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)&kvm_vcpu_stats_header;
> > +             src += pos - sizeof(id);
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vcpu stats descriptors */
> > +     copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
>
> This would be the state at this point:
>
>         pos     = sizeof(id) + size_header
>         copylen = sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
>                 = 0x1000000 + size_desc
>         remain  = size_desc + size_stats
>
> > +     copylen = min(copylen, remain);
>
>         copylen = size_desc + size_stats
>
> which is not enough to copy the descriptors (and the data).
>
> > +     if (copylen > 0) {
> > +             src = (void *)&kvm_vcpu_stats_desc;
> > +             src += pos - kvm_vcpu_stats_header.desc_offset;
>
> Moreover, src also needs to take the gap into account.
>
>         src     = &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
>                 = &kvm_vcpu_stats_desc - 0x1000000
>
> Otherwise, src ends up pointing at the wrong place.
>
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vcpu stats values */
> > +     copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
>
> The same problem occurs here. There is a potential gap before
> data_offset that needs to be taken into account for src and len.
>
> Would it be possible to just ensure that there is no gap? maybe even
> remove data_offset and desc_offset and always place them adjacent, and
> have the descriptors right after the header.
>
I guess I didn't make it clear about the offset fields in the header block.
We don't create any gap here. In this implementation, kernel knows that
descriptor block is right after header block and data block is right after
descriptor block.
The reason we have offset fields for descriptor block and data block is
for flexibility and future potential extension. e.g. we might add another
block between header block and descriptor block in the future for some
other metadata information.
I think we are good here.
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)&vcpu->stat;
> > +             src += pos - kvm_vcpu_stats_header.data_offset;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +
> > +     *offset = pos;
> > +     return len;
> > +}
> > +
> >
>
>
>
> > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > +                           size_t size, loff_t *offset)
> > +{
>
> Consider moving the common code between kvm_vcpu_stats_read and this one
> into some function that takes pointers to header, desc, and data. Unless
> there is something vcpu or vm specific besides that.
>
Will do that, thanks.
> > +     char id[KVM_STATS_ID_MAXLEN];
> > +     struct kvm *kvm = file->private_data;
> > +     ssize_t copylen, len, remain = size;
> > +     size_t size_header, size_desc, size_stats;
> > +     loff_t pos = *offset;
> > +     char __user *dest = user_buffer;
> > +     void *src;
> > +
> > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > +     size_header = sizeof(kvm_vm_stats_header);
> > +     size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > +     size_stats = sizeof(kvm->stat);
> > +
> > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > +     len = min(len, remain);
> > +     if (len <= 0)
> > +             return 0;
> > +     remain = len;
> > +
> > +     /* Copy kvm vm stats header id string */
> > +     copylen = sizeof(id) - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)id + pos;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vm stats header */
> > +     copylen = sizeof(id) + size_header - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)&kvm_vm_stats_header;
> > +             src += pos - sizeof(id);
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vm stats descriptors */
> > +     copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)&kvm_vm_stats_desc;
> > +             src += pos - kvm_vm_stats_header.desc_offset;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +     /* Copy kvm vm stats values */
> > +     copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > +     copylen = min(copylen, remain);
> > +     if (copylen > 0) {
> > +             src = (void *)&kvm->stat;
> > +             src += pos - kvm_vm_stats_header.data_offset;
> > +             if (copy_to_user(dest, src, copylen))
> > +                     return -EFAULT;
> > +             remain -= copylen;
> > +             pos += copylen;
> > +             dest += copylen;
> > +     }
> > +
> > +     *offset = pos;
> > +     return len;
> > +}
> > +
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-20 17:37     ` Jing Zhang
@ 2021-05-20 18:58       ` Ricardo Koller
  2021-05-20 19:46         ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: Ricardo Koller @ 2021-05-20 18:58 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Thu, May 20, 2021 at 12:37:59PM -0500, Jing Zhang wrote:
> Hi Ricardo,
> 
> On Wed, May 19, 2021 at 11:21 PM Ricardo Koller <ricarkol@google.com> wrote:
> >
> > On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> > > Provides a file descriptor per VM to read VM stats info/data.
> > > Provides a file descriptor per vCPU to read vCPU stats info/data.
> > >
> > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > ---
> > >  arch/arm64/kvm/guest.c    |  26 +++++
> > >  arch/mips/kvm/mips.c      |  52 +++++++++
> > >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> > >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> > >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> > >  arch/x86/kvm/x86.c        |  53 +++++++++
> > >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> > >  include/uapi/linux/kvm.h  |  50 +++++++++
> > >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> > >  9 files changed, 745 insertions(+)
> > >
> >
> > > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > > +                           size_t size, loff_t *offset)
> > > +{
> > > +     char id[KVM_STATS_ID_MAXLEN];
> > > +     struct kvm_vcpu *vcpu = file->private_data;
> > > +     ssize_t copylen, len, remain = size;
> > > +     size_t size_header, size_desc, size_stats;
> > > +     loff_t pos = *offset;
> > > +     char __user *dest = user_buffer;
> > > +     void *src;
> >
> > Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
> > clang will do the expected thing.
> >
> > > +
> > > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > > +                     task_pid_nr(current), vcpu->vcpu_id);
> > > +     size_header = sizeof(kvm_vcpu_stats_header);
> > > +     size_desc =
> > > +             kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > +     size_stats = sizeof(vcpu->stat);
> > > +
> > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > +     len = min(len, remain);
> > > +     if (len <= 0)
> > > +             return 0;
> > > +     remain = len;
> >
> > If 'desc_offset' is not right after the header, then the 'len'
> > calculation is missing the gap into account. For example, assuming there
> > is a gap of 0x1000000 between the header and the descriptors:
> >
> >         desc_offset = sizeof(id) + size_header + 0x1000000
> >
> > and the user calls the ioctl with enough space for the whole file,
> > including the gap:
> >
> >         *offset = 0
> >         size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000
> >
> > then 'remain' gets the wrong size:
> >
> >         remain = sizeof(id) + size_header + size_desc + size_stats
> >
> > and ... (more below)
> >
> > > +
> > > +     /* Copy kvm vcpu stats header id string */
> > > +     copylen = sizeof(id) - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)id + pos;
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vcpu stats header */
> > > +     copylen = sizeof(id) + size_header - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)&kvm_vcpu_stats_header;
> > > +             src += pos - sizeof(id);
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vcpu stats descriptors */
> > > +     copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> >
> > This would be the state at this point:
> >
> >         pos     = sizeof(id) + size_header
> >         copylen = sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
> >                 = 0x1000000 + size_desc
> >         remain  = size_desc + size_stats
> >
> > > +     copylen = min(copylen, remain);
> >
> >         copylen = size_desc + size_stats
> >
> > which is not enough to copy the descriptors (and the data).
> >
> > > +     if (copylen > 0) {
> > > +             src = (void *)&kvm_vcpu_stats_desc;
> > > +             src += pos - kvm_vcpu_stats_header.desc_offset;
> >
> > Moreover, src also needs to take the gap into account.
> >
> >         src     = &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
> >                 = &kvm_vcpu_stats_desc - 0x1000000
> >
> > Otherwise, src ends up pointing at the wrong place.
> >
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vcpu stats values */
> > > +     copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> >
> > The same problem occurs here. There is a potential gap before
> > data_offset that needs to be taken into account for src and len.
> >
> > Would it be possible to just ensure that there is no gap? maybe even
> > remove data_offset and desc_offset and always place them adjacent, and
> > have the descriptors right after the header.
> >
> I guess I didn't make it clear about the offset fields in the header block.
> We don't create any gap here. In this implementation, kernel knows that
> descriptor block is right after header block and data block is right after
> descriptor block.
> The reason we have offset fields for descriptor block and data block is
> for flexibility and future potential extension. e.g. we might add another
> block between header block and descriptor block in the future for some
> other metadata information.
> I think we are good here.

Hi Jing,

I realize they are adjacent right now, as the function wouldn't work if
they weren't. My comment was more about code maintenance, what happens
if the layout changes. This function depends on an asumption about a
layout defined somewhere else. For example,

	copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;

makes an assumption about desc_offset being set to:

	.desc_offset = sizeof(struct kvm_stats_header),

and if desc_offset is not exactly that, then the function doesn't
explicitely fail and instead does unexpected things (probably undetected
by tests).

I think the solution is to just check the assumptions. Either an assert
or just bail out with a warning:

	/* This function currently depends on the following layout. */
	if (kvm_vm_stats_header.desc_offset != sizeof(struct kvm_stats_header) ||
			kvm_vm_stats_header.data_offset != sizeof(struct kvm_stats_header) +
			sizeof(kvm_vm_stats_desc)) {
		warning(...);
		return 0;
	}

> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)&vcpu->stat;
> > > +             src += pos - kvm_vcpu_stats_header.data_offset;
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +
> > > +     *offset = pos;
> > > +     return len;
> > > +}
> > > +
> > >
> >
> >
> >
> > > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > > +                           size_t size, loff_t *offset)
> > > +{
> >
> > Consider moving the common code between kvm_vcpu_stats_read and this one
> > into some function that takes pointers to header, desc, and data. Unless
> > there is something vcpu or vm specific besides that.
> >
> Will do that, thanks.
> > > +     char id[KVM_STATS_ID_MAXLEN];
> > > +     struct kvm *kvm = file->private_data;
> > > +     ssize_t copylen, len, remain = size;
> > > +     size_t size_header, size_desc, size_stats;
> > > +     loff_t pos = *offset;
> > > +     char __user *dest = user_buffer;
> > > +     void *src;
> > > +
> > > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > > +     size_header = sizeof(kvm_vm_stats_header);
> > > +     size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > +     size_stats = sizeof(kvm->stat);
> > > +
> > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > +     len = min(len, remain);
> > > +     if (len <= 0)
> > > +             return 0;
> > > +     remain = len;
> > > +
> > > +     /* Copy kvm vm stats header id string */
> > > +     copylen = sizeof(id) - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)id + pos;
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vm stats header */
> > > +     copylen = sizeof(id) + size_header - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)&kvm_vm_stats_header;
> > > +             src += pos - sizeof(id);
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vm stats descriptors */
> > > +     copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)&kvm_vm_stats_desc;
> > > +             src += pos - kvm_vm_stats_header.desc_offset;
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +     /* Copy kvm vm stats values */
> > > +     copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > > +     copylen = min(copylen, remain);
> > > +     if (copylen > 0) {
> > > +             src = (void *)&kvm->stat;
> > > +             src += pos - kvm_vm_stats_header.data_offset;
> > > +             if (copy_to_user(dest, src, copylen))
> > > +                     return -EFAULT;
> > > +             remain -= copylen;
> > > +             pos += copylen;
> > > +             dest += copylen;
> > > +     }
> > > +
> > > +     *offset = pos;
> > > +     return len;
> > > +}
> > > +
> > > --
> > > 2.31.1.751.gd2f1c929bd-goog
> > >
> > > _______________________________________________
> > > kvmarm mailing list
> > > kvmarm@lists.cs.columbia.edu
> > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-20 18:58       ` Ricardo Koller
@ 2021-05-20 19:46         ` Jing Zhang
  2021-05-20 20:50           ` Ricardo Koller
  0 siblings, 1 reply; 30+ messages in thread
From: Jing Zhang @ 2021-05-20 19:46 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

Hi Ricardo,

On Thu, May 20, 2021 at 1:58 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> On Thu, May 20, 2021 at 12:37:59PM -0500, Jing Zhang wrote:
> > Hi Ricardo,
> >
> > On Wed, May 19, 2021 at 11:21 PM Ricardo Koller <ricarkol@google.com> wrote:
> > >
> > > On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> > > > Provides a file descriptor per VM to read VM stats info/data.
> > > > Provides a file descriptor per vCPU to read vCPU stats info/data.
> > > >
> > > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > > ---
> > > >  arch/arm64/kvm/guest.c    |  26 +++++
> > > >  arch/mips/kvm/mips.c      |  52 +++++++++
> > > >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> > > >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> > > >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> > > >  arch/x86/kvm/x86.c        |  53 +++++++++
> > > >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> > > >  include/uapi/linux/kvm.h  |  50 +++++++++
> > > >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> > > >  9 files changed, 745 insertions(+)
> > > >
> > >
> > > > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > > > +                           size_t size, loff_t *offset)
> > > > +{
> > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > +     struct kvm_vcpu *vcpu = file->private_data;
> > > > +     ssize_t copylen, len, remain = size;
> > > > +     size_t size_header, size_desc, size_stats;
> > > > +     loff_t pos = *offset;
> > > > +     char __user *dest = user_buffer;
> > > > +     void *src;
> > >
> > > Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
> > > clang will do the expected thing.
> > >
> > > > +
> > > > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > > > +                     task_pid_nr(current), vcpu->vcpu_id);
> > > > +     size_header = sizeof(kvm_vcpu_stats_header);
> > > > +     size_desc =
> > > > +             kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > +     size_stats = sizeof(vcpu->stat);
> > > > +
> > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > +     len = min(len, remain);
> > > > +     if (len <= 0)
> > > > +             return 0;
> > > > +     remain = len;
> > >
> > > If 'desc_offset' is not right after the header, then the 'len'
> > > calculation is missing the gap into account. For example, assuming there
> > > is a gap of 0x1000000 between the header and the descriptors:
> > >
> > >         desc_offset = sizeof(id) + size_header + 0x1000000
> > >
> > > and the user calls the ioctl with enough space for the whole file,
> > > including the gap:
> > >
> > >         *offset = 0
> > >         size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000
> > >
> > > then 'remain' gets the wrong size:
> > >
> > >         remain = sizeof(id) + size_header + size_desc + size_stats
> > >
> > > and ... (more below)
> > >
> > > > +
> > > > +     /* Copy kvm vcpu stats header id string */
> > > > +     copylen = sizeof(id) - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)id + pos;
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vcpu stats header */
> > > > +     copylen = sizeof(id) + size_header - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&kvm_vcpu_stats_header;
> > > > +             src += pos - sizeof(id);
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vcpu stats descriptors */
> > > > +     copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> > >
> > > This would be the state at this point:
> > >
> > >         pos     = sizeof(id) + size_header
> > >         copylen = sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
> > >                 = 0x1000000 + size_desc
> > >         remain  = size_desc + size_stats
> > >
> > > > +     copylen = min(copylen, remain);
> > >
> > >         copylen = size_desc + size_stats
> > >
> > > which is not enough to copy the descriptors (and the data).
> > >
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&kvm_vcpu_stats_desc;
> > > > +             src += pos - kvm_vcpu_stats_header.desc_offset;
> > >
> > > Moreover, src also needs to take the gap into account.
> > >
> > >         src     = &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
> > >                 = &kvm_vcpu_stats_desc - 0x1000000
> > >
> > > Otherwise, src ends up pointing at the wrong place.
> > >
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vcpu stats values */
> > > > +     copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> > >
> > > The same problem occurs here. There is a potential gap before
> > > data_offset that needs to be taken into account for src and len.
> > >
> > > Would it be possible to just ensure that there is no gap? maybe even
> > > remove data_offset and desc_offset and always place them adjacent, and
> > > have the descriptors right after the header.
> > >
> > I guess I didn't make it clear about the offset fields in the header block.
> > We don't create any gap here. In this implementation, kernel knows that
> > descriptor block is right after header block and data block is right after
> > descriptor block.
> > The reason we have offset fields for descriptor block and data block is
> > for flexibility and future potential extension. e.g. we might add another
> > block between header block and descriptor block in the future for some
> > other metadata information.
> > I think we are good here.
>
> Hi Jing,
>
> I realize they are adjacent right now, as the function wouldn't work if
> they weren't. My comment was more about code maintenance, what happens
> if the layout changes. This function depends on an asumption about a
> layout defined somewhere else. For example,
>
>         copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
>
> makes an assumption about desc_offset being set to:
>
>         .desc_offset = sizeof(struct kvm_stats_header),
>
> and if desc_offset is not exactly that, then the function doesn't
> explicitely fail and instead does unexpected things (probably undetected
> by tests).
>
> I think the solution is to just check the assumptions. Either an assert
> or just bail out with a warning:
>
>         /* This function currently depends on the following layout. */
>         if (kvm_vm_stats_header.desc_offset != sizeof(struct kvm_stats_header) ||
>                         kvm_vm_stats_header.data_offset != sizeof(struct kvm_stats_header) +
>                         sizeof(kvm_vm_stats_desc)) {
>                 warning(...);
>                 return 0;
>         }
>
I understand your concern. But whenever layout changes, the read function needs
to be updated anyway. The read function is actually the place to cook
the data layout
of the anonymous file. If the vm/vcpu stats header has an incorrect
offset value that is
defined in the read function, the test will complain about wrong stats
descriptor field
values usually.
Anyway, I will add more sanity tests in the selftest to cover the
potential risks.
Thanks.
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&vcpu->stat;
> > > > +             src += pos - kvm_vcpu_stats_header.data_offset;
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +
> > > > +     *offset = pos;
> > > > +     return len;
> > > > +}
> > > > +
> > > >
> > >
> > >
> > >
> > > > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > > > +                           size_t size, loff_t *offset)
> > > > +{
> > >
> > > Consider moving the common code between kvm_vcpu_stats_read and this one
> > > into some function that takes pointers to header, desc, and data. Unless
> > > there is something vcpu or vm specific besides that.
> > >
> > Will do that, thanks.
> > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > +     struct kvm *kvm = file->private_data;
> > > > +     ssize_t copylen, len, remain = size;
> > > > +     size_t size_header, size_desc, size_stats;
> > > > +     loff_t pos = *offset;
> > > > +     char __user *dest = user_buffer;
> > > > +     void *src;
> > > > +
> > > > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > > > +     size_header = sizeof(kvm_vm_stats_header);
> > > > +     size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > +     size_stats = sizeof(kvm->stat);
> > > > +
> > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > +     len = min(len, remain);
> > > > +     if (len <= 0)
> > > > +             return 0;
> > > > +     remain = len;
> > > > +
> > > > +     /* Copy kvm vm stats header id string */
> > > > +     copylen = sizeof(id) - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)id + pos;
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vm stats header */
> > > > +     copylen = sizeof(id) + size_header - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&kvm_vm_stats_header;
> > > > +             src += pos - sizeof(id);
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vm stats descriptors */
> > > > +     copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&kvm_vm_stats_desc;
> > > > +             src += pos - kvm_vm_stats_header.desc_offset;
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +     /* Copy kvm vm stats values */
> > > > +     copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > > > +     copylen = min(copylen, remain);
> > > > +     if (copylen > 0) {
> > > > +             src = (void *)&kvm->stat;
> > > > +             src += pos - kvm_vm_stats_header.data_offset;
> > > > +             if (copy_to_user(dest, src, copylen))
> > > > +                     return -EFAULT;
> > > > +             remain -= copylen;
> > > > +             pos += copylen;
> > > > +             dest += copylen;
> > > > +     }
> > > > +
> > > > +     *offset = pos;
> > > > +     return len;
> > > > +}
> > > > +
> > > > --
> > > > 2.31.1.751.gd2f1c929bd-goog
> > > >
> > > > _______________________________________________
> > > > kvmarm mailing list
> > > > kvmarm@lists.cs.columbia.edu
> > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-20 19:46         ` Jing Zhang
@ 2021-05-20 20:50           ` Ricardo Koller
  2021-05-20 21:14             ` Jing Zhang
  0 siblings, 1 reply; 30+ messages in thread
From: Ricardo Koller @ 2021-05-20 20:50 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Thu, May 20, 2021 at 02:46:41PM -0500, Jing Zhang wrote:
> Hi Ricardo,
> 
> On Thu, May 20, 2021 at 1:58 PM Ricardo Koller <ricarkol@google.com> wrote:
> >
> > On Thu, May 20, 2021 at 12:37:59PM -0500, Jing Zhang wrote:
> > > Hi Ricardo,
> > >
> > > On Wed, May 19, 2021 at 11:21 PM Ricardo Koller <ricarkol@google.com> wrote:
> > > >
> > > > On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> > > > > Provides a file descriptor per VM to read VM stats info/data.
> > > > > Provides a file descriptor per vCPU to read vCPU stats info/data.
> > > > >
> > > > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > > > ---
> > > > >  arch/arm64/kvm/guest.c    |  26 +++++
> > > > >  arch/mips/kvm/mips.c      |  52 +++++++++
> > > > >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> > > > >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> > > > >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> > > > >  arch/x86/kvm/x86.c        |  53 +++++++++
> > > > >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> > > > >  include/uapi/linux/kvm.h  |  50 +++++++++
> > > > >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> > > > >  9 files changed, 745 insertions(+)
> > > > >
> > > >
> > > > > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > > > > +                           size_t size, loff_t *offset)
> > > > > +{
> > > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > > +     struct kvm_vcpu *vcpu = file->private_data;
> > > > > +     ssize_t copylen, len, remain = size;
> > > > > +     size_t size_header, size_desc, size_stats;
> > > > > +     loff_t pos = *offset;
> > > > > +     char __user *dest = user_buffer;
> > > > > +     void *src;
> > > >
> > > > Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
> > > > clang will do the expected thing.
> > > >
> > > > > +
> > > > > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > > > > +                     task_pid_nr(current), vcpu->vcpu_id);
> > > > > +     size_header = sizeof(kvm_vcpu_stats_header);
> > > > > +     size_desc =
> > > > > +             kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > > +     size_stats = sizeof(vcpu->stat);
> > > > > +
> > > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > > +     len = min(len, remain);
> > > > > +     if (len <= 0)
> > > > > +             return 0;
> > > > > +     remain = len;
> > > >
> > > > If 'desc_offset' is not right after the header, then the 'len'
> > > > calculation is missing the gap into account. For example, assuming there
> > > > is a gap of 0x1000000 between the header and the descriptors:
> > > >
> > > >         desc_offset = sizeof(id) + size_header + 0x1000000
> > > >
> > > > and the user calls the ioctl with enough space for the whole file,
> > > > including the gap:
> > > >
> > > >         *offset = 0
> > > >         size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000
> > > >
> > > > then 'remain' gets the wrong size:
> > > >
> > > >         remain = sizeof(id) + size_header + size_desc + size_stats
> > > >
> > > > and ... (more below)
> > > >
> > > > > +
> > > > > +     /* Copy kvm vcpu stats header id string */
> > > > > +     copylen = sizeof(id) - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)id + pos;
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vcpu stats header */
> > > > > +     copylen = sizeof(id) + size_header - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&kvm_vcpu_stats_header;
> > > > > +             src += pos - sizeof(id);
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vcpu stats descriptors */
> > > > > +     copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> > > >
> > > > This would be the state at this point:
> > > >
> > > >         pos     = sizeof(id) + size_header
> > > >         copylen = sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
> > > >                 = 0x1000000 + size_desc
> > > >         remain  = size_desc + size_stats
> > > >
> > > > > +     copylen = min(copylen, remain);
> > > >
> > > >         copylen = size_desc + size_stats
> > > >
> > > > which is not enough to copy the descriptors (and the data).
> > > >
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&kvm_vcpu_stats_desc;
> > > > > +             src += pos - kvm_vcpu_stats_header.desc_offset;
> > > >
> > > > Moreover, src also needs to take the gap into account.
> > > >
> > > >         src     = &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
> > > >                 = &kvm_vcpu_stats_desc - 0x1000000
> > > >
> > > > Otherwise, src ends up pointing at the wrong place.
> > > >
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vcpu stats values */
> > > > > +     copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> > > >
> > > > The same problem occurs here. There is a potential gap before
> > > > data_offset that needs to be taken into account for src and len.
> > > >
> > > > Would it be possible to just ensure that there is no gap? maybe even
> > > > remove data_offset and desc_offset and always place them adjacent, and
> > > > have the descriptors right after the header.
> > > >
> > > I guess I didn't make it clear about the offset fields in the header block.
> > > We don't create any gap here. In this implementation, kernel knows that
> > > descriptor block is right after header block and data block is right after
> > > descriptor block.
> > > The reason we have offset fields for descriptor block and data block is
> > > for flexibility and future potential extension. e.g. we might add another
> > > block between header block and descriptor block in the future for some
> > > other metadata information.
> > > I think we are good here.
> >
> > Hi Jing,
> >
> > I realize they are adjacent right now, as the function wouldn't work if
> > they weren't. My comment was more about code maintenance, what happens
> > if the layout changes. This function depends on an asumption about a
> > layout defined somewhere else. For example,
> >
> >         copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> >
> > makes an assumption about desc_offset being set to:
> >
> >         .desc_offset = sizeof(struct kvm_stats_header),
> >
> > and if desc_offset is not exactly that, then the function doesn't
> > explicitely fail and instead does unexpected things (probably undetected
> > by tests).
> >
> > I think the solution is to just check the assumptions. Either an assert
> > or just bail out with a warning:
> >
> >         /* This function currently depends on the following layout. */
> >         if (kvm_vm_stats_header.desc_offset != sizeof(struct kvm_stats_header) ||
> >                         kvm_vm_stats_header.data_offset != sizeof(struct kvm_stats_header) +
> >                         sizeof(kvm_vm_stats_desc)) {
> >                 warning(...);
> >                 return 0;
> >         }
> >
> I understand your concern. But whenever layout changes, the read function needs
> to be updated anyway. The read function is actually the place to cook
> the data layout
> of the anonymous file.

Could it be a good idea for header.data_offset and header.desc_offset to
be set here (in the function)? so the function has full control of the
file layout.

> If the vm/vcpu stats header has an incorrect
> offset value that is
> defined in the read function, the test will complain about wrong stats
> descriptor field
> values usually.
> Anyway, I will add more sanity tests in the selftest to cover the
> potential risks.
> Thanks.
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&vcpu->stat;
> > > > > +             src += pos - kvm_vcpu_stats_header.data_offset;
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +
> > > > > +     *offset = pos;
> > > > > +     return len;
> > > > > +}
> > > > > +
> > > > >
> > > >
> > > >
> > > >
> > > > > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > > > > +                           size_t size, loff_t *offset)
> > > > > +{
> > > >
> > > > Consider moving the common code between kvm_vcpu_stats_read and this one
> > > > into some function that takes pointers to header, desc, and data. Unless
> > > > there is something vcpu or vm specific besides that.
> > > >
> > > Will do that, thanks.
> > > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > > +     struct kvm *kvm = file->private_data;
> > > > > +     ssize_t copylen, len, remain = size;
> > > > > +     size_t size_header, size_desc, size_stats;
> > > > > +     loff_t pos = *offset;
> > > > > +     char __user *dest = user_buffer;
> > > > > +     void *src;
> > > > > +
> > > > > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > > > > +     size_header = sizeof(kvm_vm_stats_header);
> > > > > +     size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > > +     size_stats = sizeof(kvm->stat);
> > > > > +
> > > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > > +     len = min(len, remain);
> > > > > +     if (len <= 0)
> > > > > +             return 0;
> > > > > +     remain = len;
> > > > > +
> > > > > +     /* Copy kvm vm stats header id string */
> > > > > +     copylen = sizeof(id) - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)id + pos;
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vm stats header */
> > > > > +     copylen = sizeof(id) + size_header - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&kvm_vm_stats_header;
> > > > > +             src += pos - sizeof(id);
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vm stats descriptors */
> > > > > +     copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&kvm_vm_stats_desc;
> > > > > +             src += pos - kvm_vm_stats_header.desc_offset;
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +     /* Copy kvm vm stats values */
> > > > > +     copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > > > > +     copylen = min(copylen, remain);
> > > > > +     if (copylen > 0) {
> > > > > +             src = (void *)&kvm->stat;
> > > > > +             src += pos - kvm_vm_stats_header.data_offset;
> > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > +                     return -EFAULT;
> > > > > +             remain -= copylen;
> > > > > +             pos += copylen;
> > > > > +             dest += copylen;
> > > > > +     }
> > > > > +
> > > > > +     *offset = pos;
> > > > > +     return len;
> > > > > +}
> > > > > +
> > > > > --
> > > > > 2.31.1.751.gd2f1c929bd-goog
> > > > >
> > > > > _______________________________________________
> > > > > kvmarm mailing list
> > > > > kvmarm@lists.cs.columbia.edu
> > > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
> Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data
  2021-05-20 20:50           ` Ricardo Koller
@ 2021-05-20 21:14             ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-20 21:14 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Thu, May 20, 2021 at 3:51 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> On Thu, May 20, 2021 at 02:46:41PM -0500, Jing Zhang wrote:
> > Hi Ricardo,
> >
> > On Thu, May 20, 2021 at 1:58 PM Ricardo Koller <ricarkol@google.com> wrote:
> > >
> > > On Thu, May 20, 2021 at 12:37:59PM -0500, Jing Zhang wrote:
> > > > Hi Ricardo,
> > > >
> > > > On Wed, May 19, 2021 at 11:21 PM Ricardo Koller <ricarkol@google.com> wrote:
> > > > >
> > > > > On Mon, May 17, 2021 at 02:53:12PM +0000, Jing Zhang wrote:
> > > > > > Provides a file descriptor per VM to read VM stats info/data.
> > > > > > Provides a file descriptor per vCPU to read vCPU stats info/data.
> > > > > >
> > > > > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > > > > ---
> > > > > >  arch/arm64/kvm/guest.c    |  26 +++++
> > > > > >  arch/mips/kvm/mips.c      |  52 +++++++++
> > > > > >  arch/powerpc/kvm/book3s.c |  52 +++++++++
> > > > > >  arch/powerpc/kvm/booke.c  |  45 ++++++++
> > > > > >  arch/s390/kvm/kvm-s390.c  | 117 ++++++++++++++++++++
> > > > > >  arch/x86/kvm/x86.c        |  53 +++++++++
> > > > > >  include/linux/kvm_host.h  | 127 ++++++++++++++++++++++
> > > > > >  include/uapi/linux/kvm.h  |  50 +++++++++
> > > > > >  virt/kvm/kvm_main.c       | 223 ++++++++++++++++++++++++++++++++++++++
> > > > > >  9 files changed, 745 insertions(+)
> > > > > >
> > > > >
> > > > > > +static ssize_t kvm_vcpu_stats_read(struct file *file, char __user *user_buffer,
> > > > > > +                           size_t size, loff_t *offset)
> > > > > > +{
> > > > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > > > +     struct kvm_vcpu *vcpu = file->private_data;
> > > > > > +     ssize_t copylen, len, remain = size;
> > > > > > +     size_t size_header, size_desc, size_stats;
> > > > > > +     loff_t pos = *offset;
> > > > > > +     char __user *dest = user_buffer;
> > > > > > +     void *src;
> > > > >
> > > > > Nit. Better to do pointer arithmetic on a "char *".  Note that gcc and
> > > > > clang will do the expected thing.
> > > > >
> > > > > > +
> > > > > > +     snprintf(id, sizeof(id), "kvm-%d/vcpu-%d",
> > > > > > +                     task_pid_nr(current), vcpu->vcpu_id);
> > > > > > +     size_header = sizeof(kvm_vcpu_stats_header);
> > > > > > +     size_desc =
> > > > > > +             kvm_vcpu_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > > > +     size_stats = sizeof(vcpu->stat);
> > > > > > +
> > > > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > > > +     len = min(len, remain);
> > > > > > +     if (len <= 0)
> > > > > > +             return 0;
> > > > > > +     remain = len;
> > > > >
> > > > > If 'desc_offset' is not right after the header, then the 'len'
> > > > > calculation is missing the gap into account. For example, assuming there
> > > > > is a gap of 0x1000000 between the header and the descriptors:
> > > > >
> > > > >         desc_offset = sizeof(id) + size_header + 0x1000000
> > > > >
> > > > > and the user calls the ioctl with enough space for the whole file,
> > > > > including the gap:
> > > > >
> > > > >         *offset = 0
> > > > >         size = sizeof(id) + size_header + size_desc + size_stats + 0x1000000
> > > > >
> > > > > then 'remain' gets the wrong size:
> > > > >
> > > > >         remain = sizeof(id) + size_header + size_desc + size_stats
> > > > >
> > > > > and ... (more below)
> > > > >
> > > > > > +
> > > > > > +     /* Copy kvm vcpu stats header id string */
> > > > > > +     copylen = sizeof(id) - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)id + pos;
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vcpu stats header */
> > > > > > +     copylen = sizeof(id) + size_header - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&kvm_vcpu_stats_header;
> > > > > > +             src += pos - sizeof(id);
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vcpu stats descriptors */
> > > > > > +     copylen = kvm_vcpu_stats_header.desc_offset + size_desc - pos;
> > > > >
> > > > > This would be the state at this point:
> > > > >
> > > > >         pos     = sizeof(id) + size_header
> > > > >         copylen = sizeof(id) + size_header + 0x1000000 + size_desc - (sizeof(id) + size_header)
> > > > >                 = 0x1000000 + size_desc
> > > > >         remain  = size_desc + size_stats
> > > > >
> > > > > > +     copylen = min(copylen, remain);
> > > > >
> > > > >         copylen = size_desc + size_stats
> > > > >
> > > > > which is not enough to copy the descriptors (and the data).
> > > > >
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&kvm_vcpu_stats_desc;
> > > > > > +             src += pos - kvm_vcpu_stats_header.desc_offset;
> > > > >
> > > > > Moreover, src also needs to take the gap into account.
> > > > >
> > > > >         src     = &kvm_vcpu_stats_desc + (sizeof(id) + size_header) - (sizeof(id) + size_header + 0x1000000)
> > > > >                 = &kvm_vcpu_stats_desc - 0x1000000
> > > > >
> > > > > Otherwise, src ends up pointing at the wrong place.
> > > > >
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vcpu stats values */
> > > > > > +     copylen = kvm_vcpu_stats_header.data_offset + size_stats - pos;
> > > > >
> > > > > The same problem occurs here. There is a potential gap before
> > > > > data_offset that needs to be taken into account for src and len.
> > > > >
> > > > > Would it be possible to just ensure that there is no gap? maybe even
> > > > > remove data_offset and desc_offset and always place them adjacent, and
> > > > > have the descriptors right after the header.
> > > > >
> > > > I guess I didn't make it clear about the offset fields in the header block.
> > > > We don't create any gap here. In this implementation, kernel knows that
> > > > descriptor block is right after header block and data block is right after
> > > > descriptor block.
> > > > The reason we have offset fields for descriptor block and data block is
> > > > for flexibility and future potential extension. e.g. we might add another
> > > > block between header block and descriptor block in the future for some
> > > > other metadata information.
> > > > I think we are good here.
> > >
> > > Hi Jing,
> > >
> > > I realize they are adjacent right now, as the function wouldn't work if
> > > they weren't. My comment was more about code maintenance, what happens
> > > if the layout changes. This function depends on an asumption about a
> > > layout defined somewhere else. For example,
> > >
> > >         copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > >
> > > makes an assumption about desc_offset being set to:
> > >
> > >         .desc_offset = sizeof(struct kvm_stats_header),
> > >
> > > and if desc_offset is not exactly that, then the function doesn't
> > > explicitely fail and instead does unexpected things (probably undetected
> > > by tests).
> > >
> > > I think the solution is to just check the assumptions. Either an assert
> > > or just bail out with a warning:
> > >
> > >         /* This function currently depends on the following layout. */
> > >         if (kvm_vm_stats_header.desc_offset != sizeof(struct kvm_stats_header) ||
> > >                         kvm_vm_stats_header.data_offset != sizeof(struct kvm_stats_header) +
> > >                         sizeof(kvm_vm_stats_desc)) {
> > >                 warning(...);
> > >                 return 0;
> > >         }
> > >
> > I understand your concern. But whenever layout changes, the read function needs
> > to be updated anyway. The read function is actually the place to cook
> > the data layout
> > of the anonymous file.
>
> Could it be a good idea for header.data_offset and header.desc_offset to
> be set here (in the function)? so the function has full control of the
> file layout.
>
It is hard to do that since all those values are architecture dependent.
> > If the vm/vcpu stats header has an incorrect
> > offset value that is
> > defined in the read function, the test will complain about wrong stats
> > descriptor field
> > values usually.
> > Anyway, I will add more sanity tests in the selftest to cover the
> > potential risks.
> > Thanks.
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&vcpu->stat;
> > > > > > +             src += pos - kvm_vcpu_stats_header.data_offset;
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +
> > > > > > +     *offset = pos;
> > > > > > +     return len;
> > > > > > +}
> > > > > > +
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > > +static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
> > > > > > +                           size_t size, loff_t *offset)
> > > > > > +{
> > > > >
> > > > > Consider moving the common code between kvm_vcpu_stats_read and this one
> > > > > into some function that takes pointers to header, desc, and data. Unless
> > > > > there is something vcpu or vm specific besides that.
> > > > >
> > > > Will do that, thanks.
> > > > > > +     char id[KVM_STATS_ID_MAXLEN];
> > > > > > +     struct kvm *kvm = file->private_data;
> > > > > > +     ssize_t copylen, len, remain = size;
> > > > > > +     size_t size_header, size_desc, size_stats;
> > > > > > +     loff_t pos = *offset;
> > > > > > +     char __user *dest = user_buffer;
> > > > > > +     void *src;
> > > > > > +
> > > > > > +     snprintf(id, sizeof(id), "kvm-%d", task_pid_nr(current));
> > > > > > +     size_header = sizeof(kvm_vm_stats_header);
> > > > > > +     size_desc = kvm_vm_stats_header.count * sizeof(struct _kvm_stats_desc);
> > > > > > +     size_stats = sizeof(kvm->stat);
> > > > > > +
> > > > > > +     len = sizeof(id) + size_header + size_desc + size_stats - pos;
> > > > > > +     len = min(len, remain);
> > > > > > +     if (len <= 0)
> > > > > > +             return 0;
> > > > > > +     remain = len;
> > > > > > +
> > > > > > +     /* Copy kvm vm stats header id string */
> > > > > > +     copylen = sizeof(id) - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)id + pos;
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vm stats header */
> > > > > > +     copylen = sizeof(id) + size_header - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&kvm_vm_stats_header;
> > > > > > +             src += pos - sizeof(id);
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vm stats descriptors */
> > > > > > +     copylen = kvm_vm_stats_header.desc_offset + size_desc - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&kvm_vm_stats_desc;
> > > > > > +             src += pos - kvm_vm_stats_header.desc_offset;
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +     /* Copy kvm vm stats values */
> > > > > > +     copylen = kvm_vm_stats_header.data_offset + size_stats - pos;
> > > > > > +     copylen = min(copylen, remain);
> > > > > > +     if (copylen > 0) {
> > > > > > +             src = (void *)&kvm->stat;
> > > > > > +             src += pos - kvm_vm_stats_header.data_offset;
> > > > > > +             if (copy_to_user(dest, src, copylen))
> > > > > > +                     return -EFAULT;
> > > > > > +             remain -= copylen;
> > > > > > +             pos += copylen;
> > > > > > +             dest += copylen;
> > > > > > +     }
> > > > > > +
> > > > > > +     *offset = pos;
> > > > > > +     return len;
> > > > > > +}
> > > > > > +
> > > > > > --
> > > > > > 2.31.1.751.gd2f1c929bd-goog
> > > > > >
> > > > > > _______________________________________________
> > > > > > kvmarm mailing list
> > > > > > kvmarm@lists.cs.columbia.edu
> > > > > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> >
> > Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 4/4] KVM: selftests: Add selftest for KVM statistics data binary interface
  2021-05-19 22:00   ` Ricardo Koller
  2021-05-19 22:54     ` Jing Zhang
@ 2021-05-20 21:30     ` Jing Zhang
  1 sibling, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-20 21:30 UTC (permalink / raw)
  To: Ricardo Koller
  Cc: KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390, Linuxkselftest,
	Paolo Bonzini, Marc Zyngier, James Morse, Julien Thierry,
	Suzuki K Poulose, Will Deacon, Huacai Chen, Aleksandar Markovic,
	Thomas Bogendoerfer, Paul Mackerras, Christian Borntraeger,
	Janosch Frank, David Hildenbrand, Cornelia Huck, Claudio Imbrenda,
	Sean Christopherson, Vitaly Kuznetsov, Jim Mattson, Peter Shier,
	Oliver Upton, David Rientjes, Emanuele Giuseppe Esposito

On Wed, May 19, 2021 at 5:00 PM Ricardo Koller <ricarkol@google.com> wrote:
>
> On Mon, May 17, 2021 at 02:53:14PM +0000, Jing Zhang wrote:
> > Add selftest to check KVM stats descriptors validity.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  tools/testing/selftests/kvm/.gitignore        |   1 +
> >  tools/testing/selftests/kvm/Makefile          |   3 +
> >  .../testing/selftests/kvm/include/kvm_util.h  |   3 +
> >  .../selftests/kvm/kvm_bin_form_stats.c        | 379 ++++++++++++++++++
> >  tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +
> >  5 files changed, 398 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
> >
> > diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
> > index bd83158e0e0b..35796667c944 100644
> > --- a/tools/testing/selftests/kvm/.gitignore
> > +++ b/tools/testing/selftests/kvm/.gitignore
> > @@ -43,3 +43,4 @@
> >  /memslot_modification_stress_test
> >  /set_memory_region_test
> >  /steal_time
> > +/kvm_bin_form_stats
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index e439d027939d..2984c86c848a 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> >  TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> >  TEST_GEN_PROGS_x86_64 += set_memory_region_test
> >  TEST_GEN_PROGS_x86_64 += steal_time
> > +TEST_GEN_PROGS_x86_64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> >  TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
> > @@ -87,6 +88,7 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> >  TEST_GEN_PROGS_aarch64 += set_memory_region_test
> >  TEST_GEN_PROGS_aarch64 += steal_time
> > +TEST_GEN_PROGS_aarch64 += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS_s390x = s390x/memop
> >  TEST_GEN_PROGS_s390x += s390x/resets
> > @@ -96,6 +98,7 @@ TEST_GEN_PROGS_s390x += dirty_log_test
> >  TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> >  TEST_GEN_PROGS_s390x += kvm_page_table_test
> >  TEST_GEN_PROGS_s390x += set_memory_region_test
> > +TEST_GEN_PROGS_s390x += kvm_bin_form_stats
> >
> >  TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> >  LIBKVM += $(LIBKVM_$(UNAME_M))
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> > index a8f022794ce3..ee01a67022d9 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> > @@ -387,4 +387,7 @@ uint64_t get_ucall(struct kvm_vm *vm, uint32_t vcpu_id, struct ucall *uc);
> >  #define GUEST_ASSERT_4(_condition, arg1, arg2, arg3, arg4) \
> >       __GUEST_ASSERT((_condition), 4, (arg1), (arg2), (arg3), (arg4))
> >
> > +int vm_get_statsfd(struct kvm_vm *vm);
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid);
> > +
> >  #endif /* SELFTEST_KVM_UTIL_H */
> > diff --git a/tools/testing/selftests/kvm/kvm_bin_form_stats.c b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > new file mode 100644
> > index 000000000000..dae44397d0f4
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/kvm_bin_form_stats.c
> > @@ -0,0 +1,379 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * kvm_bin_form_stats
> > + *
> > + * Copyright (C) 2021, Google LLC.
> > + *
> > + * Test the fd-based interface for KVM statistics.
> > + */
> > +
> > +#define _GNU_SOURCE /* for program_invocation_short_name */
> > +#include <fcntl.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <errno.h>
> > +
> > +#include "test_util.h"
> > +
> > +#include "kvm_util.h"
> > +#include "asm/kvm.h"
> > +#include "linux/kvm.h"
> > +
> > +int vm_stats_test(struct kvm_vm *vm)
> > +{
> > +     ssize_t ret;
> > +     int i, stats_fd, err = -1;
> > +     size_t size_desc, size_data = 0;
> > +     struct kvm_stats_header header;
> > +     struct kvm_stats_desc *stats_desc, *pdesc;
> > +     struct kvm_vm_stats_data *stats_data;
> > +
> > +     /* Get fd for VM stats */
> > +     stats_fd = vm_get_statsfd(vm);
> > +     if (stats_fd < 0) {
> > +             perror("Get VM stats fd");
> > +             return err;
> > +     }
>
> It seems that the only difference between vm_stats_test and
> vcpu_stats_test is what function to use for getting the fd.  If that's
> the case, it might be better to move all the checks to a common
> function.
>
> > +     /* Read kvm vm stats header */
> > +     ret = read(stats_fd, &header, sizeof(header));
> > +     if (ret != sizeof(header)) {
> > +             perror("Read VM stats header");
> > +             goto out_close_fd;
> > +     }
> > +     size_desc = sizeof(*stats_desc) + header.name_size;
> > +     /* Check id string in header, that should start with "kvm" */
> > +     if (strncmp(header.id, "kvm", 3) ||
> > +                     strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +             printf("Invalid KVM VM stats type!\n");
> > +             goto out_close_fd;
> > +     }
> > +     /* Sanity check for other fields in header */
> > +     if (header.count == 0) {
> > +             err = 0;
> > +             goto out_close_fd;
> > +     }
>
> As mentioned by David, it would be better to replace the checks with
> TEST_ASSERT's. Most other selftests rely on TEST_ASSERT.
>
> > +     /* Check overlap */
> > +     if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                     header.desc_offset < sizeof(header) ||
> > +                     header.data_offset < sizeof(header)) {
> > +             printf("Invalid offset fields in header!\n");
> > +             goto out_close_fd;
> > +     }
> > +     if (header.desc_offset < header.data_offset &&
> > +                     (header.desc_offset + size_desc * header.count >
> > +                     header.data_offset)) {
>
> Could you make the check more strict?
>
> TEST_ASSERT(header.desc_offset + size_desc * header.count == header.data_offset,
>         "The data block should be at the end of the descriptor block.");
>
> > +             printf("VM Descriptor block is overlapped with data block!\n");
> > +             goto out_close_fd;
> > +     }
> > +
> > +     /* Allocate memory for stats descriptors */
> > +     stats_desc = calloc(header.count, size_desc);
> > +     if (!stats_desc) {
> > +             perror("Allocate memory for VM stats descriptors");
> > +             goto out_close_fd;
> > +     }
> > +     /* Read kvm vm stats descriptors */
> > +     ret = pread(stats_fd, stats_desc,
> > +                     size_desc * header.count, header.desc_offset);
>
> You could stress kvm_vm_stats_read() more by calling pread for more
> offsets. For example, for every descriptor:
>
>         pread(..., header.desc_offset + i * size_desc)
>
> I realize that the typical usage will be to read once for all
> descriptors. But kvm_vm_stats_read (and kvm_vcpu_stats_read) need to
> handle any offset, and doing so seems to be quite complicated.
>
> Actually, you could stress kvm_vm_stats_read() even more by calling it
> for _every_ possible offset (and eventually invalid offsets and sizes).
> One easier way to check this is by calling read all descriptors into
> some reference buffer using a single pread, and then call it for all
> offsets while comparing against the reference buf.
>
> > +     if (ret != size_desc * header.count) {
> > +             perror("Read KVM VM stats descriptors");
> > +             goto out_free_desc;
> > +     }
> > +     /* Sanity check for fields in descriptors */
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
>
> cast to (struct kvm_stats_desc *)
>
> > +             /* Check type,unit,scale boundaries */
> > +             if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                     printf("Unknown KVM stats type!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                     printf("Unknown KVM stats unit!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                             KVM_STATS_SCALE_MAX) {
> > +                     printf("Unknown KVM stats scale!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check exponent for stats unit
> > +              * Exponent for counter should be greater than or equal to 0
> > +              * Exponent for unit bytes should be greater than or equal to 0
> > +              * Exponent for unit seconds should be less than or equal to 0
> > +              * Exponent for unit clock cycles should be greater than or
> > +              * equal to 0
> > +              */
> > +             switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +             case KVM_STATS_UNIT_NONE:
> > +             case KVM_STATS_UNIT_BYTES:
> > +             case KVM_STATS_UNIT_CYCLES:
> > +                     if (pdesc->exponent < 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             case KVM_STATS_UNIT_SECONDS:
> > +                     if (pdesc->exponent > 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
>
>                 default:
>                         TEST_FAIL("Unexpected unit ...");
>
> > +             }
> > +             /* Check name string */
> > +             if (strlen(pdesc->name) >= header.name_size) {
> > +                     printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                     goto out_free_desc;
> > +             }
>
> Tighter check:
>
> TEST_ASSERT(header.name_size > 0 &&
>         strlen(pdesc->name) + 1 == header.name_size);
>
> > +             /* Check size field, which should not be zero */
> > +             if (pdesc->size == 0) {
> > +                     printf("KVM descriptor(%s) with size of 0!\n",
> > +                                     pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +     }
> > +     /* Check overlap */
> > +     if (header.data_offset < header.desc_offset &&
> > +             header.data_offset + size_data > header.desc_offset) {
> > +             printf("Data block is overlapped with Descriptor block!\n");
> > +             goto out_free_desc;
> > +     }
>
> This won't be needed if you use the suggested TEST_ASSERT (the other
> overlap check).
>
> > +     /* Check validity of all stats data size */
> > +     if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +             printf("Data size is not correct!\n");
> > +             goto out_free_desc;
> > +     }
>
> Tighter check:
>
> TEST_ASSERT(size_data == header.count * stats_data->value[0]);
>
The value of size_data should be equal or larger than
header.count * stats_data->value[0], since some stats may
have size (the size field in descriptor) larger than 1.
> > +
> > +     /* Allocate memory for stats data */
> > +     stats_data = malloc(size_data);
> > +     if (!stats_data) {
> > +             perror("Allocate memory for VM stats data");
> > +             goto out_free_desc;
> > +     }
> > +     /* Read kvm vm stats data */
> > +     ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +     if (ret != size_data) {
> > +             perror("Read KVM VM stats data");
> > +             goto out_free_data;
> > +     }
> > +
> > +     err = 0;
> > +out_free_data:
> > +     free(stats_data);
> > +out_free_desc:
> > +     free(stats_desc);
> > +out_close_fd:
> > +     close(stats_fd);
> > +     return err;
> > +}
> > +
> > +int vcpu_stats_test(struct kvm_vm *vm, int vcpu_id)
> > +{
> > +     ssize_t ret;
> > +     int i, stats_fd, err = -1;
> > +     size_t size_desc, size_data = 0;
> > +     struct kvm_stats_header header;
> > +     struct kvm_stats_desc *stats_desc, *pdesc;
> > +     struct kvm_vcpu_stats_data *stats_data;
> > +
> > +     /* Get fd for VCPU stats */
> > +     stats_fd = vcpu_get_statsfd(vm, vcpu_id);
> > +     if (stats_fd < 0) {
> > +             perror("Get VCPU stats fd");
> > +             return err;
> > +     }
> > +     /* Read kvm vcpu stats header */
> > +     ret = read(stats_fd, &header, sizeof(header));
> > +     if (ret != sizeof(header)) {
> > +             perror("Read VCPU stats header");
> > +             goto out_close_fd;
> > +     }
> > +     size_desc = sizeof(*stats_desc) + header.name_size;
> > +     /* Check id string in header, that should start with "kvm" */
> > +     if (strncmp(header.id, "kvm", 3) ||
> > +                     strlen(header.id) >= KVM_STATS_ID_MAXLEN) {
> > +             printf("Invalid KVM VCPU stats type!\n");
> > +             goto out_close_fd;
> > +     }
> > +     /* Sanity check for other fields in header */
> > +     if (header.count == 0) {
> > +             err = 0;
> > +             goto out_close_fd;
> > +     }
> > +     /* Check overlap */
> > +     if (header.desc_offset == 0 || header.data_offset == 0 ||
> > +                     header.desc_offset < sizeof(header) ||
> > +                     header.data_offset < sizeof(header)) {
> > +             printf("Invalid offset fields in header!\n");
> > +             goto out_close_fd;
> > +     }
> > +     if (header.desc_offset < header.data_offset &&
> > +                     (header.desc_offset + size_desc * header.count >
> > +                     header.data_offset)) {
> > +             printf("VCPU Descriptor block is overlapped with data block!\n");
> > +             goto out_close_fd;
> > +     }
>
> Same as above (tighter check).
>
> > +
> > +     /* Allocate memory for stats descriptors */
> > +     stats_desc = calloc(header.count, size_desc);
> > +     if (!stats_desc) {
> > +             perror("Allocate memory for VCPU stats descriptors");
> > +             goto out_close_fd;
> > +     }
> > +     /* Read kvm vcpu stats descriptors */
> > +     ret = pread(stats_fd, stats_desc,
> > +                     size_desc * header.count, header.desc_offset);
> > +     if (ret != size_desc * header.count) {
> > +             perror("Read KVM VCPU stats descriptors");
> > +             goto out_free_desc;
> > +     }
> > +     /* Sanity check for fields in descriptors */
> > +     for (i = 0; i < header.count; ++i) {
> > +             pdesc = (void *)stats_desc + i * size_desc;
> > +             /* Check boundaries */
> > +             if ((pdesc->flags & KVM_STATS_TYPE_MASK) > KVM_STATS_TYPE_MAX) {
> > +                     printf("Unknown KVM stats type!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_UNIT_MASK) > KVM_STATS_UNIT_MAX) {
> > +                     printf("Unknown KVM stats unit!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             if ((pdesc->flags & KVM_STATS_SCALE_MASK) >
> > +                             KVM_STATS_SCALE_MAX) {
> > +                     printf("Unknown KVM stats scale!\n");
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check exponent for stats unit
> > +              * Exponent for counter should be greater than or equal to 0
> > +              * Exponent for unit bytes should be greater than or equal to 0
> > +              * Exponent for unit seconds should be less than or equal to 0
> > +              * Exponent for unit clock cycles should be greater than or
> > +              * equal to 0
> > +              */
> > +             switch (pdesc->flags & KVM_STATS_UNIT_MASK) {
> > +             case KVM_STATS_UNIT_NONE:
> > +             case KVM_STATS_UNIT_BYTES:
> > +             case KVM_STATS_UNIT_CYCLES:
> > +                     if (pdesc->exponent < 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             case KVM_STATS_UNIT_SECONDS:
> > +                     if (pdesc->exponent > 0) {
> > +                             printf("Unsupported KVM stats unit!\n");
> > +                             goto out_free_desc;
> > +                     }
> > +                     break;
> > +             }
> > +             /* Check name string */
> > +             if (strlen(pdesc->name) >= header.name_size) {
> > +                     printf("KVM stats name(%s) too long!\n", pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             /* Check size field, which should not be zero */
> > +             if (pdesc->size == 0) {
> > +                     printf("KVM descriptor(%s) with size of 0!\n",
> > +                                     pdesc->name);
> > +                     goto out_free_desc;
> > +             }
> > +             size_data += pdesc->size * sizeof(stats_data->value[0]);
> > +     }
> > +     /* Check overlap */
> > +     if (header.data_offset < header.desc_offset &&
> > +             header.data_offset + size_data > header.desc_offset) {
> > +             printf("Data block is overlapped with Descriptor block!\n");
> > +             goto out_free_desc;
> > +     }
> > +     /* Check validity of all stats data size */
> > +     if (size_data < header.count * sizeof(stats_data->value[0])) {
> > +             printf("Data size is not correct!\n");
> > +             goto out_free_desc;
> > +     }
> > +
> > +     /* Allocate memory for stats data */
> > +     stats_data = malloc(size_data);
> > +     if (!stats_data) {
> > +             perror("Allocate memory for VCPU stats data");
> > +             goto out_free_desc;
> > +     }
> > +     /* Read kvm vcpu stats data */
> > +     ret = pread(stats_fd, stats_data, size_data, header.data_offset);
> > +     if (ret != size_data) {
> > +             perror("Read KVM VCPU stats data");
> > +             goto out_free_data;
> > +     }
> > +
> > +     err = 0;
> > +out_free_data:
> > +     free(stats_data);
> > +out_free_desc:
> > +     free(stats_desc);
> > +out_close_fd:
> > +     close(stats_fd);
> > +     return err;
> > +}
> > +
> > +/*
> > + * Usage: kvm_bin_form_stats [#vm] [#vcpu]
> > + * The first parameter #vm set the number of VMs being created.
> > + * The second parameter #vcpu set the number of VCPUs being created.
> > + * By default, 1 VM and 1 VCPU for the VM would be created for testing.
> > + */
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +     int max_vm = 1, max_vcpu = 1, ret, i, j, err = -1;
> > +     struct kvm_vm **vms;
> > +
> > +     /* Get the number of VMs and VCPUs that would be created for testing. */
> > +     if (argc > 1) {
> > +             max_vm = strtol(argv[1], NULL, 0);
> > +             if (max_vm <= 0)
> > +                     max_vm = 1;
> > +     }
> > +     if (argc > 2) {
> > +             max_vcpu = strtol(argv[2], NULL, 0);
> > +             if (max_vcpu <= 0)
> > +                     max_vcpu = 1;
> > +     }
> > +
> > +     /* Check the extension for binary stats */
> > +     ret = kvm_check_cap(KVM_CAP_STATS_BINARY_FD);
> > +     if (ret < 0) {
> > +             printf("Binary form statistics interface is not supported!\n");
> > +             return err;
> > +     }
> > +
> > +     /* Create VMs and VCPUs */
> > +     vms = malloc(sizeof(vms[0]) * max_vm);
> > +     if (!vms) {
> > +             perror("Allocate memory for storing VM pointers");
> > +             return err;
> > +     }
> > +     for (i = 0; i < max_vm; ++i) {
> > +             vms[i] = vm_create(VM_MODE_DEFAULT,
> > +                             DEFAULT_GUEST_PHY_PAGES, O_RDWR);
> > +             for (j = 0; j < max_vcpu; ++j)
> > +                     vm_vcpu_add(vms[i], j);
> > +     }
> > +
> > +     /* Check stats read for every VM and VCPU */
> > +     for (i = 0; i < max_vm; ++i) {
> > +             if (vm_stats_test(vms[i]))
> > +                     goto out_free_vm;
> > +             for (j = 0; j < max_vcpu; ++j) {
> > +                     if (vcpu_stats_test(vms[i], j))
> > +                             goto out_free_vm;
> > +             }
> > +     }
> > +
> > +     err = 0;
> > +out_free_vm:
> > +     for (i = 0; i < max_vm; ++i)
> > +             kvm_vm_free(vms[i]);
> > +     free(vms);
> > +     return err;
> > +}
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index fc83f6c5902d..d9e0b2c8b906 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2090,3 +2090,15 @@ unsigned int vm_calc_num_guest_pages(enum vm_guest_mode mode, size_t size)
> >       n = DIV_ROUND_UP(size, vm_guest_mode_params[mode].page_size);
> >       return vm_adjust_num_guest_pages(mode, n);
> >  }
> > +
> > +int vm_get_statsfd(struct kvm_vm *vm)
> > +{
> > +     return ioctl(vm->fd, KVM_STATS_GETFD, NULL);
> > +}
> > +
> > +int vcpu_get_statsfd(struct kvm_vm *vm, uint32_t vcpuid)
> > +{
> > +     struct vcpu *vcpu = vcpu_find(vm, vcpuid);
> > +
> > +     return ioctl(vcpu->fd, KVM_STATS_GETFD, NULL);
> > +}
> > --
> > 2.31.1.751.gd2f1c929bd-goog
> >
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones
  2021-05-18 18:40           ` Krish Sadhukhan
@ 2021-05-21 19:04             ` Jing Zhang
  0 siblings, 0 replies; 30+ messages in thread
From: Jing Zhang @ 2021-05-21 19:04 UTC (permalink / raw)
  To: Krish Sadhukhan
  Cc: David Matlack, KVM, KVMARM, LinuxMIPS, KVMPPC, LinuxS390,
	Linuxkselftest, Paolo Bonzini, Marc Zyngier, James Morse,
	Julien Thierry, Suzuki K Poulose, Will Deacon, Huacai Chen,
	Aleksandar Markovic, Thomas Bogendoerfer, Paul Mackerras,
	Christian Borntraeger, Janosch Frank, David Hildenbrand,
	Cornelia Huck, Claudio Imbrenda, Sean Christopherson,
	Vitaly Kuznetsov, Jim Mattson, Peter Shier, Oliver Upton,
	David Rientjes, Emanuele Giuseppe Esposito

On Tue, May 18, 2021 at 1:40 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 5/18/21 10:25 AM, Jing Zhang wrote:
> > Hi David,
> >
> > On Tue, May 18, 2021 at 11:27 AM David Matlack <dmatlack@google.com> wrote:
> >> On Mon, May 17, 2021 at 5:10 PM Jing Zhang <jingzhangos@google.com> wrote:
> >> <snip>
> >>> Actually the definition of kvm_{vcpu,vm}_stat are arch specific. There is
> >>> no real structure for arch agnostic stats. Most of the stats in common
> >>> structures are arch agnostic, but not all of them.
> >>> There are some benefits to put all common stats in a separate structure.
> >>> e.g. if we want to add a stat in kvm_main.c, we only need to add this stat
> >>> in the common structure, don't have to update all kvm_{vcpu,vm}_stat
> >>> definition for all architectures.
> >> I meant rename the existing arch-specific struct kvm_{vcpu,vm}_stat to
> >> kvm_{vcpu,vm}_stat_arch and rename struct kvm_{vcpu,vm}_stat_common to
> >> kvm_{vcpu,vm}_stat.
> >>
> >> So in  include/linux/kvm_types.h you'd have:
> >>
> >> struct kvm_vm_stat {
> >>    ulong remote_tlb_flush;
> >>    struct kvm_vm_stat_arch arch;
> >> };
> >>
> >> struct kvm_vcpu_stat {
> >>    u64 halt_successful_poll;
> >>    u64 halt_attempted_poll;
> >>    u64 halt_poll_invalid;
> >>    u64 halt_wakeup;
> >>    u64 halt_poll_success_ns;
> >>    u64 halt_poll_fail_ns;
> >>    struct kvm_vcpu_stat_arch arch;
> >> };
> >>
> >> And in arch/x86/include/asm/kvm_host.h you'd have:
> >>
> >> struct kvm_vm_stat_arch {
> >>    ulong mmu_shadow_zapped;
> >>    ...
> >> };
> >>
> >> struct kvm_vcpu_stat_arch {
> >>    u64 pf_fixed;
> >>    u64 pf_guest;
> >>    u64 tlb_flush;
> >>    ...
> >> };
> >>
> >> You still have the same benefits of having an arch-neutral place to
> >> store stats but the struct layout more closely resembles struct
> >> kvm_vcpu and struct kvm.
> > You are right. This is a more reasonable way to layout the structures.
> > I remember that I didn't choose this way is only because that it needs
> > touching every arch specific stats in all architectures (stat.name ->
> > stat.arch.name) instead of only touching arch neutral stats.
> > Let's see if there is any vote from others about this.
>
>
> +1
>
> >
> > Thanks,
> > Jing
It is still not fun to change hundreds of stats update code in every
architectures.
Let's keep it as it is for now and see how it is going.

Thanks,
Jing

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-05-21 19:04 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-17 14:53 [PATCH v5 0/4] KVM statistics data fd-based binary interface Jing Zhang
2021-05-17 14:53 ` [PATCH v5 1/4] KVM: stats: Separate common stats from architecture specific ones Jing Zhang
2021-05-17 23:39   ` David Matlack
2021-05-18  0:10     ` Jing Zhang
2021-05-18 16:27       ` David Matlack
2021-05-18 17:25         ` Jing Zhang
2021-05-18 18:40           ` Krish Sadhukhan
2021-05-21 19:04             ` Jing Zhang
2021-05-17 14:53 ` [PATCH v5 2/4] KVM: stats: Add fd-based API to read binary stats data Jing Zhang
2021-05-19 17:12   ` David Matlack
2021-05-19 19:02     ` Jing Zhang
2021-05-20  4:21   ` Ricardo Koller
2021-05-20 17:37     ` Jing Zhang
2021-05-20 18:58       ` Ricardo Koller
2021-05-20 19:46         ` Jing Zhang
2021-05-20 20:50           ` Ricardo Koller
2021-05-20 21:14             ` Jing Zhang
2021-05-17 14:53 ` [PATCH v5 3/4] KVM: stats: Add documentation for statistics data binary interface Jing Zhang
2021-05-19 16:57   ` David Matlack
2021-05-19 19:29     ` Jing Zhang
2021-05-19 20:30       ` Jing Zhang
2021-05-19 17:02   ` David Matlack
2021-05-19 19:30     ` Jing Zhang
2021-05-17 14:53 ` [PATCH v5 4/4] KVM: selftests: Add selftest for KVM " Jing Zhang
2021-05-19 17:21   ` David Matlack
2021-05-19 17:58     ` Jing Zhang
2021-05-19 22:00   ` Ricardo Koller
2021-05-19 22:54     ` Jing Zhang
2021-05-20 21:30     ` Jing Zhang
2021-05-17 14:55 ` [PATCH v5 0/4] KVM statistics data fd-based " Jing Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).