From: Richard Henderson <richard.henderson@linaro.org>
To: Don Porter <porter@cs.unc.edu>, qemu-devel@nongnu.org
Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com,
philmd@linaro.org, berrange@redhat.com
Subject: Re: [PATCH v4 3/7] Add an "info pg" command that prints the current page tables
Date: Wed, 24 Jul 2024 13:33:13 +1000 [thread overview]
Message-ID: <d3316e48-e601-4a43-b77e-2bf127d91a7a@linaro.org> (raw)
In-Reply-To: <20240723010545.3648706-4-porter@cs.unc.edu>
On 7/23/24 11:05, Don Porter wrote:
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index d946161717..c70d31433d 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -605,10 +605,11 @@ extern bool mttcg_enabled;
> /**
> * cpu_paging_enabled:
> * @cpu: The CPU whose state is to be inspected.
> + * @mmu_idx: 0 == traditional paging, 1 == nested paging
> *
> * Returns: %true if paging is enabled, %false otherwise.
> */
> -bool cpu_paging_enabled(const CPUState *cpu);
> +bool cpu_paging_enabled(const CPUState *cpu, int mmu_idx);
mmu_idx already means something very different to TCG.
You will only confuse matters by using this name for a different concept here.
> @@ -671,9 +672,82 @@ int cpu_write_elf32_qemunote(WriteCoreDumpFunction f, CPUState *cpu,
> * Caller is responsible for freeing the data.
> */
> GuestPanicInformation *cpu_get_crash_info(CPUState *cpu);
> -
> #endif /* !CONFIG_USER_ONLY */
>
> +/* Maximum supported page table height - currently x86 at 5 */
> +#define MAX_HEIGHT 5
> +
> +typedef struct PageTableLayout {
> + int height; /* Height of the page table */
> + int entries_per_node[MAX_HEIGHT + 1];
> +} PageTableLayout;
> +
> +typedef struct DecodedPTE {
> + int prot; /* Always populated, arch-specific, decoded flags */
> + bool present;
> + bool leaf; /* Only valid if present */
> + bool reserved_bits_ok;
> + bool user_read_ok;
> + bool user_write_ok;
> + bool user_exec_ok;
> + bool super_read_ok;
> + bool super_write_ok;
> + bool super_exec_ok;
> + bool dirty;
> + hwaddr child; /* Only valid if present and !leaf */
> + uint64_t leaf_page_size; /* Only valid if present and leaf */
> + uint64_t nested_page_size; /*
> + * If nested paging, the page size of the host
> + * page storing the data, versus the size of the
> + * guest page frame in leaf_page_size
> + */
> + vaddr bits_translated; /*
> + * The virtual address bits translated in walking
> + * the page table to node[i].
> + */
> + hwaddr pte_addr; /* (guest) physical address of the PTE */
> + hwaddr pte_host_addr; /* (host) physical address of the PTE */
> + uint64_t pte_contents; /* Raw contents of the pte */
> +} DecodedPTE;
> +
> +typedef int (*qemu_page_walker_for_each)(CPUState *cs, void *data,
> + DecodedPTE *pte,
> + int height, int offset, int mmu_idx,
> + const PageTableLayout *layout);
> +
> +/**
> + * for_each_pte - iterate over a page table, and
> + * call fn on each entry
> + *
> + * @cs - CPU state
> + * @fn(cs, data, pte, height, offset, layout) - User-provided function to call
> + * on each pte.
> + * * @cs - pass through cs
> + * * @data - user-provided, opaque pointer
> + * * @pte - current pte, decoded
> + * * @height - height in the tree of pte
> + * * @offset - offset within the page tabe node
> + * * @layout - pointer to a PageTableLayout for this tree
> + * @data - opaque pointer; passed through to fn
> + * @visit_interior_nodes - if true, call fn() on interior entries in
> + * page table; if false, visit only leaf entries.
> + * @visit_not_present - if true, call fn() on entries that are not present.
> + * if false, visit only present entries.
> + * @mmu_idx - Which level of the mmu we are interested in:
> + * 0 == user mode, 1 == nested page table
> + * Note that MMU_*_IDX macros are not consistent across
> + * architectures.
> + *
> + * Returns true on success, false on error.
> + *
> + * We assume all callers of this function are in debug mode, and do not
> + * want to synthesize, say, a user-mode load, on each page in the address
> + * space.
> + */
> +bool for_each_pte(CPUState *cs, qemu_page_walker_for_each fn, void *data,
> + bool visit_interior_nodes, bool visit_not_present,
> + bool visit_malformed, int mmu_idx);
> +
None of this should be in hw/core/cpu.h.
Isolate it somewhere else, not included by 80% of QEMU.
> diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
> index 4c94e51267..d0e939def8 100644
> --- a/include/hw/core/sysemu-cpu-ops.h
> +++ b/include/hw/core/sysemu-cpu-ops.h
> @@ -12,6 +12,43 @@
>
> #include "hw/core/cpu.h"
>
> +/*
> + * struct mem_print_state: Used by qmp in walking page tables.
> + */
> +struct mem_print_state {
> + GString *buf;
> + CPUArchState *env;
> + int vaw, paw; /* VA and PA width in characters */
> + int max_height;
> + int mmu_idx; /* 0 == user mode, 1 == nested page table */
> + bool (*flusher)(CPUState *cs, struct mem_print_state *state);
> + bool flush_interior; /* If false, only call flusher() on leaves */
> + bool require_physical_contiguity;
> + /*
> + * The height at which we started accumulating ranges, i.e., the
> + * next height we need to print once we hit the end of a
> + * contiguous range.
> + */
> + int start_height;
> + int leaf_height; /* The height at which we found a leaf, or -1 */
> + /*
> + * For compressing contiguous ranges, track the
> + * start and end of the range
> + */
> + hwaddr vstart[MAX_HEIGHT + 1]; /* Starting virt. addr. of open pte range */
> + hwaddr vend[MAX_HEIGHT + 1]; /* Ending virtual address of open pte range */
> + hwaddr pstart; /* Starting physical address of open pte range */
> + hwaddr pend; /* Ending physical address of open pte range */
> +
> + /* PTE protection flags current root->leaf path */
> + uint64_t prot[MAX_HEIGHT + 1];
> +
> + /* Page size (leaf) or address range covered (non-leaf). */
> + uint64_t pg_size[MAX_HEIGHT + 1];
> + int offset[MAX_HEIGHT + 1]; /* PTE range starting offsets */
> + int last_offset[MAX_HEIGHT + 1]; /* PTE range ending offsets */
> +};
Likewise.
Also, CamelCase for typedefs, per coding style.
> + if (env->hflags & HF_GUEST_MASK) {
> +
> + /* Extract the EPTP value from vmcs12 structure, store in arch state */
> + if (env->nested_state->format == KVM_STATE_NESTED_FORMAT_VMX) {
> + struct vmcs12 *vmcs =
> + (struct vmcs12 *) env->nested_state->data.vmx->vmcs12;
This is not required. You appear to be confused by nested paging.
First: nested paging is how hardware virtualization works. When we are *using* hardware
virtualization, all of that is the kernel's job. Our job as hypervisor is to give a bag
of pages to the kernel and have it map them into the guest intermediate address space.
When we are *using* hardware virtualization, we are only ever concerned with one level of
paging: from the guest to the intermediate address space. From there we use QEMU data
structures to map to QEMU virtual address space (address_space_ld/st, etc).
This is all we will ever see from KVM, HVF etc.
With TCG, we can *emulate* hardware virtualization. It is at this point where we are
concerned about two levels of paging, because QEMU is handling both.
r~
next prev parent reply other threads:[~2024-07-24 3:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 1:05 [PATCH v4 0/7] Rework x86 page table walks Don Porter
2024-07-23 1:05 ` [PATCH v4 1/7] Code motion: expose some TCG definitions for page table walk consolidation Don Porter
2024-07-24 3:14 ` Richard Henderson
2024-07-23 1:05 ` [PATCH v4 2/7] Import vmcs12 definition from Linux/KVM Don Porter
2024-07-24 1:34 ` Dr. David Alan Gilbert
2024-07-24 3:18 ` Richard Henderson
2024-07-23 1:05 ` [PATCH v4 3/7] Add an "info pg" command that prints the current page tables Don Porter
2024-07-24 3:33 ` Richard Henderson [this message]
2024-07-26 20:07 ` Don Porter
2024-07-23 1:05 ` [PATCH v4 4/7] Convert 'info tlb' to use generic iterator Don Porter
2024-07-27 20:47 ` Dr. David Alan Gilbert
2024-07-23 1:05 ` [PATCH v4 5/7] Convert 'info mem' " Don Porter
2024-07-23 1:05 ` [PATCH v4 6/7] Convert x86_cpu_get_memory_mapping() to use generic iterators Don Porter
2024-07-23 1:05 ` [PATCH v4 7/7] Convert x86_mmu_translate() to use common code Don Porter
2024-07-24 3:39 ` [PATCH v4 0/7] Rework x86 page table walks Richard Henderson
2024-08-02 16:53 ` Don Porter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d3316e48-e601-4a43-b77e-2bf127d91a7a@linaro.org \
--to=richard.henderson@linaro.org \
--cc=berrange@redhat.com \
--cc=dave@treblig.org \
--cc=nadav.amit@gmail.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=porter@cs.unc.edu \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).