qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: Don Porter <porter@cs.unc.edu>, qemu-devel@nongnu.org
Cc: dave@treblig.org, peter.maydell@linaro.org, nadav.amit@gmail.com,
	philmd@linaro.org
Subject: Re: [PATCH v3 1/6] Add an "info pg" command that prints the current page tables
Date: Fri, 7 Jun 2024 09:57:05 -0700	[thread overview]
Message-ID: <101886bb-12f2-43f9-8a7b-d2bf8e8f596c@linaro.org> (raw)
In-Reply-To: <20240606140253.2277760-2-porter@cs.unc.edu>

On 6/6/24 07:02, Don Porter wrote:
> +/**
> + * _for_each_pte - recursive helper function
> + *
> + * @cs - CPU state
> + * @fn(cs, data, pte, vaddr, height) - User-provided function to call on each
> + *                                     pte.
> + *   * @cs - pass through cs
> + *   * @data - user-provided, opaque pointer
> + *   * @pte - current pte
> + *   * @vaddr_in - virtual address translated by pte
> + *   * @height - height in the tree of pte
> + * @data - user-provided, opaque pointer, passed to fn()
> + * @visit_interior_nodes - if true, call fn() on page table entries in
> + *                         interior nodes.  If false, only call fn() on page
> + *                         table entries in leaves.
> + * @visit_not_present - if true, call fn() on entries that are not present.
> + *                         if false, visit only present entries.
> + * @node - The physical address of the current page table radix tree node
> + * @vaddr_in - The virtual address bits translated in walking the page
> + *          table to node
> + * @height - The height of node in the radix tree
> + *
> + * height starts at the max and counts down.
> + * In a 4 level x86 page table, pml4e is level 4, pdpe is level 3,
> + *  pde is level 2, and pte is level 1
> + *
> + * Returns true on success, false on error.
> + */
> +static bool
> +_for_each_pte(CPUState *cs,

External identifiers beginning with "_[a-z]" are reserved.  While this is not external, 
and so is technically ok, it is still not a great idea.  Use for_each_pte_$SUFFIX, with 
SUFFIX as int, internal, recurse or something.


> +              int (*fn)(CPUState *cs, void *data, PTE_t *pte,
> +                        vaddr vaddr_in, int height, int offset),
> +              void *data, bool visit_interior_nodes,
> +              bool visit_not_present, hwaddr node,
> +              vaddr vaddr_in, int height)
> +{
> +    int ptes_per_node;
> +    int i;
> +
> +    assert(height > 0);
> +
> +    CPUClass *cc = CPU_GET_CLASS(cs);
> +
> +    if ((!cc->sysemu_ops->page_table_entries_per_node)
> +        || (!cc->sysemu_ops->get_pte)
> +        || (!cc->sysemu_ops->pte_present)
> +        || (!cc->sysemu_ops->pte_leaf)
> +        || (!cc->sysemu_ops->pte_child)) {
> +        return false;
> +    }

Probably move this check to the non-recursive function, since it only needs to be done 
once.  Following a check for page_table_root, should the rest be asserts?  I.e. if a 
target implements one hook, it must implement them all?

CPU_GET_CLASS is non-trivial: it is cached in cs->cc.
It seems like you should cache cs->cc->sysemu_ops in a local to avoid constantly reloading 
the pointer across the loop.

> +    ptes_per_node = cc->sysemu_ops->page_table_entries_per_node(cs, height);

It would be better if the entire page table format were computed by page_table_root. 
Perhaps fill in a whole structure rather than just height.

> +    for (i = 0; i < ptes_per_node; i++) {
> +        PTE_t pt_entry;
> +        vaddr vaddr_i;
> +        bool pte_present;
> +
> +        cc->sysemu_ops->get_pte(cs, node, i, height, &pt_entry, vaddr_in,
> +                                &vaddr_i, NULL);
> +        pte_present = cc->sysemu_ops->pte_present(cs, &pt_entry);
> +
> +        if (pte_present || visit_not_present) {
> +            if ((!pte_present) || cc->sysemu_ops->pte_leaf(cs, height,
> +                                                           &pt_entry)) {

Again, better to fill in a structure in get_pte rather than make 4 calls for vaddr, 
present, leaf, child.

Drop the unnecessary (), e.g. around !pte_present.

> +/* Intended to become a generic PTE type */
> +typedef union PTE {
> +    uint64_t pte64_t;
> +    uint32_t pte32_t;
> +} PTE_t;

While not yet supported by qemu, the latest Arm v9.4 document contains a 128-bit PTE.
Don't give the tag and the typedef different names.
Avoid "_t" as that's POSIX reserved namespace.

I know naming is hard, but there are many kinds of PTE in qemu -- better to use something 
more descriptive.  Perhaps "QemuPageWalkerPTE"?

> +bool for_each_pte(CPUState *cs,
> +                  int (*fn)(CPUState *cs, void *data, PTE_t *pte, vaddr vaddr,
> +                            int height, int offset), void *data,
> +                  bool visit_interior_nodes, bool visit_not_present);

Use a typedef for the callback function.
qemu_page_walker_for_each?

> +
> +
>   /**
>    * CPUDumpFlags:
>    * @CPU_DUMP_CODE:
> diff --git a/include/hw/core/sysemu-cpu-ops.h b/include/hw/core/sysemu-cpu-ops.h
> index 24d003fe04..eb16a1c3e2 100644
> --- a/include/hw/core/sysemu-cpu-ops.h
> +++ b/include/hw/core/sysemu-cpu-ops.h
> @@ -12,6 +12,39 @@
>   
>   #include "hw/core/cpu.h"
>   
> +/* Maximum supported page table height - currently x86 at 5 */
> +#define MAX_HEIGHT 5
> +
> +/*
> + * struct mem_print_state: Used by monitor in walking page tables.
> + */
> +struct mem_print_state {
> +    Monitor *mon;
> +    CPUArchState *env;
> +    int vaw, paw; /* VA and PA width in characters */
> +    int max_height;
> +    bool (*flusher)(CPUState *cs, struct mem_print_state *state);
> +    bool flush_interior; /* If false, only call flusher() on leaves */
> +    bool require_physical_contiguity;
> +    /*
> +     * The height at which we started accumulating ranges, i.e., the
> +     * next height we need to print once we hit the end of a
> +     * contiguous range.
> +     */
> +    int start_height;
> +    /*
> +     * For compressing contiguous ranges, track the
> +     * start and end of the range
> +     */
> +    hwaddr vstart[MAX_HEIGHT + 1]; /* Starting virt. addr. of open pte range */
> +    hwaddr vend[MAX_HEIGHT + 1]; /* Ending virtual address of open pte range */
> +    hwaddr pstart; /* Starting physical address of open pte range */
> +    hwaddr pend; /* Ending physical address of open pte range */
> +    int64_t ent[MAX_HEIGHT + 1]; /* PTE contents on current root->leaf path */

PTE_t (or the rename)?

> +    int offset[MAX_HEIGHT + 1]; /* PTE range starting offsets */
> +    int last_offset[MAX_HEIGHT + 1]; /* PTE range ending offsets */
> +};
> +
>   /*
>    * struct SysemuCPUOps: System operations specific to a CPU class
>    */
> @@ -87,6 +120,129 @@ typedef struct SysemuCPUOps {
>        */
>       const VMStateDescription *legacy_vmsd;
>   
> +    /**
> +     * page_table_root - Given a CPUState, return the physical address
> +     *                    of the current page table root, as well as
> +     *                    write the height of the tree into *height.
> +     *
> +     * @cs - CPU state
> +
> +     * @height - a pointer to an integer, to store the page table tree
> +     *           height
> +     *
> +     * Returns a hardware address on success.  Should not fail (i.e.,
> +     * caller is responsible to ensure that a page table is actually
> +     * present).
> +     */
> +    hwaddr (*page_table_root)(CPUState *cs, int *height);

There may be two page table roots on Arm, depending on the cpu state and the virtual 
address.  The shape of the two page tables are independent so...


> +
> +    /**
> +     * page_table_entries_per_node - Return the number of entries in a
> +     *                                   page table node for the CPU
> +     *                                   at a given height.
> +     *
> +     * @cs - CPU state
> +     * @height - height of the page table tree to query, where the leaves
> +     *          are 1.
> +     *
> +     * Returns a value greater than zero on success, -1 on error.
> +     */
> +    int (*page_table_entries_per_node)(CPUState *cs, int height);

... implementing this independently from page_table_root is not possible.


> +
> +    /**
> +     * @mon_init_page_table_iterator: Callback to configure a page table
> +     * iterator for use by a monitor function.
> +     * Returns true on success, false if not supported (e.g., no paging disabled
> +     * or not implemented on this CPU).
> +     */
> +    bool (*mon_init_page_table_iterator)(Monitor *mon,
> +                                         struct mem_print_state *state);

I don't understand the purpose of this one as a target-specific hook.

> +    /**
> +     * @flush_page_table_iterator_state: Prints the last entry,
> +     * if one is present.  Useful for iterators that aggregate information
> +     * across page table entries.
> +     */
> +    bool (*mon_flush_page_print_state)(CPUState *cs,
> +                                       struct mem_print_state *state);

Is this specific to "info pg" or not?
It appears to be, but the description suggests it is not.


r~


  parent reply	other threads:[~2024-06-08  0:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06 14:02 [PATCH v3 0/6] Rework x86 page table walks Don Porter
2024-06-06 14:02 ` [PATCH v3 1/6] Add an "info pg" command that prints the current page tables Don Porter
2024-06-07  6:09   ` Philippe Mathieu-Daudé
2024-06-07  7:16   ` Daniel P. Berrangé
2024-06-11 18:49     ` Don Porter
2024-06-07 16:57   ` Richard Henderson [this message]
2024-06-14 18:16     ` Don Porter
2024-06-07 17:43   ` Richard Henderson
2024-06-14 21:14     ` Don Porter
2024-06-15 15:34       ` Richard Henderson
2024-06-06 14:02 ` [PATCH v3 2/6] Convert 'info tlb' to use generic iterator Don Porter
2024-06-07  6:02   ` Philippe Mathieu-Daudé
2024-06-10  9:13     ` Daniel P. Berrangé
2024-06-06 14:02 ` [PATCH v3 3/6] Convert 'info mem' " Don Porter
2024-06-10  9:15   ` Daniel P. Berrangé
2024-06-06 14:02 ` [PATCH v3 4/6] Convert x86_cpu_get_memory_mapping() to use generic iterators Don Porter
2024-06-06 14:02 ` [PATCH v3 5/6] Move tcg implementation of x86 get_physical_address into common helper code Don Porter
2024-06-07  6:12   ` Philippe Mathieu-Daudé
2024-06-07 17:03   ` Richard Henderson
2024-06-15 12:49     ` Don Porter
2024-06-06 14:02 ` [PATCH v3 6/6] Convert x86_mmu_translate() to use common code Don Porter
2024-06-07 17:28   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=101886bb-12f2-43f9-8a7b-d2bf8e8f596c@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=dave@treblig.org \
    --cc=nadav.amit@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=porter@cs.unc.edu \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).