All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for 4.20? v3 0/3] Fixes for vmap_to_mfn() and pt_mapping_level
@ 2025-02-07 13:13 Oleksii Kurochko
  2025-02-07 13:13 ` [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-07 13:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

Introduce pt_walk(), which does software page table walking to resolve the
following issues:
1. vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA
   from either the direct map region or Xen's linkage region (XEN_VIRT_START),
   thereby an assertion will occur if it is used with other regions, in
   particular for the VMAP region. The solution is usage of pt_walk() for
   vmap_to_mfn().
2. pt_mapping_level() returns incorrect page table level in the case when
   mfn==INVALID_MFN when, for example, VA was mapped to PA using 4k mapping,
   but during destroying/modification pt_mapping_level() could return incorrect
   page table level as when mfn==INVALID_MFN then only VA is taking into account
   for page table level calculation and so if VA is page table level 1 aligned
   then page_mapping_level() will return level 1 ( instead of level 0 as VA was
   mapped to PA using 4k mapping so there is incostinency here ).
   The solution is to set level=CONFIG_PAGING_TABLE to tell pt_update() algo
   that it should use pt_walk() to find proper page table entry instead of
   using for searching of page table entry based on precalculated by
   pt_mapping_level() `level` and `order` values.

It would be nice  to have these fixes in Xen 4.20 but isn't really critical as
there is no any users for RISC-V port at this moment.

---
Changes in v2-v3:
 - update the commit message.
 - other changes look in specific patch.
---

Oleksii Kurochko (3):
  xen/riscv: implement software page table walking
  xen/riscv: update defintion of vmap_to_mfn()
  xen/riscv: update mfn calculation in pt_mapping_level()

 xen/arch/riscv/include/asm/mm.h   |   2 +-
 xen/arch/riscv/include/asm/page.h |   9 ++
 xen/arch/riscv/pt.c               | 141 +++++++++++++++++++++++-------
 3 files changed, 120 insertions(+), 32 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-07 13:13 [PATCH for 4.20? v3 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
@ 2025-02-07 13:13 ` Oleksii Kurochko
  2025-02-10 16:32   ` Jan Beulich
  2025-02-07 13:13 ` [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
  2025-02-07 13:13 ` [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
  2 siblings, 1 reply; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-07 13:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

RISC-V doesn't have hardware feature to ask MMU to translate
virtual address to physical address ( like Arm has, for example ),
so software page table walking is implemented.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
 - Remove circular dependency.
 - Move declaration of pt_walk() to asm/page.h.
 - Revert other not connected to pt_walk() changes.
 - Update the commit message.
 - Drop unnessary anymore local variables of pt_walk().
 - Refactor pt_walk() to use for() loop instead of while() loop
   as it was suggested by Jan B.
 - Introduce _pt_walk() which returns pte_t * and update prototype
   of pt_walk() to return pte_t by value.
---
Changes in v2:
 - Update the code of pt_walk() to return pte_t instead of paddr_t.
 - Fix typos and drop blankets inside parentheses in the comment.
 - Update the `ret` handling; there is no need for `mfn` calculation
   anymore as pt_walk() returns or pte_t of a leaf node or non-present
   pte_t.
 - Drop the comment before unmap_table().
 - Drop local variable `pa` as pt_walk() is going to return pte_t
   instead of paddr_t.
 - Add the comment above pt_walk() to explain what it does and returns.
 - Add additional explanation about possible return values of pt_next_level()
   used inside while loop in pt_walk().
 - Change `pa` to `table` in the comment before while loop in pt_walk()
   as actually this loop finds a page table where paga table entry for `va`
   is located.
 - After including <asm/page.h> in <asm/mm.h>, the following compilation
   error occurs:
    ./arch/riscv/include/asm/cmpxchg.h:56:9: error: implicit declaration of
    function 'GENMASK'
   To resolve this, <xen/bitops.h> needs to be included at the top of
   <asm/cmpxchg.h>.
 - To avoid an issue with the implicit declaration of map_domain_page() and
   unmap_domain_page() after including <asm/page.h> in <asm/mm.h>, the
   implementation of flush_page_to_ram() has been moved to mm.c. (look for
   more detailed explanation in the commit message) As a result drop
   inclusion of <xen/domain.h> in <asm/page.h>.
 - Update the commit message.
---
 xen/arch/riscv/include/asm/page.h |  2 ++
 xen/arch/riscv/pt.c               | 51 +++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index 7a6174a109..0439a1a9ee 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -208,6 +208,8 @@ static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
     return (pte_t){ .pte = pte };
 }
 
+pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* ASM__RISCV__PAGE_H */
diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index a703e0f1bd..66cb976b55 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
     return XEN_TABLE_NORMAL;
 }
 
+/*
+ * _pt_walk() performs software page table walking and returns the pte_t of
+ * a leaf node or the leaf-most not-present pte_t if no leaf node is found
+ * for further analysis.
+ * Additionally, _pt_walk() returns the level of the found pte.
+ */
+static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+    const mfn_t root = get_root_page();
+    unsigned int level;
+    pte_t *table;
+
+    DECLARE_OFFSETS(offsets, va);
+
+    table = map_table(root);
+
+    /*
+     * Find `table` of an entry which corresponds to `va` by iterating for each
+     * page level and checking if the entry points to a next page table or
+     * to a page.
+     *
+     * Two cases are possible:
+     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
+     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
+     *   pt_next_level() is called for page table level 0, it results in the
+     *   entry being a pointer to a leaf node, thereby returning
+     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
+     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
+     *   mapped.
+     */
+    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
+    {
+        int ret = pt_next_level(false, &table, offsets[level]);
+
+        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
+            break;
+
+        ASSERT(level);
+    }
+
+    if ( pte_level )
+        *pte_level = level;
+
+    return table + offsets[level];
+}
+
+pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+    return *_pt_walk(va, pte_level);
+}
+
 /* Update an entry at the level @target. */
 static int pt_update_entry(mfn_t root, vaddr_t virt,
                            mfn_t mfn, unsigned int target,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-07 13:13 [PATCH for 4.20? v3 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
  2025-02-07 13:13 ` [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
@ 2025-02-07 13:13 ` Oleksii Kurochko
  2025-02-07 13:30   ` Jan Beulich
  2025-02-07 13:13 ` [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
  2 siblings, 1 reply; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-07 13:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA from
either the direct map region or Xen's linkage region (XEN_VIRT_START).
An assertion will occur if it is used with other regions, in particular for
the VMAP region.

Since RISC-V lacks a hardware feature to request the MMU to translate a VA to
a PA (as Arm does, for example), software page table walking (pt_walk()) is
used for the VMAP region to obtain the mfn from pte_t.

To avoid introduce a circular dependency between asm/mm.h and asm/page.h by
including each other, the macro _vmap_to_mfn() is introduced in asm/page.h,
as it uses struct pte_t and pte_is_mapping() from asm/page.h. _vmap_to_mfn()
macro is then reused in the definition of vmap_to_mfn() macro in asm/mm.h.

Fixes: 7db8d2bd9b ("xen/riscv: add minimal stuff to mm.h to build full Xen")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Move vmap_to_mfn_ to asm/page.h to deal with circular dependency.
- Convert vmap_to_mfn_() to macros.
- Rename vmap_to_mfn_ to _vmap_to_mfn.
- Update _vmap_to_mfn() to work with pte_t instead of pte_t*.
- Add parentheses around va argument for macros vmap_to_mfn().
- Updated the commit message.
---
Changes in v2:
 - Update defintion of vmap_to_mfn() as pt_walk() now returns pte_t
   instead of paddr_t.
 - Update the commit message.
---
 xen/arch/riscv/include/asm/mm.h   | 2 +-
 xen/arch/riscv/include/asm/page.h | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 292aa48fc1..4035cd400a 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -23,7 +23,7 @@ extern vaddr_t directmap_virt_start;
 #define gaddr_to_gfn(ga)    _gfn(paddr_to_pfn(ga))
 #define mfn_to_maddr(mfn)   pfn_to_paddr(mfn_x(mfn))
 #define maddr_to_mfn(ma)    _mfn(paddr_to_pfn(ma))
-#define vmap_to_mfn(va)     maddr_to_mfn(virt_to_maddr((vaddr_t)(va)))
+#define vmap_to_mfn(va)     _vmap_to_mfn((vaddr_t)(va))
 #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
 static inline void *maddr_to_virt(paddr_t ma)
diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index 0439a1a9ee..18ba0dd9df 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -210,6 +210,13 @@ static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
 
 pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
 
+#define _vmap_to_mfn(va)                \
+({                                      \
+    pte_t entry = pt_walk((va), NULL);  \
+    BUG_ON(!pte_is_mapping(entry));     \
+    mfn_from_pte(entry);                \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* ASM__RISCV__PAGE_H */
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-07 13:13 [PATCH for 4.20? v3 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
  2025-02-07 13:13 ` [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
  2025-02-07 13:13 ` [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
@ 2025-02-07 13:13 ` Oleksii Kurochko
  2025-02-10 16:53   ` Jan Beulich
  2 siblings, 1 reply; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-07 13:13 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

When pt_update() is called with arguments (..., INVALID_MFN, ..., 0 or 1),
it indicates that a mapping is being destroyed/modifyed.

In the case when modifying or destroying a mapping, it is necessary to
search until a leaf node is found, instead of searching for a page table
entry based on the precalculated `level` and `order`(look at pt_update()).
This is because when `mfn` == INVALID_MFN, the `mask` (in pt_mapping_level())
will take into account only `vfn`, which could accidentally return an
incorrect level, leading to the discovery of an incorrect page table entry.

For example, if `vfn` is page table level 1 aligned, but it was mapped as
page table level 0, then pt_mapping_level() will return `level` = 1, since
only `vfn` (which is page table level 1 aligned) is taken into account when
`mfn` == INVALID_MFN (look at pt_mapping_level()).

Fixes: c2f1ded524 ("xen/riscv: page table handling")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Drop ASSERT() for order as it isn't needed anymore.
- Drop PTE_LEAF_SEARCH and use instead level=CONFIG_PAGING_LEVELS;
  refactor connected code correspondingly.
- Calculate order once.
- Drop initializer for local variable order.
- Drop BUG_ON(!pte_is_mapping(*entry)) for the case when leaf searching
  happens as there is a similar check in pt_check_entry(). Look at
  pt.c:41 and pt.c:75.
---
Changes in v2:
 - Introduce PTE_LEAF_SEARCH to tell page table update operation to
   walk down to wherever the leaf entry is.
 - Use introduced PTE_LEAF_SEARCH to not searching pte_t entry twice.
 - Update the commit message.
---
 xen/arch/riscv/pt.c | 90 +++++++++++++++++++++++++++++----------------
 1 file changed, 59 insertions(+), 31 deletions(-)

diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index 66cb976b55..8c15a48f60 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -238,11 +238,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
 
 /* Update an entry at the level @target. */
 static int pt_update_entry(mfn_t root, vaddr_t virt,
-                           mfn_t mfn, unsigned int target,
+                           mfn_t mfn, unsigned int *target,
                            unsigned int flags)
 {
     int rc;
-    unsigned int level = HYP_PT_ROOT_LEVEL;
     pte_t *table;
     /*
      * The intermediate page table shouldn't be allocated when MFN isn't
@@ -256,39 +255,45 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
     bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
     pte_t pte, *entry;
 
-    /* convenience aliases */
-    DECLARE_OFFSETS(offsets, virt);
-
-    table = map_table(root);
-    for ( ; level > target; level-- )
+    if ( *target == CONFIG_PAGING_LEVELS )
+        entry = _pt_walk(virt, target);
+    else
     {
-        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
-        if ( rc == XEN_TABLE_MAP_NOMEM )
+        unsigned int level = HYP_PT_ROOT_LEVEL;
+        /* convenience aliases */
+        DECLARE_OFFSETS(offsets, virt);
+
+        table = map_table(root);
+        for ( ; level > *target; level-- )
         {
-            rc = -ENOMEM;
-            goto out;
+            rc = pt_next_level(alloc_tbl, &table, offsets[level]);
+            if ( rc == XEN_TABLE_MAP_NOMEM )
+            {
+                rc = -ENOMEM;
+                goto out;
+            }
+
+            if ( rc == XEN_TABLE_MAP_NONE )
+            {
+                rc = 0;
+                goto out;
+            }
+
+            if ( rc != XEN_TABLE_NORMAL )
+                break;
         }
 
-        if ( rc == XEN_TABLE_MAP_NONE )
+        if ( level != *target )
         {
-            rc = 0;
+            dprintk(XENLOG_ERR,
+                    "%s: Shattering superpage is not supported\n", __func__);
+            rc = -EOPNOTSUPP;
             goto out;
         }
 
-        if ( rc != XEN_TABLE_NORMAL )
-            break;
-    }
-
-    if ( level != target )
-    {
-        dprintk(XENLOG_ERR,
-                "%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
+        entry = table + offsets[level];
     }
 
-    entry = table + offsets[level];
-
     rc = -EINVAL;
     if ( !pt_check_entry(*entry, mfn, flags) )
         goto out;
@@ -413,17 +418,40 @@ static int pt_update(vaddr_t virt, mfn_t mfn,
 
     while ( left )
     {
-        unsigned int order, level;
-
-        level = pt_mapping_level(vfn, mfn, left, flags);
-        order = XEN_PT_LEVEL_ORDER(level);
+        unsigned int order, level = CONFIG_PAGING_LEVELS;
 
-        ASSERT(left >= BIT(order, UL));
+        /*
+         * In the case when modifying or destroying a mapping, it is necessary
+         * to search until a leaf node is found, instead of searching for
+         * a page table entry based on the precalculated `level` and `order`
+         * (look at pt_update()).
+         * This is because when `mfn` == INVALID_MFN, the `mask`(in
+         * pt_mapping_level()) will take into account only `vfn`, which could
+         * accidentally return an incorrect level, leading to the discovery of
+         * an incorrect page table entry.
+         *
+         * For example, if `vfn` is page table level 1 aligned, but it was
+         * mapped as page table level 0, then pt_mapping_level() will return
+         * `level` = 1, since only `vfn` (which is page table level 1 aligned)
+         * is taken into account when `mfn` == INVALID_MFN
+         * (look at pt_mapping_level()).
+         *
+         * To force searching until a leaf node is found is necessary to have
+         * `level` == CONFIG_PAGING_LEVELS which is a default value for
+         * `level`.
+         *
+         * For other cases (when a mapping is not being modified or destroyed),
+         * pt_mapping_level() should be used.
+         */
+        if ( !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE) )
+            level = pt_mapping_level(vfn, mfn, left, flags);
 
-        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
+        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, &level, flags);
         if ( rc )
             break;
 
+        order = XEN_PT_LEVEL_ORDER(level);
+
         vfn += 1UL << order;
         if ( !mfn_eq(mfn, INVALID_MFN) )
             mfn = mfn_add(mfn, 1UL << order);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-07 13:13 ` [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
@ 2025-02-07 13:30   ` Jan Beulich
  2025-02-10  8:46     ` Oleksii Kurochko
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2025-02-07 13:30 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 07.02.2025 14:13, Oleksii Kurochko wrote:
> vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA from
> either the direct map region or Xen's linkage region (XEN_VIRT_START).
> An assertion will occur if it is used with other regions, in particular for
> the VMAP region.
> 
> Since RISC-V lacks a hardware feature to request the MMU to translate a VA to
> a PA (as Arm does, for example), software page table walking (pt_walk()) is
> used for the VMAP region to obtain the mfn from pte_t.
> 
> To avoid introduce a circular dependency between asm/mm.h and asm/page.h by
> including each other, the macro _vmap_to_mfn() is introduced in asm/page.h,
> as it uses struct pte_t and pte_is_mapping() from asm/page.h. _vmap_to_mfn()
> macro is then reused in the definition of vmap_to_mfn() macro in asm/mm.h.
> 
> Fixes: 7db8d2bd9b ("xen/riscv: add minimal stuff to mm.h to build full Xen")
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> ---
> Changes in v3:
> - Move vmap_to_mfn_ to asm/page.h to deal with circular dependency.
> - Convert vmap_to_mfn_() to macros.

Why both?

> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -210,6 +210,13 @@ static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
>  
>  pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
>  
> +#define _vmap_to_mfn(va)                \
> +({                                      \
> +    pte_t entry = pt_walk((va), NULL);  \

If this is to remain a macro, va doesn't need parenthesizing (as the argument
passed is just the identifier, not an expression.

Be careful with the naming of macro local variables. Consider a use size (for
whatever reason) having

    unsigned long entry;
    ...
    mfn = vmap_to_mfn(entry);

This is where appending an underscore comes into play.

Jan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-07 13:30   ` Jan Beulich
@ 2025-02-10  8:46     ` Oleksii Kurochko
  0 siblings, 0 replies; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-10  8:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]


On 2/7/25 2:30 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA from
>> either the direct map region or Xen's linkage region (XEN_VIRT_START).
>> An assertion will occur if it is used with other regions, in particular for
>> the VMAP region.
>>
>> Since RISC-V lacks a hardware feature to request the MMU to translate a VA to
>> a PA (as Arm does, for example), software page table walking (pt_walk()) is
>> used for the VMAP region to obtain the mfn from pte_t.
>>
>> To avoid introduce a circular dependency between asm/mm.h and asm/page.h by
>> including each other, the macro _vmap_to_mfn() is introduced in asm/page.h,
>> as it uses struct pte_t and pte_is_mapping() from asm/page.h. _vmap_to_mfn()
>> macro is then reused in the definition of vmap_to_mfn() macro in asm/mm.h.
>>
>> Fixes: 7db8d2bd9b ("xen/riscv: add minimal stuff to mm.h to build full Xen")
>> Signed-off-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
>> ---
>> Changes in v3:
>> - Move vmap_to_mfn_ to asm/page.h to deal with circular dependency.
>> - Convert vmap_to_mfn_() to macros.
> Why both?

Just for consistency. vmap_to_mfn_() could be defined as static inline, I can
send the new patch version with such changes, probably, static inline would be
better in this case.

>> --- a/xen/arch/riscv/include/asm/page.h
>> +++ b/xen/arch/riscv/include/asm/page.h
>> @@ -210,6 +210,13 @@ static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
>>   
>>   pte_t pt_walk(vaddr_t va, unsigned int *pte_level);
>>   
>> +#define _vmap_to_mfn(va)                \
>> +({                                      \
>> +    pte_t entry = pt_walk((va), NULL);  \
> If this is to remain a macro, va doesn't need parenthesizing (as the argument
> passed is just the identifier, not an expression.
>
> Be careful with the naming of macro local variables. Consider a use size (for
> whatever reason) having
>
>      unsigned long entry;
>      ...
>      mfn = vmap_to_mfn(entry);
>
> This is where appending an underscore comes into play.

This could be another reason to use|static inline _vmap_to_mfn(...)| instead of a macro.

I think I will rewrite the|_vmap_to_mfn()| macro as a|static inline| function in the next
patch series version. I'll send it after receiving comments on the entire patch series.

Thanks.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 3255 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-07 13:13 ` [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
@ 2025-02-10 16:32   ` Jan Beulich
  2025-02-11  9:15     ` Oleksii Kurochko
  2025-02-12 11:13     ` Oleksii Kurochko
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Beulich @ 2025-02-10 16:32 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 07.02.2025 14:13, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>      return XEN_TABLE_NORMAL;
>  }
>  
> +/*
> + * _pt_walk() performs software page table walking and returns the pte_t of
> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
> + * for further analysis.
> + * Additionally, _pt_walk() returns the level of the found pte.

That's optional, which I think wants expressing here.

> + */
> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
> +{
> +    const mfn_t root = get_root_page();
> +    unsigned int level;
> +    pte_t *table;
> +
> +    DECLARE_OFFSETS(offsets, va);
> +
> +    table = map_table(root);

This mapping operation doesn't look to have a counterpart. Aiui ...

> +    /*
> +     * Find `table` of an entry which corresponds to `va` by iterating for each
> +     * page level and checking if the entry points to a next page table or
> +     * to a page.
> +     *
> +     * Two cases are possible:
> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
> +     *   pt_next_level() is called for page table level 0, it results in the
> +     *   entry being a pointer to a leaf node, thereby returning
> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
> +     *   mapped.
> +     */
> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
> +    {
> +        int ret = pt_next_level(false, &table, offsets[level]);

... the mapping may be replaced here, but a new mapping will then still
be held by this function and ...

> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
> +            break;
> +
> +        ASSERT(level);
> +    }
> +
> +    if ( pte_level )
> +        *pte_level = level;
> +
> +    return table + offsets[level];
> +}

... the final one then be transferred to the caller.

> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
> +{
> +    return *_pt_walk(va, pte_level);
> +}

Hence aiui there needs to be an unmap operation here.

Jan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-07 13:13 ` [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
@ 2025-02-10 16:53   ` Jan Beulich
  2025-02-11  9:22     ` Oleksii Kurochko
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2025-02-10 16:53 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 07.02.2025 14:13, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -238,11 +238,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>  
>  /* Update an entry at the level @target. */
>  static int pt_update_entry(mfn_t root, vaddr_t virt,
> -                           mfn_t mfn, unsigned int target,
> +                           mfn_t mfn, unsigned int *target,
>                             unsigned int flags)
>  {
>      int rc;
> -    unsigned int level = HYP_PT_ROOT_LEVEL;
>      pte_t *table;

Considering the lack of an initializer here, ...

> @@ -256,39 +255,45 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>      bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
>      pte_t pte, *entry;
>  
> -    /* convenience aliases */
> -    DECLARE_OFFSETS(offsets, virt);
> -
> -    table = map_table(root);
> -    for ( ; level > target; level-- )
> +    if ( *target == CONFIG_PAGING_LEVELS )
> +        entry = _pt_walk(virt, target);
> +    else
>      {
> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> -        if ( rc == XEN_TABLE_MAP_NOMEM )
> +        unsigned int level = HYP_PT_ROOT_LEVEL;
> +        /* convenience aliases */
> +        DECLARE_OFFSETS(offsets, virt);
> +
> +        table = map_table(root);
> +        for ( ; level > *target; level-- )
>          {
> -            rc = -ENOMEM;
> -            goto out;
> +            rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> +            if ( rc == XEN_TABLE_MAP_NOMEM )
> +            {
> +                rc = -ENOMEM;
> +                goto out;
> +            }
> +
> +            if ( rc == XEN_TABLE_MAP_NONE )
> +            {
> +                rc = 0;
> +                goto out;
> +            }
> +
> +            if ( rc != XEN_TABLE_NORMAL )
> +                break;
>          }
>  
> -        if ( rc == XEN_TABLE_MAP_NONE )
> +        if ( level != *target )
>          {
> -            rc = 0;
> +            dprintk(XENLOG_ERR,
> +                    "%s: Shattering superpage is not supported\n", __func__);
> +            rc = -EOPNOTSUPP;
>              goto out;
>          }
>  
> -        if ( rc != XEN_TABLE_NORMAL )
> -            break;
> -    }
> -
> -    if ( level != target )
> -    {
> -        dprintk(XENLOG_ERR,
> -                "%s: Shattering superpage is not supported\n", __func__);
> -        rc = -EOPNOTSUPP;
> -        goto out;
> +        entry = table + offsets[level];
>      }
>  
> -    entry = table + offsets[level];
> -
>      rc = -EINVAL;
>      if ( !pt_check_entry(*entry, mfn, flags) )
>          goto out;

... I'm surprised the compiler doesn't complain about use of a possibly
uninitialized variable at

 out:
    unmap_table(table);

For the new path you're adding the variable is uninitialized afaict.
Which implies that you're again failing to unmap what _pt_walk() has
handed you.

Jan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-10 16:32   ` Jan Beulich
@ 2025-02-11  9:15     ` Oleksii Kurochko
  2025-02-12 11:13     ` Oleksii Kurochko
  1 sibling, 0 replies; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-11  9:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]


On 2/10/25 5:32 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>       return XEN_TABLE_NORMAL;
>>   }
>>   
>> +/*
>> + * _pt_walk() performs software page table walking and returns the pte_t of
>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>> + * for further analysis.
>> + * Additionally, _pt_walk() returns the level of the found pte.
> That's optional, which I think wants expressing here.
>
>> + */
>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    const mfn_t root = get_root_page();
>> +    unsigned int level;
>> +    pte_t *table;
>> +
>> +    DECLARE_OFFSETS(offsets, va);
>> +
>> +    table = map_table(root);
> This mapping operation doesn't look to have a counterpart. Aiui ...
>
>> +    /*
>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>> +     * page level and checking if the entry points to a next page table or
>> +     * to a page.
>> +     *
>> +     * Two cases are possible:
>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>> +     *   pt_next_level() is called for page table level 0, it results in the
>> +     *   entry being a pointer to a leaf node, thereby returning
>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>> +     *   mapped.
>> +     */
>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>> +    {
>> +        int ret = pt_next_level(false, &table, offsets[level]);
> ... the mapping may be replaced here, but a new mapping will then still
> be held by this function and ...
>
>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>> +            break;
>> +
>> +        ASSERT(level);
>> +    }
>> +
>> +    if ( pte_level )
>> +        *pte_level = level;
>> +
>> +    return table + offsets[level];
>> +}
> ... the final one then be transferred to the caller.
>
>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    return *_pt_walk(va, pte_level);
>> +}
> Hence aiui there needs to be an unmap operation here.

Agree, it should be an unmap here. I will update that in the next patch version.

Thanks.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 3473 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-10 16:53   ` Jan Beulich
@ 2025-02-11  9:22     ` Oleksii Kurochko
  0 siblings, 0 replies; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-11  9:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3251 bytes --]


On 2/10/25 5:53 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -238,11 +238,10 @@ pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>   
>>   /* Update an entry at the level @target. */
>>   static int pt_update_entry(mfn_t root, vaddr_t virt,
>> -                           mfn_t mfn, unsigned int target,
>> +                           mfn_t mfn, unsigned int *target,
>>                              unsigned int flags)
>>   {
>>       int rc;
>> -    unsigned int level = HYP_PT_ROOT_LEVEL;
>>       pte_t *table;
> Considering the lack of an initializer here, ...
>
>> @@ -256,39 +255,45 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>>       bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
>>       pte_t pte, *entry;
>>   
>> -    /* convenience aliases */
>> -    DECLARE_OFFSETS(offsets, virt);
>> -
>> -    table = map_table(root);
>> -    for ( ; level > target; level-- )
>> +    if ( *target == CONFIG_PAGING_LEVELS )
>> +        entry = _pt_walk(virt, target);
>> +    else
>>       {
>> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>> -        if ( rc == XEN_TABLE_MAP_NOMEM )
>> +        unsigned int level = HYP_PT_ROOT_LEVEL;
>> +        /* convenience aliases */
>> +        DECLARE_OFFSETS(offsets, virt);
>> +
>> +        table = map_table(root);
>> +        for ( ; level > *target; level-- )
>>           {
>> -            rc = -ENOMEM;
>> -            goto out;
>> +            rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>> +            if ( rc == XEN_TABLE_MAP_NOMEM )
>> +            {
>> +                rc = -ENOMEM;
>> +                goto out;
>> +            }
>> +
>> +            if ( rc == XEN_TABLE_MAP_NONE )
>> +            {
>> +                rc = 0;
>> +                goto out;
>> +            }
>> +
>> +            if ( rc != XEN_TABLE_NORMAL )
>> +                break;
>>           }
>>   
>> -        if ( rc == XEN_TABLE_MAP_NONE )
>> +        if ( level != *target )
>>           {
>> -            rc = 0;
>> +            dprintk(XENLOG_ERR,
>> +                    "%s: Shattering superpage is not supported\n", __func__);
>> +            rc = -EOPNOTSUPP;
>>               goto out;
>>           }
>>   
>> -        if ( rc != XEN_TABLE_NORMAL )
>> -            break;
>> -    }
>> -
>> -    if ( level != target )
>> -    {
>> -        dprintk(XENLOG_ERR,
>> -                "%s: Shattering superpage is not supported\n", __func__);
>> -        rc = -EOPNOTSUPP;
>> -        goto out;
>> +        entry = table + offsets[level];
>>       }
>>   
>> -    entry = table + offsets[level];
>> -
>>       rc = -EINVAL;
>>       if ( !pt_check_entry(*entry, mfn, flags) )
>>           goto out;
> ... I'm surprised the compiler doesn't complain about use of a possibly
> uninitialized variable at
>
>   out:
>      unmap_table(table);
>
> For the new path you're adding the variable is uninitialized afaict.
> Which implies that you're again failing to unmap what _pt_walk() has
> handed you.

Thanks, unmapping of table and entry (in the case of the new patch) should be
really updated. I'll take care of it in the next patch version.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 3659 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-10 16:32   ` Jan Beulich
  2025-02-11  9:15     ` Oleksii Kurochko
@ 2025-02-12 11:13     ` Oleksii Kurochko
  2025-02-12 11:27       ` Jan Beulich
  1 sibling, 1 reply; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-12 11:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2916 bytes --]


On 2/10/25 5:32 PM, Jan Beulich wrote:
> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>       return XEN_TABLE_NORMAL;
>>   }
>>   
>> +/*
>> + * _pt_walk() performs software page table walking and returns the pte_t of
>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>> + * for further analysis.
>> + * Additionally, _pt_walk() returns the level of the found pte.
> That's optional, which I think wants expressing here.
>
>> + */
>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    const mfn_t root = get_root_page();
>> +    unsigned int level;
>> +    pte_t *table;
>> +
>> +    DECLARE_OFFSETS(offsets, va);
>> +
>> +    table = map_table(root);
> This mapping operation doesn't look to have a counterpart. Aiui ...
>
>> +    /*
>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>> +     * page level and checking if the entry points to a next page table or
>> +     * to a page.
>> +     *
>> +     * Two cases are possible:
>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>> +     *   pt_next_level() is called for page table level 0, it results in the
>> +     *   entry being a pointer to a leaf node, thereby returning
>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>> +     *   mapped.
>> +     */
>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>> +    {
>> +        int ret = pt_next_level(false, &table, offsets[level]);
> ... the mapping may be replaced here, but a new mapping will then still
> be held by this function and ...
>
>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>> +            break;
>> +
>> +        ASSERT(level);
>> +    }
>> +
>> +    if ( pte_level )
>> +        *pte_level = level;
>> +
>> +    return table + offsets[level];
>> +}
> ... the final one then be transferred to the caller.
>
>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>> +{
>> +    return *_pt_walk(va, pte_level);
>> +}
> Hence aiui there needs to be an unmap operation here.

As _pt_walk() returns page table entry, it is needed to transform entry to table.

Do we have any function in Xen for that?

Or the best I can do is:
   pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
(of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)

As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
  pte_t *entry = _pt_walk(va, pte_level);
  pte_t *table = PAGE_OFFSET(entry);

~ Oleksii


[-- Attachment #2: Type: text/html, Size: 3888 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-12 11:13     ` Oleksii Kurochko
@ 2025-02-12 11:27       ` Jan Beulich
  2025-02-12 12:15         ` Oleksii Kurochko
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2025-02-12 11:27 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 12.02.2025 12:13, Oleksii Kurochko wrote:
> 
> On 2/10/25 5:32 PM, Jan Beulich wrote:
>> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/pt.c
>>> +++ b/xen/arch/riscv/pt.c
>>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>>       return XEN_TABLE_NORMAL;
>>>   }
>>>   
>>> +/*
>>> + * _pt_walk() performs software page table walking and returns the pte_t of
>>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>>> + * for further analysis.
>>> + * Additionally, _pt_walk() returns the level of the found pte.
>> That's optional, which I think wants expressing here.
>>
>>> + */
>>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>>> +{
>>> +    const mfn_t root = get_root_page();
>>> +    unsigned int level;
>>> +    pte_t *table;
>>> +
>>> +    DECLARE_OFFSETS(offsets, va);
>>> +
>>> +    table = map_table(root);
>> This mapping operation doesn't look to have a counterpart. Aiui ...
>>
>>> +    /*
>>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>>> +     * page level and checking if the entry points to a next page table or
>>> +     * to a page.
>>> +     *
>>> +     * Two cases are possible:
>>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>>> +     *   pt_next_level() is called for page table level 0, it results in the
>>> +     *   entry being a pointer to a leaf node, thereby returning
>>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>>> +     *   mapped.
>>> +     */
>>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>>> +    {
>>> +        int ret = pt_next_level(false, &table, offsets[level]);
>> ... the mapping may be replaced here, but a new mapping will then still
>> be held by this function and ...
>>
>>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>>> +            break;
>>> +
>>> +        ASSERT(level);
>>> +    }
>>> +
>>> +    if ( pte_level )
>>> +        *pte_level = level;
>>> +
>>> +    return table + offsets[level];
>>> +}
>> ... the final one then be transferred to the caller.
>>
>>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>> +{
>>> +    return *_pt_walk(va, pte_level);
>>> +}
>> Hence aiui there needs to be an unmap operation here.
> 
> As _pt_walk() returns page table entry, it is needed to transform entry to table.
> 
> Do we have any function in Xen for that?

I don't understand. Both unmap_domain_page() and pmap_unmap() ignore
the low bits of the passed in address.

Jan

> Or the best I can do is:
>    pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
> (of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)
> 
> As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
>   pte_t *entry = _pt_walk(va, pte_level);
>   pte_t *table = PAGE_OFFSET(entry);
> 
> ~ Oleksii
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking
  2025-02-12 11:27       ` Jan Beulich
@ 2025-02-12 12:15         ` Oleksii Kurochko
  0 siblings, 0 replies; 13+ messages in thread
From: Oleksii Kurochko @ 2025-02-12 12:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3382 bytes --]


On 2/12/25 12:27 PM, Jan Beulich wrote:
> On 12.02.2025 12:13, Oleksii Kurochko wrote:
>> On 2/10/25 5:32 PM, Jan Beulich wrote:
>>> On 07.02.2025 14:13, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/pt.c
>>>> +++ b/xen/arch/riscv/pt.c
>>>> @@ -185,6 +185,57 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>>>        return XEN_TABLE_NORMAL;
>>>>    }
>>>>    
>>>> +/*
>>>> + * _pt_walk() performs software page table walking and returns the pte_t of
>>>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>>>> + * for further analysis.
>>>> + * Additionally, _pt_walk() returns the level of the found pte.
>>> That's optional, which I think wants expressing here.
>>>
>>>> + */
>>>> +static pte_t *_pt_walk(vaddr_t va, unsigned int *pte_level)
>>>> +{
>>>> +    const mfn_t root = get_root_page();
>>>> +    unsigned int level;
>>>> +    pte_t *table;
>>>> +
>>>> +    DECLARE_OFFSETS(offsets, va);
>>>> +
>>>> +    table = map_table(root);
>>> This mapping operation doesn't look to have a counterpart. Aiui ...
>>>
>>>> +    /*
>>>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>>>> +     * page level and checking if the entry points to a next page table or
>>>> +     * to a page.
>>>> +     *
>>>> +     * Two cases are possible:
>>>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was found;
>>>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>>>> +     *   pt_next_level() is called for page table level 0, it results in the
>>>> +     *   entry being a pointer to a leaf node, thereby returning
>>>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>>>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>>>> +     *   mapped.
>>>> +     */
>>>> +    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>>>> +    {
>>>> +        int ret = pt_next_level(false, &table, offsets[level]);
>>> ... the mapping may be replaced here, but a new mapping will then still
>>> be held by this function and ...
>>>
>>>> +        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>>>> +            break;
>>>> +
>>>> +        ASSERT(level);
>>>> +    }
>>>> +
>>>> +    if ( pte_level )
>>>> +        *pte_level = level;
>>>> +
>>>> +    return table + offsets[level];
>>>> +}
>>> ... the final one then be transferred to the caller.
>>>
>>>> +pte_t pt_walk(vaddr_t va, unsigned int *pte_level)
>>>> +{
>>>> +    return *_pt_walk(va, pte_level);
>>>> +}
>>> Hence aiui there needs to be an unmap operation here.
>> As _pt_walk() returns page table entry, it is needed to transform entry to table.
>>
>> Do we have any function in Xen for that?
> I don't understand. Both unmap_domain_page() and pmap_unmap() ignore
> the low bits of the passed in address.

Missed that. Then it is safe to use unmap_table() with page table entry.

Thanks.

~ Oleksii

>> Or the best I can do is:
>>     pte_t *table = *_pt_walk(va, pte_level) - TABLE_OFFSET(pt_linear_offset(*pte_level, va)
>> (of course, it is needed to check if pte_level isn't null and do some extra actions if it is NULL)
>>
>> As an option, as all page tables are PAGE_SIZE aligned, we could use PAGE_OFFSET():
>>    pte_t *entry = _pt_walk(va, pte_level);
>>    pte_t *table = PAGE_OFFSET(entry);
>>
>> ~ Oleksii
>>
>>

[-- Attachment #2: Type: text/html, Size: 4803 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-02-12 12:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 13:13 [PATCH for 4.20? v3 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
2025-02-07 13:13 ` [PATCH for 4.20? v3 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
2025-02-10 16:32   ` Jan Beulich
2025-02-11  9:15     ` Oleksii Kurochko
2025-02-12 11:13     ` Oleksii Kurochko
2025-02-12 11:27       ` Jan Beulich
2025-02-12 12:15         ` Oleksii Kurochko
2025-02-07 13:13 ` [PATCH for 4.20? v3 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
2025-02-07 13:30   ` Jan Beulich
2025-02-10  8:46     ` Oleksii Kurochko
2025-02-07 13:13 ` [PATCH for 4.20? v3 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
2025-02-10 16:53   ` Jan Beulich
2025-02-11  9:22     ` Oleksii Kurochko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.