All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 for 4.20? 0/3] Fixes for vmap_to_mfn() and pt_mapping_level
@ 2025-02-03 13:12 Oleksii Kurochko
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-03 13:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

Introduce pt_walk(), which does software page table walking to resolve the
following issues:
1. vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA
   from either the direct map region or Xen's linkage region (XEN_VIRT_START),
   thereby an assertion will occur if it is used with other regions, in
   particular for the VMAP region. The solution is usage of pt_walk() for
   vmap_to_mfn().
2. pt_mapping_level() returns incorrect page table level in the case when
   mfn==INVALID_MFN when, for example, VA was mapped to PA using 4k mapping,
   but during destroying/modification pt_mapping_level() could return incorrect
   page table level as when mfn==INVALID_MFN then only VA is taking into account
   for page table level calculation and so if VA is page table level 1 aligned
   then page_mapping_level() will return level 1 ( instead of level 0 as VA was
   mapped to PA using 4k mapping so there is incostinency here ).
   The solution is an introduction of PTE_LEAF_SEACH bit to tell pt_update() algo
   that it should use pt_walk() to find proper page table entry instead of
   using for searching of page table entry based on precalculated by
   pt_mapping_level() `level` and `order` values.

It would be nice  to have these fixes in Xen 4.20 but isn't really critical as
there is no any users for RISC-V port at this moment.

---
Changes in v2:
 - update the commit message.
 - other changes look in specific patch.

Oleksii Kurochko (3):
  xen/riscv: implement software page table walking
  xen/riscv: update defintion of vmap_to_mfn()
  xen/riscv: update mfn calculation in pt_mapping_level()

 xen/arch/riscv/include/asm/cmpxchg.h |   1 +
 xen/arch/riscv/include/asm/mm.h      |  18 +++-
 xen/arch/riscv/include/asm/page.h    |  30 +++---
 xen/arch/riscv/mm.c                  |  14 +++
 xen/arch/riscv/pt.c                  | 142 ++++++++++++++++++++-------
 5 files changed, 156 insertions(+), 49 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking
  2025-02-03 13:12 [PATCH v2 for 4.20? 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
@ 2025-02-03 13:12 ` Oleksii Kurochko
  2025-02-04 13:50   ` Jan Beulich
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
  2025-02-03 13:12 ` [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
  2 siblings, 1 reply; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-03 13:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

RISC-V doesn't have hardware feature to ask MMU to translate
virtual address to physical address ( like Arm has, for example ),
so software page table walking is implemented.

As pte_walk() returns pte_t and it requires inclusion of <asm/page.h>
in <asm/mm.h>, the following compilation started to occur after this
inclusion:
- implicit declaration of function 'GENMASK'. To resolve this, <xen/bitops.h>
  needs to be included at the top of <asm/cmpxchg.h>.
- implicit declaration of map_domain_page() and unmap_domain_page(). This issue
  arises because, after including <asm/page.h> in <asm/mm.h>, we could have the
  following hierarchy if someone decides to include <xen/domain.h>:
    <xen/domain_page.h>
      <xen/mm.h>
        <asm/mm.h>
          <asm/page.h>
            <xen/domain_page.h>
            ...
            flush_to_ram() which uses {un}map_domain_page() <---
            ...
    Declaration of {un}map_domain_page() happens here <---

    To avoid this compilation issue, definition of flush_to_ram() is moved to
    mm.c.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v2:
 - Update the code of pt_walk() to return pte_t instead of paddr_t.
 - Fix typos and drop blankets inside parentheses in the comment.
 - Update the `ret` handling; there is no need for `mfn` calculation
   anymore as pt_walk() returns or pte_t of a leaf node or non-present
   pte_t.
 - Drop the comment before unmap_table().
 - Drop local variable `pa` as pt_walk() is going to return pte_t
   instead of paddr_t.
 - Add the comment above pt_walk() to explain what it does and returns.
 - Add additional explanation about possible return values of pt_next_level()
   used inside while loop in pt_walk().
 - Change `pa` to `table` in the comment before while loop in pt_walk()
   as actually this loop finds a page table where paga table entry for `va`
   is located.
 - After including <asm/page.h> in <asm/mm.h>, the following compilation
   error occurs:
    ./arch/riscv/include/asm/cmpxchg.h:56:9: error: implicit declaration of
    function 'GENMASK'
   To resolve this, <xen/bitops.h> needs to be included at the top of
   <asm/cmpxchg.h>.
 - To avoid an issue with the implicit declaration of map_domain_page() and
   unmap_domain_page() after including <asm/page.h> in <asm/mm.h>, the
   implementation of flush_page_to_ram() has been moved to mm.c. (look for
   more detailed explanation in the commit message) As a result drop
   inclusion of <xen/domain.h> in <asm/page.h>.
 - Update the commit message
---
 xen/arch/riscv/include/asm/cmpxchg.h |  1 +
 xen/arch/riscv/include/asm/mm.h      |  4 ++
 xen/arch/riscv/include/asm/page.h    | 14 +------
 xen/arch/riscv/mm.c                  | 14 +++++++
 xen/arch/riscv/pt.c                  | 55 ++++++++++++++++++++++++++++
 5 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/xen/arch/riscv/include/asm/cmpxchg.h b/xen/arch/riscv/include/asm/cmpxchg.h
index 662d3fd5d4..4cfe5927b7 100644
--- a/xen/arch/riscv/include/asm/cmpxchg.h
+++ b/xen/arch/riscv/include/asm/cmpxchg.h
@@ -4,6 +4,7 @@
 #ifndef ASM__RISCV__CMPXCHG_H
 #define ASM__RISCV__CMPXCHG_H
 
+#include <xen/bitops.h>
 #include <xen/compiler.h>
 #include <xen/lib.h>
 
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 292aa48fc1..10a15a8b03 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -156,6 +156,10 @@ static inline struct page_info *virt_to_page(const void *v)
     return frametable_virt_start + PFN_DOWN(va - directmap_virt_start);
 }
 
+#include <asm/page.h>
+
+pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
+
 /*
  * Common code requires get_page_type and put_page_type.
  * We don't care about typecounts so we just do the minimum to make it
diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index 7a6174a109..b9076173f4 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -7,7 +7,6 @@
 
 #include <xen/bug.h>
 #include <xen/const.h>
-#include <xen/domain_page.h>
 #include <xen/errno.h>
 #include <xen/types.h>
 
@@ -177,18 +176,7 @@ static inline void invalidate_icache(void)
 #define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
 #define copy_page(dp, sp) memcpy(dp, sp, PAGE_SIZE)
 
-static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache)
-{
-    const void *v = map_domain_page(_mfn(mfn));
-
-    if ( clean_and_invalidate_dcache_va_range(v, PAGE_SIZE) )
-        BUG();
-
-    unmap_domain_page(v);
-
-    if ( sync_icache )
-        invalidate_icache();
-}
+void flush_page_to_ram(unsigned long mfn, bool sync_icache);
 
 /* Write a pagetable entry. */
 static inline void write_pte(pte_t *p, pte_t pte)
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index f2bf279bac..32574c879b 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -3,6 +3,7 @@
 #include <xen/bootfdt.h>
 #include <xen/bug.h>
 #include <xen/compiler.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/libfdt/libfdt.h>
@@ -564,3 +565,16 @@ void *__init arch_vmap_virt_end(void)
 {
     return (void *)(VMAP_VIRT_START + VMAP_VIRT_SIZE);
 }
+
+void flush_page_to_ram(unsigned long mfn, bool sync_icache)
+{
+    const void *v = map_domain_page(_mfn(mfn));
+
+    if ( clean_and_invalidate_dcache_va_range(v, PAGE_SIZE) )
+        BUG();
+
+    unmap_domain_page(v);
+
+    if ( sync_icache )
+        invalidate_icache();
+}
diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index a703e0f1bd..2a5a191a70 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -274,6 +274,61 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
     return rc;
 }
 
+/*
+ * pt_walk() performs software page table walking and returns the pte_t of
+ * a leaf node or the leaf-most not-present pte_t if no leaf node is found
+ * for further analysis.
+ * Additionally, pt_walk() returns the level of the found pte.
+ */
+pte_t * pt_walk(vaddr_t va, unsigned int *pte_level)
+{
+    const mfn_t root = get_root_page();
+    /*
+     * In pt_walk() only XEN_TABLE_MAP_NONE and XEN_TABLE_SUPER_PAGE are
+     * handled (as they are only possible for page table walking), so
+     * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM.
+     */
+    int ret = XEN_TABLE_MAP_NOMEM;
+    unsigned int level = HYP_PT_ROOT_LEVEL;
+    pte_t *table;
+
+    DECLARE_OFFSETS(offsets, va);
+
+    table = map_table(root);
+
+    /*
+     * Find `table` of an entry which corresponds to `va` by iterating for each
+     * page level and checking if the entry points to a next page table or
+     * to a page.
+     *
+     * Two cases are possible:
+     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find;
+     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
+     *   pt_next_level() is called for page table level 0, it results in the
+     *   entry being a pointer to a leaf node, thereby returning
+     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
+     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
+     *   mapped.
+     */
+    while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) )
+    {
+        /*
+         * This case shouldn't really occur as it will mean that for table
+         * level 0 a pointer to next page table has been written, but at
+         * level 0 it could be only a pointer to 4k page.
+         */
+        ASSERT(level <= HYP_PT_ROOT_LEVEL);
+
+        ret = pt_next_level(false, &table, offsets[level]);
+        level--;
+    }
+
+    if ( pte_level )
+        *pte_level = level + 1;
+
+    return table + offsets[level + 1];
+}
+
 /* Return the level where mapping should be done */
 static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
                             unsigned int flags)
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-03 13:12 [PATCH v2 for 4.20? 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
@ 2025-02-03 13:12 ` Oleksii Kurochko
  2025-02-04 13:56   ` Jan Beulich
  2025-02-03 13:12 ` [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
  2 siblings, 1 reply; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-03 13:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

vmap_to_mfn() uses virt_to_maddr(), which is designed to work with VA from
either the direct map region or Xen's linkage region (XEN_VIRT_START).
An assertion will occur if it is used with other regions, in particular for
the VMAP region.

Since RISC-V lacks a hardware feature to request the MMU to translate a VA to
a PA (as Arm does, for example), software page table walking (pt_walk()) is
used for the VMAP region to obtain the mfn from pte_t.

Fixes: 7db8d2bd9b ("xen/riscv: add minimal stuff to mm.h to build full Xen")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v2:
 - Update defintion of vmap_to_mfn() as pt_walk() now returns pte_t
   instead of paddr_t.
 - Update the commit message.
---
 xen/arch/riscv/include/asm/mm.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 10a15a8b03..814a7035a8 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -23,8 +23,6 @@ extern vaddr_t directmap_virt_start;
 #define gaddr_to_gfn(ga)    _gfn(paddr_to_pfn(ga))
 #define mfn_to_maddr(mfn)   pfn_to_paddr(mfn_x(mfn))
 #define maddr_to_mfn(ma)    _mfn(paddr_to_pfn(ma))
-#define vmap_to_mfn(va)     maddr_to_mfn(virt_to_maddr((vaddr_t)(va)))
-#define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
 static inline void *maddr_to_virt(paddr_t ma)
 {
@@ -160,6 +158,18 @@ static inline struct page_info *virt_to_page(const void *v)
 
 pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
 
+static inline mfn_t vmap_to_mfn_(vaddr_t va)
+{
+    pte_t *entry = pt_walk(va, NULL);
+
+    BUG_ON(!pte_is_mapping(*entry));
+
+    return mfn_from_pte(*entry);
+}
+
+#define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)
+#define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
+
 /*
  * Common code requires get_page_type and put_page_type.
  * We don't care about typecounts so we just do the minimum to make it
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-03 13:12 [PATCH v2 for 4.20? 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
@ 2025-02-03 13:12 ` Oleksii Kurochko
  2025-02-04 14:57   ` Jan Beulich
  2 siblings, 1 reply; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-03 13:12 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
	Julien Grall, Roger Pau Monné, Stefano Stabellini

When pt_update() is called with arguments (..., INVALID_MFN, ..., 0 or 1),
it indicates that a mapping is being destroyed/modifyed.

In the case when modifying or destroying a mapping, it is necessary to
search until a leaf node is found, instead of searching for a page table
entry based on the precalculated `level` and `order` returned from pt_update().
This is because when `mfn` == INVALID_MFN, the `mask` (in pt_mapping_level())
will take into account only `vfn`, which could accidentally return an
incorrect level, leading to the discovery of an incorrect page table entry.

For example, if `vfn` is page table level 1 aligned, but it was mapped as
page table level 0, then pt_mapping_level() will return `level` = 1, since
only `vfn` (which is page table level 1 aligned) is taken into account when
`mfn` == INVALID_MFN (look at pt_mapping_level()).

Fixes: c2f1ded524 ("xen/riscv: page table handling")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v2:
 - Introduce PTE_LEAF_SEARCH to tell page table update operation to
   walk down to wherever the leaf entry is.
 - Use introduced PTE_LEAF_SEARCH to not searching pte_t entry twice.
 - Update the commit message.
---
 xen/arch/riscv/include/asm/page.h | 16 ++++++
 xen/arch/riscv/pt.c               | 87 +++++++++++++++++++------------
 2 files changed, 69 insertions(+), 34 deletions(-)

diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index b9076173f4..72d29376bc 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -55,6 +55,22 @@
 #define PTE_SMALL       BIT(10, UL)
 #define PTE_POPULATE    BIT(11, UL)
 
+/*
+ * In the case when modifying or destroying a mapping, it is necessary to
+ * search until a leaf node is found, instead of searching for a page table
+ * entry based on the precalculated `level` and `order` (look at pt_update()).
+ * This is because when `mfn` == INVALID_MFN, the `mask`(in pt_mapping_level())
+ * will take into account only `vfn`, which could accidentally return an
+ * incorrect level, leading to the discovery of an incorrect page table entry.
+ *
+ * For example, if `vfn` is page table level 1 aligned, but it was mapped as
+ * page table level 0, then pt_mapping_level() will return `level` = 1,
+ * since only `vfn` (which is page table level 1 aligned) is taken into account
+ * when `mfn` == INVALID_MFN (look at pt_mapping_level()).
+ */
+
+#define PTE_LEAF_SEARCH BIT(12, UL)
+
 #define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE)
 
 /* Calculate the offsets into the pagetables for a given VA */
diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
index 2a5a191a70..9db41eac53 100644
--- a/xen/arch/riscv/pt.c
+++ b/xen/arch/riscv/pt.c
@@ -187,11 +187,10 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
 
 /* Update an entry at the level @target. */
 static int pt_update_entry(mfn_t root, vaddr_t virt,
-                           mfn_t mfn, unsigned int target,
+                           mfn_t mfn, unsigned int *target,
                            unsigned int flags)
 {
     int rc;
-    unsigned int level = HYP_PT_ROOT_LEVEL;
     pte_t *table;
     /*
      * The intermediate page table shouldn't be allocated when MFN isn't
@@ -205,39 +204,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
     bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
     pte_t pte, *entry;
 
-    /* convenience aliases */
-    DECLARE_OFFSETS(offsets, virt);
-
-    table = map_table(root);
-    for ( ; level > target; level-- )
+    if ( flags & PTE_LEAF_SEARCH )
     {
-        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
-        if ( rc == XEN_TABLE_MAP_NOMEM )
+        entry = pt_walk(virt, target);
+        BUG_ON(!pte_is_mapping(*entry));
+    }
+    else
+    {
+        unsigned int level = HYP_PT_ROOT_LEVEL;
+        /* convenience aliases */
+        DECLARE_OFFSETS(offsets, virt);
+
+        table = map_table(root);
+        for ( ; level > *target; level-- )
         {
-            rc = -ENOMEM;
-            goto out;
+            rc = pt_next_level(alloc_tbl, &table, offsets[level]);
+            if ( rc == XEN_TABLE_MAP_NOMEM )
+            {
+                rc = -ENOMEM;
+                goto out;
+            }
+
+            if ( rc == XEN_TABLE_MAP_NONE )
+            {
+                rc = 0;
+                goto out;
+            }
+
+            if ( rc != XEN_TABLE_NORMAL )
+                break;
         }
 
-        if ( rc == XEN_TABLE_MAP_NONE )
+        if ( level != *target )
         {
-            rc = 0;
+            dprintk(XENLOG_ERR,
+                    "%s: Shattering superpage is not supported\n", __func__);
+            rc = -EOPNOTSUPP;
             goto out;
         }
 
-        if ( rc != XEN_TABLE_NORMAL )
-            break;
+        entry = table + offsets[level];
     }
 
-    if ( level != target )
-    {
-        dprintk(XENLOG_ERR,
-                "%s: Shattering superpage is not supported\n", __func__);
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    entry = table + offsets[level];
-
     rc = -EINVAL;
     if ( !pt_check_entry(*entry, mfn, flags) )
         goto out;
@@ -345,9 +353,6 @@ static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
         return level;
 
     /*
-     * Don't take into account the MFN when removing mapping (i.e
-     * MFN_INVALID) to calculate the correct target order.
-     *
      * `vfn` and `mfn` must be both superpage aligned.
      * They are or-ed together and then checked against the size of
      * each level.
@@ -415,19 +420,33 @@ static int pt_update(vaddr_t virt, mfn_t mfn,
 
     spin_lock(&pt_lock);
 
-    while ( left )
+    /* look at the comment above the definition of PTE_LEAF_SEARCH */
+    if ( mfn_eq(mfn, INVALID_MFN) && !(flags & PTE_POPULATE) )
     {
-        unsigned int order, level;
+        flags |= PTE_LEAF_SEARCH;
+    }
 
-        level = pt_mapping_level(vfn, mfn, left, flags);
-        order = XEN_PT_LEVEL_ORDER(level);
+    while ( left )
+    {
+        unsigned int order = 0, level;
 
-        ASSERT(left >= BIT(order, UL));
+        if ( !(flags & PTE_LEAF_SEARCH) )
+        {
+            level = pt_mapping_level(vfn, mfn, left, flags);
+            order = XEN_PT_LEVEL_ORDER(level);
+            ASSERT(left >= BIT(order, UL));
+        }
 
-        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
+        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, &level, flags);
         if ( rc )
             break;
 
+        if ( flags & PTE_LEAF_SEARCH )
+        {
+            order = XEN_PT_LEVEL_ORDER(level);
+            ASSERT(left >= BIT(order, UL));
+        }
+
         vfn += 1UL << order;
         if ( !mfn_eq(mfn, INVALID_MFN) )
             mfn = mfn_add(mfn, 1UL << order);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
@ 2025-02-04 13:50   ` Jan Beulich
  2025-02-05 16:55     ` Oleksii Kurochko
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2025-02-04 13:50 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 03.02.2025 14:12, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -156,6 +156,10 @@ static inline struct page_info *virt_to_page(const void *v)
>      return frametable_virt_start + PFN_DOWN(va - directmap_virt_start);
>  }
>  
> +#include <asm/page.h>

asm/page.h already includes asm/mm.h, so you're introducing a circular
dependency here (much of which the patch description is about, so it's
unclear why you didn't solve this another way). Afaict ...

> +pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);

... it's pte_t that presents a problem here. Why not simply put the
declaration in asm/page.h (and drop all the secondary changes that
don't really belong in this patch anyway)?

Also nit: Excess blank after the first *.

> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -274,6 +274,61 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>      return rc;
>  }
>  
> +/*
> + * pt_walk() performs software page table walking and returns the pte_t of
> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
> + * for further analysis.
> + * Additionally, pt_walk() returns the level of the found pte.
> + */
> +pte_t * pt_walk(vaddr_t va, unsigned int *pte_level)

See nit above.

> +{
> +    const mfn_t root = get_root_page();
> +    /*
> +     * In pt_walk() only XEN_TABLE_MAP_NONE and XEN_TABLE_SUPER_PAGE are
> +     * handled (as they are only possible for page table walking), so
> +     * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM.
> +     */
> +    int ret = XEN_TABLE_MAP_NOMEM;
> +    unsigned int level = HYP_PT_ROOT_LEVEL;

With this initialization and ...

> +    pte_t *table;
> +
> +    DECLARE_OFFSETS(offsets, va);
> +
> +    table = map_table(root);
> +
> +    /*
> +     * Find `table` of an entry which corresponds to `va` by iterating for each
> +     * page level and checking if the entry points to a next page table or
> +     * to a page.
> +     *
> +     * Two cases are possible:
> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find;

(nit: found)

> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
> +     *   pt_next_level() is called for page table level 0, it results in the
> +     *   entry being a pointer to a leaf node, thereby returning
> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
> +     *   mapped.
> +     */
> +    while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) )
> +    {
> +        /*
> +         * This case shouldn't really occur as it will mean that for table
> +         * level 0 a pointer to next page table has been written, but at
> +         * level 0 it could be only a pointer to 4k page.
> +         */
> +        ASSERT(level <= HYP_PT_ROOT_LEVEL);
> +
> +        ret = pt_next_level(false, &table, offsets[level]);
> +        level--;

... this being the only updating of the variable, what are the assertion and
accompanying comment about? What you're rather at risk of is for level to
wrap around through 0. In fact I think it does, ...

> +    }
> +
> +    if ( pte_level )
> +        *pte_level = level + 1;
> +
> +    return table + offsets[level + 1];
> +}

... which you account for by adding 1 in both of the places here. Don't you
think that it would be better to avoid such, e.g.:

    for ( level = HYP_PT_ROOT_LEVEL; ; --level )
    {
        int ret = pt_next_level(false, &table, offsets[level]);

        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
            break;
        ASSERT(level);
    } 

or

    for ( level = CONFIG_PAGING_LEVELS; level--; )
    {
        int ret = pt_next_level(false, &table, offsets[level]);

        if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
            break;
    } 

This then also avoids the oddity about ret's initializer.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-03 13:12 ` [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
@ 2025-02-04 13:56   ` Jan Beulich
  2025-02-05 16:58     ` Oleksii Kurochko
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2025-02-04 13:56 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 03.02.2025 14:12, Oleksii Kurochko wrote:
> @@ -160,6 +158,18 @@ static inline struct page_info *virt_to_page(const void *v)
>  
>  pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
>  
> +static inline mfn_t vmap_to_mfn_(vaddr_t va)

Btw., for static functions (and variables) a prefixing underscore is
fine to use. Its identifiers that don't have file scope which shouldn't.

> +{
> +    pte_t *entry = pt_walk(va, NULL);

Oh, noticing the anomaly only here: Why would pt_walk() return a pointer
to a PTE, rather than the pte_t by value? All this does is encourage
open-coded accesses (even writes), when especially writes are supposed
to be going through pt_update().

> +    BUG_ON(!pte_is_mapping(*entry));
> +
> +    return mfn_from_pte(*entry);
> +}
> +
> +#define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)

You've lost the parenthesizing of va.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-03 13:12 ` [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
@ 2025-02-04 14:57   ` Jan Beulich
  2025-02-05 17:55     ` Oleksii Kurochko
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2025-02-04 14:57 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 03.02.2025 14:12, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -55,6 +55,22 @@
>  #define PTE_SMALL       BIT(10, UL)
>  #define PTE_POPULATE    BIT(11, UL)
>  
> +/*
> + * In the case when modifying or destroying a mapping, it is necessary to
> + * search until a leaf node is found, instead of searching for a page table
> + * entry based on the precalculated `level` and `order` (look at pt_update()).
> + * This is because when `mfn` == INVALID_MFN, the `mask`(in pt_mapping_level())
> + * will take into account only `vfn`, which could accidentally return an
> + * incorrect level, leading to the discovery of an incorrect page table entry.
> + *
> + * For example, if `vfn` is page table level 1 aligned, but it was mapped as
> + * page table level 0, then pt_mapping_level() will return `level` = 1,
> + * since only `vfn` (which is page table level 1 aligned) is taken into account
> + * when `mfn` == INVALID_MFN (look at pt_mapping_level()).
> + */
> +
> +#define PTE_LEAF_SEARCH BIT(12, UL)

Is it intended for callers outside of pt.c to make use of this? If not,
it better wouldn't be globally exposed.

Furthermore, this isn't a property of the PTE(s) to be created, so is
likely wrong to mix with PTE_* flags. (PTE_POPULATE is on the edge of
also falling in this category, btw.) Perhaps ...

> --- a/xen/arch/riscv/pt.c
> +++ b/xen/arch/riscv/pt.c
> @@ -187,11 +187,10 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>  
>  /* Update an entry at the level @target. */
>  static int pt_update_entry(mfn_t root, vaddr_t virt,
> -                           mfn_t mfn, unsigned int target,
> +                           mfn_t mfn, unsigned int *target,

... you instead want to have callers of this function preset *target
to a special value (e.g. UINT_MAX or CONFIG_PAGING_LEVELS) indicating
the level is wanted as an output.

> @@ -205,39 +204,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>      bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
>      pte_t pte, *entry;
>  
> -    /* convenience aliases */
> -    DECLARE_OFFSETS(offsets, virt);
> -
> -    table = map_table(root);
> -    for ( ; level > target; level-- )
> +    if ( flags & PTE_LEAF_SEARCH )
>      {
> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> -        if ( rc == XEN_TABLE_MAP_NOMEM )
> +        entry = pt_walk(virt, target);
> +        BUG_ON(!pte_is_mapping(*entry));

Is this really necessarily a bug? Can't one want to determine how deep
the (populated) page tables are for a given VA?

Hmm, here I can see why you have pt_walk() return a pointer. As per the
comment on the earlier patch, I don't think this is a good idea. You
may want to have

static pte_t *_pt_walk(...)
{
    ...
}

pte_t pt_walk(...)
{
    return *_pt_walk(...);
}

> @@ -345,9 +353,6 @@ static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
>          return level;
>  
>      /*
> -     * Don't take into account the MFN when removing mapping (i.e
> -     * MFN_INVALID) to calculate the correct target order.
> -     *
>       * `vfn` and `mfn` must be both superpage aligned.
>       * They are or-ed together and then checked against the size of
>       * each level.

You drop part of the comment without altering the code being commented.
What's the deal?

> @@ -415,19 +420,33 @@ static int pt_update(vaddr_t virt, mfn_t mfn,
>  
>      spin_lock(&pt_lock);
>  
> -    while ( left )
> +    /* look at the comment above the definition of PTE_LEAF_SEARCH */
> +    if ( mfn_eq(mfn, INVALID_MFN) && !(flags & PTE_POPULATE) )
>      {
> -        unsigned int order, level;
> +        flags |= PTE_LEAF_SEARCH;
> +    }

For readability I think it would be better if the figure braces were
dropped.

> -        level = pt_mapping_level(vfn, mfn, left, flags);
> -        order = XEN_PT_LEVEL_ORDER(level);
> +    while ( left )
> +    {
> +        unsigned int order = 0, level;
>  
> -        ASSERT(left >= BIT(order, UL));
> +        if ( !(flags & PTE_LEAF_SEARCH) )
> +        {
> +            level = pt_mapping_level(vfn, mfn, left, flags);
> +            order = XEN_PT_LEVEL_ORDER(level);
> +            ASSERT(left >= BIT(order, UL));

Assignment to order and assertion are ...

> +        }
>  
> -        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
> +        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, &level, flags);
>          if ( rc )
>              break;
>  
> +        if ( flags & PTE_LEAF_SEARCH )
> +        {
> +            order = XEN_PT_LEVEL_ORDER(level);
> +            ASSERT(left >= BIT(order, UL));
> +        }

... repeated here, with neither left nor order being passed into
pt_update_entry(). Does this really need doing twice? (I have to
admit that I have trouble determining what the assertion is about.
For order alone it clearly could be done centrally after the call.)

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking
  2025-02-04 13:50   ` Jan Beulich
@ 2025-02-05 16:55     ` Oleksii Kurochko
  2025-02-06 11:15       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-05 16:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 4894 bytes --]


On 2/4/25 2:50 PM, Jan Beulich wrote:
> On 03.02.2025 14:12, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/include/asm/mm.h
>> +++ b/xen/arch/riscv/include/asm/mm.h
>> @@ -156,6 +156,10 @@ static inline struct page_info *virt_to_page(const void *v)
>>       return frametable_virt_start + PFN_DOWN(va - directmap_virt_start);
>>   }
>>   
>> +#include <asm/page.h>
> asm/page.h already includes asm/mm.h, so you're introducing a circular
> dependency here (much of which the patch description is about, so it's
> unclear why you didn't solve this another way). Afaict ...
>
>> +pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
> ... it's pte_t that presents a problem here. Why not simply put the
> declaration in asm/page.h (and drop all the secondary changes that
> don't really belong in this patch anyway)?

In the patch 2 it is used for implementing vmap_to_mfn():

   static inline mfn_t vmap_to_mfn_(vaddr_t va)
   {
       pte_t *entry = pt_walk(va, NULL);

       BUG_ON(!pte_is_mapping(*entry));

       return mfn_from_pte(*entry);
   }

   #define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)mfn_from_pte

what leads to including of <asm/page.h> in <asm/mm.h>.

As an option, if to move the following to <asm/page.h>:
   #define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)
   #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
the circular dependency could be dropped.

> Also nit: Excess blank after the first *.
>
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -274,6 +274,61 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>>       return rc;
>>   }
>>   
>> +/*
>> + * pt_walk() performs software page table walking and returns the pte_t of
>> + * a leaf node or the leaf-most not-present pte_t if no leaf node is found
>> + * for further analysis.
>> + * Additionally, pt_walk() returns the level of the found pte.
>> + */
>> +pte_t * pt_walk(vaddr_t va, unsigned int *pte_level)
> See nit above.
>
>> +{
>> +    const mfn_t root = get_root_page();
>> +    /*
>> +     * In pt_walk() only XEN_TABLE_MAP_NONE and XEN_TABLE_SUPER_PAGE are
>> +     * handled (as they are only possible for page table walking), so
>> +     * initialize `ret` with "impossible" XEN_TABLE_MAP_NOMEM.
>> +     */
>> +    int ret = XEN_TABLE_MAP_NOMEM;
>> +    unsigned int level = HYP_PT_ROOT_LEVEL;
> With this initialization and ...
>
>> +    pte_t *table;
>> +
>> +    DECLARE_OFFSETS(offsets, va);
>> +
>> +    table = map_table(root);
>> +
>> +    /*
>> +     * Find `table` of an entry which corresponds to `va` by iterating for each
>> +     * page level and checking if the entry points to a next page table or
>> +     * to a page.
>> +     *
>> +     * Two cases are possible:
>> +     * - ret == XEN_TABLE_SUPER_PAGE means that the entry was find;
> (nit: found)
>
>> +     *   (Despite the name) XEN_TABLE_SUPER_PAGE also covers 4K mappings. If
>> +     *   pt_next_level() is called for page table level 0, it results in the
>> +     *   entry being a pointer to a leaf node, thereby returning
>> +     *   XEN_TABLE_SUPER_PAGE, despite of the fact this leaf covers 4k mapping.
>> +     * - ret == XEN_TABLE_MAP_NONE means that requested `va` wasn't actually
>> +     *   mapped.
>> +     */
>> +    while ( (ret != XEN_TABLE_MAP_NONE) && (ret != XEN_TABLE_SUPER_PAGE) )
>> +    {
>> +        /*
>> +         * This case shouldn't really occur as it will mean that for table
>> +         * level 0 a pointer to next page table has been written, but at
>> +         * level 0 it could be only a pointer to 4k page.
>> +         */
>> +        ASSERT(level <= HYP_PT_ROOT_LEVEL);
>> +
>> +        ret = pt_next_level(false, &table, offsets[level]);
>> +        level--;
> ... this being the only updating of the variable, what are the assertion and
> accompanying comment about? What you're rather at risk of is for level to
> wrap around through 0. In fact I think it does, ...
>
>> +    }
>> +
>> +    if ( pte_level )
>> +        *pte_level = level + 1;
>> +
>> +    return table + offsets[level + 1];
>> +}
> ... which you account for by adding 1 in both of the places here. Don't you
> think that it would be better to avoid such, e.g.:
>
>      for ( level = HYP_PT_ROOT_LEVEL; ; --level )
>      {
>          int ret = pt_next_level(false, &table, offsets[level]);
>
>          if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>              break;
>          ASSERT(level);
>      }
>
> or
>
>      for ( level = CONFIG_PAGING_LEVELS; level--; )
>      {
>          int ret = pt_next_level(false, &table, offsets[level]);
>
>          if ( ret == XEN_TABLE_MAP_NONE || ret == XEN_TABLE_SUPER_PAGE )
>              break;
>      }
>
> This then also avoids the oddity about ret's initializer.

We can do in the way you suggested, it is more clear. I will go with the second option.

Thanks.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 6269 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-04 13:56   ` Jan Beulich
@ 2025-02-05 16:58     ` Oleksii Kurochko
  2025-02-06 11:17       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-05 16:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]


On 2/4/25 2:56 PM, Jan Beulich wrote:
> On 03.02.2025 14:12, Oleksii Kurochko wrote:
>> @@ -160,6 +158,18 @@ static inline struct page_info *virt_to_page(const void *v)
>>   
>>   pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
>>   
>> +static inline mfn_t vmap_to_mfn_(vaddr_t va)
> Btw., for static functions (and variables) a prefixing underscore is
> fine to use. Its identifiers that don't have file scope which shouldn't.

Should it be used a single underscore prefixing or a double one?

>
>> +{
>> +    pte_t *entry = pt_walk(va, NULL);
> Oh, noticing the anomaly only here: Why would pt_walk() return a pointer
> to a PTE, rather than the pte_t by value? All this does is encourage
> open-coded accesses (even writes), when especially writes are supposed
> to be going through pt_update().

I tried to play with forward declaration of pte_t to not introduce
circular dependency in the previous patch. It would be really better to return
pte_t by value, I will update that.

Thanks.

~Oleksii

[-- Attachment #2: Type: text/html, Size: 1824 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level()
  2025-02-04 14:57   ` Jan Beulich
@ 2025-02-05 17:55     ` Oleksii Kurochko
  0 siblings, 0 replies; 12+ messages in thread
From: Oleksii Kurochko @ 2025-02-05 17:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

[-- Attachment #1: Type: text/plain, Size: 6230 bytes --]


On 2/4/25 3:57 PM, Jan Beulich wrote:
> On 03.02.2025 14:12, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/include/asm/page.h
>> +++ b/xen/arch/riscv/include/asm/page.h
>> @@ -55,6 +55,22 @@
>>   #define PTE_SMALL       BIT(10, UL)
>>   #define PTE_POPULATE    BIT(11, UL)
>>   
>> +/*
>> + * In the case when modifying or destroying a mapping, it is necessary to
>> + * search until a leaf node is found, instead of searching for a page table
>> + * entry based on the precalculated `level` and `order` (look at pt_update()).
>> + * This is because when `mfn` == INVALID_MFN, the `mask`(in pt_mapping_level())
>> + * will take into account only `vfn`, which could accidentally return an
>> + * incorrect level, leading to the discovery of an incorrect page table entry.
>> + *
>> + * For example, if `vfn` is page table level 1 aligned, but it was mapped as
>> + * page table level 0, then pt_mapping_level() will return `level` = 1,
>> + * since only `vfn` (which is page table level 1 aligned) is taken into account
>> + * when `mfn` == INVALID_MFN (look at pt_mapping_level()).
>> + */
>> +
>> +#define PTE_LEAF_SEARCH BIT(12, UL)
> Is it intended for callers outside of pt.c to make use of this? If not,
> it better wouldn't be globally exposed.
>
> Furthermore, this isn't a property of the PTE(s) to be created, so is
> likely wrong to mix with PTE_* flags. (PTE_POPULATE is on the edge of
> also falling in this category, btw.) Perhaps ...
>
>> --- a/xen/arch/riscv/pt.c
>> +++ b/xen/arch/riscv/pt.c
>> @@ -187,11 +187,10 @@ static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
>>   
>>   /* Update an entry at the level @target. */
>>   static int pt_update_entry(mfn_t root, vaddr_t virt,
>> -                           mfn_t mfn, unsigned int target,
>> +                           mfn_t mfn, unsigned int *target,
> ... you instead want to have callers of this function preset *target
> to a special value (e.g. UINT_MAX or CONFIG_PAGING_LEVELS) indicating
> the level is wanted as an output.

I thought about this way but decided to use a separate for PTE_* flag which looked to me more clearer, at
that moment. But I didn't take into account that it will be used only inside pt.c, so I agree that it should
declared locally in pt.c and used for that a special value. I will update correspondingly.

>
>> @@ -205,39 +204,48 @@ static int pt_update_entry(mfn_t root, vaddr_t virt,
>>       bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
>>       pte_t pte, *entry;
>>   
>> -    /* convenience aliases */
>> -    DECLARE_OFFSETS(offsets, virt);
>> -
>> -    table = map_table(root);
>> -    for ( ; level > target; level-- )
>> +    if ( flags & PTE_LEAF_SEARCH )
>>       {
>> -        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
>> -        if ( rc == XEN_TABLE_MAP_NOMEM )
>> +        entry = pt_walk(virt, target);
>> +        BUG_ON(!pte_is_mapping(*entry));
> Is this really necessarily a bug? Can't one want to determine how deep
> the (populated) page tables are for a given VA?

pt_walk() could be used in that way but PTE_LEAF_SEARCH won't be used for page table populating:
     if ( mfn_eq(mfn, INVALID_MFN) && !(flags & PTE_POPULATE) )
     {
         flags |= PTE_LEAF_SEARCH;
     }
so in the current version of the patch only mapped VA <-> PA is expected to be in this part of the code.


>
> Hmm, here I can see why you have pt_walk() return a pointer. As per the
> comment on the earlier patch, I don't think this is a good idea. You
> may want to have
>
> static pte_t *_pt_walk(...)
> {
>      ...
> }
>
> pte_t pt_walk(...)
> {
>      return *_pt_walk(...);
> }

It would be better to do in this way.

>
>> @@ -345,9 +353,6 @@ static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
>>           return level;
>>   
>>       /*
>> -     * Don't take into account the MFN when removing mapping (i.e
>> -     * MFN_INVALID) to calculate the correct target order.
>> -     *
>>        * `vfn` and `mfn` must be both superpage aligned.
>>        * They are or-ed together and then checked against the size of
>>        * each level.
> You drop part of the comment without altering the code being commented.
> What's the deal?

These changes are the part of v1 version of this functions. Basically I did incorrect reverting. Thanks for noticing
that, I have to return this comments back.


>
>> @@ -415,19 +420,33 @@ static int pt_update(vaddr_t virt, mfn_t mfn,
>>   
>>       spin_lock(&pt_lock);
>>   
>> -    while ( left )
>> +    /* look at the comment above the definition of PTE_LEAF_SEARCH */
>> +    if ( mfn_eq(mfn, INVALID_MFN) && !(flags & PTE_POPULATE) )
>>       {
>> -        unsigned int order, level;
>> +        flags |= PTE_LEAF_SEARCH;
>> +    }
> For readability I think it would be better if the figure braces were
> dropped.
>
>> -        level = pt_mapping_level(vfn, mfn, left, flags);
>> -        order = XEN_PT_LEVEL_ORDER(level);
>> +    while ( left )
>> +    {
>> +        unsigned int order = 0, level;
>>   
>> -        ASSERT(left >= BIT(order, UL));
>> +        if ( !(flags & PTE_LEAF_SEARCH) )
>> +        {
>> +            level = pt_mapping_level(vfn, mfn, left, flags);
>> +            order = XEN_PT_LEVEL_ORDER(level);
>> +            ASSERT(left >= BIT(order, UL));
> Assignment to order and assertion are ...
>
>> +        }
>>   
>> -        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
>> +        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, &level, flags);
>>           if ( rc )
>>               break;
>>   
>> +        if ( flags & PTE_LEAF_SEARCH )
>> +        {
>> +            order = XEN_PT_LEVEL_ORDER(level);
>> +            ASSERT(left >= BIT(order, UL));
>> +        }
> ... repeated here, with neither left nor order being passed into
> pt_update_entry(). Does this really need doing twice? (I have to
> admit that I have trouble determining what the assertion is about.
> For order alone it clearly could be done centrally after the call.)

Sure, it could be done just once.

Regarding ASSERT() itself it was added to be sure that pt_mapping_level() returns proper `level`.
I am not really sure that it is needed anymore.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 8082 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking
  2025-02-05 16:55     ` Oleksii Kurochko
@ 2025-02-06 11:15       ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2025-02-06 11:15 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 05.02.2025 17:55, Oleksii Kurochko wrote:
> On 2/4/25 2:50 PM, Jan Beulich wrote:
>> On 03.02.2025 14:12, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/mm.h
>>> +++ b/xen/arch/riscv/include/asm/mm.h
>>> @@ -156,6 +156,10 @@ static inline struct page_info *virt_to_page(const void *v)
>>>       return frametable_virt_start + PFN_DOWN(va - directmap_virt_start);
>>>   }
>>>   
>>> +#include <asm/page.h>
>> asm/page.h already includes asm/mm.h, so you're introducing a circular
>> dependency here (much of which the patch description is about, so it's
>> unclear why you didn't solve this another way). Afaict ...
>>
>>> +pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
>> ... it's pte_t that presents a problem here. Why not simply put the
>> declaration in asm/page.h (and drop all the secondary changes that
>> don't really belong in this patch anyway)?
> 
> In the patch 2 it is used for implementing vmap_to_mfn():
> 
>    static inline mfn_t vmap_to_mfn_(vaddr_t va)
>    {
>        pte_t *entry = pt_walk(va, NULL);
> 
>        BUG_ON(!pte_is_mapping(*entry));
> 
>        return mfn_from_pte(*entry);
>    }
> 
>    #define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)mfn_from_pte
> 
> what leads to including of <asm/page.h> in <asm/mm.h>.
> 
> As an option, if to move the following to <asm/page.h>:
>    #define vmap_to_mfn(va)     vmap_to_mfn_((vaddr_t)va)
>    #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
> the circular dependency could be dropped.

I wouldn't like that, but it's an option, yes. Alternatives I can think of
right away:
- put the helper function an asm/page.h, but keep the macros where they are,
- convert the helper function to a helper macro.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn()
  2025-02-05 16:58     ` Oleksii Kurochko
@ 2025-02-06 11:17       ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2025-02-06 11:17 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
	Stefano Stabellini, xen-devel

On 05.02.2025 17:58, Oleksii Kurochko wrote:
> 
> On 2/4/25 2:56 PM, Jan Beulich wrote:
>> On 03.02.2025 14:12, Oleksii Kurochko wrote:
>>> @@ -160,6 +158,18 @@ static inline struct page_info *virt_to_page(const void *v)
>>>   
>>>   pte_t * pt_walk(vaddr_t va, unsigned int *pte_level);
>>>   
>>> +static inline mfn_t vmap_to_mfn_(vaddr_t va)
>> Btw., for static functions (and variables) a prefixing underscore is
>> fine to use. Its identifiers that don't have file scope which shouldn't.
> 
> Should it be used a single underscore prefixing or a double one?

Never use double underscores as an identifier prefix of your own. Only the
compiler (and in principle the library, if there was one) may use such.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-06 11:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03 13:12 [PATCH v2 for 4.20? 0/3] Fixes for vmap_to_mfn() and pt_mapping_level Oleksii Kurochko
2025-02-03 13:12 ` [PATCH v2 for 4.20? 1/3] xen/riscv: implement software page table walking Oleksii Kurochko
2025-02-04 13:50   ` Jan Beulich
2025-02-05 16:55     ` Oleksii Kurochko
2025-02-06 11:15       ` Jan Beulich
2025-02-03 13:12 ` [PATCH v2 for 4.20? 2/3] xen/riscv: update defintion of vmap_to_mfn() Oleksii Kurochko
2025-02-04 13:56   ` Jan Beulich
2025-02-05 16:58     ` Oleksii Kurochko
2025-02-06 11:17       ` Jan Beulich
2025-02-03 13:12 ` [PATCH v2 3/3] xen/riscv: update mfn calculation in pt_mapping_level() Oleksii Kurochko
2025-02-04 14:57   ` Jan Beulich
2025-02-05 17:55     ` Oleksii Kurochko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.