* [PATCH v8 0/3] xen/riscv: introduce p2m functionality
@ 2025-12-22 16:37 Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-22 16:37 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
In this patch series are introduced necessary functions to build and manage
RISC-V guest page tables and MMIO/RAM mappings.
CI tests:
https://gitlab.com/xen-project/people/olkur/xen/-/pipelines/2224311393
---
Changes in V8:
- All patches (except last three ones) are merged to staging.
- Addressed comments for v7.
---
Changes in V7:
- Merged to staging:
- xen/riscv: avoid redundant HGATP*_MODE_SHIFT and HGATP*_VMID_SHIFT
- Introduce new patch:
- xen/riscv: update p2m_set_entry to free unused metadata page
(could be merged with previous one: xen/riscv: introduce metadata table to
store P2M type )
- Addressed comments for v6.
---
Changes in V6:
- Addressed coment for v5.
---
Changes in V5:
- Addressed comments for v4.
---
Changes in V4:
- Merged to staging:
- xen/riscv: introduce sbi_remote_hfence_gvma()
- xen/riscv: introduce sbi_remote_hfence_gvma_vmid()
- Drop "xen/riscv: introduce page_{get,set}_xenheap_gfn()" as grant tables aren't going to be introduced for the moment. Also, drops other parts connected to grant tables support.
- All other changes are patch specific.
---
Changes in V3:
- Introduce metadata table to store P2M types.
- Use x86's way to allocate VMID.
- Abstract Arm-specific p2m type name for device MMIO mappings.
- All other updates please look at specific patch.
---
Changes in V2:
- Merged to staging:
- [PATCH v1 1/6] xen/riscv: add inclusion of xen/bitops.h to asm/cmpxchg.h
- New patches:
- xen/riscv: implement sbi_remote_hfence_gvma{_vmid}().
- Split patch "xen/riscv: implement p2m mapping functionality" into smaller
one patches:
- xen/riscv: introduce page_set_xenheap_gfn()
- xen/riscv: implement guest_physmap_add_entry() for mapping GFNs to MFNs
- xen/riscv: implement p2m_set_entry() and __p2m_set_entry()
- xen/riscv: Implement p2m_free_entry() and related helpers
- xen/riscv: Implement superpage splitting for p2m mappings
- xen/riscv: implement p2m_next_level()
- xen/riscv: Implement p2m_entry_from_mfn() and support PBMT configuration
- Move root p2m table allocation to separate patch:
xen/riscv: add root page table allocation
- Drop dependency of this patch series from the patch witn an introduction of
SvPBMT as it was merged.
- Patch "[PATCH v1 4/6] xen/riscv: define pt_t and pt_walk_t structures" was
renamed to xen/riscv: introduce pte_{set,get}_mfn() as after dropping of
bitfields for PTE structure, this patch introduce only pte_{set,get}_mfn().
- Rename "xen/riscv: define pt_t and pt_walk_t structures" to
"xen/riscv: introduce pte_{set,get}_mfn()" as pt_t and pt_walk_t were
dropped.
- Introduce guest domain's VMID allocation and manegement.
- Add patches necessary to implement p2m lookup:
- xen/riscv: implement mfn_valid() and page reference, ownership handling helpers
- xen/riscv: add support of page lookup by GFN
- Re-sort patch series.
- All other changes are patch-specific. Please check them.
---
Oleksii Kurochko (3):
xen/riscv: add support of page lookup by GFN
xen/riscv: introduce metadata table to store P2M type
xen/riscv: update p2m_set_entry() to free unused metadata pages
xen/arch/riscv/include/asm/flushtlb.h | 2 +-
xen/arch/riscv/include/asm/mm.h | 21 ++
xen/arch/riscv/include/asm/p2m.h | 21 ++
xen/arch/riscv/mm.c | 13 +
xen/arch/riscv/p2m.c | 437 ++++++++++++++++++++++++--
5 files changed, 464 insertions(+), 30 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN
2025-12-22 16:37 [PATCH v8 0/3] xen/riscv: introduce p2m functionality Oleksii Kurochko
@ 2025-12-22 16:37 ` Oleksii Kurochko
2025-12-29 15:06 ` Jan Beulich
2025-12-22 16:37 ` [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 3/3] xen/riscv: update p2m_set_entry() to free unused metadata pages Oleksii Kurochko
2 siblings, 1 reply; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-22 16:37 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce helper functions for safely querying the P2M (physical-to-machine)
mapping:
- add p2m_read_lock(), p2m_read_unlock(), and p2m_is_locked() for managing
P2M lock state.
- Implement p2m_get_entry() to retrieve mapping details for a given GFN,
including MFN, page order, and validity.
- Introduce p2m_get_page_from_gfn() to convert a GFN into a page_info
pointer, acquiring a reference to the page if valid.
- Introduce get_page().
Implementations are based on Arm's functions with some minor modifications:
- p2m_get_entry():
- Reverse traversal of page tables, as RISC-V uses the opposite level
numbering compared to Arm.
- Removed the return of p2m_access_t from p2m_get_entry() since
mem_access_settings is not introduced for RISC-V.
- Updated BUILD_BUG_ON() to check using the level 0 mask, which corresponds
to Arm's THIRD_MASK.
- Replaced open-coded bit shifts with the BIT() macro.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V8:
- Drop the local variable masked_gfn inside check_outside_boundary() and fold
the is_lower conditionals into the for loop.
- Initialize the local variable level in p2m_get_entry() to the root level
and drop the explicit assignment when root page table wasn't found, as it
now defaults to the root level.
- Introduce gfn_limit_bits and use it to calculate the maximum GFN for the
MMU second stage, and return the appropriate page_order when the GFN exceeds
this limit.
---
Changes in V7:
- Refactor check_outside_boundary().
- Reword the comment above p2m_get_entry().
- As at the moment p2m_get_entry() doesn't pass `t` as NULL we could drop
"if ( t )" checks inside it to not have dead code now.
- Add the check inside p2m_get_entry() that requested gfn is correct.
- Add "if ( t )" check inside p2m_get_page_from_gfn() as it
is going to be some callers with t = NULL.
---
Changes in V6:
- Move if-condition with initialization up in p2m_get_page_from_gfn().
- Pass p2mt to the call of p2m_get_entry() inside p2m_get_page_from_gfn()
to avoid an issue when 't' is passed NULL. With p2mt passed to p2m_get_entry()
we will recieve a proper type and so the rest of the function will able to
continue use a proper type.
- In check_outside_boundary() in the case when is_lower == true fill the bottom
bits of masked_gfn with all 1s.
- Update code of check_outside_boundary() to return proper level in the case when
`level` is equal to 0.
- Add ASSERT(p2m) in check_outside_boundary() to be sure that p2m isn't NULL as
P2M_LEVEL_MASK() depends on p2m value.
---
Changes in V5:
- Use introduced in earlier patches P2M_DECLARE_OFFSETS() instead of
DECLARE_OFFSETS().
- Drop blank line before check_outside_boundary().
- Use more readable version of if statements inside check_outside_boundary().
- Accumulate mask in check_outside_boundary() instead of re-writing it for
each page table level to have correct gfns for comparison.
- Set argument `t` of p2m_get_entry() to p2m_invalid by default.
- Drop checking of (rc == P2M_TABLE_MAP_NOMEM ) when p2m_next_level(...,false,...)
is called.
- Add ASSERT(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)); in p2m_get_entry()
to be sure that recieved `mfn` has cleared lowest bits.
- Drop `valid` argument from p2m_get_entry(), it is not needed anymore.
- Drop p2m_lookup(), use p2m_get_entry() explicitly inside p2m_get_page_from_gfn().
- Update the commit message.
---
Changes in V4:
- Update prototype of p2m_is_locked() to return bool and accept pointer-to-const.
- Correct the comment above p2m_get_entry().
- Drop the check "BUILD_BUG_ON(XEN_PT_LEVEL_MAP_MASK(0) != PAGE_MASK);" inside
p2m_get_entry() as it is stale and it was needed to sure that 4k page(s) are
used on L3 (in Arm terms) what is true for RISC-V. (if not special extension
are used). It was another reason for Arm to have it (and I copied it to RISC-V),
but it isn't true for RISC-V. (some details could be found in response to the
patch).
- Style fixes.
- Add explanatory comment what the loop inside "gfn is higher then the highest
p2m mapping" does. Move this loop to separate function check_outside_boundary()
to cover both boundaries (lower_mapped_gfn and max_mapped_gfn).
- There is not need to allocate a page table as it is expected that
p2m_get_entry() normally would be called after a corresponding p2m_set_entry()
was called. So change 'true' to 'false' in a page table walking loop inside
p2m_get_entry().
- Correct handling of p2m_is_foreign case inside p2m_get_page_from_gfn().
- Introduce and use P2M_LEVEL_MASK instead of XEN_PT_LEVEL_MASK as it isn't take
into account two extra bits for root table in case of P2M.
- Drop stale item from "change in v3" - Add is_p2m_foreign() macro and connected stuff.
- Add p2m_read_(un)lock().
---
Changes in V3:
- Change struct domain *d argument of p2m_get_page_from_gfn() to
struct p2m_domain.
- Update the comment above p2m_get_entry().
- s/_t/p2mt for local variable in p2m_get_entry().
- Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn)
to define offsets array.
- Code style fixes.
- Update a check of rc code from p2m_next_level() in p2m_get_entry()
and drop "else" case.
- Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL.
- Use struct p2m_domain instead of struct domain for p2m_lookup() and
p2m_get_page_from_gfn().
- Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers"
---
Changes in V2:
- New patch.
---
xen/arch/riscv/include/asm/p2m.h | 21 ++++
xen/arch/riscv/mm.c | 13 +++
xen/arch/riscv/p2m.c | 185 +++++++++++++++++++++++++++++++
3 files changed, 219 insertions(+)
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index b48693a2b41c..f63b5dec99b1 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -41,6 +41,9 @@
#define P2M_GFN_LEVEL_SHIFT(lvl) (P2M_LEVEL_ORDER(lvl) + PAGE_SHIFT)
+#define P2M_LEVEL_MASK(p2m, lvl) \
+ (P2M_TABLE_OFFSET(p2m, lvl) << P2M_GFN_LEVEL_SHIFT(lvl))
+
#define paddr_bits PADDR_BITS
/* Get host p2m table */
@@ -234,6 +237,24 @@ static inline bool p2m_is_write_locked(struct p2m_domain *p2m)
unsigned long construct_hgatp(const struct p2m_domain *p2m, uint16_t vmid);
+static inline void p2m_read_lock(struct p2m_domain *p2m)
+{
+ read_lock(&p2m->lock);
+}
+
+static inline void p2m_read_unlock(struct p2m_domain *p2m)
+{
+ read_unlock(&p2m->lock);
+}
+
+static inline bool p2m_is_locked(const struct p2m_domain *p2m)
+{
+ return rw_is_locked(&p2m->lock);
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+ p2m_type_t *t);
+
#endif /* ASM__RISCV__P2M_H */
/*
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index e25f995b727f..e9ce182d066c 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -673,3 +673,16 @@ struct domain *page_get_owner_and_reference(struct page_info *page)
return owner;
}
+
+bool get_page(struct page_info *page, const struct domain *domain)
+{
+ const struct domain *owner = page_get_owner_and_reference(page);
+
+ if ( likely(owner == domain) )
+ return true;
+
+ if ( owner != NULL )
+ put_page(page);
+
+ return false;
+}
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 8d572f838fc3..66943b969e8a 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1057,3 +1057,188 @@ int map_regions_p2mt(struct domain *d,
return rc;
}
+
+/*
+ * p2m_get_entry() should always return the correct order value, even if an
+ * entry is not present (i.e. the GFN is outside the range):
+ * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn] (1)
+ *
+ * This ensures that callers of p2m_get_entry() can determine what range of
+ * address space would be altered by a corresponding p2m_set_entry().
+ * Also, it would help to avoid costly page walks for GFNs outside range (1).
+ *
+ * Therefore, this function returns true for GFNs outside range (1), and in
+ * that case the corresponding level is returned via the level_out argument.
+ * Otherwise, it returns false and p2m_get_entry() performs a page walk to
+ * find the proper entry.
+ */
+static bool check_outside_boundary(const struct p2m_domain *p2m, gfn_t gfn,
+ gfn_t boundary, bool is_lower,
+ unsigned int *level_out)
+{
+ unsigned int level = P2M_ROOT_LEVEL(p2m);
+ bool ret = false;
+
+ ASSERT(p2m);
+
+ if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
+ : gfn_x(gfn) > gfn_x(boundary) )
+ {
+ for ( ; level; level-- )
+ {
+ unsigned long mask = BIT(P2M_GFN_LEVEL_SHIFT(level), UL) - 1;
+
+ if ( is_lower ? (gfn_x(gfn) | mask) < gfn_x(boundary)
+ : (gfn_x(gfn) & ~mask) > gfn_x(boundary) )
+ break;
+ }
+
+ ret = true;
+ }
+
+ if ( level_out )
+ *level_out = level;
+
+ return ret;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN, the p2m type of the mapping,
+ * and the page order of the mapping in the page table (i.e., it could be a
+ * superpage) will be returned.
+ *
+ * If the entry is not present, INVALID_MFN will be returned, page_order will
+ * be set according to the order of the invalid range, and the type will be
+ * p2m_invalid.
+ */
+static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+ p2m_type_t *t,
+ unsigned int *page_order)
+{
+ unsigned int level = P2M_ROOT_LEVEL(p2m);
+ unsigned int gfn_limit_bits =
+ P2M_LEVEL_ORDER(level + 1) + P2M_ROOT_EXTRA_BITS(p2m, level);
+ pte_t entry, *table;
+ int rc;
+ mfn_t mfn = INVALID_MFN;
+
+ P2M_BUILD_LEVEL_OFFSETS(p2m, offsets, gfn_to_gaddr(gfn));
+
+ ASSERT(p2m_is_locked(p2m));
+
+ *t = p2m_invalid;
+
+ if ( gfn_x(gfn) > (BIT(gfn_limit_bits, UL) - 1) )
+ {
+ if ( page_order )
+ *page_order = gfn_limit_bits;
+
+ return mfn;
+ }
+
+ if ( check_outside_boundary(p2m, gfn, p2m->lowest_mapped_gfn, true,
+ &level) )
+ goto out;
+
+ if ( check_outside_boundary(p2m, gfn, p2m->max_mapped_gfn, false, &level) )
+ goto out;
+
+ table = p2m_get_root_pointer(p2m, gfn);
+
+ /*
+ * The table should always be non-NULL because the gfn is below
+ * p2m->max_mapped_gfn and the root table pages are always present.
+ */
+ if ( !table )
+ {
+ ASSERT_UNREACHABLE();
+ goto out;
+ }
+
+ for ( level = P2M_ROOT_LEVEL(p2m); level; level-- )
+ {
+ rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
+ if ( rc == P2M_TABLE_MAP_NONE )
+ goto out_unmap;
+
+ if ( rc != P2M_TABLE_NORMAL )
+ break;
+ }
+
+ entry = table[offsets[level]];
+
+ if ( pte_is_valid(entry) )
+ {
+ *t = p2m_get_type(entry);
+
+ mfn = pte_get_mfn(entry);
+
+ ASSERT(!(mfn_x(mfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
+
+ /*
+ * The entry may point to a superpage. Find the MFN associated
+ * to the GFN.
+ */
+ mfn = mfn_add(mfn,
+ gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
+ }
+
+ out_unmap:
+ unmap_domain_page(table);
+
+ out:
+ if ( page_order )
+ *page_order = P2M_LEVEL_ORDER(level);
+
+ return mfn;
+}
+
+struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
+ p2m_type_t *t)
+{
+ struct page_info *page;
+ p2m_type_t p2mt;
+ mfn_t mfn;
+
+ p2m_read_lock(p2m);
+ mfn = p2m_get_entry(p2m, gfn, &p2mt, NULL);
+
+ if ( t )
+ *t = p2mt;
+
+ if ( !mfn_valid(mfn) )
+ {
+ p2m_read_unlock(p2m);
+ return NULL;
+ }
+
+ page = mfn_to_page(mfn);
+
+ /*
+ * get_page won't work on foreign mapping because the page doesn't
+ * belong to the current domain.
+ */
+ if ( unlikely(p2m_is_foreign(p2mt)) )
+ {
+ const struct domain *fdom = page_get_owner_and_reference(page);
+
+ p2m_read_unlock(p2m);
+
+ if ( fdom )
+ {
+ if ( likely(fdom != p2m->domain) )
+ return page;
+
+ ASSERT_UNREACHABLE();
+ put_page(page);
+ }
+
+ return NULL;
+ }
+
+ p2m_read_unlock(p2m);
+
+ return get_page(page, p2m->domain) ? page : NULL;
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type
2025-12-22 16:37 [PATCH v8 0/3] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
@ 2025-12-22 16:37 ` Oleksii Kurochko
2025-12-29 15:13 ` Jan Beulich
2025-12-22 16:37 ` [PATCH v8 3/3] xen/riscv: update p2m_set_entry() to free unused metadata pages Oleksii Kurochko
2 siblings, 1 reply; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-22 16:37 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
RISC-V's PTE has only two available bits that can be used to store the P2M
type. This is insufficient to represent all the current RISC-V P2M types.
Therefore, some P2M types must be stored outside the PTE bits.
To address this, a metadata table is introduced to store P2M types that
cannot fit in the PTE itself. Not all P2M types are stored in the
metadata table—only those that require it.
The metadata table is linked to the intermediate page table via the
`struct page_info`'s v.md.metadata field of the corresponding intermediate
page.
Such pages are allocated with MEMF_no_owner, which allows us to use
the v field for the purpose of storing the metadata table.
To simplify the allocation and linking of intermediate and metadata page
tables, `p2m_{alloc,free}_table()` functions are implemented.
These changes impact `p2m_split_superpage()`, since when a superpage is
split, it is necessary to update the metadata table of the new
intermediate page table — if the entry being split has its P2M type set
to `p2m_ext_storage` in its `P2M_TYPES` bits. In addition to updating
the metadata of the new intermediate page table, the corresponding entry
in the metadata for the original superpage is invalidated.
Also, update p2m_{get,set}_type to work with P2M types which don't fit
into PTE bits.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V8:
- Update the comment above p2m_set_type().
- Drop BUG_ON(ctx->level ...) and
"if ( ctx->level <= P2M_MAX_SUPPORTED_LEVEL_MAPPING )" as p2m_set_type()
doesn't care about ctx->level and it is expected that passed `pte` is valid,
and so ctx->level is expected to be valid too.
- Rename p2m_pte_ctx argument to ctx for p2m_pte_from_mfn() and p2m_free_subtree().
- Initialize local variable p2m_pte_ctx inside p2m_split_superpage() with
an initializer. Drop an assigment of p2m_pte_ctx->level when old pte's type is
got.
- Use initializer for tmp_ctx and drop an assignment of tmp_ctx.p2m inside
p2m_set_type().
- Drop brackets around p2m_free_subtree() call inside p2m_set_entry().
---
Changes in V7:
- Put p2m_domain * inside struct p2m_pte_ctx and update an APIs of
p2m_set_type(), p2m_pte_from_mfn().
Also, move ASSERT(p2m) closer to p2m_alloc_page(ctx->p2m) inside
p2m_set_type().
Update all callers of p2m_set_type() and p2m_pte_from_mfn().
- Update the comment above BUILD_BUG_ON(p2m_invalid): drop unnessary
sentenses and make it shorter then 80 chars.
- Drop the comment and BUILD_BUG_ON() in p2m_get_type() as it is
enough to have it in p2m_set_type().
- Update the comment above p2m_set_type() about p2m argument which
was droppped.
- Make ctx argument of p2m_set_type() const to be able to re-use
p2m_pte_ctx across multiple iterations without fully reinitializing.
- Declare "struct p2m_pte_ctx tmp_ctx;" as function scope variable and
rework p2m_set_entry() correspondingly.
---
Changes in V6:
- Introduce new type md_t to use it instead of pte_t to store metadata
types outside PTE bits.
- Integrate introduced struct md_t.
- Drop local variable "struct domain *d" inside p2m_set_type().
- Drop __func__ printting and use %pv.
- Code style fixes
- Drop unnessarry check inside if-condition in p2m_pte_from_mfn()
as we have ASSERT(p2m) inside p2m_set_type() anyway.
- Return back the commnent inside page_to_p2m_table() as it was deleted
accidently.
- move the initialization of
p2m_pte_ctx.pt_page and p2m_pte_ctx.level ahead of the loop
- Add BUILD_BUG_ON(p2m_invalid) before the call of p2m_alloc_page() in p2m_set_type() and in p2m_get_type() before " if ( type == p2m_ext_storage )".
- Set to NULL tbl_pg->v.md.pg in p2m_free_table().
- Make argument 't' of p2m_set_type() non-const as we are going to change it.
- Add some explanatory comments.
- Update ASSERT at the start of p2m_set_type() to verify that
passed ctx->index is lesser then 512 and drop calculation of
an index of root page as it is guaranteed by calc_offset()
and get_root_pointer() that we will aready get proper page and
proper index inside this page.
---
Changes in V5:
- Rename metadata member of stuct md inside struct page_info to pg.
- Stray blank in the declaration of p2m_alloc_table().
- Use "<" instead of "<=" in ASSERT() in p2m_set_type().
- Move the check that ctx is provided to an earlier point in
p2m_set_type().
- Set `md_pg` after ASSERT() in p2m_set_type().
- Add BUG_ON() insetead of ASSERT_UNREACHABLE() in p2m_set_type().
- Drop a check that metadata isn't NULL before unmap_domain_page() is
being called.
- Make const `md` variable in p2m_get_type().
- unmap correct domain's page in p2m_get_type: use `md` instead of
ctx->pt_page->v.md.pg.
- Add description of how p2m and p2m_pte_ctx is expected to be used
in p2m_pte_from_mfn() and drop a comment from page_to_p2m_table().
- Drop the stale part of the comment above p2m_alloc_table().
- Drop ASSERT(tbl_pg->v.md.pg) from p2m_free_table() as tbl_pg->v.md.pg
is created conditionally now.
- Drop an introduction of p2m_alloc_table(), update p2m_alloc_page()
correspondengly and use it instead.
- Add missing blank in definition of level member for tmp_ctx variable
in p2m_free_subtree(). Also, add the comma at the end.
- Initialize old_type once before for-loop in p2m_split_superpage() as
old type will be used for all newly created PTEs.
- Properly initialize p2m_pte_ctx.level with next_level instead of
level when p2m_set_type() is going to be called for new PTEs.
- Fix identations.
- Move ASSERT(p2m) on top of p2m_set_type() to be sure that NULL isn't
passed for p2m argument of p2m_set_type().
- s/virt_to_page(table)/mfn_to_page(domain_page_map_to_mfn(table))
to recieve correct page for a table which is mapped by domain_page_map().
- Add "return;" after domain_crash() in p2m_set_type() to avoid potential
NULL pointer dereference of md_pg.
---
Changes in V4:
- Add Suggested-by: Jan Beulich <jbeulich@suse.com>.
- Update the comment above declation of md structure inside struct page_info to:
"Page is used as an intermediate P2M page table".
- Allocate metadata table on demand to save some memory. (1)
- Rework p2m_set_type():
- Add allocatation of metadata page only if needed.
- Move a check what kind of type we are handling inside p2m_set_type().
- Move mapping of metadata page inside p2m_get_type() as it is needed only
in case if PTE's type is equal to p2m_ext_storage.
- Add some description to p2m_get_type() function.
- Drop blank after return type of p2m_alloc_table().
- Drop allocation of metadata page inside p2m_alloc_table becaues of (1).
- Fix p2m_free_table() to free metadata page only if it was allocated.
---
Changes in V3:
- Add is_p2m_foreign() macro and connected stuff.
- Change struct domain *d argument of p2m_get_page_from_gfn() to
struct p2m_domain.
- Update the comment above p2m_get_entry().
- s/_t/p2mt for local variable in p2m_get_entry().
- Drop local variable addr in p2m_get_entry() and use gfn_to_gaddr(gfn)
to define offsets array.
- Code style fixes.
- Update a check of rc code from p2m_next_level() in p2m_get_entry()
and drop "else" case.
- Do not call p2m_get_type() if p2m_get_entry()'s t argument is NULL.
- Use struct p2m_domain instead of struct domain for p2m_lookup() and
p2m_get_page_from_gfn().
- Move defintion of get_page() from "xen/riscv: implement mfn_valid() and page reference, ownership handling helpers"
---
Changes in V2:
- New patch.
---
xen/arch/riscv/include/asm/mm.h | 9 ++
xen/arch/riscv/p2m.c | 236 ++++++++++++++++++++++++++++----
2 files changed, 215 insertions(+), 30 deletions(-)
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 1a99e1cf0a3c..48162f5d65cd 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -149,6 +149,15 @@ struct page_info
/* Order-size of the free chunk this page is the head of. */
unsigned int order;
} free;
+
+ /* Page is used as an intermediate P2M page table */
+ struct {
+ /*
+ * Pointer to a page which store metadata for an intermediate page
+ * table.
+ */
+ struct page_info *pg;
+ } md;
} v;
union {
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 66943b969e8a..24dd07165bd1 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -26,6 +26,25 @@
*/
#define P2M_MAX_SUPPORTED_LEVEL_MAPPING _AC(2, U)
+struct md_t {
+ /*
+ * Describes a type stored outside PTE bits.
+ * Look at the comment above definition of enum p2m_type_t.
+ */
+ p2m_type_t type : 4;
+};
+
+/*
+ * P2M PTE context is used only when a PTE's P2M type is p2m_ext_storage.
+ * In this case, the P2M type is stored separately in the metadata page.
+ */
+struct p2m_pte_ctx {
+ struct p2m_domain *p2m;
+ struct page_info *pt_page; /* Page table page containing the PTE. */
+ unsigned int index; /* Index of the PTE within that page. */
+ unsigned int level; /* Paging level at which the PTE resides. */
+};
+
static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
.mode = HGATP_MODE_OFF,
.paging_levels = 0,
@@ -37,6 +56,10 @@ unsigned char get_max_supported_mode(void)
return max_gstage_mode.mode;
}
+/*
+ * If anything is changed here, it may also require updates to
+ * p2m_{get,set}_type().
+ */
static inline unsigned int calc_offset(const struct p2m_domain *p2m,
const unsigned int lvl,
const paddr_t gpa)
@@ -79,6 +102,9 @@ static inline unsigned int calc_offset(const struct p2m_domain *p2m,
* The caller is responsible for unmapping the page after use.
*
* Returns NULL if the calculated offset into the root table is invalid.
+ *
+ * If anything is changed here, it may also require updates to
+ * p2m_{get,set}_type().
*/
static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
{
@@ -370,24 +396,96 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
return pg;
}
-static int p2m_set_type(pte_t *pte, p2m_type_t t)
+/*
+ * `pte` – PTE entry for which the type `t` will be stored.
+ *
+ * If `t` >= p2m_first_external, a valid `ctx` must be provided.
+ */
+static void p2m_set_type(pte_t *pte, p2m_type_t t,
+ const struct p2m_pte_ctx *ctx)
{
- int rc = 0;
+ struct page_info **md_pg;
+ struct md_t *metadata = NULL;
- if ( t > p2m_first_external )
- panic("unimplemeted\n");
- else
- pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
+ /*
+ * It is sufficient to compare ctx->index with PAGETABLE_ENTRIES because,
+ * even for the p2m root page table (which is a 16 KB page allocated as
+ * four 4 KB pages), calc_offset() guarantees that the page-table index
+ * will always fall within the range [0, 511].
+ */
+ ASSERT(ctx && ctx->index < PAGETABLE_ENTRIES);
- return rc;
+ /*
+ * At the moment, p2m_get_root_pointer() returns one of four possible p2m
+ * root pages, so there is no need to search for the correct ->pt_page
+ * here.
+ * Non-root page tables are 4 KB pages, so simply using ->pt_page is
+ * sufficient.
+ */
+ md_pg = &ctx->pt_page->v.md.pg;
+
+ if ( !*md_pg && (t >= p2m_first_external) )
+ {
+ /*
+ * Since p2m_alloc_page() initializes an allocated page with
+ * zeros, p2m_invalid is expected to have the value 0 as well.
+ */
+ BUILD_BUG_ON(p2m_invalid);
+
+ ASSERT(ctx->p2m);
+
+ *md_pg = p2m_alloc_page(ctx->p2m);
+ if ( !*md_pg )
+ {
+ printk("%pd: can't allocate metadata page\n",
+ ctx->p2m->domain);
+ domain_crash(ctx->p2m->domain);
+
+ return;
+ }
+ }
+
+ if ( *md_pg )
+ metadata = __map_domain_page(*md_pg);
+
+ if ( t >= p2m_first_external )
+ {
+ metadata[ctx->index].type = t;
+
+ t = p2m_ext_storage;
+ }
+ else if ( metadata )
+ metadata[ctx->index].type = p2m_invalid;
+
+ pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
+
+ unmap_domain_page(metadata);
}
-static p2m_type_t p2m_get_type(const pte_t pte)
+/*
+ * `pte` -> PTE entry that stores the PTE's type.
+ *
+ * If the PTE's type is `p2m_ext_storage`, `ctx` should be provided;
+ * otherwise it could be NULL.
+ */
+static p2m_type_t p2m_get_type(const pte_t pte, const struct p2m_pte_ctx *ctx)
{
p2m_type_t type = MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK);
if ( type == p2m_ext_storage )
- panic("unimplemented\n");
+ {
+ const struct md_t *md = __map_domain_page(ctx->pt_page->v.md.pg);
+
+ type = md[ctx->index].type;
+
+ /*
+ * Since p2m_set_type() guarantees that the type will be greater than
+ * p2m_first_external, just check that we received a valid type here.
+ */
+ ASSERT(type > p2m_first_external);
+
+ unmap_domain_page(md);
+ }
return type;
}
@@ -477,7 +575,14 @@ static void p2m_set_permission(pte_t *e, p2m_type_t t)
}
}
-static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
+/*
+ * If p2m_pte_from_mfn() is called with ctx = NULL,
+ * it means the function is working with a page table for which the `t`
+ * should not be applicable. Otherwise, the function is handling a leaf PTE
+ * for which `t` is applicable.
+ */
+static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t,
+ struct p2m_pte_ctx *ctx)
{
pte_t e = (pte_t) { PTE_VALID };
@@ -485,7 +590,7 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
ASSERT(!(mfn_to_maddr(mfn) & ~PADDR_MASK) || mfn_eq(mfn, INVALID_MFN));
- if ( !is_table )
+ if ( ctx )
{
switch ( t )
{
@@ -498,7 +603,7 @@ static pte_t p2m_pte_from_mfn(mfn_t mfn, p2m_type_t t, bool is_table)
}
p2m_set_permission(&e, t);
- p2m_set_type(&e, t);
+ p2m_set_type(&e, t, ctx);
}
else
/*
@@ -518,7 +623,22 @@ static pte_t page_to_p2m_table(const struct page_info *page)
* set to true and p2m_type_t shouldn't be applied for PTEs which
* describe an intermediate table.
*/
- return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, true);
+ return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL);
+}
+
+static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
+
+/*
+ * Free page table's page and metadata page linked to page table's page.
+ */
+static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
+{
+ if ( tbl_pg->v.md.pg )
+ {
+ p2m_free_page(p2m, tbl_pg->v.md.pg);
+ tbl_pg->v.md.pg = NULL;
+ }
+ p2m_free_page(p2m, tbl_pg);
}
/* Allocate a new page table page and hook it in via the given entry. */
@@ -679,12 +799,14 @@ static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg)
/* Free pte sub-tree behind an entry */
static void p2m_free_subtree(struct p2m_domain *p2m,
- pte_t entry, unsigned int level)
+ pte_t entry,
+ const struct p2m_pte_ctx *ctx)
{
unsigned int i;
pte_t *table;
mfn_t mfn;
struct page_info *pg;
+ unsigned int level = ctx->level;
/*
* Check if the level is valid: only 4K - 2M - 1G mappings are supported.
@@ -700,7 +822,7 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
if ( pte_is_mapping(entry) )
{
- p2m_type_t p2mt = p2m_get_type(entry);
+ p2m_type_t p2mt = p2m_get_type(entry, ctx);
#ifdef CONFIG_IOREQ_SERVER
/*
@@ -719,10 +841,22 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
return;
}
- table = map_domain_page(pte_get_mfn(entry));
+ mfn = pte_get_mfn(entry);
+ ASSERT(mfn_valid(mfn));
+ table = map_domain_page(mfn);
+ pg = mfn_to_page(mfn);
for ( i = 0; i < P2M_PAGETABLE_ENTRIES(p2m, level); i++ )
- p2m_free_subtree(p2m, table[i], level - 1);
+ {
+ struct p2m_pte_ctx tmp_ctx = {
+ .pt_page = pg,
+ .index = i,
+ .level = level - 1,
+ .p2m = p2m,
+ };
+
+ p2m_free_subtree(p2m, table[i], &tmp_ctx);
+ }
unmap_domain_page(table);
@@ -734,17 +868,13 @@ static void p2m_free_subtree(struct p2m_domain *p2m,
*/
p2m_tlb_flush_sync(p2m);
- mfn = pte_get_mfn(entry);
- ASSERT(mfn_valid(mfn));
-
- pg = mfn_to_page(mfn);
-
- p2m_free_page(p2m, pg);
+ p2m_free_table(p2m, pg);
}
static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
unsigned int level, unsigned int target,
- const unsigned int *offsets)
+ const unsigned int *offsets,
+ struct page_info *tbl_pg)
{
struct page_info *page;
unsigned long i;
@@ -756,6 +886,14 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
unsigned int next_level = level - 1;
unsigned int level_order = P2M_LEVEL_ORDER(next_level);
+ struct p2m_pte_ctx p2m_pte_ctx = {
+ .p2m = p2m,
+ .level = level,
+ };
+
+ /* Init with p2m_invalid just to make compiler happy. */
+ p2m_type_t old_type = p2m_invalid;
+
/*
* This should only be called with target != level and the entry is
* a superpage.
@@ -777,6 +915,17 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
table = __map_domain_page(page);
+ if ( MASK_EXTR(entry->pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
+ {
+ p2m_pte_ctx.pt_page = tbl_pg;
+ p2m_pte_ctx.index = offsets[level];
+
+ old_type = p2m_get_type(*entry, &p2m_pte_ctx);
+ }
+
+ p2m_pte_ctx.pt_page = page;
+ p2m_pte_ctx.level = next_level;
+
for ( i = 0; i < P2M_PAGETABLE_ENTRIES(p2m, next_level); i++ )
{
pte_t *new_entry = table + i;
@@ -788,6 +937,13 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
pte = *entry;
pte_set_mfn(&pte, mfn_add(mfn, i << level_order));
+ if ( MASK_EXTR(pte.pte, P2M_TYPE_PTE_BITS_MASK) == p2m_ext_storage )
+ {
+ p2m_pte_ctx.index = i;
+
+ p2m_set_type(&pte, old_type, &p2m_pte_ctx);
+ }
+
write_pte(new_entry, pte);
}
@@ -799,7 +955,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, pte_t *entry,
*/
if ( next_level != target )
rv = p2m_split_superpage(p2m, table + offsets[next_level],
- next_level, target, offsets);
+ next_level, target, offsets, page);
if ( p2m->clean_dcache )
clean_dcache_va_range(table, PAGE_SIZE);
@@ -840,6 +996,9 @@ static int p2m_set_entry(struct p2m_domain *p2m,
* are still allowed.
*/
bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
+ struct p2m_pte_ctx tmp_ctx = {
+ .p2m = p2m,
+ };
P2M_BUILD_LEVEL_OFFSETS(p2m, offsets, gfn_to_gaddr(gfn));
ASSERT(p2m_is_write_locked(p2m));
@@ -890,13 +1049,19 @@ static int p2m_set_entry(struct p2m_domain *p2m,
{
/* We need to split the original page. */
pte_t split_pte = *entry;
+ struct page_info *tbl_pg = mfn_to_page(domain_page_map_to_mfn(table));
ASSERT(pte_is_superpage(*entry, level));
- if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) )
+ if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets,
+ tbl_pg) )
{
+ tmp_ctx.pt_page = tbl_pg;
+ tmp_ctx.index = offsets[level];
+ tmp_ctx.level = level;
+
/* Free the allocated sub-tree */
- p2m_free_subtree(p2m, split_pte, level);
+ p2m_free_subtree(p2m, split_pte, &tmp_ctx);
rc = -ENOMEM;
goto out;
@@ -922,6 +1087,10 @@ static int p2m_set_entry(struct p2m_domain *p2m,
entry = table + offsets[level];
}
+ tmp_ctx.pt_page = mfn_to_page(domain_page_map_to_mfn(table));
+ tmp_ctx.index = offsets[level];
+ tmp_ctx.level = level;
+
/*
* We should always be there with the correct level because all the
* intermediate tables have been installed if necessary.
@@ -934,7 +1103,7 @@ static int p2m_set_entry(struct p2m_domain *p2m,
p2m_clean_pte(entry, p2m->clean_dcache);
else
{
- pte_t pte = p2m_pte_from_mfn(mfn, t, false);
+ pte_t pte = p2m_pte_from_mfn(mfn, t, &tmp_ctx);
p2m_write_pte(entry, pte, p2m->clean_dcache);
@@ -970,7 +1139,7 @@ static int p2m_set_entry(struct p2m_domain *p2m,
if ( pte_is_valid(orig_pte) &&
(!pte_is_valid(*entry) ||
!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte))) )
- p2m_free_subtree(p2m, orig_pte, level);
+ p2m_free_subtree(p2m, orig_pte, &tmp_ctx);
out:
unmap_domain_page(table);
@@ -1171,7 +1340,14 @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
if ( pte_is_valid(entry) )
{
- *t = p2m_get_type(entry);
+ struct p2m_pte_ctx p2m_pte_ctx = {
+ .pt_page = mfn_to_page(domain_page_map_to_mfn(table)),
+ .index = offsets[level],
+ .level = level,
+ .p2m = p2m,
+ };
+
+ *t = p2m_get_type(entry, &p2m_pte_ctx);
mfn = pte_get_mfn(entry);
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v8 3/3] xen/riscv: update p2m_set_entry() to free unused metadata pages
2025-12-22 16:37 [PATCH v8 0/3] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
@ 2025-12-22 16:37 ` Oleksii Kurochko
2 siblings, 0 replies; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-22 16:37 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce tracking of metadata page entries usage and if all of them are
p2m_invalid then free them.
Intermediate P2M page tables are allocated with MEMF_no_owner, so we are free
to repurpose struct page_info fields for them. Since page_info.u.* is not
used for such pages, introduce a used_entries counter in struct page_info
to track how many metadata entries are in use for a given intermediate P2M
page table.
The counter is updated in p2m_set_type() when metadata entries transition
between p2m_invalid and a valid external type. When the last metadata entry
is cleared (used_entries == 0), the associated metadata page is freed and
returned to the P2M pool.
Refactor metadata page freeing into a new helper, p2m_free_metadata_page(),
as the same logic is needed both when tearing down a P2M table and when
all metadata entries become p2m_invalid in p2m_set_type(). As part of this
refactoring, move the declaration of p2m_free_page() earlier to satisfy the
new helper.
Additionally, implement page_set_tlbflush_timestamp() for RISC-V instead of
BUGing, as it is invoked when returning memory to the domheap.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V9:
- Add Acked-by: Jan Beulich <jbeulich@suse.com>.
---
Changes in V8:
- New patch.
---
xen/arch/riscv/include/asm/flushtlb.h | 2 +-
xen/arch/riscv/include/asm/mm.h | 12 ++++++++++
xen/arch/riscv/p2m.c | 32 +++++++++++++++++++++------
3 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h
index ab32311568ac..4f64f9757058 100644
--- a/xen/arch/riscv/include/asm/flushtlb.h
+++ b/xen/arch/riscv/include/asm/flushtlb.h
@@ -38,7 +38,7 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp) {}
static inline void page_set_tlbflush_timestamp(struct page_info *page)
{
- BUG_ON("unimplemented");
+ page->tlbflush_timestamp = tlbflush_current_time();
}
static inline void arch_flush_tlb_mask(const cpumask_t *mask)
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 48162f5d65cd..a005d0247a6f 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -113,6 +113,18 @@ struct page_info
unsigned long type_info;
} inuse;
+ /* Page is used as an intermediate P2M page table: count_info == 0 */
+ struct {
+ /*
+ * Tracks the number of used entries in the metadata page table.
+ *
+ * If used_entries == 0, then `page_info.v.md.pg` can be freed and
+ * returned to the P2M pool.
+ */
+ unsigned long used_entries;
+ } md;
+
+
/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
union {
struct {
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 24dd07165bd1..a6e4c01b873d 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -51,6 +51,18 @@ static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
.name = "Bare",
};
+static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
+
+static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
+ struct page_info **md_pg)
+{
+ if ( *md_pg )
+ {
+ p2m_free_page(p2m, *md_pg);
+ *md_pg = NULL;
+ }
+}
+
unsigned char get_max_supported_mode(void)
{
return max_gstage_mode.mode;
@@ -450,16 +462,27 @@ static void p2m_set_type(pte_t *pte, p2m_type_t t,
if ( t >= p2m_first_external )
{
+ if ( metadata[ctx->index].type == p2m_invalid )
+ ctx->pt_page->u.md.used_entries++;
+
metadata[ctx->index].type = t;
t = p2m_ext_storage;
}
else if ( metadata )
+ {
+ if ( metadata[ctx->index].type != p2m_invalid )
+ ctx->pt_page->u.md.used_entries--;
+
metadata[ctx->index].type = p2m_invalid;
+ }
pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
unmap_domain_page(metadata);
+
+ if ( *md_pg && !ctx->pt_page->u.md.used_entries )
+ p2m_free_metadata_page(ctx->p2m, md_pg);
}
/*
@@ -626,18 +649,13 @@ static pte_t page_to_p2m_table(const struct page_info *page)
return p2m_pte_from_mfn(page_to_mfn(page), p2m_invalid, NULL);
}
-static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
-
/*
* Free page table's page and metadata page linked to page table's page.
*/
static void p2m_free_table(struct p2m_domain *p2m, struct page_info *tbl_pg)
{
- if ( tbl_pg->v.md.pg )
- {
- p2m_free_page(p2m, tbl_pg->v.md.pg);
- tbl_pg->v.md.pg = NULL;
- }
+ p2m_free_metadata_page(p2m, &tbl_pg->v.md.pg);
+
p2m_free_page(p2m, tbl_pg);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN
2025-12-22 16:37 ` [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
@ 2025-12-29 15:06 ` Jan Beulich
2025-12-30 15:25 ` Oleksii Kurochko
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2025-12-29 15:06 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 22.12.2025 17:37, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1057,3 +1057,188 @@ int map_regions_p2mt(struct domain *d,
>
> return rc;
> }
> +
> +/*
> + * p2m_get_entry() should always return the correct order value, even if an
> + * entry is not present (i.e. the GFN is outside the range):
> + * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn] (1)
> + *
> + * This ensures that callers of p2m_get_entry() can determine what range of
> + * address space would be altered by a corresponding p2m_set_entry().
> + * Also, it would help to avoid costly page walks for GFNs outside range (1).
> + *
> + * Therefore, this function returns true for GFNs outside range (1), and in
> + * that case the corresponding level is returned via the level_out argument.
> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
> + * find the proper entry.
> + */
> +static bool check_outside_boundary(const struct p2m_domain *p2m, gfn_t gfn,
> + gfn_t boundary, bool is_lower,
> + unsigned int *level_out)
> +{
> + unsigned int level = P2M_ROOT_LEVEL(p2m);
> + bool ret = false;
> +
> + ASSERT(p2m);
> +
> + if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
> + : gfn_x(gfn) > gfn_x(boundary) )
> + {
> + for ( ; level; level-- )
> + {
> + unsigned long mask = BIT(P2M_GFN_LEVEL_SHIFT(level), UL) - 1;
> +
> + if ( is_lower ? (gfn_x(gfn) | mask) < gfn_x(boundary)
> + : (gfn_x(gfn) & ~mask) > gfn_x(boundary) )
> + break;
> + }
> +
> + ret = true;
For this case ...
> + }
> +
> + if ( level_out )
> + *level_out = level;
... this is correct, but of "ret" is still false it very likely isn't, and
arranging things this way may end up being confusing. Perhaps "level" should
be constrained to the if()'s scope? The caller cares about the value only
when the return value is true, after all.
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type
2025-12-22 16:37 ` [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
@ 2025-12-29 15:13 ` Jan Beulich
2025-12-30 15:47 ` Oleksii Kurochko
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2025-12-29 15:13 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 22.12.2025 17:37, Oleksii Kurochko wrote:
> @@ -370,24 +396,96 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
> return pg;
> }
>
> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
> +/*
> + * `pte` – PTE entry for which the type `t` will be stored.
> + *
> + * If `t` >= p2m_first_external, a valid `ctx` must be provided.
> + */
> +static void p2m_set_type(pte_t *pte, p2m_type_t t,
> + const struct p2m_pte_ctx *ctx)
> {
> - int rc = 0;
> + struct page_info **md_pg;
> + struct md_t *metadata = NULL;
>
> - if ( t > p2m_first_external )
> - panic("unimplemeted\n");
> - else
> - pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
> + /*
> + * It is sufficient to compare ctx->index with PAGETABLE_ENTRIES because,
> + * even for the p2m root page table (which is a 16 KB page allocated as
> + * four 4 KB pages), calc_offset() guarantees that the page-table index
> + * will always fall within the range [0, 511].
> + */
> + ASSERT(ctx && ctx->index < PAGETABLE_ENTRIES);
>
> - return rc;
> + /*
> + * At the moment, p2m_get_root_pointer() returns one of four possible p2m
> + * root pages, so there is no need to search for the correct ->pt_page
> + * here.
> + * Non-root page tables are 4 KB pages, so simply using ->pt_page is
> + * sufficient.
> + */
> + md_pg = &ctx->pt_page->v.md.pg;
> +
> + if ( !*md_pg && (t >= p2m_first_external) )
> + {
> + /*
> + * Since p2m_alloc_page() initializes an allocated page with
> + * zeros, p2m_invalid is expected to have the value 0 as well.
> + */
> + BUILD_BUG_ON(p2m_invalid);
> +
> + ASSERT(ctx->p2m);
I think I previously asked for this to be moved out of the if(). Else you
may not notice a caller side issue until a point where a metadata page
actually needs allocating. (This could simply be folded into the earlier
ASSERT().)
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN
2025-12-29 15:06 ` Jan Beulich
@ 2025-12-30 15:25 ` Oleksii Kurochko
2026-01-05 14:23 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-30 15:25 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12/29/25 4:06 PM, Jan Beulich wrote:
> On 22.12.2025 17:37, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -1057,3 +1057,188 @@ int map_regions_p2mt(struct domain *d,
>>
>> return rc;
>> }
>> +
>> +/*
>> + * p2m_get_entry() should always return the correct order value, even if an
>> + * entry is not present (i.e. the GFN is outside the range):
>> + * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn] (1)
>> + *
>> + * This ensures that callers of p2m_get_entry() can determine what range of
>> + * address space would be altered by a corresponding p2m_set_entry().
>> + * Also, it would help to avoid costly page walks for GFNs outside range (1).
>> + *
>> + * Therefore, this function returns true for GFNs outside range (1), and in
>> + * that case the corresponding level is returned via the level_out argument.
>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>> + * find the proper entry.
>> + */
>> +static bool check_outside_boundary(const struct p2m_domain *p2m, gfn_t gfn,
>> + gfn_t boundary, bool is_lower,
>> + unsigned int *level_out)
>> +{
>> + unsigned int level = P2M_ROOT_LEVEL(p2m);
>> + bool ret = false;
>> +
>> + ASSERT(p2m);
>> +
>> + if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>> + : gfn_x(gfn) > gfn_x(boundary) )
>> + {
>> + for ( ; level; level-- )
>> + {
>> + unsigned long mask = BIT(P2M_GFN_LEVEL_SHIFT(level), UL) - 1;
>> +
>> + if ( is_lower ? (gfn_x(gfn) | mask) < gfn_x(boundary)
>> + : (gfn_x(gfn) & ~mask) > gfn_x(boundary) )
>> + break;
>> + }
>> +
>> + ret = true;
> For this case ...
>
>> + }
>> +
>> + if ( level_out )
>> + *level_out = level;
> ... this is correct, but of "ret" is still false it very likely isn't, and
> arranging things this way may end up being confusing. Perhaps "level" should
> be constrained to the if()'s scope? The caller cares about the value only
> when the return value is true, after all.
We could simply move the "|if ( level_out )"| check inside the|if| block, but
is this really a significant issue? We still need to check the return value,
and if it is false,|level_out| should just be ignored and there is not big
difference then if level_out will contain what it contained before the call
of check_outside_boundary() or it will be set to P2M_ROOT_LEVEL(p2m).
Alternatively, could we initialize|level| to a non-existent value in the
"ret=false" case, for example|P2M_MAX_ROOT_LEVEL| + 1, and return that value
via|level_out|?
~ Oleksii
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type
2025-12-29 15:13 ` Jan Beulich
@ 2025-12-30 15:47 ` Oleksii Kurochko
2026-01-05 14:24 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Oleksii Kurochko @ 2025-12-30 15:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12/29/25 4:13 PM, Jan Beulich wrote:
> On 22.12.2025 17:37, Oleksii Kurochko wrote:
>> @@ -370,24 +396,96 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
>> return pg;
>> }
>>
>> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
>> +/*
>> + * `pte` – PTE entry for which the type `t` will be stored.
>> + *
>> + * If `t` >= p2m_first_external, a valid `ctx` must be provided.
>> + */
>> +static void p2m_set_type(pte_t *pte, p2m_type_t t,
>> + const struct p2m_pte_ctx *ctx)
>> {
>> - int rc = 0;
>> + struct page_info **md_pg;
>> + struct md_t *metadata = NULL;
>>
>> - if ( t > p2m_first_external )
>> - panic("unimplemeted\n");
>> - else
>> - pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>> + /*
>> + * It is sufficient to compare ctx->index with PAGETABLE_ENTRIES because,
>> + * even for the p2m root page table (which is a 16 KB page allocated as
>> + * four 4 KB pages), calc_offset() guarantees that the page-table index
>> + * will always fall within the range [0, 511].
>> + */
>> + ASSERT(ctx && ctx->index < PAGETABLE_ENTRIES);
>>
>> - return rc;
>> + /*
>> + * At the moment, p2m_get_root_pointer() returns one of four possible p2m
>> + * root pages, so there is no need to search for the correct ->pt_page
>> + * here.
>> + * Non-root page tables are 4 KB pages, so simply using ->pt_page is
>> + * sufficient.
>> + */
>> + md_pg = &ctx->pt_page->v.md.pg;
>> +
>> + if ( !*md_pg && (t >= p2m_first_external) )
>> + {
>> + /*
>> + * Since p2m_alloc_page() initializes an allocated page with
>> + * zeros, p2m_invalid is expected to have the value 0 as well.
>> + */
>> + BUILD_BUG_ON(p2m_invalid);
>> +
>> + ASSERT(ctx->p2m);
> I think I previously asked for this to be moved out of the if(). Else you
> may not notice a caller side issue until a point where a metadata page
> actually needs allocating. (This could simply be folded into the earlier
> ASSERT().)
I think that I understand your intention and okay to fold ASSERT(ctx->p2m)
into the earlier ASSERT().
Just want to note that generally if the metadata page has been already
allocated and then p2m_set_type() will be called with ctx->p2m == NULL then
nothing serious will happen as basically ctx->p2m is needed in this function
only for allocation of metadata page.
~ Oleksii
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN
2025-12-30 15:25 ` Oleksii Kurochko
@ 2026-01-05 14:23 ` Jan Beulich
0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2026-01-05 14:23 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 30.12.2025 16:25, Oleksii Kurochko wrote:
>
> On 12/29/25 4:06 PM, Jan Beulich wrote:
>> On 22.12.2025 17:37, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/p2m.c
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -1057,3 +1057,188 @@ int map_regions_p2mt(struct domain *d,
>>>
>>> return rc;
>>> }
>>> +
>>> +/*
>>> + * p2m_get_entry() should always return the correct order value, even if an
>>> + * entry is not present (i.e. the GFN is outside the range):
>>> + * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn] (1)
>>> + *
>>> + * This ensures that callers of p2m_get_entry() can determine what range of
>>> + * address space would be altered by a corresponding p2m_set_entry().
>>> + * Also, it would help to avoid costly page walks for GFNs outside range (1).
>>> + *
>>> + * Therefore, this function returns true for GFNs outside range (1), and in
>>> + * that case the corresponding level is returned via the level_out argument.
>>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>>> + * find the proper entry.
>>> + */
>>> +static bool check_outside_boundary(const struct p2m_domain *p2m, gfn_t gfn,
>>> + gfn_t boundary, bool is_lower,
>>> + unsigned int *level_out)
>>> +{
>>> + unsigned int level = P2M_ROOT_LEVEL(p2m);
>>> + bool ret = false;
>>> +
>>> + ASSERT(p2m);
>>> +
>>> + if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>>> + : gfn_x(gfn) > gfn_x(boundary) )
>>> + {
>>> + for ( ; level; level-- )
>>> + {
>>> + unsigned long mask = BIT(P2M_GFN_LEVEL_SHIFT(level), UL) - 1;
>>> +
>>> + if ( is_lower ? (gfn_x(gfn) | mask) < gfn_x(boundary)
>>> + : (gfn_x(gfn) & ~mask) > gfn_x(boundary) )
>>> + break;
>>> + }
>>> +
>>> + ret = true;
>> For this case ...
>>
>>> + }
>>> +
>>> + if ( level_out )
>>> + *level_out = level;
>> ... this is correct, but of "ret" is still false it very likely isn't, and
>> arranging things this way may end up being confusing. Perhaps "level" should
>> be constrained to the if()'s scope? The caller cares about the value only
>> when the return value is true, after all.
>
> We could simply move the "|if ( level_out )"| check inside the|if| block, but
> is this really a significant issue?
As I said - it is (or has the potential to be) confusing. No more, but also no
less.
> We still need to check the return value,
> and if it is false,|level_out| should just be ignored and there is not big
> difference then if level_out will contain what it contained before the call
> of check_outside_boundary() or it will be set to P2M_ROOT_LEVEL(p2m).
>
> Alternatively, could we initialize|level| to a non-existent value in the
> "ret=false" case, for example|P2M_MAX_ROOT_LEVEL| + 1, and return that value
> via|level_out|?
Might be another option, yes. Depending on how the ultimate set of callers
are going to behave.
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type
2025-12-30 15:47 ` Oleksii Kurochko
@ 2026-01-05 14:24 ` Jan Beulich
0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2026-01-05 14:24 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 30.12.2025 16:47, Oleksii Kurochko wrote:
>
> On 12/29/25 4:13 PM, Jan Beulich wrote:
>> On 22.12.2025 17:37, Oleksii Kurochko wrote:
>>> @@ -370,24 +396,96 @@ static struct page_info *p2m_alloc_page(struct p2m_domain *p2m)
>>> return pg;
>>> }
>>>
>>> -static int p2m_set_type(pte_t *pte, p2m_type_t t)
>>> +/*
>>> + * `pte` – PTE entry for which the type `t` will be stored.
>>> + *
>>> + * If `t` >= p2m_first_external, a valid `ctx` must be provided.
>>> + */
>>> +static void p2m_set_type(pte_t *pte, p2m_type_t t,
>>> + const struct p2m_pte_ctx *ctx)
>>> {
>>> - int rc = 0;
>>> + struct page_info **md_pg;
>>> + struct md_t *metadata = NULL;
>>>
>>> - if ( t > p2m_first_external )
>>> - panic("unimplemeted\n");
>>> - else
>>> - pte->pte |= MASK_INSR(t, P2M_TYPE_PTE_BITS_MASK);
>>> + /*
>>> + * It is sufficient to compare ctx->index with PAGETABLE_ENTRIES because,
>>> + * even for the p2m root page table (which is a 16 KB page allocated as
>>> + * four 4 KB pages), calc_offset() guarantees that the page-table index
>>> + * will always fall within the range [0, 511].
>>> + */
>>> + ASSERT(ctx && ctx->index < PAGETABLE_ENTRIES);
>>>
>>> - return rc;
>>> + /*
>>> + * At the moment, p2m_get_root_pointer() returns one of four possible p2m
>>> + * root pages, so there is no need to search for the correct ->pt_page
>>> + * here.
>>> + * Non-root page tables are 4 KB pages, so simply using ->pt_page is
>>> + * sufficient.
>>> + */
>>> + md_pg = &ctx->pt_page->v.md.pg;
>>> +
>>> + if ( !*md_pg && (t >= p2m_first_external) )
>>> + {
>>> + /*
>>> + * Since p2m_alloc_page() initializes an allocated page with
>>> + * zeros, p2m_invalid is expected to have the value 0 as well.
>>> + */
>>> + BUILD_BUG_ON(p2m_invalid);
>>> +
>>> + ASSERT(ctx->p2m);
>> I think I previously asked for this to be moved out of the if(). Else you
>> may not notice a caller side issue until a point where a metadata page
>> actually needs allocating. (This could simply be folded into the earlier
>> ASSERT().)
>
> I think that I understand your intention and okay to fold ASSERT(ctx->p2m)
> into the earlier ASSERT().
> Just want to note that generally if the metadata page has been already
> allocated and then p2m_set_type() will be called with ctx->p2m == NULL then
> nothing serious will happen as basically ctx->p2m is needed in this function
> only for allocation of metadata page.
Correct, but how would any particular caller know? Yes, there may be special
cases where a caller does know, but imo the code is going to be more robust
if the check is always being made (forcing all callers to set ->p2m).
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-05 14:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-22 16:37 [PATCH v8 0/3] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-12-22 16:37 ` [PATCH v8 1/3] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-12-29 15:06 ` Jan Beulich
2025-12-30 15:25 ` Oleksii Kurochko
2026-01-05 14:23 ` Jan Beulich
2025-12-22 16:37 ` [PATCH v8 2/3] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-12-29 15:13 ` Jan Beulich
2025-12-30 15:47 ` Oleksii Kurochko
2026-01-05 14:24 ` Jan Beulich
2025-12-22 16:37 ` [PATCH v8 3/3] xen/riscv: update p2m_set_entry() to free unused metadata pages Oleksii Kurochko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.