From: Oleksii Kurochko <oleksii.kurochko@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Alistair Francis" <alistair.francis@wdc.com>,
"Bob Eshleman" <bobbyeshleman@gmail.com>,
"Connor Davis" <connojdavis@gmail.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Anthony PERARD" <anthony.perard@vates.tech>,
"Michal Orzel" <michal.orzel@amd.com>,
"Julien Grall" <julien@xen.org>,
"Roger Pau Monné" <roger.pau@citrix.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 10/18] xen/riscv: implement p2m_set_range()
Date: Thu, 25 Sep 2025 22:08:52 +0200 [thread overview]
Message-ID: <a5c016c9-aee4-4a86-a6cc-0d89dd5e9216@gmail.com> (raw)
In-Reply-To: <6ee4846e-dd27-4588-aac5-f2fe2937db18@suse.com>
[-- Attachment #1: Type: text/plain, Size: 10193 bytes --]
On 9/20/25 1:36 AM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -16,6 +16,7 @@
>> #include <asm/riscv_encoding.h>
>>
>> unsigned long __ro_after_init gstage_mode;
>> +unsigned int __ro_after_init gstage_root_level;
>>
>> void __init gstage_mode_detect(void)
>> {
>> @@ -53,6 +54,7 @@ void __init gstage_mode_detect(void)
>> if ( MASK_EXTR(csr_read(CSR_HGATP), HGATP_MODE_MASK) == mode )
>> {
>> gstage_mode = mode;
>> + gstage_root_level = modes[mode_idx].paging_levels - 1;
>> break;
>> }
>> }
>> @@ -210,6 +212,9 @@ int p2m_init(struct domain *d)
>> rwlock_init(&p2m->lock);
>> INIT_PAGE_LIST_HEAD(&p2m->pages);
>>
>> + p2m->max_mapped_gfn = _gfn(0);
>> + p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
>> +
>> /*
>> * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH
>> * is not ready for RISC-V support.
>> @@ -251,13 +256,287 @@ int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
>> return rc;
>> }
>>
>> +/*
>> + * Find and map the root page table. The caller is responsible for
>> + * unmapping the table.
> With the root table being 4 pages, "the root table" is slightly misleading
> here: Yu never map the entire table.
I will update the comment then to:
/*
* Map one of the four root pages of the P2M root page table.
*
* The P2M root page table is larger than normal (16KB instead of 4KB),
* so it is allocated as four consecutive 4KB pages. This function selects
* the appropriate 4KB page based on the given GFN and returns a mapping
* to it.
*
* The caller is responsible for unmapping the page after use.
*
* Returns NULL if the calculated offset into the root table is invalid.
*/
>
>> + * The function will return NULL if the offset into the root table is
>> + * invalid.
>> + */
>> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
>> +{
>> + unsigned long root_table_indx;
>> +
>> + root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
>> + if ( root_table_indx >= P2M_ROOT_PAGES )
>> + return NULL;
>> +
>> + /*
>> + * The P2M root page table is extended by 2 bits, making its size 16KB
>> + * (instead of 4KB for non-root page tables). Therefore, p2m->root is
>> + * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
>> + * only allocates 4KB pages).
>> + *
>> + * To determine which of these four 4KB pages the root_table_indx falls
>> + * into, we divide root_table_indx by
>> + * P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1).
>> + */
>> + root_table_indx /= P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1);
> The subtraction of 1 here feels odd: You're after the root table's
> number of entries, i.e. I'd expect you to pass just P2M_ROOT_LEVEL.
> And the way P2M_PAGETABLE_ENTRIES() works also suggests so.
The purpose of this line is to select the page within the root table, which
consists of 4 consecutive pages. However, P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL)
returns 2048, so root_table_idx will always be 0 after devision, which is not
what we want.
As an alternative, P2M_PAGETABLE_ENTRIES(0) could be used, since it always
returns 512. Dividing root_table_idx by 512 then yields the index of the page
within the root table, which is made up of 4 consecutive pages.
Does it make sense now?
The problem may occur with DECLARE_OFFSET(), which can produce an incorrect
index within the root page table. Since the index is in the range [0, 2047],
it becomes an issue if the value is greater than 511, because DECLARE_OFFSET()
does not account for the larger range of the root index.
I am not sure whether it is better to make DECLARE_OFFSET() generic enough
for both P2M and Xen page tables, or to provide a separate P2M_DECLARE_OFFSET()
and use it only in P2M-related code.
Also, it could be an option to move DECLARE_OFFSET() from asm/page.h header
to riscv/pt.c and define another one DECLARE_OFFSETS in riscv/p2m.c.
Do you have a preference?
>
>> +/*
>> + * Insert an entry in the p2m. This should be called with a mapping
>> + * equal to a page/superpage.
>> + */
> I don't follow this comment: There isn't any mapping being passed in, is there?
I think this comment should be dropped, it was about that requested mapping
should be equal to a page/superpage(4k, 2m, 1g), the correct order is always
guaranteed by p2m_mapping_order().
>
>> +static int p2m_set_entry(struct p2m_domain *p2m,
>> + gfn_t gfn,
>> + unsigned long page_order,
>> + mfn_t mfn,
>> + p2m_type_t t)
> Nit: Indentation.
>
>> +{
>> + unsigned int level;
>> + unsigned int target = page_order / PAGETABLE_ORDER;
>> + pte_t *entry, *table, orig_pte;
>> + int rc;
>> + /*
>> + * A mapping is removed only if the MFN is explicitly set to INVALID_MFN.
>> + * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
>> + * are still allowed.
>> + */
>> + bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
>> + DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>> +
>> + ASSERT(p2m_is_write_locked(p2m));
>> +
>> + /*
>> + * Check if the level target is valid: we only support
>> + * 4K - 2M - 1G mapping.
>> + */
>> + ASSERT(target <= 2);
>> +
>> + table = p2m_get_root_pointer(p2m, gfn);
>> + if ( !table )
>> + return -EINVAL;
>> +
>> + for ( level = P2M_ROOT_LEVEL; level > target; level-- )
>> + {
>> + /*
>> + * Don't try to allocate intermediate page table if the mapping
>> + * is about to be removed.
>> + */
>> + rc = p2m_next_level(p2m, !removing_mapping,
>> + level, &table, offsets[level]);
>> + if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>> + {
>> + rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
>> + /*
>> + * We are here because p2m_next_level has failed to map
>> + * the intermediate page table (e.g the table does not exist
>> + * and they p2m tree is read-only).
> I thought I commented on this or something similar already: Calling the
> p2m tree "read-only" is imo misleading.
I will change then "read-only" to "not allocatable".
>
>> It is a valid case
>> + * when removing a mapping as it may not exist in the
>> + * page table. In this case, just ignore it.
> I fear the "it" has no reference; aiui you mean "ignore the lookup failure",
> but the comment isn't worded to refer to that by "it".
I will update the comment correspondingly.
>
>> + */
>> + rc = removing_mapping ? 0 : rc;
>> + goto out;
>> + }
>> +
>> + if ( rc != P2M_TABLE_NORMAL )
>> + break;
>> + }
>> +
>> + entry = table + offsets[level];
>> +
>> + /*
>> + * If we are here with level > target, we must be at a leaf node,
>> + * and we need to break up the superpage.
>> + */
>> + if ( level > target )
>> + {
>> + panic("Shattering isn't implemented\n");
>> + }
>> +
>> + /*
>> + * We should always be there with the correct level because all the
>> + * intermediate tables have been installed if necessary.
>> + */
>> + ASSERT(level == target);
>> +
>> + orig_pte = *entry;
>> +
>> + if ( removing_mapping )
>> + p2m_clean_pte(entry, p2m->clean_dcache);
>> + else
>> + {
>> + pte_t pte = p2m_pte_from_mfn(mfn, t);
>> +
>> + p2m_write_pte(entry, pte, p2m->clean_dcache);
>> +
>> + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn,
>> + gfn_add(gfn, BIT(page_order, UL) - 1));
>> + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, gfn);
>> + }
>> +
>> + p2m->need_flush = true;
>> +
>> + /*
>> + * Currently, the infrastructure required to enable CONFIG_HAS_PASSTHROUGH
>> + * is not ready for RISC-V support.
>> + *
>> + * When CONFIG_HAS_PASSTHROUGH=y, iommu_iotlb_flush() should be done
>> + * here.
>> + */
>> +#ifdef CONFIG_HAS_PASSTHROUGH
>> +# error "add code to flush IOMMU TLB"
>> +#endif
>> +
>> + rc = 0;
>> +
>> + /*
>> + * Free the entry only if the original pte was valid and the base
>> + * is different (to avoid freeing when permission is changed).
>> + *
>> + * If previously MFN 0 was mapped and it is going to be removed
>> + * and considering that during removing MFN 0 is used then `entry`
>> + * and `new_entry` will be the same and p2m_free_subtree() won't be
>> + * called. This case is handled explicitly.
>> + */
>> + if ( pte_is_valid(orig_pte) &&
>> + (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
>> + (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
>> + p2m_free_subtree(p2m, orig_pte, level);
> I continue to fail to understand why the MFN would matter here.
My understanding is that if, for the same GFN, the MFN changes fromMFN_1 to
MFN_2, then we need to update any references on the page referenced by
|orig_pte| to ensure the proper reference counter is maintained for the page
pointed to byMFN_1.
> Isn't the
> need to free strictly tied to a VALID -> NOT VALID transition? A permission
> change simply retains the VALID state of an entry.
It covers a case when removing happens and probably in this case we don't need
to check specifically for mfn(0) case "mfn_eq(pte_get_mfn(*entry), _mfn(0))",
but it would be enough to check that pte_is_valid(entry) instead:
...
(removing_mapping && !pte_is_valid(entry)))) )
Or only check removing_mapping variable as `entry` would be invalided by the
code above anyway. So we will get:
+ if ( pte_is_valid(orig_pte) &&
+ (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) || removing_mapping) )
+ p2m_free_subtree(p2m, orig_pte, level);
Does it make sense now?
~ Oleksii
[-- Attachment #2: Type: text/html, Size: 12378 bytes --]
next prev parent reply other threads:[~2025-09-25 20:09 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 21:55 [PATCH v4 00/18 for 4.22] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 01/18] xen/riscv: detect and initialize G-stage mode Oleksii Kurochko
2025-09-18 15:54 ` Jan Beulich
2025-09-24 11:31 ` Oleksii Kurochko
2025-09-24 15:00 ` Oleksii Kurochko
2025-09-25 13:46 ` Jan Beulich
2025-09-26 7:30 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 02/18] xen/riscv: introduce VMID allocation and manegement Oleksii Kurochko
2025-09-19 21:26 ` Jan Beulich
2025-09-24 14:25 ` Oleksii Kurochko
2025-09-25 13:53 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 03/18] xen/riscv: introduce things necessary for p2m initialization Oleksii Kurochko
2025-09-19 21:43 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 04/18] xen/riscv: construct the P2M pages pool for guests Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 05/18] xen/riscv: add root page table allocation Oleksii Kurochko
2025-09-19 22:14 ` Jan Beulich
2025-09-24 15:40 ` Oleksii Kurochko
2025-09-25 13:56 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 06/18] xen/riscv: introduce pte_{set,get}_mfn() Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 07/18] xen/riscv: add new p2m types and helper macros for type classification Oleksii Kurochko
2025-09-19 22:18 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 08/18] xen/dom0less: abstract Arm-specific p2m type name for device MMIO mappings Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 09/18] xen/riscv: implement function to map memory in guest p2m Oleksii Kurochko
2025-09-19 23:12 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 10/18] xen/riscv: implement p2m_set_range() Oleksii Kurochko
2025-09-19 23:36 ` Jan Beulich
2025-09-25 20:08 ` Oleksii Kurochko [this message]
2025-09-26 7:07 ` Jan Beulich
2025-09-26 8:58 ` Oleksii Kurochko
2025-10-13 11:59 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 11/18] xen/riscv: Implement p2m_free_subtree() and related helpers Oleksii Kurochko
2025-09-19 23:57 ` Jan Beulich
2025-09-26 15:33 ` Oleksii Kurochko
2025-09-28 14:30 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 12/18] xen/riscv: Implement p2m_pte_from_mfn() and support PBMT configuration Oleksii Kurochko
2025-09-22 16:28 ` Jan Beulich
2025-09-29 13:30 ` Oleksii Kurochko
2025-10-07 13:09 ` Jan Beulich
2025-10-09 9:21 ` Oleksii Kurochko
2025-10-09 12:06 ` Jan Beulich
2025-10-10 8:29 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 13/18] xen/riscv: implement p2m_next_level() Oleksii Kurochko
2025-09-22 17:35 ` Jan Beulich
2025-09-29 14:23 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 14/18] xen/riscv: Implement superpage splitting for p2m mappings Oleksii Kurochko
2025-09-22 17:55 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 15/18] xen/riscv: implement put_page() Oleksii Kurochko
2025-09-22 19:54 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 16/18] xen/riscv: implement mfn_valid() and page reference, ownership handling helpers Oleksii Kurochko
2025-09-22 20:02 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-09-22 20:46 ` Jan Beulich
2025-09-30 15:37 ` Oleksii Kurochko
2025-10-07 13:14 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 18/18] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-09-22 22:41 ` Jan Beulich
2025-10-01 16:00 ` Oleksii Kurochko
2025-10-07 13:25 ` Jan Beulich
2025-10-09 11:34 ` Oleksii Kurochko
2025-10-09 12:10 ` Jan Beulich
2025-10-10 8:42 ` Oleksii Kurochko
2025-10-10 11:00 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5c016c9-aee4-4a86-a6cc-0d89dd5e9216@gmail.com \
--to=oleksii.kurochko@gmail.com \
--cc=alistair.francis@wdc.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@vates.tech \
--cc=bobbyeshleman@gmail.com \
--cc=connojdavis@gmail.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=michal.orzel@amd.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).