xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Oleksii Kurochko <oleksii.kurochko@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Alistair Francis" <alistair.francis@wdc.com>,
	"Bob Eshleman" <bobbyeshleman@gmail.com>,
	"Connor Davis" <connojdavis@gmail.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Anthony PERARD" <anthony.perard@vates.tech>,
	"Michal Orzel" <michal.orzel@amd.com>,
	"Julien Grall" <julien@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 10/18] xen/riscv: implement p2m_set_range()
Date: Fri, 26 Sep 2025 10:58:53 +0200	[thread overview]
Message-ID: <b60c5228-d7da-4b8e-b12b-3fe26825759b@gmail.com> (raw)
In-Reply-To: <6b62cf4c-8367-47dc-9911-206c220fb050@suse.com>

[-- Attachment #1: Type: text/plain, Size: 6949 bytes --]


On 9/26/25 9:07 AM, Jan Beulich wrote:
> On 25.09.2025 22:08, Oleksii Kurochko wrote:
>> On 9/20/25 1:36 AM, Jan Beulich wrote:
>>> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>>>> +static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
>>>> +{
>>>> +    unsigned long root_table_indx;
>>>> +
>>>> +    root_table_indx = gfn_x(gfn) >> P2M_LEVEL_ORDER(P2M_ROOT_LEVEL);
>>>> +    if ( root_table_indx >= P2M_ROOT_PAGES )
>>>> +        return NULL;
>>>> +
>>>> +    /*
>>>> +     * The P2M root page table is extended by 2 bits, making its size 16KB
>>>> +     * (instead of 4KB for non-root page tables). Therefore, p2m->root is
>>>> +     * allocated as four consecutive 4KB pages (since alloc_domheap_pages()
>>>> +     * only allocates 4KB pages).
>>>> +     *
>>>> +     * To determine which of these four 4KB pages the root_table_indx falls
>>>> +     * into, we divide root_table_indx by
>>>> +     * P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1).
>>>> +     */
>>>> +    root_table_indx /= P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL - 1);
>>> The subtraction of 1 here feels odd: You're after the root table's
>>> number of entries, i.e. I'd expect you to pass just P2M_ROOT_LEVEL.
>>> And the way P2M_PAGETABLE_ENTRIES() works also suggests so.
>> The purpose of this line is to select the page within the root table, which
>> consists of 4 consecutive pages. However, P2M_PAGETABLE_ENTRIES(P2M_ROOT_LEVEL)
>> returns 2048, so root_table_idx will always be 0 after devision, which is not
>> what we want.
>>
>> As an alternative, P2M_PAGETABLE_ENTRIES(0) could be used, since it always
>> returns 512. Dividing root_table_idx by 512 then yields the index of the page
>> within the root table, which is made up of 4 consecutive pages.
>>
>> Does it make sense now?
> Yes and no. I understand what you're after, but that doesn't make the use of
> P2M_PAGETABLE_ENTRIES() (with an arbitrary level as argument) correct. This
> calculation wants doing by solely using properties of the top level.

Got it, thanks. Then I will use solely properties of the top level.

>>>> +static int p2m_set_entry(struct p2m_domain *p2m,
>>>> +                           gfn_t gfn,
>>>> +                           unsigned long page_order,
>>>> +                           mfn_t mfn,
>>>> +                           p2m_type_t t)
>>> Nit: Indentation.
>>>
>>>> +{
>>>> +    unsigned int level;
>>>> +    unsigned int target = page_order / PAGETABLE_ORDER;
>>>> +    pte_t *entry, *table, orig_pte;
>>>> +    int rc;
>>>> +    /*
>>>> +     * A mapping is removed only if the MFN is explicitly set to INVALID_MFN.
>>>> +     * Other MFNs that are considered invalid by mfn_valid() (e.g., MMIO)
>>>> +     * are still allowed.
>>>> +     */
>>>> +    bool removing_mapping = mfn_eq(mfn, INVALID_MFN);
>>>> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>>>> +
>>>> +    ASSERT(p2m_is_write_locked(p2m));
>>>> +
>>>> +    /*
>>>> +     * Check if the level target is valid: we only support
>>>> +     * 4K - 2M - 1G mapping.
>>>> +     */
>>>> +    ASSERT(target <= 2);
>>>> +
>>>> +    table = p2m_get_root_pointer(p2m, gfn);
>>>> +    if ( !table )
>>>> +        return -EINVAL;
>>>> +
>>>> +    for ( level = P2M_ROOT_LEVEL; level > target; level-- )
>>>> +    {
>>>> +        /*
>>>> +         * Don't try to allocate intermediate page table if the mapping
>>>> +         * is about to be removed.
>>>> +         */
>>>> +        rc = p2m_next_level(p2m, !removing_mapping,
>>>> +                            level, &table, offsets[level]);
>>>> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>>>> +        {
>>>> +            rc = (rc == P2M_TABLE_MAP_NONE) ? -ENOENT : -ENOMEM;
>>>> +            /*
>>>> +             * We are here because p2m_next_level has failed to map
>>>> +             * the intermediate page table (e.g the table does not exist
>>>> +             * and they p2m tree is read-only).
>>> I thought I commented on this or something similar already: Calling the
>>> p2m tree "read-only" is imo misleading.
>> I will change then "read-only" to "not allocatable".
> That'll be only marginally better: What's "allocatable"? Why not something
> like "... does not exist and none should be allocated"? Or maybe simply
> omit this part of the comment?

Agree, "allocatable" could be also confusing. Perhaps, just omitting will
be fine.

>
>>>> +    /*
>>>> +     * Free the entry only if the original pte was valid and the base
>>>> +     * is different (to avoid freeing when permission is changed).
>>>> +     *
>>>> +     * If previously MFN 0 was mapped and it is going to be removed
>>>> +     * and considering that during removing MFN 0 is used then `entry`
>>>> +     * and `new_entry` will be the same and p2m_free_subtree() won't be
>>>> +     * called. This case is handled explicitly.
>>>> +     */
>>>> +    if ( pte_is_valid(orig_pte) &&
>>>> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) ||
>>>> +          (removing_mapping && mfn_eq(pte_get_mfn(*entry), _mfn(0)))) )
>>>> +        p2m_free_subtree(p2m, orig_pte, level);
>>> I continue to fail to understand why the MFN would matter here.
>> My understanding is that if, for the same GFN, the MFN changes fromMFN_1 to
>> MFN_2, then we need to update any references on the page referenced by
>> |orig_pte| to ensure the proper reference counter is maintained for the page
>> pointed to byMFN_1.
>>
>>>    Isn't the
>>> need to free strictly tied to a VALID -> NOT VALID transition? A permission
>>> change simply retains the VALID state of an entry.
>> It covers a case when removing happens and probably in this case we don't need
>> to check specifically for mfn(0) case "mfn_eq(pte_get_mfn(*entry), _mfn(0))",
>> but it would be enough to check that pte_is_valid(entry) instead:
>>     ...
>>     (removing_mapping && !pte_is_valid(entry)))) )
>>
>> Or only check removing_mapping variable as `entry` would be invalided by the
>> code above anyway. So we will get:
>> +    if ( pte_is_valid(orig_pte) &&
>> +         (!mfn_eq(pte_get_mfn(*entry), pte_get_mfn(orig_pte)) || removing_mapping) )
>> +        p2m_free_subtree(p2m, orig_pte, level);
>>
>> Does it make sense now?
> Not really, sorry. Imo the complicated condition indicates that something is
> wrong (or at least inefficient) here.

Then, in the case of aVALID -> VALID transition, where the MFN is changed for the
same PTE, should something be done with the old MFN (e.g., calling|p2m_put_page()|
for it), or can freeing the old MFN be delayed until|domain_relinquish_resources() |is called? If so, wouldn’t that lead to a situation where many old MFNs accumulate
and cannot be re-used until|domain_relinquish_resources()| (or another function that
explicitly frees pages) is invoked?

If we only need to care about theVALID -> NOT VALID transition, doesn’t that mean
|p2m_free_subtree()| should be called only when a removal actually occurs?

~ Oleksii


[-- Attachment #2: Type: text/html, Size: 9088 bytes --]

  reply	other threads:[~2025-09-26  8:59 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17 21:55 [PATCH v4 00/18 for 4.22] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 01/18] xen/riscv: detect and initialize G-stage mode Oleksii Kurochko
2025-09-18 15:54   ` Jan Beulich
2025-09-24 11:31     ` Oleksii Kurochko
2025-09-24 15:00       ` Oleksii Kurochko
2025-09-25 13:46         ` Jan Beulich
2025-09-26  7:30           ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 02/18] xen/riscv: introduce VMID allocation and manegement Oleksii Kurochko
2025-09-19 21:26   ` Jan Beulich
2025-09-24 14:25     ` Oleksii Kurochko
2025-09-25 13:53       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 03/18] xen/riscv: introduce things necessary for p2m initialization Oleksii Kurochko
2025-09-19 21:43   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 04/18] xen/riscv: construct the P2M pages pool for guests Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 05/18] xen/riscv: add root page table allocation Oleksii Kurochko
2025-09-19 22:14   ` Jan Beulich
2025-09-24 15:40     ` Oleksii Kurochko
2025-09-25 13:56       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 06/18] xen/riscv: introduce pte_{set,get}_mfn() Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 07/18] xen/riscv: add new p2m types and helper macros for type classification Oleksii Kurochko
2025-09-19 22:18   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 08/18] xen/dom0less: abstract Arm-specific p2m type name for device MMIO mappings Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 09/18] xen/riscv: implement function to map memory in guest p2m Oleksii Kurochko
2025-09-19 23:12   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 10/18] xen/riscv: implement p2m_set_range() Oleksii Kurochko
2025-09-19 23:36   ` Jan Beulich
2025-09-25 20:08     ` Oleksii Kurochko
2025-09-26  7:07       ` Jan Beulich
2025-09-26  8:58         ` Oleksii Kurochko [this message]
2025-10-13 11:59           ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 11/18] xen/riscv: Implement p2m_free_subtree() and related helpers Oleksii Kurochko
2025-09-19 23:57   ` Jan Beulich
2025-09-26 15:33     ` Oleksii Kurochko
2025-09-28 14:30       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 12/18] xen/riscv: Implement p2m_pte_from_mfn() and support PBMT configuration Oleksii Kurochko
2025-09-22 16:28   ` Jan Beulich
2025-09-29 13:30     ` Oleksii Kurochko
2025-10-07 13:09       ` Jan Beulich
2025-10-09  9:21         ` Oleksii Kurochko
2025-10-09 12:06           ` Jan Beulich
2025-10-10  8:29             ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 13/18] xen/riscv: implement p2m_next_level() Oleksii Kurochko
2025-09-22 17:35   ` Jan Beulich
2025-09-29 14:23     ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 14/18] xen/riscv: Implement superpage splitting for p2m mappings Oleksii Kurochko
2025-09-22 17:55   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 15/18] xen/riscv: implement put_page() Oleksii Kurochko
2025-09-22 19:54   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 16/18] xen/riscv: implement mfn_valid() and page reference, ownership handling helpers Oleksii Kurochko
2025-09-22 20:02   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-09-22 20:46   ` Jan Beulich
2025-09-30 15:37     ` Oleksii Kurochko
2025-10-07 13:14       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 18/18] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-09-22 22:41   ` Jan Beulich
2025-10-01 16:00     ` Oleksii Kurochko
2025-10-07 13:25       ` Jan Beulich
2025-10-09 11:34         ` Oleksii Kurochko
2025-10-09 12:10           ` Jan Beulich
2025-10-10  8:42             ` Oleksii Kurochko
2025-10-10 11:00               ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b60c5228-d7da-4b8e-b12b-3fe26825759b@gmail.com \
    --to=oleksii.kurochko@gmail.com \
    --cc=alistair.francis@wdc.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=bobbyeshleman@gmail.com \
    --cc=connojdavis@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).