Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Oleksii Kurochko <oleksii.kurochko@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Alistair Francis" <alistair.francis@wdc.com>,
	"Bob Eshleman" <bobbyeshleman@gmail.com>,
	"Connor Davis" <connojdavis@gmail.com>,
	"Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Anthony PERARD" <anthony.perard@vates.tech>,
	"Michal Orzel" <michal.orzel@amd.com>,
	"Julien Grall" <julien@xen.org>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Date: Tue, 30 Sep 2025 17:37:25 +0200	[thread overview]
Message-ID: <577daeb5-d43f-4172-9a3b-2062c01b8d45@gmail.com> (raw)
In-Reply-To: <854ff53f-5af0-43bf-83b0-8e1e1e0deefc@suse.com>

[-- Attachment #1: Type: text/plain, Size: 7854 bytes --]


On 9/22/25 10:46 PM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
>>   
>>       return rc;
>>   }
>> +
>> +
> Nit: No double blank lines please.
>
>> +/*
>> + * p2m_get_entry() should always return the correct order value, even if an
>> + * entry is not present (i.e. the GFN is outside the range):
>> + *   [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]).    (1)
>> + *
>> + * This ensures that callers of p2m_get_entry() can determine what range of
>> + * address space would be altered by a corresponding p2m_set_entry().
>> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
>> + *
>> + * Therefore, this function returns true for GFNs outside range (1), and in
>> + * that case the corresponding level is returned via the level_out argument.
>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>> + * find the proper entry.
>> + */
>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>> +                                   unsigned int *level_out)
>> +{
>> +    unsigned int level;
>> +
>> +    if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
>> +         (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )
> I understand people write things this way, but personally I find it confusing
> to read. Why not simply use a conditional operator here (and again below):
>
>      if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
>                    : gfn_x(gfn) > gfn_x(boundary) )

I am okay with both options. If you think the second one is more readable then I
will use it.

>> +    {
>> +        for ( level = P2M_ROOT_LEVEL; level; level-- )
>> +        {
>> +            unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));
> Don't you need to accumulate the mask to use across loop iterations here
> (or calculate it accordingly)? Else ...
>
>> +            if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
>> +                 (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )
> ... here you'll compare some middle part of the original GFN against the
> boundary.

Agree, accumulation of the mask should be done here.

>> +            {
>> +                *level_out = level;
>> +                return true;
>> +            }
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * Get the details of a given gfn.
>> + *
>> + * If the entry is present, the associated MFN will be returned and the
>> + * p2m type of the mapping.
>> + * The page_order will correspond to the order of the mapping in the page
>> + * table (i.e it could be a superpage).
>> + *
>> + * If the entry is not present, INVALID_MFN will be returned and the
>> + * page_order will be set according to the order of the invalid range.
>> + *
>> + * valid will contain the value of bit[0] (e.g valid bit) of the
>> + * entry.
>> + */
>> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
>> +                           p2m_type_t *t,
>> +                           unsigned int *page_order,
>> +                           bool *valid)
>> +{
>> +    unsigned int level = 0;
>> +    pte_t entry, *table;
>> +    int rc;
>> +    mfn_t mfn = INVALID_MFN;
>> +    DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>> +
>> +    ASSERT(p2m_is_locked(p2m));
>> +
>> +    if ( valid )
>> +        *valid = false;
> Wouldn't you better similarly set *t to some "default" value?

I think it makes sense. I will set it to p2m_invalid.

>> +    if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
>> +        goto out;
>> +
>> +    if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
>> +        goto out;
>> +
>> +    table = p2m_get_root_pointer(p2m, gfn);
>> +
>> +    /*
>> +     * The table should always be non-NULL because the gfn is below
>> +     * p2m->max_mapped_gfn and the root table pages are always present.
>> +     */
>> +    if ( !table )
>> +    {
>> +        ASSERT_UNREACHABLE();
>> +        level = P2M_ROOT_LEVEL;
>> +        goto out;
>> +    }
>> +
>> +    for ( level = P2M_ROOT_LEVEL; level; level-- )
>> +    {
>> +        rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
>> +        if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>> +            goto out_unmap;
> Getting back P2M_TABLE_MAP_NOMEM here is a bug, not really a loop exit
> condition.

Oh, I agree. With the second argument set to|false|,|rc = P2M_TABLE_MAP_NOMEM|
will never be returned, so it can simply be dropped.


>
>> +        if ( rc != P2M_TABLE_NORMAL )
>> +            break;
>> +    }
>> +
>> +    entry = table[offsets[level]];
>> +
>> +    if ( pte_is_valid(entry) )
>> +    {
>> +        if ( t )
>> +            *t = p2m_get_type(entry);
>> +
>> +        mfn = pte_get_mfn(entry);
>> +        /*
>> +         * The entry may point to a superpage. Find the MFN associated
>> +         * to the GFN.
>> +         */
>> +        mfn = mfn_add(mfn,
>> +                      gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
> May want to assert that the respective bits of "mfn" are actually clear
> before this calculation.

ASSERT(!(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
Do you mean something like that?

I am not 100% sure that there is really need for that as page-fault exception
is raised if the PA is insufficienlty aligned:
  Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39 supports
  2 MiB megapages and 1 GiB gigapages, each of which must be virtually and
  physically aligned to a boundary equal to its size. A page-fault exception is
  raised if the physical address is insufficiently aligned.

>
>> +        if ( valid )
>> +            *valid = pte_is_valid(entry);
>> +    }
>> +
>> + out_unmap:
>> +    unmap_domain_page(table);
>> +
>> + out:
>> +    if ( page_order )
>> +        *page_order = P2M_LEVEL_ORDER(level);
>> +
>> +    return mfn;
>> +}
>> +
>> +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
>> +{
>> +    mfn_t mfn;
>> +
>> +    p2m_read_lock(p2m);
>> +    mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);
> Seeing the two NULLs here I wonder: What use is the "valid" parameter of that
> function?

`valid` parameter isn't really needed anymore. It was needed when I had a copy
of `valid` bit with real (in PTE) valid bit set to 0 to track which one pages
are used.
I will drop `valid` parameter.

> And what use is the function here when it doesn't also return the
> order?

It could be used for gfn_to_mfn(), but p2m_get_entry() could be used too, just
not need to forget to wrap by p2m_read_(un)lock() each time when p2m_get_entry()
is used. Probably, it makes sense to put p2m_read_(un)lock() inside p2m_get_entry().
I think we can leave only p2m_get_entry() and drop p2m_lookup().

>   IOW I'm not sure having this helper is actually worthwhile. This is
> even more so that ...
>> +    p2m_read_unlock(p2m);
>> +
>> +    return mfn;
>> +}
>> +
>> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
>> +                                        p2m_type_t *t)
>> +{
>> +    struct page_info *page;
>> +    p2m_type_t p2mt = p2m_invalid;
>> +    mfn_t mfn;
>> +
>> +    p2m_read_lock(p2m);
>> +    mfn = p2m_lookup(p2m, gfn, t);
> ... there's a locking problem here: You cannot acquire a read lock in a
> nested fashion - that's a recipe for a deadlock when between the first
> acquire and the 2nd acquire attempt another CPU tries to acquire the
> lock for writing (which will result in no further readers being allowed
> in). It wasn't all that long ago that in the security team we actually
> audited the code base for the absence of such a pattern.

Oh, missed such case. Thanks for explanation and review.

~ Oleksii

[-- Attachment #2: Type: text/html, Size: 10417 bytes --]

next prev parent reply	other threads:[~2025-09-30 15:37 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17 21:55 [PATCH v4 00/18 for 4.22] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 01/18] xen/riscv: detect and initialize G-stage mode Oleksii Kurochko
2025-09-18 15:54   ` Jan Beulich
2025-09-24 11:31     ` Oleksii Kurochko
2025-09-24 15:00       ` Oleksii Kurochko
2025-09-25 13:46         ` Jan Beulich
2025-09-26  7:30           ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 02/18] xen/riscv: introduce VMID allocation and manegement Oleksii Kurochko
2025-09-19 21:26   ` Jan Beulich
2025-09-24 14:25     ` Oleksii Kurochko
2025-09-25 13:53       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 03/18] xen/riscv: introduce things necessary for p2m initialization Oleksii Kurochko
2025-09-19 21:43   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 04/18] xen/riscv: construct the P2M pages pool for guests Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 05/18] xen/riscv: add root page table allocation Oleksii Kurochko
2025-09-19 22:14   ` Jan Beulich
2025-09-24 15:40     ` Oleksii Kurochko
2025-09-25 13:56       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 06/18] xen/riscv: introduce pte_{set,get}_mfn() Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 07/18] xen/riscv: add new p2m types and helper macros for type classification Oleksii Kurochko
2025-09-19 22:18   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 08/18] xen/dom0less: abstract Arm-specific p2m type name for device MMIO mappings Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 09/18] xen/riscv: implement function to map memory in guest p2m Oleksii Kurochko
2025-09-19 23:12   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 10/18] xen/riscv: implement p2m_set_range() Oleksii Kurochko
2025-09-19 23:36   ` Jan Beulich
2025-09-25 20:08     ` Oleksii Kurochko
2025-09-26  7:07       ` Jan Beulich
2025-09-26  8:58         ` Oleksii Kurochko
2025-10-13 11:59           ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 11/18] xen/riscv: Implement p2m_free_subtree() and related helpers Oleksii Kurochko
2025-09-19 23:57   ` Jan Beulich
2025-09-26 15:33     ` Oleksii Kurochko
2025-09-28 14:30       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 12/18] xen/riscv: Implement p2m_pte_from_mfn() and support PBMT configuration Oleksii Kurochko
2025-09-22 16:28   ` Jan Beulich
2025-09-29 13:30     ` Oleksii Kurochko
2025-10-07 13:09       ` Jan Beulich
2025-10-09  9:21         ` Oleksii Kurochko
2025-10-09 12:06           ` Jan Beulich
2025-10-10  8:29             ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 13/18] xen/riscv: implement p2m_next_level() Oleksii Kurochko
2025-09-22 17:35   ` Jan Beulich
2025-09-29 14:23     ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 14/18] xen/riscv: Implement superpage splitting for p2m mappings Oleksii Kurochko
2025-09-22 17:55   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 15/18] xen/riscv: implement put_page() Oleksii Kurochko
2025-09-22 19:54   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 16/18] xen/riscv: implement mfn_valid() and page reference, ownership handling helpers Oleksii Kurochko
2025-09-22 20:02   ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-09-22 20:46   ` Jan Beulich
2025-09-30 15:37     ` Oleksii Kurochko [this message]
2025-10-07 13:14       ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 18/18] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-09-22 22:41   ` Jan Beulich
2025-10-01 16:00     ` Oleksii Kurochko
2025-10-07 13:25       ` Jan Beulich
2025-10-09 11:34         ` Oleksii Kurochko
2025-10-09 12:10           ` Jan Beulich
2025-10-10  8:42             ` Oleksii Kurochko
2025-10-10 11:00               ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=577daeb5-d43f-4172-9a3b-2062c01b8d45@gmail.com \
    --to=oleksii.kurochko@gmail.com \
    --cc=alistair.francis@wdc.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthony.perard@vates.tech \
    --cc=bobbyeshleman@gmail.com \
    --cc=connojdavis@gmail.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).