From: Oleksii Kurochko <oleksii.kurochko@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "Alistair Francis" <alistair.francis@wdc.com>,
"Bob Eshleman" <bobbyeshleman@gmail.com>,
"Connor Davis" <connojdavis@gmail.com>,
"Andrew Cooper" <andrew.cooper3@citrix.com>,
"Anthony PERARD" <anthony.perard@vates.tech>,
"Michal Orzel" <michal.orzel@amd.com>,
"Julien Grall" <julien@xen.org>,
"Roger Pau Monné" <roger.pau@citrix.com>,
"Stefano Stabellini" <sstabellini@kernel.org>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN
Date: Tue, 30 Sep 2025 17:37:25 +0200 [thread overview]
Message-ID: <577daeb5-d43f-4172-9a3b-2062c01b8d45@gmail.com> (raw)
In-Reply-To: <854ff53f-5af0-43bf-83b0-8e1e1e0deefc@suse.com>
[-- Attachment #1: Type: text/plain, Size: 7854 bytes --]
On 9/22/25 10:46 PM, Jan Beulich wrote:
> On 17.09.2025 23:55, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -978,3 +978,189 @@ int map_regions_p2mt(struct domain *d,
>>
>> return rc;
>> }
>> +
>> +
> Nit: No double blank lines please.
>
>> +/*
>> + * p2m_get_entry() should always return the correct order value, even if an
>> + * entry is not present (i.e. the GFN is outside the range):
>> + * [p2m->lowest_mapped_gfn, p2m->max_mapped_gfn]). (1)
>> + *
>> + * This ensures that callers of p2m_get_entry() can determine what range of
>> + * address space would be altered by a corresponding p2m_set_entry().
>> + * Also, it would help to avoid cost page walks for GFNs outside range (1).
>> + *
>> + * Therefore, this function returns true for GFNs outside range (1), and in
>> + * that case the corresponding level is returned via the level_out argument.
>> + * Otherwise, it returns false and p2m_get_entry() performs a page walk to
>> + * find the proper entry.
>> + */
>> +static bool check_outside_boundary(gfn_t gfn, gfn_t boundary, bool is_lower,
>> + unsigned int *level_out)
>> +{
>> + unsigned int level;
>> +
>> + if ( (is_lower && gfn_x(gfn) < gfn_x(boundary)) ||
>> + (!is_lower && gfn_x(gfn) > gfn_x(boundary)) )
> I understand people write things this way, but personally I find it confusing
> to read. Why not simply use a conditional operator here (and again below):
>
> if ( is_lower ? gfn_x(gfn) < gfn_x(boundary)
> : gfn_x(gfn) > gfn_x(boundary) )
I am okay with both options. If you think the second one is more readable then I
will use it.
>> + {
>> + for ( level = P2M_ROOT_LEVEL; level; level-- )
>> + {
>> + unsigned long mask = PFN_DOWN(P2M_LEVEL_MASK(level));
> Don't you need to accumulate the mask to use across loop iterations here
> (or calculate it accordingly)? Else ...
>
>> + if ( (is_lower && ((gfn_x(gfn) & mask) < gfn_x(boundary))) ||
>> + (!is_lower && ((gfn_x(gfn) & mask) > gfn_x(boundary))) )
> ... here you'll compare some middle part of the original GFN against the
> boundary.
Agree, accumulation of the mask should be done here.
>> + {
>> + *level_out = level;
>> + return true;
>> + }
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> +/*
>> + * Get the details of a given gfn.
>> + *
>> + * If the entry is present, the associated MFN will be returned and the
>> + * p2m type of the mapping.
>> + * The page_order will correspond to the order of the mapping in the page
>> + * table (i.e it could be a superpage).
>> + *
>> + * If the entry is not present, INVALID_MFN will be returned and the
>> + * page_order will be set according to the order of the invalid range.
>> + *
>> + * valid will contain the value of bit[0] (e.g valid bit) of the
>> + * entry.
>> + */
>> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
>> + p2m_type_t *t,
>> + unsigned int *page_order,
>> + bool *valid)
>> +{
>> + unsigned int level = 0;
>> + pte_t entry, *table;
>> + int rc;
>> + mfn_t mfn = INVALID_MFN;
>> + DECLARE_OFFSETS(offsets, gfn_to_gaddr(gfn));
>> +
>> + ASSERT(p2m_is_locked(p2m));
>> +
>> + if ( valid )
>> + *valid = false;
> Wouldn't you better similarly set *t to some "default" value?
I think it makes sense. I will set it to p2m_invalid.
>> + if ( check_outside_boundary(gfn, p2m->lowest_mapped_gfn, true, &level) )
>> + goto out;
>> +
>> + if ( check_outside_boundary(gfn, p2m->max_mapped_gfn, false, &level) )
>> + goto out;
>> +
>> + table = p2m_get_root_pointer(p2m, gfn);
>> +
>> + /*
>> + * The table should always be non-NULL because the gfn is below
>> + * p2m->max_mapped_gfn and the root table pages are always present.
>> + */
>> + if ( !table )
>> + {
>> + ASSERT_UNREACHABLE();
>> + level = P2M_ROOT_LEVEL;
>> + goto out;
>> + }
>> +
>> + for ( level = P2M_ROOT_LEVEL; level; level-- )
>> + {
>> + rc = p2m_next_level(p2m, false, level, &table, offsets[level]);
>> + if ( (rc == P2M_TABLE_MAP_NONE) || (rc == P2M_TABLE_MAP_NOMEM) )
>> + goto out_unmap;
> Getting back P2M_TABLE_MAP_NOMEM here is a bug, not really a loop exit
> condition.
Oh, I agree. With the second argument set to|false|,|rc = P2M_TABLE_MAP_NOMEM|
will never be returned, so it can simply be dropped.
>
>> + if ( rc != P2M_TABLE_NORMAL )
>> + break;
>> + }
>> +
>> + entry = table[offsets[level]];
>> +
>> + if ( pte_is_valid(entry) )
>> + {
>> + if ( t )
>> + *t = p2m_get_type(entry);
>> +
>> + mfn = pte_get_mfn(entry);
>> + /*
>> + * The entry may point to a superpage. Find the MFN associated
>> + * to the GFN.
>> + */
>> + mfn = mfn_add(mfn,
>> + gfn_x(gfn) & (BIT(P2M_LEVEL_ORDER(level), UL) - 1));
> May want to assert that the respective bits of "mfn" are actually clear
> before this calculation.
ASSERT(!(mfn & (BIT(P2M_LEVEL_ORDER(level), UL) - 1)));
Do you mean something like that?
I am not 100% sure that there is really need for that as page-fault exception
is raised if the PA is insufficienlty aligned:
Any level of PTE may be a leaf PTE, so in addition to 4 KiB pages, Sv39 supports
2 MiB megapages and 1 GiB gigapages, each of which must be virtually and
physically aligned to a boundary equal to its size. A page-fault exception is
raised if the physical address is insufficiently aligned.
>
>> + if ( valid )
>> + *valid = pte_is_valid(entry);
>> + }
>> +
>> + out_unmap:
>> + unmap_domain_page(table);
>> +
>> + out:
>> + if ( page_order )
>> + *page_order = P2M_LEVEL_ORDER(level);
>> +
>> + return mfn;
>> +}
>> +
>> +static mfn_t p2m_lookup(struct p2m_domain *p2m, gfn_t gfn, p2m_type_t *t)
>> +{
>> + mfn_t mfn;
>> +
>> + p2m_read_lock(p2m);
>> + mfn = p2m_get_entry(p2m, gfn, t, NULL, NULL);
> Seeing the two NULLs here I wonder: What use is the "valid" parameter of that
> function?
`valid` parameter isn't really needed anymore. It was needed when I had a copy
of `valid` bit with real (in PTE) valid bit set to 0 to track which one pages
are used.
I will drop `valid` parameter.
> And what use is the function here when it doesn't also return the
> order?
It could be used for gfn_to_mfn(), but p2m_get_entry() could be used too, just
not need to forget to wrap by p2m_read_(un)lock() each time when p2m_get_entry()
is used. Probably, it makes sense to put p2m_read_(un)lock() inside p2m_get_entry().
I think we can leave only p2m_get_entry() and drop p2m_lookup().
> IOW I'm not sure having this helper is actually worthwhile. This is
> even more so that ...
>> + p2m_read_unlock(p2m);
>> +
>> + return mfn;
>> +}
>> +
>> +struct page_info *p2m_get_page_from_gfn(struct p2m_domain *p2m, gfn_t gfn,
>> + p2m_type_t *t)
>> +{
>> + struct page_info *page;
>> + p2m_type_t p2mt = p2m_invalid;
>> + mfn_t mfn;
>> +
>> + p2m_read_lock(p2m);
>> + mfn = p2m_lookup(p2m, gfn, t);
> ... there's a locking problem here: You cannot acquire a read lock in a
> nested fashion - that's a recipe for a deadlock when between the first
> acquire and the 2nd acquire attempt another CPU tries to acquire the
> lock for writing (which will result in no further readers being allowed
> in). It wasn't all that long ago that in the security team we actually
> audited the code base for the absence of such a pattern.
Oh, missed such case. Thanks for explanation and review.
~ Oleksii
[-- Attachment #2: Type: text/html, Size: 10417 bytes --]
next prev parent reply other threads:[~2025-09-30 15:37 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 21:55 [PATCH v4 00/18 for 4.22] xen/riscv: introduce p2m functionality Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 01/18] xen/riscv: detect and initialize G-stage mode Oleksii Kurochko
2025-09-18 15:54 ` Jan Beulich
2025-09-24 11:31 ` Oleksii Kurochko
2025-09-24 15:00 ` Oleksii Kurochko
2025-09-25 13:46 ` Jan Beulich
2025-09-26 7:30 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 02/18] xen/riscv: introduce VMID allocation and manegement Oleksii Kurochko
2025-09-19 21:26 ` Jan Beulich
2025-09-24 14:25 ` Oleksii Kurochko
2025-09-25 13:53 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 03/18] xen/riscv: introduce things necessary for p2m initialization Oleksii Kurochko
2025-09-19 21:43 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 04/18] xen/riscv: construct the P2M pages pool for guests Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 05/18] xen/riscv: add root page table allocation Oleksii Kurochko
2025-09-19 22:14 ` Jan Beulich
2025-09-24 15:40 ` Oleksii Kurochko
2025-09-25 13:56 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 06/18] xen/riscv: introduce pte_{set,get}_mfn() Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 07/18] xen/riscv: add new p2m types and helper macros for type classification Oleksii Kurochko
2025-09-19 22:18 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 08/18] xen/dom0less: abstract Arm-specific p2m type name for device MMIO mappings Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 09/18] xen/riscv: implement function to map memory in guest p2m Oleksii Kurochko
2025-09-19 23:12 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 10/18] xen/riscv: implement p2m_set_range() Oleksii Kurochko
2025-09-19 23:36 ` Jan Beulich
2025-09-25 20:08 ` Oleksii Kurochko
2025-09-26 7:07 ` Jan Beulich
2025-09-26 8:58 ` Oleksii Kurochko
2025-10-13 11:59 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 11/18] xen/riscv: Implement p2m_free_subtree() and related helpers Oleksii Kurochko
2025-09-19 23:57 ` Jan Beulich
2025-09-26 15:33 ` Oleksii Kurochko
2025-09-28 14:30 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 12/18] xen/riscv: Implement p2m_pte_from_mfn() and support PBMT configuration Oleksii Kurochko
2025-09-22 16:28 ` Jan Beulich
2025-09-29 13:30 ` Oleksii Kurochko
2025-10-07 13:09 ` Jan Beulich
2025-10-09 9:21 ` Oleksii Kurochko
2025-10-09 12:06 ` Jan Beulich
2025-10-10 8:29 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 13/18] xen/riscv: implement p2m_next_level() Oleksii Kurochko
2025-09-22 17:35 ` Jan Beulich
2025-09-29 14:23 ` Oleksii Kurochko
2025-09-17 21:55 ` [PATCH v4 14/18] xen/riscv: Implement superpage splitting for p2m mappings Oleksii Kurochko
2025-09-22 17:55 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 15/18] xen/riscv: implement put_page() Oleksii Kurochko
2025-09-22 19:54 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 16/18] xen/riscv: implement mfn_valid() and page reference, ownership handling helpers Oleksii Kurochko
2025-09-22 20:02 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 17/18] xen/riscv: add support of page lookup by GFN Oleksii Kurochko
2025-09-22 20:46 ` Jan Beulich
2025-09-30 15:37 ` Oleksii Kurochko [this message]
2025-10-07 13:14 ` Jan Beulich
2025-09-17 21:55 ` [PATCH v4 18/18] xen/riscv: introduce metadata table to store P2M type Oleksii Kurochko
2025-09-22 22:41 ` Jan Beulich
2025-10-01 16:00 ` Oleksii Kurochko
2025-10-07 13:25 ` Jan Beulich
2025-10-09 11:34 ` Oleksii Kurochko
2025-10-09 12:10 ` Jan Beulich
2025-10-10 8:42 ` Oleksii Kurochko
2025-10-10 11:00 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=577daeb5-d43f-4172-9a3b-2062c01b8d45@gmail.com \
--to=oleksii.kurochko@gmail.com \
--cc=alistair.francis@wdc.com \
--cc=andrew.cooper3@citrix.com \
--cc=anthony.perard@vates.tech \
--cc=bobbyeshleman@gmail.com \
--cc=connojdavis@gmail.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=michal.orzel@amd.com \
--cc=roger.pau@citrix.com \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).