From: oleksii.kurochko@gmail.com
To: Jan Beulich <jbeulich@suse.com>
Cc: Alistair Francis <alistair.francis@wdc.com>,
Bob Eshleman <bobbyeshleman@gmail.com>,
Connor Davis <connojdavis@gmail.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Julien Grall <julien@xen.org>,
Stefano Stabellini <sstabellini@kernel.org>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 1/3] xen/riscv: introduce setup_mm()
Date: Wed, 30 Oct 2024 17:50:32 +0100 [thread overview]
Message-ID: <76fc4999eded2ce74fe73bc81998e92147cf802a.camel@gmail.com> (raw)
In-Reply-To: <8ec6463e-40a8-4d60-b4c2-ea964a06c572@suse.com>
On Wed, 2024-10-30 at 11:25 +0100, Jan Beulich wrote:
> On 23.10.2024 17:50, Oleksii Kurochko wrote:
> > Introduce the implementation of setup_mm(), which includes:
> > 1. Adding all free regions to the boot allocator, as memory is
> > needed
> > to allocate page tables used for frame table mapping.
> > 2. Calculating RAM size and the RAM end address.
> > 3. Setting up direct map mappings from each RAM bank and initialize
> > directmap_virt_start (also introduce XENHEAP_VIRT_START which is
> > defined as directmap_virt_start) to be properly aligned with RAM
> > start to use more superpages to reduce pressure on the TLB.
> > 4. Setting up frame table mappings from physical address 0 to
> > ram_end
> > to simplify mfn_to_page() and page_to_mfn() conversions.
> > 5. Setting up total_pages and max_page.
> >
> > Update virt_to_maddr() to use introduced XENHEAP_VIRT_START.
> >
> > Implement maddr_to_virt() function to convert a machine address
> > to a virtual address. This function is specifically designed to be
> > used
> > only for the DIRECTMAP region, so a check has been added to ensure
> > that
> > the address does not exceed DIRECTMAP_SIZE.
>
> I'm unconvinced by this. Conceivably the function could be used on
> "imaginary" addresses, just to calculate abstract positions or e.g.
> deltas. At the same time I'm also not going to insist on the removal
> of
> that assertion, so long as it doesn't trigger.
>
> > After the introduction of maddr_to_virt() the following linkage
> > error starts
> > to occur and to avoid it share_xen_page_with_guest() stub is added:
> > riscv64-linux-gnu-ld: prelink.o: in function `tasklet_kill':
> > /build/xen/common/tasklet.c:176: undefined reference to
> > `share_xen_page_with_guest'
> > riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
> > `share_xen_page_with_guest'
> > isn't defined riscv64-linux-gnu-ld: final link failed: bad
> > value
> >
> > Despite the linkger fingering tasklet.c, it's trace.o which has the
> > undefined
> > refenrece:
> > $ find . -name \*.o | while read F; do nm $F | grep
> > share_xen_page_with_guest &&
> > echo $F; done
> > U share_xen_page_with_guest
> > ./xen/common/built_in.o
> > U share_xen_page_with_guest
> > ./xen/common/trace.o
> > U share_xen_page_with_guest
> > ./xen/prelink.o
> >
> > Looking at trace.i, there is call of share_xen_page_with_guest()
> > but in case of
> > when maddr_to_virt() is defined as "return NULL" compiler optimizes
> > the part of
> > common/trace.c code where share_xen_page_with_priviliged_guest() is
> > called
> > ( there is no any code in dissambled common/trace.o ) so there is
> > no real call
> > of share_xen_page_with_priviliged_guest().
>
> I don't think it's the "return NULL", but rather BUG_ON()'s (really
> BUG()'s)
> unreachable(). Not the least because the function can't validly
> return NULL,
> and hence callers have no need to check for NULL.
>
> > @@ -25,8 +27,11 @@
> >
> > static inline void *maddr_to_virt(paddr_t ma)
> > {
> > - BUG_ON("unimplemented");
> > - return NULL;
> > + unsigned long va_offset = maddr_to_directmapoff(ma);
> > +
> > + ASSERT(va_offset < DIRECTMAP_SIZE);
> > +
> > + return (void *)(XENHEAP_VIRT_START + va_offset);
> > }
>
> I'm afraid I'm not following why this uses XENHEAP_VIRT_START, when
> it's all about the directmap. I'm in trouble with XENHEAP_VIRT_START
> in the first place: You don't have a separate "heap" virtual address
> range, do you?
The name may not be ideal for RISC-V. I borrowed it from Arm, intending
to account for cases where the directmap virtual start might not align
with DIRECTMAP_VIRT_START due to potential adjustments for superpage
mapping.
And my understanding is that XENHEAP == DIRECTMAP in case of Arm64.
Let's discuss below whether XENHEAP_VIRT_START is necessary, as there
are related questions connected to it.
>
> > @@ -37,9 +42,9 @@ static inline void *maddr_to_virt(paddr_t ma)
> > */
> > static inline unsigned long virt_to_maddr(unsigned long va)
> > {
> > - if ((va >= DIRECTMAP_VIRT_START) &&
> > + if ((va >= XENHEAP_VIRT_START) &&
> > (va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
> > - return directmapoff_to_maddr(va - DIRECTMAP_VIRT_START);
> > + return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
>
> Same concern here then.
>
> > @@ -423,3 +424,123 @@ void * __init early_fdt_map(paddr_t
> > fdt_paddr)
> >
> > return fdt_virt;
> > }
> > +
> > +#ifndef CONFIG_RISCV_32
>
> I'd like to ask that you be more selective with this #ifdef (or omit
> it
> altogether here). setup_mm() itself, for example, looks good for any
> mode.
Regarding setup_mm() as they have pretty different implementations for
32 and 64 bit versions.
> Like does ...
>
> > +#define ROUNDDOWN(addr, size) ((addr) & ~((size) - 1))
>
> ... this #define. Then again this macro may better be placed in
> xen/macros.h anyway, next to ROUNDUP().
I will put it there. It was put in arch specific code as for such long
existence of Xen project no one introduce that so I decided that it is
only one specific case thereby no real need to go to common.
>
> > + frametable_size = ROUNDUP(frametable_size, MB(2));
> > + base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT,
> > PFN_DOWN(MB(2)));
>
> The 2Mb aspect wants a (brief) comment, imo.
>
> > + if ( map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
> > + PFN_DOWN(frametable_size),
> > + PAGE_HYPERVISOR_RW) )
> > + panic("Unable to setup the frametable mappings\n");
> > +
> > + memset(&frame_table[0], 0, nr_mfns * sizeof(struct
> > page_info));
> > + memset(&frame_table[nr_mfns], -1,
> > + frametable_size - (nr_mfns * sizeof(struct
> > page_info)));
>
> Here (see comments on v1) you're still assuming ps == 0.
Do you refer to ?
```
> +/* Map a frame table to cover physical addresses ps through pe */
> +static void __init setup_frametable_mappings(paddr_t ps, paddr_t pe)
> +{
> + unsigned long nr_mfns = mfn_x(mfn_add(maddr_to_mfn(pe), -1)) -
This looks to be accounting for a partial page at the end.
> + mfn_x(maddr_to_mfn(ps)) + 1;
Whereas this doesn't do the same at the start. The sole present caller
passes 0, so that's going to be fine for the time being. Yet it's a
latent pitfall. I'd recommend to either drop the function parameter, or
to deal with it correctly right away.
```
And I've added aligned_ps to cover the case that ps could be not page
aligned.
Or are you refering to 0 in memset(&frame_table[0],...)?
>
> > +/* Map the region in the directmap area. */
> > +static void __init setup_directmap_mappings(unsigned long
> > base_mfn,
> > + unsigned long nr_mfns)
> > +{
> > + int rc;
> > +
> > + /* First call sets the directmap physical and virtual offset.
> > */
> > + if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
> > + {
> > + directmap_mfn_start = _mfn(base_mfn);
> > +
> > + /*
> > + * The base address may not be aligned to the second level
> > + * size (e.g. 1GB when using 4KB pages). This would
> > prevent
> > + * superpage mappings for all the regions because the
> > virtual
> > + * address and machine address should both be suitably
> > aligned.
> > + *
> > + * Prevent that by offsetting the start of the directmap
> > virtual
> > + * address.
> > + */
> > + directmap_virt_start = DIRECTMAP_VIRT_START +
> > pfn_to_paddr(base_mfn);
>
> Don't you need to mask off top bits of the incoming MFN here, or else
> you
> may waste a huge part of direct map space?
Yes, it will result in a loss of direct map space, but we still have a
considerable amount available in Sv39 mode and higher modes. The
largest RAM_START I see currently is 0x1000000000, which means we would
lose 68 GB. However, our DIRECTMAP_SIZE is 308 GB, so there is still
plenty of free space available, and we can always increase
DIRECTMAP_SIZE since we have a lot of free virtual address space in
Sv39.
That said, I’m not insisting on this approach.
My suggestion was to handle the addition and subtraction of
directmap_mfn_start in maddr_to_virt() and virt_to_maddr():
```
+extern mfn_t directmap_mfn_start;
extern vaddr_t directmap_virt_start;
#define pfn_to_paddr(pfn) ((paddr_t)(pfn) << PAGE_SHIFT)
@@ -31,7 +32,7 @@ static inline void *maddr_to_virt(paddr_t ma)
ASSERT(va_offset < DIRECTMAP_SIZE);
- return (void *)(XENHEAP_VIRT_START + va_offset);
+ return (void *)(XENHEAP_VIRT_START -
(mfn_to_maddr(directmap_mfn_start)) + va_offset);
}
/*
@@ -44,7 +45,7 @@ static inline unsigned long virt_to_maddr(unsigned
long va)
{
if ((va >= XENHEAP_VIRT_START) &&
(va < (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE)))
- return directmapoff_to_maddr(va - XENHEAP_VIRT_START);
+ return directmapoff_to_maddr(va - XENHEAP_VIRT_START +
mfn_to_maddr(directmap_mfn_start));
BUILD_BUG_ON(XEN_VIRT_SIZE != MB(2));
ASSERT((va >> (PAGETABLE_ORDER + PAGE_SHIFT)) ==
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 262cec811e..7ef9db2363 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -450,7 +450,7 @@ static void __init
setup_frametable_mappings(paddr_t ps, paddr_t pe)
}
-static mfn_t __ro_after_init directmap_mfn_start =
INVALID_MFN_INITIALIZER;
+mfn_t __ro_after_init directmap_mfn_start = INVALID_MFN_INITIALIZER;
vaddr_t __ro_after_init directmap_virt_start;
/* Map the region in the directmap area. */
@@ -462,6 +462,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
/* First call sets the directmap physical and virtual offset. */
if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
{
+ unsigned long mfn_gb = base_mfn & ~XEN_PT_LEVEL_SIZE(2);
+
directmap_mfn_start = _mfn(base_mfn);
/*
@@ -473,7 +475,8 @@ static void __init
setup_directmap_mappings(unsigned long base_mfn,
* Prevent that by offsetting the start of the directmap
virtual
* address.
*/
- directmap_virt_start = DIRECTMAP_VIRT_START +
pfn_to_paddr(base_mfn);
+ directmap_virt_start = DIRECTMAP_VIRT_START +
+ (base_mfn - mfn_gb) * PAGE_SIZE; /*+
pfn_to_paddr(base_mfn)*/;
```
Finally, regarding masking off the top bits of mfn, I'm not entirely
clear on how this should work. If I understand correctly, if I mask off
certain top bits in mfn, then I would need to unmask those same top
bits in maddr_to_virt() and virt_to_maddr(). Is that correct?
Another point I’m unclear on is which specific part of the top bits
should be masked.
If you could explain this to me, I would really appreciate it, and I'll
be happy to use the masking approach.
>
> > +}
> > +
> > +/*
> > + * Setup memory management
> > + *
> > + * RISC-V 64 has a large virtual address space (the minimum
> > supported
> > + * MMU mode is Sv39, which provides TBs of VA space).
>
> Is it really TBs? According to my math you'd need more than 40 bits
> to
> map a single Tb (alongside other stuff).
I accidentally calculated it as the first 40 bits (from bits 0 to 39)
due to the "39" in Sv39. However, in reality, it’s actually 39 bits
(from bits 0 to 38), so it represents less than TBs, only GBs of
virtual address space.
>
> > + */
> > +void __init setup_mm(void)
> > +{
> > + const struct membanks *banks = bootinfo_get_mem();
> > + paddr_t ram_start = INVALID_PADDR;
> > + paddr_t ram_end = 0;
> > + paddr_t ram_size = 0;
> > + unsigned int i;
> > +
> > + /*
> > + * We need some memory to allocate the page-tables used for
> > the directmap
> > + * mappings. But some regions may contain memory already
> > allocated
> > + * for other uses (e.g. modules, reserved-memory...).
> > + *
> > + * For simplicity, add all the free regions in the boot
> > allocator.
> > + */
> > + populate_boot_allocator();
> > +
> > + total_pages = 0;
> > +
> > + for ( i = 0; i < banks->nr_banks; i++ )
> > + {
> > + const struct membank *bank = &banks->bank[i];
> > + paddr_t bank_end = bank->start + bank->size;
> > +
> > + ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
>
> As before - if a bank doesn't cover full pages, this may give the
> impression
> of there being more "total pages" than there are.
Since it rounds down to PAGE_SIZE, if ram_start is 2K and the total
size of a bank is 11K, ram_size will end up being 8K, so the "total
pages" will cover less RAM than the actual size of the RAM bank.
>
> > + ram_start = min(ram_start, bank->start);
> > + ram_end = max(ram_end, bank_end);
> > +
> > + setup_directmap_mappings(PFN_DOWN(bank->start),
> > + PFN_DOWN(bank->size));
>
> Similarly I don't think this is right when both start and size aren't
> multiple of PAGE_SIZE. You may map an unsuable partial page at the
> start,
> and then fail to map a fully usable page at the end.
ram_size should be a multiple of PAGE_SIZE because we have:
ram_size += ROUNDDOWN(bank->size, PAGE_SIZE);
Do you know of any examples where bank->start isn't aligned to
PAGE_SIZE? Should be somewhere mentioned what is legal physical address
for RAM start? If it’s not PAGE_SIZE-aligned, then it seems we have no
choice but to use ALIGNUP(..., PAGE_SIZE), which would mean losing part
of the bank.
~ Oleksii
next prev parent reply other threads:[~2024-10-30 16:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 15:50 [PATCH v2 0/3] Setup memory management for RISC-V Oleksii Kurochko
2024-10-23 15:50 ` [PATCH v2 1/3] xen/riscv: introduce setup_mm() Oleksii Kurochko
2024-10-30 10:25 ` Jan Beulich
2024-10-30 16:50 ` oleksii.kurochko [this message]
2024-10-31 9:08 ` Jan Beulich
2024-10-31 13:19 ` oleksii.kurochko
2024-10-31 14:28 ` Jan Beulich
2024-10-23 15:50 ` [PATCH v2 2/3] xen/riscv: initialize the VMAP_DEFAULT virtual range Oleksii Kurochko
2024-10-23 15:50 ` [PATCH v2 3/3] xen/riscv: finalize boot allocator and transition to boot state Oleksii Kurochko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76fc4999eded2ce74fe73bc81998e92147cf802a.camel@gmail.com \
--to=oleksii.kurochko@gmail.com \
--cc=alistair.francis@wdc.com \
--cc=andrew.cooper3@citrix.com \
--cc=bobbyeshleman@gmail.com \
--cc=connojdavis@gmail.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=sstabellini@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.