[PATCH v6 0/9] RISCV device tree mapping

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/9] RISCV device tree mapping
@ 2024-09-02 17:01 Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Current patch series introduces device tree mapping for RISC-V
and necessary things for that such as:
- Fixmap mapping
- pmap
- Xen page table processing

---
Changes in v6:
 - Add patch to fix recursion when ASSERT(), BUG*(), panic() are called.
 - Add patch to allow write_atomic() to work with  non-scalar types for consistence
   with read_atomic().
 - All other changes are patch specific so please look at the patch. 
---
Changes in v5:
 - The following patch was merged to staging:
     [PATCH v3 3/9] xen/riscv: enable CONFIG_HAS_DEVICE_TREE
 - Drop depedency from "RISCV basic exception handling implementation" as
   it was meged to staging branch.
 - All other changes are patch specific so please look at the patch.
---
Changes in v4:
 - Drop depedency from common devicre tree patch series as it was merged to
   staging.
 - Update the cover letter message.
 - All other changes are patch specific so please look at the patch.
---
Changes in v3:
 - Introduce SBI RFENCE extension support.
 - Introduce and initialize pcpu_info[] and __cpuid_to_hartid_map[] and functionality
   to work with this arrays.
 - Make page table handling arch specific instead of trying to make it generic.
 - All other changes are patch specific so please look at the patch.
---
Changes in v2:
 - Update the cover letter message
 - introduce fixmap mapping
 - introduce pmap
 - introduce CONFIG_GENREIC_PT
 - update use early_fdt_map() after MMU is enabled.
---

Oleksii Kurochko (9):
  xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are
    called
  xen/riscv: use {read,write}{b,w,l,q}_cpu() to define
    {read,write}_atomic()
  xen/riscv: allow write_atomic() to work with non-scalar types
  xen/riscv: set up fixmap mappings
  xen/riscv: introduce asm/pmap.h header
  xen/riscv: introduce functionality to work with CPU info
  xen/riscv: introduce and initialize SBI RFENCE extension
  xen/riscv: page table handling
  xen/riscv: introduce early_fdt_map()

 xen/arch/riscv/Kconfig                      |   1 +
 xen/arch/riscv/Makefile                     |   2 +
 xen/arch/riscv/include/asm/atomic.h         |  37 +-
 xen/arch/riscv/include/asm/config.h         |  16 +-
 xen/arch/riscv/include/asm/fixmap.h         |  46 +++
 xen/arch/riscv/include/asm/flushtlb.h       |  15 +
 xen/arch/riscv/include/asm/mm.h             |   6 +
 xen/arch/riscv/include/asm/page.h           |  91 +++++
 xen/arch/riscv/include/asm/pmap.h           |  36 ++
 xen/arch/riscv/include/asm/processor.h      |  27 +-
 xen/arch/riscv/include/asm/riscv_encoding.h |   1 +
 xen/arch/riscv/include/asm/sbi.h            |  63 +++
 xen/arch/riscv/include/asm/smp.h            |   9 +
 xen/arch/riscv/mm.c                         | 101 ++++-
 xen/arch/riscv/pt.c                         | 423 ++++++++++++++++++++
 xen/arch/riscv/riscv64/asm-offsets.c        |   2 +
 xen/arch/riscv/riscv64/head.S               |  15 +
 xen/arch/riscv/sbi.c                        | 274 ++++++++++++-
 xen/arch/riscv/setup.c                      |  17 +
 xen/arch/riscv/smp.c                        |  15 +
 xen/arch/riscv/stubs.c                      |   2 +-
 xen/arch/riscv/xen.lds.S                    |   2 +-
 22 files changed, 1166 insertions(+), 35 deletions(-)
 create mode 100644 xen/arch/riscv/include/asm/fixmap.h
 create mode 100644 xen/arch/riscv/include/asm/pmap.h
 create mode 100644 xen/arch/riscv/pt.c
 create mode 100644 xen/arch/riscv/smp.c

-- 
2.46.0



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
  2024-09-10  9:42   ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic() Oleksii Kurochko
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Implement machine_restart() using printk() to prevent recursion that
occurs when ASSERT(), BUG*(), or panic() are invoked.
All these macros (except panic() which could be called directly)
eventually call panic(), which then calls machine_restart(),
leading to a recursive loop.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v6:
 - new patch.
---
 xen/arch/riscv/stubs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 3285d18899..144f1250e1 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -53,7 +53,7 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
 
 void machine_restart(unsigned int delay_millisecs)
 {
-    BUG_ON("unimplemented");
+    printk("%s: unimplemented\n", __func__);
 }
 
 void machine_halt(void)
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
@ 2024-09-03 14:19   ` Andrew Cooper
  2024-09-03 14:23     ` Andrew Cooper
                       ` (2 more replies)
  2024-09-10  9:42   ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Jan Beulich
  1 sibling, 3 replies; 42+ messages in thread
From: Andrew Cooper @ 2024-09-03 14:19 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Oleksii Kurochko

SBI has an API for shutdown so wire it up.  However, the spec does allow the
call not to be implemented, so we have to cope with the call return returning.

There is a reboot-capable SBI extention, but in the short term route route
machine_restart() into machine_halt().

Then, use use machine_halt() rather than an infinite loop at the end of
start_xen().  This avoids the Qemu smoke test needing to wait for the full
timeout in order to succeed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>

As per commit e44f33ccddc2 ("ppc/shutdown: Implement
machine_{halt,restart}()")

Simply replacing BUG() with a printk() is just swapping one problem for
another.
---
 xen/arch/riscv/Makefile          |  1 +
 xen/arch/riscv/include/asm/sbi.h |  3 +++
 xen/arch/riscv/sbi.c             |  5 +++++
 xen/arch/riscv/setup.c           |  6 ++----
 xen/arch/riscv/shutdown.c        | 25 +++++++++++++++++++++++++
 xen/arch/riscv/stubs.c           | 12 ------------
 6 files changed, 36 insertions(+), 16 deletions(-)
 create mode 100644 xen/arch/riscv/shutdown.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 81b77b13d652..d192be7b552a 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -4,6 +4,7 @@ obj-y += mm.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
+obj-y += shutdown.o
 obj-y += stubs.o
 obj-y += traps.o
 obj-y += vm_event.o
diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h
index 0e6820a4eda3..4d72a2295e72 100644
--- a/xen/arch/riscv/include/asm/sbi.h
+++ b/xen/arch/riscv/include/asm/sbi.h
@@ -13,6 +13,7 @@
 #define __ASM_RISCV_SBI_H__
 
 #define SBI_EXT_0_1_CONSOLE_PUTCHAR		0x1
+#define SBI_EXT_0_1_SHUTDOWN			0x8
 
 struct sbiret {
     long error;
@@ -31,4 +32,6 @@ struct sbiret sbi_ecall(unsigned long ext, unsigned long fid,
  */
 void sbi_console_putchar(int ch);
 
+void sbi_shutdown(void);
+
 #endif /* __ASM_RISCV_SBI_H__ */
diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c
index 0ae166c8610e..c7984344bc6b 100644
--- a/xen/arch/riscv/sbi.c
+++ b/xen/arch/riscv/sbi.c
@@ -42,3 +42,8 @@ void sbi_console_putchar(int ch)
 {
     sbi_ecall(SBI_EXT_0_1_CONSOLE_PUTCHAR, 0, ch, 0, 0, 0, 0, 0);
 }
+
+void sbi_shutdown(void)
+{
+    sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0);
+}
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index a6a29a150869..bf9078f36aff 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -4,6 +4,7 @@
 #include <xen/compile.h>
 #include <xen/init.h>
 #include <xen/mm.h>
+#include <xen/shutdown.h>
 
 #include <public/version.h>
 
@@ -28,8 +29,5 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
 
     printk("All set up\n");
 
-    for ( ;; )
-        asm volatile ("wfi");
-
-    unreachable();
+    machine_halt();
 }
diff --git a/xen/arch/riscv/shutdown.c b/xen/arch/riscv/shutdown.c
new file mode 100644
index 000000000000..270bb26b68a6
--- /dev/null
+++ b/xen/arch/riscv/shutdown.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include <xen/shutdown.h>
+
+#include <asm/sbi.h>
+
+void machine_halt(void)
+{
+    sbi_shutdown();
+
+    for ( ;; )
+        asm volatile ("wfi");
+
+    unreachable();
+}
+
+void machine_restart(unsigned int delay_millisecs)
+{
+    /*
+     * TODO: mdelay(delay_millisecs)
+     * TODO: Probe for #SRST support, where sbi_system_reset() has a
+     *       shutdown/reboot parameter.
+     */
+
+    machine_halt();
+}
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 3285d1889940..2aa245f272b5 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -49,18 +49,6 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
     BUG_ON("unimplemented");
 }
 
-/* shutdown.c */
-
-void machine_restart(unsigned int delay_millisecs)
-{
-    BUG_ON("unimplemented");
-}
-
-void machine_halt(void)
-{
-    BUG_ON("unimplemented");
-}
-
 /* domctl.c */
 
 long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,

base-commit: 1e6bb29b03680a9d0e12f14c4d406a0d67317ea7
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
@ 2024-09-03 14:23     ` Andrew Cooper
  2024-09-03 14:27       ` Jan Beulich
  2024-09-03 14:26     ` Jan Beulich
  2024-09-04 10:22     ` oleksii.kurochko
  2 siblings, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2024-09-03 14:23 UTC (permalink / raw)
  To: Xen-devel; +Cc: Oleksii Kurochko

On 03/09/2024 3:19 pm, Andrew Cooper wrote:
> SBI has an API for shutdown so wire it up.  However, the spec does allow the
> call not to be implemented, so we have to cope with the call return returning.

Sorry, this is supposed to read "... cope with sbi_shutdown() returning."

~Andrew

>
> There is a reboot-capable SBI extention, but in the short term route route
> machine_restart() into machine_halt().
>
> Then, use use machine_halt() rather than an infinite loop at the end of
> start_xen().  This avoids the Qemu smoke test needing to wait for the full
> timeout in order to succeed.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> As per commit e44f33ccddc2 ("ppc/shutdown: Implement
> machine_{halt,restart}()")
>
> Simply replacing BUG() with a printk() is just swapping one problem for
> another.
> ---
>  xen/arch/riscv/Makefile          |  1 +
>  xen/arch/riscv/include/asm/sbi.h |  3 +++
>  xen/arch/riscv/sbi.c             |  5 +++++
>  xen/arch/riscv/setup.c           |  6 ++----
>  xen/arch/riscv/shutdown.c        | 25 +++++++++++++++++++++++++
>  xen/arch/riscv/stubs.c           | 12 ------------
>  6 files changed, 36 insertions(+), 16 deletions(-)
>  create mode 100644 xen/arch/riscv/shutdown.c
>
> diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> index 81b77b13d652..d192be7b552a 100644
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -4,6 +4,7 @@ obj-y += mm.o
>  obj-$(CONFIG_RISCV_64) += riscv64/
>  obj-y += sbi.o
>  obj-y += setup.o
> +obj-y += shutdown.o
>  obj-y += stubs.o
>  obj-y += traps.o
>  obj-y += vm_event.o
> diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h
> index 0e6820a4eda3..4d72a2295e72 100644
> --- a/xen/arch/riscv/include/asm/sbi.h
> +++ b/xen/arch/riscv/include/asm/sbi.h
> @@ -13,6 +13,7 @@
>  #define __ASM_RISCV_SBI_H__
>  
>  #define SBI_EXT_0_1_CONSOLE_PUTCHAR		0x1
> +#define SBI_EXT_0_1_SHUTDOWN			0x8
>  
>  struct sbiret {
>      long error;
> @@ -31,4 +32,6 @@ struct sbiret sbi_ecall(unsigned long ext, unsigned long fid,
>   */
>  void sbi_console_putchar(int ch);
>  
> +void sbi_shutdown(void);
> +
>  #endif /* __ASM_RISCV_SBI_H__ */
> diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c
> index 0ae166c8610e..c7984344bc6b 100644
> --- a/xen/arch/riscv/sbi.c
> +++ b/xen/arch/riscv/sbi.c
> @@ -42,3 +42,8 @@ void sbi_console_putchar(int ch)
>  {
>      sbi_ecall(SBI_EXT_0_1_CONSOLE_PUTCHAR, 0, ch, 0, 0, 0, 0, 0);
>  }
> +
> +void sbi_shutdown(void)
> +{
> +    sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0);
> +}
> diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
> index a6a29a150869..bf9078f36aff 100644
> --- a/xen/arch/riscv/setup.c
> +++ b/xen/arch/riscv/setup.c
> @@ -4,6 +4,7 @@
>  #include <xen/compile.h>
>  #include <xen/init.h>
>  #include <xen/mm.h>
> +#include <xen/shutdown.h>
>  
>  #include <public/version.h>
>  
> @@ -28,8 +29,5 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
>  
>      printk("All set up\n");
>  
> -    for ( ;; )
> -        asm volatile ("wfi");
> -
> -    unreachable();
> +    machine_halt();
>  }
> diff --git a/xen/arch/riscv/shutdown.c b/xen/arch/riscv/shutdown.c
> new file mode 100644
> index 000000000000..270bb26b68a6
> --- /dev/null
> +++ b/xen/arch/riscv/shutdown.c
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#include <xen/shutdown.h>
> +
> +#include <asm/sbi.h>
> +
> +void machine_halt(void)
> +{
> +    sbi_shutdown();
> +
> +    for ( ;; )
> +        asm volatile ("wfi");
> +
> +    unreachable();
> +}
> +
> +void machine_restart(unsigned int delay_millisecs)
> +{
> +    /*
> +     * TODO: mdelay(delay_millisecs)
> +     * TODO: Probe for #SRST support, where sbi_system_reset() has a
> +     *       shutdown/reboot parameter.
> +     */
> +
> +    machine_halt();
> +}
> diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
> index 3285d1889940..2aa245f272b5 100644
> --- a/xen/arch/riscv/stubs.c
> +++ b/xen/arch/riscv/stubs.c
> @@ -49,18 +49,6 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
>      BUG_ON("unimplemented");
>  }
>  
> -/* shutdown.c */
> -
> -void machine_restart(unsigned int delay_millisecs)
> -{
> -    BUG_ON("unimplemented");
> -}
> -
> -void machine_halt(void)
> -{
> -    BUG_ON("unimplemented");
> -}
> -
>  /* domctl.c */
>  
>  long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
>
> base-commit: 1e6bb29b03680a9d0e12f14c4d406a0d67317ea7



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-03 14:23     ` Andrew Cooper
@ 2024-09-03 14:27       ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-03 14:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Oleksii Kurochko, Xen-devel

On 03.09.2024 16:23, Andrew Cooper wrote:
> On 03/09/2024 3:19 pm, Andrew Cooper wrote:
>> SBI has an API for shutdown so wire it up.  However, the spec does allow the
>> call not to be implemented, so we have to cope with the call return returning.
> 
> Sorry, this is supposed to read "... cope with sbi_shutdown() returning."

And then perhaps also ...

>> There is a reboot-capable SBI extention, but in the short term route route
>> machine_restart() into machine_halt().

... one "route" dropped from this sentence?

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
  2024-09-03 14:23     ` Andrew Cooper
@ 2024-09-03 14:26     ` Jan Beulich
  2024-09-03 14:27       ` Andrew Cooper
  2024-09-04 10:22     ` oleksii.kurochko
  2 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-03 14:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Oleksii Kurochko, Xen-devel

On 03.09.2024 16:19, Andrew Cooper wrote:
> SBI has an API for shutdown so wire it up.  However, the spec does allow the
> call not to be implemented, so we have to cope with the call return returning.
> 
> There is a reboot-capable SBI extention, but in the short term route route
> machine_restart() into machine_halt().
> 
> Then, use use machine_halt() rather than an infinite loop at the end of
> start_xen().  This avoids the Qemu smoke test needing to wait for the full
> timeout in order to succeed.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
ideally with ...

> --- /dev/null
> +++ b/xen/arch/riscv/shutdown.c
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#include <xen/shutdown.h>
> +
> +#include <asm/sbi.h>
> +
> +void machine_halt(void)
> +{
> +    sbi_shutdown();
> +
> +    for ( ;; )
> +        asm volatile ("wfi");

... the missing blanks added here, as you move that loop around.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-03 14:26     ` Jan Beulich
@ 2024-09-03 14:27       ` Andrew Cooper
  0 siblings, 0 replies; 42+ messages in thread
From: Andrew Cooper @ 2024-09-03 14:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Oleksii Kurochko, Xen-devel

On 03/09/2024 3:26 pm, Jan Beulich wrote:
> On 03.09.2024 16:19, Andrew Cooper wrote:
>> SBI has an API for shutdown so wire it up.  However, the spec does allow the
>> call not to be implemented, so we have to cope with the call return returning.
>>
>> There is a reboot-capable SBI extention, but in the short term route route
>> machine_restart() into machine_halt().
>>
>> Then, use use machine_halt() rather than an infinite loop at the end of
>> start_xen().  This avoids the Qemu smoke test needing to wait for the full
>> timeout in order to succeed.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> ideally with ...
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/shutdown.c
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +#include <xen/shutdown.h>
>> +
>> +#include <asm/sbi.h>
>> +
>> +void machine_halt(void)
>> +{
>> +    sbi_shutdown();
>> +
>> +    for ( ;; )
>> +        asm volatile ("wfi");
> ... the missing blanks added here, as you move that loop around.

Ah yes.  Will do.

~Andrew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] RISCV/shutdown: Implement machine_{halt,restart}()
  2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
  2024-09-03 14:23     ` Andrew Cooper
  2024-09-03 14:26     ` Jan Beulich
@ 2024-09-04 10:22     ` oleksii.kurochko
  2 siblings, 0 replies; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-04 10:22 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel

On Tue, 2024-09-03 at 15:19 +0100, Andrew Cooper wrote:
> SBI has an API for shutdown so wire it up.  However, the spec does
> allow the
> call not to be implemented, so we have to cope with the call return
> returning.
> 
> There is a reboot-capable SBI extention, but in the short term route
> route
> machine_restart() into machine_halt().
> 
> Then, use use machine_halt() rather than an infinite loop at the end
> of
> start_xen().  This avoids the Qemu smoke test needing to wait for the
> full
> timeout in order to succeed.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
LGTM:
 Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Thanks for the patch.

~ Oleksii

> 
> As per commit e44f33ccddc2 ("ppc/shutdown: Implement
> machine_{halt,restart}()")
> 
> Simply replacing BUG() with a printk() is just swapping one problem
> for
> another.
> ---
>  xen/arch/riscv/Makefile          |  1 +
>  xen/arch/riscv/include/asm/sbi.h |  3 +++
>  xen/arch/riscv/sbi.c             |  5 +++++
>  xen/arch/riscv/setup.c           |  6 ++----
>  xen/arch/riscv/shutdown.c        | 25 +++++++++++++++++++++++++
>  xen/arch/riscv/stubs.c           | 12 ------------
>  6 files changed, 36 insertions(+), 16 deletions(-)
>  create mode 100644 xen/arch/riscv/shutdown.c
> 
> diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> index 81b77b13d652..d192be7b552a 100644
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -4,6 +4,7 @@ obj-y += mm.o
>  obj-$(CONFIG_RISCV_64) += riscv64/
>  obj-y += sbi.o
>  obj-y += setup.o
> +obj-y += shutdown.o
>  obj-y += stubs.o
>  obj-y += traps.o
>  obj-y += vm_event.o
> diff --git a/xen/arch/riscv/include/asm/sbi.h
> b/xen/arch/riscv/include/asm/sbi.h
> index 0e6820a4eda3..4d72a2295e72 100644
> --- a/xen/arch/riscv/include/asm/sbi.h
> +++ b/xen/arch/riscv/include/asm/sbi.h
> @@ -13,6 +13,7 @@
>  #define __ASM_RISCV_SBI_H__
>  
>  #define SBI_EXT_0_1_CONSOLE_PUTCHAR		0x1
> +#define SBI_EXT_0_1_SHUTDOWN			0x8
>  
>  struct sbiret {
>      long error;
> @@ -31,4 +32,6 @@ struct sbiret sbi_ecall(unsigned long ext, unsigned
> long fid,
>   */
>  void sbi_console_putchar(int ch);
>  
> +void sbi_shutdown(void);
> +
>  #endif /* __ASM_RISCV_SBI_H__ */
> diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c
> index 0ae166c8610e..c7984344bc6b 100644
> --- a/xen/arch/riscv/sbi.c
> +++ b/xen/arch/riscv/sbi.c
> @@ -42,3 +42,8 @@ void sbi_console_putchar(int ch)
>  {
>      sbi_ecall(SBI_EXT_0_1_CONSOLE_PUTCHAR, 0, ch, 0, 0, 0, 0, 0);
>  }
> +
> +void sbi_shutdown(void)
> +{
> +    sbi_ecall(SBI_EXT_0_1_SHUTDOWN, 0, 0, 0, 0, 0, 0, 0);
> +}
> diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
> index a6a29a150869..bf9078f36aff 100644
> --- a/xen/arch/riscv/setup.c
> +++ b/xen/arch/riscv/setup.c
> @@ -4,6 +4,7 @@
>  #include <xen/compile.h>
>  #include <xen/init.h>
>  #include <xen/mm.h>
> +#include <xen/shutdown.h>
>  
>  #include <public/version.h>
>  
> @@ -28,8 +29,5 @@ void __init noreturn start_xen(unsigned long
> bootcpu_id,
>  
>      printk("All set up\n");
>  
> -    for ( ;; )
> -        asm volatile ("wfi");
> -
> -    unreachable();
> +    machine_halt();
>  }
> diff --git a/xen/arch/riscv/shutdown.c b/xen/arch/riscv/shutdown.c
> new file mode 100644
> index 000000000000..270bb26b68a6
> --- /dev/null
> +++ b/xen/arch/riscv/shutdown.c
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#include <xen/shutdown.h>
> +
> +#include <asm/sbi.h>
> +
> +void machine_halt(void)
> +{
> +    sbi_shutdown();
> +
> +    for ( ;; )
> +        asm volatile ("wfi");
> +
> +    unreachable();
> +}
> +
> +void machine_restart(unsigned int delay_millisecs)
> +{
> +    /*
> +     * TODO: mdelay(delay_millisecs)
> +     * TODO: Probe for #SRST support, where sbi_system_reset() has a
> +     *       shutdown/reboot parameter.
> +     */
> +
> +    machine_halt();
> +}
> diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
> index 3285d1889940..2aa245f272b5 100644
> --- a/xen/arch/riscv/stubs.c
> +++ b/xen/arch/riscv/stubs.c
> @@ -49,18 +49,6 @@ void domain_set_time_offset(struct domain *d,
> int64_t time_offset_seconds)
>      BUG_ON("unimplemented");
>  }
>  
> -/* shutdown.c */
> -
> -void machine_restart(unsigned int delay_millisecs)
> -{
> -    BUG_ON("unimplemented");
> -}
> -
> -void machine_halt(void)
> -{
> -    BUG_ON("unimplemented");
> -}
> -
>  /* domctl.c */
>  
>  long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
> 
> base-commit: 1e6bb29b03680a9d0e12f14c4d406a0d67317ea7



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called
  2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
  2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
@ 2024-09-10  9:42   ` Jan Beulich
  2024-09-10 13:55     ` oleksii.kurochko
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-10  9:42 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> Implement machine_restart() using printk() to prevent recursion that
> occurs when ASSERT(), BUG*(), or panic() are invoked.
> All these macros (except panic() which could be called directly)
> eventually call panic(), which then calls machine_restart(),
> leading to a recursive loop.

Right, that pretty likely was an oversight. Yet then ...

> --- a/xen/arch/riscv/stubs.c
> +++ b/xen/arch/riscv/stubs.c
> @@ -53,7 +53,7 @@ void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
>  
>  void machine_restart(unsigned int delay_millisecs)
>  {
> -    BUG_ON("unimplemented");
> +    printk("%s: unimplemented\n", __func__);
>  }

... you still want to halt execution here, by (re?)adding a for() loop
of the kind you at least had in a few places earlier on. The function
is declared noreturn after all, which you're now violating. I'm
actually surprised the compiler didn't complain to you.

The same is also going to be needed for machine_halt(), btw: As soon
as you get far enough to parse the command line, "noreboot" on the
command line would have crashes end up there, not here.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called
  2024-09-10  9:42   ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Jan Beulich
@ 2024-09-10 13:55     ` oleksii.kurochko
  0 siblings, 0 replies; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-10 13:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 11:42 +0200, Jan Beulich wrote:
> On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > Implement machine_restart() using printk() to prevent recursion
> > that
> > occurs when ASSERT(), BUG*(), or panic() are invoked.
> > All these macros (except panic() which could be called directly)
> > eventually call panic(), which then calls machine_restart(),
> > leading to a recursive loop.
> 
> Right, that pretty likely was an oversight. Yet then ...
> 
> > --- a/xen/arch/riscv/stubs.c
> > +++ b/xen/arch/riscv/stubs.c
> > @@ -53,7 +53,7 @@ void domain_set_time_offset(struct domain *d,
> > int64_t time_offset_seconds)
> >  
> >  void machine_restart(unsigned int delay_millisecs)
> >  {
> > -    BUG_ON("unimplemented");
> > +    printk("%s: unimplemented\n", __func__);
> >  }
> 
> ... you still want to halt execution here, by (re?)adding a for()
> loop
> of the kind you at least had in a few places earlier on. The function
> is declared noreturn after all, which you're now violating. I'm
> actually surprised the compiler didn't complain to you.
> 
> The same is also going to be needed for machine_halt(), btw: As soon
> as you get far enough to parse the command line, "noreboot" on the
> command line would have crashes end up there, not here.

I will drop this patch in the next version as Andrew C. provides the
patch:
https://gitlab.com/xen-project/people/olkur/xen/-/commit/ea6d5a148970a7f8066e51e64fe67a9bd51e3084


~ Oleksii


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-03 14:21   ` Andrew Cooper
  2024-09-02 17:01 ` [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types Oleksii Kurochko
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

The functions {read,write}{b,w,l,q}_cpu() do not need to be memory-ordered
atomic operations in Xen, based on their definitions for other architectures.

Therefore, {read,write}{b,w,l,q}_cpu() can be used instead of
{read,write}{b,w,l,q}(), allowing the caller to decide if additional
fences should be applied before or after {read,write}_atomic().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V6:
 - revert changes connected to _write_atomic() prototype and in definition of write_atomic().
 - update the commit message.
---
Changes in v5:
 - new patch.
---
 xen/arch/riscv/include/asm/atomic.h | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h
index 31b91a79c8..3c6bd86406 100644
--- a/xen/arch/riscv/include/asm/atomic.h
+++ b/xen/arch/riscv/include/asm/atomic.h
@@ -31,21 +31,17 @@
 
 void __bad_atomic_size(void);
 
-/*
- * Legacy from Linux kernel. For some reason they wanted to have ordered
- * read/write access. Thereby read* is used instead of read*_cpu()
- */
 static always_inline void read_atomic_size(const volatile void *p,
                                            void *res,
                                            unsigned int size)
 {
     switch ( size )
     {
-    case 1: *(uint8_t *)res = readb(p); break;
-    case 2: *(uint16_t *)res = readw(p); break;
-    case 4: *(uint32_t *)res = readl(p); break;
+    case 1: *(uint8_t *)res = readb_cpu(p); break;
+    case 2: *(uint16_t *)res = readw_cpu(p); break;
+    case 4: *(uint32_t *)res = readl_cpu(p); break;
 #ifndef CONFIG_RISCV_32
-    case 8: *(uint32_t *)res = readq(p); break;
+    case 8: *(uint32_t *)res = readq_cpu(p); break;
 #endif
     default: __bad_atomic_size(); break;
     }
@@ -58,15 +54,16 @@ static always_inline void read_atomic_size(const volatile void *p,
 })
 
 static always_inline void _write_atomic(volatile void *p,
-                                       unsigned long x, unsigned int size)
+                                        unsigned long x,
+                                        unsigned int size)
 {
     switch ( size )
     {
-    case 1: writeb(x, p); break;
-    case 2: writew(x, p); break;
-    case 4: writel(x, p); break;
+    case 1: writeb_cpu(x, p); break;
+    case 2: writew_cpu(x, p); break;
+    case 4: writel_cpu(x, p); break;
 #ifndef CONFIG_RISCV_32
-    case 8: writeq(x, p); break;
+    case 8: writeq_cpu(x, p); break;
 #endif
     default: __bad_atomic_size(); break;
     }
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
  2024-09-02 17:01 ` [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic() Oleksii Kurochko
@ 2024-09-03 14:21   ` Andrew Cooper
  2024-09-04 10:27     ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2024-09-03 14:21 UTC (permalink / raw)
  To: Oleksii Kurochko, xen-devel
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Jan Beulich,
	Julien Grall, Stefano Stabellini

On 02/09/2024 6:01 pm, Oleksii Kurochko wrote:
> diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h
> index 31b91a79c8..3c6bd86406 100644
> --- a/xen/arch/riscv/include/asm/atomic.h
> +++ b/xen/arch/riscv/include/asm/atomic.h
> @@ -31,21 +31,17 @@
>  
>  void __bad_atomic_size(void);
>  
> -/*
> - * Legacy from Linux kernel. For some reason they wanted to have ordered
> - * read/write access. Thereby read* is used instead of read*_cpu()
> - */
>  static always_inline void read_atomic_size(const volatile void *p,
>                                             void *res,
>                                             unsigned int size)
>  {
>      switch ( size )
>      {
> -    case 1: *(uint8_t *)res = readb(p); break;
> -    case 2: *(uint16_t *)res = readw(p); break;
> -    case 4: *(uint32_t *)res = readl(p); break;
> +    case 1: *(uint8_t *)res = readb_cpu(p); break;
> +    case 2: *(uint16_t *)res = readw_cpu(p); break;
> +    case 4: *(uint32_t *)res = readl_cpu(p); break;
>  #ifndef CONFIG_RISCV_32
> -    case 8: *(uint32_t *)res = readq(p); break;
> +    case 8: *(uint32_t *)res = readq_cpu(p); break;

This cast looks suspiciously like it's wrong already in staging...

~Andrew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
  2024-09-03 14:21   ` Andrew Cooper
@ 2024-09-04 10:27     ` oleksii.kurochko
  2024-09-04 10:31       ` Andrew Cooper
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-04 10:27 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Jan Beulich,
	Julien Grall, Stefano Stabellini

On Tue, 2024-09-03 at 15:21 +0100, Andrew Cooper wrote:
> On 02/09/2024 6:01 pm, Oleksii Kurochko wrote:
> > diff --git a/xen/arch/riscv/include/asm/atomic.h
> > b/xen/arch/riscv/include/asm/atomic.h
> > index 31b91a79c8..3c6bd86406 100644
> > --- a/xen/arch/riscv/include/asm/atomic.h
> > +++ b/xen/arch/riscv/include/asm/atomic.h
> > @@ -31,21 +31,17 @@
> >  
> >  void __bad_atomic_size(void);
> >  
> > -/*
> > - * Legacy from Linux kernel. For some reason they wanted to have
> > ordered
> > - * read/write access. Thereby read* is used instead of read*_cpu()
> > - */
> >  static always_inline void read_atomic_size(const volatile void *p,
> >                                             void *res,
> >                                             unsigned int size)
> >  {
> >      switch ( size )
> >      {
> > -    case 1: *(uint8_t *)res = readb(p); break;
> > -    case 2: *(uint16_t *)res = readw(p); break;
> > -    case 4: *(uint32_t *)res = readl(p); break;
> > +    case 1: *(uint8_t *)res = readb_cpu(p); break;
> > +    case 2: *(uint16_t *)res = readw_cpu(p); break;
> > +    case 4: *(uint32_t *)res = readl_cpu(p); break;
> >  #ifndef CONFIG_RISCV_32
> > -    case 8: *(uint32_t *)res = readq(p); break;
> > +    case 8: *(uint32_t *)res = readq_cpu(p); break;
> 
> This cast looks suspiciously like it's wrong already in staging...
Thanks for noticing that, it should be really uint64_t. I'll update
that in the next patch version.

~ Oleksii


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
  2024-09-04 10:27     ` oleksii.kurochko
@ 2024-09-04 10:31       ` Andrew Cooper
  2024-09-05 15:45         ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Cooper @ 2024-09-04 10:31 UTC (permalink / raw)
  To: oleksii.kurochko, xen-devel
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Jan Beulich,
	Julien Grall, Stefano Stabellini

On 04/09/2024 11:27 am, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-03 at 15:21 +0100, Andrew Cooper wrote:
>> On 02/09/2024 6:01 pm, Oleksii Kurochko wrote:
>>> diff --git a/xen/arch/riscv/include/asm/atomic.h
>>> b/xen/arch/riscv/include/asm/atomic.h
>>> index 31b91a79c8..3c6bd86406 100644
>>> --- a/xen/arch/riscv/include/asm/atomic.h
>>> +++ b/xen/arch/riscv/include/asm/atomic.h
>>> @@ -31,21 +31,17 @@
>>>  
>>>  void __bad_atomic_size(void);
>>>  
>>> -/*
>>> - * Legacy from Linux kernel. For some reason they wanted to have
>>> ordered
>>> - * read/write access. Thereby read* is used instead of read*_cpu()
>>> - */
>>>  static always_inline void read_atomic_size(const volatile void *p,
>>>                                             void *res,
>>>                                             unsigned int size)
>>>  {
>>>      switch ( size )
>>>      {
>>> -    case 1: *(uint8_t *)res = readb(p); break;
>>> -    case 2: *(uint16_t *)res = readw(p); break;
>>> -    case 4: *(uint32_t *)res = readl(p); break;
>>> +    case 1: *(uint8_t *)res = readb_cpu(p); break;
>>> +    case 2: *(uint16_t *)res = readw_cpu(p); break;
>>> +    case 4: *(uint32_t *)res = readl_cpu(p); break;
>>>  #ifndef CONFIG_RISCV_32
>>> -    case 8: *(uint32_t *)res = readq(p); break;
>>> +    case 8: *(uint32_t *)res = readq_cpu(p); break;
>> This cast looks suspiciously like it's wrong already in staging...
> Thanks for noticing that, it should be really uint64_t. I'll update
> that in the next patch version.

This bug is in 4.19.

I know RISC-V is experimental, but this is the kind of thing that Jan
might consider for backporting.

Whether it gets backported or not, it wants to be in a standalone
bugfix, not as a part of "rewrite the accessors used".

~Andrew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic()
  2024-09-04 10:31       ` Andrew Cooper
@ 2024-09-05 15:45         ` oleksii.kurochko
  0 siblings, 0 replies; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-05 15:45 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Jan Beulich,
	Julien Grall, Stefano Stabellini

On Wed, 2024-09-04 at 11:31 +0100, Andrew Cooper wrote:
> On 04/09/2024 11:27 am, oleksii.kurochko@gmail.com wrote:
> > On Tue, 2024-09-03 at 15:21 +0100, Andrew Cooper wrote:
> > > On 02/09/2024 6:01 pm, Oleksii Kurochko wrote:
> > > > diff --git a/xen/arch/riscv/include/asm/atomic.h
> > > > b/xen/arch/riscv/include/asm/atomic.h
> > > > index 31b91a79c8..3c6bd86406 100644
> > > > --- a/xen/arch/riscv/include/asm/atomic.h
> > > > +++ b/xen/arch/riscv/include/asm/atomic.h
> > > > @@ -31,21 +31,17 @@
> > > >  
> > > >  void __bad_atomic_size(void);
> > > >  
> > > > -/*
> > > > - * Legacy from Linux kernel. For some reason they wanted to
> > > > have
> > > > ordered
> > > > - * read/write access. Thereby read* is used instead of
> > > > read*_cpu()
> > > > - */
> > > >  static always_inline void read_atomic_size(const volatile void
> > > > *p,
> > > >                                             void *res,
> > > >                                             unsigned int size)
> > > >  {
> > > >      switch ( size )
> > > >      {
> > > > -    case 1: *(uint8_t *)res = readb(p); break;
> > > > -    case 2: *(uint16_t *)res = readw(p); break;
> > > > -    case 4: *(uint32_t *)res = readl(p); break;
> > > > +    case 1: *(uint8_t *)res = readb_cpu(p); break;
> > > > +    case 2: *(uint16_t *)res = readw_cpu(p); break;
> > > > +    case 4: *(uint32_t *)res = readl_cpu(p); break;
> > > >  #ifndef CONFIG_RISCV_32
> > > > -    case 8: *(uint32_t *)res = readq(p); break;
> > > > +    case 8: *(uint32_t *)res = readq_cpu(p); break;
> > > This cast looks suspiciously like it's wrong already in
> > > staging...
> > Thanks for noticing that, it should be really uint64_t. I'll update
> > that in the next patch version.
> 
> This bug is in 4.19.
> 
> I know RISC-V is experimental, but this is the kind of thing that Jan
> might consider for backporting.
> 
> Whether it gets backported or not, it wants to be in a standalone
> bugfix, not as a part of "rewrite the accessors used".
It makes sense. I will send a separate patch tomorrow.

~ Oleksii



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic() Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-10  9:53   ` Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 4/9] xen/riscv: set up fixmap mappings Oleksii Kurochko
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Update the 2nd argument of _write_atomic() from 'unsigned long x'
to 'void *x' to allow write_atomic() to handle non-scalar types,
aligning it with read_atomic(), which can work with non-scalar types.

Additionally, update the implementation of _add_sized() to use
"writeX_cpu(readX_cpu(p) + x, p)" instead of
"write_atomic(ptr, read_atomic(ptr) + x)" because 'ptr' is defined
as 'volatile uintX_t *'.
This avoids a compilation error that occurs when passing the 2nd
argument to _write_atomic() (i.e., "passing argument 2 of '_write_atomic'
discards 'volatile' qualifier from pointer target type") since the 2nd
argument of _write_atomic() is now 'void *' instead of 'unsigned long'.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v6:
 - new patch.
---
 xen/arch/riscv/include/asm/atomic.h | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/xen/arch/riscv/include/asm/atomic.h b/xen/arch/riscv/include/asm/atomic.h
index 3c6bd86406..92b92fb4d4 100644
--- a/xen/arch/riscv/include/asm/atomic.h
+++ b/xen/arch/riscv/include/asm/atomic.h
@@ -54,16 +54,16 @@ static always_inline void read_atomic_size(const volatile void *p,
 })
 
 static always_inline void _write_atomic(volatile void *p,
-                                        unsigned long x,
+                                        void *x,
                                         unsigned int size)
 {
     switch ( size )
     {
-    case 1: writeb_cpu(x, p); break;
-    case 2: writew_cpu(x, p); break;
-    case 4: writel_cpu(x, p); break;
+    case 1: writeb_cpu(*(uint8_t *)x, p); break;
+    case 2: writew_cpu(*(uint16_t *)x, p); break;
+    case 4: writel_cpu(*(uint32_t *)x, p); break;
 #ifndef CONFIG_RISCV_32
-    case 8: writeq_cpu(x, p); break;
+    case 8: writeq_cpu(*(uint64_t *)x, p); break;
 #endif
     default: __bad_atomic_size(); break;
     }
@@ -72,7 +72,7 @@ static always_inline void _write_atomic(volatile void *p,
 #define write_atomic(p, x)                              \
 ({                                                      \
     typeof(*(p)) x_ = (x);                              \
-    _write_atomic(p, x_, sizeof(*(p)));                 \
+    _write_atomic(p, &x_, sizeof(*(p)));                \
 })
 
 static always_inline void _add_sized(volatile void *p,
@@ -82,27 +82,23 @@ static always_inline void _add_sized(volatile void *p,
     {
     case 1:
     {
-        volatile uint8_t *ptr = p;
-        write_atomic(ptr, read_atomic(ptr) + x);
+        writeb_cpu(readb_cpu(p) + x, p);
         break;
     }
     case 2:
     {
-        volatile uint16_t *ptr = p;
-        write_atomic(ptr, read_atomic(ptr) + x);
+        writew_cpu(readw_cpu(p) + x, p);
         break;
     }
     case 4:
     {
-        volatile uint32_t *ptr = p;
-        write_atomic(ptr, read_atomic(ptr) + x);
+        writel_cpu(readl_cpu(p) + x, p);
         break;
     }
 #ifndef CONFIG_RISCV_32
     case 8:
     {
-        volatile uint64_t *ptr = p;
-        write_atomic(ptr, read_atomic(ptr) + x);
+        writeq_cpu(readw_cpu(p) + x, p);
         break;
     }
 #endif
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-02 17:01 ` [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types Oleksii Kurochko
@ 2024-09-10  9:53   ` Jan Beulich
  2024-09-10 15:28     ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-10  9:53 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/atomic.h
> +++ b/xen/arch/riscv/include/asm/atomic.h
> @@ -54,16 +54,16 @@ static always_inline void read_atomic_size(const volatile void *p,
>  })
>  
>  static always_inline void _write_atomic(volatile void *p,
> -                                        unsigned long x,
> +                                        void *x,

Pointer-to-const please, to further aid in easily recognizing which
parameter is what. After all ...

>                                          unsigned int size)
>  {
>      switch ( size )
>      {
> -    case 1: writeb_cpu(x, p); break;
> -    case 2: writew_cpu(x, p); break;
> -    case 4: writel_cpu(x, p); break;

... unhelpfully enough parameters are then swapped, just to confuse
things.

> +    case 1: writeb_cpu(*(uint8_t *)x, p); break;
> +    case 2: writew_cpu(*(uint16_t *)x, p); break;
> +    case 4: writel_cpu(*(uint32_t *)x, p); break;
>  #ifndef CONFIG_RISCV_32
> -    case 8: writeq_cpu(x, p); break;
> +    case 8: writeq_cpu(*(uint64_t *)x, p); break;

With const added to the parameter, please further make sure to then not
cast that away again.

> @@ -72,7 +72,7 @@ static always_inline void _write_atomic(volatile void *p,
>  #define write_atomic(p, x)                              \
>  ({                                                      \
>      typeof(*(p)) x_ = (x);                              \
> -    _write_atomic(p, x_, sizeof(*(p)));                 \
> +    _write_atomic(p, &x_, sizeof(*(p)));                \
>  })
>  
>  static always_inline void _add_sized(volatile void *p,
> @@ -82,27 +82,23 @@ static always_inline void _add_sized(volatile void *p,
>      {
>      case 1:
>      {
> -        volatile uint8_t *ptr = p;
> -        write_atomic(ptr, read_atomic(ptr) + x);
> +        writeb_cpu(readb_cpu(p) + x, p);
>          break;
>      }
>      case 2:
>      {
> -        volatile uint16_t *ptr = p;
> -        write_atomic(ptr, read_atomic(ptr) + x);
> +        writew_cpu(readw_cpu(p) + x, p);
>          break;
>      }
>      case 4:
>      {
> -        volatile uint32_t *ptr = p;
> -        write_atomic(ptr, read_atomic(ptr) + x);
> +        writel_cpu(readl_cpu(p) + x, p);
>          break;
>      }
>  #ifndef CONFIG_RISCV_32
>      case 8:
>      {
> -        volatile uint64_t *ptr = p;
> -        write_atomic(ptr, read_atomic(ptr) + x);
> +        writeq_cpu(readw_cpu(p) + x, p);
>          break;
>      }
>  #endif

I'm afraid I don't understand this part, or more specifically the respective
part of the description. It is the first parameter of write_atomic() which is
volatile qualified. And it is the first argument that's volatile qualified
here. Isn't the problem entirely unrelated to volatile-ness, and instead a
result of the other parameter changing from scalar to pointer type, which
doesn't fit the addition expressions you pass in?

Also you surely mean readq_cpu() in the 8-byte case.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-10  9:53   ` Jan Beulich
@ 2024-09-10 15:28     ` oleksii.kurochko
  2024-09-10 16:05       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-10 15:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
> On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > --- a/xen/arch/riscv/include/asm/atomic.h
> > +++ b/xen/arch/riscv/include/asm/atomic.h
> > @@ -54,16 +54,16 @@ static always_inline void
> > read_atomic_size(const volatile void *p,
> >  })
> >  
> >  static always_inline void _write_atomic(volatile void *p,
> > -                                        unsigned long x,
> > +                                        void *x,
> 
> Pointer-to-const please, to further aid in easily recognizing which
> parameter is what. After all ...
> 
> >                                          unsigned int size)
> >  {
> >      switch ( size )
> >      {
> > -    case 1: writeb_cpu(x, p); break;
> > -    case 2: writew_cpu(x, p); break;
> > -    case 4: writel_cpu(x, p); break;
> 
> ... unhelpfully enough parameters are then swapped, just to confuse
> things.
If it would be better to keep 'unsigned long' as the type of x, then,
if I am not mistaken, write_atomic() should be updated in the following
way:
   #define write_atomic(p, x)                              \
   ({                                                      \
       typeof(*(p)) x_ = (x);                              \
       _write_atomic(p, *(unsigned long *)&x_, sizeof(*(p)));            
   \
   })
However, I am not sure if it is safe when x is a 2-byte value (for
example) that it will read more than 2 bytes before passing the value
to the _write_atomic() function.

> 
> > +    case 1: writeb_cpu(*(uint8_t *)x, p); break;
> > +    case 2: writew_cpu(*(uint16_t *)x, p); break;
> > +    case 4: writel_cpu(*(uint32_t *)x, p); break;
> >  #ifndef CONFIG_RISCV_32
> > -    case 8: writeq_cpu(x, p); break;
> > +    case 8: writeq_cpu(*(uint64_t *)x, p); break;
> 
> With const added to the parameter, please further make sure to then
> not
> cast that away again.
> 
> > @@ -72,7 +72,7 @@ static always_inline void _write_atomic(volatile
> > void *p,
> >  #define write_atomic(p, x)                              \
> >  ({                                                      \
> >      typeof(*(p)) x_ = (x);                              \
> > -    _write_atomic(p, x_, sizeof(*(p)));                 \
> > +    _write_atomic(p, &x_, sizeof(*(p)));                \
> >  })
> >  
> >  static always_inline void _add_sized(volatile void *p,
> > @@ -82,27 +82,23 @@ static always_inline void _add_sized(volatile
> > void *p,
> >      {
> >      case 1:
> >      {
> > -        volatile uint8_t *ptr = p;
> > -        write_atomic(ptr, read_atomic(ptr) + x);
> > +        writeb_cpu(readb_cpu(p) + x, p);
> >          break;
> >      }
> >      case 2:
> >      {
> > -        volatile uint16_t *ptr = p;
> > -        write_atomic(ptr, read_atomic(ptr) + x);
> > +        writew_cpu(readw_cpu(p) + x, p);
> >          break;
> >      }
> >      case 4:
> >      {
> > -        volatile uint32_t *ptr = p;
> > -        write_atomic(ptr, read_atomic(ptr) + x);
> > +        writel_cpu(readl_cpu(p) + x, p);
> >          break;
> >      }
> >  #ifndef CONFIG_RISCV_32
> >      case 8:
> >      {
> > -        volatile uint64_t *ptr = p;
> > -        write_atomic(ptr, read_atomic(ptr) + x);
> > +        writeq_cpu(readw_cpu(p) + x, p);
> >          break;
> >      }
> >  #endif
> 
> I'm afraid I don't understand this part, or more specifically the
> respective
> part of the description. It is the first parameter of write_atomic()
> which is
> volatile qualified. And it is the first argument that's volatile
> qualified
> here. Isn't the problem entirely unrelated to volatile-ness, and
> instead a
> result of the other parameter changing from scalar to pointer type,
> which
> doesn't fit the addition expressions you pass in?
if _add_sized() is defined as it was before:
   static always_inline void _add_sized(volatile void *p,
                                        unsigned long x, unsigned int
   size)
   {
       switch ( size )
       {
       case 1:
       {
           volatile uint8_t *ptr = p;
           write_atomic(ptr, read_atomic(ptr) + x);
           break;
       }
   ...
Then write_atomic(ptr, read_atomic(ptr) + x) will be be changed to:
   volatile uint8_t x_ = (x);
   
And that will cause a compiler error:
   ./arch/riscv/include/asm/atomic.h:75:22: error: passing argument 2
   of '_write_atomic' discards 'volatile' qualifier from pointer target
   type [-Werror=discarded-qualifiers]
      75 |     _write_atomic(p, &x_, sizeof(*(p)));
Because it can't cast 'volatile uint8_t *' to 'void *':
   expected 'void *' but argument is of type 'volatile uint8_t *' {aka
   'volatile unsigned char *'}

> 
> Also you surely mean readq_cpu() in the 8-byte case.
Yes, thanks for finding that.

~ Oleksii


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-10 15:28     ` oleksii.kurochko
@ 2024-09-10 16:05       ` Jan Beulich
  2024-09-11 11:34         ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 16:05 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 10.09.2024 17:28, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
>> On 02.09.2024 19:01, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/atomic.h
>>> +++ b/xen/arch/riscv/include/asm/atomic.h
>>> @@ -54,16 +54,16 @@ static always_inline void
>>> read_atomic_size(const volatile void *p,
>>>  })
>>>  
>>>  static always_inline void _write_atomic(volatile void *p,
>>> -                                        unsigned long x,
>>> +                                        void *x,
>>
>> Pointer-to-const please, to further aid in easily recognizing which
>> parameter is what. After all ...
>>
>>>                                          unsigned int size)
>>>  {
>>>      switch ( size )
>>>      {
>>> -    case 1: writeb_cpu(x, p); break;
>>> -    case 2: writew_cpu(x, p); break;
>>> -    case 4: writel_cpu(x, p); break;
>>
>> ... unhelpfully enough parameters are then swapped, just to confuse
>> things.
> If it would be better to keep 'unsigned long' as the type of x, then,
> if I am not mistaken, write_atomic() should be updated in the following
> way:
>    #define write_atomic(p, x)                              \
>    ({                                                      \
>        typeof(*(p)) x_ = (x);                              \
>        _write_atomic(p, *(unsigned long *)&x_, sizeof(*(p)));            
>    \
>    })
> However, I am not sure if it is safe when x is a 2-byte value (for
> example) that it will read more than 2 bytes before passing the value
> to the _write_atomic() function.

No, that's definitely unsafe.

>>> @@ -72,7 +72,7 @@ static always_inline void _write_atomic(volatile
>>> void *p,
>>>  #define write_atomic(p, x)                              \
>>>  ({                                                      \
>>>      typeof(*(p)) x_ = (x);                              \
>>> -    _write_atomic(p, x_, sizeof(*(p)));                 \
>>> +    _write_atomic(p, &x_, sizeof(*(p)));                \
>>>  })
>>>  
>>>  static always_inline void _add_sized(volatile void *p,
>>> @@ -82,27 +82,23 @@ static always_inline void _add_sized(volatile
>>> void *p,
>>>      {
>>>      case 1:
>>>      {
>>> -        volatile uint8_t *ptr = p;
>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>> +        writeb_cpu(readb_cpu(p) + x, p);
>>>          break;
>>>      }
>>>      case 2:
>>>      {
>>> -        volatile uint16_t *ptr = p;
>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>> +        writew_cpu(readw_cpu(p) + x, p);
>>>          break;
>>>      }
>>>      case 4:
>>>      {
>>> -        volatile uint32_t *ptr = p;
>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>> +        writel_cpu(readl_cpu(p) + x, p);
>>>          break;
>>>      }
>>>  #ifndef CONFIG_RISCV_32
>>>      case 8:
>>>      {
>>> -        volatile uint64_t *ptr = p;
>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>> +        writeq_cpu(readw_cpu(p) + x, p);
>>>          break;
>>>      }
>>>  #endif
>>
>> I'm afraid I don't understand this part, or more specifically the
>> respective
>> part of the description. It is the first parameter of write_atomic()
>> which is
>> volatile qualified. And it is the first argument that's volatile
>> qualified
>> here. Isn't the problem entirely unrelated to volatile-ness, and
>> instead a
>> result of the other parameter changing from scalar to pointer type,
>> which
>> doesn't fit the addition expressions you pass in?
> if _add_sized() is defined as it was before:
>    static always_inline void _add_sized(volatile void *p,
>                                         unsigned long x, unsigned int
>    size)
>    {
>        switch ( size )
>        {
>        case 1:
>        {
>            volatile uint8_t *ptr = p;
>            write_atomic(ptr, read_atomic(ptr) + x);
>            break;
>        }
>    ...
> Then write_atomic(ptr, read_atomic(ptr) + x) will be be changed to:
>    volatile uint8_t x_ = (x);
>    
> And that will cause a compiler error:
>    ./arch/riscv/include/asm/atomic.h:75:22: error: passing argument 2
>    of '_write_atomic' discards 'volatile' qualifier from pointer target
>    type [-Werror=discarded-qualifiers]
>       75 |     _write_atomic(p, &x_, sizeof(*(p)));
> Because it can't cast 'volatile uint8_t *' to 'void *':
>    expected 'void *' but argument is of type 'volatile uint8_t *' {aka
>    'volatile unsigned char *'}

Oh, I think I see now. What we'd like write_atomic() to derive is the bare
(unqualified) type of *ptr, yet iirc only recent compilers have a way to
obtain that.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-10 16:05       ` Jan Beulich
@ 2024-09-11 11:34         ` oleksii.kurochko
  2024-09-11 11:49           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-11 11:34 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 18:05 +0200, Jan Beulich wrote:
> On 10.09.2024 17:28, oleksii.kurochko@gmail.com wrote:
> > On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
> > > On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > > > --- a/xen/arch/riscv/include/asm/atomic.h
> > > > +++ b/xen/arch/riscv/include/asm/atomic.h
> > > > @@ -54,16 +54,16 @@ static always_inline void
> > > > read_atomic_size(const volatile void *p,
> > > >  })
> > > >  
> > > >  static always_inline void _write_atomic(volatile void *p,
> > > > -                                        unsigned long x,
> > > > +                                        void *x,
> > > 
> > > Pointer-to-const please, to further aid in easily recognizing
> > > which
> > > parameter is what. After all ...
> > > 
> > > >                                          unsigned int size)
> > > >  {
> > > >      switch ( size )
> > > >      {
> > > > -    case 1: writeb_cpu(x, p); break;
> > > > -    case 2: writew_cpu(x, p); break;
> > > > -    case 4: writel_cpu(x, p); break;
> > > 
> > > ... unhelpfully enough parameters are then swapped, just to
> > > confuse
> > > things.
> > If it would be better to keep 'unsigned long' as the type of x,
> > then,
> > if I am not mistaken, write_atomic() should be updated in the
> > following
> > way:
> >    #define write_atomic(p, x)                              \
> >    ({                                                      \
> >        typeof(*(p)) x_ = (x);                              \
> >        _write_atomic(p, *(unsigned long *)&x_,
> > sizeof(*(p)));            
> >    \
> >    })
> > However, I am not sure if it is safe when x is a 2-byte value (for
> > example) that it will read more than 2 bytes before passing the
> > value
> > to the _write_atomic() function.
> 
> No, that's definitely unsafe.

Then, at the moment, I don't see a better option than having const void
*x as an argument for the _write_atomic() function and then performing
casts when writeX_cpu(*(const uintX *)x, p) is called.

> 
> > > > @@ -72,7 +72,7 @@ static always_inline void
> > > > _write_atomic(volatile
> > > > void *p,
> > > >  #define write_atomic(p, x)                              \
> > > >  ({                                                      \
> > > >      typeof(*(p)) x_ = (x);                              \
> > > > -    _write_atomic(p, x_, sizeof(*(p)));                 \
> > > > +    _write_atomic(p, &x_, sizeof(*(p)));                \
> > > >  })
> > > >  
> > > >  static always_inline void _add_sized(volatile void *p,
> > > > @@ -82,27 +82,23 @@ static always_inline void
> > > > _add_sized(volatile
> > > > void *p,
> > > >      {
> > > >      case 1:
> > > >      {
> > > > -        volatile uint8_t *ptr = p;
> > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > +        writeb_cpu(readb_cpu(p) + x, p);
> > > >          break;
> > > >      }
> > > >      case 2:
> > > >      {
> > > > -        volatile uint16_t *ptr = p;
> > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > +        writew_cpu(readw_cpu(p) + x, p);
> > > >          break;
> > > >      }
> > > >      case 4:
> > > >      {
> > > > -        volatile uint32_t *ptr = p;
> > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > +        writel_cpu(readl_cpu(p) + x, p);
> > > >          break;
> > > >      }
> > > >  #ifndef CONFIG_RISCV_32
> > > >      case 8:
> > > >      {
> > > > -        volatile uint64_t *ptr = p;
> > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > +        writeq_cpu(readw_cpu(p) + x, p);
> > > >          break;
> > > >      }
> > > >  #endif
> > > 
> > > I'm afraid I don't understand this part, or more specifically the
> > > respective
> > > part of the description. It is the first parameter of
> > > write_atomic()
> > > which is
> > > volatile qualified. And it is the first argument that's volatile
> > > qualified
> > > here. Isn't the problem entirely unrelated to volatile-ness, and
> > > instead a
> > > result of the other parameter changing from scalar to pointer
> > > type,
> > > which
> > > doesn't fit the addition expressions you pass in?
> > if _add_sized() is defined as it was before:
> >    static always_inline void _add_sized(volatile void *p,
> >                                         unsigned long x, unsigned
> > int
> >    size)
> >    {
> >        switch ( size )
> >        {
> >        case 1:
> >        {
> >            volatile uint8_t *ptr = p;
> >            write_atomic(ptr, read_atomic(ptr) + x);
> >            break;
> >        }
> >    ...
> > Then write_atomic(ptr, read_atomic(ptr) + x) will be be changed to:
> >    volatile uint8_t x_ = (x);
> >    
> > And that will cause a compiler error:
> >    ./arch/riscv/include/asm/atomic.h:75:22: error: passing argument
> > 2
> >    of '_write_atomic' discards 'volatile' qualifier from pointer
> > target
> >    type [-Werror=discarded-qualifiers]
> >       75 |     _write_atomic(p, &x_, sizeof(*(p)));
> > Because it can't cast 'volatile uint8_t *' to 'void *':
> >    expected 'void *' but argument is of type 'volatile uint8_t *'
> > {aka
> >    'volatile unsigned char *'}
> 
> Oh, I think I see now. What we'd like write_atomic() to derive is the
> bare
> (unqualified) type of *ptr, yet iirc only recent compilers have a way
> to
> obtain that.
I assume that you are speaking about typeof_unqual which requires C23
(?).

__auto_type seems to me can also drop volatile quilifier but in the
docs I don't see that it should (or not) discard qualifier. Could it be
an option:
   #define write_atomic(p, x)                              \
   ({                                                      \
       __auto_type x_ = (x);                              \
       _write_atomic(p, &x_, sizeof(*(p)));                 \
   })

And another option could be just drop volatile by casting:
   #define write_atomic(p, x)                              \
   ...
       _write_atomic(p, (const void *)&x_, sizeof(*(p)));                 
   
~ Oleksii

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-11 11:34         ` oleksii.kurochko
@ 2024-09-11 11:49           ` Jan Beulich
  2024-09-12 11:15             ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-11 11:49 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 11.09.2024 13:34, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-10 at 18:05 +0200, Jan Beulich wrote:
>> On 10.09.2024 17:28, oleksii.kurochko@gmail.com wrote:
>>> On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
>>>> On 02.09.2024 19:01, Oleksii Kurochko wrote:
>>>>> @@ -72,7 +72,7 @@ static always_inline void
>>>>> _write_atomic(volatile
>>>>> void *p,
>>>>>  #define write_atomic(p, x)                              \
>>>>>  ({                                                      \
>>>>>      typeof(*(p)) x_ = (x);                              \
>>>>> -    _write_atomic(p, x_, sizeof(*(p)));                 \
>>>>> +    _write_atomic(p, &x_, sizeof(*(p)));                \
>>>>>  })
>>>>>  
>>>>>  static always_inline void _add_sized(volatile void *p,
>>>>> @@ -82,27 +82,23 @@ static always_inline void
>>>>> _add_sized(volatile
>>>>> void *p,
>>>>>      {
>>>>>      case 1:
>>>>>      {
>>>>> -        volatile uint8_t *ptr = p;
>>>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>>>> +        writeb_cpu(readb_cpu(p) + x, p);
>>>>>          break;
>>>>>      }
>>>>>      case 2:
>>>>>      {
>>>>> -        volatile uint16_t *ptr = p;
>>>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>>>> +        writew_cpu(readw_cpu(p) + x, p);
>>>>>          break;
>>>>>      }
>>>>>      case 4:
>>>>>      {
>>>>> -        volatile uint32_t *ptr = p;
>>>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>>>> +        writel_cpu(readl_cpu(p) + x, p);
>>>>>          break;
>>>>>      }
>>>>>  #ifndef CONFIG_RISCV_32
>>>>>      case 8:
>>>>>      {
>>>>> -        volatile uint64_t *ptr = p;
>>>>> -        write_atomic(ptr, read_atomic(ptr) + x);
>>>>> +        writeq_cpu(readw_cpu(p) + x, p);
>>>>>          break;
>>>>>      }
>>>>>  #endif
>>>>
>>>> I'm afraid I don't understand this part, or more specifically the
>>>> respective
>>>> part of the description. It is the first parameter of
>>>> write_atomic()
>>>> which is
>>>> volatile qualified. And it is the first argument that's volatile
>>>> qualified
>>>> here. Isn't the problem entirely unrelated to volatile-ness, and
>>>> instead a
>>>> result of the other parameter changing from scalar to pointer
>>>> type,
>>>> which
>>>> doesn't fit the addition expressions you pass in?
>>> if _add_sized() is defined as it was before:
>>>    static always_inline void _add_sized(volatile void *p,
>>>                                         unsigned long x, unsigned
>>> int
>>>    size)
>>>    {
>>>        switch ( size )
>>>        {
>>>        case 1:
>>>        {
>>>            volatile uint8_t *ptr = p;
>>>            write_atomic(ptr, read_atomic(ptr) + x);
>>>            break;
>>>        }
>>>    ...
>>> Then write_atomic(ptr, read_atomic(ptr) + x) will be be changed to:
>>>    volatile uint8_t x_ = (x);
>>>    
>>> And that will cause a compiler error:
>>>    ./arch/riscv/include/asm/atomic.h:75:22: error: passing argument
>>> 2
>>>    of '_write_atomic' discards 'volatile' qualifier from pointer
>>> target
>>>    type [-Werror=discarded-qualifiers]
>>>       75 |     _write_atomic(p, &x_, sizeof(*(p)));
>>> Because it can't cast 'volatile uint8_t *' to 'void *':
>>>    expected 'void *' but argument is of type 'volatile uint8_t *'
>>> {aka
>>>    'volatile unsigned char *'}
>>
>> Oh, I think I see now. What we'd like write_atomic() to derive is the
>> bare
>> (unqualified) type of *ptr, yet iirc only recent compilers have a way
>> to
>> obtain that.
> I assume that you are speaking about typeof_unqual which requires C23
> (?).

What C version it requires doesn't matter much for our purposes. The
question is as of which gcc / clang version (if any) this is supported
as an extension.

> __auto_type seems to me can also drop volatile quilifier but in the
> docs I don't see that it should (or not) discard qualifier. Could it be
> an option:
>    #define write_atomic(p, x)                              \
>    ({                                                      \
>        __auto_type x_ = (x);                              \
>        _write_atomic(p, &x_, sizeof(*(p)));                 \
>    })

For our purposes __auto_type doesn't provide advantages over typeof().
Plus, more importantly, the use above is wrong, just like typeof(x)
would also be wrong. It needs to be p that the type is derived from.
Otherwise consider what happens when ptr is unsigned long * or
unsigned short * and you write

    write_atomic(ptr, 0);

> And another option could be just drop volatile by casting:
>    #define write_atomic(p, x)                              \
>    ...
>        _write_atomic(p, (const void *)&x_, sizeof(*(p)));                 

See what I said earlier about casts: You shall not cast away
qualifiers. Besides doing so being bad practice, you'll notice the
latest when RISC-V code also becomes subject to Misra compliance.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-11 11:49           ` Jan Beulich
@ 2024-09-12 11:15             ` oleksii.kurochko
  2024-09-12 11:41               ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-12 11:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Wed, 2024-09-11 at 13:49 +0200, Jan Beulich wrote:
> On 11.09.2024 13:34, oleksii.kurochko@gmail.com wrote:
> > On Tue, 2024-09-10 at 18:05 +0200, Jan Beulich wrote:
> > > On 10.09.2024 17:28, oleksii.kurochko@gmail.com wrote:
> > > > On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
> > > > > On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > > > > > @@ -72,7 +72,7 @@ static always_inline void
> > > > > > _write_atomic(volatile
> > > > > > void *p,
> > > > > >  #define write_atomic(p, x)                              \
> > > > > >  ({                                                      \
> > > > > >      typeof(*(p)) x_ = (x);                              \
> > > > > > -    _write_atomic(p, x_, sizeof(*(p)));                 \
> > > > > > +    _write_atomic(p, &x_, sizeof(*(p)));                \
> > > > > >  })
> > > > > >  
> > > > > >  static always_inline void _add_sized(volatile void *p,
> > > > > > @@ -82,27 +82,23 @@ static always_inline void
> > > > > > _add_sized(volatile
> > > > > > void *p,
> > > > > >      {
> > > > > >      case 1:
> > > > > >      {
> > > > > > -        volatile uint8_t *ptr = p;
> > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > +        writeb_cpu(readb_cpu(p) + x, p);
> > > > > >          break;
> > > > > >      }
> > > > > >      case 2:
> > > > > >      {
> > > > > > -        volatile uint16_t *ptr = p;
> > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > +        writew_cpu(readw_cpu(p) + x, p);
> > > > > >          break;
> > > > > >      }
> > > > > >      case 4:
> > > > > >      {
> > > > > > -        volatile uint32_t *ptr = p;
> > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > +        writel_cpu(readl_cpu(p) + x, p);
> > > > > >          break;
> > > > > >      }
> > > > > >  #ifndef CONFIG_RISCV_32
> > > > > >      case 8:
> > > > > >      {
> > > > > > -        volatile uint64_t *ptr = p;
> > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > +        writeq_cpu(readw_cpu(p) + x, p);
> > > > > >          break;
> > > > > >      }
> > > > > >  #endif
> > > > > 
> > > > > I'm afraid I don't understand this part, or more specifically
> > > > > the
> > > > > respective
> > > > > part of the description. It is the first parameter of
> > > > > write_atomic()
> > > > > which is
> > > > > volatile qualified. And it is the first argument that's
> > > > > volatile
> > > > > qualified
> > > > > here. Isn't the problem entirely unrelated to volatile-ness,
> > > > > and
> > > > > instead a
> > > > > result of the other parameter changing from scalar to pointer
> > > > > type,
> > > > > which
> > > > > doesn't fit the addition expressions you pass in?
> > > > if _add_sized() is defined as it was before:
> > > >    static always_inline void _add_sized(volatile void *p,
> > > >                                         unsigned long x,
> > > > unsigned
> > > > int
> > > >    size)
> > > >    {
> > > >        switch ( size )
> > > >        {
> > > >        case 1:
> > > >        {
> > > >            volatile uint8_t *ptr = p;
> > > >            write_atomic(ptr, read_atomic(ptr) + x);
> > > >            break;
> > > >        }
> > > >    ...
> > > > Then write_atomic(ptr, read_atomic(ptr) + x) will be be changed
> > > > to:
> > > >    volatile uint8_t x_ = (x);
> > > >    
> > > > And that will cause a compiler error:
> > > >    ./arch/riscv/include/asm/atomic.h:75:22: error: passing
> > > > argument
> > > > 2
> > > >    of '_write_atomic' discards 'volatile' qualifier from
> > > > pointer
> > > > target
> > > >    type [-Werror=discarded-qualifiers]
> > > >       75 |     _write_atomic(p, &x_, sizeof(*(p)));
> > > > Because it can't cast 'volatile uint8_t *' to 'void *':
> > > >    expected 'void *' but argument is of type 'volatile uint8_t
> > > > *'
> > > > {aka
> > > >    'volatile unsigned char *'}
> > > 
> > > Oh, I think I see now. What we'd like write_atomic() to derive is
> > > the
> > > bare
> > > (unqualified) type of *ptr, yet iirc only recent compilers have a
> > > way
> > > to
> > > obtain that.
> > I assume that you are speaking about typeof_unqual which requires
> > C23
> > (?).
> 
> What C version it requires doesn't matter much for our purposes. The
> question is as of which gcc / clang version (if any) this is
> supported
> as an extension.
> 
> > __auto_type seems to me can also drop volatile quilifier but in the
> > docs I don't see that it should (or not) discard qualifier. Could
> > it be
> > an option:
> >    #define write_atomic(p, x)                              \
> >    ({                                                      \
> >        __auto_type x_ = (x);                              \
> >        _write_atomic(p, &x_, sizeof(*(p)));                 \
> >    })
> 
> For our purposes __auto_type doesn't provide advantages over
> typeof().
> Plus, more importantly, the use above is wrong, just like typeof(x)
> would also be wrong. It needs to be p that the type is derived from.
> Otherwise consider what happens when ptr is unsigned long * or
> unsigned short * and you write
> 
>     write_atomic(ptr, 0);
> 
> > And another option could be just drop volatile by casting:
> >    #define write_atomic(p, x)                              \
> >    ...
> >        _write_atomic(p, (const void *)&x_,
> > sizeof(*(p)));                 
> 
> See what I said earlier about casts: You shall not cast away
> qualifiers. Besides doing so being bad practice, you'll notice the
> latest when RISC-V code also becomes subject to Misra compliance.

Then probably the best one option will be to use union:
   #define write_atomic(p, x)                                         
   \
   ({                                                                 
   \
       union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_ = { .val =
   (x) };  \
       _write_atomic(p, x_.c, sizeof(*(p)));                          
   \
   })
   
~ Oleksii

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types
  2024-09-12 11:15             ` oleksii.kurochko
@ 2024-09-12 11:41               ` oleksii.kurochko
  0 siblings, 0 replies; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-12 11:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Thu, 2024-09-12 at 13:15 +0200, oleksii.kurochko@gmail.com wrote:
> On Wed, 2024-09-11 at 13:49 +0200, Jan Beulich wrote:
> > On 11.09.2024 13:34, oleksii.kurochko@gmail.com wrote:
> > > On Tue, 2024-09-10 at 18:05 +0200, Jan Beulich wrote:
> > > > On 10.09.2024 17:28, oleksii.kurochko@gmail.com wrote:
> > > > > On Tue, 2024-09-10 at 11:53 +0200, Jan Beulich wrote:
> > > > > > On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > > > > > > @@ -72,7 +72,7 @@ static always_inline void
> > > > > > > _write_atomic(volatile
> > > > > > > void *p,
> > > > > > >  #define write_atomic(p, x)                             
> > > > > > > \
> > > > > > >  ({                                                     
> > > > > > > \
> > > > > > >      typeof(*(p)) x_ = (x);                             
> > > > > > > \
> > > > > > > -    _write_atomic(p, x_, sizeof(*(p)));                
> > > > > > > \
> > > > > > > +    _write_atomic(p, &x_, sizeof(*(p)));               
> > > > > > > \
> > > > > > >  })
> > > > > > >  
> > > > > > >  static always_inline void _add_sized(volatile void *p,
> > > > > > > @@ -82,27 +82,23 @@ static always_inline void
> > > > > > > _add_sized(volatile
> > > > > > > void *p,
> > > > > > >      {
> > > > > > >      case 1:
> > > > > > >      {
> > > > > > > -        volatile uint8_t *ptr = p;
> > > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > > +        writeb_cpu(readb_cpu(p) + x, p);
> > > > > > >          break;
> > > > > > >      }
> > > > > > >      case 2:
> > > > > > >      {
> > > > > > > -        volatile uint16_t *ptr = p;
> > > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > > +        writew_cpu(readw_cpu(p) + x, p);
> > > > > > >          break;
> > > > > > >      }
> > > > > > >      case 4:
> > > > > > >      {
> > > > > > > -        volatile uint32_t *ptr = p;
> > > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > > +        writel_cpu(readl_cpu(p) + x, p);
> > > > > > >          break;
> > > > > > >      }
> > > > > > >  #ifndef CONFIG_RISCV_32
> > > > > > >      case 8:
> > > > > > >      {
> > > > > > > -        volatile uint64_t *ptr = p;
> > > > > > > -        write_atomic(ptr, read_atomic(ptr) + x);
> > > > > > > +        writeq_cpu(readw_cpu(p) + x, p);
> > > > > > >          break;
> > > > > > >      }
> > > > > > >  #endif
> > > > > > 
> > > > > > I'm afraid I don't understand this part, or more
> > > > > > specifically
> > > > > > the
> > > > > > respective
> > > > > > part of the description. It is the first parameter of
> > > > > > write_atomic()
> > > > > > which is
> > > > > > volatile qualified. And it is the first argument that's
> > > > > > volatile
> > > > > > qualified
> > > > > > here. Isn't the problem entirely unrelated to volatile-
> > > > > > ness,
> > > > > > and
> > > > > > instead a
> > > > > > result of the other parameter changing from scalar to
> > > > > > pointer
> > > > > > type,
> > > > > > which
> > > > > > doesn't fit the addition expressions you pass in?
> > > > > if _add_sized() is defined as it was before:
> > > > >    static always_inline void _add_sized(volatile void *p,
> > > > >                                         unsigned long x,
> > > > > unsigned
> > > > > int
> > > > >    size)
> > > > >    {
> > > > >        switch ( size )
> > > > >        {
> > > > >        case 1:
> > > > >        {
> > > > >            volatile uint8_t *ptr = p;
> > > > >            write_atomic(ptr, read_atomic(ptr) + x);
> > > > >            break;
> > > > >        }
> > > > >    ...
> > > > > Then write_atomic(ptr, read_atomic(ptr) + x) will be be
> > > > > changed
> > > > > to:
> > > > >    volatile uint8_t x_ = (x);
> > > > >    
> > > > > And that will cause a compiler error:
> > > > >    ./arch/riscv/include/asm/atomic.h:75:22: error: passing
> > > > > argument
> > > > > 2
> > > > >    of '_write_atomic' discards 'volatile' qualifier from
> > > > > pointer
> > > > > target
> > > > >    type [-Werror=discarded-qualifiers]
> > > > >       75 |     _write_atomic(p, &x_, sizeof(*(p)));
> > > > > Because it can't cast 'volatile uint8_t *' to 'void *':
> > > > >    expected 'void *' but argument is of type 'volatile
> > > > > uint8_t
> > > > > *'
> > > > > {aka
> > > > >    'volatile unsigned char *'}
> > > > 
> > > > Oh, I think I see now. What we'd like write_atomic() to derive
> > > > is
> > > > the
> > > > bare
> > > > (unqualified) type of *ptr, yet iirc only recent compilers have
> > > > a
> > > > way
> > > > to
> > > > obtain that.
> > > I assume that you are speaking about typeof_unqual which requires
> > > C23
> > > (?).
> > 
> > What C version it requires doesn't matter much for our purposes.
> > The
> > question is as of which gcc / clang version (if any) this is
> > supported
> > as an extension.
> > 
> > > __auto_type seems to me can also drop volatile quilifier but in
> > > the
> > > docs I don't see that it should (or not) discard qualifier. Could
> > > it be
> > > an option:
> > >    #define write_atomic(p, x)                              \
> > >    ({                                                      \
> > >        __auto_type x_ = (x);                              \
> > >        _write_atomic(p, &x_, sizeof(*(p)));                 \
> > >    })
> > 
> > For our purposes __auto_type doesn't provide advantages over
> > typeof().
> > Plus, more importantly, the use above is wrong, just like typeof(x)
> > would also be wrong. It needs to be p that the type is derived
> > from.
> > Otherwise consider what happens when ptr is unsigned long * or
> > unsigned short * and you write
> > 
> >     write_atomic(ptr, 0);
> > 
> > > And another option could be just drop volatile by casting:
> > >    #define write_atomic(p, x)                              \
> > >    ...
> > >        _write_atomic(p, (const void *)&x_,
> > > sizeof(*(p)));                 
> > 
> > See what I said earlier about casts: You shall not cast away
> > qualifiers. Besides doing so being bad practice, you'll notice the
> > latest when RISC-V code also becomes subject to Misra compliance.
> 
> Then probably the best one option will be to use union:
>    #define write_atomic(p, x)                                        
>    \
>    ({                                                                
>    \
>        union { typeof(*(p)) val; char c[sizeof(*(p))]; } x_ = { .val
> =
>    (x) };  \
>        _write_atomic(p, x_.c, sizeof(*(p)));                         
>    \
>    })
Or maybe we can use 'unsigned long' instead of char c[] and then the
casts inside _write_atomic() could be dropped as we can start to use
_write_atomic(..., const unsigned long x, ...).

But then probably it will be good to init: x_.c = 0UL to be sure that
when type of val is uint8_t for example then the significant bytes of
'union {...; unsigned long c}' are 0.

~ Oleksii

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 4/9] xen/riscv: set up fixmap mappings
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (2 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-10 10:01   ` Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 5/9] xen/riscv: introduce asm/pmap.h header Oleksii Kurochko
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Set up fixmap mappings and the L0 page table for fixmap support.

Modify the Page Table Entries (PTEs) directly in arch_pmap_map()
instead of using set_fixmap() ( which relies on map_pages_to_xen() ).
This change is necessary because PMAP is used when the domain map
page infrastructure is not yet initialized so map_pages_to_xen()
called by set_fixmap() needs to map pages on demand, which then
calls pmap() again, resulting in a loop.
The same reasoning applies to pmap_unmap(), which also modifies
PTEs directly to avoid this issue.

Define new macros in riscv/config.h for calculating
the FIXMAP_BASE address, including BOOT_FDT_VIRT_{START, SIZE},
XEN_VIRT_SIZE, and XEN_VIRT_END.

Update the check for Xen size in riscv/lds.S to use
XEN_VIRT_SIZE instead of a hardcoded constant.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V6:
 - avoid case mixing for address in RISC-V64 layout table.
 - move definition of FIXMAP_BASE to new line.
 - update the commit message.
---
Changes in V5:
 - move definition of FIXMAP_ADDR() to asm/fixmap.h
 - add gap size equal to 2 MB ( 512 * 4K one page table entry in L1 page table )
   between Xen, FDT and Fixmap.
 - drop the comment for FIX_LAST.
 - move +1 from FIX_LAST definition to FIXADDR_TOP to be aligned with Arm.
   ( probably everything below FIX_LAST will be moved to a separate header in asm/generic.h )
 - correct the "changes in V4: s/'fence r,r'/'fence rw, rw'
 - use write_atomic() in set_pte().
 - introduce read_pte().
---
Changes in V4:
 - move definitions of XEN_VIRT_SIZE, BOOT_FDT_VIRT_{START,SIZE}, FIXMAP_{BASE,ADDR}
   below XEN_VIRT_START to have definitions appear in order.
 - define FIX_LAST as (FIX_MISC + 1) to have a guard slot at the end.
 - s/enumerated/numbered in the comment
 - update the cycle which looks for L1 page table in setup_fixmap_mapping_function() and
   the comment above him.
 - drop fences inside write_pte() and put 'fence rw,rw' in setup_fixmap() before sfence_vma().
 - update the commit message
 - drop printk message inside setup_fixmap().
---
Changes in V3:
 - s/XEN_SIZE/XEN_VIRT_SIZE
 - drop usage of XEN_VIRT_END.
 - sort newly introduced defines in config.h by address
 - code style fixes
 - drop runtime check of that pte is valid as it was checked in L1 page table finding cycle by BUG_ON().
 - update implementation of write_pte() with FENCE rw, rw.
 - add BUILD_BUG_ON() to check that amount of entries aren't bigger then entries in page table.
 - drop set_fixmap, clear_fixmap declarations as they aren't used and defined now
 - update the commit message.
 - s/__ASM_FIXMAP_H/ASM_FIXMAP_H
 - add SPDX-License-Identifier: GPL-2.0 
---
 xen/arch/riscv/include/asm/config.h | 16 ++++++++--
 xen/arch/riscv/include/asm/fixmap.h | 46 +++++++++++++++++++++++++++++
 xen/arch/riscv/include/asm/mm.h     |  2 ++
 xen/arch/riscv/include/asm/page.h   | 13 ++++++++
 xen/arch/riscv/mm.c                 | 43 +++++++++++++++++++++++++++
 xen/arch/riscv/setup.c              |  2 ++
 xen/arch/riscv/xen.lds.S            |  2 +-
 7 files changed, 121 insertions(+), 3 deletions(-)
 create mode 100644 xen/arch/riscv/include/asm/fixmap.h

diff --git a/xen/arch/riscv/include/asm/config.h b/xen/arch/riscv/include/asm/config.h
index 50583aafdc..7dbb235685 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -41,8 +41,10 @@
  * Start addr          | End addr         | Slot       | area description
  * ============================================================================
  *                   .....                 L2 511          Unused
- *  0xffffffffc0600000  0xffffffffc0800000 L2 511          Fixmap
- *  0xffffffffc0200000  0xffffffffc0600000 L2 511          FDT
+ *  0xffffffffc0a00000  0xffffffffc0c00000 L2 511          Fixmap
+ *                   ..... ( 2 MB gap )
+ *  0xffffffffc0400000  0xffffffffc0800000 L2 511          FDT
+ *                   ..... ( 2 MB gap )
  *  0xffffffffc0000000  0xffffffffc0200000 L2 511          Xen
  *                   .....                 L2 510          Unused
  *  0x3200000000        0x7f40000000       L2 200-509      Direct map
@@ -74,6 +76,16 @@
 #error "unsupported RV_STAGE1_MODE"
 #endif
 
+#define GAP_SIZE                MB(2)
+
+#define XEN_VIRT_SIZE           MB(2)
+
+#define BOOT_FDT_VIRT_START     (XEN_VIRT_START + XEN_VIRT_SIZE + GAP_SIZE)
+#define BOOT_FDT_VIRT_SIZE      MB(4)
+
+#define FIXMAP_BASE \
+    (BOOT_FDT_VIRT_START + BOOT_FDT_VIRT_SIZE + GAP_SIZE)
+
 #define DIRECTMAP_SLOT_END      509
 #define DIRECTMAP_SLOT_START    200
 #define DIRECTMAP_VIRT_START    SLOTN(DIRECTMAP_SLOT_START)
diff --git a/xen/arch/riscv/include/asm/fixmap.h b/xen/arch/riscv/include/asm/fixmap.h
new file mode 100644
index 0000000000..63732df36c
--- /dev/null
+++ b/xen/arch/riscv/include/asm/fixmap.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fixmap.h: compile-time virtual memory allocation
+ */
+#ifndef ASM_FIXMAP_H
+#define ASM_FIXMAP_H
+
+#include <xen/bug.h>
+#include <xen/page-size.h>
+#include <xen/pmap.h>
+
+#include <asm/page.h>
+
+#define FIXMAP_ADDR(n) (FIXMAP_BASE + (n) * PAGE_SIZE)
+
+/* Fixmap slots */
+#define FIX_PMAP_BEGIN (0) /* Start of PMAP */
+#define FIX_PMAP_END (FIX_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
+#define FIX_MISC (FIX_PMAP_END + 1)  /* Ephemeral mappings of hardware */
+
+#define FIX_LAST FIX_MISC
+
+#define FIXADDR_START FIXMAP_ADDR(0)
+#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST + 1)
+
+#ifndef __ASSEMBLY__
+
+/*
+ * Direct access to xen_fixmap[] should only happen when {set,
+ * clear}_fixmap() is unusable (e.g. where we would end up to
+ * recursively call the helpers).
+ */
+extern pte_t xen_fixmap[];
+
+#define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
+
+static inline unsigned int virt_to_fix(vaddr_t vaddr)
+{
+    BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
+
+    return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
+}
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* ASM_FIXMAP_H */
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index 25af9e1aaa..a0bdc2bc3a 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -255,4 +255,6 @@ static inline unsigned int arch_get_dma_bitsize(void)
     return 32; /* TODO */
 }
 
+void setup_fixmap_mappings(void);
+
 #endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index c831e16417..a7419b93b2 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -9,6 +9,7 @@
 #include <xen/bug.h>
 #include <xen/types.h>
 
+#include <asm/atomic.h>
 #include <asm/mm.h>
 #include <asm/page-bits.h>
 
@@ -81,6 +82,18 @@ static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache)
     BUG_ON("unimplemented");
 }
 
+/* Write a pagetable entry. */
+static inline void write_pte(pte_t *p, pte_t pte)
+{
+    write_atomic(p, pte);
+}
+
+/* Read a pagetable entry. */
+static inline pte_t read_pte(pte_t *p)
+{
+    return read_atomic(p);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_PAGE_H */
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index 7d09e781bf..b8ff91cf4e 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -12,6 +12,7 @@
 #include <asm/early_printk.h>
 #include <asm/csr.h>
 #include <asm/current.h>
+#include <asm/fixmap.h>
 #include <asm/page.h>
 #include <asm/processor.h>
 
@@ -49,6 +50,9 @@ stage1_pgtbl_root[PAGETABLE_ENTRIES];
 pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
 stage1_pgtbl_nonroot[PGTBL_INITIAL_COUNT * PAGETABLE_ENTRIES];
 
+pte_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
+xen_fixmap[PAGETABLE_ENTRIES];
+
 #define HANDLE_PGTBL(curr_lvl_num)                                          \
     index = pt_index(curr_lvl_num, page_addr);                              \
     if ( pte_is_valid(pgtbl[index]) )                                       \
@@ -191,6 +195,45 @@ static bool __init check_pgtbl_mode_support(struct mmu_desc *mmu_desc,
     return is_mode_supported;
 }
 
+void __init setup_fixmap_mappings(void)
+{
+    pte_t *pte, tmp;
+    unsigned int i;
+
+    BUILD_BUG_ON(FIX_LAST >= PAGETABLE_ENTRIES);
+
+    pte = &stage1_pgtbl_root[pt_index(HYP_PT_ROOT_LEVEL, FIXMAP_ADDR(0))];
+
+    /*
+     * In RISC-V page table levels are numbered from Lx to L0 where
+     * x is the highest page table level for currect  MMU mode ( for example,
+     * for Sv39 has 3 page tables so the x = 2 (L2 -> L1 -> L0) ).
+     *
+     * In this cycle we want to find L1 page table because as L0 page table
+     * xen_fixmap[] will be used.
+     */
+    for ( i = HYP_PT_ROOT_LEVEL; i-- > 1; )
+    {
+        BUG_ON(!pte_is_valid(*pte));
+
+        pte = (pte_t *)LOAD_TO_LINK(pte_to_paddr(*pte));
+        pte = &pte[pt_index(i, FIXMAP_ADDR(0))];
+    }
+
+    BUG_ON(pte_is_valid(*pte));
+
+    tmp = paddr_to_pte(LINK_TO_LOAD((unsigned long)&xen_fixmap), PTE_TABLE);
+    write_pte(pte, tmp);
+
+    RISCV_FENCE(rw, rw);
+    sfence_vma();
+
+    /*
+     * We only need the zeroeth table allocated, but not the PTEs set, because
+     * set_fixmap() will set them on the fly.
+     */
+}
+
 /*
  * setup_initial_pagetables:
  *
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 4defad68f4..13f0e8c77d 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -46,6 +46,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
     test_macros_from_bug_h();
 #endif
 
+    setup_fixmap_mappings();
+
     printk("All set up\n");
 
     for ( ;; )
diff --git a/xen/arch/riscv/xen.lds.S b/xen/arch/riscv/xen.lds.S
index 070b19d915..7a683f6065 100644
--- a/xen/arch/riscv/xen.lds.S
+++ b/xen/arch/riscv/xen.lds.S
@@ -181,6 +181,6 @@ ASSERT(!SIZEOF(.got.plt),  ".got.plt non-empty")
  * Changing the size of Xen binary can require an update of
  * PGTBL_INITIAL_COUNT.
  */
-ASSERT(_end - _start <= MB(2), "Xen too large for early-boot assumptions")
+ASSERT(_end - _start <= XEN_VIRT_SIZE, "Xen too large for early-boot assumptions")
 
 ASSERT(_ident_end - _ident_start <= IDENT_AREA_SIZE, "identity region is too big");
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 4/9] xen/riscv: set up fixmap mappings
  2024-09-02 17:01 ` [PATCH v6 4/9] xen/riscv: set up fixmap mappings Oleksii Kurochko
@ 2024-09-10 10:01   ` Jan Beulich
  2024-09-10 15:55     ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 10:01 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> Set up fixmap mappings and the L0 page table for fixmap support.
> 
> Modify the Page Table Entries (PTEs) directly in arch_pmap_map()
> instead of using set_fixmap() ( which relies on map_pages_to_xen() ).

What do you derive this from? There's no set_fixmap() here, and hence
it's unknown how it is going to be implemented. The most you can claim
is that it is expected that it will use map_pages_to_xen(), which in
turn ...

> This change is necessary because PMAP is used when the domain map
> page infrastructure is not yet initialized so map_pages_to_xen()
> called by set_fixmap() needs to map pages on demand, which then
> calls pmap() again, resulting in a loop.

... is only expected to use arch_pmap_map().

> @@ -81,6 +82,18 @@ static inline void flush_page_to_ram(unsigned long mfn, bool sync_icache)
>      BUG_ON("unimplemented");
>  }
>  
> +/* Write a pagetable entry. */
> +static inline void write_pte(pte_t *p, pte_t pte)
> +{
> +    write_atomic(p, pte);
> +}
> +
> +/* Read a pagetable entry. */
> +static inline pte_t read_pte(pte_t *p)

const pte_t *?

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 4/9] xen/riscv: set up fixmap mappings
  2024-09-10 10:01   ` Jan Beulich
@ 2024-09-10 15:55     ` oleksii.kurochko
  2024-09-10 16:07       ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-10 15:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 12:01 +0200, Jan Beulich wrote:
> On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > Set up fixmap mappings and the L0 page table for fixmap support.
> > 
> > Modify the Page Table Entries (PTEs) directly in arch_pmap_map()
> > instead of using set_fixmap() ( which relies on map_pages_to_xen()
> > ).
> 
> What do you derive this from? There's no set_fixmap() here, and hence
> it's unknown how it is going to be implemented.
I derived it from the my code where is set_fixmap() is implemented but
agree that in brackets it is better to write "will use
map_pages_to_xen()" instead of "which relies on map_pages_to_xen()".

>  The most you can claim
> is that it is expected that it will use map_pages_to_xen(), which in
> turn ...
> 
> > This change is necessary because PMAP is used when the domain map
> > page infrastructure is not yet initialized so map_pages_to_xen()
> > called by set_fixmap() needs to map pages on demand, which then
> > calls pmap() again, resulting in a loop.
> 
> ... is only expected to use arch_pmap_map().
it is what is written in the message, isn't it? But I am okay to change
this part of the commit message to:
   {set, clear}_fixmap() is expected to be implemented using
   map_pages_to_xen(), which, in turn, is only expected to use
   arch_pmap_map().

> 
> > @@ -81,6 +82,18 @@ static inline void flush_page_to_ram(unsigned
> > long mfn, bool sync_icache)
> >      BUG_ON("unimplemented");
> >  }
> >  
> > +/* Write a pagetable entry. */
> > +static inline void write_pte(pte_t *p, pte_t pte)
> > +{
> > +    write_atomic(p, pte);
> > +}
> > +
> > +/* Read a pagetable entry. */
> > +static inline pte_t read_pte(pte_t *p)
> 
> const pte_t *?
It would be better to make it const. Thanks.

~ Oleksii



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 4/9] xen/riscv: set up fixmap mappings
  2024-09-10 15:55     ` oleksii.kurochko
@ 2024-09-10 16:07       ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 16:07 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 10.09.2024 17:55, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-10 at 12:01 +0200, Jan Beulich wrote:
>> On 02.09.2024 19:01, Oleksii Kurochko wrote:
>>> Set up fixmap mappings and the L0 page table for fixmap support.
>>>
>>> Modify the Page Table Entries (PTEs) directly in arch_pmap_map()
>>> instead of using set_fixmap() ( which relies on map_pages_to_xen()
>>> ).
>>
>> What do you derive this from? There's no set_fixmap() here, and hence
>> it's unknown how it is going to be implemented.
> I derived it from the my code where is set_fixmap() is implemented but
> agree that in brackets it is better to write "will use
> map_pages_to_xen()" instead of "which relies on map_pages_to_xen()".
> 
>>  The most you can claim
>> is that it is expected that it will use map_pages_to_xen(), which in
>> turn ...
>>
>>> This change is necessary because PMAP is used when the domain map
>>> page infrastructure is not yet initialized so map_pages_to_xen()
>>> called by set_fixmap() needs to map pages on demand, which then
>>> calls pmap() again, resulting in a loop.
>>
>> ... is only expected to use arch_pmap_map().
> it is what is written in the message, isn't it?

Not quite - the original sentence is written as if map_pages_to_xen()
existed already in the code base, or was brought into existence by
this very patch. (Of course I mean the real function, not the stub
that's there.)

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 5/9] xen/riscv: introduce asm/pmap.h header
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (3 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 4/9] xen/riscv: set up fixmap mappings Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info Oleksii Kurochko
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Introduce arch_pmap_{un}map functions and select HAS_PMAP for CONFIG_RISCV.

Add pte_from_mfn() for use in arch_pmap_map().

Introduce flush_xen_tlb_one_local() and use it in arch_pmap_{un}map().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V6:
 - No changes ( only rebase )
---
Changes in V5:
 - Add Reviewed-by: Jan Beulich <jbeulich@suse.com>.
 - Fix a typo in "Changes in V4":
   - "drop flush_xen_tlb_range_va_local() as it isn't used in this patch" ->
     "drop flush_xen_tlb_range_va() as it isn't used in this patch"
   - "s/flush_xen_tlb_range_va_local/flush_tlb_range_va_local" ->
     "s/flush_xen_tlb_one_local/flush_tlb_one_local"
---
Changes in V4:
 - mark arch_pmap_{un}map() as __init: documentation purpose and
   a necessary (but not sufficient) condition here, to validly
   use local TLB flushes only.
 - add flush_xen_tlb_one_local() to arch_pmap_map() as absense of
   "negative" TLB entrues will be guaranted only in the case
   when Svvptc extension is present.
 - s/mfn_from_pte/pte_from_mfn
 - drop mfn_to_xen_entry() as pte_from_mfn() does the same thing
 - add flags argument to pte_from_mfn().
 - update the commit message.
 - drop flush_xen_tlb_range_va() as it isn't used in this patch
 - s/flush_xen_tlb_one_local/flush_tlb_one_local
---
Changes in V3:
 - rename argument of function mfn_to_xen_entry(..., attr -> flags ).
 - update the code of mfn_to_xen_entry() to use flags argument.
 - add blank in mfn_from_pte() in return line.
 - introduce flush_xen_tlb_range_va_local() and use it inside arch_pmap_{un}map().
 - s/__ASM_PMAP_H__/ASM_PMAP_H
 - add SPDX-License-Identifier: GPL-2.0 
---
 xen/arch/riscv/Kconfig                |  1 +
 xen/arch/riscv/include/asm/flushtlb.h |  6 +++++
 xen/arch/riscv/include/asm/page.h     |  6 +++++
 xen/arch/riscv/include/asm/pmap.h     | 36 +++++++++++++++++++++++++++
 4 files changed, 49 insertions(+)
 create mode 100644 xen/arch/riscv/include/asm/pmap.h

diff --git a/xen/arch/riscv/Kconfig b/xen/arch/riscv/Kconfig
index 259eea8d3b..0112aa8778 100644
--- a/xen/arch/riscv/Kconfig
+++ b/xen/arch/riscv/Kconfig
@@ -3,6 +3,7 @@ config RISCV
 	select FUNCTION_ALIGNMENT_16B
 	select GENERIC_BUG_FRAME
 	select HAS_DEVICE_TREE
+	select HAS_PMAP
 
 config RISCV_64
 	def_bool y
diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h
index 7ce32bea0b..f4a735fd6c 100644
--- a/xen/arch/riscv/include/asm/flushtlb.h
+++ b/xen/arch/riscv/include/asm/flushtlb.h
@@ -5,6 +5,12 @@
 #include <xen/bug.h>
 #include <xen/cpumask.h>
 
+/* Flush TLB of local processor for address va. */
+static inline void flush_tlb_one_local(vaddr_t va)
+{
+    asm volatile ( "sfence.vma %0" :: "r" (va) : "memory" );
+}
+
 /*
  * Filter the given set of CPUs, removing those that definitely flushed their
  * TLB since @page_timestamp.
diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index a7419b93b2..55916eaa92 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -94,6 +94,12 @@ static inline pte_t read_pte(pte_t *p)
     return read_atomic(p);
 }
 
+static inline pte_t pte_from_mfn(mfn_t mfn, unsigned int flags)
+{
+    unsigned long pte = (mfn_x(mfn) << PTE_PPN_SHIFT) | flags;
+    return (pte_t){ .pte = pte };
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_RISCV_PAGE_H */
diff --git a/xen/arch/riscv/include/asm/pmap.h b/xen/arch/riscv/include/asm/pmap.h
new file mode 100644
index 0000000000..60065c996f
--- /dev/null
+++ b/xen/arch/riscv/include/asm/pmap.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_PMAP_H
+#define ASM_PMAP_H
+
+#include <xen/bug.h>
+#include <xen/init.h>
+#include <xen/mm.h>
+#include <xen/page-size.h>
+
+#include <asm/fixmap.h>
+#include <asm/flushtlb.h>
+#include <asm/system.h>
+
+static inline void __init arch_pmap_map(unsigned int slot, mfn_t mfn)
+{
+    pte_t *entry = &xen_fixmap[slot];
+    pte_t pte;
+
+    ASSERT(!pte_is_valid(*entry));
+
+    pte = pte_from_mfn(mfn, PAGE_HYPERVISOR_RW);
+    write_pte(entry, pte);
+
+    flush_tlb_one_local(FIXMAP_ADDR(slot));
+}
+
+static inline void __init arch_pmap_unmap(unsigned int slot)
+{
+    pte_t pte = {};
+
+    write_pte(&xen_fixmap[slot], pte);
+
+    flush_tlb_one_local(FIXMAP_ADDR(slot));
+}
+
+#endif /* ASM_PMAP_H */
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (4 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 5/9] xen/riscv: introduce asm/pmap.h header Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-10 10:33   ` Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension Oleksii Kurochko
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Introduce struct pcpu_info to store pCPU-related information.
Initially, it includes only processor_id and hart id, but it
will be extended to include guest CPU information and
temporary variables for saving/restoring vCPU registers.

Add set_processor_id() and get_processor_id() functions to set
and retrieve the processor_id stored in pcpu_info.

Define smp_processor_id() to provide accurate information,
replacing the previous "dummy" value of 0.

Initialize tp registers to point to pcpu_info[0].
Set processor_id to 0 for logical CPU 0 and store the physical
CPU ID in pcpu_info[0].

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V6:
 - update the commit message ( drop outdated information ).
 - s/FIXME commit/FIXME comment in "changes in V5".
 - code style fixes.
 - refactoring of smp_processor_id() and fix BUG_ON() condition inside it.
 - change "mv a0,x0" to "li a0, 0".
 - add __cacheline_aligned to the struct pcpu_info.
 - drop smp_set_bootcpu_id() and smpboot.c as it has only smp_set_bootcpu_id()
   defined at the moment.
 - re-write setup_tp() to assembler.
---
Changes in V5:
 - add hart_id to pcpu_info;
 - add comments to pcpu_info members.
 - define INVALID_HARTID as ULONG_MAX as mhart_id register has MXLEN which is
   equal to 32 for RV-32 and 64 for RV-64.
 - add hart_id to pcpu_info structure.
 - drop cpuid_to_hartid_map[] and use pcpu_info[] for the same purpose.
 - introduce new function setup_tp(cpuid).
 - add the FIXME comment on top of pcpu_info[].
 - setup TP register before start_xen() being called.
 - update the commit message.
 - change "commit message" to "comment" in "Changes in V4" in "update the comment
   above the code of TP..."
---
Changes in V4:
 - wrap id with () inside set_processor_id().
 - code style fixes
 - update BUG_ON(id > NR_CPUS) in smp_processor_id() and drop the comment
   above BUG_ON().
 - s/__cpuid_to_hartid_map/cpuid_to_hartid_map
 - s/cpuid_to_hartid_map/cpuid_to_harti ( here cpuid_to_hartid_map is the name
   of the macros ).
 - update the comment above the code of TP register initialization in
   start_xen().
 - s/smp_setup_processor_id/smp_setup_bootcpu_id
 - update the commit message.
 - cleanup headers which are included in <asm/processor.h>
---
Changes in V3:
 - new patch.
---
 xen/arch/riscv/Makefile                |  1 +
 xen/arch/riscv/include/asm/processor.h | 27 ++++++++++++++++++++++++--
 xen/arch/riscv/include/asm/smp.h       |  9 +++++++++
 xen/arch/riscv/riscv64/asm-offsets.c   |  2 ++
 xen/arch/riscv/riscv64/head.S          | 15 ++++++++++++++
 xen/arch/riscv/setup.c                 |  5 +++++
 xen/arch/riscv/smp.c                   | 15 ++++++++++++++
 7 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/riscv/smp.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 81b77b13d6..2f2d6647a2 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -4,6 +4,7 @@ obj-y += mm.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
+obj-y += smp.o
 obj-y += stubs.o
 obj-y += traps.o
 obj-y += vm_event.o
diff --git a/xen/arch/riscv/include/asm/processor.h b/xen/arch/riscv/include/asm/processor.h
index 3ae164c265..4799243863 100644
--- a/xen/arch/riscv/include/asm/processor.h
+++ b/xen/arch/riscv/include/asm/processor.h
@@ -12,8 +12,31 @@
 
 #ifndef __ASSEMBLY__
 
-/* TODO: need to be implemeted */
-#define smp_processor_id() 0
+#include <xen/bug.h>
+
+register struct pcpu_info *tp asm ( "tp" );
+
+struct pcpu_info {
+    unsigned int processor_id; /* Xen CPU id */
+    unsigned long hart_id; /* physical CPU id */
+} __cacheline_aligned;
+
+/* tp points to one of these */
+extern struct pcpu_info pcpu_info[NR_CPUS];
+
+#define get_processor_id()      (tp->processor_id)
+#define set_processor_id(id)    do { \
+    tp->processor_id = (id);         \
+} while (0)
+
+static inline unsigned int smp_processor_id(void)
+{
+    unsigned int id = get_processor_id();
+
+    BUG_ON(id > (NR_CPUS - 1));
+
+    return id;
+}
 
 /* On stack VCPU state */
 struct cpu_user_regs
diff --git a/xen/arch/riscv/include/asm/smp.h b/xen/arch/riscv/include/asm/smp.h
index b1ea91b1eb..11eee67d62 100644
--- a/xen/arch/riscv/include/asm/smp.h
+++ b/xen/arch/riscv/include/asm/smp.h
@@ -5,6 +5,8 @@
 #include <xen/cpumask.h>
 #include <xen/percpu.h>
 
+#include <asm/processor.h>
+
 DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 
@@ -14,6 +16,13 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
  */
 #define park_offline_cpus false
 
+/*
+ * Mapping between linux logical cpu index and hartid.
+ */
+#define cpuid_to_hartid(cpu) (pcpu_info[cpu].hart_id)
+
+void setup_tp(unsigned int cpuid);
+
 #endif
 
 /*
diff --git a/xen/arch/riscv/riscv64/asm-offsets.c b/xen/arch/riscv/riscv64/asm-offsets.c
index 9f663b9510..11400c4697 100644
--- a/xen/arch/riscv/riscv64/asm-offsets.c
+++ b/xen/arch/riscv/riscv64/asm-offsets.c
@@ -50,4 +50,6 @@ void asm_offsets(void)
     OFFSET(CPU_USER_REGS_SSTATUS, struct cpu_user_regs, sstatus);
     OFFSET(CPU_USER_REGS_PREGS, struct cpu_user_regs, pregs);
     BLANK();
+    DEFINE(PCPU_INFO_SIZE, sizeof(struct pcpu_info));
+    BLANK();
 }
diff --git a/xen/arch/riscv/riscv64/head.S b/xen/arch/riscv/riscv64/head.S
index 3261e9fce8..c7d8bf18c5 100644
--- a/xen/arch/riscv/riscv64/head.S
+++ b/xen/arch/riscv/riscv64/head.S
@@ -1,4 +1,5 @@
 #include <asm/asm.h>
+#include <asm/asm-offsets.h>
 #include <asm/riscv_encoding.h>
 
         .section .text.header, "ax", %progbits
@@ -55,6 +56,10 @@ FUNC(start)
          */
         jal     reset_stack
 
+        /* Xen's boot cpu id is equal to 0 so setup TP register for it */
+        li      a0, 0
+        jal     setup_tp
+
         /* restore hart_id ( bootcpu_id ) and dtb address */
         mv      a0, s0
         mv      a1, s1
@@ -72,6 +77,16 @@ FUNC(reset_stack)
         ret
 END(reset_stack)
 
+/* void setup_tp(unsigned int xen_cpuid); */
+FUNC(setup_tp)
+        la      tp, pcpu_info
+        li      t0, PCPU_INFO_SIZE
+        mul     t1, a0, t0
+        add     tp, tp, t1
+
+        ret
+END(setup_tp)
+
         .section .text.ident, "ax", %progbits
 
 FUNC(turn_on_mmu)
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 13f0e8c77d..540a3a608e 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -8,6 +8,7 @@
 #include <public/version.h>
 
 #include <asm/early_printk.h>
+#include <asm/smp.h>
 #include <asm/traps.h>
 
 void arch_get_xen_caps(xen_capabilities_info_t *info)
@@ -40,6 +41,10 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
 {
     remove_identity_mapping();
 
+    set_processor_id(0);
+
+    cpuid_to_hartid(0) = bootcpu_id;
+
     trap_init();
 
 #ifdef CONFIG_SELF_TESTS
diff --git a/xen/arch/riscv/smp.c b/xen/arch/riscv/smp.c
new file mode 100644
index 0000000000..4ca6a4e892
--- /dev/null
+++ b/xen/arch/riscv/smp.c
@@ -0,0 +1,15 @@
+#include <xen/smp.h>
+
+/*
+ * FIXME: make pcpu_info[] dynamically allocated when necessary
+ *        functionality will be ready
+ */
+/*
+ * tp points to one of these per cpu.
+ *
+ * hart_id would be valid (no matter which value) if its
+ * processor_id field is valid (less than NR_CPUS).
+ */
+struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
+    .processor_id = NR_CPUS,
+}};
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-02 17:01 ` [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info Oleksii Kurochko
@ 2024-09-10 10:33   ` Jan Beulich
  2024-09-11 12:05     ` oleksii.kurochko
  2024-09-12 16:02     ` oleksii.kurochko
  0 siblings, 2 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 10:33 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/processor.h
> +++ b/xen/arch/riscv/include/asm/processor.h
> @@ -12,8 +12,31 @@
>  
>  #ifndef __ASSEMBLY__
>  
> -/* TODO: need to be implemeted */
> -#define smp_processor_id() 0
> +#include <xen/bug.h>
> +
> +register struct pcpu_info *tp asm ( "tp" );
> +
> +struct pcpu_info {
> +    unsigned int processor_id; /* Xen CPU id */
> +    unsigned long hart_id; /* physical CPU id */
> +} __cacheline_aligned;

Shouldn't you include xen/cache.h for this, to be sure the header can
be included on its own?

I'm also unconvinced of this placement: Both Arm and x86 have similar
structures (afaict), living in current.h.

> +/* tp points to one of these */
> +extern struct pcpu_info pcpu_info[NR_CPUS];
> +
> +#define get_processor_id()      (tp->processor_id)

Iirc it was in response to one of your earlier patches that we removed
get_processor_id() from the other architectures, as being fully
redundant with smp_processor_id(). Is there a particular reason you
re-introduce that now for RISC-V?

> +#define set_processor_id(id)    do { \
> +    tp->processor_id = (id);         \
> +} while (0)
> +
> +static inline unsigned int smp_processor_id(void)
> +{
> +    unsigned int id = get_processor_id();
> +
> +    BUG_ON(id > (NR_CPUS - 1));

The more conventional way of expressing this is >= NR_CPUS.

> @@ -14,6 +16,13 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
>   */
>  #define park_offline_cpus false
>  
> +/*
> + * Mapping between linux logical cpu index and hartid.
> + */
> +#define cpuid_to_hartid(cpu) (pcpu_info[cpu].hart_id)

Does this need to be a macro (rather than an inline function)?

> @@ -72,6 +77,16 @@ FUNC(reset_stack)
>          ret
>  END(reset_stack)
>  
> +/* void setup_tp(unsigned int xen_cpuid); */
> +FUNC(setup_tp)
> +        la      tp, pcpu_info
> +        li      t0, PCPU_INFO_SIZE
> +        mul     t1, a0, t0
> +        add     tp, tp, t1
> +
> +        ret
> +END(setup_tp)

I take it this is going to run (i.e. also for secondary CPUs) ahead of
Xen being able to handle any kind of exception (on the given CPU)? If
so, all is fine here. If not, transiently pointing tp at CPU0's space
is a possible problem.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-10 10:33   ` Jan Beulich
@ 2024-09-11 12:05     ` oleksii.kurochko
  2024-09-11 12:14       ` Jan Beulich
  2024-09-12 16:02     ` oleksii.kurochko
  1 sibling, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-11 12:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 12:33 +0200, Jan Beulich wrote:
> On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > --- a/xen/arch/riscv/include/asm/processor.h
> > +++ b/xen/arch/riscv/include/asm/processor.h
> > @@ -12,8 +12,31 @@
> >  
> >  #ifndef __ASSEMBLY__
> >  
> > -/* TODO: need to be implemeted */
> > -#define smp_processor_id() 0
> > +#include <xen/bug.h>
> > +
> > +register struct pcpu_info *tp asm ( "tp" );
> > +
> > +struct pcpu_info {
> > +    unsigned int processor_id; /* Xen CPU id */
> > +    unsigned long hart_id; /* physical CPU id */
> > +} __cacheline_aligned;
> 
> Shouldn't you include xen/cache.h for this, to be sure the header can
> be included on its own?
Agree, it would be better to include xen/cache.h header.

> 
> I'm also unconvinced of this placement: Both Arm and x86 have similar
> structures (afaict), living in current.h.
Then for consistency it would be better to move this structure to
current.h for RISC-V.

> 
> > +/* tp points to one of these */
> > +extern struct pcpu_info pcpu_info[NR_CPUS];
> > +
> > +#define get_processor_id()      (tp->processor_id)
> 
> Iirc it was in response to one of your earlier patches that we
> removed
> get_processor_id() from the other architectures, as being fully
> redundant with smp_processor_id(). Is there a particular reason you
> re-introduce that now for RISC-V?
No reason, just forgot that we agreed to use only smp_processor_id()
and made a bad rebase of my 'latest' branch on top of the current
staging which doesn't tell me about merge conflict in that place.
I will drop get_processor_id().

> 
> > +#define set_processor_id(id)    do { \
> > +    tp->processor_id = (id);         \
> > +} while (0)
> > +
> > +static inline unsigned int smp_processor_id(void)
> > +{
> > +    unsigned int id = get_processor_id();
> > +
> > +    BUG_ON(id > (NR_CPUS - 1));
> 
> The more conventional way of expressing this is >= NR_CPUS.
> 
> > @@ -14,6 +16,13 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
> >   */
> >  #define park_offline_cpus false
> >  
> > +/*
> > + * Mapping between linux logical cpu index and hartid.
> > + */
> > +#define cpuid_to_hartid(cpu) (pcpu_info[cpu].hart_id)
> 
> Does this need to be a macro (rather than an inline function)?
No, there is no such need. I will use inline function instead.

> 
> > @@ -72,6 +77,16 @@ FUNC(reset_stack)
> >          ret
> >  END(reset_stack)
> >  
> > +/* void setup_tp(unsigned int xen_cpuid); */
> > +FUNC(setup_tp)
> > +        la      tp, pcpu_info
> > +        li      t0, PCPU_INFO_SIZE
> > +        mul     t1, a0, t0
> > +        add     tp, tp, t1
> > +
> > +        ret
> > +END(setup_tp)
> 
> I take it this is going to run (i.e. also for secondary CPUs) ahead
> of
> Xen being able to handle any kind of exception (on the given CPU)?
Yes, I am using it for secondary CPUs and Xen are handling exceptions (
on the given CPU ) fine.

>  If
> so, all is fine here. If not, transiently pointing tp at CPU0's space
> is a possible problem.
I haven't had any problem with that at the moment.

Do you think that it will be better to use DECLARE_PER_CPU() with
updating of setup_tp() instead of pcpu_info[] when SMP will be
introduced?
What kind of problems should I take into account?

~ Oleksii





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-11 12:05     ` oleksii.kurochko
@ 2024-09-11 12:14       ` Jan Beulich
  2024-09-12  9:27         ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-11 12:14 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 11.09.2024 14:05, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-10 at 12:33 +0200, Jan Beulich wrote:
>> On 02.09.2024 19:01, Oleksii Kurochko wrote:
>>> @@ -72,6 +77,16 @@ FUNC(reset_stack)
>>>          ret
>>>  END(reset_stack)
>>>  
>>> +/* void setup_tp(unsigned int xen_cpuid); */
>>> +FUNC(setup_tp)
>>> +        la      tp, pcpu_info
>>> +        li      t0, PCPU_INFO_SIZE
>>> +        mul     t1, a0, t0
>>> +        add     tp, tp, t1
>>> +
>>> +        ret
>>> +END(setup_tp)
>>
>> I take it this is going to run (i.e. also for secondary CPUs) ahead
>> of
>> Xen being able to handle any kind of exception (on the given CPU)?
> Yes, I am using it for secondary CPUs and Xen are handling exceptions (
> on the given CPU ) fine.

Yet that wasn't my question. Note in particular the use of "ahead of".

>>  If
>> so, all is fine here. If not, transiently pointing tp at CPU0's space
>> is a possible problem.
> I haven't had any problem with that at the moment.
> 
> Do you think that it will be better to use DECLARE_PER_CPU() with
> updating of setup_tp() instead of pcpu_info[] when SMP will be
> introduced?
> What kind of problems should I take into account?

If exceptions can be handled by Xen already when entering this function,
then the exception handler would need to be setting up tp for itself. If
not, it would use whatever the interrupted context used (or what is
brought into context by hardware while delivering the exception). If I
assumed that tp in principle doesn't need setting up when handling
exceptions (sorry, haven't read up enough yet about how guest -> host
switches work for RISC-V), and if further exceptions can already be
handled upon entering setup_tp(), then keeping tp properly invalid until
it can be set to its correct value will make it easier to diagnose
problems than when - like you do - transiently setting tp to CPU0's
value (and hence risking corruption of its state).

Jan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-11 12:14       ` Jan Beulich
@ 2024-09-12  9:27         ` oleksii.kurochko
  2024-09-12  9:58           ` Jan Beulich
  0 siblings, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-12  9:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Wed, 2024-09-11 at 14:14 +0200, Jan Beulich wrote:
> On 11.09.2024 14:05, oleksii.kurochko@gmail.com wrote:
> > On Tue, 2024-09-10 at 12:33 +0200, Jan Beulich wrote:
> > > On 02.09.2024 19:01, Oleksii Kurochko wrote:
> > > > @@ -72,6 +77,16 @@ FUNC(reset_stack)
> > > >          ret
> > > >  END(reset_stack)
> > > >  
> > > > +/* void setup_tp(unsigned int xen_cpuid); */
> > > > +FUNC(setup_tp)
> > > > +        la      tp, pcpu_info
> > > > +        li      t0, PCPU_INFO_SIZE
> > > > +        mul     t1, a0, t0
> > > > +        add     tp, tp, t1
> > > > +
> > > > +        ret
> > > > +END(setup_tp)
> > > 
> > > I take it this is going to run (i.e. also for secondary CPUs)
> > > ahead
> > > of
> > > Xen being able to handle any kind of exception (on the given
> > > CPU)?
> > Yes, I am using it for secondary CPUs and Xen are handling
> > exceptions (
> > on the given CPU ) fine.
> 
> Yet that wasn't my question. Note in particular the use of "ahead
> of".
The first executed function for secondary CPU will be
( https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/riscv64/head.S?ref_type=heads#L100
) where the first instruction mask all interrupts:
           /*
            * a0 -> started hart id
            * a1 -> private data passed by boot cpu
            */
   ENTRY(secondary_start_sbi)
           /* Mask all interrupts */
           csrw    CSR_SIE, zero
           ...
   	tail    smp_callin
   
Then at the start of smp_callin
( https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/smpboot.c?ref_type=heads#L258
) tp register is setup ( in the old way for now using inline assembly I
will switch to setup_tp() later a little bit and call it before 'tail
smp_callin' ) and only after that local irqs are enabled:
   void __init smp_callin(unsigned int cpuid)
   {
       unsigned int hcpu = 1;
   
       for ( ; (hcpu < NR_CPUS) && (cpuid_to_hartid_map(hcpu) != cpuid);
   hcpu++)
       {}
   
       asm volatile ("mv tp, %0" : : "r"((unsigned
   long)&pcpu_info[hcpu]));
   ...
      trap_init(); /* write handle_trap() address to CSR_STVEC */
   ...
      local_irq_enable();
   ...
   
> 
> > >  If
> > > so, all is fine here. If not, transiently pointing tp at CPU0's
> > > space
> > > is a possible problem.
> > I haven't had any problem with that at the moment.
> > 
> > Do you think that it will be better to use DECLARE_PER_CPU() with
> > updating of setup_tp() instead of pcpu_info[] when SMP will be
> > introduced?
> > What kind of problems should I take into account?
> 
> If exceptions can be handled by Xen already when entering this
> function,
> then the exception handler would need to be setting up tp for itself.
> If
> not, it would use whatever the interrupted context used (or what is
> brought into context by hardware while delivering the exception). If
> I
> assumed that tp in principle doesn't need setting up when handling
> exceptions (sorry, haven't read up enough yet about how guest -> host
> switches work for RISC-V), and if further exceptions can already be
> handled upon entering setup_tp(), then keeping tp properly invalid
> until
> it can be set to its correct value will make it easier to diagnose
> problems than when - like you do - transiently setting tp to CPU0's
> value (and hence risking corruption of its state).
Regarding tp in exception handler if it is an exception from Xen it
will be set to 0 ( it is done by switch CSR_SSCRATCH and tp, and
CSR_SSCRATCH is always 0 for Xen and for guest it will be set to
pcpu_info[cpuid] before returning to new
vcpu:https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/entry.S?ref_type=heads#L165
) at the start of the handler; otherwise if an exception from Guest it
will set to &pcpu_info[cpuid] which was stored in CSR_SSCRATCH:
https://gitlab.com/xen-project/people/olkur/xen/-/blob/latest/xen/arch/riscv/entry.S?ref_type=heads#L15

As I mentioned above, interrupts will be disabled until tp is set. Even
if they aren’t disabled, tp will be set to 0 because, at the moment the
secondary CPU boots, CSR_SSCRATCH will be 0, which indicates that the
interrupt is from Xen.

> - like you do - transiently setting tp to CPU0's value (and hence >
risking corruption of its state).
I think I’m missing something—why would the secondary CPU have the same
value as CPU0? If we don’t set up the tp register when the secondary
CPU boots, it will contain a default value, which is expected upon
boot. It will retain this value until setup_tp() is called, which will
then set tp to pcpu_info[SECONDARY_CPU_ID].

~ Oleksii







^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-12  9:27         ` oleksii.kurochko
@ 2024-09-12  9:58           ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-12  9:58 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 12.09.2024 11:27, oleksii.kurochko@gmail.com wrote:
> As I mentioned above, interrupts will be disabled until tp is set.

Okay, so all good then

> Even
> if they aren’t disabled, tp will be set to 0 because, at the moment the
> secondary CPU boots, CSR_SSCRATCH will be 0, which indicates that the
> interrupt is from Xen.
> 
>> - like you do - transiently setting tp to CPU0's value (and hence >
> risking corruption of its state).
> I think I’m missing something—why would the secondary CPU have the same
> value as CPU0? If we don’t set up the tp register when the secondary
> CPU boots, it will contain a default value, which is expected upon
> boot. It will retain this value until setup_tp() is called, which will
> then set tp to pcpu_info[SECONDARY_CPU_ID].

Just to clarify (shouldn't matter in practice according to what you
said above) - in

FUNC(setup_tp)
        la      tp, pcpu_info
        li      t0, PCPU_INFO_SIZE
        mul     t1, a0, t0
        add     tp, tp, t1
        ret
END(setup_tp)

you start with setting tp to the CPU0 value. You only then adjust tp (3
insns later) to the designated value. If you wanted to play safe, you'd
do it e.g. like this

FUNC(setup_tp)
        la      t0, pcpu_info
        li      t1, PCPU_INFO_SIZE
        mul     t1, a0, t1
        add     tp, t0, t1
        ret
END(setup_tp)

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-10 10:33   ` Jan Beulich
  2024-09-11 12:05     ` oleksii.kurochko
@ 2024-09-12 16:02     ` oleksii.kurochko
  2024-09-13 12:51       ` Jan Beulich
  1 sibling, 1 reply; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-12 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On Tue, 2024-09-10 at 12:33 +0200, Jan Beulich wrote:
> > +/*
> > + * Mapping between linux logical cpu index and hartid.
> > + */
> > +#define cpuid_to_hartid(cpu) (pcpu_info[cpu].hart_id)
> 
> Does this need to be a macro (rather than an inline function)?
I started to rework that and I am using this macros for both read
and write. So it will be needed to introduce set and get inline
functions instead of just one macros. I think I will stick to one
macros instead of 2 inline functions.

~ Oleksii


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info
  2024-09-12 16:02     ` oleksii.kurochko
@ 2024-09-13 12:51       ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-13 12:51 UTC (permalink / raw)
  To: oleksii.kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 12.09.2024 18:02, oleksii.kurochko@gmail.com wrote:
> On Tue, 2024-09-10 at 12:33 +0200, Jan Beulich wrote:
>>> +/*
>>> + * Mapping between linux logical cpu index and hartid.
>>> + */
>>> +#define cpuid_to_hartid(cpu) (pcpu_info[cpu].hart_id)
>>
>> Does this need to be a macro (rather than an inline function)?
> I started to rework that and I am using this macros for both read
> and write. So it will be needed to introduce set and get inline
> functions instead of just one macros. I think I will stick to one
> macros instead of 2 inline functions.

You may want to consult with Andrew as to use of such a macro on
the lhs of an assignment. I expect he'll ask to avoid such, and
instead indeed go with both a get and a set accessor (unless it
would make sense to simply open-code the few sets that there are
going to be).

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (5 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-10 11:32   ` Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 8/9] xen/riscv: page table handling Oleksii Kurochko
  2024-09-02 17:01 ` [PATCH v6 9/9] xen/riscv: introduce early_fdt_map() Oleksii Kurochko
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Introduce functions to work with the SBI RFENCE extension for issuing
various fence operations to remote CPUs.

Add the sbi_init() function along with auxiliary functions and macro
definitions for proper initialization and checking the availability of
SBI extensions. Currently, this is implemented only for RFENCE.

Introduce sbi_remote_sfence_vma() to send SFENCE_VMA instructions to
a set of target HARTs. This will support the implementation of
flush_xen_tlb_range_va().

Integrate __sbi_rfence_v02 from Linux kernel 6.6.0-rc4 with minimal
modifications:
 - Adapt to Xen code style.
 - Use cpuid_to_hartid() instead of cpuid_to_hartid_map[].
 - Update BIT(...) to BIT(..., UL).
 - Rename __sbi_rfence_v02_call to sbi_rfence_v02_real and
   remove the unused arg5.
 - Handle NULL cpu_mask to execute rfence on all CPUs by calling
   sbi_rfence_v02_real(..., 0UL, -1UL,...) instead of creating hmask.
 - change type for start_addr and size to vaddr_t and size_t.
 - Add an explanatory comment about when batching can and cannot occur,
   and why batching happens in the first place.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in V6:
 - align with zeros the definition of SBI_SPEC_VERSION_MINOR_MASK.
 - drop fallthrough in sbi_err_map_xen_errno() between 'case SBI_ERR_FAILURE'
   and default.
 - update return type of sbi_{major,minor}_version() to unsigned int.
 - move BUG_ON(ret.value < 0); inside if ( !ret.error ) in sbi_ext_base_func().
 - print fid as %#lx instead of %lu.
 - print ret.error instead of what sbi_err_map_xen_errno() as it may lose
   information.
 - drop unrelated information in the comment of the for_each_cpu cycle in
   sbi_rfence_v02().
 - small refactoring in sbi_rfence_v02(): making uniform path for returning
   result variable.
 - rename start_addr argument to start for sbi_remote_sfence_vma().
 - use sbi_err_map_xen_errno() inside sbi_probe_extension() to return an error
   value instead of -EOPNOTSUPP.
 - s/unsigned long start/vaddr_t start
 - s/unsgined long size/size_t size
 - update the commit message.
---
Changes in V5:
 - update the comment for sbi_has_rfence().
 - update the comment for sbi_remote_sfence_vma().
 - update the prototype of sbi_remote_sfence_vma() and declare cpu_mask
   argument as pointer to const.
 - use MASK_EXTR() for sbi_{major, minor}_version().
 - redefine SBI_SPEC_VERSION_MAJOR_MASK as 0x7F000000
 - drop SBI_SPEC_VERSION_MAJOR_SHIFT as unneeded.
 - add BUG_ON(ret.value < 0) inside sbi_ext_base_func() to be sure that
   ret.value is always >= 0 as SBI spec explicitly doesn't say that.
 - s/__sbi_rfence_v02_real/sbi_rfence_v02_real
 - s/__sbi_rfence_v02/sbi_rfence_v02
 - s/__sbi_rfence/sbi_rfence
 - fold cases inside sbi_rfence_v02_real()
 - mark sbi_rfence_v02 with cf_check.
 - code style fixes in sbi_rfence_v02().
 - add the comment with explanation of algorithm used in sbi_rfence_v02().
 - use __ro_after_init for sbi_rfence variable.
 - add ASSERT(sbi_rfebce) inside sbi_remote_sfence_vma to be sure that it
   is not NULL.
 - drop local variable ret inside sbi_init() and init sbi_spec_version
   directly by return value of sbi_get_spec_version() as this function
   should always be must always succeed.
 - add the comment above sbi_get_spec_version().
 - add BUG_ON for sbi_fw_id and sbi_fw_version() to be sure that they
   have correct values.
 - make sbi_fw_id, sbi_fw_version as local because they are used only once
   for printk().
 - s/veriosn/version
 - drop  BUG_ON("At the moment flush_xen_tlb_range_va() uses SBI rfence...")
   as now we have ASSERT() in the flace where sbi_rfence is actually used.
 - update the commit message.
 - s/BUG_ON("Ooops. SBI spec version 0.1 detected. Need to add support")/panic("Ooops. SBI ...");
---
Changes in V4:
 - update the commit message.
 - code style fixes
 - update return type of sbi_has_rfence() from int to bool and drop
   conditional operator inside implementation.
 - Update mapping of SBI_ERR_FAILURE in sbi_err_map_xen_errno().
 - Update return type of sbi_spec_is_0_1() and drop conditional operator
   inside implementation.
 - s/0x%lx/%#lx
 - update the comment above declaration of sbi_remote_sfence_vma() with
   more detailed explanation what the function does.
 - update prototype of sbi_remote_sfence_vma(). Now it receives cpumask_t
   and returns int.
 - refactor __sbi_rfence_v02() take from the Linux kernel as it takes into
   account a case that hart id could be from different hbase. For example,
   the case when hart IDs are the following 0, 3, 65, 2. Or the case when
   hart IDs are unsorted: 0 3 1 2.
 - drop sbi_cpumask_to_hartmask() as it is not needed anymore
 - Update the prototype of sbi_remote_sfence_vma() and implemntation accordingly
   to the fact it returns 'int'.
 - s/flush_xen_tlb_one_local/flush_tlb_one_local
---
Changes in V3:
 - new patch.
---
 xen/arch/riscv/include/asm/sbi.h |  63 +++++++
 xen/arch/riscv/sbi.c             | 274 ++++++++++++++++++++++++++++++-
 xen/arch/riscv/setup.c           |   3 +
 3 files changed, 339 insertions(+), 1 deletion(-)

diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h
index 0e6820a4ed..445d215535 100644
--- a/xen/arch/riscv/include/asm/sbi.h
+++ b/xen/arch/riscv/include/asm/sbi.h
@@ -12,8 +12,41 @@
 #ifndef __ASM_RISCV_SBI_H__
 #define __ASM_RISCV_SBI_H__
 
+#include <xen/cpumask.h>
+
 #define SBI_EXT_0_1_CONSOLE_PUTCHAR		0x1
 
+#define SBI_EXT_BASE                    0x10
+#define SBI_EXT_RFENCE                  0x52464E43
+
+/* SBI function IDs for BASE extension */
+#define SBI_EXT_BASE_GET_SPEC_VERSION   0x0
+#define SBI_EXT_BASE_GET_IMP_ID         0x1
+#define SBI_EXT_BASE_GET_IMP_VERSION    0x2
+#define SBI_EXT_BASE_PROBE_EXT          0x3
+
+/* SBI function IDs for RFENCE extension */
+#define SBI_EXT_RFENCE_REMOTE_FENCE_I           0x0
+#define SBI_EXT_RFENCE_REMOTE_SFENCE_VMA        0x1
+#define SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID   0x2
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA       0x3
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID  0x4
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA       0x5
+#define SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID  0x6
+
+#define SBI_SPEC_VERSION_MAJOR_MASK     0x7f000000
+#define SBI_SPEC_VERSION_MINOR_MASK     0x00ffffff
+
+/* SBI return error codes */
+#define SBI_SUCCESS             0
+#define SBI_ERR_FAILURE         (-1)
+#define SBI_ERR_NOT_SUPPORTED   (-2)
+#define SBI_ERR_INVALID_PARAM   (-3)
+#define SBI_ERR_DENIED          (-4)
+#define SBI_ERR_INVALID_ADDRESS (-5)
+
+#define SBI_SPEC_VERSION_DEFAULT 0x1
+
 struct sbiret {
     long error;
     long value;
@@ -31,4 +64,34 @@ struct sbiret sbi_ecall(unsigned long ext, unsigned long fid,
  */
 void sbi_console_putchar(int ch);
 
+/*
+ * Check underlying SBI implementation has RFENCE
+ *
+ * @return true for supported AND false for not-supported
+ */
+bool sbi_has_rfence(void);
+
+/*
+ * Instructs the remote harts to execute one or more SFENCE.VMA
+ * instructions, covering the range of virtual addresses between
+ * [start_addr, start_addr + size).
+ *
+ * Returns 0 if IPI was sent to all the targeted harts successfully
+ * or negative value if start_addr or size is not valid.
+ *
+ * @hart_mask a cpu mask containing all the target harts.
+ * @param start virtual address start
+ * @param size virtual address range size
+ */
+int sbi_remote_sfence_vma(const cpumask_t *cpu_mask,
+                          vaddr_t start,
+                          size_t size);
+
+/*
+ * Initialize SBI library
+ *
+ * @return 0 on success, otherwise negative errno on failure
+ */
+int sbi_init(void);
+
 #endif /* __ASM_RISCV_SBI_H__ */
diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c
index 0ae166c861..30ef661118 100644
--- a/xen/arch/riscv/sbi.c
+++ b/xen/arch/riscv/sbi.c
@@ -5,13 +5,26 @@
  * (anup.patel@wdc.com).
  *
  * Modified by Bobby Eshleman (bobby.eshleman@gmail.com).
+ * Modified by Oleksii Kurochko (oleksii.kurochko@gmail.com).
  *
  * Copyright (c) 2019 Western Digital Corporation or its affiliates.
- * Copyright (c) 2021-2023 Vates SAS.
+ * Copyright (c) 2021-2024 Vates SAS.
  */
 
+#include <xen/compiler.h>
+#include <xen/const.h>
+#include <xen/cpumask.h>
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/sections.h>
+#include <xen/smp.h>
+
+#include <asm/processor.h>
 #include <asm/sbi.h>
 
+static unsigned long __ro_after_init sbi_spec_version = SBI_SPEC_VERSION_DEFAULT;
+
 struct sbiret sbi_ecall(unsigned long ext, unsigned long fid,
                         unsigned long arg0, unsigned long arg1,
                         unsigned long arg2, unsigned long arg3,
@@ -38,7 +51,266 @@ struct sbiret sbi_ecall(unsigned long ext, unsigned long fid,
     return ret;
 }
 
+static int sbi_err_map_xen_errno(int err)
+{
+    switch ( err )
+    {
+    case SBI_SUCCESS:
+        return 0;
+    case SBI_ERR_DENIED:
+        return -EACCES;
+    case SBI_ERR_INVALID_PARAM:
+        return -EINVAL;
+    case SBI_ERR_INVALID_ADDRESS:
+        return -EFAULT;
+    case SBI_ERR_NOT_SUPPORTED:
+        return -EOPNOTSUPP;
+    case SBI_ERR_FAILURE:
+    default:
+        return -ENOSYS;
+    };
+}
+
 void sbi_console_putchar(int ch)
 {
     sbi_ecall(SBI_EXT_0_1_CONSOLE_PUTCHAR, 0, ch, 0, 0, 0, 0, 0);
 }
+
+static unsigned int sbi_major_version(void)
+{
+    return MASK_EXTR(sbi_spec_version, SBI_SPEC_VERSION_MAJOR_MASK);
+}
+
+static unsigned int sbi_minor_version(void)
+{
+    return MASK_EXTR(sbi_spec_version, SBI_SPEC_VERSION_MINOR_MASK);
+}
+
+static long sbi_ext_base_func(long fid)
+{
+    struct sbiret ret;
+
+    ret = sbi_ecall(SBI_EXT_BASE, fid, 0, 0, 0, 0, 0, 0);
+
+    if ( !ret.error )
+    {
+       /*
+        * I wasn't able to find a case in the SBI spec where sbiret.value
+        * could be negative.
+        *
+        * Unfortunately, the spec does not specify the possible values of
+        * sbiret.value, but based on the description of the SBI function,
+        * ret.value >= 0 when sbiret.error = 0. SPI spec specify only
+        * possible value for sbiret.error (<= 0 whwere 0 is SBI_SUCCESS ).
+        *
+        * Just to be sure that SBI base extension functions one day won't
+        * start to return a negative value for sbiret.value when
+        * sbiret.error < 0 BUG_ON() is added.
+        */
+        BUG_ON(ret.value < 0);
+
+        return ret.value;
+    }
+    else
+        return ret.error;
+}
+
+static int sbi_rfence_v02_real(unsigned long fid,
+                               unsigned long hmask, unsigned long hbase,
+                               vaddr_t start, size_t size,
+                               unsigned long arg4)
+{
+    struct sbiret ret = {0};
+    int result = 0;
+
+    switch ( fid )
+    {
+    case SBI_EXT_RFENCE_REMOTE_FENCE_I:
+        ret = sbi_ecall(SBI_EXT_RFENCE, fid, hmask, hbase,
+                        0, 0, 0, 0);
+        break;
+
+    case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
+    case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA:
+    case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
+        ret = sbi_ecall(SBI_EXT_RFENCE, fid, hmask, hbase,
+                        start, size, 0, 0);
+        break;
+
+    case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
+    case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
+    case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID:
+        ret = sbi_ecall(SBI_EXT_RFENCE, fid, hmask, hbase,
+                        start, size, arg4, 0);
+        break;
+
+    default:
+        printk("%s: unknown function ID [%#lx]\n",
+               __func__, fid);
+        result = -EINVAL;
+        break;
+    };
+
+    if ( ret.error )
+    {
+        result = sbi_err_map_xen_errno(ret.error);
+        printk("%s: hbase=%lu hmask=%#lx failed (error %ld)\n",
+               __func__, hbase, hmask, ret.error);
+    }
+
+    return result;
+}
+
+static int cf_check sbi_rfence_v02(unsigned long fid,
+                                   const cpumask_t *cpu_mask,
+                                   vaddr_t start, size_t size,
+                                   unsigned long arg4, unsigned long arg5)
+{
+    unsigned long hartid, cpuid, hmask = 0, hbase = 0, htop = 0;
+    int result = -EINVAL;
+
+    /*
+     * hart_mask_base can be set to -1 to indicate that hart_mask can be
+     * ignored and all available harts must be considered.
+     */
+    if ( !cpu_mask )
+        return sbi_rfence_v02_real(fid, 0UL, -1UL, start, size, arg4);
+
+    for_each_cpu ( cpuid, cpu_mask )
+    {
+        /*
+        * Hart IDs might not necessarily be numbered contiguously in
+        * a multiprocessor system.
+        *
+        * This means that it is possible for the hart ID mapping to look like:
+        *  0, 1, 3, 65, 66, 69
+        * In such cases, more than one call to sbi_rfence_v02_real() will be
+        * needed, as a single hmask can only cover sizeof(unsigned long) CPUs:
+        *  1. sbi_rfence_v02_real(hmask=0b1011, hbase=0)
+        *  2. sbi_rfence_v02_real(hmask=0b1011, hbase=65)
+        *
+        * The algorithm below tries to batch as many harts as possible before
+        * making an SBI call. However, batching may not always be possible.
+        * For example, consider the hart ID mapping:
+        *   0, 64, 1, 65, 2, 66 (1)
+        *
+        * Generally, batching is also possible for (1):
+        *    First (0,1,2), then (64,65,66).
+        * It just requires a different approach and updates to the current
+        * algorithm.
+        */
+        hartid = cpuid_to_hartid(cpuid);
+        if ( hmask )
+        {
+            if ( hartid + BITS_PER_LONG <= htop ||
+                 hbase + BITS_PER_LONG <= hartid )
+            {
+                result = sbi_rfence_v02_real(fid, hmask, hbase,
+                                             start, size, arg4);
+                hmask = 0;
+                if ( result )
+                    break;
+            }
+            else if ( hartid < hbase )
+            {
+                /* shift the mask to fit lower hartid */
+                hmask <<= hbase - hartid;
+                hbase = hartid;
+            }
+        }
+
+        if ( !hmask )
+        {
+            hbase = hartid;
+            htop = hartid;
+        }
+        else if ( hartid > htop )
+            htop = hartid;
+
+        hmask |= BIT(hartid - hbase, UL);
+    }
+
+    if ( hmask )
+        result = sbi_rfence_v02_real(fid, hmask, hbase,
+                                     start, size, arg4);
+
+    return result;
+}
+
+static int (* __ro_after_init sbi_rfence)(unsigned long fid,
+                                          const cpumask_t *cpu_mask,
+                                          vaddr_t start,
+                                          size_t size,
+                                          unsigned long arg4,
+                                          unsigned long arg5);
+
+int sbi_remote_sfence_vma(const cpumask_t *cpu_mask,
+                          vaddr_t start,
+                          size_t size)
+{
+    ASSERT(sbi_rfence);
+
+    return sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
+                      cpu_mask, start, size, 0, 0);
+}
+
+/* This function must always succeed. */
+#define sbi_get_spec_version()  \
+    sbi_ext_base_func(SBI_EXT_BASE_GET_SPEC_VERSION)
+
+#define sbi_get_firmware_id()   \
+    sbi_ext_base_func(SBI_EXT_BASE_GET_IMP_ID)
+
+#define sbi_get_firmware_version()  \
+    sbi_ext_base_func(SBI_EXT_BASE_GET_IMP_VERSION)
+
+int sbi_probe_extension(long extid)
+{
+    struct sbiret ret;
+
+    ret = sbi_ecall(SBI_EXT_BASE, SBI_EXT_BASE_PROBE_EXT, extid,
+                    0, 0, 0, 0, 0);
+    if ( !ret.error && ret.value )
+        return ret.value;
+
+    return sbi_err_map_xen_errno(ret.error);
+}
+
+static bool sbi_spec_is_0_1(void)
+{
+    return (sbi_spec_version == SBI_SPEC_VERSION_DEFAULT);
+}
+
+bool sbi_has_rfence(void)
+{
+    return (sbi_rfence != NULL);
+}
+
+int __init sbi_init(void)
+{
+    sbi_spec_version = sbi_get_spec_version();
+
+    printk("SBI specification v%u.%u detected\n",
+            sbi_major_version(), sbi_minor_version());
+
+    if ( !sbi_spec_is_0_1() )
+    {
+        long sbi_fw_id = sbi_get_firmware_id();
+        long sbi_fw_version = sbi_get_firmware_version();
+
+        BUG_ON((sbi_fw_id < 0) || (sbi_fw_version < 0));
+
+        printk("SBI implementation ID=%#lx Version=%#lx\n",
+            sbi_fw_id, sbi_fw_version);
+
+        if ( sbi_probe_extension(SBI_EXT_RFENCE) > 0 )
+        {
+            sbi_rfence = sbi_rfence_v02;
+            printk("SBI v0.2 RFENCE extension detected\n");
+        }
+    }
+    else
+        panic("Ooops. SBI spec version 0.1 detected. Need to add support");
+
+    return 0;
+}
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 540a3a608e..164b9cfdd1 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -8,6 +8,7 @@
 #include <public/version.h>
 
 #include <asm/early_printk.h>
+#include <asm/sbi.h>
 #include <asm/smp.h>
 #include <asm/traps.h>
 
@@ -47,6 +48,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
 
     trap_init();
 
+    sbi_init();
+
 #ifdef CONFIG_SELF_TESTS
     test_macros_from_bug_h();
 #endif
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension
  2024-09-02 17:01 ` [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension Oleksii Kurochko
@ 2024-09-10 11:32   ` Jan Beulich
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 11:32 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> Introduce functions to work with the SBI RFENCE extension for issuing
> various fence operations to remote CPUs.
> 
> Add the sbi_init() function along with auxiliary functions and macro
> definitions for proper initialization and checking the availability of
> SBI extensions. Currently, this is implemented only for RFENCE.
> 
> Introduce sbi_remote_sfence_vma() to send SFENCE_VMA instructions to
> a set of target HARTs. This will support the implementation of
> flush_xen_tlb_range_va().
> 
> Integrate __sbi_rfence_v02 from Linux kernel 6.6.0-rc4 with minimal
> modifications:
>  - Adapt to Xen code style.
>  - Use cpuid_to_hartid() instead of cpuid_to_hartid_map[].
>  - Update BIT(...) to BIT(..., UL).
>  - Rename __sbi_rfence_v02_call to sbi_rfence_v02_real and
>    remove the unused arg5.
>  - Handle NULL cpu_mask to execute rfence on all CPUs by calling
>    sbi_rfence_v02_real(..., 0UL, -1UL,...) instead of creating hmask.
>  - change type for start_addr and size to vaddr_t and size_t.
>  - Add an explanatory comment about when batching can and cannot occur,
>    and why batching happens in the first place.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>

Acked-by: Jan Beulich <jbeulich@suse.com>
with three more cosmetic things taken care of:

> +static long sbi_ext_base_func(long fid)
> +{
> +    struct sbiret ret;
> +
> +    ret = sbi_ecall(SBI_EXT_BASE, fid, 0, 0, 0, 0, 0, 0);
> +
> +    if ( !ret.error )
> +    {
> +       /*
> +        * I wasn't able to find a case in the SBI spec where sbiret.value
> +        * could be negative.
> +        *
> +        * Unfortunately, the spec does not specify the possible values of
> +        * sbiret.value, but based on the description of the SBI function,
> +        * ret.value >= 0 when sbiret.error = 0. SPI spec specify only
> +        * possible value for sbiret.error (<= 0 whwere 0 is SBI_SUCCESS ).
> +        *
> +        * Just to be sure that SBI base extension functions one day won't
> +        * start to return a negative value for sbiret.value when
> +        * sbiret.error < 0 BUG_ON() is added.
> +        */

The entire comment's indentation is off by one.

> +static int cf_check sbi_rfence_v02(unsigned long fid,
> +                                   const cpumask_t *cpu_mask,
> +                                   vaddr_t start, size_t size,
> +                                   unsigned long arg4, unsigned long arg5)
> +{
> +    unsigned long hartid, cpuid, hmask = 0, hbase = 0, htop = 0;
> +    int result = -EINVAL;
> +
> +    /*
> +     * hart_mask_base can be set to -1 to indicate that hart_mask can be
> +     * ignored and all available harts must be considered.
> +     */
> +    if ( !cpu_mask )
> +        return sbi_rfence_v02_real(fid, 0UL, -1UL, start, size, arg4);
> +
> +    for_each_cpu ( cpuid, cpu_mask )
> +    {
> +        /*
> +        * Hart IDs might not necessarily be numbered contiguously in
> +        * a multiprocessor system.
> +        *
> +        * This means that it is possible for the hart ID mapping to look like:
> +        *  0, 1, 3, 65, 66, 69
> +        * In such cases, more than one call to sbi_rfence_v02_real() will be
> +        * needed, as a single hmask can only cover sizeof(unsigned long) CPUs:
> +        *  1. sbi_rfence_v02_real(hmask=0b1011, hbase=0)
> +        *  2. sbi_rfence_v02_real(hmask=0b1011, hbase=65)
> +        *
> +        * The algorithm below tries to batch as many harts as possible before
> +        * making an SBI call. However, batching may not always be possible.
> +        * For example, consider the hart ID mapping:
> +        *   0, 64, 1, 65, 2, 66 (1)
> +        *
> +        * Generally, batching is also possible for (1):
> +        *    First (0,1,2), then (64,65,66).
> +        * It just requires a different approach and updates to the current
> +        * algorithm.
> +        */

Except for the initial line, the entire comment's indentation is off by
one.

> +int sbi_remote_sfence_vma(const cpumask_t *cpu_mask,
> +                          vaddr_t start,
> +                          size_t size)

Elsewhere you put multiple parameters on a line when they fit.

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 8/9] xen/riscv: page table handling
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (6 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  2024-09-10 12:19   ` Jan Beulich
  2024-09-02 17:01 ` [PATCH v6 9/9] xen/riscv: introduce early_fdt_map() Oleksii Kurochko
  8 siblings, 1 reply; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Implement map_pages_to_xen() which requires several
functions to manage page tables and entries:
- pt_update()
- pt_mapping_level()
- pt_update_entry()
- pt_next_level()
- pt_check_entry()

To support these operations, add functions for creating,
mapping, and unmapping Xen tables:
- create_table()
- map_table()
- unmap_table()

Introduce PTE_SMALL to indicate that 4KB mapping is needed
and PTE_POPULATE.

In addition introduce flush_tlb_range_va() for TLB flushing across
CPUs after updating the PTE for the requested mapping.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
 riscv_encoding.h using hard tabs as it is used in XVisor from where
 this file has been taken and SATP_PPN_MASK was aligned using 3 hard
 tabs as it was done for the definitions aove SATP_PPN_MASK.
---
Changes in V6:
 - update the commit message.
 - correct the comment above flush_tlb_range_va().
 - add PTE_READABLE to the check of pte.rwx permissions in
   pte_is_mapping().
 - s/printk/dprintk in pt_check_entry().
 - drop unnecessary ASSERTS() in pt_check_entry().
 - drop checking of PTE_VALID flags in /* Sanity check when removing
   a mapping */ because of the earlier check.
 - drop ASSERT(flags & PTE_POPULATE) in /* Sanity check when populating the page-table */
   section as in the earlier if it is checked.
 - pt_next_level() changes:
   - invert if ( alloc_tbl ) condition.
   - drop local variable ret.
 - pt_update_entry() changes:
   - invert definition of alloc_tbl.
   - update the comment inside "if ( rc == XEN_TABLE_MAP_FAILED )".
   - drop else for mentioned above if (...).
   - clear some PTE flags before update.
 - s/xen_pt_lock/pt_lock
 - use PFN_DOWN() for vfn variable definition in pt_update().
 - drop definition of PTE_{R,W,X}_MASK.
 - introduce PTE_XWV_BITS and PTE_XWV_MASK() for convenience and use them in if (...)
   in pt_update().
 - update the comment above pt_update().
 - change memset(&pte, 0x00, sizeof(pte)) to pte.pte = 0.
 - add the comment above pte_is_table().
 - add ASSERT in pte_is_mapping() to check the cases which are reserved for future
   use.
---
Changes in V5:
 - s/xen_{un}map/{un}map
 - introduce PTE_SMALL instead of PTE_BLOCK.
 - update the comment above defintion of PTE_4K_PAGES.
 - code style fixes.
 - s/RV_STAGE1_MODE > SATP_MODE_SV48/RV_STAGE1_MODE > SATP_MODE_SV39 around
   DECLARE_OFFSETS macros.
 - change type of root_maddr from unsgined long to maddr_t.
 - drop duplicated check ( if (rc) break ) in pt_update() inside while cycle.
 - s/1U/1UL
 - put 'spin_unlock(&xen_pt_lock);' ahead of TLB flush in pt_update().
 - update the commit message.
 - update the comment above ASSERT() in map_pages_to_xen() and also update
   the check within ASSERT() to check that flags has PTE_VALID bit set.
 - update the comment above pt_update() function.
 - add the comment inside pt_check_entry().
 - update the TLB flushing region in pt_update().
 - s/alloc_only/alloc_tbl
---
Changes in V4:
 - update the commit message.
 - drop xen_ prefix for functions: xen_pt_update(), xen_pt_mapping_level(),
   xen_pt_update_entry(), xen_pt_next_level(), xen_pt_check_entry().
 - drop 'select GENERIC_PT' for CONFIG_RISCV. There is no GENERIC_PT anymore.
 - update implementation of flush_xen_tlb_range_va and s/flush_xen_tlb_range_va/flush_tlb_range_va
 - s/pte_get_mfn/mfn_from_pte. Others similar definitions I decided not to touch as
   they were introduced before and this patter of naming such type of macros will be applied
   for newly introduced macros.
 - drop _PAGE_* definitions and use analogues of PTE_*.
 - introduce PTE_{W,X,R}_MASK and drop PAGE_{XN,W,X}_MASK. Also drop _PAGE_{*}_BIT
 - introduce PAGE_HYPERVISOR_RX.
 - drop unused now l3_table_offset.
 - drop struct pt_t as it was used only for one function. If it will be needed in the future
   pt_t will be re-introduced.
 - code styles fixes in pte_is_table(). drop level argument from t.
 - update implementation and prototype of pte_is_mapping().
 - drop level argument from pt_next_level().
 - introduce definition of SATP_PPN_MASK.
 - isolate PPN of CSR_SATP before shift by PAGE_SHIFT.
 - drop set_permission() functions as it is not used more then once.
 - update prototype of pt_check_entry(): drop level argument as it is not used.
 - pt_check_entry():
   - code style fixes
   - update the sanity check when modifying an entry
   - update the sanity check when when removing a mapping.
 - s/read_only/alloc_only.
 - code style fixes for pt_next_level().
 - pt_update_entry() changes:
   - drop arch_level variable inisde pt_update_entry()
   - drop convertion near virt to paddr_t in DECLARE_OFFSETS(offsets, virt);
   - pull out "goto out inside first 'for' cycle.
   - drop braces for 'if' cases which has only one line.
   - ident 'out' label with one blank.
   - update the comment above alloc_only and also definition to take into
     account  that if pte population was requested or not.
   - drop target variable and rename arch_target argument of the function to
     target.
 - pt_mapping_level() changes:
   - move the check if PTE_BLOCK should be mapped on the top of the function.
   - change int i to unsigned int and update 'for' cycle correspondingly.
 - update prototye of pt_update():
   - drop the comment  above nr_mfns and drop const to be consistent with other
     arguments.
   - always flush TLB at the end of the function as non-present entries can be put
     in the TLB.
   - add fence before TLB flush to ensure that PTEs are all updated before flushing.
 - s/XEN_TABLE_NORMAL_PAGE/XEN_TABLE_NORMAL
 - add a check in map_pages_to_xen() the mfn is not INVALID_MFN.
 - add the comment on top of pt_update() how mfn = INVALID_MFN is considered.
 - s/_PAGE_BLOCK/PTE_BLOCK.
 - add the comment with additional explanation for PTE_BLOCK.
 - drop defintion of FIRST_SIZE as it isn't used.
---
Changes in V3:
 - new patch. ( Technically it is reworked version of the generic approach
   which I tried to suggest in the previous version )
---
 xen/arch/riscv/Makefile                     |   1 +
 xen/arch/riscv/include/asm/flushtlb.h       |   9 +
 xen/arch/riscv/include/asm/mm.h             |   2 +
 xen/arch/riscv/include/asm/page.h           |  72 ++++
 xen/arch/riscv/include/asm/riscv_encoding.h |   1 +
 xen/arch/riscv/mm.c                         |   9 -
 xen/arch/riscv/pt.c                         | 423 ++++++++++++++++++++
 7 files changed, 508 insertions(+), 9 deletions(-)
 create mode 100644 xen/arch/riscv/pt.c

diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 2f2d6647a2..fca9fd93b6 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-y += entry.o
 obj-y += mm.o
+obj-y += pt.o
 obj-$(CONFIG_RISCV_64) += riscv64/
 obj-y += sbi.o
 obj-y += setup.o
diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h
index f4a735fd6c..43214f5e95 100644
--- a/xen/arch/riscv/include/asm/flushtlb.h
+++ b/xen/arch/riscv/include/asm/flushtlb.h
@@ -5,12 +5,21 @@
 #include <xen/bug.h>
 #include <xen/cpumask.h>
 
+#include <asm/sbi.h>
+
 /* Flush TLB of local processor for address va. */
 static inline void flush_tlb_one_local(vaddr_t va)
 {
     asm volatile ( "sfence.vma %0" :: "r" (va) : "memory" );
 }
 
+/* Flush a range of VA's hypervisor mappings from the TLB of all processors. */
+static inline void flush_tlb_range_va(vaddr_t va, size_t size)
+{
+    BUG_ON(!sbi_has_rfence());
+    sbi_remote_sfence_vma(NULL, va, size);
+}
+
 /*
  * Filter the given set of CPUs, removing those that definitely flushed their
  * TLB since @page_timestamp.
diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index a0bdc2bc3a..ce1557bb27 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -42,6 +42,8 @@ static inline void *maddr_to_virt(paddr_t ma)
 #define virt_to_mfn(va)     __virt_to_mfn(va)
 #define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
 
+#define mfn_from_pte(pte) maddr_to_mfn(pte_to_paddr(pte))
+
 struct page_info
 {
     /* Each frame can be threaded onto a doubly-linked list. */
diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
index 55916eaa92..9b7d4fd597 100644
--- a/xen/arch/riscv/include/asm/page.h
+++ b/xen/arch/riscv/include/asm/page.h
@@ -21,6 +21,11 @@
 #define XEN_PT_LEVEL_MAP_MASK(lvl)  (~(XEN_PT_LEVEL_SIZE(lvl) - 1))
 #define XEN_PT_LEVEL_MASK(lvl)      (VPN_MASK << XEN_PT_LEVEL_SHIFT(lvl))
 
+/*
+ * PTE format:
+ * | XLEN-1  10 | 9             8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
+ *       PFN      reserved for SW   D   A   G   U   X   W   R   V
+ */
 #define PTE_VALID                   BIT(0, UL)
 #define PTE_READABLE                BIT(1, UL)
 #define PTE_WRITABLE                BIT(2, UL)
@@ -34,15 +39,51 @@
 #define PTE_LEAF_DEFAULT            (PTE_VALID | PTE_READABLE | PTE_WRITABLE)
 #define PTE_TABLE                   (PTE_VALID)
 
+#define PAGE_HYPERVISOR_RO          (PTE_VALID | PTE_READABLE)
 #define PAGE_HYPERVISOR_RW          (PTE_VALID | PTE_READABLE | PTE_WRITABLE)
+#define PAGE_HYPERVISOR_RX          (PTE_VALID | PTE_READABLE | PTE_EXECUTABLE)
 
 #define PAGE_HYPERVISOR             PAGE_HYPERVISOR_RW
 
+/*
+ * The PTE format does not contain the following bits within itself;
+ * they are created artificially to inform the Xen page table
+ * handling algorithm. These bits should not be explicitly written
+ * to the PTE entry.
+ */
+#define PTE_SMALL       BIT(10, UL)
+#define PTE_POPULATE    BIT(11, UL)
+
+#define PTE_XWV_BITS    (PTE_WRITABLE | PTE_EXECUTABLE | PTE_VALID)
+#define PTE_XWV_MASK(x) ((x) & PTE_XWV_BITS)
+#define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE))
+
 /* Calculate the offsets into the pagetables for a given VA */
 #define pt_linear_offset(lvl, va)   ((va) >> XEN_PT_LEVEL_SHIFT(lvl))
 
 #define pt_index(lvl, va) (pt_linear_offset((lvl), (va)) & VPN_MASK)
 
+#define PAGETABLE_ORDER_MASK ((_AC(1, U) << PAGETABLE_ORDER) - 1)
+#define TABLE_OFFSET(offs) (_AT(unsigned int, offs) & PAGETABLE_ORDER_MASK)
+
+#if RV_STAGE1_MODE > SATP_MODE_SV39
+#error "need to to update DECLARE_OFFSETS macros"
+#else
+
+#define l0_table_offset(va) TABLE_OFFSET(pt_linear_offset(0, va))
+#define l1_table_offset(va) TABLE_OFFSET(pt_linear_offset(1, va))
+#define l2_table_offset(va) TABLE_OFFSET(pt_linear_offset(2, va))
+
+/* Generate an array @var containing the offset for each level from @addr */
+#define DECLARE_OFFSETS(var, addr)          \
+    const unsigned int var[] = {            \
+        l0_table_offset(addr),              \
+        l1_table_offset(addr),              \
+        l2_table_offset(addr),              \
+    }
+
+#endif
+
 /* Page Table entry */
 typedef struct {
 #ifdef CONFIG_RISCV_64
@@ -68,6 +109,37 @@ static inline bool pte_is_valid(pte_t p)
     return p.pte & PTE_VALID;
 }
 
+/*
+ * From the RISC-V spec:
+ *    Table 4.5 summarizes the encoding of the permission bits.
+ *      X W R Meaning
+ *      0 0 0 Pointer to next level of page table.
+ *      0 0 1 Read-only page.
+ *      0 1 0 Reserved for future use.
+ *      0 1 1 Read-write page.
+ *      1 0 0 Execute-only page.
+ *      1 0 1 Read-execute page.
+ *      1 1 0 Reserved for future use.
+ *      1 1 1 Read-write-execute page.
+ */
+inline bool pte_is_table(const pte_t p)
+{
+    return ((p.pte & (PTE_VALID |
+                      PTE_READABLE |
+                      PTE_WRITABLE |
+                      PTE_EXECUTABLE)) == PTE_VALID);
+}
+
+static inline bool pte_is_mapping(const pte_t p)
+{
+    /* W = 1 || (X=1 && W=1) -> Reserved for future use */
+    ASSERT((PTE_RWX_MASK(p.pte) != PTE_WRITABLE) ||
+           (PTE_RWX_MASK(p.pte) != (PTE_WRITABLE | PTE_EXECUTABLE)));
+
+    return (p.pte & PTE_VALID) &&
+           (p.pte & (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE));
+}
+
 static inline void invalidate_icache(void)
 {
     BUG_ON("unimplemented");
diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h
index 58abe5eccc..d80cef0093 100644
--- a/xen/arch/riscv/include/asm/riscv_encoding.h
+++ b/xen/arch/riscv/include/asm/riscv_encoding.h
@@ -164,6 +164,7 @@
 #define SSTATUS_SD			SSTATUS64_SD
 #define SATP_MODE			SATP64_MODE
 #define SATP_MODE_SHIFT			SATP64_MODE_SHIFT
+#define SATP_PPN_MASK			_UL(0x00000FFFFFFFFFFF)
 
 #define HGATP_PPN			HGATP64_PPN
 #define HGATP_VMID_SHIFT		HGATP64_VMID_SHIFT
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index b8ff91cf4e..e8430def14 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -369,12 +369,3 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
     BUG_ON("unimplemented");
     return -1;
 }
-
-int map_pages_to_xen(unsigned long virt,
-                     mfn_t mfn,
-                     unsigned long nr_mfns,
-                     unsigned int flags)
-{
-    BUG_ON("unimplemented");
-    return -1;
-}
diff --git a/xen/arch/riscv/pt.c b/xen/arch/riscv/pt.c
new file mode 100644
index 0000000000..332ae90599
--- /dev/null
+++ b/xen/arch/riscv/pt.c
@@ -0,0 +1,423 @@
+#include <xen/bug.h>
+#include <xen/domain_page.h>
+#include <xen/errno.h>
+#include <xen/lib.h>
+#include <xen/mm.h>
+#include <xen/mm-frame.h>
+#include <xen/pfn.h>
+#include <xen/pmap.h>
+#include <xen/spinlock.h>
+
+#include <asm/flushtlb.h>
+#include <asm/page.h>
+
+static inline const mfn_t get_root_page(void)
+{
+    paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) << PAGE_SHIFT;
+
+    return maddr_to_mfn(root_maddr);
+}
+
+/* Sanity check of the entry. */
+static bool pt_check_entry(pte_t entry, mfn_t mfn, unsigned int flags)
+{
+    /*
+     * See the comment about the possible combination of (mfn, flags) in
+     * the comment above pt_update().
+     */
+
+    /* Sanity check when modifying an entry. */
+    if ( (flags & PTE_VALID) && mfn_eq(mfn, INVALID_MFN) )
+    {
+        /* We don't allow modifying an invalid entry. */
+        if ( !pte_is_valid(entry) )
+        {
+            dprintk(XENLOG_ERR, "Modifying invalid entry is not allowed\n");
+            return false;
+        }
+
+        /* We don't allow modifying a table entry */
+        if ( pte_is_table(entry) )
+        {
+            dprintk(XENLOG_ERR, "Modifying a table entry is not allowed\n");
+            return false;
+        }
+    }
+    /* Sanity check when inserting a mapping */
+    else if ( flags & PTE_VALID )
+    {
+        /*
+         * We don't allow replacing any valid entry.
+         *
+         * Note that the function pt_update() relies on this
+         * assumption and will skip the TLB flush (when Svvptc
+         * extension will be ratified). The function will need
+         * to be updated if the check is relaxed.
+         */
+        if ( pte_is_valid(entry) )
+        {
+            if ( pte_is_mapping(entry) )
+                dprintk(XENLOG_ERR, "Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn")\n",
+                       mfn_x(mfn_from_pte(entry)), mfn_x(mfn));
+            else
+                dprintk(XENLOG_ERR, "Trying to replace a table with a mapping\n");
+            return false;
+        }
+    }
+    /* Sanity check when removing a mapping. */
+    else if ( !(flags & PTE_POPULATE) )
+    {
+        /* We should be here with an invalid MFN. */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+
+        /* We don't allow removing a table */
+        if ( pte_is_table(entry) )
+        {
+            dprintk(XENLOG_ERR, "Removing a table is not allowed\n");
+            return false;
+        }
+    }
+    /* Sanity check when populating the page-table. No check so far. */
+    else
+    {
+        /* We should be here with an invalid MFN */
+        ASSERT(mfn_eq(mfn, INVALID_MFN));
+    }
+
+    return true;
+}
+
+static pte_t *map_table(mfn_t mfn)
+{
+    /*
+     * During early boot, map_domain_page() may be unusable. Use the
+     * PMAP to map temporarily a page-table.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        return pmap_map(mfn);
+
+    return map_domain_page(mfn);
+}
+
+static void unmap_table(const pte_t *table)
+{
+    /*
+     * During early boot, map_table() will not use map_domain_page()
+     * but the PMAP.
+     */
+    if ( system_state == SYS_STATE_early_boot )
+        pmap_unmap(table);
+    else
+        unmap_domain_page(table);
+}
+
+static int create_table(pte_t *entry)
+{
+    mfn_t mfn;
+    void *p;
+    pte_t pte;
+
+    if ( system_state != SYS_STATE_early_boot )
+    {
+        struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+        if ( pg == NULL )
+            return -ENOMEM;
+
+        mfn = page_to_mfn(pg);
+    }
+    else
+        mfn = alloc_boot_pages(1, 1);
+
+    p = map_table(mfn);
+    clear_page(p);
+    unmap_table(p);
+
+    pte = pte_from_mfn(mfn, PTE_TABLE);
+    write_pte(entry, pte);
+
+    return 0;
+}
+
+#define XEN_TABLE_MAP_FAILED 0
+#define XEN_TABLE_SUPER_PAGE 1
+#define XEN_TABLE_NORMAL 2
+
+/*
+ * Take the currently mapped table, find the corresponding entry,
+ * and map the next table, if available.
+ *
+ * The alloc_tbl parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  XEN_TABLE_MAP_FAILED: Either alloc_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  XEN_TABLE_NORMAL: next level or leaf mapped normally
+ *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int pt_next_level(bool alloc_tbl, pte_t **table, unsigned int offset)
+{
+    pte_t *entry;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !pte_is_valid(*entry) )
+    {
+        if ( !alloc_tbl )
+            return XEN_TABLE_MAP_FAILED;
+
+        if ( create_table(entry) )
+            return XEN_TABLE_MAP_FAILED;
+    }
+
+    if ( pte_is_mapping(*entry) )
+        return XEN_TABLE_SUPER_PAGE;
+
+    mfn = mfn_from_pte(*entry);
+
+    unmap_table(*table);
+    *table = map_table(mfn);
+
+    return XEN_TABLE_NORMAL;
+}
+
+/* Update an entry at the level @target. */
+static int pt_update_entry(mfn_t root, unsigned long virt,
+                           mfn_t mfn, unsigned int target,
+                           unsigned int flags)
+{
+    int rc;
+    unsigned int level = HYP_PT_ROOT_LEVEL;
+    pte_t *table;
+    /*
+     * The intermediate page table shouldn't be allocated when MFN isn't
+     * valid and we are not populating page table.
+     * This means we either modify permissions or remove an entry, or
+     * inserting brand new entry.
+     *
+     * See the comment above pt_update() for an additional explanation about
+     * combinations of (mfn, flags).
+    */
+    bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
+    pte_t pte, *entry;
+
+    /* convenience aliases */
+    DECLARE_OFFSETS(offsets, virt);
+
+    table = map_table(root);
+    for ( ; level > target; level-- )
+    {
+        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
+        if ( rc == XEN_TABLE_MAP_FAILED )
+        {
+            rc = 0;
+
+            /*
+             * We are here because pt_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and the pt shouldn't be allocated). It is a valid case when
+             * removing a mapping as it may not exist in the page table.
+             * In this case, just ignore it.
+             */
+            if ( flags & PTE_VALID )
+            {
+                printk("%s: Unable to map level %u\n", __func__, level);
+                rc = -ENOENT;
+            }
+
+            goto out;
+        }
+
+        if ( rc != XEN_TABLE_NORMAL )
+            break;
+    }
+
+    if ( level != target )
+    {
+        printk("%s: Shattering superpage is not supported\n", __func__);
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    entry = table + offsets[level];
+
+    rc = -EINVAL;
+    if ( !pt_check_entry(*entry, mfn, flags) )
+        goto out;
+
+    /* We are removing the page */
+    if ( !(flags & PTE_VALID) )
+        /*
+         * there is also a check in pt_check_entry() which check that
+         * mfn=INVALID_MFN
+         */
+        pte.pte = 0;
+    else
+    {
+        /* We are inserting a mapping => Create new pte. */
+        if ( !mfn_eq(mfn, INVALID_MFN) )
+            pte = pte_from_mfn(mfn, PTE_VALID);
+        else /* We are updating the permission => Copy the current pte. */
+            pte = *entry;
+
+        /* update permission according to the flags */
+        pte.pte &= ~PTE_RWX_MASK(flags);
+        pte.pte |= PTE_RWX_MASK(flags) | PTE_ACCESSED | PTE_DIRTY;
+    }
+
+    write_pte(entry, pte);
+
+    rc = 0;
+
+ out:
+    unmap_table(table);
+
+    return rc;
+}
+
+/* Return the level where mapping should be done */
+static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
+                            unsigned int flags)
+{
+    unsigned int level = 0;
+    unsigned long mask;
+    unsigned int i;
+
+    /* Use blocking mapping unless the caller requests 4K mapping */
+    if ( unlikely(flags & PTE_SMALL) )
+        return level;
+
+    /*
+     * Don't take into account the MFN when removing mapping (i.e
+     * MFN_INVALID) to calculate the correct target order.
+     *
+     * `vfn` and `mfn` must be both superpage aligned.
+     * They are or-ed together and then checked against the size of
+     * each level.
+     *
+     * `left` ( variable declared in pt_update() ) is not included
+     * and checked separately to allow superpage mapping even if it
+     * is not properly aligned (the user may have asked to map 2MB + 4k).
+     */
+    mask = !mfn_eq(mfn, INVALID_MFN) ? mfn_x(mfn) : 0;
+    mask |= vfn;
+
+    for ( i = HYP_PT_ROOT_LEVEL; i != 0; i-- )
+    {
+        if ( !(mask & (BIT(XEN_PT_LEVEL_ORDER(i), UL) - 1)) &&
+             (nr >= BIT(XEN_PT_LEVEL_ORDER(i), UL)) )
+        {
+            level = i;
+            break;
+        }
+    }
+
+    return level;
+}
+
+static DEFINE_SPINLOCK(pt_lock);
+
+/*
+ * If `mfn` equals `INVALID_MFN`, it indicates that the following page table
+ * update operation might be related to either:
+ *   - populating the table (PTE_POPULATE will be set additionaly),
+ *   - destroying a mapping (PTE_VALID = 0 and mfn = INVALID_MFN),
+ *   - modifying an existing mapping ( PTE_VALID = 1 and mfn == INVALID_MFN ).
+ *
+ * If `mfn` != INVALID_MFN and flags has PTE_VALID bit set then it means that
+ * inserting will be done.
+ */
+static int pt_update(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    int rc = 0;
+    unsigned long vfn = PFN_DOWN(virt);
+    unsigned long left = nr_mfns;
+
+    const mfn_t root = get_root_page();
+
+    /*
+     * It is bad idea to have mapping both writeable and
+     * executable.
+     * When modifying/creating mapping (i.e PTE_VALID is set),
+     * prevent any update if this happen.
+     */
+    if ( PTE_XWV_MASK(flags) == PTE_XWV_BITS )
+    {
+        printk("Mappings should not be both Writeable and Executable.\n");
+        return -EINVAL;
+    }
+
+    if ( !IS_ALIGNED(virt, PAGE_SIZE) )
+    {
+        printk("The virtual address is not aligned to the page-size.\n");
+        return -EINVAL;
+    }
+
+    spin_lock(&pt_lock);
+
+    while ( left )
+    {
+        unsigned int order, level;
+
+        level = pt_mapping_level(vfn, mfn, left, flags);
+        order = XEN_PT_LEVEL_ORDER(level);
+
+        ASSERT(left >= BIT(order, UL));
+
+        rc = pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
+        if ( rc )
+            break;
+
+        vfn += 1UL << order;
+        if ( !mfn_eq(mfn, INVALID_MFN) )
+            mfn = mfn_add(mfn, 1UL << order);
+
+        left -= (1UL << order);
+    }
+
+    /* Ensure that PTEs are all updated before flushing */
+    RISCV_FENCE(rw, rw);
+
+    spin_unlock(&pt_lock);
+
+    /*
+     * Always flush TLB at the end of the function as non-present entries
+     * can be put in the TLB.
+     *
+     * The remote fence operation applies to the entire address space if
+     * either:
+     *  - start and size are both 0, or
+     *  - size is equal to 2^XLEN-1.
+     *
+     * TODO: come up with something which will allow not to flash the entire
+     *       address space.
+     */
+    flush_tlb_range_va(0, 0);
+
+    return rc;
+}
+
+int map_pages_to_xen(unsigned long virt,
+                     mfn_t mfn,
+                     unsigned long nr_mfns,
+                     unsigned int flags)
+{
+    /*
+     * Ensure that flags has PTE_VALID bit as map_pages_to_xen() is supposed
+     * to create a mapping.
+     *
+     * Ensure that we have a valid MFN before proceeding.
+     *
+     * If the MFN is invalid, pt_update() might misinterpret the operation,
+     * treating it as either a population, a mapping destruction,
+     * or a mapping modification.
+     */
+    ASSERT(!mfn_eq(mfn, INVALID_MFN) || (flags & PTE_VALID));
+
+    return pt_update(virt, mfn, nr_mfns, flags);
+}
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 8/9] xen/riscv: page table handling
  2024-09-02 17:01 ` [PATCH v6 8/9] xen/riscv: page table handling Oleksii Kurochko
@ 2024-09-10 12:19   ` Jan Beulich
  2024-09-11 15:09     ` oleksii.kurochko
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Beulich @ 2024-09-10 12:19 UTC (permalink / raw)
  To: Oleksii Kurochko
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

On 02.09.2024 19:01, Oleksii Kurochko wrote:
> Implement map_pages_to_xen() which requires several
> functions to manage page tables and entries:
> - pt_update()
> - pt_mapping_level()
> - pt_update_entry()
> - pt_next_level()
> - pt_check_entry()
> 
> To support these operations, add functions for creating,
> mapping, and unmapping Xen tables:
> - create_table()
> - map_table()
> - unmap_table()
> 
> Introduce PTE_SMALL to indicate that 4KB mapping is needed
> and PTE_POPULATE.
> 
> In addition introduce flush_tlb_range_va() for TLB flushing across
> CPUs after updating the PTE for the requested mapping.
> 
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> ---
>  riscv_encoding.h using hard tabs as it is used in XVisor from where
>  this file has been taken and SATP_PPN_MASK was aligned using 3 hard
>  tabs as it was done for the definitions aove SATP_PPN_MASK.
> ---
> Changes in V6:
>  - update the commit message.
>  - correct the comment above flush_tlb_range_va().
>  - add PTE_READABLE to the check of pte.rwx permissions in
>    pte_is_mapping().
>  - s/printk/dprintk in pt_check_entry().
>  - drop unnecessary ASSERTS() in pt_check_entry().
>  - drop checking of PTE_VALID flags in /* Sanity check when removing
>    a mapping */ because of the earlier check.
>  - drop ASSERT(flags & PTE_POPULATE) in /* Sanity check when populating the page-table */
>    section as in the earlier if it is checked.
>  - pt_next_level() changes:
>    - invert if ( alloc_tbl ) condition.
>    - drop local variable ret.
>  - pt_update_entry() changes:
>    - invert definition of alloc_tbl.
>    - update the comment inside "if ( rc == XEN_TABLE_MAP_FAILED )".
>    - drop else for mentioned above if (...).
>    - clear some PTE flags before update.
>  - s/xen_pt_lock/pt_lock
>  - use PFN_DOWN() for vfn variable definition in pt_update().
>  - drop definition of PTE_{R,W,X}_MASK.
>  - introduce PTE_XWV_BITS and PTE_XWV_MASK() for convenience and use them in if (...)
>    in pt_update().
>  - update the comment above pt_update().
>  - change memset(&pte, 0x00, sizeof(pte)) to pte.pte = 0.
>  - add the comment above pte_is_table().
>  - add ASSERT in pte_is_mapping() to check the cases which are reserved for future
>    use.
> ---
> Changes in V5:
>  - s/xen_{un}map/{un}map
>  - introduce PTE_SMALL instead of PTE_BLOCK.
>  - update the comment above defintion of PTE_4K_PAGES.
>  - code style fixes.
>  - s/RV_STAGE1_MODE > SATP_MODE_SV48/RV_STAGE1_MODE > SATP_MODE_SV39 around
>    DECLARE_OFFSETS macros.
>  - change type of root_maddr from unsgined long to maddr_t.
>  - drop duplicated check ( if (rc) break ) in pt_update() inside while cycle.
>  - s/1U/1UL
>  - put 'spin_unlock(&xen_pt_lock);' ahead of TLB flush in pt_update().
>  - update the commit message.
>  - update the comment above ASSERT() in map_pages_to_xen() and also update
>    the check within ASSERT() to check that flags has PTE_VALID bit set.
>  - update the comment above pt_update() function.
>  - add the comment inside pt_check_entry().
>  - update the TLB flushing region in pt_update().
>  - s/alloc_only/alloc_tbl
> ---
> Changes in V4:
>  - update the commit message.
>  - drop xen_ prefix for functions: xen_pt_update(), xen_pt_mapping_level(),
>    xen_pt_update_entry(), xen_pt_next_level(), xen_pt_check_entry().
>  - drop 'select GENERIC_PT' for CONFIG_RISCV. There is no GENERIC_PT anymore.
>  - update implementation of flush_xen_tlb_range_va and s/flush_xen_tlb_range_va/flush_tlb_range_va
>  - s/pte_get_mfn/mfn_from_pte. Others similar definitions I decided not to touch as
>    they were introduced before and this patter of naming such type of macros will be applied
>    for newly introduced macros.
>  - drop _PAGE_* definitions and use analogues of PTE_*.
>  - introduce PTE_{W,X,R}_MASK and drop PAGE_{XN,W,X}_MASK. Also drop _PAGE_{*}_BIT
>  - introduce PAGE_HYPERVISOR_RX.
>  - drop unused now l3_table_offset.
>  - drop struct pt_t as it was used only for one function. If it will be needed in the future
>    pt_t will be re-introduced.
>  - code styles fixes in pte_is_table(). drop level argument from t.
>  - update implementation and prototype of pte_is_mapping().
>  - drop level argument from pt_next_level().
>  - introduce definition of SATP_PPN_MASK.
>  - isolate PPN of CSR_SATP before shift by PAGE_SHIFT.
>  - drop set_permission() functions as it is not used more then once.
>  - update prototype of pt_check_entry(): drop level argument as it is not used.
>  - pt_check_entry():
>    - code style fixes
>    - update the sanity check when modifying an entry
>    - update the sanity check when when removing a mapping.
>  - s/read_only/alloc_only.
>  - code style fixes for pt_next_level().
>  - pt_update_entry() changes:
>    - drop arch_level variable inisde pt_update_entry()
>    - drop convertion near virt to paddr_t in DECLARE_OFFSETS(offsets, virt);
>    - pull out "goto out inside first 'for' cycle.
>    - drop braces for 'if' cases which has only one line.
>    - ident 'out' label with one blank.
>    - update the comment above alloc_only and also definition to take into
>      account  that if pte population was requested or not.
>    - drop target variable and rename arch_target argument of the function to
>      target.
>  - pt_mapping_level() changes:
>    - move the check if PTE_BLOCK should be mapped on the top of the function.
>    - change int i to unsigned int and update 'for' cycle correspondingly.
>  - update prototye of pt_update():
>    - drop the comment  above nr_mfns and drop const to be consistent with other
>      arguments.
>    - always flush TLB at the end of the function as non-present entries can be put
>      in the TLB.
>    - add fence before TLB flush to ensure that PTEs are all updated before flushing.
>  - s/XEN_TABLE_NORMAL_PAGE/XEN_TABLE_NORMAL
>  - add a check in map_pages_to_xen() the mfn is not INVALID_MFN.
>  - add the comment on top of pt_update() how mfn = INVALID_MFN is considered.
>  - s/_PAGE_BLOCK/PTE_BLOCK.
>  - add the comment with additional explanation for PTE_BLOCK.
>  - drop defintion of FIRST_SIZE as it isn't used.
> ---
> Changes in V3:
>  - new patch. ( Technically it is reworked version of the generic approach
>    which I tried to suggest in the previous version )
> ---
>  xen/arch/riscv/Makefile                     |   1 +
>  xen/arch/riscv/include/asm/flushtlb.h       |   9 +
>  xen/arch/riscv/include/asm/mm.h             |   2 +
>  xen/arch/riscv/include/asm/page.h           |  72 ++++
>  xen/arch/riscv/include/asm/riscv_encoding.h |   1 +
>  xen/arch/riscv/mm.c                         |   9 -
>  xen/arch/riscv/pt.c                         | 423 ++++++++++++++++++++
>  7 files changed, 508 insertions(+), 9 deletions(-)
>  create mode 100644 xen/arch/riscv/pt.c
> 
> diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
> index 2f2d6647a2..fca9fd93b6 100644
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
>  obj-y += entry.o
>  obj-y += mm.o
> +obj-y += pt.o
>  obj-$(CONFIG_RISCV_64) += riscv64/
>  obj-y += sbi.o
>  obj-y += setup.o
> diff --git a/xen/arch/riscv/include/asm/flushtlb.h b/xen/arch/riscv/include/asm/flushtlb.h
> index f4a735fd6c..43214f5e95 100644
> --- a/xen/arch/riscv/include/asm/flushtlb.h
> +++ b/xen/arch/riscv/include/asm/flushtlb.h
> @@ -5,12 +5,21 @@
>  #include <xen/bug.h>
>  #include <xen/cpumask.h>
>  
> +#include <asm/sbi.h>
> +
>  /* Flush TLB of local processor for address va. */
>  static inline void flush_tlb_one_local(vaddr_t va)
>  {
>      asm volatile ( "sfence.vma %0" :: "r" (va) : "memory" );
>  }
>  
> +/* Flush a range of VA's hypervisor mappings from the TLB of all processors. */
> +static inline void flush_tlb_range_va(vaddr_t va, size_t size)
> +{
> +    BUG_ON(!sbi_has_rfence());
> +    sbi_remote_sfence_vma(NULL, va, size);
> +}
> +
>  /*
>   * Filter the given set of CPUs, removing those that definitely flushed their
>   * TLB since @page_timestamp.
> diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
> index a0bdc2bc3a..ce1557bb27 100644
> --- a/xen/arch/riscv/include/asm/mm.h
> +++ b/xen/arch/riscv/include/asm/mm.h
> @@ -42,6 +42,8 @@ static inline void *maddr_to_virt(paddr_t ma)
>  #define virt_to_mfn(va)     __virt_to_mfn(va)
>  #define mfn_to_virt(mfn)    __mfn_to_virt(mfn)
>  
> +#define mfn_from_pte(pte) maddr_to_mfn(pte_to_paddr(pte))
> +
>  struct page_info
>  {
>      /* Each frame can be threaded onto a doubly-linked list. */
> diff --git a/xen/arch/riscv/include/asm/page.h b/xen/arch/riscv/include/asm/page.h
> index 55916eaa92..9b7d4fd597 100644
> --- a/xen/arch/riscv/include/asm/page.h
> +++ b/xen/arch/riscv/include/asm/page.h
> @@ -21,6 +21,11 @@
>  #define XEN_PT_LEVEL_MAP_MASK(lvl)  (~(XEN_PT_LEVEL_SIZE(lvl) - 1))
>  #define XEN_PT_LEVEL_MASK(lvl)      (VPN_MASK << XEN_PT_LEVEL_SHIFT(lvl))
>  
> +/*
> + * PTE format:
> + * | XLEN-1  10 | 9             8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> + *       PFN      reserved for SW   D   A   G   U   X   W   R   V
> + */
>  #define PTE_VALID                   BIT(0, UL)
>  #define PTE_READABLE                BIT(1, UL)
>  #define PTE_WRITABLE                BIT(2, UL)
> @@ -34,15 +39,51 @@
>  #define PTE_LEAF_DEFAULT            (PTE_VALID | PTE_READABLE | PTE_WRITABLE)
>  #define PTE_TABLE                   (PTE_VALID)
>  
> +#define PAGE_HYPERVISOR_RO          (PTE_VALID | PTE_READABLE)
>  #define PAGE_HYPERVISOR_RW          (PTE_VALID | PTE_READABLE | PTE_WRITABLE)
> +#define PAGE_HYPERVISOR_RX          (PTE_VALID | PTE_READABLE | PTE_EXECUTABLE)
>  
>  #define PAGE_HYPERVISOR             PAGE_HYPERVISOR_RW
>  
> +/*
> + * The PTE format does not contain the following bits within itself;
> + * they are created artificially to inform the Xen page table
> + * handling algorithm. These bits should not be explicitly written
> + * to the PTE entry.
> + */
> +#define PTE_SMALL       BIT(10, UL)
> +#define PTE_POPULATE    BIT(11, UL)
> +
> +#define PTE_XWV_BITS    (PTE_WRITABLE | PTE_EXECUTABLE | PTE_VALID)
> +#define PTE_XWV_MASK(x) ((x) & PTE_XWV_BITS)
> +#define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE))

I think I commented on *_MASK macros before: They are conventionally
constants (see e.g. PAGETABLE_ORDER_MASK that you have further down),
not operations on an input. It's not really clear to me what the
"mask" in this name is meant to signify as to what the macros are
doing. I seem to vaguely recall that you indicated you'd drop all
such helpers, in favor of using respective constants directly at use
sites.

As a less significant (because of being a matter of personal taste to
a fair degree) aspect: XWV is a pretty random sequence of characters.
I for one wouldn't remember what order they need to be used in, and
hence would always need to look this up.

Taken together, what about

#define PTE_LEAF_MASK   (PTE_WRITABLE | PTE_EXECUTABLE | PTE_VALID)
#define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE | PTE_EXECUTABLE)

?

> @@ -68,6 +109,37 @@ static inline bool pte_is_valid(pte_t p)
>      return p.pte & PTE_VALID;
>  }
>  
> +/*
> + * From the RISC-V spec:
> + *    Table 4.5 summarizes the encoding of the permission bits.
> + *      X W R Meaning
> + *      0 0 0 Pointer to next level of page table.
> + *      0 0 1 Read-only page.
> + *      0 1 0 Reserved for future use.
> + *      0 1 1 Read-write page.
> + *      1 0 0 Execute-only page.
> + *      1 0 1 Read-execute page.
> + *      1 1 0 Reserved for future use.
> + *      1 1 1 Read-write-execute page.
> + */
> +inline bool pte_is_table(const pte_t p)
> +{
> +    return ((p.pte & (PTE_VALID |
> +                      PTE_READABLE |
> +                      PTE_WRITABLE |
> +                      PTE_EXECUTABLE)) == PTE_VALID);
> +}
> +
> +static inline bool pte_is_mapping(const pte_t p)
> +{
> +    /* W = 1 || (X=1 && W=1) -> Reserved for future use */
> +    ASSERT((PTE_RWX_MASK(p.pte) != PTE_WRITABLE) ||
> +           (PTE_RWX_MASK(p.pte) != (PTE_WRITABLE | PTE_EXECUTABLE)));

I'm afraid I'm pretty unhappy with comments not matching the commented
code: The comment mentions only set bits, but not clear ones. Further
you're missing a check of the V bit - with that clear, the other bits
can be set whichever way. Taken together (and the spec also says it
this way): If V=1 and W=1 then R also needs to be 1.

Also - isn't this check equally relevant in pte_is_table()?

> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> @@ -164,6 +164,7 @@
>  #define SSTATUS_SD			SSTATUS64_SD
>  #define SATP_MODE			SATP64_MODE
>  #define SATP_MODE_SHIFT			SATP64_MODE_SHIFT
> +#define SATP_PPN_MASK			_UL(0x00000FFFFFFFFFFF)

Why not SATP64_PPN on the rhs? And why no similar #define in the #else
block that follows, using SATP32_PPN?

> --- /dev/null
> +++ b/xen/arch/riscv/pt.c
> @@ -0,0 +1,423 @@
> +#include <xen/bug.h>
> +#include <xen/domain_page.h>
> +#include <xen/errno.h>
> +#include <xen/lib.h>
> +#include <xen/mm.h>
> +#include <xen/mm-frame.h>

I don#t think you need this when you already have xen/mm.h.

> +#include <xen/pfn.h>
> +#include <xen/pmap.h>
> +#include <xen/spinlock.h>
> +
> +#include <asm/flushtlb.h>
> +#include <asm/page.h>
> +
> +static inline const mfn_t get_root_page(void)
> +{
> +    paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) << PAGE_SHIFT;

Won't this lose bits in RV32 mode? IOW wouldn't you better avoid open-
coding pfn_to_paddr() here?

> +    return maddr_to_mfn(root_maddr);
> +}
> +
> +/* Sanity check of the entry. */
> +static bool pt_check_entry(pte_t entry, mfn_t mfn, unsigned int flags)
> +{
> +    /*
> +     * See the comment about the possible combination of (mfn, flags) in
> +     * the comment above pt_update().
> +     */
> +
> +    /* Sanity check when modifying an entry. */
> +    if ( (flags & PTE_VALID) && mfn_eq(mfn, INVALID_MFN) )
> +    {
> +        /* We don't allow modifying an invalid entry. */
> +        if ( !pte_is_valid(entry) )
> +        {
> +            dprintk(XENLOG_ERR, "Modifying invalid entry is not allowed\n");
> +            return false;
> +        }
> +
> +        /* We don't allow modifying a table entry */
> +        if ( pte_is_table(entry) )
> +        {
> +            dprintk(XENLOG_ERR, "Modifying a table entry is not allowed\n");
> +            return false;
> +        }
> +    }
> +    /* Sanity check when inserting a mapping */
> +    else if ( flags & PTE_VALID )
> +    {
> +        /*
> +         * We don't allow replacing any valid entry.
> +         *
> +         * Note that the function pt_update() relies on this
> +         * assumption and will skip the TLB flush (when Svvptc
> +         * extension will be ratified). The function will need
> +         * to be updated if the check is relaxed.
> +         */
> +        if ( pte_is_valid(entry) )
> +        {
> +            if ( pte_is_mapping(entry) )
> +                dprintk(XENLOG_ERR, "Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn")\n",

As a general suggestion: Try to keep such log messages short, by omitting
parts not needed for understanding of what's meant. Here e.g. "Changing
MFN for valid PTE not allowed (...)\n".

> +                       mfn_x(mfn_from_pte(entry)), mfn_x(mfn));
> +            else
> +                dprintk(XENLOG_ERR, "Trying to replace a table with a mapping\n");

Similarly this would imply you can omit both "a" here.

> +static int pt_update_entry(mfn_t root, unsigned long virt,
> +                           mfn_t mfn, unsigned int target,
> +                           unsigned int flags)
> +{
> +    int rc;
> +    unsigned int level = HYP_PT_ROOT_LEVEL;
> +    pte_t *table;
> +    /*
> +     * The intermediate page table shouldn't be allocated when MFN isn't
> +     * valid and we are not populating page table.
> +     * This means we either modify permissions or remove an entry, or
> +     * inserting brand new entry.
> +     *
> +     * See the comment above pt_update() for an additional explanation about
> +     * combinations of (mfn, flags).
> +    */
> +    bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags & PTE_POPULATE);
> +    pte_t pte, *entry;
> +
> +    /* convenience aliases */
> +    DECLARE_OFFSETS(offsets, virt);
> +
> +    table = map_table(root);
> +    for ( ; level > target; level-- )
> +    {
> +        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> +        if ( rc == XEN_TABLE_MAP_FAILED )
> +        {
> +            rc = 0;
> +
> +            /*
> +             * We are here because pt_next_level has failed to map
> +             * the intermediate page table (e.g the table does not exist
> +             * and the pt shouldn't be allocated). It is a valid case when
> +             * removing a mapping as it may not exist in the page table.
> +             * In this case, just ignore it.
> +             */
> +            if ( flags & PTE_VALID )
> +            {
> +                printk("%s: Unable to map level %u\n", __func__, level);
> +                rc = -ENOENT;
> +            }

Both comment and error code assume the !populate case. What about the case
where the allocation failed? That's "couldn't be allocated" and would better
return back -ENOMEM (as create_table() correctly returns in that case).

> +            goto out;
> +        }
> +
> +        if ( rc != XEN_TABLE_NORMAL )
> +            break;
> +    }
> +
> +    if ( level != target )
> +    {
> +        printk("%s: Shattering superpage is not supported\n", __func__);
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    entry = table + offsets[level];
> +
> +    rc = -EINVAL;
> +    if ( !pt_check_entry(*entry, mfn, flags) )
> +        goto out;
> +
> +    /* We are removing the page */
> +    if ( !(flags & PTE_VALID) )
> +        /*
> +         * there is also a check in pt_check_entry() which check that
> +         * mfn=INVALID_MFN
> +         */
> +        pte.pte = 0;
> +    else
> +    {
> +        /* We are inserting a mapping => Create new pte. */
> +        if ( !mfn_eq(mfn, INVALID_MFN) )
> +            pte = pte_from_mfn(mfn, PTE_VALID);
> +        else /* We are updating the permission => Copy the current pte. */
> +            pte = *entry;
> +
> +        /* update permission according to the flags */
> +        pte.pte &= ~PTE_RWX_MASK(flags);

Thus really is needed only on the "else" branch above.

> +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr,
> +                            unsigned int flags)
> +{
> +    unsigned int level = 0;
> +    unsigned long mask;
> +    unsigned int i;
> +
> +    /* Use blocking mapping unless the caller requests 4K mapping */
> +    if ( unlikely(flags & PTE_SMALL) )
> +        return level;

Maybe "block mapping" in the comment? "Blocking" typically has quite
different a meaning. I'm uncertain about that terminology anyway, as
the spec doesn't use it.

> +/*
> + * If `mfn` equals `INVALID_MFN`, it indicates that the following page table
> + * update operation might be related to either:
> + *   - populating the table (PTE_POPULATE will be set additionaly),
> + *   - destroying a mapping (PTE_VALID = 0 and mfn = INVALID_MFN),
> + *   - modifying an existing mapping ( PTE_VALID = 1 and mfn == INVALID_MFN ).

No need to repeat the INVALID_MFN part that is already stated at the
start of the paragraph. It would also be nice if you consistently
omitted the blanks immediately inside parentheses in comments.

> + * If `mfn` != INVALID_MFN and flags has PTE_VALID bit set then it means that
> + * inserting will be done.
> + */
> +static int pt_update(unsigned long virt,
> +                     mfn_t mfn,
> +                     unsigned long nr_mfns,
> +                     unsigned int flags)
> +{
> +    int rc = 0;
> +    unsigned long vfn = PFN_DOWN(virt);
> +    unsigned long left = nr_mfns;
> +
> +    const mfn_t root = get_root_page();
> +
> +    /*
> +     * It is bad idea to have mapping both writeable and
> +     * executable.
> +     * When modifying/creating mapping (i.e PTE_VALID is set),
> +     * prevent any update if this happen.
> +     */
> +    if ( PTE_XWV_MASK(flags) == PTE_XWV_BITS )
> +    {
> +        printk("Mappings should not be both Writeable and Executable.\n");

I'm pretty sure I asked before that you omit full stops from log messages.

> +int map_pages_to_xen(unsigned long virt,
> +                     mfn_t mfn,
> +                     unsigned long nr_mfns,
> +                     unsigned int flags)
> +{
> +    /*
> +     * Ensure that flags has PTE_VALID bit as map_pages_to_xen() is supposed
> +     * to create a mapping.
> +     *
> +     * Ensure that we have a valid MFN before proceeding.
> +     *
> +     * If the MFN is invalid, pt_update() might misinterpret the operation,
> +     * treating it as either a population, a mapping destruction,
> +     * or a mapping modification.
> +     */
> +    ASSERT(!mfn_eq(mfn, INVALID_MFN) || (flags & PTE_VALID));

Judging from the comment, do you mean && instead of || ?

Jan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 8/9] xen/riscv: page table handling
  2024-09-10 12:19   ` Jan Beulich
@ 2024-09-11 15:09     ` oleksii.kurochko
  0 siblings, 0 replies; 42+ messages in thread
From: oleksii.kurochko @ 2024-09-11 15:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
	Julien Grall, Stefano Stabellini, xen-devel

> > +/*
> > + * The PTE format does not contain the following bits within
> > itself;
> > + * they are created artificially to inform the Xen page table
> > + * handling algorithm. These bits should not be explicitly written
> > + * to the PTE entry.
> > + */
> > +#define PTE_SMALL       BIT(10, UL)
> > +#define PTE_POPULATE    BIT(11, UL)
> > +
> > +#define PTE_XWV_BITS    (PTE_WRITABLE | PTE_EXECUTABLE |
> > PTE_VALID)
> > +#define PTE_XWV_MASK(x) ((x) & PTE_XWV_BITS)
> > +#define PTE_RWX_MASK(x) ((x) & (PTE_READABLE | PTE_WRITABLE |
> > PTE_EXECUTABLE))
> 
> I think I commented on *_MASK macros before: They are conventionally
> constants (see e.g. PAGETABLE_ORDER_MASK that you have further down),
> not operations on an input. It's not really clear to me what the
> "mask" in this name is meant to signify as to what the macros are
> doing. I seem to vaguely recall that you indicated you'd drop all
> such helpers, in favor of using respective constants directly at use
> sites.
Regarding *_MASK I wrote about PTE_{R,W,X}_MASK ( which were dropped
becuase they were used only once ) and by MASK here I mean that only
some bits are going to be taken. For example, PTE_XWV_MASK() means that
only eXecute, Write and Valid bits will be taken. Probably EXTR (
extract ) would be better to use instead of EXTR.

> 
> As a less significant (because of being a matter of personal taste to
> a fair degree) aspect: XWV is a pretty random sequence of characters.
> I for one wouldn't remember what order they need to be used in, and
> hence would always need to look this up.
I used that letter as  they are used by RISC-V spec.

> 
> Taken together, what about
> 
> #define PTE_LEAF_MASK   (PTE_WRITABLE | PTE_EXECUTABLE | PTE_VALID)
> #define PTE_ACCESS_MASK (PTE_READABLE | PTE_WRITABLE |
> PTE_EXECUTABLE)
> 
> ?
Looks good to me. I will use them. Thanks for the naming and
clarification.

> 
> > @@ -68,6 +109,37 @@ static inline bool pte_is_valid(pte_t p)
> >      return p.pte & PTE_VALID;
> >  }
> >  
> > +/*
> > + * From the RISC-V spec:
> > + *    Table 4.5 summarizes the encoding of the permission bits.
> > + *      X W R Meaning
> > + *      0 0 0 Pointer to next level of page table.
> > + *      0 0 1 Read-only page.
> > + *      0 1 0 Reserved for future use.
> > + *      0 1 1 Read-write page.
> > + *      1 0 0 Execute-only page.
> > + *      1 0 1 Read-execute page.
> > + *      1 1 0 Reserved for future use.
> > + *      1 1 1 Read-write-execute page.
> > + */
> > +inline bool pte_is_table(const pte_t p)
> > +{
> > +    return ((p.pte & (PTE_VALID |
> > +                      PTE_READABLE |
> > +                      PTE_WRITABLE |
> > +                      PTE_EXECUTABLE)) == PTE_VALID);
> > +}
> > +
> > +static inline bool pte_is_mapping(const pte_t p)
> > +{
> > +    /* W = 1 || (X=1 && W=1) -> Reserved for future use */
> > +    ASSERT((PTE_RWX_MASK(p.pte) != PTE_WRITABLE) ||
> > +           (PTE_RWX_MASK(p.pte) != (PTE_WRITABLE |
> > PTE_EXECUTABLE)));
> 
> I'm afraid I'm pretty unhappy with comments not matching the
> commented
> code: The comment mentions only set bits, but not clear ones.
I assumed that it would be clear that other bits should be 0 taking
into account the table above but I will update the comment to be more
precise.

>  Further
> you're missing a check of the V bit - with that clear, the other bits
> can be set whichever way. Taken together (and the spec also says it
> this way): If V=1 and W=1 then R also needs to be 1.
My intention was to check in the way how it is mentioned in the table
4.5. For example,
  0 1 0 Reserved for future use.
So I wanted to check that X=R=0 and W=1, I just confused myself with
that ASSERT(p) checks inside !p. I will update ASSERT() properly.

> 
> Also - isn't this check equally relevant in pte_is_table()?
Missed that, it should be in pte_is_table() too.

> 
> > --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> > +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> > @@ -164,6 +164,7 @@
> >  #define SSTATUS_SD			SSTATUS64_SD
> >  #define SATP_MODE			SATP64_MODE
> >  #define SATP_MODE_SHIFT			SATP64_MODE_SHIFT
> > +#define SATP_PPN_MASK			_UL(0x00000FFFFFFFFFFF)
> 
> Why not SATP64_PPN on the rhs? And why no similar #define in the
> #else
> block that follows, using SATP32_PPN?
> 
> > --- /dev/null
> > +++ b/xen/arch/riscv/pt.c
> > @@ -0,0 +1,423 @@
> > +#include <xen/bug.h>
> > +#include <xen/domain_page.h>
> > +#include <xen/errno.h>
> > +#include <xen/lib.h>
> > +#include <xen/mm.h>
> > +#include <xen/mm-frame.h>
> 
> I don#t think you need this when you already have xen/mm.h.
> 
> > +#include <xen/pfn.h>
> > +#include <xen/pmap.h>
> > +#include <xen/spinlock.h>
> > +
> > +#include <asm/flushtlb.h>
> > +#include <asm/page.h>
> > +
> > +static inline const mfn_t get_root_page(void)
> > +{
> > +    paddr_t root_maddr = (csr_read(CSR_SATP) & SATP_PPN_MASK) <<
> > PAGE_SHIFT;
> 
> Won't this lose bits in RV32 mode? IOW wouldn't you better avoid
> open-
> coding pfn_to_paddr() here?
Considering that PPN for RV32 mode is 22 bits then it will lose bits.
Anyway I agree that it would be better to use pfn_to_paddr().

> 
> > +static int pt_update_entry(mfn_t root, unsigned long virt,
> > +                           mfn_t mfn, unsigned int target,
> > +                           unsigned int flags)
> > +{
> > +    int rc;
> > +    unsigned int level = HYP_PT_ROOT_LEVEL;
> > +    pte_t *table;
> > +    /*
> > +     * The intermediate page table shouldn't be allocated when MFN
> > isn't
> > +     * valid and we are not populating page table.
> > +     * This means we either modify permissions or remove an entry,
> > or
> > +     * inserting brand new entry.
> > +     *
> > +     * See the comment above pt_update() for an additional
> > explanation about
> > +     * combinations of (mfn, flags).
> > +    */
> > +    bool alloc_tbl = !mfn_eq(mfn, INVALID_MFN) || (flags &
> > PTE_POPULATE);
> > +    pte_t pte, *entry;
> > +
> > +    /* convenience aliases */
> > +    DECLARE_OFFSETS(offsets, virt);
> > +
> > +    table = map_table(root);
> > +    for ( ; level > target; level-- )
> > +    {
> > +        rc = pt_next_level(alloc_tbl, &table, offsets[level]);
> > +        if ( rc == XEN_TABLE_MAP_FAILED )
> > +        {
> > +            rc = 0;
> > +
> > +            /*
> > +             * We are here because pt_next_level has failed to map
> > +             * the intermediate page table (e.g the table does not
> > exist
> > +             * and the pt shouldn't be allocated). It is a valid
> > case when
> > +             * removing a mapping as it may not exist in the page
> > table.
> > +             * In this case, just ignore it.
> > +             */
> > +            if ( flags & PTE_VALID )
> > +            {
> > +                printk("%s: Unable to map level %u\n", __func__,
> > level);
> > +                rc = -ENOENT;
> > +            }
> 
> Both comment and error code assume the !populate case. What about the
> case
> where the allocation failed? That's "couldn't be allocated" and would
> better
> return back -ENOMEM (as create_table() correctly returns in that
> case).
The condition should be updated here:
            if ( flags & (PTE_VALID|PTE_POPULATE) )
            {
                ...

> 
> > +static int pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned
> > long nr,
> > +                            unsigned int flags)
> > +{
> > +    unsigned int level = 0;
> > +    unsigned long mask;
> > +    unsigned int i;
> > +
> > +    /* Use blocking mapping unless the caller requests 4K mapping
> > */
> > +    if ( unlikely(flags & PTE_SMALL) )
> > +        return level;
> 
> Maybe "block mapping" in the comment? "Blocking" typically has quite
> different a meaning. I'm uncertain about that terminology anyway, as
> the spec doesn't use it.
You are right that the spec doesn't define how to call bigger then 4k
mapping so I just re-use terminology from Arm here. Probably it would
be better just to reword:
/* Use a larger mapping than 4K unless the caller specifically requests
a 4K mapping */

> 
> > + * If `mfn` != INVALID_MFN and flags has PTE_VALID bit set then it
> > means that
> > + * inserting will be done.
> > + */
> > +static int pt_update(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags)
> > +{
> > +    int rc = 0;
> > +    unsigned long vfn = PFN_DOWN(virt);
> > +    unsigned long left = nr_mfns;
> > +
> > +    const mfn_t root = get_root_page();
> > +
> > +    /*
> > +     * It is bad idea to have mapping both writeable and
> > +     * executable.
> > +     * When modifying/creating mapping (i.e PTE_VALID is set),
> > +     * prevent any update if this happen.
> > +     */
> > +    if ( PTE_XWV_MASK(flags) == PTE_XWV_BITS )
> > +    {
> > +        printk("Mappings should not be both Writeable and
> > Executable.\n");
> 
> I'm pretty sure I asked before that you omit full stops from log
> messages.
Yes, you asked and I think that even in this place. Just overlooked
that it was a lot of comments to the previous patch version. Sorry for
that.

> 
> > +int map_pages_to_xen(unsigned long virt,
> > +                     mfn_t mfn,
> > +                     unsigned long nr_mfns,
> > +                     unsigned int flags)
> > +{
> > +    /*
> > +     * Ensure that flags has PTE_VALID bit as map_pages_to_xen()
> > is supposed
> > +     * to create a mapping.
> > +     *
> > +     * Ensure that we have a valid MFN before proceeding.
> > +     *
> > +     * If the MFN is invalid, pt_update() might misinterpret the
> > operation,
> > +     * treating it as either a population, a mapping destruction,
> > +     * or a mapping modification.
> > +     */
> > +    ASSERT(!mfn_eq(mfn, INVALID_MFN) || (flags & PTE_VALID));
> 
> Judging from the comment, do you mean && instead of || ?
Yes, it should be &&.

Thanks.

~ Oleksii




^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 9/9] xen/riscv: introduce early_fdt_map()
  2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
                   ` (7 preceding siblings ...)
  2024-09-02 17:01 ` [PATCH v6 8/9] xen/riscv: page table handling Oleksii Kurochko
@ 2024-09-02 17:01 ` Oleksii Kurochko
  8 siblings, 0 replies; 42+ messages in thread
From: Oleksii Kurochko @ 2024-09-02 17:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
	Andrew Cooper, Jan Beulich, Julien Grall, Stefano Stabellini

Introduce function which allows to map FDT to Xen.

Also, initialization of device_tree_flattened happens using
early_fdt_map().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in V6:
 - Nothing changed. Only rebase.
---
Changes in V5:
 - drop usage of PTE_BLOCK for flag argument of map_pages_to_xen() in early_fdt_map()
   as block mapping is now default behaviour. Also PTE_BLOCK was dropped in the patch
   "xen/riscv: page table handling".
---
Changes in V4:
 - s/_PAGE_BLOCK/PTE_BLOCK
 - Add Acked-by: Jan Beulich <jbeulich@suse.com>
 - unwarap two lines in panic() in case when device_tree_flattened is NULL
   so  grep-ing for any part of the message line will always produce a hit.
 - slightly update the commit message.
---
Changes in V3:
 - Code style fixes
 - s/SZ_2M/MB(2)
 - fix condition to check if early_fdt_map() in setup.c return NULL or not.
---
Changes in V2:
 - rework early_fdt_map to use map_pages_to_xen()
 - move call early_fdt_map() to C code after MMU is enabled.
---
 xen/arch/riscv/include/asm/mm.h |  2 ++
 xen/arch/riscv/mm.c             | 55 +++++++++++++++++++++++++++++++++
 xen/arch/riscv/setup.c          |  7 +++++
 3 files changed, 64 insertions(+)

diff --git a/xen/arch/riscv/include/asm/mm.h b/xen/arch/riscv/include/asm/mm.h
index ce1557bb27..4b7b00b850 100644
--- a/xen/arch/riscv/include/asm/mm.h
+++ b/xen/arch/riscv/include/asm/mm.h
@@ -259,4 +259,6 @@ static inline unsigned int arch_get_dma_bitsize(void)
 
 void setup_fixmap_mappings(void);
 
+void *early_fdt_map(paddr_t fdt_paddr);
+
 #endif /* _ASM_RISCV_MM_H */
diff --git a/xen/arch/riscv/mm.c b/xen/arch/riscv/mm.c
index e8430def14..4a628aef83 100644
--- a/xen/arch/riscv/mm.c
+++ b/xen/arch/riscv/mm.c
@@ -1,13 +1,16 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 
+#include <xen/bootfdt.h>
 #include <xen/bug.h>
 #include <xen/compiler.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
+#include <xen/libfdt/libfdt.h>
 #include <xen/macros.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
 #include <xen/sections.h>
+#include <xen/sizes.h>
 
 #include <asm/early_printk.h>
 #include <asm/csr.h>
@@ -369,3 +372,55 @@ int destroy_xen_mappings(unsigned long s, unsigned long e)
     BUG_ON("unimplemented");
     return -1;
 }
+
+void * __init early_fdt_map(paddr_t fdt_paddr)
+{
+    /* We are using 2MB superpage for mapping the FDT */
+    paddr_t base_paddr = fdt_paddr & XEN_PT_LEVEL_MAP_MASK(1);
+    paddr_t offset;
+    void *fdt_virt;
+    uint32_t size;
+    int rc;
+
+    /*
+     * Check whether the physical FDT address is set and meets the minimum
+     * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
+     * least 8 bytes so that we always access the magic and size fields
+     * of the FDT header after mapping the first chunk, double check if
+     * that is indeed the case.
+     */
+    BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
+    if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
+        return NULL;
+
+    /* The FDT is mapped using 2MB superpage */
+    BUILD_BUG_ON(BOOT_FDT_VIRT_START % MB(2));
+
+    rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr),
+                          MB(2) >> PAGE_SHIFT,
+                          PAGE_HYPERVISOR_RO);
+    if ( rc )
+        panic("Unable to map the device-tree.\n");
+
+    offset = fdt_paddr % XEN_PT_LEVEL_SIZE(1);
+    fdt_virt = (void *)BOOT_FDT_VIRT_START + offset;
+
+    if ( fdt_magic(fdt_virt) != FDT_MAGIC )
+        return NULL;
+
+    size = fdt_totalsize(fdt_virt);
+    if ( size > BOOT_FDT_VIRT_SIZE )
+        return NULL;
+
+    if ( (offset + size) > MB(2) )
+    {
+        rc = map_pages_to_xen(BOOT_FDT_VIRT_START + MB(2),
+                              maddr_to_mfn(base_paddr + MB(2)),
+                              MB(2) >> PAGE_SHIFT,
+                              PAGE_HYPERVISOR_RO);
+        if ( rc )
+            panic("Unable to map the device-tree\n");
+    }
+
+    return fdt_virt;
+}
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 164b9cfdd1..a671a5442b 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -2,6 +2,7 @@
 
 #include <xen/bug.h>
 #include <xen/compile.h>
+#include <xen/device_tree.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 
@@ -56,6 +57,12 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
 
     setup_fixmap_mappings();
 
+    device_tree_flattened = early_fdt_map(dtb_addr);
+    if ( !device_tree_flattened )
+        panic("Invalid device tree blob at physical address %#lx. The DTB must be 8-byte aligned and must not exceed %lld bytes in size.\n\n"
+              "Please check your bootloader.\n",
+              dtb_addr, BOOT_FDT_VIRT_SIZE);
+
     printk("All set up\n");
 
     for ( ;; )
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-09-13 12:51 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-02 17:01 [PATCH v6 0/9] RISCV device tree mapping Oleksii Kurochko
2024-09-02 17:01 ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Oleksii Kurochko
2024-09-03 14:19   ` [PATCH] RISCV/shutdown: Implement machine_{halt,restart}() Andrew Cooper
2024-09-03 14:23     ` Andrew Cooper
2024-09-03 14:27       ` Jan Beulich
2024-09-03 14:26     ` Jan Beulich
2024-09-03 14:27       ` Andrew Cooper
2024-09-04 10:22     ` oleksii.kurochko
2024-09-10  9:42   ` [PATCH v6 1/9] xen/riscv: prevent recursion when ASSERT(), BUG*(), or panic() are called Jan Beulich
2024-09-10 13:55     ` oleksii.kurochko
2024-09-02 17:01 ` [PATCH v6 2/9] xen/riscv: use {read,write}{b,w,l,q}_cpu() to define {read,write}_atomic() Oleksii Kurochko
2024-09-03 14:21   ` Andrew Cooper
2024-09-04 10:27     ` oleksii.kurochko
2024-09-04 10:31       ` Andrew Cooper
2024-09-05 15:45         ` oleksii.kurochko
2024-09-02 17:01 ` [PATCH v6 3/9] xen/riscv: allow write_atomic() to work with non-scalar types Oleksii Kurochko
2024-09-10  9:53   ` Jan Beulich
2024-09-10 15:28     ` oleksii.kurochko
2024-09-10 16:05       ` Jan Beulich
2024-09-11 11:34         ` oleksii.kurochko
2024-09-11 11:49           ` Jan Beulich
2024-09-12 11:15             ` oleksii.kurochko
2024-09-12 11:41               ` oleksii.kurochko
2024-09-02 17:01 ` [PATCH v6 4/9] xen/riscv: set up fixmap mappings Oleksii Kurochko
2024-09-10 10:01   ` Jan Beulich
2024-09-10 15:55     ` oleksii.kurochko
2024-09-10 16:07       ` Jan Beulich
2024-09-02 17:01 ` [PATCH v6 5/9] xen/riscv: introduce asm/pmap.h header Oleksii Kurochko
2024-09-02 17:01 ` [PATCH v6 6/9] xen/riscv: introduce functionality to work with CPU info Oleksii Kurochko
2024-09-10 10:33   ` Jan Beulich
2024-09-11 12:05     ` oleksii.kurochko
2024-09-11 12:14       ` Jan Beulich
2024-09-12  9:27         ` oleksii.kurochko
2024-09-12  9:58           ` Jan Beulich
2024-09-12 16:02     ` oleksii.kurochko
2024-09-13 12:51       ` Jan Beulich
2024-09-02 17:01 ` [PATCH v6 7/9] xen/riscv: introduce and initialize SBI RFENCE extension Oleksii Kurochko
2024-09-10 11:32   ` Jan Beulich
2024-09-02 17:01 ` [PATCH v6 8/9] xen/riscv: page table handling Oleksii Kurochko
2024-09-10 12:19   ` Jan Beulich
2024-09-11 15:09     ` oleksii.kurochko
2024-09-02 17:01 ` [PATCH v6 9/9] xen/riscv: introduce early_fdt_map() Oleksii Kurochko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.