All of lore.kernel.org
 help / color / mirror / Atom feed
* [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS
@ 2025-03-21  9:24 Juergen Gross
  2025-03-21  9:24 ` [MINI-OS PATCH 01/12] kexec: add kexec framework Juergen Gross
                   ` (13 more replies)
  0 siblings, 14 replies; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Add basic kexec support to Mini-OS for running in x86 PVH mode.

With this series applied it is possible to activate another kernel
from within Mini-OS.

Right now no Xen related teardown is done (so no reset of grant table,
event channels, PV devices). These should be added via kexec callbacks
which are added as a framework.

This is a major building block for support of Xenstore-stubdom live
update (in fact I've tested the kexec path to work using the PVH
variant of Xenstore-stubdom).

Juergen Gross (12):
  add kexec framework
  Mini-OS: add final kexec stage
  mini-os: add elf.h
  mini-os: analyze new kernel for kexec
  mini-os: kexec: finalize parameter location and size
  mini-os: reserve memory below boundary
  mini-os: kexec: build parameters for new kernel
  mini-os: kexec: move used pages away for new kernel
  Mini-OS: mm: change set_readonly() to change_readonly()
  Mini-OS: kexec: switch read-only area to be writable again
  mini-os: kexec: add kexec callback functionality
  mini-os: kexec: do the final kexec step

 Config.mk                  |   1 +
 Makefile                   |   1 +
 arch/x86/kexec.c           | 273 +++++++++++++++++++++++++++++
 arch/x86/minios-x86.lds.S  |  16 ++
 arch/x86/mm.c              | 238 ++++++++++++++++++++------
 arch/x86/testbuild/all-no  |   1 +
 arch/x86/testbuild/all-yes |   2 +
 arch/x86/testbuild/kexec   |   4 +
 arch/x86/x86_hvm.S         |  46 +++++
 include/elf.h              | 340 +++++++++++++++++++++++++++++++++++++
 include/kexec.h            |  63 +++++++
 include/mm.h               |   8 +
 include/x86/os.h           |   5 +
 kexec.c                    | 253 +++++++++++++++++++++++++++
 mm.c                       |  89 +++++++++-
 15 files changed, 1289 insertions(+), 51 deletions(-)
 create mode 100644 arch/x86/kexec.c
 create mode 100644 arch/x86/testbuild/kexec
 create mode 100644 include/elf.h
 create mode 100644 include/kexec.h
 create mode 100644 kexec.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 01/12] kexec: add kexec framework
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 16:40   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 02/12] kexec: add final kexec stage Juergen Gross
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Add a new config option CONFIG_KEXEC for support of kexec-ing into a
new mini-os kernel. Add a related kexec.c source and a kexec.h header.

For now allow CONFIG_KEXEC to be set only for PVH variant of mini-os.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 Config.mk                  |  1 +
 Makefile                   |  1 +
 arch/x86/testbuild/all-no  |  1 +
 arch/x86/testbuild/all-yes |  2 ++
 arch/x86/testbuild/kexec   |  4 +++
 include/kexec.h            |  7 +++++
 kexec.c                    | 62 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 78 insertions(+)
 create mode 100644 arch/x86/testbuild/kexec
 create mode 100644 include/kexec.h
 create mode 100644 kexec.c

diff --git a/Config.mk b/Config.mk
index e493533a..e2afb1b4 100644
--- a/Config.mk
+++ b/Config.mk
@@ -204,6 +204,7 @@ CONFIG-n += CONFIG_LIBXENGUEST
 CONFIG-n += CONFIG_LIBXENTOOLCORE
 CONFIG-n += CONFIG_LIBXENTOOLLOG
 CONFIG-n += CONFIG_LIBXENMANAGE
+CONFIG-n += CONFIG_KEXEC
 # Setting CONFIG_USE_XEN_CONSOLE copies all print output to the Xen emergency
 # console apart of standard dom0 handled console.
 CONFIG-n += CONFIG_USE_XEN_CONSOLE
diff --git a/Makefile b/Makefile
index d094858a..a64913ad 100644
--- a/Makefile
+++ b/Makefile
@@ -51,6 +51,7 @@ src-y += gntmap.c
 src-y += gnttab.c
 src-y += hypervisor.c
 src-y += kernel.c
+src-$(CONFIG_KEXEC) += kexec.c
 src-y += lock.c
 src-y += main.c
 src-y += mm.c
diff --git a/arch/x86/testbuild/all-no b/arch/x86/testbuild/all-no
index 5b3e99ed..b2ee5ce8 100644
--- a/arch/x86/testbuild/all-no
+++ b/arch/x86/testbuild/all-no
@@ -18,3 +18,4 @@ CONFIG_LIBXS = n
 CONFIG_LWIP = n
 CONFIG_BALLOON = n
 CONFIG_USE_XEN_CONSOLE = n
+CONFIG_KEXEC = n
diff --git a/arch/x86/testbuild/all-yes b/arch/x86/testbuild/all-yes
index 8ae489a4..99ba75dd 100644
--- a/arch/x86/testbuild/all-yes
+++ b/arch/x86/testbuild/all-yes
@@ -19,3 +19,5 @@ CONFIG_BALLOON = y
 CONFIG_USE_XEN_CONSOLE = y
 # The following are special: they need support from outside
 CONFIG_LWIP = n
+# KEXEC only without PARAVIRT
+CONFIG_KEXEC = n
diff --git a/arch/x86/testbuild/kexec b/arch/x86/testbuild/kexec
new file mode 100644
index 00000000..ea17b4d9
--- /dev/null
+++ b/arch/x86/testbuild/kexec
@@ -0,0 +1,4 @@
+CONFIG_PARAVIRT = n
+CONFIG_BALLOON = y
+CONFIG_USE_XEN_CONSOLE = y
+CONFIG_KEXEC = y
diff --git a/include/kexec.h b/include/kexec.h
new file mode 100644
index 00000000..6fd96774
--- /dev/null
+++ b/include/kexec.h
@@ -0,0 +1,7 @@
+#ifndef _KEXEC_H
+#define _KEXEC_H
+
+int kexec(void *kernel, unsigned long kernel_size,
+          const char *cmdline);
+
+#endif /* _KEXEC_H */
diff --git a/kexec.c b/kexec.c
new file mode 100644
index 00000000..53528169
--- /dev/null
+++ b/kexec.c
@@ -0,0 +1,62 @@
+/******************************************************************************
+ * kexec.c
+ *
+ * Support of kexec (reboot locally into new mini-os kernel).
+ *
+ * Copyright (c) 2024, Juergen Gross, SUSE Linux GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifdef CONFIG_PARAVIRT
+#error "kexec support not implemented in PV variant"
+#endif
+
+#include <errno.h>
+#include <mini-os/os.h>
+#include <mini-os/lib.h>
+#include <mini-os/kexec.h>
+
+/*
+ * General approach for kexec support (PVH only) is as follows:
+ *
+ * - New kernel needs to be in memory in form of a ELF file in a virtual
+ *   memory region.
+ * - A new start_info structure is constructed in memory with the final
+ *   memory locations included.
+ * - All memory areas needed for kexec execution are being finalized.
+ * - From here on a graceful failure is no longer possible.
+ * - Grants and event channels are torn down.
+ * - A temporary set of page tables is constructed at a location where it
+ *   doesn't conflict with old and new kernel or start_info.
+ * - The final kexec execution stage is copied to a memory area below 4G which
+ *   doesn't conflict with the target areas of kernel etc.
+ * - Cr3 is switched to the new set of page tables.
+ * - Execution continues in the final execution stage.
+ * - All data is copied to its final addresses.
+ * - Processing is switched to 32-bit mode without address translation.
+ * - The new kernel is activated.
+ */
+
+int kexec(void *kernel, unsigned long kernel_size,
+          const char *cmdline)
+{
+    return ENOSYS;
+}
+EXPORT_SYMBOL(kexec);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 02/12] kexec: add final kexec stage
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
  2025-03-21  9:24 ` [MINI-OS PATCH 01/12] kexec: add kexec framework Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 16:40   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 03/12] add elf.h Juergen Gross
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Add the code and data definitions of the final kexec stage.

Put the code and related data into a dedicated section in order to be
able to copy it to another location. For this reason there must be no
absolute relocations being used in the code or data.

Being functionally related, add a function for adding a final kexec
action.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kexec.c          | 109 ++++++++++++++++++++++++++++++++++++++
 arch/x86/minios-x86.lds.S |   8 +++
 arch/x86/x86_hvm.S        |  46 ++++++++++++++++
 include/kexec.h           |  27 ++++++++++
 include/x86/os.h          |   5 ++
 kexec.c                   |  22 ++++++++
 6 files changed, 217 insertions(+)
 create mode 100644 arch/x86/kexec.c

diff --git a/arch/x86/kexec.c b/arch/x86/kexec.c
new file mode 100644
index 00000000..bf247797
--- /dev/null
+++ b/arch/x86/kexec.c
@@ -0,0 +1,109 @@
+/******************************************************************************
+ * kexec.c
+ *
+ * Support of kexec (reboot locally into new mini-os kernel).
+ *
+ * Copyright (c) 2024, Juergen Gross, SUSE Linux GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#ifdef CONFIG_KEXEC
+
+#include <mini-os/os.h>
+#include <mini-os/lib.h>
+#include <mini-os/kexec.h>
+
+/*
+ * Final stage of kexec. Copies all data to the final destinations, zeroes
+ * .bss and activates new kernel.
+ * Must be called with interrupts off. Stack, code and data must be
+ * accessible via identity mapped virtual addresses (virt == phys). Copying
+ * and zeroing is done using virtual addresses.
+ * No relocations inside the function are allowed, as it is copied to an
+ * allocated page before being executed.
+ */
+static void __attribute__((__section__(".text.kexec")))
+    kexec_final(struct kexec_action *actions, unsigned long real)
+{
+    char *src, *dest;
+    unsigned int a, cnt;
+
+    for ( a = 0; ; a++ )
+    {
+        switch ( actions[a].action )
+        {
+        case KEXEC_COPY:
+            dest = actions[a].dest;
+            src = actions[a].src;
+            for ( cnt = 0; cnt < actions[a].len; cnt++ )
+                *dest++ = *src++;
+            break;
+
+        case KEXEC_ZERO:
+            dest = actions[a].dest;
+            for ( cnt = 0; cnt < actions[a].len; cnt++ )
+                *dest++ = 0;
+            break;
+
+        case KEXEC_CALL:
+            asm("movl %0, %%ebx\n\t"
+                "movl %1, %%edi\n\t"
+                "jmp *%2"
+                :"=m" (actions[a].src), "=m" (actions[a].dest)
+                :"m" (real));
+            break;
+        }
+    }
+}
+
+#define KEXEC_STACK_LONGS  8
+static unsigned long __attribute__((__section__(".data.kexec")))
+    kexec_stack[KEXEC_STACK_LONGS];
+
+static unsigned long get_kexec_addr(void *kexec_page, void *addr)
+{
+    unsigned long off = (unsigned long)addr - (unsigned long)_kexec_start;
+
+    return (unsigned long)kexec_page + off;
+}
+
+void do_kexec(void *kexec_page)
+{
+    unsigned long actions;
+    unsigned long stack;
+    unsigned long final;
+    unsigned long phys;
+
+    actions = get_kexec_addr(kexec_page, kexec_actions);
+    stack = get_kexec_addr(kexec_page, kexec_stack + KEXEC_STACK_LONGS);
+    final = get_kexec_addr(kexec_page, kexec_final);
+    phys = get_kexec_addr(kexec_page, kexec_phys);
+
+    memcpy(kexec_page, _kexec_start, KEXEC_SECSIZE);
+    asm("cli\n\t"
+        "mov %0, %%"ASM_SP"\n\t"
+        "mov %1, %%"ASM_ARG1"\n\t"
+        "mov %2, %%"ASM_ARG2"\n\t"
+        "jmp *%3"
+        :"=m" (stack), "=m" (actions), "=m" (phys)
+        :"m" (final));
+}
+
+#endif /* CONFIG_KEXEC */
diff --git a/arch/x86/minios-x86.lds.S b/arch/x86/minios-x86.lds.S
index 8aae2fd6..83ec41ce 100644
--- a/arch/x86/minios-x86.lds.S
+++ b/arch/x86/minios-x86.lds.S
@@ -87,6 +87,14 @@ SECTIONS
 
         _edata = .;			/* End of data section */
 
+        . = ALIGN(8);
+        _kexec_start = .;		/* Kexec relocatable code/data */
+        .kexec : {
+                *(.text.kexec)
+                *(.data.kexec)
+        }
+        _kexec_end = .;
+
         __bss_start = .;		/* BSS */
         .bss : {
                 *(.bss)
diff --git a/arch/x86/x86_hvm.S b/arch/x86/x86_hvm.S
index 42a5f02e..e2f82e96 100644
--- a/arch/x86/x86_hvm.S
+++ b/arch/x86/x86_hvm.S
@@ -85,4 +85,50 @@ page_table_l2:
 #endif
         .align __PAGE_SIZE, 0
 
+#ifdef CONFIG_KEXEC
+.section .text.kexec, "ax", @progbits
+
+/*
+ * Switch off paging and call new OS for kexec.
+ * %ebx holds the physical address of the start_info structure
+ * %edi holds the physical address of the entry point to call
+ */
+.globl kexec_phys
+kexec_phys:
+        /* Set DS, ES, SS to 0...ffffffff. */
+        mov $(GDTE_DS32_DPL0 * 8), %eax
+        mov %eax, %ds
+        mov %eax, %es
+        mov %eax, %ss
+
+#ifdef __x86_64__
+        /* Switch to 32-bit mode. */
+        pushq $(GDTE_CS32_DPL0 * 8)
+        lea cs32_switch(%rip),%edx
+        push %rdx
+        lretq
+
+        .code32
+cs32_switch:
+#endif
+        /* Set %cr0 and %cr4 (disables paging). */
+        mov $X86_CR0_PE, %eax
+        mov %eax, %cr0
+        mov $0, %eax
+        mov %eax, %cr4
+#ifdef __x86_64__
+        /* Disable 64-bit mode. */
+        mov $MSR_EFER, %ecx
+        rdmsr
+        btr $_EFER_LME, %eax
+        wrmsr
+#endif
+
+        jmp *%edi
+
+#ifdef __x86_64__
+        .code64
+#endif
+#endif /* CONFIG_KEXEC */
+
 .text
diff --git a/include/kexec.h b/include/kexec.h
index 6fd96774..722be456 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -1,7 +1,34 @@
 #ifndef _KEXEC_H
 #define _KEXEC_H
 
+/* One element of kexec actions (last element must have action KEXEC_CALL): */
+struct kexec_action {
+    enum {
+        KEXEC_COPY,   /* Copy len bytes from src to dest. */
+        KEXEC_ZERO,   /* Zero len bytes at dest. */
+        KEXEC_CALL    /* Call dest with paging turned off, param is src. */
+    } action;
+    unsigned int len;
+    void *dest;
+    void *src;
+};
+
+#define KEXEC_MAX_ACTIONS  16
+
+extern char _kexec_start[], _kexec_end[];
+extern struct kexec_action kexec_actions[KEXEC_MAX_ACTIONS];
+
+int kexec_add_action(int action, void *dest, void *src, unsigned int len);
+
+#define KEXEC_SECSIZE ((unsigned long)_kexec_end - (unsigned long)_kexec_start)
+
 int kexec(void *kernel, unsigned long kernel_size,
           const char *cmdline);
 
+/* Initiate final kexec stage. */
+void do_kexec(void *kexec_page);
+
+/* Assembler code for switching off paging and passing execution to new OS. */
+void kexec_phys(void);
+
 #endif /* _KEXEC_H */
diff --git a/include/x86/os.h b/include/x86/os.h
index 0095be13..8a057d81 100644
--- a/include/x86/os.h
+++ b/include/x86/os.h
@@ -27,6 +27,7 @@
 #define MSR_EFER          0xc0000080
 #define _EFER_LME         8             /* Long mode enable */
 
+#define X86_CR0_PE        0x00000001    /* Protected mode enable */
 #define X86_CR0_WP        0x00010000    /* Write protect */
 #define X86_CR0_PG        0x80000000    /* Paging */
 #define X86_CR4_PAE       0x00000020    /* enable physical address extensions */
@@ -64,9 +65,13 @@
 #if defined(__i386__)
 #define __SZ    "l"
 #define __REG   "e"
+#define ASM_ARG1 "eax"
+#define ASM_ARG2 "edx"
 #else
 #define __SZ    "q"
 #define __REG   "r"
+#define ASM_ARG1 "rdi"
+#define ASM_ARG2 "rsi"
 #endif
 
 #define ASM_SP  __REG"sp"
diff --git a/kexec.c b/kexec.c
index 53528169..849a98e4 100644
--- a/kexec.c
+++ b/kexec.c
@@ -60,3 +60,25 @@ int kexec(void *kernel, unsigned long kernel_size,
     return ENOSYS;
 }
 EXPORT_SYMBOL(kexec);
+
+struct kexec_action __attribute__((__section__(".data.kexec")))
+    kexec_actions[KEXEC_MAX_ACTIONS];
+static unsigned int act_idx;
+
+int kexec_add_action(int action, void *dest, void *src, unsigned int len)
+{
+    struct kexec_action *act;
+
+    if ( act_idx == KEXEC_MAX_ACTIONS )
+        return -ENOSPC;
+
+    act = kexec_actions + act_idx;
+    act_idx++;
+
+    act->action = action;
+    act->len = len;
+    act->dest = dest;
+    act->src = src;
+
+    return 0;
+}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 03/12] add elf.h
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
  2025-03-21  9:24 ` [MINI-OS PATCH 01/12] kexec: add kexec framework Juergen Gross
  2025-03-21  9:24 ` [MINI-OS PATCH 02/12] kexec: add final kexec stage Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-03-21 13:51   ` Jan Beulich
  2025-03-21  9:24 ` [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec Juergen Gross
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Add some definitions for accessing an ELF file. Only the file header
and the program header are needed.

The main source for those are elfstructs.h and libelf.h from the Xen
tree. The license boiler plate of those files is being kept in the
resulting header file.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 include/elf.h | 340 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 340 insertions(+)
 create mode 100644 include/elf.h

diff --git a/include/elf.h b/include/elf.h
new file mode 100644
index 00000000..35a9c9fe
--- /dev/null
+++ b/include/elf.h
@@ -0,0 +1,340 @@
+#ifndef __ELF_H__
+#define __ELF_H__
+/*
+ * Copyright (c) 1995, 1996 Erik Theisen.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. The name of the author may not be used to endorse or promote products
+ *    derived from this software without specific prior written permission
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdbool.h>
+#include <mini-os/types.h>
+
+typedef uint32_t    Elf32_Addr;  /* Unsigned program address */
+typedef uint32_t    Elf32_Off;   /* Unsigned file offset */
+typedef uint16_t    Elf32_Half;  /* Unsigned medium integer */
+typedef uint32_t    Elf32_Word;  /* Unsigned large integer */
+
+typedef uint64_t    Elf64_Addr;
+typedef uint64_t    Elf64_Off;
+typedef uint16_t    Elf64_Half;
+typedef uint32_t    Elf64_Word;
+typedef uint64_t    Elf64_Xword;
+
+/* Unique build id string format when using --build-id. */
+#define NT_GNU_BUILD_ID 3
+
+/*
+ * e_ident[] identification indexes
+ * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html
+ */
+#define EI_MAG0        0         /* file ID */
+#define EI_MAG1        1         /* file ID */
+#define EI_MAG2        2         /* file ID */
+#define EI_MAG3        3         /* file ID */
+#define EI_CLASS       4         /* file class */
+#define EI_DATA        5         /* data encoding */
+#define EI_VERSION     6         /* ELF header version */
+#define EI_OSABI       7         /* OS/ABI ID */
+#define EI_ABIVERSION  8         /* ABI version */
+#define EI_PAD         9         /* start of pad bytes */
+#define EI_NIDENT     16         /* Size of e_ident[] */
+
+/* e_ident[] magic number */
+#define ELFMAG0        0x7f      /* e_ident[EI_MAG0] */
+#define ELFMAG1        'E'       /* e_ident[EI_MAG1] */
+#define ELFMAG2        'L'       /* e_ident[EI_MAG2] */
+#define ELFMAG3        'F'       /* e_ident[EI_MAG3] */
+#define ELFMAG         "\177ELF" /* magic */
+#define SELFMAG        4         /* size of magic */
+
+/* e_ident[] file class */
+#define ELFCLASSNONE   0         /* invalid */
+#define ELFCLASS32     1         /* 32-bit objs */
+#define ELFCLASS64     2         /* 64-bit objs */
+#define ELFCLASSNUM    3         /* number of classes */
+
+/* e_ident[] data encoding */
+#define ELFDATANONE    0         /* invalid */
+#define ELFDATA2LSB    1         /* Little-Endian */
+#define ELFDATA2MSB    2         /* Big-Endian */
+#define ELFDATANUM     3         /* number of data encode defines */
+
+/* e_ident[] Operating System/ABI */
+#define ELFOSABI_SYSV         0  /* UNIX System V ABI */
+#define ELFOSABI_NONE         0  /* Same as ELFOSABI_SYSV */
+#define ELFOSABI_HPUX         1  /* HP-UX operating system */
+#define ELFOSABI_NETBSD       2  /* NetBSD */
+#define ELFOSABI_LINUX        3  /* GNU/Linux */
+#define ELFOSABI_HURD         4  /* GNU/Hurd */
+#define ELFOSABI_86OPEN       5  /* 86Open common IA32 ABI */
+#define ELFOSABI_SOLARIS      6  /* Solaris */
+#define ELFOSABI_MONTEREY     7  /* Monterey */
+#define ELFOSABI_IRIX         8  /* IRIX */
+#define ELFOSABI_FREEBSD      9  /* FreeBSD */
+#define ELFOSABI_TRU64       10  /* TRU64 UNIX */
+#define ELFOSABI_MODESTO     11  /* Novell Modesto */
+#define ELFOSABI_OPENBSD     12  /* OpenBSD */
+#define ELFOSABI_ARM         97  /* ARM */
+#define ELFOSABI_STANDALONE 255  /* Standalone (embedded) application */
+
+/* e_ident */
+#define IS_ELF(ehdr) ((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \
+                      (ehdr).e_ident[EI_MAG1] == ELFMAG1 && \
+                      (ehdr).e_ident[EI_MAG2] == ELFMAG2 && \
+                      (ehdr).e_ident[EI_MAG3] == ELFMAG3)
+
+/* e_flags */
+#define EF_ARM_EABI_MASK    0xff000000
+#define EF_ARM_EABI_UNKNOWN 0x00000000
+#define EF_ARM_EABI_VER1    0x01000000
+#define EF_ARM_EABI_VER2    0x02000000
+#define EF_ARM_EABI_VER3    0x03000000
+#define EF_ARM_EABI_VER4    0x04000000
+#define EF_ARM_EABI_VER5    0x05000000
+
+/* ELF Header */
+typedef struct {
+    unsigned char e_ident[EI_NIDENT]; /* ELF Identification */
+    Elf32_Half    e_type;        /* object file type */
+    Elf32_Half    e_machine;     /* machine */
+    Elf32_Word    e_version;     /* object file version */
+    Elf32_Addr    e_entry;       /* virtual entry point */
+    Elf32_Off     e_phoff;       /* program header table offset */
+    Elf32_Off     e_shoff;       /* section header table offset */
+    Elf32_Word    e_flags;       /* processor-specific flags */
+    Elf32_Half    e_ehsize;      /* ELF header size */
+    Elf32_Half    e_phentsize;   /* program header entry size */
+    Elf32_Half    e_phnum;       /* number of program header entries */
+    Elf32_Half    e_shentsize;   /* section header entry size */
+    Elf32_Half    e_shnum;       /* number of section header entries */
+    Elf32_Half    e_shstrndx;    /* section header table's "section
+                                    header string table" entry offset */
+} Elf32_Ehdr;
+
+typedef struct {
+    unsigned char e_ident[EI_NIDENT]; /* Id bytes */
+    Elf64_Half    e_type;        /* file type */
+    Elf64_Half    e_machine;     /* machine type */
+    Elf64_Word    e_version;     /* version number */
+    Elf64_Addr    e_entry;       /* entry point */
+    Elf64_Off     e_phoff;       /* Program hdr offset */
+    Elf64_Off     e_shoff;       /* Section hdr offset */
+    Elf64_Word    e_flags;       /* Processor flags */
+    Elf64_Half    e_ehsize;      /* sizeof ehdr */
+    Elf64_Half    e_phentsize;   /* Program header entry size */
+    Elf64_Half    e_phnum;       /* Number of program headers */
+    Elf64_Half    e_shentsize;   /* Section header entry size */
+    Elf64_Half    e_shnum;       /* Number of section headers */
+    Elf64_Half    e_shstrndx;    /* String table index */
+} Elf64_Ehdr;
+
+/* e_type */
+#define ET_NONE      0           /* No file type */
+#define ET_REL       1           /* relocatable file */
+#define ET_EXEC      2           /* executable file */
+#define ET_DYN       3           /* shared object file */
+#define ET_CORE      4           /* core file */
+#define ET_NUM       5           /* number of types */
+#define ET_LOPROC    0xff00      /* reserved range for processor */
+#define ET_HIPROC    0xffff      /*   specific e_type */
+
+/* e_machine */
+#define EM_NONE      0           /* No Machine */
+#define EM_M32       1           /* AT&T WE 32100 */
+#define EM_SPARC     2           /* SPARC */
+#define EM_386       3           /* Intel 80386 */
+#define EM_68K       4           /* Motorola 68000 */
+#define EM_88K       5           /* Motorola 88000 */
+#define EM_486       6           /* Intel 80486 - unused? */
+#define EM_860       7           /* Intel 80860 */
+#define EM_MIPS      8           /* MIPS R3000 Big-Endian only */
+/*
+ * Don't know if EM_MIPS_RS4_BE,
+ * EM_SPARC64, EM_PARISC,
+ * or EM_PPC are ABI compliant
+ */
+#define EM_MIPS_RS4_BE 10        /* MIPS R4000 Big-Endian */
+#define EM_SPARC64     11        /* SPARC v9 64-bit unoffical */
+#define EM_PARISC      15        /* HPPA */
+#define EM_SPARC32PLUS 18        /* Enhanced instruction set SPARC */
+#define EM_PPC         20        /* PowerPC */
+#define EM_PPC64       21        /* PowerPC 64-bit */
+#define EM_ARM         40        /* Advanced RISC Machines ARM */
+#define EM_ALPHA       41        /* DEC ALPHA */
+#define EM_SPARCV9     43        /* SPARC version 9 */
+#define EM_ALPHA_EXP   0x9026    /* DEC ALPHA */
+#define EM_IA_64       50        /* Intel Merced */
+#define EM_X86_64      62        /* AMD x86-64 architecture */
+#define EM_VAX         75        /* DEC VAX */
+#define EM_AARCH64    183        /* ARM 64-bit */
+
+/* Version */
+#define EV_NONE      0           /* Invalid */
+#define EV_CURRENT   1           /* Current */
+#define EV_NUM       2           /* number of versions */
+
+/* Program Header */
+typedef struct {
+    Elf32_Word    p_type;        /* segment type */
+    Elf32_Off     p_offset;      /* segment offset */
+    Elf32_Addr    p_vaddr;       /* virtual address of segment */
+    Elf32_Addr    p_paddr;       /* physical address - ignored? */
+    Elf32_Word    p_filesz;      /* number of bytes in file for seg. */
+    Elf32_Word    p_memsz;       /* number of bytes in mem. for seg. */
+    Elf32_Word    p_flags;       /* flags */
+    Elf32_Word    p_align;       /* memory alignment */
+} Elf32_Phdr;
+
+typedef struct {
+    Elf64_Word    p_type;        /* entry type */
+    Elf64_Word    p_flags;       /* flags */
+    Elf64_Off     p_offset;      /* offset */
+    Elf64_Addr    p_vaddr;       /* virtual address */
+    Elf64_Addr    p_paddr;       /* physical address */
+    Elf64_Xword   p_filesz;      /* file size */
+    Elf64_Xword   p_memsz;       /* memory size */
+    Elf64_Xword   p_align;       /* memory & file alignment */
+} Elf64_Phdr;
+
+/* Segment types - p_type */
+#define PT_NULL      0           /* unused */
+#define PT_LOAD      1           /* loadable segment */
+#define PT_DYNAMIC   2           /* dynamic linking section */
+#define PT_INTERP    3           /* the RTLD */
+#define PT_NOTE      4           /* auxiliary information */
+#define PT_SHLIB     5           /* reserved - purpose undefined */
+#define PT_PHDR      6           /* program header */
+#define PT_NUM       7           /* Number of segment types */
+#define PT_LOPROC    0x70000000  /* reserved range for processor */
+#define PT_HIPROC    0x7fffffff  /*  specific segment types */
+
+/* Segment flags - p_flags */
+#define PF_X         0x1        /* Executable */
+#define PF_W         0x2        /* Writable */
+#define PF_R         0x4        /* Readable */
+#define PF_MASKPROC  0xf0000000 /* reserved bits for processor */
+                                /*  specific segment flags */
+
+/* Section Header */
+typedef struct {
+    Elf32_Word    sh_name;      /* name - index into section header
+                                   string table section */
+    Elf32_Word    sh_type;      /* type */
+    Elf32_Word    sh_flags;     /* flags */
+    Elf32_Addr    sh_addr;      /* address */
+    Elf32_Off     sh_offset;    /* file offset */
+    Elf32_Word    sh_size;      /* section size */
+    Elf32_Word    sh_link;      /* section header table index link */
+    Elf32_Word    sh_info;      /* extra information */
+    Elf32_Word    sh_addralign; /* address alignment */
+    Elf32_Word    sh_entsize;   /* section entry size */
+} Elf32_Shdr;
+
+typedef struct {
+    Elf64_Word    sh_name;      /* section name */
+    Elf64_Word    sh_type;      /* section type */
+    Elf64_Xword   sh_flags;     /* section flags */
+    Elf64_Addr    sh_addr;      /* virtual address */
+    Elf64_Off     sh_offset;    /* file offset */
+    Elf64_Xword   sh_size;      /* section size */
+    Elf64_Word    sh_link;      /* link to another */
+    Elf64_Word    sh_info;      /* misc info */
+    Elf64_Xword   sh_addralign; /* memory alignment */
+    Elf64_Xword   sh_entsize;   /* table entry size */
+} Elf64_Shdr;
+
+/* sh_type */
+#define SHT_NULL        0       /* inactive */
+#define SHT_PROGBITS    1       /* program defined information */
+#define SHT_SYMTAB      2       /* symbol table section */
+#define SHT_STRTAB      3       /* string table section */
+#define SHT_RELA        4       /* relocation section with addends*/
+#define SHT_HASH        5       /* symbol hash table section */
+#define SHT_DYNAMIC     6       /* dynamic section */
+#define SHT_NOTE        7       /* note section */
+#define SHT_NOBITS      8       /* no space section */
+#define SHT_REL         9       /* relation section without addends */
+#define SHT_SHLIB      10       /* reserved - purpose unknown */
+#define SHT_DYNSYM     11       /* dynamic symbol table section */
+#define SHT_NUM        12       /* number of section types */
+
+/* Note definitions */
+typedef struct {
+    Elf32_Word namesz;
+    Elf32_Word descsz;
+    Elf32_Word type;
+    char data[];
+} Elf32_Note;
+
+typedef struct {
+    Elf64_Word namesz;
+    Elf64_Word descsz;
+    Elf64_Word type;
+    char data[];
+} Elf64_Note;
+
+/* Abstraction layer for handling 32- and 64-bit ELF files. */
+
+typedef union {
+    Elf32_Ehdr e32;
+    Elf64_Ehdr e64;
+} elf_ehdr;
+
+static inline bool elf_is_32bit(elf_ehdr *ehdr)
+{
+    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS32;
+}
+
+static inline bool elf_is_64bit(elf_ehdr *ehdr)
+{
+    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS64;
+}
+
+#define ehdr_val(ehdr, elem) (elf_is_32bit(ehdr) ? (ehdr)->e32.elem : (ehdr)->e64.elem)
+
+typedef union {
+    Elf32_Phdr e32;
+    Elf64_Phdr e64;
+} elf_phdr;
+
+#define phdr_val(ehdr, phdr, elem) (elf_is_32bit(ehdr) ? (phdr)->e32.elem : (phdr)->e64.elem)
+
+typedef union {
+    Elf32_Shdr e32;
+    Elf64_Shdr e64;
+} elf_shdr;
+
+#define shdr_val(ehdr, shdr, elem) (elf_is_32bit(ehdr) ? (shdr)->e32.elem : (shdr)->e64.elem)
+
+typedef union {
+    Elf32_Note e32;
+    Elf64_Note e64;
+} elf_note;
+
+#define note_val(ehdr, note, elem) (elf_is_32bit(ehdr) ? (note)->e32.elem : (note)->e64.elem)
+
+static inline void *elf_ptr_add(void *ptr, unsigned long add)
+{
+    return (char *)ptr + add;
+}
+#endif /* __ELF_H__ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (2 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 03/12] add elf.h Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 16:41   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 05/12] kexec: finalize parameter location and size Juergen Gross
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Analyze the properties of the new kernel to be loaded by kexec. The
data needed is:

- upper boundary in final location
- copy and memory clear operations
- entry point and entry parameter

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kexec.c |  91 +++++++++++++++++++++++++++++++++++
 include/kexec.h  |  11 +++++
 kexec.c          | 120 ++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 220 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kexec.c b/arch/x86/kexec.c
index bf247797..2069f3c6 100644
--- a/arch/x86/kexec.c
+++ b/arch/x86/kexec.c
@@ -28,8 +28,15 @@
 
 #include <mini-os/os.h>
 #include <mini-os/lib.h>
+#include <mini-os/e820.h>
+#include <mini-os/err.h>
 #include <mini-os/kexec.h>
 
+#include <xen/elfnote.h>
+#include <xen/arch-x86/hvm/start_info.h>
+
+static unsigned long kernel_entry = ~0UL;
+
 /*
  * Final stage of kexec. Copies all data to the final destinations, zeroes
  * .bss and activates new kernel.
@@ -106,4 +113,88 @@ void do_kexec(void *kexec_page)
         :"m" (final));
 }
 
+bool kexec_chk_arch(elf_ehdr *ehdr)
+{
+    return ehdr->e32.e_machine == EM_386 || ehdr->e32.e_machine == EM_X86_64;
+}
+
+static unsigned int note_data_sz(unsigned int sz)
+{
+    return (sz + 3) & ~3;
+}
+
+static void check_notes_entry(elf_ehdr *ehdr, void *start, unsigned int len)
+{
+    elf_note *note = start;
+    unsigned int off, note_len, namesz, descsz;
+    char *val;
+
+    for ( off = 0; off < len; off += note_len )
+    {
+        namesz = note_data_sz(note_val(ehdr, note, namesz));
+        descsz = note_data_sz(note_val(ehdr, note, descsz));
+        val = note_val(ehdr, note, data);
+        note_len = val - (char *)note + namesz + descsz;
+
+        if ( !strncmp(val, "Xen", namesz) &&
+             note_val(ehdr, note, type) == XEN_ELFNOTE_PHYS32_ENTRY )
+        {
+            val += namesz;
+            switch ( note_val(ehdr, note, descsz) )
+            {
+            case 1:
+                kernel_entry = *(uint8_t *)val;
+                return;
+            case 2:
+                kernel_entry = *(uint16_t *)val;
+                return;
+            case 4:
+                kernel_entry = *(uint32_t *)val;
+                return;
+            case 8:
+                kernel_entry = *(uint64_t *)val;
+                return;
+            default:
+                break;
+            }
+        }
+
+        note = elf_ptr_add(note, note_len);
+    }
+}
+
+int kexec_arch_analyze_phdr(elf_ehdr *ehdr, elf_phdr *phdr)
+{
+    void *notes_start;
+    unsigned int notes_len;
+
+    if ( phdr_val(ehdr, phdr, p_type) != PT_NOTE || kernel_entry != ~0UL )
+        return 0;
+
+    notes_start = elf_ptr_add(ehdr, phdr_val(ehdr, phdr, p_offset));
+    notes_len = phdr_val(ehdr, phdr, p_filesz);
+    check_notes_entry(ehdr, notes_start, notes_len);
+
+    return 0;
+}
+
+int kexec_arch_analyze_shdr(elf_ehdr *ehdr, elf_shdr *shdr)
+{
+    void *notes_start;
+    unsigned int notes_len;
+
+    if ( shdr_val(ehdr, shdr, sh_type) != SHT_NOTE || kernel_entry != ~0UL )
+        return 0;
+
+    notes_start = elf_ptr_add(ehdr, shdr_val(ehdr, shdr, sh_offset));
+    notes_len = shdr_val(ehdr, shdr, sh_size);
+    check_notes_entry(ehdr, notes_start, notes_len);
+
+    return 0;
+}
+
+bool kexec_arch_need_analyze_shdrs(void)
+{
+    return kernel_entry == ~0UL;
+}
 #endif /* CONFIG_KEXEC */
diff --git a/include/kexec.h b/include/kexec.h
index 722be456..f54cbb90 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -1,5 +1,6 @@
 #ifndef _KEXEC_H
 #define _KEXEC_H
+#include <mini-os/elf.h>
 
 /* One element of kexec actions (last element must have action KEXEC_CALL): */
 struct kexec_action {
@@ -18,6 +19,8 @@ struct kexec_action {
 extern char _kexec_start[], _kexec_end[];
 extern struct kexec_action kexec_actions[KEXEC_MAX_ACTIONS];
 
+extern unsigned long kexec_last_addr;
+
 int kexec_add_action(int action, void *dest, void *src, unsigned int len);
 
 #define KEXEC_SECSIZE ((unsigned long)_kexec_end - (unsigned long)_kexec_start)
@@ -31,4 +34,12 @@ void do_kexec(void *kexec_page);
 /* Assembler code for switching off paging and passing execution to new OS. */
 void kexec_phys(void);
 
+/* Check kernel to match current architecture. */
+bool kexec_chk_arch(elf_ehdr *ehdr);
+
+/* Architecture specific ELF handling functions. */
+int kexec_arch_analyze_phdr(elf_ehdr *ehdr, elf_phdr *phdr);
+int kexec_arch_analyze_shdr(elf_ehdr *ehdr, elf_shdr *shdr);
+bool kexec_arch_need_analyze_shdrs(void);
+
 #endif /* _KEXEC_H */
diff --git a/kexec.c b/kexec.c
index 849a98e4..3ff4ea07 100644
--- a/kexec.c
+++ b/kexec.c
@@ -31,6 +31,9 @@
 #include <errno.h>
 #include <mini-os/os.h>
 #include <mini-os/lib.h>
+#include <mini-os/console.h>
+#include <mini-os/elf.h>
+#include <mini-os/err.h>
 #include <mini-os/kexec.h>
 
 /*
@@ -54,9 +57,122 @@
  * - The new kernel is activated.
  */
 
-int kexec(void *kernel, unsigned long kernel_size,
-          const char *cmdline)
+unsigned long kexec_last_addr;
+
+static int analyze_phdrs(elf_ehdr *ehdr)
+{
+    elf_phdr *phdr;
+    unsigned int n_hdr, i;
+    unsigned long paddr, offset, filesz, memsz;
+    int ret;
+
+    phdr = elf_ptr_add(ehdr, ehdr_val(ehdr, e_phoff));
+    n_hdr = ehdr_val(ehdr, e_phnum);
+    for ( i = 0; i < n_hdr; i++ )
+    {
+        ret = kexec_arch_analyze_phdr(ehdr, phdr);
+        if ( ret )
+            return ret;
+
+        if ( phdr_val(ehdr, phdr, p_type) == PT_LOAD &&
+             (phdr_val(ehdr, phdr, p_flags) & (PF_X | PF_W | PF_R)) )
+        {
+            paddr = phdr_val(ehdr, phdr, p_paddr);
+            offset = phdr_val(ehdr, phdr, p_offset);
+            filesz = phdr_val(ehdr, phdr, p_filesz);
+            memsz = phdr_val(ehdr, phdr, p_memsz);
+            if ( filesz > 0 )
+            {
+                ret = kexec_add_action(KEXEC_COPY, to_virt(paddr),
+                                       (char *)ehdr + offset, filesz);
+                if ( ret )
+                    return ret;
+            }
+            if ( memsz > filesz )
+            {
+                ret = kexec_add_action(KEXEC_ZERO, to_virt(paddr + filesz),
+                                       NULL, memsz - filesz);
+                if ( ret )
+                    return ret;
+            }
+            if ( paddr + memsz > kexec_last_addr )
+                kexec_last_addr = paddr + memsz;
+        }
+
+        phdr = elf_ptr_add(phdr, ehdr_val(ehdr, e_phentsize));
+    }
+
+    return 0;
+}
+
+static int analyze_shdrs(elf_ehdr *ehdr)
 {
+    elf_shdr *shdr;
+    unsigned int n_hdr, i;
+    int ret;
+
+    if ( !kexec_arch_need_analyze_shdrs() )
+        return 0;
+
+    shdr = elf_ptr_add(ehdr, ehdr_val(ehdr, e_shoff));
+    n_hdr = ehdr_val(ehdr, e_shnum);
+    for ( i = 0; i < n_hdr; i++ )
+    {
+        ret = kexec_arch_analyze_shdr(ehdr, shdr);
+        if ( ret )
+            return ret;
+
+        shdr = elf_ptr_add(shdr, ehdr_val(ehdr, e_shentsize));
+    }
+
+    return 0;
+}
+
+static int analyze_kernel(void *kernel, unsigned long size)
+{
+    elf_ehdr *ehdr = kernel;
+    int ret;
+
+    if ( !IS_ELF(ehdr->e32) )
+    {
+        printk("kexec: new kernel not an ELF file\n");
+        return ENOEXEC;
+    }
+    if ( ehdr->e32.e_ident[EI_DATA] != ELFDATA2LSB )
+    {
+        printk("kexec: ELF file of new kernel is big endian\n");
+        return ENOEXEC;
+    }
+    if ( !elf_is_32bit(ehdr) && !elf_is_64bit(ehdr) )
+    {
+        printk("kexec: ELF file of new kernel is neither 32 nor 64 bit\n");
+        return ENOEXEC;
+    }
+    if ( !kexec_chk_arch(ehdr) )
+    {
+        printk("kexec: ELF file of new kernel is not compatible with arch\n");
+        return ENOEXEC;
+    }
+
+    ret = analyze_phdrs(ehdr);
+    if ( ret )
+        return ret;
+
+    ret = analyze_shdrs(ehdr);
+    if ( ret )
+        return ret;
+
+    return 0;
+}
+
+int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
+{
+    int ret;
+
+    ret = analyze_kernel(kernel, kernel_size);
+    if ( ret )
+        return ret;
+
     return ENOSYS;
 }
 EXPORT_SYMBOL(kexec);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 05/12] kexec: finalize parameter location and size
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (3 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 16:43   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 06/12] kexec: reserve memory below boundary Juergen Gross
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Finalize the location and the size of the parameters for the new
kernel. This is needed in order to avoid allocating new memory in the
area occupied by the new kernel and parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kexec.c | 15 +++++++++++++++
 include/kexec.h  |  3 +++
 kexec.c          |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/arch/x86/kexec.c b/arch/x86/kexec.c
index 2069f3c6..98a478d3 100644
--- a/arch/x86/kexec.c
+++ b/arch/x86/kexec.c
@@ -197,4 +197,19 @@ bool kexec_arch_need_analyze_shdrs(void)
 {
     return kernel_entry == ~0UL;
 }
+
+static unsigned long kexec_param_loc;
+static unsigned int kexec_param_size;
+
+void kexec_set_param_loc(const char *cmdline)
+{
+    kexec_param_size = sizeof(struct hvm_start_info);
+    kexec_param_size += e820_entries * sizeof(struct hvm_memmap_table_entry);
+    kexec_param_size += strlen(cmdline) + 1;
+
+    kexec_last_addr = (kexec_last_addr + 7) & ~7UL;
+    kexec_param_loc = kexec_last_addr;
+    kexec_last_addr += kexec_param_size;
+    kexec_last_addr = round_pgup(kexec_last_addr);
+}
 #endif /* CONFIG_KEXEC */
diff --git a/include/kexec.h b/include/kexec.h
index f54cbb90..8a2b552f 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -42,4 +42,7 @@ int kexec_arch_analyze_phdr(elf_ehdr *ehdr, elf_phdr *phdr);
 int kexec_arch_analyze_shdr(elf_ehdr *ehdr, elf_shdr *shdr);
 bool kexec_arch_need_analyze_shdrs(void);
 
+/* Finalize parameter location and size. */
+void kexec_set_param_loc(const char *cmdline);
+
 #endif /* _KEXEC_H */
diff --git a/kexec.c b/kexec.c
index 3ff4ea07..7e559994 100644
--- a/kexec.c
+++ b/kexec.c
@@ -173,6 +173,8 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
     if ( ret )
         return ret;
 
+    kexec_set_param_loc(cmdline);
+
     return ENOSYS;
 }
 EXPORT_SYMBOL(kexec);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 06/12] kexec: reserve memory below boundary
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (4 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 05/12] kexec: finalize parameter location and size Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 16:56   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 07/12] kexec: build parameters for new kernel Juergen Gross
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

In order to support kexec any memory used during copying the new
kernel to its final destination must not overlap with the destination
area.

In order to achieve that add a new interface allowing to mark all
allocatable memory below a specific physical address as not available.

This is done by walking through all chunks of the buddy allocator and
removing the chunks (or chunk parts) below the boundary. The removed
chunks are put into a list in order to be able to undo the operation
in case kexec is failing before doing any unrecoverable system
modifications.

Any pages freed located below the boundary need to go directly into
the list of reserved pages instead of the free pool.

Call the new function from kexec code.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 include/mm.h |  5 +++
 kexec.c      |  5 +++
 mm.c         | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/include/mm.h b/include/mm.h
index 4fc364ff..5775c3e1 100644
--- a/include/mm.h
+++ b/include/mm.h
@@ -57,6 +57,11 @@ unsigned long alloc_pages(int order);
 void free_pages(void *pointer, int order);
 #define free_page(p)    free_pages(p, 0)
 
+#ifdef CONFIG_KEXEC
+void reserve_memory_below(unsigned long boundary);
+void unreserve_memory_below(void);
+#endif
+
 static __inline__ int get_order(unsigned long size)
 {
     int order;
diff --git a/kexec.c b/kexec.c
index 7e559994..68457711 100644
--- a/kexec.c
+++ b/kexec.c
@@ -175,6 +175,11 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
 
     kexec_set_param_loc(cmdline);
 
+    reserve_memory_below(kexec_last_addr);
+
+    /* Error exit. */
+    unreserve_memory_below();
+
     return ENOSYS;
 }
 EXPORT_SYMBOL(kexec);
diff --git a/mm.c b/mm.c
index a5d3f5e5..9236db58 100644
--- a/mm.c
+++ b/mm.c
@@ -230,6 +230,84 @@ static void init_page_allocator(unsigned long min, unsigned long max)
     mm_alloc_bitmap_remap();
 }
 
+#ifdef CONFIG_KEXEC
+static chunk_head_t *reserved_chunks;
+static unsigned long boundary_pfn;
+
+static void free_pages_below(void *pointer, unsigned int order)
+{
+    unsigned long pfn_s, pfn;
+    chunk_head_t *ch = pointer;
+
+    pfn_s = virt_to_pfn(ch);
+
+    if ( pfn_s + (1UL << order) <= boundary_pfn )
+    {
+        /* Put complete chunk into reserved list. */
+        ch->level = order;
+        ch->next = reserved_chunks;
+        reserved_chunks = ch;
+    }
+    else
+    {
+        /* Put pages below boundary into reserved list. */
+        for ( pfn = pfn_s; pfn < boundary_pfn; pfn++ )
+        {
+            chunk_head_t *ch_res = pfn_to_virt(pfn);
+
+            ch_res->level = 0;
+            ch_res->next = reserved_chunks;
+            reserved_chunks = ch_res;
+        }
+
+        /* Return pages above boundary to free pool again. */
+        for ( ; pfn < pfn_s + (1UL << order); pfn++ )
+            free_pages(pfn_to_virt(pfn), 0);
+    }
+}
+
+void reserve_memory_below(unsigned long boundary)
+{
+    unsigned long pfn;
+    unsigned int order;
+    chunk_head_t *ch;
+
+    ASSERT(!boundary_pfn);
+    boundary_pfn = PHYS_PFN(boundary);
+
+    for ( order = 0; order < FREELIST_SIZE; order++ )
+    {
+        for ( ch = free_list[order].next; !FREELIST_EMPTY(ch); ch = ch->next )
+        {
+            pfn = virt_to_pfn(ch);
+            if ( pfn >= boundary_pfn )
+                continue;
+
+            /* Dequeue from this level, at least parts will be reserved. */
+            dequeue_elem(ch);
+            /* Mark all as allocated, pieces above boundary will be returned. */
+            map_alloc(pfn, 1UL << ch->level);
+
+            free_pages_below(ch, ch->level);
+        }
+    }
+}
+
+void unreserve_memory_below(void)
+{
+    chunk_head_t *ch;
+
+    boundary_pfn = 0;
+
+    while ( reserved_chunks )
+    {
+        ch = reserved_chunks;
+        reserved_chunks = ch->next;
+        free_pages(ch, ch->level);
+    }
+}
+#endif /* CONFIG_KEXEC */
+
 /* Allocate 2^@order contiguous pages. Returns a VIRTUAL address. */
 unsigned long alloc_pages(int order)
 {
@@ -279,10 +357,19 @@ EXPORT_SYMBOL(alloc_pages);
 void free_pages(void *pointer, int order)
 {
     chunk_head_t *freed_ch, *to_merge_ch;
+    unsigned long pfn = virt_to_pfn(pointer);
     unsigned long mask;
 
+#ifdef CONFIG_KEXEC
+    if ( pfn < boundary_pfn )
+    {
+        free_pages_below(pointer, order);
+        return;
+    }
+#endif
+
     /* First free the chunk */
-    map_free(virt_to_pfn(pointer), 1UL << order);
+    map_free(pfn, 1UL << order);
 
     /* Create free chunk */
     freed_ch = (chunk_head_t *)pointer;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 07/12] kexec: build parameters for new kernel
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (5 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 06/12] kexec: reserve memory below boundary Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:02   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 08/12] kexec: move used pages away " Juergen Gross
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Build the parameters for the new kernel, consisting of the
hvm_start_info struct, the memory map and the command line.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kexec.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/kexec.h  |  4 ++++
 kexec.c          | 13 ++++++++++-
 3 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kexec.c b/arch/x86/kexec.c
index 98a478d3..6fc7d02d 100644
--- a/arch/x86/kexec.c
+++ b/arch/x86/kexec.c
@@ -200,6 +200,7 @@ bool kexec_arch_need_analyze_shdrs(void)
 
 static unsigned long kexec_param_loc;
 static unsigned int kexec_param_size;
+static unsigned long kexec_param_mem;
 
 void kexec_set_param_loc(const char *cmdline)
 {
@@ -212,4 +213,61 @@ void kexec_set_param_loc(const char *cmdline)
     kexec_last_addr += kexec_param_size;
     kexec_last_addr = round_pgup(kexec_last_addr);
 }
+
+int kexec_get_entry(const char *cmdline)
+{
+    struct hvm_start_info *info;
+    struct hvm_memmap_table_entry *mmap;
+    unsigned int order;
+    unsigned int i;
+
+    if ( kernel_entry == ~0UL )
+        return ENOEXEC;
+
+    order = get_order(kexec_param_size);
+
+    kexec_param_mem = alloc_pages(order);
+    if ( !kexec_param_mem )
+        return ENOMEM;
+
+    info = (struct hvm_start_info *)kexec_param_mem;
+    memset(info, 0, sizeof(*info));
+    info->magic = XEN_HVM_START_MAGIC_VALUE;
+    info->version = 1;
+    info->cmdline_paddr = kexec_param_mem + sizeof(*info) +
+                          e820_entries * sizeof(struct hvm_memmap_table_entry);
+    info->memmap_paddr = kexec_param_mem + sizeof(*info);
+    info->memmap_entries = e820_entries;
+
+    mmap = (struct hvm_memmap_table_entry *)(info + 1);
+    for ( i = 0; i < e820_entries; i++ )
+    {
+        mmap->addr = e820_map[i].addr;
+        mmap->size = e820_map[i].size;
+        mmap->type = e820_map[i].type;
+        mmap++;
+    }
+
+    strcpy((char *)mmap, cmdline);
+
+    if ( kexec_add_action(KEXEC_COPY, to_virt(kexec_param_loc), info,
+                          kexec_param_size) )
+        return ENOSPC;
+
+    /* The call of the new kernel happens via the physical address! */
+    if ( kexec_add_action(KEXEC_CALL, (void *)kernel_entry,
+                          (void *)kexec_param_loc, 0) )
+        return ENOSPC;
+
+    return 0;
+}
+
+void kexec_get_entry_undo(void)
+{
+    if ( kexec_param_mem )
+    {
+        free_pages((void *)kexec_param_mem, get_order(kexec_param_size));
+        kexec_param_mem = 0;
+    }
+}
 #endif /* CONFIG_KEXEC */
diff --git a/include/kexec.h b/include/kexec.h
index 8a2b552f..7b103dea 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -45,4 +45,8 @@ bool kexec_arch_need_analyze_shdrs(void);
 /* Finalize parameter location and size. */
 void kexec_set_param_loc(const char *cmdline);
 
+/* Get entry point and parameter of new kernel. */
+int kexec_get_entry(const char *cmdline);
+void kexec_get_entry_undo(void);
+
 #endif /* _KEXEC_H */
diff --git a/kexec.c b/kexec.c
index 68457711..0ef8eb35 100644
--- a/kexec.c
+++ b/kexec.c
@@ -177,10 +177,21 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
 
     reserve_memory_below(kexec_last_addr);
 
+    ret = kexec_get_entry(cmdline);
+    if ( ret )
+    {
+        printk("kexec: ELF file of new kernel has no valid entry point\n");
+        goto err;
+    }
+
     /* Error exit. */
+    ret = ENOSYS;
+
+ err:
     unreserve_memory_below();
+    kexec_get_entry_undo();
 
-    return ENOSYS;
+    return ret;
 }
 EXPORT_SYMBOL(kexec);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 08/12] kexec: move used pages away for new kernel
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (6 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 07/12] kexec: build parameters for new kernel Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:19   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly() Juergen Gross
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Copying the new kexec kernel must not overwrite any pages still needed
during this process. Those are especially the GDT, IDT and page tables.

Move those to new allocated pages and update any related pointers.

In case the kexec process is cancelled later, don't undo any page table
moves, as the system can just be used with the new layout. By freeing
the original pages there is no memory leaking.

GDT and IDT should be reverted to their original locations, as their
original memory can't be freed due to not being whole pages.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/mm.c   | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/kexec.h |   5 ++
 kexec.c         |   6 +++
 3 files changed, 137 insertions(+)

diff --git a/arch/x86/mm.c b/arch/x86/mm.c
index 3ba6d917..a71eb192 100644
--- a/arch/x86/mm.c
+++ b/arch/x86/mm.c
@@ -42,6 +42,7 @@
 #include <mini-os/paravirt.h>
 #include <mini-os/types.h>
 #include <mini-os/lib.h>
+#include <mini-os/kexec.h>
 #include <mini-os/xmalloc.h>
 #include <mini-os/e820.h>
 #include <xen/memory.h>
@@ -923,3 +924,128 @@ unsigned long map_frame_virt(unsigned long mfn)
     return addr;
 }
 EXPORT_SYMBOL(map_frame_virt);
+
+#ifdef CONFIG_KEXEC
+static unsigned long kexec_gdt;
+static unsigned long kexec_idt;
+
+static int move_pt(unsigned long va, unsigned int lvl, bool is_leaf,
+                   pgentry_t *pte, void *par)
+{
+    unsigned long boundary_pfn = *(unsigned long *)par;
+    unsigned long pfn;
+    void *old_pg, *new_pg;
+
+    if ( is_leaf )
+        return 0;
+
+    pfn = (lvl == PAGETABLE_LEVELS + 1) ? PHYS_PFN(*(unsigned long *)pte)
+                                        : pte_to_mfn(*pte);
+    if ( pfn >= boundary_pfn )
+        return 0;
+
+    new_pg = (void *)alloc_page();
+    if ( !new_pg )
+        return ENOMEM;
+    old_pg = pfn_to_virt(pfn);
+    memcpy(new_pg, old_pg, PAGE_SIZE);
+    if ( lvl == PAGETABLE_LEVELS + 1 )
+        *(pgentry_t **)pte = new_pg;
+    else
+        *pte = ((unsigned long)new_pg & PAGE_MASK) | ptdata[lvl].prot;
+
+    tlb_flush();
+
+    free_page(old_pg);
+
+    return 0;
+}
+
+static int move_leaf(unsigned long va, unsigned int lvl, bool is_leaf,
+                     pgentry_t *pte, void *par)
+{
+    unsigned long boundary_pfn = *(unsigned long *)par;
+    unsigned long pfn;
+    void *old_pg, *new_pg;
+
+    if ( !is_leaf )
+        return 0;
+
+    /* No large page support, all pages must be valid. */
+    if ( (*pte & _PAGE_PSE) || !(*pte & _PAGE_PRESENT) )
+        return EINVAL;
+
+    pfn = pte_to_mfn(*pte);
+    if ( pfn >= boundary_pfn )
+        return 0;
+
+    new_pg = (void *)alloc_page();
+    if ( !new_pg )
+        return ENOMEM;
+    old_pg = pfn_to_virt(pfn);
+    memcpy(new_pg, old_pg, PAGE_SIZE);
+    *pte = ((unsigned long)new_pg & PAGE_MASK) | ptdata[lvl].prot;
+
+    invlpg(va);
+
+    free_page(old_pg);
+
+    return 0;
+}
+
+int kexec_move_used_pages(unsigned long boundary, unsigned long kernel,
+                          unsigned long kernel_size)
+{
+    int ret;
+    unsigned long boundary_pfn = PHYS_PFN(boundary);
+
+    kexec_gdt = alloc_page();
+    if ( !kexec_gdt )
+        return ENOMEM;
+    memcpy((char *)kexec_gdt, &gdt, sizeof(gdt));
+    gdt_ptr.base = kexec_gdt;
+    asm volatile("lgdt %0" : : "m" (gdt_ptr));
+
+    kexec_idt = alloc_page();
+    if ( !kexec_idt )
+        return ENOMEM;
+    memcpy((char *)kexec_idt, &idt, sizeof(idt));
+    idt_ptr.base = kexec_idt;
+    asm volatile("lidt %0" : : "m" (idt_ptr));
+
+    /* Top level page table needs special handling. */
+    ret = move_pt(0, PAGETABLE_LEVELS + 1, false, (pgentry_t *)(&pt_base),
+                  &boundary_pfn);
+    if ( ret )
+        return ret;
+    ret = walk_pt(0, ~0UL, move_pt, &boundary_pfn);
+    if ( ret )
+        return ret;
+
+    /* Move new kernel image pages. */
+    ret = walk_pt(kernel, kernel + kernel_size - 1, move_leaf, &boundary_pfn);
+    if ( ret )
+        return ret;
+
+    return 0;
+}
+
+void kexec_move_used_pages_undo(void)
+{
+    if ( kexec_gdt )
+    {
+        gdt_ptr.base = (unsigned long)&gdt;
+        asm volatile("lgdt %0" : : "m" (gdt_ptr));
+        free_page((void *)kexec_gdt);
+        kexec_gdt = 0;
+    }
+
+    if ( kexec_idt )
+    {
+        idt_ptr.base = (unsigned long)&idt;
+        asm volatile("lidt %0" : : "m" (idt_ptr));
+        free_page((void *)kexec_idt);
+        kexec_idt = 0;
+    }
+}
+#endif
diff --git a/include/kexec.h b/include/kexec.h
index 7b103dea..411fa013 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -49,4 +49,9 @@ void kexec_set_param_loc(const char *cmdline);
 int kexec_get_entry(const char *cmdline);
 void kexec_get_entry_undo(void);
 
+/* Move used pages away from new kernel area. */
+int kexec_move_used_pages(unsigned long boundary, unsigned long kernel,
+                          unsigned long kernel_size);
+void kexec_move_used_pages_undo(void);
+
 #endif /* _KEXEC_H */
diff --git a/kexec.c b/kexec.c
index 0ef8eb35..16a0030a 100644
--- a/kexec.c
+++ b/kexec.c
@@ -184,11 +184,17 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
         goto err;
     }
 
+    ret = kexec_move_used_pages(kexec_last_addr, (unsigned long)kernel,
+                                kernel_size);
+    if ( ret )
+        goto err;
+
     /* Error exit. */
     ret = ENOSYS;
 
  err:
     unreserve_memory_below();
+    kexec_move_used_pages_undo();
     kexec_get_entry_undo();
 
     return ret;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly()
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (7 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 08/12] kexec: move used pages away " Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:25   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again Juergen Gross
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Rename set_readonly() to change_readonly() and add a parameter
specifying whether it should set the kernel to readonly or to writable.
At the same time move the boundary setting from the only caller into
the function itself, avoiding the need to use the same boundaries in
future, when it will be called to set the kernel to writable again.
Make the function globally visible in order to allow calling it from
kexec coding later.

Merge clear_bootstrap() into change_readonly() and undo its setting of
page 0 to invalid when setting the kernel writable.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/mm.c | 111 +++++++++++++++++++++++++++-----------------------
 include/mm.h  |   3 ++
 2 files changed, 64 insertions(+), 50 deletions(-)

diff --git a/arch/x86/mm.c b/arch/x86/mm.c
index a71eb192..f4419d95 100644
--- a/arch/x86/mm.c
+++ b/arch/x86/mm.c
@@ -405,17 +405,19 @@ static void build_pagetable(unsigned long *start_pfn, unsigned long *max_pfn)
  */
 extern struct shared_info shared_info;
 
-struct set_readonly_par {
+struct change_readonly_par {
     unsigned long etext;
 #ifdef CONFIG_PARAVIRT
     unsigned int count;
 #endif
+    bool readonly;
 };
 
-static int set_readonly_func(unsigned long va, unsigned int lvl, bool is_leaf,
-                             pgentry_t *pte, void *par)
+static int change_readonly_func(unsigned long va, unsigned int lvl,
+                                bool is_leaf, pgentry_t *pte, void *par)
 {
-    struct set_readonly_par *ro = par;
+    struct change_readonly_par *ro = par;
+    pgentry_t newval;
 
     if ( !is_leaf )
         return 0;
@@ -429,9 +431,11 @@ static int set_readonly_func(unsigned long va, unsigned int lvl, bool is_leaf,
         return 0;
     }
 
+    newval = ro->readonly ? (*pte & ~_PAGE_RW) : (*pte | _PAGE_RW);
+
 #ifdef CONFIG_PARAVIRT
     mmu_updates[ro->count].ptr = virt_to_mach(pte);
-    mmu_updates[ro->count].val = *pte & ~_PAGE_RW;
+    mmu_updates[ro->count].val = newval;
     ro->count++;
 
     if ( ro->count == L1_PAGETABLE_ENTRIES )
@@ -442,7 +446,7 @@ static int set_readonly_func(unsigned long va, unsigned int lvl, bool is_leaf,
          ro->count = 0;
     }
 #else
-    *pte &= ~_PAGE_RW;
+    *pte = newval;
 #endif
 
     return 0;
@@ -463,23 +467,6 @@ static void tlb_flush(void)
 }
 #endif
 
-static void set_readonly(void *text, void *etext)
-{
-    struct set_readonly_par setro = { .etext = (unsigned long)etext };
-    unsigned long start_address = PAGE_ALIGN((unsigned long)text);
-
-    printk("setting %p-%p readonly\n", text, etext);
-    walk_pt(start_address, setro.etext, set_readonly_func, &setro);
-
-#ifdef CONFIG_PARAVIRT
-    if ( setro.count &&
-         HYPERVISOR_mmu_update(mmu_updates, setro.count, NULL, DOMID_SELF) < 0)
-        BUG();
-#endif
-
-    tlb_flush();
-}
-
 /*
  * get the PTE for virtual address va if it exists. Otherwise NULL.
  */
@@ -508,6 +495,51 @@ static pgentry_t *get_pgt(unsigned long va)
     return tab;
 }
 
+void change_readonly(bool readonly)
+{
+    struct change_readonly_par ro = {
+        .etext = (unsigned long)&_erodata,
+        .readonly = readonly,
+    };
+    unsigned long start_address = PAGE_ALIGN((unsigned long)&_text);
+#ifdef CONFIG_PARAVIRT
+    pte_t nullpte = { };
+    int rc;
+#else
+    pgentry_t *pgt = get_pgt((unsigned long)&_text);
+#endif
+
+    if ( readonly )
+    {
+#ifdef CONFIG_PARAVIRT
+        if ( (rc = HYPERVISOR_update_va_mapping(0, nullpte, UVMF_INVLPG)) )
+            printk("Unable to unmap NULL page. rc=%d\n", rc);
+#else
+        *pgt = 0;
+        invlpg((unsigned long)&_text);
+#endif
+    }
+    else
+    {
+#ifdef CONFIG_PARAVIRT
+        /* No kexec support with PARAVIRT. */
+        BUG();
+#else
+        *pgt = L1_PROT;
+#endif
+    }
+
+    printk("setting %p-%p readonly\n", &_text, &_erodata);
+    walk_pt(start_address, ro.etext, change_readonly_func, &ro);
+
+#ifdef CONFIG_PARAVIRT
+    if ( ro.count &&
+         HYPERVISOR_mmu_update(mmu_updates, ro.count, NULL, DOMID_SELF) < 0)
+        BUG();
+#endif
+
+    tlb_flush();
+}
 
 /*
  * return a valid PTE for a given virtual address. If PTE does not exist,
@@ -789,31 +821,6 @@ int unmap_frames(unsigned long va, unsigned long num_frames)
 }
 EXPORT_SYMBOL(unmap_frames);
 
-/*
- * Clear some of the bootstrap memory
- */
-static void clear_bootstrap(void)
-{
-#ifdef CONFIG_PARAVIRT
-    pte_t nullpte = { };
-    int rc;
-#else
-    pgentry_t *pgt;
-#endif
-
-    /* Use first page as the CoW zero page */
-    memset(&_text, 0, PAGE_SIZE);
-    mfn_zero = virt_to_mfn((unsigned long) &_text);
-#ifdef CONFIG_PARAVIRT
-    if ( (rc = HYPERVISOR_update_va_mapping(0, nullpte, UVMF_INVLPG)) )
-        printk("Unable to unmap NULL page. rc=%d\n", rc);
-#else
-    pgt = get_pgt((unsigned long)&_text);
-    *pgt = 0;
-    invlpg((unsigned long)&_text);
-#endif
-}
-
 #ifdef CONFIG_PARAVIRT
 void p2m_chk_pfn(unsigned long pfn)
 {
@@ -884,8 +891,12 @@ void arch_init_mm(unsigned long* start_pfn_p, unsigned long* max_pfn_p)
     printk("    max_pfn: %lx\n", max_pfn);
 
     build_pagetable(&start_pfn, &max_pfn);
-    clear_bootstrap();
-    set_readonly(&_text, &_erodata);
+
+    /* Prepare page 0 as CoW page. */
+    memset(&_text, 0, PAGE_SIZE);
+    mfn_zero = virt_to_mfn((unsigned long)&_text);
+
+    change_readonly(true);
 
     *start_pfn_p = start_pfn;
     *max_pfn_p = max_pfn;
diff --git a/include/mm.h b/include/mm.h
index 5775c3e1..0a16d56c 100644
--- a/include/mm.h
+++ b/include/mm.h
@@ -25,6 +25,7 @@
 #ifndef _MM_H_
 #define _MM_H_
 
+#include <stdbool.h>
 #if defined(__i386__)
 #include <xen/arch-x86_32.h>
 #elif defined(__x86_64__)
@@ -92,4 +93,6 @@ extern unsigned long heap, brk, heap_mapped, heap_end;
 int free_physical_pages(xen_pfn_t *mfns, int n);
 void fini_mm(void);
 
+void change_readonly(bool readonly);
+
 #endif /* _MM_H_ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (8 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly() Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:26   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 11/12] kexec: add kexec callback functionality Juergen Gross
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

In order to allow writing the new kernel, make the readonly area
covering current kernel text writable again.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 kexec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kexec.c b/kexec.c
index 16a0030a..2992f58f 100644
--- a/kexec.c
+++ b/kexec.c
@@ -184,6 +184,8 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
         goto err;
     }
 
+    change_readonly(false);
+
     ret = kexec_move_used_pages(kexec_last_addr, (unsigned long)kernel,
                                 kernel_size);
     if ( ret )
@@ -193,6 +195,7 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
     ret = ENOSYS;
 
  err:
+    change_readonly(true);
     unreserve_memory_below();
     kexec_move_used_pages_undo();
     kexec_get_entry_undo();
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 11/12] kexec: add kexec callback functionality
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (9 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:34   ` Jason Andryuk
  2025-03-21  9:24 ` [MINI-OS PATCH 12/12] kexec: do the final kexec step Juergen Gross
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

Add a kexec_call() macro which will provide the capability to register
a function for being called when doing a kexec() call. The called
functions will be called with a boolean parameter "undo" indicating
whether a previous call needs to be undone due to a failure during
kexec().

The related loop to call all callbacks is added to kexec().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/minios-x86.lds.S |  8 ++++++++
 arch/x86/mm.c             |  3 ++-
 include/kexec.h           |  6 ++++++
 kexec.c                   | 13 +++++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/minios-x86.lds.S b/arch/x86/minios-x86.lds.S
index 83ec41ce..35356b34 100644
--- a/arch/x86/minios-x86.lds.S
+++ b/arch/x86/minios-x86.lds.S
@@ -58,6 +58,14 @@ SECTIONS
         }
         PROVIDE (__fini_array_end = .);
 
+#if defined(CONFIG_KEXEC)
+        PROVIDE (__kexec_array_start = .);
+        .kexec_array : {
+                *(.kexec_array)
+        }
+        PROVIDE (__kexec_array_end = .);
+#endif
+
         .ctors : {
                 __CTOR_LIST__ = .;
                 *(.ctors)
diff --git a/arch/x86/mm.c b/arch/x86/mm.c
index f4419d95..26ede6f4 100644
--- a/arch/x86/mm.c
+++ b/arch/x86/mm.c
@@ -529,7 +529,8 @@ void change_readonly(bool readonly)
 #endif
     }
 
-    printk("setting %p-%p readonly\n", &_text, &_erodata);
+    printk("setting %p-%p %s\n", &_text, &_erodata,
+           readonly ? "readonly" : "writable");
     walk_pt(start_address, ro.etext, change_readonly_func, &ro);
 
 #ifdef CONFIG_PARAVIRT
diff --git a/include/kexec.h b/include/kexec.h
index 411fa013..b89c3000 100644
--- a/include/kexec.h
+++ b/include/kexec.h
@@ -18,6 +18,12 @@ struct kexec_action {
 
 extern char _kexec_start[], _kexec_end[];
 extern struct kexec_action kexec_actions[KEXEC_MAX_ACTIONS];
+extern unsigned long __kexec_array_start[], __kexec_array_end[];
+
+typedef int(*kexeccall_t)(bool undo);
+#define kexec_call(func)                                                   \
+    static kexeccall_t __kexeccall_##func __attribute__((__used__))        \
+                       __attribute__((__section__(".kexec_array"))) = func
 
 extern unsigned long kexec_last_addr;
 
diff --git a/kexec.c b/kexec.c
index 2992f58f..2db876e8 100644
--- a/kexec.c
+++ b/kexec.c
@@ -168,6 +168,7 @@ static int analyze_kernel(void *kernel, unsigned long size)
 int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
 {
     int ret;
+    unsigned long *func;
 
     ret = analyze_kernel(kernel, kernel_size);
     if ( ret )
@@ -191,6 +192,18 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
     if ( ret )
         goto err;
 
+    for ( func = __kexec_array_start; func < __kexec_array_end; func++ )
+    {
+        ret = ((kexeccall_t)(*func))(false);
+        if ( ret )
+        {
+            for ( func--; func >= __kexec_array_start; func-- )
+                ((kexeccall_t)(*func))(true);
+
+            goto err;
+        }
+    }
+
     /* Error exit. */
     ret = ENOSYS;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [MINI-OS PATCH 12/12] kexec: do the final kexec step
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (10 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 11/12] kexec: add kexec callback functionality Juergen Gross
@ 2025-03-21  9:24 ` Juergen Gross
  2025-06-14 17:39   ` Jason Andryuk
  2025-03-22 23:54 ` [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Samuel Thibault
  2025-05-07 12:58 ` Juergen Gross
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-03-21  9:24 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault, Juergen Gross

With all kexec preparations done, activate the new kernel.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 kexec.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kexec.c b/kexec.c
index 2db876e8..85b09959 100644
--- a/kexec.c
+++ b/kexec.c
@@ -169,6 +169,7 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
 {
     int ret;
     unsigned long *func;
+    void *kexec_page;
 
     ret = analyze_kernel(kernel, kernel_size);
     if ( ret )
@@ -192,6 +193,13 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
     if ( ret )
         goto err;
 
+    kexec_page = (void *)alloc_page();
+    if ( !kexec_page )
+    {
+        ret = ENOMEM;
+        goto err;
+    }
+
     for ( func = __kexec_array_start; func < __kexec_array_end; func++ )
     {
         ret = ((kexeccall_t)(*func))(false);
@@ -204,10 +212,15 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
         }
     }
 
-    /* Error exit. */
-    ret = ENOSYS;
+    /* Activate the new kernel. */
+    do_kexec(kexec_page);
+
+    /* do_kexec() shouldn't return, crash. */
+    BUG();
 
  err:
+    if ( kexec_page )
+        free_page(kexec_page);
     change_readonly(true);
     unreserve_memory_below();
     kexec_move_used_pages_undo();
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 03/12] add elf.h
  2025-03-21  9:24 ` [MINI-OS PATCH 03/12] add elf.h Juergen Gross
@ 2025-03-21 13:51   ` Jan Beulich
  2025-03-21 15:53     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Beulich @ 2025-03-21 13:51 UTC (permalink / raw)
  To: Juergen Gross; +Cc: samuel.thibault, minios-devel, xen-devel

On 21.03.2025 10:24, Juergen Gross wrote:
> Add some definitions for accessing an ELF file. Only the file header
> and the program header are needed.
> 
> The main source for those are elfstructs.h and libelf.h from the Xen
> tree. The license boiler plate of those files is being kept in the
> resulting header file.

Maybe the copying was a bit too literal.

> --- /dev/null
> +++ b/include/elf.h
> @@ -0,0 +1,340 @@
> +#ifndef __ELF_H__
> +#define __ELF_H__
> +/*
> + * Copyright (c) 1995, 1996 Erik Theisen.  All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. The name of the author may not be used to endorse or promote products
> + *    derived from this software without specific prior written permission
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
> + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
> + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
> + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdbool.h>
> +#include <mini-os/types.h>
> +
> +typedef uint32_t    Elf32_Addr;  /* Unsigned program address */
> +typedef uint32_t    Elf32_Off;   /* Unsigned file offset */
> +typedef uint16_t    Elf32_Half;  /* Unsigned medium integer */
> +typedef uint32_t    Elf32_Word;  /* Unsigned large integer */
> +
> +typedef uint64_t    Elf64_Addr;
> +typedef uint64_t    Elf64_Off;
> +typedef uint16_t    Elf64_Half;
> +typedef uint32_t    Elf64_Word;
> +typedef uint64_t    Elf64_Xword;
> +
> +/* Unique build id string format when using --build-id. */
> +#define NT_GNU_BUILD_ID 3
> +
> +/*
> + * e_ident[] identification indexes
> + * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html
> + */
> +#define EI_MAG0        0         /* file ID */
> +#define EI_MAG1        1         /* file ID */
> +#define EI_MAG2        2         /* file ID */
> +#define EI_MAG3        3         /* file ID */
> +#define EI_CLASS       4         /* file class */
> +#define EI_DATA        5         /* data encoding */
> +#define EI_VERSION     6         /* ELF header version */
> +#define EI_OSABI       7         /* OS/ABI ID */
> +#define EI_ABIVERSION  8         /* ABI version */
> +#define EI_PAD         9         /* start of pad bytes */
> +#define EI_NIDENT     16         /* Size of e_ident[] */
> +
> +/* e_ident[] magic number */
> +#define ELFMAG0        0x7f      /* e_ident[EI_MAG0] */
> +#define ELFMAG1        'E'       /* e_ident[EI_MAG1] */
> +#define ELFMAG2        'L'       /* e_ident[EI_MAG2] */
> +#define ELFMAG3        'F'       /* e_ident[EI_MAG3] */
> +#define ELFMAG         "\177ELF" /* magic */
> +#define SELFMAG        4         /* size of magic */
> +
> +/* e_ident[] file class */
> +#define ELFCLASSNONE   0         /* invalid */
> +#define ELFCLASS32     1         /* 32-bit objs */
> +#define ELFCLASS64     2         /* 64-bit objs */
> +#define ELFCLASSNUM    3         /* number of classes */
> +
> +/* e_ident[] data encoding */
> +#define ELFDATANONE    0         /* invalid */
> +#define ELFDATA2LSB    1         /* Little-Endian */
> +#define ELFDATA2MSB    2         /* Big-Endian */
> +#define ELFDATANUM     3         /* number of data encode defines */
> +
> +/* e_ident[] Operating System/ABI */
> +#define ELFOSABI_SYSV         0  /* UNIX System V ABI */
> +#define ELFOSABI_NONE         0  /* Same as ELFOSABI_SYSV */
> +#define ELFOSABI_HPUX         1  /* HP-UX operating system */
> +#define ELFOSABI_NETBSD       2  /* NetBSD */
> +#define ELFOSABI_LINUX        3  /* GNU/Linux */
> +#define ELFOSABI_HURD         4  /* GNU/Hurd */
> +#define ELFOSABI_86OPEN       5  /* 86Open common IA32 ABI */
> +#define ELFOSABI_SOLARIS      6  /* Solaris */
> +#define ELFOSABI_MONTEREY     7  /* Monterey */
> +#define ELFOSABI_IRIX         8  /* IRIX */
> +#define ELFOSABI_FREEBSD      9  /* FreeBSD */
> +#define ELFOSABI_TRU64       10  /* TRU64 UNIX */
> +#define ELFOSABI_MODESTO     11  /* Novell Modesto */
> +#define ELFOSABI_OPENBSD     12  /* OpenBSD */
> +#define ELFOSABI_ARM         97  /* ARM */
> +#define ELFOSABI_STANDALONE 255  /* Standalone (embedded) application */

While I'm happy to see Modesto mentioned in yet another places, I don't
think you need the majority of these?

> +/* e_ident */
> +#define IS_ELF(ehdr) ((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \
> +                      (ehdr).e_ident[EI_MAG1] == ELFMAG1 && \
> +                      (ehdr).e_ident[EI_MAG2] == ELFMAG2 && \
> +                      (ehdr).e_ident[EI_MAG3] == ELFMAG3)
> +
> +/* e_flags */
> +#define EF_ARM_EABI_MASK    0xff000000
> +#define EF_ARM_EABI_UNKNOWN 0x00000000
> +#define EF_ARM_EABI_VER1    0x01000000
> +#define EF_ARM_EABI_VER2    0x02000000
> +#define EF_ARM_EABI_VER3    0x03000000
> +#define EF_ARM_EABI_VER4    0x04000000
> +#define EF_ARM_EABI_VER5    0x05000000
> +
> +/* ELF Header */
> +typedef struct {
> +    unsigned char e_ident[EI_NIDENT]; /* ELF Identification */
> +    Elf32_Half    e_type;        /* object file type */
> +    Elf32_Half    e_machine;     /* machine */
> +    Elf32_Word    e_version;     /* object file version */
> +    Elf32_Addr    e_entry;       /* virtual entry point */
> +    Elf32_Off     e_phoff;       /* program header table offset */
> +    Elf32_Off     e_shoff;       /* section header table offset */
> +    Elf32_Word    e_flags;       /* processor-specific flags */
> +    Elf32_Half    e_ehsize;      /* ELF header size */
> +    Elf32_Half    e_phentsize;   /* program header entry size */
> +    Elf32_Half    e_phnum;       /* number of program header entries */
> +    Elf32_Half    e_shentsize;   /* section header entry size */
> +    Elf32_Half    e_shnum;       /* number of section header entries */
> +    Elf32_Half    e_shstrndx;    /* section header table's "section
> +                                    header string table" entry offset */
> +} Elf32_Ehdr;
> +
> +typedef struct {
> +    unsigned char e_ident[EI_NIDENT]; /* Id bytes */
> +    Elf64_Half    e_type;        /* file type */
> +    Elf64_Half    e_machine;     /* machine type */
> +    Elf64_Word    e_version;     /* version number */
> +    Elf64_Addr    e_entry;       /* entry point */
> +    Elf64_Off     e_phoff;       /* Program hdr offset */
> +    Elf64_Off     e_shoff;       /* Section hdr offset */
> +    Elf64_Word    e_flags;       /* Processor flags */
> +    Elf64_Half    e_ehsize;      /* sizeof ehdr */
> +    Elf64_Half    e_phentsize;   /* Program header entry size */
> +    Elf64_Half    e_phnum;       /* Number of program headers */
> +    Elf64_Half    e_shentsize;   /* Section header entry size */
> +    Elf64_Half    e_shnum;       /* Number of section headers */
> +    Elf64_Half    e_shstrndx;    /* String table index */
> +} Elf64_Ehdr;
> +
> +/* e_type */
> +#define ET_NONE      0           /* No file type */
> +#define ET_REL       1           /* relocatable file */
> +#define ET_EXEC      2           /* executable file */
> +#define ET_DYN       3           /* shared object file */
> +#define ET_CORE      4           /* core file */
> +#define ET_NUM       5           /* number of types */
> +#define ET_LOPROC    0xff00      /* reserved range for processor */
> +#define ET_HIPROC    0xffff      /*   specific e_type */
> +
> +/* e_machine */
> +#define EM_NONE      0           /* No Machine */
> +#define EM_M32       1           /* AT&T WE 32100 */
> +#define EM_SPARC     2           /* SPARC */
> +#define EM_386       3           /* Intel 80386 */
> +#define EM_68K       4           /* Motorola 68000 */
> +#define EM_88K       5           /* Motorola 88000 */
> +#define EM_486       6           /* Intel 80486 - unused? */
> +#define EM_860       7           /* Intel 80860 */
> +#define EM_MIPS      8           /* MIPS R3000 Big-Endian only */
> +/*
> + * Don't know if EM_MIPS_RS4_BE,
> + * EM_SPARC64, EM_PARISC,
> + * or EM_PPC are ABI compliant
> + */
> +#define EM_MIPS_RS4_BE 10        /* MIPS R4000 Big-Endian */
> +#define EM_SPARC64     11        /* SPARC v9 64-bit unoffical */
> +#define EM_PARISC      15        /* HPPA */
> +#define EM_SPARC32PLUS 18        /* Enhanced instruction set SPARC */
> +#define EM_PPC         20        /* PowerPC */
> +#define EM_PPC64       21        /* PowerPC 64-bit */
> +#define EM_ARM         40        /* Advanced RISC Machines ARM */
> +#define EM_ALPHA       41        /* DEC ALPHA */
> +#define EM_SPARCV9     43        /* SPARC version 9 */
> +#define EM_ALPHA_EXP   0x9026    /* DEC ALPHA */
> +#define EM_IA_64       50        /* Intel Merced */
> +#define EM_X86_64      62        /* AMD x86-64 architecture */
> +#define EM_VAX         75        /* DEC VAX */
> +#define EM_AARCH64    183        /* ARM 64-bit */

Here I similarly think some stripping down might help. Doing so would then
also permit to leave out the comment in the middle.

> +/* Version */
> +#define EV_NONE      0           /* Invalid */
> +#define EV_CURRENT   1           /* Current */
> +#define EV_NUM       2           /* number of versions */
> +
> +/* Program Header */
> +typedef struct {
> +    Elf32_Word    p_type;        /* segment type */
> +    Elf32_Off     p_offset;      /* segment offset */
> +    Elf32_Addr    p_vaddr;       /* virtual address of segment */
> +    Elf32_Addr    p_paddr;       /* physical address - ignored? */
> +    Elf32_Word    p_filesz;      /* number of bytes in file for seg. */
> +    Elf32_Word    p_memsz;       /* number of bytes in mem. for seg. */
> +    Elf32_Word    p_flags;       /* flags */
> +    Elf32_Word    p_align;       /* memory alignment */
> +} Elf32_Phdr;
> +
> +typedef struct {
> +    Elf64_Word    p_type;        /* entry type */
> +    Elf64_Word    p_flags;       /* flags */
> +    Elf64_Off     p_offset;      /* offset */
> +    Elf64_Addr    p_vaddr;       /* virtual address */
> +    Elf64_Addr    p_paddr;       /* physical address */
> +    Elf64_Xword   p_filesz;      /* file size */
> +    Elf64_Xword   p_memsz;       /* memory size */
> +    Elf64_Xword   p_align;       /* memory & file alignment */
> +} Elf64_Phdr;
> +
> +/* Segment types - p_type */
> +#define PT_NULL      0           /* unused */
> +#define PT_LOAD      1           /* loadable segment */
> +#define PT_DYNAMIC   2           /* dynamic linking section */
> +#define PT_INTERP    3           /* the RTLD */
> +#define PT_NOTE      4           /* auxiliary information */
> +#define PT_SHLIB     5           /* reserved - purpose undefined */
> +#define PT_PHDR      6           /* program header */
> +#define PT_NUM       7           /* Number of segment types */
> +#define PT_LOPROC    0x70000000  /* reserved range for processor */
> +#define PT_HIPROC    0x7fffffff  /*  specific segment types */
> +
> +/* Segment flags - p_flags */
> +#define PF_X         0x1        /* Executable */
> +#define PF_W         0x2        /* Writable */
> +#define PF_R         0x4        /* Readable */
> +#define PF_MASKPROC  0xf0000000 /* reserved bits for processor */
> +                                /*  specific segment flags */
> +
> +/* Section Header */
> +typedef struct {
> +    Elf32_Word    sh_name;      /* name - index into section header
> +                                   string table section */
> +    Elf32_Word    sh_type;      /* type */
> +    Elf32_Word    sh_flags;     /* flags */
> +    Elf32_Addr    sh_addr;      /* address */
> +    Elf32_Off     sh_offset;    /* file offset */
> +    Elf32_Word    sh_size;      /* section size */
> +    Elf32_Word    sh_link;      /* section header table index link */
> +    Elf32_Word    sh_info;      /* extra information */
> +    Elf32_Word    sh_addralign; /* address alignment */
> +    Elf32_Word    sh_entsize;   /* section entry size */
> +} Elf32_Shdr;
> +
> +typedef struct {
> +    Elf64_Word    sh_name;      /* section name */
> +    Elf64_Word    sh_type;      /* section type */
> +    Elf64_Xword   sh_flags;     /* section flags */
> +    Elf64_Addr    sh_addr;      /* virtual address */
> +    Elf64_Off     sh_offset;    /* file offset */
> +    Elf64_Xword   sh_size;      /* section size */
> +    Elf64_Word    sh_link;      /* link to another */
> +    Elf64_Word    sh_info;      /* misc info */
> +    Elf64_Xword   sh_addralign; /* memory alignment */
> +    Elf64_Xword   sh_entsize;   /* table entry size */
> +} Elf64_Shdr;
> +
> +/* sh_type */
> +#define SHT_NULL        0       /* inactive */
> +#define SHT_PROGBITS    1       /* program defined information */
> +#define SHT_SYMTAB      2       /* symbol table section */
> +#define SHT_STRTAB      3       /* string table section */
> +#define SHT_RELA        4       /* relocation section with addends*/
> +#define SHT_HASH        5       /* symbol hash table section */
> +#define SHT_DYNAMIC     6       /* dynamic section */
> +#define SHT_NOTE        7       /* note section */
> +#define SHT_NOBITS      8       /* no space section */
> +#define SHT_REL         9       /* relation section without addends */
> +#define SHT_SHLIB      10       /* reserved - purpose unknown */
> +#define SHT_DYNSYM     11       /* dynamic symbol table section */
> +#define SHT_NUM        12       /* number of section types */
> +
> +/* Note definitions */
> +typedef struct {
> +    Elf32_Word namesz;
> +    Elf32_Word descsz;
> +    Elf32_Word type;
> +    char data[];
> +} Elf32_Note;
> +
> +typedef struct {
> +    Elf64_Word namesz;
> +    Elf64_Word descsz;
> +    Elf64_Word type;
> +    char data[];
> +} Elf64_Note;
> +
> +/* Abstraction layer for handling 32- and 64-bit ELF files. */
> +
> +typedef union {
> +    Elf32_Ehdr e32;
> +    Elf64_Ehdr e64;
> +} elf_ehdr;
> +
> +static inline bool elf_is_32bit(elf_ehdr *ehdr)
> +{
> +    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS32;
> +}
> +
> +static inline bool elf_is_64bit(elf_ehdr *ehdr)
> +{
> +    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS64;
> +}
> +
> +#define ehdr_val(ehdr, elem) (elf_is_32bit(ehdr) ? (ehdr)->e32.elem : (ehdr)->e64.elem)
> +
> +typedef union {
> +    Elf32_Phdr e32;
> +    Elf64_Phdr e64;
> +} elf_phdr;
> +
> +#define phdr_val(ehdr, phdr, elem) (elf_is_32bit(ehdr) ? (phdr)->e32.elem : (phdr)->e64.elem)
> +
> +typedef union {
> +    Elf32_Shdr e32;
> +    Elf64_Shdr e64;
> +} elf_shdr;
> +
> +#define shdr_val(ehdr, shdr, elem) (elf_is_32bit(ehdr) ? (shdr)->e32.elem : (shdr)->e64.elem)
> +
> +typedef union {
> +    Elf32_Note e32;
> +    Elf64_Note e64;
> +} elf_note;
> +
> +#define note_val(ehdr, note, elem) (elf_is_32bit(ehdr) ? (note)->e32.elem : (note)->e64.elem)
> +
> +static inline void *elf_ptr_add(void *ptr, unsigned long add)
> +{
> +    return (char *)ptr + add;

You can omit the cast here, can't you?

You're the maintainer, so you'll know how many of the comments you want to
address. Either way:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 03/12] add elf.h
  2025-03-21 13:51   ` Jan Beulich
@ 2025-03-21 15:53     ` Jürgen Groß
  2025-03-24  9:25       ` Jan Beulich
  0 siblings, 1 reply; 37+ messages in thread
From: Jürgen Groß @ 2025-03-21 15:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: samuel.thibault, minios-devel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 17030 bytes --]

On 21.03.25 14:51, Jan Beulich wrote:
> On 21.03.2025 10:24, Juergen Gross wrote:
>> Add some definitions for accessing an ELF file. Only the file header
>> and the program header are needed.
>>
>> The main source for those are elfstructs.h and libelf.h from the Xen
>> tree. The license boiler plate of those files is being kept in the
>> resulting header file.
> 
> Maybe the copying was a bit too literal.
> 
>> --- /dev/null
>> +++ b/include/elf.h
>> @@ -0,0 +1,340 @@
>> +#ifndef __ELF_H__
>> +#define __ELF_H__
>> +/*
>> + * Copyright (c) 1995, 1996 Erik Theisen.  All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in the
>> + *    documentation and/or other materials provided with the distribution.
>> + * 3. The name of the author may not be used to endorse or promote products
>> + *    derived from this software without specific prior written permission
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
>> + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
>> + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
>> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
>> + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
>> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <mini-os/types.h>
>> +
>> +typedef uint32_t    Elf32_Addr;  /* Unsigned program address */
>> +typedef uint32_t    Elf32_Off;   /* Unsigned file offset */
>> +typedef uint16_t    Elf32_Half;  /* Unsigned medium integer */
>> +typedef uint32_t    Elf32_Word;  /* Unsigned large integer */
>> +
>> +typedef uint64_t    Elf64_Addr;
>> +typedef uint64_t    Elf64_Off;
>> +typedef uint16_t    Elf64_Half;
>> +typedef uint32_t    Elf64_Word;
>> +typedef uint64_t    Elf64_Xword;
>> +
>> +/* Unique build id string format when using --build-id. */
>> +#define NT_GNU_BUILD_ID 3
>> +
>> +/*
>> + * e_ident[] identification indexes
>> + * See http://www.caldera.com/developers/gabi/2000-07-17/ch4.eheader.html
>> + */
>> +#define EI_MAG0        0         /* file ID */
>> +#define EI_MAG1        1         /* file ID */
>> +#define EI_MAG2        2         /* file ID */
>> +#define EI_MAG3        3         /* file ID */
>> +#define EI_CLASS       4         /* file class */
>> +#define EI_DATA        5         /* data encoding */
>> +#define EI_VERSION     6         /* ELF header version */
>> +#define EI_OSABI       7         /* OS/ABI ID */
>> +#define EI_ABIVERSION  8         /* ABI version */
>> +#define EI_PAD         9         /* start of pad bytes */
>> +#define EI_NIDENT     16         /* Size of e_ident[] */
>> +
>> +/* e_ident[] magic number */
>> +#define ELFMAG0        0x7f      /* e_ident[EI_MAG0] */
>> +#define ELFMAG1        'E'       /* e_ident[EI_MAG1] */
>> +#define ELFMAG2        'L'       /* e_ident[EI_MAG2] */
>> +#define ELFMAG3        'F'       /* e_ident[EI_MAG3] */
>> +#define ELFMAG         "\177ELF" /* magic */
>> +#define SELFMAG        4         /* size of magic */
>> +
>> +/* e_ident[] file class */
>> +#define ELFCLASSNONE   0         /* invalid */
>> +#define ELFCLASS32     1         /* 32-bit objs */
>> +#define ELFCLASS64     2         /* 64-bit objs */
>> +#define ELFCLASSNUM    3         /* number of classes */
>> +
>> +/* e_ident[] data encoding */
>> +#define ELFDATANONE    0         /* invalid */
>> +#define ELFDATA2LSB    1         /* Little-Endian */
>> +#define ELFDATA2MSB    2         /* Big-Endian */
>> +#define ELFDATANUM     3         /* number of data encode defines */
>> +
>> +/* e_ident[] Operating System/ABI */
>> +#define ELFOSABI_SYSV         0  /* UNIX System V ABI */
>> +#define ELFOSABI_NONE         0  /* Same as ELFOSABI_SYSV */
>> +#define ELFOSABI_HPUX         1  /* HP-UX operating system */
>> +#define ELFOSABI_NETBSD       2  /* NetBSD */
>> +#define ELFOSABI_LINUX        3  /* GNU/Linux */
>> +#define ELFOSABI_HURD         4  /* GNU/Hurd */
>> +#define ELFOSABI_86OPEN       5  /* 86Open common IA32 ABI */
>> +#define ELFOSABI_SOLARIS      6  /* Solaris */
>> +#define ELFOSABI_MONTEREY     7  /* Monterey */
>> +#define ELFOSABI_IRIX         8  /* IRIX */
>> +#define ELFOSABI_FREEBSD      9  /* FreeBSD */
>> +#define ELFOSABI_TRU64       10  /* TRU64 UNIX */
>> +#define ELFOSABI_MODESTO     11  /* Novell Modesto */
>> +#define ELFOSABI_OPENBSD     12  /* OpenBSD */
>> +#define ELFOSABI_ARM         97  /* ARM */
>> +#define ELFOSABI_STANDALONE 255  /* Standalone (embedded) application */
> 
> While I'm happy to see Modesto mentioned in yet another places, I don't
> think you need the majority of these?

Hmm, true. In the end I don't need any of those, as the handled binary
won't have any external ABI which the running kernel would need to know
(apart from the PVH boot interface, which isn't OS dependent).

> 
>> +/* e_ident */
>> +#define IS_ELF(ehdr) ((ehdr).e_ident[EI_MAG0] == ELFMAG0 && \
>> +                      (ehdr).e_ident[EI_MAG1] == ELFMAG1 && \
>> +                      (ehdr).e_ident[EI_MAG2] == ELFMAG2 && \
>> +                      (ehdr).e_ident[EI_MAG3] == ELFMAG3)
>> +
>> +/* e_flags */
>> +#define EF_ARM_EABI_MASK    0xff000000
>> +#define EF_ARM_EABI_UNKNOWN 0x00000000
>> +#define EF_ARM_EABI_VER1    0x01000000
>> +#define EF_ARM_EABI_VER2    0x02000000
>> +#define EF_ARM_EABI_VER3    0x03000000
>> +#define EF_ARM_EABI_VER4    0x04000000
>> +#define EF_ARM_EABI_VER5    0x05000000
>> +
>> +/* ELF Header */
>> +typedef struct {
>> +    unsigned char e_ident[EI_NIDENT]; /* ELF Identification */
>> +    Elf32_Half    e_type;        /* object file type */
>> +    Elf32_Half    e_machine;     /* machine */
>> +    Elf32_Word    e_version;     /* object file version */
>> +    Elf32_Addr    e_entry;       /* virtual entry point */
>> +    Elf32_Off     e_phoff;       /* program header table offset */
>> +    Elf32_Off     e_shoff;       /* section header table offset */
>> +    Elf32_Word    e_flags;       /* processor-specific flags */
>> +    Elf32_Half    e_ehsize;      /* ELF header size */
>> +    Elf32_Half    e_phentsize;   /* program header entry size */
>> +    Elf32_Half    e_phnum;       /* number of program header entries */
>> +    Elf32_Half    e_shentsize;   /* section header entry size */
>> +    Elf32_Half    e_shnum;       /* number of section header entries */
>> +    Elf32_Half    e_shstrndx;    /* section header table's "section
>> +                                    header string table" entry offset */
>> +} Elf32_Ehdr;
>> +
>> +typedef struct {
>> +    unsigned char e_ident[EI_NIDENT]; /* Id bytes */
>> +    Elf64_Half    e_type;        /* file type */
>> +    Elf64_Half    e_machine;     /* machine type */
>> +    Elf64_Word    e_version;     /* version number */
>> +    Elf64_Addr    e_entry;       /* entry point */
>> +    Elf64_Off     e_phoff;       /* Program hdr offset */
>> +    Elf64_Off     e_shoff;       /* Section hdr offset */
>> +    Elf64_Word    e_flags;       /* Processor flags */
>> +    Elf64_Half    e_ehsize;      /* sizeof ehdr */
>> +    Elf64_Half    e_phentsize;   /* Program header entry size */
>> +    Elf64_Half    e_phnum;       /* Number of program headers */
>> +    Elf64_Half    e_shentsize;   /* Section header entry size */
>> +    Elf64_Half    e_shnum;       /* Number of section headers */
>> +    Elf64_Half    e_shstrndx;    /* String table index */
>> +} Elf64_Ehdr;
>> +
>> +/* e_type */
>> +#define ET_NONE      0           /* No file type */
>> +#define ET_REL       1           /* relocatable file */
>> +#define ET_EXEC      2           /* executable file */
>> +#define ET_DYN       3           /* shared object file */
>> +#define ET_CORE      4           /* core file */
>> +#define ET_NUM       5           /* number of types */
>> +#define ET_LOPROC    0xff00      /* reserved range for processor */
>> +#define ET_HIPROC    0xffff      /*   specific e_type */
>> +
>> +/* e_machine */
>> +#define EM_NONE      0           /* No Machine */
>> +#define EM_M32       1           /* AT&T WE 32100 */
>> +#define EM_SPARC     2           /* SPARC */
>> +#define EM_386       3           /* Intel 80386 */
>> +#define EM_68K       4           /* Motorola 68000 */
>> +#define EM_88K       5           /* Motorola 88000 */
>> +#define EM_486       6           /* Intel 80486 - unused? */
>> +#define EM_860       7           /* Intel 80860 */
>> +#define EM_MIPS      8           /* MIPS R3000 Big-Endian only */
>> +/*
>> + * Don't know if EM_MIPS_RS4_BE,
>> + * EM_SPARC64, EM_PARISC,
>> + * or EM_PPC are ABI compliant
>> + */
>> +#define EM_MIPS_RS4_BE 10        /* MIPS R4000 Big-Endian */
>> +#define EM_SPARC64     11        /* SPARC v9 64-bit unoffical */
>> +#define EM_PARISC      15        /* HPPA */
>> +#define EM_SPARC32PLUS 18        /* Enhanced instruction set SPARC */
>> +#define EM_PPC         20        /* PowerPC */
>> +#define EM_PPC64       21        /* PowerPC 64-bit */
>> +#define EM_ARM         40        /* Advanced RISC Machines ARM */
>> +#define EM_ALPHA       41        /* DEC ALPHA */
>> +#define EM_SPARCV9     43        /* SPARC version 9 */
>> +#define EM_ALPHA_EXP   0x9026    /* DEC ALPHA */
>> +#define EM_IA_64       50        /* Intel Merced */
>> +#define EM_X86_64      62        /* AMD x86-64 architecture */
>> +#define EM_VAX         75        /* DEC VAX */
>> +#define EM_AARCH64    183        /* ARM 64-bit */
> 
> Here I similarly think some stripping down might help. Doing so would then
> also permit to leave out the comment in the middle.

Here I'm a little bit more on the edge. Some historical entries can
probably be dropped, but which ones would want to stay?

> 
>> +/* Version */
>> +#define EV_NONE      0           /* Invalid */
>> +#define EV_CURRENT   1           /* Current */
>> +#define EV_NUM       2           /* number of versions */
>> +
>> +/* Program Header */
>> +typedef struct {
>> +    Elf32_Word    p_type;        /* segment type */
>> +    Elf32_Off     p_offset;      /* segment offset */
>> +    Elf32_Addr    p_vaddr;       /* virtual address of segment */
>> +    Elf32_Addr    p_paddr;       /* physical address - ignored? */
>> +    Elf32_Word    p_filesz;      /* number of bytes in file for seg. */
>> +    Elf32_Word    p_memsz;       /* number of bytes in mem. for seg. */
>> +    Elf32_Word    p_flags;       /* flags */
>> +    Elf32_Word    p_align;       /* memory alignment */
>> +} Elf32_Phdr;
>> +
>> +typedef struct {
>> +    Elf64_Word    p_type;        /* entry type */
>> +    Elf64_Word    p_flags;       /* flags */
>> +    Elf64_Off     p_offset;      /* offset */
>> +    Elf64_Addr    p_vaddr;       /* virtual address */
>> +    Elf64_Addr    p_paddr;       /* physical address */
>> +    Elf64_Xword   p_filesz;      /* file size */
>> +    Elf64_Xword   p_memsz;       /* memory size */
>> +    Elf64_Xword   p_align;       /* memory & file alignment */
>> +} Elf64_Phdr;
>> +
>> +/* Segment types - p_type */
>> +#define PT_NULL      0           /* unused */
>> +#define PT_LOAD      1           /* loadable segment */
>> +#define PT_DYNAMIC   2           /* dynamic linking section */
>> +#define PT_INTERP    3           /* the RTLD */
>> +#define PT_NOTE      4           /* auxiliary information */
>> +#define PT_SHLIB     5           /* reserved - purpose undefined */
>> +#define PT_PHDR      6           /* program header */
>> +#define PT_NUM       7           /* Number of segment types */
>> +#define PT_LOPROC    0x70000000  /* reserved range for processor */
>> +#define PT_HIPROC    0x7fffffff  /*  specific segment types */
>> +
>> +/* Segment flags - p_flags */
>> +#define PF_X         0x1        /* Executable */
>> +#define PF_W         0x2        /* Writable */
>> +#define PF_R         0x4        /* Readable */
>> +#define PF_MASKPROC  0xf0000000 /* reserved bits for processor */
>> +                                /*  specific segment flags */
>> +
>> +/* Section Header */
>> +typedef struct {
>> +    Elf32_Word    sh_name;      /* name - index into section header
>> +                                   string table section */
>> +    Elf32_Word    sh_type;      /* type */
>> +    Elf32_Word    sh_flags;     /* flags */
>> +    Elf32_Addr    sh_addr;      /* address */
>> +    Elf32_Off     sh_offset;    /* file offset */
>> +    Elf32_Word    sh_size;      /* section size */
>> +    Elf32_Word    sh_link;      /* section header table index link */
>> +    Elf32_Word    sh_info;      /* extra information */
>> +    Elf32_Word    sh_addralign; /* address alignment */
>> +    Elf32_Word    sh_entsize;   /* section entry size */
>> +} Elf32_Shdr;
>> +
>> +typedef struct {
>> +    Elf64_Word    sh_name;      /* section name */
>> +    Elf64_Word    sh_type;      /* section type */
>> +    Elf64_Xword   sh_flags;     /* section flags */
>> +    Elf64_Addr    sh_addr;      /* virtual address */
>> +    Elf64_Off     sh_offset;    /* file offset */
>> +    Elf64_Xword   sh_size;      /* section size */
>> +    Elf64_Word    sh_link;      /* link to another */
>> +    Elf64_Word    sh_info;      /* misc info */
>> +    Elf64_Xword   sh_addralign; /* memory alignment */
>> +    Elf64_Xword   sh_entsize;   /* table entry size */
>> +} Elf64_Shdr;
>> +
>> +/* sh_type */
>> +#define SHT_NULL        0       /* inactive */
>> +#define SHT_PROGBITS    1       /* program defined information */
>> +#define SHT_SYMTAB      2       /* symbol table section */
>> +#define SHT_STRTAB      3       /* string table section */
>> +#define SHT_RELA        4       /* relocation section with addends*/
>> +#define SHT_HASH        5       /* symbol hash table section */
>> +#define SHT_DYNAMIC     6       /* dynamic section */
>> +#define SHT_NOTE        7       /* note section */
>> +#define SHT_NOBITS      8       /* no space section */
>> +#define SHT_REL         9       /* relation section without addends */
>> +#define SHT_SHLIB      10       /* reserved - purpose unknown */
>> +#define SHT_DYNSYM     11       /* dynamic symbol table section */
>> +#define SHT_NUM        12       /* number of section types */
>> +
>> +/* Note definitions */
>> +typedef struct {
>> +    Elf32_Word namesz;
>> +    Elf32_Word descsz;
>> +    Elf32_Word type;
>> +    char data[];
>> +} Elf32_Note;
>> +
>> +typedef struct {
>> +    Elf64_Word namesz;
>> +    Elf64_Word descsz;
>> +    Elf64_Word type;
>> +    char data[];
>> +} Elf64_Note;
>> +
>> +/* Abstraction layer for handling 32- and 64-bit ELF files. */
>> +
>> +typedef union {
>> +    Elf32_Ehdr e32;
>> +    Elf64_Ehdr e64;
>> +} elf_ehdr;
>> +
>> +static inline bool elf_is_32bit(elf_ehdr *ehdr)
>> +{
>> +    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS32;
>> +}
>> +
>> +static inline bool elf_is_64bit(elf_ehdr *ehdr)
>> +{
>> +    return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS64;
>> +}
>> +
>> +#define ehdr_val(ehdr, elem) (elf_is_32bit(ehdr) ? (ehdr)->e32.elem : (ehdr)->e64.elem)
>> +
>> +typedef union {
>> +    Elf32_Phdr e32;
>> +    Elf64_Phdr e64;
>> +} elf_phdr;
>> +
>> +#define phdr_val(ehdr, phdr, elem) (elf_is_32bit(ehdr) ? (phdr)->e32.elem : (phdr)->e64.elem)
>> +
>> +typedef union {
>> +    Elf32_Shdr e32;
>> +    Elf64_Shdr e64;
>> +} elf_shdr;
>> +
>> +#define shdr_val(ehdr, shdr, elem) (elf_is_32bit(ehdr) ? (shdr)->e32.elem : (shdr)->e64.elem)
>> +
>> +typedef union {
>> +    Elf32_Note e32;
>> +    Elf64_Note e64;
>> +} elf_note;
>> +
>> +#define note_val(ehdr, note, elem) (elf_is_32bit(ehdr) ? (note)->e32.elem : (note)->e64.elem)
>> +
>> +static inline void *elf_ptr_add(void *ptr, unsigned long add)
>> +{
>> +    return (char *)ptr + add;
> 
> You can omit the cast here, can't you?

Yes.

> 
> You're the maintainer, so you'll know how many of the comments you want to
> address. Either way:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks,


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (11 preceding siblings ...)
  2025-03-21  9:24 ` [MINI-OS PATCH 12/12] kexec: do the final kexec step Juergen Gross
@ 2025-03-22 23:54 ` Samuel Thibault
  2025-03-23  7:01   ` Jürgen Groß
  2025-05-07 12:58 ` Juergen Gross
  13 siblings, 1 reply; 37+ messages in thread
From: Samuel Thibault @ 2025-03-22 23:54 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel

Hello,

Juergen Gross, le ven. 21 mars 2025 10:24:39 +0100, a ecrit:
> Add basic kexec support to Mini-OS for running in x86 PVH mode.

I am wondering if you had considered using libxc to implement this?
The original pv-grub1 is doing it (xen/stubdom/grub/kexec.c)

Samuel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS
  2025-03-22 23:54 ` [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Samuel Thibault
@ 2025-03-23  7:01   ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-03-23  7:01 UTC (permalink / raw)
  To: Samuel Thibault, minios-devel, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 727 bytes --]

On 23.03.25 00:54, Samuel Thibault wrote:
> Hello,
> 
> Juergen Gross, le ven. 21 mars 2025 10:24:39 +0100, a ecrit:
>> Add basic kexec support to Mini-OS for running in x86 PVH mode.
> 
> I am wondering if you had considered using libxc to implement this?
> The original pv-grub1 is doing it (xen/stubdom/grub/kexec.c)

The libxc (or more precise: libxenguest) usage is needed there only for
construction of the PV specific parts like the initial page tables and
the p2m map. The main kexec functionality as I need it for PVH stubdom
is in grub itself.

Additionally I've worked hard to get rid of non-stable Xen libraries in
Xenstore-stubdom. Adding them again just after removal would be weird.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 03/12] add elf.h
  2025-03-21 15:53     ` Jürgen Groß
@ 2025-03-24  9:25       ` Jan Beulich
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Beulich @ 2025-03-24  9:25 UTC (permalink / raw)
  To: Jürgen Groß; +Cc: samuel.thibault, minios-devel, xen-devel

On 21.03.2025 16:53, Jürgen Groß wrote:
> On 21.03.25 14:51, Jan Beulich wrote:
>> On 21.03.2025 10:24, Juergen Gross wrote:
>>> +/* e_type */
>>> +#define ET_NONE      0           /* No file type */
>>> +#define ET_REL       1           /* relocatable file */
>>> +#define ET_EXEC      2           /* executable file */
>>> +#define ET_DYN       3           /* shared object file */
>>> +#define ET_CORE      4           /* core file */
>>> +#define ET_NUM       5           /* number of types */
>>> +#define ET_LOPROC    0xff00      /* reserved range for processor */
>>> +#define ET_HIPROC    0xffff      /*   specific e_type */
>>> +
>>> +/* e_machine */
>>> +#define EM_NONE      0           /* No Machine */
>>> +#define EM_M32       1           /* AT&T WE 32100 */
>>> +#define EM_SPARC     2           /* SPARC */
>>> +#define EM_386       3           /* Intel 80386 */
>>> +#define EM_68K       4           /* Motorola 68000 */
>>> +#define EM_88K       5           /* Motorola 88000 */
>>> +#define EM_486       6           /* Intel 80486 - unused? */
>>> +#define EM_860       7           /* Intel 80860 */
>>> +#define EM_MIPS      8           /* MIPS R3000 Big-Endian only */
>>> +/*
>>> + * Don't know if EM_MIPS_RS4_BE,
>>> + * EM_SPARC64, EM_PARISC,
>>> + * or EM_PPC are ABI compliant
>>> + */
>>> +#define EM_MIPS_RS4_BE 10        /* MIPS R4000 Big-Endian */
>>> +#define EM_SPARC64     11        /* SPARC v9 64-bit unoffical */
>>> +#define EM_PARISC      15        /* HPPA */
>>> +#define EM_SPARC32PLUS 18        /* Enhanced instruction set SPARC */
>>> +#define EM_PPC         20        /* PowerPC */
>>> +#define EM_PPC64       21        /* PowerPC 64-bit */
>>> +#define EM_ARM         40        /* Advanced RISC Machines ARM */
>>> +#define EM_ALPHA       41        /* DEC ALPHA */
>>> +#define EM_SPARCV9     43        /* SPARC version 9 */
>>> +#define EM_ALPHA_EXP   0x9026    /* DEC ALPHA */
>>> +#define EM_IA_64       50        /* Intel Merced */
>>> +#define EM_X86_64      62        /* AMD x86-64 architecture */
>>> +#define EM_VAX         75        /* DEC VAX */
>>> +#define EM_AARCH64    183        /* ARM 64-bit */
>>
>> Here I similarly think some stripping down might help. Doing so would then
>> also permit to leave out the comment in the middle.
> 
> Here I'm a little bit more on the edge. Some historical entries can
> probably be dropped, but which ones would want to stay?

The ones we presently need for any of the architectures we have (partial)
ports for in the hypervisor, or that the hypervisor targets for compatibility.

Jan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS
  2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
                   ` (12 preceding siblings ...)
  2025-03-22 23:54 ` [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Samuel Thibault
@ 2025-05-07 12:58 ` Juergen Gross
  2025-06-13  9:34   ` Juergen Gross
  13 siblings, 1 reply; 37+ messages in thread
From: Juergen Gross @ 2025-05-07 12:58 UTC (permalink / raw)
  To: minios-devel, xen-devel; +Cc: samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 2185 bytes --]

Ping?

On 21.03.25 10:24, Juergen Gross wrote:
> Add basic kexec support to Mini-OS for running in x86 PVH mode.
> 
> With this series applied it is possible to activate another kernel
> from within Mini-OS.
> 
> Right now no Xen related teardown is done (so no reset of grant table,
> event channels, PV devices). These should be added via kexec callbacks
> which are added as a framework.
> 
> This is a major building block for support of Xenstore-stubdom live
> update (in fact I've tested the kexec path to work using the PVH
> variant of Xenstore-stubdom).
> 
> Juergen Gross (12):
>    add kexec framework
>    Mini-OS: add final kexec stage
>    mini-os: add elf.h
>    mini-os: analyze new kernel for kexec
>    mini-os: kexec: finalize parameter location and size
>    mini-os: reserve memory below boundary
>    mini-os: kexec: build parameters for new kernel
>    mini-os: kexec: move used pages away for new kernel
>    Mini-OS: mm: change set_readonly() to change_readonly()
>    Mini-OS: kexec: switch read-only area to be writable again
>    mini-os: kexec: add kexec callback functionality
>    mini-os: kexec: do the final kexec step
> 
>   Config.mk                  |   1 +
>   Makefile                   |   1 +
>   arch/x86/kexec.c           | 273 +++++++++++++++++++++++++++++
>   arch/x86/minios-x86.lds.S  |  16 ++
>   arch/x86/mm.c              | 238 ++++++++++++++++++++------
>   arch/x86/testbuild/all-no  |   1 +
>   arch/x86/testbuild/all-yes |   2 +
>   arch/x86/testbuild/kexec   |   4 +
>   arch/x86/x86_hvm.S         |  46 +++++
>   include/elf.h              | 340 +++++++++++++++++++++++++++++++++++++
>   include/kexec.h            |  63 +++++++
>   include/mm.h               |   8 +
>   include/x86/os.h           |   5 +
>   kexec.c                    | 253 +++++++++++++++++++++++++++
>   mm.c                       |  89 +++++++++-
>   15 files changed, 1289 insertions(+), 51 deletions(-)
>   create mode 100644 arch/x86/kexec.c
>   create mode 100644 arch/x86/testbuild/kexec
>   create mode 100644 include/elf.h
>   create mode 100644 include/kexec.h
>   create mode 100644 kexec.c
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS
  2025-05-07 12:58 ` Juergen Gross
@ 2025-06-13  9:34   ` Juergen Gross
  0 siblings, 0 replies; 37+ messages in thread
From: Juergen Gross @ 2025-06-13  9:34 UTC (permalink / raw)
  To: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 2556 bytes --]

On 07.05.25 14:58, Juergen Gross wrote:
> Ping?

I'd really appreciate some feedback.


Juergen

> 
> On 21.03.25 10:24, Juergen Gross wrote:
>> Add basic kexec support to Mini-OS for running in x86 PVH mode.
>>
>> With this series applied it is possible to activate another kernel
>> from within Mini-OS.
>>
>> Right now no Xen related teardown is done (so no reset of grant table,
>> event channels, PV devices). These should be added via kexec callbacks
>> which are added as a framework.
>>
>> This is a major building block for support of Xenstore-stubdom live
>> update (in fact I've tested the kexec path to work using the PVH
>> variant of Xenstore-stubdom).
>>
>> Juergen Gross (12):
>>    add kexec framework
>>    Mini-OS: add final kexec stage
>>    mini-os: add elf.h
>>    mini-os: analyze new kernel for kexec
>>    mini-os: kexec: finalize parameter location and size
>>    mini-os: reserve memory below boundary
>>    mini-os: kexec: build parameters for new kernel
>>    mini-os: kexec: move used pages away for new kernel
>>    Mini-OS: mm: change set_readonly() to change_readonly()
>>    Mini-OS: kexec: switch read-only area to be writable again
>>    mini-os: kexec: add kexec callback functionality
>>    mini-os: kexec: do the final kexec step
>>
>>   Config.mk                  |   1 +
>>   Makefile                   |   1 +
>>   arch/x86/kexec.c           | 273 +++++++++++++++++++++++++++++
>>   arch/x86/minios-x86.lds.S  |  16 ++
>>   arch/x86/mm.c              | 238 ++++++++++++++++++++------
>>   arch/x86/testbuild/all-no  |   1 +
>>   arch/x86/testbuild/all-yes |   2 +
>>   arch/x86/testbuild/kexec   |   4 +
>>   arch/x86/x86_hvm.S         |  46 +++++
>>   include/elf.h              | 340 +++++++++++++++++++++++++++++++++++++
>>   include/kexec.h            |  63 +++++++
>>   include/mm.h               |   8 +
>>   include/x86/os.h           |   5 +
>>   kexec.c                    | 253 +++++++++++++++++++++++++++
>>   mm.c                       |  89 +++++++++-
>>   15 files changed, 1289 insertions(+), 51 deletions(-)
>>   create mode 100644 arch/x86/kexec.c
>>   create mode 100644 arch/x86/testbuild/kexec
>>   create mode 100644 include/elf.h
>>   create mode 100644 include/kexec.h
>>   create mode 100644 kexec.c
>>
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 01/12] kexec: add kexec framework
  2025-03-21  9:24 ` [MINI-OS PATCH 01/12] kexec: add kexec framework Juergen Gross
@ 2025-06-14 16:40   ` Jason Andryuk
  2025-06-16  5:40     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 16:40 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>
> Add a new config option CONFIG_KEXEC for support of kexec-ing into a
> new mini-os kernel. Add a related kexec.c source and a kexec.h header.
>
> For now allow CONFIG_KEXEC to be set only for PVH variant of mini-os.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---


> diff --git a/arch/x86/testbuild/all-yes b/arch/x86/testbuild/all-yes
> index 8ae489a4..99ba75dd 100644
> --- a/arch/x86/testbuild/all-yes
> +++ b/arch/x86/testbuild/all-yes
> @@ -19,3 +19,5 @@ CONFIG_BALLOON = y
>  CONFIG_USE_XEN_CONSOLE = y
>  # The following are special: they need support from outside
>  CONFIG_LWIP = n
> +# KEXEC only without PARAVIRT

Maybe: "KEXEC not implemented for PARAVIRT"?

> +CONFIG_KEXEC = n

> diff --git a/kexec.c b/kexec.c
> new file mode 100644
> index 00000000..53528169
> --- /dev/null
> +++ b/kexec.c
> @@ -0,0 +1,62 @@

> +
> +#include <errno.h>
> +#include <mini-os/os.h>
> +#include <mini-os/lib.h>
> +#include <mini-os/kexec.h>
> +
> +/*
> + * General approach for kexec support (PVH only) is as follows:
> + *
> + * - New kernel needs to be in memory in form of a ELF file in a virtual

"in the form of an ELF binary"

> + *   memory region.

Maybe just "The new kernel needs to be an ELF binary loaded into the
Mini-OS address space"?

> + * - A new start_info structure is constructed in memory with the final
> + *   memory locations included.
> + * - All memory areas needed for kexec execution are being finalized.
> + * - From here on a graceful failure is no longer possible.
> + * - Grants and event channels are torn down.
> + * - A temporary set of page tables is constructed at a location where it
> + *   doesn't conflict with old and new kernel or start_info.
> + * - The final kexec execution stage is copied to a memory area below 4G which
> + *   doesn't conflict with the target areas of kernel etc.
> + * - Cr3 is switched to the new set of page tables.
> + * - Execution continues in the final execution stage.
> + * - All data is copied to its final addresses.
> + * - Processing is switched to 32-bit mode without address translation.

Maybe "CPU is switched to 32-bit mode with paging disabled."?

Is the following memory layout correct?

[ 0 ... 8MB ] ... [ X ... X + Y ] ... [ Z ...      ]
 Old stubdom        New stubdom         kexec code

kexec code copies New stubdom to 0 and later jumps to New stubdom @ 0

The temporary page tables are to allow old stubdom and kexec code to
be called while overwriting the "Old stubdom" range which would
include the page tables originally used?  Or it can only run the kexec
code once old stubdom is overwritten, right?

I think some comments tweaks would be helpful, but code-wise
everything is okay, so:

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 02/12] kexec: add final kexec stage
  2025-03-21  9:24 ` [MINI-OS PATCH 02/12] kexec: add final kexec stage Juergen Gross
@ 2025-06-14 16:40   ` Jason Andryuk
  2025-06-16  6:13     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 16:40 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>
> Add the code and data definitions of the final kexec stage.
>
> Put the code and related data into a dedicated section in order to be
> able to copy it to another location. For this reason there must be no
> absolute relocations being used in the code or data.
>
> Being functionally related, add a function for adding a final kexec
> action.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

> --- /dev/null
> +++ b/arch/x86/kexec.c
> @@ -0,0 +1,109 @@

> +
> +/*
> + * Final stage of kexec. Copies all data to the final destinations, zeroes
> + * .bss and activates new kernel.
> + * Must be called with interrupts off. Stack, code and data must be
> + * accessible via identity mapped virtual addresses (virt == phys). Copying
> + * and zeroing is done using virtual addresses.
> + * No relocations inside the function are allowed, as it is copied to an
> + * allocated page before being executed.

"page" is stated here.  Do we need an ASSERT later?

> + */

> +void do_kexec(void *kexec_page)
> +{
> +    unsigned long actions;
> +    unsigned long stack;
> +    unsigned long final;
> +    unsigned long phys;
> +
> +    actions = get_kexec_addr(kexec_page, kexec_actions);
> +    stack = get_kexec_addr(kexec_page, kexec_stack + KEXEC_STACK_LONGS);
> +    final = get_kexec_addr(kexec_page, kexec_final);
> +    phys = get_kexec_addr(kexec_page, kexec_phys);
> +
> +    memcpy(kexec_page, _kexec_start, KEXEC_SECSIZE);
> +    asm("cli\n\t"
> +        "mov %0, %%"ASM_SP"\n\t"
> +        "mov %1, %%"ASM_ARG1"\n\t"
> +        "mov %2, %%"ASM_ARG2"\n\t"
> +        "jmp *%3"
> +        :"=m" (stack), "=m" (actions), "=m" (phys)

Aren't these inputs and not outputs?

> +        :"m" (final));
> +}
> +
> +#endif /* CONFIG_KEXEC */


> diff --git a/include/kexec.h b/include/kexec.h
> index 6fd96774..722be456 100644
> --- a/include/kexec.h
> +++ b/include/kexec.h
> @@ -1,7 +1,34 @@

> +
> +int kexec_add_action(int action, void *dest, void *src, unsigned int len);
> +
> +#define KEXEC_SECSIZE ((unsigned long)_kexec_end - (unsigned long)_kexec_start)

Add a build assertion here?  Or maybe the correct amount is allocated
and it doesn't matter.

Generally looks good.

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec
  2025-03-21  9:24 ` [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec Juergen Gross
@ 2025-06-14 16:41   ` Jason Andryuk
  2025-06-16  6:42     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 16:41 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>
> Analyze the properties of the new kernel to be loaded by kexec. The
> data needed is:
>
> - upper boundary in final location
> - copy and memory clear operations
> - entry point and entry parameter
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

> +
> +static void check_notes_entry(elf_ehdr *ehdr, void *start, unsigned int len)

I think you should rename this to include read_ since it is necessary
to set kernel_entry.  read_phys32_entry_note() or
read_note_kernel_entry() or some variation.  To me, check_ implies a
boolean return without a side effect.

> @@ -54,9 +57,122 @@
>   * - The new kernel is activated.
>   */
>
> -int kexec(void *kernel, unsigned long kernel_size,
> -          const char *cmdline)

> +int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)

NIT: introduce kexec() with the single line form to avoid changing it

Everything else looks good, so preferably with the renaming:

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 05/12] kexec: finalize parameter location and size
  2025-03-21  9:24 ` [MINI-OS PATCH 05/12] kexec: finalize parameter location and size Juergen Gross
@ 2025-06-14 16:43   ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 16:43 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>
> Finalize the location and the size of the parameters for the new
> kernel. This is needed in order to avoid allocating new memory in the
> area occupied by the new kernel and parameters.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 06/12] kexec: reserve memory below boundary
  2025-03-21  9:24 ` [MINI-OS PATCH 06/12] kexec: reserve memory below boundary Juergen Gross
@ 2025-06-14 16:56   ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 16:56 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:31 AM Juergen Gross <jgross@suse.com> wrote:
>
> In order to support kexec any memory used during copying the new
> kernel to its final destination must not overlap with the destination
> area.
>
> In order to achieve that add a new interface allowing to mark all
> allocatable memory below a specific physical address as not available.
>
> This is done by walking through all chunks of the buddy allocator and
> removing the chunks (or chunk parts) below the boundary. The removed
> chunks are put into a list in order to be able to undo the operation
> in case kexec is failing before doing any unrecoverable system
> modifications.
>
> Any pages freed located below the boundary need to go directly into
> the list of reserved pages instead of the free pool.
>
> Call the new function from kexec code.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 07/12] kexec: build parameters for new kernel
  2025-03-21  9:24 ` [MINI-OS PATCH 07/12] kexec: build parameters for new kernel Juergen Gross
@ 2025-06-14 17:02   ` Jason Andryuk
  2025-06-16  7:00     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:02 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>
> Build the parameters for the new kernel, consisting of the
> hvm_start_info struct, the memory map and the command line.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

> @@ -212,4 +213,61 @@ void kexec_set_param_loc(const char *cmdline)

> +
> +    /* The call of the new kernel happens via the physical address! */
> +    if ( kexec_add_action(KEXEC_CALL, (void *)kernel_entry,

Maybe kernel_entry_pa, kernel_phys32_entry, or kernel_phys_entry would
be a better name to make the physical address clear?

Either way:
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 08/12] kexec: move used pages away for new kernel
  2025-03-21  9:24 ` [MINI-OS PATCH 08/12] kexec: move used pages away " Juergen Gross
@ 2025-06-14 17:19   ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:19 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>
> Copying the new kexec kernel must not overwrite any pages still needed
> during this process. Those are especially the GDT, IDT and page tables.
>
> Move those to new allocated pages and update any related pointers.
>
> In case the kexec process is cancelled later, don't undo any page table
> moves, as the system can just be used with the new layout. By freeing
> the original pages there is no memory leaking.
>
> GDT and IDT should be reverted to their original locations, as their
> original memory can't be freed due to not being whole pages.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly()
  2025-03-21  9:24 ` [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly() Juergen Gross
@ 2025-06-14 17:25   ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:25 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:31 AM Juergen Gross <jgross@suse.com> wrote:
>
> Rename set_readonly() to change_readonly() and add a parameter
> specifying whether it should set the kernel to readonly or to writable.
> At the same time move the boundary setting from the only caller into
> the function itself, avoiding the need to use the same boundaries in
> future, when it will be called to set the kernel to writable again.
> Make the function globally visible in order to allow calling it from
> kexec coding later.
>
> Merge clear_bootstrap() into change_readonly() and undo its setting of
> page 0 to invalid when setting the kernel writable.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again
  2025-03-21  9:24 ` [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again Juergen Gross
@ 2025-06-14 17:26   ` Jason Andryuk
  0 siblings, 0 replies; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:26 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>
> In order to allow writing the new kernel, make the readonly area
> covering current kernel text writable again.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 11/12] kexec: add kexec callback functionality
  2025-03-21  9:24 ` [MINI-OS PATCH 11/12] kexec: add kexec callback functionality Juergen Gross
@ 2025-06-14 17:34   ` Jason Andryuk
  2025-06-16  7:08     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:34 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:32 AM Juergen Gross <jgross@suse.com> wrote:
>
> Add a kexec_call() macro which will provide the capability to register
> a function for being called when doing a kexec() call. The called
> functions will be called with a boolean parameter "undo" indicating
> whether a previous call needs to be undone due to a failure during
> kexec().
>
> The related loop to call all callbacks is added to kexec().
>
> Signed-off-by: Juergen Gross <jgross@suse.com>

> diff --git a/arch/x86/mm.c b/arch/x86/mm.c
> index f4419d95..26ede6f4 100644
> --- a/arch/x86/mm.c
> +++ b/arch/x86/mm.c
> @@ -529,7 +529,8 @@ void change_readonly(bool readonly)
>  #endif
>      }
>
> -    printk("setting %p-%p readonly\n", &_text, &_erodata);
> +    printk("setting %p-%p %s\n", &_text, &_erodata,
> +           readonly ? "readonly" : "writable");

Oh, I think this belongs in the earlier change.

With that moved, this one (and the earlier one still)

Code wise:
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

But this kexec_call() macro isn't actually used?  xenstore needs this
to prepare for kexec?

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 12/12] kexec: do the final kexec step
  2025-03-21  9:24 ` [MINI-OS PATCH 12/12] kexec: do the final kexec step Juergen Gross
@ 2025-06-14 17:39   ` Jason Andryuk
  2025-06-16  7:09     ` Jürgen Groß
  0 siblings, 1 reply; 37+ messages in thread
From: Jason Andryuk @ 2025-06-14 17:39 UTC (permalink / raw)
  To: Juergen Gross; +Cc: minios-devel, xen-devel, samuel.thibault

On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>
> With all kexec preparations done, activate the new kernel.
>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  kexec.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/kexec.c b/kexec.c
> index 2db876e8..85b09959 100644
> --- a/kexec.c
> +++ b/kexec.c
> @@ -169,6 +169,7 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)

> @@ -192,6 +193,13 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
>      if ( ret )
>          goto err;
>
> +    kexec_page = (void *)alloc_page();

kexec_page() is referenced already in do_kexec(), but it hasn't been
hooked up yet, right?  I guess that is okay.

If not an ASSERT on 1 page, then allocate KEXEC_SECSIZE?

Regards,
Jason


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 01/12] kexec: add kexec framework
  2025-06-14 16:40   ` Jason Andryuk
@ 2025-06-16  5:40     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  5:40 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 4160 bytes --]

Jason,

thanks for having a look at the series! I very much appreciate that!

On 14.06.25 18:40, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> Add a new config option CONFIG_KEXEC for support of kexec-ing into a
>> new mini-os kernel. Add a related kexec.c source and a kexec.h header.
>>
>> For now allow CONFIG_KEXEC to be set only for PVH variant of mini-os.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
> 
> 
>> diff --git a/arch/x86/testbuild/all-yes b/arch/x86/testbuild/all-yes
>> index 8ae489a4..99ba75dd 100644
>> --- a/arch/x86/testbuild/all-yes
>> +++ b/arch/x86/testbuild/all-yes
>> @@ -19,3 +19,5 @@ CONFIG_BALLOON = y
>>   CONFIG_USE_XEN_CONSOLE = y
>>   # The following are special: they need support from outside
>>   CONFIG_LWIP = n
>> +# KEXEC only without PARAVIRT
> 
> Maybe: "KEXEC not implemented for PARAVIRT"?

Fine with me.

> 
>> +CONFIG_KEXEC = n
> 
>> diff --git a/kexec.c b/kexec.c
>> new file mode 100644
>> index 00000000..53528169
>> --- /dev/null
>> +++ b/kexec.c
>> @@ -0,0 +1,62 @@
> 
>> +
>> +#include <errno.h>
>> +#include <mini-os/os.h>
>> +#include <mini-os/lib.h>
>> +#include <mini-os/kexec.h>
>> +
>> +/*
>> + * General approach for kexec support (PVH only) is as follows:
>> + *
>> + * - New kernel needs to be in memory in form of a ELF file in a virtual
> 
> "in the form of an ELF binary"
> 
>> + *   memory region.
> 
> Maybe just "The new kernel needs to be an ELF binary loaded into the
> Mini-OS address space"?

The "virtual memory region" is quite important, as this allows to handle
conflicts with the target memory layout on a per-page basis.

> 
>> + * - A new start_info structure is constructed in memory with the final
>> + *   memory locations included.
>> + * - All memory areas needed for kexec execution are being finalized.
>> + * - From here on a graceful failure is no longer possible.
>> + * - Grants and event channels are torn down.
>> + * - A temporary set of page tables is constructed at a location where it
>> + *   doesn't conflict with old and new kernel or start_info.
>> + * - The final kexec execution stage is copied to a memory area below 4G which
>> + *   doesn't conflict with the target areas of kernel etc.
>> + * - Cr3 is switched to the new set of page tables.
>> + * - Execution continues in the final execution stage.
>> + * - All data is copied to its final addresses.
>> + * - Processing is switched to 32-bit mode without address translation.
> 
> Maybe "CPU is switched to 32-bit mode with paging disabled."?

Okay.

> 
> Is the following memory layout correct?
> 
> [ 0 ... 8MB ] ... [ X ... X + Y ] ... [ Z ...      ]
>   Old stubdom        New stubdom         kexec code

With:
O: old stubdom kernel
P: active page tables
N: new stubdom kernel
Z: kexec code.

The guest physical memory layout is more like:
OPOOONP.NN.N.NNN..ZNN..PP..

The target layout of this example (before the final kexec stage) will be:
O.OOO....N.N.NNNP.ZNNP.PPNN

Note that all conflicting N and P entries have been moved to a position
behind the target position of the new kernel. This includes the page
tables in the old kernel which were pre-populated at boot time.

And before passing control to the new kernel it will be:
NNNNNNNNN.........Z........

> kexec code copies New stubdom to 0 and later jumps to New stubdom @ 0

Kind of. The "0" is not hard wired in the kexec code.

> The temporary page tables are to allow old stubdom and kexec code to
> be called while overwriting the "Old stubdom" range which would
> include the page tables originally used?  Or it can only run the kexec
> code once old stubdom is overwritten, right?

Yes.

I just realized that some of the comments are stale now. The current
implementation doesn't setup a new set of page tables, but is tweaking
the existing one to avoid conflicts.

> I think some comments tweaks would be helpful, but code-wise
> everything is okay, so:
> 
> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Thanks,


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 02/12] kexec: add final kexec stage
  2025-06-14 16:40   ` Jason Andryuk
@ 2025-06-16  6:13     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  6:13 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 2620 bytes --]

On 14.06.25 18:40, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> Add the code and data definitions of the final kexec stage.
>>
>> Put the code and related data into a dedicated section in order to be
>> able to copy it to another location. For this reason there must be no
>> absolute relocations being used in the code or data.
>>
>> Being functionally related, add a function for adding a final kexec
>> action.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
>> --- /dev/null
>> +++ b/arch/x86/kexec.c
>> @@ -0,0 +1,109 @@
> 
>> +
>> +/*
>> + * Final stage of kexec. Copies all data to the final destinations, zeroes
>> + * .bss and activates new kernel.
>> + * Must be called with interrupts off. Stack, code and data must be
>> + * accessible via identity mapped virtual addresses (virt == phys). Copying
>> + * and zeroing is done using virtual addresses.
>> + * No relocations inside the function are allowed, as it is copied to an
>> + * allocated page before being executed.
> 
> "page" is stated here.  Do we need an ASSERT later?

Good idea. I'll ad an ASSERT() to the linker script in order to catch
such an issue at build time.

> 
>> + */
> 
>> +void do_kexec(void *kexec_page)
>> +{
>> +    unsigned long actions;
>> +    unsigned long stack;
>> +    unsigned long final;
>> +    unsigned long phys;
>> +
>> +    actions = get_kexec_addr(kexec_page, kexec_actions);
>> +    stack = get_kexec_addr(kexec_page, kexec_stack + KEXEC_STACK_LONGS);
>> +    final = get_kexec_addr(kexec_page, kexec_final);
>> +    phys = get_kexec_addr(kexec_page, kexec_phys);
>> +
>> +    memcpy(kexec_page, _kexec_start, KEXEC_SECSIZE);
>> +    asm("cli\n\t"
>> +        "mov %0, %%"ASM_SP"\n\t"
>> +        "mov %1, %%"ASM_ARG1"\n\t"
>> +        "mov %2, %%"ASM_ARG2"\n\t"
>> +        "jmp *%3"
>> +        :"=m" (stack), "=m" (actions), "=m" (phys)
> 
> Aren't these inputs and not outputs?

Oh, of course they are.

> 
>> +        :"m" (final));
>> +}
>> +
>> +#endif /* CONFIG_KEXEC */
> 
> 
>> diff --git a/include/kexec.h b/include/kexec.h
>> index 6fd96774..722be456 100644
>> --- a/include/kexec.h
>> +++ b/include/kexec.h
>> @@ -1,7 +1,34 @@
> 
>> +
>> +int kexec_add_action(int action, void *dest, void *src, unsigned int len);
>> +
>> +#define KEXEC_SECSIZE ((unsigned long)_kexec_end - (unsigned long)_kexec_start)
> 
> Add a build assertion here?  Or maybe the correct amount is allocated
> and it doesn't matter.
> 
> Generally looks good.

Thanks,


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec
  2025-06-14 16:41   ` Jason Andryuk
@ 2025-06-16  6:42     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  6:42 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 1242 bytes --]

On 14.06.25 18:41, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:25 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> Analyze the properties of the new kernel to be loaded by kexec. The
>> data needed is:
>>
>> - upper boundary in final location
>> - copy and memory clear operations
>> - entry point and entry parameter
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
>> +
>> +static void check_notes_entry(elf_ehdr *ehdr, void *start, unsigned int len)
> 
> I think you should rename this to include read_ since it is necessary
> to set kernel_entry.  read_phys32_entry_note() or
> read_note_kernel_entry() or some variation.  To me, check_ implies a
> boolean return without a side effect.

I'll go with read_note_entry().

> 
>> @@ -54,9 +57,122 @@
>>    * - The new kernel is activated.
>>    */
>>
>> -int kexec(void *kernel, unsigned long kernel_size,
>> -          const char *cmdline)
> 
>> +int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
> 
> NIT: introduce kexec() with the single line form to avoid changing it
> 
> Everything else looks good, so preferably with the renaming:
> 
> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Thanks,


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 07/12] kexec: build parameters for new kernel
  2025-06-14 17:02   ` Jason Andryuk
@ 2025-06-16  7:00     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  7:00 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 779 bytes --]

On 14.06.25 19:02, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> Build the parameters for the new kernel, consisting of the
>> hvm_start_info struct, the memory map and the command line.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
>> @@ -212,4 +213,61 @@ void kexec_set_param_loc(const char *cmdline)
> 
>> +
>> +    /* The call of the new kernel happens via the physical address! */
>> +    if ( kexec_add_action(KEXEC_CALL, (void *)kernel_entry,
> 
> Maybe kernel_entry_pa, kernel_phys32_entry, or kernel_phys_entry would
> be a better name to make the physical address clear?

Fine with me.

> 
> Either way:
> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>

Thanks,


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 11/12] kexec: add kexec callback functionality
  2025-06-14 17:34   ` Jason Andryuk
@ 2025-06-16  7:08     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  7:08 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 1309 bytes --]

On 14.06.25 19:34, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:32 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> Add a kexec_call() macro which will provide the capability to register
>> a function for being called when doing a kexec() call. The called
>> functions will be called with a boolean parameter "undo" indicating
>> whether a previous call needs to be undone due to a failure during
>> kexec().
>>
>> The related loop to call all callbacks is added to kexec().
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
>> diff --git a/arch/x86/mm.c b/arch/x86/mm.c
>> index f4419d95..26ede6f4 100644
>> --- a/arch/x86/mm.c
>> +++ b/arch/x86/mm.c
>> @@ -529,7 +529,8 @@ void change_readonly(bool readonly)
>>   #endif
>>       }
>>
>> -    printk("setting %p-%p readonly\n", &_text, &_erodata);
>> +    printk("setting %p-%p %s\n", &_text, &_erodata,
>> +           readonly ? "readonly" : "writable");
> 
> Oh, I think this belongs in the earlier change.

Indeed.

> 
> With that moved, this one (and the earlier one still)
> 
> Code wise:
> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
> 
> But this kexec_call() macro isn't actually used?  xenstore needs this
> to prepare for kexec?

This will be needed to e.g. handle FD_CLOEXEC.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [MINI-OS PATCH 12/12] kexec: do the final kexec step
  2025-06-14 17:39   ` Jason Andryuk
@ 2025-06-16  7:09     ` Jürgen Groß
  0 siblings, 0 replies; 37+ messages in thread
From: Jürgen Groß @ 2025-06-16  7:09 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: minios-devel, xen-devel, samuel.thibault


[-- Attachment #1.1.1: Type: text/plain, Size: 1032 bytes --]

On 14.06.25 19:39, Jason Andryuk wrote:
> On Fri, Mar 21, 2025 at 5:30 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> With all kexec preparations done, activate the new kernel.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   kexec.c | 17 +++++++++++++++--
>>   1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/kexec.c b/kexec.c
>> index 2db876e8..85b09959 100644
>> --- a/kexec.c
>> +++ b/kexec.c
>> @@ -169,6 +169,7 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
> 
>> @@ -192,6 +193,13 @@ int kexec(void *kernel, unsigned long kernel_size, const char *cmdline)
>>       if ( ret )
>>           goto err;
>>
>> +    kexec_page = (void *)alloc_page();
> 
> kexec_page() is referenced already in do_kexec(), but it hasn't been
> hooked up yet, right?  I guess that is okay.

Yes, shouldn't cause any issues.

> 
> If not an ASSERT on 1 page, then allocate KEXEC_SECSIZE?

Handled now via an ASSERT() in the linker script.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-06-16  7:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-21  9:24 [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Juergen Gross
2025-03-21  9:24 ` [MINI-OS PATCH 01/12] kexec: add kexec framework Juergen Gross
2025-06-14 16:40   ` Jason Andryuk
2025-06-16  5:40     ` Jürgen Groß
2025-03-21  9:24 ` [MINI-OS PATCH 02/12] kexec: add final kexec stage Juergen Gross
2025-06-14 16:40   ` Jason Andryuk
2025-06-16  6:13     ` Jürgen Groß
2025-03-21  9:24 ` [MINI-OS PATCH 03/12] add elf.h Juergen Gross
2025-03-21 13:51   ` Jan Beulich
2025-03-21 15:53     ` Jürgen Groß
2025-03-24  9:25       ` Jan Beulich
2025-03-21  9:24 ` [MINI-OS PATCH 04/12] kexec: analyze new kernel for kexec Juergen Gross
2025-06-14 16:41   ` Jason Andryuk
2025-06-16  6:42     ` Jürgen Groß
2025-03-21  9:24 ` [MINI-OS PATCH 05/12] kexec: finalize parameter location and size Juergen Gross
2025-06-14 16:43   ` Jason Andryuk
2025-03-21  9:24 ` [MINI-OS PATCH 06/12] kexec: reserve memory below boundary Juergen Gross
2025-06-14 16:56   ` Jason Andryuk
2025-03-21  9:24 ` [MINI-OS PATCH 07/12] kexec: build parameters for new kernel Juergen Gross
2025-06-14 17:02   ` Jason Andryuk
2025-06-16  7:00     ` Jürgen Groß
2025-03-21  9:24 ` [MINI-OS PATCH 08/12] kexec: move used pages away " Juergen Gross
2025-06-14 17:19   ` Jason Andryuk
2025-03-21  9:24 ` [MINI-OS PATCH 09/12] mm: change set_readonly() to change_readonly() Juergen Gross
2025-06-14 17:25   ` Jason Andryuk
2025-03-21  9:24 ` [MINI-OS PATCH 10/12] kexec: switch read-only area to be writable again Juergen Gross
2025-06-14 17:26   ` Jason Andryuk
2025-03-21  9:24 ` [MINI-OS PATCH 11/12] kexec: add kexec callback functionality Juergen Gross
2025-06-14 17:34   ` Jason Andryuk
2025-06-16  7:08     ` Jürgen Groß
2025-03-21  9:24 ` [MINI-OS PATCH 12/12] kexec: do the final kexec step Juergen Gross
2025-06-14 17:39   ` Jason Andryuk
2025-06-16  7:09     ` Jürgen Groß
2025-03-22 23:54 ` [MINI-OS PATCH 00/12] kexec: add kexec support to Mini-OS Samuel Thibault
2025-03-23  7:01   ` Jürgen Groß
2025-05-07 12:58 ` Juergen Gross
2025-06-13  9:34   ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.