public inbox for u-boot@lists.denx.de
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Coroutines
@ 2025-01-20 13:50 Jerome Forissier
  2025-01-20 13:50 ` [RFC PATCH 1/2] Introduce coroutines framework Jerome Forissier
  2025-01-20 13:50 ` [RFC PATCH 2/2] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier
  0 siblings, 2 replies; 6+ messages in thread
From: Jerome Forissier @ 2025-01-20 13:50 UTC (permalink / raw)
  To: u-boot; +Cc: Ilias Apalodimas, Jerome Forissier

This series introduces a simple coroutines framework and uses it to
speed up the efi_init_obj_list() function. It is just an example to
trigger discussions; hopefully other places in U-Boot can benefit from
a similar treatment. Suggestions are welcome.

I came up with this idea after analyzing some profiling data (ftrace)
taken during the execution of the "printenv -e" command on a Kria KV260
board. When the command is first invoked, efi_init_obj_list() is called
which takes a significant amount of time (2.872 seconds). This time can
be split into 0.570 s for efi_disks_register(), 0.811 s for
efi_tcg2_register() and 1.418 s for efi_tcg2_do_initial_measurement().
All the other child functions are much quicker. Another interesting
observation is that a large part of the time is actually spent waiting
in __udelay(). More precisely:
- For efi_disks_register(): 421 ms / 570 ms = 73.8 % spent in __udelay()
- For efi_tcg2_register(): 805 ms / 811 ms = 99.1 % spent in __udelay()
- For efi_tcg2_do_initial_measurement(): 1395.025 ms / 1418.372 ms =
  98.3 % spent in __udelay()

Given the above data, it was reasonable to think that a nice speedup
could be obtained if these functions could somehow be run in parallel.
efi_tcg2_do_initial_measurement() unfortunately needs to be excluded
because it depends on the first two. But efi_disks_register() and
efi_tcg2_register() are clearly independant and are therefore good
candidates for concurrent execution.

So, I considered two options:
- Spin up a secondary core to take care of one function while the other
one runs on the main core
- Introduce some kind of software scheduling
I quickly ruled out the first one for several reasons: initializing a
secondary core is typically quite hardware-specific, it would not scale
well if more functions than available cores would need to be run in
parallel, it would make debugging harder etc.
Software scheduling however can be accomplished quite easily, especially
since we don't need to consider preemptive multitasking. Coroutines [1]
for example can perfectly do the job. They provide a way to save and
restore the execution context (registers and stack). Here is how it
look like:

static void do_some_work(int v)
{
	int i;
	
	for (i = 0; i < 5; i++) {
		printf("%d", v);
		/* Save context and resume main thread */
		co_yield();
	}
}

static void co1(void *arg)
{
	do_some_work(1);
	/* Mark coroutine as "done" and resume main thread */
	co_exit();
}

static void co2(void *arg)
{
	do_some_work(2);
	co_exit();
}

void main_thread(void)
{
	struct co *co1, *co2;

	co1 = co_create(co1, ...);
	co2 = co_create(co2, ...);

	do {
		printf("A");
		if (!co1->done) {
			/* Invoke or resume first coroutine */
			co_resume(co1);
		}
		printf("B");
		if (!cor21->done) {
			/* Invoke or resume second coroutine */
			co_resume(co2);
		}
	} while (!(co1->done && co2->done));

	/* At this point, co1 and co2 have both called co_exit() */
}

The above example would print: A1B2A1B2A1B2A1B2A1B2.

- The first commit introduces the coroutine framework, loosely based on
libaco [2]. The code was simplified and reformatted to better suit
U-Boot.
- The second commit modifies efi_init_obj_list() in order to turn
efi_disks_register() and efi_tcg2_register() into coroutines when
COROUTINES in enabled.

[1] https://en.wikipedia.org/wiki/Coroutine
[2] https://github.com/hnes/libaco/

Jerome Forissier (2):
  Introduce coroutines framework
  efi_loader: optimize efi_init_obj_list() with coroutines

 arch/arm/cpu/armv8/Makefile     |   1 +
 arch/arm/cpu/armv8/co_switch.S  |  35 +++++++
 arch/x86/cpu/i386/Makefile      |   1 +
 arch/x86/cpu/i386/co_switch.S   |  26 +++++
 arch/x86/cpu/x86_64/Makefile    |   2 +
 arch/x86/cpu/x86_64/co_switch.S |  26 +++++
 include/coroutines.h            | 151 +++++++++++++++++++++++++++
 lib/Kconfig                     |  10 ++
 lib/Makefile                    |   2 +
 lib/coroutines.c                | 176 ++++++++++++++++++++++++++++++++
 lib/efi_loader/efi_setup.c      | 113 +++++++++++++++++++-
 lib/time.c                      |  14 ++-
 12 files changed, 552 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/cpu/armv8/co_switch.S
 create mode 100644 arch/x86/cpu/i386/co_switch.S
 create mode 100644 arch/x86/cpu/x86_64/co_switch.S
 create mode 100644 include/coroutines.h
 create mode 100644 lib/coroutines.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/2] Introduce coroutines framework
  2025-01-20 13:50 [RFC PATCH 0/2] Coroutines Jerome Forissier
@ 2025-01-20 13:50 ` Jerome Forissier
  2025-01-21 11:44   ` Michal Simek
  2025-01-20 13:50 ` [RFC PATCH 2/2] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier
  1 sibling, 1 reply; 6+ messages in thread
From: Jerome Forissier @ 2025-01-20 13:50 UTC (permalink / raw)
  To: u-boot
  Cc: Ilias Apalodimas, Jerome Forissier, Tom Rini, Simon Glass,
	Bin Meng, Patrick Rudolph, Sughosh Ganu, Michal Simek,
	Heinrich Schuchardt, Raymond Mao

Adds the COROUTINES Kconfig symbol which introduces a new internal API
for coroutines support. As explained in the Kconfig file, this is meant
to provide some kind of cooperative multi-tasking with the goal to
improve performance by overlapping lengthy operations.

The API as well as the implementation is very much inspired from libaco
[1]. The reference implementation is simplified to remove all things
not needed in U-Boot, the coding style is updated, and the aco_ prefix
is replaced by co_.

I believe the stack handling could be simplified: the stack of the main
coroutine could probably probably be used by the secondary coroutines
instead of allocating a new stack dynamically.

Only i386, x86_64 and aarch64 are supported at the moment. Other
architectures need to provide a _co_switch() function in assembly.

Only aarch64 has been tested.

[1] https://github.com/hnes/libaco/

Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
---
 arch/arm/cpu/armv8/Makefile     |   1 +
 arch/arm/cpu/armv8/co_switch.S  |  35 +++++++
 arch/x86/cpu/i386/Makefile      |   1 +
 arch/x86/cpu/i386/co_switch.S   |  26 +++++
 arch/x86/cpu/x86_64/Makefile    |   2 +
 arch/x86/cpu/x86_64/co_switch.S |  26 +++++
 include/coroutines.h            | 151 +++++++++++++++++++++++++++
 lib/Kconfig                     |  10 ++
 lib/Makefile                    |   2 +
 lib/coroutines.c                | 176 ++++++++++++++++++++++++++++++++
 10 files changed, 430 insertions(+)
 create mode 100644 arch/arm/cpu/armv8/co_switch.S
 create mode 100644 arch/x86/cpu/i386/co_switch.S
 create mode 100644 arch/x86/cpu/x86_64/co_switch.S
 create mode 100644 include/coroutines.h
 create mode 100644 lib/coroutines.c

diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
index 2e71ff2dc97..6d07b6aa9f9 100644
--- a/arch/arm/cpu/armv8/Makefile
+++ b/arch/arm/cpu/armv8/Makefile
@@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/
 obj-$(CONFIG_XEN) += xen/
 obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o
 obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o
+obj-$(CONFIG_COROUTINES) += co_switch.o
diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S
new file mode 100644
index 00000000000..2fa6c52935a
--- /dev/null
+++ b/arch/arm/cpu/armv8/co_switch.S
@@ -0,0 +1,35 @@
+/* void _co_switch(struct uco *from_co, struct uco *to_co); */
+.text
+.globl _co_switch
+.type  _co_switch, @function
+_co_switch:
+    // x0: from_co
+    // x1: to_co
+    // from_co and to_co layout: { pc, sp, x19-x29 }
+
+    // Save context to from_co (x0)
+    // AAPCS64 says "A subroutine invocation must preserve the contents of the
+    // registers r19-r29 and SP"
+    adr x2, 1f  // pc we should use to resume after this function
+    mov x3, sp
+    stp x2, x3, [x0, #0]  // pc, sp
+    stp x19, x20, [x0, #16]
+    stp x21, x22, [x0, #32]
+    stp x23, x24, [x0, #48]
+    stp x25, x26, [x0, #64]
+    stp x27, x28, [x0, #80]
+    stp x29, x30, [x0, #96]
+
+    // Load new context from to_co (x1)
+    ldp x2, x3, [x1, #0]  // pc, sp
+    ldp x19, x20, [x1, #16]
+    ldp x21, x22, [x1, #32]
+    ldp x23, x24, [x1, #48]
+    ldp x25, x26, [x1, #64]
+    ldp x27, x28, [x1, #80]
+    ldp x29, x30, [x1, #96]
+    mov sp, x3
+    br x2
+
+1:  // Return to the caller
+    ret
diff --git a/arch/x86/cpu/i386/Makefile b/arch/x86/cpu/i386/Makefile
index 18e152074a7..7066904a41e 100644
--- a/arch/x86/cpu/i386/Makefile
+++ b/arch/x86/cpu/i386/Makefile
@@ -9,3 +9,4 @@ ifndef CONFIG_TPL_BUILD
 obj-y += interrupt.o
 endif
 obj-y += setjmp.o
+obj-$(CONFIG_COROUTINES) += co_switch.o
diff --git a/arch/x86/cpu/i386/co_switch.S b/arch/x86/cpu/i386/co_switch.S
new file mode 100644
index 00000000000..ec4d2d778f7
--- /dev/null
+++ b/arch/x86/cpu/i386/co_switch.S
@@ -0,0 +1,26 @@
+/* void _co_switch(struct uco *from_co, struct uco *to_co); */
+.text
+.globl _co_switch
+.type  _co_switch, @function
+.intel_syntax noprefix
+_co_switch:
+    mov     eax,DWORD PTR [esp+0x4]     // from_co
+    mov     edx,DWORD PTR [esp]         // retaddr
+    lea     ecx,[esp+0x4]               // esp
+    mov     DWORD PTR [eax+0x8],ebp     //<ebp
+    mov     DWORD PTR [eax+0x4],ecx     //<esp
+    mov     DWORD PTR [eax+0x0],edx     //<retaddr
+    mov     DWORD PTR [eax+0xc],edi     //<edi
+    mov     ecx,DWORD PTR [esp+0x8]     // to_co
+    mov     DWORD PTR [eax+0x10],esi    //<esi
+    mov     DWORD PTR [eax+0x14],ebx    //<ebx
+    mov     edx,DWORD PTR [ecx+0x4]     //>esp
+    mov     ebp,DWORD PTR [ecx+0x8]     //>ebp
+    mov     eax,DWORD PTR [ecx+0x0]     //>retaddr
+    mov     edi,DWORD PTR [ecx+0xc]     //>edi
+    mov     esi,DWORD PTR [ecx+0x10]    //>esi
+    mov     ebx,DWORD PTR [ecx+0x14]    //>ebx
+    xor     ecx,ecx
+    mov     esp,edx
+    xor     edx,edx
+    jmp     eax
diff --git a/arch/x86/cpu/x86_64/Makefile b/arch/x86/cpu/x86_64/Makefile
index e929563b2c1..862f522242c 100644
--- a/arch/x86/cpu/x86_64/Makefile
+++ b/arch/x86/cpu/x86_64/Makefile
@@ -8,3 +8,5 @@ obj-y += cpu.o interrupts.o setjmp.o
 ifndef CONFIG_EFI
 obj-y += misc.o
 endif
+
+obj-$(CONFIG_COUROUTINES) += co_switch.o
diff --git a/arch/x86/cpu/x86_64/co_switch.S b/arch/x86/cpu/x86_64/co_switch.S
new file mode 100644
index 00000000000..ec928f2d1f7
--- /dev/null
+++ b/arch/x86/cpu/x86_64/co_switch.S
@@ -0,0 +1,26 @@
+/* void _co_switch(struct uco *from_co, struct uco *to_co); */
+.text
+.globl _co_switch
+.type  _co_switch, @function
+.intel_syntax noprefix
+_co_switch:
+    mov     rdx,QWORD PTR [rsp]      // retaddr
+    lea     rcx,[rsp+0x8]            // rsp
+    mov     QWORD PTR [rdi+0x0], r12
+    mov     QWORD PTR [rdi+0x8], r13
+    mov     QWORD PTR [rdi+0x10],r14
+    mov     QWORD PTR [rdi+0x18],r15
+    mov     QWORD PTR [rdi+0x20],rdx // retaddr
+    mov     QWORD PTR [rdi+0x28],rcx // rsp
+    mov     QWORD PTR [rdi+0x30],rbx
+    mov     QWORD PTR [rdi+0x38],rbp
+    mov     r12,QWORD PTR [rsi+0x0]
+    mov     r13,QWORD PTR [rsi+0x8]
+    mov     r14,QWORD PTR [rsi+0x10]
+    mov     r15,QWORD PTR [rsi+0x18]
+    mov     rax,QWORD PTR [rsi+0x20] // retaddr
+    mov     rcx,QWORD PTR [rsi+0x28] // rsp
+    mov     rbx,QWORD PTR [rsi+0x30]
+    mov     rbp,QWORD PTR [rsi+0x38]
+    mov     rsp,rcx
+    jmp     rax
diff --git a/include/coroutines.h b/include/coroutines.h
new file mode 100644
index 00000000000..2e2fb1170a3
--- /dev/null
+++ b/include/coroutines.h
@@ -0,0 +1,151 @@
+/* SPDX-License-Identifier: Apache-2.0 */
+/*
+ * Copyright 2018 Sen Han <00hnes@gmail.com>
+ * Copyright 2025 Linaro Limited
+ */
+
+#ifndef _COROUTINES_H_
+#define _COROUTINES_H_
+
+#ifndef CONFIG_COROUTINES
+
+static inline void co_yield(void) {}
+static inline void co_exit(void) {}
+
+#else
+
+#ifdef __UBOOT__
+#include <log.h>
+#else
+#include <assert.h>
+#endif
+#include <limits.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+
+#ifdef __i386__
+#define UCO_REG_IDX_RETADDR 0
+#define UCO_REG_IDX_SP 1
+#define UCO_REG_IDX_BP 2
+#elif __x86_64__
+#define UCO_REG_IDX_RETADDR 4
+#define UCO_REG_IDX_SP 5
+#define UCO_REG_IDX_BP 7
+#elif __aarch64__
+#define UCO_REG_IDX_RETADDR 0
+#define UCO_REG_IDX_SP 1
+#else
+#error Architecture no supported
+#endif
+
+struct co_save_stack {
+    void*  ptr;
+    size_t sz;
+    size_t valid_sz;
+    size_t max_cpsz; /* max copy size in bytes */
+};
+
+struct co_stack {
+    void *ptr;
+    size_t sz;
+    void *align_highptr;
+    void *align_retptr;
+    size_t align_validsz;
+    size_t align_limit;
+    struct co *owner;
+    void *real_ptr;
+    size_t real_sz;
+};
+
+struct co {
+    /*
+     * CPU registers state (callee-savec plus SP, PC)
+     */
+#ifdef __i386__
+        void*  reg[6];
+#elif __x86_64__
+        void*  reg[8];
+#elif __aarch64__
+	void *reg[14];  // pc, sp, x19-x29, x30 (lr)
+#endif
+	struct co *main_co;
+	void *arg;
+	bool done;
+
+	void (*fp)(void);
+
+	struct co_save_stack save_stack;
+	struct co_stack *stack;
+};
+
+#if defined(__i386__) || defined(__x86_64__)
+#define UCO_THREAD __thread
+#else
+#define UCO_THREAD
+#endif
+
+extern UCO_THREAD struct co *current_co;
+
+static inline struct co *co_get_co(void)
+{
+	return current_co;
+}
+
+static inline void *co_get_arg(void)
+{
+	return co_get_co()->arg;
+}
+
+struct co_stack *co_stack_new(size_t sz);
+
+void co_stack_destroy(struct co_stack *s);
+
+struct co *co_create(struct co *main_co,
+		     struct co_stack *stack,
+		     size_t save_stack_sz, void (*fp)(void),
+		     void *arg);
+
+void co_resume(struct co *resume_co);
+
+void co_destroy(struct co *co);
+
+void *_co_switch(struct co *from_co, struct co *to_co);
+
+static inline void _co_yield_to_main_co(struct co *yield_co)
+{
+    assert(yield_co);
+    assert(yield_co->main_co);
+    _co_switch(yield_co, yield_co->main_co);
+}
+
+static inline void co_yield(void)
+{
+	if (current_co)
+		_co_yield_to_main_co(current_co);
+}
+
+static inline bool co_is_main_co(struct co *co)
+{
+	return !co->main_co;
+}
+
+static inline void co_exit(void)
+{
+	struct co *co = co_get_co();
+
+	if (!co)
+		return;
+	co->done = true;
+	assert(co->stack->owner == co);
+	co->stack->owner = NULL;
+	co->stack->align_validsz = 0;
+	_co_yield_to_main_co(co);
+	assert(false);
+}
+
+#endif /* CONFIG_COROUTINES */
+#endif /* _COROUTINES_H_ */
diff --git a/lib/Kconfig b/lib/Kconfig
index 8f1a96d98c4..b6c1380b927 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
 	  enable this config option to distinguish them using
 	  phandles in fdtdec_get_alias_seq() function.
 
+config COROUTINES
+	bool "Enable coroutine support"
+	help
+	  Coroutines allow to implement a simple form of cooperative
+	  multi-tasking. The main thread of execution registers one or
+	  more functions as coroutine entry points, then it schedules one
+	  of them. At any point the scheduled coroutine may yield, that is,
+	  suspend its execution and return back to the main thread. At this
+	  point another coroutine may be scheduled and so on until all the
+	  registered coroutines are done.
 endmenu
 
 source "lib/fwu_updates/Kconfig"
diff --git a/lib/Makefile b/lib/Makefile
index 5cb3278d2ef..7b809151f5a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
 
 obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
 
+obj-$(CONFIG_COROUTINES) += coroutines.o
+
 #
 # Build a fast OID lookup registry from include/linux/oid_registry.h
 #
diff --git a/lib/coroutines.c b/lib/coroutines.c
new file mode 100644
index 00000000000..b23c2004a06
--- /dev/null
+++ b/lib/coroutines.c
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: Apache-2.0
+
+// Copyright 2018 Sen Han <00hnes@gmail.com>
+// Copyright 2025 Linaro Limited
+
+#include <coroutines.h>
+#include <stdio.h>
+#include <stdint.h>
+
+
+/* Current co-routine */
+UCO_THREAD struct co *current_co;
+
+struct co_stack *co_stack_new(size_t sz)
+{
+	struct co_stack *p = calloc(1, sizeof(*p));
+	uintptr_t u_p;
+
+	if (!p)
+		return NULL;
+
+	if (sz < 4096)
+		sz = 4096;
+
+	p->sz = sz;
+	p->ptr = malloc(sz);
+	if (!p->ptr) {
+		free(p);
+		return NULL;
+	}
+
+	p->owner = NULL;
+	u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
+	u_p = (u_p >> 4) << 4;
+	p->align_highptr = (void*)u_p;
+	p->align_retptr  = (void*)(u_p - sizeof(void*));
+	assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
+	p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
+
+	return p;
+}
+
+void co_stack_destroy(struct co_stack *s){
+	if (!s)
+		return;
+	free(s->ptr);
+	free(s);
+}
+
+struct co *co_create(struct co *main_co,
+		     struct co_stack *stack,
+		     size_t save_stack_sz,
+		     void (*fp)(void), void *arg)
+{
+	struct co *p = malloc(sizeof(*p));
+	assert(p);
+	memset(p, 0, sizeof(*p));
+
+	if (main_co) {
+		assert(stack);
+		p->stack = stack;
+#ifdef __i386__
+		// POSIX.1-2008 (IEEE Std 1003.1-2008) - General Information - Data Types - Pointer Types
+		// http://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/V2_chap02.html#tag_15_12_03
+		p->reg[UCO_REG_IDX_RETADDR] = (void*)fp;
+		// push retaddr
+		p->reg[UCO_REG_IDX_SP] = p->stack->align_retptr;
+#elif  __x86_64__
+		p->reg[UCO_REG_IDX_RETADDR] = (void*)fp;
+		p->reg[UCO_REG_IDX_SP] = p->stack->align_retptr;
+#elif __aarch64__
+		p->reg[UCO_REG_IDX_RETADDR] = (void *)fp;
+		// FIXME setting to align_retptr causes a crash
+		p->reg[UCO_REG_IDX_SP] = p->stack->align_highptr;
+#endif
+		p->main_co = main_co;
+		p->arg = arg;
+		p->fp = fp;
+		if (!save_stack_sz)
+			save_stack_sz = 64;
+		p->save_stack.ptr = malloc(save_stack_sz);
+		assert(p->save_stack.ptr);
+		p->save_stack.sz = save_stack_sz;
+		p->save_stack.valid_sz = 0;
+	} else {
+		p->main_co = NULL;
+		p->arg = arg;
+		p->fp = fp;
+		p->stack = NULL;
+		p->save_stack.ptr = NULL;
+	}
+	return p;
+}
+
+static void grab_stack(struct co *resume_co)
+{
+	struct co *owner_co = resume_co->stack->owner;
+
+	if (owner_co) {
+		assert(owner_co->stack == resume_co->stack);
+		assert((uintptr_t)(owner_co->stack->align_retptr) >=
+		       (uintptr_t)(owner_co->reg[UCO_REG_IDX_SP]));
+		assert((uintptr_t)owner_co->stack->align_highptr -
+				(uintptr_t)owner_co->stack->align_limit
+			<= (uintptr_t)owner_co->reg[UCO_REG_IDX_SP]);
+		owner_co->save_stack.valid_sz =
+			(uintptr_t)owner_co->stack->align_retptr -
+			(uintptr_t)owner_co->reg[UCO_REG_IDX_SP];
+		if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
+			free(owner_co->save_stack.ptr);
+			owner_co->save_stack.ptr = NULL;
+			do {
+				owner_co->save_stack.sz <<= 1;
+				assert(owner_co->save_stack.sz > 0);
+			} while (owner_co->save_stack.sz <
+				 owner_co->save_stack.valid_sz);
+			owner_co->save_stack.ptr =
+				malloc(owner_co->save_stack.sz);
+			assert(owner_co->save_stack.ptr);
+		}
+		if (owner_co->save_stack.valid_sz > 0)
+			memcpy(owner_co->save_stack.ptr,
+			       owner_co->reg[UCO_REG_IDX_SP],
+			       owner_co->save_stack.valid_sz);
+		if (owner_co->save_stack.valid_sz >
+		    owner_co->save_stack.max_cpsz)
+			owner_co->save_stack.max_cpsz =
+				owner_co->save_stack.valid_sz;
+		owner_co->stack->owner = NULL;
+		owner_co->stack->align_validsz = 0;
+	}
+	assert(!resume_co->stack->owner);
+	assert(resume_co->save_stack.valid_sz <=
+	       resume_co->stack->align_limit - sizeof(void *));
+	if (resume_co->save_stack.valid_sz > 0)
+		memcpy((void*)
+		       (uintptr_t)(resume_co->stack->align_retptr) -
+				resume_co->save_stack.valid_sz,
+		       resume_co->save_stack.ptr,
+		       resume_co->save_stack.valid_sz);
+	if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
+		resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
+	resume_co->stack->align_validsz =
+		resume_co->save_stack.valid_sz + sizeof(void *);
+	resume_co->stack->owner = resume_co;
+}
+
+void co_resume(struct co *resume_co)
+{
+	assert(resume_co && resume_co->main_co && !resume_co->done);
+
+	if (resume_co->stack->owner != resume_co)
+		grab_stack(resume_co);
+
+	current_co = resume_co;
+	_co_switch(resume_co->main_co, resume_co);
+	current_co = resume_co->main_co;
+}
+
+void co_destroy(struct co *co){
+	if (!co)
+		return;
+
+	if(co_is_main_co(co)){
+		free(co);
+		current_co = NULL;
+	} else {
+		if(co->stack->owner == co){
+			co->stack->owner = NULL;
+			co->stack->align_validsz = 0;
+		}
+		free(co->save_stack.ptr);
+		co->save_stack.ptr = NULL;
+		free(co);
+	}
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/2] efi_loader: optimize efi_init_obj_list() with coroutines
  2025-01-20 13:50 [RFC PATCH 0/2] Coroutines Jerome Forissier
  2025-01-20 13:50 ` [RFC PATCH 1/2] Introduce coroutines framework Jerome Forissier
@ 2025-01-20 13:50 ` Jerome Forissier
  1 sibling, 0 replies; 6+ messages in thread
From: Jerome Forissier @ 2025-01-20 13:50 UTC (permalink / raw)
  To: u-boot
  Cc: Ilias Apalodimas, Jerome Forissier, Heinrich Schuchardt, Tom Rini,
	Simon Glass

When COROUTINES is enabled, schedule efi_disks_register() and
efi_tcg2_register() as two coroutines in efi_init_obj_list() instead of
invoking them sequentially. The voluntary yield point is introduced
inside udelay() which is called frequently as a result of each function
polling the hardware. This allows the two coroutines to make progress
simultaneously and reduce the wall clock time required by
efi_init_obj_list().

Tested on Kria KV260 with a microSD card inserted with the "printenv
-e" command. With COROUTINES disabled, efi_init_obj_list() completes in
2821 ms on average (2825, 2822, 2817). With COROUTINES enabled, it
takes 2265 ms (2262, 2260, 2272). That is a reduction of 556 ms which
is not bad at all considering that measured separately,
efi_tcg2_register() takes ~825 ms and efi_disks_register() needs ~600
ms, so assuming they would overlap perfectly one can expect a 600 ms
improvement at best.

The code size penalty for this improvement is 1340 bytes.

Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
---
 lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++++++++++++++++++--
 lib/time.c                 |  14 ++++-
 2 files changed, 122 insertions(+), 5 deletions(-)

diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c
index aa59bc7779d..94160f4bd86 100644
--- a/lib/efi_loader/efi_setup.c
+++ b/lib/efi_loader/efi_setup.c
@@ -7,10 +7,12 @@
 
 #define LOG_CATEGORY LOGC_EFI
 
+#include <coroutines.h>
 #include <efi_loader.h>
 #include <efi_variable.h>
 #include <log.h>
 #include <asm-generic/unaligned.h>
+#include <stdlib.h>
 
 #define OBJ_LIST_NOT_INITIALIZED 1
 
@@ -208,6 +210,46 @@ out:
 	return -1;
 }
 
+#if CONFIG_IS_ENABLED(COROUTINES)
+
+static void efi_disks_register_co(void)
+{
+	efi_status_t ret;
+
+	if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
+		goto out;
+
+	/*
+	 * Probe block devices to find the ESP.
+	 * efi_disks_register() must be called before efi_init_variables().
+	 */
+	ret = efi_disks_register();
+	if (ret != EFI_SUCCESS)
+		efi_obj_list_initialized = ret;
+out:
+	co_exit();
+}
+
+static void efi_tcg2_register_co(void)
+{
+	efi_status_t ret = EFI_SUCCESS;
+
+	if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
+		goto out;
+
+	if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
+		ret = efi_tcg2_register();
+		if (ret != EFI_SUCCESS)
+			efi_obj_list_initialized = ret;
+	}
+out:
+	co_exit();
+}
+
+extern int udelay_yield;
+
+#endif /* COROUTINES */
+
 /**
  * efi_init_obj_list() - Initialize and populate EFI object list
  *
@@ -216,6 +258,12 @@ out:
 efi_status_t efi_init_obj_list(void)
 {
 	efi_status_t ret = EFI_SUCCESS;
+#if CONFIG_IS_ENABLED(COROUTINES)
+	struct co_stack *stk = NULL;
+	struct co *main_co = NULL;
+	struct co *co1 = NULL;
+	struct co *co2 = NULL;
+#endif
 
 	/* Initialize once only */
 	if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
@@ -224,6 +272,53 @@ efi_status_t efi_init_obj_list(void)
 	/* Set up console modes */
 	efi_setup_console_size();
 
+#if CONFIG_IS_ENABLED(COROUTINES)
+	main_co = co_create(NULL, NULL, 0, NULL, NULL);
+	if (!main_co) {
+		ret = EFI_OUT_OF_RESOURCES;
+		goto out;
+	}
+
+	stk = co_stack_new(8192);
+	if (!stk) {
+		ret = EFI_OUT_OF_RESOURCES;
+		goto out;
+	}
+
+	co1 = co_create(main_co, stk, 0, efi_disks_register_co, NULL);
+	if (!co1) {
+		ret = EFI_OUT_OF_RESOURCES;
+		goto out;
+	}
+
+	co2 = co_create(main_co, stk, 0, efi_tcg2_register_co, NULL);
+	if (!co2) {
+		ret = EFI_OUT_OF_RESOURCES;
+		goto out;
+	}
+
+	udelay_yield = 0xCAFEDECA;
+	do {
+		if (!co1->done)
+			co_resume(co1);
+		if (!co2->done)
+			co_resume(co2);
+	} while (!(co1->done && co2->done));
+	udelay_yield = 0;
+
+	co_stack_destroy(stk);
+	co_destroy(main_co);
+	co_destroy(co1);
+	co_destroy(co2);
+	stk = NULL;
+	main_co = co1 = co2 = NULL;
+
+	if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) {
+		/* Some kind of error was saved by a coroutine */
+		ret = efi_obj_list_initialized;
+		goto out;
+	}
+#else
 	/*
 	 * Probe block devices to find the ESP.
 	 * efi_disks_register() must be called before efi_init_variables().
@@ -232,6 +327,13 @@ efi_status_t efi_init_obj_list(void)
 	if (ret != EFI_SUCCESS)
 		goto out;
 
+	if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
+		ret = efi_tcg2_register();
+		if (ret != EFI_SUCCESS)
+			efi_obj_list_initialized = ret;
+	}
+#endif
+
 	/* Initialize variable services */
 	ret = efi_init_variables();
 	if (ret != EFI_SUCCESS)
@@ -272,10 +374,6 @@ efi_status_t efi_init_obj_list(void)
 	}
 
 	if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
-		ret = efi_tcg2_register();
-		if (ret != EFI_SUCCESS)
-			goto out;
-
 		ret = efi_tcg2_do_initial_measurement();
 		if (ret == EFI_SECURITY_VIOLATION)
 			goto out;
@@ -350,6 +448,13 @@ efi_status_t efi_init_obj_list(void)
 	    !IS_ENABLED(CONFIG_EFI_CAPSULE_ON_DISK_EARLY))
 		ret = efi_launch_capsules();
 out:
+#if CONFIG_IS_ENABLED(COROUTINES)
+	co_stack_destroy(stk);
+	co_destroy(main_co);
+	co_destroy(co1);
+	co_destroy(co2);
 	efi_obj_list_initialized = ret;
+#endif
+
 	return ret;
 }
diff --git a/lib/time.c b/lib/time.c
index d88edafb196..c11288102fe 100644
--- a/lib/time.c
+++ b/lib/time.c
@@ -17,6 +17,7 @@
 #include <asm/global_data.h>
 #include <asm/io.h>
 #include <linux/delay.h>
+#include <coroutines.h>
 
 #ifndef CFG_WD_PERIOD
 # define CFG_WD_PERIOD	(10 * 1000 * 1000)	/* 10 seconds default */
@@ -190,6 +191,8 @@ void __weak __udelay(unsigned long usec)
 
 /* ------------------------------------------------------------------------- */
 
+int udelay_yield;
+
 void udelay(unsigned long usec)
 {
 	ulong kv;
@@ -197,7 +200,16 @@ void udelay(unsigned long usec)
 	do {
 		schedule();
 		kv = usec > CFG_WD_PERIOD ? CFG_WD_PERIOD : usec;
-		__udelay(kv);
+		if (CONFIG_IS_ENABLED(COROUTINES) &&
+		    udelay_yield == 0xCAFEDECA) {
+			ulong t0 = timer_get_us();
+			do {
+				co_yield();
+				__udelay(10);
+			} while (timer_get_us() < t0 + kv);
+		} else {
+			__udelay(kv);
+		}
 		usec -= kv;
 	} while(usec);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/2] Introduce coroutines framework
  2025-01-20 13:50 ` [RFC PATCH 1/2] Introduce coroutines framework Jerome Forissier
@ 2025-01-21 11:44   ` Michal Simek
  2025-01-21 13:15     ` Jerome Forissier
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Simek @ 2025-01-21 11:44 UTC (permalink / raw)
  To: Jerome Forissier, u-boot
  Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Bin Meng,
	Patrick Rudolph, Sughosh Ganu, Heinrich Schuchardt, Raymond Mao



On 1/20/25 14:50, Jerome Forissier wrote:
> Adds the COROUTINES Kconfig symbol which introduces a new internal API
> for coroutines support. As explained in the Kconfig file, this is meant
> to provide some kind of cooperative multi-tasking with the goal to
> improve performance by overlapping lengthy operations.
> 
> The API as well as the implementation is very much inspired from libaco
> [1]. The reference implementation is simplified to remove all things
> not needed in U-Boot, the coding style is updated, and the aco_ prefix
> is replaced by co_.
> 
> I believe the stack handling could be simplified: the stack of the main
> coroutine could probably probably be used by the secondary coroutines
> instead of allocating a new stack dynamically.
> 
> Only i386, x86_64 and aarch64 are supported at the moment. Other
> architectures need to provide a _co_switch() function in assembly.
> 
> Only aarch64 has been tested.

I can't see the reason why to keep x86 around if it is not tested.

Licenses should be cleared for all files.

Also I think this mostly should target full u-boot and not really TPL/SPL as of 
today.

I have applied this patch to measure time difference on kv260 and kd240 and 
pretty much the time is the same.

diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c
index 94160f4bd86c..18d2260b8c11 100644
--- a/lib/efi_loader/efi_setup.c
+++ b/lib/efi_loader/efi_setup.c
@@ -258,6 +258,7 @@ extern int udelay_yield;
  efi_status_t efi_init_obj_list(void)
  {
         efi_status_t ret = EFI_SUCCESS;
+       ulong t0, t1;
  #if CONFIG_IS_ENABLED(COROUTINES)
         struct co_stack *stk = NULL;
         struct co *main_co = NULL;
@@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void)
         /* Set up console modes */
         efi_setup_console_size();

+       t0 = timer_get_us();
  #if CONFIG_IS_ENABLED(COROUTINES)
         main_co = co_create(NULL, NULL, 0, NULL, NULL);
         if (!main_co) {
@@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void)
                         efi_obj_list_initialized = ret;
         }
  #endif
+       t1 = timer_get_us();
+
+       printf("time counted %ld\n", t1 - t0);

         /* Initialize variable services */
         ret = efi_init_variables();

Please correct me if you have measured it differently.

Thanks,
Michal

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/2] Introduce coroutines framework
  2025-01-21 11:44   ` Michal Simek
@ 2025-01-21 13:15     ` Jerome Forissier
  2025-01-21 13:52       ` Michal Simek
  0 siblings, 1 reply; 6+ messages in thread
From: Jerome Forissier @ 2025-01-21 13:15 UTC (permalink / raw)
  To: Michal Simek, u-boot
  Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Bin Meng,
	Patrick Rudolph, Sughosh Ganu, Heinrich Schuchardt, Raymond Mao

Hi Michal,

On 1/21/25 12:44, Michal Simek wrote:
> 
> 
> On 1/20/25 14:50, Jerome Forissier wrote:
>> Adds the COROUTINES Kconfig symbol which introduces a new internal API
>> for coroutines support. As explained in the Kconfig file, this is meant
>> to provide some kind of cooperative multi-tasking with the goal to
>> improve performance by overlapping lengthy operations.
>>
>> The API as well as the implementation is very much inspired from libaco
>> [1]. The reference implementation is simplified to remove all things
>> not needed in U-Boot, the coding style is updated, and the aco_ prefix
>> is replaced by co_.
>>
>> I believe the stack handling could be simplified: the stack of the main
>> coroutine could probably probably be used by the secondary coroutines
>> instead of allocating a new stack dynamically.
>>
>> Only i386, x86_64 and aarch64 are supported at the moment. Other
>> architectures need to provide a _co_switch() function in assembly.
>>
>> Only aarch64 has been tested.
> 
> I can't see the reason why to keep x86 around if it is not tested.

OK, I'll drop x86 in v2.
 
> Licenses should be cleared for all files.

I will add "SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later"
to the coroutines C implementation (include/coroutines.h and
lib/coroutines.c) since it originates from libaco which is Apache-2.0
then was modified by me for U-Boot. I will add GPL-2.0-or-later to
arch/arm/cpu/armv8/co_switch.S since it is a new file written by me.


> Also I think this mostly should target full u-boot and not really TPL/SPL as of today.

I agree, but why do you think it targets xPL? efi_init_obj_list() is
called as a result of "printenv -e" in the main U-Boot for example.
I mentioned in the cover letter that I think coroutines could be
beneficial in other places too.

> 
> I have applied this patch to measure time difference on kv260 and kd240 and pretty much the time is the same.
> 
> diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c
> index 94160f4bd86c..18d2260b8c11 100644
> --- a/lib/efi_loader/efi_setup.c
> +++ b/lib/efi_loader/efi_setup.c
> @@ -258,6 +258,7 @@ extern int udelay_yield;
>  efi_status_t efi_init_obj_list(void)
>  {
>         efi_status_t ret = EFI_SUCCESS;
> +       ulong t0, t1;
>  #if CONFIG_IS_ENABLED(COROUTINES)
>         struct co_stack *stk = NULL;
>         struct co *main_co = NULL;
> @@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void)
>         /* Set up console modes */
>         efi_setup_console_size();
> 
> +       t0 = timer_get_us();
>  #if CONFIG_IS_ENABLED(COROUTINES)
>         main_co = co_create(NULL, NULL, 0, NULL, NULL);
>         if (!main_co) {
> @@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void)
>                         efi_obj_list_initialized = ret;
>         }
>  #endif
> +       t1 = timer_get_us();
> +
> +       printf("time counted %ld\n", t1 - t0);
> 
>         /* Initialize variable services */
>         ret = efi_init_variables();
> 
> Please correct me if you have measured it differently.

That's fine. Do you have a SD card inserted in your board?
If not, the efi_disks_register() call is very quick and
therefore it makes little difference if it's run
sequentially or in parallel with efi_tcg2_register().

Thanks,
-- 
Jerome

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 1/2] Introduce coroutines framework
  2025-01-21 13:15     ` Jerome Forissier
@ 2025-01-21 13:52       ` Michal Simek
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Simek @ 2025-01-21 13:52 UTC (permalink / raw)
  To: Jerome Forissier, u-boot
  Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Bin Meng,
	Patrick Rudolph, Sughosh Ganu, Heinrich Schuchardt, Raymond Mao



On 1/21/25 14:15, Jerome Forissier wrote:
> Hi Michal,
> 
> On 1/21/25 12:44, Michal Simek wrote:
>>
>>
>> On 1/20/25 14:50, Jerome Forissier wrote:
>>> Adds the COROUTINES Kconfig symbol which introduces a new internal API
>>> for coroutines support. As explained in the Kconfig file, this is meant
>>> to provide some kind of cooperative multi-tasking with the goal to
>>> improve performance by overlapping lengthy operations.
>>>
>>> The API as well as the implementation is very much inspired from libaco
>>> [1]. The reference implementation is simplified to remove all things
>>> not needed in U-Boot, the coding style is updated, and the aco_ prefix
>>> is replaced by co_.
>>>
>>> I believe the stack handling could be simplified: the stack of the main
>>> coroutine could probably probably be used by the secondary coroutines
>>> instead of allocating a new stack dynamically.
>>>
>>> Only i386, x86_64 and aarch64 are supported at the moment. Other
>>> architectures need to provide a _co_switch() function in assembly.
>>>
>>> Only aarch64 has been tested.
>>
>> I can't see the reason why to keep x86 around if it is not tested.
> 
> OK, I'll drop x86 in v2.
>   
>> Licenses should be cleared for all files.
> 
> I will add "SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later"
> to the coroutines C implementation (include/coroutines.h and
> lib/coroutines.c) since it originates from libaco which is Apache-2.0
> then was modified by me for U-Boot. I will add GPL-2.0-or-later to
> arch/arm/cpu/armv8/co_switch.S since it is a new file written by me.
> 
> 
>> Also I think this mostly should target full u-boot and not really TPL/SPL as of today.
> 
> I agree, but why do you think it targets xPL? efi_init_obj_list() is
> called as a result of "printenv -e" in the main U-Boot for example.
> I mentioned in the cover letter that I think coroutines could be
> beneficial in other places too.

It is build for SPL too if you look at files which are touched.
I think you should define separate symbols for SPL/TPL.


> 
>>
>> I have applied this patch to measure time difference on kv260 and kd240 and pretty much the time is the same.
>>
>> diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c
>> index 94160f4bd86c..18d2260b8c11 100644
>> --- a/lib/efi_loader/efi_setup.c
>> +++ b/lib/efi_loader/efi_setup.c
>> @@ -258,6 +258,7 @@ extern int udelay_yield;
>>   efi_status_t efi_init_obj_list(void)
>>   {
>>          efi_status_t ret = EFI_SUCCESS;
>> +       ulong t0, t1;
>>   #if CONFIG_IS_ENABLED(COROUTINES)
>>          struct co_stack *stk = NULL;
>>          struct co *main_co = NULL;
>> @@ -272,6 +273,7 @@ efi_status_t efi_init_obj_list(void)
>>          /* Set up console modes */
>>          efi_setup_console_size();
>>
>> +       t0 = timer_get_us();
>>   #if CONFIG_IS_ENABLED(COROUTINES)
>>          main_co = co_create(NULL, NULL, 0, NULL, NULL);
>>          if (!main_co) {
>> @@ -333,6 +335,9 @@ efi_status_t efi_init_obj_list(void)
>>                          efi_obj_list_initialized = ret;
>>          }
>>   #endif
>> +       t1 = timer_get_us();
>> +
>> +       printf("time counted %ld\n", t1 - t0);
>>
>>          /* Initialize variable services */
>>          ret = efi_init_variables();
>>
>> Please correct me if you have measured it differently.
> 
> That's fine. Do you have a SD card inserted in your board?

yes I have. I do use xilinx_zynqmp_kria_defconfig only.

> If not, the efi_disks_register() call is very quick and
> therefore it makes little difference if it's run
> sequentially or in parallel with efi_tcg2_register().

I think in my case where distro boot runs things are initialized already that's 
why diff is not there.

Thanks,
Michal


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-01-21 13:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-20 13:50 [RFC PATCH 0/2] Coroutines Jerome Forissier
2025-01-20 13:50 ` [RFC PATCH 1/2] Introduce coroutines framework Jerome Forissier
2025-01-21 11:44   ` Michal Simek
2025-01-21 13:15     ` Jerome Forissier
2025-01-21 13:52       ` Michal Simek
2025-01-20 13:50 ` [RFC PATCH 2/2] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox