* [RFC PATCH v2 0/3] Coroutines
@ 2025-01-28 10:19 Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 10:19 UTC (permalink / raw)
To: u-boot; +Cc: Ilias Apalodimas, Jerome Forissier
This series introduces a simple coroutines framework and uses it to
speed up the efi_init_obj_list() function. It is just an example to
trigger discussions; hopefully other places in U-Boot can benefit from
a similar treatment. Suggestions are welcome.
I came up with this idea after analyzing some profiling data (ftrace)
taken during the execution of the "printenv -e" command on a Kria KV260
board. When the command is first invoked, efi_init_obj_list() is called
which takes a significant amount of time (2.872 seconds). This time can
be split into 0.570 s for efi_disks_register(), 0.811 s for
efi_tcg2_register() and 1.418 s for efi_tcg2_do_initial_measurement().
All the other child functions are much quicker. Another interesting
observation is that a large part of the time is actually spent waiting
in __udelay(). More precisely:
- For efi_disks_register(): 421 ms / 570 ms = 73.8 % spent in __udelay()
- For efi_tcg2_register(): 805 ms / 811 ms = 99.1 % spent in __udelay()
- For efi_tcg2_do_initial_measurement(): 1395.025 ms / 1418.372 ms =
98.3 % spent in __udelay()
Given the above data, it is reasonable to think that a nice speedup
could be obtained if these functions could somehow be run in parallel.
efi_tcg2_do_initial_measurement() unfortunately needs to be excluded
because it depends on the first two. But efi_disks_register() and
efi_tcg2_register() are clearly independant and are therefore good
candidates for concurrent execution.
So, I considered two options:
- Spin up a secondary core to take care of one function while the other
one runs on the main core
- Introduce some kind of software scheduling
I quickly ruled out the first one for several reasons: initializing a
secondary core is typically quite hardware-specific, it would not scale
well if more functions than available cores would need to be run in
parallel, it would make debugging harder etc.
Software scheduling however can be accomplished quite easily, especially
since we don't need to consider preemptive multitasking. Coroutines [1]
for example can perfectly do the job. They provide a way to save and
restore the execution context (registers and stack). Here is how it
look like:
static void do_some_work(int v)
{
int i;
for (i = 0; i < 5; i++) {
printf("%d", v);
/* Save context and resume main thread */
co_yield();
}
}
static void co1(void *arg)
{
do_some_work(1);
/* Mark coroutine as "done" and resume main thread */
co_exit();
}
static void co2(void *arg)
{
do_some_work(2);
co_exit();
}
void main_thread(void)
{
struct co *co1, *co2;
co1 = co_create(co1, ...);
co2 = co_create(co2, ...);
do {
printf("A");
if (!co1->done) {
/* Invoke or resume first coroutine */
co_resume(co1);
}
printf("B");
if (!cor21->done) {
/* Invoke or resume second coroutine */
co_resume(co2);
}
} while (!(co1->done && co2->done));
/* At this point, co1 and co2 have both called co_exit() */
}
The above example would print: A1B2A1B2A1B2A1B2A1B2.
- The first commit introduces the coroutine framework, loosely based on
libaco [2]. The code was simplified and reformatted to better suit
U-Boot.
- The second commit modifies efi_init_obj_list() in order to turn
efi_disks_register() and efi_tcg2_register() into coroutines when
COROUTINES in enabled. On a KV260 board with a SD card inserted, this
saves about .6 second (2.2 s instead of 2.8 s).
- The third commit applies coroutines to usb_init(), which can
significantly reduce the time it takes to scan multiple buses. Tested
on arm64 QEMU with 4 XHCI buses: the USB scan takes 2.2 s instead of
5.8 s.
[1] https://en.wikipedia.org/wiki/Coroutine
[2] https://github.com/hnes/libaco/
Changes in v2
- Remove x86 and x86_64 arch code since it is untested
- Add missing SPDX license tag to arch/arm/cpu/armv8/co_switch.S
- Change Apache-2.0 SPDX license tag to "Apache-2.0 OR GPL-2.0-or-later"
- Apply coroutines to the USB bus scan in usb_init()
Jerome Forissier (3):
Introduce coroutines framework
efi_loader: optimize efi_init_obj_list() with coroutines
usb: scan multiple buses simultaneously with coroutines
arch/arm/cpu/armv8/Makefile | 1 +
arch/arm/cpu/armv8/co_switch.S | 36 +++++++
drivers/usb/host/usb-uclass.c | 152 +++++++++++++++++++++++++++++-
include/coroutines.h | 130 ++++++++++++++++++++++++++
lib/Kconfig | 10 ++
lib/Makefile | 2 +
lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++++-
lib/time.c | 14 ++-
9 files changed, 615 insertions(+), 8 deletions(-)
create mode 100644 arch/arm/cpu/armv8/co_switch.S
create mode 100644 include/coroutines.h
create mode 100644 lib/coroutines.c
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH v2 1/3] Introduce coroutines framework
2025-01-28 10:19 [RFC PATCH v2 0/3] Coroutines Jerome Forissier
@ 2025-01-28 10:19 ` Jerome Forissier
2025-01-28 11:09 ` Michal Simek
2025-01-31 16:10 ` Yao Zi
2025-01-28 10:19 ` [RFC PATCH v2 2/3] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier
` (2 subsequent siblings)
3 siblings, 2 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 10:19 UTC (permalink / raw)
To: u-boot
Cc: Ilias Apalodimas, Jerome Forissier, Tom Rini, Simon Glass,
Patrick Rudolph, Sughosh Ganu, Michal Simek, Raymond Mao,
Heinrich Schuchardt
Adds the COROUTINES Kconfig symbol which introduces a new internal API
for coroutines support. As explained in the Kconfig file, this is meant
to provide some kind of cooperative multi-tasking with the goal to
improve performance by overlapping lengthy operations.
The API as well as the implementation is very much inspired from libaco
[1]. The reference implementation is simplified to remove all things
not needed in U-Boot, the coding style is updated, and the aco_ prefix
is replaced by co_.
I believe the stack handling could be simplified: the stack of the main
coroutine could probably probably be used by the secondary coroutines
instead of allocating a new stack dynamically.
Only i386, x86_64 and aarch64 are supported at the moment. Other
architectures need to provide a _co_switch() function in assembly.
Only aarch64 has been tested.
[1] https://github.com/hnes/libaco/
Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
---
arch/arm/cpu/armv8/Makefile | 1 +
arch/arm/cpu/armv8/co_switch.S | 36 +++++++
include/coroutines.h | 130 ++++++++++++++++++++++++++
lib/Kconfig | 10 ++
lib/Makefile | 2 +
lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
6 files changed, 344 insertions(+)
create mode 100644 arch/arm/cpu/armv8/co_switch.S
create mode 100644 include/coroutines.h
create mode 100644 lib/coroutines.c
diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
index 2e71ff2dc97..6d07b6aa9f9 100644
--- a/arch/arm/cpu/armv8/Makefile
+++ b/arch/arm/cpu/armv8/Makefile
@@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/
obj-$(CONFIG_XEN) += xen/
obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o
obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o
+obj-$(CONFIG_COROUTINES) += co_switch.o
diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S
new file mode 100644
index 00000000000..4405e89ec56
--- /dev/null
+++ b/arch/arm/cpu/armv8/co_switch.S
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/* void _co_switch(struct uco *from_co, struct uco *to_co); */
+.text
+.globl _co_switch
+.type _co_switch, @function
+_co_switch:
+ // x0: from_co
+ // x1: to_co
+ // from_co and to_co layout: { pc, sp, x19-x29 }
+
+ // Save context to from_co (x0)
+ // AAPCS64 says "A subroutine invocation must preserve the contents of the
+ // registers r19-r29 and SP"
+ adr x2, 1f // pc we should use to resume after this function
+ mov x3, sp
+ stp x2, x3, [x0, #0] // pc, sp
+ stp x19, x20, [x0, #16]
+ stp x21, x22, [x0, #32]
+ stp x23, x24, [x0, #48]
+ stp x25, x26, [x0, #64]
+ stp x27, x28, [x0, #80]
+ stp x29, x30, [x0, #96]
+
+ // Load new context from to_co (x1)
+ ldp x2, x3, [x1, #0] // pc, sp
+ ldp x19, x20, [x1, #16]
+ ldp x21, x22, [x1, #32]
+ ldp x23, x24, [x1, #48]
+ ldp x25, x26, [x1, #64]
+ ldp x27, x28, [x1, #80]
+ ldp x29, x30, [x1, #96]
+ mov sp, x3
+ br x2
+
+1: // Return to the caller
+ ret
diff --git a/include/coroutines.h b/include/coroutines.h
new file mode 100644
index 00000000000..b85b656127c
--- /dev/null
+++ b/include/coroutines.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later */
+/*
+ * Copyright 2018 Sen Han <00hnes@gmail.com>
+ * Copyright 2025 Linaro Limited
+ */
+
+#ifndef _COROUTINES_H_
+#define _COROUTINES_H_
+
+#ifndef CONFIG_COROUTINES
+
+static inline void co_yield(void) {}
+static inline void co_exit(void) {}
+
+#else
+
+#ifdef __UBOOT__
+#include <log.h>
+#else
+#include <assert.h>
+#endif
+#include <limits.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+
+#ifdef __aarch64__
+#define CO_REG_IDX_RETADDR 0
+#define CO_REG_IDX_SP 1
+#else
+#error Architecture no supported
+#endif
+
+struct co_save_stack {
+ void* ptr;
+ size_t sz;
+ size_t valid_sz;
+ size_t max_cpsz; /* max copy size in bytes */
+};
+
+struct co_stack {
+ void *ptr;
+ size_t sz;
+ void *align_highptr;
+ void *align_retptr;
+ size_t align_validsz;
+ size_t align_limit;
+ struct co *owner;
+ void *real_ptr;
+ size_t real_sz;
+};
+
+struct co {
+ /* CPU state: callee-saved registers plus SP and PC */
+ void *reg[14]; // pc, sp, x19-x29, x30 (lr)
+
+ struct co *main_co;
+ void *arg;
+ bool done;
+
+ void (*fp)(void);
+
+ struct co_save_stack save_stack;
+ struct co_stack *stack;
+};
+
+extern struct co *current_co;
+
+static inline struct co *co_get_co(void)
+{
+ return current_co;
+}
+
+static inline void *co_get_arg(void)
+{
+ return co_get_co()->arg;
+}
+
+struct co_stack *co_stack_new(size_t sz);
+
+void co_stack_destroy(struct co_stack *s);
+
+struct co *co_create(struct co *main_co,
+ struct co_stack *stack,
+ size_t save_stack_sz, void (*fp)(void),
+ void *arg);
+
+void co_resume(struct co *resume_co);
+
+void co_destroy(struct co *co);
+
+void *_co_switch(struct co *from_co, struct co *to_co);
+
+static inline void _co_yield_to_main_co(struct co *yield_co)
+{
+ assert(yield_co);
+ assert(yield_co->main_co);
+ _co_switch(yield_co, yield_co->main_co);
+}
+
+static inline void co_yield(void)
+{
+ if (current_co)
+ _co_yield_to_main_co(current_co);
+}
+
+static inline bool co_is_main_co(struct co *co)
+{
+ return !co->main_co;
+}
+
+static inline void co_exit(void)
+{
+ struct co *co = co_get_co();
+
+ if (!co)
+ return;
+ co->done = true;
+ assert(co->stack->owner == co);
+ co->stack->owner = NULL;
+ co->stack->align_validsz = 0;
+ _co_yield_to_main_co(co);
+ assert(false);
+}
+
+#endif /* CONFIG_COROUTINES */
+#endif /* _COROUTINES_H_ */
diff --git a/lib/Kconfig b/lib/Kconfig
index 8f1a96d98c4..b6c1380b927 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
enable this config option to distinguish them using
phandles in fdtdec_get_alias_seq() function.
+config COROUTINES
+ bool "Enable coroutine support"
+ help
+ Coroutines allow to implement a simple form of cooperative
+ multi-tasking. The main thread of execution registers one or
+ more functions as coroutine entry points, then it schedules one
+ of them. At any point the scheduled coroutine may yield, that is,
+ suspend its execution and return back to the main thread. At this
+ point another coroutine may be scheduled and so on until all the
+ registered coroutines are done.
endmenu
source "lib/fwu_updates/Kconfig"
diff --git a/lib/Makefile b/lib/Makefile
index 5cb3278d2ef..7b809151f5a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
+obj-$(CONFIG_COROUTINES) += coroutines.o
+
#
# Build a fast OID lookup registry from include/linux/oid_registry.h
#
diff --git a/lib/coroutines.c b/lib/coroutines.c
new file mode 100644
index 00000000000..20c5aba5510
--- /dev/null
+++ b/lib/coroutines.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later
+
+// Copyright 2018 Sen Han <00hnes@gmail.com>
+// Copyright 2025 Linaro Limited
+
+#include <coroutines.h>
+#include <stdio.h>
+#include <stdint.h>
+
+
+/* Current co-routine */
+struct co *current_co;
+
+struct co_stack *co_stack_new(size_t sz)
+{
+ struct co_stack *p = calloc(1, sizeof(*p));
+ uintptr_t u_p;
+
+ if (!p)
+ return NULL;
+
+ if (sz < 4096)
+ sz = 4096;
+
+ p->sz = sz;
+ p->ptr = malloc(sz);
+ if (!p->ptr) {
+ free(p);
+ return NULL;
+ }
+
+ p->owner = NULL;
+ u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
+ u_p = (u_p >> 4) << 4;
+ p->align_highptr = (void*)u_p;
+ p->align_retptr = (void*)(u_p - sizeof(void*));
+ assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
+ p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
+
+ return p;
+}
+
+void co_stack_destroy(struct co_stack *s){
+ if (!s)
+ return;
+ free(s->ptr);
+ free(s);
+}
+
+struct co *co_create(struct co *main_co,
+ struct co_stack *stack,
+ size_t save_stack_sz,
+ void (*fp)(void), void *arg)
+{
+ struct co *p = malloc(sizeof(*p));
+ assert(p);
+ memset(p, 0, sizeof(*p));
+
+ if (main_co) {
+ assert(stack);
+ p->stack = stack;
+ p->reg[CO_REG_IDX_RETADDR] = (void *)fp;
+ // FIXME original code uses align_retptr; causes a crash
+ p->reg[CO_REG_IDX_SP] = p->stack->align_highptr;
+ p->main_co = main_co;
+ p->arg = arg;
+ p->fp = fp;
+ if (!save_stack_sz)
+ save_stack_sz = 64;
+ p->save_stack.ptr = malloc(save_stack_sz);
+ assert(p->save_stack.ptr);
+ p->save_stack.sz = save_stack_sz;
+ p->save_stack.valid_sz = 0;
+ } else {
+ p->main_co = NULL;
+ p->arg = arg;
+ p->fp = fp;
+ p->stack = NULL;
+ p->save_stack.ptr = NULL;
+ }
+ return p;
+}
+
+static void grab_stack(struct co *resume_co)
+{
+ struct co *owner_co = resume_co->stack->owner;
+
+ if (owner_co) {
+ assert(owner_co->stack == resume_co->stack);
+ assert((uintptr_t)(owner_co->stack->align_retptr) >=
+ (uintptr_t)(owner_co->reg[CO_REG_IDX_SP]));
+ assert((uintptr_t)owner_co->stack->align_highptr -
+ (uintptr_t)owner_co->stack->align_limit
+ <= (uintptr_t)owner_co->reg[CO_REG_IDX_SP]);
+ owner_co->save_stack.valid_sz =
+ (uintptr_t)owner_co->stack->align_retptr -
+ (uintptr_t)owner_co->reg[CO_REG_IDX_SP];
+ if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
+ free(owner_co->save_stack.ptr);
+ owner_co->save_stack.ptr = NULL;
+ do {
+ owner_co->save_stack.sz <<= 1;
+ assert(owner_co->save_stack.sz > 0);
+ } while (owner_co->save_stack.sz <
+ owner_co->save_stack.valid_sz);
+ owner_co->save_stack.ptr =
+ malloc(owner_co->save_stack.sz);
+ assert(owner_co->save_stack.ptr);
+ }
+ if (owner_co->save_stack.valid_sz > 0)
+ memcpy(owner_co->save_stack.ptr,
+ owner_co->reg[CO_REG_IDX_SP],
+ owner_co->save_stack.valid_sz);
+ if (owner_co->save_stack.valid_sz >
+ owner_co->save_stack.max_cpsz)
+ owner_co->save_stack.max_cpsz =
+ owner_co->save_stack.valid_sz;
+ owner_co->stack->owner = NULL;
+ owner_co->stack->align_validsz = 0;
+ }
+ assert(!resume_co->stack->owner);
+ assert(resume_co->save_stack.valid_sz <=
+ resume_co->stack->align_limit - sizeof(void *));
+ if (resume_co->save_stack.valid_sz > 0)
+ memcpy((void*)
+ (uintptr_t)(resume_co->stack->align_retptr) -
+ resume_co->save_stack.valid_sz,
+ resume_co->save_stack.ptr,
+ resume_co->save_stack.valid_sz);
+ if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
+ resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
+ resume_co->stack->align_validsz =
+ resume_co->save_stack.valid_sz + sizeof(void *);
+ resume_co->stack->owner = resume_co;
+}
+
+void co_resume(struct co *resume_co)
+{
+ assert(resume_co && resume_co->main_co && !resume_co->done);
+
+ if (resume_co->stack->owner != resume_co)
+ grab_stack(resume_co);
+
+ current_co = resume_co;
+ _co_switch(resume_co->main_co, resume_co);
+ current_co = resume_co->main_co;
+}
+
+void co_destroy(struct co *co){
+ if (!co)
+ return;
+
+ if(co_is_main_co(co)){
+ free(co);
+ current_co = NULL;
+ } else {
+ if(co->stack->owner == co){
+ co->stack->owner = NULL;
+ co->stack->align_validsz = 0;
+ }
+ free(co->save_stack.ptr);
+ co->save_stack.ptr = NULL;
+ free(co);
+ }
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v2 2/3] efi_loader: optimize efi_init_obj_list() with coroutines
2025-01-28 10:19 [RFC PATCH v2 0/3] Coroutines Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
@ 2025-01-28 10:19 ` Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously " Jerome Forissier
2025-01-28 14:04 ` [RFC PATCH v2 0/3] Coroutines Simon Glass
3 siblings, 0 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 10:19 UTC (permalink / raw)
To: u-boot
Cc: Ilias Apalodimas, Jerome Forissier, Heinrich Schuchardt, Tom Rini,
Simon Glass
When COROUTINES is enabled, schedule efi_disks_register() and
efi_tcg2_register() as two coroutines in efi_init_obj_list() instead of
invoking them sequentially. The voluntary yield point is introduced
inside udelay() which is called frequently as a result of each function
polling the hardware. This allows the two coroutines to make progress
simultaneously and reduce the wall clock time required by
efi_init_obj_list().
Tested on Kria KV260 with a microSD card inserted with the "printenv
-e" command. With COROUTINES disabled, efi_init_obj_list() completes in
2821 ms on average (2825, 2822, 2817). With COROUTINES enabled, it
takes 2265 ms (2262, 2260, 2272). That is a reduction of 556 ms which
is not bad at all considering that measured separately,
efi_tcg2_register() takes ~825 ms and efi_disks_register() needs ~600
ms, so assuming they would overlap perfectly one can expect a 600 ms
improvement at best.
The code size penalty for this improvement is 1340 bytes.
Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
---
lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++++++++++++++++++--
lib/time.c | 14 ++++-
2 files changed, 122 insertions(+), 5 deletions(-)
diff --git a/lib/efi_loader/efi_setup.c b/lib/efi_loader/efi_setup.c
index aa59bc7779d..94160f4bd86 100644
--- a/lib/efi_loader/efi_setup.c
+++ b/lib/efi_loader/efi_setup.c
@@ -7,10 +7,12 @@
#define LOG_CATEGORY LOGC_EFI
+#include <coroutines.h>
#include <efi_loader.h>
#include <efi_variable.h>
#include <log.h>
#include <asm-generic/unaligned.h>
+#include <stdlib.h>
#define OBJ_LIST_NOT_INITIALIZED 1
@@ -208,6 +210,46 @@ out:
return -1;
}
+#if CONFIG_IS_ENABLED(COROUTINES)
+
+static void efi_disks_register_co(void)
+{
+ efi_status_t ret;
+
+ if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
+ goto out;
+
+ /*
+ * Probe block devices to find the ESP.
+ * efi_disks_register() must be called before efi_init_variables().
+ */
+ ret = efi_disks_register();
+ if (ret != EFI_SUCCESS)
+ efi_obj_list_initialized = ret;
+out:
+ co_exit();
+}
+
+static void efi_tcg2_register_co(void)
+{
+ efi_status_t ret = EFI_SUCCESS;
+
+ if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
+ goto out;
+
+ if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
+ ret = efi_tcg2_register();
+ if (ret != EFI_SUCCESS)
+ efi_obj_list_initialized = ret;
+ }
+out:
+ co_exit();
+}
+
+extern int udelay_yield;
+
+#endif /* COROUTINES */
+
/**
* efi_init_obj_list() - Initialize and populate EFI object list
*
@@ -216,6 +258,12 @@ out:
efi_status_t efi_init_obj_list(void)
{
efi_status_t ret = EFI_SUCCESS;
+#if CONFIG_IS_ENABLED(COROUTINES)
+ struct co_stack *stk = NULL;
+ struct co *main_co = NULL;
+ struct co *co1 = NULL;
+ struct co *co2 = NULL;
+#endif
/* Initialize once only */
if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED)
@@ -224,6 +272,53 @@ efi_status_t efi_init_obj_list(void)
/* Set up console modes */
efi_setup_console_size();
+#if CONFIG_IS_ENABLED(COROUTINES)
+ main_co = co_create(NULL, NULL, 0, NULL, NULL);
+ if (!main_co) {
+ ret = EFI_OUT_OF_RESOURCES;
+ goto out;
+ }
+
+ stk = co_stack_new(8192);
+ if (!stk) {
+ ret = EFI_OUT_OF_RESOURCES;
+ goto out;
+ }
+
+ co1 = co_create(main_co, stk, 0, efi_disks_register_co, NULL);
+ if (!co1) {
+ ret = EFI_OUT_OF_RESOURCES;
+ goto out;
+ }
+
+ co2 = co_create(main_co, stk, 0, efi_tcg2_register_co, NULL);
+ if (!co2) {
+ ret = EFI_OUT_OF_RESOURCES;
+ goto out;
+ }
+
+ udelay_yield = 0xCAFEDECA;
+ do {
+ if (!co1->done)
+ co_resume(co1);
+ if (!co2->done)
+ co_resume(co2);
+ } while (!(co1->done && co2->done));
+ udelay_yield = 0;
+
+ co_stack_destroy(stk);
+ co_destroy(main_co);
+ co_destroy(co1);
+ co_destroy(co2);
+ stk = NULL;
+ main_co = co1 = co2 = NULL;
+
+ if (efi_obj_list_initialized != OBJ_LIST_NOT_INITIALIZED) {
+ /* Some kind of error was saved by a coroutine */
+ ret = efi_obj_list_initialized;
+ goto out;
+ }
+#else
/*
* Probe block devices to find the ESP.
* efi_disks_register() must be called before efi_init_variables().
@@ -232,6 +327,13 @@ efi_status_t efi_init_obj_list(void)
if (ret != EFI_SUCCESS)
goto out;
+ if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
+ ret = efi_tcg2_register();
+ if (ret != EFI_SUCCESS)
+ efi_obj_list_initialized = ret;
+ }
+#endif
+
/* Initialize variable services */
ret = efi_init_variables();
if (ret != EFI_SUCCESS)
@@ -272,10 +374,6 @@ efi_status_t efi_init_obj_list(void)
}
if (IS_ENABLED(CONFIG_EFI_TCG2_PROTOCOL)) {
- ret = efi_tcg2_register();
- if (ret != EFI_SUCCESS)
- goto out;
-
ret = efi_tcg2_do_initial_measurement();
if (ret == EFI_SECURITY_VIOLATION)
goto out;
@@ -350,6 +448,13 @@ efi_status_t efi_init_obj_list(void)
!IS_ENABLED(CONFIG_EFI_CAPSULE_ON_DISK_EARLY))
ret = efi_launch_capsules();
out:
+#if CONFIG_IS_ENABLED(COROUTINES)
+ co_stack_destroy(stk);
+ co_destroy(main_co);
+ co_destroy(co1);
+ co_destroy(co2);
efi_obj_list_initialized = ret;
+#endif
+
return ret;
}
diff --git a/lib/time.c b/lib/time.c
index d88edafb196..c11288102fe 100644
--- a/lib/time.c
+++ b/lib/time.c
@@ -17,6 +17,7 @@
#include <asm/global_data.h>
#include <asm/io.h>
#include <linux/delay.h>
+#include <coroutines.h>
#ifndef CFG_WD_PERIOD
# define CFG_WD_PERIOD (10 * 1000 * 1000) /* 10 seconds default */
@@ -190,6 +191,8 @@ void __weak __udelay(unsigned long usec)
/* ------------------------------------------------------------------------- */
+int udelay_yield;
+
void udelay(unsigned long usec)
{
ulong kv;
@@ -197,7 +200,16 @@ void udelay(unsigned long usec)
do {
schedule();
kv = usec > CFG_WD_PERIOD ? CFG_WD_PERIOD : usec;
- __udelay(kv);
+ if (CONFIG_IS_ENABLED(COROUTINES) &&
+ udelay_yield == 0xCAFEDECA) {
+ ulong t0 = timer_get_us();
+ do {
+ co_yield();
+ __udelay(10);
+ } while (timer_get_us() < t0 + kv);
+ } else {
+ __udelay(kv);
+ }
usec -= kv;
} while(usec);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously with coroutines
2025-01-28 10:19 [RFC PATCH v2 0/3] Coroutines Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 2/3] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier
@ 2025-01-28 10:19 ` Jerome Forissier
2025-01-28 11:58 ` Michal Simek
2025-01-28 21:36 ` Marek Vasut
2025-01-28 14:04 ` [RFC PATCH v2 0/3] Coroutines Simon Glass
3 siblings, 2 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 10:19 UTC (permalink / raw)
To: u-boot
Cc: Ilias Apalodimas, Jerome Forissier, Marek Vasut, Tom Rini,
Mattijs Korpershoek, Heinrich Schuchardt, Dragan Simic,
Caleb Connolly
Use the coroutines framework to scan USB buses in parallel for better
performance. Tested on arm64 QEMU on a somewhat contrived example
(4 USB buses, each with one audio device, one keyboard, one mouse and
one tablet).
$ make qemu_arm64_defconfig
$ make -j$(nproc) CROSS_COMPILE="ccache aarch64-linux-gnu-"
$ qemu-system-aarch64 -M virt -nographic -cpu max -bios u-boot.bin \
$(for i in {1..4}; do echo -device qemu-xhci,id=xhci$i \
-device\ usb-{audio,kbd,mouse,tablet},bus=xhci$i.0; \
done)
The time spent in usb_init() is reported on the console and shows a
significant improvement with COROUTINES enabled.
** Without COROUTINES
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devices... 6 USB Device(s) found
scanning bus xhci_pci for devices... 6 USB Device(s) found
scanning bus xhci_pci for devices... 6 USB Device(s) found
scanning bus xhci_pci for devices... 6 USB Device(s) found
USB: 4 bus(es) scanned in 5873 ms
** With COROUTINES
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Bus xhci_pci: Register 8001040 NbrPorts 8
Starting the controller
USB XHCI 1.00
Scanning 4 USB bus(es)... done
Bus xhci_pci: 6 USB device(s) found
Bus xhci_pci: 6 USB device(s) found
Bus xhci_pci: 6 USB device(s) found
Bus xhci_pci: 6 USB device(s) found
USB: 4 bus(es) scanned in 2213 ms
Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
---
drivers/usb/host/usb-uclass.c | 152 +++++++++++++++++++++++++++++++++-
1 file changed, 149 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/usb-uclass.c b/drivers/usb/host/usb-uclass.c
index bfec303e7af..3104efe7f9e 100644
--- a/drivers/usb/host/usb-uclass.c
+++ b/drivers/usb/host/usb-uclass.c
@@ -9,6 +9,7 @@
#define LOG_CATEGORY UCLASS_USB
#include <bootdev.h>
+#include <coroutines.h>
#include <dm.h>
#include <errno.h>
#include <log.h>
@@ -18,6 +19,8 @@
#include <dm/lists.h>
#include <dm/uclass-internal.h>
+#include <time.h>
+
static bool asynch_allowed;
struct usb_uclass_priv {
@@ -221,6 +224,40 @@ int usb_stop(void)
return err;
}
+static int nbus;
+
+#if CONFIG_IS_ENABLED(COROUTINES)
+static void usb_scan_bus(struct udevice *bus, bool recurse)
+{
+ struct usb_bus_priv *priv;
+ struct udevice *dev;
+ int ret;
+
+ priv = dev_get_uclass_priv(bus);
+
+ assert(recurse); /* TODO: Support non-recusive */
+
+ debug("\n");
+ ret = usb_scan_device(bus, 0, USB_SPEED_FULL, &dev);
+ if (ret)
+ printf("Scanning bus %s failed, error %d\n", bus->name, ret);
+}
+
+static void usb_report_devices(struct uclass *uc)
+{
+ struct usb_bus_priv *priv;
+ struct udevice *bus;
+
+ uclass_foreach_dev(bus, uc) {
+ priv = dev_get_uclass_priv(bus);
+ printf("Bus %s: ", bus->name);
+ if (priv->next_addr == 0)
+ printf("No USB device found\n");
+ else
+ printf("%d USB device(s) found\n", priv->next_addr);
+ }
+}
+#else
static void usb_scan_bus(struct udevice *bus, bool recurse)
{
struct usb_bus_priv *priv;
@@ -240,7 +277,81 @@ static void usb_scan_bus(struct udevice *bus, bool recurse)
printf("No USB Device found\n");
else
printf("%d USB Device(s) found\n", priv->next_addr);
+ nbus++;
}
+#endif
+
+#if CONFIG_IS_ENABLED(COROUTINES)
+extern int udelay_yield;
+
+static void usb_scan_bus_co(void)
+{
+ usb_scan_bus((struct udevice *)co_get_arg(), true);
+ co_exit();
+}
+
+static struct co_stack *stk;
+static struct co *main_co;
+static struct co **co;
+static int co_sz = 8;
+
+static int add_usb_scan_bus_co(struct udevice *bus)
+{
+ if (!co) {
+ co = malloc(co_sz * sizeof(*co));
+ if (!co)
+ return -ENOMEM;
+ }
+ if (nbus == co_sz) {
+ struct co **nco;
+
+ co_sz *= 2;
+ nco = realloc(co, co_sz * sizeof(*co));
+ if (!nco)
+ return -ENOMEM;
+ co = nco;
+ }
+ if (!main_co) {
+ main_co = co_create(NULL, NULL, 0, NULL, NULL);
+ if (!main_co)
+ return -ENOMEM;
+ }
+ if (!stk) {
+ stk = co_stack_new(32768);
+ if (!stk)
+ return -ENOMEM;
+ }
+ co[nbus] = co_create(main_co, stk, 0, usb_scan_bus_co, bus);
+ if (!co[nbus])
+ return -ENOMEM;
+ nbus++;
+ return 0;
+}
+
+static void usb_scan_cleanup(void)
+{
+ int i;
+
+ for (i = 0; i < nbus; i++) {
+ co_destroy(co[i]);
+ co[i] = NULL;
+ }
+ nbus = 0;
+ co_destroy(main_co);
+ main_co = NULL;
+ co_stack_destroy(stk);
+ stk = NULL;
+}
+#else
+static int add_usb_scan_bus_co(struct udevice *bus)
+{
+ return 0;
+}
+
+static void usb_scan_cleanup(void)
+{
+}
+#endif
static void remove_inactive_children(struct uclass *uc, struct udevice *bus)
{
@@ -289,6 +400,7 @@ static int usb_probe_companion(struct udevice *bus)
int usb_init(void)
{
+ unsigned long t0 = timer_get_us();
int controllers_initialized = 0;
struct usb_uclass_priv *uc_priv;
struct usb_bus_priv *priv;
@@ -355,10 +467,40 @@ int usb_init(void)
continue;
priv = dev_get_uclass_priv(bus);
- if (!priv->companion)
- usb_scan_bus(bus, true);
+ if (!priv->companion) {
+ if (CONFIG_IS_ENABLED(COROUTINES)) {
+ ret = add_usb_scan_bus_co(bus);
+ if (ret)
+ goto out;
+ } else {
+ usb_scan_bus(bus, true);
+ }
+ }
}
+#if CONFIG_IS_ENABLED(COROUTINES)
+ {
+ bool done;
+ int i;
+
+ printf("Scanning %d USB bus(es)... ", nbus);
+ udelay_yield = 0xCAFEDECA;
+ do {
+ done = true;
+ for (i = 0; i < nbus; i++) {
+ if (!co[i]->done) {
+ done = false;
+ co_resume(co[i]);
+ }
+ }
+ } while (!done);
+ udelay_yield = 0;
+ printf("done\n");
+
+ usb_report_devices(uc);
+ }
+#endif
+
/*
* Now that the primary controllers have been scanned and have handed
* over any devices they do not understand to their companions, scan
@@ -388,7 +530,11 @@ int usb_init(void)
/* if we were not able to find at least one working bus, bail out */
if (controllers_initialized == 0)
printf("No USB controllers found\n");
-
+out:
+ if (nbus)
+ printf("USB: %d bus(es) scanned in %ld ms\n", nbus,
+ (timer_get_us() - t0) / 1000);
+ usb_scan_cleanup();
return usb_started ? 0 : -ENOENT;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/3] Introduce coroutines framework
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
@ 2025-01-28 11:09 ` Michal Simek
2025-01-28 13:30 ` Jerome Forissier
2025-01-31 16:10 ` Yao Zi
1 sibling, 1 reply; 12+ messages in thread
From: Michal Simek @ 2025-01-28 11:09 UTC (permalink / raw)
To: Jerome Forissier, u-boot
Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Patrick Rudolph,
Sughosh Ganu, Raymond Mao, Heinrich Schuchardt
On 1/28/25 11:19, Jerome Forissier wrote:
> Adds the COROUTINES Kconfig symbol which introduces a new internal API
> for coroutines support. As explained in the Kconfig file, this is meant
> to provide some kind of cooperative multi-tasking with the goal to
> improve performance by overlapping lengthy operations.
>
> The API as well as the implementation is very much inspired from libaco
> [1]. The reference implementation is simplified to remove all things
> not needed in U-Boot, the coding style is updated, and the aco_ prefix
> is replaced by co_.
>
> I believe the stack handling could be simplified: the stack of the main
Please use imperative mood.
> coroutine could probably probably be used by the secondary coroutines
> instead of allocating a new stack dynamically.
>
> Only i386, x86_64 and aarch64 are supported at the moment. Other
> architectures need to provide a _co_switch() function in assembly.
Likely should be updated.
>
> Only aarch64 has been tested.
>
> [1] https://github.com/hnes/libaco/
>
> Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
> ---
> arch/arm/cpu/armv8/Makefile | 1 +
> arch/arm/cpu/armv8/co_switch.S | 36 +++++++
> include/coroutines.h | 130 ++++++++++++++++++++++++++
> lib/Kconfig | 10 ++
> lib/Makefile | 2 +
> lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
Checkpatch is reporting some issues. Please fix them.
total: 17 errors, 19 warnings, 1 checks, 359 lines checked
> 6 files changed, 344 insertions(+)
> create mode 100644 arch/arm/cpu/armv8/co_switch.S
> create mode 100644 include/coroutines.h
> create mode 100644 lib/coroutines.c
>
> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
> index 2e71ff2dc97..6d07b6aa9f9 100644
> --- a/arch/arm/cpu/armv8/Makefile
> +++ b/arch/arm/cpu/armv8/Makefile
> @@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/
> obj-$(CONFIG_XEN) += xen/
> obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o
> obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o
> +obj-$(CONFIG_COROUTINES) += co_switch.o
> diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S
> new file mode 100644
> index 00000000000..4405e89ec56
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/co_switch.S
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/* void _co_switch(struct uco *from_co, struct uco *to_co); */
> +.text
> +.globl _co_switch
> +.type _co_switch, @function
> +_co_switch:
> + // x0: from_co
> + // x1: to_co
> + // from_co and to_co layout: { pc, sp, x19-x29 }
> +
> + // Save context to from_co (x0)
> + // AAPCS64 says "A subroutine invocation must preserve the contents of the
> + // registers r19-r29 and SP"
> + adr x2, 1f // pc we should use to resume after this function
> + mov x3, sp
> + stp x2, x3, [x0, #0] // pc, sp
> + stp x19, x20, [x0, #16]
> + stp x21, x22, [x0, #32]
> + stp x23, x24, [x0, #48]
> + stp x25, x26, [x0, #64]
> + stp x27, x28, [x0, #80]
> + stp x29, x30, [x0, #96]
> +
> + // Load new context from to_co (x1)
> + ldp x2, x3, [x1, #0] // pc, sp
> + ldp x19, x20, [x1, #16]
> + ldp x21, x22, [x1, #32]
> + ldp x23, x24, [x1, #48]
> + ldp x25, x26, [x1, #64]
> + ldp x27, x28, [x1, #80]
> + ldp x29, x30, [x1, #96]
> + mov sp, x3
> + br x2
> +
> +1: // Return to the caller
> + ret
> diff --git a/include/coroutines.h b/include/coroutines.h
> new file mode 100644
> index 00000000000..b85b656127c
> --- /dev/null
> +++ b/include/coroutines.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later */
> +/*
> + * Copyright 2018 Sen Han <00hnes@gmail.com>
> + * Copyright 2025 Linaro Limited
> + */
> +
> +#ifndef _COROUTINES_H_
> +#define _COROUTINES_H_
> +
> +#ifndef CONFIG_COROUTINES
> +
> +static inline void co_yield(void) {}
> +static inline void co_exit(void) {}
> +
> +#else
> +
> +#ifdef __UBOOT__
> +#include <log.h>
> +#else
> +#include <assert.h>
> +#endif
Do we need ifdef at all? Are you testing it out of u-boot too? Or is this just
untested code?
> +#include <limits.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <time.h>
> +
> +#ifdef __aarch64__
> +#define CO_REG_IDX_RETADDR 0
> +#define CO_REG_IDX_SP 1
> +#else
> +#error Architecture no supported
> +#endif
> +
> +struct co_save_stack {
> + void* ptr;
> + size_t sz;
> + size_t valid_sz;
> + size_t max_cpsz; /* max copy size in bytes */
> +};
> +
> +struct co_stack {
> + void *ptr;
> + size_t sz;
> + void *align_highptr;
> + void *align_retptr;
> + size_t align_validsz;
> + size_t align_limit;
> + struct co *owner;
> + void *real_ptr;
> + size_t real_sz;
> +};
> +
> +struct co {
> + /* CPU state: callee-saved registers plus SP and PC */
> + void *reg[14]; // pc, sp, x19-x29, x30 (lr)
> +
> + struct co *main_co;
> + void *arg;
> + bool done;
> +
> + void (*fp)(void);
> +
> + struct co_save_stack save_stack;
> + struct co_stack *stack;
> +};
> +
> +extern struct co *current_co;
> +
> +static inline struct co *co_get_co(void)
> +{
> + return current_co;
> +}
> +
> +static inline void *co_get_arg(void)
> +{
> + return co_get_co()->arg;
> +}
> +
> +struct co_stack *co_stack_new(size_t sz);
> +
> +void co_stack_destroy(struct co_stack *s);
> +
> +struct co *co_create(struct co *main_co,
> + struct co_stack *stack,
> + size_t save_stack_sz, void (*fp)(void),
> + void *arg);
> +
> +void co_resume(struct co *resume_co);
> +
> +void co_destroy(struct co *co);
> +
> +void *_co_switch(struct co *from_co, struct co *to_co);
> +
> +static inline void _co_yield_to_main_co(struct co *yield_co)
> +{
> + assert(yield_co);
> + assert(yield_co->main_co);
> + _co_switch(yield_co, yield_co->main_co);
> +}
> +
> +static inline void co_yield(void)
> +{
> + if (current_co)
> + _co_yield_to_main_co(current_co);
> +}
> +
> +static inline bool co_is_main_co(struct co *co)
> +{
> + return !co->main_co;
> +}
> +
> +static inline void co_exit(void)
> +{
> + struct co *co = co_get_co();
> +
> + if (!co)
> + return;
> + co->done = true;
> + assert(co->stack->owner == co);
> + co->stack->owner = NULL;
> + co->stack->align_validsz = 0;
> + _co_yield_to_main_co(co);
> + assert(false);
> +}
> +
> +#endif /* CONFIG_COROUTINES */
> +#endif /* _COROUTINES_H_ */
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 8f1a96d98c4..b6c1380b927 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
> enable this config option to distinguish them using
> phandles in fdtdec_get_alias_seq() function.
>
> +config COROUTINES
> + bool "Enable coroutine support"
> + help
> + Coroutines allow to implement a simple form of cooperative
> + multi-tasking. The main thread of execution registers one or
> + more functions as coroutine entry points, then it schedules one
> + of them. At any point the scheduled coroutine may yield, that is,
> + suspend its execution and return back to the main thread. At this
> + point another coroutine may be scheduled and so on until all the
> + registered coroutines are done.
newline
> endmenu
>
> source "lib/fwu_updates/Kconfig"
> diff --git a/lib/Makefile b/lib/Makefile
> index 5cb3278d2ef..7b809151f5a 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
>
> obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
>
> +obj-$(CONFIG_COROUTINES) += coroutines.o
> +
> #
> # Build a fast OID lookup registry from include/linux/oid_registry.h
> #
> diff --git a/lib/coroutines.c b/lib/coroutines.c
> new file mode 100644
> index 00000000000..20c5aba5510
> --- /dev/null
> +++ b/lib/coroutines.c
> @@ -0,0 +1,165 @@
> +// SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later
> +
> +// Copyright 2018 Sen Han <00hnes@gmail.com>
> +// Copyright 2025 Linaro Limited
> +
> +#include <coroutines.h>
> +#include <stdio.h>
> +#include <stdint.h>
> +
> +
> +/* Current co-routine */
> +struct co *current_co;
> +
> +struct co_stack *co_stack_new(size_t sz)
> +{
> + struct co_stack *p = calloc(1, sizeof(*p));
> + uintptr_t u_p;
> +
> + if (!p)
> + return NULL;
> +
> + if (sz < 4096)
> + sz = 4096;
> +
> + p->sz = sz;
> + p->ptr = malloc(sz);
> + if (!p->ptr) {
> + free(p);
> + return NULL;
> + }
> +
> + p->owner = NULL;
> + u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
> + u_p = (u_p >> 4) << 4;
> + p->align_highptr = (void*)u_p;
> + p->align_retptr = (void*)(u_p - sizeof(void*));
> + assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
> + p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
> +
> + return p;
> +}
> +
> +void co_stack_destroy(struct co_stack *s){
> + if (!s)
> + return;
> + free(s->ptr);
> + free(s);
> +}
> +
> +struct co *co_create(struct co *main_co,
> + struct co_stack *stack,
> + size_t save_stack_sz,
> + void (*fp)(void), void *arg)
> +{
> + struct co *p = malloc(sizeof(*p));
> + assert(p);
> + memset(p, 0, sizeof(*p));
> +
> + if (main_co) {
> + assert(stack);
> + p->stack = stack;
> + p->reg[CO_REG_IDX_RETADDR] = (void *)fp;
> + // FIXME original code uses align_retptr; causes a crash
> + p->reg[CO_REG_IDX_SP] = p->stack->align_highptr;
> + p->main_co = main_co;
> + p->arg = arg;
> + p->fp = fp;
> + if (!save_stack_sz)
> + save_stack_sz = 64;
> + p->save_stack.ptr = malloc(save_stack_sz);
> + assert(p->save_stack.ptr);
> + p->save_stack.sz = save_stack_sz;
> + p->save_stack.valid_sz = 0;
> + } else {
> + p->main_co = NULL;
> + p->arg = arg;
> + p->fp = fp;
> + p->stack = NULL;
> + p->save_stack.ptr = NULL;
> + }
> + return p;
> +}
> +
> +static void grab_stack(struct co *resume_co)
> +{
> + struct co *owner_co = resume_co->stack->owner;
> +
> + if (owner_co) {
> + assert(owner_co->stack == resume_co->stack);
> + assert((uintptr_t)(owner_co->stack->align_retptr) >=
> + (uintptr_t)(owner_co->reg[CO_REG_IDX_SP]));
> + assert((uintptr_t)owner_co->stack->align_highptr -
> + (uintptr_t)owner_co->stack->align_limit
> + <= (uintptr_t)owner_co->reg[CO_REG_IDX_SP]);
> + owner_co->save_stack.valid_sz =
> + (uintptr_t)owner_co->stack->align_retptr -
> + (uintptr_t)owner_co->reg[CO_REG_IDX_SP];
> + if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
> + free(owner_co->save_stack.ptr);
> + owner_co->save_stack.ptr = NULL;
> + do {
> + owner_co->save_stack.sz <<= 1;
> + assert(owner_co->save_stack.sz > 0);
> + } while (owner_co->save_stack.sz <
> + owner_co->save_stack.valid_sz);
> + owner_co->save_stack.ptr =
> + malloc(owner_co->save_stack.sz);
> + assert(owner_co->save_stack.ptr);
> + }
> + if (owner_co->save_stack.valid_sz > 0)
> + memcpy(owner_co->save_stack.ptr,
> + owner_co->reg[CO_REG_IDX_SP],
> + owner_co->save_stack.valid_sz);
> + if (owner_co->save_stack.valid_sz >
> + owner_co->save_stack.max_cpsz)
> + owner_co->save_stack.max_cpsz =
> + owner_co->save_stack.valid_sz;
> + owner_co->stack->owner = NULL;
> + owner_co->stack->align_validsz = 0;
> + }
> + assert(!resume_co->stack->owner);
> + assert(resume_co->save_stack.valid_sz <=
> + resume_co->stack->align_limit - sizeof(void *));
> + if (resume_co->save_stack.valid_sz > 0)
> + memcpy((void*)
> + (uintptr_t)(resume_co->stack->align_retptr) -
> + resume_co->save_stack.valid_sz,
> + resume_co->save_stack.ptr,
> + resume_co->save_stack.valid_sz);
> + if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
> + resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
> + resume_co->stack->align_validsz =
> + resume_co->save_stack.valid_sz + sizeof(void *);
> + resume_co->stack->owner = resume_co;
> +}
> +
> +void co_resume(struct co *resume_co)
> +{
> + assert(resume_co && resume_co->main_co && !resume_co->done);
> +
> + if (resume_co->stack->owner != resume_co)
> + grab_stack(resume_co);
> +
> + current_co = resume_co;
> + _co_switch(resume_co->main_co, resume_co);
> + current_co = resume_co->main_co;
> +}
> +
> +void co_destroy(struct co *co){
obviously this is not u-boot coding style. I expect checkpatch reports this too.
> + if (!co)
> + return;
> +
> + if(co_is_main_co(co)){
> + free(co);
> + current_co = NULL;
> + } else {
> + if(co->stack->owner == co){
> + co->stack->owner = NULL;
> + co->stack->align_validsz = 0;
> + }
> + free(co->save_stack.ptr);
> + co->save_stack.ptr = NULL;
> + free(co);
> + }
> +}
Thanks,
Michal
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously with coroutines
2025-01-28 10:19 ` [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously " Jerome Forissier
@ 2025-01-28 11:58 ` Michal Simek
2025-01-28 13:53 ` Jerome Forissier
2025-01-28 21:36 ` Marek Vasut
1 sibling, 1 reply; 12+ messages in thread
From: Michal Simek @ 2025-01-28 11:58 UTC (permalink / raw)
To: Jerome Forissier
Cc: u-boot, Ilias Apalodimas, Marek Vasut, Tom Rini,
Mattijs Korpershoek, Heinrich Schuchardt, Dragan Simic,
Caleb Connolly
Hi,
út 28. 1. 2025 v 11:20 odesílatel Jerome Forissier
<jerome.forissier@linaro.org> napsal:
>
> Use the coroutines framework to scan USB buses in parallel for better
> performance. Tested on arm64 QEMU on a somewhat contrived example
> (4 USB buses, each with one audio device, one keyboard, one mouse and
> one tablet).
>
> $ make qemu_arm64_defconfig
> $ make -j$(nproc) CROSS_COMPILE="ccache aarch64-linux-gnu-"
> $ qemu-system-aarch64 -M virt -nographic -cpu max -bios u-boot.bin \
> $(for i in {1..4}; do echo -device qemu-xhci,id=xhci$i \
> -device\ usb-{audio,kbd,mouse,tablet},bus=xhci$i.0; \
> done)
>
> The time spent in usb_init() is reported on the console and shows a
> significant improvement with COROUTINES enabled.
>
> ** Without COROUTINES
>
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> scanning bus xhci_pci for devices... 6 USB Device(s) found
> scanning bus xhci_pci for devices... 6 USB Device(s) found
> scanning bus xhci_pci for devices... 6 USB Device(s) found
> scanning bus xhci_pci for devices... 6 USB Device(s) found
> USB: 4 bus(es) scanned in 5873 ms
>
> ** With COROUTINES
>
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Bus xhci_pci: Register 8001040 NbrPorts 8
> Starting the controller
> USB XHCI 1.00
> Scanning 4 USB bus(es)... done
> Bus xhci_pci: 6 USB device(s) found
> Bus xhci_pci: 6 USB device(s) found
> Bus xhci_pci: 6 USB device(s) found
> Bus xhci_pci: 6 USB device(s) found
> USB: 4 bus(es) scanned in 2213 ms
>
> Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
> ---
> drivers/usb/host/usb-uclass.c | 152 +++++++++++++++++++++++++++++++++-
> 1 file changed, 149 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/host/usb-uclass.c b/drivers/usb/host/usb-uclass.c
> index bfec303e7af..3104efe7f9e 100644
> --- a/drivers/usb/host/usb-uclass.c
> +++ b/drivers/usb/host/usb-uclass.c
> @@ -9,6 +9,7 @@
> #define LOG_CATEGORY UCLASS_USB
>
> #include <bootdev.h>
> +#include <coroutines.h>
> #include <dm.h>
> #include <errno.h>
> #include <log.h>
> @@ -18,6 +19,8 @@
> #include <dm/lists.h>
> #include <dm/uclass-internal.h>
>
> +#include <time.h>
> +
> static bool asynch_allowed;
>
> struct usb_uclass_priv {
> @@ -221,6 +224,40 @@ int usb_stop(void)
> return err;
> }
>
> +static int nbus;
> +
> +#if CONFIG_IS_ENABLED(COROUTINES)
> +static void usb_scan_bus(struct udevice *bus, bool recurse)
> +{
> + struct usb_bus_priv *priv;
> + struct udevice *dev;
> + int ret;
> +
> + priv = dev_get_uclass_priv(bus);
> +
> + assert(recurse); /* TODO: Support non-recusive */
> +
> + debug("\n");
> + ret = usb_scan_device(bus, 0, USB_SPEED_FULL, &dev);
> + if (ret)
> + printf("Scanning bus %s failed, error %d\n", bus->name, ret);
> +}
> +
> +static void usb_report_devices(struct uclass *uc)
> +{
> + struct usb_bus_priv *priv;
> + struct udevice *bus;
> +
> + uclass_foreach_dev(bus, uc) {
> + priv = dev_get_uclass_priv(bus);
> + printf("Bus %s: ", bus->name);
> + if (priv->next_addr == 0)
> + printf("No USB device found\n");
> + else
> + printf("%d USB device(s) found\n", priv->next_addr);
> + }
> +}
> +#else
> static void usb_scan_bus(struct udevice *bus, bool recurse)
> {
> struct usb_bus_priv *priv;
> @@ -240,7 +277,81 @@ static void usb_scan_bus(struct udevice *bus, bool recurse)
> printf("No USB Device found\n");
> else
> printf("%d USB Device(s) found\n", priv->next_addr);
> + nbus++;
> }
> +#endif
> +
> +#if CONFIG_IS_ENABLED(COROUTINES)
> +extern int udelay_yield;
> +
> +static void usb_scan_bus_co(void)
> +{
> + usb_scan_bus((struct udevice *)co_get_arg(), true);
> + co_exit();
> +}
> +
> +static struct co_stack *stk;
> +static struct co *main_co;
> +static struct co **co;
> +static int co_sz = 8;
> +
> +static int add_usb_scan_bus_co(struct udevice *bus)
> +{
> + if (!co) {
> + co = malloc(co_sz * sizeof(*co));
> + if (!co)
> + return -ENOMEM;
> + }
> + if (nbus == co_sz) {
> + struct co **nco;
> +
> + co_sz *= 2;
> + nco = realloc(co, co_sz * sizeof(*co));
> + if (!nco)
> + return -ENOMEM;
> + co = nco;
> + }
> + if (!main_co) {
> + main_co = co_create(NULL, NULL, 0, NULL, NULL);
> + if (!main_co)
> + return -ENOMEM;
> + }
> + if (!stk) {
> + stk = co_stack_new(32768);
> + if (!stk)
> + return -ENOMEM;
> + }
> + co[nbus] = co_create(main_co, stk, 0, usb_scan_bus_co, bus);
> + if (!co[nbus])
> + return -ENOMEM;
> + nbus++;
> + return 0;
> +}
> +
> +static void usb_scan_cleanup(void)
> +{
> + int i;
> +
> + for (i = 0; i < nbus; i++) {
> + co_destroy(co[i]);
> + co[i] = NULL;
> + }
> + nbus = 0;
> + co_destroy(main_co);
> + main_co = NULL;
> + co_stack_destroy(stk);
> + stk = NULL;
> +}
> +#else
> +static int add_usb_scan_bus_co(struct udevice *bus)
> +{
> + return 0;
> +}
> +
> +static void usb_scan_cleanup(void)
> +{
> +}
> +#endif
>
> static void remove_inactive_children(struct uclass *uc, struct udevice *bus)
> {
> @@ -289,6 +400,7 @@ static int usb_probe_companion(struct udevice *bus)
>
> int usb_init(void)
> {
> + unsigned long t0 = timer_get_us();
> int controllers_initialized = 0;
> struct usb_uclass_priv *uc_priv;
> struct usb_bus_priv *priv;
> @@ -355,10 +467,40 @@ int usb_init(void)
> continue;
>
> priv = dev_get_uclass_priv(bus);
> - if (!priv->companion)
> - usb_scan_bus(bus, true);
> + if (!priv->companion) {
> + if (CONFIG_IS_ENABLED(COROUTINES)) {
> + ret = add_usb_scan_bus_co(bus);
> + if (ret)
> + goto out;
> + } else {
> + usb_scan_bus(bus, true);
> + }
> + }
> }
>
> +#if CONFIG_IS_ENABLED(COROUTINES)
> + {
> + bool done;
> + int i;
> +
> + printf("Scanning %d USB bus(es)... ", nbus);
> + udelay_yield = 0xCAFEDECA;
> + do {
> + done = true;
> + for (i = 0; i < nbus; i++) {
> + if (!co[i]->done) {
> + done = false;
> + co_resume(co[i]);
> + }
> + }
> + } while (!done);
> + udelay_yield = 0;
> + printf("done\n");
> +
> + usb_report_devices(uc);
> + }
> +#endif
> +
> /*
> * Now that the primary controllers have been scanned and have handed
> * over any devices they do not understand to their companions, scan
> @@ -388,7 +530,11 @@ int usb_init(void)
> /* if we were not able to find at least one working bus, bail out */
> if (controllers_initialized == 0)
> printf("No USB controllers found\n");
> -
> +out:
> + if (nbus)
> + printf("USB: %d bus(es) scanned in %ld ms\n", nbus,
> + (timer_get_us() - t0) / 1000);
> + usb_scan_cleanup();
> return usb_started ? 0 : -ENOENT;
> }
>
> --
> 2.43.0
>
I have tested it on kr260 which is using 2 usb interfaces and there is
an issue with usb hub initialization.
That board has two hubs connected over i2c and only one of them is
initialized over i2c.
It means there is some work which needs to happen and likely some
locking should be in place.
Thanks,
Michal
--
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Xilinx Microblaze
Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs
U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/3] Introduce coroutines framework
2025-01-28 11:09 ` Michal Simek
@ 2025-01-28 13:30 ` Jerome Forissier
0 siblings, 0 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 13:30 UTC (permalink / raw)
To: Michal Simek, u-boot
Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Patrick Rudolph,
Sughosh Ganu, Raymond Mao, Heinrich Schuchardt
On 1/28/25 12:09, Michal Simek wrote:
>
>
> On 1/28/25 11:19, Jerome Forissier wrote:
>> Adds the COROUTINES Kconfig symbol which introduces a new internal API
>> for coroutines support. As explained in the Kconfig file, this is meant
>> to provide some kind of cooperative multi-tasking with the goal to
>> improve performance by overlapping lengthy operations.
>>
>> The API as well as the implementation is very much inspired from libaco
>> [1]. The reference implementation is simplified to remove all things
>> not needed in U-Boot, the coding style is updated, and the aco_ prefix
>> is replaced by co_.
>>
>> I believe the stack handling could be simplified: the stack of the main
>
> Please use imperative mood.
OK. I will drop the remark about stack handling simplification because it
is not very helpful here.
>> coroutine could probably probably be used by the secondary coroutines
>> instead of allocating a new stack dynamically.
>>
>> Only i386, x86_64 and aarch64 are supported at the moment. Other
>> architectures need to provide a _co_switch() function in assembly.
>
> Likely should be updated.
OK
>
>>
>> Only aarch64 has been tested.
>>
>> [1] https://github.com/hnes/libaco/
>>
>> Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
>> ---
>> arch/arm/cpu/armv8/Makefile | 1 +
>> arch/arm/cpu/armv8/co_switch.S | 36 +++++++
>> include/coroutines.h | 130 ++++++++++++++++++++++++++
>> lib/Kconfig | 10 ++
>> lib/Makefile | 2 +
>> lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
>
> Checkpatch is reporting some issues. Please fix them.
>
> total: 17 errors, 19 warnings, 1 checks, 359 lines checked
All the checkpatch issues will be fixed in v3.
>> 6 files changed, 344 insertions(+)
>> create mode 100644 arch/arm/cpu/armv8/co_switch.S
>> create mode 100644 include/coroutines.h
>> create mode 100644 lib/coroutines.c
>>
>> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
>> index 2e71ff2dc97..6d07b6aa9f9 100644
>> --- a/arch/arm/cpu/armv8/Makefile
>> +++ b/arch/arm/cpu/armv8/Makefile
>> @@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/
>> obj-$(CONFIG_XEN) += xen/
>> obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o
>> obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o
>> +obj-$(CONFIG_COROUTINES) += co_switch.o
>> diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S
>> new file mode 100644
>> index 00000000000..4405e89ec56
>> --- /dev/null
>> +++ b/arch/arm/cpu/armv8/co_switch.S
>> @@ -0,0 +1,36 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ */
>> +/* void _co_switch(struct uco *from_co, struct uco *to_co); */
>> +.text
>> +.globl _co_switch
>> +.type _co_switch, @function
>> +_co_switch:
>> + // x0: from_co
>> + // x1: to_co
>> + // from_co and to_co layout: { pc, sp, x19-x29 }
>> +
>> + // Save context to from_co (x0)
>> + // AAPCS64 says "A subroutine invocation must preserve the contents of the
>> + // registers r19-r29 and SP"
>> + adr x2, 1f // pc we should use to resume after this function
>> + mov x3, sp
>> + stp x2, x3, [x0, #0] // pc, sp
>> + stp x19, x20, [x0, #16]
>> + stp x21, x22, [x0, #32]
>> + stp x23, x24, [x0, #48]
>> + stp x25, x26, [x0, #64]
>> + stp x27, x28, [x0, #80]
>> + stp x29, x30, [x0, #96]
>> +
>> + // Load new context from to_co (x1)
>> + ldp x2, x3, [x1, #0] // pc, sp
>> + ldp x19, x20, [x1, #16]
>> + ldp x21, x22, [x1, #32]
>> + ldp x23, x24, [x1, #48]
>> + ldp x25, x26, [x1, #64]
>> + ldp x27, x28, [x1, #80]
>> + ldp x29, x30, [x1, #96]
>> + mov sp, x3
>> + br x2
>> +
>> +1: // Return to the caller
>> + ret
>> diff --git a/include/coroutines.h b/include/coroutines.h
>> new file mode 100644
>> index 00000000000..b85b656127c
>> --- /dev/null
>> +++ b/include/coroutines.h
>> @@ -0,0 +1,130 @@
>> +/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later */
>> +/*
>> + * Copyright 2018 Sen Han <00hnes@gmail.com>
>> + * Copyright 2025 Linaro Limited
>> + */
>> +
>> +#ifndef _COROUTINES_H_
>> +#define _COROUTINES_H_
>> +
>> +#ifndef CONFIG_COROUTINES
>> +
>> +static inline void co_yield(void) {}
>> +static inline void co_exit(void) {}
>> +
>> +#else
>> +
>> +#ifdef __UBOOT__
>> +#include <log.h>
>> +#else
>> +#include <assert.h>
>> +#endif
>
> Do we need ifdef at all? Are you testing it out of u-boot too? Or is this just untested code?
It is a leftover from when I used to test outside U-Boot (in a regular
Linux application) but that is not useful anymore. I'll remove the
conditional.
>> +#include <limits.h>
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <time.h>
>> +
>> +#ifdef __aarch64__
>> +#define CO_REG_IDX_RETADDR 0
>> +#define CO_REG_IDX_SP 1
>> +#else
>> +#error Architecture no supported
>> +#endif
>> +
>> +struct co_save_stack {
>> + void* ptr;
>> + size_t sz;
>> + size_t valid_sz;
>> + size_t max_cpsz; /* max copy size in bytes */
>> +};
>> +
>> +struct co_stack {
>> + void *ptr;
>> + size_t sz;
>> + void *align_highptr;
>> + void *align_retptr;
>> + size_t align_validsz;
>> + size_t align_limit;
>> + struct co *owner;
>> + void *real_ptr;
>> + size_t real_sz;
>> +};
>> +
>> +struct co {
>> + /* CPU state: callee-saved registers plus SP and PC */
>> + void *reg[14]; // pc, sp, x19-x29, x30 (lr)
>> +
>> + struct co *main_co;
>> + void *arg;
>> + bool done;
>> +
>> + void (*fp)(void);
>> +
>> + struct co_save_stack save_stack;
>> + struct co_stack *stack;
>> +};
>> +
>> +extern struct co *current_co;
>> +
>> +static inline struct co *co_get_co(void)
>> +{
>> + return current_co;
>> +}
>> +
>> +static inline void *co_get_arg(void)
>> +{
>> + return co_get_co()->arg;
>> +}
>> +
>> +struct co_stack *co_stack_new(size_t sz);
>> +
>> +void co_stack_destroy(struct co_stack *s);
>> +
>> +struct co *co_create(struct co *main_co,
>> + struct co_stack *stack,
>> + size_t save_stack_sz, void (*fp)(void),
>> + void *arg);
>> +
>> +void co_resume(struct co *resume_co);
>> +
>> +void co_destroy(struct co *co);
>> +
>> +void *_co_switch(struct co *from_co, struct co *to_co);
>> +
>> +static inline void _co_yield_to_main_co(struct co *yield_co)
>> +{
>> + assert(yield_co);
>> + assert(yield_co->main_co);
>> + _co_switch(yield_co, yield_co->main_co);
>> +}
>> +
>> +static inline void co_yield(void)
>> +{
>> + if (current_co)
>> + _co_yield_to_main_co(current_co);
>> +}
>> +
>> +static inline bool co_is_main_co(struct co *co)
>> +{
>> + return !co->main_co;
>> +}
>> +
>> +static inline void co_exit(void)
>> +{
>> + struct co *co = co_get_co();
>> +
>> + if (!co)
>> + return;
>> + co->done = true;
>> + assert(co->stack->owner == co);
>> + co->stack->owner = NULL;
>> + co->stack->align_validsz = 0;
>> + _co_yield_to_main_co(co);
>> + assert(false);
>> +}
>> +
>> +#endif /* CONFIG_COROUTINES */
>> +#endif /* _COROUTINES_H_ */
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index 8f1a96d98c4..b6c1380b927 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
>> enable this config option to distinguish them using
>> phandles in fdtdec_get_alias_seq() function.
>> +config COROUTINES
>> + bool "Enable coroutine support"
>> + help
>> + Coroutines allow to implement a simple form of cooperative
>> + multi-tasking. The main thread of execution registers one or
>> + more functions as coroutine entry points, then it schedules one
>> + of them. At any point the scheduled coroutine may yield, that is,
>> + suspend its execution and return back to the main thread. At this
>> + point another coroutine may be scheduled and so on until all the
>> + registered coroutines are done.
>
> newline
>
>> endmenu
>> source "lib/fwu_updates/Kconfig"
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 5cb3278d2ef..7b809151f5a 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
>> obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
>> +obj-$(CONFIG_COROUTINES) += coroutines.o
>> +
>> #
>> # Build a fast OID lookup registry from include/linux/oid_registry.h
>> #
>> diff --git a/lib/coroutines.c b/lib/coroutines.c
>> new file mode 100644
>> index 00000000000..20c5aba5510
>> --- /dev/null
>> +++ b/lib/coroutines.c
>> @@ -0,0 +1,165 @@
>> +// SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later
>> +
>> +// Copyright 2018 Sen Han <00hnes@gmail.com>
>> +// Copyright 2025 Linaro Limited
>> +
>> +#include <coroutines.h>
>> +#include <stdio.h>
>> +#include <stdint.h>
>> +
>> +
>> +/* Current co-routine */
>> +struct co *current_co;
>> +
>> +struct co_stack *co_stack_new(size_t sz)
>> +{
>> + struct co_stack *p = calloc(1, sizeof(*p));
>> + uintptr_t u_p;
>> +
>> + if (!p)
>> + return NULL;
>> +
>> + if (sz < 4096)
>> + sz = 4096;
>> +
>> + p->sz = sz;
>> + p->ptr = malloc(sz);
>> + if (!p->ptr) {
>> + free(p);
>> + return NULL;
>> + }
>> +
>> + p->owner = NULL;
>> + u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
>> + u_p = (u_p >> 4) << 4;
>> + p->align_highptr = (void*)u_p;
>> + p->align_retptr = (void*)(u_p - sizeof(void*));
>> + assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
>> + p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
>> +
>> + return p;
>> +}
>> +
>> +void co_stack_destroy(struct co_stack *s){
>> + if (!s)
>> + return;
>> + free(s->ptr);
>> + free(s);
>> +}
>> +
>> +struct co *co_create(struct co *main_co,
>> + struct co_stack *stack,
>> + size_t save_stack_sz,
>> + void (*fp)(void), void *arg)
>> +{
>> + struct co *p = malloc(sizeof(*p));
>> + assert(p);
>> + memset(p, 0, sizeof(*p));
>> +
>> + if (main_co) {
>> + assert(stack);
>> + p->stack = stack;
>> + p->reg[CO_REG_IDX_RETADDR] = (void *)fp;
>> + // FIXME original code uses align_retptr; causes a crash
>> + p->reg[CO_REG_IDX_SP] = p->stack->align_highptr;
>> + p->main_co = main_co;
>> + p->arg = arg;
>> + p->fp = fp;
>> + if (!save_stack_sz)
>> + save_stack_sz = 64;
>> + p->save_stack.ptr = malloc(save_stack_sz);
>> + assert(p->save_stack.ptr);
>> + p->save_stack.sz = save_stack_sz;
>> + p->save_stack.valid_sz = 0;
>> + } else {
>> + p->main_co = NULL;
>> + p->arg = arg;
>> + p->fp = fp;
>> + p->stack = NULL;
>> + p->save_stack.ptr = NULL;
>> + }
>> + return p;
>> +}
>> +
>> +static void grab_stack(struct co *resume_co)
>> +{
>> + struct co *owner_co = resume_co->stack->owner;
>> +
>> + if (owner_co) {
>> + assert(owner_co->stack == resume_co->stack);
>> + assert((uintptr_t)(owner_co->stack->align_retptr) >=
>> + (uintptr_t)(owner_co->reg[CO_REG_IDX_SP]));
>> + assert((uintptr_t)owner_co->stack->align_highptr -
>> + (uintptr_t)owner_co->stack->align_limit
>> + <= (uintptr_t)owner_co->reg[CO_REG_IDX_SP]);
>> + owner_co->save_stack.valid_sz =
>> + (uintptr_t)owner_co->stack->align_retptr -
>> + (uintptr_t)owner_co->reg[CO_REG_IDX_SP];
>> + if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
>> + free(owner_co->save_stack.ptr);
>> + owner_co->save_stack.ptr = NULL;
>> + do {
>> + owner_co->save_stack.sz <<= 1;
>> + assert(owner_co->save_stack.sz > 0);
>> + } while (owner_co->save_stack.sz <
>> + owner_co->save_stack.valid_sz);
>> + owner_co->save_stack.ptr =
>> + malloc(owner_co->save_stack.sz);
>> + assert(owner_co->save_stack.ptr);
>> + }
>> + if (owner_co->save_stack.valid_sz > 0)
>> + memcpy(owner_co->save_stack.ptr,
>> + owner_co->reg[CO_REG_IDX_SP],
>> + owner_co->save_stack.valid_sz);
>> + if (owner_co->save_stack.valid_sz >
>> + owner_co->save_stack.max_cpsz)
>> + owner_co->save_stack.max_cpsz =
>> + owner_co->save_stack.valid_sz;
>> + owner_co->stack->owner = NULL;
>> + owner_co->stack->align_validsz = 0;
>> + }
>> + assert(!resume_co->stack->owner);
>> + assert(resume_co->save_stack.valid_sz <=
>> + resume_co->stack->align_limit - sizeof(void *));
>> + if (resume_co->save_stack.valid_sz > 0)
>> + memcpy((void*)
>> + (uintptr_t)(resume_co->stack->align_retptr) -
>> + resume_co->save_stack.valid_sz,
>> + resume_co->save_stack.ptr,
>> + resume_co->save_stack.valid_sz);
>> + if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
>> + resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
>> + resume_co->stack->align_validsz =
>> + resume_co->save_stack.valid_sz + sizeof(void *);
>> + resume_co->stack->owner = resume_co;
>> +}
>> +
>> +void co_resume(struct co *resume_co)
>> +{
>> + assert(resume_co && resume_co->main_co && !resume_co->done);
>> +
>> + if (resume_co->stack->owner != resume_co)
>> + grab_stack(resume_co);
>> +
>> + current_co = resume_co;
>> + _co_switch(resume_co->main_co, resume_co);
>> + current_co = resume_co->main_co;
>> +}
>> +
>> +void co_destroy(struct co *co){
>
> obviously this is not u-boot coding style. I expect checkpatch reports this too.
Yep. Addressed in v3.
>> + if (!co)
>> + return;
>> +
>> + if(co_is_main_co(co)){
>> + free(co);
>> + current_co = NULL;
>> + } else {
>> + if(co->stack->owner == co){
>> + co->stack->owner = NULL;
>> + co->stack->align_validsz = 0;
>> + }
>> + free(co->save_stack.ptr);
>> + co->save_stack.ptr = NULL;
>> + free(co);
>> + }
>> +}
>
>
> Thanks,
> Michal
Thanks for the review.
--
Jerome
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously with coroutines
2025-01-28 11:58 ` Michal Simek
@ 2025-01-28 13:53 ` Jerome Forissier
0 siblings, 0 replies; 12+ messages in thread
From: Jerome Forissier @ 2025-01-28 13:53 UTC (permalink / raw)
To: Michal Simek
Cc: u-boot, Ilias Apalodimas, Marek Vasut, Tom Rini,
Mattijs Korpershoek, Heinrich Schuchardt, Dragan Simic,
Caleb Connolly
On 1/28/25 12:58, Michal Simek wrote:
> Hi,
>
> út 28. 1. 2025 v 11:20 odesílatel Jerome Forissier
> <jerome.forissier@linaro.org> napsal:
>>
>> Use the coroutines framework to scan USB buses in parallel for better
>> performance. Tested on arm64 QEMU on a somewhat contrived example
>> (4 USB buses, each with one audio device, one keyboard, one mouse and
>> one tablet).
>>
>> $ make qemu_arm64_defconfig
>> $ make -j$(nproc) CROSS_COMPILE="ccache aarch64-linux-gnu-"
>> $ qemu-system-aarch64 -M virt -nographic -cpu max -bios u-boot.bin \
>> $(for i in {1..4}; do echo -device qemu-xhci,id=xhci$i \
>> -device\ usb-{audio,kbd,mouse,tablet},bus=xhci$i.0; \
>> done)
>>
>> The time spent in usb_init() is reported on the console and shows a
>> significant improvement with COROUTINES enabled.
>>
>> ** Without COROUTINES
>>
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> scanning bus xhci_pci for devices... 6 USB Device(s) found
>> scanning bus xhci_pci for devices... 6 USB Device(s) found
>> scanning bus xhci_pci for devices... 6 USB Device(s) found
>> scanning bus xhci_pci for devices... 6 USB Device(s) found
>> USB: 4 bus(es) scanned in 5873 ms
>>
>> ** With COROUTINES
>>
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Bus xhci_pci: Register 8001040 NbrPorts 8
>> Starting the controller
>> USB XHCI 1.00
>> Scanning 4 USB bus(es)... done
>> Bus xhci_pci: 6 USB device(s) found
>> Bus xhci_pci: 6 USB device(s) found
>> Bus xhci_pci: 6 USB device(s) found
>> Bus xhci_pci: 6 USB device(s) found
>> USB: 4 bus(es) scanned in 2213 ms
>>
>> Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
>> ---
>> drivers/usb/host/usb-uclass.c | 152 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 149 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/usb/host/usb-uclass.c b/drivers/usb/host/usb-uclass.c
>> index bfec303e7af..3104efe7f9e 100644
>> --- a/drivers/usb/host/usb-uclass.c
>> +++ b/drivers/usb/host/usb-uclass.c
>> @@ -9,6 +9,7 @@
>> #define LOG_CATEGORY UCLASS_USB
>>
>> #include <bootdev.h>
>> +#include <coroutines.h>
>> #include <dm.h>
>> #include <errno.h>
>> #include <log.h>
>> @@ -18,6 +19,8 @@
>> #include <dm/lists.h>
>> #include <dm/uclass-internal.h>
>>
>> +#include <time.h>
>> +
>> static bool asynch_allowed;
>>
>> struct usb_uclass_priv {
>> @@ -221,6 +224,40 @@ int usb_stop(void)
>> return err;
>> }
>>
>> +static int nbus;
>> +
>> +#if CONFIG_IS_ENABLED(COROUTINES)
>> +static void usb_scan_bus(struct udevice *bus, bool recurse)
>> +{
>> + struct usb_bus_priv *priv;
>> + struct udevice *dev;
>> + int ret;
>> +
>> + priv = dev_get_uclass_priv(bus);
>> +
>> + assert(recurse); /* TODO: Support non-recusive */
>> +
>> + debug("\n");
>> + ret = usb_scan_device(bus, 0, USB_SPEED_FULL, &dev);
>> + if (ret)
>> + printf("Scanning bus %s failed, error %d\n", bus->name, ret);
>> +}
>> +
>> +static void usb_report_devices(struct uclass *uc)
>> +{
>> + struct usb_bus_priv *priv;
>> + struct udevice *bus;
>> +
>> + uclass_foreach_dev(bus, uc) {
>> + priv = dev_get_uclass_priv(bus);
>> + printf("Bus %s: ", bus->name);
>> + if (priv->next_addr == 0)
>> + printf("No USB device found\n");
>> + else
>> + printf("%d USB device(s) found\n", priv->next_addr);
>> + }
>> +}
>> +#else
>> static void usb_scan_bus(struct udevice *bus, bool recurse)
>> {
>> struct usb_bus_priv *priv;
>> @@ -240,7 +277,81 @@ static void usb_scan_bus(struct udevice *bus, bool recurse)
>> printf("No USB Device found\n");
>> else
>> printf("%d USB Device(s) found\n", priv->next_addr);
>> + nbus++;
>> }
>> +#endif
>> +
>> +#if CONFIG_IS_ENABLED(COROUTINES)
>> +extern int udelay_yield;
>> +
>> +static void usb_scan_bus_co(void)
>> +{
>> + usb_scan_bus((struct udevice *)co_get_arg(), true);
>> + co_exit();
>> +}
>> +
>> +static struct co_stack *stk;
>> +static struct co *main_co;
>> +static struct co **co;
>> +static int co_sz = 8;
>> +
>> +static int add_usb_scan_bus_co(struct udevice *bus)
>> +{
>> + if (!co) {
>> + co = malloc(co_sz * sizeof(*co));
>> + if (!co)
>> + return -ENOMEM;
>> + }
>> + if (nbus == co_sz) {
>> + struct co **nco;
>> +
>> + co_sz *= 2;
>> + nco = realloc(co, co_sz * sizeof(*co));
>> + if (!nco)
>> + return -ENOMEM;
>> + co = nco;
>> + }
>> + if (!main_co) {
>> + main_co = co_create(NULL, NULL, 0, NULL, NULL);
>> + if (!main_co)
>> + return -ENOMEM;
>> + }
>> + if (!stk) {
>> + stk = co_stack_new(32768);
>> + if (!stk)
>> + return -ENOMEM;
>> + }
>> + co[nbus] = co_create(main_co, stk, 0, usb_scan_bus_co, bus);
>> + if (!co[nbus])
>> + return -ENOMEM;
>> + nbus++;
>> + return 0;
>> +}
>> +
>> +static void usb_scan_cleanup(void)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < nbus; i++) {
>> + co_destroy(co[i]);
>> + co[i] = NULL;
>> + }
>> + nbus = 0;
>> + co_destroy(main_co);
>> + main_co = NULL;
>> + co_stack_destroy(stk);
>> + stk = NULL;
>> +}
>> +#else
>> +static int add_usb_scan_bus_co(struct udevice *bus)
>> +{
>> + return 0;
>> +}
>> +
>> +static void usb_scan_cleanup(void)
>> +{
>> +}
>> +#endif
>>
>> static void remove_inactive_children(struct uclass *uc, struct udevice *bus)
>> {
>> @@ -289,6 +400,7 @@ static int usb_probe_companion(struct udevice *bus)
>>
>> int usb_init(void)
>> {
>> + unsigned long t0 = timer_get_us();
>> int controllers_initialized = 0;
>> struct usb_uclass_priv *uc_priv;
>> struct usb_bus_priv *priv;
>> @@ -355,10 +467,40 @@ int usb_init(void)
>> continue;
>>
>> priv = dev_get_uclass_priv(bus);
>> - if (!priv->companion)
>> - usb_scan_bus(bus, true);
>> + if (!priv->companion) {
>> + if (CONFIG_IS_ENABLED(COROUTINES)) {
>> + ret = add_usb_scan_bus_co(bus);
>> + if (ret)
>> + goto out;
>> + } else {
>> + usb_scan_bus(bus, true);
>> + }
>> + }
>> }
>>
>> +#if CONFIG_IS_ENABLED(COROUTINES)
>> + {
>> + bool done;
>> + int i;
>> +
>> + printf("Scanning %d USB bus(es)... ", nbus);
>> + udelay_yield = 0xCAFEDECA;
>> + do {
>> + done = true;
>> + for (i = 0; i < nbus; i++) {
>> + if (!co[i]->done) {
>> + done = false;
>> + co_resume(co[i]);
>> + }
>> + }
>> + } while (!done);
>> + udelay_yield = 0;
>> + printf("done\n");
>> +
>> + usb_report_devices(uc);
>> + }
>> +#endif
>> +
>> /*
>> * Now that the primary controllers have been scanned and have handed
>> * over any devices they do not understand to their companions, scan
>> @@ -388,7 +530,11 @@ int usb_init(void)
>> /* if we were not able to find at least one working bus, bail out */
>> if (controllers_initialized == 0)
>> printf("No USB controllers found\n");
>> -
>> +out:
>> + if (nbus)
>> + printf("USB: %d bus(es) scanned in %ld ms\n", nbus,
>> + (timer_get_us() - t0) / 1000);
>> + usb_scan_cleanup();
>> return usb_started ? 0 : -ENOENT;
>> }
>>
>> --
>> 2.43.0
>>
>
> I have tested it on kr260 which is using 2 usb interfaces and there is
> an issue with usb hub initialization.
> That board has two hubs connected over i2c and only one of them is
> initialized over i2c.
> It means there is some work which needs to happen and likely some
> locking should be in place.
Hmmmm... I was kind of expecting that :-/ Do you think the kv260 has a
similar problem? If so I can use mine to troubleshoot.
Thanks,
--
Jerome
> Thanks,
> Michal
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 0/3] Coroutines
2025-01-28 10:19 [RFC PATCH v2 0/3] Coroutines Jerome Forissier
` (2 preceding siblings ...)
2025-01-28 10:19 ` [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously " Jerome Forissier
@ 2025-01-28 14:04 ` Simon Glass
3 siblings, 0 replies; 12+ messages in thread
From: Simon Glass @ 2025-01-28 14:04 UTC (permalink / raw)
To: Jerome Forissier; +Cc: u-boot, Ilias Apalodimas
Hi Jerome,
On Tue, 28 Jan 2025 at 03:20, Jerome Forissier
<jerome.forissier@linaro.org> wrote:
>
> This series introduces a simple coroutines framework and uses it to
> speed up the efi_init_obj_list() function. It is just an example to
> trigger discussions; hopefully other places in U-Boot can benefit from
> a similar treatment. Suggestions are welcome.
>
> I came up with this idea after analyzing some profiling data (ftrace)
> taken during the execution of the "printenv -e" command on a Kria KV260
> board. When the command is first invoked, efi_init_obj_list() is called
> which takes a significant amount of time (2.872 seconds). This time can
> be split into 0.570 s for efi_disks_register(), 0.811 s for
> efi_tcg2_register() and 1.418 s for efi_tcg2_do_initial_measurement().
> All the other child functions are much quicker. Another interesting
> observation is that a large part of the time is actually spent waiting
> in __udelay(). More precisely:
> - For efi_disks_register(): 421 ms / 570 ms = 73.8 % spent in __udelay()
> - For efi_tcg2_register(): 805 ms / 811 ms = 99.1 % spent in __udelay()
> - For efi_tcg2_do_initial_measurement(): 1395.025 ms / 1418.372 ms =
> 98.3 % spent in __udelay()
>
> Given the above data, it is reasonable to think that a nice speedup
> could be obtained if these functions could somehow be run in parallel.
> efi_tcg2_do_initial_measurement() unfortunately needs to be excluded
> because it depends on the first two. But efi_disks_register() and
> efi_tcg2_register() are clearly independant and are therefore good
> candidates for concurrent execution.
>
> So, I considered two options:
>
> - Spin up a secondary core to take care of one function while the other
> one runs on the main core
> - Introduce some kind of software scheduling
>
> I quickly ruled out the first one for several reasons: initializing a
> secondary core is typically quite hardware-specific, it would not scale
> well if more functions than available cores would need to be run in
> parallel, it would make debugging harder etc.
Yes this is fairly brittle IMO. Aide: Years ago I did a prototype of
using a second core to speed up loading from MMC. It turned out that
just 'kicking' the mmc controller at the start was enough to speed
things up. I suspect that these days the controllers are faster and
there is less of a delay getting going.
> Software scheduling however can be accomplished quite easily, especially
> since we don't need to consider preemptive multitasking. Coroutines [1]
> for example can perfectly do the job. They provide a way to save and
> restore the execution context (registers and stack). Here is how it
> look like:
>
> static void do_some_work(int v)
> {
> int i;
>
> for (i = 0; i < 5; i++) {
> printf("%d", v);
> /* Save context and resume main thread */
> co_yield();
> }
> }
>
> static void co1(void *arg)
> {
> do_some_work(1);
> /* Mark coroutine as "done" and resume main thread */
> co_exit();
> }
>
> static void co2(void *arg)
> {
> do_some_work(2);
> co_exit();
> }
>
> void main_thread(void)
> {
> struct co *co1, *co2;
>
> co1 = co_create(co1, ...);
> co2 = co_create(co2, ...);
>
> do {
> printf("A");
> if (!co1->done) {
> /* Invoke or resume first coroutine */
> co_resume(co1);
> }
> printf("B");
> if (!cor21->done) {
> /* Invoke or resume second coroutine */
> co_resume(co2);
> }
> } while (!(co1->done && co2->done));
>
> /* At this point, co1 and co2 have both called co_exit() */
> }
>
> The above example would print: A1B2A1B2A1B2A1B2A1B2.
>
> - The first commit introduces the coroutine framework, loosely based on
> libaco [2]. The code was simplified and reformatted to better suit
> U-Boot.
> - The second commit modifies efi_init_obj_list() in order to turn
> efi_disks_register() and efi_tcg2_register() into coroutines when
> COROUTINES in enabled. On a KV260 board with a SD card inserted, this
> saves about .6 second (2.2 s instead of 2.8 s).
> - The third commit applies coroutines to usb_init(), which can
> significantly reduce the time it takes to scan multiple buses. Tested
> on arm64 QEMU with 4 XHCI buses: the USB scan takes 2.2 s instead of
> 5.8 s.
>
> [1] https://en.wikipedia.org/wiki/Coroutine
> [2] https://github.com/hnes/libaco/
>
> Changes in v2
>
> - Remove x86 and x86_64 arch code since it is untested
> - Add missing SPDX license tag to arch/arm/cpu/armv8/co_switch.S
> - Change Apache-2.0 SPDX license tag to "Apache-2.0 OR GPL-2.0-or-later"
> - Apply coroutines to the USB bus scan in usb_init()
> Jerome Forissier (3):
> Introduce coroutines framework
> efi_loader: optimize efi_init_obj_list() with coroutines
> usb: scan multiple buses simultaneously with coroutines
>
> arch/arm/cpu/armv8/Makefile | 1 +
> arch/arm/cpu/armv8/co_switch.S | 36 +++++++
> drivers/usb/host/usb-uclass.c | 152 +++++++++++++++++++++++++++++-
> include/coroutines.h | 130 ++++++++++++++++++++++++++
> lib/Kconfig | 10 ++
> lib/Makefile | 2 +
> lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
> lib/efi_loader/efi_setup.c | 113 +++++++++++++++++++++-
> lib/time.c | 14 ++-
> 9 files changed, 615 insertions(+), 8 deletions(-)
> create mode 100644 arch/arm/cpu/armv8/co_switch.S
> create mode 100644 include/coroutines.h
> create mode 100644 lib/coroutines.c
>
> --
> 2.43.0
>
Another idea would be to use the cyclic framework.
To use cyclic for USB, we would need a struct that holds the state of
the USB scanning. This is likely just a few index variables. The nice
thing about cyclic is that we are a bit more in control of what is
going on and we know at any point what state things are in. We have a
way of aborting when we want to. Arguably many of the high-level
functions in U-Boot should be written as:
int do_more(struct my_state *state)
and just go until they run out of things to immediately do*
It would collect the state for each subsystem in one place.
The other nice thing is that we can progress the boot with multiple
subsystems at the same time, with less risk of strange behaviour (I
suspect?). In other words, start USB scanning 'in the background'
while continuing to allow the user to type stuff at the cmdline.
I believe that we would very quickly want to get progress info from
the operation, but it's hard to be sure when it is safe to read it, if
the coroutines are doing their own thing on their own time with
limited visibility from the 'main' process.
You can see this sort of API with libfdt (fdt_first/next_subnode()),
driver model (uclass_first/next_device() and bootstd
(bootflow_scan_first/next()).
If we are going to have coroutines, then we should have sandbox
support, at least. How does it look inside the debugger?
Regards,
Simon
* BTW, with EFI, do the disks and TPM actually all need to be probed
right at the start? I'm not quite convinced that anyone has seriously
looked at this.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously with coroutines
2025-01-28 10:19 ` [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously " Jerome Forissier
2025-01-28 11:58 ` Michal Simek
@ 2025-01-28 21:36 ` Marek Vasut
1 sibling, 0 replies; 12+ messages in thread
From: Marek Vasut @ 2025-01-28 21:36 UTC (permalink / raw)
To: Jerome Forissier, u-boot
Cc: Ilias Apalodimas, Tom Rini, Mattijs Korpershoek,
Heinrich Schuchardt, Dragan Simic, Caleb Connolly
On 1/28/25 11:19 AM, Jerome Forissier wrote:
> Use the coroutines framework to scan USB buses in parallel for better
> performance. Tested on arm64 QEMU on a somewhat contrived example
> (4 USB buses, each with one audio device, one keyboard, one mouse and
> one tablet).
>
> $ make qemu_arm64_defconfig
> $ make -j$(nproc) CROSS_COMPILE="ccache aarch64-linux-gnu-"
> $ qemu-system-aarch64 -M virt -nographic -cpu max -bios u-boot.bin \
> $(for i in {1..4}; do echo -device qemu-xhci,id=xhci$i \
> -device\ usb-{audio,kbd,mouse,tablet},bus=xhci$i.0; \
> done)
>
> The time spent in usb_init() is reported on the console and shows a
> significant improvement with COROUTINES enabled.
Have you considered using the cyclic framework ( cyclic_register() and
co. ) to fully offload USB operations away from the main thread, i.e. to
make the U-Boot shell and e.g. 'usb start' run fully in parallel ? That
could then be extended to block transfers, which could run in
background, and ... we would also get a concept of shell pipes to move
data around like we do in Linux.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/3] Introduce coroutines framework
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
2025-01-28 11:09 ` Michal Simek
@ 2025-01-31 16:10 ` Yao Zi
2025-02-03 11:57 ` Ahmad Fatoum
1 sibling, 1 reply; 12+ messages in thread
From: Yao Zi @ 2025-01-31 16:10 UTC (permalink / raw)
To: Jerome Forissier, u-boot
Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Patrick Rudolph,
Sughosh Ganu, Michal Simek, Raymond Mao, Heinrich Schuchardt
On Tue, Jan 28, 2025 at 11:19:15AM +0100, Jerome Forissier wrote:
> Adds the COROUTINES Kconfig symbol which introduces a new internal API
> for coroutines support. As explained in the Kconfig file, this is meant
> to provide some kind of cooperative multi-tasking with the goal to
> improve performance by overlapping lengthy operations.
>
> The API as well as the implementation is very much inspired from libaco
> [1]. The reference implementation is simplified to remove all things
> not needed in U-Boot, the coding style is updated, and the aco_ prefix
> is replaced by co_.
>
> I believe the stack handling could be simplified: the stack of the main
> coroutine could probably probably be used by the secondary coroutines
> instead of allocating a new stack dynamically.
>
> Only i386, x86_64 and aarch64 are supported at the moment. Other
> architectures need to provide a _co_switch() function in assembly.
>
> Only aarch64 has been tested.
>
> [1] https://github.com/hnes/libaco/
>
> Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org>
> ---
> arch/arm/cpu/armv8/Makefile | 1 +
> arch/arm/cpu/armv8/co_switch.S | 36 +++++++
> include/coroutines.h | 130 ++++++++++++++++++++++++++
> lib/Kconfig | 10 ++
> lib/Makefile | 2 +
> lib/coroutines.c | 165 +++++++++++++++++++++++++++++++++
> 6 files changed, 344 insertions(+)
> create mode 100644 arch/arm/cpu/armv8/co_switch.S
> create mode 100644 include/coroutines.h
> create mode 100644 lib/coroutines.c
>
> diff --git a/arch/arm/cpu/armv8/Makefile b/arch/arm/cpu/armv8/Makefile
> index 2e71ff2dc97..6d07b6aa9f9 100644
> --- a/arch/arm/cpu/armv8/Makefile
> +++ b/arch/arm/cpu/armv8/Makefile
> @@ -46,3 +46,4 @@ obj-$(CONFIG_TARGET_BCMNS3) += bcmns3/
> obj-$(CONFIG_XEN) += xen/
> obj-$(CONFIG_ARMV8_CE_SHA1) += sha1_ce_glue.o sha1_ce_core.o
> obj-$(CONFIG_ARMV8_CE_SHA256) += sha256_ce_glue.o sha256_ce_core.o
> +obj-$(CONFIG_COROUTINES) += co_switch.o
> diff --git a/arch/arm/cpu/armv8/co_switch.S b/arch/arm/cpu/armv8/co_switch.S
> new file mode 100644
> index 00000000000..4405e89ec56
> --- /dev/null
> +++ b/arch/arm/cpu/armv8/co_switch.S
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/* void _co_switch(struct uco *from_co, struct uco *to_co); */
> +.text
> +.globl _co_switch
> +.type _co_switch, @function
> +_co_switch:
> + // x0: from_co
> + // x1: to_co
> + // from_co and to_co layout: { pc, sp, x19-x29 }
> +
> + // Save context to from_co (x0)
> + // AAPCS64 says "A subroutine invocation must preserve the contents of the
> + // registers r19-r29 and SP"
> + adr x2, 1f // pc we should use to resume after this function
> + mov x3, sp
> + stp x2, x3, [x0, #0] // pc, sp
> + stp x19, x20, [x0, #16]
> + stp x21, x22, [x0, #32]
> + stp x23, x24, [x0, #48]
> + stp x25, x26, [x0, #64]
> + stp x27, x28, [x0, #80]
> + stp x29, x30, [x0, #96]
> +
> + // Load new context from to_co (x1)
> + ldp x2, x3, [x1, #0] // pc, sp
> + ldp x19, x20, [x1, #16]
> + ldp x21, x22, [x1, #32]
> + ldp x23, x24, [x1, #48]
> + ldp x25, x26, [x1, #64]
> + ldp x27, x28, [x1, #80]
> + ldp x29, x30, [x1, #96]
> + mov sp, x3
> + br x2
> +
> +1: // Return to the caller
> + ret
We've done similar context switching in setjmp/longjmp. Is it possible
to unify this part and get rid of the duplicated assembly for each
architecture?
The jmp_buf structure is actually non-opaque to the caller through
jmp_buf_data, thus I believe this logic could be rewritten in C with
setjmp/longjmp(),
if (!setjmp(from_co))
longjmp(to_co);
else
return;
and replace co.regs with jmp_buf_data.
btw, I guess the jmp_buf_data type is kept for historical usage in EFI
implementation, but looking through our EFI code it seems the details of
jmp_buf aren't used anymore now. So maybe it's the time to clean up and
make it a better context-switching API as well.
> diff --git a/include/coroutines.h b/include/coroutines.h
> new file mode 100644
> index 00000000000..b85b656127c
> --- /dev/null
> +++ b/include/coroutines.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later */
> +/*
> + * Copyright 2018 Sen Han <00hnes@gmail.com>
> + * Copyright 2025 Linaro Limited
> + */
> +
> +#ifndef _COROUTINES_H_
> +#define _COROUTINES_H_
> +
> +#ifndef CONFIG_COROUTINES
> +
> +static inline void co_yield(void) {}
> +static inline void co_exit(void) {}
> +
> +#else
> +
> +#ifdef __UBOOT__
> +#include <log.h>
> +#else
> +#include <assert.h>
> +#endif
> +#include <limits.h>
> +#include <stdbool.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <time.h>
> +
> +#ifdef __aarch64__
> +#define CO_REG_IDX_RETADDR 0
> +#define CO_REG_IDX_SP 1
> +#else
> +#error Architecture no supported
> +#endif
> +
> +struct co_save_stack {
> + void* ptr;
> + size_t sz;
> + size_t valid_sz;
> + size_t max_cpsz; /* max copy size in bytes */
> +};
> +
> +struct co_stack {
> + void *ptr;
> + size_t sz;
> + void *align_highptr;
> + void *align_retptr;
> + size_t align_validsz;
> + size_t align_limit;
> + struct co *owner;
> + void *real_ptr;
> + size_t real_sz;
> +};
> +
> +struct co {
> + /* CPU state: callee-saved registers plus SP and PC */
> + void *reg[14]; // pc, sp, x19-x29, x30 (lr)
> +
> + struct co *main_co;
> + void *arg;
> + bool done;
> +
> + void (*fp)(void);
> +
> + struct co_save_stack save_stack;
> + struct co_stack *stack;
> +};
> +
> +extern struct co *current_co;
> +
> +static inline struct co *co_get_co(void)
> +{
> + return current_co;
> +}
> +
> +static inline void *co_get_arg(void)
> +{
> + return co_get_co()->arg;
> +}
> +
> +struct co_stack *co_stack_new(size_t sz);
> +
> +void co_stack_destroy(struct co_stack *s);
> +
> +struct co *co_create(struct co *main_co,
> + struct co_stack *stack,
> + size_t save_stack_sz, void (*fp)(void),
> + void *arg);
> +
> +void co_resume(struct co *resume_co);
> +
> +void co_destroy(struct co *co);
> +
> +void *_co_switch(struct co *from_co, struct co *to_co);
> +
> +static inline void _co_yield_to_main_co(struct co *yield_co)
> +{
> + assert(yield_co);
> + assert(yield_co->main_co);
> + _co_switch(yield_co, yield_co->main_co);
> +}
> +
> +static inline void co_yield(void)
> +{
> + if (current_co)
> + _co_yield_to_main_co(current_co);
> +}
> +
> +static inline bool co_is_main_co(struct co *co)
> +{
> + return !co->main_co;
> +}
> +
> +static inline void co_exit(void)
> +{
> + struct co *co = co_get_co();
> +
> + if (!co)
> + return;
> + co->done = true;
> + assert(co->stack->owner == co);
> + co->stack->owner = NULL;
> + co->stack->align_validsz = 0;
> + _co_yield_to_main_co(co);
> + assert(false);
> +}
> +
> +#endif /* CONFIG_COROUTINES */
> +#endif /* _COROUTINES_H_ */
> diff --git a/lib/Kconfig b/lib/Kconfig
> index 8f1a96d98c4..b6c1380b927 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
> enable this config option to distinguish them using
> phandles in fdtdec_get_alias_seq() function.
>
> +config COROUTINES
> + bool "Enable coroutine support"
> + help
> + Coroutines allow to implement a simple form of cooperative
> + multi-tasking. The main thread of execution registers one or
> + more functions as coroutine entry points, then it schedules one
> + of them. At any point the scheduled coroutine may yield, that is,
> + suspend its execution and return back to the main thread. At this
> + point another coroutine may be scheduled and so on until all the
> + registered coroutines are done.
> endmenu
>
> source "lib/fwu_updates/Kconfig"
> diff --git a/lib/Makefile b/lib/Makefile
> index 5cb3278d2ef..7b809151f5a 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
>
> obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
>
> +obj-$(CONFIG_COROUTINES) += coroutines.o
> +
> #
> # Build a fast OID lookup registry from include/linux/oid_registry.h
> #
> diff --git a/lib/coroutines.c b/lib/coroutines.c
> new file mode 100644
> index 00000000000..20c5aba5510
> --- /dev/null
> +++ b/lib/coroutines.c
> @@ -0,0 +1,165 @@
> +// SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later
> +
> +// Copyright 2018 Sen Han <00hnes@gmail.com>
> +// Copyright 2025 Linaro Limited
> +
> +#include <coroutines.h>
> +#include <stdio.h>
> +#include <stdint.h>
> +
> +
> +/* Current co-routine */
> +struct co *current_co;
> +
> +struct co_stack *co_stack_new(size_t sz)
> +{
> + struct co_stack *p = calloc(1, sizeof(*p));
> + uintptr_t u_p;
> +
> + if (!p)
> + return NULL;
> +
> + if (sz < 4096)
> + sz = 4096;
> +
> + p->sz = sz;
> + p->ptr = malloc(sz);
> + if (!p->ptr) {
> + free(p);
> + return NULL;
> + }
> +
> + p->owner = NULL;
> + u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
> + u_p = (u_p >> 4) << 4;
> + p->align_highptr = (void*)u_p;
> + p->align_retptr = (void*)(u_p - sizeof(void*));
> + assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
> + p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
> +
> + return p;
> +}
> +
> +void co_stack_destroy(struct co_stack *s){
> + if (!s)
> + return;
> + free(s->ptr);
> + free(s);
> +}
> +
> +struct co *co_create(struct co *main_co,
> + struct co_stack *stack,
> + size_t save_stack_sz,
> + void (*fp)(void), void *arg)
> +{
> + struct co *p = malloc(sizeof(*p));
> + assert(p);
> + memset(p, 0, sizeof(*p));
> +
> + if (main_co) {
> + assert(stack);
> + p->stack = stack;
> + p->reg[CO_REG_IDX_RETADDR] = (void *)fp;
> + // FIXME original code uses align_retptr; causes a crash
> + p->reg[CO_REG_IDX_SP] = p->stack->align_highptr;
> + p->main_co = main_co;
> + p->arg = arg;
> + p->fp = fp;
> + if (!save_stack_sz)
> + save_stack_sz = 64;
> + p->save_stack.ptr = malloc(save_stack_sz);
> + assert(p->save_stack.ptr);
> + p->save_stack.sz = save_stack_sz;
> + p->save_stack.valid_sz = 0;
> + } else {
> + p->main_co = NULL;
> + p->arg = arg;
> + p->fp = fp;
> + p->stack = NULL;
> + p->save_stack.ptr = NULL;
> + }
> + return p;
> +}
> +
> +static void grab_stack(struct co *resume_co)
> +{
> + struct co *owner_co = resume_co->stack->owner;
> +
> + if (owner_co) {
> + assert(owner_co->stack == resume_co->stack);
> + assert((uintptr_t)(owner_co->stack->align_retptr) >=
> + (uintptr_t)(owner_co->reg[CO_REG_IDX_SP]));
> + assert((uintptr_t)owner_co->stack->align_highptr -
> + (uintptr_t)owner_co->stack->align_limit
> + <= (uintptr_t)owner_co->reg[CO_REG_IDX_SP]);
> + owner_co->save_stack.valid_sz =
> + (uintptr_t)owner_co->stack->align_retptr -
> + (uintptr_t)owner_co->reg[CO_REG_IDX_SP];
> + if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
> + free(owner_co->save_stack.ptr);
> + owner_co->save_stack.ptr = NULL;
> + do {
> + owner_co->save_stack.sz <<= 1;
> + assert(owner_co->save_stack.sz > 0);
> + } while (owner_co->save_stack.sz <
> + owner_co->save_stack.valid_sz);
> + owner_co->save_stack.ptr =
> + malloc(owner_co->save_stack.sz);
> + assert(owner_co->save_stack.ptr);
> + }
> + if (owner_co->save_stack.valid_sz > 0)
> + memcpy(owner_co->save_stack.ptr,
> + owner_co->reg[CO_REG_IDX_SP],
> + owner_co->save_stack.valid_sz);
> + if (owner_co->save_stack.valid_sz >
> + owner_co->save_stack.max_cpsz)
> + owner_co->save_stack.max_cpsz =
> + owner_co->save_stack.valid_sz;
> + owner_co->stack->owner = NULL;
> + owner_co->stack->align_validsz = 0;
> + }
> + assert(!resume_co->stack->owner);
> + assert(resume_co->save_stack.valid_sz <=
> + resume_co->stack->align_limit - sizeof(void *));
> + if (resume_co->save_stack.valid_sz > 0)
> + memcpy((void*)
> + (uintptr_t)(resume_co->stack->align_retptr) -
> + resume_co->save_stack.valid_sz,
> + resume_co->save_stack.ptr,
> + resume_co->save_stack.valid_sz);
> + if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
> + resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
> + resume_co->stack->align_validsz =
> + resume_co->save_stack.valid_sz + sizeof(void *);
> + resume_co->stack->owner = resume_co;
> +}
> +
> +void co_resume(struct co *resume_co)
> +{
> + assert(resume_co && resume_co->main_co && !resume_co->done);
> +
> + if (resume_co->stack->owner != resume_co)
> + grab_stack(resume_co);
> +
> + current_co = resume_co;
> + _co_switch(resume_co->main_co, resume_co);
> + current_co = resume_co->main_co;
> +}
> +
> +void co_destroy(struct co *co){
> + if (!co)
> + return;
> +
> + if(co_is_main_co(co)){
> + free(co);
> + current_co = NULL;
> + } else {
> + if(co->stack->owner == co){
> + co->stack->owner = NULL;
> + co->stack->align_validsz = 0;
> + }
> + free(co->save_stack.ptr);
> + co->save_stack.ptr = NULL;
> + free(co);
> + }
> +}
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/3] Introduce coroutines framework
2025-01-31 16:10 ` Yao Zi
@ 2025-02-03 11:57 ` Ahmad Fatoum
0 siblings, 0 replies; 12+ messages in thread
From: Ahmad Fatoum @ 2025-02-03 11:57 UTC (permalink / raw)
To: Yao Zi, Jerome Forissier, u-boot
Cc: Ilias Apalodimas, Tom Rini, Simon Glass, Patrick Rudolph,
Sughosh Ganu, Michal Simek, Raymond Mao, Heinrich Schuchardt
Hi,
On 31.01.25 17:10, Yao Zi wrote:
> On Tue, Jan 28, 2025 at 11:19:15AM +0100, Jerome Forissier wrote:
>> +.globl _co_switch
>> +.type _co_switch, @function
>> +_co_switch:
>> + // x0: from_co
>> + // x1: to_co
>> + // from_co and to_co layout: { pc, sp, x19-x29 }
>> +
>> + // Save context to from_co (x0)
>> + // AAPCS64 says "A subroutine invocation must preserve the contents of the
>> + // registers r19-r29 and SP"
>> + adr x2, 1f // pc we should use to resume after this function
>> + mov x3, sp
>> + stp x2, x3, [x0, #0] // pc, sp
>> + stp x19, x20, [x0, #16]
>> + stp x21, x22, [x0, #32]
>> + stp x23, x24, [x0, #48]
>> + stp x25, x26, [x0, #64]
>> + stp x27, x28, [x0, #80]
>> + stp x29, x30, [x0, #96]
>> +
>> + // Load new context from to_co (x1)
>> + ldp x2, x3, [x1, #0] // pc, sp
>> + ldp x19, x20, [x1, #16]
>> + ldp x21, x22, [x1, #32]
>> + ldp x23, x24, [x1, #48]
>> + ldp x25, x26, [x1, #64]
>> + ldp x27, x28, [x1, #80]
>> + ldp x29, x30, [x1, #96]
>> + mov sp, x3
>> + br x2
>> +
>> +1: // Return to the caller
>> + ret
>
> We've done similar context switching in setjmp/longjmp. Is it possible
> to unify this part and get rid of the duplicated assembly for each
> architecture?
>
> The jmp_buf structure is actually non-opaque to the caller through
> jmp_buf_data, thus I believe this logic could be rewritten in C with
> setjmp/longjmp(),
>
> if (!setjmp(from_co))
> longjmp(to_co);
> else
> return;
>
> and replace co.regs with jmp_buf_data.
That's what the equivalent feature in barebox (called bthreads) is doing
as well: https://github.com/barebox/barebox/blob/master/common/bthread.c#L116
In addition to longjmp/setjmp, a third initjmp was introduced for all
architectures that allows creating the new context from scratch.
Cheers,
Ahmad
>
> btw, I guess the jmp_buf_data type is kept for historical usage in EFI
> implementation, but looking through our EFI code it seems the details of
> jmp_buf aren't used anymore now. So maybe it's the time to clean up and
> make it a better context-switching API as well.
>
>> diff --git a/include/coroutines.h b/include/coroutines.h
>> new file mode 100644
>> index 00000000000..b85b656127c
>> --- /dev/null
>> +++ b/include/coroutines.h
>> @@ -0,0 +1,130 @@
>> +/* SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later */
>> +/*
>> + * Copyright 2018 Sen Han <00hnes@gmail.com>
>> + * Copyright 2025 Linaro Limited
>> + */
>> +
>> +#ifndef _COROUTINES_H_
>> +#define _COROUTINES_H_
>> +
>> +#ifndef CONFIG_COROUTINES
>> +
>> +static inline void co_yield(void) {}
>> +static inline void co_exit(void) {}
>> +
>> +#else
>> +
>> +#ifdef __UBOOT__
>> +#include <log.h>
>> +#else
>> +#include <assert.h>
>> +#endif
>> +#include <limits.h>
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <time.h>
>> +
>> +#ifdef __aarch64__
>> +#define CO_REG_IDX_RETADDR 0
>> +#define CO_REG_IDX_SP 1
>> +#else
>> +#error Architecture no supported
>> +#endif
>> +
>> +struct co_save_stack {
>> + void* ptr;
>> + size_t sz;
>> + size_t valid_sz;
>> + size_t max_cpsz; /* max copy size in bytes */
>> +};
>> +
>> +struct co_stack {
>> + void *ptr;
>> + size_t sz;
>> + void *align_highptr;
>> + void *align_retptr;
>> + size_t align_validsz;
>> + size_t align_limit;
>> + struct co *owner;
>> + void *real_ptr;
>> + size_t real_sz;
>> +};
>> +
>> +struct co {
>> + /* CPU state: callee-saved registers plus SP and PC */
>> + void *reg[14]; // pc, sp, x19-x29, x30 (lr)
>> +
>> + struct co *main_co;
>> + void *arg;
>> + bool done;
>> +
>> + void (*fp)(void);
>> +
>> + struct co_save_stack save_stack;
>> + struct co_stack *stack;
>> +};
>> +
>> +extern struct co *current_co;
>> +
>> +static inline struct co *co_get_co(void)
>> +{
>> + return current_co;
>> +}
>> +
>> +static inline void *co_get_arg(void)
>> +{
>> + return co_get_co()->arg;
>> +}
>> +
>> +struct co_stack *co_stack_new(size_t sz);
>> +
>> +void co_stack_destroy(struct co_stack *s);
>> +
>> +struct co *co_create(struct co *main_co,
>> + struct co_stack *stack,
>> + size_t save_stack_sz, void (*fp)(void),
>> + void *arg);
>> +
>> +void co_resume(struct co *resume_co);
>> +
>> +void co_destroy(struct co *co);
>> +
>> +void *_co_switch(struct co *from_co, struct co *to_co);
>> +
>> +static inline void _co_yield_to_main_co(struct co *yield_co)
>> +{
>> + assert(yield_co);
>> + assert(yield_co->main_co);
>> + _co_switch(yield_co, yield_co->main_co);
>> +}
>> +
>> +static inline void co_yield(void)
>> +{
>> + if (current_co)
>> + _co_yield_to_main_co(current_co);
>> +}
>> +
>> +static inline bool co_is_main_co(struct co *co)
>> +{
>> + return !co->main_co;
>> +}
>> +
>> +static inline void co_exit(void)
>> +{
>> + struct co *co = co_get_co();
>> +
>> + if (!co)
>> + return;
>> + co->done = true;
>> + assert(co->stack->owner == co);
>> + co->stack->owner = NULL;
>> + co->stack->align_validsz = 0;
>> + _co_yield_to_main_co(co);
>> + assert(false);
>> +}
>> +
>> +#endif /* CONFIG_COROUTINES */
>> +#endif /* _COROUTINES_H_ */
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index 8f1a96d98c4..b6c1380b927 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -1226,6 +1226,16 @@ config PHANDLE_CHECK_SEQ
>> enable this config option to distinguish them using
>> phandles in fdtdec_get_alias_seq() function.
>>
>> +config COROUTINES
>> + bool "Enable coroutine support"
>> + help
>> + Coroutines allow to implement a simple form of cooperative
>> + multi-tasking. The main thread of execution registers one or
>> + more functions as coroutine entry points, then it schedules one
>> + of them. At any point the scheduled coroutine may yield, that is,
>> + suspend its execution and return back to the main thread. At this
>> + point another coroutine may be scheduled and so on until all the
>> + registered coroutines are done.
>> endmenu
>>
>> source "lib/fwu_updates/Kconfig"
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 5cb3278d2ef..7b809151f5a 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -159,6 +159,8 @@ obj-$(CONFIG_LIB_ELF) += elf.o
>>
>> obj-$(CONFIG_$(PHASE_)SEMIHOSTING) += semihosting.o
>>
>> +obj-$(CONFIG_COROUTINES) += coroutines.o
>> +
>> #
>> # Build a fast OID lookup registry from include/linux/oid_registry.h
>> #
>> diff --git a/lib/coroutines.c b/lib/coroutines.c
>> new file mode 100644
>> index 00000000000..20c5aba5510
>> --- /dev/null
>> +++ b/lib/coroutines.c
>> @@ -0,0 +1,165 @@
>> +// SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later
>> +
>> +// Copyright 2018 Sen Han <00hnes@gmail.com>
>> +// Copyright 2025 Linaro Limited
>> +
>> +#include <coroutines.h>
>> +#include <stdio.h>
>> +#include <stdint.h>
>> +
>> +
>> +/* Current co-routine */
>> +struct co *current_co;
>> +
>> +struct co_stack *co_stack_new(size_t sz)
>> +{
>> + struct co_stack *p = calloc(1, sizeof(*p));
>> + uintptr_t u_p;
>> +
>> + if (!p)
>> + return NULL;
>> +
>> + if (sz < 4096)
>> + sz = 4096;
>> +
>> + p->sz = sz;
>> + p->ptr = malloc(sz);
>> + if (!p->ptr) {
>> + free(p);
>> + return NULL;
>> + }
>> +
>> + p->owner = NULL;
>> + u_p = (uintptr_t)(p->sz - (sizeof(void*) << 1) + (uintptr_t)p->ptr);
>> + u_p = (u_p >> 4) << 4;
>> + p->align_highptr = (void*)u_p;
>> + p->align_retptr = (void*)(u_p - sizeof(void*));
>> + assert(p->sz > (16 + (sizeof(void*) << 1) + sizeof(void*)));
>> + p->align_limit = p->sz - 16 - (sizeof(void*) << 1);
>> +
>> + return p;
>> +}
>> +
>> +void co_stack_destroy(struct co_stack *s){
>> + if (!s)
>> + return;
>> + free(s->ptr);
>> + free(s);
>> +}
>> +
>> +struct co *co_create(struct co *main_co,
>> + struct co_stack *stack,
>> + size_t save_stack_sz,
>> + void (*fp)(void), void *arg)
>> +{
>> + struct co *p = malloc(sizeof(*p));
>> + assert(p);
>> + memset(p, 0, sizeof(*p));
>> +
>> + if (main_co) {
>> + assert(stack);
>> + p->stack = stack;
>> + p->reg[CO_REG_IDX_RETADDR] = (void *)fp;
>> + // FIXME original code uses align_retptr; causes a crash
>> + p->reg[CO_REG_IDX_SP] = p->stack->align_highptr;
>> + p->main_co = main_co;
>> + p->arg = arg;
>> + p->fp = fp;
>> + if (!save_stack_sz)
>> + save_stack_sz = 64;
>> + p->save_stack.ptr = malloc(save_stack_sz);
>> + assert(p->save_stack.ptr);
>> + p->save_stack.sz = save_stack_sz;
>> + p->save_stack.valid_sz = 0;
>> + } else {
>> + p->main_co = NULL;
>> + p->arg = arg;
>> + p->fp = fp;
>> + p->stack = NULL;
>> + p->save_stack.ptr = NULL;
>> + }
>> + return p;
>> +}
>> +
>> +static void grab_stack(struct co *resume_co)
>> +{
>> + struct co *owner_co = resume_co->stack->owner;
>> +
>> + if (owner_co) {
>> + assert(owner_co->stack == resume_co->stack);
>> + assert((uintptr_t)(owner_co->stack->align_retptr) >=
>> + (uintptr_t)(owner_co->reg[CO_REG_IDX_SP]));
>> + assert((uintptr_t)owner_co->stack->align_highptr -
>> + (uintptr_t)owner_co->stack->align_limit
>> + <= (uintptr_t)owner_co->reg[CO_REG_IDX_SP]);
>> + owner_co->save_stack.valid_sz =
>> + (uintptr_t)owner_co->stack->align_retptr -
>> + (uintptr_t)owner_co->reg[CO_REG_IDX_SP];
>> + if (owner_co->save_stack.sz < owner_co->save_stack.valid_sz) {
>> + free(owner_co->save_stack.ptr);
>> + owner_co->save_stack.ptr = NULL;
>> + do {
>> + owner_co->save_stack.sz <<= 1;
>> + assert(owner_co->save_stack.sz > 0);
>> + } while (owner_co->save_stack.sz <
>> + owner_co->save_stack.valid_sz);
>> + owner_co->save_stack.ptr =
>> + malloc(owner_co->save_stack.sz);
>> + assert(owner_co->save_stack.ptr);
>> + }
>> + if (owner_co->save_stack.valid_sz > 0)
>> + memcpy(owner_co->save_stack.ptr,
>> + owner_co->reg[CO_REG_IDX_SP],
>> + owner_co->save_stack.valid_sz);
>> + if (owner_co->save_stack.valid_sz >
>> + owner_co->save_stack.max_cpsz)
>> + owner_co->save_stack.max_cpsz =
>> + owner_co->save_stack.valid_sz;
>> + owner_co->stack->owner = NULL;
>> + owner_co->stack->align_validsz = 0;
>> + }
>> + assert(!resume_co->stack->owner);
>> + assert(resume_co->save_stack.valid_sz <=
>> + resume_co->stack->align_limit - sizeof(void *));
>> + if (resume_co->save_stack.valid_sz > 0)
>> + memcpy((void*)
>> + (uintptr_t)(resume_co->stack->align_retptr) -
>> + resume_co->save_stack.valid_sz,
>> + resume_co->save_stack.ptr,
>> + resume_co->save_stack.valid_sz);
>> + if (resume_co->save_stack.valid_sz > resume_co->save_stack.max_cpsz)
>> + resume_co->save_stack.max_cpsz = resume_co->save_stack.valid_sz;
>> + resume_co->stack->align_validsz =
>> + resume_co->save_stack.valid_sz + sizeof(void *);
>> + resume_co->stack->owner = resume_co;
>> +}
>> +
>> +void co_resume(struct co *resume_co)
>> +{
>> + assert(resume_co && resume_co->main_co && !resume_co->done);
>> +
>> + if (resume_co->stack->owner != resume_co)
>> + grab_stack(resume_co);
>> +
>> + current_co = resume_co;
>> + _co_switch(resume_co->main_co, resume_co);
>> + current_co = resume_co->main_co;
>> +}
>> +
>> +void co_destroy(struct co *co){
>> + if (!co)
>> + return;
>> +
>> + if(co_is_main_co(co)){
>> + free(co);
>> + current_co = NULL;
>> + } else {
>> + if(co->stack->owner == co){
>> + co->stack->owner = NULL;
>> + co->stack->align_validsz = 0;
>> + }
>> + free(co->save_stack.ptr);
>> + co->save_stack.ptr = NULL;
>> + free(co);
>> + }
>> +}
>> --
>> 2.43.0
>>
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-03 11:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 10:19 [RFC PATCH v2 0/3] Coroutines Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 1/3] Introduce coroutines framework Jerome Forissier
2025-01-28 11:09 ` Michal Simek
2025-01-28 13:30 ` Jerome Forissier
2025-01-31 16:10 ` Yao Zi
2025-02-03 11:57 ` Ahmad Fatoum
2025-01-28 10:19 ` [RFC PATCH v2 2/3] efi_loader: optimize efi_init_obj_list() with coroutines Jerome Forissier
2025-01-28 10:19 ` [RFC PATCH v2 3/3] usb: scan multiple buses simultaneously " Jerome Forissier
2025-01-28 11:58 ` Michal Simek
2025-01-28 13:53 ` Jerome Forissier
2025-01-28 21:36 ` Marek Vasut
2025-01-28 14:04 ` [RFC PATCH v2 0/3] Coroutines Simon Glass
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.