From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BE0D198E81; Thu, 31 Jul 2025 05:07:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753938463; cv=none; b=cJXL8v2oKJvnfiO2I6AvYVXXdUA69WfEBss8cOlOkDNP5RK3yI6uLChRHFV4yk9Sx49MVs7cZkoSG4iMk+WX9YO2McYcsotEKdyd7vMmhiZSWbS6lhtCu8sWrzQWU+eVTohQfJ+oS/Xj0yidyZUpCz0Dw2f5A4idxxMTIiO/nS0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753938463; c=relaxed/simple; bh=J8XH8c+tisSXBhUoA5RBsq2Rept4gnon3SkIS0FdLGE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XfgTjsP3+s/f0ulmbo2U/JDS6oYPPN9vhhvOKPk6saMvnS/++1sHy+gmBshka6/QYhSPv3vyZS+9MgDgJvYn/LUy/Gj54tV5stL4Pix5g8yyzFQvcd50Xu6h9s8x1wEfLJyiik15cxpXBnuelpXxWYtBcEJVoaZTQ/+hb734qqU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=szWpnu6q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="szWpnu6q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 14333C4CEEF; Thu, 31 Jul 2025 05:07:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753938462; bh=J8XH8c+tisSXBhUoA5RBsq2Rept4gnon3SkIS0FdLGE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=szWpnu6qY8RxzEOTkXNjkmAh0TaXV6dOhpYBC8SP9u4i+DuiPiwAD3cpM7F5q3+WF yZK6V1IbuNsbLFlnVvwogfqwGN3LAvmKbRiFvSIgLwmTCVRZ555qI7PXTMPbegi4VA PdtM029/OOokfbliiIC6/9NiVENw9ikVjeaMEDRmn+C/8Ah0sjINBHzc3Rgiwx16ms AwjxJdiG+hzbnF5YrZT97oOQxKgbHH1PIDnEOHQiWf4MvlWW/5f5OWs3Ra6Gs4IQwU 3qZ+v/bK6fAk8eTLCepOnVqa79DjdSsoAH5Wyr5pN/iQIAN9vuDJ+QBsGng7hU1Lis 5nOMe3H0hp4vA== Date: Wed, 30 Jul 2025 22:07:40 -0700 From: Namhyung Kim To: Yuzhuo Jing Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Liang Kan , Yuzhuo Jing , Andrea Parri , Palmer Dabbelt , Charlie Jenkins , Sebastian Andrzej Siewior , Kumar Kartikeya Dwivedi , Alexei Starovoitov , Barret Rhoden , Alexandre Ghiti , Guo Ren , linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Subject: Re: [PATCH v1 4/7] tools: Implement userspace per-cpu Message-ID: References: <20250729022640.3134066-1-yuzhuo@google.com> <20250729022640.3134066-5-yuzhuo@google.com> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250729022640.3134066-5-yuzhuo@google.com> On Mon, Jul 28, 2025 at 07:26:37PM -0700, Yuzhuo Jing wrote: > Implement userspace per-cpu for imported kernel code. Compared with > simple thread-local definition, the kernel per-cpu provides 1) a > guarantee of static lifetime even when thread exits, and 2) the ability > to access other CPU's per-cpu data. > > This patch adds an alternative implementation and interface for > userspace per-cpu. The kernel implementation uses special ELF sections > and offset calculation. For simplicity, this version defines a > PERCPU_MAX length global array for each per-cpu data, and uses a > thread-local cpu id for indexing. > > Signed-off-by: Yuzhuo Jing > --- > tools/include/linux/compiler_types.h | 3 + > tools/include/linux/percpu-simulate.h | 128 ++++++++++++++++++++++++++ > 2 files changed, 131 insertions(+) > create mode 100644 tools/include/linux/percpu-simulate.h > > diff --git a/tools/include/linux/compiler_types.h b/tools/include/linux/compiler_types.h > index 9a2a2f8d7b6c..46550c500b8c 100644 > --- a/tools/include/linux/compiler_types.h > +++ b/tools/include/linux/compiler_types.h > @@ -31,6 +31,9 @@ > # define __cond_lock(x,c) (c) > #endif /* __CHECKER__ */ > > +/* Per-cpu checker flag does not use address space attribute in userspace */ > +#define __percpu > + > /* > * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving > * non-scalar types unchanged. > diff --git a/tools/include/linux/percpu-simulate.h b/tools/include/linux/percpu-simulate.h > new file mode 100644 > index 000000000000..a6af2f2211eb > --- /dev/null > +++ b/tools/include/linux/percpu-simulate.h > @@ -0,0 +1,128 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Userspace implementation of per_cpu_ptr for adapted kernel code. > + * > + * Userspace code does not have and does not need a per-cpu concept, but > + * instead can declare variables as thread-local. However, the kernel per-cpu > + * further provides 1) the guarantee of static lifetime when thread exits, and > + * 2) the ability to access other CPU's per-cpu data. This file provides a > + * simple implementation of such functionality, but with slightly different > + * APIs and without linker script changes. > + * > + * 2025 Yuzhuo Jing > + */ > +#ifndef __PERCPU_SIMULATE_H__ > +#define __PERCPU_SIMULATE_H__ > + > +#include > + > +#include > +#include > + > +/* > + * The maximum supported number of CPUs. Per-cpu variables are defined as a > + * PERCPU_MAX length array, indexed by a thread-local cpu id. > + */ > +#define PERCPU_MAX 4096 > + > +#ifdef ASSERT_PERCPU > +#define __check_cpu_id(cpu) \ > +({ \ > + u32 cpuid = (cpu); \ > + assert(cpuid < PERCPU_MAX); \ > + cpuid; \ > +}) > +#else > +#define __check_cpu_id(cpu) (cpu) > +#endif > + > +/* > + * Use weak symbol: only define __thread_per_cpu_id variable if any perf tool > + * includes this header file. > + */ > +_Thread_local u32 __thread_per_cpu_id __weak; Is there any overhead (or some indirection) when using the thread local variable? > + > +static inline u32 get_this_cpu_id(void) > +{ > + return __thread_per_cpu_id; > +} > + > +/* > + * The user code must call this function inside of each thread that uses > + * per-cpu data structures. The user code can choose an id of their choice, > + * but must ensure each thread uses a different id. > + * > + * Safety: asserts CPU id smaller than PERCPU_MAX if ASSERT_PERCPU is defined. > + */ > +static inline void set_this_cpu_id(u32 id) > +{ > + __thread_per_cpu_id = __check_cpu_id(id); > +} > + > +/* > + * Declare a per-cpu data structure. This only declares the data type and > + * array length. Different per-cpu data are differentiated by a key (identifer). > + * > + * Different from the kernel version, this API must be called before the actual > + * definition (i.e. DEFINE_PER_CPU_ALIGNED). > + * > + * Note that this implementation does not support prepending static qualifier, > + * or appending assignment expressions. > + */ > +#define DECLARE_PER_CPU_ALIGNED(key, type, data) \ > + extern struct __percpu_type_##key { \ > + type data; \ > + } __percpu_data_##key[PERCPU_MAX] > + > +/* > + * Define the per-cpu data storage for a given key. This uses a previously > + * defined data type in DECLARE_PER_CPU_ALIGNED. > + * > + * Different from the kernel version, this API only accepts a key name. > + */ > +#define DEFINE_PER_CPU_ALIGNED(key) \ > + struct __percpu_type_##key __percpu_data_##key[PERCPU_MAX] How do these APIs guarantee the alignment? Thanks, Namhyung > + > +#define __raw_per_cpu_value(key, field, cpu) \ > + (__percpu_data_##key[cpu].field) > + > +/* > + * Get a pointer of per-cpu data for a given key. > + * > + * Different from the kernel version, users of this API don't need to pass the > + * address of the base variable (through `&varname'). > + * > + * Safety: asserts CPU id smaller than PERCPU_MAX if ASSERT_PERCPU is defined. > + */ > +#define per_cpu_ptr(key, field, cpu) (&per_cpu_value(key, field, cpu)) > +#define this_cpu_ptr(key, field) (&this_cpu_value(key, field)) > + > +/* > + * Additional APIs for direct value access. Effectively, `*per_cpu_ptr(...)'. > + * > + * Safety: asserts CPU id smaller than PERCPU_MAX if ASSERT_PERCPU is defined. > + */ > +#define per_cpu_value(key, field, cpu) \ > + (__raw_per_cpu_value(key, field, __check_cpu_id(cpu))) > +#define this_cpu_value(key, field) \ > + (__raw_per_cpu_value(key, field, __thread_per_cpu_id)) > + > +/* > + * Helper functions of simple per-cpu operations. > + * > + * The kernel version differentiates __this_cpu_* from this_cpu_* for > + * preemption/interrupt-safe contexts, but the userspace version defines them > + * as the same. > + */ > + > +#define __this_cpu_add(key, field, val) (this_cpu_value(key, field) += (val)) > +#define __this_cpu_sub(key, field, val) (this_cpu_value(key, field) -= (val)) > +#define __this_cpu_inc(key, field) (++this_cpu_value(key, field)) > +#define __this_cpu_dec(key, field) (--this_cpu_value(key, field)) > + > +#define this_cpu_add __this_cpu_add > +#define this_cpu_sub __this_cpu_sub > +#define this_cpu_inc __this_cpu_inc > +#define this_cpu_dec __this_cpu_dec > + > +#endif /* __PERCPU_SIMULATE_H__ */ > -- > 2.50.1.487.gc89ff58d15-goog >