From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A35991B0100; Thu, 20 Jun 2024 15:37:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718897861; cv=none; b=a/PC/rEJG+vwZMrX5UN2F4JOao6m0s4MRsWiVboqXsMdegrWqH1i6GT+mBSZOxPiw7WO3VOXcaAlKd9slbULl3V8k84GVxIYMqFmS/ZAUp5gJ4coRFUY9FG2q8Ngp/bsVCIFzwVesXERAZYJ+JuI82KthBC+12QtWzFr40Pq9Es= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718897861; c=relaxed/simple; bh=IEhC0p9x2snVKydX5aS28QYN4rJYQuxad4mrw0XQSss=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=JN9zR91O+JtLJIilnX4gKTcNrdV4ZKgGicCIAQY7Q/Owt5SgW+dls9aTS/M0pxLVDY1sb++tnEsuNxb1Qy1y/QOjpebWu2pNsVFBAUDOap4uwrccA9fd2OdMaVbvkKOWFPft+FXR9y+9cKkCsJeB9oJgUlHaQCpvDuUAP278z20= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=Som9HEIv; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="Som9HEIv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1718897857; bh=IEhC0p9x2snVKydX5aS28QYN4rJYQuxad4mrw0XQSss=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Som9HEIvEH22Z164vOZrX16wXxl1HAP1NeWg7fM+BxLGd59M8XezlXkbBu3R4f6zB Of1eecIYG55CdEMT6FPb+pWOMQprxc8DtN0nXmjx0QW4Vq6KBRLqmO/5TDoX/gH7nm PMdu+ecNIK6TSmIKxmsK4uQemA7RORKFB4alDxlfrVhZKvAeK4o2t9H6wcUUjxAad3 9nYzmGpVDbVvgvxRAEIk6DJopUyo4XArnoROTdM1NOiIWRmyrgPTgkC39xpHDM1iIz 8kL6z78M5q2k19i4wt5KM94HSQTmLxM7Bs/1Yod+3lVUVtUGAUQs0F3VtvMOUpa4rx TV7HSSukfHm7Q== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4W4l413Jd8z16yl; Thu, 20 Jun 2024 11:37:37 -0400 (EDT) Message-ID: Date: Thu, 20 Jun 2024 11:38:38 -0400 Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 1/5] tracing: Introduce faultable tracepoints To: Steven Rostedt Cc: Masami Hiramatsu , linux-kernel@vger.kernel.org, Michael Jeanson , Peter Zijlstra , Alexei Starovoitov , Yonghong Song , "Paul E . McKenney" , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , bpf@vger.kernel.org, Joel Fernandes References: <20231120205418.334172-1-mathieu.desnoyers@efficios.com> <20231120205418.334172-2-mathieu.desnoyers@efficios.com> <20231120172004.7a1c3acc@gandalf.local.home> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20231120172004.7a1c3acc@gandalf.local.home> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2023-11-20 17:20, Steven Rostedt wrote: > On Mon, 20 Nov 2023 15:54:14 -0500 > Mathieu Desnoyers wrote: > >> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h >> index 4dc4955f0fbf..67bacfaa8fd0 100644 >> --- a/include/linux/tracepoint-defs.h >> +++ b/include/linux/tracepoint-defs.h >> @@ -29,6 +29,19 @@ struct tracepoint_func { >> int prio; >> }; >> >> +/** >> + * enum tracepoint_flags - Tracepoint flags >> + * @TRACEPOINT_MAY_EXIST: Don't return an error if the tracepoint does not >> + * exist upon registration. >> + * @TRACEPOINT_MAY_FAULT: The tracepoint probe callback will be called with >> + * preemption enabled, and is allowed to take page >> + * faults. >> + */ >> +enum tracepoint_flags { >> + TRACEPOINT_MAY_EXIST = (1 << 0), >> + TRACEPOINT_MAY_FAULT = (1 << 1), >> +}; >> + >> struct tracepoint { >> const char *name; /* Tracepoint name */ >> struct static_key key; >> @@ -39,6 +52,7 @@ struct tracepoint { >> int (*regfunc)(void); >> void (*unregfunc)(void); >> struct tracepoint_func __rcu *funcs; >> + unsigned int flags; > > Since faultable and non-faultable events are mutually exclusive, why not > just allocated them separately? Then you could have the __DO_TRACE() macro > get passed in whether the event can be faulted or not, by the created trace. Hi Steven, Sorry for the delayed reply. We're now resuming work on this series. We already have may_exit and want to introduce may_fault. I want to avoid: - combinatory explosion of the number of tracepoint API functions, - allocating tracepoints into different sections based on their characteristics, which will make it unclear how additional axes will later fit into the scheme. - passing a set of booleans to functions as an API, which I find more error prone than explicit flags. I prefer: func(..., TRACEPOINT_MAY_FAULT | TRACEPOINT_MAY_EXIST); over: func(..., true, true); So technically we could split faultable and non-faultable tracepoints into different sections, but how would it be an improvement over the proposed approach ? Note that the registration function checks that the faultable flag of the probe matches the faultable flag of the tracepoint, which prevents mixups already. > > >> }; >> >> #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS >> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h >> index 88c0ba623ee6..8a6b58a2bf3b 100644 >> --- a/include/linux/tracepoint.h >> +++ b/include/linux/tracepoint.h >> @@ -18,6 +18,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -41,17 +42,10 @@ extern int >> tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data, >> int prio); >> extern int >> -tracepoint_probe_register_prio_may_exist(struct tracepoint *tp, void *probe, void *data, >> - int prio); >> +tracepoint_probe_register_prio_flags(struct tracepoint *tp, void *probe, void *data, >> + int prio, unsigned int flags); >> extern int >> tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data); >> -static inline int >> -tracepoint_probe_register_may_exist(struct tracepoint *tp, void *probe, >> - void *data) >> -{ >> - return tracepoint_probe_register_prio_may_exist(tp, probe, data, >> - TRACEPOINT_DEFAULT_PRIO); >> -} >> extern void >> for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), >> void *priv); >> @@ -90,6 +84,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) >> #ifdef CONFIG_TRACEPOINTS >> static inline void tracepoint_synchronize_unregister(void) >> { >> + synchronize_rcu_tasks_trace(); > > As Peter mentioned, why not use the srcu below? This was discussed thoroughly in a separate thread. See https://lore.kernel.org/lkml/e3721b80-4dfb-4914-acfb-b315b8cc45b8@paulmck-laptop/ > >> synchronize_srcu(&tracepoint_srcu); >> synchronize_rcu(); >> } >> @@ -192,9 +187,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> * it_func[0] is never NULL because there is at least one element in the array >> * when the array itself is non NULL. >> */ >> -#define __DO_TRACE(name, args, cond, rcuidle) \ >> +#define __DO_TRACE(name, args, cond, rcuidle, tp_flags) \ >> do { \ >> int __maybe_unused __idx = 0; \ >> + bool mayfault = (tp_flags) & TRACEPOINT_MAY_FAULT; \ >> \ >> if (!(cond)) \ >> return; \ >> @@ -202,8 +198,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> if (WARN_ON_ONCE(RCUIDLE_COND(rcuidle))) \ >> return; \ >> \ >> - /* keep srcu and sched-rcu usage consistent */ \ >> - preempt_disable_notrace(); \ >> + if (mayfault) { \ >> + rcu_read_lock_trace(); \ >> + } else { \ >> + /* keep srcu and sched-rcu usage consistent */ \ >> + preempt_disable_notrace(); \ >> + } \ > > Change the above comment and have: > > if (!mayfault) > preempt_disable_notrace(); > > And we can have: > > if (rcuidle || mayfault) { > __idx = srcu_read_lock_notrace(&tracepoint_srcu); > if (!mayfault) > ct_irq_enter_irqson(); > } Not needed if we keep rcu_read_lock_trace() which exists for this purpose. > >> \ >> /* \ >> * For rcuidle callers, use srcu since sched-rcu \ >> @@ -221,20 +221,23 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ >> } \ >> \ >> - preempt_enable_notrace(); \ >> + if (mayfault) \ >> + rcu_read_unlock_trace(); \ >> + else \ >> + preempt_enable_notrace(); \ >> } while (0) >> >> #ifndef MODULE >> -#define __DECLARE_TRACE_RCU(name, proto, args, cond) \ >> +#define __DECLARE_TRACE_RCU(name, proto, args, cond, tp_flags) \ >> static inline void trace_##name##_rcuidle(proto) \ >> { \ >> if (static_key_false(&__tracepoint_##name.key)) \ >> __DO_TRACE(name, \ >> TP_ARGS(args), \ >> - TP_CONDITION(cond), 1); \ >> + TP_CONDITION(cond), 1, tp_flags); \ >> } >> #else >> -#define __DECLARE_TRACE_RCU(name, proto, args, cond) >> +#define __DECLARE_TRACE_RCU(name, proto, args, cond, tp_flags) >> #endif >> >> /* >> @@ -248,7 +251,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> * site if it is not watching, as it will need to be active when the >> * tracepoint is enabled. >> */ >> -#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ >> +#define __DECLARE_TRACE(name, proto, args, cond, data_proto, tp_flags) \ > > Instead of adding "tp_flags" just pass the "mayfault" boolean in. As explained above, I want to avoid combinatory explosion of the number of API functions with ..._may_exist_may_fault_may_xxx_may_yyy(). I also want to avoid the pattern where we need as many booleans, e.g.: ...(..., bool may_exist, bool may_fault, bool may_xxx, bool may_yyy) which then looks like a maze of (... true, false, true, false) in the caller macros. This is error prone and tricky to review. The solution I propose to this problem is introducing the tp_flags. In the case of __DECLARE_TRACE, the tp_flags are only used for their TRACEPOINT_MAY_FAULT bit, but it keeps things consistent everywhere: at tracepoint declaration, registration and use. Note that ((tp_flags) & TRACEPOINT_MAY_FAULT) evaluates to a constant, so there is no performance overhead involved. I would favor keeping the tp_flags to keep everything consistent. > >> extern int __traceiter_##name(data_proto); \ >> DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name); \ >> extern struct tracepoint __tracepoint_##name; \ >> @@ -257,13 +260,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> if (static_key_false(&__tracepoint_##name.key)) \ >> __DO_TRACE(name, \ >> TP_ARGS(args), \ >> - TP_CONDITION(cond), 0); \ >> + TP_CONDITION(cond), 0, tp_flags); \ >> if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ >> WARN_ON_ONCE(!rcu_is_watching()); \ >> } \ >> + if ((tp_flags) & TRACEPOINT_MAY_FAULT) \ >> + might_fault(); \ >> } \ >> __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ >> - PARAMS(cond)) \ >> + PARAMS(cond), tp_flags) \ >> static inline int \ >> register_trace_##name(void (*probe)(data_proto), void *data) \ >> { \ >> @@ -278,6 +283,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> (void *)probe, data, prio); \ >> } \ >> static inline int \ >> + register_trace_prio_flags_##name(void (*probe)(data_proto), void *data, \ >> + int prio, unsigned int flags) \ >> + { \ >> + return tracepoint_probe_register_prio_flags(&__tracepoint_##name, \ >> + (void *)probe, data, prio, flags); \ >> + } \ >> + static inline int \ >> unregister_trace_##name(void (*probe)(data_proto), void *data) \ >> { \ >> return tracepoint_probe_unregister(&__tracepoint_##name,\ >> @@ -298,7 +310,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> * structures, so we create an array of pointers that will be used for iteration >> * on the tracepoints. >> */ >> -#define DEFINE_TRACE_FN(_name, _reg, _unreg, proto, args) \ >> +#define DEFINE_TRACE_FN_FLAGS(_name, _reg, _unreg, proto, args, tp_flags) \ > > Instead of passing in flags, I'm thinking that the faultable tracepoints > need to go into its own section, and possibly have a > register_trace_mayfault_##event() to make it highly distinguishable from > events that don't expect to fault. Registering a probe over a tracepoint with may_fault bit mismatch is already rejected. I'm concerned about multiplying the number of API functions. It may look fine just now to add "just one more" axis and have ...may_fault_may_exist_prio(), but I already find it has reached its limits. This is why I favor the flags. As for placing the faultable tracepoints into their own section, what is the benefit in doing that ? > > Since everything is made by macros, it's not hard to keep all the above > code, and wrap it in other macros so that the faultable and non-faultable > tracepoints share most of the code. > > But as tracepoints live in __section("__tracepoints"), I'm thinking we may > want __section("__tracepoints_mayfault") to keep them separate. We could do that, but I'm not sure what we'd gain, and it would certainly make things awkward when other mutually exclusive "may_..." axes need to be added in the future. Thanks, Mathieu > > Thoughts? > > -- Steve > > >> static const char __tpstrtab_##_name[] \ >> __section("__tracepoints_strings") = #_name; \ >> extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name); \ >> @@ -314,7 +326,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> .probestub = &__probestub_##_name, \ >> .regfunc = _reg, \ >> .unregfunc = _unreg, \ >> - .funcs = NULL }; \ >> + .funcs = NULL, \ >> + .flags = (tp_flags), \ >> + }; \ >> __TRACEPOINT_ENTRY(_name); \ >> int __traceiter_##_name(void *__data, proto) \ >> { \ >> @@ -337,8 +351,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> } \ >> DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name); >> >> +#define DEFINE_TRACE_FN(_name, _reg, _unreg, proto, args) \ >> + DEFINE_TRACE_FN_FLAGS(_name, _reg, _unreg, PARAMS(proto), PARAMS(args), 0) >> + >> #define DEFINE_TRACE(name, proto, args) \ >> - DEFINE_TRACE_FN(name, NULL, NULL, PARAMS(proto), PARAMS(args)); >> + DEFINE_TRACE_FN(name, NULL, NULL, PARAMS(proto), PARAMS(args)) >> >> #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \ >> EXPORT_SYMBOL_GPL(__tracepoint_##name); \ >> @@ -351,7 +368,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> >> >> #else /* !TRACEPOINTS_ENABLED */ >> -#define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ >> +#define __DECLARE_TRACE(name, proto, args, cond, data_proto, tp_flags) \ >> static inline void trace_##name(proto) \ >> { } \ >> static inline void trace_##name##_rcuidle(proto) \ >> @@ -363,6 +380,18 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> return -ENOSYS; \ >> } \ >> static inline int \ >> + register_trace_prio_##name(void (*probe)(data_proto), \ >> + void *data, int prio) \ >> + { \ >> + return -ENOSYS; \ >> + } \ >> + static inline int \ >> + register_trace_prio_flags_##name(void (*probe)(data_proto), \ >> + void *data, int prio, unsigned int flags) \ >> + { \ >> + return -ENOSYS; \ >> + } \ >> + static inline int \ >> unregister_trace_##name(void (*probe)(data_proto), \ >> void *data) \ >> { \ >> @@ -377,6 +406,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) >> return false; \ >> } >> -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com