From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9AE3C7EE21 for ; Thu, 4 May 2023 12:00:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbjEDMAj (ORCPT ); Thu, 4 May 2023 08:00:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbjEDMAi (ORCPT ); Thu, 4 May 2023 08:00:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F71B5FC5 for ; Thu, 4 May 2023 04:59:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683201586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FMKFd8IA3FNxNBNhoF7dywQLlyMztaelEDgwPPj+TUc=; b=iEmgIDTowGGTigvDgmNav5TMbGxcqAgHihJLTryzQi/TgViZt9aj76RxWHNuILBHEjVBuI W6HBhmcysWz/iHU2whCmTJUoTj6YoQv/ktpdp2VMYbNeQ9dmNXORD3rB3VTrAVRjehqptr FYyS+7n56z6hEkwXppbPWu2KO3G33Fo= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-252-ffl3vej-PzKZxm1baLoCWQ-1; Thu, 04 May 2023 07:59:45 -0400 X-MC-Unique: ffl3vej-PzKZxm1baLoCWQ-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3f33f8ffa95so1868915e9.3 for ; Thu, 04 May 2023 04:59:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683201584; x=1685793584; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PC9di2YxHssjaMBoXDABga/hkogRfuRdA576/OSSxXI=; b=jdbtlLpyeocCHYBFv+xbxjPDnEoX+JfEIJ6wzpoGY923yYcnJbe6lt5PWBpt2mAKmC vhN2qbTUaXDMOCk4cg0KH1Odk62T9/JBBy3rPIdx9Zeh1ktzKE0n1d3rF0zFcP/unoSU a+H0mCFb8yqHuMuSwvmX9fVRHaSltL5mhtUT2MQfbEaJwrNO0rgByoZJ/xLn2OnIWjcO YGDQMwSobnI3vWqOaE74mrLjq1Ywrv3hLqpQhTWgsUerc2kiG2bqYjNVKj67DViUknAn 3rgeaoWfGVjwwWZ0fV9JLIy8qf3blbwK97pBE/CHtxEtLjf0UopgR7vbcD0uep2katub wCzw== X-Gm-Message-State: AC+VfDwjBA+e1OD5Zia/qrM21etlCFgFLkljWlc5pE80c/WrDwgtecra lbHN212FZue7AiNyYzALI5mw96M7elSIvtzUG2xMzwKFxnlIY0zZkIqwUBs1DGmkN701m66ifsg 7YXuictGy7YrZqPv3tyk10rc/RBkzzXoDtpnfK5VS X-Received: by 2002:a05:600c:2205:b0:3f1:6ead:34fe with SMTP id z5-20020a05600c220500b003f16ead34femr17169305wml.26.1683201583819; Thu, 04 May 2023 04:59:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Dbzjf30GkMKJXs9b/K/GEeE/NLQpafC0y69HJ1EptyOlHH5MD1FK66isDx7W62rhQFPHmGg== X-Received: by 2002:a05:600c:2205:b0:3f1:6ead:34fe with SMTP id z5-20020a05600c220500b003f16ead34femr17169281wml.26.1683201583411; Thu, 04 May 2023 04:59:43 -0700 (PDT) Received: from vschneid.remote.csb ([154.57.232.159]) by smtp.gmail.com with ESMTPSA id q20-20020a1ce914000000b003edc4788fa0sm4727125wmc.2.2023.05.04.04.59.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 04:59:42 -0700 (PDT) From: Valentin Schneider To: Peter Zijlstra , Leonardo =?utf-8?Q?Br=C3=A1s?= Cc: Steven Rostedt , Masami Hiramatsu , Thomas Gleixner , Yury Norov , "Paul E. McKenney" , Sebastian Andrzej Siewior , Nadav Amit , Zhen Lei , Chen Zhongjin , linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Marcelo Tosatti , Daniel Bristot de Oliveira Subject: Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*() In-Reply-To: <20230503145900.GC1689514@hirez.programming.kicks-ass.net> References: <20230406075718.68672-1-leobras@redhat.com> <20230406095519.GG386572@hirez.programming.kicks-ass.net> <20230503145900.GC1689514@hirez.programming.kicks-ass.net> Date: Thu, 04 May 2023 12:59:41 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org +Daniel On 03/05/23 16:59, Peter Zijlstra wrote: > On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Br=C3=A1s wrote: >> >> IIUC the last commits add tracepoints that are collected in the >> requesting CPU, at the moment of scheduling the IPI, which are also usef= ul in >> some scenarios. >> >> On my scenario, it could help a little,=C2=A0since it makes possible to = filter what >> all other cpus are scheduling on the requested cpu. OTOH it could be als= o be >> misleading, as the requested cpu could be running something that was sch= eduled >> way before. >> >> The change I propose does exactly what my scenario need: track exactly w= hich >> function was running at given time in the requested CPU. With this info,= we can >> check which (if any) remotely requested function was running on given ti= me >> window. > > I was thinking you could simply (graph)-trace > __flush_smp_call_function_queue() with a max_graph_depth or so (Steve > says that ought to work). > > But even that might be too specific, your use case sounds more like what > we have the irq-off latency tracer for, and that thing will immediately > tell you what functions were being ran. > >> (An unrelated thing I just thought: We could even use the commits you po= inted >> together with my proposed change in order to measure how long does it ta= ke for a >> requested function to run / complete in the requested cpu) > > I don't think you could actually do that; the send tracepoints Valentin > added don't log the csd address, this means you cannot distinguish > two CSDs with the same function send from different CPUs. > > To do this you'd need to add the csd address to the the ipi_send > tracepoints and your own (possibly replacing info -- because I don't > think that's too useful). > > Valentin -- is any of this something you'd also find useful? Conceptually yeah, however: With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over, you interrupt the userspace task and you eat the full meal that is NOHZ_FULL kernel entry. Pretty much any such IPI will show up when evaluating your setup with rteval/rlta/whatever, so having an event at IPI reception is redundant. IIUC Leonardo's looking at a setup with isolcpus but not necessarily NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the isolated task over its deadline/period. I would argue any IPI received there is at risk of doing that, so it's the same fight as with NOHZ_FULL. With that said, I suppose this could still be helpful for e.g. osnoise to hook into and point the finger at which CPU/context sent the problematic IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery times. One thing though is that trace_ipi_send_cpu*() is not used solely for CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*(). Something like so... --- diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h new file mode 100644 index 0000000000000..8fc725a2b45b9 --- /dev/null +++ b/include/trace/events/smp.h @@ -0,0 +1,69 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM smp + +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_SMP_H + +#include + +TRACE_EVENT(csd_queue_cpu, + +=09TP_PROTO(const unsigned int cpu, +=09=09 unsigned long callsite, +=09=09 smp_call_func_t func, +=09=09 call_single_data_t *csd), + +=09TP_ARGS(cpu, callsite, func, csd), + +=09TP_STRUCT__entry( +=09=09__field(unsigned int, cpu) +=09=09__field(void *, callsite) +=09=09__field(void *, func) +=09=09__field(void *, csd) +=09), + +=09TP_fast_assign( +=09=09__entry->cpu =3D cpu; +=09=09__entry->callsite =3D (void *)callsite; +=09=09__entry->func =3D func; +=09=09__entry->csd =3D csd; +=09), + +=09TP_printk("cpu=3D%u callsite=3D%pS func=3D%pS csd=3D%p", +=09=09 __entry->cpu, __entry->callsite, __entry->func, __entry->csd) +); + +DECLARE_EVENT_CLASS(csd_function, + +=09TP_PROTO(smp_call_func_t func, call_single_data_t *csd), + +=09TP_ARGS(func, csd), + +=09TP_STRUCT__entry( +=09=09__field(void *,=09func) +=09=09__field(void *,=09csd) +=09), + +=09TP_fast_assign( +=09=09__entry->func=09=3D func; +=09=09__entry->csd=09=3D csd; +=09), + +=09TP_printk("func=3D%pS csd=3D%p", __entry->func, __entry->csd) +); + +DEFINE_EVENT(csd_function, csd_function_entry, +=09TP_PROTO(smp_call_func_t func, call_single_data_t *csd), +=09TP_ARGS(func, csd) +); + +DEFINE_EVENT(csd_function, csd_function_exit, +=09TP_PROTO(smp_call_func_t func, call_single_data_t *csd), +=09TP_ARGS(func, csd) +); + +#endif /* _TRACE_SMP_H */ + +/* This part must be outside protection */ +#include diff --git a/kernel/smp.c b/kernel/smp.c index ab3e5dad6cfe9..7d28db303e9bc 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -27,6 +27,9 @@ #include =20 #include +#define CREATE_TRACE_POINTS +#include +#undef CREATE_TRACE_POINTS =20 #include "smpboot.h" #include "sched/smp.h" @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask) =09arch_send_call_function_ipi_mask(mask); } =20 +static __always_inline void +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd) +{ +=09trace_csd_function_entry(func, csd); +=09func(info); +=09trace_csd_function_exit(func, csd); +} + #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG =20 static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock= _debug_enabled); @@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node= *node) =09 * even if we haven't sent the smp_call IPI yet (e.g. the stopper =09 * executes migration_cpu_stop() on the remote CPU). =09 */ -=09if (trace_ipi_send_cpu_enabled()) { +=09if (trace_csd_queue_cpu_enabled()) { =09=09call_single_data_t *csd; =09=09smp_call_func_t func; =20 @@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node= *node) =09=09func =3D CSD_TYPE(csd) =3D=3D CSD_TYPE_TTWU ? =09=09=09sched_ttwu_pending : csd->func; =20 -=09=09trace_ipi_send_cpu(cpu, _RET_IP_, func); +=09=09trace_csd_queue_cpu(cpu, _RET_IP_, func, csd); =09} =20 =09/* @@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_s= ingle_data *csd) =09=09csd_lock_record(csd); =09=09csd_unlock(csd); =09=09local_irq_save(flags); -=09=09func(info); +=09=09csd_do_func(func, info, NULL); =09=09csd_lock_record(NULL); =09=09local_irq_restore(flags); =09=09return 0; @@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_c= pu_offline) =09=09=09} =20 =09=09=09csd_lock_record(csd); -=09=09=09func(info); +=09=09=09csd_do_func(func, info, csd); =09=09=09csd_unlock(csd); =09=09=09csd_lock_record(NULL); =09=09} else { @@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_c= pu_offline) =20 =09=09=09=09csd_lock_record(csd); =09=09=09=09csd_unlock(csd); -=09=09=09=09func(info); +=09=09=09=09csd_do_func(func, info, csd); =09=09=09=09csd_lock_record(NULL); =09=09=09} else if (type =3D=3D CSD_TYPE_IRQ_WORK) { =09=09=09=09irq_work_single(csd); @@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_= cpu_offline) =09/* =09 * Third; only CSD_TYPE_TTWU is left, issue those. =09 */ -=09if (entry) -=09=09sched_ttwu_pending(entry); +=09if (entry) { +=09=09csd =3D llist_entry(entry, typeof(*csd), node.llist); +=09=09csd_do_func(sched_ttwu_pending, entry, csd); +=09} } =20 =20 @@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single); =20 /** * smp_call_function_single_async() - Run an asynchronous function on a - * =09=09=09 specific CPU. + *=09=09=09=09 specific CPU. * @cpu: The CPU to run on. * @csd: Pre-allocated and setup data structure * @@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, =09int cpu, last_cpu, this_cpu =3D smp_processor_id(); =09struct call_function_data *cfd; =09bool wait =3D scf_flags & SCF_WAIT; -=09int nr_cpus =3D 0, nr_queued =3D 0; +=09int nr_cpus =3D 0; =09bool run_remote =3D false; =09bool run_local =3D false; =20 @@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, =09=09=09csd->node.src =3D smp_processor_id(); =09=09=09csd->node.dst =3D cpu; #endif + +=09=09=09trace_csd_queue_cpu(cpu, _RET_IP_, func, csd); + =09=09=09if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))= ) { =09=09=09=09__cpumask_set_cpu(cpu, cfd->cpumask_ipi); =09=09=09=09nr_cpus++; =09=09=09=09last_cpu =3D cpu; =09=09=09} -=09=09=09nr_queued++; =09=09} =20 -=09=09/* -=09=09 * Trace each smp_function_call_*() as an IPI, actual IPIs -=09=09 * will be traced with func=3D=3Dgeneric_smp_call_function_single_ip= i(). -=09=09 */ -=09=09if (nr_queued) -=09=09=09trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func); - =09=09/* =09=09 * Choose the most efficient way to send an IPI. Note that the =09=09 * number of CPUs might be zero due to concurrent changes to the @@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, =09=09unsigned long flags; =20 =09=09local_irq_save(flags); -=09=09func(info); +=09=09csd_do_func(func, info, NULL); =09=09local_irq_restore(flags); =09} =20