* [PATCH v2 1/2] sched: fix firing of sched:::on-cpu
@ 2024-06-28 17:16 Alan Maguire
2024-06-28 17:16 ` [PATCH v2 2/2] unittest/sched: remove dtv2 xfail Alan Maguire
2024-08-02 21:36 ` [DTrace-devel] [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Kris Van Hees
0 siblings, 2 replies; 9+ messages in thread
From: Alan Maguire @ 2024-06-28 17:16 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
sched:::on-cpu is not firing very often versus off-cpu. It appears
that - for recent kernels at least - fbt::schedule_tail:entry
placement is wrong. The only way to efficiently ensure firing in
the right place - when the new task has been just scheduled in -
is to use fbt::__perf_event_task_sched_in:entry as it
- fires at the right time
- is not static, so not subject to inlining or other optimizations
- is stable across kernel versions.
However the downside is it will not be called unless context switch
perf events are enabled. So the most efficient method is to
perf_event_open() such an event but not attach anything to it.
Also explored was attaching to cpc:::sched_switch-all-1 and weeding
out off-cpu events, but that required a copy in of task state,
comparison etc so in such a hot codepath a more precise attach
is preferable.
With this in place we get sensible on/off cpu numbers:
$ dtrace -n 'sched:::*-cpu { @c[probename] = count();}'
dtrace: description 'sched:::*-cpu ' matched 2 probes
^C
off-cpu 1454
on-cpu 1454
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
libdtrace/dt_prov_sched.c | 48 +++++++++++++++++++++++++++++++++++++--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c
index 2749385a..3e9d4f6b 100644
--- a/libdtrace/dt_prov_sched.c
+++ b/libdtrace/dt_prov_sched.c
@@ -9,6 +9,9 @@
#include <assert.h>
#include <errno.h>
+#include <linux/perf_event.h>
+#include <perfmon/pfmlib_perf_event.h>
+
#include "dt_dctx.h"
#include "dt_cg.h"
#include "dt_provider_sdt.h"
@@ -25,7 +28,7 @@ static probe_dep_t probes[] = {
{ "off-cpu",
DTRACE_PROBESPEC_NAME, "rawtp:sched::sched_switch" },
{ "on-cpu",
- DTRACE_PROBESPEC_NAME, "fbt::schedule_tail:entry" },
+ DTRACE_PROBESPEC_NAME, "fbt::__perf_event_task_sched_in:entry" },
{ "surrender",
DTRACE_PROBESPEC_NAME, "fbt::do_sched_yield:entry" },
{ "tick",
@@ -141,13 +144,54 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
return 0;
}
+/* We need a custom enabling for on-cpu probes to ensure that the fbt function
+ * __perf_event_task_sched_in is called. __perf_event_task_sched_in
+ * will not be called unless context switch perf events have been enabled,
+ * so we do that here by opening a context switch count perf event but not
+ * attaching anything to it to minimize overhead. The alternative - attaching
+ * to cpc:::context_switches-all-1 and weeding out on- versus off-cpu events
+ * via a trampoline is too expensive. This approach works stably across
+ * kernels because __perf_event_task_sched_in() is not static, so not potentially
+ * subject to inlining or other optimizations.
+ */
+static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
+{
+ struct perf_event_attr attr = {};
+ int swfd;
+
+ if (strcmp(prp->desc->prb, "on-cpu") != 0)
+ return dt_sdt_enable(dtp, prp);
+
+ memset(&attr, 0, sizeof(attr));
+ attr.size = sizeof(attr);
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_CONTEXT_SWITCHES;
+ attr.freq = 1;
+ attr.sample_freq = 1000;
+ attr.context_switch = 1;
+
+ swfd = dt_perf_event_open(&attr, -1, 0, -1, 0);
+ if (swfd < 0)
+ dt_dprintf("open of context_switch perf event open failed: %d\n", errno);
+ else
+ prp->prv_data = (void *)(long)swfd;
+ dt_sdt_enable(dtp, prp);
+}
+
+static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
+{
+ if (prp->prv_data)
+ close((int)(long)prp->prv_data);
+}
+
dt_provimpl_t dt_sched = {
.name = prvname,
.prog_type = BPF_PROG_TYPE_UNSPEC,
.populate = &populate,
- .enable = &dt_sdt_enable,
+ .enable = &enable,
.load_prog = &dt_bpf_prog_load,
.trampoline = &trampoline,
.probe_info = &dt_sdt_probe_info,
+ .detach = &detach,
.destroy = &dt_sdt_destroy,
};
--
2.43.5
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-06-28 17:16 [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Alan Maguire
@ 2024-06-28 17:16 ` Alan Maguire
2024-08-02 21:37 ` Kris Van Hees
2024-08-02 21:36 ` [DTrace-devel] [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Kris Van Hees
1 sibling, 1 reply; 9+ messages in thread
From: Alan Maguire @ 2024-06-28 17:16 UTC (permalink / raw)
To: dtrace; +Cc: dtrace-devel, Alan Maguire
...since tst.oncpu.d test passes now.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
test/unittest/sched/tst.oncpu.d | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
index d2c2ce57..7a33bab4 100644
--- a/test/unittest/sched/tst.oncpu.d
+++ b/test/unittest/sched/tst.oncpu.d
@@ -1,11 +1,10 @@
/*
* Oracle Linux DTrace.
- * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
* Licensed under the Universal Permissive License v 1.0 as shown at
* http://oss.oracle.com/licenses/upl.
*/
-/* @@xfail: dtv2 */
/* @@timeout: 15 */
#pragma D option switchrate=100hz
--
2.43.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-06-28 17:16 ` [PATCH v2 2/2] unittest/sched: remove dtv2 xfail Alan Maguire
@ 2024-08-02 21:37 ` Kris Van Hees
2024-08-16 19:33 ` Kris Van Hees
0 siblings, 1 reply; 9+ messages in thread
From: Kris Van Hees @ 2024-08-02 21:37 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
> ...since tst.oncpu.d test passes now.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> ---
> test/unittest/sched/tst.oncpu.d | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
> index d2c2ce57..7a33bab4 100644
> --- a/test/unittest/sched/tst.oncpu.d
> +++ b/test/unittest/sched/tst.oncpu.d
> @@ -1,11 +1,10 @@
> /*
> * Oracle Linux DTrace.
> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
> + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
> * Licensed under the Universal Permissive License v 1.0 as shown at
> * http://oss.oracle.com/licenses/upl.
> */
>
> -/* @@xfail: dtv2 */
> /* @@timeout: 15 */
>
> #pragma D option switchrate=100hz
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-08-02 21:37 ` Kris Van Hees
@ 2024-08-16 19:33 ` Kris Van Hees
2024-08-17 0:28 ` Kris Van Hees
0 siblings, 1 reply; 9+ messages in thread
From: Kris Van Hees @ 2024-08-16 19:33 UTC (permalink / raw)
To: Kris Van Hees; +Cc: Alan Maguire, dtrace, dtrace-devel
Running a full testsuite run (and then also individual test) I found this
test to fail on my OL9 VM with 5.15.0-205.149.5.1.el9uek.x86_64 kernel
while it works on my Debian VM with a 6.5.0 kernel.
On Fri, Aug 02, 2024 at 05:37:11PM -0400, Kris Van Hees wrote:
> On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
> > ...since tst.oncpu.d test passes now.
> >
> > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>
> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
>
> > ---
> > test/unittest/sched/tst.oncpu.d | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
> > index d2c2ce57..7a33bab4 100644
> > --- a/test/unittest/sched/tst.oncpu.d
> > +++ b/test/unittest/sched/tst.oncpu.d
> > @@ -1,11 +1,10 @@
> > /*
> > * Oracle Linux DTrace.
> > - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
> > + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
> > * Licensed under the Universal Permissive License v 1.0 as shown at
> > * http://oss.oracle.com/licenses/upl.
> > */
> >
> > -/* @@xfail: dtv2 */
> > /* @@timeout: 15 */
> >
> > #pragma D option switchrate=100hz
> > --
> > 2.43.5
> >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-08-16 19:33 ` Kris Van Hees
@ 2024-08-17 0:28 ` Kris Van Hees
2024-08-28 16:58 ` Alan Maguire
0 siblings, 1 reply; 9+ messages in thread
From: Kris Van Hees @ 2024-08-17 0:28 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
Problem found: the OL9 UEK7 kernel I am working with (and possibly all) does
not allow an FBT probe on __perf_event_task_sched_in. The failure is silent,
causing the probe to simply never get enabled and no error reported, so the
probe does not fire and causes the test to fail.
In other words... the approach in 1/2 of this series does *not* seem to work
for OL7 kernels. That is a problem.
On Fri, Aug 16, 2024 at 03:33:00PM -0400, Kris Van Hees wrote:
> Running a full testsuite run (and then also individual test) I found this
> test to fail on my OL9 VM with 5.15.0-205.149.5.1.el9uek.x86_64 kernel
> while it works on my Debian VM with a 6.5.0 kernel.
>
> On Fri, Aug 02, 2024 at 05:37:11PM -0400, Kris Van Hees wrote:
> > On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
> > > ...since tst.oncpu.d test passes now.
> > >
> > > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >
> > Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> >
> > > ---
> > > test/unittest/sched/tst.oncpu.d | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
> > > index d2c2ce57..7a33bab4 100644
> > > --- a/test/unittest/sched/tst.oncpu.d
> > > +++ b/test/unittest/sched/tst.oncpu.d
> > > @@ -1,11 +1,10 @@
> > > /*
> > > * Oracle Linux DTrace.
> > > - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
> > > + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
> > > * Licensed under the Universal Permissive License v 1.0 as shown at
> > > * http://oss.oracle.com/licenses/upl.
> > > */
> > >
> > > -/* @@xfail: dtv2 */
> > > /* @@timeout: 15 */
> > >
> > > #pragma D option switchrate=100hz
> > > --
> > > 2.43.5
> > >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-08-17 0:28 ` Kris Van Hees
@ 2024-08-28 16:58 ` Alan Maguire
2024-10-07 19:10 ` Kris Van Hees
0 siblings, 1 reply; 9+ messages in thread
From: Alan Maguire @ 2024-08-28 16:58 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 17/08/2024 01:28, Kris Van Hees wrote:
> Problem found: the OL9 UEK7 kernel I am working with (and possibly all) does
> not allow an FBT probe on __perf_event_task_sched_in. The failure is silent,
> causing the probe to simply never get enabled and no error reported, so the
> probe does not fire and causes the test to fail.
>
> In other words... the approach in 1/2 of this series does *not* seem to work
> for OL7 kernels. That is a problem.
>
Thanks for the report! I've root-caused the absence of the function from
available_filter_functions in UEK7 and earlier (upstream works fine).
Prior to
commit 79df45731da68772d2285265864a52c900b8c65f
Author: Song Liu <songliubraving@fb.com>
Date: Wed Oct 6 14:07:32 2021 -0700
perf/core: Allow ftrace for functions in kernel/event/core.c
It is useful to trace functions in kernel/event/core.c. Allow ftrace for
them by removing $(CC_FLAGS_FTRACE) from Makefile.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link:
https://lkml.kernel.org/r/20211006210732.2826289-1-songliubraving@fb.com
...kernel/events/Makefile removed the ftrace compile flags which mark
function entry for the code in kernel/events. In UEK7 we see
ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
endif
I'll try to find a workaround that doesn't rely on function boundary
tracing in these files to work..
Alan
> On Fri, Aug 16, 2024 at 03:33:00PM -0400, Kris Van Hees wrote:
>> Running a full testsuite run (and then also individual test) I found this
>> test to fail on my OL9 VM with 5.15.0-205.149.5.1.el9uek.x86_64 kernel
>> while it works on my Debian VM with a 6.5.0 kernel.
>>
>> On Fri, Aug 02, 2024 at 05:37:11PM -0400, Kris Van Hees wrote:
>>> On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
>>>> ...since tst.oncpu.d test passes now.
>>>>
>>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>>
>>> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
>>>
>>>> ---
>>>> test/unittest/sched/tst.oncpu.d | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
>>>> index d2c2ce57..7a33bab4 100644
>>>> --- a/test/unittest/sched/tst.oncpu.d
>>>> +++ b/test/unittest/sched/tst.oncpu.d
>>>> @@ -1,11 +1,10 @@
>>>> /*
>>>> * Oracle Linux DTrace.
>>>> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
>>>> + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>> * http://oss.oracle.com/licenses/upl.
>>>> */
>>>>
>>>> -/* @@xfail: dtv2 */
>>>> /* @@timeout: 15 */
>>>>
>>>> #pragma D option switchrate=100hz
>>>> --
>>>> 2.43.5
>>>>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-08-28 16:58 ` Alan Maguire
@ 2024-10-07 19:10 ` Kris Van Hees
2024-10-08 18:17 ` Alan Maguire
0 siblings, 1 reply; 9+ messages in thread
From: Kris Van Hees @ 2024-10-07 19:10 UTC (permalink / raw)
To: Alan Maguire; +Cc: Kris Van Hees, dtrace, dtrace-devel
Any progress on this?
On Wed, Aug 28, 2024 at 05:58:11PM +0100, Alan Maguire wrote:
> On 17/08/2024 01:28, Kris Van Hees wrote:
> > Problem found: the OL9 UEK7 kernel I am working with (and possibly all) does
> > not allow an FBT probe on __perf_event_task_sched_in. The failure is silent,
> > causing the probe to simply never get enabled and no error reported, so the
> > probe does not fire and causes the test to fail.
> >
> > In other words... the approach in 1/2 of this series does *not* seem to work
> > for OL7 kernels. That is a problem.
> >
>
> Thanks for the report! I've root-caused the absence of the function from
> available_filter_functions in UEK7 and earlier (upstream works fine).
> Prior to
>
> commit 79df45731da68772d2285265864a52c900b8c65f
> Author: Song Liu <songliubraving@fb.com>
> Date: Wed Oct 6 14:07:32 2021 -0700
>
> perf/core: Allow ftrace for functions in kernel/event/core.c
>
> It is useful to trace functions in kernel/event/core.c. Allow ftrace for
> them by removing $(CC_FLAGS_FTRACE) from Makefile.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link:
> https://lkml.kernel.org/r/20211006210732.2826289-1-songliubraving@fb.com
>
>
> ...kernel/events/Makefile removed the ftrace compile flags which mark
> function entry for the code in kernel/events. In UEK7 we see
>
> ifdef CONFIG_FUNCTION_TRACER
> CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
> endif
>
> I'll try to find a workaround that doesn't rely on function boundary
> tracing in these files to work..
>
> Alan
>
> > On Fri, Aug 16, 2024 at 03:33:00PM -0400, Kris Van Hees wrote:
> >> Running a full testsuite run (and then also individual test) I found this
> >> test to fail on my OL9 VM with 5.15.0-205.149.5.1.el9uek.x86_64 kernel
> >> while it works on my Debian VM with a 6.5.0 kernel.
> >>
> >> On Fri, Aug 02, 2024 at 05:37:11PM -0400, Kris Van Hees wrote:
> >>> On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
> >>>> ...since tst.oncpu.d test passes now.
> >>>>
> >>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >>>
> >>> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
> >>>
> >>>> ---
> >>>> test/unittest/sched/tst.oncpu.d | 3 +--
> >>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
> >>>> index d2c2ce57..7a33bab4 100644
> >>>> --- a/test/unittest/sched/tst.oncpu.d
> >>>> +++ b/test/unittest/sched/tst.oncpu.d
> >>>> @@ -1,11 +1,10 @@
> >>>> /*
> >>>> * Oracle Linux DTrace.
> >>>> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
> >>>> + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
> >>>> * Licensed under the Universal Permissive License v 1.0 as shown at
> >>>> * http://oss.oracle.com/licenses/upl.
> >>>> */
> >>>>
> >>>> -/* @@xfail: dtv2 */
> >>>> /* @@timeout: 15 */
> >>>>
> >>>> #pragma D option switchrate=100hz
> >>>> --
> >>>> 2.43.5
> >>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] unittest/sched: remove dtv2 xfail
2024-10-07 19:10 ` Kris Van Hees
@ 2024-10-08 18:17 ` Alan Maguire
0 siblings, 0 replies; 9+ messages in thread
From: Alan Maguire @ 2024-10-08 18:17 UTC (permalink / raw)
To: Kris Van Hees; +Cc: dtrace, dtrace-devel
On 07/10/2024 20:10, Kris Van Hees wrote:
> Any progress on this?
>
The first approach I investigated - using a perf event BPF program to
catch sched switch software perf events - seems to be a dead end because
there is no easy way I could find to access the perf event payload to
determine if the event is a sched_in or sched_out event. There's nothing
I can see in the perf sample to provide that info, and even if we could
access it we would have to do a series of expensive
bpf_probe_read_kernel()s in a trampoline for sched;::on-cpu to determine
if it is the right event. There is no way I can see to limit perf event
firing to oncpu events, so we'd have to filter and that's expensive
since we know 50% of the time we will be in an off-cpu event. I can't
see any other obvious context we could draw upon reliably to distinguish
on- and off-cpu events in this context.
An alternative - that has other benefits aside from this specific
problem - is to support attach to .isra.0-suffixed functions. If we can
do that, we can attach to fbt::finish_task_switch[.isra.0:return to
trace the right location for on-cpu. This works on UEK7 also.
The .isra.0 functions are in available_filter_functions, and have BTF
representations (as long as they do not violate register expectations
with parameters), so having support for these would be valuable in
expanding the traceable surface of the system. There are a few wrinkles,
specifically:
- kprobe event names do not support a "." so we need to remove it
- BTF representations do not have the ".isra.0" suffix (in line with
DWARF) so we need to drop it to do BTF function lookup
Both these can be accomplished by creating a string which drops the
suffix and using that at the appropriate times. I have a rough
implementation working for kprobes, and will test fprobes shortly. What
do you think? It would be nice to expand the tracing surface, and if we
did add that, supporting sched:::on-cpu would be much easier (and
backwards-compatible to 5.15 at least).
Thanks!
Alan
>>> not allow an FBT probe on __perf_event_task_sched_in. The failure is silent,
>>> causing the probe to simply never get enabled and no error reported, so the
>>> probe does not fire and causes the test to fail.
>>>
>>> In other words... the approach in 1/2 of this series does *not* seem to work
>>> for OL7 kernels. That is a problem.
>>>
>>
>> Thanks for the report! I've root-caused the absence of the function from
>> available_filter_functions in UEK7 and earlier (upstream works fine).
>> Prior to
>>
>> commit 79df45731da68772d2285265864a52c900b8c65f
>> Author: Song Liu <songliubraving@fb.com>
>> Date: Wed Oct 6 14:07:32 2021 -0700
>>
>> perf/core: Allow ftrace for functions in kernel/event/core.c
>>
>> It is useful to trace functions in kernel/event/core.c. Allow ftrace for
>> them by removing $(CC_FLAGS_FTRACE) from Makefile.
>>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Link:
>> https://lkml.kernel.org/r/20211006210732.2826289-1-songliubraving@fb.com
>>
>>
>> ...kernel/events/Makefile removed the ftrace compile flags which mark
>> function entry for the code in kernel/events. In UEK7 we see
>>
>> ifdef CONFIG_FUNCTION_TRACER
>> CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
>> endif
>>
>> I'll try to find a workaround that doesn't rely on function boundary
>> tracing in these files to work..
>>
>> Alan
>>
>>> On Fri, Aug 16, 2024 at 03:33:00PM -0400, Kris Van Hees wrote:
>>>> Running a full testsuite run (and then also individual test) I found this
>>>> test to fail on my OL9 VM with 5.15.0-205.149.5.1.el9uek.x86_64 kernel
>>>> while it works on my Debian VM with a 6.5.0 kernel.
>>>>
>>>> On Fri, Aug 02, 2024 at 05:37:11PM -0400, Kris Van Hees wrote:
>>>>> On Fri, Jun 28, 2024 at 06:16:34PM +0100, Alan Maguire wrote:
>>>>>> ...since tst.oncpu.d test passes now.
>>>>>>
>>>>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>>>>
>>>>> Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
>>>>>
>>>>>> ---
>>>>>> test/unittest/sched/tst.oncpu.d | 3 +--
>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/test/unittest/sched/tst.oncpu.d b/test/unittest/sched/tst.oncpu.d
>>>>>> index d2c2ce57..7a33bab4 100644
>>>>>> --- a/test/unittest/sched/tst.oncpu.d
>>>>>> +++ b/test/unittest/sched/tst.oncpu.d
>>>>>> @@ -1,11 +1,10 @@
>>>>>> /*
>>>>>> * Oracle Linux DTrace.
>>>>>> - * Copyright (c) 2006, 2020, Oracle and/or its affiliates. All rights reserved.
>>>>>> + * Copyright (c) 2006, 2024, Oracle and/or its affiliates. All rights reserved.
>>>>>> * Licensed under the Universal Permissive License v 1.0 as shown at
>>>>>> * http://oss.oracle.com/licenses/upl.
>>>>>> */
>>>>>>
>>>>>> -/* @@xfail: dtv2 */
>>>>>> /* @@timeout: 15 */
>>>>>>
>>>>>> #pragma D option switchrate=100hz
>>>>>> --
>>>>>> 2.43.5
>>>>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [DTrace-devel] [PATCH v2 1/2] sched: fix firing of sched:::on-cpu
2024-06-28 17:16 [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Alan Maguire
2024-06-28 17:16 ` [PATCH v2 2/2] unittest/sched: remove dtv2 xfail Alan Maguire
@ 2024-08-02 21:36 ` Kris Van Hees
1 sibling, 0 replies; 9+ messages in thread
From: Kris Van Hees @ 2024-08-02 21:36 UTC (permalink / raw)
To: Alan Maguire; +Cc: dtrace, dtrace-devel
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
On Fri, Jun 28, 2024 at 06:16:33PM +0100, Alan Maguire via DTrace-devel wrote:
> sched:::on-cpu is not firing very often versus off-cpu. It appears
> that - for recent kernels at least - fbt::schedule_tail:entry
> placement is wrong. The only way to efficiently ensure firing in
> the right place - when the new task has been just scheduled in -
> is to use fbt::__perf_event_task_sched_in:entry as it
>
> - fires at the right time
> - is not static, so not subject to inlining or other optimizations
> - is stable across kernel versions.
>
> However the downside is it will not be called unless context switch
> perf events are enabled. So the most efficient method is to
> perf_event_open() such an event but not attach anything to it.
> Also explored was attaching to cpc:::sched_switch-all-1 and weeding
> out off-cpu events, but that required a copy in of task state,
> comparison etc so in such a hot codepath a more precise attach
> is preferable.
>
> With this in place we get sensible on/off cpu numbers:
>
> $ dtrace -n 'sched:::*-cpu { @c[probename] = count();}'
> dtrace: description 'sched:::*-cpu ' matched 2 probes
> ^C
>
> off-cpu 1454
> on-cpu 1454
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> libdtrace/dt_prov_sched.c | 48 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/libdtrace/dt_prov_sched.c b/libdtrace/dt_prov_sched.c
> index 2749385a..3e9d4f6b 100644
> --- a/libdtrace/dt_prov_sched.c
> +++ b/libdtrace/dt_prov_sched.c
> @@ -9,6 +9,9 @@
> #include <assert.h>
> #include <errno.h>
>
> +#include <linux/perf_event.h>
> +#include <perfmon/pfmlib_perf_event.h>
> +
> #include "dt_dctx.h"
> #include "dt_cg.h"
> #include "dt_provider_sdt.h"
> @@ -25,7 +28,7 @@ static probe_dep_t probes[] = {
> { "off-cpu",
> DTRACE_PROBESPEC_NAME, "rawtp:sched::sched_switch" },
> { "on-cpu",
> - DTRACE_PROBESPEC_NAME, "fbt::schedule_tail:entry" },
> + DTRACE_PROBESPEC_NAME, "fbt::__perf_event_task_sched_in:entry" },
> { "surrender",
> DTRACE_PROBESPEC_NAME, "fbt::do_sched_yield:entry" },
> { "tick",
> @@ -141,13 +144,54 @@ static int trampoline(dt_pcb_t *pcb, uint_t exitlbl)
> return 0;
> }
>
> +/* We need a custom enabling for on-cpu probes to ensure that the fbt function
> + * __perf_event_task_sched_in is called. __perf_event_task_sched_in
> + * will not be called unless context switch perf events have been enabled,
> + * so we do that here by opening a context switch count perf event but not
> + * attaching anything to it to minimize overhead. The alternative - attaching
> + * to cpc:::context_switches-all-1 and weeding out on- versus off-cpu events
> + * via a trampoline is too expensive. This approach works stably across
> + * kernels because __perf_event_task_sched_in() is not static, so not potentially
> + * subject to inlining or other optimizations.
> + */
> +static void enable(dtrace_hdl_t *dtp, dt_probe_t *prp)
> +{
> + struct perf_event_attr attr = {};
> + int swfd;
> +
> + if (strcmp(prp->desc->prb, "on-cpu") != 0)
> + return dt_sdt_enable(dtp, prp);
> +
> + memset(&attr, 0, sizeof(attr));
> + attr.size = sizeof(attr);
> + attr.type = PERF_TYPE_SOFTWARE;
> + attr.config = PERF_COUNT_SW_CONTEXT_SWITCHES;
> + attr.freq = 1;
> + attr.sample_freq = 1000;
> + attr.context_switch = 1;
> +
> + swfd = dt_perf_event_open(&attr, -1, 0, -1, 0);
> + if (swfd < 0)
> + dt_dprintf("open of context_switch perf event open failed: %d\n", errno);
> + else
> + prp->prv_data = (void *)(long)swfd;
> + dt_sdt_enable(dtp, prp);
> +}
> +
> +static void detach(dtrace_hdl_t *dtp, const dt_probe_t *prp)
> +{
> + if (prp->prv_data)
> + close((int)(long)prp->prv_data);
> +}
> +
> dt_provimpl_t dt_sched = {
> .name = prvname,
> .prog_type = BPF_PROG_TYPE_UNSPEC,
> .populate = &populate,
> - .enable = &dt_sdt_enable,
> + .enable = &enable,
> .load_prog = &dt_bpf_prog_load,
> .trampoline = &trampoline,
> .probe_info = &dt_sdt_probe_info,
> + .detach = &detach,
> .destroy = &dt_sdt_destroy,
> };
> --
> 2.43.5
>
>
> _______________________________________________
> DTrace-devel mailing list
> DTrace-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/dtrace-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-10-08 18:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 17:16 [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Alan Maguire
2024-06-28 17:16 ` [PATCH v2 2/2] unittest/sched: remove dtv2 xfail Alan Maguire
2024-08-02 21:37 ` Kris Van Hees
2024-08-16 19:33 ` Kris Van Hees
2024-08-17 0:28 ` Kris Van Hees
2024-08-28 16:58 ` Alan Maguire
2024-10-07 19:10 ` Kris Van Hees
2024-10-08 18:17 ` Alan Maguire
2024-08-02 21:36 ` [DTrace-devel] [PATCH v2 1/2] sched: fix firing of sched:::on-cpu Kris Van Hees
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox