* Query regarding work scheduling @ 2025-08-12 11:40 Subbaraya Sundeep 2025-08-12 18:52 ` Tejun Heo 0 siblings, 1 reply; 8+ messages in thread From: Subbaraya Sundeep @ 2025-08-12 11:40 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, tj, jiangshanlai Cc: linux-kernel Hi, One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they see more delay in their applications shutdown time. To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). During interrupt handler work is queued and messages are processed in work handler. I calculated the latencies (time between work queued and work execution start) of 6.1 and 6.16 and below are the observations 6.1 mainline ------------ Total samples: 4647 Min latency: 0.001 ms Max latency: 0.195 ms Total latency: 7.797 ms Latency Histogram (bucket size = 0.01 ms): 0.00 - 0.01 ms: 4644 0.01 - 0.02 ms: 1 0.03 - 0.04 ms: 1 0.19 - 0.20 ms: 1 ================== 6.16 mainline ------------- Total samples: 4647 Min latency: 0.000 ms Max latency: 4.880 ms Total latency: 158.813 ms Latency Histogram (bucket size = 0.01 ms): 0.00 - 0.01 ms: 4573 0.03 - 0.04 ms: 1 0.19 - 0.20 ms: 1 0.70 - 0.71 ms: 1 0.72 - 0.73 ms: 1 0.92 - 0.93 ms: 3 0.93 - 0.94 ms: 1 0.95 - 0.96 ms: 2 0.97 - 0.98 ms: 2 0.98 - 0.99 ms: 6 0.99 - 1.00 ms: 8 1.00 - 1.01 ms: 14 1.08 - 1.09 ms: 1 1.41 - 1.42 ms: 1 1.79 - 1.80 ms: 1 1.80 - 1.81 ms: 1 1.81 - 1.82 ms: 1 1.92 - 1.93 ms: 1 1.99 - 2.00 ms: 1 2.34 - 2.35 ms: 1 2.61 - 2.62 ms: 1 2.99 - 3.00 ms: 1 3.14 - 3.15 ms: 1 3.62 - 3.63 ms: 1 3.70 - 3.71 ms: 1 3.71 - 3.72 ms: 1 3.75 - 3.76 ms: 1 3.87 - 3.88 ms: 4 3.90 - 3.91 ms: 1 3.91 - 3.92 ms: 2 3.92 - 3.93 ms: 2 3.94 - 3.95 ms: 2 3.95 - 3.96 ms: 1 3.98 - 3.99 ms: 2 3.99 - 4.00 ms: 3 4.87 - 4.88 ms: 2 ================== As seen from histograms above, latency is more with 6.16 kernel. Above is tested on uniprocessor system. SMP is fine since work is scheduled to all cores. Please let me know what is going on and provide some pointers if am missing very basic here. I changed only kernel images. Application and rootfs are same in both cases (to ensure no additonal daemons or load in 6.16 case) Let me know if there is any knob in 6.16 to have same behavior as 6.1. Scheduler features of 6.16 are as below: # cat /sys/kernel/debug/sched/features PLACE_LAG PLACE_DEADLINE_INITIAL PLACE_REL_DEADLINE RUN_TO_PARITY PREEMPT_SHORT NO_NEXT_BUDDY PICK_BUDDY CACHE_HOT_BUDDY DELAY_DEQUEUE DELAY_ZERO WAKEUP_PREEMPTION NO_HRTICK NO_HRTICK_DL NONTASK_CAPACITY TTWU_QUEUE SIS_UTIL NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI NO_RT_RUNTIME_SHARE NO_LB_MIN ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS UTIL_EST NO_LATENCY_WARN Thanks, Sundeep ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-12 11:40 Query regarding work scheduling Subbaraya Sundeep @ 2025-08-12 18:52 ` Tejun Heo 2025-08-14 3:54 ` Subbaraya Sundeep 0 siblings, 1 reply; 8+ messages in thread From: Tejun Heo @ 2025-08-12 18:52 UTC (permalink / raw) To: Subbaraya Sundeep Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel Hello, On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote: > Hi, > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they > see more delay in their applications shutdown time. > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). > During interrupt handler work is queued and messages are processed in work handler. > I calculated the latencies (time between work queued and work execution start) of 6.1 > and 6.16 and below are the observations > > > 6.1 mainline > ------------ > Total samples: 4647 > Min latency: 0.001 ms > Max latency: 0.195 ms > Total latency: 7.797 ms > > Latency Histogram (bucket size = 0.01 ms): > 0.00 - 0.01 ms: 4644 > 0.01 - 0.02 ms: 1 > 0.03 - 0.04 ms: 1 > 0.19 - 0.20 ms: 1 > > ================== > > 6.16 mainline > ------------- > Total samples: 4647 > Min latency: 0.000 ms > Max latency: 4.880 ms > Total latency: 158.813 ms Difficult to tell where the latencies are coming from. Maybe you can use something like https://github.com/josefbacik/systing to look further into it? All the scheduling events are tracked by default and you should be able to add tracepoints and other events relatively easily. You can also set trigger conditions so that trace around a high latency event can be captured reliably. Thanks. -- tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-12 18:52 ` Tejun Heo @ 2025-08-14 3:54 ` Subbaraya Sundeep 2025-08-14 9:48 ` Peter Zijlstra 2025-08-14 17:45 ` Steven Rostedt 0 siblings, 2 replies; 8+ messages in thread From: Subbaraya Sundeep @ 2025-08-14 3:54 UTC (permalink / raw) To: Tejun Heo Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel Hi Tejun, On 2025-08-12 at 18:52:32, Tejun Heo (tj@kernel.org) wrote: > Hello, > > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote: > > Hi, > > > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they > > see more delay in their applications shutdown time. > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). > > During interrupt handler work is queued and messages are processed in work handler. > > I calculated the latencies (time between work queued and work execution start) of 6.1 > > and 6.16 and below are the observations > > > > > > 6.1 mainline > > ------------ > > Total samples: 4647 > > Min latency: 0.001 ms > > Max latency: 0.195 ms > > Total latency: 7.797 ms > > > > Latency Histogram (bucket size = 0.01 ms): > > 0.00 - 0.01 ms: 4644 > > 0.01 - 0.02 ms: 1 > > 0.03 - 0.04 ms: 1 > > 0.19 - 0.20 ms: 1 > > > > ================== > > > > 6.16 mainline > > ------------- > > Total samples: 4647 > > Min latency: 0.000 ms > > Max latency: 4.880 ms > > Total latency: 158.813 ms > > Difficult to tell where the latencies are coming from. Maybe you can use > something like https://github.com/josefbacik/systing to look further into > it? All the scheduling events are tracked by default and you should be able > to add tracepoints and other events relatively easily. You can also set Thanks for the reply. I am using simple busybox to avoid overhead of any other apps or deamons running in background and taking CPU time in between. I will try building systing and running it. 6.16 histogram shows that it is not one high latency event causing overall latency but bunch of small latencies are adding up and causing big latency. I suspect this has something to do with EEVDF scheduling since this behavior is seen from 6.6 (please note I may be wrong completly). Are there any methods or options with which I can bring back CFS scheduling behavior maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? Thanks, Sundeep > trigger conditions so that trace around a high latency event can be captured > reliably. > > Thanks. > > -- > tejun ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-14 3:54 ` Subbaraya Sundeep @ 2025-08-14 9:48 ` Peter Zijlstra 2025-08-19 11:18 ` Subbaraya Sundeep 2025-08-14 17:45 ` Steven Rostedt 1 sibling, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2025-08-14 9:48 UTC (permalink / raw) To: Subbaraya Sundeep Cc: Tejun Heo, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel On Thu, Aug 14, 2025 at 03:54:58AM +0000, Subbaraya Sundeep wrote: > Hi Tejun, > > On 2025-08-12 at 18:52:32, Tejun Heo (tj@kernel.org) wrote: > > Hello, > > > > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote: > > > Hi, > > > > > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they > > > see more delay in their applications shutdown time. > > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and > > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). > > > During interrupt handler work is queued and messages are processed in work handler. > > > I calculated the latencies (time between work queued and work execution start) of 6.1 > > > and 6.16 and below are the observations > > > > > > > > > 6.1 mainline > > > ------------ > > > Total samples: 4647 > > > Min latency: 0.001 ms > > > Max latency: 0.195 ms > > > Total latency: 7.797 ms > > > > > > Latency Histogram (bucket size = 0.01 ms): > > > 0.00 - 0.01 ms: 4644 > > > 0.01 - 0.02 ms: 1 > > > 0.03 - 0.04 ms: 1 > > > 0.19 - 0.20 ms: 1 > > > > > > ================== > > > > > > 6.16 mainline > > > ------------- > > > Total samples: 4647 > > > Min latency: 0.000 ms > > > Max latency: 4.880 ms > > > Total latency: 158.813 ms > > > > Difficult to tell where the latencies are coming from. Maybe you can use > > something like https://github.com/josefbacik/systing to look further into > > it? All the scheduling events are tracked by default and you should be able > > to add tracepoints and other events relatively easily. You can also set > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps > or deamons running in background and taking CPU time in between. Well, something is running. So there must be competing runnable tasks. > I suspect this has something to do with EEVDF scheduling since this behavior is > seen from 6.6 (please note I may be wrong completly). EEVDF is stricter in a sense than CFS was, is looks like the workqueue thread just ran out of cycles and is made to wait. > Are there any methods or options with which I can bring back CFS scheduling behavior > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? We have a lot of knobs; but not one that says: do-what-I-want. If you push a ton of work into a workqueue and have competing runnable tasks; why do you think it isn't reasonable to have the competing tasks run some of the time? You can maybe push the slice length up a bit -- it was fixed to the small side of the CFS dynamic slice. But who knows what your workload is doing. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-14 9:48 ` Peter Zijlstra @ 2025-08-19 11:18 ` Subbaraya Sundeep 2025-10-31 15:14 ` Subbaraya Sundeep 0 siblings, 1 reply; 8+ messages in thread From: Subbaraya Sundeep @ 2025-08-19 11:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel Hi Peter, On 2025-08-14 at 09:48:31, Peter Zijlstra (peterz@infradead.org) wrote: > On Thu, Aug 14, 2025 at 03:54:58AM +0000, Subbaraya Sundeep wrote: > > Hi Tejun, > > > > On 2025-08-12 at 18:52:32, Tejun Heo (tj@kernel.org) wrote: > > > Hello, > > > > > > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote: > > > > Hi, > > > > > > > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they > > > > see more delay in their applications shutdown time. > > > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and > > > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). > > > > During interrupt handler work is queued and messages are processed in work handler. > > > > I calculated the latencies (time between work queued and work execution start) of 6.1 > > > > and 6.16 and below are the observations > > > > > > > > > > > > 6.1 mainline > > > > ------------ > > > > Total samples: 4647 > > > > Min latency: 0.001 ms > > > > Max latency: 0.195 ms > > > > Total latency: 7.797 ms > > > > > > > > Latency Histogram (bucket size = 0.01 ms): > > > > 0.00 - 0.01 ms: 4644 > > > > 0.01 - 0.02 ms: 1 > > > > 0.03 - 0.04 ms: 1 > > > > 0.19 - 0.20 ms: 1 > > > > > > > > ================== > > > > > > > > 6.16 mainline > > > > ------------- > > > > Total samples: 4647 > > > > Min latency: 0.000 ms > > > > Max latency: 4.880 ms > > > > Total latency: 158.813 ms > > > > > > Difficult to tell where the latencies are coming from. Maybe you can use > > > something like https://github.com/josefbacik/systing to look further into > > > it? All the scheduling events are tracked by default and you should be able > > > to add tracepoints and other events relatively easily. You can also set > > > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps > > or deamons running in background and taking CPU time in between. > > Well, something is running. So there must be competing runnable tasks. > > > I suspect this has something to do with EEVDF scheduling since this behavior is > > seen from 6.6 (please note I may be wrong completly). > > EEVDF is stricter in a sense than CFS was, is looks like the workqueue > thread just ran out of cycles and is made to wait. > I am a complete beginner in this area. If a work function has executed thousands of times by a kworker then will it be made to wait a little longer after some invocations since kworker has taken too much of CPU time already? Or the accounting starts from the moment kworker became runnable from sleep state only? Sorry if I am not making any sense but I want to understand below: 1. kworker sleeping -> waking up and running a function -> sleeping Above can be done n number of times and scheduler is always in favor of kworker and picks it when it becomes runnable since new runnable task Or Scheduler knows that CPU has executed kworker thread lot of time (runtime of each invocation is tracked) so starts delaying the kworker for execution. > > Are there any methods or options with which I can bring back CFS scheduling behavior > > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? > > We have a lot of knobs; but not one that says: do-what-I-want. > > If you push a ton of work into a workqueue and have competing runnable > tasks; why do you think it isn't reasonable to have the competing tasks > run some of the time? > > You can maybe push the slice length up a bit -- it was fixed to the > small side of the CFS dynamic slice. But who knows what your workload is > doing. Workload is like 1. userspace writes message in hw shared mbox region 2. triggers interrupt to PF 3. PF receives interrupt and queues work for processing message and sends response 3. userspace polls for response in while(1) So on a single cpu system userspace while(1) code and kernel workqueue function are competing whereas userspace while(1) code actually depends on workqueue function execution in kernel. I am doing more experiments and will update you. Thanks for the reply and your time. Sundeep ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-19 11:18 ` Subbaraya Sundeep @ 2025-10-31 15:14 ` Subbaraya Sundeep 0 siblings, 0 replies; 8+ messages in thread From: Subbaraya Sundeep @ 2025-10-31 15:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel Hi Peter, On 2025-08-19 at 16:48:18, Subbaraya Sundeep (sbhatta@marvell.com) wrote: > Hi Peter, > > On 2025-08-14 at 09:48:31, Peter Zijlstra (peterz@infradead.org) wrote: > > On Thu, Aug 14, 2025 at 03:54:58AM +0000, Subbaraya Sundeep wrote: > > > Hi Tejun, > > > > > > On 2025-08-12 at 18:52:32, Tejun Heo (tj@kernel.org) wrote: > > > > Hello, > > > > > > > > On Tue, Aug 12, 2025 at 11:40:34AM +0000, Subbaraya Sundeep wrote: > > > > > Hi, > > > > > > > > > > One of our customers reported that when their kernel upgraded from 6.1 to 6.6 then they > > > > > see more delay in their applications shutdown time. > > > > > To put in simple terms, dataplane applications are run with SRIOV VFs attached to them and > > > > > apps send number of mailbox messages to kernel PF driver (PF receives an mbox interrupt). > > > > > During interrupt handler work is queued and messages are processed in work handler. > > > > > I calculated the latencies (time between work queued and work execution start) of 6.1 > > > > > and 6.16 and below are the observations > > > > > > > > > > > > > > > 6.1 mainline > > > > > ------------ > > > > > Total samples: 4647 > > > > > Min latency: 0.001 ms > > > > > Max latency: 0.195 ms > > > > > Total latency: 7.797 ms > > > > > > > > > > Latency Histogram (bucket size = 0.01 ms): > > > > > 0.00 - 0.01 ms: 4644 > > > > > 0.01 - 0.02 ms: 1 > > > > > 0.03 - 0.04 ms: 1 > > > > > 0.19 - 0.20 ms: 1 > > > > > > > > > > ================== > > > > > > > > > > 6.16 mainline > > > > > ------------- > > > > > Total samples: 4647 > > > > > Min latency: 0.000 ms > > > > > Max latency: 4.880 ms > > > > > Total latency: 158.813 ms > > > > > > > > Difficult to tell where the latencies are coming from. Maybe you can use > > > > something like https://github.com/josefbacik/systing to look further into > > > > it? All the scheduling events are tracked by default and you should be able > > > > to add tracepoints and other events relatively easily. You can also set > > > > > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps > > > or deamons running in background and taking CPU time in between. > > > > Well, something is running. So there must be competing runnable tasks. > > > > > I suspect this has something to do with EEVDF scheduling since this behavior is > > > seen from 6.6 (please note I may be wrong completly). > > > > EEVDF is stricter in a sense than CFS was, is looks like the workqueue > > thread just ran out of cycles and is made to wait. > > > I am a complete beginner in this area. If a work function has executed > thousands of times by a kworker then will it be made to wait a little longer after > some invocations since kworker has taken too much of CPU time already? > Or the accounting starts from the moment kworker became runnable from > sleep state only? Sorry if I am not making any sense but I want to understand > below: > 1. kworker sleeping -> waking up and running a function -> sleeping > > Above can be done n number of times and scheduler is always in favor of > kworker and picks it when it becomes runnable since new runnable task > Or > Scheduler knows that CPU has executed kworker thread lot of time > (runtime of each invocation is tracked) so starts delaying the kworker > for execution. > > > > Are there any methods or options with which I can bring back CFS scheduling behavior > > > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? > > > > We have a lot of knobs; but not one that says: do-what-I-want. > > > > If you push a ton of work into a workqueue and have competing runnable > > tasks; why do you think it isn't reasonable to have the competing tasks > > run some of the time? > > > > You can maybe push the slice length up a bit -- it was fixed to the > > small side of the CFS dynamic slice. But who knows what your workload is > > doing. > Workload is like > 1. userspace writes message in hw shared mbox region > 2. triggers interrupt to PF > 3. PF receives interrupt and queues work for processing message and sends response > 3. userspace polls for response in while(1) > > So on a single cpu system userspace while(1) code and kernel workqueue function > are competing whereas userspace while(1) code actually depends on workqueue > function execution in kernel. > I am doing more experiments and will update you. Thanks for the reply > and your time. Sorry for long delay. I suggested customer to use sleep instead of busy loop in their application then latencies are gone and they are okay with it. Thanks, Sundeep > > Sundeep ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-14 3:54 ` Subbaraya Sundeep 2025-08-14 9:48 ` Peter Zijlstra @ 2025-08-14 17:45 ` Steven Rostedt 2025-08-19 11:36 ` Subbaraya Sundeep 1 sibling, 1 reply; 8+ messages in thread From: Steven Rostedt @ 2025-08-14 17:45 UTC (permalink / raw) To: Subbaraya Sundeep Cc: Tejun Heo, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel On Thu, 14 Aug 2025 03:54:58 +0000 Subbaraya Sundeep <sbhatta@marvell.com> wrote: > > Difficult to tell where the latencies are coming from. Maybe you can use > > something like https://github.com/josefbacik/systing to look further into > > it? All the scheduling events are tracked by default and you should be able > > to add tracepoints and other events relatively easily. You can also set > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps > or deamons running in background and taking CPU time in between. > I will try building systing and running it. 6.16 histogram shows that it > is not one high latency event causing overall latency but bunch of small > latencies are adding up and causing big latency. > I suspect this has something to do with EEVDF scheduling since this behavior is > seen from 6.6 (please note I may be wrong completly). > Are there any methods or options with which I can bring back CFS scheduling behavior > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? You could also use tracefs as that works on busybox: # echo 0 > /sys/kernel/tracing/tracing_on # echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable [ and perhaps even more events ] # echo 1 > /sys/kernel/tracing/tracing_on # <run test>; echo 0 > /sys/kernel/tracing/tracing_on # cat /sys/kernel/tracing/trace You could even make it a trace.dat file: # mkdir /tmp/tracing # cp -r /sys/kernel/tracing/events /tmp/tracing/ # cp -r /proc/kallsyms /tmp/tracing/ [ have bs be PAGE_SIZE for your architecture ] # dd bs=4096 if=/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw of=/tmp/tracing/trace0.raw # cd /tmp # tar cvf trace.tar tracing Copy trace.tar to a desktop and extract it. $ cd /tmp $ tar xvf trace.tar [ Make sure you have the latest trace-cmd installed ] $ trace-cmd restore -t /tmp/tracing/ -k /tmp/tracing/kallsyms -o /tmp/trace.dat /tmp/tracing/trace0.raw $ trace-cmp report /tmp/trace.dat Now you can send us the trace.dat file and we could analyze it more. You could also enable more events than just sched_switch, like sched_waking and such. -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Query regarding work scheduling 2025-08-14 17:45 ` Steven Rostedt @ 2025-08-19 11:36 ` Subbaraya Sundeep 0 siblings, 0 replies; 8+ messages in thread From: Subbaraya Sundeep @ 2025-08-19 11:36 UTC (permalink / raw) To: Steven Rostedt Cc: Tejun Heo, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman, vschneid, jiangshanlai, linux-kernel Hi Steven, On 2025-08-14 at 17:45:50, Steven Rostedt (rostedt@goodmis.org) wrote: > On Thu, 14 Aug 2025 03:54:58 +0000 > Subbaraya Sundeep <sbhatta@marvell.com> wrote: > > > > Difficult to tell where the latencies are coming from. Maybe you can use > > > something like https://github.com/josefbacik/systing to look further into > > > it? All the scheduling events are tracked by default and you should be able > > > to add tracepoints and other events relatively easily. You can also set > > > Thanks for the reply. I am using simple busybox to avoid overhead of any other apps > > or deamons running in background and taking CPU time in between. > > I will try building systing and running it. 6.16 histogram shows that it > > is not one high latency event causing overall latency but bunch of small > > latencies are adding up and causing big latency. > > I suspect this has something to do with EEVDF scheduling since this behavior is > > seen from 6.6 (please note I may be wrong completly). > > Are there any methods or options with which I can bring back CFS scheduling behavior > > maybe with the knobs in /sys/kernel/debug/sched/features as a quick check? > > You could also use tracefs as that works on busybox: > > # echo 0 > /sys/kernel/tracing/tracing_on > # echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable > [ and perhaps even more events ] > # echo 1 > /sys/kernel/tracing/tracing_on > # <run test>; echo 0 > /sys/kernel/tracing/tracing_on > # cat /sys/kernel/tracing/trace Thank you for tracefs. I use it extensively and using timestamps in trace data calculated the latencies. > > You could even make it a trace.dat file: > > # mkdir /tmp/tracing > # cp -r /sys/kernel/tracing/events /tmp/tracing/ > # cp -r /proc/kallsyms /tmp/tracing/ > [ have bs be PAGE_SIZE for your architecture ] > # dd bs=4096 if=/sys/kernel/tracing/per_cpu/cpu0/trace_pipe_raw of=/tmp/tracing/trace0.raw > # cd /tmp > # tar cvf trace.tar tracing > > Copy trace.tar to a desktop and extract it. > > $ cd /tmp > $ tar xvf trace.tar > [ Make sure you have the latest trace-cmd installed ] > $ trace-cmd restore -t /tmp/tracing/ -k /tmp/tracing/kallsyms -o /tmp/trace.dat /tmp/tracing/trace0.raw > $ trace-cmp report /tmp/trace.dat > > Now you can send us the trace.dat file and we could analyze it more. > > You could also enable more events than just sched_switch, like sched_waking > and such. > I am on it. Thanks for the commands and your time. Sundeep > -- Steve ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-31 15:15 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-12 11:40 Query regarding work scheduling Subbaraya Sundeep 2025-08-12 18:52 ` Tejun Heo 2025-08-14 3:54 ` Subbaraya Sundeep 2025-08-14 9:48 ` Peter Zijlstra 2025-08-19 11:18 ` Subbaraya Sundeep 2025-10-31 15:14 ` Subbaraya Sundeep 2025-08-14 17:45 ` Steven Rostedt 2025-08-19 11:36 ` Subbaraya Sundeep
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.