[RFC PATCH] cyclictest: Add skip

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] cyclictest: Add skip_sec option
@ 2025-05-20  3:56 Leonardo Bras
  2025-05-20  7:27 ` John Ogness
  2025-05-20 15:14 ` Leonardo Bras Soares Passos
  0 siblings, 2 replies; 6+ messages in thread
From: Leonardo Bras @ 2025-05-20  3:56 UTC (permalink / raw)
  To: Clark Williams, John Kacur; +Cc: Leonardo Bras, rt-users, Sana Sharma

When running cyclictest with break trace (-b) option, wait for skip_sec
seconds before issuing the break.

Cyclictest may present high latency on the initial cycles, which can be
caused by initial resource allocations, and may not represent the
expected latency for the running workload.

So add skip_sec so user can make cyclictest ignore any break that may
happen in those inital sseconds.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---

This behavior was found by Sana and me when we were testing cyclictest
on an ARM KVM VM, while running trace-cmd at host and guest
(with trace-cmd agent).

 src/cyclictest/cyclictest.c | 11 ++++++++++-
 src/cyclictest/cyclictest.8 |  2 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
index 890da5d..26e181a 100644
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -183,20 +183,21 @@ static int use_nsecs = 0;
 static int refresh_on_max;
 static int force_sched_other;
 static int priospread = 0;
 static int check_clock_resolution;
 static int ct_debug;
 static int use_fifo = 0;
 static pthread_t fifo_threadid;
 static int laptop = 0;
 static int power_management = 0;
 static int use_histfile = 0;
+static int skip_sec = 0;
 
 #ifdef ARCH_HAS_SMI_COUNTER
 static int smi = 0;
 #else
 #define smi	0
 #endif
 
 static pthread_cond_t refresh_on_max_cond = PTHREAD_COND_INITIALIZER;
 static pthread_mutex_t refresh_on_max_lock = PTHREAD_MUTEX_INITIALIZER;
 
@@ -637,20 +638,21 @@ static void *timerthread(void *param)
 	sigset_t sigset;
 	timer_t timer;
 	struct timespec now, next, interval, stop = { 0 };
 	struct itimerval itimer;
 	struct itimerspec tspec;
 	struct thread_stat *stat = par->stats;
 	int stopped = 0;
 	cpu_set_t mask;
 	pthread_t thread;
 	unsigned long smi_now, smi_old = 0;
+	unsigned long skip_cycles = skip_sec * USEC_PER_SEC / par->interval;
 
 	/* if we're running in numa mode, set our memory node */
 	if (par->node != -1)
 		rt_numa_set_numa_run_on_node(par->node, par->cpu);
 
 	if (par->cpu != -1) {
 		CPU_ZERO(&mask);
 		CPU_SET(par->cpu, &mask);
 		thread = pthread_self();
 		if (pthread_setaffinity_np(thread, sizeof(mask), &mask) != 0)
@@ -837,21 +839,21 @@ static void *timerthread(void *param)
 				pthread_cond_signal(&refresh_on_max_cond);
 		}
 		stat->avg += (double) diff;
 
 		if (trigger && (diff > trigger))
 			trigger_update(par, diff, calctime(now));
 
 		if (duration && (calcdiff(now, stop) >= 0))
 			shutdown++;
 
-		if (!stopped && tracelimit && (diff > tracelimit)) {
+		if (!stopped && tracelimit && (diff > tracelimit) && stat->cycles > skip_cycles) {
 			stopped++;
 			shutdown++;
 			pthread_mutex_lock(&break_thread_id_lock);
 			if (break_thread_id == 0) {
 				break_thread_id = stat->tid;
 				tracemark("hit latency threshold (%llu > %d)",
 					  (unsigned long long) diff, tracelimit);
 				break_thread_value = diff;
 			}
 			pthread_mutex_unlock(&break_thread_id_lock);
@@ -1081,20 +1083,21 @@ enum option_values {
 	OPT_DEFAULT_SYSTEM, OPT_DISTANCE, OPT_DURATION, OPT_LATENCY,
 	OPT_FIFO, OPT_HISTOGRAM, OPT_HISTOFALL, OPT_HISTFILE,
 	OPT_INTERVAL, OPT_JSON, OPT_MAINAFFINITY, OPT_LOOPS, OPT_MLOCKALL,
 	OPT_REFRESH, OPT_NANOSLEEP, OPT_NSECS, OPT_OSCOPE, OPT_PRIORITY,
 	OPT_QUIET, OPT_PRIOSPREAD, OPT_RELATIVE, OPT_RESOLUTION,
 	OPT_SYSTEM, OPT_SMP, OPT_THREADS, OPT_TRIGGER,
 	OPT_TRIGGER_NODES, OPT_UNBUFFERED, OPT_NUMA, OPT_VERBOSE,
 	OPT_DBGCYCLIC, OPT_POLICY, OPT_HELP, OPT_NUMOPTS,
 	OPT_ALIGNED, OPT_SECALIGNED, OPT_LAPTOP, OPT_SMI,
 	OPT_TRACEMARK, OPT_POSIX_TIMERS, OPT_DEEPEST_IDLE_STATE,
+	OPT_SKIP_SEC
 };
 
 /* Process commandline options */
 static void process_options(int argc, char *argv[], int max_cpus)
 {
 	int error = 0;
 	int option_affinity = 0;
 
 	for (;;) {
 		int option_index = 0;
@@ -1137,20 +1140,21 @@ static void process_options(int argc, char *argv[], int max_cpus)
 			{"spike-nodes",	     required_argument, NULL, OPT_TRIGGER_NODES },
 			{"threads",          optional_argument, NULL, OPT_THREADS },
 			{"tracemark",	     no_argument,	NULL, OPT_TRACEMARK },
 			{"unbuffered",       no_argument,       NULL, OPT_UNBUFFERED },
 			{"verbose",          no_argument,       NULL, OPT_VERBOSE },
 			{"dbg_cyclictest",   no_argument,       NULL, OPT_DBGCYCLIC },
 			{"policy",           required_argument, NULL, OPT_POLICY },
 			{"help",             no_argument,       NULL, OPT_HELP },
 			{"posix_timers",     no_argument,	NULL, OPT_POSIX_TIMERS },
 			{"deepest-idle-state", required_argument,	NULL, OPT_DEEPEST_IDLE_STATE },
+			{"skip_sec",         required_argument,	NULL, OPT_SKIP_SEC },
 			{NULL, 0, NULL, 0 },
 		};
 		int c = getopt_long(argc, argv, "a::A::b:c:d:D:F:h:H:i:l:MNo:p:mqrRsSt::uvD:x",
 				    long_options, &option_index);
 		if (c == -1)
 			break;
 		switch (c) {
 		case 'a':
 		case OPT_AFFINITY:
 			option_affinity = 1;
@@ -1340,20 +1344,25 @@ static void process_options(int argc, char *argv[], int max_cpus)
 			smi = 1;
 #else
 			fatal("--smi is not available on your arch\n");
 #endif
 			break;
 		case OPT_TRACEMARK:
 			trace_marker = 1; break;
 		case OPT_DEEPEST_IDLE_STATE:
 			deepest_idle_state = atoi(optarg);
 			break;
+		case OPT_SKIP_SEC:
+			skip_sec = atoi(optarg);
+			if (skip_sec < 0)
+				skip_sec = 0;
+			break;
 		}
 	}
 
 	if ((use_system == MODE_SYS_OFFSET) && (use_nanosleep == MODE_CYCLIC)) {
 		warn("The system option requires clock_nanosleep\n");
 		warn("and is not compatible with posix_timers\n");
 		warn("Using clock_nanosleep\n");
 		use_nanosleep = MODE_CLOCK_NANOSLEEP;
 	}
 
diff --git a/src/cyclictest/cyclictest.8 b/src/cyclictest/cyclictest.8
index 6becd19..a0eaf7e 100644
--- a/src/cyclictest/cyclictest.8
+++ b/src/cyclictest/cyclictest.8
@@ -174,20 +174,22 @@ Output values on stdout for statistics. This option is used to gather statistica
 
 n:c:v
 
 where n=task number c=count v=latency value in us.
 .TP
 .B \-\-dbg_cyclictest
 Print info userful for debugging cyclictest
 .TP
 .B \-x, \-\-posix_timers
 Use POSIX timers instead of clock_nanosleep.
+.B \-\-skip_sec=SEC
+When using -b, skip sending break trace command for the first SEC seconds.
 
 .SH SEE ALSO
 .BR numa (3),
 .BR numactl (8),
 .\" .br
 .\" The programs are documented fully by
 .\" .IR "The Rise and Fall of a Fooish Bar" ,
 .\" available via the Info system.
 .SH AUTHOR
 cyclictest was written by Thomas Gleixner <tglx@linuxtronix.de>.
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] cyclictest: Add skip_sec option
  2025-05-20  3:56 [RFC PATCH] cyclictest: Add skip_sec option Leonardo Bras
@ 2025-05-20  7:27 ` John Ogness
  2025-05-20 14:30   ` Leonardo Bras Soares Passos
  2025-05-20 15:14 ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 6+ messages in thread
From: John Ogness @ 2025-05-20  7:27 UTC (permalink / raw)
  To: Leonardo Bras, Clark Williams, John Kacur
  Cc: Leonardo Bras, rt-users, Sana Sharma

On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote:
> When running cyclictest with break trace (-b) option, wait for skip_sec
> seconds before issuing the break.
>
> Cyclictest may present high latency on the initial cycles, which can be
> caused by initial resource allocations, and may not represent the
> expected latency for the running workload.

If cyclictest is programmed correctly, this will not happen. Can you
provide a technical explanation for high latencies on initial cycles?

John Ogness

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] cyclictest: Add skip_sec option
  2025-05-20  7:27 ` John Ogness
@ 2025-05-20 14:30   ` Leonardo Bras Soares Passos
  2025-05-21  8:40     ` John Ogness
  0 siblings, 1 reply; 6+ messages in thread
From: Leonardo Bras Soares Passos @ 2025-05-20 14:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Clark Williams, John Kacur, rt-users, Sana Sharma,
	Marcelo Tosatti

On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote:
> > When running cyclictest with break trace (-b) option, wait for skip_sec
> > seconds before issuing the break.
> >
> > Cyclictest may present high latency on the initial cycles, which can be
> > caused by initial resource allocations, and may not represent the
> > expected latency for the running workload.
>
> If cyclictest is programmed correctly, this will not happen. Can you
> provide a technical explanation for high latencies on initial cycles?

Hi John,

We are currently investigating the source of this latency, but I heard
from other team members it's also happening on Intel Secure VMs as
well.

Scenario:
Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest
KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116
vcpus isolated on guest

Cyclictest runs with trace-cmd attaching to a guest agent:

trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer
-e irq_handler_entry -e irq_handler_exit -e tick_stop -e
ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e
timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m
-q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50
--mainaffinity 0,1,2,3 --tracemark'"

What we see is a peak of >50us in the first couple cycles, and then a
max peak of 15 in our tests.
We also tested a sleep before the above cyclictest command, in order
to remove any trace-cmd setup that might have been in progress, but it
did not help.

If you have any idea on why this might happen, or suggest a debugging
setup, it would be appreciated.

Thanks!
Leo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] cyclictest: Add skip_sec option
  2025-05-20  3:56 [RFC PATCH] cyclictest: Add skip_sec option Leonardo Bras
  2025-05-20  7:27 ` John Ogness
@ 2025-05-20 15:14 ` Leonardo Bras Soares Passos
  1 sibling, 0 replies; 6+ messages in thread
From: Leonardo Bras Soares Passos @ 2025-05-20 15:14 UTC (permalink / raw)
  To: Clark Williams, John Kacur; +Cc: rt-users, Sana Sharma

On Tue, May 20, 2025 at 12:56 AM Leonardo Bras <leobras@redhat.com> wrote:
> @@ -637,20 +638,21 @@ static void *timerthread(void *param)
>         sigset_t sigset;
>         timer_t timer;
>         struct timespec now, next, interval, stop = { 0 };
>         struct itimerval itimer;
>         struct itimerspec tspec;
>         struct thread_stat *stat = par->stats;
>         int stopped = 0;
>         cpu_set_t mask;
>         pthread_t thread;
>         unsigned long smi_now, smi_old = 0;
> +       unsigned long skip_cycles = skip_sec * USEC_PER_SEC / par->interval;
>

BTW, I noticed this is not the best approach: in case cycles take
longer than the interval, we may be skipping longer than the
parameter. Will send a rfc v2 for taking time into account instead of
cycles.


Thanks!
Leo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] cyclictest: Add skip_sec option
  2025-05-20 14:30   ` Leonardo Bras Soares Passos
@ 2025-05-21  8:40     ` John Ogness
  2025-05-22  4:01       ` Leonardo Bras Soares Passos
  0 siblings, 1 reply; 6+ messages in thread
From: John Ogness @ 2025-05-21  8:40 UTC (permalink / raw)
  To: Leonardo Bras Soares Passos
  Cc: Clark Williams, John Kacur, rt-users, Sana Sharma,
	Marcelo Tosatti

Hi Leo,

On 2025-05-20, Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote:
>>
>> On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote:
>> > When running cyclictest with break trace (-b) option, wait for skip_sec
>> > seconds before issuing the break.
>> >
>> > Cyclictest may present high latency on the initial cycles, which can be
>> > caused by initial resource allocations, and may not represent the
>> > expected latency for the running workload.
>>
>> If cyclictest is programmed correctly, this will not happen. Can you
>> provide a technical explanation for high latencies on initial cycles?
>
> We are currently investigating the source of this latency, but I heard
> from other team members it's also happening on Intel Secure VMs as
> well.
>
> Scenario:
> Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest
> KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116
> vcpus isolated on guest
>
> Cyclictest runs with trace-cmd attaching to a guest agent:
>
> trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer
> -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e
> timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m
> -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50
> --mainaffinity 0,1,2,3 --tracemark'"
>
> What we see is a peak of >50us in the first couple cycles, and then a
> max peak of 15 in our tests.

I wonder if this related to cache misses and/or memory bandwidth. If you
keep this patch but you increase the interval, do you see >50us during
your tests? For example:

    -i 10000

If so, your "max peak" of 15us is only valid when cache hot.

IMHO whatever situation is happening for the "first couple cycles" could
happen later. In that case, the patch is just sweeping some of the bad
numbers under the rug. It is really important to understand exactly what
the problem is before changing cyclictest to ignore such problems.

John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] cyclictest: Add skip_sec option
  2025-05-21  8:40     ` John Ogness
@ 2025-05-22  4:01       ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 6+ messages in thread
From: Leonardo Bras Soares Passos @ 2025-05-22  4:01 UTC (permalink / raw)
  To: John Ogness
  Cc: Clark Williams, John Kacur, rt-users, Sana Sharma,
	Marcelo Tosatti

On Wed, May 21, 2025 at 5:40 AM John Ogness <john.ogness@linutronix.de> wrote:
>
> Hi Leo,
>
> On 2025-05-20, Leonardo Bras Soares Passos <leobras@redhat.com> wrote:
> > On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@linutronix.de> wrote:
> >>
> >> On 2025-05-20, Leonardo Bras <leobras@redhat.com> wrote:
> >> > When running cyclictest with break trace (-b) option, wait for skip_sec
> >> > seconds before issuing the break.
> >> >
> >> > Cyclictest may present high latency on the initial cycles, which can be
> >> > caused by initial resource allocations, and may not represent the
> >> > expected latency for the running workload.
> >>
> >> If cyclictest is programmed correctly, this will not happen. Can you
> >> provide a technical explanation for high latencies on initial cycles?
> >
> > We are currently investigating the source of this latency, but I heard
> > from other team members it's also happening on Intel Secure VMs as
> > well.
> >
> > Scenario:
> > Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest
> > KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116
> > vcpus isolated on guest
> >
> > Cyclictest runs with trace-cmd attaching to a guest agent:
> >
> > trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer
> > -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> > ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e
> > timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> > ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m
> > -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50
> > --mainaffinity 0,1,2,3 --tracemark'"
> >
> > What we see is a peak of >50us in the first couple cycles, and then a
> > max peak of 15 in our tests.
>
> I wonder if this related to cache misses and/or memory bandwidth. If you
> keep this patch but you increase the interval, do you see >50us during
> your tests? For example:
>
>     -i 10000
>
> If so, your "max peak" of 15us is only valid when cache hot.
>

That's a smart move, we will run this test as suggested and bring the results.

> IMHO whatever situation is happening for the "first couple cycles" could
> happen later. In that case, the patch is just sweeping some of the bad
> numbers under the rug. It is really important to understand exactly what
> the problem is before changing cyclictest to ignore such problems.

That's a valid point.
We are currently investigating it, and I will bring the results when
we have some.

I have a v2 ready, but I think it makes sense to send it once we find
the above investigation if it makes sense to proceed with the
suggestion.

Thanks!
Leo


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-05-22  4:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20  3:56 [RFC PATCH] cyclictest: Add skip_sec option Leonardo Bras
2025-05-20  7:27 ` John Ogness
2025-05-20 14:30   ` Leonardo Bras Soares Passos
2025-05-21  8:40     ` John Ogness
2025-05-22  4:01       ` Leonardo Bras Soares Passos
2025-05-20 15:14 ` Leonardo Bras Soares Passos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox