Cyclictest usage

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Cyclictest usage
@ 2008-08-11  9:00 Tobias Knutsson
  2008-08-11 12:48 ` Gregory Haskins
  0 siblings, 1 reply; 9+ messages in thread
From: Tobias Knutsson @ 2008-08-11  9:00 UTC (permalink / raw)
  To: linux-rt-users

Hello,

I'm trying to do some initial benchmarking on a rt-kernel versus a
vanilla kernel. On the rt the results are stable in between test runs
using the following command:

cyclictest -t 1 -p 80 -c 1 -n -i 10000 -l 10000

However, using the same command while running the vanilla kernel, I
get very varying results. And I would completely understand if it was
due to increased jitter. The thing is, the jitter is very low (0%
load) but the average times varies from 600 us to 6500 us depending on
when the test is initiated.

I imagine this might have something to do with not having high res
timers in the vanilla kernel. If that is what's causing my problems.
What is the best way of comparing the two kernels?

I would be much greatful for a few pointers!

-- 

Hälsningar/Regards
Tobias Knutsson
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Cyclictest usage
  2008-08-11  9:00 Cyclictest usage Tobias Knutsson
@ 2008-08-11 12:48 ` Gregory Haskins
       [not found]   ` <ccb913ac0808110638v79283450j8d4953bc0820e747@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Gregory Haskins @ 2008-08-11 12:48 UTC (permalink / raw)
  To: Tobias Knutsson; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]

Tobias Knutsson wrote:
> Hello,
>
> I'm trying to do some initial benchmarking on a rt-kernel versus a
> vanilla kernel. On the rt the results are stable in between test runs
> using the following command:
>
> cyclictest -t 1 -p 80 -c 1 -n -i 10000 -l 10000
>
> However, using the same command while running the vanilla kernel, I
> get very varying results. And I would completely understand if it was
> due to increased jitter. The thing is, the jitter is very low (0%
> load) but the average times varies from 600 us to 6500 us depending on
> when the test is initiated.
>
> I imagine this might have something to do with not having high res
> timers in the vanilla kernel. If that is what's causing my problems.
> What is the best way of comparing the two kernels?
>
> I would be much greatful for a few pointers!
>
>   

Hi Tobias,
  What kernel versions are you testing out?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Cyclictest usage
       [not found]   ` <ccb913ac0808110638v79283450j8d4953bc0820e747@mail.gmail.com>
@ 2008-08-11 13:55     ` Gregory Haskins
  0 siblings, 0 replies; 9+ messages in thread
From: Gregory Haskins @ 2008-08-11 13:55 UTC (permalink / raw)
  To: Tobias Knutsson; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

Hi Tobias,

  Some more questions (inline)

Tobias Knutsson wrote:
> 2.6.25.8-rt7 and 2.6.25.8
>
> On Mon, Aug 11, 2008 at 14:48, Gregory Haskins <ghaskins@novell.com> wrote:
>   
>> Tobias Knutsson wrote:
>>     
>>> Hello,
>>>
>>> I'm trying to do some initial benchmarking on a rt-kernel versus a
>>> vanilla kernel. On the rt the results are stable in between test runs
>>> using the following command:
>>>
>>> cyclictest -t 1 -p 80 -c 1 -n -i 10000 -l 10000
>>>       

What architecture is this?  And what do you see for ave/max in the 
cyclictest output on -rt (so we have a reference)

>>> However, using the same command while running the vanilla kernel, I
>>> get very varying results. And I would completely understand if it was
>>> due to increased jitter. The thing is, the jitter is very low (0%
>>> load) but the average times varies from 600 us to 6500 us depending on
>>> when the test is initiated.
>>>       

What are you considering "jitter" when you say it is very low?  Are the 
600-6500us values truly from the "ave" output of cyclictest, or are 
these an average of the "max" output?

>>> I imagine this might have something to do with not having high res
>>> timers in the vanilla kernel.

I will defer to Thomas, but my understanding is that 2.6.25 vanilla 
should have HRT as an option (though I am not sure if you enabled it or 
not).

>>>  If that is what's causing my problems.
>>>       

Cyclictest will report that it cannot access the HRT when it cannot find 
it (at least it does on my system, but I might have an older version 
running).  Do you see any such warnings?

>>> What is the best way of comparing the two kernels?
>>>       

This is a pretty good way, as long as you have HRT in the vanilla 
kernel.  Assuming for a second that you do, your results are showing you 
what I would expect to see for vanilla (600-6500us spikes).

HTH

-Greg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* cyclictest usage
@ 2017-05-26 16:25 Clark Williams
  2017-06-02  9:59 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Clark Williams @ 2017-05-26 16:25 UTC (permalink / raw)
  To: RT; +Cc: John Kacur

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

I was gently reminded at the Linux Foundation RT board meeting that I had been tasked with sending out an email, soliciting people's use-cases for running cyclictest on RT kernels and to announce that we're looking at doing a rewrite to cyclictest 2.0. Herein is that email (better late than never). 

What we're looking for is how people are using cyclictest. For example, at Red Hat we use the 'rteval' tool, which puts a large SCHED_OTHER load on the system and then runs cyclictest with a measurement thread on each core. The intent is to put a large load on the scheduler and prove that the RT patchset provides deterministic performance under load. 

What other types of testing/measurement are people doing with cyclictest?

John Kacur and I are wanting to clean up tracing and make sure that the most commonly used options are on by default. In addition we want to refactor some of the runtime logic. Are there other areas that need to be cleaned up? Features that need to be added/deleted?

Thanks,

Clark

-- 
The United States Coast Guard
Ruining Natural Selection since 1790

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cyclictest usage
  2017-05-26 16:25 cyclictest usage Clark Williams
@ 2017-06-02  9:59 ` Sebastian Andrzej Siewior
  2017-06-02 14:26   ` Clark Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-02  9:59 UTC (permalink / raw)
  To: Clark Williams; +Cc: RT, John Kacur

On 2017-05-26 11:25:02 [-0500], Clark Williams wrote:
> What we're looking for is how people are using cyclictest. For example, at Red Hat we use the 'rteval' tool, which puts a large SCHED_OTHER load on the system and then runs cyclictest with a measurement thread on each core. The intent is to put a large load on the scheduler and prove that the RT patchset provides deterministic performance under load. 
> 
> What other types of testing/measurement are people doing with cyclictest?

hackbench, disk I/O, network related ping/traffic for the "normal"
interfaces and some custom ones to poke at the gpio, i2c, … drivers to
ensure that they don't a long off time. Either way, I prefer starting
them independently of cyclictest.

> John Kacur and I are wanting to clean up tracing and make sure that the most commonly used options are on by default. In addition we want to refactor some of the runtime logic. Are there other areas that need to be cleaned up? Features that need to be added/deleted?

I do have (had) a tiny version of cyclictest with a lot things pulled
out simply to get it run a system with 8 MiB RAM in total. Learned from
this: everything out :)
Basically the only interaction between cyclictest and the tracing
infrastructure should be just to stop tracing only if a break value was
specified _and_ was the reason for cyclictest to abort.
This would also reduce the number of command line options which would
_really_ nice. 
As for defaults, it should be have those arguments which are used by
people by default. I guess this includes clock_nanosleep(), mlockall(),
very high priority, one thread per-core and so on.
Not sure about "-d 0 --secaligned 250" but something should be default
so we have the same behaviour on its invocation.
I remember, that there was (or is) an option to figure out if the
hrtimer is working on the system and estimates the resolution of the
clocksource. I would move that into a different tool.
That -M mode is nice, but it should give some kind of indication, that
the system is still alive like update the number of "loop" once in a
while. But this brings me to another topic: The output system. Usually
the console output is enough. Then we have the "histogram" mode to check
the distribution. People often use the histogram mode because the former
can't be used/parsed by script/tool. Here (the histogram) I hear people
complaining about the output which is not (easy)
machine-readable. *I* think it would be okay to use the "histogram" mode
for machine-readable but the output should be better structured.
Something like yaml is probably just fine. However I can't tell if this
will work for everyone or if a plugin-like interface would be best so we
can dock yaml output as well as something that creates xml based output
(for people that dream in XML, too).

> Thanks,
> 
> Clark

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cyclictest usage
  2017-06-02  9:59 ` Sebastian Andrzej Siewior
@ 2017-06-02 14:26   ` Clark Williams
  2017-06-02 16:54     ` Austin Schuh
  2017-06-08  8:40     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 9+ messages in thread
From: Clark Williams @ 2017-06-02 14:26 UTC (permalink / raw)
  Cc: Sebastian Andrzej Siewior, RT, John Kacur

[-- Attachment #1: Type: text/plain, Size: 5261 bytes --]

On Fri, 2 Jun 2017 11:59:11 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2017-05-26 11:25:02 [-0500], Clark Williams wrote:
> > What we're looking for is how people are using cyclictest. For example, at Red Hat we use the 'rteval' tool, which puts a large SCHED_OTHER load on the system and then runs cyclictest with a measurement thread on each core. The intent is to put a large load on the scheduler and prove that the RT patchset provides deterministic performance under load. 
> > 
> > What other types of testing/measurement are people doing with cyclictest?  
> 
> hackbench, disk I/O, network related ping/traffic for the "normal"
> interfaces and some custom ones to poke at the gpio, i2c, … drivers to
> ensure that they don't a long off time. Either way, I prefer starting
> them independently of cyclictest.

Yeah, that's the *other* tool we should discuss some time: rteval. Currently the only loads rteval has are 'kcompile', which is a parallel make of a kernel tree and 'hackbench' which kicks off a bunch of hackbench instances. I've been thinking about adding a 'stress' load to use the stress app, but just haven't had time. 

And of course the only measurement module for rteval is cyclictest. Haven't really figured out what would be another good measurement module. 

> 
> > John Kacur and I are wanting to clean up tracing and make sure that the most commonly used options are on by default. In addition we want to refactor some of the runtime logic. Are there other areas that need to be cleaned up? Features that need to be added/deleted?  
> 
> I do have (had) a tiny version of cyclictest with a lot things pulled
> out simply to get it run a system with 8 MiB RAM in total. Learned from
> this: everything out :)
> Basically the only interaction between cyclictest and the tracing
> infrastructure should be just to stop tracing only if a break value was
> specified _and_ was the reason for cyclictest to abort.
> This would also reduce the number of command line options which would
> _really_ nice. 

That's exactly the plan. Rip out the tracing bits (except for -b/--breaktrace) and their associated command line args. 

> As for defaults, it should be have those arguments which are used by
> people by default. I guess this includes clock_nanosleep(), mlockall(),
> very high priority, one thread per-core and so on.
> Not sure about "-d 0 --secaligned 250" but something should be default
> so we have the same behaviour on its invocation.

When rteval starts cyclictest it uses '-d0 -i100'. I know that's the worst case of having every timer interrupt fire at the same time, but when we started this that's what we wanted to see. Do we need to have two meta-arguments: --worst-case / --best-case? The --worst-case would do things most normal deployers of a realtime app wouldn't do, while the --best-case would stagger timer starts, isolate to numa nodes, etc. 

> I remember, that there was (or is) an option to figure out if the
> hrtimer is working on the system and estimates the resolution of the
> clocksource. I would move that into a different tool.
> That -M mode is nice, but it should give some kind of indication, that
> the system is still alive like update the number of "loop" once in a
> while. But this brings me to another topic: The output system. Usually
> the console output is enough. Then we have the "histogram" mode to check
> the distribution. People often use the histogram mode because the former
> can't be used/parsed by script/tool. Here (the histogram) I hear people
> complaining about the output which is not (easy)
> machine-readable. *I* think it would be okay to use the "histogram" mode
> for machine-readable but the output should be better structured.
> Something like yaml is probably just fine. However I can't tell if this
> will work for everyone or if a plugin-like interface would be best so we
> can dock yaml output as well as something that creates xml based output
> (for people that dream in XML, too).

I don't have a problem generating XML or JSON or some sort of structured output. I'll put that on the list. Hmmm, plugins...

Regarding periodic updates, what do you think about providing a way for an app to query the current state of a run? Possibly a memory region that can be mmapped by a process? Or possibly a UNIX domain socket? I'm thinking of the case where you're doing a 48 hour test run on a box where you don't have easy access to the console and would just like to know if a) the box is alive and b) is still showing reasonable values for max latency and jitter. 

I kinda like the memory region since that's how cyclictest handles output: the measurement threads all dump their data into a cpu-specific memory area and a display thread (running at a low priority) is the one that wakes up, reads the update and printf's to the screen. If we exported that region so that another process could mmap it read-only, that would provide a way to get a snapshot of the run, without actually depending on cyclictest to *do* anything out of the ordinary. 

Thanks for the input Sebastian. 

Clark

-- 
The United States Coast Guard
Ruining Natural Selection since 1790

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cyclictest usage
  2017-06-02 14:26   ` Clark Williams
@ 2017-06-02 16:54     ` Austin Schuh
  2017-06-08  8:46       ` Sebastian Andrzej Siewior
  2017-06-08  8:40     ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Austin Schuh @ 2017-06-02 16:54 UTC (permalink / raw)
  To: Clark Williams; +Cc: Sebastian Andrzej Siewior, RT, John Kacur

On Fri, Jun 2, 2017 at 7:26 AM, Clark Williams <williams@redhat.com> wrote:
> Regarding periodic updates, what do you think about providing a way for an app to query the current state of a run? Possibly a memory region that can be mmapped by a process? Or possibly a UNIX domain socket? I'm thinking of the case where you're doing a 48 hour test run on a box where you don't have easy access to the console and would just like to know if a) the box is alive and b) is still showing reasonable values for max latency and jitter.
>
> I kinda like the memory region since that's how cyclictest handles output: the measurement threads all dump their data into a cpu-specific memory area and a display thread (running at a low priority) is the one that wakes up, reads the update and printf's to the screen. If we exported that region so that another process could mmap it read-only, that would provide a way to get a snapshot of the run, without actually depending on cyclictest to *do* anything out of the ordinary.

I might be missing something, but if you want to be able to view the
status remotely, why not tee the output to a file and tail that from
the remote session, or use screen/tmux/etc?  That would stay aligned
with Sebastian's goal of keeping it small and simple.

> Thanks for the input Sebastian.
>
> Clark

Austin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cyclictest usage
  2017-06-02 14:26   ` Clark Williams
  2017-06-02 16:54     ` Austin Schuh
@ 2017-06-08  8:40     ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-08  8:40 UTC (permalink / raw)
  To: Clark Williams; +Cc: linux-rt-users, John Kacur

On 2017-06-02 09:26:12 [-0500], Clark Williams wrote:
> > As for defaults, it should be have those arguments which are used by
> > people by default. I guess this includes clock_nanosleep(), mlockall(),
> > very high priority, one thread per-core and so on.
> > Not sure about "-d 0 --secaligned 250" but something should be default
> > so we have the same behaviour on its invocation.
> 
> When rteval starts cyclictest it uses '-d0 -i100'. I know that's the worst case of having every timer interrupt fire at the same time, but when we started this that's what we wanted to see. Do we need to have two meta-arguments: --worst-case / --best-case? The --worst-case would do things most normal deployers of a realtime app wouldn't do, while the --best-case would stagger timer starts, isolate to numa nodes, etc. 



> 
> > I remember, that there was (or is) an option to figure out if the
> > hrtimer is working on the system and estimates the resolution of the
> > clocksource. I would move that into a different tool.
> > That -M mode is nice, but it should give some kind of indication, that
> > the system is still alive like update the number of "loop" once in a
> > while. But this brings me to another topic: The output system. Usually
> > the console output is enough. Then we have the "histogram" mode to check
> > the distribution. People often use the histogram mode because the former
> > can't be used/parsed by script/tool. Here (the histogram) I hear people
> > complaining about the output which is not (easy)
> > machine-readable. *I* think it would be okay to use the "histogram" mode
> > for machine-readable but the output should be better structured.
> > Something like yaml is probably just fine. However I can't tell if this
> > will work for everyone or if a plugin-like interface would be best so we
> > can dock yaml output as well as something that creates xml based output
> > (for people that dream in XML, too).
> 
> I don't have a problem generating XML or JSON or some sort of structured output. I'll put that on the list. Hmmm, plugins...
I remember Steven said some about plugins, I don't really care. It is
just libxml2 has ~2MiB, libjson-c3 85KiB and libyaml 156KiB I would
prefer both (json & yaml) over xml. I don't want to make things more
complicated (with plugins).

> Regarding periodic updates, what do you think about providing a way for an app to query the current state of a run? Possibly a memory region that can be mmapped by a process? Or possibly a UNIX domain socket? I'm thinking of the case where you're doing a 48 hour test run on a box where you don't have easy access to the console and would just like to know if a) the box is alive and b) is still showing reasonable values for max latency and jitter. 

The memory mapped output is probably fine for the local usage. I think
now we have a unix domain socket / pipe or something. Another use case
would be where you have a limited box and you want obtain the cyclictest
information over network and provide a real time view of the data and
another box.
 
> I kinda like the memory region since that's how cyclictest handles output: the measurement threads all dump their data into a cpu-specific memory area and a display thread (running at a low priority) is the one that wakes up, reads the update and printf's to the screen. If we exported that region so that another process could mmap it read-only, that would provide a way to get a snapshot of the run, without actually depending on cyclictest to *do* anything out of the ordinary. 

So if we would have "naked" cyclictest which dumps data then we could
attach a console output which is what we have now _or_ network output
which sends the output via network. Sounds good.

> 
> Clark

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cyclictest usage
  2017-06-02 16:54     ` Austin Schuh
@ 2017-06-08  8:46       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2017-06-08  8:46 UTC (permalink / raw)
  To: Austin Schuh; +Cc: Clark Williams, RT, John Kacur

On 2017-06-02 09:54:16 [-0700], Austin Schuh wrote:
> On Fri, Jun 2, 2017 at 7:26 AM, Clark Williams <williams@redhat.com> wrote:
> > Regarding periodic updates, what do you think about providing a way for an app to query the current state of a run? Possibly a memory region that can be mmapped by a process? Or possibly a UNIX domain socket? I'm thinking of the case where you're doing a 48 hour test run on a box where you don't have easy access to the console and would just like to know if a) the box is alive and b) is still showing reasonable values for max latency and jitter.
> >
> > I kinda like the memory region since that's how cyclictest handles output: the measurement threads all dump their data into a cpu-specific memory area and a display thread (running at a low priority) is the one that wakes up, reads the update and printf's to the screen. If we exported that region so that another process could mmap it read-only, that would provide a way to get a snapshot of the run, without actually depending on cyclictest to *do* anything out of the ordinary.
> 
> I might be missing something, but if you want to be able to view the
> status remotely, why not tee the output to a file and tail that from
> the remote session, or use screen/tmux/etc?  That would stay aligned
> with Sebastian's goal of keeping it small and simple.

This works in general for you as a person. But if you want to have the
output read by a machine/script then this won't work.

> Austin

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-08  8:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-26 16:25 cyclictest usage Clark Williams
2017-06-02  9:59 ` Sebastian Andrzej Siewior
2017-06-02 14:26   ` Clark Williams
2017-06-02 16:54     ` Austin Schuh
2017-06-08  8:46       ` Sebastian Andrzej Siewior
2017-06-08  8:40     ` Sebastian Andrzej Siewior
  -- strict thread matches above, loose matches on Subject: below --
2008-08-11  9:00 Cyclictest usage Tobias Knutsson
2008-08-11 12:48 ` Gregory Haskins
     [not found]   ` <ccb913ac0808110638v79283450j8d4953bc0820e747@mail.gmail.com>
2008-08-11 13:55     ` Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).