* Re: 1us latency?
2015-08-03 18:53 ` Clark Williams
@ 2015-08-03 19:03 ` pavel
2015-08-04 20:27 ` Frank Rowand
2015-08-06 13:12 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 9+ messages in thread
From: pavel @ 2015-08-03 19:03 UTC (permalink / raw)
To: Clark Williams; +Cc: Linux RT Users
03.08.2015 21:53, Clark Williams пишет:
> On Mon, 3 Aug 2015 21:36:26 +0300
> pavel <pavel@pavlinux.ru> wrote:
>
>> 1. Without patch
>>
>> # ./cyclictest -S -p fifo -D60s
>> defaulting realtime priority to 9
>> # /dev/cpu_dma_latency set to 0us
>> policy: fifo: loadavg: 0.00 0.01 0.05 1/207 12378
>>
>> T: 0 (12361) P: 9 I:1000 C: 59993 Min: 1 Act: 1 Avg: 1 Max: 8
>> T: 1 (12362) P: 9 I:1500 C: 39995 Min: 1 Act: 1 Avg: 1 Max: 9
>> T: 2 (12363) P: 9 I:2000 C: 29996 Min: 1 Act: 1 Avg: 1 Max: 12
>> T: 3 (12364) P: 9 I:2500 C: 23997 Min: 1 Act: 1 Avg: 1 Max: 9
>> T: 4 (12365) P: 9 I:3000 C: 19997 Min: 1 Act: 1 Avg: 1 Max: 6
>> T: 5 (12366) P: 9 I:3500 C: 17141 Min: 1 Act: 1 Avg: 1 Max: 11
>> T: 6 (12367) P: 9 I:4000 C: 14998 Min: 0 Act: 1 Avg: 1 Max: 10
>> T: 7 (12368) P: 9 I:4500 C: 13331 Min: 0 Act: 1 Avg: 1 Max: 6
>>
>> 2. With patch
>>
>> # ./cyclictest -S -p fifo -D60s
>> defaulting realtime priority to 9
>> # /dev/cpu_dma_latency set to 0us
>> policy: fifo: loadavg: 0.05 0.04 0.05 1/206 12469
>>
>> T: 0 (12452) P: 9 I:1000 C: 59997 Min: 1 Act: 1 Avg: 1 Max: 3
>> T: 1 (12453) P: 9 I:1500 C: 39998 Min: 1 Act: 1 Avg: 1 Max: 2
>> T: 2 (12454) P: 9 I:2000 C: 29998 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 3 (12455) P: 9 I:2500 C: 23999 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 4 (12456) P: 9 I:3000 C: 19999 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 5 (12457) P: 9 I:3500 C: 17142 Min: 1 Act: 1 Avg: 1 Max: 3
>> T: 6 (12458) P: 9 I:4000 C: 14999 Min: 1 Act: 1 Avg: 1 Max: 2
>> T: 7 (12459) P: 9 I:4500 C: 13332 Min: 1 Act: 1 Avg: 1 Max: 3
>>
>>
>> Patch ---
>>
>> diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
>> index 34053c5..84a70de 100644
>> --- a/src/cyclictest/cyclictest.c
>> +++ b/src/cyclictest/cyclictest.c
>> @@ -1727,6 +1727,9 @@ static void print_stat(FILE *fp, struct thread_param *par,
>> int index, int verbos
>> {
>> struct thread_stat *stat = par->stats;
>>
>> + if ( stat->cycles < 5000)
>> + stat->max = 0;
>> +
>> if (!verbose) {
>> if (quiet != 1) {
>> char *fmt;
>>
>> ---
>>
>>
>> 03.08.2015 20:59, pavel пишет:
>>>>> # ./cyclictest -S -p fifo
>>>>> defaulting realtime priority to 9
>>>>> # /dev/cpu_dma_latency set to 0us
>>>>> policy: fifo: loadavg: 0.00 0.03 0.05 1/218 4240
>>>>>
>>>>> T: 0 ( 4174) P: 9 I:1000 C: 430806 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 1 ( 4175) P: 9 I:1500 C: 287204 Min: 1 Act: 1 Avg: 1 Max: 6
>>>>> T: 2 ( 4176) P: 9 I:2000 C: 215403 Min: 1 Act: 2 Avg: 1 Max: 11
>>>>> T: 3 ( 4177) P: 9 I:2500 C: 172322 Min: 1 Act: 1 Avg: 1 Max: 9
>>>>> T: 4 ( 4178) P: 9 I:3000 C: 143602 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 5 ( 4179) P: 9 I:3500 C: 123087 Min: 1 Act: 1 Avg: 1 Max: 11
>>>>> T: 6 ( 4180) P: 9 I:4000 C: 107701 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 7 ( 4181) P: 9 I:4500 C: 108232 Min: 1 Act: 2 Avg: 1 Max: 11
>>>>>
>>>>>
>>>>> It possible? 1us latency? o_O
>>>> No, your latency is 11us. Max latency is what we care about, not the
>>>> average.
>>> By the way, max values appear only at the start, then they are roughly equal to
>>> the average.
> Interesting. Betting that's page faults and cache filling.
>
> I don't think we want to arbitrarily pick some number of cycles for a
> "settle time" (i.e. a grace period for the application to reach steady
> state). Possibly we should add an option for that? Specify some number
> of cycles or some amount of time that where the measurement threads run
> before actual measurements start?
>
> $ cyclictest --numa -p95 -m --settle=10ms
>
> That would say "run the measurement threads for ten milliseconds before
> actually starting the measurement period". That would allow them to
> fault in and fill cache lines before starting real work.
>
> Anyone else have an opinion?
Add option and yet another utility for detect steady state, when the difference
is not more than 10% (like an oldschool resistors with silver line :))
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: 1us latency?
2015-08-03 18:53 ` Clark Williams
2015-08-03 19:03 ` pavel
@ 2015-08-04 20:27 ` Frank Rowand
2015-08-06 13:12 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 9+ messages in thread
From: Frank Rowand @ 2015-08-04 20:27 UTC (permalink / raw)
To: Clark Williams; +Cc: pavel, Linux RT Users
On 8/3/2015 11:53 AM, Clark Williams wrote:
> On Mon, 3 Aug 2015 21:36:26 +0300
> pavel <pavel@pavlinux.ru> wrote:
>
>> 1. Without patch
>>
>> # ./cyclictest -S -p fifo -D60s
>> defaulting realtime priority to 9
>> # /dev/cpu_dma_latency set to 0us
>> policy: fifo: loadavg: 0.00 0.01 0.05 1/207 12378
>>
>> T: 0 (12361) P: 9 I:1000 C: 59993 Min: 1 Act: 1 Avg: 1 Max: 8
>> T: 1 (12362) P: 9 I:1500 C: 39995 Min: 1 Act: 1 Avg: 1 Max: 9
>> T: 2 (12363) P: 9 I:2000 C: 29996 Min: 1 Act: 1 Avg: 1 Max: 12
>> T: 3 (12364) P: 9 I:2500 C: 23997 Min: 1 Act: 1 Avg: 1 Max: 9
>> T: 4 (12365) P: 9 I:3000 C: 19997 Min: 1 Act: 1 Avg: 1 Max: 6
>> T: 5 (12366) P: 9 I:3500 C: 17141 Min: 1 Act: 1 Avg: 1 Max: 11
>> T: 6 (12367) P: 9 I:4000 C: 14998 Min: 0 Act: 1 Avg: 1 Max: 10
>> T: 7 (12368) P: 9 I:4500 C: 13331 Min: 0 Act: 1 Avg: 1 Max: 6
>>
>> 2. With patch
>>
>> # ./cyclictest -S -p fifo -D60s
>> defaulting realtime priority to 9
>> # /dev/cpu_dma_latency set to 0us
>> policy: fifo: loadavg: 0.05 0.04 0.05 1/206 12469
>>
>> T: 0 (12452) P: 9 I:1000 C: 59997 Min: 1 Act: 1 Avg: 1 Max: 3
>> T: 1 (12453) P: 9 I:1500 C: 39998 Min: 1 Act: 1 Avg: 1 Max: 2
>> T: 2 (12454) P: 9 I:2000 C: 29998 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 3 (12455) P: 9 I:2500 C: 23999 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 4 (12456) P: 9 I:3000 C: 19999 Min: 1 Act: 1 Avg: 1 Max: 4
>> T: 5 (12457) P: 9 I:3500 C: 17142 Min: 1 Act: 1 Avg: 1 Max: 3
>> T: 6 (12458) P: 9 I:4000 C: 14999 Min: 1 Act: 1 Avg: 1 Max: 2
>> T: 7 (12459) P: 9 I:4500 C: 13332 Min: 1 Act: 1 Avg: 1 Max: 3
>>
>>
>> Patch ---
>>
>> diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
>> index 34053c5..84a70de 100644
>> --- a/src/cyclictest/cyclictest.c
>> +++ b/src/cyclictest/cyclictest.c
>> @@ -1727,6 +1727,9 @@ static void print_stat(FILE *fp, struct thread_param *par,
>> int index, int verbos
>> {
>> struct thread_stat *stat = par->stats;
>>
>> + if ( stat->cycles < 5000)
>> + stat->max = 0;
>> +
>> if (!verbose) {
>> if (quiet != 1) {
>> char *fmt;
>>
>> ---
>>
>>
>> 03.08.2015 20:59, pavel пишет:
>>>
>>>>
>>>>> # ./cyclictest -S -p fifo
>>>>> defaulting realtime priority to 9
>>>>> # /dev/cpu_dma_latency set to 0us
>>>>> policy: fifo: loadavg: 0.00 0.03 0.05 1/218 4240
>>>>>
>>>>> T: 0 ( 4174) P: 9 I:1000 C: 430806 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 1 ( 4175) P: 9 I:1500 C: 287204 Min: 1 Act: 1 Avg: 1 Max: 6
>>>>> T: 2 ( 4176) P: 9 I:2000 C: 215403 Min: 1 Act: 2 Avg: 1 Max: 11
>>>>> T: 3 ( 4177) P: 9 I:2500 C: 172322 Min: 1 Act: 1 Avg: 1 Max: 9
>>>>> T: 4 ( 4178) P: 9 I:3000 C: 143602 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 5 ( 4179) P: 9 I:3500 C: 123087 Min: 1 Act: 1 Avg: 1 Max: 11
>>>>> T: 6 ( 4180) P: 9 I:4000 C: 107701 Min: 1 Act: 1 Avg: 1 Max: 10
>>>>> T: 7 ( 4181) P: 9 I:4500 C: 108232 Min: 1 Act: 2 Avg: 1 Max: 11
>>>>>
>>>>>
>>>>> It possible? 1us latency? o_O
>>>> No, your latency is 11us. Max latency is what we care about, not the
>>>> average.
>>> By the way, max values appear only at the start, then they are roughly equal to
>>> the average.
>>
>
> Interesting. Betting that's page faults and cache filling.
You might want to try running a background load on the system that pollutes
the cache and TLB and see if you get larger values after the start up period.
>
> I don't think we want to arbitrarily pick some number of cycles for a
> "settle time" (i.e. a grace period for the application to reach steady
> state). Possibly we should add an option for that? Specify some number
> of cycles or some amount of time that where the measurement threads run
> before actual measurements start?
>
> $ cyclictest --numa -p95 -m --settle=10ms
>
> That would say "run the measurement threads for ten milliseconds before
> actually starting the measurement period". That would allow them to
> fault in and fill cache lines before starting real work.
>
> Anyone else have an opinion?
>
> Clark
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: 1us latency?
2015-08-03 18:53 ` Clark Williams
2015-08-03 19:03 ` pavel
2015-08-04 20:27 ` Frank Rowand
@ 2015-08-06 13:12 ` Sebastian Andrzej Siewior
2015-08-06 15:03 ` Clark Williams
2 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-08-06 13:12 UTC (permalink / raw)
To: Clark Williams; +Cc: pavel, Linux RT Users
* Clark Williams | 2015-08-03 13:53:26 [-0500]:
>On Mon, 3 Aug 2015 21:36:26 +0300
>
>Interesting. Betting that's page faults and cache filling.
>
>I don't think we want to arbitrarily pick some number of cycles for a
>"settle time" (i.e. a grace period for the application to reach steady
>state). Possibly we should add an option for that? Specify some number
>of cycles or some amount of time that where the measurement threads run
>before actual measurements start?
>
> $ cyclictest --numa -p95 -m --settle=10ms
>
>That would say "run the measurement threads for ten milliseconds before
>actually starting the measurement period". That would allow them to
>fault in and fill cache lines before starting real work.
>
>Anyone else have an opinion?
Wouldn't you have everything in-memory after once cycle of each thread?
>Clark
Sebastian
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 1us latency?
2015-08-06 13:12 ` Sebastian Andrzej Siewior
@ 2015-08-06 15:03 ` Clark Williams
0 siblings, 0 replies; 9+ messages in thread
From: Clark Williams @ 2015-08-06 15:03 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: pavel, Linux RT Users
On Thu, 6 Aug 2015 15:12:59 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> * Clark Williams | 2015-08-03 13:53:26 [-0500]:
>
> >On Mon, 3 Aug 2015 21:36:26 +0300
> >
> >Interesting. Betting that's page faults and cache filling.
> >
> >I don't think we want to arbitrarily pick some number of cycles for a
> >"settle time" (i.e. a grace period for the application to reach steady
> >state). Possibly we should add an option for that? Specify some number
> >of cycles or some amount of time that where the measurement threads run
> >before actual measurements start?
> >
> > $ cyclictest --numa -p95 -m --settle=10ms
> >
> >That would say "run the measurement threads for ten milliseconds before
> >actually starting the measurement period". That would allow them to
> >fault in and fill cache lines before starting real work.
> >
> >Anyone else have an opinion?
>
> Wouldn't you have everything in-memory after once cycle of each thread?
I had to go through the timerthread() routine a couple of times to
convince myself, but I think you're right.
So if we wanted to discount the paging-in overhead, we could have each
thread do a "dummy" pass through the timer loop (i.e. do everything but
just not record the results) and then start recording measurements. I
may hack together an option to try that and see what sort of results
we get.
Clark
^ permalink raw reply [flat|nested] 9+ messages in thread