Some troubles with perf and measuring flops

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Some troubles with perf and measuring flops
@ 2014-03-06  0:55 Alen Stojanov
  2014-03-06  1:40 ` Vince Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Alen Stojanov @ 2014-03-06  0:55 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

Dear Linux Perf Users Community,

I noticed some inconsistencies with the perf tool. I would like to 
determine whether I am doing something wrong, or whether there are 
problem in the perf tool. Here is the problem:

I would like to obtain flops on a simple matrix-to-matrix multiplication 
algorithm. The code is available in the attachment as mmmtest.c. To 
obtain flops, I run the perf tool using raw counters. When I try to 
obtain flops for matrices having sizes bellow 150x150, I obtain accurate 
results. Example (anticipated flops: 100 * 100 * 100 * 2 = 2'000'000):

perf stat -e r538010 ./mmmtest 100

  Performance counter stats for './mmmtest 100':

          2,078,775 r538010

        0.003889544 seconds time elapsed


However, whenever I try to run matrices of bigger size, the reported 
flops are not even close to the flops that I am supposed to obtain 
(anticipated results: 600 * 600 * 600 * 2 = 432'000'000):

perf stat -e r538010 ./mmmtest 600

  Performance counter stats for './mmmtest 600':

      2,348,148,851 r538010

        0.955511968 seconds time elapsed


To give you more info to replicate the problem, I provide you with the 
following:

CPU: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz, 8 cores
Linux Kernel: 3.11.0-12-generic
GCC Version: gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
Monitored events: FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE - Raw event: 
0x538010 (converted using libpfm4)

I have compiled the mmmtest.c using gcc -O3 -march=corei7-avx -o mmmtest 
mmmtest.c. You can also find mmmtest.s asm version in the attachment.

Do you know why does this happens ? How can I instruct perf to obtain 
accurate results ?

Greetings,
Alen

[-- Attachment #2: mmmtest.c --]
[-- Type: text/plain, Size: 479 bytes --]

#include <stdlib.h>

int m, n, k;
double *A, *B, *C;

void compute() {
	int i,j,h;
	for(i = 0; i < m; ++i) {
		for(j = 0; j < n; ++j) {
			for(h = 0; h < k; ++h) {
				C[i*n+j] += A[i*k+h] * B[h*n+j];
			}
		}
	}
}

int main(int argc, char **argv)
{
	m = atoi(argv[1]); n = m; k = m;

	A = (double *) malloc (m * k * sizeof(double));
	B = (double *) malloc (k * n * sizeof(double));
	C = (double *) malloc (m * n * sizeof(double));

	compute ();

	free(A);
	free(B);
	free(C);
}

[-- Attachment #3: mmmtest.s --]
[-- Type: text/plain, Size: 2423 bytes --]

	.file	"mmmtest.c"
	.text
	.p2align 4,,15
	.globl	compute
	.type	compute, @function
compute:
.LFB14:
	.cfi_startproc
	pushq	%r15
	.cfi_def_cfa_offset 16
	.cfi_offset 15, -16
	pushq	%r14
	.cfi_def_cfa_offset 24
	.cfi_offset 14, -24
	pushq	%r13
	.cfi_def_cfa_offset 32
	.cfi_offset 13, -32
	pushq	%r12
	.cfi_def_cfa_offset 40
	.cfi_offset 12, -40
	movl	m(%rip), %r12d
	pushq	%rbp
	.cfi_def_cfa_offset 48
	.cfi_offset 6, -48
	pushq	%rbx
	.cfi_def_cfa_offset 56
	.cfi_offset 3, -56
	testl	%r12d, %r12d
	jle	.L9
	movl	n(%rip), %ebp
	xorl	%ebx, %ebx
	movl	k(%rip), %esi
	movq	B(%rip), %r15
	movq	A(%rip), %rdi
	movq	C(%rip), %r11
	leal	-1(%rbp), %eax
	movslq	%ebp, %r8
	leaq	8(,%rax,8), %r13
	movslq	%esi, %r14
	salq	$3, %r8
	salq	$3, %r14
.L3:
	testl	%ebp, %ebp
	jle	.L5
	leaq	0(%r13,%r11), %r10
	movq	%r15, %r9
	movq	%r11, %rcx
	.p2align 4,,10
	.p2align 3
.L8:
	testl	%esi, %esi
	jle	.L6
	vmovsd	(%rcx), %xmm0
	movq	%r9, %rdx
	xorl	%eax, %eax
	.p2align 4,,10
	.p2align 3
.L7:
	vmovsd	(%rdi,%rax,8), %xmm1
	addq	$1, %rax
	vmulsd	(%rdx), %xmm1, %xmm1
	addq	%r8, %rdx
	cmpl	%eax, %esi
	vaddsd	%xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, (%rcx)
	jg	.L7
.L6:
	addq	$8, %rcx
	addq	$8, %r9
	cmpq	%r10, %rcx
	jne	.L8
.L5:
	addl	$1, %ebx
	addq	%r14, %rdi
	addq	%r8, %r11
	cmpl	%r12d, %ebx
	jne	.L3
.L9:
	popq	%rbx
	.cfi_def_cfa_offset 48
	popq	%rbp
	.cfi_def_cfa_offset 40
	popq	%r12
	.cfi_def_cfa_offset 32
	popq	%r13
	.cfi_def_cfa_offset 24
	popq	%r14
	.cfi_def_cfa_offset 16
	popq	%r15
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE14:
	.size	compute, .-compute
	.section	.text.startup,"ax",@progbits
	.p2align 4,,15
	.globl	main
	.type	main, @function
main:
.LFB15:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	.cfi_offset 3, -16
	movl	$10, %edx
	movq	8(%rsi), %rdi
	xorl	%esi, %esi
	call	strtol
	movl	%eax, m(%rip)
	movl	%eax, n(%rip)
	movl	%eax, k(%rip)
	imull	%eax, %eax
	movslq	%eax, %rbx
	salq	$3, %rbx
	movq	%rbx, %rdi
	call	malloc
	movq	%rbx, %rdi
	movq	%rax, A(%rip)
	call	malloc
	movq	%rbx, %rdi
	movq	%rax, B(%rip)
	call	malloc
	movq	%rax, C(%rip)
	xorl	%eax, %eax
	call	compute
	movq	A(%rip), %rdi
	call	free
	movq	B(%rip), %rdi
	call	free
	movq	C(%rip), %rdi
	call	free
	popq	%rbx
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE15:
	.size	main, .-main
	.comm	C,8,8
	.comm	B,8,8
	.comm	A,8,8
	.comm	k,4,4
	.comm	n,4,4
	.comm	m,4,4
	.ident	"GCC: (Ubuntu/Linaro 4.8.1-10ubuntu8) 4.8.1"
	.section	.note.GNU-stack,"",@progbits

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-06  0:55 Some troubles with perf and measuring flops Alen Stojanov
@ 2014-03-06  1:40 ` Vince Weaver
  2014-03-06  1:53   ` Alen Stojanov
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Weaver @ 2014-03-06  1:40 UTC (permalink / raw)
  To: Alen Stojanov; +Cc: linux-perf-users

On Thu, 6 Mar 2014, Alen Stojanov wrote:

> However, whenever I try to run matrices of bigger size, the reported flops are
> not even close to the flops that I am supposed to obtain (anticipated results:
> 600 * 600 * 600 * 2 = 432'000'000):
> 
> perf stat -e r538010 ./mmmtest 600
> 
>  Performance counter stats for './mmmtest 600':
> 
>      2,348,148,851 r538010
> 
>        0.955511968 seconds time elapsed
> 
...
> CPU: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz, 8 cores
> Linux Kernel: 3.11.0-12-generic
> GCC Version: gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
> Monitored events: FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE - Raw event: 0x538010
> (converted using libpfm4)
...
> Do you know why does this happens ? How can I instruct perf to obtain accurate
> results ?

one thing you might want to do is put :u on your event name so you are 
only measuring user space accesses not kernel too.

floating point events are notoriously unreliable on modern intel 
processors.

The event might also be counting speculative events or uops and it gets 
more complicated with AVX in the mix.  What does the intel documentation 
say for the event for your architecture?

Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-06  1:40 ` Vince Weaver
@ 2014-03-06  1:53   ` Alen Stojanov
  2014-03-06 18:25     ` Vince Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Alen Stojanov @ 2014-03-06  1:53 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users

On 06/03/14 02:40, Vince Weaver wrote:
> On Thu, 6 Mar 2014, Alen Stojanov wrote:
>
>> However, whenever I try to run matrices of bigger size, the reported flops are
>> not even close to the flops that I am supposed to obtain (anticipated results:
>> 600 * 600 * 600 * 2 = 432'000'000):
>>
>> perf stat -e r538010 ./mmmtest 600
>>
>>   Performance counter stats for './mmmtest 600':
>>
>>       2,348,148,851 r538010
>>
>>         0.955511968 seconds time elapsed
>>
> ...
>> CPU: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz, 8 cores
>> Linux Kernel: 3.11.0-12-generic
>> GCC Version: gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu8)
>> Monitored events: FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE - Raw event: 0x538010
>> (converted using libpfm4)
> ...
>> Do you know why does this happens ? How can I instruct perf to obtain accurate
>> results ?
> one thing you might want to do is put :u on your event name so you are
> only measuring user space accesses not kernel too.
Well, even if the perf is measuring kernel events, I really doubt that 
the kernel is doing any double precision floating point operations. 
Nevertheless, I tried the :u option, and this does not change anything:

perf stat -e r538010:u ./mmmtest 100

  Performance counter stats for './mmmtest 100':

          2,079,002 r538010:u

        0.003887873 seconds time elapsed


perf stat -e r538010:u ./mmmtest 600

  Performance counter stats for './mmmtest 600':

      2,349,426,507 r538010:u

        0.956538237 seconds time elapsed

>
> floating point events are notoriously unreliable on modern intel
> processors.
>
> The event might also be counting speculative events or uops and it gets
> more complicated with AVX in the mix.  What does the intel documentation
> say for the event for your architecture?
I agree on this. However, if you would look at the .s file, you can see 
that it does not have any AVX instructions inside. And if I would 
monitor any other event on the CPU that counts any flop operations, I 
get 0s. It seems that the FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE is the only 
one that occurs. I don't think that FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE 
counts speculative events.
>
> Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-06  1:53   ` Alen Stojanov
@ 2014-03-06 18:25     ` Vince Weaver
  2014-03-06 19:41       ` Alen Stojanov
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Weaver @ 2014-03-06 18:25 UTC (permalink / raw)
  To: Alen Stojanov; +Cc: linux-perf-users

On Thu, 6 Mar 2014, Alen Stojanov wrote:

> > more complicated with AVX in the mix.  What does the intel documentation
> > say for the event for your architecture?
> I agree on this. However, if you would look at the .s file, you can see that
> it does not have any AVX instructions inside. 

I'm pretty sure vmovsd and vmuld are AVX instructions.

> And if I would monitor any other
> event on the CPU that counts any flop operations, I get 0s. It seems that the
> FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE is the only one that occurs. I don't think
> that FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE counts speculative events.

are you sure?

See http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops
about FP events on SNB and IVB at least.

Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-06 18:25     ` Vince Weaver
@ 2014-03-06 19:41       ` Alen Stojanov
  2014-03-11 23:53         ` Alen Stojanov
  0 siblings, 1 reply; 7+ messages in thread
From: Alen Stojanov @ 2014-03-06 19:41 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users

On 06/03/14 19:25, Vince Weaver wrote:
> On Thu, 6 Mar 2014, Alen Stojanov wrote:
>
>>> more complicated with AVX in the mix.  What does the intel documentation
>>> say for the event for your architecture?
>> I agree on this. However, if you would look at the .s file, you can see that
>> it does not have any AVX instructions inside.
> I'm pretty sure vmovsd and vmuld are AVX instructions.

Yes you are absolutely right. I made a wrong statement. What I really 
meant was that there are no AVX instructions on packed doubles, since 
vmovsd and vmulsd operate with scalar doubles. This is also why I get 
zeros whenever I do:

  perf stat -e r530211 ./mmmtest 600

  Performance counter stats for './mmmtest 600':

                  0 r530211

        0.952037328 seconds time elapsed

What I really wanted to depict was the fact that I don't have to mix 
several counters to obtain results, as there would always be only 
FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE as an event in the code.

>> And if I would monitor any other
>> event on the CPU that counts any flop operations, I get 0s. It seems that the
>> FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE is the only one that occurs. I don't think
>> that FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE counts speculative events.
> are you sure?
>
> See http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops
> about FP events on SNB and IVB at least.

Thank you for the link. I only made the assumption that we do not have 
speculative events, since in a previous project that was done as part of 
my research group, we were able to get accurate flops, using Intel PCM: 
https://github.com/GeorgOfenbeck/perfplot/ (and we were able to get 
correct flops of a of a mmm having size 1600x1600x1600).

Nevertheless, as much as I understood, the PAPI is discussing count 
deviations whenever several counters are combined. In my use case that I 
send you before, I would always use one single raw counter to obtain 
counts. But the deviations that I obtain, they grow as the matrix size 
grows. I made a list to depict how much the flops would deviate

List format:
(mmm size) (anticipated_flops) (obtained_flops) (anticipated_flops / 
obtained_flops * 100.0)
10      2000      2061      97.040
20      16000      16692      95.854
30      54000      58097      92.948
40      128000      132457      96.635
50      250000      257482      97.094
60      432000      452624      95.443
70      686000      730299      93.934
80      1024000      1098453      93.222
90      1458000      1573331      92.670
100      2000000      2138014      93.545
110      2662000      2852239      93.330
120      3456000      3626028      95.311
130      4394000      4783638      91.855
140      5488000      5979236      91.784
150      6750000      7349358      91.845
160      8192000      11324521      72.339
170      9826000      11000354      89.324
180      11664000      13191288      88.422
190      13718000      16492253      83.178
200      16000000      20253599      78.998
210      18522000      23839202      77.696
220      21296000      27832906      76.514
230      24334000      32056213      75.910
240      27648000      40026709      69.074
250      31250000      41837527      74.694
260      35152000      47291908      74.330
270      39366000      53534225      73.534
280      43904000      60193718      72.938
290      48778000      67230702      72.553
300      54000000      74451165      72.531
310      59582000      82773965      71.982
320      65536000      129974914      50.422
330      71874000      99894238      71.950
340      78608000      108421806      72.502
350      85750000      118870753      72.137
360      93312000      129058036      72.302
370      101306000      141901053      71.392
380      109744000      152138340      72.134
390      118638000      170393279      69.626
400      128000000      225637046      56.728
410      137842000      208174503      66.215
420      148176000      205434911      72.128
430      159014000      231594232      68.661
440      170368000      235422186      72.367
450      182250000      280728129      64.920
460      194672000      282586911      68.889
470      207646000      310944304      66.779
480      221184000      409532779      54.009
490      235298000      381057200      61.749
500      250000000      413099959      60.518
510      265302000      393498007      67.421
520      281216000      675607105      41.624
530      297754000      988906780      30.109
540      314928000      1228529787      25.635
550      332750000      1396858866      23.821
560      351232000      2144144283      16.381
570      370386000      2712975462      13.652
580      390224000      3308411489      11.795
590      410758000      2326514544      17.656

And I cant see a pattern to derive any conclusion that makes sense.

>
> Vince
Alen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-06 19:41       ` Alen Stojanov
@ 2014-03-11 23:53         ` Alen Stojanov
  2014-03-13 20:17           ` Vince Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Alen Stojanov @ 2014-03-11 23:53 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users

So just to summarize (since I did not get any reply) - the final 
conclusion is that I can not simply obtain proper flop counts with linux 
perf, because of hardware limitations ?

On 06/03/14 20:41, Alen Stojanov wrote:
> On 06/03/14 19:25, Vince Weaver wrote:
>> On Thu, 6 Mar 2014, Alen Stojanov wrote:
>>
>>>> more complicated with AVX in the mix. What does the intel 
>>>> documentation
>>>> say for the event for your architecture?
>>> I agree on this. However, if you would look at the .s file, you can 
>>> see that
>>> it does not have any AVX instructions inside.
>> I'm pretty sure vmovsd and vmuld are AVX instructions.
>
> Yes you are absolutely right. I made a wrong statement. What I really 
> meant was that there are no AVX instructions on packed doubles, since 
> vmovsd and vmulsd operate with scalar doubles. This is also why I get 
> zeros whenever I do:
>
>  perf stat -e r530211 ./mmmtest 600
>
>  Performance counter stats for './mmmtest 600':
>
>                  0 r530211
>
>        0.952037328 seconds time elapsed
>
> What I really wanted to depict was the fact that I don't have to mix 
> several counters to obtain results, as there would always be only 
> FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE as an event in the code.
>
>>> And if I would monitor any other
>>> event on the CPU that counts any flop operations, I get 0s. It seems 
>>> that the
>>> FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE is the only one that occurs. I 
>>> don't think
>>> that FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE counts speculative events.
>> are you sure?
>>
>> See http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops
>> about FP events on SNB and IVB at least.
>
> Thank you for the link. I only made the assumption that we do not have 
> speculative events, since in a previous project that was done as part 
> of my research group, we were able to get accurate flops, using Intel 
> PCM: https://github.com/GeorgOfenbeck/perfplot/ (and we were able to 
> get correct flops of a of a mmm having size 1600x1600x1600).
>
> Nevertheless, as much as I understood, the PAPI is discussing count 
> deviations whenever several counters are combined. In my use case that 
> I send you before, I would always use one single raw counter to obtain 
> counts. But the deviations that I obtain, they grow as the matrix size 
> grows. I made a list to depict how much the flops would deviate
>
> List format:
> (mmm size) (anticipated_flops) (obtained_flops) (anticipated_flops / 
> obtained_flops * 100.0)
> 10      2000      2061      97.040
> 20      16000      16692      95.854
> 30      54000      58097      92.948
> 40      128000      132457      96.635
> 50      250000      257482      97.094
> 60      432000      452624      95.443
> 70      686000      730299      93.934
> 80      1024000      1098453      93.222
> 90      1458000      1573331      92.670
> 100      2000000      2138014      93.545
> 110      2662000      2852239      93.330
> 120      3456000      3626028      95.311
> 130      4394000      4783638      91.855
> 140      5488000      5979236      91.784
> 150      6750000      7349358      91.845
> 160      8192000      11324521      72.339
> 170      9826000      11000354      89.324
> 180      11664000      13191288      88.422
> 190      13718000      16492253      83.178
> 200      16000000      20253599      78.998
> 210      18522000      23839202      77.696
> 220      21296000      27832906      76.514
> 230      24334000      32056213      75.910
> 240      27648000      40026709      69.074
> 250      31250000      41837527      74.694
> 260      35152000      47291908      74.330
> 270      39366000      53534225      73.534
> 280      43904000      60193718      72.938
> 290      48778000      67230702      72.553
> 300      54000000      74451165      72.531
> 310      59582000      82773965      71.982
> 320      65536000      129974914      50.422
> 330      71874000      99894238      71.950
> 340      78608000      108421806      72.502
> 350      85750000      118870753      72.137
> 360      93312000      129058036      72.302
> 370      101306000      141901053      71.392
> 380      109744000      152138340      72.134
> 390      118638000      170393279      69.626
> 400      128000000      225637046      56.728
> 410      137842000      208174503      66.215
> 420      148176000      205434911      72.128
> 430      159014000      231594232      68.661
> 440      170368000      235422186      72.367
> 450      182250000      280728129      64.920
> 460      194672000      282586911      68.889
> 470      207646000      310944304      66.779
> 480      221184000      409532779      54.009
> 490      235298000      381057200      61.749
> 500      250000000      413099959      60.518
> 510      265302000      393498007      67.421
> 520      281216000      675607105      41.624
> 530      297754000      988906780      30.109
> 540      314928000      1228529787      25.635
> 550      332750000      1396858866      23.821
> 560      351232000      2144144283      16.381
> 570      370386000      2712975462      13.652
> 580      390224000      3308411489      11.795
> 590      410758000      2326514544      17.656
>
> And I cant see a pattern to derive any conclusion that makes sense.
>
>>
>> Vince
> Alen
> -- 
> To unsubscribe from this list: send the line "unsubscribe 
> linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Some troubles with perf and measuring flops
  2014-03-11 23:53         ` Alen Stojanov
@ 2014-03-13 20:17           ` Vince Weaver
  0 siblings, 0 replies; 7+ messages in thread
From: Vince Weaver @ 2014-03-13 20:17 UTC (permalink / raw)
  To: Alen Stojanov; +Cc: linux-perf-users

On Wed, 12 Mar 2014, Alen Stojanov wrote:

> So just to summarize (since I did not get any reply) - the final conclusion is
> that I can not simply obtain proper flop counts with linux perf, because of
> hardware limitations ?

Performance counters are tricky things.  You shouldn't take my word for 
it, you should either run tests or contact people inside Intel.

But yes, "flop count" has always been a tricky quantity to measure (what 
constitutes a flop?  Is a fused multiply-add one flop or two?  etc.)

And in recent Intel processors the floating point events are notriously 
hard to use, and for a while Intel even stopped documenting that they
existed until the HPC community complained enough that Intel has brought 
them back but with a lot of warnings about accuracy.

This doesn't mean you can't get useful results out of the events, it just 
means that it's probably never going to be possible to get "exact flop 
counts", whatever that means.

Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-03-13 20:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-06  0:55 Some troubles with perf and measuring flops Alen Stojanov
2014-03-06  1:40 ` Vince Weaver
2014-03-06  1:53   ` Alen Stojanov
2014-03-06 18:25     ` Vince Weaver
2014-03-06 19:41       ` Alen Stojanov
2014-03-11 23:53         ` Alen Stojanov
2014-03-13 20:17           ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).