linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Issue] access cpu_cycles PMU conter in r8a7791
@ 2014-10-15 12:13 Chai Wen
  2014-10-17  5:15 ` Magnus Damm
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Chai Wen @ 2014-10-15 12:13 UTC (permalink / raw)
  To: linux-sh

Hi developers,

Sorry for some disturbing.

I am planing to use pmu counter cpu_cycles to measure the elapsed time of some small pieces of code.
But I found that I can not make the PMU work properly.

My cpu is called r8a7791, a 2-core cpu. And the kernel version is 3.10.31.
The following is my simple code to test it and its result.

I found that no matter what's the scale of this loop:
	for (i = 0; i < loops; i++) {
		__asm__ __volatile__("mov r0, r0\n\t");
	}
	the cycles counts got via:
	__asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
	are not significantly different from each other. I am confused about these values.

Any comment or help is appreciated, thanks a lot.

============
num of counters: 6
before count: 0
loops is 100 takes cycles 104665
num of counters: 6
before count: 0
loops is 500 takes cycles 104650
num of counters: 6
before count: 0
loops is 1000 takes cycles 104827
num of counters: 6
before count: 0
loops is 5000 takes cycles 120351
num of counters: 6
before count: 0
loops is 10000 takes cycles 105041
num of counters: 6
before count: 0
loops is 50000 takes cycles 115470
num of counters: 6
before count: 0
loops is 100000 takes cycles 107905
num of counters: 6
before count: 0
loops is 500000 takes cycles 120427
num of counters: 6
before count: 9
loops is 1000000 takes cycles 145033



-------------------------------------------------
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/preempt.h>

MODULE_LICENSE("GPL");


static unsigned int loops\x100;
module_param(loops, uint, S_IRUGO);

static unsigned int get_pmucycles(void)
{
	unsigned int count;
	__asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
	return count;
}

static int __init pmu_init(void)
{
	unsigned int value = 0x1;
	unsigned int num_counters;
	unsigned int i;
	unsigned int count1, count2;

	//__asm__ __volatile__("MCR p15, 0, %0, c9, c14, 0\n\t" :: "r"(1));
	/* disable interrupt request */
	__asm__ __volatile__("MCR p15, 0, %0, c9, c14, 2\n\t" :: "r"(0x8000001f));


	__asm__ __volatile__("MRC p15, 0, %0, c9, c12, 0\n\t" : "=r"(num_counters));
	num_counters = (num_counters >> 11) & 0x1FU;
	printk("num of counters: %u\n", num_counters);

	value |= 0x6; //reset and enable counters
	value |= 0x8;   //enable cycles counter divide

	__asm__ __volatile__("MCR p15, 0, %0, c9, c12, 0\n\t" :: "r"(value));
	/* enable all possible counters */
	__asm__ __volatile__("MCR p15, 0, %0, c9, c12, 1\n\t" :: "r"(0x8000001f));
	/* clear counters */
	__asm__ __volatile__("MCR p15, 0, %0, c9, c12, 3\n\t" :: "r"(0x8000001f));
	/* select cycles counter */
	__asm__ __volatile__("MCR p15, 0, %0, c9, c12, 5\n\t" :: "r"(0x1f));
	
	count1 = get_pmucycles();
	printk("before count: %u\n", count1);
	
	preempt_disable();
	for (i = 0; i < loops; i++) {
		__asm__ __volatile__("mov r0, r0\n\t");
	}
	preempt_enable();
	
	count2 = get_pmucycles();
	if (count2 < count1)
	printk("after count: %u\n", count2);
	printk("loops is %u takes cycles %u\n", loops, count2 - count1);
	return -1;
}

static void __exit pmu_exit(void)
{
	printk("bye\n");
}

module_init(pmu_init);
module_exit(pmu_exit);
--------------------------------------

-- 
Regards

Chai Wen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Issue] access cpu_cycles PMU conter in r8a7791
  2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
@ 2014-10-17  5:15 ` Magnus Damm
  2014-10-17 11:37 ` Chai Wen
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Magnus Damm @ 2014-10-17  5:15 UTC (permalink / raw)
  To: linux-sh

Hi Chai Wen,

On Wed, Oct 15, 2014 at 2:13 PM, Chai Wen <chaiw.fnst@cn.fujitsu.com> wrote:
> Hi developers,
>
> Sorry for some disturbing.
>
> I am planing to use pmu counter cpu_cycles to measure the elapsed time of some small pieces of code.
> But I found that I can not make the PMU work properly.
>
> My cpu is called r8a7791, a 2-core cpu. And the kernel version is 3.10.31.
> The following is my simple code to test it and its result.
>
> I found that no matter what's the scale of this loop:
>         for (i = 0; i < loops; i++) {
>                 __asm__ __volatile__("mov r0, r0\n\t");
>         }
>         the cycles counts got via:
>         __asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
>         are not significantly different from each other. I am confused about these values.
>
> Any comment or help is appreciated, thanks a lot.

I'm not aware of any known issue like this, however I do notice that
you are using a rather old kernel version. The kernel version may be
unrelated to the root cause of your issue, however from a community
stand point I recommend that you try to reproduce this issue on the
latest upstream with the most recent boot loader for your board. If
upstream is not working then we need to fix that first.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Issue] access cpu_cycles PMU conter in r8a7791
  2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
  2014-10-17  5:15 ` Magnus Damm
@ 2014-10-17 11:37 ` Chai Wen
  2014-10-20 10:14 ` Geert Uytterhoeven
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Chai Wen @ 2014-10-17 11:37 UTC (permalink / raw)
  To: linux-sh

On 10/17/2014 01:15 PM, Magnus Damm wrote:

> Hi Chai Wen,
> 
> On Wed, Oct 15, 2014 at 2:13 PM, Chai Wen <chaiw.fnst@cn.fujitsu.com> wrote:
>> Hi developers,
>>
>> Sorry for some disturbing.
>>
>> I am planing to use pmu counter cpu_cycles to measure the elapsed time of some small pieces of code.
>> But I found that I can not make the PMU work properly.
>>
>> My cpu is called r8a7791, a 2-core cpu. And the kernel version is 3.10.31.
>> The following is my simple code to test it and its result.
>>
>> I found that no matter what's the scale of this loop:
>>         for (i = 0; i < loops; i++) {
>>                 __asm__ __volatile__("mov r0, r0\n\t");
>>         }
>>         the cycles counts got via:
>>         __asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
>>         are not significantly different from each other. I am confused about these values.
>>
>> Any comment or help is appreciated, thanks a lot.
> 
> I'm not aware of any known issue like this, however I do notice that
> you are using a rather old kernel version. The kernel version may be
> unrelated to the root cause of your issue, however from a community
> stand point I recommend that you try to reproduce this issue on the
> latest upstream with the most recent boot loader for your board. If
> upstream is not working then we need to fix that first.


Hi Magnus Damm

thanks for your reply.

I will try this on the latest upstream kernel version.


thanks
chai wen

> 
> Cheers,
> 
> / magnus
> .
> 



-- 
Regards

Chai Wen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Issue] access cpu_cycles PMU conter in r8a7791
  2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
  2014-10-17  5:15 ` Magnus Damm
  2014-10-17 11:37 ` Chai Wen
@ 2014-10-20 10:14 ` Geert Uytterhoeven
  2014-10-20 10:29 ` Geert Uytterhoeven
  2014-10-20 11:15 ` Chai Wen
  4 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2014-10-20 10:14 UTC (permalink / raw)
  To: linux-sh

Hi Wen-san,

On Wed, Oct 15, 2014 at 2:13 PM, Chai Wen <chaiw.fnst@cn.fujitsu.com> wrote:
> I am planing to use pmu counter cpu_cycles to measure the elapsed time of some small pieces of code.
> But I found that I can not make the PMU work properly.
>
> My cpu is called r8a7791, a 2-core cpu. And the kernel version is 3.10.31.
> The following is my simple code to test it and its result.
>
> I found that no matter what's the scale of this loop:
>         for (i = 0; i < loops; i++) {
>                 __asm__ __volatile__("mov r0, r0\n\t");
>         }
>         the cycles counts got via:
>         __asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
>         are not significantly different from each other. I am confused about these values.
>
> Any comment or help is appreciated, thanks a lot.
>
> ============
> num of counters: 6
> before count: 0
> loops is 100 takes cycles 104665
> num of counters: 6
> before count: 0
> loops is 500 takes cycles 104650
> num of counters: 6
> before count: 0
> loops is 1000 takes cycles 104827
> num of counters: 6
> before count: 0
> loops is 5000 takes cycles 120351
> num of counters: 6
> before count: 0
> loops is 10000 takes cycles 105041
> num of counters: 6
> before count: 0
> loops is 50000 takes cycles 115470
> num of counters: 6
> before count: 0
> loops is 100000 takes cycles 107905
> num of counters: 6
> before count: 0
> loops is 500000 takes cycles 120427
> num of counters: 6
> before count: 9
> loops is 1000000 takes cycles 145033

I get similar results when compiling your test program as a loadable
module. When built-in, the results look saner, but still not perfect.

After wrapping everything in local_irq_save(flags) / local_irq_restore(flags),
and moving the printk() outside the measurement loop, it looks much better:

On r8a7791:

num of counters: 6
before count: 0
loops is 100 takes cycles 26
num of counters: 6
before count: 0
loops is 1000 takes cycles 265
num of counters: 6
before count: 0
loops is 10000 takes cycles 2656
num of counters: 6
before count: 0
loops is 100000 takes cycles 26563
num of counters: 6
before count: 0
loops is 1000000 takes cycles 265625
num of counters: 6
before count: 0
loops is 10000000 takes cycles 2656250
num of counters: 6
before count: 0
loops is 100000000 takes cycles 26562500

On r8a7740:

num of counters: 6
before count: 0
loops is 100 takes cycles 5
num of counters: 6
before count: 0
loops is 1000 takes cycles 40
num of counters: 6
before count: 0
loops is 10000 takes cycles 391
num of counters: 6
before count: 0
loops is 100000 takes cycles 3907
num of counters: 6
before count: 0
loops is 1000000 takes cycles 39063
num of counters: 6
before count: 0
loops is 10000000 takes cycles 390626
num of counters: 6
before count: 0
loops is 100000000 takes cycles 3906255

Thanks a lot for your test code, I was just about to do high-resolution
measurements on r8a7740 anyway ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Issue] access cpu_cycles PMU conter in r8a7791
  2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
                   ` (2 preceding siblings ...)
  2014-10-20 10:14 ` Geert Uytterhoeven
@ 2014-10-20 10:29 ` Geert Uytterhoeven
  2014-10-20 11:15 ` Chai Wen
  4 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2014-10-20 10:29 UTC (permalink / raw)
  To: linux-sh

On Mon, Oct 20, 2014 at 12:14 PM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> After wrapping everything in local_irq_save(flags) / local_irq_restore(flags),
> and moving the printk() outside the measurement loop, it looks much better:

The printk() was the biggest culprit. Disabling interrupts isn't needed.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Issue] access cpu_cycles PMU conter in r8a7791
  2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
                   ` (3 preceding siblings ...)
  2014-10-20 10:29 ` Geert Uytterhoeven
@ 2014-10-20 11:15 ` Chai Wen
  4 siblings, 0 replies; 6+ messages in thread
From: Chai Wen @ 2014-10-20 11:15 UTC (permalink / raw)
  To: linux-sh

On 10/20/2014 06:14 PM, Geert Uytterhoeven wrote:

> Hi Wen-san,
> 
> On Wed, Oct 15, 2014 at 2:13 PM, Chai Wen <chaiw.fnst@cn.fujitsu.com> wrote:
>> I am planing to use pmu counter cpu_cycles to measure the elapsed time of some small pieces of code.
>> But I found that I can not make the PMU work properly.
>>
>> My cpu is called r8a7791, a 2-core cpu. And the kernel version is 3.10.31.
>> The following is my simple code to test it and its result.
>>
>> I found that no matter what's the scale of this loop:
>>         for (i = 0; i < loops; i++) {
>>                 __asm__ __volatile__("mov r0, r0\n\t");
>>         }
>>         the cycles counts got via:
>>         __asm__ __volatile__("MRC p15, 0, %0, c9, c13, 0\n\t" : "=r"(count));
>>         are not significantly different from each other. I am confused about these values.
>>
>> Any comment or help is appreciated, thanks a lot.
>>
>> ============
>> num of counters: 6
>> before count: 0
>> loops is 100 takes cycles 104665
>> num of counters: 6
>> before count: 0
>> loops is 500 takes cycles 104650
>> num of counters: 6
>> before count: 0
>> loops is 1000 takes cycles 104827
>> num of counters: 6
>> before count: 0
>> loops is 5000 takes cycles 120351
>> num of counters: 6
>> before count: 0
>> loops is 10000 takes cycles 105041
>> num of counters: 6
>> before count: 0
>> loops is 50000 takes cycles 115470
>> num of counters: 6
>> before count: 0
>> loops is 100000 takes cycles 107905
>> num of counters: 6
>> before count: 0
>> loops is 500000 takes cycles 120427
>> num of counters: 6
>> before count: 9
>> loops is 1000000 takes cycles 145033
> 
> I get similar results when compiling your test program as a loadable
> module. When built-in, the results look saner, but still not perfect.
> 
> After wrapping everything in local_irq_save(flags) / local_irq_restore(flags),
> and moving the printk() outside the measurement loop, it looks much better:



Hi Geert Uytterhoeven

Thank you very much for your test and detail explanation in your another reply.

Indeed,  printk is a rather heavy operation. The result became much saner just after I removed
the printk in the loop. And it was rather stable.

It is a mistake of my pool knowledge of printk.
Glad to see that this code could help you ;).

thanks
chai wen

> 
> On r8a7791:
> 
> num of counters: 6
> before count: 0
> loops is 100 takes cycles 26
> num of counters: 6
> before count: 0
> loops is 1000 takes cycles 265
> num of counters: 6
> before count: 0
> loops is 10000 takes cycles 2656
> num of counters: 6
> before count: 0
> loops is 100000 takes cycles 26563
> num of counters: 6
> before count: 0
> loops is 1000000 takes cycles 265625
> num of counters: 6
> before count: 0
> loops is 10000000 takes cycles 2656250
> num of counters: 6
> before count: 0
> loops is 100000000 takes cycles 26562500
> 
> On r8a7740:
> 
> num of counters: 6
> before count: 0
> loops is 100 takes cycles 5
> num of counters: 6
> before count: 0
> loops is 1000 takes cycles 40
> num of counters: 6
> before count: 0
> loops is 10000 takes cycles 391
> num of counters: 6
> before count: 0
> loops is 100000 takes cycles 3907
> num of counters: 6
> before count: 0
> loops is 1000000 takes cycles 39063
> num of counters: 6
> before count: 0
> loops is 10000000 takes cycles 390626
> num of counters: 6
> before count: 0
> loops is 100000000 takes cycles 3906255
> 
> Thanks a lot for your test code, I was just about to do high-resolution
> measurements on r8a7740 anyway ;-)
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
> .
> 



-- 
Regards

Chai Wen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-20 11:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-15 12:13 [Issue] access cpu_cycles PMU conter in r8a7791 Chai Wen
2014-10-17  5:15 ` Magnus Damm
2014-10-17 11:37 ` Chai Wen
2014-10-20 10:14 ` Geert Uytterhoeven
2014-10-20 10:29 ` Geert Uytterhoeven
2014-10-20 11:15 ` Chai Wen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).