* sched_clock - microblaze
@ 2010-04-15 14:55 Michal Simek
2010-04-15 15:04 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Michal Simek @ 2010-04-15 14:55 UTC (permalink / raw)
To: Thomas Gleixner, Steven Rostedt, LKML; +Cc: Wu Zhangjin
Hi Thomas and Steven,
I would like to improve time measuring for ftrace (Log below)
I looked at http://lkml.org/lkml/2009/11/16/181
where Thomas suggested to create sched_clock() function.
I used the same solution as Wu proposed but it is not nice.
Is unimplemented sched_clock the reason why ftrace not show fine grain time?
Or is there any other thing which is wrong?
Thanks,
Michal
Dumping ftrace buffer:
---------------------------------
0) ! 10000.00 us | }
0) <========== |
0) ! 10000.00 us | }
0) ! 10000.00 us | }
0) 0.000 us | get_seconds();
0) 0.000 us | tcp_rcv_rtt_update();
0) 0.000 us | tcp_ack();
0) 0.000 us | tcp_rcv_rtt_update();
0) 0.000 us | tcp_urg();
0) | tcp_data_queue() {
0) 0.000 us | __sk_mem_schedule();
0) 0.000 us | tcp_event_data_recv();
0) 0.000 us | sock_def_readable();
0) 0.000 us | }
0) 0.000 us | tcp_check_space();
0) | __tcp_ack_snd_check() {
--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: sched_clock - microblaze 2010-04-15 14:55 sched_clock - microblaze Michal Simek @ 2010-04-15 15:04 ` Steven Rostedt 2010-04-15 16:32 ` Michal Simek 2010-04-15 15:06 ` Steven J. Magnani 2010-04-15 17:18 ` Wu Zhangjin 2 siblings, 1 reply; 13+ messages in thread From: Steven Rostedt @ 2010-04-15 15:04 UTC (permalink / raw) To: michal.simek; +Cc: Thomas Gleixner, LKML, Wu Zhangjin On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > Hi Thomas and Steven, > > I would like to improve time measuring for ftrace (Log below) > > I looked at http://lkml.org/lkml/2009/11/16/181 > where Thomas suggested to create sched_clock() function. > I used the same solution as Wu proposed but it is not nice. > > Is unimplemented sched_clock the reason why ftrace not show fine grain time? Yeah, sched_clock is used by ftrace for timings, so if it only returns jiffies, then that will, unfortunately, be the resolution of the tracer. I've been told that if you make a higher resolution timer for sched clock, it will improve the scheduling in CFS. -- Steve > > Or is there any other thing which is wrong? > > Thanks, > Michal > > > > Dumping ftrace buffer: > --------------------------------- > 0) ! 10000.00 us | } > 0) <========== | > 0) ! 10000.00 us | } > 0) ! 10000.00 us | } > 0) 0.000 us | get_seconds(); > 0) 0.000 us | tcp_rcv_rtt_update(); > 0) 0.000 us | tcp_ack(); > 0) 0.000 us | tcp_rcv_rtt_update(); > 0) 0.000 us | tcp_urg(); > 0) | tcp_data_queue() { > 0) 0.000 us | __sk_mem_schedule(); > 0) 0.000 us | tcp_event_data_recv(); > 0) 0.000 us | sock_def_readable(); > 0) 0.000 us | } > 0) 0.000 us | tcp_check_space(); > 0) | __tcp_ack_snd_check() { > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 15:04 ` Steven Rostedt @ 2010-04-15 16:32 ` Michal Simek 2010-04-15 18:12 ` Wu Zhangjin 0 siblings, 1 reply; 13+ messages in thread From: Michal Simek @ 2010-04-15 16:32 UTC (permalink / raw) To: Steven Rostedt; +Cc: Thomas Gleixner, LKML, Wu Zhangjin Steven Rostedt wrote: > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: >> Hi Thomas and Steven, >> >> I would like to improve time measuring for ftrace (Log below) >> >> I looked at http://lkml.org/lkml/2009/11/16/181 >> where Thomas suggested to create sched_clock() function. >> I used the same solution as Wu proposed but it is not nice. >> >> Is unimplemented sched_clock the reason why ftrace not show fine grain time? > > Yeah, sched_clock is used by ftrace for timings, so if it only returns > jiffies, then that will, unfortunately, be the resolution of the tracer. > > I've been told that if you make a higher resolution timer for sched > clock, it will improve the scheduling in CFS. We have two timers (in one IP). timer0 - in IRQ mode - clock event timer1 - free running up counter without IRQ - clock source I hope that I can use timer1 in sched_clock too. Anyway. I played with it a little bit and I also implemented microblaze specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c) unsigned long long notrace sched_clock(void) { printk("%s\n", __func__); cycle_t cyc = microblaze_read(NULL); struct clocksource *cs = &clocksource_microblaze; return clocksource_cyc2ns(cyc, cs->mult, cs->shift); } (And comment sched_clock in kernel/sched_clock.c because our toolchain has the problem with weak symbols) Recompile the kernel and then show the log_buf <4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28 <5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty (monstr@monstr.eu) (gcc version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010 <6>setup_cpuinfo: initialising <6>setup_cpuinfo: Using full CPU PVR support <6>cache: wb_msr <6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000, klimit 0xc09c7000 <6>setup_memory: max_mapnr: 0x8000 <6>setup_memory: min_low_pfn: 0x48000 <6>setup_memory: max_low_pfn: 0x50000 <4>reserved 0 - 0x48000000-0x009c8000 <4>reserved 1 - 0x48fe0000-0x00020000 <4>reserved 2 - 0x4fffd8c8-0x00002738 <7>On node 0 totalpages: 32768 <7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000 <7> Normal zone: 256 pages used for memmap <7> Normal zone: 32512 pages, LIFO batch:7 <4>Built 1 zonelists in Zone order, mobility grouping on. Total pages: 32512 <5>Kernel command line: console=ttyS0,115200 <6>PID hash table entries: 512 (order: -1, 2048 bytes) <6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) <6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) <6>Memory: 119796k/131072k available <4>sched_clock The problem I see is that scheck_clock is used before our timer subsystem is initialized. What am I missing? Thanks, Michal -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 16:32 ` Michal Simek @ 2010-04-15 18:12 ` Wu Zhangjin 2010-04-15 19:29 ` Michal Simek 0 siblings, 1 reply; 13+ messages in thread From: Wu Zhangjin @ 2010-04-15 18:12 UTC (permalink / raw) To: michal.simek; +Cc: Steven Rostedt, Thomas Gleixner, LKML On Thu, 2010-04-15 at 18:32 +0200, Michal Simek wrote: > Steven Rostedt wrote: > > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > >> Hi Thomas and Steven, > >> > >> I would like to improve time measuring for ftrace (Log below) > >> > >> I looked at http://lkml.org/lkml/2009/11/16/181 > >> where Thomas suggested to create sched_clock() function. > >> I used the same solution as Wu proposed but it is not nice. > >> > >> Is unimplemented sched_clock the reason why ftrace not show fine grain time? > > > > Yeah, sched_clock is used by ftrace for timings, so if it only returns > > jiffies, then that will, unfortunately, be the resolution of the tracer. > > > > I've been told that if you make a higher resolution timer for sched > > clock, it will improve the scheduling in CFS. > > We have two timers (in one IP). > timer0 - in IRQ mode - clock event > timer1 - free running up counter without IRQ - clock source > > I hope that I can use timer1 in sched_clock too. > > Anyway. I played with it a little bit and I also implemented microblaze > specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c) > > unsigned long long notrace sched_clock(void) > { > printk("%s\n", __func__); > cycle_t cyc = microblaze_read(NULL); > struct clocksource *cs = &clocksource_microblaze; > return clocksource_cyc2ns(cyc, cs->mult, cs->shift); > } Hey, I guess you got: sched_clock() -> printk() -> sched_clock() ... since printk() is not annotated with notrace, it will call mcount() and then sched_clock() and ... so, you can not call printk() in sched_clock(). but you may try this if you want to print something in sched_clock() for debugging. #include <> sched_clock() { ftrace_stop(); printk(...); ftrace_start(); ... } Regards, Wu Zhangjin > > (And comment sched_clock in kernel/sched_clock.c because our toolchain > has the problem with weak symbols) > > Recompile the kernel and then show the log_buf > > <4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28 > <5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty (monstr@monstr.eu) (gcc > version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010 > <6>setup_cpuinfo: initialising > <6>setup_cpuinfo: Using full CPU PVR support > <6>cache: wb_msr > <6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000, > klimit 0xc09c7000 > <6>setup_memory: max_mapnr: 0x8000 > <6>setup_memory: min_low_pfn: 0x48000 > <6>setup_memory: max_low_pfn: 0x50000 > <4>reserved 0 - 0x48000000-0x009c8000 > <4>reserved 1 - 0x48fe0000-0x00020000 > <4>reserved 2 - 0x4fffd8c8-0x00002738 > <7>On node 0 totalpages: 32768 > <7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000 > <7> Normal zone: 256 pages used for memmap > <7> Normal zone: 32512 pages, LIFO batch:7 > <4>Built 1 zonelists in Zone order, mobility grouping on. Total pages: > 32512 > <5>Kernel command line: console=ttyS0,115200 > <6>PID hash table entries: 512 (order: -1, 2048 bytes) > <6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) > <6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) > <6>Memory: 119796k/131072k available > <4>sched_clock > > The problem I see is that scheck_clock is used before our timer > subsystem is initialized. > What am I missing? > > Thanks, > Michal > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 18:12 ` Wu Zhangjin @ 2010-04-15 19:29 ` Michal Simek 2010-04-15 19:52 ` Arnd Bergmann 0 siblings, 1 reply; 13+ messages in thread From: Michal Simek @ 2010-04-15 19:29 UTC (permalink / raw) To: wuzhangjin; +Cc: Steven Rostedt, Thomas Gleixner, LKML Wu Zhangjin wrote: > On Thu, 2010-04-15 at 18:32 +0200, Michal Simek wrote: >> Steven Rostedt wrote: >>> On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: >>>> Hi Thomas and Steven, >>>> >>>> I would like to improve time measuring for ftrace (Log below) >>>> >>>> I looked at http://lkml.org/lkml/2009/11/16/181 >>>> where Thomas suggested to create sched_clock() function. >>>> I used the same solution as Wu proposed but it is not nice. >>>> >>>> Is unimplemented sched_clock the reason why ftrace not show fine grain time? >>> Yeah, sched_clock is used by ftrace for timings, so if it only returns >>> jiffies, then that will, unfortunately, be the resolution of the tracer. >>> >>> I've been told that if you make a higher resolution timer for sched >>> clock, it will improve the scheduling in CFS. >> We have two timers (in one IP). >> timer0 - in IRQ mode - clock event >> timer1 - free running up counter without IRQ - clock source >> >> I hope that I can use timer1 in sched_clock too. >> >> Anyway. I played with it a little bit and I also implemented microblaze >> specific sched_clock function (inspired by arch/arm/mach-ixp4xx/common.c) >> >> unsigned long long notrace sched_clock(void) >> { >> printk("%s\n", __func__); >> cycle_t cyc = microblaze_read(NULL); >> struct clocksource *cs = &clocksource_microblaze; >> return clocksource_cyc2ns(cyc, cs->mult, cs->shift); >> } > > Hey, I guess you got: > > sched_clock() -> printk() -> sched_clock() ... > > since printk() is not annotated with notrace, it will call mcount() and > then sched_clock() and ... so, you can not call printk() in > sched_clock(). > > but you may try this if you want to print something in sched_clock() for > debugging. > > #include <> > > sched_clock() { > ftrace_stop(); > printk(...); > ftrace_start(); > ... > } But I am describing different problem. sched_clock is called before kernel initialized timers. Microblaze takes all information from DTS that's means that I am not able to read timer because I even don't know where it is. :-( Thanks, Michal > > Regards, > Wu Zhangjin > >> (And comment sched_clock in kernel/sched_clock.c because our toolchain >> has the problem with weak symbols) >> >> Recompile the kernel and then show the log_buf >> >> <4>Ramdisk addr 0x00000003, Compiled-in FDT at 0xc0242a28 >> <5>Linux version 2.6.34-rc4-00053-gbc6ce8a-dirty (monstr@monstr.eu) (gcc >> version 4.1.2) #27 Thu Apr 15 16:38:16 CEST 2010 >> <6>setup_cpuinfo: initialising >> <6>setup_cpuinfo: Using full CPU PVR support >> <6>cache: wb_msr >> <6>setup_memory: Main mem: 0x48000000-0x50000000, size 0x08000000, >> klimit 0xc09c7000 >> <6>setup_memory: max_mapnr: 0x8000 >> <6>setup_memory: min_low_pfn: 0x48000 >> <6>setup_memory: max_low_pfn: 0x50000 >> <4>reserved 0 - 0x48000000-0x009c8000 >> <4>reserved 1 - 0x48fe0000-0x00020000 >> <4>reserved 2 - 0x4fffd8c8-0x00002738 >> <7>On node 0 totalpages: 32768 >> <7>free_area_init_node: node 0, pgdat c02bc70c, node_mem_map c09c8000 >> <7> Normal zone: 256 pages used for memmap >> <7> Normal zone: 32512 pages, LIFO batch:7 >> <4>Built 1 zonelists in Zone order, mobility grouping on. Total pages: >> 32512 >> <5>Kernel command line: console=ttyS0,115200 >> <6>PID hash table entries: 512 (order: -1, 2048 bytes) >> <6>Dentry cache hash table entries: 16384 (order: 4, 65536 bytes) >> <6>Inode-cache hash table entries: 8192 (order: 3, 32768 bytes) >> <6>Memory: 119796k/131072k available >> <4>sched_clock >> >> The problem I see is that scheck_clock is used before our timer >> subsystem is initialized. >> What am I missing? >> >> Thanks, >> Michal >> > > -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 19:29 ` Michal Simek @ 2010-04-15 19:52 ` Arnd Bergmann 2010-04-16 6:23 ` Michal Simek 0 siblings, 1 reply; 13+ messages in thread From: Arnd Bergmann @ 2010-04-15 19:52 UTC (permalink / raw) To: michal.simek; +Cc: wuzhangjin, Steven Rostedt, Thomas Gleixner, LKML On Thursday 15 April 2010 21:29:27 Michal Simek wrote: > > sched_clock() { > > ftrace_stop(); > > printk(...); > > ftrace_start(); > > ... > > } > > But I am describing different problem. sched_clock is called before > kernel initialized timers. Microblaze takes all information from DTS > that's means that I am not able to read timer because I even don't know > where it is. :-( You could make sched_clock return zero before the clock is probed, it is very unlikely to be needed before that anyway. Arnd ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 19:52 ` Arnd Bergmann @ 2010-04-16 6:23 ` Michal Simek 2010-04-16 8:37 ` Arnd Bergmann 0 siblings, 1 reply; 13+ messages in thread From: Michal Simek @ 2010-04-16 6:23 UTC (permalink / raw) To: Arnd Bergmann; +Cc: wuzhangjin, Steven Rostedt, Thomas Gleixner, LKML Arnd Bergmann wrote: > On Thursday 15 April 2010 21:29:27 Michal Simek wrote: >>> sched_clock() { >>> ftrace_stop(); >>> printk(...); >>> ftrace_start(); >>> ... >>> } >> But I am describing different problem. sched_clock is called before >> kernel initialized timers. Microblaze takes all information from DTS >> that's means that I am not able to read timer because I even don't know >> where it is. :-( > > You could make sched_clock return zero before the clock is probed, it > is very unlikely to be needed before that anyway. First of all - this technique works. That calling sequence is main.c: sched_init() -> sched.c: sched_init(); -> init_idle(); -> sched_clock() without initialized timer which is causing my problems. Why is sched_clock called in init_idle? Do you get any non zero value on any ARCH? Thanks, Michal diff --git a/kernel/sched.c b/kernel/sched.c index 6af210a..bddd918 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5244,7 +5244,7 @@ void __cpuinit init_idle(struct task_struct *idle, int cpu) __sched_fork(idle); idle->state = TASK_RUNNING; - idle->se.exec_start = sched_clock(); + idle->se.exec_start = 0; cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu)); __set_task_cpu(idle, cpu); -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-16 6:23 ` Michal Simek @ 2010-04-16 8:37 ` Arnd Bergmann 2010-04-16 9:11 ` Michal Simek 0 siblings, 1 reply; 13+ messages in thread From: Arnd Bergmann @ 2010-04-16 8:37 UTC (permalink / raw) To: michal.simek; +Cc: wuzhangjin, Steven Rostedt, Thomas Gleixner, LKML On Friday 16 April 2010, Michal Simek wrote: > Why is sched_clock called in init_idle? Do you get any non zero value on > any ARCH? Yes, some architectures return a real-time value here that starts before boot. Also, when the clock starts at boot time, there can be a significant time spent in the firmware and early kernel code before we get to init_idle. Arnd ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-16 8:37 ` Arnd Bergmann @ 2010-04-16 9:11 ` Michal Simek 0 siblings, 0 replies; 13+ messages in thread From: Michal Simek @ 2010-04-16 9:11 UTC (permalink / raw) To: Arnd Bergmann; +Cc: wuzhangjin, Steven Rostedt, Thomas Gleixner, LKML Arnd Bergmann wrote: > On Friday 16 April 2010, Michal Simek wrote: >> Why is sched_clock called in init_idle? Do you get any non zero value on >> any ARCH? > > Yes, some architectures return a real-time value here that starts before boot. > Also, when the clock starts at boot time, there can be a significant time > spent in the firmware and early kernel code before we get to init_idle. ok. I will return 0 before timer initialization. Thanks for explanation, Michal > > Arnd -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 14:55 sched_clock - microblaze Michal Simek 2010-04-15 15:04 ` Steven Rostedt @ 2010-04-15 15:06 ` Steven J. Magnani 2010-04-15 16:21 ` Michal Simek 2010-04-15 17:18 ` Wu Zhangjin 2 siblings, 1 reply; 13+ messages in thread From: Steven J. Magnani @ 2010-04-15 15:06 UTC (permalink / raw) To: michal.simek; +Cc: Thomas Gleixner, Steven Rostedt, LKML, Wu Zhangjin Michal, On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > Is unimplemented sched_clock the reason why ftrace not show fine grain time? > > Or is there any other thing which is wrong? > I think that's it. On my platform we have a free-running 1 MHz counter so I implemented a platform sched_clock for that, and I get nice ftrace times. 0) + 65.000 us | finish_task_switch(); 0) | lock_sock_nested() { 0) + 52.000 us | local_bh_disable(); 0) + 53.000 us | local_bh_enable(); 0) ! 264.000 us | } Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 15:06 ` Steven J. Magnani @ 2010-04-15 16:21 ` Michal Simek 2010-04-15 18:26 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Michal Simek @ 2010-04-15 16:21 UTC (permalink / raw) To: steve; +Cc: Thomas Gleixner, Steven Rostedt, LKML, Wu Zhangjin Steven J. Magnani wrote: > Michal, > > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > >> Is unimplemented sched_clock the reason why ftrace not show fine grain time? >> >> Or is there any other thing which is wrong? >> > > I think that's it. On my platform we have a free-running 1 MHz counter > so I implemented a platform sched_clock for that, and I get nice ftrace > times. > > 0) + 65.000 us | finish_task_switch(); > 0) | lock_sock_nested() { > 0) + 52.000 us | local_bh_disable(); > 0) + 53.000 us | local_bh_enable(); > 0) ! 264.000 us | } I don't understand why I should add any "new" free running counter because we have one free running counter which do it (clocksource timer - timer1). Or am I missing something? Thanks for sending your code. Michal -- Michal Simek, Ing. (M.Eng) PetaLogix - Linux Solutions for a Reconfigurable World w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 16:21 ` Michal Simek @ 2010-04-15 18:26 ` Thomas Gleixner 0 siblings, 0 replies; 13+ messages in thread From: Thomas Gleixner @ 2010-04-15 18:26 UTC (permalink / raw) To: Michal Simek; +Cc: steve, Steven Rostedt, LKML, Wu Zhangjin On Thu, 15 Apr 2010, Michal Simek wrote: > Steven J. Magnani wrote: > > Michal, > > > > On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > > > > > Is unimplemented sched_clock the reason why ftrace not show fine grain > > > time? > > > > > > Or is there any other thing which is wrong? > > > > > > > I think that's it. On my platform we have a free-running 1 MHz counter > > so I implemented a platform sched_clock for that, and I get nice ftrace > > times. > > > > 0) + 65.000 us | finish_task_switch(); > > 0) | lock_sock_nested() { > > 0) + 52.000 us | local_bh_disable(); > > 0) + 53.000 us | local_bh_enable(); > > 0) ! 264.000 us | } > > I don't understand why I should add any "new" free running counter because we > have one free running counter which do it (clocksource timer - timer1). Or am > I missing something? No, you just can use the clocksource for it. We have that separation as on x86 we want to use the TSC even if it is not usable for time keeping as it is the fastest accesible clock in x86. So for your case you can just use your main clocksource as sched_clock and be done. Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sched_clock - microblaze 2010-04-15 14:55 sched_clock - microblaze Michal Simek 2010-04-15 15:04 ` Steven Rostedt 2010-04-15 15:06 ` Steven J. Magnani @ 2010-04-15 17:18 ` Wu Zhangjin 2 siblings, 0 replies; 13+ messages in thread From: Wu Zhangjin @ 2010-04-15 17:18 UTC (permalink / raw) To: michal.simek; +Cc: Thomas Gleixner, Steven Rostedt, LKML On Thu, 2010-04-15 at 16:55 +0200, Michal Simek wrote: > Hi Thomas and Steven, > > I would like to improve time measuring for ftrace (Log below) > > I looked at http://lkml.org/lkml/2009/11/16/181 > where Thomas suggested to create sched_clock() function. > I used the same solution as Wu proposed but it is not nice. Which solution did you use? >From my point of view, the basic idea of high resolution sched_clock() looks like: 0. read the clock cycles from the hardware clock counter cyc = read_clock_counter() read_clock_counter() is processor specific, for example, in X86, there is a tsc register, and in a r4k MIPS, there is a c0 counter. such counters can provide a high resolution of time counting: the tsc register's frequency is equal to the frequency of the processor, if the frequency of the processor is 1G, then one cycle of the counter gives us 1ns. 1. convert the clock cycles to ns ns = cyc * NSEC_PER_SEC / freq (freq is clock freq of the counter) = use the scaling math. (please refer to arch/x86/kernel/tsc.c) = (cyc * mult) >> shift (refer to clocksource_calc_mult_shift() ) so, you need to implement the arch specific cyc2ns(), now, we get: cyc = read_clock_counter(); ns = cyc2ns(cyc); 2. handle the overflow problems + overflow of 32bit counter, solution: cnt32_to_63() If the hardware clock counter is 32bit long, you need to 'avoid' its counting overflow via cnt32_to_63() defined in include/linux/cnt32_to_63() Please ensure the cnt32_to_63() is called every less than half of the overflow period(the max time the counter can record) of the hardware clock counter, now, we get: #ifdef CONFIG_64BIT cyc = read_clock_counter(); #else /* CONFIG_32BIT */ cyc = cnt32_to_63(read_clock_counter()); #endif ns = cyc2ns(cyc); and a kernel software timer(setup_timer(), mod_timer()) is needed to update call the cnt32_to_63() automatically. [why cnt32_to_63()? it has less overhead than the other methods, will talk about it later.] 'avoid' above means avoid it overflow too quickly, for example, a 32bit clock counter with 400MHz may overflow after about 10+ seconds, which is not enough for Ftrace since a function may be executed for more 200+ seconds, we need a bigger 'counter', that is why cnt32_to_63() is needed, it tries to convert the 32bit counter to a virtual 63bit(the highest bit is used to sync the virtual high 32bit with the low real 32bit in the hardware counter) long, then with the 63bit counter, it will overflow after several hundreds of days, which is enough for generic debugging. + overflow of (cyc * mult), solution: smaller shift OR 128bit arithmetic And please ensure the shift is not too big(Seems ARM use 8 as the shift, and X86 use 10), otherwise, the 64bit arithmetic (cyc * mult) will overflow quickly. As we can see, the type of the return value of sched_clock() is unsigned long long, 64bit, so, the value of (cyc * mult) is limited to 64bit. And because cyc is also in 64bit (63bit if using cnt32_to_63()), mult can not be too big, otherwise, (cyc * mult) will overflow quickly, mult is relative to shift, then shift can not be too big. In another aspect, the shift can not be too small, otherwise, the precision will be lost. to learn how to choose such a shift, please refer to arch/arm/plat-orion/time.c and arch/x86/kernel/tsc.c. By default, the mult and shift in struct clocksource is calculated by clocksource_calc_mult_shift(), both of them are in 32bit, and in some platforms, for a 32bit counter, the shift is calculated from 32 to 0 and will be set as 31 for getting a good precision, but it is too big a value for a virtual 63bit counter, so, we need to manually set it a smaller one, the same to a real 64bit hardware counter. In some platforms, they have used the 128bit arithmetic to handle the overflow problem for a real 64bit hardware counter(please refer to arch/mips/cavium-octeon/csrc-octeon.c), perhaps we also need to use it for clocksource_cyc2ns() too(I did not check the potential overflow problem in such processor yet, it should overflow if the shift is too big!), but we also need to notice that, 128bit arithmetic may not be supported directly in some processors or it may increase the overhead obviously. 3. why not getnstimeofday() we need a lower overhead of clock func (please refer to kernel/trace/trace_clock.c) to make Ftrace has lower overhead for the clock func is a basic function needed to get the timestamp of entering into and exiting from a kernel function. I have done some overhead benchmarking of the cnt32_to_63() based sched_clock() and getnstimeofday() on a MIPS machine with 64bit kernel: Clock func/overhead(ns) Min Avg Max Jitter Std.Dev. ---------------------------------------------- sched_clock(cnt32_to_63) 105 116.2 236 131 9.5 getnstimeofday() 160 167.1 437 277 15 sched_clock(Accumulation method[1]) 193 200.9 243 50 2.9 ---------------------------------------------- 'Overhead' above means the time spent on calling clock func themselves, as we can see, the result of cnt32_to_63() base method is better than the other two. [1] the algorithm looks like this: static inline unsigned long long notrace read_clock_counter(void) { static u64 clock; static u32 old_clock; u32 current_clock; raw_spin_lock(&clock_lock); current_clock = read_c0_count(); clock += ((current_clock - old_clock) & MASK); old_clock = current_clock; raw_spin_unlock(&clock_lock); return clock; } Regards, Wu Zhangjin > > Is unimplemented sched_clock the reason why ftrace not show fine grain time? > > Or is there any other thing which is wrong? > > Thanks, > Michal > > > > Dumping ftrace buffer: > --------------------------------- > 0) ! 10000.00 us | } > 0) <========== | > 0) ! 10000.00 us | } > 0) ! 10000.00 us | } > 0) 0.000 us | get_seconds(); > 0) 0.000 us | tcp_rcv_rtt_update(); > 0) 0.000 us | tcp_ack(); > 0) 0.000 us | tcp_rcv_rtt_update(); > 0) 0.000 us | tcp_urg(); > 0) | tcp_data_queue() { > 0) 0.000 us | __sk_mem_schedule(); > 0) 0.000 us | tcp_event_data_recv(); > 0) 0.000 us | sock_def_readable(); > 0) 0.000 us | } > 0) 0.000 us | tcp_check_space(); > 0) | __tcp_ack_snd_check() { > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-04-16 9:11 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-15 14:55 sched_clock - microblaze Michal Simek 2010-04-15 15:04 ` Steven Rostedt 2010-04-15 16:32 ` Michal Simek 2010-04-15 18:12 ` Wu Zhangjin 2010-04-15 19:29 ` Michal Simek 2010-04-15 19:52 ` Arnd Bergmann 2010-04-16 6:23 ` Michal Simek 2010-04-16 8:37 ` Arnd Bergmann 2010-04-16 9:11 ` Michal Simek 2010-04-15 15:06 ` Steven J. Magnani 2010-04-15 16:21 ` Michal Simek 2010-04-15 18:26 ` Thomas Gleixner 2010-04-15 17:18 ` Wu Zhangjin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox