* Module vs Kernel main performacne
@ 2012-05-29 23:50 Abu Rasheda
2012-05-30 4:18 ` Mulyadi Santosa
2012-06-07 23:36 ` Peter Senna Tschudin
0 siblings, 2 replies; 16+ messages in thread
From: Abu Rasheda @ 2012-05-29 23:50 UTC (permalink / raw)
To: kernelnewbies
Hi,
I am working on x8_64 arch. Profiled (oprofile) Linux kernel module
and notice that whole lot of cycles are spent in copy_from_user call.
I compared same flow from kernel proper and noticed that for more data
through put cycles spent in copy_from_user are much less. Kernel
proper has 1/8 cycles compared to module. (There is a user process
which keeps sending data, like iperf)
Used perf tool to gather some statistics and found that call from kernel proper
185,719,857,837 cpu-cycles # 3.318 GHz
[90.01%]
99,886,030,243 instructions # 0.54 insns per cycle
[95.00%]
1,696,072,702 cache-references # 30.297 M/sec
[94.99%]
786,929,244 cache-misses # 46.397 % of all cache
refs [95.00%]
16,867,747,688 branch-instructions # 301.307 M/sec
[95.03%]
86,752,646 branch-misses # 0.51% of all branches
[95.00%]
5,482,768,332 bus-cycles # 97.938 M/sec
[20.08%]
55967.269801 cpu-clock
55981.842225 task-clock # 0.933 CPUs utilized
and call from kernel module
9,388,787,678 cpu-cycles # 1.527 GHz
[89.77%]
1,706,203,221 instructions # 0.18 insns per cycle
[94.59%]
551,010,961 cache-references # 89.588 M/sec [94.73%]
369,632,492 cache-misses # 67.083 % of all cache refs
[95.18%]
291,358,658 branch-instructions # 47.372 M/sec [94.68%]
10,291,678 branch-misses # 3.53% of all branches
[95.01%]
582,651,999 bus-cycles # 94.733 M/sec
[20.55%]
6112.471585 cpu-clock
6150.490210 task-clock # 0.102 CPUs utilized
367 page-faults # 0.000 M/sec
367 minor-faults # 0.000 M/sec
0 major-faults # 0.000 M/sec
25,770 context-switches # 0.004 M/sec
23 cpu-migrations # 0.000 M/sec
So obviously, CPU is stalling when it is copying data and there are
more cache misses. My question is, is there a difference calling
copy_from_user from kernel proper compared to calling from LKM ?
^ permalink raw reply [flat|nested] 16+ messages in thread* Module vs Kernel main performacne 2012-05-29 23:50 Module vs Kernel main performacne Abu Rasheda @ 2012-05-30 4:18 ` Mulyadi Santosa 2012-05-30 4:51 ` Abu Rasheda 2012-06-07 23:36 ` Peter Senna Tschudin 1 sibling, 1 reply; 16+ messages in thread From: Mulyadi Santosa @ 2012-05-30 4:18 UTC (permalink / raw) To: kernelnewbies Hi... On Wed, May 30, 2012 at 6:50 AM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > So obviously, CPU is stalling when it is copying data and there are > more cache misses. My question is, is there a difference calling > copy_from_user from kernel proper compared to calling from LKM ? Theoritically, it should be the same. However, one thing that might interest you is that the fact that linux kernel module memory area is prepared through vmalloc(), thus there is a chance they are not physically contigous...whereas the main kernel image are using page_alloc() IIRC thus physically contigous. What I meant here is, there must be difference speed when you copy onto something contigous vs non contigous. IIRC at least it will waste some portion of L1/L2 cache. Just my 2 cents, maybe I am wrong somewhere... -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-30 4:18 ` Mulyadi Santosa @ 2012-05-30 4:51 ` Abu Rasheda 2012-05-30 16:45 ` Mulyadi Santosa 0 siblings, 1 reply; 16+ messages in thread From: Abu Rasheda @ 2012-05-30 4:51 UTC (permalink / raw) To: kernelnewbies > What I meant here is, there must be difference speed when you copy > onto something contigous vs non contigous. IIRC at least it will waste > some portion of L1/L2 cache. When you say, LKM area is prepared with vmalloc is it for code / executable you refering too ? if so will it matter for data copy ? Point # 2. Some one was saying that on atleast MIPS it takes more cycle to call kernel main function from module because of log jump. Does it apply to x86_64 to ? To teat above two should I make my module part of static kernel ? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-30 4:51 ` Abu Rasheda @ 2012-05-30 16:45 ` Mulyadi Santosa 2012-05-30 21:44 ` Abu Rasheda 0 siblings, 1 reply; 16+ messages in thread From: Mulyadi Santosa @ 2012-05-30 16:45 UTC (permalink / raw) To: kernelnewbies Hi... On Wed, May 30, 2012 at 11:51 AM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > When you say, LKM area is prepared with vmalloc is it for code / > executable you refering too ? Yes, AFAIK memory area code and static data in linux kernel module is allocated via vmalloc(). >if so will it matter for data copy ? see my previous reply :) > > Point # 2. Some one was saying that on atleast MIPS it takes more > cycle to call kernel main function from module because of log jump. > Does it apply to x86_64 to ? IIRC long jump means jumping more than 64 KB...but that's in real mode in 32 bit...so I am not sure whether it still applies in protected mode. > To teat above two should I make my module part of static kernel ? good idea....i think you can try that... :) -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-30 16:45 ` Mulyadi Santosa @ 2012-05-30 21:44 ` Abu Rasheda 2012-05-31 0:17 ` Abu Rasheda 2012-05-31 5:35 ` Mulyadi Santosa 0 siblings, 2 replies; 16+ messages in thread From: Abu Rasheda @ 2012-05-30 21:44 UTC (permalink / raw) To: kernelnewbies I did another experiment. Wrote a stand alone module and user program which does ioctl and pass buffer to kernel module. User program passes a buffer through ioctl and kernel module does kmalloc on it and calls copy_from_user, kfree and return. Test program send 120 gigabyte data to module. If I pass 1k buffer per call, I get 115,396,349,819 instructions # 0.90 insns per cycle [95.00%] as I increase size of buffer, insns per cycle keep decreasing. Here is the data: 1k 0.90 insns per cycle 8k 0.43 insns per cycle 43k 0.18 insns per cycle 100k 0.08 insns per cycle Showing that cop_from_user is more efficient when copy data is small, why it is so ? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-30 21:44 ` Abu Rasheda @ 2012-05-31 0:17 ` Abu Rasheda 2012-05-31 5:35 ` Mulyadi Santosa 1 sibling, 0 replies; 16+ messages in thread From: Abu Rasheda @ 2012-05-31 0:17 UTC (permalink / raw) To: kernelnewbies On Wed, May 30, 2012 at 2:44 PM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > I did another experiment. > > Wrote a stand alone module and user program which does ioctl and pass > buffer to kernel module. > > User program passes a buffer through ioctl and kernel module does > kmalloc on it and calls copy_from_user, kfree and return. Test program > send 120 gigabyte data to module. > > If I pass 1k buffer per call, I get > > 115,396,349,819 instructions ? ? ? ? ? ? ?# ? ?0.90 ?insns per cycle > ? ? ?[95.00%] > > as I increase size of buffer, insns per cycle keep decreasing. Here is the data: > > ? ?1k 0.90 ?insns per cycle > ? ?8k 0.43 ?insns per cycle > ?43k 0.18 ?insns per cycle > 100k 0.08 ?insns per cycle > > Showing that cop_from_user is more efficient when copy data is small, > why it is so ? Did another experiment: User program sending 43k and allocating 43k after entering ioctl and copy_from_user smaller portion in each call to copy_from_user: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- copy_from_user 0.25k at a time 0.56 insns per cycle copy_from_user 0.50k at a time 0.42 insns per cycle copy_from_user 1.00k at a time 0.36 insns per cycle copy_from_user 2.00k at a time 0.29 insns per cycle copy_from_user 3.00k at a time 0.26 insns per cycle copy_from_user 4.00k at a time 0.23 insns per cycle copy_from_user 8.00k at a time 0.21 insns per cycle copy_from_user 16.00k at a time 0.19 insns per cycle User program sending 43k, allocating smaller chunk and sending that chunk to call to copy_from_user: -------------------------------------------------------------------------------------------------------------------------------------------------------------- Allocated 0.25k and copy_from_user 0.25k at a time 1.04 insns per cycle Allocated 0.50k and copy_from_user 0.50k at a time 0.90 insns per cycle Allocated 1.00k and copy_from_user 1.00k at a time 0.79 insns per cycle Allocated 2.00k and copy_from_user 2.00k at a time 0.67 insns per cycle Allocated 4.00k and copy_from_user 4.00k at a time 0.53 insns per cycle Allocated 8.00k and copy_from_user 8.00k at a time 0.42 insns per cycle Allocated 16.00k and copy_from_user 16.00k at a time 0.33 insns per cycle Allocated 32.00k and copy_from_user 32.00k at a time 0.22 insns per cycle ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-30 21:44 ` Abu Rasheda 2012-05-31 0:17 ` Abu Rasheda @ 2012-05-31 5:35 ` Mulyadi Santosa 2012-05-31 13:35 ` Abu Rasheda 1 sibling, 1 reply; 16+ messages in thread From: Mulyadi Santosa @ 2012-05-31 5:35 UTC (permalink / raw) To: kernelnewbies Hi... On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > as I increase size of buffer, insns per cycle keep decreasing. Here is the data: > > ? ?1k 0.90 ?insns per cycle > ? ?8k 0.43 ?insns per cycle > ?43k 0.18 ?insns per cycle > 100k 0.08 ?insns per cycle > > Showing that cop_from_user is more efficient when copy data is small, > why it is so ? you meant, the bigger the buffer, the fewer the instructions, right? Not sure why, but I am sure it will reach some peak point. Anyway, you did kmalloc and then kfree()? I think that's why...bigger buffer will grab large chunk from slab...and again likely it's physically contigous. Also, it will be placed in the same cache line. Whereas the smaller one....will hit allocate/free cycle more...thus flushing the L1/L2 cache even more. CMIIW people... -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-31 5:35 ` Mulyadi Santosa @ 2012-05-31 13:35 ` Abu Rasheda 2012-06-01 0:27 ` Chetan Nanda 0 siblings, 1 reply; 16+ messages in thread From: Abu Rasheda @ 2012-05-31 13:35 UTC (permalink / raw) To: kernelnewbies On Wed, May 30, 2012 at 10:35 PM, Mulyadi Santosa <mulyadi.santosa@gmail.com> wrote: > Hi... > > On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda <rcpilot2010@gmail.com> wrote: >> as I increase size of buffer, insns per cycle keep decreasing. Here is the data: >> >> ? ?1k 0.90 ?insns per cycle >> ? ?8k 0.43 ?insns per cycle >> ?43k 0.18 ?insns per cycle >> 100k 0.08 ?insns per cycle >> >> Showing that copy_from_user is more efficient when copy data is small, >> why it is so ? > > you meant, the bigger the buffer, the fewer the instructions, right? yes > > Not sure why, but I am sure it will reach some peak point. > > Anyway, you did kmalloc and then kfree()? I think that's why...bigger > buffer will grab large chunk from slab...and again likely it's > physically contigous. Also, it will be placed in the same cache line. > > Whereas the smaller one....will hit allocate/free cycle more...thus > flushing the L1/L2 cache even more. It seems to be doing opposite, bigger the allocation / copy longer stall is. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-31 13:35 ` Abu Rasheda @ 2012-06-01 0:27 ` Chetan Nanda 2012-06-01 18:52 ` Abu Rasheda 0 siblings, 1 reply; 16+ messages in thread From: Chetan Nanda @ 2012-06-01 0:27 UTC (permalink / raw) To: kernelnewbies On May 31, 2012 9:37 PM, "Abu Rasheda" <rcpilot2010@gmail.com> wrote: > > On Wed, May 30, 2012 at 10:35 PM, Mulyadi Santosa > <mulyadi.santosa@gmail.com> wrote: > > Hi... > > > > On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > >> as I increase size of buffer, insns per cycle keep decreasing. Here is the data: > >> > >> 1k 0.90 insns per cycle > >> 8k 0.43 insns per cycle > >> 43k 0.18 insns per cycle > >> 100k 0.08 insns per cycle > >> > >> Showing that copy_from_user is more efficient when copy data is small, > >> why it is so ? > > > > you meant, the bigger the buffer, the fewer the instructions, right? > > yes > If the buffer at user side is more then a page, then it may be that complete user space buffer is not available in memory and kernel spend time in processing page fault > > > > Not sure why, but I am sure it will reach some peak point. > > > > Anyway, you did kmalloc and then kfree()? I think that's why...bigger > > buffer will grab large chunk from slab...and again likely it's > > physically contigous. Also, it will be placed in the same cache line. > > > > Whereas the smaller one....will hit allocate/free cycle more...thus > > flushing the L1/L2 cache even more. > > It seems to be doing opposite, bigger the allocation / copy longer stall is. > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies at kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120601/14efcdb3/attachment.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-01 0:27 ` Chetan Nanda @ 2012-06-01 18:52 ` Abu Rasheda 2012-06-07 13:11 ` Peter Senna Tschudin 0 siblings, 1 reply; 16+ messages in thread From: Abu Rasheda @ 2012-06-01 18:52 UTC (permalink / raw) To: kernelnewbies > > If the buffer at user side is more then a page, then it may be that > complete user space buffer is not available in memory and kernel spend time > in processing page fault > I have attached code for module and user program. If anyone is bored over the weekend they are welcome to try and explain the behavior. Abu Rasheda -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120601/8a7dc407/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: m.tgz Type: application/x-gzip Size: 18825 bytes Desc: not available Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120601/8a7dc407/attachment.tgz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-01 18:52 ` Abu Rasheda @ 2012-06-07 13:11 ` Peter Senna Tschudin 2012-06-07 17:47 ` Abu Rasheda 0 siblings, 1 reply; 16+ messages in thread From: Peter Senna Tschudin @ 2012-06-07 13:11 UTC (permalink / raw) To: kernelnewbies Hello Abu, I had to include <linux/module.h> or an error was issued about "THIS_MODULE". What Kernel version are you using? I'm trying to compile it and I'm getting the error: [peter at ace m]$ make make -C /lib/modules/3.3.7-1.fc17.x86_64/build SUBDIRS=`pwd` modules make[1]: Entering directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' CC [M] /tmp/m/m.o /tmp/m/m.c:36:2: error: unknown field ?ioctl? specified in initializer /tmp/m/m.c:36:2: warning: initialization from incompatible pointer type [enabled by default] /tmp/m/m.c:36:2: warning: (near initialization for ?m_fops.llseek?) [enabled by default] make[2]: *** [/tmp/m/m.o] Error 1 make[1]: *** [_module_/tmp/m] Error 2 make[1]: Leaving directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' make: *** [module] Error 2 According to: http://lxr.linux.no/linux+v3.4.1/include/linux/fs.h#L1609 There is no .ioctl at struct file_operations... Can you share how you've used perf/oprofile on your module/Kernel code? []'s Peter On Fri, Jun 1, 2012 at 3:52 PM, Abu Rasheda <rcpilot2010@gmail.com> wrote: >> If the buffer at user side is more then a page, then it may be that >> complete user space buffer is not available in memory and kernel spend time >> in processing page fault > > > I have attached code for module and user program. If anyone is bored over > the weekend they are welcome to try and explain the?behavior. > > Abu Rasheda > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies at kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > -- Peter Senna Tschudin peter.senna at gmail.com gpg id: 48274C36 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-07 13:11 ` Peter Senna Tschudin @ 2012-06-07 17:47 ` Abu Rasheda 2012-06-07 18:10 ` Peter Senna Tschudin 0 siblings, 1 reply; 16+ messages in thread From: Abu Rasheda @ 2012-06-07 17:47 UTC (permalink / raw) To: kernelnewbies > > Hello Abu, > > I had to include <linux/module.h> or an error was issued about > "THIS_MODULE". > I am running this tool on Scientific Linux 6.0, which is 2.6.32 kernel. I know this is old but this is what I have for my product. > What Kernel version are you using? I'm trying to compile it and I'm > getting the error: > > [peter at ace m]$ make > make -C /lib/modules/3.3.7-1.fc17.x86_64/build SUBDIRS=`pwd` modules > make[1]: Entering directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' > CC [M] /tmp/m/m.o > /tmp/m/m.c:36:2: error: unknown field ?ioctl? specified in initializer > /tmp/m/m.c:36:2: warning: initialization from incompatible pointer > type [enabled by default] > /tmp/m/m.c:36:2: warning: (near initialization for ?m_fops.llseek?) > [enabled by default] > make[2]: *** [/tmp/m/m.o] Error 1 > make[1]: *** [_module_/tmp/m] Error 2 > make[1]: Leaving directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' > make: *** [module] Error 2 > > According to: > http://lxr.linux.no/linux+v3.4.1/include/linux/fs.h#L1609 > > There is no .ioctl at struct file_operations... > > Can you share how you've used perf/oprofile on your module/Kernel code? > > []'s > > Peter for perf: perf stat -e cpu-cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,cache-references,cache-misses,branch-instructions,branch-misses,bus-cycles,cpu-clock,task-clock,page-faults,minor-faults,major-faults,context-switches,cpu-migrations,alignment-faults,emulation-faults,L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,L1-dcache-prefetches,L1-dcache-prefetch-misses,L1-icache-loads,L1-icache-load-misses,L1-icache-prefetches,L1-icache-prefetch-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches,LLC-prefetch-misses,dTLB-loads,dTLB-load-misses,dTLB-stores,dTLB-store-misses,dTLB-prefetches,dTLB-prefetch-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses,syscalls:sys_enter_sendmsg,syscalls:sys_exit_sendmsg,sched:sched_wakeup,sched:sched_stat_sleep ./prog for oprofile: # opcontrol --reset # opcontrol --vmlinux=/boot/vmlinux.64 # opcontrol --start # ./a.out # opcontrol --shutdown # opreport -l -p -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120607/ab773484/attachment.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-07 17:47 ` Abu Rasheda @ 2012-06-07 18:10 ` Peter Senna Tschudin 2012-06-09 1:52 ` Abu Rasheda 0 siblings, 1 reply; 16+ messages in thread From: Peter Senna Tschudin @ 2012-06-07 18:10 UTC (permalink / raw) To: kernelnewbies Hello Abu, On Thu, Jun 7, 2012 at 2:47 PM, Abu Rasheda <rcpilot2010@gmail.com> wrote: >> Hello Abu, >> >> I had to include <linux/module.h> or an error was issued about >> "THIS_MODULE". > > > I am running this tool on Scientific Linux 6.0, which is 2.6.32 kernel. I > know this is old but this is what I have for my product. > > >> >> What Kernel version are you using? I'm trying to compile it and I'm >> getting the error: >> >> [peter at ace m]$ make >> make -C /lib/modules/3.3.7-1.fc17.x86_64/build SUBDIRS=`pwd` modules >> make[1]: Entering directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' >> ?CC [M] ?/tmp/m/m.o >> /tmp/m/m.c:36:2: error: unknown field ?ioctl? specified in initializer >> /tmp/m/m.c:36:2: warning: initialization from incompatible pointer >> type [enabled by default] >> /tmp/m/m.c:36:2: warning: (near initialization for ?m_fops.llseek?) >> [enabled by default] >> make[2]: *** [/tmp/m/m.o] Error 1 >> make[1]: *** [_module_/tmp/m] Error 2 >> make[1]: Leaving directory `/usr/src/kernels/3.3.7-1.fc17.x86_64' >> make: *** [module] Error 2 >> >> According to: >> http://lxr.linux.no/linux+v3.4.1/include/linux/fs.h#L1609 >> >> There is no .ioctl at struct file_operations... >> >> Can you share how you've used perf/oprofile on your module/Kernel code? >> >> []'s >> >> Peter > > > for perf: > > perf stat -e > cpu-cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,cache-references,cache-misses,branch-instructions,branch-misses,bus-cycles,cpu-clock,task-clock,page-faults,minor-faults,major-faults,context-switches,cpu-migrations,alignment-faults,emulation-faults,L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,L1-dcache-prefetches,L1-dcache-prefetch-misses,L1-icache-loads,L1-icache-load-misses,L1-icache-prefetches,L1-icache-prefetch-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches,LLC-prefetch-misses,dTLB-loads,dTLB-load-misses,dTLB-stores,dTLB-store-misses,dTLB-prefetches,dTLB-prefetch-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses,syscalls:sys_enter_sendmsg,syscalls:sys_exit_sendmsg,sched:sched_wakeup,sched:sched_stat_sleep > ./prog > > for oprofile: > > # opcontrol --reset > # opcontrol --vmlinux=/boot/vmlinux.64 > # opcontrol --start > # ./a.out > # opcontrol --shutdown > # opreport -l -p Thanks! I'll try it now. I've made changes to your code, so it "probably" will: - Run on 3.4 Kernel - Partially meet Kernel coding style (Try to run scripts/checkpatch.pl -f m.c) - Stop working due lack of locking at m_ioctl(). I'm working on this now... :-) See it at: http://pastebin.com/sibPrQJL []'s Peter -- Peter Senna Tschudin peter.senna at gmail.com gpg id: 48274C36 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-07 18:10 ` Peter Senna Tschudin @ 2012-06-09 1:52 ` Abu Rasheda 0 siblings, 0 replies; 16+ messages in thread From: Abu Rasheda @ 2012-06-09 1:52 UTC (permalink / raw) To: kernelnewbies I modified my module (m.c). Still sending buffer from user space using ioctl, but instead of copying data from buffer provided by user, I have allocated (kmalloc) a buffer and I copy from this buffer to another kernel buffer which is allocated each time this module ioclt is invoked. copy_from_user is now replaced with memcpy. I still see processor stall. This means the buffer allocated per call is the cause. Abu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120608/43f027cc/attachment.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-05-29 23:50 Module vs Kernel main performacne Abu Rasheda 2012-05-30 4:18 ` Mulyadi Santosa @ 2012-06-07 23:36 ` Peter Senna Tschudin 2012-06-07 23:41 ` Abu Rasheda 1 sibling, 1 reply; 16+ messages in thread From: Peter Senna Tschudin @ 2012-06-07 23:36 UTC (permalink / raw) To: kernelnewbies Hi again! On Tue, May 29, 2012 at 8:50 PM, Abu Rasheda <rcpilot2010@gmail.com> wrote: > Hi, > > I am working on x8_64 arch. Profiled (oprofile) Linux kernel module > and notice that whole lot of cycles are spent in copy_from_user call. > I compared same flow from kernel proper and noticed that for more data > through put cycles spent in copy_from_user are much less. Kernel > proper has 1/8 cycles compared to module. (There is a user process > which keeps sending data, like iperf) > > Used perf tool to gather some statistics and found that call from kernel proper > > 185,719,857,837 cpu-cycles ? ? ? ? ? ? ? # ? ?3.318 GHz > ? ? [90.01%] > ?99,886,030,243 instructions ? ? ? ? ? ? ?# ? ?0.54 ?insns per cycle > ? ? ? [95.00%] > ? ?1,696,072,702 cache-references ? ? # ? 30.297 M/sec > ? [94.99%] > ? ? ? 786,929,244 cache-misses ? ? ? ? ? # ? 46.397 % of all cache > refs ? ? [95.00%] > ?16,867,747,688 branch-instructions ? # ?301.307 M/sec > ? [95.03%] > ? ? ? ? 86,752,646 branch-misses ? ? ? ? ?# ? ?0.51% of all branches > ? ? ? [95.00%] > ? ?5,482,768,332 bus-cycles ? ? ? ? ? ? ? ?# ? 97.938 M/sec > ? ? ? ?[20.08%] > ? ?55967.269801 cpu-clock > ? ?55981.842225 task-clock ? ? ? ? ? ? ? ? # ? ?0.933 CPUs utilized > > and call from kernel module > > ?9,388,787,678 cpu-cycles ? ? ? ? ? ? ? # ? ?1.527 GHz > ? ?[89.77%] > ?1,706,203,221 instructions ? ? ? ? ? ? # ? ?0.18 ?insns per cycle > ? ?[94.59%] > ? ?551,010,961 cache-references ? ?# ? 89.588 M/sec ? ? ? ? ? ? ? ? ? [94.73%] > ? 369,632,492 cache-misses ? ? ? ? ? # ? 67.083 % of all cache refs > ?[95.18%] > ? 291,358,658 branch-instructions ? # ? 47.372 M/sec ? ? ? ? ? ? ? ? ? [94.68%] > ? ?10,291,678 branch-misses ? ? ? ? ? # ? ?3.53% of all branches > ? [95.01%] > ?582,651,999 bus-cycles ? ? ? ? ? ? ? ? # ? 94.733 M/sec > ? ? [20.55%] > ?6112.471585 cpu-clock > ?6150.490210 task-clock ? ? ? ? ? ? ? ? # ? ?0.102 CPUs utilized > ? ? ? ? ? ? ? ?367 page-faults ? ? ? ? ? ? ? ?# ? ?0.000 M/sec > ? ? ? ? ? ? ? ?367 minor-faults ? ? ? ? ? ? ? ?# ? ?0.000 M/sec > ? ? ? ? ? ? ? ? ? ?0 major-faults ? ? ? ? ? ? ? ?# ? ?0.000 M/sec > ? ? ? ? ? 25,770 context-switches ? ? ? ?# ? ?0.004 M/sec > ? ? ? ? ? ? ? ? 23 cpu-migrations ? ? ? ? ? ?# ? ?0.000 M/sec How did you call from Kernel module? > > > So obviously, CPU is stalling when it is copying data and there are > more cache misses. My question is, is there a difference calling > copy_from_user from kernel proper compared to calling from LKM ? > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies at kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies []'s -- Peter Senna Tschudin peter.senna at gmail.com gpg id: 48274C36 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Module vs Kernel main performacne 2012-06-07 23:36 ` Peter Senna Tschudin @ 2012-06-07 23:41 ` Abu Rasheda 0 siblings, 0 replies; 16+ messages in thread From: Abu Rasheda @ 2012-06-07 23:41 UTC (permalink / raw) To: kernelnewbies <peter.senna@gmail.com> wrote: > Hi again! > Hi > How did you call from Kernel module? In original code, copied data is dmaed and in experimental code data is dropped. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20120607/adfa954b/attachment.html ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-06-09 1:52 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-29 23:50 Module vs Kernel main performacne Abu Rasheda 2012-05-30 4:18 ` Mulyadi Santosa 2012-05-30 4:51 ` Abu Rasheda 2012-05-30 16:45 ` Mulyadi Santosa 2012-05-30 21:44 ` Abu Rasheda 2012-05-31 0:17 ` Abu Rasheda 2012-05-31 5:35 ` Mulyadi Santosa 2012-05-31 13:35 ` Abu Rasheda 2012-06-01 0:27 ` Chetan Nanda 2012-06-01 18:52 ` Abu Rasheda 2012-06-07 13:11 ` Peter Senna Tschudin 2012-06-07 17:47 ` Abu Rasheda 2012-06-07 18:10 ` Peter Senna Tschudin 2012-06-09 1:52 ` Abu Rasheda 2012-06-07 23:36 ` Peter Senna Tschudin 2012-06-07 23:41 ` Abu Rasheda
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).