From: "M. Koehrer" <mathias_koehrer@domain.hid>
To: stephane.fillod@domain.hid, jan.kiszka@domain.hid,
mathias_koehrer@domain.hid
Cc: xenomai@xenomai.org
Subject: Re: [Xenomai-help] Xenomai and futexes - Native API optimized
Date: Fri, 11 May 2007 09:08:32 +0200 (CEST) [thread overview]
Message-ID: <15183185.1178867312688.JavaMail.ngmail@domain.hid> (raw)
Hi,
here are my latest results:
I included xeno_nucleus and xeno_native into the kernel. (see first oprofile result).
(I also added debug information to the kernel which was of no additional help).
I detected that most of the time was spent in __ipipe_hard_cpuid.
I looked at that routine and broke that using 2 additional helper functions that are use
to "monitor" the apic_read() resp. the GET_APIC_ID calls (see patch below).
The results of this experiment can be found at "second oprofile result" below.
When I interpret the results correctly, then it looks as if the apic_read() is actually
eating up the performance. As this call "leaves" the CPU internal "area" and accesses the external
APIC this sounds sensible.
I think this is done here to detect the current CPU.
Is it possible to detect differently or to store that information somehow with a thread (TLS)
to avoid requesting it frequently?
I hope that helps a little bit to identify this issue and (perhaps) to find a faster solution.
Thanks for all feedback on this!
Regards
Mathias
------------ Begin of patch -----------------
--- ipipe.c.orig 2007-05-09 16:16:32.000000000 +0200
+++ ipipe.c 2007-05-11 08:47:41.000000000 +0200
@@ -72,13 +72,30 @@
int (*__ipipe_logical_cpuid)(void) = &__ipipe_boot_cpuid;
+
+unsigned long __ipipe_hard_cpuid_apic_read(void)
+{
+ return apic_read(APIC_ID);
+}
+
+unsigned __ipipe_hard_cpuid_get_apic_id(unsigned long apic)
+{
+ return GET_APIC_ID(apic);
+}
+
+
static notrace int __ipipe_hard_cpuid(void)
{
unsigned long flags;
int cpu;
+ unsigned long apic;
+ unsigned apic_id;
local_irq_save_hw_notrace(flags);
- cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+ // cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+ apic = __ipipe_hard_cpuid_apic_read();
+ apic_id = __ipipe_hard_cpuid_get_apic_id(apic);
+ cpu = __ipipe_apicid_2_cpu[apic_id];
local_irq_restore_hw_notrace(flags);
return cpu;
}
--------- End of patch
------------- First oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 50404054895 per step: 5040
Stopping profiling.
CPU: P4 / Xeon, speed 3192.16 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (ma
ndatory) count 100000
samples % image name app name symbol name
6273 56.8928 vmlinux vmlinux __ipipe_hard_cpuid
900 8.1625 vmlinux vmlinux rt_sem_v
694 6.2942 vmlinux vmlinux xnregistry_fetch
321 2.9113 vmlinux vmlinux __ipipe_dispatch_event
293 2.6574 bash bash (no symbols)
250 2.2674 vmlinux vmlinux hrtimer_run_queues
227 2.0588 libc-2.3.6.so libc-2.3.6.so (no symbols)
140 1.2697 vmlinux vmlinux delay_tsc
123 1.1155 libnative.so.0.0.0 libnative.so.0.0.0 rt_sem_p
103 0.9342 vmlinux vmlinux __ipipe_stall_root
78 0.7074 vmlinux vmlinux __ipipe_test_and_stall_root
67 0.6077 vmlinux vmlinux apic_timer_interrupt
66 0.5986 vmlinux vmlinux sysenter_past_esp
60 0.5442 vmlinux vmlinux __ipipe_restore_pipeline_head
56 0.5079 vmlinux vmlinux do_wp_page
56 0.5079 vmlinux vmlinux search_by_key
54 0.4898 oprofiled oprofiled (no symbols)
41 0.3718 vmlinux vmlinux __ipipe_sync_stage
37 0.3356 ld-2.3.6.so ld-2.3.6.so do_lookup_x
37 0.3356 vmlinux vmlinux __handle_mm_fault
25 0.2267 vmlinux vmlinux __ipipe_handle_exception
25 0.2267 vmlinux vmlinux find_get_page
25 0.2267 vmlinux vmlinux get_page_from_freelist
25 0.2267 vmlinux vmlinux run_timer_softirq
25 0.2267 vmlinux vmlinux scheduler_tick
24 0.2177 vmlinux vmlinux sysenter_exit
23 0.2086 vmlinux vmlinux __ipipe_syscall_root
23 0.2086 vmlinux vmlinux __ipipe_unstall_root
22 0.1995 libnative.so.0.0.0 libnative.so.0.0.0 rt_sem_v
21 0.1905 vmlinux vmlinux __ipipe_test_root
21 0.1905 vmlinux vmlinux ata_bmdma_start
19 0.1723 ld-2.3.6.so ld-2.3.6.so strcmp
19 0.1723 vmlinux vmlinux ata_altstatus
19 0.1723 vmlinux vmlinux ata_bmdma_irq_clear
18 0.1633 oprofile oprofile (no symbols)
18 0.1633 vmlinux vmlinux find_vma
18 0.1633 vmlinux vmlinux flush_tlb_page
18 0.1633 vmlinux vmlinux release_pages
18 0.1633 vmlinux vmlinux unmap_vmas
17 0.1542 ld-2.3.6.so ld-2.3.6.so _dl_relocate_object
17 0.1542 vmlinux vmlinux __ipipe_unstall_iret_root
---------------------------------------------
------------- Second oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 51846556098 per step: 5184
Stopping profiling.
CPU: P4 / Xeon, speed 3192.33 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (ma
ndatory) count 100000
samples % image name app name symbol name
4350 39.4307 vmlinux vmlinux __ipipe_hard_cpuid_apic_read
2336 21.1748 vmlinux vmlinux __ipipe_hard_cpuid
306 2.7737 bash bash (no symbols)
287 2.6015 vmlinux vmlinux __ipipe_dispatch_event
276 2.5018 vmlinux vmlinux sysenter_past_esp
269 2.4384 vmlinux vmlinux hrtimer_run_queues
264 2.3930 vmlinux vmlinux __ipipe_syscall_root
245 2.2208 vmlinux vmlinux xnregistry_fetch
209 1.8945 libc-2.3.6.so libc-2.3.6.so (no symbols)
173 1.5682 vmlinux vmlinux rt_sem_v
154 1.3959 vmlinux vmlinux __ipipe_restore_pipeline_head
128 1.1603 vmlinux vmlinux __copy_from_user_ll_nozero
102 0.9246 vmlinux vmlinux delay_tsc
100 0.9065 vmlinux vmlinux __ipipe_stall_root
100 0.9065 vmlinux vmlinux hisyscall_event
90 0.8158 vmlinux vmlinux apic_timer_interrupt
89 0.8067 vmlinux vmlinux __ipipe_test_and_stall_root
80 0.7252 vmlinux vmlinux rt_sem_p
74 0.6708 vmlinux vmlinux sysenter_exit
58 0.5257 vmlinux vmlinux do_wp_page
53 0.4804 oprofiled oprofiled (no symbols)
53 0.4804 vmlinux vmlinux search_by_key
50 0.4532 vmlinux vmlinux __ipipe_sync_stage
40 0.3626 vmlinux vmlinux __rt_sem_v
39 0.3535 vmlinux vmlinux __rt_sem_p
31 0.2810 ld-2.3.6.so ld-2.3.6.so do_lookup_x
30 0.2719 vmlinux vmlinux find_get_page
27 0.2447 vmlinux vmlinux ata_bmdma_start
24 0.2175 vmlinux vmlinux __ipipe_unstall_root
24 0.2175 vmlinux vmlinux run_timer_softirq
23 0.2085 vmlinux vmlinux unmap_vmas
22 0.1994 vmlinux vmlinux flush_tlb_page
21 0.1904 ld-2.3.6.so ld-2.3.6.so strcmp
21 0.1904 vmlinux vmlinux __handle_mm_fault
19 0.1722 vmlinux vmlinux __ipipe_test_root
18 0.1632 vmlinux vmlinux do_page_fault
17 0.1541 oprofile oprofile (no symbols)
17 0.1541 vmlinux vmlinux __ipipe_handle_exception
16 0.1450 vmlinux vmlinux __d_lookup
15 0.1360 vmlinux vmlinux filemap_nopage
15 0.1360 vmlinux vmlinux get_page_from_freelist
15 0.1360 vmlinux vmlinux page_remove_rmap
15 0.1360 vmlinux vmlinux restore_nocheck_notrace
14 0.1269 vmlinux vmlinux _atomic_dec_and_lock
14 0.1269 vmlinux vmlinux copy_page_range
14 0.1269 vmlinux vmlinux scheduler_tick
12 0.1088 ld-2.3.6.so ld-2.3.6.so _dl_relocate_object
12 0.1088 vmlinux vmlinux __find_get_block
12 0.1088 vmlinux vmlinux __ipipe_unstall_iret_root
--------------------------------------------------------------
> define CONFIG_XENO_OPT_DEBUG and CONFIG_DEBUG_KERNEL/CONFIG_DEBUG_INFO
> to have symbols and all. OProfile is only able to look up virtual
> address
> when debug symbols are present in file. You may have to pass the
> --vmlinux option to opcontrol. Compiling Xenomai like suggested by Jan
> will make your life easier, and will bring you an extra mini speedup
> if you're in that kind of business.
>
> --
> Stephane
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 39,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
next reply other threads:[~2007-05-11 7:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-11 7:08 M. Koehrer [this message]
2007-05-11 7:37 ` [Xenomai-help] Xenomai and futexes - Native API optimized M. Koehrer
2007-05-11 7:44 ` Jan Kiszka
2007-05-11 8:10 ` M. Koehrer
2007-05-11 8:21 ` Philippe Gerum
-- strict thread matches above, loose matches on Subject: below --
2007-05-10 17:23 Fillod Stephane
2007-05-09 10:52 [Xenomai-help] Xenomai and futexes - Native API optimized for user space only applications M. Koehrer
2007-05-09 16:14 ` Philippe Gerum
2007-05-10 8:31 ` [Xenomai-help] Xenomai and futexes - Native API optimized M. Koehrer
2007-05-10 9:26 ` Jan Kiszka
2007-05-10 13:02 ` M. Koehrer
2007-05-10 13:41 ` M. Koehrer
2007-05-10 16:21 ` Jan Kiszka
2007-05-10 9:46 ` Daniel Schnell
2007-05-10 15:16 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=15183185.1178867312688.JavaMail.ngmail@domain.hid \
--to=mathias_koehrer@domain.hid \
--cc=jan.kiszka@domain.hid \
--cc=stephane.fillod@domain.hid \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.