All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-help] Xenomai and futexes - Native API optimized for user space only applications
@ 2007-05-09 10:52 M. Koehrer
  2007-05-09 11:42 ` Dmitry Adamushko
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: M. Koehrer @ 2007-05-09 10:52 UTC (permalink / raw)
  To: xenomai

Hi everybody,

I am using Xenomai for a high performance real time simulation system. 
All of the simulation code is executed within user space. One application is running that consists
of several Xenomai real time threads.
For performance reasons I am always using the latest PC technology available (which is currently
Pentium D or Core2Duo PCs, we are thinking even of using quad-core CPUs).
There has to be a kind of thread synchronisation e.g. when accessing shared data.
Hardware I/O is done via rtnet or via PCI I/O boards that work in user space aswell (PCI memory
mapped into the user space).
This is done e.g. by using semaphores, mutexes et.c
I like the Xenomai-native skin as it provides a very clear API that is easy to use.
However, for a user space only application it is a performance killer, that all API calls
lead to a mode switch from user to kernel space. Each API call takes about 1-2 microseconds (us)
on my PC which is really expensive.
Especially when inter process communication is used to protect the access to shared data
it is mostly the case that the calling thread does not have to wait. In this situation there is no
need for a context switch. The API call did not lead to a rescheduling of the available tasks.
And for this the required 1-2 us do really hurt.

Thus my question/proposal is if there is a plan to use a "variant" of the native API that is optimized for
user space only applications. In this case e.g. futexes can be used. If there is a need to
reschedule to another task it is fine to "invest" the 2us but it can be avoided mostly which should
increase the overall performance dramatically.
This would lead to a library where a big part of the functionality is handled directly in the library 
(in user space). Currently the skin passes the (user space) API call via a Xenomai System call 
to the kernel space to execute there the actual functionality.

Thanks for any feedback on this proposal.

Regards

Mathias





-- 
Mathias Koehrer
mathias_koehrer@domain.hid


Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur  39,85 €  inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Xenomai and futexes - Native API optimized
@ 2007-05-10 17:23 Fillod Stephane
  0 siblings, 0 replies; 24+ messages in thread
From: Fillod Stephane @ 2007-05-10 17:23 UTC (permalink / raw)
  To: Jan Kiszka, M. Koehrer; +Cc: xenomai

Jan Kiszka wrote:
[...]
>> 2309      5.4062  xeno_nucleus             xeno_nucleus
(no symbols)
>
>We are lacking module symbols here. Dunno if one can teach them to
>oprofile (likely somehow), but the easiest approach would be to compile
>xeno_native and nucleus into the kernel.

You can pass --enable-debug to the configure script in order to enable
debug symbols (and don't strip programs/libs). For the kernel and
modules,
define CONFIG_XENO_OPT_DEBUG and CONFIG_DEBUG_KERNEL/CONFIG_DEBUG_INFO
to have symbols and all. OProfile is only able to look up virtual
address
when debug symbols are present in file. You may have to pass the 
--vmlinux option to opcontrol. Compiling Xenomai like suggested by Jan
will make your life easier, and will bring you an extra mini speedup
if you're in that kind of business.

-- 
Stephane


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [Xenomai-help] Xenomai and futexes - Native API optimized
@ 2007-05-11  7:08 M. Koehrer
  2007-05-11  7:37 ` M. Koehrer
  2007-05-11  7:44 ` Jan Kiszka
  0 siblings, 2 replies; 24+ messages in thread
From: M. Koehrer @ 2007-05-11  7:08 UTC (permalink / raw)
  To: stephane.fillod, jan.kiszka, mathias_koehrer; +Cc: xenomai

Hi,

here are my latest results:
I included xeno_nucleus and xeno_native into the kernel. (see first oprofile result).
(I also added debug information to the kernel which was of no additional help).
I detected that most of the time was spent in __ipipe_hard_cpuid.
I looked at that routine and broke that using 2 additional helper functions that are use
to "monitor" the apic_read() resp. the GET_APIC_ID calls (see patch below).
The results of this experiment can be found at "second oprofile result" below.
When I interpret the results correctly, then it looks as if the apic_read() is actually
eating up the performance. As this call "leaves" the CPU internal "area" and accesses the external 
APIC this sounds sensible.
I think this is done here to detect the current CPU.
Is it possible to detect differently or to store that information somehow with a thread (TLS)
to avoid requesting it frequently?
 
I hope that helps a little bit to identify this issue and (perhaps) to find a faster solution.
Thanks for all feedback on this!

Regards

Mathias

------------ Begin of patch -----------------
--- ipipe.c.orig        2007-05-09 16:16:32.000000000 +0200
+++ ipipe.c     2007-05-11 08:47:41.000000000 +0200
@@ -72,13 +72,30 @@

 int (*__ipipe_logical_cpuid)(void) = &__ipipe_boot_cpuid;

+
+unsigned long __ipipe_hard_cpuid_apic_read(void)
+{
+    return apic_read(APIC_ID);
+}
+
+unsigned __ipipe_hard_cpuid_get_apic_id(unsigned long apic)
+{
+        return GET_APIC_ID(apic);
+}
+
+
 static notrace int __ipipe_hard_cpuid(void)
 {
        unsigned long flags;
        int cpu;
+        unsigned long apic;
+        unsigned apic_id;

        local_irq_save_hw_notrace(flags);
-       cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+       // cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+        apic = __ipipe_hard_cpuid_apic_read();
+        apic_id = __ipipe_hard_cpuid_get_apic_id(apic);
+        cpu = __ipipe_apicid_2_cpu[apic_id];
        local_irq_restore_hw_notrace(flags);
        return cpu;
 }
--------- End of patch 

------------- First oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 50404054895 per step: 5040
Stopping profiling.
CPU: P4 / Xeon, speed 3192.16 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (ma
ndatory) count 100000
samples  %        image name               app name                 symbol name
6273     56.8928  vmlinux                  vmlinux                  __ipipe_hard_cpuid
900       8.1625  vmlinux                  vmlinux                  rt_sem_v
694       6.2942  vmlinux                  vmlinux                  xnregistry_fetch
321       2.9113  vmlinux                  vmlinux                  __ipipe_dispatch_event
293       2.6574  bash                     bash                     (no symbols)
250       2.2674  vmlinux                  vmlinux                  hrtimer_run_queues
227       2.0588  libc-2.3.6.so            libc-2.3.6.so            (no symbols)
140       1.2697  vmlinux                  vmlinux                  delay_tsc
123       1.1155  libnative.so.0.0.0       libnative.so.0.0.0       rt_sem_p
103       0.9342  vmlinux                  vmlinux                  __ipipe_stall_root
78        0.7074  vmlinux                  vmlinux                  __ipipe_test_and_stall_root
67        0.6077  vmlinux                  vmlinux                  apic_timer_interrupt
66        0.5986  vmlinux                  vmlinux                  sysenter_past_esp
60        0.5442  vmlinux                  vmlinux                  __ipipe_restore_pipeline_head
56        0.5079  vmlinux                  vmlinux                  do_wp_page
56        0.5079  vmlinux                  vmlinux                  search_by_key
54        0.4898  oprofiled                oprofiled                (no symbols)
41        0.3718  vmlinux                  vmlinux                  __ipipe_sync_stage
37        0.3356  ld-2.3.6.so              ld-2.3.6.so              do_lookup_x
37        0.3356  vmlinux                  vmlinux                  __handle_mm_fault
25        0.2267  vmlinux                  vmlinux                  __ipipe_handle_exception
25        0.2267  vmlinux                  vmlinux                  find_get_page
25        0.2267  vmlinux                  vmlinux                  get_page_from_freelist
25        0.2267  vmlinux                  vmlinux                  run_timer_softirq
25        0.2267  vmlinux                  vmlinux                  scheduler_tick
24        0.2177  vmlinux                  vmlinux                  sysenter_exit
23        0.2086  vmlinux                  vmlinux                  __ipipe_syscall_root
23        0.2086  vmlinux                  vmlinux                  __ipipe_unstall_root
22        0.1995  libnative.so.0.0.0       libnative.so.0.0.0       rt_sem_v
21        0.1905  vmlinux                  vmlinux                  __ipipe_test_root
21        0.1905  vmlinux                  vmlinux                  ata_bmdma_start
19        0.1723  ld-2.3.6.so              ld-2.3.6.so              strcmp
19        0.1723  vmlinux                  vmlinux                  ata_altstatus
19        0.1723  vmlinux                  vmlinux                  ata_bmdma_irq_clear
18        0.1633  oprofile                 oprofile                 (no symbols)
18        0.1633  vmlinux                  vmlinux                  find_vma
18        0.1633  vmlinux                  vmlinux                  flush_tlb_page
18        0.1633  vmlinux                  vmlinux                  release_pages
18        0.1633  vmlinux                  vmlinux                  unmap_vmas
17        0.1542  ld-2.3.6.so              ld-2.3.6.so              _dl_relocate_object
17        0.1542  vmlinux                  vmlinux                  __ipipe_unstall_iret_root
---------------------------------------------

------------- Second oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 51846556098 per step: 5184
Stopping profiling.
CPU: P4 / Xeon, speed 3192.33 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (ma
ndatory) count 100000
samples  %        image name               app name                 symbol name
4350     39.4307  vmlinux                  vmlinux                  __ipipe_hard_cpuid_apic_read
2336     21.1748  vmlinux                  vmlinux                  __ipipe_hard_cpuid
306       2.7737  bash                     bash                     (no symbols)
287       2.6015  vmlinux                  vmlinux                  __ipipe_dispatch_event
276       2.5018  vmlinux                  vmlinux                  sysenter_past_esp
269       2.4384  vmlinux                  vmlinux                  hrtimer_run_queues
264       2.3930  vmlinux                  vmlinux                  __ipipe_syscall_root
245       2.2208  vmlinux                  vmlinux                  xnregistry_fetch
209       1.8945  libc-2.3.6.so            libc-2.3.6.so            (no symbols)
173       1.5682  vmlinux                  vmlinux                  rt_sem_v
154       1.3959  vmlinux                  vmlinux                  __ipipe_restore_pipeline_head
128       1.1603  vmlinux                  vmlinux                  __copy_from_user_ll_nozero
102       0.9246  vmlinux                  vmlinux                  delay_tsc
100       0.9065  vmlinux                  vmlinux                  __ipipe_stall_root
100       0.9065  vmlinux                  vmlinux                  hisyscall_event
90        0.8158  vmlinux                  vmlinux                  apic_timer_interrupt
89        0.8067  vmlinux                  vmlinux                  __ipipe_test_and_stall_root
80        0.7252  vmlinux                  vmlinux                  rt_sem_p
74        0.6708  vmlinux                  vmlinux                  sysenter_exit
58        0.5257  vmlinux                  vmlinux                  do_wp_page
53        0.4804  oprofiled                oprofiled                (no symbols)
53        0.4804  vmlinux                  vmlinux                  search_by_key
50        0.4532  vmlinux                  vmlinux                  __ipipe_sync_stage
40        0.3626  vmlinux                  vmlinux                  __rt_sem_v
39        0.3535  vmlinux                  vmlinux                  __rt_sem_p
31        0.2810  ld-2.3.6.so              ld-2.3.6.so              do_lookup_x
30        0.2719  vmlinux                  vmlinux                  find_get_page
27        0.2447  vmlinux                  vmlinux                  ata_bmdma_start
24        0.2175  vmlinux                  vmlinux                  __ipipe_unstall_root
24        0.2175  vmlinux                  vmlinux                  run_timer_softirq
23        0.2085  vmlinux                  vmlinux                  unmap_vmas
22        0.1994  vmlinux                  vmlinux                  flush_tlb_page
21        0.1904  ld-2.3.6.so              ld-2.3.6.so              strcmp
21        0.1904  vmlinux                  vmlinux                  __handle_mm_fault
19        0.1722  vmlinux                  vmlinux                  __ipipe_test_root
18        0.1632  vmlinux                  vmlinux                  do_page_fault
17        0.1541  oprofile                 oprofile                 (no symbols)
17        0.1541  vmlinux                  vmlinux                  __ipipe_handle_exception
16        0.1450  vmlinux                  vmlinux                  __d_lookup
15        0.1360  vmlinux                  vmlinux                  filemap_nopage
15        0.1360  vmlinux                  vmlinux                  get_page_from_freelist
15        0.1360  vmlinux                  vmlinux                  page_remove_rmap
15        0.1360  vmlinux                  vmlinux                  restore_nocheck_notrace
14        0.1269  vmlinux                  vmlinux                  _atomic_dec_and_lock
14        0.1269  vmlinux                  vmlinux                  copy_page_range
14        0.1269  vmlinux                  vmlinux                  scheduler_tick
12        0.1088  ld-2.3.6.so              ld-2.3.6.so              _dl_relocate_object
12        0.1088  vmlinux                  vmlinux                  __find_get_block
12        0.1088  vmlinux                  vmlinux                  __ipipe_unstall_iret_root
--------------------------------------------------------------


> define CONFIG_XENO_OPT_DEBUG and CONFIG_DEBUG_KERNEL/CONFIG_DEBUG_INFO
> to have symbols and all. OProfile is only able to look up virtual
> address
> when debug symbols are present in file. You may have to pass the 
> --vmlinux option to opcontrol. Compiling Xenomai like suggested by Jan
> will make your life easier, and will bring you an extra mini speedup
> if you're in that kind of business.
> 
> -- 
> Stephane
> 


-- 
Mathias Koehrer
mathias_koehrer@domain.hid


Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur  39,85 €  inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-05-11  8:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-09 10:52 [Xenomai-help] Xenomai and futexes - Native API optimized for user space only applications M. Koehrer
2007-05-09 11:42 ` Dmitry Adamushko
2007-05-09 11:51 ` Jan Kiszka
2007-05-09 12:27   ` [Xenomai-help] Xenomai and futexes - Native API optimized for M. Koehrer
2007-05-09 12:51     ` Jan Kiszka
2007-05-09 13:35     ` Herman Bruyninckx
2007-05-09 13:58       ` M. Koehrer
2007-05-09 15:46         ` Jan Kiszka
2007-05-09 22:36         ` Herman Bruyninckx
2007-05-09 16:14 ` [Xenomai-help] Xenomai and futexes - Native API optimized for user space only applications Philippe Gerum
2007-05-09 16:32   ` Gilles Chanteperdrix
2007-05-10  8:31   ` [Xenomai-help] Xenomai and futexes - Native API optimized M. Koehrer
2007-05-10  9:26     ` Jan Kiszka
2007-05-10 13:02       ` M. Koehrer
2007-05-10 13:41         ` M. Koehrer
2007-05-10 16:21           ` Jan Kiszka
2007-05-10  9:46     ` Daniel Schnell
2007-05-10 15:16       ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2007-05-10 17:23 Fillod Stephane
2007-05-11  7:08 M. Koehrer
2007-05-11  7:37 ` M. Koehrer
2007-05-11  7:44 ` Jan Kiszka
2007-05-11  8:10   ` M. Koehrer
2007-05-11  8:21     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.