* [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock
@ 2008-02-23 13:33 Jan Kiszka
2008-02-23 13:36 ` [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Jan Kiszka
` (3 more replies)
0 siblings, 4 replies; 25+ messages in thread
From: Jan Kiszka @ 2008-02-23 13:33 UTC (permalink / raw)
To: Xenomai-core
Hi,
I'm happy to announce a patch series which actually started like "hey,
let's check if we can have this nice FIFO ticket spinlock algorithm also
for our recursive xnlock." The result is a critical bug fix and a
massive text size reduction of nucleus and skins, both concerning SMP
systems. Furthermore, there are now optional ticket locks available for
larger SMP boxes. :)
This series consists of
1. Refactoring of asm-generic/system.h
(+ some optimization of xnlock_get)
2. SMP-critical memory barrier fix for xnlock_put
(+ optimization for production code)
3. Uninlining of xnlock_get_irqsave/xnlock_put_irqrestore on SMP
(xeno_nucleus text, x86-64, non-debug: 112482 -> 79453 bytes!)
4. Optional FIFO ticket spinlock for strict deterministic spin times
on xnlocks (still highly experimental!)
The second patch is stable stuff, so we should merge 1+2 into 2.4.x as well.
Jan
^ permalink raw reply [flat|nested] 25+ messages in thread* [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 13:33 [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock Jan Kiszka @ 2008-02-23 13:36 ` Jan Kiszka 2008-02-23 17:41 ` Gilles Chanteperdrix 2008-02-23 13:37 ` [Xenomai-core] [PATCH 1/4] Refactor generic system.h Jan Kiszka ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 13:36 UTC (permalink / raw) To: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 711 bytes --] As the #ifdef forest was cut down, I once again looked at xnlock_put. Why do you have this safety check for the owner also in production code? Let's move it into the debug section. That leaves up with static inline void xnlock_put(xnlock_t *lock) { xnlock_dbg_release(lock); atomic_set(&lock->owner, ~0); } Nice - but wait, where is the memory barrier here? Was it in the original code? No! Neither atomic_read, atomic_set, nor the rthal_processor_id imply any kind of barrier to ensure all memory writes inside the protected region are committed before releasing the lock. Cool, there was a real bug under all this dust, and this patch fixes it + it shortens the hot path on SMP production systems. Jan [-- Attachment #2: rework-xnlock_put.patch --] [-- Type: text/x-patch, Size: 2359 bytes --] --- include/asm-generic/system.h | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) Index: b/include/asm-generic/system.h =================================================================== --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -163,7 +163,20 @@ static inline void xnlock_dbg_release(xn { extern xnlockinfo_t xnlock_stats[]; unsigned long long lock_time = rthal_rdtsc() - lock->lock_date; - xnlockinfo_t *stats = &xnlock_stats[xnarch_current_cpu()]; + int cpu = xnarch_current_cpu(); + xnlockinfo_t *stats = &xnlock_stats[cpu]; + + if (unlikely(atomic_read(&lock->owner) != cpu)) { + rthal_emergency_console(); + printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p" + " on CPU #%d\n" + " owner = %s:%u (%s(), CPU #%d)\n", + lock, cpu, lock->file, lock->line, lock->function, + lock->cpu); + show_stack(NULL,NULL); + for (;;) + cpu_relax(); + } if (lock_time > stats->lock_time) { stats->lock_time = lock_time; @@ -174,17 +187,6 @@ static inline void xnlock_dbg_release(xn } } -static inline void xnlock_dbg_invalid_release(xnlock_t *lock) -{ - rthal_emergency_console(); - printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p\n" - " owner = %s:%u (%s(), CPU #%d)\n", - lock, lock->file, lock->line, lock->function, lock->cpu); - show_stack(NULL,NULL); - for (;;) - cpu_relax(); -} - #else /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ typedef struct { atomic_t owner; } xnlock_t; @@ -198,8 +200,7 @@ typedef struct { atomic_t owner; } xnloc #define XNLOCK_DBG_SPINNING() do { } while (0) #define XNLOCK_DBG_ACQUIRED() do { } while (0) -static inline void xnlock_dbg_release(xnlock_t *lock) { } -static inline void xnlock_dbg_invalid_release(xnlock_t *lock) { } +static inline void xnlock_dbg_release(xnlock_t *lock) { } #endif /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ @@ -325,11 +326,10 @@ static inline int __xnlock_get(xnlock_t static inline void xnlock_put(xnlock_t *lock) { - if (likely(atomic_read(&lock->owner) == xnarch_current_cpu())) { - xnlock_dbg_release(lock); - atomic_set(&lock->owner, ~0); - } else - xnlock_dbg_invalid_release(lock); + xnarch_memory_barrier(); + + xnlock_dbg_release(lock); + atomic_set(&lock->owner, ~0); } static inline spl_t ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 13:36 ` [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Jan Kiszka @ 2008-02-23 17:41 ` Gilles Chanteperdrix 2008-02-23 18:05 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 17:41 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > As the #ifdef forest was cut down, I once again looked at xnlock_put. > Why do you have this safety check for the owner also in production code? Because only one broken xnlock_put could entail a chain reaction of broken xnlock sections with code on multiple CPU violating critical sections. With the test, we prevent the chain reaction. But I agree this check should not be silent. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 17:41 ` Gilles Chanteperdrix @ 2008-02-23 18:05 ` Jan Kiszka 2008-02-23 18:29 ` Gilles Chanteperdrix 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:05 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1047 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > Why do you have this safety check for the owner also in production code? > > Because only one broken xnlock_put could entail a chain reaction of > broken xnlock sections with code on multiple CPU violating critical > sections. With the test, we prevent the chain reaction. But I agree this > check should not be silent. When there is a bug, then there is bug and we are hosed. That's why we have debug checks for finding such cases in advance. Here I was talking about such a debug check in a hot path on a _production_ system, and that check even had no fault recovery. That appeared pointless to me. Just to avoid misunderstandings: This version is not different from the old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced to cover specifically lock debugging. Do you have an idea for some cheap fault recovery for broken locking that we should put in instead? Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 18:05 ` Jan Kiszka @ 2008-02-23 18:29 ` Gilles Chanteperdrix 2008-02-23 18:57 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 18:29 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > > Why do you have this safety check for the owner also in production code? > > > > Because only one broken xnlock_put could entail a chain reaction of > > broken xnlock sections with code on multiple CPU violating critical > > sections. With the test, we prevent the chain reaction. But I agree this > > check should not be silent. > > When there is a bug, then there is bug and we are hosed. That's why we > have debug checks for finding such cases in advance. Here I was talking > about such a debug check in a hot path on a _production_ system, and > that check even had no fault recovery. That appeared pointless to me. > > Just to avoid misunderstandings: This version is not different from the > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > to cover specifically lock debugging. > > Do you have an idea for some cheap fault recovery for broken locking > that we should put in instead? The fault recovery is to leave the xnlock untouched, in order to avoid the chain reaction I was talking about. Anyway, I think even production code should contain some sanity checks, and this one looks cheap. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 18:29 ` Gilles Chanteperdrix @ 2008-02-23 18:57 ` Jan Kiszka 2008-02-23 19:41 ` Gilles Chanteperdrix 2008-02-23 23:50 ` Philippe Gerum 0 siblings, 2 replies; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:57 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1930 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Jan Kiszka wrote: > > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > > > Why do you have this safety check for the owner also in production code? > > > > > > Because only one broken xnlock_put could entail a chain reaction of > > > broken xnlock sections with code on multiple CPU violating critical > > > sections. With the test, we prevent the chain reaction. But I agree this > > > check should not be silent. > > > > When there is a bug, then there is bug and we are hosed. That's why we > > have debug checks for finding such cases in advance. Here I was talking > > about such a debug check in a hot path on a _production_ system, and > > that check even had no fault recovery. That appeared pointless to me. > > > > Just to avoid misunderstandings: This version is not different from the > > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > > to cover specifically lock debugging. > > > > Do you have an idea for some cheap fault recovery for broken locking > > that we should put in instead? > > The fault recovery is to leave the xnlock untouched, in order to avoid > the chain reaction I was talking about. But you may later deadlock on it when trying to reacquire this unbalanced lock. It can help against double releases, granted. Still, such cases should be eliminated _ahead_ of release via review, so that one can finally go for the fast unchecked version in the matured system. > Anyway, I think even production > code should contain some sanity checks, and this one looks cheap. I'm fine with a simple check as well. But there should be an _option_ for such checks, even more if they are supposed to be inlined. Uninlining reduces this pressure, but it still adds text size to the hot path. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 18:57 ` Jan Kiszka @ 2008-02-23 19:41 ` Gilles Chanteperdrix 2008-02-23 23:50 ` Philippe Gerum 1 sibling, 0 replies; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 19:41 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > Gilles Chanteperdrix wrote: > > > > Jan Kiszka wrote: > > > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. > > > > > Why do you have this safety check for the owner also in production code? > > > > > > > > Because only one broken xnlock_put could entail a chain reaction of > > > > broken xnlock sections with code on multiple CPU violating critical > > > > sections. With the test, we prevent the chain reaction. But I agree this > > > > check should not be silent. > > > > > > When there is a bug, then there is bug and we are hosed. That's why we > > > have debug checks for finding such cases in advance. Here I was talking > > > about such a debug check in a hot path on a _production_ system, and > > > that check even had no fault recovery. That appeared pointless to me. > > > > > > Just to avoid misunderstandings: This version is not different from the > > > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced > > > to cover specifically lock debugging. > > > > > > Do you have an idea for some cheap fault recovery for broken locking > > > that we should put in instead? > > > > The fault recovery is to leave the xnlock untouched, in order to avoid > > the chain reaction I was talking about. > > But you may later deadlock on it when trying to reacquire this > unbalanced lock. It can help against double releases, granted. Still, > such cases should be eliminated _ahead_ of release via review, so that > one can finally go for the fast unchecked version in the matured system. A deadlock is far easier to spot than a violated critical section. > > > Anyway, I think even production > > code should contain some sanity checks, and this one looks cheap. > > I'm fine with a simple check as well. But there should be an _option_ > for such checks, even more if they are supposed to be inlined. > Uninlining reduces this pressure, but it still adds text size to the hot > path. No, no option, some sanity checks should be unavoidable, this one is such a check. We will probably never agree I guess. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put 2008-02-23 18:57 ` Jan Kiszka 2008-02-23 19:41 ` Gilles Chanteperdrix @ 2008-02-23 23:50 ` Philippe Gerum 1 sibling, 0 replies; 25+ messages in thread From: Philippe Gerum @ 2008-02-23 23:50 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >> > Gilles Chanteperdrix wrote: >> > > Jan Kiszka wrote: >> > > > As the #ifdef forest was cut down, I once again looked at xnlock_put. >> > > > Why do you have this safety check for the owner also in production code? >> > > >> > > Because only one broken xnlock_put could entail a chain reaction of >> > > broken xnlock sections with code on multiple CPU violating critical >> > > sections. With the test, we prevent the chain reaction. But I agree this >> > > check should not be silent. >> > >> > When there is a bug, then there is bug and we are hosed. That's why we >> > have debug checks for finding such cases in advance. Here I was talking >> > about such a debug check in a hot path on a _production_ system, and >> > that check even had no fault recovery. That appeared pointless to me. >> > >> > Just to avoid misunderstandings: This version is not different from the >> > old one if XENO_OPT_DEBUG_NUCLEUS is on, the switch which was introduced >> > to cover specifically lock debugging. >> > >> > Do you have an idea for some cheap fault recovery for broken locking >> > that we should put in instead? >> >> The fault recovery is to leave the xnlock untouched, in order to avoid >> the chain reaction I was talking about. > > But you may later deadlock on it when trying to reacquire this > unbalanced lock. It can help against double releases, granted. Still, > such cases should be eliminated _ahead_ of release via review, so that > one can finally go for the fast unchecked version in the matured system. > >> Anyway, I think even production >> code should contain some sanity checks, and this one looks cheap. > > I'm fine with a simple check as well. But there should be an _option_ > for such checks, even more if they are supposed to be inlined. > Uninlining reduces this pressure, but it still adds text size to the hot > path. I don't think this is strictly equivalent. The cause of deadlock is no rocket science to diagnose even if it may be painful to fix, so our main problem is critical section breakage. And this particular kind of bugs is extremely sensitive to the instrumentation overhead, to the point where enabling a debug option would turn it into a damned Heisenbug. Unless you want to create a unique debug option to control each and every tiny check, you will likely bind it to a more general debug option, turning on a significant amount of unrelated debug code, which could in turn affect the bug itself. This is where I agree with Gilles, I would rather pay peanuts cycles when acquiring a lock on SMP machines to run this check, than chase wild gooses for days. Some key issues like obvious critical section breakage deserve systematic detection, including in the field. Due to their very nature, maybe you won't be able to trigger some of them elsewhere anyway, because in some complex application cases (the ones we more often have with SMP configs), you may just fake part of the operating environment to debug the application (network feed simulation and so on). But, and there I tend to agree with you, uninlining the containing code is definitely something I would combine with such approach, since I-cache is a precious thing to save. Anyway, measurements will tell. -- Philippe. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-02-23 13:33 [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock Jan Kiszka 2008-02-23 13:36 ` [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Jan Kiszka @ 2008-02-23 13:37 ` Jan Kiszka 2008-02-23 17:38 ` Gilles Chanteperdrix 2008-02-23 13:38 ` [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions Jan Kiszka 2008-02-23 13:50 ` [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock Jan Kiszka 3 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 13:37 UTC (permalink / raw) To: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 443 bytes --] In order to allow further optimizations of xnlock, I started with refactoring the related system.h. This improves the readability significantly, IMHO. It also happen to reduce the text size of __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. Another quirk I happen to remove: xnlock debugging depends on XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version of xnlock_t already with XENO_OPT_DEBUG. Jan [-- Attachment #2: refactor-generic-system-h.patch --] [-- Type: text/x-patch, Size: 17874 bytes --] --- include/asm-generic/system.h | 412 +++++++++++++++++++++---------------------- 1 file changed, 204 insertions(+), 208 deletions(-) Index: b/include/asm-generic/system.h =================================================================== --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -47,18 +47,22 @@ #define CONFIG_XENO_OPT_DEBUG_NUCLEUS 0 #endif +#ifdef __cplusplus +extern "C" { +#endif + /* Time base export */ #define xnarch_declare_tbase(base) do { } while(0) /* Tracer interface */ #define xnarch_trace_max_begin(v) rthal_trace_max_begin(v) -#define xnarch_trace_max_end(v) rthal_trace_max_end(v) +#define xnarch_trace_max_end(v) rthal_trace_max_end(v) #define xnarch_trace_max_reset() rthal_trace_max_reset() #define xnarch_trace_user_start() rthal_trace_user_start() #define xnarch_trace_user_stop(v) rthal_trace_user_stop(v) -#define xnarch_trace_user_freeze(v, once) rthal_trace_user_freeze(v, once) +#define xnarch_trace_user_freeze(v, once) rthal_trace_user_freeze(v, once) #define xnarch_trace_special(id, v) rthal_trace_special(id, v) -#define xnarch_trace_special_u64(id, v) rthal_trace_special_u64(id, v) +#define xnarch_trace_special_u64(id, v) rthal_trace_special_u64(id, v) #define xnarch_trace_pid(pid, prio) rthal_trace_pid(pid, prio) #define xnarch_trace_panic_freeze() rthal_trace_panic_freeze() #define xnarch_trace_panic_dump() rthal_trace_panic_dump() @@ -81,26 +85,32 @@ typedef unsigned long spl_t; #define spltest() rthal_local_irq_test() #define splget(x) rthal_local_irq_flags(x) -#if defined(CONFIG_SMP) && defined(CONFIG_XENO_OPT_DEBUG) +static inline unsigned xnarch_current_cpu(void) +{ + return rthal_processor_id(); +} + +#if defined(CONFIG_SMP) && XENO_DEBUG(NUCLEUS) + typedef struct { - unsigned long long spin_time; - unsigned long long lock_time; - const char *file; - const char *function; - unsigned line; + unsigned long long spin_time; + unsigned long long lock_time; + const char *file; + const char *function; + unsigned line; } xnlockinfo_t; typedef struct { - atomic_t owner; - const char *file; - const char *function; - unsigned line; - int cpu; - unsigned long long spin_time; - unsigned long long lock_date; + atomic_t owner; + const char *file; + const char *function; + unsigned line; + int cpu; + unsigned long long spin_time; + unsigned long long lock_date; } xnlock_t; @@ -114,70 +124,137 @@ typedef struct { 0LL, \ } -#else /* !(CONFIG_SMP && CONFIG_XENO_OPT_DEBUG) */ +#define XNLOCK_DBG_CONTEXT , __FILE__, __LINE__, __FUNCTION__ +#define XNLOCK_DBG_CONTEXT_ARGS \ + , const char *file, int line, const char *function +#define XNLOCK_DBG_PASS_CONTEXT , file, line, function + +#define XNLOCK_DBG_PREPARE_ACQUIRE() \ + unsigned long long __lock_date = rthal_rdtsc(); \ + unsigned __spin_limit = 3000000 + +#define XNLOCK_DBG_SPINNING() \ + do { \ + if (__spin_limit-- == 0) { \ + rthal_emergency_console(); \ + printk(KERN_ERR \ + "Xenomai: stuck on nucleus lock %p\n" \ + " waiter = %s:%u (%s(), CPU #%d)\n" \ + " owner = %s:%u (%s(), CPU #%d)\n", \ + lock, file, line, function, cpu, lock->file, \ + lock->line, lock->function, lock->cpu); \ + show_stack(NULL, NULL); \ + for (;;) \ + cpu_relax(); \ + } \ + } while (0) + +#define XNLOCK_DBG_ACQUIRED() \ + do { \ + lock->spin_time = rthal_rdtsc() - __lock_date; \ + lock->lock_date = __lock_date; \ + lock->file = file; \ + lock->function = function; \ + lock->line = line; \ + lock->cpu = cpu; \ + } while (0) + +static inline void xnlock_dbg_release(xnlock_t *lock) +{ + extern xnlockinfo_t xnlock_stats[]; + unsigned long long lock_time = rthal_rdtsc() - lock->lock_date; + xnlockinfo_t *stats = &xnlock_stats[xnarch_current_cpu()]; + + if (lock_time > stats->lock_time) { + stats->lock_time = lock_time; + stats->spin_time = lock->spin_time; + stats->file = lock->file; + stats->function = lock->function; + stats->line = lock->line; + } +} + +static inline void xnlock_dbg_invalid_release(xnlock_t *lock) +{ + rthal_emergency_console(); + printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p\n" + " owner = %s:%u (%s(), CPU #%d)\n", + lock, lock->file, lock->line, lock->function, lock->cpu); + show_stack(NULL,NULL); + for (;;) + cpu_relax(); +} + +#else /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ typedef struct { atomic_t owner; } xnlock_t; -#define XNARCH_LOCK_UNLOCKED (xnlock_t) { { ~0 } } -#endif /* !(CONFIG_SMP && CONFIG_XENO_OPT_DEBUG) */ +#define XNARCH_LOCK_UNLOCKED (xnlock_t) { { ~0 } } + +#define XNLOCK_DBG_CONTEXT +#define XNLOCK_DBG_CONTEXT_ARGS +#define XNLOCK_DBG_PASS_CONTEXT +#define XNLOCK_DBG_PREPARE_ACQUIRE() do { } while (0) +#define XNLOCK_DBG_SPINNING() do { } while (0) +#define XNLOCK_DBG_ACQUIRED() do { } while (0) + +static inline void xnlock_dbg_release(xnlock_t *lock) { } +static inline void xnlock_dbg_invalid_release(xnlock_t *lock) { } -#define XNARCH_NR_CPUS RTHAL_NR_CPUS +#endif /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ -#define XNARCH_NR_IRQS RTHAL_NR_IRQS -#define XNARCH_TIMER_IRQ RTHAL_TIMER_IRQ -#define XNARCH_TIMER_DEVICE RTHAL_TIMER_DEVICE -#define XNARCH_CLOCK_DEVICE RTHAL_CLOCK_DEVICE +#define XNARCH_NR_CPUS RTHAL_NR_CPUS -#define XNARCH_ROOT_STACKSZ 0 /* Only a placeholder -- no stack */ +#define XNARCH_NR_IRQS RTHAL_NR_IRQS +#define XNARCH_TIMER_IRQ RTHAL_TIMER_IRQ +#define XNARCH_TIMER_DEVICE RTHAL_TIMER_DEVICE +#define XNARCH_CLOCK_DEVICE RTHAL_CLOCK_DEVICE + +#define XNARCH_ROOT_STACKSZ 0 /* Only a placeholder -- no stack */ #define XNARCH_PROMPT "Xenomai: " -#define xnarch_loginfo(fmt,args...) printk(KERN_INFO XNARCH_PROMPT fmt , ##args) -#define xnarch_logwarn(fmt,args...) printk(KERN_WARNING XNARCH_PROMPT fmt , ##args) -#define xnarch_logerr(fmt,args...) printk(KERN_ERR XNARCH_PROMPT fmt , ##args) -#define xnarch_printf(fmt,args...) printk(KERN_INFO XNARCH_PROMPT fmt , ##args) +#define xnarch_loginfo(fmt, args...) printk(KERN_INFO XNARCH_PROMPT fmt, ##args) +#define xnarch_logwarn(fmt, args...) printk(KERN_WARNING XNARCH_PROMPT fmt, ##args) +#define xnarch_logerr(fmt, args...) printk(KERN_ERR XNARCH_PROMPT fmt, ##args) +#define xnarch_printf(fmt, args...) printk(KERN_INFO XNARCH_PROMPT fmt, ##args) typedef cpumask_t xnarch_cpumask_t; #ifdef CONFIG_SMP -#define xnarch_cpu_online_map cpu_online_map +#define xnarch_cpu_online_map cpu_online_map #else -#define xnarch_cpu_online_map cpumask_of_cpu(0) +#define xnarch_cpu_online_map cpumask_of_cpu(0) #endif -#define xnarch_num_online_cpus() num_online_cpus() -#define xnarch_cpu_set(cpu, mask) cpu_set(cpu, (mask)) -#define xnarch_cpu_clear(cpu, mask) cpu_clear(cpu, (mask)) -#define xnarch_cpus_clear(mask) cpus_clear(mask) -#define xnarch_cpu_isset(cpu, mask) cpu_isset(cpu, (mask)) -#define xnarch_cpus_and(dst, src1, src2) cpus_and((dst), (src1), (src2)) -#define xnarch_cpus_equal(mask1, mask2) cpus_equal((mask1), (mask2)) -#define xnarch_cpus_empty(mask) cpus_empty(mask) -#define xnarch_cpumask_of_cpu(cpu) cpumask_of_cpu(cpu) -#define xnarch_cpu_test_and_set(cpu,mask) cpu_test_and_set(cpu, (mask)) - -#define xnarch_first_cpu(mask) first_cpu(mask) -#define XNARCH_CPU_MASK_ALL CPU_MASK_ALL +#define xnarch_num_online_cpus() num_online_cpus() +#define xnarch_cpu_set(cpu, mask) cpu_set(cpu, (mask)) +#define xnarch_cpu_clear(cpu, mask) cpu_clear(cpu, (mask)) +#define xnarch_cpus_clear(mask) cpus_clear(mask) +#define xnarch_cpu_isset(cpu, mask) cpu_isset(cpu, (mask)) +#define xnarch_cpus_and(dst, src1, src2) cpus_and((dst), (src1), (src2)) +#define xnarch_cpus_equal(mask1, mask2) cpus_equal((mask1), (mask2)) +#define xnarch_cpus_empty(mask) cpus_empty(mask) +#define xnarch_cpumask_of_cpu(cpu) cpumask_of_cpu(cpu) +#define xnarch_cpu_test_and_set(cpu, mask) cpu_test_and_set(cpu, (mask)) +#define xnarch_first_cpu(mask) first_cpu(mask) +#define XNARCH_CPU_MASK_ALL CPU_MASK_ALL typedef struct xnarch_heapcb { - atomic_t numaps; /* # of active user-space mappings. */ + atomic_t numaps; /* # of active user-space mappings. */ - int kmflags; /* Kernel memory flags (0 if vmalloc()). */ + int kmflags; /* Kernel memory flags (0 if vmalloc()). */ - void *heapbase; /* Shared heap memory base. */ + void *heapbase; /* Shared heap memory base. */ } xnarch_heapcb_t; -#ifdef __cplusplus -extern "C" { -#endif - unsigned long long xnarch_get_host_time(void); long long xnarch_tsc_to_ns(long long ts); static inline long long xnarch_tsc_to_ns_rounded(long long ts) { - return (xnarch_llimd(ts, 1000000000, RTHAL_CPU_FREQ/2) + 1) / 2; + return (xnarch_llimd(ts, 1000000000, RTHAL_CPU_FREQ/2) + 1) / 2; } long long xnarch_ns_to_tsc(long long ns); @@ -186,48 +263,39 @@ unsigned long long xnarch_get_cpu_time(v static inline unsigned long long xnarch_get_cpu_freq(void) { - return RTHAL_CPU_FREQ; + return RTHAL_CPU_FREQ; } -static inline unsigned xnarch_current_cpu(void) -{ - return rthal_processor_id(); -} - -#define xnarch_halt(emsg) \ -do { \ - rthal_emergency_console(); \ - xnarch_logerr("fatal: %s\n",emsg); \ - show_stack(NULL,NULL); \ - xnarch_trace_panic_dump(); \ - for (;;) cpu_relax(); \ -} while(0) +#define xnarch_halt(emsg) \ + do { \ + rthal_emergency_console(); \ + xnarch_logerr("fatal: %s\n", emsg); \ + show_stack(NULL,NULL); \ + xnarch_trace_panic_dump(); \ + for (;;) \ + cpu_relax(); \ + } while(0) static inline int xnarch_setimask (int imask) { - spl_t s; - splhigh(s); - splexit(!!imask); - return !!s; + spl_t s; + + splhigh(s); + splexit(!!imask); + return !!s; } #ifdef CONFIG_SMP -#if XENO_DEBUG(NUCLEUS) -#define xnlock_get(lock) \ - __xnlock_get(lock, __FILE__, __LINE__,__FUNCTION__) +#define xnlock_get(lock) __xnlock_get(lock XNLOCK_DBG_CONTEXT) #define xnlock_get_irqsave(lock,x) \ - ((x) = __xnlock_get_irqsave(lock, __FILE__, __LINE__,__FUNCTION__)) -#else /* !XENO_DEBUG(NUCLEUS) */ -#define xnlock_get(lock) __xnlock_get(lock) -#define xnlock_get_irqsave(lock,x) ((x) = __xnlock_get_irqsave(lock)) -#endif /* !XENO_DEBUG(NUCLEUS) */ -#define xnlock_clear_irqoff(lock) xnlock_put_irqrestore(lock,1) -#define xnlock_clear_irqon(lock) xnlock_put_irqrestore(lock,0) + ((x) = __xnlock_get_irqsave(lock XNLOCK_DBG_CONTEXT)) +#define xnlock_clear_irqoff(lock) xnlock_put_irqrestore(lock, 1) +#define xnlock_clear_irqon(lock) xnlock_put_irqrestore(lock, 0) static inline void xnlock_init (xnlock_t *lock) { - *lock = XNARCH_LOCK_UNLOCKED; + *lock = XNARCH_LOCK_UNLOCKED; } #define DECLARE_XNLOCK(lock) xnlock_t lock @@ -235,131 +303,59 @@ static inline void xnlock_init (xnlock_t #define DEFINE_XNLOCK(lock) xnlock_t lock = XNARCH_LOCK_UNLOCKED #define DEFINE_PRIVATE_XNLOCK(lock) static DEFINE_XNLOCK(lock) -#if XENO_DEBUG(NUCLEUS) +static inline int __xnlock_get(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) +{ + int cpu = xnarch_current_cpu(); + int recursing = (atomic_read(&lock->owner) == cpu); + + if (!recursing) { + XNLOCK_DBG_PREPARE_ACQUIRE(); -#define XNARCH_DEBUG_SPIN_LIMIT 3000000 + while (atomic_cmpxchg(&lock->owner, ~0, cpu) != ~0) + do { + cpu_relax(); + XNLOCK_DBG_SPINNING(); + } while(atomic_read(&lock->owner) != ~0); -static inline int __xnlock_get (xnlock_t *lock, - const char *file, - unsigned line, - const char *function) -{ - unsigned spin_count = 0; -#else /* !XENO_DEBUG(NUCLEUS) */ -static inline int __xnlock_get (xnlock_t *lock) -{ -#endif /* !XENO_DEBUG(NUCLEUS) */ - int recursing; - - recursing = (atomic_read(&lock->owner) == rthal_processor_id()); - if (!recursing) { -#if XENO_DEBUG(NUCLEUS) - unsigned long long lock_date = rthal_rdtsc(); -#endif /* XENO_DEBUG(NUCLEUS) */ - while(atomic_cmpxchg(&lock->owner, ~0, rthal_processor_id()) != ~0) - do { - cpu_relax(); - -#if XENO_DEBUG(NUCLEUS) - if (++spin_count == XNARCH_DEBUG_SPIN_LIMIT) { - rthal_emergency_console(); - printk(KERN_ERR - "Xenomai: stuck on nucleus lock %p\n" - " waiter = %s:%u (%s(), CPU #%d)\n" - " owner = %s:%u (%s(), CPU #%d)\n", - lock,file,line,function,rthal_processor_id(), - lock->file,lock->line,lock->function,lock->cpu); - show_stack(NULL,NULL); - for (;;) - cpu_relax(); - } -#endif /* XENO_DEBUG(NUCLEUS) */ - } while(atomic_read(&lock->owner) != ~0); - -#if XENO_DEBUG(NUCLEUS) - lock->spin_time = rthal_rdtsc() - lock_date; - lock->lock_date = lock_date; - lock->file = file; - lock->function = function; - lock->line = line; - lock->cpu = rthal_processor_id(); -#endif /* XENO_DEBUG(NUCLEUS) */ - } - - return recursing; -} - -static inline void xnlock_put (xnlock_t *lock) -{ - if (likely(atomic_read(&lock->owner) == rthal_processor_id())) { - -#if XENO_DEBUG(NUCLEUS) - extern xnlockinfo_t xnlock_stats[]; - - unsigned long long lock_time = rthal_rdtsc() - lock->lock_date; - int cpu = rthal_processor_id(); - - if (lock_time > xnlock_stats[cpu].lock_time) { - xnlock_stats[cpu].lock_time = lock_time; - xnlock_stats[cpu].spin_time = lock->spin_time; - xnlock_stats[cpu].file = lock->file; - xnlock_stats[cpu].function = lock->function; - xnlock_stats[cpu].line = lock->line; - } -#endif /* XENO_DEBUG(NUCLEUS) */ - atomic_set(&lock->owner, ~0); - } -#if XENO_DEBUG(NUCLEUS) - else { - rthal_emergency_console(); - printk(KERN_ERR - "Xenomai: unlocking unlocked nucleus lock %p\n" - " owner = %s:%u (%s(), CPU #%d)\n", - lock,lock->file,lock->line,lock->function,lock->cpu); - show_stack(NULL,NULL); - for (;;) - cpu_relax(); - } -#endif /* XENO_DEBUG(NUCLEUS) */ -} - -#if XENO_DEBUG(NUCLEUS) - -static inline spl_t __xnlock_get_irqsave (xnlock_t *lock, - const char *file, - unsigned line, - const char *function) -{ -#else /* !XENO_DEBUG(NUCLEUS) */ -static inline spl_t __xnlock_get_irqsave (xnlock_t *lock) -{ -#endif /* !XENO_DEBUG(NUCLEUS) */ - unsigned long flags; - - rthal_local_irq_save(flags); - -#if XENO_DEBUG(NUCLEUS) - if (__xnlock_get(lock, file, line, function)) - flags |= 2; -#else /* !XENO_DEBUG(NUCLEUS) */ - if (__xnlock_get(lock)) - flags |= 2; -#endif /* !XENO_DEBUG(NUCLEUS) */ - - return flags; + XNLOCK_DBG_ACQUIRED(); + } + + return recursing; } -static inline void xnlock_put_irqrestore (xnlock_t *lock, spl_t flags) +static inline void xnlock_put(xnlock_t *lock) { - if (!(flags & 2)) - xnlock_put(lock); + if (likely(atomic_read(&lock->owner) == xnarch_current_cpu())) { + xnlock_dbg_release(lock); + atomic_set(&lock->owner, ~0); + } else + xnlock_dbg_invalid_release(lock); +} - rthal_local_irq_restore(flags & 1); +static inline spl_t +__xnlock_get_irqsave(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) +{ + unsigned long flags; + + rthal_local_irq_save(flags); + + if (__xnlock_get(lock XNLOCK_DBG_PASS_CONTEXT)) + flags |= 2; + + return flags; } -static inline int xnarch_send_ipi (xnarch_cpumask_t cpumask) +static inline void xnlock_put_irqrestore(xnlock_t *lock, spl_t flags) +{ + if (!(flags & 2)) + xnlock_put(lock); + + rthal_local_irq_restore(flags & 1); +} + +static inline int xnarch_send_ipi(xnarch_cpumask_t cpumask) { - return rthal_send_ipi(RTHAL_SERVICE_IPI0, cpumask); + return rthal_send_ipi(RTHAL_SERVICE_IPI0, cpumask); } static inline int xnlock_is_owner(xnlock_t *lock) @@ -369,14 +365,14 @@ static inline int xnlock_is_owner(xnlock #else /* !CONFIG_SMP */ -#define xnlock_init(lock) do { } while(0) -#define xnlock_get(lock) do { } while(0) -#define xnlock_put(lock) do { } while(0) -#define xnlock_get_irqsave(lock,x) rthal_local_irq_save(x) -#define xnlock_put_irqrestore(lock,x) rthal_local_irq_restore(x) -#define xnlock_clear_irqoff(lock) rthal_local_irq_disable() -#define xnlock_clear_irqon(lock) rthal_local_irq_enable() -#define xnlock_is_owner(lock) 1 +#define xnlock_init(lock) do { } while(0) +#define xnlock_get(lock) do { } while(0) +#define xnlock_put(lock) do { } while(0) +#define xnlock_get_irqsave(lock,x) rthal_local_irq_save(x) +#define xnlock_put_irqrestore(lock,x) rthal_local_irq_restore(x) +#define xnlock_clear_irqoff(lock) rthal_local_irq_disable() +#define xnlock_clear_irqon(lock) rthal_local_irq_enable() +#define xnlock_is_owner(lock) 1 #define DECLARE_XNLOCK(lock) #define DECLARE_EXTERN_XNLOCK(lock) @@ -385,7 +381,7 @@ static inline int xnlock_is_owner(xnlock static inline int xnarch_send_ipi (xnarch_cpumask_t cpumask) { - return 0; + return 0; } #endif /* !CONFIG_SMP */ @@ -400,7 +396,7 @@ static inline int xnarch_remap_vm_page(s unsigned long from, unsigned long to) { - return wrap_remap_vm_page(vma,from,to); + return wrap_remap_vm_page(vma, from, to); } static inline int xnarch_remap_io_page_range(struct vm_area_struct *vma, @@ -409,7 +405,7 @@ static inline int xnarch_remap_io_page_r unsigned long size, pgprot_t prot) { - return wrap_remap_io_page_range(vma,from,to,size,prot); + return wrap_remap_io_page_range(vma, from, to, size, prot); } #ifndef xnarch_hisyscall_entry ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-02-23 13:37 ` [Xenomai-core] [PATCH 1/4] Refactor generic system.h Jan Kiszka @ 2008-02-23 17:38 ` Gilles Chanteperdrix 2008-02-23 18:03 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 17:38 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > In order to allow further optimizations of xnlock, I started with > refactoring the related system.h. This improves the readability > significantly, IMHO. It also happen to reduce the text size of > __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. > > Another quirk I happen to remove: xnlock debugging depends on > XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version > of xnlock_t already with XENO_OPT_DEBUG. There is a lot of whitespace change in this patch, which make it hard to read. Anyway, there are a few things I do not like in this patch: - macro which make reference to symbols defined elsewhere - functions arguments as macro, I find more readable the #ifdef with the different function prototypes, the code can be read without having to look at a different place. Something we could be interesting would be to be able to enable spinlocks debug in UP, which would enable real debugging xnlocks in this case. I made an attempt of doing this on ARM some time ago, this generated a kernel that would lockup at boot. But I think this is something we should sort out. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-02-23 17:38 ` Gilles Chanteperdrix @ 2008-02-23 18:03 ` Jan Kiszka 2008-02-23 18:59 ` Gilles Chanteperdrix 2008-03-01 18:54 ` Gilles Chanteperdrix 0 siblings, 2 replies; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:03 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2147 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > In order to allow further optimizations of xnlock, I started with > > refactoring the related system.h. This improves the readability > > significantly, IMHO. It also happen to reduce the text size of > > __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. > > > > Another quirk I happen to remove: xnlock debugging depends on > > XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version > > of xnlock_t already with XENO_OPT_DEBUG. > > There is a lot of whitespace change in this patch, which make it hard to > read. Well, this patch is mostly about whitespace and formatting fixes (among which ifdef reduction falls for me as well). But I can split it up if desired. > > Anyway, there are a few things I do not like in this patch: > - macro which make reference to symbols defined elsewhere You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? Granted, not nice but so far the most compact approach I found. My goal was to keep the lock implementations as pure as possible (you can easily ignore the debug stuff now when reading xnlock_get/put). > - functions arguments as macro, I find more readable the #ifdef with the > different function prototypes, the code can be read without having to > look at a different place. I'm open to learn a third way to achieve what we need. I'm just convinced that the old way was far worse. Please consider for a better suggestion that the number of variants increase with my ticket lock. That's why I tried to stuff things in macros. Hmm, maybe we should simply get rid of the file/line/function stuff completely and switch to IP + ksyms. What do you think? > > Something we could be interesting would be to be able to enable > spinlocks debug in UP, which would enable real debugging xnlocks in this > case. I made an attempt of doing this on ARM some time ago, this > generated a kernel that would lockup at boot. But I think this is > something we should sort out. > Yeah, sounds good and should be feasible. Will check. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-02-23 18:03 ` Jan Kiszka @ 2008-02-23 18:59 ` Gilles Chanteperdrix 2008-03-01 18:54 ` Gilles Chanteperdrix 1 sibling, 0 replies; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 18:59 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > In order to allow further optimizations of xnlock, I started with > > > refactoring the related system.h. This improves the readability > > > significantly, IMHO. It also happen to reduce the text size of > > > __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. > > > > > > Another quirk I happen to remove: xnlock debugging depends on > > > XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version > > > of xnlock_t already with XENO_OPT_DEBUG. > > > > There is a lot of whitespace change in this patch, which make it hard to > > read. > > Well, this patch is mostly about whitespace and formatting fixes (among > which ifdef reduction falls for me as well). But I can split it up if > desired. > > > > > Anyway, there are a few things I do not like in this patch: > > - macro which make reference to symbols defined elsewhere > > You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? > Granted, not nice but so far the most compact approach I found. My goal > was to keep the lock implementations as pure as possible (you can easily > ignore the debug stuff now when reading xnlock_get/put). > > > - functions arguments as macro, I find more readable the #ifdef with the > > different function prototypes, the code can be read without having to > > look at a different place. > > I'm open to learn a third way to achieve what we need. I'm just > convinced that the old way was far worse. I do not see a third approach. Maybe passing all the arguments to the function, and count on the optimizer to remove useless arguments when inlining ? > > Please consider for a better suggestion that the number of variants > increase with my ticket lock. That's why I tried to stuff things in > macros. Hmm, maybe we should simply get rid of the file/line/function > stuff completely and switch to IP + ksyms. What do you think? I find the file line approach more precise. print_symbol gives you an offset, you have to disassemble to understand what it means, and it does not see inline functions. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-02-23 18:03 ` Jan Kiszka 2008-02-23 18:59 ` Gilles Chanteperdrix @ 2008-03-01 18:54 ` Gilles Chanteperdrix 2008-03-01 19:22 ` Jan Kiszka 1 sibling, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-03-01 18:54 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > In order to allow further optimizations of xnlock, I started with > > > refactoring the related system.h. This improves the readability > > > significantly, IMHO. It also happen to reduce the text size of > > > __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. > > > > > > Another quirk I happen to remove: xnlock debugging depends on > > > XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version > > > of xnlock_t already with XENO_OPT_DEBUG. > > > > There is a lot of whitespace change in this patch, which make it hard to > > read. > > Well, this patch is mostly about whitespace and formatting fixes (among > which ifdef reduction falls for me as well). But I can split it up if > desired. > > > > > Anyway, there are a few things I do not like in this patch: > > - macro which make reference to symbols defined elsewhere > > You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? > Granted, not nice but so far the most compact approach I found. My goal > was to keep the lock implementations as pure as possible (you can easily > ignore the debug stuff now when reading xnlock_get/put). > > > - functions arguments as macro, I find more readable the #ifdef with the > > different function prototypes, the code can be read without having to > > look at a different place. > > I'm open to learn a third way to achieve what we need. I'm just > convinced that the old way was far worse. > > Please consider for a better suggestion that the number of variants > increase with my ticket lock. That's why I tried to stuff things in > macros. Hmm, maybe we should simply get rid of the file/line/function > stuff completely and switch to IP + ksyms. What do you think? I do not want to leave this in a dead end. IMO, your approach make xnlock_get readable in the non debugging case at the expense of its readability in the debugging case. I would better see the two implementations with a unique ifdef. Granted, there will be some code duplication, but it will not be that much, and this allows us to move the debugging version out of line while keeping the non debugging case inline. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 1/4] Refactor generic system.h 2008-03-01 18:54 ` Gilles Chanteperdrix @ 2008-03-01 19:22 ` Jan Kiszka 0 siblings, 0 replies; 25+ messages in thread From: Jan Kiszka @ 2008-03-01 19:22 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2742 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Jan Kiszka wrote: > > > > In order to allow further optimizations of xnlock, I started with > > > > refactoring the related system.h. This improves the readability > > > > significantly, IMHO. It also happen to reduce the text size of > > > > __xnlock_get a bit by avoid redundant rthal_processor_id read-outs. > > > > > > > > Another quirk I happen to remove: xnlock debugging depends on > > > > XENO_OPT_DEBUG_NUCLEUS, but needlessly we used to pick the debug version > > > > of xnlock_t already with XENO_OPT_DEBUG. > > > > > > There is a lot of whitespace change in this patch, which make it hard to > > > read. > > > > Well, this patch is mostly about whitespace and formatting fixes (among > > which ifdef reduction falls for me as well). But I can split it up if > > desired. > > > > > > > > Anyway, there are a few things I do not like in this patch: > > > - macro which make reference to symbols defined elsewhere > > > > You mean XNLOCK_DBG_PREPARE_ACQUIRE vs. XNLOCK_DBG_SPINNING/ACQUIRED? > > Granted, not nice but so far the most compact approach I found. My goal > > was to keep the lock implementations as pure as possible (you can easily > > ignore the debug stuff now when reading xnlock_get/put). > > > > > - functions arguments as macro, I find more readable the #ifdef with the > > > different function prototypes, the code can be read without having to > > > look at a different place. > > > > I'm open to learn a third way to achieve what we need. I'm just > > convinced that the old way was far worse. > > > > Please consider for a better suggestion that the number of variants > > increase with my ticket lock. That's why I tried to stuff things in > > macros. Hmm, maybe we should simply get rid of the file/line/function > > stuff completely and switch to IP + ksyms. What do you think? > > I do not want to leave this in a dead end. IMO, your approach make > xnlock_get readable in the non debugging case at the expense of its > readability in the debugging case. I would better see the two > implementations with a unique ifdef. Granted, there will be some code > duplication, but it will not be that much, and this allows us to move > the debugging version out of line while keeping the non debugging case > inline. Don't panic. I'm sitting on a new version of this patch series, only running a final benchmark to estimate the gain. This unfortunately takes a lot of time. BTW, the series will also add lock debugging for UP, and it beautifies the refactoring a bit more, addressing your concerns. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 250 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 13:33 [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock Jan Kiszka 2008-02-23 13:36 ` [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Jan Kiszka 2008-02-23 13:37 ` [Xenomai-core] [PATCH 1/4] Refactor generic system.h Jan Kiszka @ 2008-02-23 13:38 ` Jan Kiszka 2008-02-23 17:51 ` Gilles Chanteperdrix 2008-02-23 21:36 ` Jeroen Van den Keybus 2008-02-23 13:50 ` [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock Jan Kiszka 3 siblings, 2 replies; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 13:38 UTC (permalink / raw) To: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2567 bytes --] At least when SMP is enable, already __xnlock_get becomes far too heavy-weighted for being inlined. xnlock_put is fine now, but looking closer at the disassembly still revealed a lot of redundancy related to acquiring and releasing xnlocks. In fact, we are mostly using xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with rthal_local_irq_save/restore, also heavy-weighted on SMP. So this patch turns the latter two into uninlined functions which reduces the text size or nucleus and skins significantly on x86-64/SMP (XENO_OPT_DEBUG_NUCLEUS disabled): Without any patch of this series: text data bss dec hex filename 79189 2168 308 81665 13f01 kernel/xenomai/skins/native/xeno_native.o 26668 2176 1104 29948 74fc kernel/xenomai/skins/rtdm/xeno_rtdm.o 102661 1376 160224 264261 40845 kernel/xenomai/skins/posix/xeno_posix.o 112482 5440 444340 562262 89456 kernel/xenomai/nucleus/xeno_nucleus.o With 1+2 applied: text data bss dec hex filename 76099 2168 308 78575 132ef kernel/xenomai/skins/native/xeno_native.o 25871 2176 1104 29151 71df kernel/xenomai/skins/rtdm/xeno_rtdm.o 97816 1376 160224 259416 3f558 kernel/xenomai/skins/posix/xeno_posix.o 108818 5440 444340 558598 88606 kernel/xenomai/nucleus/xeno_nucleus.o With this one applied: text data bss dec hex filename 49469 2168 308 51945 cae9 kernel/xenomai/skins/native/xeno_native.o 19247 2176 1104 22527 57ff kernel/xenomai/skins/rtdm/xeno_rtdm.o 60200 1376 160224 221800 36268 kernel/xenomai/skins/posix/xeno_posix.o 79453 5440 444340 529233 81351 kernel/xenomai/nucleus/xeno_nucleus.o Given this dramatic reduction, I really cannot imagine any negative impact on worst-case latencies. Already the first nesting of nklock in some hot path (I would say, this is the minimum in practice) should pay of the additional function call. But I wasn't able to test yet. Anyone is welcome to try this out, feedback highly appreciated! Jan PS: Philippe, ipipe_restore_pipeline_head's disassembly eg. looks awful for an inline function. That's most obvious on SMP, but it is not much better on UP. The reason seems to be ipipe_cpudom_var(__ipipe_pipeline_head(), status) with its indirections, required to find the head domain slot. Wild idea: Why not using a fixed slot for the head domain? There can only be one anyway. That would then also help UP configs where this patch doesn't improve anything. [-- Attachment #2: uninline-xnlock-functions.patch --] [-- Type: text/x-patch, Size: 1859 bytes --] --- include/asm-generic/bits/pod.h | 24 ++++++++++++++++++++++++ include/asm-generic/system.h | 22 ++-------------------- 2 files changed, 26 insertions(+), 20 deletions(-) Index: b/include/asm-generic/bits/pod.h =================================================================== --- a/include/asm-generic/bits/pod.h +++ b/include/asm-generic/bits/pod.h @@ -295,4 +295,28 @@ unsigned long long xnarch_get_cpu_time(v EXPORT_SYMBOL(xnarch_get_cpu_time); +#ifdef CONFIG_SMP +spl_t __xnlock_get_irqsave(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) +{ + unsigned long flags; + + rthal_local_irq_save(flags); + + if (__xnlock_get(lock XNLOCK_DBG_PASS_CONTEXT)) + flags |= 2; + + return flags; +} +EXPORT_SYMBOL(__xnlock_get_irqsave); + +void xnlock_put_irqrestore(xnlock_t *lock, spl_t flags) +{ + if (!(flags & 2)) + xnlock_put(lock); + + rthal_local_irq_restore(flags & 1); +} +EXPORT_SYMBOL(xnlock_put_irqrestore); +#endif /* CONFIG_SMP */ + #endif /* !_XENO_ASM_GENERIC_BITS_POD_H */ Index: b/include/asm-generic/system.h =================================================================== --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -332,26 +332,8 @@ static inline void xnlock_put(xnlock_t * atomic_set(&lock->owner, ~0); } -static inline spl_t -__xnlock_get_irqsave(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) -{ - unsigned long flags; - - rthal_local_irq_save(flags); - - if (__xnlock_get(lock XNLOCK_DBG_PASS_CONTEXT)) - flags |= 2; - - return flags; -} - -static inline void xnlock_put_irqrestore(xnlock_t *lock, spl_t flags) -{ - if (!(flags & 2)) - xnlock_put(lock); - - rthal_local_irq_restore(flags & 1); -} +spl_t __xnlock_get_irqsave(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS); +void xnlock_put_irqrestore(xnlock_t *lock, spl_t flags); static inline int xnarch_send_ipi(xnarch_cpumask_t cpumask) { ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 13:38 ` [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions Jan Kiszka @ 2008-02-23 17:51 ` Gilles Chanteperdrix 2008-02-23 18:13 ` Jan Kiszka 2008-02-23 21:36 ` Jeroen Van den Keybus 1 sibling, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 17:51 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > At least when SMP is enable, already __xnlock_get becomes far too > heavy-weighted for being inlined. xnlock_put is fine now, but looking > closer at the disassembly still revealed a lot of redundancy related to > acquiring and releasing xnlocks. In fact, we are mostly using > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with > rthal_local_irq_save/restore, also heavy-weighted on SMP. > > So this patch turns the latter two into uninlined functions which > reduces the text size or nucleus and skins significantly on x86-64/SMP > (XENO_OPT_DEBUG_NUCLEUS disabled): I think the human idea of how long an inline function can be is far more restrictive than what a processor can take. When looking at assembly code, you always find the code long, whereas in reality it is not that long for a processor. Besides, IMO, the proper way to uninline xnlock operations is to leave the non contended case inline, and to move the spinning out of line. And this is something we should not do without measuring its impact. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 17:51 ` Gilles Chanteperdrix @ 2008-02-23 18:13 ` Jan Kiszka 2008-02-23 18:33 ` Gilles Chanteperdrix 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:13 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1847 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > At least when SMP is enable, already __xnlock_get becomes far too > > heavy-weighted for being inlined. xnlock_put is fine now, but looking > > closer at the disassembly still revealed a lot of redundancy related to > > acquiring and releasing xnlocks. In fact, we are mostly using > > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with > > rthal_local_irq_save/restore, also heavy-weighted on SMP. > > > > So this patch turns the latter two into uninlined functions which > > reduces the text size or nucleus and skins significantly on x86-64/SMP > > (XENO_OPT_DEBUG_NUCLEUS disabled): > > I think the human idea of how long an inline function can be is far more > restrictive than what a processor can take. When looking at assembly > code, you always find the code long, whereas in reality it is not that > long for a processor. > > Besides, IMO, the proper way to uninline xnlock operations is to leave > the non contended case inline, and to move the spinning out of line. This patch is not just about uninlining xnlock, that's only one half of the savings. The other one is irq-disabling via i-pipe. The problem with our case is that we have no simple single check to find out that we are on a fast ride. Rather, we have to do quite some calculations/lookups before the first check, and we have to perform multiple checks even in the best case. > > And this is something we should not do without measuring its impact. For sure, will be done. But I'm very optimistic about the results given this massive code size reduction - which should translates in less cache misses for the worst-case path. What increase latency most for us (special hardware properties aside) is memory access, both data and text. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 18:13 ` Jan Kiszka @ 2008-02-23 18:33 ` Gilles Chanteperdrix 2008-02-23 18:58 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 18:33 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > At least when SMP is enable, already __xnlock_get becomes far too > > > heavy-weighted for being inlined. xnlock_put is fine now, but looking > > > closer at the disassembly still revealed a lot of redundancy related to > > > acquiring and releasing xnlocks. In fact, we are mostly using > > > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with > > > rthal_local_irq_save/restore, also heavy-weighted on SMP. > > > > > > So this patch turns the latter two into uninlined functions which > > > reduces the text size or nucleus and skins significantly on x86-64/SMP > > > (XENO_OPT_DEBUG_NUCLEUS disabled): > > > > I think the human idea of how long an inline function can be is far more > > restrictive than what a processor can take. When looking at assembly > > code, you always find the code long, whereas in reality it is not that > > long for a processor. > > > > Besides, IMO, the proper way to uninline xnlock operations is to leave > > the non contended case inline, and to move the spinning out of line. > > This patch is not just about uninlining xnlock, that's only one half of > the savings. The other one is irq-disabling via i-pipe. The problem with > our case is that we have no simple single check to find out that we are > on a fast ride. Rather, we have to do quite some calculations/lookups > before the first check, and we have to perform multiple checks even in > the best case. This is my fault, a tradeoff I made, I thought that the atomic_cmpxchg could be heavy on SMP systems, so I made a first check to see if we are not recursing. But we can do the two operations in one move if we accept to have a failing atomic_cmpxchg when recursing. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 18:33 ` Gilles Chanteperdrix @ 2008-02-23 18:58 ` Jan Kiszka 0 siblings, 0 replies; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:58 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2314 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Jan Kiszka wrote: > > > > At least when SMP is enable, already __xnlock_get becomes far too > > > > heavy-weighted for being inlined. xnlock_put is fine now, but looking > > > > closer at the disassembly still revealed a lot of redundancy related to > > > > acquiring and releasing xnlocks. In fact, we are mostly using > > > > xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with > > > > rthal_local_irq_save/restore, also heavy-weighted on SMP. > > > > > > > > So this patch turns the latter two into uninlined functions which > > > > reduces the text size or nucleus and skins significantly on x86-64/SMP > > > > (XENO_OPT_DEBUG_NUCLEUS disabled): > > > > > > I think the human idea of how long an inline function can be is far more > > > restrictive than what a processor can take. When looking at assembly > > > code, you always find the code long, whereas in reality it is not that > > > long for a processor. > > > > > > Besides, IMO, the proper way to uninline xnlock operations is to leave > > > the non contended case inline, and to move the spinning out of line. > > > > This patch is not just about uninlining xnlock, that's only one half of > > the savings. The other one is irq-disabling via i-pipe. The problem with > > our case is that we have no simple single check to find out that we are > > on a fast ride. Rather, we have to do quite some calculations/lookups > > before the first check, and we have to perform multiple checks even in > > the best case. > > This is my fault, a tradeoff I made, I thought that the atomic_cmpxchg > could be heavy on SMP systems, so I made a first check to see if we are > not recursing. But we can do the two operations in one move if we accept > to have a failing atomic_cmpxchg when recursing. I'm unsure about the cache pressure of cmpxchg vs. plain read. I guess the existing variant is already better. Moreover, the spinning code is only a fraction of the fraction. We cannot eliminate the recursion check, and we still have all the local_irq_save code. And _all_ this code mostly comes together, thus we save so much by uninlining those two functions. Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions 2008-02-23 13:38 ` [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions Jan Kiszka 2008-02-23 17:51 ` Gilles Chanteperdrix @ 2008-02-23 21:36 ` Jeroen Van den Keybus 1 sibling, 0 replies; 25+ messages in thread From: Jeroen Van den Keybus @ 2008-02-23 21:36 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1486 bytes --] > > Without any patch of this series: > text data bss dec hex filename > 79189 2168 308 81665 13f01 > kernel/xenomai/skins/native/xeno_native.o > 26668 2176 1104 29948 74fc > kernel/xenomai/skins/rtdm/xeno_rtdm.o > 102661 1376 160224 264261 40845 > kernel/xenomai/skins/posix/xeno_posix.o > 112482 5440 444340 562262 89456 > kernel/xenomai/nucleus/xeno_nucleus.o > > With 1+2 applied: > text data bss dec hex filename > 76099 2168 308 78575 132ef > kernel/xenomai/skins/native/xeno_native.o > 25871 2176 1104 29151 71df > kernel/xenomai/skins/rtdm/xeno_rtdm.o > 97816 1376 160224 259416 3f558 > kernel/xenomai/skins/posix/xeno_posix.o > 108818 5440 444340 558598 88606 > kernel/xenomai/nucleus/xeno_nucleus.o > > With this one applied: > text data bss dec hex filename > 49469 2168 308 51945 cae9 > kernel/xenomai/skins/native/xeno_native.o > 19247 2176 1104 22527 57ff > kernel/xenomai/skins/rtdm/xeno_rtdm.o > 60200 1376 160224 221800 36268 > kernel/xenomai/skins/posix/xeno_posix.o > 79453 5440 444340 529233 81351 > kernel/xenomai/nucleus/xeno_nucleus.o How badly would the inlining affect the cache ? I imagine that having the respective (un)locking functions in a single place would greatly reduce the chance of it being swapped out. Is there a difference on the latency values with those patches applied ? Jeroen. [-- Attachment #2: Type: text/html, Size: 2703 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock 2008-02-23 13:33 [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock Jan Kiszka ` (2 preceding siblings ...) 2008-02-23 13:38 ` [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions Jan Kiszka @ 2008-02-23 13:50 ` Jan Kiszka 2008-02-23 17:54 ` Gilles Chanteperdrix 3 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 13:50 UTC (permalink / raw) To: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 1183 bytes --] The root of all this: When Nick Piggin posted his first suggestion for ticket spinlocks on LKML, I immediately liked the idea. For details check LWN [1], in a nutshell: This algorithm enforces strict FIFO order for the admission to contended spinlocks, thus it improves the determinism on SMP systems with more than 2 CPUs. Meanwhile, ticket spinlocks are mainline (2.6.25). But that version has to drawbacks for us: it doesn't support nesting like xnlock does, and it is x86-only so far. So I designed a version for Xenomai which is both nestable and arch-independent. It is certainly not as optimal as mainline's version, but our code path stresses the locking code differently anyway. This thing here /seems/ to work, but I'm lacking CPUs at home to test. You can't truly stress ticket locks with only a single dual-core :-/. QEMU runs into a live-lock with -smp 2, this patch applied and two moderate latency loops, but that might be an artifact of its single-threaded VCPU scheduling. And kvm currently locks up under SMP even without any change, but kvm and SMP is a story of its own. There is hope: 16-way is waiting at work... :) Jan [1] http://lwn.net/Articles/267331 [-- Attachment #2: ticket-xnlock.patch --] [-- Type: text/x-patch, Size: 5738 bytes --] --- include/asm-generic/system.h | 110 +++++++++++++++++++++++++++++++++++++++---- ksrc/nucleus/Kconfig | 12 ++++ 2 files changed, 112 insertions(+), 10 deletions(-) Index: b/include/asm-generic/system.h =================================================================== --- a/include/asm-generic/system.h +++ b/include/asm-generic/system.h @@ -104,7 +104,12 @@ typedef struct { typedef struct { +#ifdef CONFIG_XENO_OPT_TICKET_LOCK + atomic_t fifo; + int owner; +#else atomic_t owner; +#endif const char *file; const char *function; unsigned line; @@ -115,7 +120,7 @@ typedef struct { } xnlock_t; #define XNARCH_LOCK_UNLOCKED (xnlock_t) { \ - { ~0 }, \ + XNLOCK_INIT_VALUE, \ NULL, \ NULL, \ 0, \ @@ -159,6 +164,8 @@ typedef struct { lock->cpu = cpu; \ } while (0) +static inline int xnlock_owner(xnlock_t *lock); + static inline void xnlock_dbg_release(xnlock_t *lock) { extern xnlockinfo_t xnlock_stats[]; @@ -166,7 +173,7 @@ static inline void xnlock_dbg_release(xn int cpu = xnarch_current_cpu(); xnlockinfo_t *stats = &xnlock_stats[cpu]; - if (unlikely(atomic_read(&lock->owner) != cpu)) { + if (unlikely(xnlock_owner(lock) != cpu)) { rthal_emergency_console(); printk(KERN_ERR "Xenomai: unlocking unlocked nucleus lock %p" " on CPU #%d\n" @@ -189,9 +196,18 @@ static inline void xnlock_dbg_release(xn #else /* !(CONFIG_SMP && XENO_DEBUG(NUCLEUS)) */ +#ifdef CONFIG_XENO_OPT_TICKET_LOCK +typedef struct { + + atomic_t fifo; + int owner; + +} xnlock_t; +#else typedef struct { atomic_t owner; } xnlock_t; +#endif -#define XNARCH_LOCK_UNLOCKED (xnlock_t) { { ~0 } } +#define XNARCH_LOCK_UNLOCKED (xnlock_t) { XNLOCK_INIT_VALUE } #define XNLOCK_DBG_CONTEXT #define XNLOCK_DBG_CONTEXT_ARGS @@ -294,20 +310,88 @@ static inline int xnarch_setimask (int i #define xnlock_clear_irqoff(lock) xnlock_put_irqrestore(lock, 1) #define xnlock_clear_irqon(lock) xnlock_put_irqrestore(lock, 0) -static inline void xnlock_init (xnlock_t *lock) -{ - *lock = XNARCH_LOCK_UNLOCKED; -} - #define DECLARE_XNLOCK(lock) xnlock_t lock #define DECLARE_EXTERN_XNLOCK(lock) extern xnlock_t lock #define DEFINE_XNLOCK(lock) xnlock_t lock = XNARCH_LOCK_UNLOCKED #define DEFINE_PRIVATE_XNLOCK(lock) static DEFINE_XNLOCK(lock) +#ifdef CONFIG_XENO_OPT_TICKET_LOCK +/* Note: We assume atomic_t is at least 32-bit wide. */ +#define XNLOCK_HEAD_MASK 0x000000FF +#define XNLOCK_HEAD_INC 0x00000001 +#define XNLOCK_TAIL_MASK 0xFF000000 +#define XNLOCK_TAIL_SHIFT 24 +#define XNLOCK_TAIL_INC 0x01000000 +#define XNLOCK_TICKET_MASK (XNLOCK_HEAD_MASK | XNLOCK_TAIL_MASK) +#define XNLOCK_INIT_VALUE { 0x00000001 }, ~0 + +static inline int xnlock_owner(xnlock_t *lock) +{ + return lock->owner; +} + +static inline int __xnlock_get(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) +{ + int cpu = xnarch_current_cpu(); + int recursing = (xnlock_owner(lock) == cpu); + u32 fifo; + u8 ticket; + + if (!recursing) { + XNLOCK_DBG_PREPARE_ACQUIRE(); + + fifo = atomic_add_return(XNLOCK_TAIL_INC, &lock->fifo); + ticket = fifo >> XNLOCK_TAIL_SHIFT; + + /* First check if we are already the owner (head == tail). */ + if (unlikely((u8)fifo != ticket)) { + /* + * Spin until head equals our ticket. + */ + do { + cpu_relax(); + XNLOCK_DBG_SPINNING(); + } while ((u8)atomic_read(&lock->fifo) != ticket); + } + + lock->owner = cpu; + XNLOCK_DBG_ACQUIRED(); + } + + return recursing; +} + +static inline void xnlock_put(xnlock_t *lock) +{ + u32 fifo = atomic_read(&lock->fifo); + + xnlock_dbg_release(lock); + + lock->owner = ~0; + + xnarch_memory_barrier(); + + /* + * Carefully calculate the addend so that we confine the + * XNLOCK_HEAD_INC just to the first byte of fifo. + */ + atomic_add(((fifo + XNLOCK_HEAD_INC) & XNLOCK_TICKET_MASK) - fifo, + &lock->fifo); +} + +#else /* ! CONFIG_XENO_OPT_TICKET_LOCK */ + +#define XNLOCK_INIT_VALUE { ~0 } + +static inline int xnlock_owner(xnlock_t *lock) +{ + return atomic_read(&lock->owner); +} + static inline int __xnlock_get(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS) { int cpu = xnarch_current_cpu(); - int recursing = (atomic_read(&lock->owner) == cpu); + int recursing = (xnlock_owner(lock) == cpu); if (!recursing) { XNLOCK_DBG_PREPARE_ACQUIRE(); @@ -331,6 +415,12 @@ static inline void xnlock_put(xnlock_t * xnlock_dbg_release(lock); atomic_set(&lock->owner, ~0); } +#endif /* ! CONFIG_XENO_OPT_TICKET_LOCK */ + +static inline void xnlock_init(xnlock_t *lock) +{ + *lock = XNARCH_LOCK_UNLOCKED; +} spl_t __xnlock_get_irqsave(xnlock_t *lock XNLOCK_DBG_CONTEXT_ARGS); void xnlock_put_irqrestore(xnlock_t *lock, spl_t flags); @@ -342,7 +432,7 @@ static inline int xnarch_send_ipi(xnarch static inline int xnlock_is_owner(xnlock_t *lock) { - return atomic_read(&lock->owner) == xnarch_current_cpu(); + return xnlock_owner(lock) == xnarch_current_cpu(); } #else /* !CONFIG_SMP */ Index: b/ksrc/nucleus/Kconfig =================================================================== --- a/ksrc/nucleus/Kconfig +++ b/ksrc/nucleus/Kconfig @@ -340,6 +340,18 @@ config XENO_OPT_TIMER_WHEEL_STEP Set the duration in ns of a timer wheel step. At each step, the timer wheel use the next hash bucket. +config XENO_OPT_TICKET_LOCK + bool "FIFO ticket spinlocks" + depends on SMP + help + + Uses a ticket spinlock algorithm which enforces strict FIFO + admission order on contended locks, improving the determinism + of lock acquisitions on large SMP systems. The ticket spinlock + has a higher overhead than the unordered algorithm and should + therefore only be used if there are more than two CPUs, thus + more than one waiter in the worst case. + endmenu endif ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock 2008-02-23 13:50 ` [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock Jan Kiszka @ 2008-02-23 17:54 ` Gilles Chanteperdrix 2008-02-23 18:20 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 17:54 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > The root of all this: When Nick Piggin posted his first suggestion for > ticket spinlocks on LKML, I immediately liked the idea. For details > check LWN [1], in a nutshell: This algorithm enforces strict FIFO order > for the admission to contended spinlocks, thus it improves the > determinism on SMP systems with more than 2 CPUs. > > Meanwhile, ticket spinlocks are mainline (2.6.25). But that version has > to drawbacks for us: it doesn't support nesting like xnlock does, and it > is x86-only so far. > > So I designed a version for Xenomai which is both nestable and > arch-independent. It is certainly not as optimal as mainline's version, > but our code path stresses the locking code differently anyway. > > This thing here /seems/ to work, but I'm lacking CPUs at home to test. > You can't truly stress ticket locks with only a single dual-core :-/. > QEMU runs into a live-lock with -smp 2, this patch applied and two > moderate latency loops, but that might be an artifact of its > single-threaded VCPU scheduling. And kvm currently locks up under SMP > even without any change, but kvm and SMP is a story of its own. There is > hope: 16-way is waiting at work... :) Xenomai uses a Big Kernel Lock, an approach known to not scale very well. So, if we want to scale correctly on machines with many cpus, we should change our locking strategy first. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock 2008-02-23 17:54 ` Gilles Chanteperdrix @ 2008-02-23 18:20 ` Jan Kiszka 2008-02-23 18:43 ` Gilles Chanteperdrix 0 siblings, 1 reply; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 18:20 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2024 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > The root of all this: When Nick Piggin posted his first suggestion for > > ticket spinlocks on LKML, I immediately liked the idea. For details > > check LWN [1], in a nutshell: This algorithm enforces strict FIFO order > > for the admission to contended spinlocks, thus it improves the > > determinism on SMP systems with more than 2 CPUs. > > > > Meanwhile, ticket spinlocks are mainline (2.6.25). But that version has > > to drawbacks for us: it doesn't support nesting like xnlock does, and it > > is x86-only so far. > > > > So I designed a version for Xenomai which is both nestable and > > arch-independent. It is certainly not as optimal as mainline's version, > > but our code path stresses the locking code differently anyway. > > > > This thing here /seems/ to work, but I'm lacking CPUs at home to test. > > You can't truly stress ticket locks with only a single dual-core :-/. > > QEMU runs into a live-lock with -smp 2, this patch applied and two > > moderate latency loops, but that might be an artifact of its > > single-threaded VCPU scheduling. And kvm currently locks up under SMP > > even without any change, but kvm and SMP is a story of its own. There is > > hope: 16-way is waiting at work... :) > > Xenomai uses a Big Kernel Lock, an approach known to not scale very > well. So, if we want to scale correctly on machines with many cpus, we > should change our locking strategy first. Per-cpu IPC objects, per-cpu nklock - I'm all with you! But that's stuff for a massive restructuring we could schedule for Xenomai 3. This thing here surely does not help to scale RT loads across 16 CPUs or more. But it already pays off determinism-wise with 3 or 4 CPUs involved in RT jobs with moderate contention on nklock. It is nothing for 2-way boxes (except for testing) where the exiting algorithm is more efficient (and where you don't have the risk of unfair lock admission anyway). Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock 2008-02-23 18:20 ` Jan Kiszka @ 2008-02-23 18:43 ` Gilles Chanteperdrix 2008-02-23 19:13 ` Jan Kiszka 0 siblings, 1 reply; 25+ messages in thread From: Gilles Chanteperdrix @ 2008-02-23 18:43 UTC (permalink / raw) To: Jan Kiszka; +Cc: Xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: > > Jan Kiszka wrote: > > > The root of all this: When Nick Piggin posted his first suggestion for > > > ticket spinlocks on LKML, I immediately liked the idea. For details > > > check LWN [1], in a nutshell: This algorithm enforces strict FIFO order > > > for the admission to contended spinlocks, thus it improves the > > > determinism on SMP systems with more than 2 CPUs. > > > > > > Meanwhile, ticket spinlocks are mainline (2.6.25). But that version has > > > to drawbacks for us: it doesn't support nesting like xnlock does, and it > > > is x86-only so far. > > > > > > So I designed a version for Xenomai which is both nestable and > > > arch-independent. It is certainly not as optimal as mainline's version, > > > but our code path stresses the locking code differently anyway. > > > > > > This thing here /seems/ to work, but I'm lacking CPUs at home to test. > > > You can't truly stress ticket locks with only a single dual-core :-/. > > > QEMU runs into a live-lock with -smp 2, this patch applied and two > > > moderate latency loops, but that might be an artifact of its > > > single-threaded VCPU scheduling. And kvm currently locks up under SMP > > > even without any change, but kvm and SMP is a story of its own. There is > > > hope: 16-way is waiting at work... :) > > > > Xenomai uses a Big Kernel Lock, an approach known to not scale very > > well. So, if we want to scale correctly on machines with many cpus, we > > should change our locking strategy first. > > Per-cpu IPC objects, per-cpu nklock - I'm all with you! But that's stuff > for a massive restructuring we could schedule for Xenomai 3. This does not look that easy. What I meant is rather that Xenomai is probably run mostly on machines with not so many cpus. -- Gilles Chanteperdrix. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock 2008-02-23 18:43 ` Gilles Chanteperdrix @ 2008-02-23 19:13 ` Jan Kiszka 0 siblings, 0 replies; 25+ messages in thread From: Jan Kiszka @ 2008-02-23 19:13 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: Xenomai-core [-- Attachment #1: Type: text/plain, Size: 2853 bytes --] Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > Gilles Chanteperdrix wrote: > > > Jan Kiszka wrote: > > > > The root of all this: When Nick Piggin posted his first suggestion for > > > > ticket spinlocks on LKML, I immediately liked the idea. For details > > > > check LWN [1], in a nutshell: This algorithm enforces strict FIFO order > > > > for the admission to contended spinlocks, thus it improves the > > > > determinism on SMP systems with more than 2 CPUs. > > > > > > > > Meanwhile, ticket spinlocks are mainline (2.6.25). But that version has > > > > to drawbacks for us: it doesn't support nesting like xnlock does, and it > > > > is x86-only so far. > > > > > > > > So I designed a version for Xenomai which is both nestable and > > > > arch-independent. It is certainly not as optimal as mainline's version, > > > > but our code path stresses the locking code differently anyway. > > > > > > > > This thing here /seems/ to work, but I'm lacking CPUs at home to test. > > > > You can't truly stress ticket locks with only a single dual-core :-/. > > > > QEMU runs into a live-lock with -smp 2, this patch applied and two > > > > moderate latency loops, but that might be an artifact of its > > > > single-threaded VCPU scheduling. And kvm currently locks up under SMP > > > > even without any change, but kvm and SMP is a story of its own. There is > > > > hope: 16-way is waiting at work... :) > > > > > > Xenomai uses a Big Kernel Lock, an approach known to not scale very > > > well. So, if we want to scale correctly on machines with many cpus, we > > > should change our locking strategy first. > > > > Per-cpu IPC objects, per-cpu nklock - I'm all with you! But that's stuff > > for a massive restructuring we could schedule for Xenomai 3. > > This does not look that easy. What I meant is rather that Xenomai is > probably run mostly on machines with not so many cpus. You do not have to be an oracle to predict that the number of cores will grow quickly in the foreseeable future on almost any arch, and that users will like to _use_ more of them for RT. Running Xenomai on big boxes is already no big deal today, you just have to keep the RT (and borderline) load confined to a few of them. (Statically!) spreading the load over more cores is then just the next logical step. For sure, what I'm proposing, strict user-manageable cpu-locality of objects and thus of lock mechanisms, is not simple to achieve - but it would scale like hell and it would keep the costs mostly at UP level, on huge SMP boxes, NUMA included. But even in that case, you will need spinlocks from time to time, and you will want them to remain as predictable as possible. So this patch is also a bit of an investment in the future. :) Jan [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 254 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2008-03-01 19:22 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-02-23 13:33 [Xenomai-core] [PATCH 0/4] Fixes and improvements around xnlock Jan Kiszka 2008-02-23 13:36 ` [Xenomai-core] [PATCH 2/4] Fix and optimize xnlock_put Jan Kiszka 2008-02-23 17:41 ` Gilles Chanteperdrix 2008-02-23 18:05 ` Jan Kiszka 2008-02-23 18:29 ` Gilles Chanteperdrix 2008-02-23 18:57 ` Jan Kiszka 2008-02-23 19:41 ` Gilles Chanteperdrix 2008-02-23 23:50 ` Philippe Gerum 2008-02-23 13:37 ` [Xenomai-core] [PATCH 1/4] Refactor generic system.h Jan Kiszka 2008-02-23 17:38 ` Gilles Chanteperdrix 2008-02-23 18:03 ` Jan Kiszka 2008-02-23 18:59 ` Gilles Chanteperdrix 2008-03-01 18:54 ` Gilles Chanteperdrix 2008-03-01 19:22 ` Jan Kiszka 2008-02-23 13:38 ` [Xenomai-core] [PATCH 3/4] Uninline heavy locking functions Jan Kiszka 2008-02-23 17:51 ` Gilles Chanteperdrix 2008-02-23 18:13 ` Jan Kiszka 2008-02-23 18:33 ` Gilles Chanteperdrix 2008-02-23 18:58 ` Jan Kiszka 2008-02-23 21:36 ` Jeroen Van den Keybus 2008-02-23 13:50 ` [Xenomai-core] [RFC][PATCH 4/4] Recursive FIFO ticket xnlock Jan Kiszka 2008-02-23 17:54 ` Gilles Chanteperdrix 2008-02-23 18:20 ` Jan Kiszka 2008-02-23 18:43 ` Gilles Chanteperdrix 2008-02-23 19:13 ` Jan Kiszka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.