At least when SMP is enable, already __xnlock_get becomes far too heavy-weighted for being inlined. xnlock_put is fine now, but looking closer at the disassembly still revealed a lot of redundancy related to acquiring and releasing xnlocks. In fact, we are mostly using xnlock_get_irqsave and xnlock_put_irqrestore. Both include fiddling with rthal_local_irq_save/restore, also heavy-weighted on SMP. So this patch turns the latter two into uninlined functions which reduces the text size or nucleus and skins significantly on x86-64/SMP (XENO_OPT_DEBUG_NUCLEUS disabled): Without any patch of this series: text data bss dec hex filename 79189 2168 308 81665 13f01 kernel/xenomai/skins/native/xeno_native.o 26668 2176 1104 29948 74fc kernel/xenomai/skins/rtdm/xeno_rtdm.o 102661 1376 160224 264261 40845 kernel/xenomai/skins/posix/xeno_posix.o 112482 5440 444340 562262 89456 kernel/xenomai/nucleus/xeno_nucleus.o With 1+2 applied: text data bss dec hex filename 76099 2168 308 78575 132ef kernel/xenomai/skins/native/xeno_native.o 25871 2176 1104 29151 71df kernel/xenomai/skins/rtdm/xeno_rtdm.o 97816 1376 160224 259416 3f558 kernel/xenomai/skins/posix/xeno_posix.o 108818 5440 444340 558598 88606 kernel/xenomai/nucleus/xeno_nucleus.o With this one applied: text data bss dec hex filename 49469 2168 308 51945 cae9 kernel/xenomai/skins/native/xeno_native.o 19247 2176 1104 22527 57ff kernel/xenomai/skins/rtdm/xeno_rtdm.o 60200 1376 160224 221800 36268 kernel/xenomai/skins/posix/xeno_posix.o 79453 5440 444340 529233 81351 kernel/xenomai/nucleus/xeno_nucleus.o Given this dramatic reduction, I really cannot imagine any negative impact on worst-case latencies. Already the first nesting of nklock in some hot path (I would say, this is the minimum in practice) should pay of the additional function call. But I wasn't able to test yet. Anyone is welcome to try this out, feedback highly appreciated! Jan PS: Philippe, ipipe_restore_pipeline_head's disassembly eg. looks awful for an inline function. That's most obvious on SMP, but it is not much better on UP. The reason seems to be ipipe_cpudom_var(__ipipe_pipeline_head(), status) with its indirections, required to find the head domain slot. Wild idea: Why not using a fixed slot for the head domain? There can only be one anyway. That would then also help UP configs where this patch doesn't improve anything.