From: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
To: Jan Kiszka <jan.kiszka@siemens.com>,
Jeroen Van den Keybus <jeroen.vandenkeybus@gmail.com>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
Date: Fri, 19 Sep 2014 04:06:09 +0200 [thread overview]
Message-ID: <541B8F91.9050603@xenomai.org> (raw)
In-Reply-To: <541B3ED6.8090606@xenomai.org>
On 09/18/2014 10:21 PM, Gilles Chanteperdrix wrote:
> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>> (*) There is no guarantee that a CPU will see the correct order of
>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>> barrier (see the subsection on "SMP Barrier Pairing").
>>
>> [quick answer]
>>
>> ...or the architecture refrains from reordering write requests, like x86
>> does. What may happen, though, is that the compiler reorders the writes.
>> Therefore you need at least a (must cheaper) compiler barrier on those
>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>
> The passage you quote is quoted from memory-barriers.txt, and I find it
> makes it pretty clear that the two barriers are needed for cache
> synchronization in the general case. Now, I read more in
> memory-barriers, and I do not find easily details about what the fact
> that x86 is "strictly ordered" means, and how it relaxes the constraints
> on what rules. Maybe you would care to give us the exact passage where
> this is mentioned? Also, I would welcome any detail about how SMP cache
> synchronization actually works on x86.
Ok, I have read a few things, it would seem recent x86 architectures
(nehalem, sandy bridge and probably haswell) use the MESIF cache
coherence protocol, with a twist for haswell since it introduced
transactional memory. A cache coherence protocol ensures in theory
transparently the same view of cache on all cpus. MESIF itself is
derived from the MESI cache coherence protocol, which is said (by
wikipedia article) to have some performance issues which are generally
compensated by adding a store buffer, which in turn requires memory
barriers for a store on one cpu to be visible in the cache (and so on
other cpus). I did not find any indication that memory barriers are
still needed for this case (which is exactly the case we are interested
in) with MESIF, but no indication that they are not needed either.
Then, I had a look at the ticket spinlocks implementations. The
operations they do are roughly the same as the xnlock implementation,
except that they are optimized for each architecture, and so remove the
useless barriers. The ARM implementation has the barrier after unlock,
and use in addition the special "sev" instruction, allowing the spinning
cpu to wait for this signal with the "wfe" (wait for event) instruction,
and to not burn cpu power when spinning. In fact it does not spin.
Of course, the problem is that they are not recursive, so implementing
recursive tickets spinlocks without adding overhead seems tricky. Just
to test if ticket spinlocks solve the issue which started this thread, I
made the following implementation:
typedef struct {
unsigned owner;
arch_spinlock_t alock;
} xnlock_t;
static inline int __xnlock_get(xnlock_t *lock /*, */
XNLOCK_DBG_CONTEXT_ARGS)
{
unsigned long long start;
int cpu = xnarch_current_cpu();
if (lock->owner == cpu)
return 1;
xnlock_dbg_prepare_acquire(&start);
arch_spin_lock(&lock->alock);
lock->owner = cpu;
xnlock_dbg_acquired(lock, cpu, &start /*, */ XNLOCK_DBG_PASS_CONTEXT);
return 0;
}
static inline void xnlock_put(xnlock_t *lock)
{
if (xnlock_dbg_release(lock))
return;
lock->owner = ~0U;
arch_spin_unlock(&lock->alock);
}
And the good news is yes, this avoids the issue with /proc/xenomai/stat.
The bad news is that it does not answer the question about visibility on
one cpu of stores on another cpu without barrier. Because the ticket
spinlocks work either way on x86: the atomic add at the beginning of
arch_spin_lock ensures both the visibility of the fact that there is a
waiter to the cpu attempting to relock, and of the fact that the spin
lock has been unlocked to the waiting cpu. So, in the particular case of
the concurrent cat /proc/xenomai/stat, the "two barriers needed for
visibility" rule is respected.
I have also measured latencies with a cat /proc/xenomai/stat loop
running, with and without a memory barrier after arch_spin_unlock, and
could not find any difference, minimum, average and maximum latency
after a few minutes of runtime are the same, or at least inferior to 100ns.
I am also wondering if this xnlock implementation could be used on
forge. It has the advantage of benefiting from architecture
optimization, without the need for maintaining architecture dependent code.
--
Gilles.
next prev parent reply other threads:[~2014-09-19 2:06 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-22 16:02 [Xenomai] Reading /proc/xenomai/stat causes high latencies Jeroen Van den Keybus
2014-04-23 9:14 ` Jeroen Van den Keybus
2014-04-23 13:45 ` Jeroen Van den Keybus
2014-04-23 14:07 ` Gilles Chanteperdrix
2014-04-23 20:54 ` Jeroen Van den Keybus
2014-04-23 20:56 ` Gilles Chanteperdrix
2014-04-23 21:39 ` Jeroen Van den Keybus
2014-04-23 22:25 ` Gilles Chanteperdrix
2014-04-24 8:57 ` Jeroen Van den Keybus
2014-04-24 14:46 ` Jeroen Van den Keybus
2014-04-25 8:15 ` Jeroen Van den Keybus
2014-04-25 10:44 ` Jeroen Van den Keybus
2014-09-09 21:03 ` Gilles Chanteperdrix
2014-09-10 13:50 ` Jeroen Van den Keybus
2014-09-10 19:47 ` Gilles Chanteperdrix
2014-09-11 5:11 ` Jan Kiszka
2014-09-11 5:19 ` Jan Kiszka
2014-09-18 11:46 ` Gilles Chanteperdrix
2014-09-18 11:59 ` Jan Kiszka
2014-09-18 12:11 ` Gilles Chanteperdrix
2014-09-18 12:17 ` Gilles Chanteperdrix
2014-09-18 12:20 ` Jan Kiszka
2014-09-18 13:05 ` Gilles Chanteperdrix
2014-09-18 13:26 ` Jan Kiszka
2014-09-18 13:44 ` Gilles Chanteperdrix
2014-09-18 16:14 ` Jan Kiszka
2014-09-18 16:28 ` Gilles Chanteperdrix
2014-09-18 18:39 ` Gilles Chanteperdrix
2014-09-18 19:23 ` Jan Kiszka
2014-09-18 19:31 ` Gilles Chanteperdrix
2014-09-18 19:09 ` Jan Kiszka
2014-09-18 19:32 ` Gilles Chanteperdrix
2014-09-18 19:56 ` Jan Kiszka
2014-09-18 20:13 ` Gilles Chanteperdrix
2014-09-18 20:21 ` Gilles Chanteperdrix
2014-09-19 2:06 ` Gilles Chanteperdrix [this message]
2014-09-19 5:41 ` Jan Kiszka
2014-09-19 7:04 ` Philippe Gerum
2014-09-19 10:51 ` Gilles Chanteperdrix
2014-09-16 11:09 ` Gilles Chanteperdrix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=541B8F91.9050603@xenomai.org \
--to=gilles.chanteperdrix@xenomai.org \
--cc=jan.kiszka@siemens.com \
--cc=jeroen.vandenkeybus@gmail.com \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.