linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, lkml <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	David Miller <davem@davemloft.net>
Subject: Re: [RFC patch 00/15] Tracer Timestamping
Date: Mon, 20 Oct 2008 21:32:24 -0400	[thread overview]
Message-ID: <20081021013223.GB18751@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.0810202003260.26244@xanadu.home>

* Nicolas Pitre (nico@cam.org) wrote:
> On Mon, 20 Oct 2008, Mathieu Desnoyers wrote:
> 
> > * Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> > > Have you looked at the existing 32->63 extention code in
> > > include/linux/cnt32_to_63.h and considered unifying it?
> > > 
> > 
> > Yep, I felt this code was dangerous on SMP given it could suffer from
> > the following type of race due to lack of proper barriers :
> 
> You are wrong.
> 
> > CPU    A                                 B
> >        read hw cnt low
> >        read __m_cnt_hi
> >                                          read hw cnt low
> >        (wrap detected)
> >        write __m_cnt_hi (incremented)
> >                                          read __m_cnt_hi
> >                                          (wrap detected)
> >                                          write __m_cnt_hi (incremented)
> > 
> > we therefore increment the high bits twice in the given race.
> 
> No.  Either you do the _same_ incrementation twice effectively writing 
> the _same_ high value twice, or you don't have simultaneous wrap 
> detections.  Simulate it on paper with real numbers if you are not 
> convinced.
> 

Hi Nicolas,

Yup, you are right. However, the case where one CPU sees the clock source
a little bit off-sync (late) still poses a problem. Example follows :

CPU    A                                 B
       read __m_cnt_hi (0x80000000)
       read hw cnt low (0x00000001)
       (wrap detected :
        (s32)(0x80000000 ^ 0x1) < 0)
       write __m_cnt_hi = 0x00000001
       return 0x0000000100000001
                                            read __m_cnt_hi (0x00000001)
                                     (late) read hw cnt low (0xFFFFFFFA)
                                            (wrap detected :
                                             (s32)(0x00000001 ^ 0xFFFFFFFA) < 0)
                                            write __m_cnt_hi = 0x80000001
                                            return 0x80000001FFFFFFFA
                                                                  (time jumps)
       read __m_cnt_hi (0x80000001)
       read hw cnt low (0x00000020)
       (wrap detected :
        (s32)(0x80000001 ^ 0x20) < 0)
       write __m_cnt_hi = 0x00000002
       return 0x0000000200000020
                             (time jumps again)

A similar situation can be generated by out-of-order hi/low bits reads.

> > On UP, the same race could happen if the code is called with preemption
> > enabled.
> 
> Wrong again.
> 
> > I don't think the "volatile" statement would necessarily make sure the
> > compiler and CPU would do the __m_cnt_hi read before the hw cnt low
> > read. A real memory barrier to order mmio reads wrt memory reads (or
> > instruction sync barrier if the value is taken from the cpu registers)
> > would be required to insure such order.
> 
> That's easy enough to fix, right?
> 

For memory read vs cycle counter, something like the "data_barrier(x)"
on powerpc would probably be enough when available. We would need
something that orders memory reads with the rest of instruction
execution. On other architectures, smp_rmb() + get_cycles_barrier() (the
latter being an instruction sync) would be required.

For memory read vs mmap io read, a smp_rmb() should be enough.

> > I also felt it would be more solid to have per-cpu structures to keep
> > track of 32->64 bits TSC updates, given the TSCs can always be slightly
> > out-of-sync :
> 
> If the low part is a per CPU value then the high part has to be a per 
> CPU value too.  There only needs to be a per-CPU variant of the same 
> algorithm where the only difference is that __m_cnt_hi would be per CPU.
> 
> If the low part is global then __m_cnt_hi has to be global, even on SMP.
> 

Hrm, given the performance impact of barriers that would need to be
added to insure correct read order of hi and low bits, and also
considering the risk that a "synchronized" timestamp counter off by a
few cycles poses in terms of time jumping forward, I don't see why we
would ever want to use a global __m_cnt_hi ? I would therefore recommend
to always use a per-cpu __m_cnt_hi.

Also, a small nit, to make sure cnt32_to_63 never gets scheduled out
with a stale old __m_cnt_hi value, its comments should probably say that
it has to be always used with preemption off.

Mathieu

> 
> Nicolas

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-10-21  1:32 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16 23:27 [RFC patch 00/15] Tracer Timestamping Mathieu Desnoyers
2008-10-16 23:27 ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 01/15] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 02/15] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  2:48   ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  2:48     ` Mathieu Desnoyers
2008-10-17  2:57     ` David Miller
2008-10-16 23:27 ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  0:26   ` Paul Mackerras
2008-10-17  0:43     ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  0:43       ` Mathieu Desnoyers
2008-10-17  0:54       ` Paul Mackerras
2008-10-17  1:42       ` David Miller
2008-10-17  2:08         ` Mathieu Desnoyers
2008-10-17  2:08           ` Mathieu Desnoyers
2008-10-17  2:33           ` David Miller
2008-10-16 23:27 ` [RFC patch 05/15] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-26 10:18   ` Ralf Baechle
2008-10-26 10:18     ` Ralf Baechle
2008-10-26 20:39     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 06/15] LTTng build Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  8:10   ` KOSAKI Motohiro
2008-10-17 16:18     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 07/15] LTTng timestamp Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  8:15   ` KOSAKI Motohiro
2008-10-17 16:23     ` Mathieu Desnoyers
2008-10-17 16:23       ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 08/15] LTTng - Timestamping Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 09/15] LTTng mips export hpt frequency Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 10/15] LTTng timestamp mips Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 11/15] LTTng timestamp powerpc Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 12/15] LTTng timestamp sparc64 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 13/15] LTTng timestamp sh Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 14/15] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 15/15] LTTng timestamp x86 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  0:08   ` Linus Torvalds
2008-10-17  0:12     ` Linus Torvalds
2008-10-17  1:28     ` Mathieu Desnoyers
2008-10-17  2:19       ` Luck, Tony
2008-10-17  2:19         ` Luck, Tony
2008-10-17 17:25         ` Steven Rostedt
2008-10-17 18:08           ` Luck, Tony
2008-10-17 18:42             ` Mathieu Desnoyers
2008-10-17 18:58               ` Luck, Tony
2008-10-17 20:23                 ` Mathieu Desnoyers
2008-10-17 23:52                   ` Luck, Tony
2008-10-18 17:01                     ` Mathieu Desnoyers
2008-10-18 17:01                       ` Mathieu Desnoyers
2008-10-18 17:35                       ` Linus Torvalds
2008-10-18 17:50                         ` Ingo Molnar
2008-10-22 16:19                           ` Mathieu Desnoyers
2008-10-22 15:53                         ` Mathieu Desnoyers
2008-10-20 18:07                       ` Luck, Tony
2008-10-22 16:51                         ` Mathieu Desnoyers
2008-10-17 19:17               ` Steven Rostedt
2008-10-20 20:10               ` Linus Torvalds
2008-10-20 20:10                 ` Linus Torvalds
2008-10-20 21:38                 ` john stultz
2008-10-20 22:06                   ` Linus Torvalds
2008-10-20 22:17                     ` Ingo Molnar
2008-10-20 22:17                       ` Ingo Molnar
2008-10-20 22:29                     ` H. Peter Anvin
2008-10-20 22:29                       ` H. Peter Anvin
2008-10-21 18:10                       ` Bjorn Helgaas
2008-10-23 15:47                         ` Linus Torvalds
2008-10-23 16:39                           ` H. Peter Anvin
2008-10-23 21:54                           ` Paul Mackerras
2008-10-20 23:47                     ` john stultz
2008-10-20 23:47                       ` john stultz
2008-10-22 17:05                 ` Mathieu Desnoyers
2008-10-17 19:36         ` Christoph Lameter
2008-10-17  7:59 ` [RFC patch 00/15] Tracer Timestamping Peter Zijlstra
2008-10-20 20:25   ` Mathieu Desnoyers
2008-10-20 20:25     ` Mathieu Desnoyers
2008-10-21  0:20     ` Nicolas Pitre
2008-10-21  1:32       ` Mathieu Desnoyers [this message]
2008-10-21  2:32         ` Nicolas Pitre
2008-10-21  4:05           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081021013223.GB18751@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nico@cam.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).