All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Nicolas Pitre <nico@cam.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, lkml <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	David Miller <davem@davemloft.net>
Subject: Re: [RFC patch 00/15] Tracer Timestamping
Date: Mon, 20 Oct 2008 21:32:24 -0400	[thread overview]
Message-ID: <20081021013223.GB18751@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.0810202003260.26244@xanadu.home>

* Nicolas Pitre (nico@cam.org) wrote:
> On Mon, 20 Oct 2008, Mathieu Desnoyers wrote:
> 
> > * Peter Zijlstra (a.p.zijlstra@chello.nl) wrote:
> > > Have you looked at the existing 32->63 extention code in
> > > include/linux/cnt32_to_63.h and considered unifying it?
> > > 
> > 
> > Yep, I felt this code was dangerous on SMP given it could suffer from
> > the following type of race due to lack of proper barriers :
> 
> You are wrong.
> 
> > CPU    A                                 B
> >        read hw cnt low
> >        read __m_cnt_hi
> >                                          read hw cnt low
> >        (wrap detected)
> >        write __m_cnt_hi (incremented)
> >                                          read __m_cnt_hi
> >                                          (wrap detected)
> >                                          write __m_cnt_hi (incremented)
> > 
> > we therefore increment the high bits twice in the given race.
> 
> No.  Either you do the _same_ incrementation twice effectively writing 
> the _same_ high value twice, or you don't have simultaneous wrap 
> detections.  Simulate it on paper with real numbers if you are not 
> convinced.
> 

Hi Nicolas,

Yup, you are right. However, the case where one CPU sees the clock source
a little bit off-sync (late) still poses a problem. Example follows :

CPU    A                                 B
       read __m_cnt_hi (0x80000000)
       read hw cnt low (0x00000001)
       (wrap detected :
        (s32)(0x80000000 ^ 0x1) < 0)
       write __m_cnt_hi = 0x00000001
       return 0x0000000100000001
                                            read __m_cnt_hi (0x00000001)
                                     (late) read hw cnt low (0xFFFFFFFA)
                                            (wrap detected :
                                             (s32)(0x00000001 ^ 0xFFFFFFFA) < 0)
                                            write __m_cnt_hi = 0x80000001
                                            return 0x80000001FFFFFFFA
                                                                  (time jumps)
       read __m_cnt_hi (0x80000001)
       read hw cnt low (0x00000020)
       (wrap detected :
        (s32)(0x80000001 ^ 0x20) < 0)
       write __m_cnt_hi = 0x00000002
       return 0x0000000200000020
                             (time jumps again)

A similar situation can be generated by out-of-order hi/low bits reads.

> > On UP, the same race could happen if the code is called with preemption
> > enabled.
> 
> Wrong again.
> 
> > I don't think the "volatile" statement would necessarily make sure the
> > compiler and CPU would do the __m_cnt_hi read before the hw cnt low
> > read. A real memory barrier to order mmio reads wrt memory reads (or
> > instruction sync barrier if the value is taken from the cpu registers)
> > would be required to insure such order.
> 
> That's easy enough to fix, right?
> 

For memory read vs cycle counter, something like the "data_barrier(x)"
on powerpc would probably be enough when available. We would need
something that orders memory reads with the rest of instruction
execution. On other architectures, smp_rmb() + get_cycles_barrier() (the
latter being an instruction sync) would be required.

For memory read vs mmap io read, a smp_rmb() should be enough.

> > I also felt it would be more solid to have per-cpu structures to keep
> > track of 32->64 bits TSC updates, given the TSCs can always be slightly
> > out-of-sync :
> 
> If the low part is a per CPU value then the high part has to be a per 
> CPU value too.  There only needs to be a per-CPU variant of the same 
> algorithm where the only difference is that __m_cnt_hi would be per CPU.
> 
> If the low part is global then __m_cnt_hi has to be global, even on SMP.
> 

Hrm, given the performance impact of barriers that would need to be
added to insure correct read order of hi and low bits, and also
considering the risk that a "synchronized" timestamp counter off by a
few cycles poses in terms of time jumping forward, I don't see why we
would ever want to use a global __m_cnt_hi ? I would therefore recommend
to always use a per-cpu __m_cnt_hi.

Also, a small nit, to make sure cnt32_to_63 never gets scheduled out
with a stale old __m_cnt_hi value, its comments should probably say that
it has to be always used with preemption off.

Mathieu

> 
> Nicolas

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-10-21  1:32 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16 23:27 [RFC patch 00/15] Tracer Timestamping Mathieu Desnoyers
2008-10-16 23:27 ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 01/15] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 02/15] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  2:48   ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  2:48     ` Mathieu Desnoyers
2008-10-17  2:57     ` David Miller
2008-10-16 23:27 ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  0:26   ` Paul Mackerras
2008-10-17  0:43     ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  0:54       ` Paul Mackerras
2008-10-17  1:42       ` David Miller
2008-10-17  2:08         ` Mathieu Desnoyers
2008-10-17  2:33           ` David Miller
2008-10-16 23:27 ` [RFC patch 05/15] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-26 10:18   ` Ralf Baechle
2008-10-26 20:39     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 06/15] LTTng build Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  8:10   ` KOSAKI Motohiro
2008-10-17 16:18     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 07/15] LTTng timestamp Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  8:15   ` KOSAKI Motohiro
2008-10-17 16:23     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 08/15] LTTng - Timestamping Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 09/15] LTTng mips export hpt frequency Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 10/15] LTTng timestamp mips Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 11/15] LTTng timestamp powerpc Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 12/15] LTTng timestamp sparc64 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 13/15] LTTng timestamp sh Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 14/15] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 15/15] LTTng timestamp x86 Mathieu Desnoyers
2008-10-16 23:27   ` Mathieu Desnoyers
2008-10-17  0:08   ` Linus Torvalds
2008-10-17  0:12     ` Linus Torvalds
2008-10-17  1:28     ` Mathieu Desnoyers
2008-10-17  2:19       ` Luck, Tony
2008-10-17 17:25         ` Steven Rostedt
2008-10-17 18:08           ` Luck, Tony
2008-10-17 18:42             ` Mathieu Desnoyers
2008-10-17 18:58               ` Luck, Tony
2008-10-17 20:23                 ` Mathieu Desnoyers
2008-10-17 23:52                   ` Luck, Tony
2008-10-18 17:01                     ` Mathieu Desnoyers
2008-10-18 17:35                       ` Linus Torvalds
2008-10-18 17:50                         ` Ingo Molnar
2008-10-22 16:19                           ` Mathieu Desnoyers
2008-10-22 15:53                         ` Mathieu Desnoyers
2008-10-20 18:07                       ` Luck, Tony
2008-10-22 16:51                         ` Mathieu Desnoyers
2008-10-17 19:17               ` Steven Rostedt
2008-10-20 20:10               ` Linus Torvalds
2008-10-20 21:38                 ` john stultz
2008-10-20 22:06                   ` Linus Torvalds
2008-10-20 22:17                     ` Ingo Molnar
2008-10-20 22:29                     ` H. Peter Anvin
2008-10-21 18:10                       ` Bjorn Helgaas
2008-10-23 15:47                         ` Linus Torvalds
2008-10-23 16:39                           ` H. Peter Anvin
2008-10-23 21:54                           ` Paul Mackerras
2008-10-20 23:47                     ` john stultz
2008-10-22 17:05                 ` Mathieu Desnoyers
2008-10-17 19:36         ` Christoph Lameter
2008-10-17  7:59 ` [RFC patch 00/15] Tracer Timestamping Peter Zijlstra
2008-10-20 20:25   ` Mathieu Desnoyers
2008-10-21  0:20     ` Nicolas Pitre
2008-10-21  1:32       ` Mathieu Desnoyers [this message]
2008-10-21  2:32         ` Nicolas Pitre
2008-10-21  4:05           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081021013223.GB18751@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nico@cam.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.