Perf user-space ABI sequence lock memory barriers

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Perf user-space ABI sequence lock memory barriers
       [not found] <104831840.19303.1391553779700.JavaMail.zimbra@efficios.com>
@ 2014-02-04 22:56 ` Mathieu Desnoyers
  2014-02-05  8:05   ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Mathieu Desnoyers @ 2014-02-04 22:56 UTC (permalink / raw)
  To: tglx
  Cc: Heinz.Egger, bigeasy, Linux Kernel Mailing List, Peter Zijlstra,
	Molnar, Ingo, rostedt, Paul E. McKenney

Hi,

I'm currently integrating user-space performance counters from
Perf into LTTng-UST, and I'm noticing something odd regarding
the home-made sequence lock found at:

kernel/events/core.c: perf_event_update_userpage()

        ++userpg->lock;
        barrier();
[...]
        barrier();
        ++userpg->lock;

This goes in pair with something like this at user-level:

        do {
                seq = pc->lock;
                barrier();

                idx = pc->index;
                count = pc->offset;
                if (idx)
                        count += rdpmc(idx - 1);

                barrier();
        } while (pc->lock != seq);

As we see, only compiler barrier() are protecting all this.
First question, is it possible that the update be performed
by a thread running on a different CPU than the thread reading
the info in user-space ?

I would be tempted to use a volatile semantic on all reads of the
lock field (ACCESS_ONCE()). Secondly, read sequence locks usually use a
smp_rmb() at the end of the seqcount_begin(), and at the beginning
of the seqcount_retry(). Moreover, this is usually matched
by smp_wmb() in write_seqcount begin/end().

Am I missing something special about this lock that makes these
barriers unnecessary ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Perf user-space ABI sequence lock memory barriers
  2014-02-04 22:56 ` Perf user-space ABI sequence lock memory barriers Mathieu Desnoyers
@ 2014-02-05  8:05   ` Peter Zijlstra
  2014-02-05 20:33     ` Mathieu Desnoyers
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2014-02-05  8:05 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: tglx, Heinz.Egger, bigeasy, Linux Kernel Mailing List,
	Molnar, Ingo, rostedt, Paul E. McKenney

On Tue, Feb 04, 2014 at 10:56:24PM +0000, Mathieu Desnoyers wrote:
> Hi,
> 
> I'm currently integrating user-space performance counters from
> Perf into LTTng-UST, and I'm noticing something odd regarding
> the home-made sequence lock found at:
> 
> kernel/events/core.c: perf_event_update_userpage()
> 
>         ++userpg->lock;
>         barrier();
> [...]
>         barrier();
>         ++userpg->lock;
> 
> This goes in pair with something like this at user-level:
> 
>         do {
>                 seq = pc->lock;

You could make that:

		while ((seq = pc->lock) & 1);

>                 barrier();
> 
>                 idx = pc->index;
>                 count = pc->offset;
>                 if (idx)
>                         count += rdpmc(idx - 1);
> 
>                 barrier();
>         } while (pc->lock != seq);
> 
> As we see, only compiler barrier() are protecting all this.
> First question, is it possible that the update be performed
> by a thread running on a different CPU than the thread reading
> the info in user-space ?

You can make that so, but that is not a 'supported' case. This all
assumes you're monitoring yourself, in which case the event is ran on
the cpu you are running on too and the updates are matched on cpu, or
separated by schedule() which includes the required memory barriers to
make it appear its all on the same cpu anyway.

> I would be tempted to use a volatile semantic on all reads of the
> lock field (ACCESS_ONCE()).

Since its all separated by the compiler barrier all the reads should be
contained and the compiler is not allowed to re-read once outside.

So I don't see the point of volatile/ACCESS_ONCE here.

You could make an argument for ACCESS_ONCE(pc->lock) though.

> Secondly, read sequence locks usually use a
> smp_rmb() at the end of the seqcount_begin(), and at the beginning
> of the seqcount_retry(). Moreover, this is usually matched
> by smp_wmb() in write_seqcount begin/end().

Given this is all for self-monitoring and hard assuming the event runs
on the same cpu, smp barriers are pointless.

> Am I missing something special about this lock that makes these
> barriers unnecessary ?

The self-monitoring aspect perhaps? But there's a NOTE in struct
perf_event_mmap_page() that's rather a dead give-away on that though.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Perf user-space ABI sequence lock memory barriers
  2014-02-05  8:05   ` Peter Zijlstra
@ 2014-02-05 20:33     ` Mathieu Desnoyers
  0 siblings, 0 replies; 3+ messages in thread
From: Mathieu Desnoyers @ 2014-02-05 20:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, Heinz Egger, bigeasy, Linux Kernel Mailing List,
	Ingo Molnar, rostedt, Paul E. McKenney

----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: tglx@linutronix.de, "Heinz Egger" <Heinz.Egger@linutronix.de>, bigeasy@linutronix.de, "Linux Kernel Mailing List"
> <linux-kernel@vger.kernel.org>, "Ingo Molnar" <mingo@kernel.org>, "rostedt" <rostedt@goodmis.org>, "Paul E.
> McKenney" <paulmck@linux.vnet.ibm.com>
> Sent: Wednesday, February 5, 2014 3:05:18 AM
> Subject: Re: Perf user-space ABI sequence lock memory barriers
> 
> On Tue, Feb 04, 2014 at 10:56:24PM +0000, Mathieu Desnoyers wrote:
> > Hi,
> > 
> > I'm currently integrating user-space performance counters from
> > Perf into LTTng-UST, and I'm noticing something odd regarding
> > the home-made sequence lock found at:
> > 
> > kernel/events/core.c: perf_event_update_userpage()
> > 
> >         ++userpg->lock;
> >         barrier();
> > [...]
> >         barrier();
> >         ++userpg->lock;
> > 
> > This goes in pair with something like this at user-level:
> > 
> >         do {
> >                 seq = pc->lock;
> 
> You could make that:
> 
> 		while ((seq = pc->lock) & 1);

Ah, yes, although since as you describe, the data structure
is per-thread, there would be no need to do this.

> 
> >                 barrier();
> > 
> >                 idx = pc->index;
> >                 count = pc->offset;
> >                 if (idx)
> >                         count += rdpmc(idx - 1);
> > 
> >                 barrier();
> >         } while (pc->lock != seq);
> > 
> > As we see, only compiler barrier() are protecting all this.
> > First question, is it possible that the update be performed
> > by a thread running on a different CPU than the thread reading
> > the info in user-space ?
> 
> You can make that so, but that is not a 'supported' case. This all
> assumes you're monitoring yourself, in which case the event is ran on
> the cpu you are running on too and the updates are matched on cpu, or
> separated by schedule() which includes the required memory barriers to
> make it appear its all on the same cpu anyway.
> 
> > I would be tempted to use a volatile semantic on all reads of the
> > lock field (ACCESS_ONCE()).
> 
> Since its all separated by the compiler barrier all the reads should be
> contained and the compiler is not allowed to re-read once outside.
> 
> So I don't see the point of volatile/ACCESS_ONCE here.
> 
> You could make an argument for ACCESS_ONCE(pc->lock) though.

Yes, this is what I meant, but I'm not sure it's absolutely required.

> 
> > Secondly, read sequence locks usually use a
> > smp_rmb() at the end of the seqcount_begin(), and at the beginning
> > of the seqcount_retry(). Moreover, this is usually matched
> > by smp_wmb() in write_seqcount begin/end().
> 
> Given this is all for self-monitoring and hard assuming the event runs
> on the same cpu, smp barriers are pointless.
> 
> > Am I missing something special about this lock that makes these
> > barriers unnecessary ?
> 
> The self-monitoring aspect perhaps? But there's a NOTE in struct
> perf_event_mmap_page() that's rather a dead give-away on that though.

The one things that confused me in the note:

         * NOTE: for obvious reason this only works on self-monitoring
         *       processes.

is the use of the word "process" for a user-space API, when it actually
means "thread" in user-space semantic. Yes, I must have been doing too much
userland stuff lately. ;-)

Thanks for the clarification,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-02-05 20:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <104831840.19303.1391553779700.JavaMail.zimbra@efficios.com>
2014-02-04 22:56 ` Perf user-space ABI sequence lock memory barriers Mathieu Desnoyers
2014-02-05  8:05   ` Peter Zijlstra
2014-02-05 20:33     ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox