public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rostedt <rostedt@goodmis.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Yordan Karadzhov <y.karadz@gmail.com>,
	Tzvetomir Stoyanov <tz.stoyanov@gmail.com>,
	Tom Zanussi <zanussi@kernel.org>,
	Jason Behmer <jbehmer@google.com>,
	Julia Lawall <julia.lawall@inria.fr>,
	Clark Williams <williams@redhat.com>,
	bristot <bristot@redhat.com>, Daniel Wagner <wagi@monom.org>,
	Darren Hart <dvhart@vmware.com>, Jonathan Corbet <corbet@lwn.net>,
	"Suresh E. Warrier" <warrier@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH] ring-buffer: Have nested events still record running time stamp
Date: Thu, 25 Jun 2020 15:35:02 -0400 (EDT)	[thread overview]
Message-ID: <79426976.13417.1593113702719.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20200625143525.2f3a2902@oasis.local.home>

----- On Jun 25, 2020, at 2:35 PM, rostedt rostedt@goodmis.org wrote:

> On Thu, 25 Jun 2020 13:55:07 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote
> 
>> > Here's the design of this solution:
>> > 
>> > All this is per cpu, and only needs to worry about nested events (not
>> > parallel events).
>> > 
>> > The players:
>> > 
>> > write_tail: The index in the buffer where new events can be written to.
>> >     It is incremented via local_add() to reserve space for a new event.
>> > 
>> > before_stamp: A time stamp set by all events before reserving space.
>> > 
>> > write_stamp: A time stamp updated by events after it has successfully
>> >     reserved space.
>> 
>> What are the types used for before_stamp and write_stamp ? If those
>> are 64-bit integers, how does sharing them between nested contexts
>> work on 32-bit architectures ?
> 
> Well, write_stamp is updated via local64, which I believe handles this
> for us. I probably should make before_stamp handle it as well.

By looking at local64 headers, it appears that 32-bit rely on atomic64,
which on x86 is implemented with LOCK; cmpxchg8b for 586+ (which is AFAIK
painfully slow) and with cli/sti for 386/486 (which is not nmi-safe).

For all other 32-bit architectures, the generic atomic64.h implements 64-bit
atomics using spinlocks with irqs off, which seems to also bring considerable
overhead, in addition to be non-reentrant with respect to NMI-like interrupts,
e.g. FIQ on ARM32.

That seems at odds with the performance constraints of ftrace's ring buffer.

Those performance and reentrancy concerns are why I always stick to local_t
(long), and never use a full 64-bit type for anything that has to
do with concurrent store/load between execution contexts in lttng.

> 
> 
>> 
>> >	 * a full time stamp (this can turn into a time extend which
>> >	is
>> >	 * just an extended time delta but fill up the extra space).
>> >	 */
>> >	if (after != before)
>> >		abs = true;
>> > 
>> >	ts = clock();
>> > 
>> >	/* Now update the before_stamp (everyone does this!) */
>> > [B]	WRITE_ONCE(before_stamp, ts);
>> > 
>> >	/* Read the current next_write and update it to what we want
>> >	write
>> >	 * to be after we reserve space. */
>> > 	next = READ_ONCE(next_write);
>> >	WRITE_ONCE(next_write, w + len);
>> > 
>> >	/* Now reserve space on the buffer */
>> > [C]	write = local_add_return(len, write_tail);
>> 
>> So the reservation is not "just" an add instruction, it's actually an
>> xadd on x86. Is that really faster than a cmpxchg ?
> 
> I believe the answer is still yes. But I can run some benchmarks to
> make sure.

This would be interesting to see, because if xadd and cmpxchg have
similar overhead, then going for a cmpxchg-loop for the space
reservation could vastly decrease the overall complexity of this
timestamp+space reservation algorithm.

If we decrease complexity of the fast-paths, we can then reduce the
amount of test+branches which need to be performed and instruction cache
being used, and this can lead to performance improvements. So it's not only
good in making things more easily verifiable, but it also improves
performance.

[...]
> 
> 
>> 
>> >		} else {
>> >			/* slow path */
>> >			if (w == next) {
>> 
>> If it's a slow path, why try to play games with a delta timestamp ?
>> Space should not be an issue for a slow path like this. It would be
>> simpler to just use the full timestamp here.
> 
> Hmm, you may be right. Previous iterations of this code had a distinct
> difference here, but after restructuring it, I don't think that
> distinction is valid anymore. If anything, having this at least lets me
> validate that it's doing what I believe it should be doing (as I added
> a ton of trace_printk()s into this).

Good. Simple is good. :-)

> 
>> 
>> >				/* This event preempted the previous
>> >	event
>> >				 * just after it reserved its
>> >	buffer.
>> 
>> You mean nesting after [C] but before [D].
> 
> Yes. I can add that for clarity, but perhaps I don't need that if I
> merge the two.

OK

> 
>> 
>> >				 * It's timestamp should be
>> >	"before_stamp" */
>> 
>> It's -> Its
> 
> ;-)
> 
> My email client messed up your formatting of the rest of the email, so
> I'll send a separate reply.

OK,

Thanks!

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-06-25 19:35 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-25 13:44 [RFC][PATCH] ring-buffer: Have nested events still record running time stamp Steven Rostedt
2020-06-25 13:53 ` Mathieu Desnoyers
2020-06-25 14:37   ` Steven Rostedt
2020-06-25 16:42     ` Korben Rusek
2020-06-25 18:12       ` Steven Rostedt
2020-06-25 17:55 ` Mathieu Desnoyers
2020-06-25 18:35   ` Steven Rostedt
2020-06-25 19:35     ` Mathieu Desnoyers [this message]
2020-06-25 19:58       ` Steven Rostedt
2020-06-26  2:36       ` Steven Rostedt
2020-06-26  3:35         ` Steven Rostedt
2020-06-26 13:58           ` Steven Rostedt
2020-06-26 18:13             ` Mathieu Desnoyers
2020-06-26 18:58               ` Steven Rostedt
2020-06-26 19:39                 ` Steven Rostedt
2020-06-30  0:21       ` Steven Rostedt
2020-06-30  3:13         ` Mathieu Desnoyers
2020-06-30  3:26           ` Steven Rostedt
2020-06-25 19:04   ` Steven Rostedt
2020-06-25 19:58     ` Mathieu Desnoyers
2020-06-25 20:42       ` Steven Rostedt
2020-06-25 19:09   ` Steven Rostedt
2020-06-25 20:03     ` Mathieu Desnoyers
2020-06-25 18:09 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79426976.13417.1593113702719.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=acme@redhat.com \
    --cc=bristot@redhat.com \
    --cc=corbet@lwn.net \
    --cc=dvhart@vmware.com \
    --cc=jbehmer@google.com \
    --cc=jolsa@redhat.com \
    --cc=julia.lawall@inria.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tz.stoyanov@gmail.com \
    --cc=wagi@monom.org \
    --cc=warrier@linux.vnet.ibm.com \
    --cc=williams@redhat.com \
    --cc=y.karadz@gmail.com \
    --cc=zanussi@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox