Re: bts & perf_counters - Peter Zijlstra

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Markus Metzger <markus.t.metzger@googlemail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: bts & perf_counters
Date: Mon, 06 Jul 2009 17:34:14 +0200	[thread overview]
Message-ID: <1246894454.8143.101.camel@twins> (raw)
In-Reply-To: <928CFBE8E7CB0040959E56B4EA41A77EBE519AE5@irsmsx504.ger.corp.intel.com>

On Tue, 2009-06-30 at 08:32 +0100, Metzger, Markus T wrote:
> 
> >> A debugger is interested in the tail of the execution trace. It
> >> won't poll the trace data (which would be far too much overhead).
> >> How would a user synchronize on the profile stream when the
> >> profiled process is stopped?
> >
> >Yeah, with a new perf_attr flag that activates overwrite this
> >usecase would be solved, right? The debugger has to make sure the
> >task is stopped before reading out the buffer, but that's pretty
> >much all.
> 
> I'm not sure about that. The way I read struct perf_counter_mmap_page,
> data_head points to the end of the stream (I would guess one byte
> beyond the last record).
> 
> I think we can ignore data_tail in the debug scenario since debuggers
> won't poll. We can further assume a buffer overflow no matter how big
> the ring buffer - branch trace grows terribly fast and we don't want
> normal uses to lock megabytes of memory, do we?
> 
> How would a debugger find the beginning of the event stream to start
> reading?

something like the below? (utterly untested)

---
 include/linux/perf_counter.h |    3 ++-
 kernel/perf_counter.c        |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 5e970c7..95b5257 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -180,8 +180,9 @@ struct perf_counter_attr {
 				freq           :  1, /* use freq, not period  */
 				inherit_stat   :  1, /* per task counts       */
 				enable_on_exec :  1, /* next exec enables     */
+				overwrite      :  1, /* overwrite mmap data   */
 
-				__reserved_1   : 51;
+				__reserved_1   : 50;
 
 	__u32			wakeup_events;	/* wakeup every n events */
 	__u32			__reserved_2;
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index d55a50d..0c64d53 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -2097,6 +2097,13 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	nr_pages = (vma_size / PAGE_SIZE) - 1;
 
 	/*
+	 * attr->overwrite and PROT_WRITE both use ->data_tail in an exclusive
+	 * manner, disallow this combination.
+	 */
+	if ((vma->vm_flags & VM_WRITE) && counter->attr.overwrite)
+		return -EINVAL;
+
+	/*
 	 * If we have data pages ensure they're a power-of-two number, so we
 	 * can do bitmasks instead of modulo.
 	 */
@@ -2329,6 +2336,7 @@ struct perf_output_handle {
 	struct perf_counter	*counter;
 	struct perf_mmap_data	*data;
 	unsigned long		head;
+	unsigned long		tail;
 	unsigned long		offset;
 	int			nmi;
 	int			sample;
@@ -2363,6 +2371,31 @@ static bool perf_output_space(struct perf_mmap_data *data,
 	return true;
 }
 
+static void perf_output_tail(struct perf_mmap_data *data, unsigned int head)
+{
+	__u64 *tailp = &data->user_page->data_tail;
+	struct perf_event_header *header;
+	unsigned long pages_mask, nr;
+	unsigned long tail, new;
+	unsigned long size;
+	void *ptr;
+
+	if (data->writable)
+		return;
+
+	size 	   = data->nr_pages << PAGE_SHIFT;
+	pages_mask = data->nr_pages - 1;
+	tail	   = ACCESS_ONCE(*tailp);
+
+	while (tail + size - head < 0) {
+		nr     = (tail >> PAGE_SHIFT) & pages_mask;
+		ptr    = data->pages[nr] + (tail & (PAGE_SIZE - 1));
+		header = (struct perf_event_header *)ptr;
+		new    = tail + header->size;
+		tail   = atomic64_cmpxchg(tailp, tail, new);
+	}
+}
+
 static void perf_output_wakeup(struct perf_output_handle *handle)
 {
 	atomic_set(&handle->data->poll, POLL_IN);
@@ -2535,6 +2568,8 @@ static int perf_output_begin(struct perf_output_handle *handle,
 		head += size;
 		if (unlikely(!perf_output_space(data, offset, head)))
 			goto fail;
+		if (unlikely(counter->attr.overwrite))
+			perf_output_tail(data, head);
 	} while (atomic_long_cmpxchg(&data->head, offset, head) != offset);
 
 	handle->offset	= offset;

     prev parent reply	other threads:[~2009-07-06 15:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <tip-511b01bdf64ad8a38414096eab283c7784aebfc4@git.kernel.org>
2009-06-11  6:30 ` [tip:tracing/core] Revert "x86, bts: reenable ptrace branch trace support" Metzger, Markus T
2009-06-11  6:36   ` Peter Zijlstra
2009-06-11  7:17     ` Metzger, Markus T
2009-06-11  8:08       ` Peter Zijlstra
2009-06-11  8:30         ` Metzger, Markus T
2009-06-11 10:21     ` Ingo Molnar
2009-06-11 10:39       ` Metzger, Markus T
2009-06-11 21:41         ` Ingo Molnar
2009-06-12 11:04           ` Metzger, Markus T
2009-06-18 10:23             ` Metzger, Markus T
2009-06-24 13:10               ` Metzger, Markus T
     [not found]                 ` <20090624133645.GE6224@elte.hu>
     [not found]                   ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DB9B9@irsmsx504.ger.corp.intel.com>
     [not found]                     ` <20090624153229.GA24346@elte.hu>
     [not found]                       ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DC3D9@irsmsx504.ger.corp.intel.com>
     [not found]                         ` <20090626122948.GC10850@elte.hu>
     [not found]                           ` <928CFBE8E7CB0040959E56B4EA41A77EBE519869@irsmsx504.ger.corp.intel.com>
     [not found]                             ` <20090629202002.GF31577@elte.hu>
2009-06-30  7:32                               ` bts & perf_counters Metzger, Markus T
2009-06-30 19:32                                 ` Ingo Molnar
2009-07-06 15:34                                 ` Peter Zijlstra [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5e970c7 dfblob:95b5257 dfblob:d55a50d dfblob:0c64d53 )
 OR (
bs:"Re: bts & perf_counters" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1246894454.8143.101.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=markus.t.metzger@googlemail.com \
    --cc=markus.t.metzger@intel.com \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox