From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Markus Metzger <markus.t.metzger@googlemail.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Paul Mackerras <paulus@samba.org>
Subject: Re: bts & perf_counters
Date: Mon, 06 Jul 2009 17:34:14 +0200 [thread overview]
Message-ID: <1246894454.8143.101.camel@twins> (raw)
In-Reply-To: <928CFBE8E7CB0040959E56B4EA41A77EBE519AE5@irsmsx504.ger.corp.intel.com>
On Tue, 2009-06-30 at 08:32 +0100, Metzger, Markus T wrote:
>
> >> A debugger is interested in the tail of the execution trace. It
> >> won't poll the trace data (which would be far too much overhead).
> >> How would a user synchronize on the profile stream when the
> >> profiled process is stopped?
> >
> >Yeah, with a new perf_attr flag that activates overwrite this
> >usecase would be solved, right? The debugger has to make sure the
> >task is stopped before reading out the buffer, but that's pretty
> >much all.
>
> I'm not sure about that. The way I read struct perf_counter_mmap_page,
> data_head points to the end of the stream (I would guess one byte
> beyond the last record).
>
> I think we can ignore data_tail in the debug scenario since debuggers
> won't poll. We can further assume a buffer overflow no matter how big
> the ring buffer - branch trace grows terribly fast and we don't want
> normal uses to lock megabytes of memory, do we?
>
> How would a debugger find the beginning of the event stream to start
> reading?
something like the below? (utterly untested)
---
include/linux/perf_counter.h | 3 ++-
kernel/perf_counter.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+), 1 deletions(-)
diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 5e970c7..95b5257 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -180,8 +180,9 @@ struct perf_counter_attr {
freq : 1, /* use freq, not period */
inherit_stat : 1, /* per task counts */
enable_on_exec : 1, /* next exec enables */
+ overwrite : 1, /* overwrite mmap data */
- __reserved_1 : 51;
+ __reserved_1 : 50;
__u32 wakeup_events; /* wakeup every n events */
__u32 __reserved_2;
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index d55a50d..0c64d53 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -2097,6 +2097,13 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
nr_pages = (vma_size / PAGE_SIZE) - 1;
/*
+ * attr->overwrite and PROT_WRITE both use ->data_tail in an exclusive
+ * manner, disallow this combination.
+ */
+ if ((vma->vm_flags & VM_WRITE) && counter->attr.overwrite)
+ return -EINVAL;
+
+ /*
* If we have data pages ensure they're a power-of-two number, so we
* can do bitmasks instead of modulo.
*/
@@ -2329,6 +2336,7 @@ struct perf_output_handle {
struct perf_counter *counter;
struct perf_mmap_data *data;
unsigned long head;
+ unsigned long tail;
unsigned long offset;
int nmi;
int sample;
@@ -2363,6 +2371,31 @@ static bool perf_output_space(struct perf_mmap_data *data,
return true;
}
+static void perf_output_tail(struct perf_mmap_data *data, unsigned int head)
+{
+ __u64 *tailp = &data->user_page->data_tail;
+ struct perf_event_header *header;
+ unsigned long pages_mask, nr;
+ unsigned long tail, new;
+ unsigned long size;
+ void *ptr;
+
+ if (data->writable)
+ return;
+
+ size = data->nr_pages << PAGE_SHIFT;
+ pages_mask = data->nr_pages - 1;
+ tail = ACCESS_ONCE(*tailp);
+
+ while (tail + size - head < 0) {
+ nr = (tail >> PAGE_SHIFT) & pages_mask;
+ ptr = data->pages[nr] + (tail & (PAGE_SIZE - 1));
+ header = (struct perf_event_header *)ptr;
+ new = tail + header->size;
+ tail = atomic64_cmpxchg(tailp, tail, new);
+ }
+}
+
static void perf_output_wakeup(struct perf_output_handle *handle)
{
atomic_set(&handle->data->poll, POLL_IN);
@@ -2535,6 +2568,8 @@ static int perf_output_begin(struct perf_output_handle *handle,
head += size;
if (unlikely(!perf_output_space(data, offset, head)))
goto fail;
+ if (unlikely(counter->attr.overwrite))
+ perf_output_tail(data, head);
} while (atomic_long_cmpxchg(&data->head, offset, head) != offset);
handle->offset = offset;
prev parent reply other threads:[~2009-07-06 15:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <tip-511b01bdf64ad8a38414096eab283c7784aebfc4@git.kernel.org>
2009-06-11 6:30 ` [tip:tracing/core] Revert "x86, bts: reenable ptrace branch trace support" Metzger, Markus T
2009-06-11 6:36 ` Peter Zijlstra
2009-06-11 7:17 ` Metzger, Markus T
2009-06-11 8:08 ` Peter Zijlstra
2009-06-11 8:30 ` Metzger, Markus T
2009-06-11 10:21 ` Ingo Molnar
2009-06-11 10:39 ` Metzger, Markus T
2009-06-11 21:41 ` Ingo Molnar
2009-06-12 11:04 ` Metzger, Markus T
2009-06-18 10:23 ` Metzger, Markus T
2009-06-24 13:10 ` Metzger, Markus T
[not found] ` <20090624133645.GE6224@elte.hu>
[not found] ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DB9B9@irsmsx504.ger.corp.intel.com>
[not found] ` <20090624153229.GA24346@elte.hu>
[not found] ` <928CFBE8E7CB0040959E56B4EA41A77EBE2DC3D9@irsmsx504.ger.corp.intel.com>
[not found] ` <20090626122948.GC10850@elte.hu>
[not found] ` <928CFBE8E7CB0040959E56B4EA41A77EBE519869@irsmsx504.ger.corp.intel.com>
[not found] ` <20090629202002.GF31577@elte.hu>
2009-06-30 7:32 ` bts & perf_counters Metzger, Markus T
2009-06-30 19:32 ` Ingo Molnar
2009-07-06 15:34 ` Peter Zijlstra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1246894454.8143.101.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=markus.t.metzger@googlemail.com \
--cc=markus.t.metzger@intel.com \
--cc=mingo@elte.hu \
--cc=paulus@samba.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox