* perf MMAP2 interface and COW faults
@ 2014-03-13 20:03 Don Zickus
2014-03-14 11:17 ` Peter Zijlstra
0 siblings, 1 reply; 8+ messages in thread
From: Don Zickus @ 2014-03-13 20:03 UTC (permalink / raw)
To: peterz; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
Hi Peter,
So we found another corner case with MMAP2 interface. I don't think it is
a big hurdle to overcome, just wanted a suggestion.
Joe ran specjbb2013 (which creates about 10,000 java threads across 9
processes) and our c2c tool turned up some cacheline collision data on
libjvm.so. This didn't make sense because you shouldn't be able to write
to a shared library.
Even worse, our tool said it affected all the java process and a majority
of the threads. Which again didn't make sense because this shared library
should be local to each pid's memory.
Anyway, what we determined is that the shared library had mmap data that
was non-zero (because it was backed by a file, libjvm.so). So the
assumption was if the major, minor, inode and inode generation numbers
were non-zero, this memory segment was shared across processes.
So perf setup its map files for the mmap area and then started sampling data
addresses. A few hundred HITMs were to a virtual address that fell into
the libjvm.so memory segment (which was assumed to be mmap'd across
processes).
Coalescing all the data suggested that multiple pids/tids were contending
for a cacheline in a shared library.
After talking with Larry Woodman, we realized when you write to a 'data' or
'bss' segment of a shared library, you incur a COW fault that maps to an
anonymous page in the pid's memory. However, perf doesn't see this.
So when all the tids start writing to this 'data' or 'bss' segment they
generate HITMs within their pid (which is fine). However the tool thinks
it affects other pids (which is not fine).
My question is, how can our tool determine if a virtual address is private
to a pid or not? Originally it had to have a zero for maj, min, ino, and
ino gen. But for file map'd libraries this doesn't always work because we
don't see COW faults in perf (and we may not want too :-) ).
Is there another technique we can use? Perhaps during the reading of
/proc/<pid>/maps, if the protection is marked 'p' for private, we just tell
the sort algorithm to sort locally to the process but a 's' for shared can
be sorted globally based on data addresses?
Or something else that tells us that a virtual address has changed its
mapping? Thoughts?
Cheers,
Don
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-13 20:03 perf MMAP2 interface and COW faults Don Zickus
@ 2014-03-14 11:17 ` Peter Zijlstra
2014-03-14 12:58 ` Don Zickus
2014-03-17 15:37 ` Don Zickus
0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2014-03-14 11:17 UTC (permalink / raw)
To: Don Zickus; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Thu, Mar 13, 2014 at 04:03:52PM -0400, Don Zickus wrote:
> Hi Peter,
>
> So we found another corner case with MMAP2 interface. I don't think it is
> a big hurdle to overcome, just wanted a suggestion.
>
> Joe ran specjbb2013 (which creates about 10,000 java threads across 9
> processes) and our c2c tool turned up some cacheline collision data on
> libjvm.so. This didn't make sense because you shouldn't be able to write
> to a shared library.
>
> Even worse, our tool said it affected all the java process and a majority
> of the threads. Which again didn't make sense because this shared library
> should be local to each pid's memory.
>
> Anyway, what we determined is that the shared library had mmap data that
> was non-zero (because it was backed by a file, libjvm.so). So the
> assumption was if the major, minor, inode and inode generation numbers
> were non-zero, this memory segment was shared across processes.
>
> So perf setup its map files for the mmap area and then started sampling data
> addresses. A few hundred HITMs were to a virtual address that fell into
> the libjvm.so memory segment (which was assumed to be mmap'd across
> processes).
>
> Coalescing all the data suggested that multiple pids/tids were contending
> for a cacheline in a shared library.
>
> After talking with Larry Woodman, we realized when you write to a 'data' or
> 'bss' segment of a shared library, you incur a COW fault that maps to an
> anonymous page in the pid's memory. However, perf doesn't see this.
>
> So when all the tids start writing to this 'data' or 'bss' segment they
> generate HITMs within their pid (which is fine). However the tool thinks
> it affects other pids (which is not fine).
>
> My question is, how can our tool determine if a virtual address is private
> to a pid or not? Originally it had to have a zero for maj, min, ino, and
> ino gen. But for file map'd libraries this doesn't always work because we
> don't see COW faults in perf (and we may not want too :-) ).
>
> Is there another technique we can use? Perhaps during the reading of
> /proc/<pid>/maps, if the protection is marked 'p' for private, we just tell
> the sort algorithm to sort locally to the process but a 's' for shared can
> be sorted globally based on data addresses?
>
> Or something else that tells us that a virtual address has changed its
> mapping? Thoughts?
Very good indeed; we're missing the protection and flags bits.
How about something like the below; with that you can solve your problem
by looking at mmap2.flags & MAP_PRIVATE.
---
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1ccb395..2ed502f5679f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -699,6 +699,7 @@ enum perf_event_type {
* u32 min;
* u64 ino;
* u64 ino_generation;
+ * u32 prot, flags;
* char filename[];
* struct sample_id sample_id;
* };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 661951ab8ae7..6d50791d3d96 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5100,6 +5107,7 @@ struct perf_mmap_event {
int maj, min;
u64 ino;
u64 ino_generation;
+ u32 prot, flags;
struct {
struct perf_event_header header;
@@ -5141,6 +5149,8 @@ static void perf_event_mmap_output(struct perf_event *event,
mmap_event->event_id.header.size += sizeof(mmap_event->min);
mmap_event->event_id.header.size += sizeof(mmap_event->ino);
mmap_event->event_id.header.size += sizeof(mmap_event->ino_generation);
+ mmap_event->event_id.header.size += sizeof(mmap_event->prot);
+ mmap_event->event_id.header.size += sizeof(mmap_event->flags);
}
perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
@@ -5159,6 +5169,8 @@ static void perf_event_mmap_output(struct perf_event *event,
perf_output_put(&handle, mmap_event->min);
perf_output_put(&handle, mmap_event->ino);
perf_output_put(&handle, mmap_event->ino_generation);
+ perf_output_put(&handle, mmap_event->prot);
+ perf_output_put(&handle, mmap_event->flags);
}
__output_copy(&handle, mmap_event->file_name,
@@ -5177,11 +5189,35 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
struct file *file = vma->vm_file;
int maj = 0, min = 0;
u64 ino = 0, gen = 0;
+ u32 prot = 0, flags = 0;
unsigned int size;
char tmp[16];
char *buf = NULL;
char *name;
+ if (event->attr.mmap2) {
+ if (vma->vm_flags & VM_READ)
+ prot |= PROT_READ;
+ if (vma->vm_flags & VM_WRITE)
+ prot |= PROT_WRITE;
+ if (vma->vm_flags & VM_EXEC)
+ prot |= PROT_EXEC;
+
+ if (vma->vm_flags & VM_MAYSHARE)
+ flags = MAP_SHARED;
+ else
+ flags = MAP_PRIVATE;
+
+ if (vma->vm_flags & VM_DENYWRITE)
+ flags |= MAP_DENYWRITE;
+ if (vma->vm_flags & VM_MAYEXEC)
+ flags |= MAP_EXECUTABLE;
+ if (vma->vm_flags & VM_LOCKED)
+ flags |= MAP_LOCKED;
+ if (vma->vm_flags & VM_HUGETLB)
+ flags |= MAP_HUGETLB;
+ }
+
if (file) {
struct inode *inode;
dev_t dev;
@@ -5247,6 +5283,8 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
mmap_event->min = min;
mmap_event->ino = ino;
mmap_event->ino_generation = gen;
+ mmap_event->prot = prot;
+ mmap_event->flags = flags;
if (!(vma->vm_flags & VM_EXEC))
mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
@@ -5287,6 +5325,8 @@ void perf_event_mmap(struct vm_area_struct *vma)
/* .min (attr_mmap2 only) */
/* .ino (attr_mmap2 only) */
/* .ino_generation (attr_mmap2 only) */
+ /* .prot (attr_mmap2 only) */
+ /* .flags (attr_mmap2 only) */
};
perf_event_mmap_event(&mmap_event);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-14 11:17 ` Peter Zijlstra
@ 2014-03-14 12:58 ` Don Zickus
2014-03-14 13:24 ` Peter Zijlstra
2014-03-17 15:37 ` Don Zickus
1 sibling, 1 reply; 8+ messages in thread
From: Don Zickus @ 2014-03-14 12:58 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Fri, Mar 14, 2014 at 12:17:05PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 13, 2014 at 04:03:52PM -0400, Don Zickus wrote:
> > Hi Peter,
> >
> > So we found another corner case with MMAP2 interface. I don't think it is
> > a big hurdle to overcome, just wanted a suggestion.
> >
> > Joe ran specjbb2013 (which creates about 10,000 java threads across 9
> > processes) and our c2c tool turned up some cacheline collision data on
> > libjvm.so. This didn't make sense because you shouldn't be able to write
> > to a shared library.
> >
> > Even worse, our tool said it affected all the java process and a majority
> > of the threads. Which again didn't make sense because this shared library
> > should be local to each pid's memory.
> >
> > Anyway, what we determined is that the shared library had mmap data that
> > was non-zero (because it was backed by a file, libjvm.so). So the
> > assumption was if the major, minor, inode and inode generation numbers
> > were non-zero, this memory segment was shared across processes.
> >
> > So perf setup its map files for the mmap area and then started sampling data
> > addresses. A few hundred HITMs were to a virtual address that fell into
> > the libjvm.so memory segment (which was assumed to be mmap'd across
> > processes).
> >
> > Coalescing all the data suggested that multiple pids/tids were contending
> > for a cacheline in a shared library.
> >
> > After talking with Larry Woodman, we realized when you write to a 'data' or
> > 'bss' segment of a shared library, you incur a COW fault that maps to an
> > anonymous page in the pid's memory. However, perf doesn't see this.
> >
> > So when all the tids start writing to this 'data' or 'bss' segment they
> > generate HITMs within their pid (which is fine). However the tool thinks
> > it affects other pids (which is not fine).
> >
> > My question is, how can our tool determine if a virtual address is private
> > to a pid or not? Originally it had to have a zero for maj, min, ino, and
> > ino gen. But for file map'd libraries this doesn't always work because we
> > don't see COW faults in perf (and we may not want too :-) ).
> >
> > Is there another technique we can use? Perhaps during the reading of
> > /proc/<pid>/maps, if the protection is marked 'p' for private, we just tell
> > the sort algorithm to sort locally to the process but a 's' for shared can
> > be sorted globally based on data addresses?
> >
> > Or something else that tells us that a virtual address has changed its
> > mapping? Thoughts?
>
> Very good indeed; we're missing the protection and flags bits.
>
> How about something like the below; with that you can solve your problem
> by looking at mmap2.flags & MAP_PRIVATE.
Hmmm. That will probably work for future mmap events. My problem is for
synthesized mmap events. We read the same protection bits from
/proc/<pid>/maps file, so I assume the same strategy can work for those
events too?
Let me incorporate this patch and hack up perf to handle it and let you
know how it goes.
Thanks!
Cheers,
Don
>
> ---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 853bc1ccb395..2ed502f5679f 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -699,6 +699,7 @@ enum perf_event_type {
> * u32 min;
> * u64 ino;
> * u64 ino_generation;
> + * u32 prot, flags;
> * char filename[];
> * struct sample_id sample_id;
> * };
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 661951ab8ae7..6d50791d3d96 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5100,6 +5107,7 @@ struct perf_mmap_event {
> int maj, min;
> u64 ino;
> u64 ino_generation;
> + u32 prot, flags;
>
> struct {
> struct perf_event_header header;
> @@ -5141,6 +5149,8 @@ static void perf_event_mmap_output(struct perf_event *event,
> mmap_event->event_id.header.size += sizeof(mmap_event->min);
> mmap_event->event_id.header.size += sizeof(mmap_event->ino);
> mmap_event->event_id.header.size += sizeof(mmap_event->ino_generation);
> + mmap_event->event_id.header.size += sizeof(mmap_event->prot);
> + mmap_event->event_id.header.size += sizeof(mmap_event->flags);
> }
>
> perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
> @@ -5159,6 +5169,8 @@ static void perf_event_mmap_output(struct perf_event *event,
> perf_output_put(&handle, mmap_event->min);
> perf_output_put(&handle, mmap_event->ino);
> perf_output_put(&handle, mmap_event->ino_generation);
> + perf_output_put(&handle, mmap_event->prot);
> + perf_output_put(&handle, mmap_event->flags);
> }
>
> __output_copy(&handle, mmap_event->file_name,
> @@ -5177,11 +5189,35 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> struct file *file = vma->vm_file;
> int maj = 0, min = 0;
> u64 ino = 0, gen = 0;
> + u32 prot = 0, flags = 0;
> unsigned int size;
> char tmp[16];
> char *buf = NULL;
> char *name;
>
> + if (event->attr.mmap2) {
> + if (vma->vm_flags & VM_READ)
> + prot |= PROT_READ;
> + if (vma->vm_flags & VM_WRITE)
> + prot |= PROT_WRITE;
> + if (vma->vm_flags & VM_EXEC)
> + prot |= PROT_EXEC;
> +
> + if (vma->vm_flags & VM_MAYSHARE)
> + flags = MAP_SHARED;
> + else
> + flags = MAP_PRIVATE;
> +
> + if (vma->vm_flags & VM_DENYWRITE)
> + flags |= MAP_DENYWRITE;
> + if (vma->vm_flags & VM_MAYEXEC)
> + flags |= MAP_EXECUTABLE;
> + if (vma->vm_flags & VM_LOCKED)
> + flags |= MAP_LOCKED;
> + if (vma->vm_flags & VM_HUGETLB)
> + flags |= MAP_HUGETLB;
> + }
> +
> if (file) {
> struct inode *inode;
> dev_t dev;
> @@ -5247,6 +5283,8 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> mmap_event->min = min;
> mmap_event->ino = ino;
> mmap_event->ino_generation = gen;
> + mmap_event->prot = prot;
> + mmap_event->flags = flags;
>
> if (!(vma->vm_flags & VM_EXEC))
> mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
> @@ -5287,6 +5325,8 @@ void perf_event_mmap(struct vm_area_struct *vma)
> /* .min (attr_mmap2 only) */
> /* .ino (attr_mmap2 only) */
> /* .ino_generation (attr_mmap2 only) */
> + /* .prot (attr_mmap2 only) */
> + /* .flags (attr_mmap2 only) */
> };
>
> perf_event_mmap_event(&mmap_event);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-14 12:58 ` Don Zickus
@ 2014-03-14 13:24 ` Peter Zijlstra
2014-03-14 13:37 ` Don Zickus
0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2014-03-14 13:24 UTC (permalink / raw)
To: Don Zickus; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Fri, Mar 14, 2014 at 08:58:37AM -0400, Don Zickus wrote:
> Hmmm. That will probably work for future mmap events. My problem is for
> synthesized mmap events. We read the same protection bits from
> /proc/<pid>/maps file, so I assume the same strategy can work for those
> events too?
Yeah, so /proc/$pid/maps doesn't contain all those bits, but it does
have the prot read/write/exec and flags shared/private thing, which
should be sufficient for your needs.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-14 13:24 ` Peter Zijlstra
@ 2014-03-14 13:37 ` Don Zickus
0 siblings, 0 replies; 8+ messages in thread
From: Don Zickus @ 2014-03-14 13:37 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Fri, Mar 14, 2014 at 02:24:47PM +0100, Peter Zijlstra wrote:
> On Fri, Mar 14, 2014 at 08:58:37AM -0400, Don Zickus wrote:
> > Hmmm. That will probably work for future mmap events. My problem is for
> > synthesized mmap events. We read the same protection bits from
> > /proc/<pid>/maps file, so I assume the same strategy can work for those
> > events too?
>
> Yeah, so /proc/$pid/maps doesn't contain all those bits, but it does
> have the prot read/write/exec and flags shared/private thing, which
> should be sufficient for your needs.
Great, thanks!
Cheers,
Don
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-14 11:17 ` Peter Zijlstra
2014-03-14 12:58 ` Don Zickus
@ 2014-03-17 15:37 ` Don Zickus
2014-03-17 15:41 ` Don Zickus
2014-03-17 16:16 ` Peter Zijlstra
1 sibling, 2 replies; 8+ messages in thread
From: Don Zickus @ 2014-03-17 15:37 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Fri, Mar 14, 2014 at 12:17:05PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 13, 2014 at 04:03:52PM -0400, Don Zickus wrote:
> > Hi Peter,
> >
> > So we found another corner case with MMAP2 interface. I don't think it is
> > a big hurdle to overcome, just wanted a suggestion.
> >
> > Joe ran specjbb2013 (which creates about 10,000 java threads across 9
> > processes) and our c2c tool turned up some cacheline collision data on
> > libjvm.so. This didn't make sense because you shouldn't be able to write
> > to a shared library.
> >
> > Even worse, our tool said it affected all the java process and a majority
> > of the threads. Which again didn't make sense because this shared library
> > should be local to each pid's memory.
> >
> > Anyway, what we determined is that the shared library had mmap data that
> > was non-zero (because it was backed by a file, libjvm.so). So the
> > assumption was if the major, minor, inode and inode generation numbers
> > were non-zero, this memory segment was shared across processes.
> >
> > So perf setup its map files for the mmap area and then started sampling data
> > addresses. A few hundred HITMs were to a virtual address that fell into
> > the libjvm.so memory segment (which was assumed to be mmap'd across
> > processes).
> >
> > Coalescing all the data suggested that multiple pids/tids were contending
> > for a cacheline in a shared library.
> >
> > After talking with Larry Woodman, we realized when you write to a 'data' or
> > 'bss' segment of a shared library, you incur a COW fault that maps to an
> > anonymous page in the pid's memory. However, perf doesn't see this.
> >
> > So when all the tids start writing to this 'data' or 'bss' segment they
> > generate HITMs within their pid (which is fine). However the tool thinks
> > it affects other pids (which is not fine).
> >
> > My question is, how can our tool determine if a virtual address is private
> > to a pid or not? Originally it had to have a zero for maj, min, ino, and
> > ino gen. But for file map'd libraries this doesn't always work because we
> > don't see COW faults in perf (and we may not want too :-) ).
> >
> > Is there another technique we can use? Perhaps during the reading of
> > /proc/<pid>/maps, if the protection is marked 'p' for private, we just tell
> > the sort algorithm to sort locally to the process but a 's' for shared can
> > be sorted globally based on data addresses?
> >
> > Or something else that tells us that a virtual address has changed its
> > mapping? Thoughts?
>
> Very good indeed; we're missing the protection and flags bits.
>
> How about something like the below; with that you can solve your problem
> by looking at mmap2.flags & MAP_PRIVATE.
Yes this seemed to work. I attached a slight update to your patch (one
that compiles :-) ). And I will reply to this thread with the tool
changes I used to verify this (in case I did that piece wrong).
Cheers,
Don
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 853bc1c..2ed502f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -699,6 +699,7 @@ enum perf_event_type {
* u32 min;
* u64 ino;
* u64 ino_generation;
+ * u32 prot, flags;
* char filename[];
* struct sample_id sample_id;
* };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81919fe..ace46f8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
#include <linux/hw_breakpoint.h>
#include <linux/mm_types.h>
#include <linux/cgroup.h>
+#include <linux/mman.h>
#include "internal.h"
@@ -5100,6 +5101,7 @@ struct perf_mmap_event {
int maj, min;
u64 ino;
u64 ino_generation;
+ u32 prot, flags;
struct {
struct perf_event_header header;
@@ -5141,6 +5143,8 @@ static void perf_event_mmap_output(struct perf_event *event,
mmap_event->event_id.header.size += sizeof(mmap_event->min);
mmap_event->event_id.header.size += sizeof(mmap_event->ino);
mmap_event->event_id.header.size += sizeof(mmap_event->ino_generation);
+ mmap_event->event_id.header.size += sizeof(mmap_event->prot);
+ mmap_event->event_id.header.size += sizeof(mmap_event->flags);
}
perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
@@ -5159,6 +5163,8 @@ static void perf_event_mmap_output(struct perf_event *event,
perf_output_put(&handle, mmap_event->min);
perf_output_put(&handle, mmap_event->ino);
perf_output_put(&handle, mmap_event->ino_generation);
+ perf_output_put(&handle, mmap_event->prot);
+ perf_output_put(&handle, mmap_event->flags);
}
__output_copy(&handle, mmap_event->file_name,
@@ -5177,6 +5183,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
struct file *file = vma->vm_file;
int maj = 0, min = 0;
u64 ino = 0, gen = 0;
+ u32 prot = 0, flags = 0;
unsigned int size;
char tmp[16];
char *buf = NULL;
@@ -5207,6 +5214,28 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
gen = inode->i_generation;
maj = MAJOR(dev);
min = MINOR(dev);
+
+ if (vma->vm_flags & VM_READ)
+ prot |= PROT_READ;
+ if (vma->vm_flags & VM_WRITE)
+ prot |= PROT_WRITE;
+ if (vma->vm_flags & VM_EXEC)
+ prot |= PROT_EXEC;
+
+ if (vma->vm_flags & VM_MAYSHARE)
+ flags = MAP_SHARED;
+ else
+ flags = MAP_PRIVATE;
+
+ if (vma->vm_flags & VM_DENYWRITE)
+ flags |= MAP_DENYWRITE;
+ if (vma->vm_flags & VM_MAYEXEC)
+ flags |= MAP_EXECUTABLE;
+ if (vma->vm_flags & VM_LOCKED)
+ flags |= MAP_LOCKED;
+ if (vma->vm_flags & VM_HUGETLB)
+ flags |= MAP_HUGETLB;
+
goto got_name;
} else {
name = (char *)arch_vma_name(vma);
@@ -5247,6 +5276,8 @@ got_name:
mmap_event->min = min;
mmap_event->ino = ino;
mmap_event->ino_generation = gen;
+ mmap_event->prot = prot;
+ mmap_event->flags = flags;
if (!(vma->vm_flags & VM_EXEC))
mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
@@ -5287,6 +5318,8 @@ void perf_event_mmap(struct vm_area_struct *vma)
/* .min (attr_mmap2 only) */
/* .ino (attr_mmap2 only) */
/* .ino_generation (attr_mmap2 only) */
+ /* .prot (attr_mmap2 only) */
+ /* .flags (attr_mmap2 only) */
};
perf_event_mmap_event(&mmap_event);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-17 15:37 ` Don Zickus
@ 2014-03-17 15:41 ` Don Zickus
2014-03-17 16:16 ` Peter Zijlstra
1 sibling, 0 replies; 8+ messages in thread
From: Don Zickus @ 2014-03-17 15:41 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Mon, Mar 17, 2014 at 11:37:47AM -0400, Don Zickus wrote:
> On Fri, Mar 14, 2014 at 12:17:05PM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 13, 2014 at 04:03:52PM -0400, Don Zickus wrote:
> > > Hi Peter,
> > >
> > > So we found another corner case with MMAP2 interface. I don't think it is
> > > a big hurdle to overcome, just wanted a suggestion.
> > >
> > > Joe ran specjbb2013 (which creates about 10,000 java threads across 9
> > > processes) and our c2c tool turned up some cacheline collision data on
> > > libjvm.so. This didn't make sense because you shouldn't be able to write
> > > to a shared library.
> > >
> > > Even worse, our tool said it affected all the java process and a majority
> > > of the threads. Which again didn't make sense because this shared library
> > > should be local to each pid's memory.
> > >
> > > Anyway, what we determined is that the shared library had mmap data that
> > > was non-zero (because it was backed by a file, libjvm.so). So the
> > > assumption was if the major, minor, inode and inode generation numbers
> > > were non-zero, this memory segment was shared across processes.
> > >
> > > So perf setup its map files for the mmap area and then started sampling data
> > > addresses. A few hundred HITMs were to a virtual address that fell into
> > > the libjvm.so memory segment (which was assumed to be mmap'd across
> > > processes).
> > >
> > > Coalescing all the data suggested that multiple pids/tids were contending
> > > for a cacheline in a shared library.
> > >
> > > After talking with Larry Woodman, we realized when you write to a 'data' or
> > > 'bss' segment of a shared library, you incur a COW fault that maps to an
> > > anonymous page in the pid's memory. However, perf doesn't see this.
> > >
> > > So when all the tids start writing to this 'data' or 'bss' segment they
> > > generate HITMs within their pid (which is fine). However the tool thinks
> > > it affects other pids (which is not fine).
> > >
> > > My question is, how can our tool determine if a virtual address is private
> > > to a pid or not? Originally it had to have a zero for maj, min, ino, and
> > > ino gen. But for file map'd libraries this doesn't always work because we
> > > don't see COW faults in perf (and we may not want too :-) ).
> > >
> > > Is there another technique we can use? Perhaps during the reading of
> > > /proc/<pid>/maps, if the protection is marked 'p' for private, we just tell
> > > the sort algorithm to sort locally to the process but a 's' for shared can
> > > be sorted globally based on data addresses?
> > >
> > > Or something else that tells us that a virtual address has changed its
> > > mapping? Thoughts?
> >
> > Very good indeed; we're missing the protection and flags bits.
> >
> > How about something like the below; with that you can solve your problem
> > by looking at mmap2.flags & MAP_PRIVATE.
>
> Yes this seemed to work. I attached a slight update to your patch (one
> that compiles :-) ). And I will reply to this thread with the tool
> changes I used to verify this (in case I did that piece wrong).
And here is the perf tool changes I did to utilize this. Of course this
is based on reverting 3090ffb5a2515990182f3f55b0688a7817325488.
Cheers,
Don
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index b02f3b4..bad58fc 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1,4 +1,5 @@
#include <linux/types.h>
+#include <sys/mman.h>
#include "event.h"
#include "debug.h"
#include "machine.h"
@@ -213,6 +214,21 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
else
event->header.misc = PERF_RECORD_MISC_GUEST_USER;
+ /* map protection and flags bits */
+ event->mmap2.prot = 0;
+ event->mmap2.flags = 0;
+ if (prot[0] == 'r')
+ event->mmap2.prot |= PROT_READ;
+ if (prot[1] == 'w')
+ event->mmap2.prot |= PROT_WRITE;
+ if (prot[2] == 'x')
+ event->mmap2.prot |= PROT_EXEC;
+
+ if (prot[3] == 's')
+ event->mmap2.flags |= MAP_SHARED;
+ else
+ event->mmap2.flags |= MAP_PRIVATE;
+
if (prot[2] != 'x') {
if (!mmap_data || prot[0] != 'r')
continue;
@@ -613,12 +629,15 @@ size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
{
return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64
- " %02x:%02x %"PRIu64" %"PRIu64"]: %c %s\n",
+ " %02x:%02x %"PRIu64" %"PRIu64"]: %c%c%c%c %s\n",
event->mmap2.pid, event->mmap2.tid, event->mmap2.start,
event->mmap2.len, event->mmap2.pgoff, event->mmap2.maj,
event->mmap2.min, event->mmap2.ino,
event->mmap2.ino_generation,
- (event->header.misc & PERF_RECORD_MISC_MMAP_DATA) ? 'r' : 'x',
+ (event->mmap2.prot & PROT_READ) ? 'r' : '-',
+ (event->mmap2.prot & PROT_WRITE) ? 'w' : '-',
+ (event->mmap2.prot & PROT_EXEC) ? 'x' : '-',
+ (event->mmap2.flags & MAP_SHARED) ? 's' : 'p',
event->mmap2.filename);
}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 38457d4..96bd19c 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -27,6 +27,8 @@ struct mmap2_event {
u32 min;
u64 ino;
u64 ino_generation;
+ u32 prot;
+ u32 flags;
char filename[PATH_MAX];
};
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c8b0fdd..986931e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1040,6 +1040,8 @@ int machine__process_mmap2_event(struct machine *machine,
event->mmap2.pid, event->mmap2.maj,
event->mmap2.min, event->mmap2.ino,
event->mmap2.ino_generation,
+ event->mmap2.prot,
+ event->mmap2.flags,
event->mmap2.filename, type);
if (map == NULL)
@@ -1085,7 +1087,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
map = map__new(&machine->user_dsos, event->mmap.start,
event->mmap.len, event->mmap.pgoff,
- event->mmap.pid, 0, 0, 0, 0,
+ event->mmap.pid, 0, 0, 0, 0, 0, 0,
event->mmap.filename,
type);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 39cd2d0..f98f8fe 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -51,7 +51,7 @@ void map__init(struct map *map, enum map_type type,
struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
u64 pgoff, u32 pid, u32 d_maj, u32 d_min, u64 ino,
- u64 ino_gen, char *filename,
+ u64 ino_gen, u32 prot, u32 flags, char *filename,
enum map_type type)
{
struct map *map = malloc(sizeof(*map));
@@ -69,6 +69,8 @@ struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
map->min = d_min;
map->ino = ino;
map->ino_generation = ino_gen;
+ map->prot = prot;
+ map->flags = flags;
if ((anon || no_dso) && type == MAP__FUNCTION) {
snprintf(newfilename, sizeof(newfilename), "/tmp/perf-%d.map", pid);
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index f00f058..8cd0cff 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -35,6 +35,8 @@ struct map {
bool referenced;
bool erange_warned;
u32 priv;
+ u32 prot;
+ u32 flags;
u64 pgoff;
u64 reloc;
u32 maj, min; /* only valid for MMAP2 record */
@@ -106,7 +108,7 @@ void map__init(struct map *map, enum map_type type,
u64 start, u64 end, u64 pgoff, struct dso *dso);
struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
u64 pgoff, u32 pid, u32 d_maj, u32 d_min, u64 ino,
- u64 ino_gen,
+ u64 ino_gen, u32 prot, u32 flags,
char *filename, enum map_type type);
struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
void map__delete(struct map *map);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: perf MMAP2 interface and COW faults
2014-03-17 15:37 ` Don Zickus
2014-03-17 15:41 ` Don Zickus
@ 2014-03-17 16:16 ` Peter Zijlstra
1 sibling, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2014-03-17 16:16 UTC (permalink / raw)
To: Don Zickus; +Cc: eranian, jmario, jolsa, acme, linux-kernel, lwoodman
On Mon, Mar 17, 2014 at 11:37:47AM -0400, Don Zickus wrote:
> Yes this seemed to work. I attached a slight update to your patch (one
> that compiles :-) ). And I will reply to this thread with the tool
> changes I used to verify this (in case I did that piece wrong).
:-)
OK, I'll write it a changelog and queue. Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-03-17 16:17 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-13 20:03 perf MMAP2 interface and COW faults Don Zickus
2014-03-14 11:17 ` Peter Zijlstra
2014-03-14 12:58 ` Don Zickus
2014-03-14 13:24 ` Peter Zijlstra
2014-03-14 13:37 ` Don Zickus
2014-03-17 15:37 ` Don Zickus
2014-03-17 15:41 ` Don Zickus
2014-03-17 16:16 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).