* perf: child events not killed on release paths, survive indefinitely
@ 2014-07-18 12:32 Mark Rutland
2014-07-18 14:03 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2014-07-18 12:32 UTC (permalink / raw)
To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
Arnaldo Carvalho de Melo, Sahasrabudhe, Sheetal
Cc: Will Deacon, linux-kernel@vger.kernel.org
Hi all,
Sheetal reported a weird issue on arm where events which have been
closed seem to stay around and compete for HW counters if an application
has forked between the events being opened and them being closed.
I've reproduced this in mainline and linux-next and this seems to be a
generic issue; the below test case fires on my x86-64 workstation as
well as on arm and arm64.
The problem is the way we (don't) handle child events when releasing a
parent in perf_release and perf_event_release_kernel. We call put_event
on the parent when it is released, but this will exit early having done
nothing because each child will have incremented the parent refcount
when initialised from perf_event_init_task. We don't seem to do anything
about the children in this case.
Thus the parent event can't be killed until all the children have first
been killed. As the only places references to them exist are the
parent's child_list and their respective tasks' hardware
perf_event_context, they'll only get killed when their respective tasks
exit (I confirmed this with some printks in hw_perf_event_destroy and
put_event). Until that happens they remain in their respective contexts
and continue to compete for HW counters, adversely affecting events
opened later.
I'm not sure what the best way of handling this is. We need to clean up
the children when the last possible user of the event is gone, but it
looks to me like we'd need to have a separate child_refcount or
reader_refcount to be able to tell when the events are still useful and
when the only references which remain are internal.
Any ideas?
Cheers,
Mark.
---->8----
/*
* It looks like events which we close are hanging around in
* kernel-space, and remain schedulable unexpectedly, penalizing events
* we later open. Let's try to detect that...
*/
#include <pthread.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <asm/unistd.h>
/*
* Something the children can block on indefinitely without wasting CPU time.
*/
int pipefd[2];
void *block_forever(void *unused)
{
read(pipefd[0], NULL, 1);
return NULL;
}
void create_child(void)
{
pthread_t child;
if (pthread_create(&child, NULL, block_forever, NULL) == 0)
return;
printf("Unable to create child thread.\n");
exit(1);
}
/* There's no libc wrapper... */
int perf_event_open(struct perf_event_attr *attr,
pid_t pid, int cpu, int group_fd,
unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.size = sizeof(attr),
.config = PERF_COUNT_HW_INSTRUCTIONS,
.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING,
.inherit = 1,
};
int create_event(void)
{
int event_fd = perf_event_open(&attr, 0, -1, -1, 0);
if (event_fd >= 0)
return event_fd;
printf("Unable to open event\n");
exit(1);
}
void validate_event(int fd)
{
struct read_format {
uint64_t value;
uint64_t enabled;
uint64_t running;
} data;
if (read(fd, &data, sizeof(data)) != sizeof(data)) {
printf("Unable to read event.\n");
exit(1);
}
/*
* When there's no competition for counters, eligible events should
* always be scheduled. Given we closed all prior events, there should
* be no contention in the absence of events owned by other tasks.
*/
if (data.enabled != data.running) {
printf("HW counter contention detected\n");
exit(1);
}
}
int main(int argc, char *argv[])
{
if (pipe(pipefd) != 0) {
printf("Unable to open pipe\n");
exit(1);
}
for (;;) {
int event_fd = create_event();
create_child();
validate_event(event_fd);
close(event_fd);
printf(".");
};
return 0;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf: child events not killed on release paths, survive indefinitely
2014-07-18 12:32 perf: child events not killed on release paths, survive indefinitely Mark Rutland
@ 2014-07-18 14:03 ` Peter Zijlstra
2014-07-18 14:31 ` Mark Rutland
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2014-07-18 14:03 UTC (permalink / raw)
To: Mark Rutland
Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
Sahasrabudhe, Sheetal, Will Deacon, linux-kernel@vger.kernel.org,
jolsa
On Fri, Jul 18, 2014 at 01:32:39PM +0100, Mark Rutland wrote:
> Hi all,
>
> Sheetal reported a weird issue on arm where events which have been
> closed seem to stay around and compete for HW counters if an application
> has forked between the events being opened and them being closed.
>
> I've reproduced this in mainline and linux-next and this seems to be a
> generic issue; the below test case fires on my x86-64 workstation as
> well as on arm and arm64.
>
> The problem is the way we (don't) handle child events when releasing a
> parent in perf_release and perf_event_release_kernel. We call put_event
> on the parent when it is released, but this will exit early having done
> nothing because each child will have incremented the parent refcount
> when initialised from perf_event_init_task. We don't seem to do anything
> about the children in this case.
>
> Thus the parent event can't be killed until all the children have first
> been killed. As the only places references to them exist are the
> parent's child_list and their respective tasks' hardware
> perf_event_context, they'll only get killed when their respective tasks
> exit (I confirmed this with some printks in hw_perf_event_destroy and
> put_event). Until that happens they remain in their respective contexts
> and continue to compete for HW counters, adversely affecting events
> opened later.
>
> I'm not sure what the best way of handling this is. We need to clean up
> the children when the last possible user of the event is gone, but it
> looks to me like we'd need to have a separate child_refcount or
> reader_refcount to be able to tell when the events are still useful and
> when the only references which remain are internal.
>
> Any ideas?
Jiri was recently poking at that:
lkml.kernel.org/r/1405079782-8139-3-git-send-email-jolsa@kernel.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf: child events not killed on release paths, survive indefinitely
2014-07-18 14:03 ` Peter Zijlstra
@ 2014-07-18 14:31 ` Mark Rutland
2014-07-18 14:42 ` Jiri Olsa
0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2014-07-18 14:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
Sahasrabudhe, Sheetal, Will Deacon, linux-kernel@vger.kernel.org,
jolsa@redhat.com
On Fri, Jul 18, 2014 at 03:03:43PM +0100, Peter Zijlstra wrote:
> On Fri, Jul 18, 2014 at 01:32:39PM +0100, Mark Rutland wrote:
> > Hi all,
> >
> > Sheetal reported a weird issue on arm where events which have been
> > closed seem to stay around and compete for HW counters if an application
> > has forked between the events being opened and them being closed.
> >
> > I've reproduced this in mainline and linux-next and this seems to be a
> > generic issue; the below test case fires on my x86-64 workstation as
> > well as on arm and arm64.
> >
> > The problem is the way we (don't) handle child events when releasing a
> > parent in perf_release and perf_event_release_kernel. We call put_event
> > on the parent when it is released, but this will exit early having done
> > nothing because each child will have incremented the parent refcount
> > when initialised from perf_event_init_task. We don't seem to do anything
> > about the children in this case.
> >
> > Thus the parent event can't be killed until all the children have first
> > been killed. As the only places references to them exist are the
> > parent's child_list and their respective tasks' hardware
> > perf_event_context, they'll only get killed when their respective tasks
> > exit (I confirmed this with some printks in hw_perf_event_destroy and
> > put_event). Until that happens they remain in their respective contexts
> > and continue to compete for HW counters, adversely affecting events
> > opened later.
> >
> > I'm not sure what the best way of handling this is. We need to clean up
> > the children when the last possible user of the event is gone, but it
> > looks to me like we'd need to have a separate child_refcount or
> > reader_refcount to be able to tell when the events are still useful and
> > when the only references which remain are internal.
> >
> > Any ideas?
>
> Jiri was recently poking at that:
>
> lkml.kernel.org/r/1405079782-8139-3-git-send-email-jolsa@kernel.org
Ah. I hadn't spotted that, thanks for the link.
That approach (closing child events when the owner exits) doesn't seem
to fix the general case, as long running tasks (think interactive
debugger/profiler) could open and close many events before exiting, if
nothing else wasting memory until it does so.
My test case triggers with said patch applied (before hanging, probably
due to the AB-BA deadlock).
Jiri, could you add me to Cc for future versions of that series?
I'll have a look and see if I can come up with something; otherwise I'm
happy to test/review. :)
Cheers,
Mark.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf: child events not killed on release paths, survive indefinitely
2014-07-18 14:31 ` Mark Rutland
@ 2014-07-18 14:42 ` Jiri Olsa
2014-07-18 15:50 ` Mark Rutland
0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2014-07-18 14:42 UTC (permalink / raw)
To: Mark Rutland
Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
Arnaldo Carvalho de Melo, Sahasrabudhe, Sheetal, Will Deacon,
linux-kernel@vger.kernel.org
On Fri, Jul 18, 2014 at 03:31:57PM +0100, Mark Rutland wrote:
SNIP
> > > I'm not sure what the best way of handling this is. We need to clean up
> > > the children when the last possible user of the event is gone, but it
> > > looks to me like we'd need to have a separate child_refcount or
> > > reader_refcount to be able to tell when the events are still useful and
> > > when the only references which remain are internal.
> > >
> > > Any ideas?
> >
> > Jiri was recently poking at that:
> >
> > lkml.kernel.org/r/1405079782-8139-3-git-send-email-jolsa@kernel.org
>
> Ah. I hadn't spotted that, thanks for the link.
>
> That approach (closing child events when the owner exits) doesn't seem
> to fix the general case, as long running tasks (think interactive
> debugger/profiler) could open and close many events before exiting, if
> nothing else wasting memory until it does so.
>
> My test case triggers with said patch applied (before hanging, probably
> due to the AB-BA deadlock).
yep, peter already found that
http://marc.info/?l=linux-kernel&m=140541548218652&w=2
>
> Jiri, could you add me to Cc for future versions of that series?
>
> I'll have a look and see if I can come up with something; otherwise I'm
> happy to test/review. :)
sure, and vice versa ;-)
jirka
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: perf: child events not killed on release paths, survive indefinitely
2014-07-18 14:42 ` Jiri Olsa
@ 2014-07-18 15:50 ` Mark Rutland
0 siblings, 0 replies; 5+ messages in thread
From: Mark Rutland @ 2014-07-18 15:50 UTC (permalink / raw)
To: Jiri Olsa
Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
Arnaldo Carvalho de Melo, Sahasrabudhe, Sheetal, Will Deacon,
linux-kernel@vger.kernel.org
> > Jiri, could you add me to Cc for future versions of that series?
> >
> > I'll have a look and see if I can come up with something; otherwise I'm
> > happy to test/review. :)
>
> sure, and vice versa ;-)
Will do. :)
Mark.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-18 15:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-18 12:32 perf: child events not killed on release paths, survive indefinitely Mark Rutland
2014-07-18 14:03 ` Peter Zijlstra
2014-07-18 14:31 ` Mark Rutland
2014-07-18 14:42 ` Jiri Olsa
2014-07-18 15:50 ` Mark Rutland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox