From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759221AbbA2Xzn (ORCPT ); Thu, 29 Jan 2015 18:55:43 -0500 Received: from e37.co.us.ibm.com ([32.97.110.158]:57586 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758671AbbA2Xzm (ORCPT ); Thu, 29 Jan 2015 18:55:42 -0500 Date: Thu, 29 Jan 2015 15:55:22 -0800 From: Sukadev Bhattiprolu To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , linux-kernel@vger.kernel.org, zhlbj@cn.ibm.com Subject: perf_evlist__filter_pollfd() in trace__run() Message-ID: <20150129235522.GA14385@us.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: Linux 2.0.32 on an i486 User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15012923-0025-0000-0000-0000082B2E90 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Arnaldo, On one of our systems we are seeing an intermittent SIGSEGV with perf trace sleep 1 and I have question about the 'draining' flag below: | | From 46fb3c21d20415dd2693570c33d0ea6eb8745e04 Mon Sep 17 00:00:00 2001 | From: Arnaldo Carvalho de Melo | Date: Mon, 22 Sep 2014 14:39:48 -0300 | Subject: [PATCH 1/1] perf trace: Filter out POLLHUP'ed file descriptors | | So that we don't continue polling on vanished file descriptors, i.e. | file descriptors for events monitoring threads that exited. | | I.e. the following 'trace' command now exits as expected, instead | of staying in an eternal loop: | | $ sleep 5s & | $ trace -p `pidof sleep` | | Reported-by: Jiri Olsa | Cc: Adrian Hunter | Cc: David Ahern | Cc: Don Zickus | Cc: Frederic Weisbecker | Cc: Jiri Olsa | Cc: Mike Galbraith | Cc: Namhyung Kim | Cc: Paul Mackerras | Cc: Peter Zijlstra | Cc: Stephane Eranian | Link: http://lkml.kernel.org/n/tip-6qegv786zbf6i8us6t4rxug9@git.kernel.org | Signed-off-by: Arnaldo Carvalho de Melo | --- | tools/perf/builtin-trace.c | 7 ++++++- | 1 file changed, 6 insertions(+), 1 deletion(-) | | diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c | index b8fedf3..fe39dc6 100644 | --- a/tools/perf/builtin-trace.c | +++ b/tools/perf/builtin-trace.c | @@ -2044,6 +2044,7 @@ static int trace__run(struct trace *trace, int argc, const char **argv) | int err = -1, i; | unsigned long before; | const bool forks = argc > 0; | + bool draining = false; | char sbuf[STRERR_BUFSIZE]; | | trace->live = true; | @@ -2171,8 +2172,12 @@ next_event: | if (trace->nr_events == before) { | int timeout = done ? 100 : -1; | | - if (perf_evlist__poll(evlist, timeout) > 0) | + if (!draining && perf_evlist__poll(evlist, timeout) > 0) { | + if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0) | + draining = true; | + If an fd gets into POLLHUP state, perf_evlist__filter_pollfd() removes ("puts") the mmap for the fd. We are seeing that sometimes (frequently) _all_ fds are in the POLLHUP state and hence their mmap->base are set to NULL. | goto again; Now with this goto, we go back and call perf_evlist__mmap_read() which tries to access the freed mmaps. Should there be another check to before reading the mmap again ? I must add that I don't get the SIGSEGV on recent perf-core and the system where we get the crash, first runs into the following errors that we are still looking into (maybe related to "ppc64le" architecture). Problems reading syscall 45 information Problems reading syscall 5 information Problems reading syscall 5 information Problems reading syscall 108 information Problems reading syscall 108 information Problems reading syscall 90 information Problems reading syscall 90 information Problems reading syscall 6 information Unlike the SIGSEGV, these errors occur always. | + } | } else { | goto again; | } | -- | 1.8.3.1 | Following hack seems to fix the SIGSEGV, but then we completely ignore 'draining' flag. diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index fb12645..ac25e16 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -2173,8 +2173,10 @@ next_event: int timeout = done ? 100 : -1; if (!draining && perf_evlist__poll(evlist, timeout) > 0) { - if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0) + if (perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) == 0) { draining = true; + goto out_disable; + } goto again; }