public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf: detect loops processing events
@ 2013-05-08 15:18 David Ahern
  2013-05-09  9:24 ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2013-05-08 15:18 UTC (permalink / raw)
  To: acme, linux-kernel
  Cc: David Ahern, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian

Recovery algorithm in __perf_session__process_events attempts to remap
a perf.data file with a different file_offset and try again at a new head
position. Both of these adjustment rely on page_offset. If page_offset is
0 then file_offset and head never change which means the remap attempt is
the same and the fetch_mmaped_event is the same and the processing just
loops forever.

Detect this condition and warn the user.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/session.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index cf1fe01..1c4dc45 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1235,6 +1235,12 @@ more:
 		}
 
 		page_offset = page_size * (head / page_size);
+		/* catch looping where we never make forward progress. */
+		if (page_offset == 0) {
+			pr_err("Loop detection processing events. Is file corrupted?\n");
+			return -1;
+		}
+
 		file_offset += page_offset;
 		head -= page_offset;
 		goto remap;
-- 
1.7.10.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf: detect loops processing events
  2013-05-08 15:18 [PATCH] perf: detect loops processing events David Ahern
@ 2013-05-09  9:24 ` Ingo Molnar
  2013-05-09  9:30   ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-05-09  9:24 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian


* David Ahern <dsahern@gmail.com> wrote:

> Recovery algorithm in __perf_session__process_events attempts to remap
> a perf.data file with a different file_offset and try again at a new head
> position. Both of these adjustment rely on page_offset. If page_offset is
> 0 then file_offset and head never change which means the remap attempt is
> the same and the fetch_mmaped_event is the same and the processing just
> loops forever.
> 
> Detect this condition and warn the user.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Stephane Eranian <eranian@google.com>
> ---
>  tools/perf/util/session.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index cf1fe01..1c4dc45 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1235,6 +1235,12 @@ more:
>  		}
>  
>  		page_offset = page_size * (head / page_size);
> +		/* catch looping where we never make forward progress. */
> +		if (page_offset == 0) {
> +			pr_err("Loop detection processing events. Is file corrupted?\n");
> +			return -1;
> +		}
> +
>  		file_offset += page_offset;
>  		head -= page_offset;
>  		goto remap;

Ah, nice!

Btw., would it make sense to emit a (once-only) warning and optimistically 
fix page_offset up to 1 (or 4096) and let things continue with the next 
set of data - can we recover most of the data in that case?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf: detect loops processing events
  2013-05-09  9:24 ` Ingo Molnar
@ 2013-05-09  9:30   ` Ingo Molnar
  2013-05-10  1:10     ` Namhyung Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2013-05-09  9:30 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, Frederic Weisbecker, Peter Zijlstra,
	Jiri Olsa, Namhyung Kim, Stephane Eranian


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * David Ahern <dsahern@gmail.com> wrote:
> 
> > Recovery algorithm in __perf_session__process_events attempts to remap
> > a perf.data file with a different file_offset and try again at a new head
> > position. Both of these adjustment rely on page_offset. If page_offset is
> > 0 then file_offset and head never change which means the remap attempt is
> > the same and the fetch_mmaped_event is the same and the processing just
> > loops forever.
> > 
> > Detect this condition and warn the user.
> > 
> > Signed-off-by: David Ahern <dsahern@gmail.com>
> > Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Stephane Eranian <eranian@google.com>
> > ---
> >  tools/perf/util/session.c |    6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index cf1fe01..1c4dc45 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1235,6 +1235,12 @@ more:
> >  		}
> >  
> >  		page_offset = page_size * (head / page_size);
> > +		/* catch looping where we never make forward progress. */
> > +		if (page_offset == 0) {
> > +			pr_err("Loop detection processing events. Is file corrupted?\n");
> > +			return -1;
> > +		}
> > +
> >  		file_offset += page_offset;
> >  		head -= page_offset;
> >  		goto remap;
> 
> Ah, nice!
> 
> Btw., would it make sense to emit a (once-only) warning and optimistically 
> fix page_offset up to 1 (or 4096) and let things continue with the next 
> set of data - can we recover most of the data in that case?

Basically, what'd like to do with binary data is similar to what we do 
with text data if some trace output is corrupted:

            sshd-15478 [002]  1100.859353:  15478:120:S ==> [002]     0:140:R <idle>
          <idle>-0     [005]  1100.859378:      0:140:R ==> [005] 20169:120:R cat
             cat-20169 [005]  1100.860718:  20169:120:R   + [005] 15521:120:S bash
             ^@¨<81><92>^@^Bª^P^P^D<9c>8@<88>^A^M ;¤0 ^D<90>"ª<81>^B^T)Ò^C$^@^N^@^A
             cat-20169 [005]  1100.860720:  20169:120:R   + [005]   305:115:S kblockd/5
             cat-20169 [005]  1100.860722:  20169:120:? ==> [005]   305:115:R kblockd/5
       kblockd/5-305   [005]  1100.860755:    305:115:S ==> [005] 15521:120:R bash
            bash-15521 [005]  1100.860792:  15521:120:S   + [002] 15478:120:S sshd
          <idle>-0     [002]  1100.860853:      0:140:R ==> [002] 15478:120:R sshd
            sshd-15478 [002]  1100.860895:  15478:120:S ==> [002]     0:140:R <idle>
            bash-15521 [005]  1100.860925:  15521:120:S   + [002] 15478:120:S sshd
            bash-15521 [005]  1100.860999:  15521:120:S ==> [005]     0:140:R <idle>

See that junk in the middle, sign of some sort of file corruption? Instead 
of detecting it and aborting we just try to skip that line and try to find 
the next useful looking line, ignoring the junk and bits around it.

Is there a perf.data equivalent of intelligently trying to skip to the 
next plausible looking event record?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf: detect loops processing events
  2013-05-09  9:30   ` Ingo Molnar
@ 2013-05-10  1:10     ` Namhyung Kim
  2013-05-10  1:40       ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2013-05-10  1:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, acme, linux-kernel, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Stephane Eranian

Hi Ingo,

On Thu, 9 May 2013 11:30:25 +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@kernel.org> wrote:
>> Btw., would it make sense to emit a (once-only) warning and optimistically 
>> fix page_offset up to 1 (or 4096) and let things continue with the next 
>> set of data - can we recover most of the data in that case?
>
> Basically, what'd like to do with binary data is similar to what we do 
> with text data if some trace output is corrupted:
>
>             sshd-15478 [002]  1100.859353:  15478:120:S ==> [002]     0:140:R <idle>
>           <idle>-0     [005]  1100.859378:      0:140:R ==> [005] 20169:120:R cat
>              cat-20169 [005]  1100.860718:  20169:120:R   + [005] 15521:120:S bash
>              ^@¨<81><92>^@^Bª^P^P^D<9c>8@<88>^A^M ;¤0 ^D<90>"ª<81>^B^T)Ò^C$^@^N^@^A
>              cat-20169 [005]  1100.860720:  20169:120:R   + [005]   305:115:S kblockd/5
>              cat-20169 [005]  1100.860722:  20169:120:? ==> [005]   305:115:R kblockd/5
>        kblockd/5-305   [005]  1100.860755:    305:115:S ==> [005] 15521:120:R bash
>             bash-15521 [005]  1100.860792:  15521:120:S   + [002] 15478:120:S sshd
>           <idle>-0     [002]  1100.860853:      0:140:R ==> [002] 15478:120:R sshd
>             sshd-15478 [002]  1100.860895:  15478:120:S ==> [002]     0:140:R <idle>
>             bash-15521 [005]  1100.860925:  15521:120:S   + [002] 15478:120:S sshd
>             bash-15521 [005]  1100.860999:  15521:120:S ==> [005]     0:140:R <idle>
>
> See that junk in the middle, sign of some sort of file corruption? Instead 
> of detecting it and aborting we just try to skip that line and try to find 
> the next useful looking line, ignoring the junk and bits around it.
>
> Is there a perf.data equivalent of intelligently trying to skip to the 
> next plausible looking event record?

I think we should not truncate file_size for this case.  It was
decreased to data_offset + data_size in order not to read unrelated
metadata (additional header feature info).  But in this case, since
data_size is 0 it'd have same value as data_offset, and in turn
mmap_size truncated to data_offset too.  So fetch_mmaped_event() always
return NULL as head + sizeof(event->header) exceeds mmap_size.

If we keep original file_size, perf can report existing samples but no
metadata.  So does the patch below make sense?


diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index cf1fe01b7e89..cf4e574c7b7f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1196,7 +1196,7 @@ int __perf_session__process_events(struct perf_session *session,
        file_offset = page_offset;
        head = data_offset - page_offset;
 
-       if (data_offset + data_size < file_size)
+       if (data_size && (data_offset + data_size < file_size))
                file_size = data_offset + data_size;
 
        progress_next = file_size / 16;


-- 
Thanks,
Namhyung

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] perf: detect loops processing events
  2013-05-10  1:10     ` Namhyung Kim
@ 2013-05-10  1:40       ` David Ahern
  0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2013-05-10  1:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, acme, linux-kernel, Frederic Weisbecker,
	Peter Zijlstra, Jiri Olsa, Stephane Eranian

On 5/9/13 7:10 PM, Namhyung Kim wrote:
>
> I think we should not truncate file_size for this case.  It was
> decreased to data_offset + data_size in order not to read unrelated
> metadata (additional header feature info).  But in this case, since
> data_size is 0 it'd have same value as data_offset, and in turn
> mmap_size truncated to data_offset too.  So fetch_mmaped_event() always
> return NULL as head + sizeof(event->header) exceeds mmap_size.
>
> If we keep original file_size, perf can report existing samples but no
> metadata.  So does the patch below make sense?
>
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index cf1fe01b7e89..cf4e574c7b7f 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1196,7 +1196,7 @@ int __perf_session__process_events(struct perf_session *session,
>          file_offset = page_offset;
>          head = data_offset - page_offset;
>
> -       if (data_offset + data_size < file_size)
> +       if (data_size && (data_offset + data_size < file_size))
>                  file_size = data_offset + data_size;
>
>          progress_next = file_size / 16;

Nice. That does handle the case of the perf.data file not getting closed 
properly. With this, my patch should not return -1 just print the error 
message to the user which would explain why the feature data is not printed.

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-05-10  1:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-08 15:18 [PATCH] perf: detect loops processing events David Ahern
2013-05-09  9:24 ` Ingo Molnar
2013-05-09  9:30   ` Ingo Molnar
2013-05-10  1:10     ` Namhyung Kim
2013-05-10  1:40       ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox