PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@kernel.org>
To: eranian@gmail.com
Cc: Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <andi@firstfloor.org>
Subject: PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12)
Date: Tue, 10 Sep 2013 13:53:06 +0200	[thread overview]
Message-ID: <20130910115306.GA6091@gmail.com> (raw)
In-Reply-To: <CAMsRxfLEO15kKrbmtKKXuW-JTtCCgiuXS6wFs9kiLmG1wge24A@mail.gmail.com>


* Stephane Eranian <eranian@googlemail.com> wrote:

> Hi,
> 
> 
> And what was the perf record command line for this crash?

AFAICS it wasn't a crash but the WARN_ON() in intel_pmu_drain_pebs_hsw(), 
at arch/x86/kernel/cpu/perf_event_intel_ds.c:1003.

        at  = (struct pebs_record_hsw *)(unsigned long)ds->pebs_buffer_base;
        top = (struct pebs_record_hsw *)(unsigned long)ds->pebs_index;

        n = top - at;
        if (n <= 0)
                return;

        /*
         * Should not happen, we program the threshold at 1 and do not
         * set a reset value.
         */
        WARN_ONCE(n > x86_pmu.max_pebs_events,
                  "Unexpected number of pebs records %d\n", n);

The command line Linus used was probably close to:

   perf record -e cycles:pp -g make -j64 bzImage

i.e. PEBS precise profiling, call chains, LBR is used to figure out the 
real instruction, but no '-a' per CPU profiling option, i.e. high 
frequency per task PMU context switching.

Note that AFAIK neither the kernel nor user-space used any TSX extensions, 
so this is the Haswell PMU in pure compatibility mode.

My (wild) guess is that unless all of us missed some subtle race in the 
PEBS code it's an (unknown?) erratum: the hardware got confused by the 
high frequency PMU switches, in this particular case where we got a new 
PMI right after a very short interval was programmed:

>>  Call Trace:
>>   <NMI>  [<ffffffff815fc637>] dump_stack+0x45/0x56
>>   [<ffffffff81051e78>] warn_slowpath_common+0x78/0xa0
>>   [<ffffffff81051ee7>] warn_slowpath_fmt+0x47/0x50
>>   [<ffffffff8101b051>] intel_pmu_drain_pebs_hsw+0x91/0xa0
>>   [<ffffffff8101c5d0>] intel_pmu_handle_irq+0x210/0x390
>>   [<ffffffff81604deb>] perf_event_nmi_handler+0x2b/0x50
>>   [<ffffffff81604670>] nmi_handle.isra.3+0x80/0x180
>>   [<ffffffff81604840>] do_nmi+0xd0/0x310
>>   [<ffffffff81603d37>] end_repeat_nmi+0x1e/0x2e
>>   <<EOE>>  [<ffffffff810167df>] perf_events_lapic_init+0x2f/0x40
>>   [<ffffffff81016a50>] x86_pmu_enable+0x260/0x310
>>   [<ffffffff81111d87>] perf_pmu_enable+0x27/0x30
>>   [<ffffffff81112140>] perf_event_context_sched_in+0x80/0xc0
>>   [<ffffffff811127eb>] __perf_event_task_sched_in+0x16b/0x180
>>   [<ffffffff8107c300>] finish_task_switch+0x70/0xa0
>>   [<ffffffff81600f48>] __schedule+0x368/0x7c0
>>   [<ffffffff816013c4>] schedule+0x24/0x70

Note that due to per task profiling the default (long, about 1 KHz) 
interval can get chopped up and can result in a very small period value 
being reprogrammed at PMU-sched-in time.

That kind of high-freq back-to-back activity could, in theory, confuse the 
PEBS hardware. Or the kernel :-)

Thanks,

	Ingo

next prev parent reply	other threads:[~2013-09-10 11:53 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-03 13:29 [GIT PULL] perf changes for v3.12 Ingo Molnar
2013-09-03 13:37 ` Arnaldo Carvalho de Melo
2013-09-03 13:43   ` Ingo Molnar
2013-09-03 17:02 ` Vince Weaver
2013-09-04 17:53 ` Linus Torvalds
2013-09-05 10:56   ` Ingo Molnar
2013-09-05 12:42     ` Frederic Weisbecker
2013-09-05 12:51       ` Ingo Molnar
2013-09-05 12:58         ` Frederic Weisbecker
2013-09-10  8:06       ` Namhyung Kim
2013-09-10 11:18         ` Frederic Weisbecker
2013-09-05 13:38 ` Ingo Molnar
2013-09-08  2:17 ` Linus Torvalds
2013-09-09 10:05   ` Peter Zijlstra
2013-09-10 11:28     ` Stephane Eranian
2013-09-10 11:53       ` Ingo Molnar [this message]
2013-09-10 12:32         ` PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12) Stephane Eranian
2013-09-10 12:42           ` Ramkumar Ramachandra
2013-09-10 12:51           ` Ramkumar Ramachandra
2013-09-10 12:55             ` Stephane Eranian
2013-09-10 13:22               ` Ingo Molnar
2013-09-10 13:38           ` Ingo Molnar
2013-09-10 14:15             ` Stephane Eranian
2013-09-10 14:29               ` Ingo Molnar
2013-09-10 14:34                 ` Stephane Eranian
2013-09-10 17:14                   ` Ingo Molnar
2013-09-16 11:07                     ` Stephane Eranian
2013-09-16 15:41                       ` Ingo Molnar
2013-09-16 16:29                         ` Peter Zijlstra
2013-09-17  7:00                           ` Ingo Molnar
2013-09-23 15:25                           ` Stephane Eranian
2013-09-23 15:33                             ` Peter Zijlstra
2013-09-23 17:11                               ` Stephane Eranian
2013-09-23 17:24                                 ` Peter Zijlstra
2013-09-10 15:28               ` Peter Zijlstra
2013-09-10 16:14                 ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130910115306.GA6091@gmail.com \
    --to=mingo@kernel.org \
    --cc=acme@infradead.org \
    --cc=andi@firstfloor.org \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox