From: Cyrill Gorcunov <gorcunov@gmail.com>
To: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups
Date: Thu, 30 Jan 2014 00:06:57 +0400 [thread overview]
Message-ID: <20140129200657.GJ29846@moon> (raw)
In-Reply-To: <1391024270-19469-1-git-send-email-dzickus@redhat.com>
On Wed, Jan 29, 2014 at 02:37:50PM -0500, Don Zickus wrote:
> On a P4 box stressing perf with
>
> ./perf record -o perf.data ./perf stat -v ./perf bench all
>
> it was noticed that a slew of unknown NMIs would pop out rather quickly.
>
> Painfully debugging this ancient platform, led me to notice cross cpu counter
> corruption.
>
> The P4 machine is special in that it has 18 counters, half are used for cpu0
> and the other half is for cpu1 (or all 18 if hyperthreading is disabled). But
> the splitting of the counters has to be actively managed by the software.
>
> In this particular bug, one of the cpu0 specific counters was being used by
> cpu1 and caused all sorts of random unknown nmis.
>
> I am not entirely sure on the corruption path, but what happens is:
>
> o perf schedules a group with p4_pmu_schedule_events()
> o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> but for a different cpu, so it 'swaps' the config bits and returns the
> updated 'assign' array with a _new_ index.
> o perf schedules another group with p4_pmu_schedule_events()
> o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
> (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
> 'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
> an earlier part of the 'assign' array (and hasn't been committed yet).
> o perf commits the transaction using the wrong index and corrupts the other cpu
Thanks for the fix Don! I fear I won't be able to look precisely tonight, so
could it wait until tomorrow? (If it's critical sure such fix should do the
trick).
next prev parent reply other threads:[~2014-01-29 20:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-29 19:37 [PATCH] x86, perf, p4: Counter corruption when using lots of perf groups Don Zickus
2014-01-29 20:06 ` Cyrill Gorcunov [this message]
2014-01-29 20:17 ` Don Zickus
2014-02-03 6:19 ` Cyrill Gorcunov
2014-02-03 16:35 ` Don Zickus
2014-02-10 13:29 ` [tip:perf/core] perf/x86/p4: Fix counter " tip-bot for Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140129200657.GJ29846@moon \
--to=gorcunov@gmail.com \
--cc=dzickus@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.