From: Peter Zijlstra <peterz@infradead.org>
To: Zefan Li <lizefan@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
mingo@kernel.org, Matt Fleming <matt.fleming@intel.com>,
vikas.shivappa@linux.intel.com,
LKML <linux-kernel@vger.kernel.org>,
davidcc@google.com
Subject: Re: [PATCH] perf/x86/intel/cqm: Make sure the head event of cache_groups always has valid RMID
Date: Tue, 16 May 2017 16:38:27 +0200 [thread overview]
Message-ID: <20170516143827.GN4626@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <590A928F.8020201@huawei.com>
On Thu, May 04, 2017 at 10:31:43AM +0800, Zefan Li wrote:
> It is assumed that the head of cache_groups always has valid RMID,
> which isn't true.
>
> When we deallocate RMID from conflicting events currently we don't
> move them to the tail, and one of those events can happen to be in
> the head. Another case is we allocate RMIDs for all the events except
> the head event in intel_cqm_sched_in_event().
>
> Besides there's another bug that we retry rotating without resetting
> nr_needed and start in __intel_cqm_rmid_rotate().
>
> Those bugs combined together led to the following oops.
>
> WARNING: at arch/x86/kernel/cpu/perf_event_intel_cqm.c:186 __put_rmid+0x28/0x80()
> ...
> [<ffffffff8103a578>] __put_rmid+0x28/0x80
> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
> ...
> BUG: unable to handle kernel NULL pointer dereference at (null)
> ...
> [<ffffffff8103a74a>] intel_cqm_rmid_rotate+0xba/0x440
> [<ffffffff8109d8cb>] process_one_work+0x17b/0x470
> [<ffffffff8109e69b>] worker_thread+0x11b/0x400
I've managed to forgot most if not all of that horror show. Vikas and
David seem to be working on a replacement, but until such a time it
would be good if this thing would not crash the kernel.
Guys, could you have a look? To me it appears to mostly have the right
shape, but like I said, I forgot most details...
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Zefan Li <lizefan@huawei.com>
> ---
> arch/x86/events/intel/cqm.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
> index 8c00dc0..c06a5ba 100644
> --- a/arch/x86/events/intel/cqm.c
> +++ b/arch/x86/events/intel/cqm.c
> @@ -553,8 +553,13 @@ static bool intel_cqm_sched_in_event(u32 rmid)
>
> leader = list_first_entry(&cache_groups, struct perf_event,
> hw.cqm_groups_entry);
> - event = leader;
>
> + if (!list_empty(&cache_groups) && !__rmid_valid(leader->hw.cqm_rmid)) {
> + intel_cqm_xchg_rmid(leader, rmid);
> + return true;
> + }
> +
> + event = leader;
> list_for_each_entry_continue(event, &cache_groups,
> hw.cqm_groups_entry) {
> if (__rmid_valid(event->hw.cqm_rmid))
> @@ -721,6 +726,7 @@ static void intel_cqm_sched_out_conflicting_events(struct perf_event *event)
> {
> struct perf_event *group, *g;
> u32 rmid;
> + LIST_HEAD(conflicting_groups);
>
> lockdep_assert_held(&cache_mutex);
>
> @@ -744,7 +750,11 @@ static void intel_cqm_sched_out_conflicting_events(struct perf_event *event)
>
> intel_cqm_xchg_rmid(group, INVALID_RMID);
> __put_rmid(rmid);
> + list_move_tail(&group->hw.cqm_groups_entry,
> + &conflicting_groups);
> }
> +
> + list_splice_tail(&conflicting_groups, &cache_groups);
> }
>
> /*
> @@ -773,9 +783,9 @@ static void intel_cqm_sched_out_conflicting_events(struct perf_event *event)
> */
> static bool __intel_cqm_rmid_rotate(void)
> {
> - struct perf_event *group, *start = NULL;
> + struct perf_event *group, *start;
> unsigned int threshold_limit;
> - unsigned int nr_needed = 0;
> + unsigned int nr_needed;
> unsigned int nr_available;
> bool rotated = false;
>
> @@ -789,6 +799,9 @@ static bool __intel_cqm_rmid_rotate(void)
> if (list_empty(&cache_groups) && list_empty(&cqm_rmid_limbo_lru))
> goto out;
>
> + nr_needed = 0;
> + start = NULL;
> +
> list_for_each_entry(group, &cache_groups, hw.cqm_groups_entry) {
> if (!__rmid_valid(group->hw.cqm_rmid)) {
> if (!start)
> --
> 1.8.2.2
>
next prev parent reply other threads:[~2017-05-16 14:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 2:31 [PATCH] perf/x86/intel/cqm: Make sure the head event of cache_groups always has valid RMID Zefan Li
2017-05-16 1:40 ` Zefan Li
2017-05-16 14:38 ` Peter Zijlstra [this message]
2017-05-18 4:59 ` David Carrillo-Cisneros
2017-05-22 17:06 ` Shivappa Vikas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170516143827.GN4626@worktop.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=davidcc@google.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=matt.fleming@intel.com \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=vikas.shivappa@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox