All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, linux-kernel@vger.kernel.org,
	irogers@google.com, eranian@google.com, ak@linux.intel.com
Subject: Re: [PATCH] perf/core: Fix endless multiplex timer
Date: Wed, 4 Mar 2020 09:20:42 -0500	[thread overview]
Message-ID: <6a271c19-9575-24ca-8ebc-9ff5a65bbe3d@linux.intel.com> (raw)
In-Reply-To: <20200304093344.GJ2596@hirez.programming.kicks-ass.net>



On 3/4/2020 4:33 AM, Peter Zijlstra wrote:
> On Tue, Mar 03, 2020 at 08:40:10PM -0500, Liang, Kan wrote:
>>> I'm thinking this is wrong.
>>>
>>> That is, yes, this fixes the observed problem, but it also misses at
>>> least one other site. Which seems to suggest we ought to take a
>>> different approach.
>>>
>>> But even with that; I wonder if the actual condition isn't wrong.
>>> Suppose the event was exclusive, and other events weren't scheduled
>>> because of that. Then you disable the one exclusive event _and_ kill
>>> rotation, so then nothing else will ever get on.
>>>
>>> So what I think was supposed to happen is rotation killing itself;
>>> rotation will schedule out the context -- which will clear the flag, and
>>> then schedule the thing back in -- which will set the flag again when
>>> needed.
>>>
>>> Now, that isn't happening, and I think I see why, because when we drop
>>> to !nr_active, we terminate ctx_sched_out() before we get to clearing
>>> the flag, oops!
>>>
>>> So how about something like this?
>>>
>>> ---
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index e453589da97c..7947bd3271a9 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -2182,6 +2182,7 @@ __perf_remove_from_context(struct perf_event *event,
>>>    	if (!ctx->nr_events && ctx->is_active) {
>>>    		ctx->is_active = 0;
>>> +		ctx->rotate_necessary = 0;
>>>    		if (ctx->task) {
>>>    			WARN_ON_ONCE(cpuctx->task_ctx != ctx);
>>>    			cpuctx->task_ctx = NULL;
>>
>>
>> The patch can fix the observed problem with uncore PMU.
>> But it cannot fix all the cases with core PMU, especially when NMI watchdog
>> is enabled.
>> Because the ctx->nr_events never be 0 with NMI watchdog enabled.
> 
> But, I'm confused.. why do we care about nr_events==0 ? The below: vvvv
> 
>>> @@ -3074,15 +3075,15 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>>>    	is_active ^= ctx->is_active; /* changed bits */
>>> -	if (!ctx->nr_active || !(is_active & EVENT_ALL))
>>> -		return;
>>> -
>>>    	/*
>>>    	 * If we had been multiplexing, no rotations are necessary, now no events
>>>    	 * are active.
>>>    	 */
>>>    	ctx->rotate_necessary = 0;
>>> +	if (!ctx->nr_active || !(is_active & EVENT_ALL))
>>> +		return;
>>> +
>>>    	perf_pmu_disable(ctx->pmu);
>>>    	if (is_active & EVENT_PINNED) {
>>>    		list_for_each_entry_safe(event, tmp, &ctx->pinned_active, active_list)
> 
> Makes sure we clear the flag when we ctx_sched_out(), and as long as
> ctx->rotate_necessary is set, perf_rotate_context() will do exactly
> that.
>

NMI watchdog is pinned event.
ctx_event_to_rotate() will only pick an event from the flexible_groups.
So the cpu_ctx_sched_out() in perf_rotate_context() will never be called.


Thanks,
Kan


> Then ctx_sched_in() will re-set the flag if it failed to schedule a
> counter.
> 
> So where is that going wrong?


  reply	other threads:[~2020-03-04 14:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03 20:28 [PATCH] perf/core: Fix endless multiplex timer kan.liang
2020-03-03 21:08 ` Peter Zijlstra
2020-03-04  1:40   ` Liang, Kan
2020-03-04  9:33     ` Peter Zijlstra
2020-03-04 14:20       ` Liang, Kan [this message]
2020-03-05 12:38         ` Peter Zijlstra
2020-03-05 17:56           ` Liang, Kan
2020-03-20 12:58           ` [tip: perf/core] " tip-bot2 for Peter Zijlstra
2020-08-06 18:11             ` Robin Murphy
2020-08-06 18:53               ` Greg KH
2020-08-06 20:40                 ` Robin Murphy
2020-03-24  6:00 ` [perf/core] 92b1f046a2: BUG:kernel_NULL_pointer_dereference, address kernel test robot
2020-03-24  6:00   ` [perf/core] 92b1f046a2: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2020-03-24 12:52   ` [perf/core] 92b1f046a2: BUG:kernel_NULL_pointer_dereference, address Liang, Kan
2020-03-24 12:52     ` [perf/core] 92b1f046a2: BUG:kernel_NULL_pointer_dereference,address Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a271c19-9575-24ca-8ebc-9ff5a65bbe3d@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.