All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Matthieu Baerts <matttbe@kernel.org>
Cc: linux-kernel@vger.kernel.org, irogers@google.com,
	peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	namhyung@kernel.org, linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/7] perf: Generic hotplug support for a PMU with a scope
Date: Wed, 23 Oct 2024 13:46:43 -0400	[thread overview]
Message-ID: <eb552133-829d-4935-87e9-101e052fd40c@linux.intel.com> (raw)
In-Reply-To: <f856d105-5463-4b8b-8715-0e6871165616@kernel.org>



On 2024-10-23 1:09 p.m., Matthieu Baerts wrote:
> Hi Kan Liang,
> 
> (+ cc Perf ML)
> 
> On 02/08/2024 17:16, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The perf subsystem assumes that the counters of a PMU are per-CPU. So
>> the user space tool reads a counter from each CPU in the system wide
>> mode. However, many PMUs don't have a per-CPU counter. The counter is
>> effective for a scope, e.g., a die or a socket. To address this, a
>> cpumask is exposed by the kernel driver to restrict to one CPU to stand
>> for a specific scope. In case the given CPU is removed,
>> the hotplug support has to be implemented for each such driver.
>>
>> The codes to support the cpumask and hotplug are very similar.
>> - Expose a cpumask into sysfs
>> - Pickup another CPU in the same scope if the given CPU is removed.
>> - Invoke the perf_pmu_migrate_context() to migrate to a new CPU.
>> - In event init, always set the CPU in the cpumask to event->cpu
>>
>> Similar duplicated codes are implemented for each such PMU driver. It
>> would be good to introduce a generic infrastructure to avoid such
>> duplication.
>>
>> 5 popular scopes are implemented here, core, die, cluster, pkg, and
>> the system-wide. The scope can be set when a PMU is registered. If so, a
>> "cpumask" is automatically exposed for the PMU.
>>
>> The "cpumask" is from the perf_online_<scope>_mask, which is to track
>> the active CPU for each scope. They are set when the first CPU of the
>> scope is online via the generic perf hotplug support. When a
>> corresponding CPU is removed, the perf_online_<scope>_mask is updated
>> accordingly and the PMU will be moved to a new CPU from the same scope
>> if possible.
> 
> Thank you for the patch.
> 
> It looks like this modification causes the following warning on my side
> when shutting down a VM running a kernel built with a debug config
> including CONFIG_PROVE_RCU_LIST=y (and CONFIG_RCU_EXPERT=y):
> 
> 
>> # /usr/lib/klibc/bin/poweroff
>>
>> =============================
>> WARNING: suspicious RCU usage
>> 6.12.0-rc3+ #3 Not tainted
>> -----------------------------
>> kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
>>
>> other info that might help us debug this:
>>
>>
>> rcu_scheduler_active = 2, debug_locks = 1
>> 3 locks held by poweroff/11748:
>>  #0: ffffffff9b441e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770)
>>  #1: ffffffff9b43eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388)
>>  #2: ffffffff9b6d06c8 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983)
>>
>> stack backtrace:
>> CPU: 0 UID: 0 PID: 11748 Comm: poweroff Not tainted 6.12.0-rc3+ #3
>> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> Call Trace:
>>  <TASK>
>>  dump_stack_lvl (lib/dump_stack.c:123)
>>  lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
>>  perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9))
>>  ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939)
>>  ? acpi_execute_simple_method (drivers/acpi/utils.c:679)
>>  ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679)
>>  ? md_notify_reboot (drivers/md/md.c:9860)
>>  perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1))
>>  perf_reboot (kernel/events/core.c:14066 (discriminator 3))
>>  ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2))
>>  notifier_call_chain (kernel/notifier.c:93)
>>  blocking_notifier_call_chain (kernel/notifier.c:389)
>>  kernel_power_off (kernel/reboot.c:294)
>>  __do_sys_reboot (kernel/reboot.c:771)
>>  ? __pfx___do_sys_reboot (kernel/reboot.c:717)
>>  ? __pfx_ksys_sync (fs/sync.c:98)
>>  do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1))
>>  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> 
> 
> It is very easy for me to reproduce it: simply by stopping the VM. Just
> in case, here are the steps I used to have the same VM:
> 
>   $ cd [kernel source code]
>   $ echo true > .virtme-exec-run
>   $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
>         --pull always mptcp/mptcp-upstream-virtme-docker:latest \
>         auto-debug -e RCU_EXPERT -e PROVE_RCU_LIST
> 
> 
> I have one question below about the modification you did here.
> 
> (...)
> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index aa3450bdc227..5e1877c4cb4c 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
> 
> (...)
> 
>> @@ -13730,6 +13816,40 @@ static void __perf_event_exit_context(void *__info)
>>  	raw_spin_unlock(&ctx->lock);
>>  }
>>  
>> +static void perf_event_clear_cpumask(unsigned int cpu)
>> +{
>> +	int target[PERF_PMU_MAX_SCOPE];
>> +	unsigned int scope;
>> +	struct pmu *pmu;
>> +
>> +	cpumask_clear_cpu(cpu, perf_online_mask);
>> +
>> +	for (scope = PERF_PMU_SCOPE_NONE + 1; scope < PERF_PMU_MAX_SCOPE; scope++) {
>> +		const struct cpumask *cpumask = perf_scope_cpu_topology_cpumask(scope, cpu);
>> +		struct cpumask *pmu_cpumask = perf_scope_cpumask(scope);
>> +
>> +		target[scope] = -1;
>> +		if (WARN_ON_ONCE(!pmu_cpumask || !cpumask))
>> +			continue;
>> +
>> +		if (!cpumask_test_and_clear_cpu(cpu, pmu_cpumask))
>> +			continue;
>> +		target[scope] = cpumask_any_but(cpumask, cpu);
>> +		if (target[scope] < nr_cpu_ids)
>> +			cpumask_set_cpu(target[scope], pmu_cpumask);
>> +	}
>> +
>> +	/* migrate */
>> +	list_for_each_entry_rcu(pmu, &pmus, entry, lockdep_is_held(&pmus_srcu)) {
> 
> 
> Here is the line causing the warning, because rcu_read_lock() is not
> used before.
> 
> I don't know this code, but I guess you are not only doing read
> operations here if you are migrating something, right? This operation is
> done under the "pmus_lock", maybe the "_rcu" variant is not needed here?
> 
> So just using this instead is maybe enough?
> 
>   list_for_each_entry(pmu, &pmus, entry) {

Yes, it's good enough. A patch has been proposed, but haven't been
merged yet.

https://lore.kernel.org/lkml/20240913162340.2142976-1-kan.liang@linux.intel.com/

Thanks,
Kan
> 
> 
>> +		if (pmu->scope == PERF_PMU_SCOPE_NONE ||
>> +		    WARN_ON_ONCE(pmu->scope >= PERF_PMU_MAX_SCOPE))
>> +			continue;
>> +
>> +		if (target[pmu->scope] >= 0 && target[pmu->scope] < nr_cpu_ids)
>> +			perf_pmu_migrate_context(pmu, cpu, target[pmu->scope]);
>> +	}
>> +}
>> +
>>  static void perf_event_exit_cpu_context(int cpu)
>>  {
>>  	struct perf_cpu_context *cpuctx;
>> @@ -13737,6 +13857,11 @@ static void perf_event_exit_cpu_context(int cpu)
>>  
>>  	// XXX simplify cpuctx->online
>>  	mutex_lock(&pmus_lock);
>> +	/*
>> +	 * Clear the cpumasks, and migrate to other CPUs if possible.
>> +	 * Must be invoked before the __perf_event_exit_context.
>> +	 */
>> +	perf_event_clear_cpumask(cpu);
>>  	cpuctx = per_cpu_ptr(&perf_cpu_context, cpu);
>>  	ctx = &cpuctx->ctx;
>>  
>> @@ -13744,7 +13869,6 @@ static void perf_event_exit_cpu_context(int cpu)
>>  	smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
>>  	cpuctx->online = 0;
>>  	mutex_unlock(&ctx->mutex);
>> -	cpumask_clear_cpu(cpu, perf_online_mask);
>>  	mutex_unlock(&pmus_lock);
>>  }
>>  #else
> (...)
> 
> Cheers,
> Matt


  reply	other threads:[~2024-10-23 17:46 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02 15:16 [PATCH 0/7] Generic hotplug support for a PMU with a scope kan.liang
2024-08-02 15:16 ` [PATCH 1/7] perf: " kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-09-12 10:12     ` Steven Price
2024-09-12 14:53       ` Liang, Kan
2024-09-19 15:43   ` [PATCH 1/7] " Guenter Roeck
2024-09-19 16:28     ` Liang, Kan
2024-10-23 17:09   ` Matthieu Baerts
2024-10-23 17:46     ` Liang, Kan [this message]
2024-10-23 18:19       ` Peter Zijlstra
2024-08-02 15:16 ` [PATCH 2/7] perf: Add PERF_EV_CAP_READ_SCOPE kan.liang
2024-09-06 15:11   ` Peter Zijlstra
2024-09-06 15:26     ` Liang, Kan
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 3/7] perf/x86/intel/cstate: Clean up cpumask and hotplug kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 4/7] iommu/vt-d: Clean up cpumask and hotplug for perfmon kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 5/7] dmaengine: idxd: " kan.liang
2024-08-05 15:40   ` Dave Jiang
2024-08-05 17:46   ` Fenghua Yu
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 6/7] perf/x86/rapl: Move the pmu allocation out of CPU hotplug kan.liang
2024-09-09  9:26   ` Dhananjay Ugwekar
2024-09-09 13:02     ` Liang, Kan
2024-09-09 13:24       ` Peter Zijlstra
2024-09-09 17:11         ` Liang, Kan
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-08-02 15:16 ` [PATCH 7/7] perf/x86/rapl: Clean up cpumask and hotplug kan.liang
2024-09-10  9:59   ` [tip: perf/core] " tip-bot2 for Kan Liang
2024-09-04 12:44 ` [PATCH 0/7] Generic hotplug support for a PMU with a scope Liang, Kan
2024-09-06 15:17   ` Peter Zijlstra
2024-09-06 15:30     ` Liang, Kan
2024-09-06 15:12 ` Peter Zijlstra
2024-09-06 15:30   ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb552133-829d-4935-87e9-101e052fd40c@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.