linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	James Clark <james.clark@arm.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>,
	ananth.narayan@amd.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, sandipan.das@amd.com
Subject: Re: [PATCH v2 6/6] perf parse-events: Add "cpu" term to set the CPU an event is recorded on
Date: Fri, 19 Jul 2024 12:42:33 -0400	[thread overview]
Message-ID: <ee3c8e75-2463-49f8-b1ba-5a68db0bae39@linux.intel.com> (raw)
In-Reply-To: <CAP-5=fUvA5Lmk16Zs1B6bj1OLHf9NaHhAzx7yZL9S7JGwmO0uw@mail.gmail.com>



On 2024-07-19 11:01 a.m., Ian Rogers wrote:
> On Fri, Jul 19, 2024 at 7:14 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-07-18 5:06 p.m., Ian Rogers wrote:
>>> On Thu, Jul 18, 2024 at 11:03 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2024-07-18 11:12 a.m., Ian Rogers wrote:
>>>>> On Thu, Jul 18, 2024 at 7:41 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024-07-17 8:30 p.m., Ian Rogers wrote:
>>>>>>> The -C option allows the CPUs for a list of events to be specified but
>>>>>>> its not possible to set the CPU for a single event. Add a term to
>>>>>>> allow this. The term isn't a general CPU list due to ',' already being
>>>>>>> a special character in event parsing instead multiple cpu= terms may
>>>>>>> be provided and they will be merged/unioned together.
>>>>>>>
>>>>>>> An example of mixing different types of events counted on different CPUs:
>>>>>>> ```
>>>>>>> $ perf stat -A -C 0,4-5,8 -e "instructions/cpu=0/,l1d-misses/cpu=4,cpu=5/,inst_retired.any/cpu=8/,cycles" -a sleep 0.1
>>>>>>>
>>>>>>>  Performance counter stats for 'system wide':
>>>>>>>
>>>>>>> CPU0              368,647      instructions/cpu=0/              #    0.26  insn per cycle
>>>>>>> CPU4        <not counted>      instructions/cpu=0/
>>>>>>> CPU5        <not counted>      instructions/cpu=0/
>>>>>>> CPU8        <not counted>      instructions/cpu=0/
>>>>>>> CPU0        <not counted>      l1d-misses [cpu]
>>>>>>> CPU4              203,377      l1d-misses [cpu]
>>>>>>> CPU5              138,231      l1d-misses [cpu]
>>>>>>> CPU8        <not counted>      l1d-misses [cpu]
>>>>>>> CPU0        <not counted>      cpu/cpu=8/
>>>>>>> CPU4        <not counted>      cpu/cpu=8/
>>>>>>> CPU5        <not counted>      cpu/cpu=8/
>>>>>>> CPU8              943,861      cpu/cpu=8/
>>>>>>> CPU0            1,412,071      cycles
>>>>>>> CPU4           20,362,900      cycles
>>>>>>> CPU5           10,172,725      cycles
>>>>>>> CPU8            2,406,081      cycles
>>>>>>>
>>>>>>>        0.102925309 seconds time elapsed
>>>>>>> ```
>>>>>>>
>>>>>>> Note, the event name of inst_retired.any is missing, reported as
>>>>>>> cpu/cpu=8/, as there are unmerged uniquify fixes:
>>>>>>> https://lore.kernel.org/lkml/20240510053705.2462258-3-irogers@google.com/
>>>>>>>
>>>>>>> An example of spreading uncore overhead across two CPUs:
>>>>>>> ```
>>>>>>> $ perf stat -A -e "data_read/cpu=0/,data_write/cpu=1/" -a sleep 0.1
>>>>>>>
>>>>>>>  Performance counter stats for 'system wide':
>>>>>>>
>>>>>>> CPU0               223.65 MiB  uncore_imc_free_running_0/cpu=0/
>>>>>>> CPU0               223.66 MiB  uncore_imc_free_running_1/cpu=0/
>>>>>>> CPU0        <not counted> MiB  uncore_imc_free_running_0/cpu=1/
>>>>>>> CPU1                 5.78 MiB  uncore_imc_free_running_0/cpu=1/
>>>>>>> CPU0        <not counted> MiB  uncore_imc_free_running_1/cpu=1/
>>>>>>> CPU1                 5.74 MiB  uncore_imc_free_running_1/cpu=1/
>>>>>>> ```
>>>>>>>
>>>>>>> Manually fixing the output it should be:
>>>>>>> ```
>>>>>>> CPU0               223.65 MiB  uncore_imc_free_running_0/data_read,cpu=0/
>>>>>>> CPU0               223.66 MiB  uncore_imc_free_running_1/data_read,cpu=0/
>>>>>>> CPU1                 5.78 MiB  uncore_imc_free_running_0/data_write,cpu=1/
>>>>>>> CPU1                 5.74 MiB  uncore_imc_free_running_1/data_write,cpu=1/
>>>>>>> ```
>>>>>>>
>>>>>>> That is data_read from 2 PMUs was counted on CPU0 and data_write was
>>>>>>> counted on CPU1.
>>>>>>
>>>>>> There was an effort to make the counter access from any CPU of the package.
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6a2f9035bfc27d0e9d78b13635dda9fb017ac01
>>>>>>
>>>>>> But now it limits the access from specific CPUs. It sounds like a
>>>>>> regression.
>>>>>
>>>>> Thanks Kan, I'm not sure I understand the comment.
>>>>
>>>> The flag is also applied for the uncore and RAPL.
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/events/intel/uncore.c?&id=e64cd6f73ff5a7eb4f8f759049ee24a3fe55e731
>>>>
>>>> So specifying a CPU to an uncore event doesn't make sense. If the
>>>> current CPU is in the same package as the asked CPU. The kernel will
>>>> always choose the current CPU.
>>>
>>> Ugh, that sounds sub-optimal. If I'm monitoring uncore events with
>>> cgroups CPU0 (or the first CPU in a package) is going to be loaded up
>>> with all the events and have all of the rdmsr/wrmsrs in its context
>>> switch. Perhaps we should warn and say to use BPF events.
>>>
>>> Is there a way through say ioctls to get the CPU an event is on? That
>>> way we could update the `perf stat -A` to accurately report cpus.
>>> There's also the issue that the affinity stuff is going to be off.
>>>
>>
>> I don't think there is such ioctl.
>>
>> Emphasizing the CPU ID for an uncore event seems misleading. The uncore
>> only supports per-socket counter, not per-core counter.
>> Opening/reading an counter from any CPUs on a package should be identical.
>> An accurate report of the `perf stat -A` for an uncore should use "S0".
> 
> I think it is an "elite" level trick to try to do things like balance
> context switch overhead. Putting everything on the first CPU feels
> like a scalability issue for cgroup events. BPF events work around it
> to some degree, but they are still going to put all the work on CPU0
> which could cause performance issues, latency spikes, etc. on it.

Yes, that's why the PERF_EV_CAP_READ_ACTIVE_PKG was introduced. So the
read would not be on the same CPU0.
But I'm not sure how bad it is if perf tool open/close all the uncore
events on the same default CPU0.

Thanks,
Kan
> 
> Thanks,
> Ian
> 
>> Thanks,
>> Kan
>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> Thanks,
>>>> Kan
>>>>> The overhead I was
>>>>> thinking of here is more along the lines of cgroup context switches
>>>>> (although that isn't in my example). There may be a large number of
>>>>> say memory controller events just by having 2 events for each PMU and
>>>>> then there are 10s of PMUs. By putting half of the events on 1 CPU and
>>>>> half on another, the context switch overhead is shared. That said, the
>>>>> counters don't care what cgroup is accessing memory, and users doing
>>>>> this are likely making some kind of error.
>>>>>
>>>>> Thanks,
>>>>> Ian
>>>>>
>>>
> 

  reply	other threads:[~2024-07-19 16:42 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18  0:30 [PATCH v2 0/6] Add support for sysfs event.cpus and cpu event term Ian Rogers
2024-07-18  0:30 ` [PATCH v2 1/6] perf pmu: Merge boolean sysfs event option parsing Ian Rogers
2024-07-18  0:30 ` [PATCH v2 2/6] perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event Ian Rogers
2024-07-18  0:30 ` [PATCH v2 3/6] perf pmu: Add support for event.cpus files in sysfs Ian Rogers
2024-07-18 14:33   ` Liang, Kan
2024-07-18 15:39     ` Ian Rogers
2024-07-18 17:47       ` Liang, Kan
2024-07-18 20:50         ` Ian Rogers
2024-07-19 13:55           ` Liang, Kan
2024-07-19 14:59             ` Ian Rogers
2024-07-19 16:35               ` Liang, Kan
2024-07-19 22:02                 ` Ian Rogers
2024-07-22 13:57                   ` Liang, Kan
2024-07-22 15:43                     ` Ian Rogers
2024-07-22 16:45                       ` Liang, Kan
2024-07-26  7:06                   ` Dhananjay Ugwekar
2024-07-26  7:09                     ` Dhananjay Ugwekar
2024-07-26  7:52                       ` Ian Rogers
2024-07-26  8:17                         ` Dhananjay Ugwekar
2024-07-26 14:07                           ` Liang, Kan
2024-07-18  0:30 ` [PATCH v2 4/6] libperf cpumap: Add ability to create CPU from a single CPU number Ian Rogers
2024-07-18  0:30 ` [PATCH v2 5/6] perf parse-events: Set is_pmu_core for legacy hardware events Ian Rogers
2024-07-18  0:30 ` [PATCH v2 6/6] perf parse-events: Add "cpu" term to set the CPU an event is recorded on Ian Rogers
2024-07-18 14:09   ` James Clark
2024-07-18 15:07     ` Ian Rogers
2024-07-18 14:41   ` Liang, Kan
2024-07-18 15:12     ` Ian Rogers
2024-07-18 18:02       ` Liang, Kan
2024-07-18 21:06         ` Ian Rogers
2024-07-19 14:14           ` Liang, Kan
2024-07-19 15:01             ` Ian Rogers
2024-07-19 16:42               ` Liang, Kan [this message]
2024-07-18 13:42 ` [PATCH v2 0/6] Add support for sysfs event.cpus and cpu event term Dhananjay Ugwekar
2024-07-18 15:00   ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ee3c8e75-2463-49f8-b1ba-5a68db0bae39@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=Dhananjay.Ugwekar@amd.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ananth.narayan@amd.com \
    --cc=asmadeus@codewreck.org \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=gautham.shenoy@amd.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=jolsa@kernel.org \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=sandipan.das@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).