From: Thomas Gleixner <tglx@linutronix.de>
To: "Liang, Kan" <kan.liang@linux.intel.com>,
Li Huafei <lihuafei1@huawei.com>,
peterz@infradead.org, mingo@redhat.com
Cc: acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
alexander.shishkin@linux.intel.com, jolsa@kernel.org,
irogers@google.com, adrian.hunter@intel.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf/x86/intel: Restrict period on Haswell
Date: Wed, 14 Aug 2024 21:01:51 +0200 [thread overview]
Message-ID: <87frr7nd28.ffs@tglx> (raw)
In-Reply-To: <a42a3e35-2166-4539-930b-21ea0921e8d8@linux.intel.com>
On Wed, Aug 14 2024 at 14:15, Kan Liang wrote:
> On 2024-08-14 10:52 a.m., Thomas Gleixner wrote:
>> Now looking at the HSW specification update specifically erratum HSW11:
>>
>> Performance Monitor Precise Instruction Retired Event May Present
>> Wrong Indications
>>
>> Problem:
>> When the Precise Distribution for Instructions Retired (PDIR)
>> mechanism is activated (INST_RETIRED.ALL (event C0H, umask
>> value 00H) on Counter 1 programmed in PEBS mode), the processor
>> may return wrong PEBS or Performance Monitoring Interrupt (PMI)
>> interrupts and/or incorrect counter values if the counter is
>> reset with a Sample- After-Value (SAV) below 100 (the SAV is
>> the counter reset value software programs in the MSR
>> IA32_PMC1[47:0] in order to control interrupt frequency).
>>
>> Implication:
>> Due to this erratum, when using low SAV values, the program may
>> get incorrect PEBS or PMI interrupts and/or an invalid counter
>> state.
>>
>> Workaround:
>> The sampling driver should avoid using SAV<100.
>>
>> IOW, that's exactly the same issue as the BDM11 erratum.
>>
>> Kan: Can you please go through the various specification updates and
>> identify which generations are affected by this and fix it once and
>> forever in a sane way instead of relying on 'tried until it works by
>> some definition of works' hacks. These errata are there for a reason.
>
> Sure. I will check all the related erratum and propose a fix.
>
>> But that does not explain the fallout with that cve test because that
>> does not use PEBS. It's using fixed counter 0.
>
> The errata also mentions about the PMI interrupts, which may imply
> non-PEBS case. I will double check with the architect.
Ah. Indeed.
> According to the description of the patch, if I understand correctly, it
> runs 100 CVE-2015-3290 tests at the same time. If so, all the GP
> counters are used. Huafei, could you please confirm?
I can reproduce that way on my quad socket HSW almost instantaneously:
[10473.376928] CPU#16: ctrl: 0000000000000000
[10473.376930] CPU#16: status: 0000000000000000
[10473.376931] CPU#16: overflow: 0000000000000000
[10473.376932] CPU#16: fixed: 00000000000000bb
[10473.376933] CPU#16: pebs: 0000000000000000
[10473.376934] CPU#16: debugctl: 0000000000004000
[10473.376935] CPU#16: active: 0000000300000000
[10473.376937] CPU#16: gen-PMC0 ctrl: 0000000000134f2e
[10473.376938] CPU#16: gen-PMC0 count: 0000ffffffffffca
[10473.376940] CPU#16: gen-PMC0 left: 000000000000003b
[10473.376941] CPU#16: gen-PMC1 ctrl: 0000000000000000
[10473.376943] CPU#16: gen-PMC1 count: 0000000000000000
[10473.376944] CPU#16: gen-PMC1 left: 0000000000000000
[10473.376946] CPU#16: gen-PMC2 ctrl: 0000000000000000
[10473.376947] CPU#16: gen-PMC2 count: 0000000000000000
[10473.376948] CPU#16: gen-PMC2 left: 0000000000000000
[10473.376949] CPU#16: gen-PMC3 ctrl: 0000000000000000
[10473.376950] CPU#16: gen-PMC3 count: 0000000000000000
[10473.376952] CPU#16: gen-PMC3 left: 0000000000000000
[10473.376953] CPU#16: fixed-PMC0 count: 0000fffffffffffe
[10473.376954] CPU#16: fixed-PMC1 count: 0000fffbabf57908
[10473.376955] CPU#16: fixed-PMC2 count: 0000000000000000
[10473.376928] CPU#88: ctrl: 0000000000000000
[10473.376930] CPU#88: status: 0000000000000000
[10473.376931] CPU#88: overflow: 0000000000000000
[10473.376932] CPU#88: fixed: 00000000000000bb
[10473.376933] CPU#88: pebs: 0000000000000000
[10473.376934] CPU#88: debugctl: 0000000000004000
[10473.376935] CPU#88: active: 0000000300000000
[10473.376937] CPU#88: gen-PMC0 ctrl: 0000000000134f2e
[10473.376939] CPU#88: gen-PMC0 count: 0000fffffffffff2
[10473.376940] CPU#88: gen-PMC0 left: 00000000000000a8
[10473.376942] CPU#88: gen-PMC1 ctrl: 0000000000000000
[10473.376944] CPU#88: gen-PMC1 count: 0000000000000000
[10473.376945] CPU#88: gen-PMC1 left: 0000000000000000
[10473.376946] CPU#88: gen-PMC2 ctrl: 0000000000000000
[10473.376947] CPU#88: gen-PMC2 count: 0000000000000000
[10473.376949] CPU#88: gen-PMC2 left: 0000000000000000
[10473.376950] CPU#88: gen-PMC3 ctrl: 0000000000000000
[10473.376951] CPU#88: gen-PMC3 count: 0000000000000000
[10473.376952] CPU#88: gen-PMC3 left: 0000000000000000
[10473.376953] CPU#88: fixed-PMC0 count: 0000fffffffffffe
[10473.376955] CPU#88: fixed-PMC1 count: 0000fffa79a83958
[10473.376956] CPU#88: fixed-PMC2 count: 0000000000000000
This happens at the very same time and CPU#88 is the HT sibling of
CPU#16
Thanks,
tglx
next prev parent reply other threads:[~2024-08-14 19:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-29 22:33 [PATCH] perf/x86/intel: Restrict period on Haswell Li Huafei
2024-07-31 19:20 ` Thomas Gleixner
2024-08-13 13:13 ` Li Huafei
2024-08-14 14:43 ` Thomas Gleixner
2024-08-14 14:52 ` Thomas Gleixner
2024-08-14 18:15 ` Liang, Kan
2024-08-14 19:01 ` Thomas Gleixner [this message]
2024-08-14 19:37 ` Liang, Kan
2024-08-14 22:47 ` Thomas Gleixner
2024-08-15 15:39 ` Liang, Kan
2024-08-15 18:26 ` Thomas Gleixner
2024-08-15 20:15 ` Liang, Kan
2024-08-15 23:43 ` Thomas Gleixner
2024-08-16 19:27 ` Liang, Kan
2024-08-17 12:22 ` Liang, Kan
2024-08-17 12:23 ` Thomas Gleixner
2024-08-15 19:01 ` Vince Weaver
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87frr7nd28.ffs@tglx \
--to=tglx@linutronix.de \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=lihuafei1@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.