linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Liang, Kan" <kan.liang@linux.intel.com>,
	Li Huafei <lihuafei1@huawei.com>,
	peterz@infradead.org, mingo@redhat.com
Cc: acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, adrian.hunter@intel.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf/x86/intel: Restrict period on Haswell
Date: Wed, 14 Aug 2024 21:01:51 +0200	[thread overview]
Message-ID: <87frr7nd28.ffs@tglx> (raw)
In-Reply-To: <a42a3e35-2166-4539-930b-21ea0921e8d8@linux.intel.com>

On Wed, Aug 14 2024 at 14:15, Kan Liang wrote:
> On 2024-08-14 10:52 a.m., Thomas Gleixner wrote:
>> Now looking at the HSW specification update specifically erratum HSW11:
>> 
>>   Performance Monitor Precise Instruction Retired Event May Present
>>   Wrong Indications
>> 
>>   Problem:
>>          When the Precise Distribution for Instructions Retired (PDIR)
>>          mechanism is activated (INST_RETIRED.ALL (event C0H, umask
>>          value 00H) on Counter 1 programmed in PEBS mode), the processor
>>          may return wrong PEBS or Performance Monitoring Interrupt (PMI)
>>          interrupts and/or incorrect counter values if the counter is
>>          reset with a Sample- After-Value (SAV) below 100 (the SAV is
>>          the counter reset value software programs in the MSR
>>          IA32_PMC1[47:0] in order to control interrupt frequency).
>> 
>>   Implication:
>>          Due to this erratum, when using low SAV values, the program may
>>          get incorrect PEBS or PMI interrupts and/or an invalid counter
>>          state.
>> 
>>   Workaround:
>>          The sampling driver should avoid using SAV<100.
>> 
>> IOW, that's exactly the same issue as the BDM11 erratum.
>> 
>> Kan: Can you please go through the various specification updates and
>> identify which generations are affected by this and fix it once and
>> forever in a sane way instead of relying on 'tried until it works by
>> some definition of works' hacks. These errata are there for a reason.
>
> Sure. I will check all the related erratum and propose a fix.
>
>> But that does not explain the fallout with that cve test because that
>> does not use PEBS. It's using fixed counter 0.
>
> The errata also mentions about the PMI interrupts, which may imply
> non-PEBS case. I will double check with the architect.

Ah. Indeed.

> According to the description of the patch, if I understand correctly, it
> runs 100 CVE-2015-3290 tests at the same time. If so, all the GP
> counters are used. Huafei, could you please confirm?

I can reproduce that way on my quad socket HSW almost instantaneously:

[10473.376928] CPU#16: ctrl:       0000000000000000
[10473.376930] CPU#16: status:     0000000000000000
[10473.376931] CPU#16: overflow:   0000000000000000
[10473.376932] CPU#16: fixed:      00000000000000bb
[10473.376933] CPU#16: pebs:       0000000000000000
[10473.376934] CPU#16: debugctl:   0000000000004000
[10473.376935] CPU#16: active:     0000000300000000
[10473.376937] CPU#16:   gen-PMC0 ctrl:  0000000000134f2e
[10473.376938] CPU#16:   gen-PMC0 count: 0000ffffffffffca
[10473.376940] CPU#16:   gen-PMC0 left:  000000000000003b
[10473.376941] CPU#16:   gen-PMC1 ctrl:  0000000000000000
[10473.376943] CPU#16:   gen-PMC1 count: 0000000000000000
[10473.376944] CPU#16:   gen-PMC1 left:  0000000000000000
[10473.376946] CPU#16:   gen-PMC2 ctrl:  0000000000000000
[10473.376947] CPU#16:   gen-PMC2 count: 0000000000000000
[10473.376948] CPU#16:   gen-PMC2 left:  0000000000000000
[10473.376949] CPU#16:   gen-PMC3 ctrl:  0000000000000000
[10473.376950] CPU#16:   gen-PMC3 count: 0000000000000000
[10473.376952] CPU#16:   gen-PMC3 left:  0000000000000000
[10473.376953] CPU#16: fixed-PMC0 count: 0000fffffffffffe
[10473.376954] CPU#16: fixed-PMC1 count: 0000fffbabf57908
[10473.376955] CPU#16: fixed-PMC2 count: 0000000000000000

[10473.376928] CPU#88: ctrl:       0000000000000000
[10473.376930] CPU#88: status:     0000000000000000
[10473.376931] CPU#88: overflow:   0000000000000000
[10473.376932] CPU#88: fixed:      00000000000000bb
[10473.376933] CPU#88: pebs:       0000000000000000
[10473.376934] CPU#88: debugctl:   0000000000004000
[10473.376935] CPU#88: active:     0000000300000000
[10473.376937] CPU#88:   gen-PMC0 ctrl:  0000000000134f2e
[10473.376939] CPU#88:   gen-PMC0 count: 0000fffffffffff2
[10473.376940] CPU#88:   gen-PMC0 left:  00000000000000a8
[10473.376942] CPU#88:   gen-PMC1 ctrl:  0000000000000000
[10473.376944] CPU#88:   gen-PMC1 count: 0000000000000000
[10473.376945] CPU#88:   gen-PMC1 left:  0000000000000000
[10473.376946] CPU#88:   gen-PMC2 ctrl:  0000000000000000
[10473.376947] CPU#88:   gen-PMC2 count: 0000000000000000
[10473.376949] CPU#88:   gen-PMC2 left:  0000000000000000
[10473.376950] CPU#88:   gen-PMC3 ctrl:  0000000000000000
[10473.376951] CPU#88:   gen-PMC3 count: 0000000000000000
[10473.376952] CPU#88:   gen-PMC3 left:  0000000000000000
[10473.376953] CPU#88: fixed-PMC0 count: 0000fffffffffffe
[10473.376955] CPU#88: fixed-PMC1 count: 0000fffa79a83958
[10473.376956] CPU#88: fixed-PMC2 count: 0000000000000000

This happens at the very same time and CPU#88 is the HT sibling of
CPU#16

Thanks,

        tglx

  reply	other threads:[~2024-08-14 19:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-29 22:33 [PATCH] perf/x86/intel: Restrict period on Haswell Li Huafei
2024-07-31 19:20 ` Thomas Gleixner
2024-08-13 13:13   ` Li Huafei
2024-08-14 14:43     ` Thomas Gleixner
2024-08-14 14:52     ` Thomas Gleixner
2024-08-14 18:15       ` Liang, Kan
2024-08-14 19:01         ` Thomas Gleixner [this message]
2024-08-14 19:37           ` Liang, Kan
2024-08-14 22:47             ` Thomas Gleixner
2024-08-15 15:39               ` Liang, Kan
2024-08-15 18:26                 ` Thomas Gleixner
2024-08-15 20:15                   ` Liang, Kan
2024-08-15 23:43                     ` Thomas Gleixner
2024-08-16 19:27                       ` Liang, Kan
2024-08-17 12:22                         ` Liang, Kan
2024-08-17 12:23                         ` Thomas Gleixner
2024-08-15 19:01                 ` Vince Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frr7nd28.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).