linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Liang, Kan" <kan.liang@linux.intel.com>,
	Li Huafei <lihuafei1@huawei.com>,
	peterz@infradead.org, mingo@redhat.com
Cc: acme@kernel.org, namhyung@kernel.org, mark.rutland@arm.com,
	alexander.shishkin@linux.intel.com, jolsa@kernel.org,
	irogers@google.com, adrian.hunter@intel.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	Andi Kleen <ak@linux.intel.com>,
	Vince Weaver <vincent.weaver@maine.edu>
Subject: Re: [PATCH] perf/x86/intel: Restrict period on Haswell
Date: Fri, 16 Aug 2024 01:43:21 +0200	[thread overview]
Message-ID: <87plq9l5d2.ffs@tglx> (raw)
In-Reply-To: <059d6217-10a5-4d2a-b639-90806c04a13b@linux.intel.com>

On Thu, Aug 15 2024 at 16:15, Kan Liang wrote:
> On 2024-08-15 2:26 p.m., Thomas Gleixner wrote:
>>> Implication: When this erratum occurs, the fixed-function performance
>>> counter IA32_FIXED_CTR0 may over count by up to 32.
>> 
>> Sure. That's only explaining half of the problem.
>> 
>> As I demonstrated in the non-contended case even with a count of 2 (I
>> tried 1 too) the status bit is never set on the second check.
>> 
>
> Do you mean the below example? The status bit (32) of the fixed counter
> 0 is always set.

When HT is off or the threads are not running the handler concurrently
then there is zero looping. Once they start do fiddle concurrently the
looping starts and potentially never ends.

> 65.147782: x86_perf_event_set_period: idx:    32 period:         1 left:  2
> 65.147783: intel_pmu_handle_irq:      loops: 001 status: 100000000
>
> 65.147784: x86_perf_event_set_period: idx:    32 period:         1 left:  2
> 65.147784: intel_pmu_handle_irq:      loops: 002 status: 100000000

So in the non-concurrent (which includes !HT) case the status check
after handling the event is always 0. This never gets into a loop, not
even once.

>>> It should explains that the issue is gone with the magic number 32 or
>>> disabling the Hyper-Threading.
>> 
>> It explains only half of it. If you use 32, then the counter is set to
>> -32 so the overcount of 32 will still bring it to 0, which should set
>> the status bit, no?
>
> I think it's up to 32, not always 32.
> I don't have more details regarding the issue. The architect of HSW has
> left. I'm asking around internally to find the original bug report of
> the erratum. Hope there are more details in the report.

See below.

>>> I also found a related discussion about 9 years ago.
>>> https://lore.kernel.org/lkml/alpine.DEB.2.11.1505181343090.32481@vincent-weaver-1.umelst.maine.edu/
>>> Vince tried the workaround but it seems not work.
>> 
>> Let me play with that. :)
>
> Appreciate it.

I got it actually working. The inital sequence which "worked" is:

    1) Clear bit 32  in IA32_PERF_GLOBAL_CTRL
    2) Clear bit 0/1 in IA32_PERF_FIXED_CTR_CTRL
    3) Zero the IA32_FIXED_CTR0 MSR
    4) Set IA32_FIXED_CTR0 to (-left) & mask;
    5) Set bit 0/1 in IA32_PERF_FIXED_CTR_CTRL
    6) Set bit 32  in IA32_PERF_GLOBAL_CTRL

If I omit #3 it does not work. If I flip #5 and #6 it does not work.

So the initial "working" variant I had was hacking this sequence into
x86_perf_event_set_period() (omitting the fixed counter 0 conditionals
for readability):

	rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl & ~(1ULL << 32));

	rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd);
	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd & ~3ULL);

	wrmsrl(hwc->event_base, 0);
	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd);
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);

Now I thought, that needs to be in intel/core.c and implemented a proper
quirk. And of course being smart I decided it's a brilliant idea to use
the cached values instead of the rdmsrl()s.

	cglbl = hybrid(cpuc->pmu, intel_ctrl) & ~cpuc->intel_ctrl_guest_mask;
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl & ~(1ULL << 32));

        cfixd = cpuc->fixed_ctrl_val;
	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd & ~3ULL);

	wrmsrl(hwc->event_base, 0);
	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd);
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);

Surprise, surprise, that does not work. So I went back and wanted to
know which rdmslr() is curing it:

	rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl & ~(1ULL << 32));

        cfixd = cpuc->fixed_ctrl_val;
	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd & ~3ULL);

	wrmsrl(hwc->event_base, 0);
	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

	wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR_CTRL, cfixd);
	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);

This worked. Using the rdmsrl() only for MSR_ARCH_PERFMON_FIXED_CTR_CTRL
did not.

Now I got bold and boiled it down to:

	rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, cglbl);

	wrmsrl(hwc->event_base, 0);
	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

and the whole thing still worked. *BLINK*

Exactly zero loop entries in the trace when running 100 instances of
that cve test case, which otherwise spams the trace with entries and
ends up in the loop > 100 path within a split second.

Removing the zeroing of the counter makes it come back, but reducing the
whole nonsense to:

	wrmsrl(hwc->event_base, 0);
	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);

makes the loop problem go away, but it "works" so well that the stupid
frequency adjustment algorithm keeps the left == 1, i.e count == 2 case
stay long enough around to trigger 'hung task messages' ....

Now I looked more at the dmesg output of all the experiments. In all
"working" cases except one running these 100 instances of cve... results
in a varying cascade of

   perf: interrupt took too long (2503 > 2500), lowering ...

messages.

The one case where this does not happen is when the limit is
unconditionally set to 32. But when I apply this limit only for the
fixed counter 0 it comes back. 

Now I looked at when these 'took too long' problems surface aside of the
obvious case of extensive looping. They are unrelated to the hyper
threading issue as I can reproduce with smt=off too.

They always happen when a counter was armed with a count < 32 and two
events expired in the same NMI. The test case uses fixed counter 0 and
general counter 0 for the other event.

So that erratum is a good hint, but that hardware does have more issues
than it tells.

So I think we should just apply that limit patch with a proper change
log and also make it:

hsw_limit(...)
{
	*left = max(*left, erratum_hsw11(event) ? 128 : 32;);
}

or such.

That limit won't cure the overcount issue from that HSM154 erratum, but
*SHRUG*.

Thanks,

        tglx

  reply	other threads:[~2024-08-15 23:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-29 22:33 [PATCH] perf/x86/intel: Restrict period on Haswell Li Huafei
2024-07-31 19:20 ` Thomas Gleixner
2024-08-13 13:13   ` Li Huafei
2024-08-14 14:43     ` Thomas Gleixner
2024-08-14 14:52     ` Thomas Gleixner
2024-08-14 18:15       ` Liang, Kan
2024-08-14 19:01         ` Thomas Gleixner
2024-08-14 19:37           ` Liang, Kan
2024-08-14 22:47             ` Thomas Gleixner
2024-08-15 15:39               ` Liang, Kan
2024-08-15 18:26                 ` Thomas Gleixner
2024-08-15 20:15                   ` Liang, Kan
2024-08-15 23:43                     ` Thomas Gleixner [this message]
2024-08-16 19:27                       ` Liang, Kan
2024-08-17 12:22                         ` Liang, Kan
2024-08-17 12:23                         ` Thomas Gleixner
2024-08-15 19:01                 ` Vince Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plq9l5d2.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=lihuafei1@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=vincent.weaver@maine.edu \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).