From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>,
tglx@linutronix.de, mingo@redhat.com, fenghua.yu@intel.com,
tony.luck@intel.com, vikas.shivappa@linux.intel.com,
gavin.hindman@intel.com, jithu.joseph@intel.com, hpa@zytor.com,
x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf
Date: Mon, 6 Aug 2018 16:07:09 -0700 [thread overview]
Message-ID: <08d51131-7802-5bfe-2cae-d116807183d1@intel.com> (raw)
In-Reply-To: <20180806221225.GO2458@hirez.programming.kicks-ass.net>
Hi Peter,
On 8/6/2018 3:12 PM, Peter Zijlstra wrote:
> On Mon, Aug 06, 2018 at 12:50:50PM -0700, Reinette Chatre wrote:
>> In my previous email I provided the details of the Cache Pseudo-Locking
>> feature implemented on top of resctrl. Please let me know if you would
>> like any more details about that. I can send you more materials.
>
> I've no yet had time to read..
>
>> BUG: sleeping function called from invalid context at
>> kernel/locking/mutex.c:748
>>
>> I thus continued to use the API with interrupts enabled did the following:
>>
>> Two new event attributes:
>> static struct perf_event_attr l2_miss_attr = {
>> .type = PERF_TYPE_RAW,
>> .config = (0x10ULL << 8) | 0xd1,
>
> Please use something like:
>
> X86_CONFIG(.event=0xd1, .umask=0x10),
>
> that's ever so much more readable.
>
>> .size = sizeof(struct perf_event_attr),
>> .pinned = 1,
>> .disabled = 1,
>> .exclude_user = 1
>> };
>>
>> static struct perf_event_attr l2_hit_attr = {
>> .type = PERF_TYPE_RAW,
>> .config = (0x2ULL << 8) | 0xd1,
>> .size = sizeof(struct perf_event_attr),
>> .pinned = 1,
>> .disabled = 1,
>> .exclude_user = 1
>> };
>>
>> Create the two new events using these attributes:
>> l2_miss_event = perf_event_create_kernel_counter(&l2_miss_attr, cpu,
>> NULL, NULL, NULL);
>> l2_hit_event = perf_event_create_kernel_counter(&l2_hit_attr, cpu, NULL,
>> NULL, NULL);
>>
>> Take measurements:
>> perf_event_enable(l2_miss_event);
>> perf_event_enable(l2_hit_event);
>> local_irq_disable();
>> /* Disable hardware prefetchers */
>> /* Loop through pseudo-locked memory */
>> /* Enable hardware prefetchers */
>> local_irq_enable();
>> perf_event_disable(l2_hit_event);
>> perf_event_disable(l2_miss_event);
>>
>> Read results:
>> l2_hits = perf_event_read_value(l2_hit_event, &enabled, &running);
>> l2_miss = perf_event_read_value(l2_miss_event, &enabled, &running);
>> /* Make results available in tracepoints */
>
> switch to .disabled=0 and try this for measurement:
>
> local_irq_disable();
> perf_event_read_local(l2_miss_event, &miss_val1, NULL, NULL);
> perf_event_read_local(l2_hit_event, &hit_val1, NULL, NULL);
> /* do your thing */
> perf_event_read_local(l2_miss_event, &miss_val2, NULL, NULL);
> perf_event_read_local(l2_hit_event, &hit_val2, NULL, NULL);
> local_irq_enable();
Thank you very much for taking a look and providing your guidance.
>
> You're running this on the CPU you created the event for, right?
Yes.
I've modified your suggestion slightly in an attempt to gain accuracy.
Now it looks like:
local_irq_disable();
/* disable hw prefetchers */
/* init local vars to loop through pseudo-locked mem */
perf_event_read_local(l2_hit_event, &l2_hits_before, NULL, NULL);
perf_event_read_local(l2_miss_event, &l2_miss_before, NULL, NULL);
/* loop through pseudo-locked mem */
perf_event_read_local(l2_hit_event, &l2_hits_after, NULL, NULL);
perf_event_read_local(l2_miss_event, &l2_miss_after, NULL, NULL);
/* enable hw prefetchers */
local_irq_enable();
With the above I do not see the impact of an interference workload
anymore but the results are not yet accurate:
pseudo_lock_mea-538 [002] .... 113.296084: pseudo_lock_l2: hits=4103
miss=2
pseudo_lock_mea-541 [002] .... 114.349343: pseudo_lock_l2: hits=4102
miss=3
pseudo_lock_mea-544 [002] .... 115.410206: pseudo_lock_l2: hits=4101
miss=4
pseudo_lock_mea-551 [002] .... 116.473912: pseudo_lock_l2: hits=4102
miss=3
pseudo_lock_mea-554 [002] .... 117.532446: pseudo_lock_l2: hits=4100
miss=5
pseudo_lock_mea-557 [002] .... 118.591121: pseudo_lock_l2: hits=4103
miss=2
pseudo_lock_mea-560 [002] .... 119.642467: pseudo_lock_l2: hits=4102
miss=3
pseudo_lock_mea-563 [002] .... 120.698562: pseudo_lock_l2: hits=4102
miss=3
pseudo_lock_mea-566 [002] .... 121.769348: pseudo_lock_l2: hits=4105
miss=4
In an attempt to improve the accuracy of the above I modified it to the
following:
/* create the two events as before in "enabled" state */
l2_hit_pmcnum = l2_hit_event->hw.event_base_rdpmc;
l2_miss_pmcnum = l2_miss_event->hw.event_base_rdpmc;
local_irq_disable();
/* disable hw prefetchers */
/* init local vars to loop through pseudo-locked mem */
l2_hits_before = native_read_pmc(l2_hit_pmcnum);
l2_miss_before = native_read_pmc(l2_miss_pmcnum);
/* loop through pseudo-locked mem */
l2_hits_after = native_read_pmc(l2_hit_pmcnum);
l2_miss_after = native_read_pmc(l2_miss_pmcnum);
/* enable hw prefetchers */
local_irq_enable();
With the above I seem to get the same accuracy as before:
pseudo_lock_mea-557 [002] .... 155.402566: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-564 [002] .... 156.441299: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-567 [002] .... 157.478605: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-570 [002] .... 158.524054: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-573 [002] .... 159.561853: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-576 [002] .... 160.599758: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-579 [002] .... 161.645553: pseudo_lock_l2: hits=4096
miss=0
pseudo_lock_mea-582 [002] .... 162.687593: pseudo_lock_l2: hits=4096
miss=0
Would a solution like this perhaps be acceptable to you?
I will continue to do more testing searching for any caveats in this
solution.
>> With the above implementation and a 256KB pseudo-locked memory region I
>> obtain the following results:
>> pseudo_lock_mea-755 [002] .... 396.946953: pseudo_lock_l2: hits=4140
>
>> The above results are not accurate since it does not reflect the success
>> of the pseudo-locked region. Expected results are as we can currently
>> obtain (copying results from previous email):
>> pseudo_lock_mea-26090 [002] .... 61838.488027: pseudo_lock_l2: hits=4096
>
> Still fairly close.. only like 44 extra hits or 1% error.
While the results do seem close, reporting a cache miss on memory that
is set up to be locked in cache is significant.
Thank you very much for your patience
Reinette
next prev parent reply other threads:[~2018-08-06 23:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-31 19:38 [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination with perf Reinette Chatre
2018-07-31 19:38 ` [PATCH 1/2] perf/x86: Expose PMC hardware reservation Reinette Chatre
2018-07-31 19:38 ` [PATCH 2/2] x86/intel_rdt: Coordinate performance monitoring with perf Reinette Chatre
2018-08-02 12:39 ` [PATCH 0/2] x86/intel_rdt and perf/x86: Fix lack of coordination " Peter Zijlstra
2018-08-02 16:14 ` Reinette Chatre
2018-08-02 16:18 ` Peter Zijlstra
2018-08-02 16:44 ` Reinette Chatre
2018-08-02 17:37 ` Peter Zijlstra
2018-08-02 18:18 ` Dave Hansen
2018-08-02 19:54 ` Peter Zijlstra
2018-08-02 20:06 ` Dave Hansen
2018-08-02 20:13 ` Peter Zijlstra
2018-08-02 20:43 ` Reinette Chatre
2018-08-03 10:49 ` Peter Zijlstra
2018-08-03 15:18 ` Reinette Chatre
2018-08-03 15:25 ` Peter Zijlstra
2018-08-03 18:37 ` Reinette Chatre
2018-08-06 19:50 ` Reinette Chatre
2018-08-06 22:12 ` Peter Zijlstra
2018-08-06 23:07 ` Reinette Chatre [this message]
2018-08-07 9:36 ` Peter Zijlstra
[not found] ` <ace0bebb-91ab-5d40-e7d7-d72d48302fa8@intel.com>
2018-08-08 1:28 ` Luck, Tony
2018-08-08 5:44 ` Reinette Chatre
2018-08-08 7:41 ` Peter Zijlstra
2018-08-08 15:55 ` Luck, Tony
2018-08-08 16:47 ` Peter Zijlstra
2018-08-08 16:51 ` Reinette Chatre
2018-08-08 7:51 ` Peter Zijlstra
2018-08-08 17:33 ` Reinette Chatre
2018-08-10 16:25 ` Reinette Chatre
2018-08-10 17:52 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08d51131-7802-5bfe-2cae-d116807183d1@intel.com \
--to=reinette.chatre@intel.com \
--cc=dave.hansen@intel.com \
--cc=fenghua.yu@intel.com \
--cc=gavin.hindman@intel.com \
--cc=hpa@zytor.com \
--cc=jithu.joseph@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=vikas.shivappa@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox