public inbox for igt-dev@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
To: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: igt-dev@lists.freedesktop.org
Subject: Re: [igt-dev] [PATCH i-g-t 1/6] test/perf: Drop caches when closing perf stream
Date: Thu, 5 Mar 2020 11:36:28 -0800	[thread overview]
Message-ID: <20200305193628.GA63879@orsosgc001.amr.corp.intel.com> (raw)
In-Reply-To: <20200304235156.GB58678@orsosgc001.amr.corp.intel.com>

On Wed, Mar 04, 2020 at 03:51:56PM -0800, Umesh Nerlige Ramappa wrote:
>On Thu, Mar 05, 2020 at 01:29:50AM +0200, Lionel Landwerlin wrote:
>>On 05/03/2020 01:04, Umesh Nerlige Ramappa wrote:
>>>On Thu, Mar 05, 2020 at 12:05:55AM +0200, Lionel Landwerlin wrote:
>>>>On 04/03/2020 19:51, Umesh Nerlige Ramappa wrote:
>>>>>On Wed, Mar 04, 2020 at 10:45:55AM +0200, Lionel Landwerlin wrote:
>>>>>>On 04/03/2020 00:57, Umesh Nerlige Ramappa wrote:
>>>>>>>On Tue, Mar 03, 2020 at 02:38:08PM -0800, Umesh Nerlige 
>>>>>>>Ramappa wrote:
>>>>>>>>Running ./build/tests/perf will run all the perf 
>>>>>>>>subtests in a sequence.
>>>>>>>>When running tests in a sequence, subsequent tests may 
>>>>>>>>not run with a
>>>>>>>>clean slate. For resources that are lazily released, drop caches in
>>>>>>>>__perf_close.
>>>>>>>
>>>>>>>Hi Lionel, Chris,
>>>>>>>
>>>>>>>I notice an issue on TGL when running the entire suite of 
>>>>>>>perf tests.  In my setup, the polling test was failing 
>>>>>>>with invalid reports being seen in the beginning of the OA 
>>>>>>>buffer. This issue is seen more prominently with the newly 
>>>>>>>added subtests which call perf_open and perf_close a 
>>>>>>>couple of times (say blocking-with-interrupt).
>>>>>>>
>>>>>>>What I see in some runs is that the second test would 
>>>>>>>result in a bunch of unlanded reports in the beginning of 
>>>>>>>the OA buffer. Assuming that we are already waiting for 
>>>>>>>the NOA config with a noa_wait bo, I tried to look into 
>>>>>>>this further.
>>>>>>>
>>>>>>>free_oa_buffer is called to free the oa_buffer bo and this 
>>>>>>>work is deferred by the driver. If a test is called before 
>>>>>>>this free completes, we see the issue. Just to test out 
>>>>>>>this theory, if I comment out the free_oa_buffer entirely, 
>>>>>>>I see that the tests pass without any issues since new gtt 
>>>>>>>memory is being allocated each time.
>>>>>>>
>>>>>>>I guess the deferred free and the new allocation of the OA 
>>>>>>>buffer for subsequent test has something missing. Maybe 
>>>>>>>TLBs not being dropped? I imagine the OA unit might write 
>>>>>>>valid reports somewhere based on what it sees in the TLBs 
>>>>>>>and cpu is looking for them elsewhere (until the free 
>>>>>>>completes). Just a theory though. Let me know what you 
>>>>>>>think.
>>>>>>>
>>>>>>>For now igt_drop_caches_set(DROP_FREED) is what is helping 
>>>>>>>and hence this patch.
>>>>>>
>>>>>>
>>>>>>Hey Umesh,
>>>>>>
>>>>>>
>>>>>>I guess this could be fixed by this commit :
>>>>>>
>>>>>>
>>>>>>commit 4b4e973d5eb89244b67d3223b60f752d0479f253
>>>>>>Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>Date:   Mon Mar 2 08:57:57 2020 +0000
>>>>>>
>>>>>>    drm/i915/perf: Reintroduce wait on OA configuration completion
>>>>>>
>>>>>>If you can give this commit a try or rebase on drm-tip it 
>>>>>>would be great to confirm.
>>>>>
>>>>>I thought this commit was ensuring that the noa_wait is 
>>>>>executed completely before we enable the OA buffer captures. 
>>>>>That still does not explain why the issue goes away for me 
>>>>>when I comment out free_oa_buffer.
>>>>
>>>>
>>>>If noa_wait is not waited upon, either from CPU or GPU, then we 
>>>>enable OA while the configuration is not completely applied.
>>>>
>>>>Hence the invalid data at the beginning of the buffer.
>>>>
>>>>
>>>>Are you saying this commit didn't help?
>>>>
>>>No, I haven't tried it yet. I lost my reservation on the TGL 
>>>machine :(, so waiting for another one.
>>>
>>>What I meant is that - not freeing the OA buffer (only for 
>>>experimenting) results in a new gtt offset everytime we allocate 
>>>the OA buffer. When I do this, I don't see any invalid OA reports. 
>>>This is without the commit you mention above. If waiting for the 
>>>NOA config to complete were indeed the issue, I should have seen 
>>>it even with my experiment. Right?
>>
>>
>>I see, thanks that's a useful experiment.
>>
>>Only thing I can think of would be HEAD/TAIL register writes that 
>>didn't land before the OA unit was turned on.
>>
>>I'll dig into the code.
>>
>>
>>Is this only on Gen12?
>
>Yes. On TGL, There should be 2 way to reproduce this (without the 
>above commit):
>
>1. Just run perf so that it runs all the tests (or at least blocking 
>followed by polling). The polling test fails.
>
>2. If you are applying the interrupt patches from this and the kernel 
>thread, then you just need to run the blocking-with-interrupts test.
>
>In dmesg, you will see messages for unlanded reports. That's what I am 
>going by to decide if any fix/workaround worked or not.
>
>Thanks,
>Umesh
>>

I rebased the kernel patches to get Chris's commit, removed 
gem_quiesce_gpu from igt tests and tried running 
blocking-with-interrupts. I still see the issue. This subtest has 4 
tests - first one passes, second one has unlanded reports, third one
causes a reboot (have to enable crash dumps to debug further).

Thanks,
Umesh

>>
>>Thanks,
>>
>>
>>-Lionel
>>
>>
>>>
>>>Thanks,
>>>Umesh
>>>
>>>>
>>>>Thanks,
>>>>
>>>>
>>>>-Lionel
>>>>
>>>>
>>>>>
>>>>>Thanks,
>>>>>Umesh
>>>>>
>>>>>>
>>>>>>Otherwise we might need more digging to figure what's going on.
>>>>>>
>>>>>>
>>>>>>Thanks,
>>>>>>
>>>>>>
>>>>>>-Lionel
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Umesh
>>>>>>>
>>>>>>>>
>>>>>>>>Signed-off-by: Umesh Nerlige Ramappa 
>>>>>>>><umesh.nerlige.ramappa@intel.com>
>>>>>>>>---
>>>>>>>>tests/perf.c | 7 ++++++-
>>>>>>>>1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>>diff --git a/tests/perf.c b/tests/perf.c
>>>>>>>>index 5e818030..189c6aa1 100644
>>>>>>>>--- a/tests/perf.c
>>>>>>>>+++ b/tests/perf.c
>>>>>>>>@@ -244,6 +244,12 @@ __perf_close(int fd)
>>>>>>>>        close(pm_fd);
>>>>>>>>        pm_fd = -1;
>>>>>>>>    }
>>>>>>>>+
>>>>>>>>+    /* When running tests in a sequence, subsequent 
>>>>>>>>tests may not run with a
>>>>>>>>+     * clean slate. For resources that are lazily 
>>>>>>>>released, cleanup here.
>>>>>>>>+     */
>>>>>>>>+    if (drm_fd >= 0 && !getgid() && !getuid())
>>>>>>>>+        gem_quiescent_gpu(drm_fd);
>>>>>>>>}
>>>>>>>>
>>>>>>>>static int
>>>>>>>>@@ -3993,7 +3999,6 @@ test_rc6_disable(void)
>>>>>>>>    igt_assert_eq(n_events_end - n_events_start, 0);
>>>>>>>>
>>>>>>>>    __perf_close(stream_fd);
>>>>>>>>-    gem_quiescent_gpu(drm_fd);
>>>>>>>>
>>>>>>>>    n_events_start = rc6_residency_ms();
>>>>>>>>    nanosleep(&(struct timespec){ .tv_sec = 1, .tv_nsec 
>>>>>>>>= 0 }, NULL);
>>>>>>>>-- 
>>>>>>>>2.20.1
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>igt-dev mailing list
>>>>>>>>igt-dev@lists.freedesktop.org
>>>>>>>>https://lists.freedesktop.org/mailman/listinfo/igt-dev
>>>>>>
>>>>>>
>>>>
>>
>_______________________________________________
>igt-dev mailing list
>igt-dev@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/igt-dev
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

  reply	other threads:[~2020-03-05 19:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03 22:38 [igt-dev] [PATCH 0/6] Add igt tests for perf OA interrupts Umesh Nerlige Ramappa
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 1/6] test/perf: Drop caches when closing perf stream Umesh Nerlige Ramappa
2020-03-03 22:57   ` Umesh Nerlige Ramappa
2020-03-04  8:45     ` Lionel Landwerlin
2020-03-04 17:51       ` Umesh Nerlige Ramappa
2020-03-04 22:05         ` Lionel Landwerlin
2020-03-04 23:04           ` Umesh Nerlige Ramappa
2020-03-04 23:29             ` Lionel Landwerlin
2020-03-04 23:51               ` Umesh Nerlige Ramappa
2020-03-05 19:36                 ` Umesh Nerlige Ramappa [this message]
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 2/6] include/drm-uapi: Update i915_drm.h for perf OA APIs Umesh Nerlige Ramappa
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 3/6] tests/perf: new tests for parameterized OA buffer polling Umesh Nerlige Ramappa
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 4/6] tests/perf: new tests for OA interrupt Umesh Nerlige Ramappa
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 5/6] tests/perf: add flush data ioctl test Umesh Nerlige Ramappa
2020-03-03 22:38 ` [igt-dev] [PATCH i-g-t 6/6] tools: Enable interrupt support in i915 perf recorder Umesh Nerlige Ramappa
2020-03-03 23:56 ` [igt-dev] ✓ Fi.CI.BAT: success for Add igt tests for perf OA interrupts Patchwork
2020-03-04 14:44 ` [igt-dev] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305193628.GA63879@orsosgc001.amr.corp.intel.com \
    --to=umesh.nerlige.ramappa@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=lionel.g.landwerlin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox