From: "Vodapalli, Ravi Kumar" <ravi.kumar.vodapalli@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>
Cc: <igt-dev@lists.freedesktop.org>,
<balasubramani.vivekanandan@intel.com>,
<lucas.demarchi@intel.com>, <gustavo.sousa@intel.com>,
<clinton.a.taylor@intel.com>, <matthew.s.atwood@intel.com>,
<dnyaneshwar.bhadane@intel.com>, <haridhar.kalvala@intel.com>,
<shekhar.chauhan@intel.com>, <umesh.nerlige.ramappa@intel.com>,
<ashutosh.dixit@intel.com>
Subject: Re: [PATCH] tests/intel/perf_pmu: Fix Test Assertion Failure for semaphore-busy test
Date: Wed, 27 Nov 2024 12:26:03 +0530 [thread overview]
Message-ID: <75c4f516-681e-40cb-833c-5372b7729999@intel.com> (raw)
In-Reply-To: <20241126213932.GQ5725@mdroper-desk1.amr.corp.intel.com>
[-- Attachment #1: Type: text/plain, Size: 3304 bytes --]
Hi Matt,
This issue is not reproducible, tested multiple times, myself and
Nerlige Ramappa, Umesh from
kernel-telemetry team has discussed the issue, did not found any
permanent fix.
Please suggest what can be done.
Thanks,
Ravi Kumar V
On 11/27/2024 3:09 AM, Matt Roper wrote:
> On Tue, Nov 26, 2024 at 07:39:13PM +0530, Ravi Kumar Vodapalli wrote:
>> The tolerance limit exceeds the threshold values sometimes for test
>> igt@perf_pmu@semaphore-busy, bump up the limits slightly.
>> Also print the log in readable format in percentage instead of nanosec.
> But why is it exceeding the limits? We're already giving 5% wiggle room
> (which seems like a lot); bumping that up to 10% will hide CI failures,
> but it doesn't really explain why the numbers are so inaccurate to begin
> with. If there's a real bug causing the mismatch, then we should figure
> out how to fix that bug rather than just making IGT more willing to
> ignore the problem.
>
> BTW, looking through some of the results for this test in cibuglog, it
> seems like there are still cases where even the larger 10% tolerance
> still doesn't cover the gap. E.g.,
>
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_8126/shard-dg2-10/igt@perf_pmu@semaphore-busy.html
> (87.2% vs 100%)
>
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_15748/shard-dg2-4/igt@perf_pmu@semaphore-busy.html
> (88.1% vs 100%)
>
> So I think we need to understand what's actually going on here and
> causing these results. If there's a known, legitimate reason why the
> numbers are way off on specific platform(s) (e.g., some kind of
> workaround in the kernel or GuC) then it would be better to blacklist
> the test on platforms where we can't expect reliable results than to
> relax the test itself (and possibly let bugs go unnoticed on other
> platforms).
>
>
> Matt
>
>> Signed-off-by: Ravi Kumar Vodapalli<ravi.kumar.vodapalli@intel.com>
>> ---
>> tests/intel/perf_pmu.c | 9 ++++-----
>> 1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/tests/intel/perf_pmu.c b/tests/intel/perf_pmu.c
>> index bfa2d501a..7f43354fd 100644
>> --- a/tests/intel/perf_pmu.c
>> +++ b/tests/intel/perf_pmu.c
>> @@ -189,7 +189,7 @@
>>
>> IGT_TEST_DESCRIPTION("Test the i915 pmu perf interface");
>>
>> -const double tolerance = 0.05f;
>> +const double tolerance = 0.1f;
>> const unsigned long batch_duration_ns = 500e6;
>>
>> char *drpc;
>> @@ -287,10 +287,9 @@ static uint64_t pmu_read_multi(int fd, unsigned int num, uint64_t *val)
>> #define __assert_within_epsilon(x, ref, tol_up, tol_down, debug_data) \
>> igt_assert_f((double)(x) <= (1.0 + (tol_up)) * (double)(ref) && \
>> (double)(x) >= (1.0 - (tol_down)) * (double)(ref), \
>> - "'%s' != '%s' (%f not within +%.1f%%/-%.1f%% tolerance of %f)\n%s\n",\
>> - #x, #ref, (double)(x), \
>> - (tol_up) * 100.0, (tol_down) * 100.0, \
>> - (double)(ref), debug_data)
>> + "%.3f%% is not within tolerance limits of +%.1f%%/-%.1f%%\n%s\n", \
>> + (((double)((double)(x) - (double)(ref)) * 100.0) / (double)(ref)), \
>> + (tol_up) * 100.0, (tol_down) * 100.0, debug_data)
>>
>> #define assert_within_epsilon(x, ref, tolerance) \
>> __assert_within_epsilon(x, ref, tolerance, tolerance, no_debug_data)
>> --
>> 2.25.1
>>
[-- Attachment #2: Type: text/html, Size: 4333 bytes --]
prev parent reply other threads:[~2024-11-27 6:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-26 14:09 [PATCH] tests/intel/perf_pmu: Fix Test Assertion Failure for semaphore-busy test Ravi Kumar Vodapalli
2024-11-26 15:26 ` ✓ Xe.CI.BAT: success for " Patchwork
2024-11-26 15:40 ` ✗ i915.CI.BAT: failure " Patchwork
2024-11-26 17:16 ` ✗ Xe.CI.Full: " Patchwork
2024-11-26 21:39 ` [PATCH] " Matt Roper
2024-11-27 6:56 ` Vodapalli, Ravi Kumar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75c4f516-681e-40cb-833c-5372b7729999@intel.com \
--to=ravi.kumar.vodapalli@intel.com \
--cc=ashutosh.dixit@intel.com \
--cc=balasubramani.vivekanandan@intel.com \
--cc=clinton.a.taylor@intel.com \
--cc=dnyaneshwar.bhadane@intel.com \
--cc=gustavo.sousa@intel.com \
--cc=haridhar.kalvala@intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=matthew.s.atwood@intel.com \
--cc=shekhar.chauhan@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox