From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: "Piecielska, Katarzyna" <katarzyna.piecielska@intel.com>,
Kamil Konieczny <kamil.konieczny@linux.intel.com>,
"igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
"Heikkila, Juha-pekka" <juha-pekka.heikkila@intel.com>,
"Knop, Ryszard" <ryszard.knop@intel.com>,
"Musial, Ewelina" <ewelina.musial@intel.com>,
"adrinael@adrinael.net" <adrinael@adrinael.net>,
"Grabski, Mateusz" <mateusz.grabski@intel.com>,
"Brodzik, Konrad B" <konrad.b.brodzik@intel.com>
Subject: Re: [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds
Date: Tue, 11 Feb 2025 14:38:27 +0100 [thread overview]
Message-ID: <e9aeefb5-942b-495d-9b0e-bc3c33cb9ccc@linux.intel.com> (raw)
In-Reply-To: <IA1PR11MB6097AD4DB631A02EF73EAE1886FD2@IA1PR11MB6097.namprd11.prod.outlook.com>
Hi Kasia,
On 11.02.2025 12:59, Piecielska, Katarzyna wrote:
> Hello Peter
>
>
>
> Hi Kamil,
>
> Thank you for your review. Please see below.
>
> On 11.02.2025 10:21, Kamil Konieczny wrote:
>> Hi Peter,
>> On 2025-02-06 at 16:21:47 +0100, Peter Senna Tschudin wrote:
>>> Commit ddfde25f16ba ("runner: Add support for aborting on network
>>> failure") introduced a 20 second deadline for the DUT’s network to
>>> recover after a suspend/resume cycle. If the network isn’t back up
>>> within that time, igt_runner aborts the test run to save logs and
>>> prevent potential log loss from an imminent power cycle.
>>>
>>> This deadline was set to accommodate our internal CI system, which
>>> checks for DUT network connectivity every 5 seconds and retries up to
>>> 3 times at 20 second intervals. If it fails 3 consecutive checks,
>>
>> This is a little confusing, you wrote in first paragraph about
>> 20 second deadline and here it looks like 60 seconds (3*20).
>
> The first paragraph is explicitly about the deadline introduced by a previous commit. The second paragraph is explicitly about the internal CI mechanism that inspired the deadline. Can you explain what is confusing?
>
>>
>>> it triggers a power cycle on the DUT.
>>>
>>> Although our internal CI system can be configured with a longer
>> -------------- ^^^^^^^^
>> Remove this.
>
> No, why?
>
>
> Kasia -> This is upstream review, no need to add word 'internal'.
This entire discussion is about safety mechanisms from our internal CI.
Even the commit that I talk about, ddfde25f16ba ("runner: Add support for
aborting on network failure") was created to work well with our internal
CI.
What is the problem?
>
>>
>>> wait time, extending it further would unnecessarily prolong tests in
>>> cases of DUT hangs.
>>>
>>> Bumping the deadline to 40 seconds keeps the abort mechanism safely
>>
>> imho this should be option for igt-runner, I would prefer to not
>> adjust it later, let CI team tune it. Option could be either time or
>> retry counter or both.
>
> I disagree. We can add value now by potentially preventing premature aborts with very little risk of creating any issue. The people in CC seems to agree with that.
>
> This patch is as simple as it can get. I don't buy the benefit of the extra complexity of adding yet another command line option. It is way more effective to send one of these every 6 years or so.
>
> Am I correct that this is a nack from you?
>
> This looks like we need a good discussion with CI team. @Knop, Ryszard @Grabski, Mateusz can you comment on this approach?
What is not clear? Not sure there is any open here.
Thanks,
Peter
next prev parent reply other threads:[~2025-02-11 13:38 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 15:21 [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds Peter Senna Tschudin
2025-02-06 21:06 ` ✓ Xe.CI.BAT: success for " Patchwork
2025-02-07 2:00 ` ✗ Xe.CI.Full: failure " Patchwork
2025-02-07 7:51 ` Peter Senna Tschudin
2025-02-07 8:13 ` Ravali, JupallyX
2025-02-07 14:00 ` ✓ Xe.CI.BAT: success " Patchwork
2025-02-07 14:07 ` ✗ i915.CI.BAT: failure " Patchwork
2025-02-07 14:43 ` Peter Senna Tschudin
2025-02-10 5:35 ` Ravali, JupallyX
2025-02-07 21:01 ` ✓ Xe.CI.Full: success " Patchwork
2025-02-10 5:33 ` ✓ i915.CI.BAT: " Patchwork
2025-02-10 7:08 ` ✗ i915.CI.Full: failure " Patchwork
2025-02-10 8:26 ` Peter Senna Tschudin
2025-02-10 14:43 ` Ravali, JupallyX
2025-02-10 10:36 ` ✓ i915.CI.Full: success " Patchwork
2025-02-11 9:21 ` [PATCH i-g-t] " Kamil Konieczny
2025-02-11 9:55 ` Peter Senna Tschudin
2025-02-11 11:59 ` Piecielska, Katarzyna
2025-02-11 13:38 ` Peter Senna Tschudin [this message]
2025-02-12 10:06 ` Knop, Ryszard
2025-02-12 13:52 ` Kamil Konieczny
2025-02-12 14:31 ` Knop, Ryszard
2025-02-12 14:59 ` Kamil Konieczny
2025-02-12 15:38 ` Peter Senna Tschudin
2025-02-13 13:37 ` Kamil Konieczny
2025-02-14 9:26 ` Peter Senna Tschudin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9aeefb5-942b-495d-9b0e-bc3c33cb9ccc@linux.intel.com \
--to=peter.senna@linux.intel.com \
--cc=adrinael@adrinael.net \
--cc=ewelina.musial@intel.com \
--cc=igt-dev@lists.freedesktop.org \
--cc=juha-pekka.heikkila@intel.com \
--cc=kamil.konieczny@linux.intel.com \
--cc=katarzyna.piecielska@intel.com \
--cc=konrad.b.brodzik@intel.com \
--cc=mateusz.grabski@intel.com \
--cc=ryszard.knop@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox