Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Senna Tschudin <peter.senna@linux.intel.com>
To: "Piecielska, Katarzyna" <katarzyna.piecielska@intel.com>,
	Kamil Konieczny <kamil.konieczny@linux.intel.com>,
	"igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
	"Heikkila, Juha-pekka" <juha-pekka.heikkila@intel.com>,
	"Knop, Ryszard" <ryszard.knop@intel.com>,
	"Musial, Ewelina" <ewelina.musial@intel.com>,
	"adrinael@adrinael.net" <adrinael@adrinael.net>,
	"Grabski, Mateusz" <mateusz.grabski@intel.com>,
	"Brodzik, Konrad B" <konrad.b.brodzik@intel.com>
Subject: Re: [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds
Date: Tue, 11 Feb 2025 14:38:27 +0100	[thread overview]
Message-ID: <e9aeefb5-942b-495d-9b0e-bc3c33cb9ccc@linux.intel.com> (raw)
In-Reply-To: <IA1PR11MB6097AD4DB631A02EF73EAE1886FD2@IA1PR11MB6097.namprd11.prod.outlook.com>

Hi Kasia,

On 11.02.2025 12:59, Piecielska, Katarzyna wrote:
> Hello Peter
> 
> 
> 
> Hi Kamil,
> 
> Thank you for your review. Please see below.
> 
> On 11.02.2025 10:21, Kamil Konieczny wrote:
>> Hi Peter,
>> On 2025-02-06 at 16:21:47 +0100, Peter Senna Tschudin wrote:
>>> Commit ddfde25f16ba ("runner: Add support for aborting on network
>>> failure") introduced a 20 second deadline for the DUT’s network to 
>>> recover after a suspend/resume cycle. If the network isn’t back up 
>>> within that time, igt_runner aborts the test run to save logs and 
>>> prevent potential log loss from an imminent power cycle.
>>>
>>> This deadline was set to accommodate our internal CI system, which 
>>> checks for DUT network connectivity every 5 seconds and retries up to 
>>> 3 times at 20 second intervals. If it fails 3 consecutive checks,
>>
>> This is a little confusing, you wrote in first paragraph about
>> 20 second deadline and here it looks like 60 seconds (3*20).
> 
> The first paragraph is explicitly about the deadline introduced by a previous commit. The second paragraph is explicitly about the internal CI mechanism that inspired the deadline. Can you explain what is confusing?
> 
>>
>>> it triggers a power cycle on the DUT.
>>>
>>> Although our internal CI system can be configured with a longer
>> -------------- ^^^^^^^^
>> Remove this.
> 
> No, why?
> 
> 
> Kasia -> This is upstream review, no need to add word 'internal'.

This entire discussion is about safety mechanisms from our internal CI.
Even the commit that I talk about, ddfde25f16ba ("runner: Add support for
aborting on network failure") was created to work well with our internal
CI.

What is the problem?

> 
>>
>>> wait time, extending it further would unnecessarily prolong tests in 
>>> cases of DUT hangs.
>>>
>>> Bumping the deadline to 40 seconds keeps the abort mechanism safely
>>
>> imho this should be option for igt-runner, I would prefer to not 
>> adjust it later, let CI team tune it. Option could be either time or 
>> retry counter or both.
> 
> I disagree. We can add value now by potentially preventing premature aborts with very little risk of creating any issue. The people in CC seems to agree with that.
> 
> This patch is as simple as it can get. I don't buy the benefit of the extra complexity of adding yet another command line option. It is way more effective to send one of these every 6 years or so.
> 
> Am I correct that this is a nack from you?
> 
> This looks like we need a good discussion with CI team. @Knop, Ryszard @Grabski, Mateusz can you comment on this approach?

What is not clear? Not sure there is any open here.

Thanks,

Peter

  reply	other threads:[~2025-02-11 13:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 15:21 [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds Peter Senna Tschudin
2025-02-06 21:06 ` ✓ Xe.CI.BAT: success for " Patchwork
2025-02-07  2:00 ` ✗ Xe.CI.Full: failure " Patchwork
2025-02-07  7:51   ` Peter Senna Tschudin
2025-02-07  8:13     ` Ravali, JupallyX
2025-02-07 14:00 ` ✓ Xe.CI.BAT: success " Patchwork
2025-02-07 14:07 ` ✗ i915.CI.BAT: failure " Patchwork
2025-02-07 14:43   ` Peter Senna Tschudin
2025-02-10  5:35     ` Ravali, JupallyX
2025-02-07 21:01 ` ✓ Xe.CI.Full: success " Patchwork
2025-02-10  5:33 ` ✓ i915.CI.BAT: " Patchwork
2025-02-10  7:08 ` ✗ i915.CI.Full: failure " Patchwork
2025-02-10  8:26   ` Peter Senna Tschudin
2025-02-10 14:43     ` Ravali, JupallyX
2025-02-10 10:36 ` ✓ i915.CI.Full: success " Patchwork
2025-02-11  9:21 ` [PATCH i-g-t] " Kamil Konieczny
2025-02-11  9:55   ` Peter Senna Tschudin
2025-02-11 11:59     ` Piecielska, Katarzyna
2025-02-11 13:38       ` Peter Senna Tschudin [this message]
2025-02-12 10:06       ` Knop, Ryszard
2025-02-12 13:52         ` Kamil Konieczny
2025-02-12 14:31           ` Knop, Ryszard
2025-02-12 14:59     ` Kamil Konieczny
2025-02-12 15:38       ` Peter Senna Tschudin
2025-02-13 13:37         ` Kamil Konieczny
2025-02-14  9:26           ` Peter Senna Tschudin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9aeefb5-942b-495d-9b0e-bc3c33cb9ccc@linux.intel.com \
    --to=peter.senna@linux.intel.com \
    --cc=adrinael@adrinael.net \
    --cc=ewelina.musial@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=juha-pekka.heikkila@intel.com \
    --cc=kamil.konieczny@linux.intel.com \
    --cc=katarzyna.piecielska@intel.com \
    --cc=konrad.b.brodzik@intel.com \
    --cc=mateusz.grabski@intel.com \
    --cc=ryszard.knop@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox