Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Knop, Ryszard" <ryszard.knop@intel.com>
To: "Musial, Ewelina" <ewelina.musial@intel.com>,
	"Piecielska, Katarzyna" <katarzyna.piecielska@intel.com>,
	"Heikkila, Juha-pekka" <juha-pekka.heikkila@intel.com>,
	"Grabski, Mateusz" <mateusz.grabski@intel.com>,
	"adrinael@adrinael.net" <adrinael@adrinael.net>,
	 "igt-dev@lists.freedesktop.org" <igt-dev@lists.freedesktop.org>,
	"Brodzik, Konrad B" <konrad.b.brodzik@intel.com>,
	"kamil.konieczny@linux.intel.com"
	<kamil.konieczny@linux.intel.com>,
	"peter.senna@linux.intel.com" <peter.senna@linux.intel.com>
Subject: Re: [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds
Date: Wed, 12 Feb 2025 10:06:47 +0000	[thread overview]
Message-ID: <c246f66564337e7778abc23b23fa1ad7400138c1.camel@intel.com> (raw)
In-Reply-To: <IA1PR11MB6097AD4DB631A02EF73EAE1886FD2@IA1PR11MB6097.namprd11.prod.outlook.com>

On Tue, 2025-02-11 at 11:59 +0000, Piecielska, Katarzyna wrote:
> Hello Peter
> 
> 
> 
> Hi Kamil,
> 
> Thank you for your review. Please see below.
> 
> On 11.02.2025 10:21, Kamil Konieczny wrote:
> > Hi Peter,
> > On 2025-02-06 at 16:21:47 +0100, Peter Senna Tschudin wrote:
> > > Commit ddfde25f16ba ("runner: Add support for aborting on network
> > > failure") introduced a 20 second deadline for the DUT’s network to 
> > > recover after a suspend/resume cycle. If the network isn’t back up 
> > > within that time, igt_runner aborts the test run to save logs and 
> > > prevent potential log loss from an imminent power cycle.
> > > 
> > > This deadline was set to accommodate our internal CI system, which 
> > > checks for DUT network connectivity every 5 seconds and retries up to 
> > > 3 times at 20 second intervals. If it fails 3 consecutive checks,
> > 
> > This is a little confusing, you wrote in first paragraph about
> > 20 second deadline and here it looks like 60 seconds (3*20).
> 
> The first paragraph is explicitly about the deadline introduced by a previous commit. The second paragraph is explicitly about the internal CI mechanism that inspired the deadline. Can you explain what is confusing?
> 
> > 
> > > it triggers a power cycle on the DUT.
> > > 
> > > Although our internal CI system can be configured with a longer
> > -------------- ^^^^^^^^
> > Remove this.
> 
> No, why?
> 
> 
> Kasia -> This is upstream review, no need to add word 'internal'.
> 
> > 
> > > wait time, extending it further would unnecessarily prolong tests in 
> > > cases of DUT hangs.
> > > 
> > > Bumping the deadline to 40 seconds keeps the abort mechanism safely
> > 
> > imho this should be option for igt-runner, I would prefer to not 
> > adjust it later, let CI team tune it. Option could be either time or 
> > retry counter or both.
> 
> I disagree. We can add value now by potentially preventing premature aborts with very little risk of creating any issue. The people in CC seems to agree with that.
> 
> This patch is as simple as it can get. I don't buy the benefit of the extra complexity of adding yet another command line option. It is way more effective to send one of these every 6 years or so.
> 
> Am I correct that this is a nack from you?
> 
> This looks like we need a good discussion with CI team. @Knop, Ryszard @Grabski, Mateusz can you comment on this approach?

- If the current value is good and the ping timeout causes machines to
take 20 seconds longer on each broken run, it's just 20 seconds.
Compared to everything else we do suboptimally, it costs maybe a few
minutes on a bad day, so it's not a problem if it's overshot.
- If the current value is too low, runs take longer, but on the other
hand we get less broken runs and less noise - I would prefer this
instead of spurious aborts.

IMO it's fine to bump this to 40s.

Thanks, Ryszard

> Thanks,
> Kasia
> 
> > 
> > > within our internal CI system retry window while improving chances of 
> > > preventing a premature abort. For upstream testing on Jenkins, the 
> > > deadlines vary from 16 and 25 minutes, and this change has no impact.
> > > 
> > > CC: juha-pekka.heikkila@intel.com
> > > CC: katarzyna.piecielska@intel.com
> > > CC: ryszard.knop@intel.com
> > > CC: ewelina.musial@intel.com
> > > CC: adrinael@adrinael.net
> > > CC: mateusz.grabski@intel.com
> > > CC: konrad.b.brodzik@intel.com
> > 
> > imho better here 'Cc:'
> 
> Thank you!


  parent reply	other threads:[~2025-02-12 10:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06 15:21 [PATCH i-g-t] Bump aborting on network failure deadline to 40 seconds Peter Senna Tschudin
2025-02-06 21:06 ` ✓ Xe.CI.BAT: success for " Patchwork
2025-02-07  2:00 ` ✗ Xe.CI.Full: failure " Patchwork
2025-02-07  7:51   ` Peter Senna Tschudin
2025-02-07  8:13     ` Ravali, JupallyX
2025-02-07 14:00 ` ✓ Xe.CI.BAT: success " Patchwork
2025-02-07 14:07 ` ✗ i915.CI.BAT: failure " Patchwork
2025-02-07 14:43   ` Peter Senna Tschudin
2025-02-10  5:35     ` Ravali, JupallyX
2025-02-07 21:01 ` ✓ Xe.CI.Full: success " Patchwork
2025-02-10  5:33 ` ✓ i915.CI.BAT: " Patchwork
2025-02-10  7:08 ` ✗ i915.CI.Full: failure " Patchwork
2025-02-10  8:26   ` Peter Senna Tschudin
2025-02-10 14:43     ` Ravali, JupallyX
2025-02-10 10:36 ` ✓ i915.CI.Full: success " Patchwork
2025-02-11  9:21 ` [PATCH i-g-t] " Kamil Konieczny
2025-02-11  9:55   ` Peter Senna Tschudin
2025-02-11 11:59     ` Piecielska, Katarzyna
2025-02-11 13:38       ` Peter Senna Tschudin
2025-02-12 10:06       ` Knop, Ryszard [this message]
2025-02-12 13:52         ` Kamil Konieczny
2025-02-12 14:31           ` Knop, Ryszard
2025-02-12 14:59     ` Kamil Konieczny
2025-02-12 15:38       ` Peter Senna Tschudin
2025-02-13 13:37         ` Kamil Konieczny
2025-02-14  9:26           ` Peter Senna Tschudin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c246f66564337e7778abc23b23fa1ad7400138c1.camel@intel.com \
    --to=ryszard.knop@intel.com \
    --cc=adrinael@adrinael.net \
    --cc=ewelina.musial@intel.com \
    --cc=igt-dev@lists.freedesktop.org \
    --cc=juha-pekka.heikkila@intel.com \
    --cc=kamil.konieczny@linux.intel.com \
    --cc=katarzyna.piecielska@intel.com \
    --cc=konrad.b.brodzik@intel.com \
    --cc=mateusz.grabski@intel.com \
    --cc=peter.senna@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox