From: John Kacur <jkacur@gmail.com>
To: Costa Shulyupin <costa.shul@redhat.com>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>,
John Kacur <jkacur@redhat.com>,
"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Subject: Re: [PATCH v4] hwlatdetect: Add MTBF calculation
Date: Thu, 26 Feb 2026 15:37:28 -0500 (EST) [thread overview]
Message-ID: <29ea4c85-1ce6-ada8-3cdf-dc6ca79f432c@gmail.com> (raw)
In-Reply-To: <20260225133232.756608-1-costa.shul@redhat.com>
On Wed, 25 Feb 2026, Costa Shulyupin wrote:
> Hwlatdetect reports the number of latency spikes but provides no
> information about their frequency distribution over time. This makes it
> difficult to compare results - a test with 10 spikes over 1 hour is very
> different from 10 spikes over 24 hours, but both show 'spikes = 10'.
>
> Add Mean Time Between Failures (MTBF) calculation to quantify spike
> frequency.
>
> By definition MTBF = total operating time / number of failures.
>
> When the failure interval is large relative to test duration, this
> formula is biased. For example, imagine stable periodic failures. The
> total operating time will include time before the first failure and
> after the last failure. These intervals are determined by when the test
> starts and stops, not by the system's failure behavior, which adds
> measurement bias. The resulting MTBF will vary between runs even for
> stable periodic failures.
>
> To reduce this bias, calculate MTBF using only the time between the
> first and last failure, divided by the number of intervals (failures
> minus one).
>
> Additionally, hwlatdetect only samples during the 'width' period within
> each 'window' cycle. The non-sampling periods contribute to the error
> margin. The MTBF is therefore adjusted by multiplying it by the ratio
> of window to width:
>
> MTBF = (timestamp of last failure - timestamp of first failure) * window
> / ((number of failures - 1) * width)
>
> In hwlatdetect, the failures are called samples. The failure count is
> the sum of counters from all samples.
>
> This metric enables meaningful comparison of hardware latency across
> different test runs, hardware configurations, and kernel versions. It
> can be considered a KPI for real-time stability, relevant for
> certification and SLA evaluation.
>
> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
> ---
>
> v4: Adjust MTBF by multiplying by the ratio of window to width
> v3:
> - Fix formatting
> - Make first and last instance variables
> v2:
> - Use another more stable calculation of MTBF
> ---
> src/hwlatdetect/hwlatdetect.py | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/src/hwlatdetect/hwlatdetect.py b/src/hwlatdetect/hwlatdetect.py
> index 38671724f3e3..abfbb954fe75 100755
> --- a/src/hwlatdetect/hwlatdetect.py
> +++ b/src/hwlatdetect/hwlatdetect.py
> @@ -306,6 +306,8 @@ def __init__(self):
> raise DetectorNotAvailable("hwlat", "hwlat tracer not available")
> self.type = "tracer"
> self.samples = []
> + self.first = None
> + self.last = None
> self.set("enable", 0)
> self.set('current_tracer', 'hwlat')
>
> @@ -338,6 +340,8 @@ def detect(self):
> pollcnt += 1
> val = self.get_sample()
> while val:
> + self.first = self.first or val.timestamp
> + self.last = val.timestamp
> self.samples.append(val)
> if watch:
> val.display()
> @@ -559,6 +563,11 @@ def cleanup(self):
> exceeding = detect.get("count")
> info(f"Samples exceeding threshold: {exceeding}")
>
> + if exceeding > 1:
> + mtbf = ((float(detect.last) - float(detect.first)) * int(detect.get('window'))
> + / ((exceeding - 1) * int(detect.get('width'))))
> + info(f"MTBF: {mtbf:.3f} seconds")
> +
> if detect.have_msr:
> finishsmi = detect.getsmicounts()
> total_smis = 0
> --
> 2.53.0
>
>
>
Signed-off-by: John Kacur <jkacur@redhat.com>
prev parent reply other threads:[~2026-02-26 20:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 13:32 [PATCH v4] hwlatdetect: Add MTBF calculation Costa Shulyupin
2026-02-26 20:37 ` John Kacur [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29ea4c85-1ce6-ada8-3cdf-dc6ca79f432c@gmail.com \
--to=jkacur@gmail.com \
--cc=costa.shul@redhat.com \
--cc=jkacur@redhat.com \
--cc=lgoncalv@redhat.com \
--cc=linux-rt-users@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox