KCIDB: Support one more test status

public inbox for kernelci@lists.linux.dev
 help / color / mirror / Atom feed

* KCIDB: Support one more test status
@ 2023-04-19 17:08 Nikolai Kondrashov
  2023-04-19 17:49 ` Mark Brown
  2023-04-19 19:38 ` Bird, Tim
  0 siblings, 2 replies; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-04-19 17:08 UTC (permalink / raw)
  To: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi,
	Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI,
	Mark Brown, Johnson George, Sachin Sant

Hello everyone involved with, or interested in KCIDB,

I would like to make KCIDB I/O schema accept one more test status string:
"MISS" (preliminary name), meaning the test was supposed to run, but didn't
because of a harness/framework/infrastructure/CI system failure.

This would be distinct from the "SKIP" status, meaning the test was supposed
to run, but didn't (or didn't complete), because it was not applicable.

Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74

I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
objections by that time.

Read on for details, and respond either to this message, or in the PR.

At CKI we have both tests that skip themselves for valid reasons, and tests
that should've executed, but didn't. Because e.g. the previous test has
crashed the machine, or the tested kernel failed to boot. At the same time, we
want to make sure all the tests we planned to execute had their chance.

Before testing, we send our plan to our database as a KCIDB dataset with all
tests listed without the "status" field. According to KCIDB schema/protocol a
missing "status" field simply means no data, and is canonically interpreted as
"execution in progress".

After the testing is done, or we gave up trying, we send the same KCIDB test
objects to the database, but this time only containing whatever results we
got, including the "status" fields. However, with the current set of status
strings [1], the only way we can try to express "wanted to run, but couldn't"
is with "SKIP", which is not supposed to alert anyone, yet this situation
should be treated as a problem.

We propose to call this new status "MISS" (as in "the test result should be
there, but isn't"), and think it would be useful to others as well.

We can break down the testing stack into three layers: the tested code, the
test, and the harness (and everything above it) that runs the test. If we then
express each existing test status as one trinary outcome per each of those
layers, we would get this table (in order of descending status priority):

     STATUS      CODE TEST HARNESS+           LEGEND

     FAIL        ❌   ✅   ✅                 ❌ - failure
     ERROR       ➖   ❌   ✅                 ✅ - success
     PASS        ✅   ✅   ✅                 ➖ - no data
     DONE        ➖   ✅   ✅
     SKIP        ➖   ➖   ✅
                 ➖   ➖   ➖

If you look at the above closely, you will notice one possible state missing
(because we didn't need to express failing harnesses), and that is the status
we want to introduce:

     STATUS      CODE TEST HARNESS+           LEGEND

     FAIL        ❌   ✅   ✅                 ❌ - failure
     ERROR       ➖   ❌   ✅                 ✅ - success
=>  MISS        ➖   ➖   ❌ <=              ➖ - no data
     PASS        ✅   ✅   ✅
     DONE        ➖   ✅   ✅
     SKIP        ➖   ➖   ✅
                 ➖   ➖   ➖

Please respond with comments, objections, and (counter-)proposals,
if you have them.

Thank you for your attention!
Nick (and the CKI team)

[1] The current set of supported status strings

https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KCIDB: Support one more test status
  2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov
@ 2023-04-19 17:49 ` Mark Brown
  2023-04-20 16:50   ` Nikolai Kondrashov
  2023-04-19 19:38 ` Bird, Tim
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Brown @ 2023-04-19 17:49 UTC (permalink / raw)
  To: Nikolai Kondrashov
  Cc: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi,
	Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI,
	Johnson George, Sachin Sant

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

On Wed, Apr 19, 2023 at 08:08:03PM +0300, Nikolai Kondrashov wrote:

> I would like to make KCIDB I/O schema accept one more test status string:
> "MISS" (preliminary name), meaning the test was supposed to run, but didn't
> because of a harness/framework/infrastructure/CI system failure.

> This would be distinct from the "SKIP" status, meaning the test was supposed
> to run, but didn't (or didn't complete), because it was not applicable.

> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74

> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
> objections by that time.

> Read on for details, and respond either to this message, or in the PR.

This makes sense to me, it corresponds to how my own testing inteprets
results - I have a list of tests I expect to pass, and if any of them
return any result other than pass then that's an issue.  I also picked
the name "missing" for that in my code.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KCIDB: Support one more test status
  2023-04-19 17:49 ` Mark Brown
@ 2023-04-20 16:50   ` Nikolai Kondrashov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-04-20 16:50 UTC (permalink / raw)
  To: Mark Brown
  Cc: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi,
	Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI,
	Johnson George, Sachin Sant

On 4/19/23 20:49, Mark Brown wrote:
> On Wed, Apr 19, 2023 at 08:08:03PM +0300, Nikolai Kondrashov wrote:
> 
>> I would like to make KCIDB I/O schema accept one more test status string:
>> "MISS" (preliminary name), meaning the test was supposed to run, but didn't
>> because of a harness/framework/infrastructure/CI system failure.
> 
>> This would be distinct from the "SKIP" status, meaning the test was supposed
>> to run, but didn't (or didn't complete), because it was not applicable.
> 
>> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74
> 
>> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
>> objections by that time.
> 
>> Read on for details, and respond either to this message, or in the PR.
> 
> This makes sense to me, it corresponds to how my own testing inteprets
> results - I have a list of tests I expect to pass, and if any of them
> return any result other than pass then that's an issue.  I also picked
> the name "missing" for that in my code.

Yeah, it's kinda like that. Only KCIDB defines explicit statuses for a bunch 
of specific failures, and I propose to make "MISS" represent 
harness/framework/CI system mess-ups/"misses" only.

Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: KCIDB: Support one more test status
  2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov
  2023-04-19 17:49 ` Mark Brown
@ 2023-04-19 19:38 ` Bird, Tim
  2023-04-20  5:53   ` Guillaume Tucker
  2023-04-20 12:08   ` Nikolai Kondrashov
  1 sibling, 2 replies; 10+ messages in thread
From: Bird, Tim @ 2023-04-19 19:38 UTC (permalink / raw)
  To: Nikolai Kondrashov, kernelci@lists.linux.dev, Dmitry Vyukov,
	Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj,
	automated-testing@lists.yoctoproject.org, CKI, Mark Brown,
	Johnson George, Sachin Sant


> -----Original Message-----
> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
> Hello everyone involved with, or interested in KCIDB,
> 
> I would like to make KCIDB I/O schema accept one more test status string:
> "MISS" (preliminary name), meaning the test was supposed to run, but didn't
> because of a harness/framework/infrastructure/CI system failure.

> 
> This would be distinct from the "SKIP" status, meaning the test was supposed
> to run, but didn't (or didn't complete), because it was not applicable.
> 
> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74
> 
> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
> objections by that time.
> 
> Read on for details, and respond either to this message, or in the PR.
> 
> At CKI we have both tests that skip themselves for valid reasons, and tests
> that should've executed, but didn't. Because e.g. the previous test has
> crashed the machine, or the tested kernel failed to boot. At the same time, we
> want to make sure all the tests we planned to execute had their chance.
> 
> Before testing, we send our plan to our database as a KCIDB dataset with all
> tests listed without the "status" field. According to KCIDB schema/protocol a
> missing "status" field simply means no data, and is canonically interpreted as
> "execution in progress".
> 
> After the testing is done, or we gave up trying, we send the same KCIDB test
> objects to the database, but this time only containing whatever results we
> got, including the "status" fields. However, with the current set of status
> strings [1], the only way we can try to express "wanted to run, but couldn't"
> is with "SKIP", which is not supposed to alert anyone, yet this situation
> should be treated as a problem.
Why can't "wanted to run, but couldn't" be expressed with "ERROR"? 

> 
> We propose to call this new status "MISS" (as in "the test result should be
> there, but isn't"), and think it would be useful to others as well.
> 
> We can break down the testing stack into three layers: the tested code, the
> test, and the harness (and everything above it) that runs the test. If we then
> express each existing test status as one trinary outcome per each of those
> layers, we would get this table (in order of descending status priority):
> 
>      STATUS      CODE TEST HARNESS+           LEGEND
> 
>      FAIL        ❌   ✅   ✅                 ❌ - failure
>      ERROR       ➖   ❌   ✅                 ✅ - success
>      PASS        ✅   ✅   ✅                 ➖ - no data
>      DONE        ➖   ✅   ✅
>      SKIP        ➖   ➖   ✅
>                  ➖   ➖   ➖
> 
> If you look at the above closely, you will notice one possible state missing
> (because we didn't need to express failing harnesses), and that is the status
> we want to introduce:
> 
>      STATUS      CODE TEST HARNESS+           LEGEND
> 
>      FAIL        ❌   ✅   ✅                 ❌ - failure
>      ERROR       ➖   ❌   ✅                 ✅ - success
> =>  MISS        ➖   ➖   ❌ <=              ➖ - no data
>      PASS        ✅   ✅   ✅
>      DONE        ➖   ✅   ✅
>      SKIP        ➖   ➖   ✅
>                  ➖   ➖   ➖
> 
> Please respond with comments, objections, and (counter-)proposals,
> if you have them.

I don't understand the rationale for distinguishing a test error from a harness
error.  In either case the test was not executed properly, and so there is no
useful test result data available.  Diagnostic information should enable
the user to determine whether the problem was due to the test code failing
or the test harness failing.

I think I'm missing something.  Are you trying to distinguish these so you
can determine whether there is a problem with the test itself, vs. the harness?
Are you automatically re-running a test if the harness is the problem?

Why do you want to distinguish these error cases?
 -- Tim

> 
> Thank you for your attention!
> Nick (and the CKI team)
> 
> [1] The current set of supported status strings
> 
> https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KCIDB: Support one more test status
  2023-04-19 19:38 ` Bird, Tim
@ 2023-04-20  5:53   ` Guillaume Tucker
  2023-04-20 16:37     ` [Automated-testing] " Nikolai Kondrashov
  2023-04-20 12:08   ` Nikolai Kondrashov
  1 sibling, 1 reply; 10+ messages in thread
From: Guillaume Tucker @ 2023-04-20  5:53 UTC (permalink / raw)
  To: Bird, Tim, Nikolai Kondrashov, kernelci@lists.linux.dev,
	Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li,
	Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI,
	Mark Brown, Johnson George, Sachin Sant

On 19/04/2023 21:38, Bird, Tim wrote:
> 
>> -----Original Message-----
>> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
>> Hello everyone involved with, or interested in KCIDB,
>>
>> I would like to make KCIDB I/O schema accept one more test status string:
>> "MISS" (preliminary name), meaning the test was supposed to run, but didn't
>> because of a harness/framework/infrastructure/CI system failure.
> 
>>
>> This would be distinct from the "SKIP" status, meaning the test was supposed
>> to run, but didn't (or didn't complete), because it was not applicable.
>>
>> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74
>>
>> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
>> objections by that time.
>>
>> Read on for details, and respond either to this message, or in the PR.
>>
>> At CKI we have both tests that skip themselves for valid reasons, and tests
>> that should've executed, but didn't. Because e.g. the previous test has
>> crashed the machine, or the tested kernel failed to boot. At the same time, we
>> want to make sure all the tests we planned to execute had their chance.
>>
>> Before testing, we send our plan to our database as a KCIDB dataset with all
>> tests listed without the "status" field. According to KCIDB schema/protocol a
>> missing "status" field simply means no data, and is canonically interpreted as
>> "execution in progress".
>>
>> After the testing is done, or we gave up trying, we send the same KCIDB test
>> objects to the database, but this time only containing whatever results we
>> got, including the "status" fields. However, with the current set of status
>> strings [1], the only way we can try to express "wanted to run, but couldn't"
>> is with "SKIP", which is not supposed to alert anyone, yet this situation
>> should be treated as a problem.
> Why can't "wanted to run, but couldn't" be expressed with "ERROR"? 

There's also "wanted to run, but didn't" if the test wasn't
actually run.  I guess failing to start the test is a harness
problem, and a failure while it's running is more likely a test
problem (but not necessarily).  But also, what if a test has
disappeared from a test suite?

>> We propose to call this new status "MISS" (as in "the test result should be
>> there, but isn't"), and think it would be useful to others as well.
>>
>> We can break down the testing stack into three layers: the tested code, the
>> test, and the harness (and everything above it) that runs the test. If we then
>> express each existing test status as one trinary outcome per each of those
>> layers, we would get this table (in order of descending status priority):
>>
>>      STATUS      CODE TEST HARNESS+           LEGEND
>>
>>      FAIL        ❌   ✅   ✅                 ❌ - failure
>>      ERROR       ➖   ❌   ✅                 ✅ - success
>>      PASS        ✅   ✅   ✅                 ➖ - no data
>>      DONE        ➖   ✅   ✅
>>      SKIP        ➖   ➖   ✅
>>                  ➖   ➖   ➖
>>
>> If you look at the above closely, you will notice one possible state missing
>> (because we didn't need to express failing harnesses), and that is the status
>> we want to introduce:
>>
>>      STATUS      CODE TEST HARNESS+           LEGEND
>>
>>      FAIL        ❌   ✅   ✅                 ❌ - failure
>>      ERROR       ➖   ❌   ✅                 ✅ - success
>> =>  MISS        ➖   ➖   ❌ <=              ➖ - no data
>>      PASS        ✅   ✅   ✅
>>      DONE        ➖   ✅   ✅
>>      SKIP        ➖   ➖   ✅
>>                  ➖   ➖   ➖

It seems there are many reasons for a test to not run correctly
and trying to put them in buckets isn't as simple as that.  I
would rather be in favour of having fewer states: pass, fail,
skip and unknown for when there's no clear test result.

On a side note, the last line with no data anywhere doesn't have
a name.  Is the status field optional in KCIDB, so we can send
data with no status?  That would match the last line I guess.  Or
maybe it could be named as UNKNOWN or something.  I think that
kind of relates to missing test results if there's no positive
error reported by the test harness, see my other paragraph below.

>> Please respond with comments, objections, and (counter-)proposals,
>> if you have them.
> 
> I don't understand the rationale for distinguishing a test error from a harness
> error.  In either case the test was not executed properly, and so there is no
> useful test result data available.  Diagnostic information should enable
> the user to determine whether the problem was due to the test code failing
> or the test harness failing.
> 
> I think I'm missing something.  Are you trying to distinguish these so you
> can determine whether there is a problem with the test itself, vs. the harness?
> Are you automatically re-running a test if the harness is the problem?
> 
> Why do you want to distinguish these error cases?

I was wondering the same thing.  Is MISS effectively like
a "null" default state for when the system knows a test has been
started and a result is expected but hasn't arrived yet?  Whereas
ERROR is when the test has clearly hit an error and failed to
run?  For example, if a test suite changed and a test case
disappeared or was renamed then you would see a MISS result with
the old test case name and that's not really an error.

Thanks,
Guillaume

>> Thank you for your attention!
>> Nick (and the CKI team)
>>
>> [1] The current set of supported status strings
>>
>> https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Automated-testing] KCIDB: Support one more test status
  2023-04-20  5:53   ` Guillaume Tucker
@ 2023-04-20 16:37     ` Nikolai Kondrashov
  2023-05-05 11:39       ` Guillaume Tucker
  0 siblings, 1 reply; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-04-20 16:37 UTC (permalink / raw)
  To: guillaume.tucker, Bird, Tim, kernelci@lists.linux.dev,
	Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li,
	Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI,
	Mark Brown, Johnson George, Sachin Sant

Hi Guillaume,

Thank you for your response!

Snipping & answering below.

On 4/20/23 08:53, Guillaume Tucker via lists.yoctoproject.org wrote:
> On 19/04/2023 21:38, Bird, Tim wrote:
>>> -----Original Message-----
>>> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
>>> Hello everyone involved with, or interested in KCIDB,
>>>
>>> After the testing is done, or we gave up trying, we send the same KCIDB test
>>> objects to the database, but this time only containing whatever results we
>>> got, including the "status" fields. However, with the current set of status
>>> strings [1], the only way we can try to express "wanted to run, but couldn't"
>>> is with "SKIP", which is not supposed to alert anyone, yet this situation
>>> should be treated as a problem.
>>
>> Why can't "wanted to run, but couldn't" be expressed with "ERROR"?
> 
> There's also "wanted to run, but didn't" if the test wasn't
> actually run.  I guess failing to start the test is a harness
> problem, and a failure while it's running is more likely a test
> problem (but not necessarily).

Yes, that's the idea.

> But also, what if a test has disappeared from a test suite?

That depends where you slice your responsibility, I suppose.

If you want to control and make sure which exact sub-tests of a test suite 
should be executed, then you should report the disappeared test as "MISS", 
because it would be the fault of your harness not noticing that.

If you'd prefer to leave it to the test suite maintainers to decide what 
running their test suite (or a subset of) means, then you just collect 
whatever you get, and only report "MISS" for the whole test suite when you 
failed to run it.

>>> If you look at the above closely, you will notice one possible state missing
>>> (because we didn't need to express failing harnesses), and that is the status
>>> we want to introduce:
>>>
>>>       STATUS      CODE TEST HARNESS+           LEGEND
>>>
>>>       FAIL        ❌   ✅   ✅                 ❌ - failure
>>>       ERROR       ➖   ❌   ✅                 ✅ - success
>>>   =>  MISS        ➖   ➖   ❌ <=              ➖ - no data
>>>       PASS        ✅   ✅   ✅
>>>       DONE        ➖   ✅   ✅
>>>       SKIP        ➖   ➖   ✅
>>>                   ➖   ➖   ➖
> 
> It seems there are many reasons for a test to not run correctly
> and trying to put them in buckets isn't as simple as that.  I
> would rather be in favour of having fewer states: pass, fail,
> skip and unknown for when there's no clear test result.

I agree, we could slice the test outcome into an infinite number of statuses, 
for every little thing, and keep adding them until the cows come home.

However, if you look at the above table, you would see that we have statuses 
strictly focusing on which part of the stack has passed/failed/had no data, 
and these parts of the stack correspond to the clearly-separate responsibility 
areas. We're just enumerating all the possibilities in that space, and "MISS" 
plugs the one remaining hole.

So, I think it has its place. That is unless we object to having harness (and 
higher) failures reported to KCIDB.

I would say, though, that we should accept them. Because, even if the CI 
systems try hard to avoid them, their failures will creep in anyway, and then 
the issue/incident system will be there to help them set the correct status 
after the fact, for the situations when they failed to execute the test and 
reported something else by mistake.

And of course, I'd be happy if this new status helps CKI keep using KCIDB for 
internal result reporting as well.

> On a side note, the last line with no data anywhere doesn't have
> a name.  Is the status field optional in KCIDB, so we can send
> data with no status?  That would match the last line I guess.  Or
> maybe it could be named as UNKNOWN or something.  I think that
> kind of relates to missing test results if there's no positive
> error reported by the test harness, see my other paragraph below.

Yes, the "status" field is optional (same as most other fields), and yes, the 
last line represents it missing (all missing fields represent "no data" in KCIDB).

I don't think we should add an explicit "UNKNOWN" status, because then we 
would have two kinds of "unknowns", with all the problems that follow.

>> I think I'm missing something.  Are you trying to distinguish these so you
>> can determine whether there is a problem with the test itself, vs. the harness?
>> Are you automatically re-running a test if the harness is the problem?
>>
>> Why do you want to distinguish these error cases?
> 
> I was wondering the same thing.  Is MISS effectively like
> a "null" default state for when the system knows a test has been
> started and a result is expected but hasn't arrived yet?  Whereas
> ERROR is when the test has clearly hit an error and failed to
> run?  For example, if a test suite changed and a test case
> disappeared or was renamed then you would see a MISS result with
> the old test case name and that's not really an error.

No, "MISS" is not the default state. The default state is the "status" field 
missing, which indicates that the test is planned, but not (completely) 
executed yet. That is the only field state that could be changed by the KCIDB 
protocol.

I.e. the only possible change to an already-submitted object is to add a 
missing field. It's not possible to deterministically do any other change.
Basically, whichever status string you submit is final for that test.

"MISS" is for when you sent your "test plan" in the form of a set of tests 
with missing "status" fields, proceeded to testing, and then e.g. network went 
down in the middle of it, and you couldn't reach your hosts to get the results 
until you ran out of time. Or e.g. you couldn't get the hosts you need for 
some of the tests in a reasonable time, because they were taken by someone 
else. Or the kernel simply failed to boot on some hosts, and no tests could be 
executed there.

In these cases, you can send the "MISS" status to indicate you gave up on this 
run. Then people and machines can stop waiting for those results and go do 
something else, like say "this version was not fully tested, so we cannot 
release it, and gonna ask for a retry", or "we did our best, upstream, here 
are your test results, such as they are, take 'em or leave 'em". Or you can 
use this status in internal communication, to trigger a re-run of those 
particular tests, before reporting the results to upstream KCIDB.

Yes, the "ERROR" status is for when we actually got to executing the test 
(suite), and then it messed up, and e.g. crashed in the middle of execution. 
This is likely a problem for test maintainers to solve, and not for the CI 
people (although they would do well to keep an eye on such results too).

Yes, as I also described above, if you care about which exact tests a test 
suite executes, and some tests in your previously-submitted plan didn't 
execute after all, then you would need to set their status to "MISS".

But if you only care that this suite executes, and runs whatever tests it 
deems important, you could e.g. send an object for the whole test suite with 
its status missing before starting. Then send its subtests with status filled 
in with whatever you get as they execute, capping off with the final status of 
the whole suite after it's done. And if you failed to run the suite for 
whatever reason, then you could send the "MISS" status for it instead.

Hope this helps :D
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Automated-testing] KCIDB: Support one more test status
  2023-04-20 16:37     ` [Automated-testing] " Nikolai Kondrashov
@ 2023-05-05 11:39       ` Guillaume Tucker
  2023-05-08 10:16         ` Nikolai Kondrashov
  0 siblings, 1 reply; 10+ messages in thread
From: Guillaume Tucker @ 2023-05-05 11:39 UTC (permalink / raw)
  To: Nikolai Kondrashov, Bird, Tim, kernelci@lists.linux.dev,
	Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li,
	Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI,
	Mark Brown, Johnson George, Sachin Sant

On 20/04/2023 18:37, Nikolai Kondrashov wrote:
> Hi Guillaume,
> 
> Thank you for your response!
> 
> Snipping & answering below.
> 
> On 4/20/23 08:53, Guillaume Tucker via lists.yoctoproject.org wrote:
>> On 19/04/2023 21:38, Bird, Tim wrote:
>>>> -----Original Message-----
>>>> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
>>>> Hello everyone involved with, or interested in KCIDB,
>>>>
>>>> After the testing is done, or we gave up trying, we send the same KCIDB test
>>>> objects to the database, but this time only containing whatever results we
>>>> got, including the "status" fields. However, with the current set of status
>>>> strings [1], the only way we can try to express "wanted to run, but couldn't"
>>>> is with "SKIP", which is not supposed to alert anyone, yet this situation
>>>> should be treated as a problem.
>>>
>>> Why can't "wanted to run, but couldn't" be expressed with "ERROR"?
>>
>> There's also "wanted to run, but didn't" if the test wasn't
>> actually run.  I guess failing to start the test is a harness
>> problem, and a failure while it's running is more likely a test
>> problem (but not necessarily).
> 
> Yes, that's the idea.
> 
>> But also, what if a test has disappeared from a test suite?
> 
> That depends where you slice your responsibility, I suppose.
> 
> If you want to control and make sure which exact sub-tests of a test suite should be executed, then you should report the disappeared test as "MISS", because it would be the fault of your harness not noticing that.
> 
> If you'd prefer to leave it to the test suite maintainers to decide what running their test suite (or a subset of) means, then you just collect whatever you get, and only report "MISS" for the whole test suite when you failed to run it.
> 
>>>> If you look at the above closely, you will notice one possible state missing
>>>> (because we didn't need to express failing harnesses), and that is the status
>>>> we want to introduce:
>>>>
>>>>       STATUS      CODE TEST HARNESS+           LEGEND
>>>>
>>>>       FAIL        ❌   ✅   ✅                 ❌ - failure
>>>>       ERROR       ➖   ❌   ✅                 ✅ - success
>>>>   =>  MISS        ➖   ➖   ❌ <=              ➖ - no data
>>>>       PASS        ✅   ✅   ✅
>>>>       DONE        ➖   ✅   ✅
>>>>       SKIP        ➖   ➖   ✅
>>>>                   ➖   ➖   ➖
>>
>> It seems there are many reasons for a test to not run correctly
>> and trying to put them in buckets isn't as simple as that.  I
>> would rather be in favour of having fewer states: pass, fail,
>> skip and unknown for when there's no clear test result.
> 
> I agree, we could slice the test outcome into an infinite number of statuses, for every little thing, and keep adding them until the cows come home.
> 
> However, if you look at the above table, you would see that we have statuses strictly focusing on which part of the stack has passed/failed/had no data, and these parts of the stack correspond to the clearly-separate responsibility areas. We're just enumerating all the possibilities in that space, and "MISS" plugs the one remaining hole.
> 
> So, I think it has its place. That is unless we object to having harness (and higher) failures reported to KCIDB.
> 
> I would say, though, that we should accept them. Because, even if the CI systems try hard to avoid them, their failures will creep in anyway, and then the issue/incident system will be there to help them set the correct status after the fact, for the situations when they failed to execute the test and reported something else by mistake.
> 
> And of course, I'd be happy if this new status helps CKI keep using KCIDB for internal result reporting as well.
> 
>> On a side note, the last line with no data anywhere doesn't have
>> a name.  Is the status field optional in KCIDB, so we can send
>> data with no status?  That would match the last line I guess.  Or
>> maybe it could be named as UNKNOWN or something.  I think that
>> kind of relates to missing test results if there's no positive
>> error reported by the test harness, see my other paragraph below.
> 
> Yes, the "status" field is optional (same as most other fields), and yes, the last line represents it missing (all missing fields represent "no data" in KCIDB).
> 
> I don't think we should add an explicit "UNKNOWN" status, because then we would have two kinds of "unknowns", with all the problems that follow.

OK that's fair, effectively "null" fills that gap.

>>> I think I'm missing something.  Are you trying to distinguish these so you
>>> can determine whether there is a problem with the test itself, vs. the harness?
>>> Are you automatically re-running a test if the harness is the problem?
>>>
>>> Why do you want to distinguish these error cases?
>>
>> I was wondering the same thing.  Is MISS effectively like
>> a "null" default state for when the system knows a test has been
>> started and a result is expected but hasn't arrived yet?  Whereas
>> ERROR is when the test has clearly hit an error and failed to
>> run?  For example, if a test suite changed and a test case
>> disappeared or was renamed then you would see a MISS result with
>> the old test case name and that's not really an error.
> 
> No, "MISS" is not the default state. The default state is the "status" field missing, which indicates that the test is planned, but not (completely) executed yet. That is the only field state that could be changed by the KCIDB protocol.
> 
> I.e. the only possible change to an already-submitted object is to add a missing field. It's not possible to deterministically do any other change.
> Basically, whichever status string you submit is final for that test.
> 
> "MISS" is for when you sent your "test plan" in the form of a set of tests with missing "status" fields, proceeded to testing, and then e.g. network went down in the middle of it, and you couldn't reach your hosts to get the results until you ran out of time. Or e.g. you couldn't get the hosts you need for some of the tests in a reasonable time, because they were taken by someone else. Or the kernel simply failed to boot on some hosts, and no tests could be executed there.
> 
> In these cases, you can send the "MISS" status to indicate you gave up on this run. Then people and machines can stop waiting for those results and go do something else, like say "this version was not fully tested, so we cannot release it, and gonna ask for a retry", or "we did our best, upstream, here are your test results, such as they are, take 'em or leave 'em". Or you can use this status in internal communication, to trigger a re-run of those particular tests, before reporting the results to upstream KCIDB.
> 
> Yes, the "ERROR" status is for when we actually got to executing the test (suite), and then it messed up, and e.g. crashed in the middle of execution. This is likely a problem for test maintainers to solve, and not for the CI people (although they would do well to keep an eye on such results too).

OK I think the documentation could be updated to provide a bit
more context about what's expected from each status:

  https://kernelci.org/docs/kcidb/submitter_guide/#status

At the moment, it just says that ERROR means:

  "the test is faulty, the status of the tested code is unknown."

The word "test" here I think means the code written to run a test
to you, but I can see how this might be interpreted differently
and things that prevent the test from being run could slip in and
be reported as errors.  Maybe adding something about who in
principle is responsible for each error status would help as
discussed in yesterday's meeting (e.g. ERROR is for the test
authors, MISS is for the people in charge of the test
infrastructure).

> Yes, as I also described above, if you care about which exact tests a test suite executes, and some tests in your previously-submitted plan didn't execute after all, then you would need to set their status to "MISS".

Right, even though I think that's a grey area between the test
and the infrastructure.  Maybe no status would also work here.

> But if you only care that this suite executes, and runs whatever tests it deems important, you could e.g. send an object for the whole test suite with its status missing before starting. Then send its subtests with status filled in with whatever you get as they execute, capping off with the final status of the whole suite after it's done. And if you failed to run the suite for whatever reason, then you could send the "MISS" status for it instead.

I think my main point was that we could just have ERROR with a
broader scope and then something else like an error type or
message to differentiate errors between the test, the
infrastructure or anything specific to the CI system.  So from a
user point of view, it's either a valid result i.e. PASS / FAIL /
SKIP or something went wrong and it's ERROR.

On the other hand, having MISS shouldn't really be a problem.  I
can see why it could be very useful to have it in KCIDB for CKI's
use-case at least.  So I'm still not personally convinced by this
but I think it's fine to add it anyway as I don't see any
blocking reason for not adding it either.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Automated-testing] KCIDB: Support one more test status
  2023-05-05 11:39       ` Guillaume Tucker
@ 2023-05-08 10:16         ` Nikolai Kondrashov
  0 siblings, 0 replies; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-05-08 10:16 UTC (permalink / raw)
  To: Guillaume Tucker, Bird, Tim, kernelci@lists.linux.dev,
	Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li,
	Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI,
	Mark Brown, Johnson George, Sachin Sant

Hi Guillaume,

Thank you for the response.
I snippet some old stuff, and answered a bit, below.

On 5/5/23 14:39, Guillaume Tucker wrote:
> On 20/04/2023 18:37, Nikolai Kondrashov wrote:
>> Yes, the "ERROR" status is for when we actually got to executing the test (suite), and then it messed up, and e.g. crashed in the middle of execution. This is likely a problem for test maintainers to solve, and not for the CI people (although they would do well to keep an eye on such results too).
> 
> OK I think the documentation could be updated to provide a bit
> more context about what's expected from each status:
> 
>    https://kernelci.org/docs/kcidb/submitter_guide/#status
> 
> At the moment, it just says that ERROR means:
> 
>    "the test is faulty, the status of the tested code is unknown."
> 
> The word "test" here I think means the code written to run a test
> to you, but I can see how this might be interpreted differently
> and things that prevent the test from being run could slip in and
> be reported as errors.  Maybe adding something about who in
> principle is responsible for each error status would help as
> discussed in yesterday's meeting (e.g. ERROR is for the test
> authors, MISS is for the people in charge of the test
> infrastructure).

Yes, we could use better documentation, definitely. Have you taken a look at 
the PR in question? I refined the schema docs there a bit, do they make more 
sense now?

Here's the PR: https://github.com/kernelci/kcidb-io/pull/74

If that looks good, I'll update the Submitter Guide with this and go into a 
bit more detail, adding the table I used here to break down the responsibility.

>> Yes, as I also described above, if you care about which exact tests a test suite executes, and some tests in your previously-submitted plan didn't execute after all, then you would need to set their status to "MISS".
> 
> Right, even though I think that's a grey area between the test
> and the infrastructure.  Maybe no status would also work here.
> 

Yes, often it's hard to determine what's at fault, but when you can it's best 
to avoid alerting people who are not responsible for the problem (such as not 
notifying the test maintainer in case of CI failure).

>> But if you only care that this suite executes, and runs whatever tests it deems important, you could e.g. send an object for the whole test suite with its status missing before starting. Then send its subtests with status filled in with whatever you get as they execute, capping off with the final status of the whole suite after it's done. And if you failed to run the suite for whatever reason, then you could send the "MISS" status for it instead.
> 
> I think my main point was that we could just have ERROR with a
> broader scope and then something else like an error type or
> message to differentiate errors between the test, the
> infrastructure or anything specific to the CI system.  So from a
> user point of view, it's either a valid result i.e. PASS / FAIL /
> SKIP or something went wrong and it's ERROR.

Sure, some CI systems would only be able to report "ERROR", that's OK, and 
even if you try hard to report a "MISS" when you can do it, "ERROR" would also 
keep slipping in by mistake.

While we already have a lot of fields (and need more still), I think "MISS" 
fits the value space of existing test statuses quite well. After all, we 
already try to distinguish "ERROR" from "FAIL", splitting the test maintainer 
responsibility from the developer responsibility (when possible). We're just 
adding a finer gradation here, splitting "ERROR" again, into test maintainer 
and CI maintainer responsibilities.

> On the other hand, having MISS shouldn't really be a problem.  I
> can see why it could be very useful to have it in KCIDB for CKI's
> use-case at least.  So I'm still not personally convinced by this
> but I think it's fine to add it anyway as I don't see any
> blocking reason for not adding it either.

Thank you, Guillaume. I have now merged the PR and will proceed on finishing 
support for it in the notifications and dashboards. Right now the new schema 
version is not accepted by Production yet.

Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KCIDB: Support one more test status
  2023-04-19 19:38 ` Bird, Tim
  2023-04-20  5:53   ` Guillaume Tucker
@ 2023-04-20 12:08   ` Nikolai Kondrashov
  2023-04-20 21:48     ` Bird, Tim
  1 sibling, 1 reply; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-04-20 12:08 UTC (permalink / raw)
  To: Bird, Tim, kernelci@lists.linux.dev, Dmitry Vyukov,
	Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj,
	automated-testing@lists.yoctoproject.org, CKI, Mark Brown,
	Johnson George, Sachin Sant

Hi Tim,

Thanks a lot for your response!
I will do some snipping and answering below.

On 4/19/23 22:38, Bird, Tim wrote:
>> After the testing is done, or we gave up trying, we send the same KCIDB test
>> objects to the database, but this time only containing whatever results we
>> got, including the "status" fields. However, with the current set of status
>> strings [1], the only way we can try to express "wanted to run, but couldn't"
>> is with "SKIP", which is not supposed to alert anyone, yet this situation
>> should be treated as a problem.
>
> Why can't "wanted to run, but couldn't" be expressed with "ERROR"?

This is a matter of responsibility area and distinguishing who should be 
fixing the problem.

The three layers I listed below correspond to the three distinct parties 
involved in testing, at Red Hat, and other large CI systems: "CODE" is for 
kernel developers/maintainers, "TEST" is for test maintainers, and "HARNESS+" 
is for CI system maintainers.

At Red Hat we have the CKI project, which is responsible for maintaining the 
pipeline, builds, provisioning, reporting, etc - "HARNESS+". Then we have *a 
lot* of test maintainers, both internal (for the tests Red Hat needs), and 
external (for test suites like LTP) - "TEST". Finally, we have the kernel 
developers/maintainers, of course - "CODE".

Naturally, if there was an issue with the test (normally reported as ERROR), 
we don't want to bother kernel developers. A test maintainer would need to 
deal with that (and they would get their notification), although they would be 
interested in just regular PASS/FAIL results too.

Similarly, if the CI system couldn't manage to run the test, we won't want to 
report it as ERROR, because that would alert the test maintainer, while it 
wouldn't be their fault at all, and they shouldn't go and waste their time 
investigating it.

Now, of course, this particular split is not always there, or it's not so 
clear-cut. E.g. kunit tests are normally maintained by kernel developers 
themselves. So they would be interested in both "CODE" and "TEST" layers for 
those, and so wouldn't really need the "ERROR" status - "FAIL" would be enough.

CI system maintainers often take the role of test maintainers as well, and 
they wouldn't really need to distinguish "TEST" and "HARNESS+", for them it 
would be "TEST+", and they wouldn't need the "MISS" status - "ERROR" would be 
enough.

However, this split (and the various statuses) is a good tool for handling all 
the various responsibility combinations the CI systems submitting to KCIDB 
have. It allows precise targeting of notifications and dashboard data, saving 
time and effort in many cases.

>> We propose to call this new status "MISS" (as in "the test result should be
>> there, but isn't"), and think it would be useful to others as well.
>>
>> We can break down the testing stack into three layers: the tested code, the
>> test, and the harness (and everything above it) that runs the test. If we then
>> express each existing test status as one trinary outcome per each of those
>> layers, we would get this table (in order of descending status priority):
>>
>>       STATUS      CODE TEST HARNESS+           LEGEND
>>
>>       FAIL        ❌   ✅   ✅                 ❌ - failure
>>       ERROR       ➖   ❌   ✅                 ✅ - success
>>       PASS        ✅   ✅   ✅                 ➖ - no data
>>       DONE        ➖   ✅   ✅
>>       SKIP        ➖   ➖   ✅
>>                   ➖   ➖   ➖
>>
>> If you look at the above closely, you will notice one possible state missing
>> (because we didn't need to express failing harnesses), and that is the status
>> we want to introduce:
>>
>>       STATUS      CODE TEST HARNESS+           LEGEND
>>
>>       FAIL        ❌   ✅   ✅                 ❌ - failure
>>       ERROR       ➖   ❌   ✅                 ✅ - success
>>   =>  MISS        ➖   ➖   ❌ <=              ➖ - no data
>>       PASS        ✅   ✅   ✅
>>       DONE        ➖   ✅   ✅
>>       SKIP        ➖   ➖   ✅
>>                   ➖   ➖   ➖
>>
>> Please respond with comments, objections, and (counter-)proposals,
>> if you have them.
> 
> I don't understand the rationale for distinguishing a test error from a harness
> error.  In either case the test was not executed properly, and so there is no
> useful test result data available.  Diagnostic information should enable
> the user to determine whether the problem was due to the test code failing
> or the test harness failing.

This works when the test maintainer and the harness/framework/CI system 
maintainer are the same person or team. This doesn't work when everything from 
CI system down to the test (suite) harness is maintained by one team, and the 
test by a completely different team (e.g. CKI and LTP).

> I think I'm missing something.  Are you trying to distinguish these so you
> can determine whether there is a problem with the test itself, vs. the harness?

Yes.

> Are you automatically re-running a test if the harness is the problem?

We do try rerunning tests in case we hit a faulty host in our inventory (this 
happens), or e.g. a network problem occurred. However, at some point we gotta 
give up, and then we need a way to say: "this test result is not just missing 
or in progress (as specified by a missing "status" property), but we're done 
testing, and we couldn't run that test".

Because we usually run multiple suites on each machine, one after another, if 
one of them crashes/locks up the machine (we're testing the kernel after all), 
then the following suites won't be able to run. In this case we also need a 
way to say "we finished testing, but those tests didn't even get to run".

> Why do you want to distinguish these error cases?

As described above, to alert the right people and avoid wasting time of others.

Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: KCIDB: Support one more test status
  2023-04-20 12:08   ` Nikolai Kondrashov
@ 2023-04-20 21:48     ` Bird, Tim
  0 siblings, 0 replies; 10+ messages in thread
From: Bird, Tim @ 2023-04-20 21:48 UTC (permalink / raw)
  To: Nikolai Kondrashov, kernelci@lists.linux.dev, Dmitry Vyukov,
	Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj,
	automated-testing@lists.yoctoproject.org, CKI, Mark Brown,
	Johnson George, Sachin Sant



> -----Original Message-----
> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
> Hi Tim,
> 
> Thanks a lot for your response!
> I will do some snipping and answering below.
> 
> On 4/19/23 22:38, Bird, Tim wrote:
> >> After the testing is done, or we gave up trying, we send the same KCIDB test
> >> objects to the database, but this time only containing whatever results we
> >> got, including the "status" fields. However, with the current set of status
> >> strings [1], the only way we can try to express "wanted to run, but couldn't"
> >> is with "SKIP", which is not supposed to alert anyone, yet this situation
> >> should be treated as a problem.
> >
> > Why can't "wanted to run, but couldn't" be expressed with "ERROR"?
> 
> This is a matter of responsibility area and distinguishing who should be
> fixing the problem.
> 
> The three layers I listed below correspond to the three distinct parties
> involved in testing, at Red Hat, and other large CI systems: "CODE" is for
> kernel developers/maintainers, "TEST" is for test maintainers, and "HARNESS+"
> is for CI system maintainers.
> 
> At Red Hat we have the CKI project, which is responsible for maintaining the
> pipeline, builds, provisioning, reporting, etc - "HARNESS+". Then we have *a
> lot* of test maintainers, both internal (for the tests Red Hat needs), and
> external (for test suites like LTP) - "TEST". Finally, we have the kernel
> developers/maintainers, of course - "CODE".
> 
> Naturally, if there was an issue with the test (normally reported as ERROR),
> we don't want to bother kernel developers. A test maintainer would need to
> deal with that (and they would get their notification), although they would be
> interested in just regular PASS/FAIL results too.
> 
> Similarly, if the CI system couldn't manage to run the test, we won't want to
> report it as ERROR, because that would alert the test maintainer, while it
> wouldn't be their fault at all, and they shouldn't go and waste their time
> investigating it.
> 
> Now, of course, this particular split is not always there, or it's not so
> clear-cut. E.g. kunit tests are normally maintained by kernel developers
> themselves. So they would be interested in both "CODE" and "TEST" layers for
> those, and so wouldn't really need the "ERROR" status - "FAIL" would be enough.
> 
> CI system maintainers often take the role of test maintainers as well, and
> they wouldn't really need to distinguish "TEST" and "HARNESS+", for them it
> would be "TEST+", and they wouldn't need the "MISS" status - "ERROR" would be
> enough.
> 
> However, this split (and the various statuses) is a good tool for handling all
> the various responsibility combinations the CI systems submitting to KCIDB
> have. It allows precise targeting of notifications and dashboard data, saving
> time and effort in many cases.

Ok - this sounds reasonable.
 -- Tim


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-08 10:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov
2023-04-19 17:49 ` Mark Brown
2023-04-20 16:50   ` Nikolai Kondrashov
2023-04-19 19:38 ` Bird, Tim
2023-04-20  5:53   ` Guillaume Tucker
2023-04-20 16:37     ` [Automated-testing] " Nikolai Kondrashov
2023-05-05 11:39       ` Guillaume Tucker
2023-05-08 10:16         ` Nikolai Kondrashov
2023-04-20 12:08   ` Nikolai Kondrashov
2023-04-20 21:48     ` Bird, Tim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox