* KCIDB: Support one more test status
@ 2023-04-19 17:08 Nikolai Kondrashov
2023-04-19 17:49 ` Mark Brown
2023-04-19 19:38 ` Bird, Tim
0 siblings, 2 replies; 10+ messages in thread
From: Nikolai Kondrashov @ 2023-04-19 17:08 UTC (permalink / raw)
To: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi,
Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI,
Mark Brown, Johnson George, Sachin Sant
Hello everyone involved with, or interested in KCIDB,
I would like to make KCIDB I/O schema accept one more test status string:
"MISS" (preliminary name), meaning the test was supposed to run, but didn't
because of a harness/framework/infrastructure/CI system failure.
This would be distinct from the "SKIP" status, meaning the test was supposed
to run, but didn't (or didn't complete), because it was not applicable.
Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74
I'll merge it in two weeks, on Wed, May 3, if there are no unresolved
objections by that time.
Read on for details, and respond either to this message, or in the PR.
At CKI we have both tests that skip themselves for valid reasons, and tests
that should've executed, but didn't. Because e.g. the previous test has
crashed the machine, or the tested kernel failed to boot. At the same time, we
want to make sure all the tests we planned to execute had their chance.
Before testing, we send our plan to our database as a KCIDB dataset with all
tests listed without the "status" field. According to KCIDB schema/protocol a
missing "status" field simply means no data, and is canonically interpreted as
"execution in progress".
After the testing is done, or we gave up trying, we send the same KCIDB test
objects to the database, but this time only containing whatever results we
got, including the "status" fields. However, with the current set of status
strings [1], the only way we can try to express "wanted to run, but couldn't"
is with "SKIP", which is not supposed to alert anyone, yet this situation
should be treated as a problem.
We propose to call this new status "MISS" (as in "the test result should be
there, but isn't"), and think it would be useful to others as well.
We can break down the testing stack into three layers: the tested code, the
test, and the harness (and everything above it) that runs the test. If we then
express each existing test status as one trinary outcome per each of those
layers, we would get this table (in order of descending status priority):
STATUS CODE TEST HARNESS+ LEGEND
FAIL ❌ ✅ ✅ ❌ - failure
ERROR ➖ ❌ ✅ ✅ - success
PASS ✅ ✅ ✅ ➖ - no data
DONE ➖ ✅ ✅
SKIP ➖ ➖ ✅
➖ ➖ ➖
If you look at the above closely, you will notice one possible state missing
(because we didn't need to express failing harnesses), and that is the status
we want to introduce:
STATUS CODE TEST HARNESS+ LEGEND
FAIL ❌ ✅ ✅ ❌ - failure
ERROR ➖ ❌ ✅ ✅ - success
=> MISS ➖ ➖ ❌ <= ➖ - no data
PASS ✅ ✅ ✅
DONE ➖ ✅ ✅
SKIP ➖ ➖ ✅
➖ ➖ ➖
Please respond with comments, objections, and (counter-)proposals,
if you have them.
Thank you for your attention!
Nick (and the CKI team)
[1] The current set of supported status strings
https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: KCIDB: Support one more test status 2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov @ 2023-04-19 17:49 ` Mark Brown 2023-04-20 16:50 ` Nikolai Kondrashov 2023-04-19 19:38 ` Bird, Tim 1 sibling, 1 reply; 10+ messages in thread From: Mark Brown @ 2023-04-19 17:49 UTC (permalink / raw) To: Nikolai Kondrashov Cc: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI, Johnson George, Sachin Sant [-- Attachment #1: Type: text/plain, Size: 946 bytes --] On Wed, Apr 19, 2023 at 08:08:03PM +0300, Nikolai Kondrashov wrote: > I would like to make KCIDB I/O schema accept one more test status string: > "MISS" (preliminary name), meaning the test was supposed to run, but didn't > because of a harness/framework/infrastructure/CI system failure. > This would be distinct from the "SKIP" status, meaning the test was supposed > to run, but didn't (or didn't complete), because it was not applicable. > Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74 > I'll merge it in two weeks, on Wed, May 3, if there are no unresolved > objections by that time. > Read on for details, and respond either to this message, or in the PR. This makes sense to me, it corresponds to how my own testing inteprets results - I have a list of tests I expect to pass, and if any of them return any result other than pass then that's an issue. I also picked the name "missing" for that in my code. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KCIDB: Support one more test status 2023-04-19 17:49 ` Mark Brown @ 2023-04-20 16:50 ` Nikolai Kondrashov 0 siblings, 0 replies; 10+ messages in thread From: Nikolai Kondrashov @ 2023-04-20 16:50 UTC (permalink / raw) To: Mark Brown Cc: kernelci, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing, Tim Bird, CKI, Johnson George, Sachin Sant On 4/19/23 20:49, Mark Brown wrote: > On Wed, Apr 19, 2023 at 08:08:03PM +0300, Nikolai Kondrashov wrote: > >> I would like to make KCIDB I/O schema accept one more test status string: >> "MISS" (preliminary name), meaning the test was supposed to run, but didn't >> because of a harness/framework/infrastructure/CI system failure. > >> This would be distinct from the "SKIP" status, meaning the test was supposed >> to run, but didn't (or didn't complete), because it was not applicable. > >> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74 > >> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved >> objections by that time. > >> Read on for details, and respond either to this message, or in the PR. > > This makes sense to me, it corresponds to how my own testing inteprets > results - I have a list of tests I expect to pass, and if any of them > return any result other than pass then that's an issue. I also picked > the name "missing" for that in my code. Yeah, it's kinda like that. Only KCIDB defines explicit statuses for a bunch of specific failures, and I propose to make "MISS" represent harness/framework/CI system mess-ups/"misses" only. Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: KCIDB: Support one more test status 2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov 2023-04-19 17:49 ` Mark Brown @ 2023-04-19 19:38 ` Bird, Tim 2023-04-20 5:53 ` Guillaume Tucker 2023-04-20 12:08 ` Nikolai Kondrashov 1 sibling, 2 replies; 10+ messages in thread From: Bird, Tim @ 2023-04-19 19:38 UTC (permalink / raw) To: Nikolai Kondrashov, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant > -----Original Message----- > From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> > Hello everyone involved with, or interested in KCIDB, > > I would like to make KCIDB I/O schema accept one more test status string: > "MISS" (preliminary name), meaning the test was supposed to run, but didn't > because of a harness/framework/infrastructure/CI system failure. > > This would be distinct from the "SKIP" status, meaning the test was supposed > to run, but didn't (or didn't complete), because it was not applicable. > > Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74 > > I'll merge it in two weeks, on Wed, May 3, if there are no unresolved > objections by that time. > > Read on for details, and respond either to this message, or in the PR. > > At CKI we have both tests that skip themselves for valid reasons, and tests > that should've executed, but didn't. Because e.g. the previous test has > crashed the machine, or the tested kernel failed to boot. At the same time, we > want to make sure all the tests we planned to execute had their chance. > > Before testing, we send our plan to our database as a KCIDB dataset with all > tests listed without the "status" field. According to KCIDB schema/protocol a > missing "status" field simply means no data, and is canonically interpreted as > "execution in progress". > > After the testing is done, or we gave up trying, we send the same KCIDB test > objects to the database, but this time only containing whatever results we > got, including the "status" fields. However, with the current set of status > strings [1], the only way we can try to express "wanted to run, but couldn't" > is with "SKIP", which is not supposed to alert anyone, yet this situation > should be treated as a problem. Why can't "wanted to run, but couldn't" be expressed with "ERROR"? > > We propose to call this new status "MISS" (as in "the test result should be > there, but isn't"), and think it would be useful to others as well. > > We can break down the testing stack into three layers: the tested code, the > test, and the harness (and everything above it) that runs the test. If we then > express each existing test status as one trinary outcome per each of those > layers, we would get this table (in order of descending status priority): > > STATUS CODE TEST HARNESS+ LEGEND > > FAIL ❌ ✅ ✅ ❌ - failure > ERROR ➖ ❌ ✅ ✅ - success > PASS ✅ ✅ ✅ ➖ - no data > DONE ➖ ✅ ✅ > SKIP ➖ ➖ ✅ > ➖ ➖ ➖ > > If you look at the above closely, you will notice one possible state missing > (because we didn't need to express failing harnesses), and that is the status > we want to introduce: > > STATUS CODE TEST HARNESS+ LEGEND > > FAIL ❌ ✅ ✅ ❌ - failure > ERROR ➖ ❌ ✅ ✅ - success > => MISS ➖ ➖ ❌ <= ➖ - no data > PASS ✅ ✅ ✅ > DONE ➖ ✅ ✅ > SKIP ➖ ➖ ✅ > ➖ ➖ ➖ > > Please respond with comments, objections, and (counter-)proposals, > if you have them. I don't understand the rationale for distinguishing a test error from a harness error. In either case the test was not executed properly, and so there is no useful test result data available. Diagnostic information should enable the user to determine whether the problem was due to the test code failing or the test harness failing. I think I'm missing something. Are you trying to distinguish these so you can determine whether there is a problem with the test itself, vs. the harness? Are you automatically re-running a test if the harness is the problem? Why do you want to distinguish these error cases? -- Tim > > Thank you for your attention! > Nick (and the CKI team) > > [1] The current set of supported status strings > > https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KCIDB: Support one more test status 2023-04-19 19:38 ` Bird, Tim @ 2023-04-20 5:53 ` Guillaume Tucker 2023-04-20 16:37 ` [Automated-testing] " Nikolai Kondrashov 2023-04-20 12:08 ` Nikolai Kondrashov 1 sibling, 1 reply; 10+ messages in thread From: Guillaume Tucker @ 2023-04-20 5:53 UTC (permalink / raw) To: Bird, Tim, Nikolai Kondrashov, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant On 19/04/2023 21:38, Bird, Tim wrote: > >> -----Original Message----- >> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> >> Hello everyone involved with, or interested in KCIDB, >> >> I would like to make KCIDB I/O schema accept one more test status string: >> "MISS" (preliminary name), meaning the test was supposed to run, but didn't >> because of a harness/framework/infrastructure/CI system failure. > >> >> This would be distinct from the "SKIP" status, meaning the test was supposed >> to run, but didn't (or didn't complete), because it was not applicable. >> >> Here's the PR in question: https://github.com/kernelci/kcidb-io/pull/74 >> >> I'll merge it in two weeks, on Wed, May 3, if there are no unresolved >> objections by that time. >> >> Read on for details, and respond either to this message, or in the PR. >> >> At CKI we have both tests that skip themselves for valid reasons, and tests >> that should've executed, but didn't. Because e.g. the previous test has >> crashed the machine, or the tested kernel failed to boot. At the same time, we >> want to make sure all the tests we planned to execute had their chance. >> >> Before testing, we send our plan to our database as a KCIDB dataset with all >> tests listed without the "status" field. According to KCIDB schema/protocol a >> missing "status" field simply means no data, and is canonically interpreted as >> "execution in progress". >> >> After the testing is done, or we gave up trying, we send the same KCIDB test >> objects to the database, but this time only containing whatever results we >> got, including the "status" fields. However, with the current set of status >> strings [1], the only way we can try to express "wanted to run, but couldn't" >> is with "SKIP", which is not supposed to alert anyone, yet this situation >> should be treated as a problem. > Why can't "wanted to run, but couldn't" be expressed with "ERROR"? There's also "wanted to run, but didn't" if the test wasn't actually run. I guess failing to start the test is a harness problem, and a failure while it's running is more likely a test problem (but not necessarily). But also, what if a test has disappeared from a test suite? >> We propose to call this new status "MISS" (as in "the test result should be >> there, but isn't"), and think it would be useful to others as well. >> >> We can break down the testing stack into three layers: the tested code, the >> test, and the harness (and everything above it) that runs the test. If we then >> express each existing test status as one trinary outcome per each of those >> layers, we would get this table (in order of descending status priority): >> >> STATUS CODE TEST HARNESS+ LEGEND >> >> FAIL ❌ ✅ ✅ ❌ - failure >> ERROR ➖ ❌ ✅ ✅ - success >> PASS ✅ ✅ ✅ ➖ - no data >> DONE ➖ ✅ ✅ >> SKIP ➖ ➖ ✅ >> ➖ ➖ ➖ >> >> If you look at the above closely, you will notice one possible state missing >> (because we didn't need to express failing harnesses), and that is the status >> we want to introduce: >> >> STATUS CODE TEST HARNESS+ LEGEND >> >> FAIL ❌ ✅ ✅ ❌ - failure >> ERROR ➖ ❌ ✅ ✅ - success >> => MISS ➖ ➖ ❌ <= ➖ - no data >> PASS ✅ ✅ ✅ >> DONE ➖ ✅ ✅ >> SKIP ➖ ➖ ✅ >> ➖ ➖ ➖ It seems there are many reasons for a test to not run correctly and trying to put them in buckets isn't as simple as that. I would rather be in favour of having fewer states: pass, fail, skip and unknown for when there's no clear test result. On a side note, the last line with no data anywhere doesn't have a name. Is the status field optional in KCIDB, so we can send data with no status? That would match the last line I guess. Or maybe it could be named as UNKNOWN or something. I think that kind of relates to missing test results if there's no positive error reported by the test harness, see my other paragraph below. >> Please respond with comments, objections, and (counter-)proposals, >> if you have them. > > I don't understand the rationale for distinguishing a test error from a harness > error. In either case the test was not executed properly, and so there is no > useful test result data available. Diagnostic information should enable > the user to determine whether the problem was due to the test code failing > or the test harness failing. > > I think I'm missing something. Are you trying to distinguish these so you > can determine whether there is a problem with the test itself, vs. the harness? > Are you automatically re-running a test if the harness is the problem? > > Why do you want to distinguish these error cases? I was wondering the same thing. Is MISS effectively like a "null" default state for when the system knows a test has been started and a result is expected but hasn't arrived yet? Whereas ERROR is when the test has clearly hit an error and failed to run? For example, if a test suite changed and a test case disappeared or was renamed then you would see a MISS result with the old test case name and that's not really an error. Thanks, Guillaume >> Thank you for your attention! >> Nick (and the CKI team) >> >> [1] The current set of supported status strings >> >> https://github.com/kernelci/kcidb-io/blob/0bb7ffff3fe012ae138dd2f7c1d817034fe2c0ba/kcidb_io/schema/v04_01.py#L126 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Automated-testing] KCIDB: Support one more test status 2023-04-20 5:53 ` Guillaume Tucker @ 2023-04-20 16:37 ` Nikolai Kondrashov 2023-05-05 11:39 ` Guillaume Tucker 0 siblings, 1 reply; 10+ messages in thread From: Nikolai Kondrashov @ 2023-04-20 16:37 UTC (permalink / raw) To: guillaume.tucker, Bird, Tim, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant Hi Guillaume, Thank you for your response! Snipping & answering below. On 4/20/23 08:53, Guillaume Tucker via lists.yoctoproject.org wrote: > On 19/04/2023 21:38, Bird, Tim wrote: >>> -----Original Message----- >>> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> >>> Hello everyone involved with, or interested in KCIDB, >>> >>> After the testing is done, or we gave up trying, we send the same KCIDB test >>> objects to the database, but this time only containing whatever results we >>> got, including the "status" fields. However, with the current set of status >>> strings [1], the only way we can try to express "wanted to run, but couldn't" >>> is with "SKIP", which is not supposed to alert anyone, yet this situation >>> should be treated as a problem. >> >> Why can't "wanted to run, but couldn't" be expressed with "ERROR"? > > There's also "wanted to run, but didn't" if the test wasn't > actually run. I guess failing to start the test is a harness > problem, and a failure while it's running is more likely a test > problem (but not necessarily). Yes, that's the idea. > But also, what if a test has disappeared from a test suite? That depends where you slice your responsibility, I suppose. If you want to control and make sure which exact sub-tests of a test suite should be executed, then you should report the disappeared test as "MISS", because it would be the fault of your harness not noticing that. If you'd prefer to leave it to the test suite maintainers to decide what running their test suite (or a subset of) means, then you just collect whatever you get, and only report "MISS" for the whole test suite when you failed to run it. >>> If you look at the above closely, you will notice one possible state missing >>> (because we didn't need to express failing harnesses), and that is the status >>> we want to introduce: >>> >>> STATUS CODE TEST HARNESS+ LEGEND >>> >>> FAIL ❌ ✅ ✅ ❌ - failure >>> ERROR ➖ ❌ ✅ ✅ - success >>> => MISS ➖ ➖ ❌ <= ➖ - no data >>> PASS ✅ ✅ ✅ >>> DONE ➖ ✅ ✅ >>> SKIP ➖ ➖ ✅ >>> ➖ ➖ ➖ > > It seems there are many reasons for a test to not run correctly > and trying to put them in buckets isn't as simple as that. I > would rather be in favour of having fewer states: pass, fail, > skip and unknown for when there's no clear test result. I agree, we could slice the test outcome into an infinite number of statuses, for every little thing, and keep adding them until the cows come home. However, if you look at the above table, you would see that we have statuses strictly focusing on which part of the stack has passed/failed/had no data, and these parts of the stack correspond to the clearly-separate responsibility areas. We're just enumerating all the possibilities in that space, and "MISS" plugs the one remaining hole. So, I think it has its place. That is unless we object to having harness (and higher) failures reported to KCIDB. I would say, though, that we should accept them. Because, even if the CI systems try hard to avoid them, their failures will creep in anyway, and then the issue/incident system will be there to help them set the correct status after the fact, for the situations when they failed to execute the test and reported something else by mistake. And of course, I'd be happy if this new status helps CKI keep using KCIDB for internal result reporting as well. > On a side note, the last line with no data anywhere doesn't have > a name. Is the status field optional in KCIDB, so we can send > data with no status? That would match the last line I guess. Or > maybe it could be named as UNKNOWN or something. I think that > kind of relates to missing test results if there's no positive > error reported by the test harness, see my other paragraph below. Yes, the "status" field is optional (same as most other fields), and yes, the last line represents it missing (all missing fields represent "no data" in KCIDB). I don't think we should add an explicit "UNKNOWN" status, because then we would have two kinds of "unknowns", with all the problems that follow. >> I think I'm missing something. Are you trying to distinguish these so you >> can determine whether there is a problem with the test itself, vs. the harness? >> Are you automatically re-running a test if the harness is the problem? >> >> Why do you want to distinguish these error cases? > > I was wondering the same thing. Is MISS effectively like > a "null" default state for when the system knows a test has been > started and a result is expected but hasn't arrived yet? Whereas > ERROR is when the test has clearly hit an error and failed to > run? For example, if a test suite changed and a test case > disappeared or was renamed then you would see a MISS result with > the old test case name and that's not really an error. No, "MISS" is not the default state. The default state is the "status" field missing, which indicates that the test is planned, but not (completely) executed yet. That is the only field state that could be changed by the KCIDB protocol. I.e. the only possible change to an already-submitted object is to add a missing field. It's not possible to deterministically do any other change. Basically, whichever status string you submit is final for that test. "MISS" is for when you sent your "test plan" in the form of a set of tests with missing "status" fields, proceeded to testing, and then e.g. network went down in the middle of it, and you couldn't reach your hosts to get the results until you ran out of time. Or e.g. you couldn't get the hosts you need for some of the tests in a reasonable time, because they were taken by someone else. Or the kernel simply failed to boot on some hosts, and no tests could be executed there. In these cases, you can send the "MISS" status to indicate you gave up on this run. Then people and machines can stop waiting for those results and go do something else, like say "this version was not fully tested, so we cannot release it, and gonna ask for a retry", or "we did our best, upstream, here are your test results, such as they are, take 'em or leave 'em". Or you can use this status in internal communication, to trigger a re-run of those particular tests, before reporting the results to upstream KCIDB. Yes, the "ERROR" status is for when we actually got to executing the test (suite), and then it messed up, and e.g. crashed in the middle of execution. This is likely a problem for test maintainers to solve, and not for the CI people (although they would do well to keep an eye on such results too). Yes, as I also described above, if you care about which exact tests a test suite executes, and some tests in your previously-submitted plan didn't execute after all, then you would need to set their status to "MISS". But if you only care that this suite executes, and runs whatever tests it deems important, you could e.g. send an object for the whole test suite with its status missing before starting. Then send its subtests with status filled in with whatever you get as they execute, capping off with the final status of the whole suite after it's done. And if you failed to run the suite for whatever reason, then you could send the "MISS" status for it instead. Hope this helps :D Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Automated-testing] KCIDB: Support one more test status 2023-04-20 16:37 ` [Automated-testing] " Nikolai Kondrashov @ 2023-05-05 11:39 ` Guillaume Tucker 2023-05-08 10:16 ` Nikolai Kondrashov 0 siblings, 1 reply; 10+ messages in thread From: Guillaume Tucker @ 2023-05-05 11:39 UTC (permalink / raw) To: Nikolai Kondrashov, Bird, Tim, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant On 20/04/2023 18:37, Nikolai Kondrashov wrote: > Hi Guillaume, > > Thank you for your response! > > Snipping & answering below. > > On 4/20/23 08:53, Guillaume Tucker via lists.yoctoproject.org wrote: >> On 19/04/2023 21:38, Bird, Tim wrote: >>>> -----Original Message----- >>>> From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> >>>> Hello everyone involved with, or interested in KCIDB, >>>> >>>> After the testing is done, or we gave up trying, we send the same KCIDB test >>>> objects to the database, but this time only containing whatever results we >>>> got, including the "status" fields. However, with the current set of status >>>> strings [1], the only way we can try to express "wanted to run, but couldn't" >>>> is with "SKIP", which is not supposed to alert anyone, yet this situation >>>> should be treated as a problem. >>> >>> Why can't "wanted to run, but couldn't" be expressed with "ERROR"? >> >> There's also "wanted to run, but didn't" if the test wasn't >> actually run. I guess failing to start the test is a harness >> problem, and a failure while it's running is more likely a test >> problem (but not necessarily). > > Yes, that's the idea. > >> But also, what if a test has disappeared from a test suite? > > That depends where you slice your responsibility, I suppose. > > If you want to control and make sure which exact sub-tests of a test suite should be executed, then you should report the disappeared test as "MISS", because it would be the fault of your harness not noticing that. > > If you'd prefer to leave it to the test suite maintainers to decide what running their test suite (or a subset of) means, then you just collect whatever you get, and only report "MISS" for the whole test suite when you failed to run it. > >>>> If you look at the above closely, you will notice one possible state missing >>>> (because we didn't need to express failing harnesses), and that is the status >>>> we want to introduce: >>>> >>>> STATUS CODE TEST HARNESS+ LEGEND >>>> >>>> FAIL ❌ ✅ ✅ ❌ - failure >>>> ERROR ➖ ❌ ✅ ✅ - success >>>> => MISS ➖ ➖ ❌ <= ➖ - no data >>>> PASS ✅ ✅ ✅ >>>> DONE ➖ ✅ ✅ >>>> SKIP ➖ ➖ ✅ >>>> ➖ ➖ ➖ >> >> It seems there are many reasons for a test to not run correctly >> and trying to put them in buckets isn't as simple as that. I >> would rather be in favour of having fewer states: pass, fail, >> skip and unknown for when there's no clear test result. > > I agree, we could slice the test outcome into an infinite number of statuses, for every little thing, and keep adding them until the cows come home. > > However, if you look at the above table, you would see that we have statuses strictly focusing on which part of the stack has passed/failed/had no data, and these parts of the stack correspond to the clearly-separate responsibility areas. We're just enumerating all the possibilities in that space, and "MISS" plugs the one remaining hole. > > So, I think it has its place. That is unless we object to having harness (and higher) failures reported to KCIDB. > > I would say, though, that we should accept them. Because, even if the CI systems try hard to avoid them, their failures will creep in anyway, and then the issue/incident system will be there to help them set the correct status after the fact, for the situations when they failed to execute the test and reported something else by mistake. > > And of course, I'd be happy if this new status helps CKI keep using KCIDB for internal result reporting as well. > >> On a side note, the last line with no data anywhere doesn't have >> a name. Is the status field optional in KCIDB, so we can send >> data with no status? That would match the last line I guess. Or >> maybe it could be named as UNKNOWN or something. I think that >> kind of relates to missing test results if there's no positive >> error reported by the test harness, see my other paragraph below. > > Yes, the "status" field is optional (same as most other fields), and yes, the last line represents it missing (all missing fields represent "no data" in KCIDB). > > I don't think we should add an explicit "UNKNOWN" status, because then we would have two kinds of "unknowns", with all the problems that follow. OK that's fair, effectively "null" fills that gap. >>> I think I'm missing something. Are you trying to distinguish these so you >>> can determine whether there is a problem with the test itself, vs. the harness? >>> Are you automatically re-running a test if the harness is the problem? >>> >>> Why do you want to distinguish these error cases? >> >> I was wondering the same thing. Is MISS effectively like >> a "null" default state for when the system knows a test has been >> started and a result is expected but hasn't arrived yet? Whereas >> ERROR is when the test has clearly hit an error and failed to >> run? For example, if a test suite changed and a test case >> disappeared or was renamed then you would see a MISS result with >> the old test case name and that's not really an error. > > No, "MISS" is not the default state. The default state is the "status" field missing, which indicates that the test is planned, but not (completely) executed yet. That is the only field state that could be changed by the KCIDB protocol. > > I.e. the only possible change to an already-submitted object is to add a missing field. It's not possible to deterministically do any other change. > Basically, whichever status string you submit is final for that test. > > "MISS" is for when you sent your "test plan" in the form of a set of tests with missing "status" fields, proceeded to testing, and then e.g. network went down in the middle of it, and you couldn't reach your hosts to get the results until you ran out of time. Or e.g. you couldn't get the hosts you need for some of the tests in a reasonable time, because they were taken by someone else. Or the kernel simply failed to boot on some hosts, and no tests could be executed there. > > In these cases, you can send the "MISS" status to indicate you gave up on this run. Then people and machines can stop waiting for those results and go do something else, like say "this version was not fully tested, so we cannot release it, and gonna ask for a retry", or "we did our best, upstream, here are your test results, such as they are, take 'em or leave 'em". Or you can use this status in internal communication, to trigger a re-run of those particular tests, before reporting the results to upstream KCIDB. > > Yes, the "ERROR" status is for when we actually got to executing the test (suite), and then it messed up, and e.g. crashed in the middle of execution. This is likely a problem for test maintainers to solve, and not for the CI people (although they would do well to keep an eye on such results too). OK I think the documentation could be updated to provide a bit more context about what's expected from each status: https://kernelci.org/docs/kcidb/submitter_guide/#status At the moment, it just says that ERROR means: "the test is faulty, the status of the tested code is unknown." The word "test" here I think means the code written to run a test to you, but I can see how this might be interpreted differently and things that prevent the test from being run could slip in and be reported as errors. Maybe adding something about who in principle is responsible for each error status would help as discussed in yesterday's meeting (e.g. ERROR is for the test authors, MISS is for the people in charge of the test infrastructure). > Yes, as I also described above, if you care about which exact tests a test suite executes, and some tests in your previously-submitted plan didn't execute after all, then you would need to set their status to "MISS". Right, even though I think that's a grey area between the test and the infrastructure. Maybe no status would also work here. > But if you only care that this suite executes, and runs whatever tests it deems important, you could e.g. send an object for the whole test suite with its status missing before starting. Then send its subtests with status filled in with whatever you get as they execute, capping off with the final status of the whole suite after it's done. And if you failed to run the suite for whatever reason, then you could send the "MISS" status for it instead. I think my main point was that we could just have ERROR with a broader scope and then something else like an error type or message to differentiate errors between the test, the infrastructure or anything specific to the CI system. So from a user point of view, it's either a valid result i.e. PASS / FAIL / SKIP or something went wrong and it's ERROR. On the other hand, having MISS shouldn't really be a problem. I can see why it could be very useful to have it in KCIDB for CKI's use-case at least. So I'm still not personally convinced by this but I think it's fine to add it anyway as I don't see any blocking reason for not adding it either. Thanks, Guillaume ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Automated-testing] KCIDB: Support one more test status 2023-05-05 11:39 ` Guillaume Tucker @ 2023-05-08 10:16 ` Nikolai Kondrashov 0 siblings, 0 replies; 10+ messages in thread From: Nikolai Kondrashov @ 2023-05-08 10:16 UTC (permalink / raw) To: Guillaume Tucker, Bird, Tim, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant Hi Guillaume, Thank you for the response. I snippet some old stuff, and answered a bit, below. On 5/5/23 14:39, Guillaume Tucker wrote: > On 20/04/2023 18:37, Nikolai Kondrashov wrote: >> Yes, the "ERROR" status is for when we actually got to executing the test (suite), and then it messed up, and e.g. crashed in the middle of execution. This is likely a problem for test maintainers to solve, and not for the CI people (although they would do well to keep an eye on such results too). > > OK I think the documentation could be updated to provide a bit > more context about what's expected from each status: > > https://kernelci.org/docs/kcidb/submitter_guide/#status > > At the moment, it just says that ERROR means: > > "the test is faulty, the status of the tested code is unknown." > > The word "test" here I think means the code written to run a test > to you, but I can see how this might be interpreted differently > and things that prevent the test from being run could slip in and > be reported as errors. Maybe adding something about who in > principle is responsible for each error status would help as > discussed in yesterday's meeting (e.g. ERROR is for the test > authors, MISS is for the people in charge of the test > infrastructure). Yes, we could use better documentation, definitely. Have you taken a look at the PR in question? I refined the schema docs there a bit, do they make more sense now? Here's the PR: https://github.com/kernelci/kcidb-io/pull/74 If that looks good, I'll update the Submitter Guide with this and go into a bit more detail, adding the table I used here to break down the responsibility. >> Yes, as I also described above, if you care about which exact tests a test suite executes, and some tests in your previously-submitted plan didn't execute after all, then you would need to set their status to "MISS". > > Right, even though I think that's a grey area between the test > and the infrastructure. Maybe no status would also work here. > Yes, often it's hard to determine what's at fault, but when you can it's best to avoid alerting people who are not responsible for the problem (such as not notifying the test maintainer in case of CI failure). >> But if you only care that this suite executes, and runs whatever tests it deems important, you could e.g. send an object for the whole test suite with its status missing before starting. Then send its subtests with status filled in with whatever you get as they execute, capping off with the final status of the whole suite after it's done. And if you failed to run the suite for whatever reason, then you could send the "MISS" status for it instead. > > I think my main point was that we could just have ERROR with a > broader scope and then something else like an error type or > message to differentiate errors between the test, the > infrastructure or anything specific to the CI system. So from a > user point of view, it's either a valid result i.e. PASS / FAIL / > SKIP or something went wrong and it's ERROR. Sure, some CI systems would only be able to report "ERROR", that's OK, and even if you try hard to report a "MISS" when you can do it, "ERROR" would also keep slipping in by mistake. While we already have a lot of fields (and need more still), I think "MISS" fits the value space of existing test statuses quite well. After all, we already try to distinguish "ERROR" from "FAIL", splitting the test maintainer responsibility from the developer responsibility (when possible). We're just adding a finer gradation here, splitting "ERROR" again, into test maintainer and CI maintainer responsibilities. > On the other hand, having MISS shouldn't really be a problem. I > can see why it could be very useful to have it in KCIDB for CKI's > use-case at least. So I'm still not personally convinced by this > but I think it's fine to add it anyway as I don't see any > blocking reason for not adding it either. Thank you, Guillaume. I have now merged the PR and will proceed on finishing support for it in the notifications and dashboards. Right now the new schema version is not accepted by Production yet. Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: KCIDB: Support one more test status 2023-04-19 19:38 ` Bird, Tim 2023-04-20 5:53 ` Guillaume Tucker @ 2023-04-20 12:08 ` Nikolai Kondrashov 2023-04-20 21:48 ` Bird, Tim 1 sibling, 1 reply; 10+ messages in thread From: Nikolai Kondrashov @ 2023-04-20 12:08 UTC (permalink / raw) To: Bird, Tim, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant Hi Tim, Thanks a lot for your response! I will do some snipping and answering below. On 4/19/23 22:38, Bird, Tim wrote: >> After the testing is done, or we gave up trying, we send the same KCIDB test >> objects to the database, but this time only containing whatever results we >> got, including the "status" fields. However, with the current set of status >> strings [1], the only way we can try to express "wanted to run, but couldn't" >> is with "SKIP", which is not supposed to alert anyone, yet this situation >> should be treated as a problem. > > Why can't "wanted to run, but couldn't" be expressed with "ERROR"? This is a matter of responsibility area and distinguishing who should be fixing the problem. The three layers I listed below correspond to the three distinct parties involved in testing, at Red Hat, and other large CI systems: "CODE" is for kernel developers/maintainers, "TEST" is for test maintainers, and "HARNESS+" is for CI system maintainers. At Red Hat we have the CKI project, which is responsible for maintaining the pipeline, builds, provisioning, reporting, etc - "HARNESS+". Then we have *a lot* of test maintainers, both internal (for the tests Red Hat needs), and external (for test suites like LTP) - "TEST". Finally, we have the kernel developers/maintainers, of course - "CODE". Naturally, if there was an issue with the test (normally reported as ERROR), we don't want to bother kernel developers. A test maintainer would need to deal with that (and they would get their notification), although they would be interested in just regular PASS/FAIL results too. Similarly, if the CI system couldn't manage to run the test, we won't want to report it as ERROR, because that would alert the test maintainer, while it wouldn't be their fault at all, and they shouldn't go and waste their time investigating it. Now, of course, this particular split is not always there, or it's not so clear-cut. E.g. kunit tests are normally maintained by kernel developers themselves. So they would be interested in both "CODE" and "TEST" layers for those, and so wouldn't really need the "ERROR" status - "FAIL" would be enough. CI system maintainers often take the role of test maintainers as well, and they wouldn't really need to distinguish "TEST" and "HARNESS+", for them it would be "TEST+", and they wouldn't need the "MISS" status - "ERROR" would be enough. However, this split (and the various statuses) is a good tool for handling all the various responsibility combinations the CI systems submitting to KCIDB have. It allows precise targeting of notifications and dashboard data, saving time and effort in many cases. >> We propose to call this new status "MISS" (as in "the test result should be >> there, but isn't"), and think it would be useful to others as well. >> >> We can break down the testing stack into three layers: the tested code, the >> test, and the harness (and everything above it) that runs the test. If we then >> express each existing test status as one trinary outcome per each of those >> layers, we would get this table (in order of descending status priority): >> >> STATUS CODE TEST HARNESS+ LEGEND >> >> FAIL ❌ ✅ ✅ ❌ - failure >> ERROR ➖ ❌ ✅ ✅ - success >> PASS ✅ ✅ ✅ ➖ - no data >> DONE ➖ ✅ ✅ >> SKIP ➖ ➖ ✅ >> ➖ ➖ ➖ >> >> If you look at the above closely, you will notice one possible state missing >> (because we didn't need to express failing harnesses), and that is the status >> we want to introduce: >> >> STATUS CODE TEST HARNESS+ LEGEND >> >> FAIL ❌ ✅ ✅ ❌ - failure >> ERROR ➖ ❌ ✅ ✅ - success >> => MISS ➖ ➖ ❌ <= ➖ - no data >> PASS ✅ ✅ ✅ >> DONE ➖ ✅ ✅ >> SKIP ➖ ➖ ✅ >> ➖ ➖ ➖ >> >> Please respond with comments, objections, and (counter-)proposals, >> if you have them. > > I don't understand the rationale for distinguishing a test error from a harness > error. In either case the test was not executed properly, and so there is no > useful test result data available. Diagnostic information should enable > the user to determine whether the problem was due to the test code failing > or the test harness failing. This works when the test maintainer and the harness/framework/CI system maintainer are the same person or team. This doesn't work when everything from CI system down to the test (suite) harness is maintained by one team, and the test by a completely different team (e.g. CKI and LTP). > I think I'm missing something. Are you trying to distinguish these so you > can determine whether there is a problem with the test itself, vs. the harness? Yes. > Are you automatically re-running a test if the harness is the problem? We do try rerunning tests in case we hit a faulty host in our inventory (this happens), or e.g. a network problem occurred. However, at some point we gotta give up, and then we need a way to say: "this test result is not just missing or in progress (as specified by a missing "status" property), but we're done testing, and we couldn't run that test". Because we usually run multiple suites on each machine, one after another, if one of them crashes/locks up the machine (we're testing the kernel after all), then the following suites won't be able to run. In this case we also need a way to say "we finished testing, but those tests didn't even get to run". > Why do you want to distinguish these error cases? As described above, to alert the right people and avoid wasting time of others. Nick ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: KCIDB: Support one more test status 2023-04-20 12:08 ` Nikolai Kondrashov @ 2023-04-20 21:48 ` Bird, Tim 0 siblings, 0 replies; 10+ messages in thread From: Bird, Tim @ 2023-04-20 21:48 UTC (permalink / raw) To: Nikolai Kondrashov, kernelci@lists.linux.dev, Dmitry Vyukov, Cristian Marussi, Alice Ferrazzi, Philip Li, Vishal Bhoj, automated-testing@lists.yoctoproject.org, CKI, Mark Brown, Johnson George, Sachin Sant > -----Original Message----- > From: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com> > Hi Tim, > > Thanks a lot for your response! > I will do some snipping and answering below. > > On 4/19/23 22:38, Bird, Tim wrote: > >> After the testing is done, or we gave up trying, we send the same KCIDB test > >> objects to the database, but this time only containing whatever results we > >> got, including the "status" fields. However, with the current set of status > >> strings [1], the only way we can try to express "wanted to run, but couldn't" > >> is with "SKIP", which is not supposed to alert anyone, yet this situation > >> should be treated as a problem. > > > > Why can't "wanted to run, but couldn't" be expressed with "ERROR"? > > This is a matter of responsibility area and distinguishing who should be > fixing the problem. > > The three layers I listed below correspond to the three distinct parties > involved in testing, at Red Hat, and other large CI systems: "CODE" is for > kernel developers/maintainers, "TEST" is for test maintainers, and "HARNESS+" > is for CI system maintainers. > > At Red Hat we have the CKI project, which is responsible for maintaining the > pipeline, builds, provisioning, reporting, etc - "HARNESS+". Then we have *a > lot* of test maintainers, both internal (for the tests Red Hat needs), and > external (for test suites like LTP) - "TEST". Finally, we have the kernel > developers/maintainers, of course - "CODE". > > Naturally, if there was an issue with the test (normally reported as ERROR), > we don't want to bother kernel developers. A test maintainer would need to > deal with that (and they would get their notification), although they would be > interested in just regular PASS/FAIL results too. > > Similarly, if the CI system couldn't manage to run the test, we won't want to > report it as ERROR, because that would alert the test maintainer, while it > wouldn't be their fault at all, and they shouldn't go and waste their time > investigating it. > > Now, of course, this particular split is not always there, or it's not so > clear-cut. E.g. kunit tests are normally maintained by kernel developers > themselves. So they would be interested in both "CODE" and "TEST" layers for > those, and so wouldn't really need the "ERROR" status - "FAIL" would be enough. > > CI system maintainers often take the role of test maintainers as well, and > they wouldn't really need to distinguish "TEST" and "HARNESS+", for them it > would be "TEST+", and they wouldn't need the "MISS" status - "ERROR" would be > enough. > > However, this split (and the various statuses) is a good tool for handling all > the various responsibility combinations the CI systems submitting to KCIDB > have. It allows precise targeting of notifications and dashboard data, saving > time and effort in many cases. Ok - this sounds reasonable. -- Tim ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-05-08 10:16 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-19 17:08 KCIDB: Support one more test status Nikolai Kondrashov 2023-04-19 17:49 ` Mark Brown 2023-04-20 16:50 ` Nikolai Kondrashov 2023-04-19 19:38 ` Bird, Tim 2023-04-20 5:53 ` Guillaume Tucker 2023-04-20 16:37 ` [Automated-testing] " Nikolai Kondrashov 2023-05-05 11:39 ` Guillaume Tucker 2023-05-08 10:16 ` Nikolai Kondrashov 2023-04-20 12:08 ` Nikolai Kondrashov 2023-04-20 21:48 ` Bird, Tim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox