public inbox for kernelci@lists.linux.dev
 help / color / mirror / Atom feed
* KCIDB: v5.0 - backwards-incompatible changes
@ 2025-02-27 18:32 Nikolai Kondrashov
  2025-02-27 18:41 ` Bird, Tim
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Nikolai Kondrashov @ 2025-02-27 18:32 UTC (permalink / raw)
  To: syzkaller, Dmitry Vyukov, Vishal Bhoj, Alice Ferrazzi,
	automated-testing, Cristian Marussi, Johnson George,
	kernelci@lists.linux.dev, Mark Brown, Philip Li,
	Denys Fedoryshchenko, Tales da Aparecida, Aditya Nagesh,
	Sachin Sant, Benjamin Copeland, Manoj Kumar, Michael Hofmann,
	marcelo.santos, kernelci-webdashboard

Hello again, everyone (potentially) involved with sending data to KCIDB,

We've accumulated a bunch of schema change needs, which would break backwards
compatibility, and so it's time for another major update: v5.0.

First, here's the kcidb-io PR: https://github.com/kernelci/kcidb-io/pull/95

This will be followed by a DB/ORM update, once we agree on something. I'll go
over each proposed change below, linking to corresponding commits in the PR so
far. Please review and respond if you have any objections, comments or
requests (here, or in the PR).

I'll merge this next Friday, March 7, if there are no objections by that time.


DROP "contacts" FROM "checkouts"
--------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/a29e115ac3b28781d9d82090b320c82e8559aec2

Remove the "contacts" field from checkout objects. It was added to support
extracting email addresses from patches so that notifications could be sent to
people concerned with them. Since then we've focused on subscription-based
system to minimize negative effect of sending false positives/negatives to
people who didn't ask for them.

Here's an example of a checkout with the field about to be removed:

    {
        "origin": "microsoft",
        "id": "microsoft:89bf6209cad6",
        "comment": "Merge tag 'devicetree-fixes-for-6.5-2' of git://git.kernel.org/pub/scm/linux/ker \bnel/git/robh/linux",
        "contacts": [
            "torvalds@linux-foundation.org"
        ],
        "git_commit_hash": "89bf6209cad66214d3774dac86b6bbf2aec6a30d",
        "git_commit_name": "v6.5-rc7-18-g89bf6209cad6",
        "git_repository_branch": "master",
        "git_repository_url": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git",
        "start_time": "2023-08-23T15:09:13.791021+00:00",
        "valid": true
    }


DROP "build_valid"/"test_status" FROM "issues"
----------------------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/99ea3dfa1f8ed02949b58c180b13d86c77d14138

Drop the "build_valid" and "test_status" fields from issue objects.

The idea for these fields was to let submitters rectify results they sent
earlier, so that reports could be corrected. However, it seems that we're
shifting our attention to notifying about issues themselves, rather than
particular test and build results, so these fields wouldn't be needed, and
their absence would make things simpler.

Here are two issue examples with the fields to be removed:

    {
        "id": "maestro:b93384a5a9d06e5f3422c15569364f45f598bd2c",
        "version": 1,
        "origin": "maestro",
        "culprit": {
            "code": true,
            "tool": false,
            "harness": false
        },
        "build_valid": false,
        "comment": " use of undeclared identifier 'MIGRATE_CMA'; did you mean 'MIGRATE_SYNC'? in mm/page_alloc.o (mm/page_alloc.c) [logspec:kbuild,kbuild.compiler.error]",
    },
    {
        "id": "maestro:f627e08f767a688209f3f9a2d41c6b4c260897b0",
        "version": 0,
        "origin": "maestro",
        "test_status": "FAIL",
        "comment": "[logspec:generic_linux_boot] linux.kernel.null_pointer_dereference NULL pointer dereference at virtual address 00000000000000d0",
    }


REPLACE "valid" WITH "status" IN "builds"
-----------------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/635b0695807ac366c3d52853b1056bed2052d80b

At the moment builds can only have two outcomes: pass and fail, encoded in the
boolean "valid" field. However, builds tend to have all the similar issues as
tests do, and at least two CI systems were missing support for that. So, I'd
like to replace the "valid" field with the same "status" field that tests use.

E.g. this:

    {
        "id": "redhat:1177148648-s390x-kernel",
        "valid": true,
        "origin": "redhat",
        "command": "make CC=clang -j24 INSTALL_MOD_STRIP=1 targz-pkg",
        "comment": "CKI build of mainline.kernel.org-clang",
        "compiler": "clang version 17.0.6 (Fedora 17.0.6-6.fc40)",
        "checkout_id": "redhat:1177148648",
        "config_name": "olddefconfig",
        "architecture": "s390x",
    }

becomes this:

    {
        "id": "redhat:1177148648-s390x-kernel",
        "status": "PASS",
        "origin": "redhat",
        "command": "make CC=clang -j24 INSTALL_MOD_STRIP=1 targz-pkg",
        "comment": "CKI build of mainline.kernel.org-clang",
        "compiler": "clang version 17.0.6 (Fedora 17.0.6-6.fc40)",
        "checkout_id": "redhat:1177148648",
        "config_name": "olddefconfig",
        "architecture": "s390x",
    }


BE STRICTER ABOUT TEST PATHS
----------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/fa9ec752a5d7c84b2391132815a136d5aabf0332

Test "paths" were defined rather loosely in the schema so far, and some corner
cases the current pattern permits are hard to make meaning of on the
dashboards and notifications. So I'd like to prohibit paths with leading,
trailing, and repeating dots. So something like "ltp.memcpy01" would still be
valid, but e.g. "ltp..memcpy01" or ".ltp.memcpy01", or "ltp.memcpy01." will
not.

NOTE: While the previous schema versions will validate these just fine, once
      the support for the new v5.0 schema is deployed, we'll start dropping
      submissions with those. You'll need to upgrade to this schema on the
      client side to catch these *before* sending them to KCIDB.


REPLACE "waived" field with an "incident"
-----------------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/b2b6cfaabdd0a0024456018df86fbc44942beba3

We've had "waived" field in tests right from the start, because that was what
CKI was using at the time. They still do, but I think we can perhaps move on
past it, which will simplify things a lot.

The field is intended for a CI system to signal that the test which has it set
to true is unstable, and shouldn't be taken into consideration when deciding
on a testing result. This is useful to temporary disable the test effect,
while keeping running it and collecting data about its performance.

However, since then we came up with an idea of "issues" and "incidents" which
allow to express this particular situation, as well as many others. We're also
forgoing the whole idea of reporting pure (and often unreliable) test results,
and reporting issues themselves instead.

As such I would like to drop support for the "waived" field and when
inheriting data containing it from the older schema versions replace it with a
global (constant-ID) issue and a test-specific incident linking the test to
it.

This will result in submissions like this:

    {
        "version": {"major": 4, "minor": 5},
        "tests": [
            {
                "id": "redhat:1691700431-aarch64-kernel-64k_upt_22",
                "origin": "redhat",
                "build_id": "redhat:1691700431-aarch64-kernel-64k",
                "comment": "Hardware - libevdev test",
                "duration": 67.0,
                "path": "libevdev",
                "start_time": "2025-02-27T17:00:29.000000+00:00",
                "status": "PASS",
                "waived": true
            }
        ]
    }

Being converted to this:

    {
        "version": {"major": 5, "minor": 0},
        "tests": [
            {
                "id": "redhat:1691700431-aarch64-kernel-64k_upt_22",
                "origin": "redhat",
                "build_id": "redhat:1691700431-aarch64-kernel-64k",
                "comment": "Hardware - libevdev test",
                "duration": 67.0,
                "path": "libevdev",
                "start_time": "2025-02-27T17:00:29.000000+00:00",
                "status": "PASS"
            }
        ],
        "issues": [
            {
                "id": "_:waived",
                "origin": "_",
                "version": 1,
                "comment": "Test waived as unreliable"
            }
        ],
        "incidents": [
            {
                "id": "_:waived:1:redhat:1691700431-aarch64-kernel-64k_upt_22",
                "origin": "_",
                "issue_id": "_:waived",
                "issue_version": 1,
                "test_id": "redhat:1691700431-aarch64-kernel-64k_upt_22",
                "present": true
            }
        ]
    }


PROHIBIT NULL CHARACTERS ('\0') IN STRINGS
------------------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/db36bc46b2c7ddb479eec7a5344e0ce28cb6c2de

JSON allows null characters ('\0') in strings, which is good for flexibility.
Unfortunately PostgreSQL, which we're using for our operational database
doesn't allow them in its TEXT columns. We've had a few cases when serial
console garbage ended up in log excerpts and failed to be loaded into the
database. So I'd like to prohibit null characters in all the strings of the
I/O schema to avoid such problems in the future.

NOTE: While the previous schema versions will validate these just fine, once
      the support for the new v5.0 schema is deployed, we'll start dropping
      submissions with those. You'll need to upgrade to this schema on the
      client side to catch these *before* sending them to KCIDB.


ONLY ALLOW HTTPS URLS FOR REPOS
-------------------------------

https://github.com/kernelci/kcidb-io/pull/95/commits/601157e5d3e140214656e69caf62fce9d7869737

We've supported both https:// and git:// URLs for "git_repository_url" field
from the early days. However, actually only one repository, tested by
KernelCI, required it. We no longer test that repository and it has been
deleted by the owner. Having only https:// URLs in JSON and the database would
make some things simpler, so I would like to drop support for git:// repos.

We last received a checkout with a git:// URL on July 17th last year, and that
was for the removed repo. Here it is, as an example of a checkout which would
fail validation and have the submission discarded:

    {
        "id": "_:kernelci:2cd7584b84b6e2732e9b87805f38b89b7b633cbf",
        "origin": "kernelci",
        "git_repository_url": "git://git.armlinux.org.uk/~rmk/linux-arm.git",
        "git_commit_hash": "2cd7584b84b6e2732e9b87805f38b89b7b633cbf",
        "git_commit_name": "v5.4-38-g2cd7584b84b6",
        "git_repository_branch": "to-build",
        "patchset_hash": "",
        "start_time": "2024-06-15T15:46:07.133000+00:00",
        "valid": true
    }


NOTE: While the previous schema versions will validate these just fine, once
      the support for the new v5.0 schema is deployed, we'll start dropping
      submissions with those. You'll need to upgrade to this schema on the
      client side to catch these *before* sending them to KCIDB.

---

And this is finally all for this schema update. Please respond if you disagree
with any of these changes, or would like to propose your own. Thank you for
reading all the way to here!

I'll merge this next Friday, March 7, if there are no objections by that time.

Nick

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-21 11:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 18:32 KCIDB: v5.0 - backwards-incompatible changes Nikolai Kondrashov
2025-02-27 18:41 ` Bird, Tim
2025-02-27 19:19 ` [kernelci-webdashboard] " Gustavo Padovan
2025-02-27 19:21 ` Mark Brown
2025-02-28 14:59 ` Tales da Aparecida
2025-03-07 16:37 ` Nikolai Kondrashov
2025-03-21 11:16   ` Nikolai Kondrashov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox