KernelCI backend redesign and generic lab support

public inbox for kernelci@lists.linux.dev
 help / color / mirror / Atom feed

* KernelCI backend redesign and generic lab support
@ 2021-03-05 20:55 Guillaume Tucker
  2021-04-13  2:54 ` Bjorn Andersson
  0 siblings, 1 reply; 3+ messages in thread
From: Guillaume Tucker @ 2021-03-05 20:55 UTC (permalink / raw)
  To: Michał Gałka, ticotimo, Nikolai Kondrashov,
	Michael Grzeschik, santiago.esteban, Jan Lübbe
  Cc: kernelci@groups.io, automated-testing

Hello,

As it has been mentioned multiple times recently, the
kernelci-backend code is ageing pretty badly: it's doing too
many things so it's hard to maintain, there are better ways to
implement a backend now with less code, and it's still Python
2.7.  Also, there is a need to better support non-LAVA labs such
as Labgrid.  Finally, in order to really implement a modular
KernelCI pipeline, we need a good messaging system to
orchestrate the different components - which is similar to
having a generic way to notify labs about tests to run.  For all
these reasons, it's now time to seriously consider how we should
replace it with a better architecture.

I've gathered some ideas in this email regarding how we might go
about doing that.  It seems like there are several people
motivated to help on different aspects of the work, so it would
be really great to organise this as a community development
effort.

Please feel free to share your thoughts about any of the points
below, and tell whether you're interested to take part in any of
it.  If there appears to be enough interest, we should schedule
a meeting to kick-start this in a couple of weeks or so.

* Design ideas

  * REST API to submit / retrieve data
    * same idea as existing one but simplified implementation using jsonschema
    * auth tokens but if possible using existing frameworks to simplify code

  * interface to database
    * same idea as now but with better models implementation

  * pub/sub mechanism to coordinate pipeline with events
    * new feature, framework to be decided (Cloud Events? Autobahn?)
    * no logic in backend, only messages
    * send notifications when things get added in database

* Client side

  Some features currently in kernelci-backend should be moved to client side
  and rely on the pub/sub and API instead:

  * LAVA callback handling (receive from LAVA, push via API)
  * log parsing (subscribe to events, get log when notified, send results)
  * email reports (subscribe to events, generate reports and send directly)
  * KCIDB bridge (subscribe to events, forward to KCIDB API)

  About getting tests to run in labs, this could then be unified
  to in fact deal with LAVA labs in the same way as non-LAVA
  ones.  At the moment, the Jenkins pipeline knows when builds
  are completed and directly schedules LAVA jobs to run.
  Instead, we should have a service listening to events to know
  when builds are available, and schedule LAVA jobs then.  Other
  labs could do that too, by receiving the same events but then
  performing actions that are specific to their own
  implementation.  For common ones such as LabGrid and
  Kubernetes, some code could be added to kernelci-core like we
  currently have for LAVA to facilitate translating KernelCI
  events into "lab dialects".

  About emails, we could also have a micro-service listening for
  emails such as replies to reports previously sent (say, to
  automatically change the status of a tracked regression...) or
  for specific ones such as stable reviews.

* Implementation ideas

  The current Python 2.7 implementation uses Tornado as the web
  framework, Redis for object caching and locking, Celery for
  asynchronous processing and interfaces with MongoDB.  Here's
  what I propose to do:

  * start new design using Python 3.x (minor version TBD) using current one as
    reference rather than doing a straight port

  * keep Tornado as the web framework since it still has a good community and
    is well suited for backend applications

  * keep Redis for caching and locking, but also use it for the pub/sub
    mechanism provided out of the box (we may host it on Azure)

  * see if we really need to keep Celery when we have client-side services

  * keep MongoDB as it's been working well for us, also to reduce the effort
    with the new design and have the ability to directly import existing data
    (we may host it in Azure)

  * separate the "storage" server from the backend, as it currently relies on
    it to be on the same host which is causing bad design and unnecessary
    dependencies (the backend shouldn't even need to read anything from
    storage, only client code would be doing this using URLs stored in the
    database)

  * use the "kernelci" Python package from kernelci-core to define common code
    as appropriate such as YAML configuration handling and JSON schema
    validation, to be shared between the backend and client code

* Schema

  The current schema has worked well for many years, but it has
  also become inconsistent and hard to maintain.  For example,
  the names of the fields are getting translated in several
  places from "tree" to "job", from "kernel" to "git_describe",
  from "build_environment" to "compiler"...  So it needs a big
  refresh.

  Also, one important thing to consider would be to have common
  object properties for all the database entries so we could
  make a tree structure with them.  For example, tests may
  depend on other tests and also on builds, and also on
  revisions.  Pretty much like object inheritance, we could have
  a basic "type" and then derivatives such as build and test.
  So I think we should take this opportunity to start with a new
  schema design, taking inspiration from the current one and
  what has been done with KCIDB in terms of content.

  It would somehow relate to the YAML configuration where
  dependencies should be better expressed (i.e. run this test
  once this build has completed, and this other test once the
  first test has completed...).  This is the same dependency
  tree as in the results, just without the runtime details and
  actual results.

  All this would deserve a discussion of its own, and I think we
  should start with an over-simplified schema to get the
  components up and running with the new design.

* Development

  It would seem like the different pieces can be worked on in
  parallel to some extent, so it would be good to create a
  backlog on GitHub to define some high-level objectives
  accordingly.  Then people who are interested can assign issues
  to them.

  We should try to have this working in Docker from the start,
  to it easier for all the contributors to have a a compatible
  environement and also to actually deploy it.  We can run an
  instance of it on staging.kernelci.org with an alternative
  port number than the current REST API.

  I believe it should be fine to ignore the web frontend
  initially, we can then adjust it to make it use the newly
  designed API.  We however have to keep its use-case in mind
  and the type of queries it would be typically making.  We may
  have a minimal frontend instance reworked with only one view
  as a basic end-to-end test.

How does that all sound?

Have a good week-end!

Best wishes,
Guillaume

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: KernelCI backend redesign and generic lab support
  2021-03-05 20:55 KernelCI backend redesign and generic lab support Guillaume Tucker
@ 2021-04-13  2:54 ` Bjorn Andersson
  2021-10-19 11:42   ` Guillaume Tucker
  0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Andersson @ 2021-04-13  2:54 UTC (permalink / raw)
  To: kernelci, guillaume.tucker
  Cc: Micha?? Ga??ka, ticotimo, Nikolai Kondrashov, Michael Grzeschik,
	santiago.esteban, Jan L?bbe, automated-testing

On Fri 05 Mar 14:55 CST 2021, Guillaume Tucker wrote:

> Hello,
> 

Hi Guillaume,

Sorry for taking the time to give you some feedback on this.

> As it has been mentioned multiple times recently, the
> kernelci-backend code is ageing pretty badly: it's doing too
> many things so it's hard to maintain, there are better ways to
> implement a backend now with less code, and it's still Python
> 2.7.  Also, there is a need to better support non-LAVA labs such
> as Labgrid.  Finally, in order to really implement a modular
> KernelCI pipeline, we need a good messaging system to
> orchestrate the different components - which is similar to
> having a generic way to notify labs about tests to run.  For all
> these reasons, it's now time to seriously consider how we should
> replace it with a better architecture.
> 
> I've gathered some ideas in this email regarding how we might go
> about doing that.  It seems like there are several people
> motivated to help on different aspects of the work, so it would
> be really great to organise this as a community development
> effort.
> 
> Please feel free to share your thoughts about any of the points
> below, and tell whether you're interested to take part in any of
> it.  If there appears to be enough interest, we should schedule
> a meeting to kick-start this in a couple of weeks or so.
> 
> 
> * Design ideas
> 
>   * REST API to submit / retrieve data
>     * same idea as existing one but simplified implementation using jsonschema
>     * auth tokens but if possible using existing frameworks to simplify code
> 
>   * interface to database
>     * same idea as now but with better models implementation
> 
>   * pub/sub mechanism to coordinate pipeline with events
>     * new feature, framework to be decided (Cloud Events? Autobahn?)
>     * no logic in backend, only messages
>     * send notifications when things get added in database

My current approach for lab-bjorn is to poll the REST api from time to
time for builds that matches some search criteria relevant for my boards
and submit these builds to a RabbitMQ "topic" exchange. Then I have
individual jobs per board that consumes these builds, run tests and
submits test results in a queue, which finally is consumed by a thing
that reports back using the REST api.

The scraper in the beginning works, but replacing it with a subscriber
model would feel like a better design. Perhaps RabbitMQ is too low
level? But the model would be nice to have.

> 
> 
> * Client side
> 
>   Some features currently in kernelci-backend should be moved to client side
>   and rely on the pub/sub and API instead:
> 
>   * LAVA callback handling (receive from LAVA, push via API)
>   * log parsing (subscribe to events, get log when notified, send results)

Since I moved to the REST api for reporting, instead of faking a LAVA
instance, I lost a few details - such as the LAVA parser generating html
logs. Nothing serious, but unifying the interface here would be good.

Regards,
Bjorn

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: KernelCI backend redesign and generic lab support
  2021-04-13  2:54 ` Bjorn Andersson
@ 2021-10-19 11:42   ` Guillaume Tucker
  0 siblings, 0 replies; 3+ messages in thread
From: Guillaume Tucker @ 2021-10-19 11:42 UTC (permalink / raw)
  To: Bjorn Andersson, kernelci
  Cc: ticotimo, Nikolai Kondrashov, Michael Grzeschik, santiago.esteban,
	automated-testing, Jan Lübbe, Michał Gałka

On 13/04/2021 03:54, Bjorn Andersson wrote:
> On Fri 05 Mar 14:55 CST 2021, Guillaume Tucker wrote:
> 
>> Hello,
>>
> 
> Hi Guillaume,
> 
> Sorry for taking the time to give you some feedback on this.

No worries, it's a long-term redesign :)

The good news is that we've now got something off the ground with
the new KernelCI API project:

  https://github.com/kernelci/kernelci-api

See below some notes based on the initial ideas in this thread.
The proof-of-concept has helped shed some light on a few things,
now I think there's enough to start designing things in a way
that would overcome the limitations of the current
kernelci-backend.

It seems like a great opportunity for new people to start
contributing and to really make it a collaborative work.  We
could have a hackfest dedicated to it in a few months' time.
There is also an Outreachy project about it, with a few
candidates already contributing:

  https://www.outreachy.org/outreachy-december-2021-internship-round/communities/kernelci/create-new-kernelci-api/cfp/

While kernelci-backend is entirely monolithic, this new
architecture should be very modular with more logic on the client
side.  This should facilitate people working on different things
in parallel.

Feel free to join any of the weekly calls to discuss this, every
Tuesday at 17:00 BST (https://meet.kernel.social/kernelci-dev).

>> As it has been mentioned multiple times recently, the
>> kernelci-backend code is ageing pretty badly: it's doing too
>> many things so it's hard to maintain, there are better ways to
>> implement a backend now with less code, and it's still Python
>> 2.7.  Also, there is a need to better support non-LAVA labs such
>> as Labgrid.  Finally, in order to really implement a modular
>> KernelCI pipeline, we need a good messaging system to
>> orchestrate the different components - which is similar to
>> having a generic way to notify labs about tests to run.  For all
>> these reasons, it's now time to seriously consider how we should
>> replace it with a better architecture.
>>
>> I've gathered some ideas in this email regarding how we might go
>> about doing that.  It seems like there are several people
>> motivated to help on different aspects of the work, so it would
>> be really great to organise this as a community development
>> effort.
>>
>> Please feel free to share your thoughts about any of the points
>> below, and tell whether you're interested to take part in any of
>> it.  If there appears to be enough interest, we should schedule
>> a meeting to kick-start this in a couple of weeks or so.
>>
>>
>> * Design ideas
>>
>>   * REST API to submit / retrieve data
>>     * same idea as existing one but simplified implementation using jsonschema

Actually, the new design is using FastAPI which relies on
Pydantic and OpenAPI.  This provides validation of the data
schema, automatically generated API documentation and
interoperability with other web services.

>>     * auth tokens but if possible using existing frameworks to simplify code

FastAPI provides OAuth2 support, that's what the new design is
using.  It's based on username / password accounts but tokens can
be used too (JWT).  It also means we could use third-party
authentication e.g. GitHub...

>>   * interface to database
>>     * same idea as now but with better models implementation

That's where Pydantic comes into play, and it's a key part of
FastAPI which can directly validate incoming data and create
objects following Pydantic models.

Also, FastAPI relies on the asynchronous features provided
natively by Python 3 which we can use with Redis and Mongo DB via
the aioredis and motor Python packages.  This means we can have
the same benefits as Celery but without the added complexity of
managing tasks "by hand".  It also means the client can be
blocked and get an HTTP error if the async task failed without
blocking any backend threads.  That's an advantage compared to
the current Celery-based solution, where the client gets an HTTP
202 right away when the task starts but never gets to know if the
task failed later on.

>>   * pub/sub mechanism to coordinate pipeline with events
>>     * new feature, framework to be decided (Cloud Events? Autobahn?)

I just made a PR for this:

  https://github.com/kernelci/kernelci-api/pull/7

See the README on the incoming branch with some examples of how
to use it.  It's based on Redis, but with the authentication
provided by FastAPI.  It's using CloudEvents to format the
messages in a standard way, which should help interacting with
other web services that also use CloudEvents (I guess it's
becoming a standard but I don't know how widespread it is yet).

>>     * no logic in backend, only messages

By "no logic", this means keeping things such as email
notifications, regression tracking, job submission, log parsing
all on the client side.  I think this is still a valid principle.

>>     * send notifications when things get added in database

The idea is that the backend will provide some basic mechanism to
generate event messages when events occur (e.g. when some data is
being pushed to it) in a systematic way and without any actual
application logic.  I believe this makes more sense than having
every client submit both data _and_ an event to say it has
submitted some data, since this should be entirely deterministic.

> My current approach for lab-bjorn is to poll the REST api from time to
> time for builds that matches some search criteria relevant for my boards
> and submit these builds to a RabbitMQ "topic" exchange. Then I have
> individual jobs per board that consumes these builds, run tests and
> submits test results in a queue, which finally is consumed by a thing
> that reports back using the REST api.
> 
> The scraper in the beginning works, but replacing it with a subscriber
> model would feel like a better design. Perhaps RabbitMQ is too low
> level? But the model would be nice to have.

This is of course one of the main use-cases for the new pub/sub
mechanism.  It should in fact be a generic way of triggering
anything: builds, tests, data post-processing, email reports...

Please feel free to compare the proposed solution based on Redis
and FastAPI with other frameworks, it's still very easy to move
things around at this stage of development.

>> * Client side
>>
>>   Some features currently in kernelci-backend should be moved to client side
>>   and rely on the pub/sub and API instead:
>>
>>   * LAVA callback handling (receive from LAVA, push via API)
>>   * log parsing (subscribe to events, get log when notified, send results)

As I already mentioned above, this still seems valid to me.

> Since I moved to the REST api for reporting, instead of faking a LAVA
> instance, I lost a few details - such as the LAVA parser generating html
> logs. Nothing serious, but unifying the interface here would be good.

Absolutely.

Best wishes,
Guillaume

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-19 11:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-05 20:55 KernelCI backend redesign and generic lab support Guillaume Tucker
2021-04-13  2:54 ` Bjorn Andersson
2021-10-19 11:42   ` Guillaume Tucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox