public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed
From: Chuck Lever <cel@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@kruces.com>, kdevops@lists.linux.dev
Subject: Re: [PATCH 4/4] bootlinux: add support for A/B kernel testing
Date: Sat, 26 Jul 2025 19:35:54 -0400	[thread overview]
Message-ID: <6ed06807-e5db-41a0-a471-35dc27cf16f7@kernel.org> (raw)
In-Reply-To: <aIU4uhrO_mOQNcKx@bombadil.infradead.org>

On 7/26/25 4:21 PM, Luis Chamberlain wrote:
> On Sat, Jul 26, 2025 at 02:00:55PM -0400, Chuck Lever wrote:
>> On 7/25/25 9:16 PM, Luis Chamberlain wrote:
>>> Right now we use the same kernel for all target nodes. We want to
>>> compare and contrast different kenrels for different features. We
>>> add support for A/B testing by leveraging the baseline and dev groups
>>> provided to us by KDEVOPS_BASELINE_AND_DEV.
>>>
>>> This extends the bootlinux playbook by enabling us to allow a different
>>> kernel tree / ref to be used for the dev group. This just becomes a
>>> configuration thing. The targets are intuitive:
>>>
>>>   make linux                 # Handles A/B compilation transparently
>>>   make linux-baseline        # Build and install baseline kernel only
>>>   make linux-dev             # Build and install development kernel only
>>
>> My "build the kernel once and package it" patches are still under test,
>> but this patch conflicts heavily with that work. I'm not sure how to
>> reconcile it,
> 
> If you give me a branch, I can just fetch it and ask Claude Code to
> refactor my branch with your and take preference for style for your
> branch. Its as easy as that.

https://github.com/chucklever/kdevops/tree/builder

This is the first it's seen the light of day. :-) I have a few other
bowling balls in the air, so it's taken a while to get robust water
fowl alignment.


> What do you think?

You are welcome to experiment and/or provide feedback to me about the
ideas.


> You can try it as well too, if you like as my tree is public already.
> And it may help you test / fix any outstanding bugs.

Unfortunately my employer has a policy that we cannot make use of public
AI code generators at this time, so I can't try it quite yet.


>> What my set does is change the operation of the "make linux" target so
>> that, depending on Kconfig settings, it either:
>>
>> - Works as it does today (builds the kernel on each test runner)
>> - Builds the kernel and packages it on a separate node, saving the
>>   package
> 
> I see, OK so an extra option is available to "build on foo system which
> is wicked fast".

Yes, one new option for that, and one new option for "please install
from the packages sitting in this directory on the Ansible controller.

I'm also thinking that, for cloud, kdevops could set up a build pipeline
to build the kernel. So far, my tenancies are limited to 10 vCPUs at a
time. But the pipelines seem to get as many as 16, making the builds
quick.


>> - Finds the packaged kernel and installs it on the test runners
> 
> Sweeeeet. It would be nice for us to have a sort of registry for kdevops
> where if a kernel is already built and available we can use it but alas
> we don't yet have any public infrastructure for that.

Yes. Scott and I have been going back and forth about how to do such a
thing.

For cloud, it is typical for artifacts built by a pipeline to land in
a public S3-style bucket. We could certainly put them there and have
some scripting to create a website around it.

For guestfs, kdevops could set up a local minIO instance to do much the
same. Or the guests could NFS-mount the controller.


> Maybe we can just
> regularly later build using OBS daily for next and weekly for RCs and
> see if the latest build is availble. That is, we just have a secondary
> process which all it does is fetch and build and push to OBS. If using
> a vanilla kernel or next kernel then it can use that. This limits the
> functionality to standard trees but a possible nice enhacement later.

The primary goal is reducing the number of times we have to build any
kernel. Building them on a schedule (if there is a change) is better
than building the kernel every time a change is noticed, I think (and
unfortunately that is what I'm doing right now).

The reason to build kernels on the spot is for development work. For
CI, having one build, and using that build everywhere you can, seems
like that should be our goal.


>> Along the way there are a number of clean-ups suggested by Daniel,
>> including improving the selection of kernel .configs, and splitting
>> apart the tasks for 9p and non-9p builds.
> 
> Awesome.
> 
>> It seems to me that we want to add a lot of complexity and nuance to
>> the "make linux" target. It might make sense to first split bootlinux
>> into three playbooks:
>>
>> 1. build this { URL, commit hash } tuple
>> 2. install from the source tree or existing package
>> 3. handle the grub details
>>
>> Then A/B testing (and other new features) can be built on top of those
>> smaller plays.
>>
>> Or, simply add a new "make linux-ab" target for this use case; I'm
>> guessing that the current set of "make linux" tasks might be utilized
>> far more frequently than this specialized case.
> 
> I just exposed make linux-baseline and make make linux-dev, and so the
> make linux becomes a sequential operation of these two. I decided to
> blend it to make linux just because that's the goal. Its not clear if we
> need a different target for when you already configured and know that
> you want different kernels for baseline (A) and dev (B). Specially since
> I added also an option so that when you enable KDEVOPS_BASELINE_AND_DEV
> (essentially what we're using for A/B testing) in the bootlinux menu
> you can select if you want both groups to have different kernels or
> the same kernel.
> 
> So yeah I'm not sure if we really need a linux-ab ?
> 
>> (Architecting the kdevops UX might be a little beyond the skill set of
>> AI code generators right at the moment)
> 
> Indeed. However I'm seeing that once I augment CLAUDE.md with
> instructions on preferred style -- it sometimes does pick up on it.
> But indeed -- stylistic preferences are best expressed by humans on
> CLAUDE.md. And in this case I do think that proper architecture ends
> up being something we need to think hard on. I think this is a good
> example limitation of boundaries and what we humans end up having to
> think more about.
> 
>>> +# Run fio-tests with kernel comparison
>>> +make linux                 # Install different kernels
>>> +make fio-tests             # Performance test both kernels
>>> +make fio-tests-compare     # Compare performance metrics
>>
>> I know that Chinner has made claims about how closely his QEMU-based
>> performance testing approaches the same results from bare metal testing.
>> Even so ...
>>
>> This example is a little dangerous. fio results (especially latency)
>> will depend on there being enough idle CPU horsepower on the hypervisor
>> system. For cloud, all bets are off, because we have no control over the
>> other workloads on that physical host unless we have rented a bare metal
>> instance.
> 
> Agreed. Its up to the users to know the above. Its why my fio-tests
> patches clarify the same caution you're observing. So I do agree a
> revised enhancement on language to caution strongly on this would
> be good.
> 
> I just use guests to prototype. When I want real data I use bare metal.
> And fortunately I'm starting to see bare metal working on kdevops, its
> just minor tweaks we have to do. The steady-state stuff I added already
> works on bare metal. For other workflows its just a matter of modifying
> the create_data role calls to be skipped when SKIP_BRINGUP is enabled,
> as its inferred the user would have done this step too. The other one
> is that SKIP_BRINGUP should likely select WORKFLOW_INFER_USER_AND_GROUP.
> This is so data on /data/ gets the local user / group. And so the way
> I intend to use future performance analysis *will* be on bare metal.
> 
> Guests are just for prototyping. However there are a few test which one
> could *still* likely run with PCI-E passthrough onto SSDs which likely
> could still be very useful too. Once we have real data on bare metal we
> can compare and contrast against VMs with PCI-E passthorugh (which
> kdevops supports) and try to see what the deltas are.

Interesting -- I didn't know you have so much bare metal work going on!

AFAI recall, kdevops/aws is the only provider where we offer a bare
metal choice. I can have a look at the others to see what is easy to
introduce.


>> Even in the guestfs world, it's easy to set up a configuration where you
>> have provisioned more vCPUs than your system has threads / cores. There
>> can be other workload traffic on the system too, and you can be sure
>> that running the A and B tests at the same time will interfere with each
>> other.
> 
> Yup!
> 
>> "-compare" makes total sense for functional test results. For comparing
>> automatically generated performance results, there are loads of caveats.
> 
> Indeed.
> 
>> (I rely on fio testing for NFS, so I'm pleased to see the fio-test
>> workflow -- but still, plenty of care must be taken before the results
>> can be used for inter-run comparison).
> 
> Yes, yes, yes.
> 
> I think we're in sync with our preferences for avoiding stupid
> performance results. I think the language needs to be enhanced to
> ensure those who are not familiar with these issues are brought forward.
> 
> Maybe just recommend guests for protyping, and we'd evaluate which
> performance matrics *do* make sense with PCI-E passthrough, but advocate
> for only true bare metal as the real data. I'm not sure it would help to
> have a Kconfig option which would taint performance results or something
> like that if the config does not adhere to our accepted norms. Perhaps?
> The more I think about it -- the more I like it. Then we'd have semantic
> way to express the idea / norms / best practices -- which may also be
> useful for the bots.

Watermarking the results by injecting timing jitter? ;-)


>>> +# Run sysbench with kernel comparison
>>> +make linux                 # Install different kernels
>>> +make sysbench              # Database tests on both kernels
>>> +```
>>> +
>>> +### Best Practices
>>> +
>>> +1. **Version Identification**: Use descriptive kernel release versions to distinguish builds
>>
>> This has been an ongoing problem for my "build once, package, and
>> install everywhere" set. How to identify which package to install,
>> in particular when the build step and the install step do not take
>> place during the same "make linux" run?
>>
>> The kernel's Kconfig has the CONFIG_LOCALVERSION_AUTO option which
>> adds the 12-hexit commit hash to the name of the kernel:
>>
>>    6.16.0-rc6-00005-g6b59765c97a3
>>
>> Which:
>> - Gives us an identifier tied to a specific version of the code base
>> - rpmbuild makes part of the kernel package name, like so:
>>
>>    kernel-6.12.40_rc1_g596aae841edf-2.x86_64.rpm
>>
>> But CONFIG_LOCALVERSION_AUTO does not identify the particular .config
>> used to build the kernel.
> 
> I see -- I thought CONFIG_LOCALVERSION_AUTO was already nice enough,
> so you wan to track configs more closely.

I set CONFIG_LOCALVERSION_AUTO pretty much everywhere. I am already a
client, as the TV ads used to say.

I'm just pointing out a possible use case where it wouldn't be enough.
And maybe we worry about that use case some other time. There is a lot
going on in this one patch!


> Maybe we should enable CONFIG_IKCONFIG and CONFIG_IKCONFIG_PROC and 
> to enhance this further perhaps we can also add either upstream to Linux
> or on kdevops an equivalent sha256sum of the config. So if not upstream
> on Linux we'd add to the kdevops kconfig cat linux/.config | sha256sum
> and I'd hope that $(zcat /proc/config.gz | sha256sum) on the runtime on
> the booted target DUT would match.
> 
>> For instance you might want to A/B the same
>> code base built with different .configs. Or, maybe you are explicitly
>> intending not to support multiple .configs here.
> 
> You are totally right -- let's just extend our semantics to include more
> config details as suggested above. Thoughts?
> 
>   Luis


-- 
Chuck Lever

  parent reply	other threads:[~2025-07-26 23:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-26  1:16 [PATCH 0/4] kdevops: add support for A/B testing Luis Chamberlain
2025-07-26  1:16 ` [PATCH 1/4] Makefile: add make style for style checking Luis Chamberlain
2025-07-26  1:16 ` [PATCH 2/4] CLAUDE.md: new workflow guide for hosts and nodes Luis Chamberlain
2025-07-26  1:16 ` [PATCH 3/4] gen_nodes/gen_hosts: avoid usage of fs_config_path on task names Luis Chamberlain
2025-07-26  1:16 ` [PATCH 4/4] bootlinux: add support for A/B kernel testing Luis Chamberlain
2025-07-26 18:00   ` Chuck Lever
2025-07-26 20:21     ` Luis Chamberlain
2025-07-26 21:37       ` Luis Chamberlain
2025-07-26 22:46       ` Luis Chamberlain
2025-07-26 23:16         ` Chuck Lever
2025-07-26 23:34           ` Luis Chamberlain
2025-07-26 23:35       ` Chuck Lever [this message]
2025-07-27  0:06         ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ed06807-e5db-41a0-a471-35dc27cf16f7@kernel.org \
    --to=cel@kernel.org \
    --cc=da.gomez@kruces.com \
    --cc=kdevops@lists.linux.dev \
    --cc=mcgrof@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox