From: Fabiano Rosas <farosas@suse.de>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Thomas Huth" <thuth@redhat.com>,
"Wainer dos Santos Moschetta" <wainersm@redhat.com>
Subject: Re: [PATCH 4/4] ci: Add check-migration-quick to the clang job
Date: Fri, 18 Oct 2024 11:47:48 -0300 [thread overview]
Message-ID: <87r08d1n8r.fsf@suse.de> (raw)
In-Reply-To: <ZxJu6vfFVP6lYym_@redhat.com>
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Fri, Oct 18, 2024 at 10:51:08AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>> > On Thu, Oct 17, 2024 at 01:29:35PM -0300, Fabiano Rosas wrote:
>> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >>
>> >> > On Thu, Oct 17, 2024 at 11:32:11AM -0300, Fabiano Rosas wrote:
>> >> >> Recent changes to how we invoke the migration tests have
>> >> >> (intentionally) caused them to not be part of the check-qtest target
>> >> >> anymore. Add the check-migration-quick target so we don't lose
>> >> >> migration code testing in this job.
>> >> >
>> >> > But 'check-migration-quick' is only the subset of migration tests,
>> >> > 'check-migration' is all of the migration tests. So surely this is
>> >> > a massive regressions in covage in CI pipelines.
>> >>
>> >> I'm not sure it is. There are tests there already for all the major
>> >> parts of the code: precopy, postcopy, multifd, socket. Besides, we can
>> >> tweak migration-quick to cover spots where we think we're losing
>> >> coverage.
>> >
>> > Each of the tests in migration-test were added for a good reason,
>> > generally to address testing gaps where we had functional regressions
>> > in the past. I don't think its a good idea to stop running such tests
>> > in CI as gating on new contributions. Any time we've had optional
>> > tests in QEMU, we've seen repeated regressions in the area in question.
>> >
>> >> Since our CI offers nothing in terms of reproducibility or
>> >> debuggability, I don't think it's productive to have an increasing
>> >> amount of tests running in CI if that means we'll be dealing with
>> >> timeouts and intermittent crashes constantly.
>> >
>> > Test reliability is a different thing. If a particular test is
>> > flaky, it needs to either be fixed or disabled.
>>
>> The problem is that in this community the idea of "fix" is: wait until
>> someone with the appropriate skill level and interest stumbles upon the
>> problem on their own and fix it in anger.
>>
>> For it to be a proper strategy, we'd need to create an issue in gitlab
>> referencing the bug, have a proper reproducer and encourage contributors
>> to work on the issue.
>
> It is policy that we should be creating issues in gitlab for any
> flaky tests. I wouldn't say we've been perfect at that, but we
> should be doing it, and that link ought to be linked in the code
> if we disable the test there.
>
>> - It was disabled in March 2023 and stood there *not testing anything*
>> while a major refactoring of the test code was happening.
>>
>> - The test was fixed in June 2023, but not reenabled in fear of getting
>> flak from the community for breaking CI again (or at least that's the
>> feeling I got from talking to Juan).
>>
>> - mapped-ram (which relies entirely on multifd) started being worked on
>> and I had to enable the test in my own branch to be able to test the
>> code properly. While disabled, it caught several issues in mapped-ram.
>>
>> - In October 2023 the test is re-enabled an immediately exposes issues
>> in the code.
>>
>> This is how I started working on the migration code. Maybe you can
>> appreciate why I don't feel confident about this fix or disable
>> strategy. It has eaten many hours of my work.
>
> The migration subsystem was definitely suffering from insufficient
> maintainer resources available historically, which is reflected
> in some of the testing problems we've had & largely how I ended
> up spending so much time on migration code too.
>
>> > Looking at its execution time right now, I'd say migration test
>> > is pretty good, considering the permutations we have to target.
>> >
>> > It gets a bad reputation because historically it has been as
>> > much as x20 slower than it is today, and has also struggled
>> > with reliability. The latter is a reflection of the complexity
>> > of migration and and IMHO actually justifies greater testing,
>> > as long as we put in time to address bugs.
>> >
>> > Also we've got one single test program, covering an entire
>> > subsystem in one go, rather than lots of short individual
>> > test programs, so migration unfairly gets blamed for being
>> > slow, when it simply covers alot of functionality in one
>> > program.
>>
>> And still you think it's not worth it having a separate target for
>> testing migration. FWIW, I also proposed splittling it into multiple
>> meson tests, which you also rejected. It would be so much easier to move
>> all of this into a separate target and let those who want nothing do to
>> with to just ignore it.
>
> In the qtests/meson.build, I see we register separate
> suites - a generic "qtest" suite, and a "qtest-$TARGET"
> suite. What's missing here is a suite for subsystem
> classification, which I guess is more or less what you
> proposed here for migration.
>
> How about (in addition to the idea of splitting migration-test
> into one part run for all targets, and one part run for just
> one target), we generalize this concept to work for any
> subsystem tagging
>
> qtest_subsystems = {
> 'migration-test': ['migration'],
> 'cdrom-test': ['block'],
> 'ahci-test': ['block'],
> ....
> }
This is interesting because it would allow a test to be considered in
more than one subsystem. There are many tests that invoke migration
themselves.
>
>
> then when registering tests we could do
>
> suites = ['qtest', 'qtest-' + target_base]
> foreach subsys: qtest_subsystems.get(test, [])
> suites += ['qtest-' + subsys, 'qtest-' + target_base + '-' + subsys]
Hopefully meson won't take this as a hint to construct a 1000 characters
long line when invoking the test.
> endforeach
>
> test(....
> suite: suites)
>
> that would give us a way to run just the migration tests, or
> just the migration tests on x86_64, etc, likewise for other
> subsystems we want to tag, while still keeping 'make check-qtest'
> as the full coverage.
>
>> >> > Any tests in tree need to be exercised by CI as the minimum bar
>> >> > to prevent bit rot from merges.
>> >> >
>> >>
>> >> No disagreement here. But then I'm going to need advice on what to do
>> >> when other maintainers ask us to stop writing migration tests because
>> >> they take too long. I cannot send contributors away nor merge code
>> >> without tests.
>> >
>> > In general, I think it is unreasonable for other maintainers to
>> > tell us to stop adding test coverage for migration, and would
>> > push back against such a request.
>>
>> This might be a larger issue in QEMU. I also heard the same back in 2021
>> when doing ppc work: "don't add too many tests because the CI buckles
>> and people get mad". The same with adding too much logging really. We're
>> hostages to the gitlab CI unfortunately.
>
> Yep, we need more investment in our CI resources. There were some
> ideas discussed at KVM Forum that could help with this, which
> should be publicised soon.
Great. Looking forward to those.
>
>
>> > This feels like something that should be amenable to unit testing.
>> > Might need a little re-factoring of migration code to make it
>> > easier to run a subset of its logic in isolation, but that'd
>> > probably be a win anyway, as such work usually makes code cleaner.
>>
>> I'll invest some time in that. I've no idea how we do unit testing in
>> QEMU.
>
> Mostly starts with the standard glib test program setup, where by
> you create a tests/unit/test-<blah>.c file, with functions for
> each test case that you register with g_test_add, same as qtest.
>
> The key difference from qtest is that you then just directly
> link the test binary to the code to be tested, and call into
> it internal QEMU APIs directly. In this case, it would involve
> linking to the 'libmigration' meson static library object.
Thanks for the introduction. I'll pick some self-contained part of
migration and see how far we are from being able to write unit tests.
>
>
> With regards,
> Daniel
next prev parent reply other threads:[~2024-10-18 14:48 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-17 14:32 [PATCH 0/4] tests/qtest: Move the bulk of migration tests into a separate target Fabiano Rosas
2024-10-17 14:32 ` [PATCH 1/4] tests/qtest: Add check-migration Fabiano Rosas
2024-10-17 15:09 ` Daniel P. Berrangé
2024-10-17 14:32 ` [PATCH 2/4] docs: Add migration tests documentation Fabiano Rosas
2024-10-17 14:32 ` [PATCH 3/4] tests/qtest/migration: Move tests into g_test_slow() Fabiano Rosas
2024-10-17 14:32 ` [PATCH 4/4] ci: Add check-migration-quick to the clang job Fabiano Rosas
2024-10-17 14:57 ` Daniel P. Berrangé
2024-10-17 16:29 ` Fabiano Rosas
2024-10-18 9:01 ` Daniel P. Berrangé
2024-10-18 9:46 ` Peter Maydell
2024-10-18 10:00 ` Daniel P. Berrangé
2024-10-18 10:09 ` Peter Maydell
2024-10-18 10:38 ` Alex Bennée
2024-10-18 13:51 ` Fabiano Rosas
2024-10-18 13:54 ` Daniel P. Berrangé
2024-10-18 14:28 ` Fabiano Rosas
2024-10-18 14:33 ` Daniel P. Berrangé
2024-10-18 13:51 ` Fabiano Rosas
2024-10-18 14:21 ` Daniel P. Berrangé
2024-10-18 14:47 ` Fabiano Rosas [this message]
2024-10-18 15:25 ` Peter Maydell
2024-10-18 16:12 ` Daniel P. Berrangé
2024-10-21 14:55 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r08d1n8r.fsf@suse.de \
--to=farosas@suse.de \
--cc=alex.bennee@linaro.org \
--cc=berrange@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
--cc=wainersm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).