From: Alan Huang <mmpgouride@gmail.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Theodore Ts'o <tytso@mit.edu>,
linux-bcachefs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] bcachefs fixes for 6.12-rc2
Date: Mon, 7 Oct 2024 05:31:27 +0800 [thread overview]
Message-ID: <D370F79F-8D33-4156-8675-8C00A2CD2DF3@gmail.com> (raw)
In-Reply-To: <dcfwznpfogbtbsiwbtj56fa3dxnba4aptkcq5a5buwnkma76nc@rjon67szaahh>
On Oct 7, 2024, at 03:29, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>
> On Sun, Oct 06, 2024 at 12:04:45PM GMT, Linus Torvalds wrote:
>> On Sat, 5 Oct 2024 at 21:33, Kent Overstreet <kent.overstreet@linux.dev> wrote:
>>>
>>> On Sun, Oct 06, 2024 at 12:30:02AM GMT, Theodore Ts'o wrote:
>>>>
>>>> You may believe that yours is better than anyone else's, but with
>>>> respect, I disagree, at least for my own workflow and use case. And
>>>> if you look at the number of contributors in both Luis and my xfstests
>>>> runners[2][3], I suspect you'll find that we have far more
>>>> contributors in our git repo than your solo effort....
>>>
>>> Correct me if I'm wrong, but your system isn't available to the
>>> community, and I haven't seen a CI or dashboard for kdevops?
>>>
>>> Believe me, I would love to not be sinking time into this as well, but
>>> we need to standardize on something everyone can use.
>>
>> I really don't think we necessarily need to standardize. Certainly not
>> across completely different subsystems.
>>
>> Maybe filesystem people have something in common, but honestly, even
>> that is rather questionable. Different filesystems have enough
>> different features that you will have different testing needs.
>>
>> And a filesystem tree and an architecture tree (or the networking
>> tree, or whatever) have basically almost _zero_ overlap in testing -
>> apart from the obvious side of just basic build and boot testing.
>>
>> And don't even get me started on drivers, which have a whole different
>> thing and can generally not be tested in some random VM at all.
>
> Drivers are obviously a whole different ballgame, but what I'm after is
> more
> - tooling the community can use
> - some level of common infrastructure, so we're not all rolling our own.
>
> "Test infrastructure the community can use" is a big one, because
> enabling the community and making it easier for people to participate
> and do real development is where our pipeline of new engineers comes
> from.
Yeah, the CI is really helpful, at least for those who want to get involved in
the development of bcachefs. As a new comer, I’m not at all interested in setting up
a separate testing environment at the very beginning, which might be time-consuming
and costly.
>
> Over the past 15 years, I've seen the filesystem community get smaller
> and older, and that's not a good thing. I've had some good success with
> giving ktest access to people in the community, who then start using it
> actively and contributing (small, so far) patches (and interesting, a
> lot of the new activity is from China) - this means they can do
> development at a reasonable pace and I don't have to look at their code
> until it's actually passing all the tests, which is _huge_.
>
> And filesystem tests take overnight to run on a single machine, so
> having something that gets them results back in 20 minutes is also huge.
Exactly, I can verify some ideas very quickly with the help of the CI.
So, a big thank you for all the effort you've put into it!
>
> The other thing I'd really like is to take the best of what we've got
> for testrunner/CI dashboard (and opinions will vary, but of course I
> like ktest the best) and make it available to other subsystems (mm,
> block, kselftests) because not everyone has time to roll their own.
>
> That takes a lot of facetime - getting to know people's workflows,
> porting tests - so it hasn't happened as much as I'd like, but it's
> still an active interest of mine.
>
>> So no. People should *not* try to standardize on something everyone can use.
>>
>> But _everybody_ should participate in the basic build testing (and the
>> basic boot testing we have, even if it probably doesn't exercise much
>> of most subsystems). That covers a *lot* of stuff that various
>> domain-specific testing does not (and generally should not).
>>
>> For example, when you do filesystem-specific testing, you very seldom
>> have much issues with different compilers or architectures. Sure,
>> there can be compiler version issues that affect behavior, but let's
>> be honest: it's very very rare. And yes, there are big-endian machines
>> and the whole 32-bit vs 64-bit thing, and that can certainly affect
>> your filesystem testing, but I would expect it to be a fairly rare and
>> secondary thing for you to worry about when you try to stress your
>> filesystem for correctness.
>
> But - a big gap right now is endian /portability/, and that one is a
> pain to cover with automated tests because you either need access to
> both big and little endian hardware (at a minumm for creating test
> images), or you need to run qemu in full-emulation mode, which is pretty
> unbearably slow.
>
>> But build and boot testing? All those random configs, all those odd
>> architectures, and all those odd compilers *do* affect build testing.
>> So you as a filesystem maintainer should *not* generally strive to do
>> your own basic build test, but very much participate in the generic
>> build test that is being done by various bots (not just on linux-next,
>> but things like the 0day bot on various patch series posted to the
>> list etc).
>>
>> End result: one size does not fit all. But I get unhappy when I see
>> some subsystem that doesn't seem to participate in what I consider the
>> absolute bare minimum.
>
> So the big issue for me has been that with the -next/0day pipeline, I
> have no visibility into when it finishes; which means it has to go onto
> my mental stack of things to watch for and becomes yet another thing to
> pipeline, and the more I have to pipeline the more I lose track of
> things.
>
> (Seriously: when I am constantly tracking 5 different bug reports and
> talking to 5 different users, every additional bit of mental state I
> have to remember is death by a thousand cuts).
>
> Which would all be solved with a dashboard - which is why adding the
> bulid testing to ktest (or ideally, stealing _all_ the 0day tests for
> ktest) is becoming a bigger and bigger priority.
>
>> Btw, there are other ways to make me less unhappy. For example, a
>> couple of years ago, we had a string of issues with the networking
>> tree. Not because there was any particular maintenance issue, but
>> because the networking tree is basically one of the biggest subsystems
>> there are, and so bugs just happen more for that simple reason. Random
>> driver issues that got found resolved quickly, but that kept happening
>> in rc releases (or even final releases).
>>
>> And that was *despite* the networking fixes generally having been in linux-next.
>
> Yeah, same thing has been going on in filesystem land, which is why now
> have fs-next that we're supposed to be targeting our testing automation
> at.
>
> That one will likely come slower for me, because I need to clear out a
> bunch of CI failing tests before I'll want to look at that, but it's on
> my radar.
>
>> Now, the reason I mention the networking tree is that the one simple
>> thing that made it a lot less stressful was that I asked whether the
>> networking fixes pulls could just come in on Thursday instead of late
>> on Friday or Saturday. That meant that any silly things that the bots
>> picked up on (or good testers picked up on quickly) now had an extra
>> day or two to get resolved.
>
> Ok, if fixes coming in on Saturday is an issue for you that's something
> I can absolutely change. The only _critical_ one for rc2 was the
> __wait_for_freeing_inode() fix (which did come in late), the rest
> could've waited until Monday.
>
>> Now, it may be that the string of unfortunate networking issues that
>> caused this policy were entirely just bad luck, and we just haven't
>> had that. But the networking pull still comes in on Thursdays, and
>> we've been doing it that way for four years, and it seems to have
>> worked out well for both sides. I certainly feel a lot better about
>> being able to do the (sometimes fairly sizeable) pull on a Thursday,
>> knowing that if there is some last-minute issue, we can still fix just
>> *that* before the rc or final release.
>>
>> And hey, that's literally just a "this was how we dealt with one
>> particular situation". Not everybody needs to have the same rules,
>> because the exact details will be different. I like doing releases on
>> Sundays, because that way the people who do a fairly normal Mon-Fri
>> week come in to a fresh release (whether rc or not). And people tend
>> to like sending in their "work of the week" to me on Fridays, so I get
>> a lot of pull requests on Friday, and most of the time that works just
>> fine.
>>
>> So the networking tree timing policy ended up working quite well for
>> that, but there's no reason it should be "The Rule" and that everybody
>> should do it. But maybe it would lessen the stress on both sides for
>> bcachefs too if we aimed for that kind of thing?
>
> Yeah, that sounds like the plan then.
next prev parent reply other threads:[~2024-10-06 21:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-05 18:35 [GIT PULL] bcachefs fixes for 6.12-rc2 Kent Overstreet
2024-10-05 22:34 ` Linus Torvalds
2024-10-05 22:54 ` Kent Overstreet
2024-10-05 23:15 ` Linus Torvalds
2024-10-05 23:40 ` Kent Overstreet
2024-10-05 23:47 ` Kent Overstreet
2024-10-06 0:14 ` Linus Torvalds
2024-10-06 0:54 ` Kent Overstreet
2024-10-06 4:30 ` Theodore Ts'o
2024-10-06 4:33 ` Kent Overstreet
2024-10-06 19:04 ` Linus Torvalds
2024-10-06 19:29 ` Kent Overstreet
2024-10-06 21:31 ` Alan Huang [this message]
2024-10-07 15:01 ` Jason A. Donenfeld
2024-10-07 19:59 ` Kent Overstreet
2024-10-07 21:21 ` Jason A. Donenfeld
2024-10-07 23:33 ` Jann Horn
2024-10-09 3:51 ` Theodore Ts'o
2024-10-09 4:17 ` Kent Overstreet
2024-10-09 17:54 ` Theodore Ts'o
[not found] ` <CGME20241010085125eucas1p2ad657cb4a5d0bbb9a6a8579406983210@eucas1p2.samsung.com>
2024-10-10 8:51 ` Daniel Gomez
2024-10-06 11:49 ` Martin Steigerwald
2024-10-06 17:18 ` Kent Overstreet
2024-10-07 15:13 ` Martin Steigerwald
2024-10-06 1:20 ` Carl E. Thompson
2024-10-06 1:56 ` Kent Overstreet
2024-10-06 3:06 ` Carl E. Thompson
2024-10-06 3:42 ` Kent Overstreet
2024-10-07 14:58 ` Josef Bacik
2024-10-07 20:21 ` Kent Overstreet
2024-10-05 22:36 ` pr-tracker-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D370F79F-8D33-4156-8675-8C00A2CD2DF3@gmail.com \
--to=mmpgouride@gmail.com \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).