From: Pingfan Liu <kernelfans@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux ARM <linux-arm-kernel@lists.infradead.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [GIT PULL] arm64 fixes for 5.15-rc5
Date: Sun, 17 Oct 2021 15:42:34 +0800 [thread overview]
Message-ID: <YWvT6hQWODTFaMwA@piliu.users.ipa.redhat.com> (raw)
In-Reply-To: <CAHk-=wjTAJwMJZ-6PPxvdtDmkL0=pfRF77nJ5qWw2vbiTzT4nQ@mail.gmail.com>
Hi Linus,
I am not in the Cc list, so just aware of it the day before yesterday.
On Mon, Oct 11, 2021 at 12:54:49PM -0700, Linus Torvalds wrote:
> On Mon, Oct 11, 2021 at 3:47 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Sorry; I agree that commit messages don't explain this thoroughly. I can
> > go and rework the commit messages to clarify things, if you'd like?
>
> So I really hate that patch, even with explanation.
>
> Having the Kconfig option for "do the right thing" is not how this
> should be fixed.
>
> Either the generic code is generic, or it isn't.
>
> And if the generic code doesn't work for arm64, we shouldn't add a
> random Kconfig option for "this architecture wants different
> behavior".
>
> > The TL;DR is that a bunch of constraints conspire such that we can't
> > defer accounting things to the irqdomain or irqchip code without making
> > this even more complicated/fragile, and many architectures get this
> > subtly wrong today -- it's not that arm64 is necessarily much different
> > from other architectures using this code, just that we're trying to
> > solve this first.
>
> So then I think the fix is to just say "accounting in this generic
> function is wrong, and the accounting needs to be moved to the
> callers".
>
> That's particularly true if you think arm64 is only the only actually
> _tested_ case that gets this wrong, and other architectures most
> likely have the exact same issue, but you only fixed it for arm64.
>
> So do it unconditionally - possibly even using a coccinelle script if
> there are lots of callers.
>
> Because generic code that just isn't generic, but randomly does
> different things based on subtle Kconfig options that are different
> for different architectures is bad, bad, bad.
>
When composing the patch, I failed to realize it, but now, I learn.
> And yes, I realize that we've had that kind of stuff before (and we
> have odd Kconfig option testing in that irqdesc.c file elsewhere), but
> the Kconfig options we currently have are mostly either
>
> (a) actual real honest-to-goodness config options with semantic
> meaning (ie things like CONFIG_SMP and CONFIG_NUMA)
>
> (b) really ugly workarounds for odd special module exports (that
> CONFIG_KVM_BOOK3S_64_HV_MODULE thing for irq_to_desc() that we *tried*
> but failed to remove).
>
> And so the reason I really hate that patch is that it introduces a new
> "different architectures randomly and inexplicably do different
> things, and the generic behavior is very different on arm64 than it is
> elsewhere".
>
> That's just the worst kind of hack to me.
>
> And in this case, it's really *horribly* hard to see what the call
> chain is. It all ends up being actively obfuscated and obscured
> through that 'handle_arch_irq' function pointer, that is sometimes set
> through set_handle_irq(), and sometimes set directly.
>
> I really think that if the rule is "we can't do accounting in
> handle_domain_irq(), because it's too late for arm64", then the fix
> really should be to just not do that.
>
> Move the irq_enter()/irq_exit() to the callers - quite possibly far up
> the call chain to the root of it all, and just say "architecture code
> needs to do this in the low-level code before calling
> handle_arch_irq".
>
> Or, if it turns out that 99% of callers do want it - and don't have
> the issues arm64 has - maybe we can have a helper wrapper that does
> the irq_enter/irq_exit, and another version that doesn't do it, and at
> least it's clear to the callers which one it is. But it looks like
> particularly with the odd indirection through that handle_arch_irq
> function, it's best to just say "this needs to be done".
>
> What architectures actually care? From what I can follow, it's really
> GENERIC_IRQ_MULTI_HANDLER that ends up doing this all, and then arm64
> has it's own special case for no obvious reason (why isn't it using
> GENERIC_IRQ_MULTI_HANDLER that seems to do the EXACT same thing in
> generic code except for a special "default != NULL" case?)
>
> Anyway, it _looks_ to me like the pattern is very simple:
>
> Step 1:
> - remove irq_enter/irq_exit from handle_domain_irq(), move it to all callers.
>
> This clearly doesn't change anything at all, but also doesn't fix the
> problem you have. But it's easy to verify that the code is the same
> before-and-after.
>
> Step 2 is the pattern matching step:
>
> - if the caller of handle_domain_irq() ends up being a function that
> is registered with set_handle_irq(), then we
> (a) remove the irq_enter/irq_exit from it
> (b) add it to the architectures that use handle_arch_irq.
> (c) make sure that if there are other callers of it (not through
> handle_arch_irq) we move that irq_enter/irq_exit into them too
>
> I _suspect_ - but didn't check - that Step 2(c) doesn't actually
> exist. But who knows.
>
> It really looks like there is a very tight connection between "uses
> handle_domain_irq()" and "uses handle_arch_irq/set_handle_irq()". No?
>
> Is this a bit more work? Yes.
>
> Would this fix arm64? Yes - it ends up effectively doing what you did.
>
> Would this fix _other_ architectures doing the same thing that you
> suspect are broken? YES. Instead of leaving them randomly broken.
>
> And most importantly, it would make the rules for "generic" code
> clear, and actually generic.
>
> Now, it may be that I'm wrong. I'm willing to be convinced, and if you
> beat me over the head enough I guess I can take that pull request and
> just be unhappy about it.
>
I had thought if any arch adapts GENERIC_ENTRY, then handle_domain_irq()
can be a bug. As GENERIC_ENTRY is a not _random_ config option, so try
to re-anchor the config depending on GENERIC_ENTRY.
But finally I gave up, since there is no direct link between them at a
glance. And what is more, as you said, "the rules for "generic" code
clear, and actually generic", so it is better to go in that way.
I think Mark has started the work or I will be happy to re-work on these
patches.
Thanks,
Pingfan
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2021-10-17 7:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-08 18:36 [GIT PULL] arm64 fixes for 5.15-rc5 Catalin Marinas
2021-10-08 20:25 ` Linus Torvalds
2021-10-11 10:47 ` Mark Rutland
2021-10-11 19:54 ` Linus Torvalds
2021-10-12 13:18 ` Thomas Gleixner
2021-10-12 14:02 ` Mark Rutland
2021-10-12 15:39 ` Thomas Gleixner
2021-10-21 15:57 ` Mark Rutland
2021-10-17 7:42 ` Pingfan Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWvT6hQWODTFaMwA@piliu.users.ipa.redhat.com \
--to=kernelfans@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).