* Re: [PATCH] irqchip/armada-370-xp: Implement SoC Error interrupts
[not found] <20240814124537.29847-1-kabel@kernel.org>
@ 2024-08-14 15:51 ` Thomas Gleixner
2024-08-14 16:43 ` Conor Dooley
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2024-08-14 15:51 UTC (permalink / raw)
To: Marek Behún, Andrew Lunn, Gregory Clement,
Sebastian Hesselbarth, linux-arm-kernel, arm, Andy Shevchenko,
Hans de Goede, Ilpo Järvinen, Rob Herring, devicetree
Cc: Marek Behún
On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:
Cc+ device tree people.
> + The code binds this new interrupt domain to the same device-tree node as
> + the main interrupt domain. The main interrupt controller has its
> + interrupts described by one argument in device-tree
> + (#interrupt-cells = <1>), i.e.:
> +
> + interrupts-extended = <&mpic 8>;
> +
> + Because of backwards compatibility we cannot change this number of
> + arguments, and so the SoC Error interrupts must also be described by
> + this one number.
> +
> + Thus, to describe a SoC Error interrupt, one has to add the an offset
> + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> + main controller supports at most 1024 interrupts (in theory; in practice
> + it seems to be 116 interrupts on all supported platforms). An example of
> + describing a SoC Error interrupt is
> +
> + interrupts-extended = <&mpic 0x404>;
This looks like a horrible hack and I don't understand why this can't be
a separate interrupt controller, which it is in the hardware. That
controller utilizes interrupt 4 from the MPIC.
But then my DT foo is limited, so I let the DT folks comment on that.
> +static int mpic_soc_err_irq_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force)
> +{
> + unsigned int cpu;
> +
> + /*
> + * TODO: The mpic->per_cpu region accesses CPU Local IC registers for CPU n when accessed
> + * from CPU n. Thus if we want to access this registers from another CPU, we need to request
> + * a function to be executed on CPU n. This is what we do here by calling smp_call_on_cpu().
> + *
> + * Instead, we could access CPU Local IC registers by having CPU Local region of each CPU
> + * mapped in the MPIC private data structure. We could do this either by extending the
> + * register resource in the device-tree, or by computing the physical base address of those
> + * regions relative to the main MPIC base address.
That requires locking for those registers obviously.
> + */
> +
> + cpus_read_lock();
This code was clearly never tested with any debug enabled.
set_affinity() is invoked with interrupts disabled and irq_desc::lock
held. cpus_read_lock() can sleep... The mandatory debug options would
have told you loud and clearly.
> + /* First, disable the ERR IRQ on all cores */
> + for_each_online_cpu(cpu)
> + smp_call_on_cpu(cpu, mpic_soc_err_irq_mask_on_cpu, d, true);
Again. smp_call_on_cpu() invokes wait_for_completion(), which obviously
can sleep.
Also why do you want to do that on _ALL_ CPUs if there is only one you
pick from the effective affinity mask?
> + /* Then enable on one online core from the affinity mask */
> + cpu = cpumask_any_and(mask, cpu_online_mask);
> + smp_call_on_cpu(cpu, mpic_soc_err_irq_unmask_on_cpu, d, true);
Ditto.
So you really want to map the registers so they are accessible cross CPU
including locking. Alternatively pin the error interrupts to CPU0 which
cannot be unplugged and be done with it.
> +static int mpic_soc_err_irq_map(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq)
> +{
> + struct mpic *mpic = domain->host_data;
> +
> + irq_set_chip_data(virq, mpic);
> +
> + mpic_soc_err_irq_mask(irq_get_irq_data(virq));
What for? It should be masked if it's not mapped, no?
> + irq_set_status_flags(virq, IRQ_LEVEL);
> + irq_set_chip_and_handler(virq, &mpic_soc_err_irq_chip, handle_level_irq);
> + irq_set_probe(virq);
> +
> + return 0;
> +}
> +static int mpic_soc_err_xlate(struct irq_domain *domain, struct device_node *np,
> + const u32 *spec, unsigned int count,
> + unsigned long *hwirq, unsigned int *type)
> +{
> + int err = irq_domain_xlate_onecell(domain, np, spec, count, hwirq, type);
> +
> + if (err)
> + return err;
> +
> + *hwirq -= MPIC_SOC_ERR_IRQS_OFFSET;
> + return 0;
> +}
> +static int __init mpic_soc_err_init(struct mpic *mpic, struct device_node *np)
> +{
> + unsigned int nr_irqs;
> +
> + if (of_machine_is_compatible("marvell,armada-370-xp"))
> + nr_irqs = 32;
> + else
> + nr_irqs = 64;
> +
> + mpic->soc_err_domain = irq_domain_add_hierarchy(mpic->domain, 0, nr_irqs, np,
> + &mpic_soc_err_irq_ops, mpic);
Why is this a hierarchical domain? That does not make any sense at
all. The MPIC domain provides only the demultiplexing interrupt and is
not involved in a domain hierarchy. Hierarchical domains are required
when the top level domain depends on resources from the parent domain
which need to be allocated when mapping an interrupt. e.g. on x86:
vector - remap - PCI/MSI
So the top level PCI/MSI domain requires resources from the remap domain
and the remap domain requires a vector from the vector domain. These
resources are per interrupt.
During runtime there are also irq_*() callbacks which utilize callbacks
from the parent domains. I.e. mask() invokes mask(parent) ...
But that has nothing to do with demultiplexed interrupts because the
underlying demultiplex interrupt is already there and the same for all
interrupts in the demultiplexed domain. There is no callback dependency
and no resource dependency at all.
Thanks,
tglx
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] irqchip/armada-370-xp: Implement SoC Error interrupts
2024-08-14 15:51 ` [PATCH] irqchip/armada-370-xp: Implement SoC Error interrupts Thomas Gleixner
@ 2024-08-14 16:43 ` Conor Dooley
2024-08-15 10:15 ` Marek Behún
0 siblings, 1 reply; 3+ messages in thread
From: Conor Dooley @ 2024-08-14 16:43 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Marek Behún, Andrew Lunn, Gregory Clement,
Sebastian Hesselbarth, linux-arm-kernel, arm, Andy Shevchenko,
Hans de Goede, Ilpo Järvinen, Rob Herring, devicetree
[-- Attachment #1: Type: text/plain, Size: 2083 bytes --]
On Wed, Aug 14, 2024 at 05:51:33PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:
>
> Cc+ device tree people.
>
> > + The code binds this new interrupt domain to the same device-tree node as
> > + the main interrupt domain. The main interrupt controller has its
> > + interrupts described by one argument in device-tree
> > + (#interrupt-cells = <1>), i.e.:
> > +
> > + interrupts-extended = <&mpic 8>;
> > +
> > + Because of backwards compatibility we cannot change this number of
> > + arguments, and so the SoC Error interrupts must also be described by
> > + this one number.
> > +
> > + Thus, to describe a SoC Error interrupt, one has to add the an offset
> > + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> > + main controller supports at most 1024 interrupts (in theory; in practice
> > + it seems to be 116 interrupts on all supported platforms). An example of
> > + describing a SoC Error interrupt is
> > +
> > + interrupts-extended = <&mpic 0x404>;
>
> This looks like a horrible hack and I don't understand why this can't be
> a separate interrupt controller, which it is in the hardware. That
> controller utilizes interrupt 4 from the MPIC.
>
> But then my DT foo is limited, so I let the DT folks comment on that.
I'm guessing, not being author and all that, that since the registers
for this SOC_ERR business are intermingled with those of the existing
interrupt controller it is not possible to create a standalone node for
the new controller with it's own reg entry, without having to redo
the binding etc for the existing controller, ending up with some sort of
syscon.
Alternatively, I suppose a child node could work, but it wouldn't be much
less of a hack than deciding that numbers > 400 are the SOC_ERR ones.
I see arguments for doing it either way and Marek must have an opinion
on doing it without a new node, given the comment suggesting that a
previous revision did have a dedicated node and that approach was
dropped.
Cheers,
Conor.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] irqchip/armada-370-xp: Implement SoC Error interrupts
2024-08-14 16:43 ` Conor Dooley
@ 2024-08-15 10:15 ` Marek Behún
0 siblings, 0 replies; 3+ messages in thread
From: Marek Behún @ 2024-08-15 10:15 UTC (permalink / raw)
To: Conor Dooley, Thomas Gleixner
Cc: Marek Behún, Andrew Lunn, Gregory Clement,
Sebastian Hesselbarth, linux-arm-kernel, arm, Andy Shevchenko,
Hans de Goede, Ilpo Järvinen, Rob Herring, devicetree
On Wed, Aug 14, 2024 at 05:43:19PM +0100, Conor Dooley wrote:
> On Wed, Aug 14, 2024 at 05:51:33PM +0200, Thomas Gleixner wrote:
> > On Wed, Aug 14 2024 at 14:45, Marek Behún wrote:
> >
> > Cc+ device tree people.
> >
> > > + The code binds this new interrupt domain to the same device-tree node as
> > > + the main interrupt domain. The main interrupt controller has its
> > > + interrupts described by one argument in device-tree
> > > + (#interrupt-cells = <1>), i.e.:
> > > +
> > > + interrupts-extended = <&mpic 8>;
> > > +
> > > + Because of backwards compatibility we cannot change this number of
> > > + arguments, and so the SoC Error interrupts must also be described by
> > > + this one number.
> > > +
> > > + Thus, to describe a SoC Error interrupt, one has to add the an offset
> > > + to the SoC Error interrupt number. Offset 0x400 was chosen because the
> > > + main controller supports at most 1024 interrupts (in theory; in practice
> > > + it seems to be 116 interrupts on all supported platforms). An example of
> > > + describing a SoC Error interrupt is
> > > +
> > > + interrupts-extended = <&mpic 0x404>;
> >
> > This looks like a horrible hack and I don't understand why this can't be
> > a separate interrupt controller, which it is in the hardware. That
> > controller utilizes interrupt 4 from the MPIC.
> >
> > But then my DT foo is limited, so I let the DT folks comment on that.
>
> I'm guessing, not being author and all that, that since the registers
> for this SOC_ERR business are intermingled with those of the existing
> interrupt controller it is not possible to create a standalone node for
> the new controller with it's own reg entry, without having to redo
> the binding etc for the existing controller, ending up with some sort of
> syscon.
> Alternatively, I suppose a child node could work, but it wouldn't be much
> less of a hack than deciding that numbers > 400 are the SOC_ERR ones.
> I see arguments for doing it either way and Marek must have an opinion
> on doing it without a new node, given the comment suggesting that a
> previous revision did have a dedicated node and that approach was
> dropped.
This is exactly the reason.
Pali's original code required creating another interrupt-controller node
(soc_err) within the mpic node:
mpic: interrupt-controller@20a00 {
compatible = "marvell,mpic";
reg = <0x20a00 0x2d0>, <0x21070 0x58>;
#interrupt-cells = <1>;
interrupt-controller;
msi-controller;
interrupts = <GIC_PPI 15 IRQ_TYPE_LEVEL_HIGH>;
soc_err: interrupt-controller@20 {
interrupt-controller;
#interrupt-cells = <1>;
};
};
I decided against it, because to do this correctly in device-tree
bindings, it gets unnecessarily complicated:
- it requires #address-cells and #size-cells within the mpic node
- the `interrupt-controller` property of mpic node can in some
interpretations clash with the `interrupt-controller@20` name of the
child node
- the registers of soc_err and mpic overlap
- the child note should have a compatible string if we want to be proper
I also did consider syscon, but IMO it just adds unnecessary burden to
backwards compatibility of device-trees.
The solution with the 0x400 offset solves the backwards compatiblity
issue.
Marek
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-15 10:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240814124537.29847-1-kabel@kernel.org>
2024-08-14 15:51 ` [PATCH] irqchip/armada-370-xp: Implement SoC Error interrupts Thomas Gleixner
2024-08-14 16:43 ` Conor Dooley
2024-08-15 10:15 ` Marek Behún
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox