From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3DF91A6800; Fri, 20 Mar 2026 12:04:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774008297; cv=none; b=T1fUcqOqixKMqX6BZ+bueZZf3vfog8C8czEdhnHa6qmqE+YCXD8zYcPp5QklDjLLbpVMojb5LiT1+I3kYhTTH8Lbf1KV7gWNyEsPX1rowu9j7zt98nHSry9CQaTFiJGHYjj26pOk0ZUL/ME5uxVFfFnJmFpNE190F6dPscNkuwE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774008297; c=relaxed/simple; bh=PEtVe0XP3Yy0m7adMQRU2WDLbqvQqWio1jNWvnoX4lY=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=dkDRSEkN485qZTsXQJZ3sqNgtMUuBXGJTjQr/DvGvqFsJXJWn/wwESX4W8FtGYV9FSSnpOpyySA9vVuHhFekQtZ80jbb8oOry9MwW6TZlW2rMkNh44V49BOEOp9RC6xcM05yLp34kOKV6PLTaZnJRaI3snVHgQsXM2cAloO601U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PlRC563Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PlRC563Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68B0DC4CEF7; Fri, 20 Mar 2026 12:04:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774008297; bh=PEtVe0XP3Yy0m7adMQRU2WDLbqvQqWio1jNWvnoX4lY=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=PlRC563YasevsgiVclZC/bBbLAaUVcMDM19PpLDduE/DppZ+VWTYu8kqchH0GFnhg lZE+maaLsGWW06Vw3wa6T6iGP8pn2jifiEFUpUCuAPWSTb87Y3Z/UQaqcINMzHIeoy qhxZqurfZPm/GmWDExV0unb1Y0B3nxBRGO6pghGlihsNDHVn4g6oDyqIxeAg6haBdm X+BTTdQSg9o2UMLPr90Z4FvDbUrAZen5OIo9UouETfqqGY5AAVLzV/rQGP0OCcqUEm 56tu8YRBlMTvPJXb12VHRnudTG2Je8IsAB0B1QKDFTpLNhlqQjiCks+j8Z77ccvxBD sTqafBEPYPpSg== From: Thomas Gleixner To: Florian Eckert , Rob Herring , Krzysztof Kozlowski , Conor Dooley Cc: linux-kernel@vger.kernel.org, devicetree@vger.kernel.org, Florian Eckert , Eckert.Florian@googlemail.com, ms@dev.tdt.de, Ricardo Neri Subject: Re: [PATCH 2/2] irqchip: Add Lightning Mountain irqchip support In-Reply-To: <20260318-irq-intel-soc-msi-v1-2-0e8cdf844fa8@dev.tdt.de> References: <20260318-irq-intel-soc-msi-v1-0-0e8cdf844fa8@dev.tdt.de> <20260318-irq-intel-soc-msi-v1-2-0e8cdf844fa8@dev.tdt.de> Date: Fri, 20 Mar 2026 13:04:44 +0100 Message-ID: <87v7eqk8pv.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Florian! On Wed, Mar 18 2026 at 14:10, Florian Eckert wrote: > The Lightning Mountain (LGM) has a MSI irqchip connected to the x86 vector > domain. This commit adds the driver for this IP core, which is available on git grep 'This patch' Documentation/process. It doesn't make it better when you replace patch with commit. > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Based on arch/x86/kernel/apic/msi.c and kernel/irq/msi.c > + * Copyright (c) 2019 Intel Corporation. > + * Copyright (c) 2020-2022, MaxLinear, Inc. > + * Copyright (c) 2026 TDT AG. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > + > +#define MSI_MSGA(x) ((x) << 2) > +#define MSI_MSGD(x) (0x200 + ((x) << 2)) > +#define MSI_CTRL 0x400 > +#define MSI_CTRL_EN BIT(0) > +#define MSI_MSK_L 0x404 > +#define MSI_MSK_H 0x408 > + > +#define NMI_MSI_47 47 > +#define NMI_MSI_49 49 > +#define NMI_MSI_62 62 > +#define NMI_MSI_63 63 > + > +#define MAX_SOC_MSI_IRQ_PINS 64 > + > +struct soc_msi_dev { > + struct device *dev; > + void __iomem *base; > + raw_spinlock_t lock; /* protect register handling */ No tail comments please > +}; > + > +struct soc_nmi_msi { > + irq_hw_number_t irq; > + int cpuid; unsigned int. We won't have negative CPU IDs in the forseeable future. > +}; > + > +static const struct soc_nmi_msi nmi_msi[] = { > + {NMI_MSI_47, 3}, > + {NMI_MSI_49, 2}, > + {NMI_MSI_62, 1}, > + {NMI_MSI_63, 0}, > + { }, No C89 initializers and get rid of the pointless trailing {} entry. > +}; > + > +static bool soc_nmi_msi(irq_hw_number_t hwirq) > +{ > + if (hwirq == NMI_MSI_47 || hwirq == NMI_MSI_49 || > + hwirq == NMI_MSI_62 || hwirq == NMI_MSI_63) > + return true; > + > + return false; > +} > + > +static u32 nmi_irq_to_cpuid(irq_hw_number_t hwirq) > +{ > + int i; > + unsigned int nr_pcpus = num_possible_cpus(); Variable declaration ordering. See https://www.kernel.org/doc/html/latest/process/maintainer-tip.html > + for (i = 0; i < ARRAY_SIZE(nmi_msi); i++) { Now you use ARRAY_SIZE() which makes the trailing entry even more wrong. > + if (nmi_msi[i].irq == hwirq) { > + if (nmi_msi[i].cpuid >= nr_pcpus) { > + WARN(1, "NMI on invalid CPU: cpu: %d\n", > + nmi_msi[i].cpuid); No line break required. You have 100 characters. But aside of that this WARN() is bogus as the code path is well known already, no? pr_warn() is sufficient. > + return -EINVAL; > + } > + return nmi_msi[i].cpuid; > + } > + } > + > + WARN((i >= ARRAY_SIZE(nmi_msi)), "Should never come"); Ditto > + return -EINVAL; > +} > + > +static inline void > +soc_msi_update_bits(struct soc_msi_dev *mdev, u32 clr, u32 set, u32 ofs) s/ofs/offs/ > +{ > + writel((readl(mdev->base + ofs) & ~clr) | set, mdev->base + ofs); > +} > + > +static void soc_msi_enable(struct soc_msi_dev *mdev) > +{ > + soc_msi_update_bits(mdev, MSI_CTRL_EN, MSI_CTRL_EN, MSI_CTRL); > +} > + > +static void soc_msi_write_msg(struct irq_data *d, struct msi_msg *msg) > +{ > + struct soc_msi_dev *mdev = irq_data_get_irq_chip_data(d); > + unsigned long flag; > + > + raw_spin_lock_irqsave(&mdev->lock, flag); guard(raw_spinlock)(&mdev->lock); There is no reason for irqsave as irq_write_msi_msg() is invoked with the interrupt descriptor lock held, which implies interrupts are disabled. > + writel(msg->address_lo, mdev->base + MSI_MSGA(d->hwirq)); > + writel(msg->data, mdev->base + MSI_MSGD(d->hwirq)); As the comment on top claims this was modelled after apic/msi.c I have to ask the obvious question how this MSI incarnation is magically not affected by the problem described in great length in msi_set_affinity(). > +static void nmi_msi_compose_msg(struct soc_msi_dev *mdev, irq_hw_number_t hwirq) x86 lacks NMI support for irq domain based allocations.... > +{ > + struct msi_msg msg = {0}; > + unsigned long flag; > + u32 cpuid, destid; > + u32 off; > + > + cpuid = nmi_irq_to_cpuid(hwirq); > + if (cpuid < 0) > + return; > + > + destid = apic->calc_dest_apicid(cpuid); > + off = hwirq < 32 ? MSI_MSK_L : MSI_MSK_H; > + > + msg.arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW; > + msg.arch_addr_lo.dest_mode_logical = apic->dest_mode_logical; > + msg.arch_addr_lo.redirect_hint = 0; > + msg.arch_addr_lo.destid_0_7 = destid & 0xFF; > + > + msg.address_hi = X86_MSI_BASE_ADDRESS_HIGH; > + > + /* > + * On edge trigger, we don't care about assert level. Also, > + * since delivery mode is NMI, no irq vector is needed. > + */ > + msg.arch_data.is_level = 0; > + msg.arch_data.delivery_mode = APIC_DELIVERY_MODE_NMI; > + > + raw_spin_lock_irqsave(&mdev->lock, flag); > + writel(msg.address_lo, mdev->base + MSI_MSGA(hwirq)); > + writel(msg.data, mdev->base + MSI_MSGD(hwirq)); > + soc_msi_update_bits(mdev, BIT(hwirq % 32), 0, off); > + raw_spin_unlock_irqrestore(&mdev->lock, flag); Writing the MSI message in compose_msg() smells more than fishy. > +} > + > + > +static int soc_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, > + unsigned int nr_irqs, void *arg) > +{ > + struct msi_domain_info *msi_info = domain->host_data; > + struct irq_fwspec *fwspec = arg; > + struct irq_alloc_info info; > + irq_hw_number_t hwirq; > + unsigned int type; > + void *chip_data; > + int i, ret; > + > + if (!msi_info) > + return -EINVAL; > + > + chip_data = msi_info->chip_data; > + > + ret = soc_msi_domain_xlate(domain, fwspec, &hwirq, &type); > + if (ret) > + return ret; > + > + if (irq_find_mapping(domain, hwirq) > 0) > + return -EEXIST; > + > + /* > + * All NMI interrupts go to vector 2, no irq mapping needed. > + * What we want is to configure hardware once, don't do anything else. > + * 0 means it will continue to initialize other stuff in the irqdomain. > + * We can just return other value after hw initialized. In this case, > + * irqdomain will release all resources. This comment is incomprehensible word salad. Aside of that ... > + */ > + if (soc_nmi_msi(hwirq)) { > + nmi_msi_compose_msg((struct soc_msi_dev *)chip_data, hwirq); > + return -EINVAL; ... this is a blatant violation of all layering rules known to mankind in one go. Admittedly you get creativity points, but that's not making it technically more correct. > +static int soc_msi_domain_activate(struct irq_domain *domain, > + struct irq_data *irq_data, bool early) > +{ > + struct msi_msg msg[2] = { [1] = { }, }; > + > + WARN_ON(irq_chip_compose_msi_msg(irq_data, msg)); > + soc_msi_write_msg(irq_data, msg); > + > + return 0; That's just a copy of the corresponding kernel/irq/msi.c function. > +} > + > +static void soc_msi_domain_deactivate(struct irq_domain *domain, > + struct irq_data *irq_data) > +{ > + struct msi_msg msg[2]; > + > + memset(msg, 0, sizeof(msg)); > + soc_msi_write_msg(irq_data, msg); Ditto. > +} > + > +static const struct irq_domain_ops soc_msi_domain_ops = { Now it's obvious why you need them. You are using the wrong interrupt domain type. > + .translate = soc_msi_domain_xlate, > + .alloc = soc_msi_domain_alloc, > + .free = irq_domain_free_irqs_common, > + .activate = soc_msi_domain_activate, > + .deactivate = soc_msi_domain_deactivate, > +}; > + > +static int intel_soc_msi_probe(struct platform_device *pdev) > +{ > + struct device_node *node = pdev->dev.of_node; > + struct msi_domain_info *msi_info; > + struct irq_domain *domain; > + struct soc_msi_dev *mdev; > + struct resource *res; > + > + mdev = devm_kzalloc(&pdev->dev, sizeof(*mdev), GFP_KERNEL); > + if (!mdev) > + return -ENOMEM; > + > + msi_info = devm_kzalloc(&pdev->dev, sizeof(*msi_info), GFP_KERNEL); > + if (!msi_info) > + return -ENOMEM; > + > + mdev->dev = &pdev->dev; > + > + msi_info->flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS; > + msi_info->chip_data = mdev; Q: Does this magically create a MSI domain? A: No Q: Why? A: Because irq_domain_create_hierarchy() will never reach the MSI code that handles this. > + > + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > + if (!res) > + return -EINVAL; > + > + mdev->base = devm_ioremap_resource(&pdev->dev, res); > + if (IS_ERR(mdev->base)) { > + dev_err(&pdev->dev, "failed to ioremap %pR\n", res); > + return PTR_ERR(mdev->base); > + } > + > + domain = irq_domain_create_hierarchy(x86_vector_domain, 0, So this is hardwired to the vector domain and does not allow the interrupts to be remapped? Those SoCs have VT-x which implies interrupt remapping support. But what do I know about the infinite wisdom of hardware designers. TBH, if they decided to hardwire it to the vector domain, then they are begging for a cluebat treatment. Let me summarize what I can crystal-ball out of your comprehensive change log and the insane amount of comments in the code: 1) The IP block converts 'wired' interrupts to MSI messages 2) It needs to route four interrupts as NMI Right? #1 The implementation gets the MSI interrupt domain concept completely wrong X86 uses the MSI parent domain concept. [vector domain] -- [remap domain] -- [ device domain] The remap domain is optional, but both the vector domain and the remap domain act as MSI parent domains. So what you want to create for that chip is a MSI device domain and that domain needs to set the bus token to DOMAIN_BUS_WIRED_TO_MSI. See drivers/irqchip/irq-mbigen.c mbigen_create_device_domain() and related code as an example for a proper wired to MSI implementation. #2 NMI routing There has been attempts to implement that before in a clean way. The patch set dried out, but the underlying changes for NMI support are still valid and Ricardo (CC'ed) is working on them again, IIRC. See: https://lore.kernel.org/lkml/20230301234753.28582-1-ricardo.neri-calderon@linux.intel.com/ Thanks, tglx