* [PATCH v5 3/3] iommu/arm-smmu-v3: Allow ATS to be always on
From: Nicolin Chen @ 2026-05-20 19:46 UTC (permalink / raw)
To: jgg, will, joro, bhelgaas
Cc: robin.murphy, praan, baolu.lu, kevin.tian, miko.lenczewski,
linux-arm-kernel, iommu, linux-kernel, linux-pci, dan.j.williams,
jonathan.cameron, vsethi, linux-cxl, nirmoyd
In-Reply-To: <cover.1779304390.git.nicolinc@nvidia.com>
When a device's default substream attaches to an identity domain, the SMMU
driver currently sets the device's STE between two modes:
Mode 1: Cfg=Translate, S1DSS=Bypass, EATS=1
Mode 2: Cfg=bypass (EATS is ignored by HW)
When there is an active PASID (non-default substream), mode 1 is used. And
when there is no PASID support or no active PASID, mode 2 is used.
The driver will also downgrade an STE from mode 1 to mode 2, when the last
active substream becomes inactive.
However, there are PCIe devices that demand ATS to be always on. For these
devices, their STEs have to use the mode 1 as HW ignores EATS with mode 2.
Change the driver accordingly:
- always use the mode 1
- never downgrade to mode 2
- allocate and retain a CD table (see note below)
Note that these devices might not support PASID, i.e. doing non-PASID ATS.
In such a case, the ssid_bits is set to 0. However, s1cdmax must be set to
a !0 value in order to keep the S1DSS field effective. Thus, when a master
requires ats_always_on, set its s1cdmax to at least 1, meaning that the CD
table will have a dummy entry (SSID=1) that will never be used.
Now for these devices, arm_smmu_cdtab_allocated() will always return true,
v.s. false prior to this change. When its default substream is attached to
an IDENTITY domain, its first CD is NULL in the table, which is a totally
valid case. Thus, add "!master->ats_always_on" to the condition.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 75 ++++++++++++++++++---
2 files changed, 68 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ef42df4753ec4..8c3600f4364c5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -943,6 +943,7 @@ struct arm_smmu_master {
bool ats_enabled : 1;
bool ste_ats_enabled : 1;
bool stall_enabled;
+ bool ats_always_on;
unsigned int ssid_bits;
unsigned int iopf_refcount;
};
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e8d7dbe495f03..5c9d4bb542249 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1742,8 +1742,11 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
if (!arm_smmu_cdtab_allocated(&master->cd_table))
return;
cdptr = arm_smmu_get_cd_ptr(master, ssid);
- if (WARN_ON(!cdptr))
+ if (!cdptr) {
+ /* Only ats_always_on allows a NULL CD on default substream */
+ WARN_ON(!master->ats_always_on || ssid);
return;
+ }
arm_smmu_write_cd_entry(master, ssid, cdptr, &target);
}
@@ -1756,6 +1759,22 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
cd_table->s1cdmax = master->ssid_bits;
+
+ /*
+ * When a device doesn't support PASID (non default SSID), ssid_bits is
+ * set to 0. This also sets S1CDMAX to 0, which disables the substreams
+ * and ignores the S1DSS field.
+ *
+ * On the other hand, if a device demands ATS to be always on even when
+ * its default substream is IOMMU bypassed, it has to use EATS that is
+ * only effective with an STE (CFG=S1translate, S1DSS=Bypass). For such
+ * use cases, S1CDMAX has to be !0, in order to make use of S1DSS/EATS.
+ *
+ * Set S1CDMAX no lower than 1. This would add a dummy substream in the
+ * CD table but it should never be used by an actual CD.
+ */
+ if (master->ats_always_on)
+ cd_table->s1cdmax = max_t(u8, cd_table->s1cdmax, 1);
max_contexts = 1 << cd_table->s1cdmax;
if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) ||
@@ -3851,7 +3870,8 @@ static int arm_smmu_blocking_set_dev_pasid(struct iommu_domain *new_domain,
* When the last user of the CD table goes away downgrade the STE back
* to a non-cd_table one, by re-attaching its sid_domain.
*/
- if (!arm_smmu_ssids_in_use(&master->cd_table)) {
+ if (!master->ats_always_on &&
+ !arm_smmu_ssids_in_use(&master->cd_table)) {
struct iommu_domain *sid_domain =
iommu_driver_get_domain_for_dev(master->dev);
@@ -3875,6 +3895,8 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain,
.old_domain = old_domain,
.ssid = IOMMU_NO_PASID,
};
+ bool ats_always_on = master->ats_always_on &&
+ s1dss != STRTAB_STE_1_S1DSS_TERMINATE;
/*
* Do not allow any ASID to be changed while are working on the STE,
@@ -3886,7 +3908,7 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain,
* If the CD table is not in use we can use the provided STE, otherwise
* we use a cdtable STE with the provided S1DSS.
*/
- if (arm_smmu_ssids_in_use(&master->cd_table)) {
+ if (ats_always_on || arm_smmu_ssids_in_use(&master->cd_table)) {
/*
* If a CD table has to be present then we need to run with ATS
* on because we have to assume a PASID is using ATS. For
@@ -4215,6 +4237,42 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master)
kfree(master->build_invs);
}
+static int arm_smmu_master_prepare_ats(struct arm_smmu_master *master)
+{
+ bool s1p = master->smmu->features & ARM_SMMU_FEAT_TRANS_S1;
+ unsigned int stu = __ffs(master->smmu->pgsize_bitmap);
+ struct pci_dev *pdev;
+ int ret;
+
+ if (!arm_smmu_ats_supported(master))
+ return 0;
+
+ pdev = to_pci_dev(master->dev);
+
+ if (!pci_ats_required(pdev))
+ goto out_prepare;
+
+ /*
+ * S1DSS is required for ATS to be always on for identity domain cases.
+ * However, the S1DSS field is ignored if !IDR0_S1P or !IDR1_SSIDSIZE.
+ */
+ if (!s1p || !master->smmu->ssid_bits) {
+ dev_info_once(master->dev,
+ "SMMU doesn't support ATS to be always on\n");
+ goto out_prepare;
+ }
+
+ master->ats_always_on = true;
+
+ ret = arm_smmu_alloc_cd_tables(master);
+ if (ret)
+ return ret;
+
+out_prepare:
+ pci_prepare_ats(pdev, stu);
+ return 0;
+}
+
static struct iommu_device *arm_smmu_probe_device(struct device *dev)
{
int ret;
@@ -4263,14 +4321,15 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
master->stall_enabled = true;
- if (dev_is_pci(dev)) {
- unsigned int stu = __ffs(smmu->pgsize_bitmap);
-
- pci_prepare_ats(to_pci_dev(dev), stu);
- }
+ ret = arm_smmu_master_prepare_ats(master);
+ if (ret)
+ goto err_disable_pasid;
return &smmu->iommu;
+err_disable_pasid:
+ arm_smmu_disable_pasid(master);
+ arm_smmu_remove_master(master);
err_free_master:
kfree(master);
return ERR_PTR(ret);
--
2.43.0
^ permalink raw reply related
* [PATCH v5 2/3] PCI: Allow ATS to be always on for pre-CXL devices
From: Nicolin Chen @ 2026-05-20 19:46 UTC (permalink / raw)
To: jgg, will, joro, bhelgaas
Cc: robin.murphy, praan, baolu.lu, kevin.tian, miko.lenczewski,
linux-arm-kernel, iommu, linux-kernel, linux-pci, dan.j.williams,
jonathan.cameron, vsethi, linux-cxl, nirmoyd
In-Reply-To: <cover.1779304390.git.nicolinc@nvidia.com>
Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
have many CXL-like properties. Call this kind "pre-CXL".
Similar to CXL.cache capability, these pre-CXL devices also require the ATS
function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
Introduce pci_dev_specific_ats_required() quirk function to scan a list of
IDs for these devices. Then, include it in pci_ats_required().
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/pci/pci.h | 9 +++++++++
drivers/pci/ats.c | 3 ++-
drivers/pci/quirks.c | 42 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4a14f88e543a2..e8ad27abb1cfe 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -1155,6 +1155,15 @@ static inline int pci_dev_specific_reset(struct pci_dev *dev, bool probe)
}
#endif
+#if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_PCI_ATS)
+bool pci_dev_specific_ats_required(struct pci_dev *dev);
+#else
+static inline bool pci_dev_specific_ats_required(struct pci_dev *dev)
+{
+ return false;
+}
+#endif
+
#if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_ARM64)
int acpi_get_rc_resources(struct device *dev, const char *hid, u16 segment,
struct resource *res);
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ebdf761843867..3a04d5b04c883 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -247,7 +247,8 @@ bool pci_ats_required(struct pci_dev *pdev)
if (pdev->is_virtfn)
pdev = pci_physfn(pdev);
- return pci_cxl_ats_required(pdev);
+ return pci_cxl_ats_required(pdev) ||
+ pci_dev_specific_ats_required(pdev);
}
EXPORT_SYMBOL_GPL(pci_ats_required);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index caaed1a01dc02..c0242f3e9f063 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5715,6 +5715,48 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1457, quirk_intel_e2000_no_ats);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1459, quirk_intel_e2000_no_ats);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145a, quirk_intel_e2000_no_ats);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145c, quirk_intel_e2000_no_ats);
+
+static bool quirk_nvidia_gpu_ats_required(struct pci_dev *pdev)
+{
+ switch (pdev->device) {
+ case 0x2e00 ... 0x2e3f: /* GB20B */
+ return true;
+ }
+ return false;
+}
+
+static const struct pci_dev_ats_required {
+ u16 vendor;
+ u16 device;
+ bool (*ats_required)(struct pci_dev *dev);
+} pci_dev_ats_required[] = {
+ /* NVIDIA GPUs */
+ { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, quirk_nvidia_gpu_ats_required },
+ /* NVIDIA CX10 Family NVlink-C2C */
+ { PCI_VENDOR_ID_MELLANOX, 0x2101, NULL },
+ { 0 }
+};
+
+/*
+ * Some NVIDIA devices do not implement CXL config space, but present as PCIe
+ * devices that can issue CXL-like cache operations like CXL.cache. Thus, they
+ * require ATS to obtain host physical addresses, like pci_cxl_ats_required().
+ */
+bool pci_dev_specific_ats_required(struct pci_dev *pdev)
+{
+ const struct pci_dev_ats_required *i;
+
+ for (i = pci_dev_ats_required; i->vendor; i++) {
+ if (i->vendor != pdev->vendor)
+ continue;
+ if (i->ats_required && i->ats_required(pdev))
+ return true;
+ if (!i->ats_required && i->device == pdev->device)
+ return true;
+ }
+
+ return false;
+}
#endif /* CONFIG_PCI_ATS */
/* Freescale PCIe doesn't support MSI in RC mode */
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v5 1/3] PCI: Add pci_ats_required() for CXL.cache capable devices
From: Bjorn Helgaas @ 2026-05-20 20:03 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, will, joro, bhelgaas, robin.murphy, praan, baolu.lu,
kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
linux-cxl, nirmoyd
In-Reply-To: <9e01e6d39deff2bf751da3e1abb43f35a9169194.1779304390.git.nicolinc@nvidia.com>
On Wed, May 20, 2026 at 12:46:08PM -0700, Nicolin Chen wrote:
> Controlled by the IOMMU driver, ATS is usually enabled "on demand" when a
> given PASID on a device is attached to an I/O page table. This is working
> even when a device has no translation on its RID (i.e., the RID is IOMMU
> bypassed).
>
> However, certain PCIe devices require non-PASID ATS on their RID even when
> the RID is IOMMU bypassed. Call this "ATS always on" in IOMMU term.
>
> For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache:
> "To source requests on CXL.cache, devices need to get the Host Physical
> Address (HPA) from the Host by means of an ATS request on CXL.io."
>
> In other words, the CXL.cache capability requires ATS; otherwise, it can't
> access host physical memory.
>
> Introduce a new pci_ats_required() helper for the IOMMU driver to scan a
> PCI device and shift ATS policies between "on demand" and "always on".
>
> Add the support for CXL.cache devices first. Pre-CXL devices will be added
> in quirks.c file.
>
> Note that pci_ats_required() validates against pci_ats_supported(), so we
> ensure that untrusted devices (e.g. external ports) will not be always on.
> This maintains the existing ATS security policy regarding potential side-
> channel attacks via ATS.
>
> Cc: linux-cxl@vger.kernel.org
> Suggested-by: Vikram Sethi <vsethi@nvidia.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
> Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
One lingering question below that I asked before but I don't think
anybody answered.
> ---
> include/linux/pci-ats.h | 3 +++
> include/uapi/linux/pci_regs.h | 1 +
> drivers/pci/ats.c | 46 +++++++++++++++++++++++++++++++++++
> 3 files changed, 50 insertions(+)
>
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 75c6c86cf09dc..f3723b6861294 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -12,6 +12,7 @@ int pci_prepare_ats(struct pci_dev *dev, int ps);
> void pci_disable_ats(struct pci_dev *dev);
> int pci_ats_queue_depth(struct pci_dev *dev);
> int pci_ats_page_aligned(struct pci_dev *dev);
> +bool pci_ats_required(struct pci_dev *dev);
> #else /* CONFIG_PCI_ATS */
> static inline bool pci_ats_supported(struct pci_dev *d)
> { return false; }
> @@ -24,6 +25,8 @@ static inline int pci_ats_queue_depth(struct pci_dev *d)
> { return -ENODEV; }
> static inline int pci_ats_page_aligned(struct pci_dev *dev)
> { return 0; }
> +static inline bool pci_ats_required(struct pci_dev *dev)
> +{ return false; }
> #endif /* CONFIG_PCI_ATS */
>
> #ifdef CONFIG_PCI_PRI
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 14f634ab9350d..6ac45be1008b8 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -1349,6 +1349,7 @@
> /* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
> #define PCI_DVSEC_CXL_DEVICE 0
> #define PCI_DVSEC_CXL_CAP 0xA
> +#define PCI_DVSEC_CXL_CACHE_CAPABLE _BITUL(0)
> #define PCI_DVSEC_CXL_MEM_CAPABLE _BITUL(2)
> #define PCI_DVSEC_CXL_HDM_COUNT __GENMASK(5, 4)
> #define PCI_DVSEC_CXL_CTRL 0xC
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index ec6c8dbdc5e9c..ebdf761843867 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -205,6 +205,52 @@ int pci_ats_page_aligned(struct pci_dev *pdev)
> return 0;
> }
>
> +/*
> + * CXL r4.0, sec 3.2.5.13 Memory Type on CXL.cache notes: to source requests on
> + * CXL.cache, devices need to get the Host Physical Address (HPA) from the Host
> + * by means of an ATS request on CXL.io.
> + *
> + * In other words, CXL.cache devices cannot access host physical memory without
> + * ATS.
> + *
> + * Check Cache_Capable instead of Cache_Enable because CXL.cache may be enabled
> + * after the caller uses this to make its ATS decision.
> + */
> +static bool pci_cxl_ats_required(struct pci_dev *pdev)
> +{
> + int offset;
> + u16 cap;
> +
> + offset = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!offset)
> + return false;
> +
> + if (pci_read_config_word(pdev, offset + PCI_DVSEC_CXL_CAP, &cap))
> + return false;
> +
> + return cap & PCI_DVSEC_CXL_CACHE_CAPABLE;
> +}
> +
> +/**
> + * pci_ats_required - Whether the PCI device requires ATS
> + * @pdev: the PCI device
> + *
> + * Returns true, if the PCI device requires ATS for basic functional operation.
> + */
> +bool pci_ats_required(struct pci_dev *pdev)
> +{
> + if (pci_ats_disabled() || !pci_ats_supported(pdev))
> + return false;
I still have the question about whether it's necessary to test
pci_ats_disabled() here. I think pci_ats_supported() should return
false if pci_ats_disabled() returns true.
> + /* A VF inherits its PF's requirement for ATS function */
> + if (pdev->is_virtfn)
> + pdev = pci_physfn(pdev);
> +
> + return pci_cxl_ats_required(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_required);
> +
> #ifdef CONFIG_PCI_PRI
> void pci_pri_init(struct pci_dev *pdev)
> {
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH v5 2/3] PCI: Allow ATS to be always on for pre-CXL devices
From: Bjorn Helgaas @ 2026-05-20 20:04 UTC (permalink / raw)
To: Nicolin Chen
Cc: jgg, will, joro, bhelgaas, robin.murphy, praan, baolu.lu,
kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
linux-cxl, nirmoyd
In-Reply-To: <35e75bf0abfa48f76bc87d73a772a3faf6271a9f.1779304390.git.nicolinc@nvidia.com>
On Wed, May 20, 2026 at 12:46:09PM -0700, Nicolin Chen wrote:
> Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
> have many CXL-like properties. Call this kind "pre-CXL".
>
> Similar to CXL.cache capability, these pre-CXL devices also require the ATS
> function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
> v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
>
> Introduce pci_dev_specific_ats_required() quirk function to scan a list of
> IDs for these devices. Then, include it in pci_ats_required().
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
> Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/pci.h | 9 +++++++++
> drivers/pci/ats.c | 3 ++-
> drivers/pci/quirks.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4a14f88e543a2..e8ad27abb1cfe 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -1155,6 +1155,15 @@ static inline int pci_dev_specific_reset(struct pci_dev *dev, bool probe)
> }
> #endif
>
> +#if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_PCI_ATS)
> +bool pci_dev_specific_ats_required(struct pci_dev *dev);
> +#else
> +static inline bool pci_dev_specific_ats_required(struct pci_dev *dev)
> +{
> + return false;
> +}
> +#endif
> +
> #if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_ARM64)
> int acpi_get_rc_resources(struct device *dev, const char *hid, u16 segment,
> struct resource *res);
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index ebdf761843867..3a04d5b04c883 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -247,7 +247,8 @@ bool pci_ats_required(struct pci_dev *pdev)
> if (pdev->is_virtfn)
> pdev = pci_physfn(pdev);
>
> - return pci_cxl_ats_required(pdev);
> + return pci_cxl_ats_required(pdev) ||
> + pci_dev_specific_ats_required(pdev);
> }
> EXPORT_SYMBOL_GPL(pci_ats_required);
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index caaed1a01dc02..c0242f3e9f063 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5715,6 +5715,48 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1457, quirk_intel_e2000_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1459, quirk_intel_e2000_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145a, quirk_intel_e2000_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x145c, quirk_intel_e2000_no_ats);
> +
> +static bool quirk_nvidia_gpu_ats_required(struct pci_dev *pdev)
> +{
> + switch (pdev->device) {
> + case 0x2e00 ... 0x2e3f: /* GB20B */
> + return true;
> + }
> + return false;
> +}
> +
> +static const struct pci_dev_ats_required {
> + u16 vendor;
> + u16 device;
> + bool (*ats_required)(struct pci_dev *dev);
> +} pci_dev_ats_required[] = {
> + /* NVIDIA GPUs */
> + { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, quirk_nvidia_gpu_ats_required },
> + /* NVIDIA CX10 Family NVlink-C2C */
> + { PCI_VENDOR_ID_MELLANOX, 0x2101, NULL },
> + { 0 }
> +};
> +
> +/*
> + * Some NVIDIA devices do not implement CXL config space, but present as PCIe
> + * devices that can issue CXL-like cache operations like CXL.cache. Thus, they
> + * require ATS to obtain host physical addresses, like pci_cxl_ats_required().
> + */
> +bool pci_dev_specific_ats_required(struct pci_dev *pdev)
> +{
> + const struct pci_dev_ats_required *i;
> +
> + for (i = pci_dev_ats_required; i->vendor; i++) {
> + if (i->vendor != pdev->vendor)
> + continue;
> + if (i->ats_required && i->ats_required(pdev))
> + return true;
> + if (!i->ats_required && i->device == pdev->device)
> + return true;
> + }
> +
> + return false;
> +}
> #endif /* CONFIG_PCI_ATS */
>
> /* Freescale PCIe doesn't support MSI in RC mode */
> --
> 2.43.0
>
^ permalink raw reply
* Re: [PATCH v5 1/3] PCI: Add pci_ats_required() for CXL.cache capable devices
From: Nicolin Chen @ 2026-05-20 20:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: jgg, will, joro, bhelgaas, robin.murphy, praan, baolu.lu,
kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
linux-cxl, nirmoyd
In-Reply-To: <20260520200327.GA88349@bhelgaas>
On Wed, May 20, 2026 at 03:03:27PM -0500, Bjorn Helgaas wrote:
> On Wed, May 20, 2026 at 12:46:08PM -0700, Nicolin Chen wrote:
> > +bool pci_ats_required(struct pci_dev *pdev)
> > +{
> > + if (pci_ats_disabled() || !pci_ats_supported(pdev))
> > + return false;
>
> I still have the question about whether it's necessary to test
> pci_ats_disabled() here. I think pci_ats_supported() should return
> false if pci_ats_disabled() returns true.
Sorry, it fell through the crack.
And you are right that. pci_ats_disabled() seems redundant.
I can respin a v6.
Thanks
Nicolin
^ permalink raw reply
* Re: [PATCH v5 1/5] PCI: host-common: Add helper to determine host bridge D3cold eligibility
From: Bjorn Helgaas @ 2026-05-20 20:27 UTC (permalink / raw)
To: Krishna Chaitanya Chundru
Cc: Jingoo Han, Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Bjorn Helgaas,
Will Deacon, linux-pci, linux-kernel, linux-arm-msm,
linux-arm-kernel, jonathanh, bjorn.andersson
In-Reply-To: <20260519223901.GA20376@bhelgaas>
On Tue, May 19, 2026 at 05:39:01PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 29, 2026 at 12:12:23PM +0530, Krishna Chaitanya Chundru wrote:
> > Add a common helper, pci_host_common_d3cold_possible(), to determine
> > whether PCIe devices under host bridge can safely transition to D3cold.
> ...
> > +static int __pci_host_common_d3cold_possible(struct pci_dev *pdev, void *userdata)
> > +{
> > + u32 *flags = userdata;
> > + int type;
> > +
> > + /* Ignore conventional PCI devices */
> > + if (!pci_is_pcie(pdev))
> > + return 0;
> > +
> > + type = pci_pcie_type(pdev);
> > + if (type != PCI_EXP_TYPE_ENDPOINT &&
> > + type != PCI_EXP_TYPE_LEG_END &&
> > + type != PCI_EXP_TYPE_RC_END)
> > + return 0;
>
> From https://sashiko.dev/#/patchset/20260429-d3cold-v5-0-89e9735b9df6%40oss.qualcomm.com:
>
> If the topology contains an active conventional PCI device or an
> intermediate PCIe switch in PCI_D0, returning 0 here allows
> pci_walk_bus() to continue without clearing the
> PCI_HOST_D3COLD_ALLOWED flag.
>
> Does this create a situation where the host bridge might
> aggressively power off the link, dropping power to these active
> components?
>
> I guess this is intentional, since you have comment about ignoring
> conventional PCI devices. But this does seem like a potential
> problem. Why should we ignore switches here? And I think it's still
> fairly common to have a PCIe-to-PCI bridge leading to a conventional
> PCI device, and I don't know why we should ignore them.
>
> The commit log consistently refers to "PCIe" devices and endpoints, so
> maybe there's some reason that I'm missing.
>
> There are other sashiko comments on this series that I think should
> also be looked at.
This series is all in pci/next, so you and Mani can decide on whether
any sashiko comments need to be addressed.
Even if there's no code change, I think it'd be nice to have a brief
comment here about why conventional PCI and switches are ignored.
^ permalink raw reply
* [PATCH 1/2] gfp_types: Introduce a new GFP_ATOMIC_RT gfp flag
From: Waiman Long @ 2026-05-20 20:46 UTC (permalink / raw)
To: Marc Zyngier, Thomas Gleixner, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko
Cc: linux-arm-kernel, linux-kernel, linux-mm, linux-rt-devel,
Waiman Long
The GFP_ATOMIC flag is to be used in atomic context where user cannot
sleep and need the allocation to succeed. However, it does not support
contexts where preemption or interrupt is disabled under PREEMPT_RT
like raw_spin_lock_irqsave() or plain preempt_disable().
With the advance of the ALLOC_TRYLOCK allocation flag in the v7.1
kernel, it is possible to allocate memory under such contexts by using
spin_trylock to acquire the spinlock in the memory allocation path. This
does increase the chance that the allocation can fail due to the presence
of concurrent memory allocation requests. So its users must be able to
handle such memory allocation failure gracefully.
The ALLOC_TRYLOCK flag will only be enabled if none of the
___GFP_DIRECT_RECLAIM and ___GFP_KSWAPD_RECLAIM flags are set.
Introduce a new GFP_ATOMIC_RT gfp flag for those PREEMPT_RT
atomic contexts. This new flag will fall back to GFP_ATOMIC in
non-PREEMPT_RT kernel. GFP_ATOMIC can continue to be used in contexts
where preemption and interrupt are not disabled in PREEMPT_RT kernel
like spin_lock_irqsave().
Signed-off-by: Waiman Long <longman@redhat.com>
---
include/linux/gfp_types.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index cd4972a7c97c..ac30882b6cd4 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -316,6 +316,13 @@ enum {
* preempt_disable() - see "Memory allocation" in
* Documentation/core-api/real-time/differences.rst for more info.
*
+ * %GFP_ATOMIC_RT is similar to %GFP_ATOMIC with the addition that it can also
+ * be used in context where preemption and/or interrupt is disabled under
+ * PREEMPT_RT, but not in NMI or hardirq contexts. The allocation is more
+ * likely to fail under PREEMPT_RT due to the spin_trylock() nature of lock
+ * acquisition. So the caller must be ready to handle memory allocation failure
+ * gracefully.
+ *
* %GFP_KERNEL is typical for kernel-internal allocations. The caller requires
* %ZONE_NORMAL or a lower zone for direct access but can direct reclaim.
*
@@ -388,4 +395,10 @@ enum {
__GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM)
#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM)
+#ifdef CONFIG_PREEMPT_RT
+# define GFP_ATOMIC_RT __GFP_HIGH
+#else
+# define GFP_ATOMIC_RT GFP_ATOMIC
+#endif
+
#endif /* __LINUX_GFP_TYPES_H */
--
2.54.0
^ permalink raw reply related
* [PATCH 2/2] irqchip/gic-v3-its: Use GFP_ATOMIC_RT gfp flag in allocate_vpe_l1_table()
From: Waiman Long @ 2026-05-20 20:46 UTC (permalink / raw)
To: Marc Zyngier, Thomas Gleixner, Sebastian Andrzej Siewior,
Clark Williams, Steven Rostedt, Andrew Morton, David Hildenbrand,
Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko
Cc: linux-arm-kernel, linux-kernel, linux-mm, linux-rt-devel,
Waiman Long
In-Reply-To: <20260520204628.933654-1-longman@redhat.com>
When running a PREEMPT_RT debug kernel on a 2-socket Grace arm64 system,
the following bug report was produced at bootup time.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/72
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
:
CPU: 72 UID: 0 PID: 0 Comm: swapper/72 Tainted: G W 6.19.0-rc4-test+ #4 PREEMPT_{RT,(full)}
Tainted: [W]=WARN
Call trace:
:
rt_spin_lock+0xe4/0x408
rmqueue_bulk+0x48/0x1de8
__rmqueue_pcplist+0x410/0x650
rmqueue.constprop.0+0x6a8/0x2b50
get_page_from_freelist+0x3c0/0xe68
__alloc_frozen_pages_noprof+0x1dc/0x348
alloc_pages_mpol+0xe4/0x2f8
alloc_frozen_pages_noprof+0x124/0x190
allocate_slab+0x2f0/0x438
new_slab+0x4c/0x80
___slab_alloc+0x410/0x798
__slab_alloc.constprop.0+0x88/0x1e0
__kmalloc_cache_noprof+0x2dc/0x4b0
allocate_vpe_l1_table+0x114/0x788
its_cpu_init_lpis+0x344/0x790
its_cpu_init+0x60/0x220
gic_starting_cpu+0x64/0xe8
cpuhp_invoke_callback+0x438/0x6d8
__cpuhp_invoke_callback_range+0xd8/0x1f8
notify_cpu_starting+0x11c/0x178
secondary_start_kernel+0xc8/0x188
__secondary_switched+0xc0/0xc8
This is due to the fact that allocate_vpe_l1_table() will call kzalloc()
to allocate a cpumask_t when the first CPU of the second node of the
72-cpu Grace system is being called from the CPUHP_AP_IRQ_GIC_STARTING
state inside the starting section of the CPU hotplug bringup pipeline
where interrupt is disabled. This is an atomic context where sleeping
is not allowed and acquiring a sleeping rt_spin_lock within kzalloc()
may lead to system hang in case there is a lock contention.
A possible workaround is to use the new GFP_ATOMIC_RT gfp flag where only
spin_trylock() will be used to attempt to acquire spinlocks in the memory
allocation path to disallow sleeping. As this memory allocation is only
needed for the first core of a new socket in early boot, the chance of
memory allocation request collision is low. In case it happens, direct
injection of virtual interrupts from the physical Interrupt Translation
Service (ITS) into a guest Virtual Machine (VM) will be disabled.
A longer term solution is to defer the allocation to a later stage of the
hotplug pipeline where interrupt isn't disabled.
With that change applied, booting up a debug kernel on the same 2-socket
Grace system does not produce such a bug report anymore with no direct
injection disable warning.
Signed-off-by: Waiman Long <longman@redhat.com>
---
drivers/irqchip/irq-gic-v3-its.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 291d7668cc8d..d78057fb40df 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2927,7 +2927,7 @@ static int allocate_vpe_l1_table(void)
if (val & GICR_VPROPBASER_4_1_VALID)
goto out;
- gic_data_rdist()->vpe_table_mask = kzalloc_obj(cpumask_t, GFP_ATOMIC);
+ gic_data_rdist()->vpe_table_mask = kzalloc_obj(cpumask_t, GFP_ATOMIC_RT);
if (!gic_data_rdist()->vpe_table_mask)
return -ENOMEM;
@@ -3271,6 +3271,8 @@ static void its_cpu_init_lpis(void)
*/
gic_rdists->has_rvpeid = false;
gic_rdists->has_vlpis = false;
+ pr_warn("GICv3: CPU%d: direct injection of virtual interrupt disabled\n",
+ smp_processor_id());
}
/* Make sure the GIC has seen the above */
--
2.54.0
^ permalink raw reply related
* [PATCH v4 1/5] optee: ffa: Add NULL check in optee_ffa_lend_protmem
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
In-Reply-To: <20260520204948.2440882-1-smostafa@google.com>
Sashiko (locally) reports a possible null dereference under memory
pressure due to the lack of validation of the allocated pointer.
Fix that by adding the missing check.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/tee/optee/ffa_abi.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tee/optee/ffa_abi.c b/drivers/tee/optee/ffa_abi.c
index b4372fa268d0..633715b98625 100644
--- a/drivers/tee/optee/ffa_abi.c
+++ b/drivers/tee/optee/ffa_abi.c
@@ -698,6 +698,9 @@ static int optee_ffa_lend_protmem(struct optee *optee, struct tee_shm *protmem,
int rc;
mem_attr = kzalloc_objs(*mem_attr, ma_count);
+ if (!mem_attr)
+ return -ENOMEM;
+
for (n = 0; n < ma_count; n++) {
mem_attr[n].receiver = mem_attrs[n] & U16_MAX;
mem_attr[n].attrs = mem_attrs[n] >> 16;
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply related
* [PATCH v4 3/5] firmware: arm_ffa: Fix Endpoint Memory Access Descriptor offset calculation
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
In-Reply-To: <20260520204948.2440882-1-smostafa@google.com>
From: Sebastian Ene <sebastianene@google.com>
Use the descriptor's `ep_mem_offset` to calculate the start of the endpoint
memory access array and to comply with the FF-A spec instead of defaulting
to `sizeof(struct ffa_mem_region)`.
This requires moving `ffa_mem_region_additional_setup()` earlier in the setup
flow.
Also, add sanity checks to ensure the calculated descriptor offsets do not
exceed `max_fragsize`.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/firmware/arm_ffa/driver.c | 16 +++++++++++-----
include/linux/arm_ffa.h | 2 +-
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index b700b2e93e72..8573a7a6556e 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -685,19 +685,26 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
struct ffa_composite_mem_region *composite;
struct ffa_mem_region_addr_range *constituents;
struct ffa_mem_region_attributes *ep_mem_access;
- u32 idx, frag_len, length, buf_sz = 0, num_entries = sg_nents(args->sg);
+ u32 idx, frag_len, length, buf_sz = 0, num_entries = sg_nents(args->sg), ep_offset;
+ u32 emad_size = ffa_emad_size_get(drv_info->version);
mem_region->tag = args->tag;
mem_region->flags = args->flags;
mem_region->sender_id = drv_info->vm_id;
mem_region->attributes = ffa_memory_attributes_get(func_id);
+
+ ffa_mem_region_additional_setup(drv_info->version, mem_region);
composite_offset = ffa_mem_desc_offset(buffer, args->nattrs,
drv_info->version);
+ if (composite_offset + sizeof(*composite) > max_fragsize)
+ return -ENXIO;
for (idx = 0; idx < args->nattrs; idx++) {
- ep_mem_access = buffer +
- ffa_mem_desc_offset(buffer, idx, drv_info->version);
- memset(ep_mem_access, 0, ffa_emad_size_get(drv_info->version));
+ ep_offset = ffa_mem_desc_offset(buffer, idx, drv_info->version);
+ if (ep_offset + emad_size > max_fragsize)
+ return -ENXIO;
+ ep_mem_access = buffer + ep_offset;
+ memset(ep_mem_access, 0, emad_size);
ep_mem_access->receiver = args->attrs[idx].receiver;
ep_mem_access->attrs = args->attrs[idx].attrs;
ep_mem_access->composite_off = composite_offset;
@@ -707,7 +714,6 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
}
mem_region->handle = 0;
mem_region->ep_count = args->nattrs;
- ffa_mem_region_additional_setup(drv_info->version, mem_region);
composite = buffer + composite_offset;
composite->total_pg_cnt = ffa_get_num_pages_sg(args->sg);
diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h
index 81e603839c4a..62d67dae8b70 100644
--- a/include/linux/arm_ffa.h
+++ b/include/linux/arm_ffa.h
@@ -445,7 +445,7 @@ ffa_mem_desc_offset(struct ffa_mem_region *buf, int count, u32 ffa_version)
if (!FFA_MEM_REGION_HAS_EP_MEM_OFFSET(ffa_version))
offset += offsetof(struct ffa_mem_region, ep_mem_offset);
else
- offset += sizeof(struct ffa_mem_region);
+ offset += buf->ep_mem_offset;
return offset;
}
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply related
* [PATCH v4 0/5] arm_ffa, KVM: Fix FF-A emad offset calculations
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
Hi all,
This series fixes the Endpoint Memory Access Descriptor (EMAD) offset
calculations and adds the necessary bounds checks for both the core
FF-A driver and the pKVM hypervisor.
Prior to FF-A version 1.1, the memory region header didn't specify an
explicit offset for the EMADs, leading to the assumption that they
immediately follow the header.
However, from v1.1 onwards, the specification dictates using the
ep_mem_offset` field to determine the start of the memory access
array.
The patches in this series address this by:
1. Updating the core `arm_ffa` firmware driver to correctly calculate the descriptor
offset using `ep_mem_offset` rather than defaulting to `sizeof(struct ffa_mem_region)`.
It also introduces bounds checking against `max_fragsize`.
2. Enhancing the pKVM hypervisor validation logic to no longer strictly enforce that
the descriptor strictly follows the header, aligning it with the driver behavior
and the FF-A specification, while also ensuring the offset falls within the mailbox
buffer bounds.
While addressing these bugs, Sashiko uncovered other issues that were
fixed in the same series.
Changelog
#########
v3 -> v4:
- Address review comments and fix Sashiko bugs
v2 -> v3:
- Fixed typo in nvhe/ffa.c (missing sizeof)
v1 -> v2:
- For pKVM, removed the strict placement enforcement for `ep_mem_offset` as it is not
compliant with the spec, and avoids making assumptions about the driver's memory
layout.
Link to:
########
v3: https://lore.kernel.org/all/20260512124442.1899107-1-sebastianene@google.com/
v2: https://lore.kernel.org/all/20260430160241.1934777-1-sebastianene@google.com/
v1: https://lore.kernel.org/all/ae9KN9nkOgDYJcGP@google.com/T/#t
Mostafa Saleh (3):
optee: ffa: Add NULL check in optee_ffa_lend_protmem
firmware: arm_ffa: Fix out-of-bound writes in ffa_setup_and_transmit()
KVM: arm64: Fix bounds checking in do_ffa_mem_reclaim()
Sebastian Ene (2):
firmware: arm_ffa: Fix Endpoint Memory Access Descriptor offset
calculation
KVM: arm64: Validate the offset to the mem access descriptor
arch/arm64/kvm/hyp/nvhe/ffa.c | 30 +++++++++++++++++++++++-------
drivers/firmware/arm_ffa/driver.c | 21 +++++++++++++--------
drivers/tee/optee/ffa_abi.c | 3 +++
include/linux/arm_ffa.h | 2 +-
4 files changed, 40 insertions(+), 16 deletions(-)
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply
* [PATCH v4 2/5] firmware: arm_ffa: Fix out-of-bound writes in ffa_setup_and_transmit()
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
In-Reply-To: <20260520204948.2440882-1-smostafa@google.com>
Sashiko (locally) reports multiple out-of-bound issues in
ffa_setup_and_transmit:
1) Writing ep_mem_access->reserved can write out of bounds for FFA
versions < 1.2 as ffa_emad_size_get() returns 16 bytes in that case
while reserved has an offset of 24.
Instead of zeroing fields, memset the struct to zero first based on
the FFA version.
2) Make sure there is enough size to write constituents.
While at it, convert the only sizeof() in the driver that uses a
type instead of variable.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
drivers/firmware/arm_ffa/driver.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/arm_ffa/driver.c b/drivers/firmware/arm_ffa/driver.c
index eb2782848283..b700b2e93e72 100644
--- a/drivers/firmware/arm_ffa/driver.c
+++ b/drivers/firmware/arm_ffa/driver.c
@@ -697,11 +697,10 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
for (idx = 0; idx < args->nattrs; idx++) {
ep_mem_access = buffer +
ffa_mem_desc_offset(buffer, idx, drv_info->version);
+ memset(ep_mem_access, 0, ffa_emad_size_get(drv_info->version));
ep_mem_access->receiver = args->attrs[idx].receiver;
ep_mem_access->attrs = args->attrs[idx].attrs;
ep_mem_access->composite_off = composite_offset;
- ep_mem_access->flag = 0;
- ep_mem_access->reserved = 0;
ffa_emad_impdef_value_init(drv_info->version,
ep_mem_access->impdef_val,
args->attrs[idx].impdef_val);
@@ -741,7 +740,7 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
constituents = buffer;
}
- if ((void *)constituents - buffer > max_fragsize) {
+ if ((void *)constituents + sizeof(*constituents) - buffer > max_fragsize) {
pr_err("Memory Region Fragment > Tx Buffer size\n");
return -EFAULT;
}
@@ -750,7 +749,7 @@ ffa_setup_and_transmit(u32 func_id, void *buffer, u32 max_fragsize,
constituents->pg_cnt = args->sg->length / FFA_PAGE_SIZE;
constituents->reserved = 0;
constituents++;
- frag_len += sizeof(struct ffa_mem_region_addr_range);
+ frag_len += sizeof(*constituents);
} while ((args->sg = sg_next(args->sg)));
return ffa_transmit_fragment(func_id, addr, buf_sz, frag_len,
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply related
* [PATCH v4 5/5] KVM: arm64: Validate the offset to the mem access descriptor
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
In-Reply-To: <20260520204948.2440882-1-smostafa@google.com>
From: Sebastian Ene <sebastianene@google.com>
Prevent the pKVM hypervisor from making assumptions that the
endpoint memory access descriptor (EMAD) comes right after the
FF-A memory region header.
Prior to FF-A version 1.1 the header of the memory region
didn't contain an offset to the endpoint memory access descriptor.
The layout of a memory transaction looks like this from 1.1 onward:
Type | Field name | Offset
[ Header | ffa_mem_region | 0
EMAD 1 | ffa_mem_region_attributes) | ffa_mem_region.ep_mem_offset
]
Verify that the offset to the first endpoint memory access descriptor
is within the mailbox buffer bounds.
[@Mostafa, Add missing call to ffa_rx_release() and use fraglen
as the max buffer size as it is the only intialised part]
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/nvhe/ffa.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
index e6aa2bfa63b1..38f35887e846 100644
--- a/arch/arm64/kvm/hyp/nvhe/ffa.c
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
@@ -479,7 +479,7 @@ static void __do_ffa_mem_xfer(const u64 func_id,
struct ffa_mem_region_attributes *ep_mem_access;
struct ffa_composite_mem_region *reg;
struct ffa_mem_region *buf;
- u32 offset, nr_ranges, checked_offset;
+ u32 offset, nr_ranges, checked_offset, em_mem_access_off;
int ret = 0;
if (addr_mbz || npages_mbz || fraglen > len ||
@@ -508,8 +508,13 @@ static void __do_ffa_mem_xfer(const u64 func_id,
buf = hyp_buffers.tx;
memcpy(buf, host_buffers.tx, fraglen);
- ep_mem_access = (void *)buf +
- ffa_mem_desc_offset(buf, 0, hyp_ffa_version);
+ em_mem_access_off = ffa_mem_desc_offset(buf, 0, hyp_ffa_version);
+ if (em_mem_access_off + sizeof(struct ffa_mem_region_attributes) > fraglen) {
+ ret = FFA_RET_INVALID_PARAMETERS;
+ goto out_unlock;
+ }
+
+ ep_mem_access = (void *)buf + em_mem_access_off;
offset = ep_mem_access->composite_off;
if (!offset || buf->ep_count != 1 || buf->sender_id != HOST_FFA_ID) {
ret = FFA_RET_INVALID_PARAMETERS;
@@ -576,7 +581,7 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
DECLARE_REG(u32, flags, ctxt, 3);
struct ffa_mem_region_attributes *ep_mem_access;
struct ffa_composite_mem_region *reg;
- u32 offset, len, fraglen, fragoff;
+ u32 offset, len, fraglen, fragoff, em_mem_access_off;
struct ffa_mem_region *buf;
int ret = 0;
u64 handle;
@@ -599,8 +604,14 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
len = res->a1;
fraglen = res->a2;
- ep_mem_access = (void *)buf +
- ffa_mem_desc_offset(buf, 0, hyp_ffa_version);
+ em_mem_access_off = ffa_mem_desc_offset(buf, 0, hyp_ffa_version);
+ if (em_mem_access_off + sizeof(struct ffa_mem_region_attributes) > fraglen) {
+ ret = FFA_RET_INVALID_PARAMETERS;
+ ffa_rx_release(res);
+ goto out_unlock;
+ }
+
+ ep_mem_access = (void *)buf + em_mem_access_off;
offset = ep_mem_access->composite_off;
/*
* We can trust the SPMD to get this right, but let's at least
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply related
* [PATCH v4 4/5] KVM: arm64: Fix bounds checking in do_ffa_mem_reclaim()
From: Mostafa Saleh @ 2026-05-20 20:49 UTC (permalink / raw)
To: op-tee, linux-kernel, kvmarm, linux-arm-kernel
Cc: maz, oupton, joey.gouly, suzuki.poulose, catalin.marinas,
jens.wiklander, sumit.garg, sebastianene, vdonnefort,
sudeep.holla, Mostafa Saleh
In-Reply-To: <20260520204948.2440882-1-smostafa@google.com>
Sashiko (locally) reports out of bound write possiblity if SPMD
returns an invalid data.
While SPMD is considered trusted, pKVM does some basic checks,
for offset to be less than or equal len.
However, that is incorrect as even if the offset is smaller than
len pKVM can still access out of bound memory in the next
ffa_host_unshare_ranges().
Split this check into 2:
1- Check that the fixed portion of the descriptor fits.
2- After getting reg, check the variable array size addr_range_cnt
fits.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/hyp/nvhe/ffa.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/ffa.c b/arch/arm64/kvm/hyp/nvhe/ffa.c
index 1af722771178..e6aa2bfa63b1 100644
--- a/arch/arm64/kvm/hyp/nvhe/ffa.c
+++ b/arch/arm64/kvm/hyp/nvhe/ffa.c
@@ -607,7 +607,7 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
* check that we end up with something that doesn't look _completely_
* bogus.
*/
- if (WARN_ON(offset > len ||
+ if (WARN_ON(offset + CONSTITUENTS_OFFSET(0) > len ||
fraglen > KVM_FFA_MBOX_NR_PAGES * PAGE_SIZE)) {
ret = FFA_RET_ABORTED;
ffa_rx_release(res);
@@ -641,6 +641,11 @@ static void do_ffa_mem_reclaim(struct arm_smccc_1_2_regs *res,
goto out_unlock;
reg = (void *)buf + offset;
+ if (WARN_ON(offset + CONSTITUENTS_OFFSET(reg->addr_range_cnt) > len)) {
+ ret = FFA_RET_ABORTED;
+ goto out_unlock;
+ }
+
/* If the SPMD was happy, then we should be too. */
WARN_ON(ffa_host_unshare_ranges(reg->constituents,
reg->addr_range_cnt));
--
2.54.0.669.g59709faab0-goog
^ permalink raw reply related
* Re: [PATCH v2 4/7] mm/vmalloc: Extend page table walk to support larger page_shift sizes and eliminate page table rewalk
From: Barry Song @ 2026-05-20 20:56 UTC (permalink / raw)
To: Mike Rapoport
Cc: Wen Jiang, linux-mm, linux-arm-kernel, catalin.marinas, will,
akpm, urezki, Xueyuan.chen21, dev.jain, david, ryan.roberts,
anshuman.khandual, ajd, linux-kernel, Wen Jiang
In-Reply-To: <177927799624.3551615.1232032128494296554.b4-review@b4>
On Wed, May 20, 2026 at 7:53 PM Mike Rapoport <rppt@kernel.org> wrote:
>
> On Thu, 14 May 2026 17:41:05 +0800, Wen Jiang <jiangwenxiaomi@gmail.com> wrote:
>
> Hi,
>
> > vmap_pages_range_noflush_walk() (formerly vmap_small_pages_range_noflush())
> > provides a clean interface by taking struct page **pages and mapping them
> > via direct PTE iteration. This avoids the page table rewalk seen when
> > using vmap_range_noflush() for page_shift values other than PAGE_SHIFT.
> >
> > Extend it to support larger page_shift values, and add PMD- and
> > contiguous-PTE mappings as well. Rename it to vmap_pages_range_noflush_walk()
> > since it now handles more than just small pages.
> >
> > For vmalloc() allocations with VM_ALLOW_HUGE_VMAP, we no longer need to
> > iterate over pages one by one via vmap_range_noflush(), which would
> > otherwise lead to page table rewalk. The code is now unified with the
> > PAGE_SHIFT case by simply calling vmap_pages_range_noflush_walk().
>
> After this patch we have two very simalar page table walkers:
> vmap_pages_range_noflush_walk() and vmap_range_noflush().
>
> The subtly differ at what levels they try huge mappings, how they
> account page table modifucations and, at last vmap_range_noflush() is
> left without support for contiguous mappings.
>
> Is there a fundamental reason to have two page walkers?
> Is there a reason not to support contiguous mappings in
> vmap_range_noflush()?
Hi Mike,
They are two completely different cases. In one case, you have a contiguous
physical address range for ioremap(). Since the range is fully contiguous,
many things can be simplified. We already support large mappings there, but
this patchset still improves performance through larger batching.
In the other case, you have a pages[] array whose pages are not physically
contiguous as a whole. You iterate over the pages and attempt to create large
mappings whenever a subset of adjacent pages happens to be physically
contiguous.
Best Regards
Barry
^ permalink raw reply
* Re: [PATCH] arm64/entry: Don't disable preemption in debug_exception_enter() with RT kernel
From: Waiman Long @ 2026-05-20 21:01 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Clark Williams,
Steven Rostedt, linux-arm-kernel, linux-kernel, linux-rt-devel
In-Reply-To: <20260520061920.XCqBHe0x@linutronix.de>
On 5/20/26 2:19 AM, Sebastian Andrzej Siewior wrote:
> On 2026-05-19 18:25:24 [-0400], Waiman Long wrote:
>> Commit d8bb6718c4db ("arm64: Make debug exception handlers visible from
>> RCU") introduces debug_exception_enter() and debug_exception_exit()
>> where preemption is explicitly disabled. With a PREEMPT_RT debug kernel,
>> the following bug report can happen.
> …
>
> What kernel is this? I have backport (which is being tested) for v6.6
> and v6.12, the patches are from v6.17-rc1.
The kernel backtrace is produced using the latest v7.1-rc4 kernel. There
are a number of changes to the debug_exception_enter() and
debug_exception_exit() functions over the years, but the preemption
disable code remains since its introduction in v5.3.
BTW, what v6.17-rc1 patches are you talking about?
-Longman
^ permalink raw reply
* Re: [PATCH 5/6] firmware: samsung: acpm: Add TMU protocol support
From: Alexey Klimov @ 2026-05-20 21:01 UTC (permalink / raw)
To: Tudor Ambarus
Cc: Krzysztof Kozlowski, Michael Turquette, Stephen Boyd, Lee Jones,
Alim Akhtar, Sylwester Nawrocki, Chanwoo Choi, André Draszik,
linux-kernel, linux-samsung-soc, linux-arm-kernel, linux-clk,
peter.griffin, jyescas, kernel-team, Krzysztof Kozlowski
In-Reply-To: <73f18a2c-e729-45d7-9376-7c0e60ed35c7@linaro.org>
Hi Tudor,
On Tue May 19, 2026 at 4:46 PM BST, Tudor Ambarus wrote:
> Hi, Alexey,
>
> On 5/18/26 2:24 PM, Alexey Klimov wrote:
>> Thinking further about this I'd humbly suggest that even
>>
>> if (fw_err >= 0)
>> return 0;
>>
>> pr_debug_ratelimited("ACPM tmu call returned: %x\n", fw_err);
>> or pr_debug(...);
>>
>> if (fw_err == -1)
>> return -EACCES;
>>
>> some debug message would do.
>> Perhaps we need some convertation, for instance as it is done in scmi
>> code (scmi_to_linux_errno(), scmi_linux_errmap[]). But I don't have any
>> data for mapping acpm errors to some human meanings.
>
> I did that for the pmic helpers. I don't need any debug prints for
> gs101 TMU as I have clear instructions from firmware: 0 for success,
> -1 for error.
This doesn't look like a right approach for upstreaming a ACPM TMU
framework.
You are trying to submit a gs101-specific implementation masquerading
it as a generic ACPM TMU framework, while explicitly pushing the
refactoring work onto the next developer to add support for other
SoCs in this generic ACPM code.
The ACPM TMU protocol implementation on Exynos850 is different: it uses
different error codes, and half of the calls in this 'generic' driver
are not even implemented in the Exynos850 firmware. Relying on a
hardcoded if (fw_err == -1) in a driver named generic ACPM is broken
by design and may silently swallow critical firmware errors on other
SoCs.
What about such options below?
- rename the driver to reflect reality: rename this specifically to
gs101-acpm-tmu-something to reflect that it is tailored for gs101-s;
or
- abstract the firmware error handling paths through driver_data or
a dedicated ops structure now, so that other SoCs can cleanly hook into
it without having to rewrite the logic later.
Maybe we can utilise build info date from SRAM initdata base to
distinguish between different version of ACPM firmware implementations.
Sadly I was told that specs doesn't provide any version info.
Or probe for implemented calls but that may break ACPM firmware
potentially.
You also mentioned that there are capabilities returned by TMU init
call. Maybe that one can be used somehow.
Thanks,
Alexey
^ permalink raw reply
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Matthew Wilcox @ 2026-05-20 21:04 UTC (permalink / raw)
To: Barry Song
Cc: Liam R. Howlett, Suren Baghdasaryan, Lorenzo Stoakes, akpm,
linux-mm, david, vbabka, rppt, mhocko, jack, pfalcato, wanglian,
chentao, lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong,
shikemeng, nphamcs, bhe, youngjun.park, linux-arm-kernel,
linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
Nanzhe Zhao
In-Reply-To: <CAGsJ_4yKxg1QugcsJi3WD0KVGJKe-zXycgm5D6cRi9vWtNcpDQ@mail.gmail.com>
On Wed, May 20, 2026 at 06:01:56AM +0800, Barry Song wrote:
> > implied is that the per-vma locking may stall mmap_lock writes for
> > longer than if the mmap_lock was taken in read mode? Barry, is that
> > correct?
>
> Not the case — the actual situation is (if we modify the
> current kernel to perform I/O without releasing VMA read locks):
>
> thread 1 PF: lock vma1 read ---- IO ----- ;
> thread 2 PF: lock vma2 read ----- IO ----- ;
> thread 3 PF: lock vma3 read ---- IO ----- ;
> thread 4 fork: mmap_lock_write ---- lock vma1, vma2, vma3 write ;
> thread 5 : take mmap_lock for any read/write reason
>
> Now you can see that thread 4 has to wait for the I/O of
> VMA1, VMA2, and VMA3 to complete, and thread 5 then has to
> wait for thread 4 to release mmap_lock. Both thread 4 and
> thread 5 can become extremely slow, because I/O may be stuck
> anywhere in the bio/request queue or filesystem GC.
>
> So now we have two choices:
>
> 1. Change fork() to avoid taking the vma write lock for vma1/2/3 where possible;
> 2. Keep the current kernel behavior and drop the VMA lock before I/O:
Option 3: Say that this is a very silly thing to optimise for. I have a
hard time believing that any application will care about the latency of
fork(), or the latency of page faults while it's in the middle of fork().
Multithreaded applications just don't fork that often!
^ permalink raw reply
* Re: [PATCH v4 1/2] media: nxp: imx8-isi: Add virtual channel support
From: Laurent Pinchart @ 2026-05-20 21:09 UTC (permalink / raw)
To: Guoniu Zhou
Cc: Mauro Carvalho Chehab, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Aisheng Dong, linux-media,
imx, linux-arm-kernel, linux-kernel, Guoniu Zhou
In-Reply-To: <20260508-isi_vc-v4-1-feee39c63939@oss.nxp.com>
Hello Guoniu,
Thank you for the patch.
On Fri, May 08, 2026 at 11:05:40AM +0800, Guoniu Zhou wrote:
> From: Guoniu Zhou <guoniu.zhou@nxp.com>
>
> The ISI supports different numbers of virtual channels depending on the
> platform. i.MX95 supports 8 virtual channels, and i.MX8QXP/QM support 4
> virtual channels. They are used in multiple camera use cases, such as
> surround view. Other platforms (such as i.MX8/MN/MP/ULP/91/93) don't
> support virtual channels, and the VC_ID bits are marked as read-only.
>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com>
> ---
> Changes in v4:
> - Fix VC boundary check: use num_vc (virtual channels count) instead of
> num_channels (ISI pipelines count)
> - Set VC to 0 when frame descriptor has no entries
> - Move platform-specific comments to block style to fix line length warnings
>
> Changes in v3:
> - Add num_vc field to platform data to indicate VC support
> - Clear VC_ID_1 bit after reading CHNL_CTRL for proper VC switching
> - Set VC_ID_1 only on platforms with num_vc > 4
> - Improve mxc_isi_get_vc() error handling
> - Add back CHNL_CTRL_BLANK_PXL and document platform-specific register fields
> ---
> .../media/platform/nxp/imx8-isi/imx8-isi-core.c | 3 ++
> .../media/platform/nxp/imx8-isi/imx8-isi-core.h | 4 ++
> drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c | 14 ++++-
> .../media/platform/nxp/imx8-isi/imx8-isi-pipe.c | 59 ++++++++++++++++++++++
> .../media/platform/nxp/imx8-isi/imx8-isi-regs.h | 12 +++--
> 5 files changed, 88 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c
> index 4bf8570e1b9e..837ac7046cf2 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.c
> @@ -318,6 +318,7 @@ static const struct mxc_isi_plat_data mxc_imx95_data = {
> .model = MXC_ISI_IMX95,
> .num_ports = 4,
> .num_channels = 8,
> + .num_vc = 8,
> .reg_offset = 0x10000,
> .ier_reg = &mxc_imx8_isi_ier_v2,
> .set_thd = &mxc_imx8_isi_thd_v1,
> @@ -329,6 +330,7 @@ static const struct mxc_isi_plat_data mxc_imx8qm_data = {
> .model = MXC_ISI_IMX8QM,
> .num_ports = 5,
> .num_channels = 8,
> + .num_vc = 4,
> .reg_offset = 0x10000,
> .ier_reg = &mxc_imx8_isi_ier_qm,
> .set_thd = &mxc_imx8_isi_thd_v1,
> @@ -340,6 +342,7 @@ static const struct mxc_isi_plat_data mxc_imx8qxp_data = {
> .model = MXC_ISI_IMX8QXP,
> .num_ports = 5,
> .num_channels = 6,
> + .num_vc = 4,
> .reg_offset = 0x10000,
> .ier_reg = &mxc_imx8_isi_ier_v2,
> .set_thd = &mxc_imx8_isi_thd_v1,
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> index 14d63ec36416..195c28dbd151 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> @@ -169,6 +169,7 @@ struct mxc_isi_plat_data {
> enum model model;
> unsigned int num_ports;
> unsigned int num_channels;
> + unsigned int num_vc; /* Number of VCs, 0 = no VC support */
> unsigned int reg_offset;
> const struct mxc_isi_ier_reg *ier_reg;
> const struct mxc_isi_set_thd *set_thd;
> @@ -257,6 +258,9 @@ struct mxc_isi_pipe {
> u8 acquired_res;
> u8 chained_res;
> bool chained;
> +
> + /* Virtual channel ID for the ISI channel */
> + u8 vc;
I try not to store such values in global structures, when the purpose is
to pass them between functions in a direct call stack. You can instead
return the vc value from mxc_isi_get_vc(), pass it to
mxc_isi_channel_config() and from there to
mxc_isi_channel_set_control().
> };
>
> struct mxc_isi_m2m {
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> index 0187d4ab97e8..ecd0c2ef28b6 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> @@ -308,6 +308,11 @@ static void mxc_isi_channel_set_control(struct mxc_isi_pipe *pipe,
> mutex_lock(&pipe->lock);
>
> val = mxc_isi_read(pipe, CHNL_CTRL);
> +
> + /* Clear the VC_ID_1 bit on platforms supporting more than 4 VCs. */
> + if (pipe->isi->pdata->num_vc > 4)
> + val &= ~CHNL_CTRL_VC_ID_1_MASK;
> +
Please move this just after the next statement, we usually start with
generic statements followed by conditional ones.
> val &= ~(CHNL_CTRL_CHNL_BYPASS | CHNL_CTRL_CHAIN_BUF_MASK |
> CHNL_CTRL_SRC_TYPE_MASK | CHNL_CTRL_MIPI_VC_ID_MASK |
> CHNL_CTRL_SRC_INPUT_MASK);
> @@ -338,7 +343,14 @@ static void mxc_isi_channel_set_control(struct mxc_isi_pipe *pipe,
> } else {
> val |= CHNL_CTRL_SRC_TYPE(CHNL_CTRL_SRC_TYPE_DEVICE);
> val |= CHNL_CTRL_SRC_INPUT(input);
> - val |= CHNL_CTRL_MIPI_VC_ID(0); /* FIXME: For CSI-2 only */
> + val |= CHNL_CTRL_MIPI_VC_ID(pipe->vc); /* FIXME: For CSI-2 only */
> +
> + /*
> + * On platforms with more than 4 VCs (i.MX95), the VC ID is
> + * split across VC_ID_0 (bits 7:6) and VC_ID_1 (bit 16).
> + */
> + if (pipe->isi->pdata->num_vc > 4)
> + val |= CHNL_CTRL_VC_ID_1(pipe->vc >> 2);
> }
>
> mxc_isi_write(pipe, CHNL_CTRL, val);
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> index a41c51dd9ce0..e6da254a9ef0 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> @@ -232,6 +232,61 @@ static inline struct mxc_isi_pipe *to_isi_pipe(struct v4l2_subdev *sd)
> return container_of(sd, struct mxc_isi_pipe, sd);
> }
>
> +static int mxc_isi_get_vc(struct mxc_isi_pipe *pipe)
> +{
> + struct mxc_isi_crossbar *xbar = &pipe->isi->crossbar;
> + struct device *dev = pipe->isi->dev;
> + struct v4l2_mbus_frame_desc fd = { };
> + unsigned int source_pad = xbar->num_sinks + pipe->id;
> + unsigned int max_vc;
> + unsigned int i;
> + int ret;
> +
> + ret = v4l2_subdev_call(&xbar->sd, pad, get_frame_desc,
> + source_pad, &fd);
> + if (ret == -ENOIOCTLCMD) {
Is this needed ? If we swap patches 1/2 and 2/2, the get_frame_desc
operation should always be available on the source.
> + /*
> + * If remote subdev doesn't implement get_frame_desc.
> + * Assume virtual channel 0.
> + */
> + pipe->vc = 0;
> + return 0;
> + }
> + if (ret < 0) {
> + dev_err(dev, "Failed to get source frame desc from pad %u\n",
> + source_pad);
> + return ret;
> + }
> +
> + if (!fd.num_entries) {
> + pipe->vc = 0;
> + return 0;
> + }
Similarly, can this happen ?
> +
> + /* Find stream 0 in the frame descriptor */
> + for (i = 0; i < fd.num_entries; i++) {
> + if (fd.entry[i].stream == 0)
> + break;
> + }
> +
> + if (i == fd.num_entries) {
> + dev_err(dev, "Failed to find stream from source frame desc\n");
> + return -EINVAL;
I think -EPIPE would be more appropriate, this indicates the pipeline
isn't correctly configured.
> + }
> +
> + max_vc = pipe->isi->pdata->num_vc ? : 1;
> +
> + /* Check virtual channel range */
> + if (fd.entry[i].bus.csi2.vc >= max_vc) {
> + dev_err(dev, "Virtual channel %u exceeds maximum %u\n",
> + fd.entry[i].bus.csi2.vc, max_vc - 1);
> + return -EINVAL;
Same here.
> + }
> +
> + pipe->vc = fd.entry[i].bus.csi2.vc;
> + return 0;
> +}
> +
> int mxc_isi_pipe_enable(struct mxc_isi_pipe *pipe)
> {
> struct mxc_isi_crossbar *xbar = &pipe->isi->crossbar;
> @@ -280,6 +335,10 @@ int mxc_isi_pipe_enable(struct mxc_isi_pipe *pipe)
>
> v4l2_subdev_unlock_state(state);
>
> + ret = mxc_isi_get_vc(pipe);
> + if (ret)
> + return ret;
> +
> /* Configure the ISI channel. */
> mxc_isi_channel_config(pipe, input, &in_size, &scale, &crop,
> sink_info->encoding, src_info->encoding);
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-regs.h b/drivers/media/platform/nxp/imx8-isi/imx8-isi-regs.h
> index 1b65eccdf0da..e795f4daf3ff 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-regs.h
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-regs.h
> @@ -6,6 +6,7 @@
> #ifndef __IMX8_ISI_REGS_H__
> #define __IMX8_ISI_REGS_H__
>
> +#include <linux/bitfield.h>
> #include <linux/bits.h>
>
> /* ISI Registers Define */
> @@ -19,9 +20,14 @@
> #define CHNL_CTRL_CHAIN_BUF_NO_CHAIN 0
> #define CHNL_CTRL_CHAIN_BUF_2_CHAIN 1
> #define CHNL_CTRL_SW_RST BIT(24)
> -#define CHNL_CTRL_BLANK_PXL(n) ((n) << 16)
> -#define CHNL_CTRL_BLANK_PXL_MASK GENMASK(23, 16)
> -#define CHNL_CTRL_MIPI_VC_ID(n) ((n) << 6)
> +/*
> + * CHNL_CTRL_BLANK_PXL: i.MX8{QM,QXP} only
> + * CHNL_CTRL_VC_ID_1, CHNL_CTRL_VC_ID_1_MASK: i.MX95 only
> + */
> +#define CHNL_CTRL_BLANK_PXL(n) FIELD_PREP(GENMASK(23, 16), (n))
> +#define CHNL_CTRL_VC_ID_1(n) FIELD_PREP(BIT(16), (n))
> +#define CHNL_CTRL_VC_ID_1_MASK BIT(16)
> +#define CHNL_CTRL_MIPI_VC_ID(n) FIELD_PREP(GENMASK(7, 6), (n))
> #define CHNL_CTRL_MIPI_VC_ID_MASK GENMASK(7, 6)
> #define CHNL_CTRL_SRC_TYPE(n) ((n) << 4)
> #define CHNL_CTRL_SRC_TYPE_MASK BIT(4)
--
Regards,
Laurent Pinchart
^ permalink raw reply
* [GIT PULL] soc: fixes for 7.1
From: Arnd Bergmann @ 2026-05-20 21:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: soc, linux-arm-kernel, linux-kernel
The following changes since commit 7fd2df204f342fc17d1a0bfcd474b24232fb0f32:
Linux 7.1-rc2 (2026-05-03 14:21:25 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git tags/soc-fixes-7.1
for you to fetch changes up to 1fcf4149418e7a8f8253dd74059d56340795503f:
Merge tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes (2026-05-08 15:33:36 +0200)
----------------------------------------------------------------
soc: fixes for 7.1
The ff-a firmware driver gets 11 individual bugfixes for a number
of issues with robustness to buggy firmware or client implementations.
Another firmware fix address suspend to RAM via PSCI firmware.
The final code change is for the old Arm Integrator reference
platform that recently started exposing an old NULL pointer
dereference bug.
The MAINTAINERS file gets two updates, notably James Tai and Yu-Chun
Lin are stepping up as co-maintainers for the Realtek platform.
The remaining patches are all for devicetree files. Two of these
are for riscv boards, the rest are all for enesas Arm platforms,
addressing build time checking issues as well as minor configuration
problems.
----------------------------------------------------------------
Arnd Bergmann (3):
Merge tag 'renesas-fixes-for-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/fixes
Merge tag 'ffa-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
Merge tag 'riscv-dt-fixes-for-v7.1-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes
Conor Dooley (1):
riscv: dts: microchip: fix icicle i2c pinctrl configuration
Geert Uytterhoeven (1):
arm64: dts: renesas: r8a78000: Fix SCIF brg_int clocks
Guenter Roeck (1):
ARM: integrator: Fix early initialization
Jai Luthra (1):
riscv: dts: starfive: jh7110: Drop CAMSS node
Konrad Dybcio (1):
firmware: psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND
Krzysztof Kozlowski (1):
ARM: realtek: MAINTAINERS: Include pin controller drivers
Marek Vasut (10):
arm64: dts: renesas: draak/ebisu-panel: Fix missing cells and reg in DTO
arm64: dts: renesas: salvator-panel: Fix missing cells and reg in DTO
arm64: dts: renesas: rz-smarc-cru-csi-ov5645: Fix missing cells and reg in CSI2 subnode
arm64: dts: renesas: rz-smarc-du-adv7513-smarc: Fix missing cells and reg in DU subnode
ARM: dts: renesas: r8a7778: Add missing unit address to bus node
ARM: dts: renesas: r8a7779: Add missing unit address to bus node
ARM: dts: renesas: r8a7792: Add missing unit address to bus node
ARM: dts: renesas: r7s72100: Add missing unit address to bus node
ARM: dts: renesas: genmai: Drop superfluous cells
ARM: dts: renesas: rskrza1: Drop superfluous cells
Sudeep Holla (11):
firmware: arm_ffa: Check for NULL FF-A ID table while driver registration
firmware: arm_ffa: Skip free_pages on RX buffer alloc failure
firmware: arm_ffa: Avoid collapsing NPI work from different CPUs
firmware: arm_ffa: Fix per-vcpu self notifications handling in workqueue
firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0
firmware: arm_ffa: Bound PARTITION_INFO_GET_REGS copies
firmware: arm_ffa: Keep framework RX release under lock
firmware: arm_ffa: Validate framework notification message layout
firmware: arm_ffa: Align RxTx buffer size before mapping
firmware: arm_ffa: Snapshot notifier callbacks under lock
firmware: arm_ffa: Fix sched-recv callback partition lookup
Tommaso Merciai (2):
arm64: dts: renesas: r9a09g057: Add #mux-state-cells to usb2{0,1}phyrst
arm64: dts: renesas: r9a09g056: Add #mux-state-cells to usb20phyrst
Yu-Chun Lin (1):
MAINTAINERS: Add maintainers for ARM/REALTEK ARCHITECTURE
MAINTAINERS | 5 +-
arch/arm/boot/dts/renesas/r7s72100-genmai.dts | 3 -
arch/arm/boot/dts/renesas/r7s72100-rskrza1.dts | 2 -
arch/arm/boot/dts/renesas/r7s72100.dtsi | 2 +-
arch/arm/boot/dts/renesas/r8a7778.dtsi | 2 +-
arch/arm/boot/dts/renesas/r8a7779.dtsi | 2 +-
arch/arm/boot/dts/renesas/r8a7792.dtsi | 2 +-
arch/arm/mach-versatile/integrator_cp.c | 13 +-
.../dts/renesas/draak-ebisu-panel-aa104xd12.dtso | 5 +
arch/arm64/boot/dts/renesas/r8a78000.dtsi | 8 +-
arch/arm64/boot/dts/renesas/r9a09g056.dtsi | 1 +
arch/arm64/boot/dts/renesas/r9a09g057.dtsi | 2 +
.../boot/dts/renesas/rz-smarc-cru-csi-ov5645.dtsi | 5 +
.../boot/dts/renesas/rz-smarc-du-adv7513.dtsi | 5 +
.../boot/dts/renesas/salvator-panel-aa104xd12.dtso | 5 +
.../boot/dts/microchip/mpfs-icicle-kit-fabric.dtsi | 10 --
.../boot/dts/microchip/mpfs-icicle-kit-prod.dts | 10 ++
arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dts | 19 +++
arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 27 +---
arch/riscv/boot/dts/starfive/jh7110.dtsi | 28 ----
drivers/firmware/arm_ffa/bus.c | 4 +-
drivers/firmware/arm_ffa/driver.c | 144 +++++++++++++++------
drivers/firmware/psci/psci.c | 10 ++
23 files changed, 183 insertions(+), 131 deletions(-)
^ permalink raw reply
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Barry Song @ 2026-05-20 21:14 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Liam R. Howlett, Suren Baghdasaryan, Lorenzo Stoakes, akpm,
linux-mm, david, vbabka, rppt, mhocko, jack, pfalcato, wanglian,
chentao, lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong,
shikemeng, nphamcs, bhe, youngjun.park, linux-arm-kernel,
linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
Nanzhe Zhao
In-Reply-To: <ag4h87CBd-gph9zX@casper.infradead.org>
On Thu, May 21, 2026 at 5:05 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, May 20, 2026 at 06:01:56AM +0800, Barry Song wrote:
> > > implied is that the per-vma locking may stall mmap_lock writes for
> > > longer than if the mmap_lock was taken in read mode? Barry, is that
> > > correct?
> >
> > Not the case — the actual situation is (if we modify the
> > current kernel to perform I/O without releasing VMA read locks):
> >
> > thread 1 PF: lock vma1 read ---- IO ----- ;
> > thread 2 PF: lock vma2 read ----- IO ----- ;
> > thread 3 PF: lock vma3 read ---- IO ----- ;
> > thread 4 fork: mmap_lock_write ---- lock vma1, vma2, vma3 write ;
> > thread 5 : take mmap_lock for any read/write reason
> >
> > Now you can see that thread 4 has to wait for the I/O of
> > VMA1, VMA2, and VMA3 to complete, and thread 5 then has to
> > wait for thread 4 to release mmap_lock. Both thread 4 and
> > thread 5 can become extremely slow, because I/O may be stuck
> > anywhere in the bio/request queue or filesystem GC.
> >
> > So now we have two choices:
> >
> > 1. Change fork() to avoid taking the vma write lock for vma1/2/3 where possible;
> > 2. Keep the current kernel behavior and drop the VMA lock before I/O:
>
> Option 3: Say that this is a very silly thing to optimise for. I have a
> hard time believing that any application will care about the latency of
> fork(), or the latency of page faults while it's in the middle of fork().
> Multithreaded applications just don't fork that often!
My understanding is that we should not blame applications here. This is 2026:
there are basically only two kinds of applications — single-threaded and
multi-threaded — and single-threaded applications are nearly extinct.
^ permalink raw reply
* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Matthew Wilcox @ 2026-05-20 21:15 UTC (permalink / raw)
To: Barry Song
Cc: Liam R. Howlett, Suren Baghdasaryan, Lorenzo Stoakes, akpm,
linux-mm, david, vbabka, rppt, mhocko, jack, pfalcato, wanglian,
chentao, lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong,
shikemeng, nphamcs, bhe, youngjun.park, linux-arm-kernel,
linux-kernel, loongarch, linuxppc-dev, linux-riscv, linux-s390,
Nanzhe Zhao
In-Reply-To: <CAGsJ_4zA8afu0xXy0WS+tMe-eesDX1W6UBmfAsuouUpcAgK8JQ@mail.gmail.com>
On Thu, May 21, 2026 at 05:14:20AM +0800, Barry Song wrote:
> My understanding is that we should not blame applications here. This is 2026:
> there are basically only two kinds of applications — single-threaded and
> multi-threaded — and single-threaded applications are nearly extinct.
all of the applications i run are either single threaded or don't fork.
what multithreaded applications call fork?
^ permalink raw reply
* Re: [PATCH v4 2/2] media: nxp: imx8-isi: Implement get_frame_desc for crossbar subdev
From: Laurent Pinchart @ 2026-05-20 21:22 UTC (permalink / raw)
To: Guoniu Zhou
Cc: Mauro Carvalho Chehab, Frank Li, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Aisheng Dong, linux-media,
imx, linux-arm-kernel, linux-kernel, Guoniu Zhou
In-Reply-To: <20260508-isi_vc-v4-2-feee39c63939@oss.nxp.com>
Hello Guoniu,
Thank you for the patch.
On Fri, May 08, 2026 at 11:05:41AM +0800, Guoniu Zhou wrote:
> From: "Guoniu.zhou" <guoniu.zhou@nxp.com>
>
> Implement the get_frame_desc pad operation for the crossbar subdevice
> to propagate frame descriptor information from the source subdevice to
> downstream ISI channels.
>
> This allows the ISI driver to retrieve virtual channel information and
> other stream parameters from the connected upstream, which is required
> for proper virtual channel routing on platforms supporting multiple VCs.
Have you looked at v4l2_subdev_get_frame_desc_passthrough(), could it be
used instead of a manual implementation ? This could be either direct
usage of v4l2_subdev_get_frame_desc_passthrough(), or with minor
additional customization (first calling the unlocked helper
__v4l2_subdev_get_frame_desc_passthrough() and updating the descriptors.
> Signed-off-by: Guoniu.zhou <guoniu.zhou@nxp.com>
> ---
> Changes in v4:
> - Use %d instead of %u for ret variable in error messages
> - Fix potential -ENOIOCTLCMD leak by resetting ret to 0 on continue
>
> Changes in v3:
> - New patch added based on feedback from Laurent Pinchart
> ---
> .../platform/nxp/imx8-isi/imx8-isi-crossbar.c | 98 ++++++++++++++++++++++
> 1 file changed, 98 insertions(+)
>
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-crossbar.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-crossbar.c
> index 605a45124103..b5eff191b2d5 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-crossbar.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-crossbar.c
> @@ -306,6 +306,103 @@ static int mxc_isi_crossbar_set_fmt(struct v4l2_subdev *sd,
> return 0;
> }
>
> +static int mxc_isi_get_frame_desc(struct v4l2_subdev *sd, unsigned int pad,
> + struct v4l2_mbus_frame_desc *fd)
> +{
> + struct mxc_isi_crossbar *xbar = to_isi_crossbar(sd);
> + struct device *dev = xbar->isi->dev;
> + struct v4l2_subdev_route *route;
> + struct v4l2_subdev_state *state;
> + int ret = 0;
> +
> + if (pad < xbar->num_sinks)
> + return -EINVAL;
> +
> + memset(fd, 0, sizeof(*fd));
> +
> + state = v4l2_subdev_lock_and_get_active_state(sd);
> +
> + /*
> + * Iterate over all active routes. For each route going through the
> + * requested source pad, get the frame descriptor from the connected
> + * source subdev, find the corresponding stream entry, and add it to
> + * the output frame descriptor with the routed stream ID.
> + */
> + for_each_active_route(&state->routing, route) {
> + struct v4l2_mbus_frame_desc source_fd;
> + struct v4l2_subdev *remote_sd;
> + struct media_pad *remote_pad;
> + unsigned int i;
> +
> + if (route->source_pad != pad)
> + continue;
> +
> + /* Find the remote subdev connected to this sink pad */
> + remote_pad = media_pad_remote_pad_first(&xbar->pads[route->sink_pad]);
> + if (!remote_pad) {
> + dev_dbg(dev, "no remote pad connected to crossbar input %u\n",
> + route->sink_pad);
> + continue;
> + }
> +
> + remote_sd = media_entity_to_v4l2_subdev(remote_pad->entity);
> + if (!remote_sd) {
> + dev_err(dev, "no subdev connected to crossbar input %u\n",
> + route->sink_pad);
> + ret = -EPIPE;
> + goto out_unlock;
> + }
> +
> + /* Get frame descriptor from the remote subdev */
> + ret = v4l2_subdev_call(remote_sd, pad, get_frame_desc,
> + remote_pad->index, &source_fd);
> + if (ret == -ENOIOCTLCMD) {
> + dev_dbg(dev, "%s:%u does not support frame descriptors\n",
> + remote_sd->entity.name, remote_pad->index);
> + ret = 0;
> + continue;
> + }
> + if (ret < 0) {
> + dev_err(dev, "failed to get frame desc from %s:%u: %d\n",
> + remote_sd->entity.name, remote_pad->index, ret);
> + goto out_unlock;
> + }
> +
> + if (fd->num_entries == 0)
> + fd->type = source_fd.type;
> +
> + /* Find the source frame descriptor entry matching the sink stream */
> + for (i = 0; i < source_fd.num_entries; i++) {
> + if (source_fd.entry[i].stream == route->sink_stream)
> + break;
> + }
> +
> + if (i == source_fd.num_entries) {
> + dev_err(dev, "stream %u not found in frame desc from %s:%u\n",
> + route->sink_stream, remote_sd->entity.name,
> + remote_pad->index);
> + ret = -EPIPE;
> + goto out_unlock;
> + }
> +
> + if (fd->num_entries >= ARRAY_SIZE(fd->entry)) {
> + dev_err(dev, "frame descriptor is full\n");
> + ret = -ENOSPC;
> + goto out_unlock;
> + }
> +
> + /* Copy the entry and update the stream ID */
> + fd->entry[fd->num_entries] = source_fd.entry[i];
> + fd->entry[fd->num_entries].stream = route->source_stream;
> + fd->num_entries++;
> + }
> +
> +out_unlock:
> + v4l2_subdev_unlock_state(state);
> +
> + return ret;
> +}
> +
> static int mxc_isi_crossbar_set_routing(struct v4l2_subdev *sd,
> struct v4l2_subdev_state *state,
> enum v4l2_subdev_format_whence which,
> @@ -404,6 +501,7 @@ static const struct v4l2_subdev_pad_ops mxc_isi_crossbar_subdev_pad_ops = {
> .enum_mbus_code = mxc_isi_crossbar_enum_mbus_code,
> .get_fmt = v4l2_subdev_get_fmt,
> .set_fmt = mxc_isi_crossbar_set_fmt,
> + .get_frame_desc = mxc_isi_get_frame_desc,
> .set_routing = mxc_isi_crossbar_set_routing,
> .enable_streams = mxc_isi_crossbar_enable_streams,
> .disable_streams = mxc_isi_crossbar_disable_streams,
--
Regards,
Laurent Pinchart
^ permalink raw reply
* Re: [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs
From: David Hildenbrand (Arm) @ 2026-05-20 21:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Kumar Kartikeya Dwivedi, Catalin Marinas,
Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, Andrew Morton, Mike Rapoport, Emil Tsalapatis,
sched-ext, bpf, X86 ML, linux-arm-kernel, linux-mm, LKML
In-Reply-To: <CAADnVQ+j4o0Xy+cykSdx9txSqCouy0yHFaA09qTJp3OVF3VaRA@mail.gmail.com>
>>
>> At least in apply_range_clear_cb() one could similarly switch to
>> ptep_try_install() to at least have both these paths handle races in a
>> reasonable way. (having to handle when ptep_try_install() is not really implemented)
>
> You mean to use ptep_get_and_clear() ?
> Makes sense to me.
Yes, using also an atomic to replace it.
I recall that ptep_get_and_clear() might not be atomic, but I guess it is on the
architectures where ptep_try_install() is currently implemented.
>
>> Anyhow, the documentation of ptep_try_install() must clearly spell out that this
>> must be used very carefully, and only in special kernel page tables, never user
>> page tables. There are likely other scenarios we should document (caller must
>> prevent concurrent page table teardown somehow, and must be prepared to handle
>> races if other code is not using atomics).
>>
>> To highlight that, we should likely consider adding a "kernel" in the name, like
>> "ptep_try_install_kernel()".
>>
>> I am also not sure if "install" is the right terminology and whether it should
>> instead be "ptep_try_set()". (set_pte_at is the non-atomic interface right now)
>
> I suggested using the ptep_try_set() name too :)
>
>> Further note that last time I talked to Linus about arch helpers, he preferred
>>
>> #define ptep_try_install ptep_try_install
>>
>> over __HAVE_ARCH_PTEP_TRY_INSTALL
>
> ok.
> I guess __HAVE_ARCH_PTEP_GET_AND_CLEAR is legacy ?
Yeah ... and we have plenty of legacy around :)
--
Cheers,
David
^ permalink raw reply
* [PATCH v4 0/5] arm_mpam: resctrl: Counter Assignment (ABMC)
From: Ben Horgan @ 2026-05-20 21:24 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jic23, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, zengheng4, x86
Version 4 of this series addresses a few review comments and some concerns
of the sashiko bot.
From the cover letter of v3:
Removing the rfc tag as the resctrl precursors [1] have been queued in tip
x86/cache. Due to that dependency, it would be good for this to also go through
x86/cache.
This series adds support for memory bandwidth monitoring.
Please review and test.
Changelogs in patches.
[1] https://lore.kernel.org/all/20260506082855.3694761-1-ben.horgan@arm.com/
Description from the initial cover letter:
The MPAM counter assignment (ABMC emulation) changes that were dropped from
the resctrl glue series due to some missing precursors in resctrl. Counter
assignment enables bandwidth monitoring in systems that have fewer
monitors than resctrl monitor groups.
rfc v1: https://lore.kernel.org/lkml/20260225205436.3571756-1-ben.horgan@arm.com/
rfc v2: https://lore.kernel.org/lkml/20260319165540.381410-1-ben.horgan@arm.com/
v3: https://lore.kernel.org/linux-arm-kernel/20260511154147.557481-1-ben.horgan@arm.com/
The code can be found at:
https://gitlab.arm.com/linux-arm/linux-bh.git mpam_abmc_v4
Ben Horgan (2):
arm_mpam: resctrl: Pre-allocate assignable monitors
arm64: mpam: Add memory bandwidth usage (MBWU) documentation
James Morse (3):
arm_mpam: resctrl: Pick classes for use as MBM counters
arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use
arm_mpam: resctrl: Add resctrl_arch_cntr_read() &
resctrl_arch_reset_cntr()
Documentation/arch/arm64/mpam.rst | 17 ++
drivers/resctrl/mpam_internal.h | 6 +-
drivers/resctrl/mpam_resctrl.c | 308 +++++++++++++++++++++++++++---
3 files changed, 308 insertions(+), 23 deletions(-)
--
2.43.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox