Linux-ARM-Kernel Archive on lore.kernel.org

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v4 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Bjorn Helgaas @ 2026-05-19 19:36 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, robin.murphy, bhelgaas, joro, praan, baolu.lu,
	kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <f6734b9dad0050138676f11ecd14e9db1cf6b697.1777269009.git.nicolinc@nvidia.com>

On Sun, Apr 26, 2026 at 10:54:00PM -0700, Nicolin Chen wrote:
> Controlled by the IOMMU driver, ATS is usually enabled "on demand" when a
> given PASID on a device is attached to an I/O page table. This is working
> even when a device has no translation on its RID (i.e., the RID is IOMMU
> bypassed).
> 
> However, certain PCIe devices require non-PASID ATS on their RID even when
> the RID is IOMMU bypassed. Call this "always on".
> 
> For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache:
>  "To source requests on CXL.cache, devices need to get the Host Physical
>   Address (HPA) from the Host by means of an ATS request on CXL.io."
> 
> In other words, the CXL.cache capability requires ATS; otherwise, it can't
> access host physical memory.
> 
> Introduce a new pci_ats_always_on() helper for the IOMMU driver to scan a
> PCI device and shift ATS policies between "on demand" and "always on".
> 
> Add the support for CXL.cache devices first. Pre-CXL devices will be added
> in quirks.c file.
> 
> Note that pci_ats_always_on() validates against pci_ats_supported(), so we
> ensure that untrusted devices (e.g. external ports) will not be always on.
> This maintains the existing ATS security policy regarding potential side-
> channel attacks via ATS.

IMO this doesn't really fit in the PCI core.  ats.c encapsulates
discovery and provides interfaces to access the ATS Capability, but
the users of those interfaces are all outside the PCI core.

The decision to enable enable ATS for CXL.cache devices is fine but
it's really an IOMMU usage policy, and I think it should be
implemented in the IOMMU core.  All the pieces needed
(pci_ats_disabled(), pci_ats_supported(), pci_find_dvsec_capability())
are already exported.

One motivation for putting this in the PCI core was to use the quirk
infrastructure, but this series doesn't use any of that.  It doesn't
declare any fixups, e.g., DECLARE_PCI_FIXUP_FINAL, and it doesn't
update any state cached by the PCI core.

> +++ b/include/uapi/linux/pci_regs.h
> @@ -1349,6 +1349,7 @@
>  /* CXL r4.0, 8.1.3: PCIe DVSEC for CXL Device */
>  #define PCI_DVSEC_CXL_DEVICE				0
>  #define  PCI_DVSEC_CXL_CAP				0xA
> +#define   PCI_DVSEC_CXL_CACHE_CAPABLE			_BITUL(0)

This makes good sense, I'm fine with adding
PCI_DVSEC_CXL_CACHE_CAPABLE.

> +bool pci_ats_always_on(struct pci_dev *pdev)
> +{
> +	if (pci_ats_disabled() || !pci_ats_supported(pdev))
> +		return false;

Isn't this the same as:

  if (!pci_ats_supported(pdev))
    return false;

If pci_ats_disabled(), dev->ats_cap should be zero, so
pci_ats_supported() should always return false.

> +
> +	/* A VF inherits its PF's requirement for ATS function */
> +	if (pdev->is_virtfn)
> +		pdev = pci_physfn(pdev);
> +
> +	return pci_cxl_ats_always_on(pdev);
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_always_on);


^ permalink raw reply

* Re: [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
From: Jason Gunthorpe @ 2026-05-19 19:16 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Helgaas,
	Rafael J . Wysocki, Len Brown, Pranjal Shrivastava, Mostafa Saleh,
	Lu Baolu, Kevin Tian, linux-arm-kernel, iommu, linux-kernel,
	linux-acpi, linux-pci, vsethi, Shuai Xue
In-Reply-To: <agysAyreCY4huf0I@Asurada-Nvidia>

On Tue, May 19, 2026 at 11:29:23AM -0700, Nicolin Chen wrote:
> On Tue, May 19, 2026 at 09:07:37AM -0300, Jason Gunthorpe wrote:
> > On Mon, May 18, 2026 at 08:38:54PM -0700, Nicolin Chen wrote:
> > > +void iommu_report_device_broken(struct device *dev)
> > > +{
> > > +	struct group_device *gdev;
> > > +
> > > +	/*
> > > +	 * We cannot hold group->mutex here. Rely on iommu_group_broken_worker()
> > > +	 * to validate dev_has_iommu(). The iommu_group memory is RCU-protected
> > > +	 * via kfree_rcu() in iommu_group_release(), and group->devices is an
> > > +	 * RCU-protected list, so the lookup runs entirely under rcu_read_lock.
> > > +	 *
> > > +	 * Note the device might have been concurrently removed from the group
> > > +	 * (list_del_rcu) before iommu_deinit_device() cleared the dev->iommu.
> > > +	 */
> > > +	rcu_read_lock();
> > > +	gdev = __dev_to_gdev_rcu(dev);
> > > +	if (gdev) {
> > 
> > If this is why the RCU is being added it seems like overkill.
> > 
> > Just add the worker to struct dev_iommu and push it there so it can
> > use a mutex but I'm confused why are we even adding this function?
> > 
> > The entire design of this series was supposed to have the IOMMU driver
> > itself adjust it's "STE" to inhibit translated TLPs synchronosly
> > within its fully locked invalidation loop.
> 
> Yes. Surgical STE is done in the driver. But, core-level attaching
> state doesn't reflect correctly. So the driver calls this function
> to notify the core (this is in an invalidation context -- not able
> to use mutex).
> 
> > Whats the async worker for?
> 
> Then, the core needs to block the device using the similar routine
> to the reset prepare(). And that needs to hold group->mutex, so it
> needs an async worker.
> 
> Do you see a much simpler way?

Put the work on the dev_iommu and forget about rcu.

But this is all probably better as some later series if at all. The
driver can block the ATS and the expectation is something will FLR the
device. The FLR will set the blocking and then restore the
domain. None of this async work seems functionally necessary, though
it would be a nice to have. Lets focus on the bare minimum here it, it
is already a difficult enough problem without tacking on these
extras..

Jason


^ permalink raw reply

* Re: [PATCH 1/3] arm64: dts: freescale: imx{91,93}-phycore-som: Set BUCK5 in FPWM mode
From: Frank.Li @ 2026-05-19 19:12 UTC (permalink / raw)
  To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Primoz Fiser
  Cc: Frank Li, devicetree, imx, linux-arm-kernel, linux-kernel,
	upstream
In-Reply-To: <20260507062058.1711292-1-primoz.fiser@norik.com>

From: Frank Li <Frank.Li@nxp.com>


On Thu, 07 May 2026 08:20:56 +0200, Primoz Fiser wrote:
> Set PMIC BUCK5 mode to forced PWM (Pulse Width Modulation) mode instead
> of the default automatic PFM and PWM transition mode. FPWM mode produces
> less ripple on the output voltage rail under light load conditions. And
> since BUCK5 supplies SoC internal ADC reference voltage we need to keep
> voltage ripple to a minimum. This solves issues with the occasional ADC
> calibration procedure failures on phyCORE-i.MX91/93 SoM based boards.
> 
> [...]

Applied, thanks!

[1/3] arm64: dts: freescale: imx{91,93}-phycore-som: Set BUCK5 in FPWM mode
      commit: 5601ee1b64da17d2df2e657a7f1582ad468ed4d2
[2/3] arm64: dts: freescale: imx{91,93}-phycore-som: Adjust PHY RST drive-strength
      commit: e1b256ccc5cc3fc59a67a261d711bbbb989d7512
[3/3] arm64: dts: freescale: imx{91,93}-phycore-som: Improve USDHC signals
      commit: a6254c90e14b5b9d0b2222b2121598fe12f55ca5

Best regards,
-- 
Frank Li <Frank.Li@nxp.com>


^ permalink raw reply

* Re: [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs
From: Alexei Starovoitov @ 2026-05-19 19:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Hildenbrand (Arm), David Vernet, Andrea Righi, Changwoo Min,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Kumar Kartikeya Dwivedi, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, Andrew Morton, Mike Rapoport, Emil Tsalapatis,
	sched-ext, bpf, X86 ML, linux-arm-kernel, linux-mm, LKML
In-Reply-To: <agyZxmHoueGwBAQg@slm.duckdns.org>

On Tue, May 19, 2026 at 10:11 AM Tejun Heo <tj@kernel.org> wrote:
>
> On Tue, May 19, 2026 at 11:40:48AM +0200, David Hildenbrand (Arm) wrote:
> > On 5/19/26 11:05, David Hildenbrand (Arm) wrote:
> ...
> > >> The only requirements are that the kernel doesn't oops and the
> > >> violation gets caught. Beyond that, behavior at the address is
> > >> unspecified, and which installer wins the race doesn't matter as
> > >> long as kernel integrity holds.
> > >
> > > You'll have inconsistent TLB state.
>
> Wouldn't it still be either or with both cases being okay?
>
> > > I really don't like that approach.
> > >
> > > We should really try to just take the lock, and remove any code under the lock
> > > that could trigger such unpleasant deadlocks.
> > >
> > > Is that feasible?
> >
> > ... or can we run into similar problems with kprobes? (I am obviously no bpf
> > expert ...)
>
> Yeah, I mean, that was just the first TP I found scanning the code. Any
> kprobes or other TPs in the path would behave the same.
>
> When this fault triggers, the BPF program has already malfunctioned, so it's
> not going to be a high frequency path and performance isn't a primary
> consideration. So, anything that can ensure that the kernel doesn't crash or
> lock up would be fine. Any better ideas?

As you guys already figured out the trylock is not an option.
The fault handler has to install _some_ page and let kernel continue.
Scratch page or arena page doesn't matter. Potentially different CPUs
will see different page. It's not a concern at all.
bpf prog is buggy, but the kernel will continue to work without a glitch.
bpf runtime will disable and unload misbehaving prog.


^ permalink raw reply

* Re: [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu()
From: Nicolin Chen @ 2026-05-19 18:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Helgaas,
	Rafael J . Wysocki, Len Brown, Pranjal Shrivastava, Mostafa Saleh,
	Lu Baolu, Kevin Tian, linux-arm-kernel, iommu, linux-kernel,
	linux-acpi, linux-pci, vsethi, Shuai Xue
In-Reply-To: <20260519113916.GO787748@nvidia.com>

On Tue, May 19, 2026 at 08:39:16AM -0300, Jason Gunthorpe wrote:
> On Mon, May 18, 2026 at 08:38:49PM -0700, Nicolin Chen wrote:
> > dev->iommu_group will be read in an ISR-context to look up a group_device
> > for fault reporting, in which case mutex cannot be used. For that read to
> > be safe, two things are needed:
> 
> What driver does this? iommu_report_device_fault() has to be called in
> a sleepable context - usually a threaded IRQ handler. So mutex is no
> problem.

It's used in an invalidation context, where mutex is a problem.

> This seems like Sashiko slop - iommu_group does not change while a
> driver is attached and a driver is not permitted to do any "fault
> handling" after it has detached, it must flush and synchronize its
> IRQ if it is using this from a hard IRQ for some bad reason.

Oh, I probably picked a wrong word. "fault" here means "ATS broken".

IOMMU driver sees ATC timeout during invalidation, and reports "ATS
broken" in a lockless context where device could be detached. No?

> You need to be alot more critical about the noise that Sashiko
> generates, alot is useful, alot is not. It is a not a tool you want to
> have 0 reports, complaining about hallucinations is expected.

OK. I will be more aware of that in the future.

Thanks
Nicolin


^ permalink raw reply

* Re: (subset) [PATCH v8 1/4] dt-bindings: backlight: Add max25014 support
From: Frank Li @ 2026-05-19 18:51 UTC (permalink / raw)
  To: Lee Jones
  Cc: Daniel Thompson, Jingoo Han, Pavel Machek, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Helge Deller, Shawn Guo,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Liam Girdwood, Mark Brown, Maud Spierings, dri-devel, linux-leds,
	devicetree, linux-kernel, linux-fbdev, imx, linux-arm-kernel
In-Reply-To: <177755722019.2606736.10749503716773482329.b4-ty@b4>

On Thu, Apr 30, 2026 at 02:53:40PM +0100, Lee Jones wrote:
> On Tue, 07 Apr 2026 16:41:42 +0200, Maud Spierings wrote:
> > The Maxim MAX25014 is a 4-channel automotive grade backlight driver IC
> > with integrated boost controller.
>
> Applied, thanks!
>
> [1/4] dt-bindings: backlight: Add max25014 support
>       commit: 5fcbbedec9dfce78044eee922bf2030e1bd03faa

Lee Jones:

	I have not seen it in linux-next. Anything wrong?

Frank

>
> --
> Lee Jones [李琼斯]
>


^ permalink raw reply

* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Yang Shi @ 2026-05-19 18:50 UTC (permalink / raw)
  To: Barry Song
  Cc: Matthew Wilcox, surenb, akpm, linux-mm, david, ljs, liam, vbabka,
	rppt, mhocko, jack, pfalcato, wanglian, chentao, lianux.mm,
	kunwu.chan, liyangouwen1, chrisl, kasong, shikemeng, nphamcs, bhe,
	youngjun.park, linux-arm-kernel, linux-kernel, loongarch,
	linuxppc-dev, linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <CAGsJ_4w_-Y8qNLDeLX9OWpLpK01YG2bF-N6_mGypgsauvfCvkA@mail.gmail.com>

On Tue, May 19, 2026 at 4:07 AM Barry Song <baohua@kernel.org> wrote:
>
> On Tue, May 19, 2026 at 5:21 AM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Sun, May 17, 2026 at 1:45 AM Barry Song <baohua@kernel.org> wrote:
> > >
> > > On Sat, May 2, 2026 at 1:58 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Sat, May 02, 2026 at 01:44:34AM +0800, Barry Song wrote:
> > > > > On Fri, May 1, 2026 at 10:57 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Fri, May 01, 2026 at 06:49:58AM +0800, Barry Song wrote:
> > > > > > > 1. There is no deterministic latency for I/O completion. It depends on
> > > > > > > both the hardware and the software stack (bio/request queues and the
> > > > > > > block scheduler). Sometimes the latency is short; at other times it can
> > > > > > > be quite long. In such cases, a high-priority thread performing operations
> > > > > > > such as mprotect, unmap, prctl_set_vma, or madvise may be forced to wait
> > > > > > > for an unpredictable amount of time.
> > > > > >
> > > > > > But does that actually happen?  I find it hard to believe that thread A
> > > > > > unmaps a VMA while thread B is in the middle of taking a page fault in
> > > > > > that same VMA.  mprotect() and madvise() are more likely to happen, but
> > > > > > it still seems really unlikely to me.
> > > > >
> > > > > It doesn’t have to involve unmapping or applying mprotect to
> > > > > the entire VMA—just a portion of it is sufficient.
> > > >
> > > > Yes, but that still fails to answer "does this actually happen".  How much
> > > > performance is all this complexity in the page fault handler buying us?
> > > > If you don't answer this question, I'm just going to go in and rip it
> > > > all out.
> > > >
> > >
> > > Hi Matthew (and Lorenzo, Jan, and anyone else who may be
> > > waiting for answers),
> > >
> > > As promised during LSF/MM/BPF, we conducted thorough
> > > testing on Android phones to determine whether performing
> > > I/O in `filemap_fault()` can block `vma_start_write()`.
> > > I wanted to give a quick update on this question.
> > >
> > > Nanzhe at Xiaomi created tracing scripts and ran various
> > > applications on Android devices with I/O performed under
> > > the VMA lock in `filemap_fault()`. We found that:
> > >
> > > 1. There are very few cases where unmap() is blocked by
> > >    page faults. I assume this is due to buggy user code
> > >    or poor synchronization between reads and unmap().
> > > So I assume it is not a problem.
> > >
> > > 2. We observed many cases where `vma_start_write()`
> > >    is blocked by page-fault I/O in some applications.
> > >    The blocking occurs in the `dup_mmap()` path during
> > >    fork().
> > >
> > > With Suren's commit fb49c455323ff ("fork: lock VMAs of
> > > the parent process when forking"), we now always hold
> > > `vma_write_lock()` for each VMA. Note that the
> > > `mmap_lock` write lock is also held, which could lead to
> > > chained waiting if page-fault I/O is performed without
> > > releasing the VMA lock.
> > >
> > > My gut feeling is that Suren's commit may be overshooting,
> > > so my rough idea is that we might want to do something like
> > > the following (we haven't tested it yet and it might be
> > > wrong):
> > >
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 2311ae7c2ff4..5ddaf297f31a 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1762,7 +1762,13 @@ __latent_entropy int dup_mmap(struct mm_struct
> > > *mm, struct mm_struct *oldmm)
> > >         for_each_vma(vmi, mpnt) {
> > >                 struct file *file;
> > >
> > > -               retval = vma_start_write_killable(mpnt);
> > > +               /*
> > > +                * For anonymous or writable private VMAs, prevent
> > > +                * concurrent CoW faults.
> > > +                */
> > > +               if (!mpnt->vm_file || (!(mpnt->vm_flags & VM_SHARED) &&
> > > +                                       (mpnt->vm_flags & VM_WRITE)))
> > > +                       retval = vma_start_write_killable(mpnt);
> > >                 if (retval < 0)
> > >                         goto loop_out;
> > >                 if (mpnt->vm_flags & VM_DONTCOPY) {
> >
> > Maybe a little bit off topic. This is an interesting idea. It seems
> > possible we don't have to take vma write lock unconditionally. IIUC
> > the write lock is mainly used to serialize against page fault and
> > madvise, right? I got a crazy idea off the top of my head. We may be
> > able to just take vma write lock iff vma->anon_vma is not NULL.
> >
> > First of all, write mmap_lock is held, so the vma can't go or be
> > changed under us.
> >
> > Secondly, if vma->anon_vma is NULL, it basically means either no page
> > fault happened or no cow happened, so there is no page table to copy,
> > this is also what copy_page_range() does currently. So we can shrink
> > the critical section to:
> >
> > if (vma->anon_vma) {
> >     vma_start_write_killable(src_vma);
> >     anon_vma_fork(dst_vma, src_vma);
> >     copy_page_range(dst_vma, src_vma);
> > }
> >
> > But page fault can happen before write mmap_lock is taken, when we
> > check vma->anon_vma, it is possible it has not been set up yet. But it
> > seems to be equivalent to page fault after fork and won't break the
> > semantic.
>
> Re-reading Suren's commit log for fb49c455323ff8
> ("fork: lock VMAs of the parent process when forking"),
> it seems that vm_start_write() is used to protect
> against a race where anon_vma changes from NULL to
> non-NULL during fork. In that scenario, we hold the
> mmap_lock write lock, but not vma_start_write(), so a
> concurrent anon_vma_prepare() could still install an
> anon_vma.
>
> "    A concurrent page fault on a page newly marked read-only by the page
>     copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
>     source vma, defeating the anon_vma_clone() that wasn't done because the
>     parent vma originally didn't have an anon_vma, but we now might end up
>     copying a pte entry for a page that has one.
> "
>
> If that is the case, then your change does not work.
>
> Nowadays, nobody calls anon_vma_prepare(vma) directly.
> Instead, vmf_anon_prepare() is used, and we always
> require the mmap_lock read lock before calling
> __anon_vma_prepare(). As a result, anon_vma cannot
> transition from NULL to non-NULL during fork.
>
> So the original race condition has effectively
> disappeared.

anon_vma_prepare() has some usecases too, but it seems like it
requires taking read mmap_lock too if I read the code correctly.

>
> You also mentioned the madvise() case. If I understand
> correctly, madvise() should take mmap_lock before
> modifying anon_vma. Only some parts of madvise() can
> support per-VMA locking. Therefore, we probably do not
> need:
>
> if (vma->anon_vma) {
> vma_start_write_killable(src_vma);
> ...
> }

I think we still need write vma lock to serialize anon_vma fork
otherwise we may see:

        CPU 0                                                 CPU 1
fork                                                       page fault
   src vma has no anon_vma
       skip vma fork

allocate anon_vma for src vma
vma_needs_copy() sees anon_vma
copy page

Then we may end up being no anon_vma for dst vma, but with pages mapped in it.

Thanks,
Yang

>
> >
> > Anyway, just a crazy idea, I may miss some corner cases.
>
> To me, it seems that we could remove vma_start_write()
> entirely now. Or is that an even crazier idea?


>
> Thanks
> Barry


^ permalink raw reply

* Re: [PATCH 2/2] arm64: dts: freescale: add initial device tree for TQMa8MPQS with i.MX8MP
From: Frank Li @ 2026-05-19 18:44 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Geert Uytterhoeven,
	Magnus Damm, Shawn Guo, Paul Gerber, devicetree, linux-kernel,
	imx, linux-arm-kernel, linux, linux-renesas-soc
In-Reply-To: <20260505063346.1799500-2-alexander.stein@ew.tq-group.com>

On Tue, May 05, 2026 at 08:33:44AM +0200, Alexander Stein wrote:
> From: Paul Gerber <paul.gerber@tq-group.com>
>
> This adds support for TQMa8MPQS module on MB-SMARC-2 board.
>
> Signed-off-by: Paul Gerber <paul.gerber@tq-group.com>
> Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
> ---
...
> +
> +&usb3_0 {
> +	pinctrl-names = "default";
> +	pinctrl-0 = <&pinctrl_usb0>;
> +	fsl,over-current-active-low;
> +	maximum-speed = "high-speed";

arch/arm64/boot/dts/freescale/imx8mp-tqma8mpqs-mb-smarc-2.dtb: usb@32f10100 (fsl,imx8mp-dwc3): 'maximum-speed' does not match any of the regexes: '^pinctrl-[0-9]+$', '^usb@[0-9a-f]+$'
	from schema $id: http://devicetree.org/schemas/usb/fsl,imx8mp-dwc3.yaml

It will reduce review time if run CHECK_DTBS locally before post

Frank


^ permalink raw reply

* Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
From: Yang Shi @ 2026-05-19 18:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Barry Song, Matthew Wilcox, surenb, akpm, linux-mm, david, liam,
	vbabka, rppt, mhocko, jack, pfalcato, wanglian, chentao,
	lianux.mm, kunwu.chan, liyangouwen1, chrisl, kasong, shikemeng,
	nphamcs, bhe, youngjun.park, linux-arm-kernel, linux-kernel,
	loongarch, linuxppc-dev, linux-riscv, linux-s390, Nanzhe Zhao
In-Reply-To: <agxnJ8R-G3CRjeTR@lucifer>

On Tue, May 19, 2026 at 6:39 AM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Tue, May 19, 2026 at 02:12:10PM +0100, Lorenzo Stoakes wrote:
> > On Mon, May 18, 2026 at 02:21:14PM -0700, Yang Shi wrote:
> > > Maybe a little bit off topic. This is an interesting idea. It seems
> > > possible we don't have to take vma write lock unconditionally. IIUC
> > > the write lock is mainly used to serialize against page fault and
> > > madvise, right? I got a crazy idea off the top of my head. We may be
> >
> > Err no, it serialises against literally any modification or read of any
> > characteristic of VMAs.

If I remember correctly, you are not supposed to change VMA
flags/size/mm pointer/vm_file/pgoff/prot, etc, under read vma lock or
read mmap_lock.

> >
> > > able to just take vma write lock iff vma->anon_vma is not NULL.
> >
> > Except if we don't take it and vma->anon_vma is NULL, then somebody can
> > anon_vma_prepare() and change vma->anon_vma midway through a fork and completely
> > screw up the anon_vma fork hierarchy.
>
> correction: this won't happen as per Barry (see - I managed to confuse myself
> here :), since for vma->anon_vma install we take the mmap read lock.
>
> BUT we also have to consider other cases.
>
> >
> > So no.
> >
> > >
> > > First of all, write mmap_lock is held, so the vma can't go or be
> > > changed under us.
> >
> > vma->anon_vma can be changed.
>
> Correction: no it can't :)

Yes, vma->anon_vma change should require taking read mmap_lock.

>
> >
> > >
> > > Secondly, if vma->anon_vma is NULL, it basically means either no page
> > > fault happened or no cow happened, so there is no page table to copy,
> > > this is also what copy_page_range() does currently. So we can shrink
> > > the critical section to:
> >
> > Firstly, with no VMA write lock, !vma->anon_vma means a fault can race and
> > secondly copy_page_range() checks vma_needs_copy(), there are other cases - PFN
> > maps, mixed maps, UFFD W/P (ugh), guard regions.
> >
> > So yeah this isn't sufficient.
>
> However this is true...

Yes, fault can race with fork. Basically this is actually the purpose
of this idea. We can have improved page fault scalability. In my
proposal (take write vma lock if vma->anon_vma is not NULL), the race
just happens on the VMAs which page fault has not happened on before.
vma_needs_copy() also skips the VMAs which don't have vma->anon_vma.
So there is basically no difference in semantics other than more page
fault races IIUC. It should be safe as long as we can guarantee there
is no writable PTE point to a shared page after fork.

For guard regions, it can be serialized by vma write lock if
vma->anon_vma exists. If vma->anon_vma is NULL, it will prepare
anon_vma, which will take read mmap_lock if I read the code correctly.

I have not investigated UFFD yet.

>
> >
> > >
> > > if (vma->anon_vma) {
> > >     vma_start_write_killable(src_vma);
> > >     anon_vma_fork(dst_vma, src_vma);
> > >     copy_page_range(dst_vma, src_vma);
> > > }
> >
> > Yeah that's totally broken fo reasons above as I said :)
> >
> > >
> > > But page fault can happen before write mmap_lock is taken, when we
> > > check vma->anon_vma, it is possible it has not been set up yet. But it
> > > seems to be equivalent to page fault after fork and won't break the
> > > semantic.
> >
> > It will totally break how the anon_vma hierarchy works :) See the links at the
> > top of https://ljs.io/talks for a link to various slides on anon_vma behaviour
> > (it's really a pain to think about because it's a super broken abstraction).
> >
> > You could end up with a CoW mapping that's unreachable from rmap and you could
> > get some nasty issues with page table entries pointing at freed folios :)
>
> Correction: actually we should be safe given mmap read lock on anon_vma install.
>
> >
> > >
> > > Anyway, just a crazy idea, I may miss some corner cases.
> >
> > Yeah sorry to push back here but this is just not a viable approach.

No worries. Thanks for all the feedback. Just tried to explore whether
such an idea is feasible or not.

> >
> > And this is forgetting that we have relied on page faults being blocked by fork
> > _forever_, who knows what else has baked in assumptions about that
> > serialisation.
> >
> > Forking is one of the nastiest parts of mm and has had multiple, subtle, corner
> > case breakages that have been a nightmare to deal with.

Yes, this might be the biggest concern. The page fault can race with
fork. If some applications rely on such subtle behavior, it may break,
but such applications are fragile too.

> >
> > So I'm very much against changing this behaviour to try to fix something in the
> > fault path.
> >
> > We should address the fault path issues in the fault path :)

Yeah, this idea was inspired by Barry's "not take vma read lock
unconditionally" idea. Maybe irrelevant to Barry's priority inversion
problem, just an idea for further optimization on page fault
scalability. This probably should be a separate topic.

Thanks,
Yang

>
> Above still all true though.
>
> >
> > >
> > > Thanks,
> > > Yang
> > >
> > > }
> > >
> > > >
> > > > Based on the above, we may want to re-check whether fork()
> > > > can be blocked by page faults. At the same time, if Suren,
> > > > you, or anyone else has any comments, please feel free to
> > > > share them.
> > > >
> > > > Best Regards
> > > > Barry
> > > >
> >
> > Cheers, Lorenzo
>
> So still a nope :)
>
> Cheers, Lorenzo


^ permalink raw reply

* Re: [PATCH v2 1/2] i2c: imx: Don't recover bus when arbitration lost
From: Dan Scally @ 2026-05-19 18:32 UTC (permalink / raw)
  To: Carlos Song (OSS)
  Cc: linux-i2c@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Andi Shyti, Frank Li,
	Sascha Hauer, Fabio Estevam, Gao Pan, Fugang Duan, Wolfram Sang,
	Oleksij Rempel, Pengutronix Kernel Team
In-Reply-To: <AM0PR04MB680225F01902AD7990E4B6E5E8002@AM0PR04MB6802.eurprd04.prod.outlook.com>

Hi Carlos

On 19/05/2026 11:29, Carlos Song (OSS) wrote:
> 
> 
>> -----Original Message-----
>> From: Dan Scally <dan.scally@ideasonboard.com>
>> Sent: Tuesday, May 19, 2026 4:42 PM
>> To: Oleksij Rempel <o.rempel@pengutronix.de>; Pengutronix Kernel Team
>> <kernel@pengutronix.de>
>> Cc: linux-i2c@vger.kernel.org; imx@lists.linux.dev;
>> linux-arm-kernel@lists.infradead.org; Andi Shyti <andi.shyti@kernel.org>; Frank
>> Li <frank.li@nxp.com>; Sascha Hauer <s.hauer@pengutronix.de>; Fabio
>> Estevam <festevam@gmail.com>; Gao Pan <b54642@freescale.com>; Fugang
>> Duan <B38611@freescale.com>; Wolfram Sang <wsa@kernel.org>
>> Subject: Re: [PATCH v2 1/2] i2c: imx: Don't recover bus when arbitration lost
>>
>> [You don't often get email from dan.scally@ideasonboard.com. Learn why this is
>> important at https://aka.ms/LearnAboutSenderIdentification ]
>>
>> Hello Oleksij / all
>>
>> On 24/04/2026 13:36, Daniel Scally wrote:
>>> In i2c_imx_xfer_common(), the driver attempts bus recovery whenever
>>> i2c_imx_start() fails. One of the failure modes for i2c_imx_start() is
>>> an arbitration-lost signal which results when a second I2C master on
>>> the bus tries to control the bus simultaneously, which is a normal and
>>> expected behaviour.
>>>
>>> Bus recovery is not the right response for this case. Add a check for
>>> the -EAGAIN return code to avoid running the bus recovery.
>>>
>>> Fixes: 1c4b6c3bcf30d ("i2c: imx: implement bus recovery")
>>> Signed-off-by: Daniel Scally <dan.scally@ideasonboard.com>
>>> ---
>>
>> I raised this patch after we had issues with one of the i2c controllers on imx8mp.
>> In that case, the bus had multiple masters that were causing the SoC's i2c
>> controller to lose arbitration. The result was that the framework attempted to
>> run i2c_generic_scl_recovery() and regularly hit the "SCL is stuck low, exit
>> recovery" message [1] because the bus was busy rather than stuck.
>>
>> I'm now experiencing a different issue with the imx8mp in which a different
>> controller - which isn't on a multiple-masters bus - starts transacting fine early in
>> boot, but then seems to get stuck - any attempt to start a transaction by either
>> a driver or i2ctransfer results in the IAL bit in I2C_I2SR being set and so the
>> driver reports that it's lost arbitration [2]. In this case, the bus recovery is
>> needed to fix the problem, and so this commit hurts things rather than helps
>> them. This problem isn't consistent - I get it on maybe 10% of boots.
>>
> Hi Dan,
> 
> This is the RM shows:
> 
> Arbitration lost. Set by hardware in the following circumstances (IAL must be cleared by software by
> writing a "0" to it at the start of the interrupt service routine):
> * I2Cn_SDA input samples low when the master drives high during an address or data-transmit cycle.
> * I2Cn_SDA input samples low when the master drives high during the acknowledge bit of a datareceive
> cycle.
> For the above two cases, the bit is set at the falling edge of the ninth I2Cn_SCL clock during the ACK
> cycle.
> * A Start cycle is attempted when the bus is busy.
> * A Repeated Start cycle is requested in Slave mode.
> * A Stop condition is detected when the master did not request it.
> NOTE: Software cannot set the bit.
> 0 No arbitration lost.
> 1 Arbitration is lost.
> 
>  From my understanding:
> The IAL (Arbitration Lost) bit is set not only when true arbitration is lost, but also in several other conditions:
> 
> - SDA is sampled low when the master drives it high (during address/data or ACK phase)
> - A START is attempted while the bus is busy
> - A STOP condition is detected unexpectedly
> - A repeated START occurs in slave mode
> 
> So in practice, IAL can be asserted not only by real arbitration loss, but also when the controller detects abnormal bus conditions.
> 
> Since your system is single-master, this is unlikely to be a true arbitration scenario. Instead, it is more likely caused by signal integrity or timing-related issues, such as:
> - weak pull-up / slow rising edges
> - noise or glitches on SDA
> - timing violations from the slave device
> - others
> 
> As a workaround, you can enable the 'single-master' property to disable arbitration checks in single-master systems, for example:
> 
> &i2c1 {
>      clock-frequency = <400000>;
>      pinctrl-names = "default", "gpio";
>      pinctrl-0 = <&pinctrl_i2c1>;
>      pinctrl-1 = <&pinctrl_i2c1_gpio>;
>      scl-gpios = <&gpio5 14 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
>      sda-gpios = <&gpio5 15 (GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN)>;
>      single-master;
>      status = "okay";
> };

Thanks! I'm away from the hardware at the moment but I'll give it a try next week and see if that 
fixes the issue.

Dan

> 
> Hope it will help some.
> 
> Carlos



^ permalink raw reply

* Re: [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
From: Nicolin Chen @ 2026-05-19 18:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Helgaas,
	Rafael J . Wysocki, Len Brown, Pranjal Shrivastava, Mostafa Saleh,
	Lu Baolu, Kevin Tian, linux-arm-kernel, iommu, linux-kernel,
	linux-acpi, linux-pci, vsethi, Shuai Xue
In-Reply-To: <20260519120737.GQ787748@nvidia.com>

On Tue, May 19, 2026 at 09:07:37AM -0300, Jason Gunthorpe wrote:
> On Mon, May 18, 2026 at 08:38:54PM -0700, Nicolin Chen wrote:
> > +void iommu_report_device_broken(struct device *dev)
> > +{
> > +	struct group_device *gdev;
> > +
> > +	/*
> > +	 * We cannot hold group->mutex here. Rely on iommu_group_broken_worker()
> > +	 * to validate dev_has_iommu(). The iommu_group memory is RCU-protected
> > +	 * via kfree_rcu() in iommu_group_release(), and group->devices is an
> > +	 * RCU-protected list, so the lookup runs entirely under rcu_read_lock.
> > +	 *
> > +	 * Note the device might have been concurrently removed from the group
> > +	 * (list_del_rcu) before iommu_deinit_device() cleared the dev->iommu.
> > +	 */
> > +	rcu_read_lock();
> > +	gdev = __dev_to_gdev_rcu(dev);
> > +	if (gdev) {
> 
> If this is why the RCU is being added it seems like overkill.
> 
> Just add the worker to struct dev_iommu and push it there so it can
> use a mutex but I'm confused why are we even adding this function?
> 
> The entire design of this series was supposed to have the IOMMU driver
> itself adjust it's "STE" to inhibit translated TLPs synchronosly
> within its fully locked invalidation loop.

Yes. Surgical STE is done in the driver. But, core-level attaching
state doesn't reflect correctly. So the driver calls this function
to notify the core (this is in an invalidation context -- not able
to use mutex).

> Whats the async worker for?

Then, the core needs to block the device using the similar routine
to the reset prepare(). And that needs to hold group->mutex, so it
needs an async worker.

Do you see a much simpler way?

Thanks
Nicolin


^ permalink raw reply

* Re: [PATCH 3/3] arm64: dts: imx95: Add iommus property and enable SMMU
From: Frank Li @ 2026-05-19 18:16 UTC (permalink / raw)
  To: Peng Fan (OSS)
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, devicetree, imx,
	linux-arm-kernel, linux-kernel, Peng Fan
In-Reply-To: <20260409-imx95-s-dts-v1-3-858e83ae1a37@nxp.com>

On Thu, Apr 09, 2026 at 08:00:03PM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <peng.fan@nxp.com>
>
> Add iommus property for SDHC and EDMA
> Enable SMMU by default.
>
> Signed-off-by: Peng Fan <peng.fan@nxp.com>
> ---

Peng:
	I have to drop this patch because it cause below CHECK_DTB warnings
arch/arm64/boot/dts/freescale/imx95-verdin-wifi-dev.dtb: dma-controller@42210000 (fsl,imx95-edma5): Unevaluated properties are not allowed ('iommus' was unexpected)
	from schema $id: http://devicetree.org/schemas/dma/fsl,edma.yaml


Frank

>  arch/arm64/boot/dts/freescale/imx95.dtsi | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/imx95.dtsi b/arch/arm64/boot/dts/freescale/imx95.dtsi
> index 3e35c956a4d7af88310b3dfaef7e3d064f530e07..adcc0e1d3696b93250ab97fcac7c181b187d3d10 100644
> --- a/arch/arm64/boot/dts/freescale/imx95.dtsi
> +++ b/arch/arm64/boot/dts/freescale/imx95.dtsi
> @@ -777,6 +777,7 @@ edma3: dma-controller@42210000 {
>  					     <GIC_SPI 287 IRQ_TYPE_LEVEL_HIGH>;
>  				clocks = <&scmi_clk IMX95_CLK_BUSWAKEUP>;
>  				clock-names = "dma";
> +				iommus = <&smmu 0x0>;
>  			};
>
>  			mu7: mailbox@42430000 {
> @@ -1242,6 +1243,7 @@ usdhc1: mmc@42850000 {
>  				bus-width = <8>;
>  				fsl,tuning-start-tap = <1>;
>  				fsl,tuning-step = <2>;
> +				iommus = <&smmu 0x1>;
>  				status = "disabled";
>  			};
>
> @@ -1259,6 +1261,7 @@ usdhc2: mmc@42860000 {
>  				bus-width = <4>;
>  				fsl,tuning-start-tap = <1>;
>  				fsl,tuning-step = <2>;
> +				iommus = <&smmu 0x2>;
>  				status = "disabled";
>  			};
>
> @@ -1276,6 +1279,7 @@ usdhc3: mmc@428b0000 {
>  				bus-width = <4>;
>  				fsl,tuning-start-tap = <1>;
>  				fsl,tuning-step = <2>;
> +				iommus = <&smmu 0x3>;
>  				status = "disabled";
>  			};
>  		};
> @@ -1768,7 +1772,6 @@ smmu: iommu@490d0000 {
>  					     <GIC_SPI 326 IRQ_TYPE_EDGE_RISING>;
>  				interrupt-names = "eventq", "gerror", "priq", "cmdq-sync";
>  				#iommu-cells = <1>;
> -				status = "disabled";
>  			};
>
>  			pmu@490d2000 {
>
> --
> 2.37.1
>


^ permalink raw reply

* Re: [PATCH v5 1/6] iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
From: Nicolin Chen @ 2026-05-19 18:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <20260519171003.GD3602937@nvidia.com>

On Tue, May 19, 2026 at 02:10:03PM -0300, Jason Gunthorpe wrote:
> On Sun, May 10, 2026 at 02:23:00PM -0700, Nicolin Chen wrote:
> 
> > +#include <linux/dma-direct.h>
> 
> Nope, never do this, it is an internal header.

Hmm, I have included it for a wrong reason, yet it does mention
"IOMMU drivers".

/*
 * Internals of the DMA direct mapping implementation.  Only for use by the
 * DMA mapping code and IOMMU drivers.
 */

> > +/*
> > + * Adopting the crashed kernel's stream table has risks: the physical addresses
> > + * read from ARM_SMMU_STRTAB_BASE / L1 descriptors may be corrupted. Reject any
> > + * range that overlaps the kdump kernel's critical regions.
> > + */
> > +static bool arm_smmu_kdump_phys_is_corrupted(phys_addr_t base, size_t size)
[..]
> Something like this should not be in the smmu driver, this is some
> core kdump code. I'd drop it, I don't see other drivers doing this?

OK.

> > +static int arm_smmu_kdump_adopt_l2_strtab(struct arm_smmu_device *smmu, u32 sid,
> > +					  u32 l1_idx, u64 l2_dma, u32 span,
> > +					  struct arm_smmu_strtab_l2 **l2table)
> > +{
> > +	phys_addr_t base = dma_to_phys(smmu->dev, l2_dma);
> 
> The thing stored in the L2PTR is a *phys*, the HW doesn't support any
> kind of translation. When using dma_alloc_coherent we never get a phys
> so it uses the dma_addr_t and assumes it is == phys.
> 
> But on this flow this is *phys* and should remain phys. Never touch
> dma_addr_t.

Fixing that and other places too.
 
> > +static void arm_smmu_kdump_adopt_cleanup(struct arm_smmu_device *smmu, u32 fmt)
> > +{
> > +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> > +
> > +	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
> > +		if (cfg->l2.l2ptrs)
> > +			devm_kfree(smmu->dev, cfg->l2.l2ptrs);
> > +		if (!IS_ERR_OR_NULL(cfg->l2.l1tab))
> > +			devm_memunmap(smmu->dev, cfg->l2.l1tab);
> > +	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
> > +		if (!IS_ERR_OR_NULL(cfg->linear.table))
> > +			devm_memunmap(smmu->dev, cfg->linear.table);
> > +	}
> > +}
> 
> If we have a cleanup function why is it using devm? Call the cleanup
> function during remove too?

Dropping "devm_"s.

Thanks
Nicolin


^ permalink raw reply

* Re: [PATCH v5 6/6] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe()
From: Jason Gunthorpe @ 2026-05-19 17:58 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <69abcccc388952b2ba0ab4b50c31fcbdac59184a.1778416609.git.nicolinc@nvidia.com>

On Sun, May 10, 2026 at 02:23:05PM -0700, Nicolin Chen wrote:
> arm_smmu_device_hw_probe() runs before arm_smmu_init_structures(), so it's
> natural to decide whether the kdump kernel must adopt the crashed kernel's
> stream table.
> 
> Given that memremap is used to adopt the old stream table, set this option
> only on a coherent SMMU.
> 
> And make sure SMMU isn't in Service Failure Mode.
> 
> Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
> Cc: stable@vger.kernel.org # v6.12+
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 31 +++++++++++++++++++++
>  1 file changed, 31 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason


^ permalink raw reply

* Re: [PATCH v5 4/6] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
From: Jason Gunthorpe @ 2026-05-19 17:45 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <8de5639630e5723d6f371093cef93733f0ca534d.1778416609.git.nicolinc@nvidia.com>

On Sun, May 10, 2026 at 02:23:03PM -0700, Nicolin Chen wrote:
> In kdump cases, the crashed kernel's CDs and page tables can be corrupted,
> which could trigger event spamming. Also, we cannot serve page requests.
> 
> Skip the EVTQ/PRIQ setup entirely rather than enabling then disabling them.
> 
> Also add some inline comments explaining that.
> 
> Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
> Cc: stable@vger.kernel.org # v6.12+
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
>  1 file changed, 27 insertions(+), 16 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason


^ permalink raw reply

* Re: [PATCH v5 3/6] iommu/arm-smmu-v3: Suppress EVTQ/PRIQ events in kdump kernel
From: Jason Gunthorpe @ 2026-05-19 17:44 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <6e5828f3288aed6f9e9f4e0ca54e7fbd9f439274.1778416609.git.nicolinc@nvidia.com>

On Sun, May 10, 2026 at 02:23:02PM -0700, Nicolin Chen wrote:
> In kdump cases, the crashed kernel's CDs and page tables can be corrupted,
> which could trigger event spamming. Also, we cannot serve page requests.
> 
> Skip the IRQ setup for EVTQ/PRIQ in arm_smmu_setup_irqs(), and guard the
> thread functions against being entered via a combined-IRQ delivery while
> the queue is disabled.
> 
> Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
> Cc: stable@vger.kernel.org # v6.12+
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 23 +++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 579c8af82d6b6..ebb0826d74541 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2364,6 +2364,14 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
>  	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
>  
> +	/*
> +	 * A combined IRQ might call into this function with the queue disabled.
> +	 * E.g. kdump, where stale HW PROD vs SW CONS would drive a bogus drain
> +	 * and a CONS write to a disabled queue.
> +	 */
> +	if (!(readl_relaxed(smmu->base + ARM_SMMU_CR0) & CR0_EVTQEN))
> +		return IRQ_NONE;

I don't think we should be doing register reads on these paths. 

Why not load a different irq function instead?

Jason


^ permalink raw reply

* Re: [PATCH v5 2/6] iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
From: Jason Gunthorpe @ 2026-05-19 17:43 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <43fd9986b085cf5bfba2c9bc06c0411693a361e5.1778416609.git.nicolinc@nvidia.com>

On Sun, May 10, 2026 at 02:23:01PM -0700, Nicolin Chen wrote:
> Though the kdump kernel adopts the crashed kernel's stream table, the iommu
> core will still try to attach each probed device to a default domain, which
> overwrites the adopted STE and breaks in-flight DMA from that device.
> 
> Implement an is_attach_deferred() callback to prevent this. For each device
> that has STE.V=1 and STE.Cfg!=Abort in the adopted table, defer the default
> domain attachment, until the device driver explicitly requests it.
> 
> Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
> Cc: stable@vger.kernel.org # v6.12+
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 +++++++++++++++++++++
>  1 file changed, 24 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason


^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Boris Brezillon @ 2026-05-19 17:29 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Ketil Johnsen, Liviu Dudau, Marcin Ślusarz, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian König, Steven Price, Daniel Almeida, Alice Ryhl,
	Matthias Brugger, AngeloGioacchino Del Regno, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	linux-arm-kernel, linux-mediatek, Florent Tomasin, nd
In-Reply-To: <CAPaKu7T7JZRmsS+D_3zFZtyhJk9mNXjL=xpAQ-UNGbm0vztyRg@mail.gmail.com>

On Tue, 19 May 2026 10:07:02 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 19, 2026 at 1:49 AM Ketil Johnsen <ketil.johnsen@arm.com> wrote:
> >
> > On 19/05/2026 09:39, Boris Brezillon wrote:  
> > > On Mon, 18 May 2026 17:36:40 -0700
> > > Chia-I Wu <olvaffe@gmail.com> wrote:
> > >  
> > >> On Mon, May 18, 2026 at 12:16 AM Boris Brezillon
> > >> <boris.brezillon@collabora.com> wrote:  
> > >>>
> > >>> On Wed, 13 May 2026 12:31:32 -0700
> > >>> Chia-I Wu <olvaffe@gmail.com> wrote:
> > >>>  
> > >>>> On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:  
> > >>>>>
> > >>>>> On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:  
> > >>>>>> On Tue, 12 May 2026 14:47:27 +0100
> > >>>>>> Liviu Dudau <liviu.dudau@arm.com> wrote:
> > >>>>>>  
> > >>>>>>> On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:  
> > >>>>>>>> On Thu, 7 May 2026 11:02:26 +0200
> > >>>>>>>> Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > >>>>>>>>  
> > >>>>>>>>> On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:  
> > >>>>>>>>>>> @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > >>>>>>>>>>>                      return ret;
> > >>>>>>>>>>>      }
> > >>>>>>>>>>>
> > >>>>>>>>>>> +   /* If a protected heap name is specified but not found, defer the probe until created */
> > >>>>>>>>>>> +   if (protected_heap_name && strlen(protected_heap_name)) {  
> > >>>>>>>>>>
> > >>>>>>>>>> Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > >>>>>>>>>> name is "" already?  
> > >>>>>>>>>
> > >>>>>>>>> If dma_heap_find() will fail, then the whole probe with fail too.
> > >>>>>>>>> This check prevents that.  
> > >>>>>>>>
> > >>>>>>>> Yeah, that's also a questionable design choice. I mean, we can
> > >>>>>>>> currently probe and boot the FW even though we never setup the
> > >>>>>>>> protected FW sections, so why should we defer the probe here? Can't we
> > >>>>>>>> just retry the next time a group with the protected bit is created and
> > >>>>>>>> fail if we can find a protected heap?  
> > >>>>>>>
> > >>>>>>> The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > >>>>>>> time only. One of the steps is preparing its internal structures for when it enters protected
> > >>>>>>> mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > >>>>>>> process when we have a group with protected mode set.  
> > >>>>>>
> > >>>>>> No, but we can force a full/slow reset and have that thing
> > >>>>>> re-initialized, can't we? I mean, that's basically what we do when a
> > >>>>>> fast reset fails: we re-initialize all the sections and reset again, at
> > >>>>>> which point the FW should start from a fresh state, and be able to
> > >>>>>> properly initialize the protected-related stuff if protected sections
> > >>>>>> are populated. Am I missing something?  
> > >>>>>
> > >>>>> Right, we can do that. For some reason I keep associating the reset with the
> > >>>>> error handling and not with "normal" operations.  
> > >>>> I kind of hope we end up with either
> > >>>>
> > >>>>   - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> > >>>> the heap is missing, or
> > >>>>   - panthor gets a dma-buf from userspace and does the full reset
> > >>>>     - userspace also needs to provide a dma-buf for each protected
> > >>>> group for the suspend buffer
> > >>>>
> > >>>> than something in-between. The latter is more ad-hoc and basically
> > >>>> kicks the issue to the userspace.  
> > >>>
> > >>> Indeed, the second option is more ad-hoc, but when you think about it,
> > >>> userspace has to have this knowledge, because it needs to know the
> > >>> dma-heap to use for buffer allocation that cross a device boundary
> > >>> anyway. Think about frames produced by a video decoder, and composited
> > >>> by the GPU into a protected scanout buffer that's passed to the KMS
> > >>> device. Why would the GPU driver be source of truth when it comes to
> > >>> choosing the heap to use to allocate protected buffers for the video
> > >>> decoder or those used for the display?  
> > >> I don't think the GPU driver is ever the source of truth. If the
> > >> system integrator wants to specify the source of truth (SoT) from
> > >> kernel space, they should use the device tree (or module params /
> > >> config options). If they want to specify the SoT in userspace, then we
> > >> don't really care how it is done other than providing an ioctl.
> > >> Panthor is always on the receiving end.  
> > >
> > > Okay, we're on the same page then.
> > >  
> > >>
> > >> If we don't want to delay this functionality, but it takes time to
> > >> converge on SoT, maybe a solution that is not a long-term promise can
> > >> work? Of the options on the table (dt, module params, kconfig options,
> > >> ioctls), a kconfig option, potentially marked as experimental, seems
> > >> like a good candidate.  
> > >
> > > If Panthor is only a consumer, I actually think it'd be easier to just
> > > let userspace pass the protected FW section as an imported buffer
> > > through an ioctl for now. It means we don't need any of the
> > > modifications to the dma_heap API in this series, and userspace is free
> > > to choose its SoT (efuse, DT, ...) and pass the info back to mesa/GBM
> > > somehow (envvar, driconf, ...). The only thing we need to ensure is if
> > > lazy protected FW section allocation is going to work, but given the
> > > current code purely and simply ignores those sections, and the FW is
> > > still able to boot and act properly (at least on v10-v13), I'm pretty
> > > confident this is okay, unless there's some trick the MCU can do to
> > > detect that the protected section isn't mapped (which I doubt, because
> > > the MCU doesn't know it lives behind an MMU).  
> I set up MMU to map non-protected memory to the protected section the
> other day. The FW still booted fine. I didn't get access violation
> until the FW executed PROT_REGION and panthor requested
> GLB_PROTM_ENTER in response.

Ah, thanks for testing! We still don't have a setup with proper
protected heap, but that was on my list of things to test.

> 
> This was on v13, but I also doubt it will become an issue. Can ARM help clarify?
> 
> > >
> > > Of course, once we have a consensus on how to describe this in the DT,
> > > we can switch Panthor over to "protected dma_heap selection through DT",
> > > and reflect that through the ioctl that exposes whether protected
> > > support is ready or not (would be a DEV_QUERY), such that userspace can
> > > skip this "PROTM initialization" step.
> > >
> > > We're talking about an extra ioctl to set those buffers, and a
> > > DEV_QUERY to query the state (ready or not), the size of the global
> > > protected buffer (protected FW section) and the size of the protected
> > > suspend buffer. The protected suspend buffer would be allocated and
> > > passed at group creation time (extra arg passed to the existing
> > > GROUP_CREATE ioctl). So, overall, I don't consider it a huge liability
> > > in term of maintenance cost.  
> >
> > If we can avoid the dma-heap changes, then that would surely help!
> > I can try to implement this in the next version unless someone finds a
> > reason why it is a bad idea.  
> Yeah, that sounds good to me too.
> 
> Will the extra ioctl require root?

The PROTM_INIT ioctl will certainly require high privilege
CAP_SYS_<something>, dunno yet what that <something> would be though.

> On a system with true protected
> memory, the FW cannot write to non-protected memory. It seems ok to
> allow any client to make the ioctl call. But on systems without true
> protected memory, it can be problematic.

Yep, I agree we shouldn't let random users pretend they initialized
protected mode if the system as a whole doesn't have proper the proper
bit hooked up to set that up.


^ permalink raw reply

* Re: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a platform specific way
From: Niklas Cassel @ 2026-05-19 17:19 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver O'Halloran, Will Deacon, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, Heiko Stuebner,
	Philipp Zabel, linux-pci, linux-kernel, linuxppc-dev,
	linux-arm-kernel, linux-arm-msm, linux-rockchip, Wilfred Mallawa,
	Krishna Chaitanya Chundru, Lukas Wunner, Richard Zhu,
	Brian Norris, Wilson Ding, Frank Li
In-Reply-To: <tgsh3cum6qxrqjzbdeqjsp6bf7cqedj7il77hww3oxecadndin@idjnwib7cz4z>

Hello Mani,

On Mon, May 18, 2026 at 11:51:56AM +0530, Manivannan Sadhasivam wrote:
> > 
> > With the patch above. There is zero difference before/after reset, and all
> > the BAR tests pass. However, MSI/MSI-X tests still fail with:
> > 
> > # pci_endpoint_test.c:143:MSI_TEST:Expected 0 (0) == ret (-110) 
> > # pci_endpoint_test.c:143:MSI_TEST:Test failed for MSI1
> > 
> > ETIMEDOUT.
> > 
> > This suggests that pci_endpoint_test on the host side did not receive an
> > interrupt.
> > 
> > I don't know why, but considering that lspci output is now (with the
> > save+restore) identical, I assume that the problem is not related to
> > the host. Unless somehow the host will use a new/different MSI address
> > after the root port has been reset, and we restore the old MSI address,
> > but looking at the code, dw_pcie_msi_init() is called by
> > dw_pcie_setup_rc(), so I would expect the MSI address to be the same.
> > 
> 
> Hi Niklas,
> 
> When I rebased this series on top of v7.1-rc1, I ended up seeing the issue what
> you described here (not sure why I didn't see it earlier). So after the Root
> Port reset, MSI tests fail, but BAR tests succeed. Also, I got IOMMU faults on
> the host after endpoint triggers MSI.
> 
> I investigated it and found that the MSI iATU mapping gets cleared in hw after
> LDn happens. But the host continues to use the same address/size for the
> endpoint MSI even after reset. Due to this, the existing checks in
> dw_pcie_ep_raise_msi_irq() don't pass and the stale MSI iATU mapping gets
> reused.
> 
> The fix would be to clear the mapping in dw_pcie_ep_cleanup(), which gets called
> as part of the PERST# assert/deassert sequence post LDn and also set
> msi_iatu_mapped flag to 'false'. This will force dw_pcie_ep_raise_msi_irq() to
> use fresh iATU mapping when it gets called for the first time:
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index d4dc3b24da60..4ae0e1b55f39 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -1035,6 +1035,11 @@ void dw_pcie_ep_cleanup(struct dw_pcie_ep *ep)
>  {
>         struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>  
> +       if (ep->msi_iatu_mapped) {
> +               dw_pcie_ep_unmap_addr(ep->epc, 0, 0, ep->msi_mem_phys);
> +               ep->msi_iatu_mapped = false;
> +       }
> +
>         dwc_pcie_debugfs_deinit(pci);
>         dw_pcie_edma_remove(pci);
>  }
> 
> With this change, MSI works after Root Port reset without any issues on our Qcom
> endpoint/host setup.
> 
> Please test this change on your rockchip setup as well. You have to make sure
> that dw_pcie_ep_cleanup() is called during PERST# assert/deassert.
> 
> I'm going to respin the series with this fix. If you confirm it works for you,
> then we can merge your Rockchip Root Port change.

I am happy to hear that you managed to find the root cause!

Hopefully your series can finally move forward :)

While e.g. RK3588 does have a PERST# input GPIO, so it could theoretically
add a perst_deassert()/assert() function. However, when the EPC support was
added, you did not want that, since I remember that you said that you only
wanted that for drivers that required an external refclock.

Thus, for drivers that do not require an external refclock, should we
perhaps add your suggested code in dw_pcie_ep_linkdown()?

E.g. pcie-tegra194.c does not call dw_pcie_ep_linkdown(), so I'm not
sure if we can simply move it from dw_pcie_ep_cleanup() to
dw_pcie_ep_linkdown() either...

Perhaps we need the code in both functions?

(pcie-qcom-ep.c seems to be the only function that will call both
dw_pcie_ep_linkdown() and dw_pcie_ep_cleanup().)


Kind regards,
Niklas


^ permalink raw reply

* Re: [PATCH] spi: aspeed: Replace VLA parameter with flat pointer in calibration helper
From: David Laight @ 2026-05-19 17:13 UTC (permalink / raw)
  To: Mark Brown
  Cc: Chin-Ting Kuo, clg, joel, andrew, linux-aspeed, openbmc,
	linux-spi, linux-arm-kernel, linux-kernel, BMC-SW,
	kernel test robot
In-Reply-To: <659a6593-0223-4a26-830b-1390326b84e5@sirena.org.uk>

On Tue, 19 May 2026 12:03:51 +0100
Mark Brown <broonie@kernel.org> wrote:

> On Mon, May 18, 2026 at 05:57:08PM +0800, Chin-Ting Kuo wrote:
> 
> > -			while (k < cols && buf[i][k])
> > +			while (k < cols && buf[i * cols + k])  
> 
> This really needs () to make it clear what's going on; the precedence is
> well defined but not everyone is going to know that off the top of their
> head.

Come on, it's multiply and add - everyone is going to get that right.

-- David


^ permalink raw reply

* Re: [PATCH 1/8] mm: Add ptep_try_install() for lockless empty-slot installs
From: Tejun Heo @ 2026-05-19 17:11 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: David Vernet, Andrea Righi, Changwoo Min, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Catalin Marinas, Will Deacon,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Andrew Morton, Mike Rapoport, Emil Tsalapatis, sched-ext, bpf,
	x86, linux-arm-kernel, linux-mm, linux-kernel
In-Reply-To: <5590fd3d-dae1-4070-b52f-bc40982ac678@kernel.org>

On Tue, May 19, 2026 at 11:40:48AM +0200, David Hildenbrand (Arm) wrote:
> On 5/19/26 11:05, David Hildenbrand (Arm) wrote:
...
> >> The only requirements are that the kernel doesn't oops and the
> >> violation gets caught. Beyond that, behavior at the address is
> >> unspecified, and which installer wins the race doesn't matter as
> >> long as kernel integrity holds.
> > 
> > You'll have inconsistent TLB state.

Wouldn't it still be either or with both cases being okay?

> > I really don't like that approach.
> > 
> > We should really try to just take the lock, and remove any code under the lock
> > that could trigger such unpleasant deadlocks.
> > 
> > Is that feasible?
> 
> ... or can we run into similar problems with kprobes? (I am obviously no bpf
> expert ...)

Yeah, I mean, that was just the first TP I found scanning the code. Any
kprobes or other TPs in the path would behave the same.

When this fault triggers, the BPF program has already malfunctioned, so it's
not going to be a high frequency path and performance isn't a primary
consideration. So, anything that can ensure that the kernel doesn't crash or
lock up would be fine. Any better ideas?

Thanks.

-- 
tejun


^ permalink raw reply

* Re: [PATCH v5 1/6] iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
From: Jason Gunthorpe @ 2026-05-19 17:10 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: will, robin.murphy, kevin.tian, joro, praan, kees, baolu.lu,
	miko.lenczewski, smostafa, linux-arm-kernel, iommu, linux-kernel,
	stable, jamien
In-Reply-To: <0582326eeadd4ae2b16fd4914e9bd46da5a251d3.1778416609.git.nicolinc@nvidia.com>

On Sun, May 10, 2026 at 02:23:00PM -0700, Nicolin Chen wrote:

> +#include <linux/dma-direct.h>

Nope, never do this, it is an internal header.

> +/*
> + * Adopting the crashed kernel's stream table has risks: the physical addresses
> + * read from ARM_SMMU_STRTAB_BASE / L1 descriptors may be corrupted. Reject any
> + * range that overlaps the kdump kernel's critical regions.
> + */
> +static bool arm_smmu_kdump_phys_is_corrupted(phys_addr_t base, size_t size)
> +{
> +	/*
> +	 * On arm64 kdump, iomem_resource entries are typically:
> +	 * ------------------------------------------------------------
> +	 * | Entry           | IORESOURCE_ Flags   | IORES_DESC_ Desc |
> +	 * ------------------------------------------------------------
> +	 * | System RAM      | MEM + BUSY + SYSRAM | NONE             |
> +	 * | MMIO regions    | MEM + BUSY          | NONE             |
> +	 * | Reserved memory | MEM                 | NONE             |
> +	 * ------------------------------------------------------------
> +	 *
> +	 * Test and reject any overlap with MEM + BUSY, covering/excluding:
> +	 *  + System RAM: silent corruption of kdump kernel's own memory
> +	 *  + MMIO regions: fatal SError on cacheable speculative access
> +	 *  - Reserved memory: crashed kernel's stream table might reside
> +	 */
> +	if (region_intersects(base, size, IORESOURCE_MEM | IORESOURCE_BUSY,
> +			      IORES_DESC_NONE) != REGION_DISJOINT)
> +		return true;
> +
> +	/*
> +	 * Note: physical holes are absent from iomem_resource, so a corrupted
> +	 * address pointing into one will not be caught here. Closing that gap
> +	 * requires a firmware memory map and is left as a future improvement.
> +	 */
> +	return false;
> +}

Something like this should not be in the smmu driver, this is some
core kdump code. I'd drop it, I don't see other drivers doing this?


> +static int arm_smmu_kdump_adopt_l2_strtab(struct arm_smmu_device *smmu, u32 sid,
> +					  u32 l1_idx, u64 l2_dma, u32 span,
> +					  struct arm_smmu_strtab_l2 **l2table)
> +{
> +	phys_addr_t base = dma_to_phys(smmu->dev, l2_dma);

The thing stored in the L2PTR is a *phys*, the HW doesn't support any
kind of translation. When using dma_alloc_coherent we never get a phys
so it uses the dma_addr_t and assumes it is == phys.

But on this flow this is *phys* and should remain phys. Never touch
dma_addr_t.


> +	struct arm_smmu_strtab_l2 *table;
> +	size_t size;
> +
> +	/*
> +	 * Only a coherent SMMU is supported at this moment. For a non-coherent
> +	 * SMMU that wants to support ARM_SMMU_OPT_KDUMP_ADOPT, try MEMREMAP_WC.
> +	 */
> +	if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_COHERENCY)))
> +		return -EOPNOTSUPP;
> +
> +	/*
> +	 * Retest the memremap inputs in case the L1 descriptor was overwritten
> +	 * since adopt. Reject this master's insert; panic or SMMU-disable would
> +	 * either lose the vmcore or cascade aborts. Do not try to fix it, as it
> +	 * would break all other SIDs in the same bus (PCI case). The corruption
> +	 * blast radius is already bounded to that bus range.
> +	 */
> +	if (span != STRTAB_SPLIT + 1) {
> +		dev_err(smmu->dev,
> +			"kdump: L1[%u] span %u changed since adopt (was %u)\n",
> +			l1_idx, span, STRTAB_SPLIT + 1);
> +		return -EINVAL;
> +	}

>  static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>  {
>  	dma_addr_t l2ptr_dma;
>  	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>  	struct arm_smmu_strtab_l2 **l2table;
> +	u32 l1_idx = arm_smmu_strtab_l1_idx(sid);
>  
> -	l2table = &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)];
> +	l2table = &cfg->l2.l2ptrs[l1_idx];
>  	if (*l2table)
>  		return 0;
>  
> +	/* Deferred adoption of the crashed kernel's L2 table */
> +	if (smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT) {
> +		u64 l2ptr = le64_to_cpu(cfg->l2.l1tab[l1_idx].l2ptr);
> +		dma_addr_t l2_dma = l2ptr & STRTAB_L1_DESC_L2PTR_MASK;

Like here, this should by phys_addr_t

> +static int arm_smmu_kdump_adopt_strtab_2lvl(struct arm_smmu_device *smmu,
> +					    u32 cfg_reg, dma_addr_t dma)

Same issues with dma_addr_t

> +static int arm_smmu_kdump_adopt_strtab_linear(struct arm_smmu_device *smmu,
> +					      u32 cfg_reg, dma_addr_t dma)
> +{

Same issues with dma_addr_t

> +static void arm_smmu_kdump_adopt_cleanup(struct arm_smmu_device *smmu, u32 fmt)
> +{
> +	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +
> +	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
> +		if (cfg->l2.l2ptrs)
> +			devm_kfree(smmu->dev, cfg->l2.l2ptrs);
> +		if (!IS_ERR_OR_NULL(cfg->l2.l1tab))
> +			devm_memunmap(smmu->dev, cfg->l2.l1tab);
> +	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
> +		if (!IS_ERR_OR_NULL(cfg->linear.table))
> +			devm_memunmap(smmu->dev, cfg->linear.table);
> +	}
> +}

If we have a cleanup function why is it using devm? Call the cleanup
function during remove too?

Jason


^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Chia-I Wu @ 2026-05-19 17:07 UTC (permalink / raw)
  To: Ketil Johnsen
  Cc: Boris Brezillon, Liviu Dudau, Marcin Ślusarz, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian König, Steven Price, Daniel Almeida, Alice Ryhl,
	Matthias Brugger, AngeloGioacchino Del Regno, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	linux-arm-kernel, linux-mediatek, Florent Tomasin, nd
In-Reply-To: <8f0b1750-a853-4895-9672-73a75f6dbd84@arm.com>

On Tue, May 19, 2026 at 1:49 AM Ketil Johnsen <ketil.johnsen@arm.com> wrote:
>
> On 19/05/2026 09:39, Boris Brezillon wrote:
> > On Mon, 18 May 2026 17:36:40 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >
> >> On Mon, May 18, 2026 at 12:16 AM Boris Brezillon
> >> <boris.brezillon@collabora.com> wrote:
> >>>
> >>> On Wed, 13 May 2026 12:31:32 -0700
> >>> Chia-I Wu <olvaffe@gmail.com> wrote:
> >>>
> >>>> On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
> >>>>>
> >>>>> On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:
> >>>>>> On Tue, 12 May 2026 14:47:27 +0100
> >>>>>> Liviu Dudau <liviu.dudau@arm.com> wrote:
> >>>>>>
> >>>>>>> On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:
> >>>>>>>> On Thu, 7 May 2026 11:02:26 +0200
> >>>>>>>> Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> >>>>>>>>
> >>>>>>>>> On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:
> >>>>>>>>>>> @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> >>>>>>>>>>>                      return ret;
> >>>>>>>>>>>      }
> >>>>>>>>>>>
> >>>>>>>>>>> +   /* If a protected heap name is specified but not found, defer the probe until created */
> >>>>>>>>>>> +   if (protected_heap_name && strlen(protected_heap_name)) {
> >>>>>>>>>>
> >>>>>>>>>> Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> >>>>>>>>>> name is "" already?
> >>>>>>>>>
> >>>>>>>>> If dma_heap_find() will fail, then the whole probe with fail too.
> >>>>>>>>> This check prevents that.
> >>>>>>>>
> >>>>>>>> Yeah, that's also a questionable design choice. I mean, we can
> >>>>>>>> currently probe and boot the FW even though we never setup the
> >>>>>>>> protected FW sections, so why should we defer the probe here? Can't we
> >>>>>>>> just retry the next time a group with the protected bit is created and
> >>>>>>>> fail if we can find a protected heap?
> >>>>>>>
> >>>>>>> The problem we have with the current firmware is that it does a number of setup steps at "boot"
> >>>>>>> time only. One of the steps is preparing its internal structures for when it enters protected
> >>>>>>> mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> >>>>>>> process when we have a group with protected mode set.
> >>>>>>
> >>>>>> No, but we can force a full/slow reset and have that thing
> >>>>>> re-initialized, can't we? I mean, that's basically what we do when a
> >>>>>> fast reset fails: we re-initialize all the sections and reset again, at
> >>>>>> which point the FW should start from a fresh state, and be able to
> >>>>>> properly initialize the protected-related stuff if protected sections
> >>>>>> are populated. Am I missing something?
> >>>>>
> >>>>> Right, we can do that. For some reason I keep associating the reset with the
> >>>>> error handling and not with "normal" operations.
> >>>> I kind of hope we end up with either
> >>>>
> >>>>   - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> >>>> the heap is missing, or
> >>>>   - panthor gets a dma-buf from userspace and does the full reset
> >>>>     - userspace also needs to provide a dma-buf for each protected
> >>>> group for the suspend buffer
> >>>>
> >>>> than something in-between. The latter is more ad-hoc and basically
> >>>> kicks the issue to the userspace.
> >>>
> >>> Indeed, the second option is more ad-hoc, but when you think about it,
> >>> userspace has to have this knowledge, because it needs to know the
> >>> dma-heap to use for buffer allocation that cross a device boundary
> >>> anyway. Think about frames produced by a video decoder, and composited
> >>> by the GPU into a protected scanout buffer that's passed to the KMS
> >>> device. Why would the GPU driver be source of truth when it comes to
> >>> choosing the heap to use to allocate protected buffers for the video
> >>> decoder or those used for the display?
> >> I don't think the GPU driver is ever the source of truth. If the
> >> system integrator wants to specify the source of truth (SoT) from
> >> kernel space, they should use the device tree (or module params /
> >> config options). If they want to specify the SoT in userspace, then we
> >> don't really care how it is done other than providing an ioctl.
> >> Panthor is always on the receiving end.
> >
> > Okay, we're on the same page then.
> >
> >>
> >> If we don't want to delay this functionality, but it takes time to
> >> converge on SoT, maybe a solution that is not a long-term promise can
> >> work? Of the options on the table (dt, module params, kconfig options,
> >> ioctls), a kconfig option, potentially marked as experimental, seems
> >> like a good candidate.
> >
> > If Panthor is only a consumer, I actually think it'd be easier to just
> > let userspace pass the protected FW section as an imported buffer
> > through an ioctl for now. It means we don't need any of the
> > modifications to the dma_heap API in this series, and userspace is free
> > to choose its SoT (efuse, DT, ...) and pass the info back to mesa/GBM
> > somehow (envvar, driconf, ...). The only thing we need to ensure is if
> > lazy protected FW section allocation is going to work, but given the
> > current code purely and simply ignores those sections, and the FW is
> > still able to boot and act properly (at least on v10-v13), I'm pretty
> > confident this is okay, unless there's some trick the MCU can do to
> > detect that the protected section isn't mapped (which I doubt, because
> > the MCU doesn't know it lives behind an MMU).
I set up MMU to map non-protected memory to the protected section the
other day. The FW still booted fine. I didn't get access violation
until the FW executed PROT_REGION and panthor requested
GLB_PROTM_ENTER in response.

This was on v13, but I also doubt it will become an issue. Can ARM help clarify?

> >
> > Of course, once we have a consensus on how to describe this in the DT,
> > we can switch Panthor over to "protected dma_heap selection through DT",
> > and reflect that through the ioctl that exposes whether protected
> > support is ready or not (would be a DEV_QUERY), such that userspace can
> > skip this "PROTM initialization" step.
> >
> > We're talking about an extra ioctl to set those buffers, and a
> > DEV_QUERY to query the state (ready or not), the size of the global
> > protected buffer (protected FW section) and the size of the protected
> > suspend buffer. The protected suspend buffer would be allocated and
> > passed at group creation time (extra arg passed to the existing
> > GROUP_CREATE ioctl). So, overall, I don't consider it a huge liability
> > in term of maintenance cost.
>
> If we can avoid the dma-heap changes, then that would surely help!
> I can try to implement this in the next version unless someone finds a
> reason why it is a bad idea.
Yeah, that sounds good to me too.

Will the extra ioctl require root? On a system with true protected
memory, the FW cannot write to non-protected memory. It seems ok to
allow any client to make the ioctl call. But on systems without true
protected memory, it can be problematic.

>
> >>>> For the former, expressing the relation in DT seems to be the best,
> >>>> but only if possible :-). Otherwise, a kconfig option (instead of
> >>>> module param) should be easier to work with.
> >>>>
> >>>> Looking at the userspace implementation, can we also have an panthor
> >>>> ioctl to return the heap to userspace?
> >>>
> >>> Yes, it's something we can add, but again, I'm questioning the
> >>> usefulness of this: how can we ensure the heap used by panthor to
> >>> allocate its protected FW buffers is suitable for scanout buffers
> >>> (buffers that can be used by display drivers). There needs to be a glue
> >>> leaving in usersland and taking the decision, and I'm not too sure
> >>> trusting any of the component in the chain (vdec, gpu, display) is the
> >>> right thing to do.
> >> The heap returned by panthor is only for panfrost/panvk. It says
> >> nothing about compatibility with other components on the system.
> >
> > Okay, if it's used only for internal buffers, I guess that's fine.
>
> --
> Ketil


^ permalink raw reply

* Re: [PATCH v6 00/19] dmaengine: ti: Add support for BCDMA v2 and PKTDMA v2
From: Vinod Koul @ 2026-05-19 17:06 UTC (permalink / raw)
  To: Sai Sree Kartheek Adivi
  Cc: peter.ujfalusi, robh, krzk+dt, conor+dt, nm, ssantosh, dmaengine,
	devicetree, linux-kernel, linux-arm-kernel, vigneshr, Frank.li,
	r-sharma3, gehariprasath
In-Reply-To: <20260428085202.1724548-1-s-adivi@ti.com>

On 28-04-26, 14:21, Sai Sree Kartheek Adivi wrote:
> This series adds support for the BCDMA_V2 and PKTDMA_V2 which is
> introduced in AM62L.
> 
> The key differences between the existing DMA and DMA V2 are:
> - Absence of TISCI: Instead of configuring via TISCI calls, direct
>   register writes are required.
> - Autopair: There is no longer a need for PSIL pair and instead AUTOPAIR
>   bit needs to set in the RT_CTL register.
> - Static channel mapping: Each channel is mapped to a single peripheral.
> - Direct IRQs: There is no INT-A and interrupt lines from DMA are
>   directly connected to GIC.
> - Remote side configuration handled by DMA. So no need to write to PEER
>   registers to START / STOP / PAUSE / TEARDOWN.
> - Unified Channel Space: Tx and Rx channels share a single register
>   space. Each channel index is specifically fixed in hardware as either
>   Tx or Rx in an interleaved manner.

Please check the commments from Sashiko https://sashiko.dev/#/patchset/20260428085202.1724548-1-s-adivi%40ti.com

-- 
~Vinod


^ permalink raw reply

* Re: [PATCH] arm64: probes: Handle probes on hinted conditional branch instructions
From: Catalin Marinas @ 2026-05-19 17:05 UTC (permalink / raw)
  To: linux-arm-kernel, Vladimir Murzin; +Cc: Will Deacon
In-Reply-To: <20260515133729.112196-1-vladimir.murzin@arm.com>

On Fri, 15 May 2026 14:37:29 +0100, Vladimir Murzin wrote:
> BC.cond instructions introduced by FEAT_HBC cannot be executed
> out-of-line, like other branch instructions. However, they can be
> simulated in the same way as B.cond instructions.
> 
> Extend the B.cond decoder mask to match BC.cond instructions as well,
> and handle them using the existing B.cond simulation path.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: probes: Handle probes on hinted conditional branch instructions
      https://git.kernel.org/arm64/c/2ccd8ff980b5


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox