* [PATCH v1 00/12] Introduce nova-core mm prerequisites
From: Joel Fernandes @ 2026-05-18 18:03 UTC (permalink / raw)
To: linux-kernel
Cc: Miguel Ojeda, Boqun Feng, Gary Guo, Bjorn Roy Baron, Benno Lossin,
Andreas Hindborg, Alice Ryhl, Trevor Gross, Danilo Krummrich,
Dave Airlie, Daniel Almeida, dri-devel, rust-for-linux, nova-gpu,
Nikola Djukic, David Airlie, Boqun Feng, John Hubbard,
Alistair Popple, Timur Tabi, Edwin Peer, Alexandre Courbot,
Andrea Righi, Andy Ritger, Zhi Wang, Balbir Singh,
Philipp Stanner, alexeyi, Eliot Courtney, joel, linux-doc,
Joel Fernandes
This series introduces the prerequisite memory-management infrastructure for
the nova-core driver: a centralized GpuMm manager, types for addressing VRAM
(Pfn, VramAddress), the PRAMIN aperture for indirect VRAM access from the CPU,
and the GSP plumbing that surfaces the usable FB region and total VRAM extent
at boot. It also picks up two small Rust enablers (pci::Bar::resource_flags()
and a cast+shift accessor form of bitfield!) that the rest of the nova-core
mm code relies on.
This series is based on drm-rust-next.
Dependencies (not yet merged):
- Alex Courbot's bitfield series. Tested on v2:
https://lore.kernel.org/all/20260409-bitfield-v2-0-23ac400071cb@nvidia.com/
A newer v3 of bitfield is available and should also work (haven't tested):
https://lore.kernel.org/all/20260501-bitfield-v3-0-aa1076c3337d@nvidia.com/
- rust: maple_tree: implement Send and Sync for MapleTree (v3):
https://lore.kernel.org/all/20260511143604.3848176-1-joelagnelf@nvidia.com/
The git tree (containing the dependencies above, this series, and the
follow-on page-table/VMM/BAR1 series) can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (tag: nova-mm-v1-20260518)
Change log:
Changes from v12 to v1 (split-out):
- Part 1 of 2; the v12 series was split for easier review. Page-table/VMM/BAR1 patches in companion series.
- Broke v12's "Add common memory management types" into atomic patches: Pfn, VramAddress, VramAddress arithmetic.
- New prereq: "rust: pci: add resource_flags accessor".
- New prereq: "rust: bitfield: support cast+shift accessor syntax".
- "Add GpuMm centralized memory manager" scoped to scaffolding only; buddy/TLB wiring deferred to companion series.
- Squashed v12's "pramin: drop useless as_ref()" cleanup into "Add PRAMIN aperture self-tests".
- Moved "rust: maple_tree: Send and Sync" out as a standalone dependency.
- Smaller code touch-ups across most carried-over patches.
Link to v12: https://lore.kernel.org/all/20260425211454.174696-1-joelagnelf@nvidia.com/
Joel Fernandes (12):
rust: pci: add resource_flags accessor
rust: bitfield: support cast+shift accessor syntax
gpu: nova-core: gsp: Return GspStaticInfo from boot()
gpu: nova-core: gsp: Extract usable FB region from GSP
gpu: nova-core: gsp: Expose total physical VRAM end from FB region
info
gpu: nova-core: mm: Add Pfn (Physical Frame Number) type
gpu: nova-core: mm: Add VramAddress type and conversion traits
gpu: nova-core: mm: Add VramAddress arithmetic and ordering
gpu: nova-core: mm: Add support to use PRAMIN windows to write to VRAM
docs: gpu: nova-core: Document the PRAMIN aperture mechanism
gpu: nova-core: mm: Add GpuMm centralized memory manager
gpu: nova-core: mm: Add PRAMIN aperture self-tests
Documentation/gpu/nova/core/pramin.rst | 123 ++++++
Documentation/gpu/nova/index.rst | 1 +
drivers/gpu/nova-core/Kconfig | 10 +
drivers/gpu/nova-core/driver.rs | 2 +
drivers/gpu/nova-core/gpu.rs | 48 ++-
drivers/gpu/nova-core/gsp/boot.rs | 12 +-
drivers/gpu/nova-core/gsp/commands.rs | 16 +-
drivers/gpu/nova-core/gsp/fw/commands.rs | 49 ++-
drivers/gpu/nova-core/mm.rs | 247 +++++++++++
drivers/gpu/nova-core/mm/pramin.rs | 512 +++++++++++++++++++++++
drivers/gpu/nova-core/nova_core.rs | 1 +
drivers/gpu/nova-core/regs.rs | 122 ++++++
rust/helpers/pci.c | 6 +
rust/kernel/bitfield.rs | 67 +++
rust/kernel/io/resource.rs | 8 +
rust/kernel/pci.rs | 14 +
16 files changed, 1228 insertions(+), 10 deletions(-)
create mode 100644 Documentation/gpu/nova/core/pramin.rst
create mode 100644 drivers/gpu/nova-core/mm.rs
create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
base-commit: 9bd99adf7cee4b8ed4adecd53269010250a0d2ec
--
2.34.1
^ permalink raw reply
* Re: [PATCH v2 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings
From: Andrew Morton @ 2026-05-18 17:48 UTC (permalink / raw)
To: Stanislav Kinsburskii
Cc: kys, Liam.Howlett, david, jgg, corbet, leon, ljs, mhocko, rppt,
shuah, skhan, surenb, vbabka, skinsburskii, linux-doc,
linux-kernel, linux-kselftest, linux-mm
In-Reply-To: <177863991557.82528.15288076059759579141.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
On Wed, 13 May 2026 02:40:11 +0000 Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> wrote:
> This series extends the HMM framework to support userfaultfd-backed memory
> by allowing the mmap read lock to be dropped during hmm_range_fault().
>
> Some page fault handlers — most notably userfaultfd — require the mmap lock
> to be released so that userspace can resolve the fault. The current HMM
> interface never sets FAULT_FLAG_ALLOW_RETRY, making it impossible to fault
> in pages from userfaultfd-registered regions.
>
> This series follows the established int *locked pattern from
> get_user_pages_remote() in mm/gup.c. A new entry point,
> hmm_range_fault_unlockable(), accepts an int *locked parameter. When the
> mmap lock is dropped during fault resolution (VM_FAULT_RETRY or
> VM_FAULT_COMPLETED), the function returns 0 with *locked = 0, signalling
> the caller to restart its walk. The existing hmm_range_fault() is
> refactored into a thin wrapper that passes NULL, preserving current
> behavior for all existing callers.
>
> Faulting hugetlb pages on the unlockable path is not supported because
> walk_hugetlb_range() unconditionally holds and releases
> hugetlb_vma_lock_read across the callback; if the mmap lock is dropped
> inside the callback, the VMA may be freed before the walk framework's
> unlock. Hugetlb pages already present in page tables are handled normally.
> Possible approaches to lift this limitation are documented in
> Documentation/mm/hmm.rst.
Thanks. AI review asked some questions:
https://sashiko.dev/#/patchset/177863991557.82528.15288076059759579141.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net
I'd ignore the fist one: don't write buggy fault handlers!
^ permalink raw reply
* Re: [PATCH v3 2/2] cpufreq: CPPC: add autonomous mode boot parameter support
From: Sumit Gupta @ 2026-05-18 17:22 UTC (permalink / raw)
To: Mario Limonciello, rafael, viresh.kumar, pierre.gondois,
ionela.voinescu, zhenglifeng1, zhanjie9, corbet, skhan, rdunlap,
linux-pm, linux-doc, linux-kernel
Cc: linux-tegra, treding, jonathanh, vsethi, ksitaraman, sanjayc,
mochs, bbasu
In-Reply-To: <7d7a6ab6-b1ea-484c-a275-19acca50c483@amd.com>
On 18/05/26 19:51, Mario Limonciello wrote:
> External email: Use caution opening links or attachments
>
>
> On 5/18/26 09:15, Sumit Gupta wrote:
>>
>> On 18/05/26 19:20, Mario Limonciello wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 5/18/26 08:44, Sumit Gupta wrote:
>>>> Hi Mario,
>>>>
>>>>
>>>> On 16/05/26 02:43, Mario Limonciello wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On 5/15/26 07:26, Sumit Gupta wrote:
>>>>>> Add a kernel boot parameter 'cppc_cpufreq.auto_sel_mode' to enable
>>>>>> CPPC autonomous performance selection on all CPUs at system startup.
>>>>>> When autonomous mode is enabled, the hardware automatically adjusts
>>>>>> CPU performance based on workload demands using Energy Performance
>>>>>> Preference (EPP) hints.
>>>>>>
>>>>>> When the parameter is set:
>>>>>> - Configure all CPUs for autonomous operation on first init
>>>>>> - Use HW min/max_perf when available; otherwise initialize from caps
>>>>>> - Initialize desired_perf to max_perf as a starting hint
>>>>>> - Hardware controls frequency instead of the OS governor
>>>>>> - EPP behavior depends on parameter value:
>>>>>> - performance (or 1): override EPP to performance preference
>>>>>> (0x0)
>>>>>> - default_epp (or 2): preserve EPP value programmed by BIOS/
>>>>>> firmware
>>>>>>
>>>>>> The boot parameter is applied only during first policy
>>>>>> initialization.
>>>>>> Skip applying it on CPU hotplug to preserve runtime sysfs
>>>>>> configuration.
>>>>>>
>>>>>> This patch depends on patch series [1] ("cpufreq: Set policy->min
>>>>>> and
>>>>>> max as real QoS constraints") so that the policy->min/max set in
>>>>>> cppc_cpufreq_cpu_init() are not overridden by cpufreq_set_policy()
>>>>>> during init.
>>>>>>
>>>>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>>>>> ---
>>>>>> [1] https://lore.kernel.org/lkml/20260511135538.522653-1-
>>>>>> pierre.gondois@arm.com/
>>>>>> ---
>>>>>> .../admin-guide/kernel-parameters.txt | 16 +++
>>>>>> drivers/cpufreq/cppc_cpufreq.c | 122 +++++++++++++
>>>>>> ++++-
>>>>>> 2 files changed, 133 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/
>>>>>> Documentation/admin-guide/kernel-parameters.txt
>>>>>> index 0eb64aab3685..7e4b3a8fd76f 100644
>>>>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>>>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>>>>> @@ -1048,6 +1048,22 @@ Kernel parameters
>>>>>> policy to use. This governor must be
>>>>>> registered
>>>>>> in the
>>>>>> kernel before the cpufreq driver probes.
>>>>>>
>>>>>> + cppc_cpufreq.auto_sel_mode=
>>>>>> + [CPU_FREQ] Enable ACPI CPPC autonomous
>>>>>> performance
>>>>>> + selection. When enabled, hardware
>>>>>> automatically
>>>>>> adjusts
>>>>>> + CPU frequency on all CPUs based on workload
>>>>>> demands.
>>>>>> + In Autonomous mode, Energy Performance
>>>>>> Preference (EPP)
>>>>>> + hints guide hardware toward performance (0x0)
>>>>>> or energy
>>>>>> + efficiency (0xff).
>>>>>> + Requires ACPI CPPC autonomous selection
>>>>>> register
>>>>>> + support.
>>>>>> + Accepts:
>>>>>> + performance, 1: enable auto_sel + set EPP to
>>>>>> + performance (0x0)
>>>>>> + default_epp, 2: enable auto_sel, preserve
>>>>>> EPP
>>>>>> value
>>>>>> + programmed by BIOS/firmware
>>>>>> + Unset: cpufreq governors are used (auto_sel
>>>>>> disabled).
>>>>>
>>>>> Rather than unset doing nothing, have you considered having it take a
>>>>> midpoint like 128? That's what we do in amd-pstate (default to
>>>>> balance_performance). I think it turns into a reasonable balance.
>>>>
>>>> Thanks for the suggestion.
>>>> I can add balance_performance that enables auto_sel with EPP=128 in
>>>> v4.
>>>>
>>>> On changing the driver default (no param behavior) to auto enable
>>>> balance_performance, it would be good to keep the current behavior for
>>>> now since cppc_cpufreq is generic across ARM64/RISC-V platforms where
>>>> EPP and Autonomous Selection registers are optional.
>>>> A default change would affect existing users relying on governors.
>>>>
>>>> Thank you,
>>>> Sumit Gupta
>>>
>>> But couldn't you make the "no module parameter set" follow the behavior
>>> to only set the registers if they're available?
>>>
>>> So the systems that support it start using it, the ones that don't it's
>>> a NOP.
>>>
>>
>> Would it work to add balance_performance as a new mode in v4,
>> and discuss changing the default separately as a follow-up?
>>
>
> Sure.
>
>> Runtime detection helps for unsupported platforms. But platforms which
>> support the registers use OS governors today, and silently switching
>> them to autonomous mode on a kernel update is a behavior change for
>> existing users. They would also have no way to boot into sw governor.
>>
>
> But hopefully it should be better battery life/responsiveness for those
> scenarios too, right?
>
Yes in many cases, but if some workloads rely on specific OS governor
configurations, then that would get impacted.
I will send a separate change later to seek broader consensus on
enabling auto_sel as default without any param.
Thank you,
Sumit Gupta
....
^ permalink raw reply
* Re: [PATCH mm-unstable v17 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse
From: Usama Arif @ 2026-05-18 17:00 UTC (permalink / raw)
To: Nico Pache
Cc: Usama Arif, linux-doc, linux-kernel, linux-mm, linux-trace-kernel,
akpm, anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
hannes, hughd, jack, jackmanb, jannh, jglisse, joshua.hahnjy, kas,
lance.yang, liam, ljs, mathieu.desnoyers, matthew.brost, mhiramat,
mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
zokeefe
In-Reply-To: <20260511185817.686831-7-npache@redhat.com>
On Mon, 11 May 2026 12:58:06 -0600 Nico Pache <npache@redhat.com> wrote:
> Pass an order and offset to collapse_huge_page to support collapsing anon
> memory to arbitrary orders within a PMD. order indicates what mTHP size we
> are attempting to collapse to, and offset indicates were in the PMD to
> start the collapse attempt.
>
> For non-PMD collapse we must leave the anon VMA write locked until after
> we collapse the mTHP-- in the PMD case all the pages are isolated, but in
> the mTHP case this is not true, and we must keep the lock to prevent
> access/changes to the page tables. This can happen if the rmap walkers hit
> a pmd_none while the PMD entry is currently unavailable due to being
> temporarily removed during the collapse phase.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
> mm/khugepaged.c | 93 +++++++++++++++++++++++++++++--------------------
> 1 file changed, 55 insertions(+), 38 deletions(-)
>
The patch did 2 things:
Make it work with any order and not just PMD order.
Keeps anon_vma_write held across the copy and install for non-PMD orders,
as mTHP leaves the out-of-range PTEs mapped while the PMD is temporarily none.
rmap walkers cannot reach here until PMD is isntalled.
Acked-by: Usama Arif <usama.arif@linux.dev>
^ permalink raw reply
* Re: [PATCH v4 05/16] vfio: Enforce preserved devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD
From: Vipin Sharma @ 2026-05-18 16:47 UTC (permalink / raw)
To: Zhu Yanjun
Cc: kvm, linux-doc, linux-kernel, linux-kselftest, linux-pci,
ajayachandra, alex, amastro, ankita, apopple, chrisl, corbet,
dmatlack, graf, jacob.pan, jgg, jgg, jrhilke, julianr, kevin.tian,
leon, leonro, lukas, michal.winiarski, parav, pasha.tatashin,
praan, pratyush, rananta, rientjes, rodrigo.vivi, rppt, saeedm,
skhan, skhawaja, vivek.kasireddy, witu, yi.l.liu
In-Reply-To: <65228806-6ed3-4577-9037-13fd5eb8f9b6@linux.dev>
On Sun, May 17, 2026 at 12:04:04PM -0700, Zhu Yanjun wrote:
>
> 在 2026/5/11 16:47, Vipin Sharma 写道:
> > From: David Matlack <dmatlack@google.com>
> >
> > Enforce that files for incoming (preserved by previous kernel) VFIO
> > devices are retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD rather than by
> > opening the corresponding VFIO character device or via
> > VFIO_GROUP_GET_DEVICE_FD.
> >
> > Both of these methods would result in VFIO initializing the device
> > without access to the preserved state of the device passed by the
> > previous kernel.
> >
> > Reviewed-by: Pranjal Shrivastava <praan@google.com>
> > Signed-off-by: David Matlack <dmatlack@google.com>
> > Co-developed-by: Vipin Sharma <vipinsh@google.com>
> > Signed-off-by: Vipin Sharma <vipinsh@google.com>
> > ---
> > drivers/vfio/device_cdev.c | 8 ++++++++
> > drivers/vfio/group.c | 9 +++++++++
> > drivers/vfio/pci/vfio_pci_liveupdate.c | 6 ++++++
> > drivers/vfio/vfio.h | 18 ++++++++++++++++++
> > 4 files changed, 41 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1ab07ccaf3ab..4df0495941c6 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -49,6 +49,14 @@ static int vfio_device_cdev_open(struct vfio_device *device, struct file **filep
> > }
> > *filep = file;
> > + } else if (vfio_liveupdate_incoming_is_preserved(device)) {
> > + /*
> > + * Since it is live update preserved device, it must be
> > + * retrieved via LIVEUPDATE_SESSION_RETRIEVE_FD instead of
> > + * opening /dev/vfio/devices/vfioX.
> > + */
> > + ret = -EBUSY;
> > + goto err_free_device_file;
>
> When vfio_liveupdate_incoming_is_preserved(device) returns true,
> vfio_device_put_registration(device) is not called in this path.
>
> Is vfio_device_put_registration(device) instead invoked from the
> err_free_device_file error handling path?
Yes, at the end of vfio_device_cdev_open(), goto label first frees the
device file object and then calls the vfio_device_put_registration().
This is the same error handlign flow as in the if(!file) {} code in the
above function.
^ permalink raw reply
* Re: [PATCH v4 02/16] vfio/pci: Preserve vfio-pci device files across Live Update
From: Vipin Sharma @ 2026-05-18 16:37 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Samiullah Khawaja, David Matlack, kvm, linux-doc, linux-kernel,
linux-kselftest, linux-pci, ajayachandra, alex, amastro, ankita,
apopple, chrisl, corbet, graf, jacob.pan, jgg, jgg, jrhilke,
julianr, kevin.tian, leon, leonro, lukas, michal.winiarski, parav,
pasha.tatashin, praan, rananta, rientjes, rodrigo.vivi, rppt,
saeedm, skhan, vivek.kasireddy, witu, yanjun.zhu, yi.l.liu
In-Reply-To: <2vxzcxyy9fpd.fsf@kernel.org>
On Thu, May 14, 2026 at 05:24:46PM +0200, Pratyush Yadav wrote:
> On Wed, May 13 2026, Samiullah Khawaja wrote:
>
> > On Tue, May 12, 2026 at 02:29:19PM -0700, Vipin Sharma wrote:
> >>On Tue, May 12, 2026 at 01:59:51PM -0700, David Matlack wrote:
> >>> On Mon, May 11, 2026 at 4:48 PM Vipin Sharma <vipinsh@google.com> wrote:
> >>>
> >>> > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>> > index c12d614fc6c4..019de053f116 100644
> >>> > --- a/drivers/vfio/pci/Kconfig
> >>> > +++ b/drivers/vfio/pci/Kconfig
> >>> > @@ -45,13 +45,15 @@ config VFIO_PCI_IGD
> >>> >
> >>> > config VFIO_PCI_LIVEUPDATE
> >>> > bool "VFIO PCI support for Live Update (EXPERIMENTAL)"
> >>> > - depends on PCI_LIVEUPDATE
> >>> > + depends on PCI_LIVEUPDATE && VFIO_DEVICE_CDEV
> >>> > help
> >>> > Support for preserving devices bound to vfio-pci across a Live
> >>> > Update. This option should only be enabled by developers working on
> >>> > implementing this support. Once enough support has landed in the
> >>> > kernel, this option will no longer be marked EXPERIMENTAL.
> >>> >
> >>> > + Enabling this will disable support for VFIO PCI DMA buffer.
> >>> > +
> >>> > If you don't know what to do here, say N.
> >>> >
> >>> > endif
> >>> > @@ -68,7 +70,7 @@ config VFIO_PCI_ZDEV_KVM
> >>> > To enable s390x KVM vfio-pci extensions, say Y.
> >>> >
> >>> > config VFIO_PCI_DMABUF
> >>> > - def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER
> >>> > + def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER && !VFIO_PCI_LIVEUPDATE
> >>>
> >>> Why does enabling VFIO_PCI_LIVEUPDATE require disabling
> >>> VFIO_PCI_DMABUF? I saw the cover letter says "to keep things simple",
> >>> but what specific problem does this solve or simplify?
> >>
> >>I should have provided more details there.
> >>
> >>When device is getting reset in vfio_pci_liveupdate_freeze(), we are
> >>zapping userspace mapped bars, we also need to use
> >>vfio_pci_dma_buf_move() to revoke dma buffer access or
> >>vfio_pci_dma_buf_cleanup() combination. Cleanup takes the memory lock
> >>which freeze already takes, and there are some refcounts which are
> >>managed in both of these APIs. This was causing complexities with code
> >>flow based on result of pci_load_saved_state(). All this was adding more
> >>refactoring than I wanted in the series.
> >
> > Maybe we can return -EOPNOTSUPP if any dmabufs for this vfio cdev are
> > exported during preserve?
Currently, no APIs are present to fetch if dmabufs are exported or not.
I will add one patch to this series to return EOPNOTSUPP and remove
condition from the config.
>
> Whichever way you go with, a TODO/comment would be nice to have so
> someone (including future you) looking at this code knows why this
> restriction exists.
>
I will add comment in the next version.
^ permalink raw reply
* Re: [PATCH] docs: submitting-patches: Clarify that in English "reviewer" is a person
From: Randy Dunlap @ 2026-05-18 16:25 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), Krzysztof Kozlowski, Jonathan Corbet,
Shuah Khan, workflows, linux-doc, linux-kernel
Cc: Greg Kroah-Hartman, Andrew Morton, David Hildenbrand,
Linus Torvalds, Guenter Roeck
In-Reply-To: <ce1e5e9b-83d0-4971-aee3-dc5a8f85ce22@kernel.org>
On 5/16/26 7:39 AM, Vlastimil Babka (SUSE) wrote:
> On 5/16/26 14:38, Krzysztof Kozlowski wrote:
>> Common understanding of word "Reviewer" is: a person performing a review
>> work [1]. Tools are not persons, thus cannot be reviewers in this term.
>> Also tools cannot make statements ("A Reviewed-by tag is a statement of
>> opinion"), since making a statement needs some sort of conscious mind.
>>
>> Our docs already clearly mark that "Reviewed-by" must come from a
>> person:
>>
>> - "By offering my Reviewed-by: tag, I state that:"
>>
>> Usage of first person "I" and word "state"
>>
>> - "A Reviewed-by tag is *a statement of opinion* that the patch is an
>> appropriate modification of the kernel without any remaining serious"
>>
>> Only a person can make a statement of opinion.
>>
>> - "Any interested reviewer (who has done the work) can offer a
>> Reviewed-by"
>>
>> A person can offer a tag thus above does not grant the tool
>> permission to offer a tag.
>>
>> However this is not enough and apparently English is not that precise,
>> so let's clarify that only a person can state the "Reviewer's statement
>> of oversight".
>>
>> Link: https://en.wiktionary.org/wiki/reviewer [1]
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Vlastimil Babka <vbabka@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
>
> I agree with the intent that the tag is for people (whether they use a tool
> or not to help them). We also don't put "Tested-by: kernel test robot" or
> syzkaller on every commit that they test and find no bugs. Review is also
> not just about absence of bugs, but agreeing with the larger design and
> whether the change makes sense to do in the first place.
Ack that also.
> So whether that's achieved with this particular wording or differently,
>
> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Thanks.
>
>>
>> ---
>>
>> I find it silly to need to describe English, but it seems it is needed.
>>
>> https://lore.kernel.org/all/fd3b2ca7-4d64-4c4b-98a3-7d3285fa6826@roeck-us.net/
>> ---
>> Documentation/process/submitting-patches.rst | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
>> index d7290e208e72..a989de43f3db 100644
>> --- a/Documentation/process/submitting-patches.rst
>> +++ b/Documentation/process/submitting-patches.rst
>> @@ -581,10 +581,10 @@ By offering my Reviewed-by: tag, I state that:
>>
>> A Reviewed-by tag is a statement of opinion that the patch is an
>> appropriate modification of the kernel without any remaining serious
>> -technical issues. Any interested reviewer (who has done the work) can
>> -offer a Reviewed-by tag for a patch. This tag serves to give credit to
>> -reviewers and to inform maintainers of the degree of review which has been
>> -done on the patch. Reviewed-by: tags, when supplied by reviewers known to
>> +technical issues. Any interested reviewer (who has done the work and is a
>> +person) can offer a Reviewed-by tag for a patch. This tag serves to give
>> +credit to reviewers and to inform maintainers of the degree of review which has
>> +been done on the patch. Reviewed-by: tags, when supplied by reviewers known to
>> understand the subject area and to perform thorough reviews, will normally
>> increase the likelihood of your patch getting into the kernel.
>>
>
>
--
~Randy
^ permalink raw reply
* Re: [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events
From: Jonathan Cameron @ 2026-05-18 16:09 UTC (permalink / raw)
To: Dan Williams (nvidia)
Cc: Bowman, Terry, dave, dave.jiang, alison.schofield, bhelgaas,
shiju.jose, ming.li, Smita.KoralahalliChannabasappa, rrichter,
dan.carpenter, PradeepVineshReddy.Kodamati, lukas,
Benjamin.Cheatham, sathyanarayanan.kuppuswamy, vishal.l.verma,
alucerop, ira.weiny, corbet, rafael, xueshuai, linux-cxl,
linux-kernel, linux-pci, linux-acpi, linux-doc,
Mauro Carvalho Chehab
In-Reply-To: <69feaebd471c3_1b86a100b@djbw-dev.notmuch>
On Fri, 08 May 2026 20:49:17 -0700
"Dan Williams (nvidia)" <djbw@kernel.org> wrote:
> Jonathan Cameron wrote:
> > On Thu, 7 May 2026 13:33:45 -0500
> > "Bowman, Terry" <terry.bowman@amd.com> wrote:
> [..]
> > > > This concerns me (sorry I wasn't paying attention to the v16 thread).
> > > > It is a userspace regression against code that is out in the wild and typically
> > > > not updated in sync with the kernel.
> > > >
> > > > If you are suggesting breaking ras-daemon at the very least +CC the maintainer.
>
> Sorry, that was not the intent, see below.
Sorry for slow reply - getting a bit buried in other kernel work so haven't been
checking CXL stuff as often as normal.
Anyhow direction looks good to me.
Jonathan
^ permalink raw reply
* Re: (subset) [PATCH v3 00/28] vfs/nfsd: add support for CB_NOTIFY callbacks in directory delegations
From: Chuck Lever @ 2026-05-18 16:05 UTC (permalink / raw)
To: Christian Brauner, Jeff Layton, Chuck Lever
Cc: Alexander Viro, Jan Kara, Alexander Aring, Steven Rostedt,
Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker, Amir Goldstein, Calum Mackay,
linux-fsdevel, linux-kernel, linux-trace-kernel, linux-doc,
linux-nfs
In-Reply-To: <20260515-weltschmerz-folgen-68ca0db1ef84@brauner>
On Fri, May 15, 2026, at 1:26 PM, Christian Brauner wrote:
> On Tue, 28 Apr 2026 08:09:44 +0100, Jeff Layton wrote:
>> Re-posting the set per Christian's request. The only difference in this
>> version is a small error handling fix in alloc_init_dir_deleg(). The old
>> version could crash since release_pages() can't handle an array with
>> NULL pointers in it.
>>
>> ---------------------------------8<------------------------------------
>>
>> [...]
>
> @Chuck, @Jeff, I've only merged the vfs specific changes into a stable branch.
> You can pull it I won't touch it again. You can pull the nfsd work in in
> whatever form you like. Same procedure I use with io_uring et al.
>
> Let me know if that work for you.
>
> ---
>
> Applied to the vfs-7.2.directory.delegations branch of the vfs/vfs.git
> tree.
> Patches in the vfs-7.2.directory.delegations branch should appear in
> linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs-7.2.directory.delegations
>
> [01/28] filelock: pass current blocking lease to
> trace_break_lease_block() rather than "new_fl"
> https://git.kernel.org/vfs/vfs/c/89330d3a60f7
> [02/28] filelock: add support for ignoring deleg breaks for dir change
> events
> https://git.kernel.org/vfs/vfs/c/24cbf43337f4
> [03/28] filelock: add a tracepoint to start of break_lease()
> https://git.kernel.org/vfs/vfs/c/e39026a86b48
> [04/28] filelock: add an inode_lease_ignore_mask helper
> https://git.kernel.org/vfs/vfs/c/95825fdcc0b0
> [05/28] fsnotify: new tracepoint in fsnotify()
> https://git.kernel.org/vfs/vfs/c/ad4489dcd08d
> [06/28] fsnotify: add fsnotify_modify_mark_mask()
> https://git.kernel.org/vfs/vfs/c/12ffbb117b64
> [07/28] fsnotify: add FSNOTIFY_EVENT_RENAME data type
> https://git.kernel.org/vfs/vfs/c/010043003c0c
Looks good.
To make the NFSD pieces apply, I need v7.1-rc4 and
vfs-7.2.directory.delegations merged into vfs.all. Given your
regular merge cadence over the past few weeks, I expect that
will happen end of this week? Early next?
--
Chuck Lever
^ permalink raw reply
* Re: [PATCH] MAINTAINERS: nvdimm: Include maintainer profile
From: Dave Jiang @ 2026-05-18 15:43 UTC (permalink / raw)
To: Krzysztof Kozlowski, Dan Williams, Vishal Verma, Ira Weiny,
Jonathan Corbet, Shuah Khan, nvdimm, linux-doc, linux-kernel
In-Reply-To: <20260518104306.39289-2-krzysztof.kozlowski@oss.qualcomm.com>
On 5/18/26 3:43 AM, Krzysztof Kozlowski wrote:
> No dedicated NVDIMM maintainers are returned by get_maintainers.pl for
> the subsystem maintainer profile, thus patches changing that file miss
> the actual owners of the file.
>
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Acked-by: Dave Jiang <dave.jiang@intel.com>
> ---
> MAINTAINERS | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 7a65b220d93f..294909f6d488 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14751,6 +14751,7 @@ S: Supported
> Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
> P: Documentation/nvdimm/maintainer-entry-profile.rst
> T: git git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
> +F: Documentation/nvdimm/maintainer-entry-profile.rst
> F: drivers/acpi/nfit/*
> F: drivers/nvdimm/*
> F: include/linux/libnvdimm.h
^ permalink raw reply
* [PATCH v2 6/6] selftests: iou-zcrx: add notification and stats test for zcrx
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: Clément Léger, linux-doc, linux-kernel, linux-kselftest,
netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
Add a selftest to verify that ZCRX notification are properly delivered
to userspace and that the shared-memory notification stats (copy_count,
copy_bytes) are correctly incremented when zero-copy RX falls back to
copying or when it runs out of buffers.
The test registers a notification descriptor during
IORING_REGISTER_ZCRX_IFQ with a stats region placed after the refill
queue entries. A new -n flag verifies that the copy fallback is
triggered and -b/-a flags allows to check for out of buffer
notification.
To reliably trigger copy fallback, the Python test uses a new
single_no_flow() setup variant that configures tcp-data-split and RSS
but without ethtool flow rule. Without flow steering, traffic arrives
on non-zcrx queues as regular pages, forcing the kernel copy-fallback
path in io_zcrx_copy_frag().
Out-of-buffer notification is verified by using a smaller receive area
and by avoiding recycling the buffers so that the kernel runs out of
buffer quickly.
Signed-off-by: Clément Léger <cleger@meta.com>
---
.../selftests/drivers/net/hw/iou-zcrx.c | 114 ++++++++++++++++--
.../selftests/drivers/net/hw/iou-zcrx.py | 49 +++++++-
2 files changed, 151 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c
index 240d13dbc54e..78a43ede77ed 100644
--- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c
+++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c
@@ -52,7 +52,27 @@ struct t_io_uring_zcrx_ifq_reg {
struct io_uring_zcrx_offsets offsets;
__u32 zcrx_id;
__u32 rx_buf_len;
- __u64 __resv[3];
+ __u64 notif_desc;
+ __u64 __resv[2];
+};
+
+#define ZCRX_NOTIF_NO_BUFFERS 0
+#define ZCRX_NOTIF_COPY 1
+#define ZCRX_NOTIF_DESC_FLAG_STATS (1 << 0)
+
+#define NOTIF_USER_DATA 3
+
+struct t_zcrx_notification_desc {
+ __u64 user_data;
+ __u32 type_mask;
+ __u32 flags;
+ __u64 stats_offset;
+ __u64 __resv2[9];
+};
+
+struct t_io_uring_zcrx_notif_stats {
+ __u64 copy_count;
+ __u64 copy_bytes;
};
static long page_size;
@@ -84,7 +104,10 @@ static int cfg_oneshot_recvs;
static int cfg_send_size = SEND_SIZE;
static struct sockaddr_in6 cfg_addr;
static unsigned int cfg_rx_buf_len;
+static size_t cfg_area_size;
static bool cfg_dry_run;
+static bool cfg_copy_fallback;
+static bool cfg_no_buffers;
static char *payload;
static void *area_ptr;
@@ -95,6 +118,9 @@ static unsigned long area_token;
static int connfd;
static bool stop;
static size_t received;
+static unsigned int received_notif_type;
+static bool received_notif;
+static size_t notif_stats_offset;
static unsigned long gettimeofday_ms(void)
{
@@ -142,6 +168,7 @@ static void setup_zcrx(struct io_uring *ring)
{
unsigned int ifindex;
unsigned int rq_entries = 4096;
+ size_t area_size = cfg_area_size ? cfg_area_size : AREA_SIZE;
int ret;
ifindex = if_nametoindex(cfg_ifname);
@@ -150,7 +177,7 @@ static void setup_zcrx(struct io_uring *ring)
if (cfg_rx_buf_len && cfg_rx_buf_len != page_size) {
area_ptr = mmap(NULL,
- AREA_SIZE,
+ area_size,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE |
MAP_HUGETLB | MAP_HUGE_2MB,
@@ -162,7 +189,7 @@ static void setup_zcrx(struct io_uring *ring)
}
} else {
area_ptr = mmap(NULL,
- AREA_SIZE,
+ area_size,
PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE,
0,
@@ -172,6 +199,12 @@ static void setup_zcrx(struct io_uring *ring)
}
ring_size = get_refill_ring_size(rq_entries);
+
+ if (cfg_copy_fallback) {
+ notif_stats_offset = ring_size;
+ ring_size += ALIGN_UP(sizeof(struct t_io_uring_zcrx_notif_stats), page_size);
+ }
+
ring_ptr = mmap(NULL,
ring_size,
PROT_READ | PROT_WRITE,
@@ -187,10 +220,11 @@ static void setup_zcrx(struct io_uring *ring)
struct io_uring_zcrx_area_reg area_reg = {
.addr = (__u64)(unsigned long)area_ptr,
- .len = AREA_SIZE,
+ .len = area_size,
.flags = 0,
};
+ struct t_zcrx_notification_desc notif_desc;
struct t_io_uring_zcrx_ifq_reg reg = {
.if_idx = ifindex,
.if_rxq = cfg_queue_id,
@@ -200,11 +234,32 @@ static void setup_zcrx(struct io_uring *ring)
.rx_buf_len = cfg_rx_buf_len,
};
+ if (cfg_copy_fallback || cfg_no_buffers) {
+ __u32 type_mask = 0;
+
+ if (cfg_copy_fallback)
+ type_mask = 1 << ZCRX_NOTIF_COPY;
+ if (cfg_no_buffers)
+ type_mask = 1 << ZCRX_NOTIF_NO_BUFFERS;
+
+ memset(¬if_desc, 0, sizeof(notif_desc));
+ notif_desc.user_data = NOTIF_USER_DATA;
+ notif_desc.type_mask = type_mask;
+ if (cfg_copy_fallback) {
+ notif_desc.flags = ZCRX_NOTIF_DESC_FLAG_STATS;
+ notif_desc.stats_offset = notif_stats_offset;
+ }
+ reg.notif_desc = (__u64)(unsigned long)¬if_desc;
+ }
+
ret = io_uring_register_ifq(ring, (void *)®);
if (cfg_rx_buf_len && (ret == -EINVAL || ret == -EOPNOTSUPP ||
ret == -ERANGE)) {
printf("Large chunks are not supported %i\n", ret);
exit(SKIP_CODE);
+ } else if ((cfg_copy_fallback || cfg_no_buffers) && ret == -EINVAL) {
+ printf("Notifications not supported %i\n", ret);
+ exit(SKIP_CODE);
} else if (ret) {
error(1, 0, "io_uring_register_ifq(): %d", ret);
}
@@ -304,10 +359,13 @@ static void process_recvzc(struct io_uring *ring, struct io_uring_cqe *cqe)
}
received += n;
- rqe = &rq_ring.rqes[(rq_ring.rq_tail & rq_mask)];
- rqe->off = (rcqe->off & ~IORING_ZCRX_AREA_MASK) | area_token;
- rqe->len = cqe->res;
- io_uring_smp_store_release(rq_ring.ktail, ++rq_ring.rq_tail);
+ /* Skip ring refill so that we ran out of buffers quickly */
+ if (!cfg_no_buffers) {
+ rqe = &rq_ring.rqes[(rq_ring.rq_tail & rq_mask)];
+ rqe->off = (rcqe->off & ~IORING_ZCRX_AREA_MASK) | area_token;
+ rqe->len = cqe->res;
+ io_uring_smp_store_release(rq_ring.ktail, ++rq_ring.rq_tail);
+ }
}
static void server_loop(struct io_uring *ring)
@@ -324,8 +382,16 @@ static void server_loop(struct io_uring *ring)
process_accept(ring, cqe);
else if (cqe->user_data == 2)
process_recvzc(ring, cqe);
- else
+ else if ((cfg_copy_fallback || cfg_no_buffers) &&
+ cqe->user_data == NOTIF_USER_DATA) {
+ received_notif_type |= cqe->res;
+ received_notif = true;
+ if (cfg_no_buffers &&
+ (cqe->res == ZCRX_NOTIF_NO_BUFFERS))
+ stop = true;
+ } else {
error(1, 0, "unknown cqe");
+ }
count++;
}
io_uring_cq_advance(ring, count);
@@ -374,6 +440,23 @@ static void run_server(void)
if (!stop)
error(1, 0, "test failed\n");
+
+ if (cfg_copy_fallback) {
+ struct t_io_uring_zcrx_notif_stats *stats =
+ (void *)((char *)ring_ptr + notif_stats_offset);
+
+ if (!received_notif || received_notif_type != ZCRX_NOTIF_COPY)
+ error(1, 0, "expected copy fallback notification");
+ if (!IO_URING_READ_ONCE(stats->copy_count))
+ error(1, 0, "expected copy_count > 0");
+ if (!IO_URING_READ_ONCE(stats->copy_bytes))
+ error(1, 0, "expected copy_bytes > 0");
+ }
+
+ if (cfg_no_buffers) {
+ if (!received_notif || received_notif_type != ZCRX_NOTIF_NO_BUFFERS)
+ error(1, 0, "expected no-buffers notification");
+ }
}
static void run_client(void)
@@ -425,7 +508,7 @@ static void parse_opts(int argc, char **argv)
usage(argv[0]);
cfg_payload_len = max_payload_len;
- while ((c = getopt(argc, argv, "sch:p:l:i:q:o:z:x:d")) != -1) {
+ while ((c = getopt(argc, argv, "sch:p:l:i:q:o:z:x:a:dnb")) != -1) {
switch (c) {
case 's':
if (cfg_client)
@@ -466,8 +549,19 @@ static void parse_opts(int argc, char **argv)
case 'd':
cfg_dry_run = true;
break;
+ case 'n':
+ cfg_copy_fallback = true;
+ break;
+ case 'b':
+ cfg_no_buffers = true;
+ break;
+ case 'a':
+ cfg_area_size = strtoul(optarg, NULL, 0) * page_size;
+ break;
}
}
+ if (cfg_copy_fallback && cfg_no_buffers)
+ error(1, 0, "Pass one of -n or -b");
if (cfg_server && addr)
error(1, 0, "Receiver cannot have -h specified");
diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.py b/tools/testing/selftests/drivers/net/hw/iou-zcrx.py
index e81724cb5542..82b4f4777182 100755
--- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.py
+++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.py
@@ -41,7 +41,9 @@ def set_flow_rule_rss(cfg, rss_ctx_id):
return int(values)
-def single(cfg):
+def single_no_flow(cfg):
+ """Like single() but without a flow rule."""
+
channels = cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
channels = channels['combined-count']
if channels < 2:
@@ -65,6 +67,9 @@ def single(cfg):
ethtool(f"-X {cfg.ifname} equal {cfg.target}")
defer(ethtool, f"-X {cfg.ifname} default")
+def single(cfg):
+ single_no_flow(cfg)
+
flow_rule_id = set_flow_rule(cfg)
defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
@@ -130,6 +135,26 @@ def test_zcrx_oneshot(cfg, setup) -> None:
cmd(tx_cmd, host=cfg.remote)
+@ksft_variants([
+ KsftNamedVariant("single", single_no_flow),
+])
+def test_zcrx_notif_copy_fallback(cfg, setup) -> None:
+ """Test zcrx copy fallback notification.
+
+ Omits the flow rule so traffic arrives on non-zcrx queues as regular
+ pages, forcing the kernel copy-fallback path. Asserts that the
+ ZCRX_NOTIF_COPY notification CQE is delivered."""
+
+ cfg.require_ipver('6')
+
+ setup(cfg)
+ rx_cmd = f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {cfg.target} -n"
+ tx_cmd = f"{cfg.bin_remote} -c -h {cfg.addr_v['6']} -p {cfg.port} -l 12840"
+ with bkg(rx_cmd, exit_wait=True):
+ wait_port_listen(cfg.port, proto="tcp")
+ cmd(tx_cmd, host=cfg.remote)
+
+
def test_zcrx_large_chunks(cfg) -> None:
"""Test zcrx with large buffer chunks."""
@@ -157,6 +182,25 @@ def test_zcrx_large_chunks(cfg) -> None:
cmd(tx_cmd, host=cfg.remote)
+@ksft_variants([
+ KsftNamedVariant("single", single),
+])
+def test_zcrx_notif_no_buffers(cfg, setup) -> None:
+ """Test zcrx out-of-buffer notification.
+
+ Skips buffer refill so the pool is quickly exhausted, triggering
+ a ZCRX_NOTIF_NO_BUFFERS notification CQE."""
+
+ cfg.require_ipver('6')
+
+ setup(cfg)
+ rx_cmd = f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {cfg.target} -b -a 64"
+ tx_cmd = f"{cfg.bin_remote} -c -h {cfg.addr_v['6']} -p {cfg.port} -l 12840"
+ with bkg(rx_cmd, exit_wait=True):
+ wait_port_listen(cfg.port, proto="tcp")
+ cmd(tx_cmd, host=cfg.remote, fail=False)
+
+
def main() -> None:
with NetDrvEpEnv(__file__) as cfg:
cfg.bin_local = path.abspath(path.dirname(__file__) + "/../../../drivers/net/hw/iou-zcrx")
@@ -166,7 +210,8 @@ def main() -> None:
cfg.netnl = NetdevFamily()
cfg.port = rand_port()
ksft_run(globs=globals(), cases=[test_zcrx, test_zcrx_oneshot,
- test_zcrx_large_chunks], args=(cfg, ))
+ test_zcrx_large_chunks, test_zcrx_notif_copy_fallback,
+ test_zcrx_notif_no_buffers], args=(cfg, ))
ksft_exit()
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 5/6] Documentation: networking: document zcrx notifications and statistics
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: Clément Léger, linux-doc, linux-kernel, linux-kselftest,
netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
Document the zcrx notification system and shared-memory statistics
that were introduced to let userspace monitor zero-copy receive health.
The notification section covers the two notification types
(ZCRX_NOTIF_NO_BUFFERS, ZCRX_NOTIF_COPY), registration via
zcrx_notification_desc, and the fire-once / re-arm mechanism via
ZCRX_CTRL_ARM_NOTIFICATION. The statistics section covers the optional
shared-memory io_uring_zcrx_notif_stats structure placed in the refill
ring region, including how to query its layout via
IO_URING_QUERY_ZCRX_NOTIF.
Signed-off-by: Clément Léger <cleger@meta.com>
---
Documentation/networking/iou-zcrx.rst | 121 ++++++++++++++++++++++++++
1 file changed, 121 insertions(+)
diff --git a/Documentation/networking/iou-zcrx.rst b/Documentation/networking/iou-zcrx.rst
index 7f3f4b2e6cf2..442760a1ca03 100644
--- a/Documentation/networking/iou-zcrx.rst
+++ b/Documentation/networking/iou-zcrx.rst
@@ -196,6 +196,127 @@ Return buffers back to the kernel to be used again::
rqe->len = cqe->res;
IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail);
+Notifications
+-------------
+
+When zero-copy receive encounters conditions that impact performance or
+functionality, the kernel can notify userspace via dedicated CQE notifications.
+The application must register a notification descriptor during
+``IORING_REGISTER_ZCRX_IFQ`` to receive them. Notifications are sent
+individually and are not batched with other CQEs. Each notification CQE reports
+a single notification in ``cqe->res``.
+
+Supported features can be detected by checking for ``ZCRX_FEATURE_NOTIFICATION``
+in the features bitmask returned by ``IO_URING_QUERY_ZCRX``.
+
+**Notification types**
+
+``ZCRX_NOTIF_NO_BUFFERS``
+ Fired when the page pool fails to allocate because the zcrx buffer area is
+ exhausted.
+
+``ZCRX_NOTIF_COPY``
+ Fired when a received fragment could not be delivered zero-copy and was
+ instead copied into a buffer.
+
+**Registering notifications**
+
+Allocate and fill a ``struct zcrx_notification_desc``::
+
+ struct zcrx_notification_desc notif = {
+ .user_data = MY_NOTIF_USER_DATA,
+ .type_mask = ZCRX_NOTIF_NO_BUFFERS | ZCRX_NOTIF_COPY,
+ };
+
+ reg.notif_desc = (__u64)(unsigned long)¬if;
+
+``user_data`` is the value that will appear in the notification CQE's
+``user_data`` field. ``type_mask`` selects which notification types the
+application wants to receive.
+
+When a registered event occurs, the kernel posts a CQE with the specified
+``user_data`` and ``cqe->res`` set to a bitmask of the triggered notification
+types.
+
+**Rate limiting**
+
+Each notification type fires once until the application explicitly re-arms it.
+To re-arm, issue ``IORING_REGISTER_ZCRX_CTRL`` with
+``ZCRX_CTRL_ARM_NOTIFICATION``::
+
+ struct zcrx_ctrl ctrl = {
+ .zcrx_id = zcrx_id,
+ .op = ZCRX_CTRL_ARM_NOTIFICATION,
+ .zc_arm_notif = {
+ .notif_type = ZCRX_NOTIF_NO_BUFFERS,
+ },
+ };
+
+ io_uring_register(ring_fd, IORING_REGISTER_ZCRX_CTRL, &ctrl, 0);
+
+Only notification types that have previously fired can be re-armed.
+
+Notification statistics
+-----------------------
+
+In addition to CQE-based notifications, the kernel can maintain a shared-memory
+statistics structure that is updated on every relevant event. All stats are
+updated regardless of which notification flags were registered.
+
+The statistics structure layout and alignment requirements can be queried via
+``IO_URING_QUERY_ZCRX_NOTIF``. The application must query the structure size
+and alignment requirements so that it allocates enough memory for the region
+to fit both the refill ring and the stats structure::
+
+ struct io_uring_query_zcrx_notif notif_query = {};
+ struct io_uring_query_hdr hdr = {
+ .query_op = IO_URING_QUERY_ZCRX_NOTIF,
+ .size = sizeof(notif_query),
+ .query_data = (__u64)(unsigned long)¬if_query,
+ };
+
+ io_uring_register(ring_fd, IORING_REGISTER_QUERY, &hdr, 1);
+
+ __u32 notif_stats_size = notif_query.notif_stats_size;
+ __u32 notif_stats_off_alignment = notif_query.notif_stats_off_alignment;
+
+To enable statistics, place the stats structure after the refill ring entries
+within the same mapped region, and set the ``ZCRX_NOTIF_DESC_FLAG_STATS`` flag
+in the notification descriptor::
+
+ /* Compute offset for the stats struct (after refill ring entries) */
+ size_t stats_offset = ALIGN_UP(ring_size, notif_stats_off_alignment);
+ ring_size = stats_offset + notif_stats_size;
+ ring_size = ALIGN_UP(ring_size, PAGE_SIZE);
+
+ /* Map the region with the extra space */
+ ring_ptr = mmap(NULL, ring_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+
+ struct zcrx_notification_desc notif = {
+ .user_data = MY_NOTIF_USER_DATA,
+ .type_mask = ZCRX_NOTIF_COPY,
+ .flags = ZCRX_NOTIF_DESC_FLAG_STATS,
+ .stats_offset = stats_offset,
+ };
+
+The ``stats_offset`` must satisfy the alignment reported by
+``notif_stats_off_alignment`` and must point to a location within the mapped
+region that does not overlap with the refill ring header or entries.
+
+Application can read stat counters them at any time::
+
+ volatile struct io_uring_zcrx_notif_stats *stats =
+ (void *)((char *)ring_ptr + stats_offset);
+
+ printf("copy fallbacks: %llu (%llu bytes)\n",
+ IO_URING_READ_ONCE(stats->copy_count),
+ IO_URING_READ_ONCE(stats->copy_bytes));
+
+``copy_count`` is incremented each time a fragment is copied instead of being
+delivered via zero-copy. ``copy_bytes`` accumulates the total number of bytes
+copied.
+
Area chunking
-------------
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 4/6] io_uring/zcrx: add shared-memory notification statistics
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: Clément Léger, linux-doc, linux-kernel, linux-kselftest,
netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
Add support for an optional stats struct embedded in the refill queue
region, allowing userspace to monitor copy-fallback in real-time.
Userspace queries the stats struct size and alignment via
IO_URING_QUERY_ZCRX_NOTIF (notif_stats_size / notif_stats_alignment),
then provides a stats_offset in zcrx_notification_desc pointing to a
location within the refill queue region.
The kernel updates the stats counters in-place on every copy-fallback
event.
Signed-off-by: Clément Léger <cleger@meta.com>
---
include/uapi/linux/io_uring/query.h | 12 +++++++
include/uapi/linux/io_uring/zcrx.h | 15 ++++++--
io_uring/query.c | 16 +++++++++
io_uring/zcrx.c | 54 +++++++++++++++++++++++++++--
io_uring/zcrx.h | 1 +
5 files changed, 94 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/io_uring/query.h b/include/uapi/linux/io_uring/query.h
index 95500759cc13..1a68eca7c6b4 100644
--- a/include/uapi/linux/io_uring/query.h
+++ b/include/uapi/linux/io_uring/query.h
@@ -23,6 +23,7 @@ enum {
IO_URING_QUERY_OPCODES = 0,
IO_URING_QUERY_ZCRX = 1,
IO_URING_QUERY_SCQ = 2,
+ IO_URING_QUERY_ZCRX_NOTIF = 3,
__IO_URING_QUERY_MAX,
};
@@ -62,6 +63,17 @@ struct io_uring_query_zcrx {
__u64 __resv2;
};
+struct io_uring_query_zcrx_notif {
+ /* Bitmask of supported ZCRX_NOTIF_* flags */
+ __u32 notif_flags;
+ /* Size of io_uring_zcrx_notif_stats */
+ __u32 notif_stats_size;
+ /* Required alignment for the stats struct within the region (ie stats_offset) */
+ __u32 notif_stats_off_alignment;
+ __u32 __resv1;
+ __u64 __resv2[4];
+};
+
struct io_uring_query_scq {
/* The SQ/CQ rings header size */
__u64 hdr_size;
diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uring/zcrx.h
index 3f7b72b09878..384e185a180c 100644
--- a/include/uapi/linux/io_uring/zcrx.h
+++ b/include/uapi/linux/io_uring/zcrx.h
@@ -75,11 +75,22 @@ enum zcrx_notification_type {
__ZCRX_NOTIF_TYPE_LAST,
};
+enum zcrx_notification_desc_flags {
+ /* If set, stats_offset holds a valid offset to a notif_stats struct */
+ ZCRX_NOTIF_DESC_FLAG_STATS = 1 << 0,
+};
+
+struct io_uring_zcrx_notif_stats {
+ __u64 copy_count; /* cumulative copy-fallback CQEs */
+ __u64 copy_bytes; /* cumulative bytes copied */
+};
+
struct zcrx_notification_desc {
__u64 user_data;
__u32 type_mask;
- __u32 __resv1;
- __u64 __resv2[10];
+ __u32 flags; /* see enum zcrx_notification_desc_flags */
+ __u64 stats_offset; /* offset from the beginning of refill ring region for stats */
+ __u64 __resv2[9];
};
/*
diff --git a/io_uring/query.c b/io_uring/query.c
index c1704d088374..d17a83645bcd 100644
--- a/io_uring/query.c
+++ b/io_uring/query.c
@@ -9,6 +9,7 @@
union io_query_data {
struct io_uring_query_opcode opcodes;
struct io_uring_query_zcrx zcrx;
+ struct io_uring_query_zcrx_notif zcrx_notif;
struct io_uring_query_scq scq;
};
@@ -44,6 +45,18 @@ static ssize_t io_query_zcrx(union io_query_data *data)
return sizeof(*e);
}
+static ssize_t io_query_zcrx_notif(union io_query_data *data)
+{
+ struct io_uring_query_zcrx_notif *e = &data->zcrx_notif;
+
+ e->notif_flags = ZCRX_NOTIF_TYPE_MASK;
+ e->notif_stats_size = sizeof(struct io_uring_zcrx_notif_stats);
+ e->notif_stats_off_alignment = __alignof__(struct io_uring_zcrx_notif_stats);
+ e->__resv1 = 0;
+ memset(&e->__resv2, 0, sizeof(e->__resv2));
+ return sizeof(*e);
+}
+
static ssize_t io_query_scq(union io_query_data *data)
{
struct io_uring_query_scq *e = &data->scq;
@@ -83,6 +96,9 @@ static int io_handle_query_entry(union io_query_data *data, void __user *uhdr,
case IO_URING_QUERY_ZCRX:
ret = io_query_zcrx(data);
break;
+ case IO_URING_QUERY_ZCRX_NOTIF:
+ ret = io_query_zcrx_notif(data);
+ break;
case IO_URING_QUERY_SCQ:
ret = io_query_scq(data);
break;
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index f31f2ca0f7ec..2881ad76bacc 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -415,6 +415,7 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
io_free_region(ifq->user, &ifq->rq_region);
ifq->rq.ring = IO_URING_PTR_POISON;
ifq->rq.rqes = IO_URING_PTR_POISON;
+ ifq->notif_stats = IO_URING_PTR_POISON;
}
static void io_zcrx_free_area(struct io_zcrx_ifq *ifq,
@@ -855,6 +856,33 @@ static int zcrx_register_netdev(struct io_zcrx_ifq *ifq,
return ret;
}
+static int zcrx_validate_notif_stats(struct io_zcrx_ifq *ifq,
+ const struct io_uring_zcrx_ifq_reg *reg,
+ const struct zcrx_notification_desc *notif)
+{
+ size_t stats_off = notif->stats_offset;
+ size_t used, end;
+
+ used = reg->offsets.rqes +
+ sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries;
+
+ if (!IS_ALIGNED(stats_off, __alignof__(struct io_uring_zcrx_notif_stats)))
+ return -EINVAL;
+ if (stats_off < used)
+ return -ERANGE;
+ if (check_add_overflow(stats_off,
+ sizeof(struct io_uring_zcrx_notif_stats),
+ &end))
+ return -ERANGE;
+ if (end > io_region_size(&ifq->rq_region))
+ return -ERANGE;
+
+ ifq->notif_stats = io_region_get_ptr(&ifq->rq_region) + stats_off;
+ memset(ifq->notif_stats, 0, sizeof(*ifq->notif_stats));
+
+ return 0;
+}
+
int io_register_zcrx(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg)
{
@@ -908,7 +936,13 @@ int io_register_zcrx(struct io_ring_ctx *ctx,
return -EFAULT;
if (notif.type_mask & ~ZCRX_NOTIF_TYPE_MASK)
return -EINVAL;
- if (notif.__resv1 || !mem_is_zero(¬if.__resv2, sizeof(notif.__resv2)))
+ if (notif.flags & ~ZCRX_NOTIF_DESC_FLAG_STATS)
+ return -EINVAL;
+ if (!(notif.flags & ZCRX_NOTIF_DESC_FLAG_STATS)) {
+ if (notif.stats_offset)
+ return -EINVAL;
+ }
+ if (!mem_is_zero(¬if.__resv2, sizeof(notif.__resv2)))
return -EINVAL;
ifq = io_zcrx_ifq_alloc(ctx);
@@ -939,6 +973,12 @@ int io_register_zcrx(struct io_ring_ctx *ctx,
if (ret)
goto err;
+ if (notif.flags & ZCRX_NOTIF_DESC_FLAG_STATS) {
+ ret = zcrx_validate_notif_stats(ifq, ®, ¬if);
+ if (ret)
+ goto err;
+ }
+
ifq->kern_readable = !(area.flags & IORING_ZCRX_AREA_DMABUF);
if (!(reg.flags & ZCRX_REG_NODEV)) {
@@ -1154,6 +1194,11 @@ static void zcrx_notif_tw(struct io_tw_req tw_req, io_tw_token_t tw)
kmem_cache_free(req_cachep, req);
}
+static void zcrx_stat_add(__u64 *p, s64 v)
+{
+ WRITE_ONCE(*p, READ_ONCE(*p) + v);
+}
+
static void zcrx_send_notif(struct io_zcrx_ifq *ifq, unsigned type)
{
gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN | __GFP_ZERO;
@@ -1537,8 +1582,13 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
int ret;
ret = io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len);
- if (ret > 0)
+ if (ret > 0) {
+ if (ifq->notif_stats) {
+ zcrx_stat_add(&ifq->notif_stats->copy_count, 1);
+ zcrx_stat_add(&ifq->notif_stats->copy_bytes, ret);
+ }
zcrx_send_notif(ifq, ZCRX_NOTIF_COPY);
+ }
return ret;
}
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 203b3049e14b..e1aab76c310d 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -81,6 +81,7 @@ struct io_zcrx_ifq {
u32 allowed_notif_mask;
u32 fired_notifs;
u64 notif_data;
+ struct io_uring_zcrx_notif_stats *notif_stats;
};
#if defined(CONFIG_IO_URING_ZCRX)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 2/6] io_uring/zcrx: notify user when out of buffers
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: linux-doc, linux-kernel, linux-kselftest, netdev, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jonathan Corbet, Shuah Khan, Vishwanath Seshagiri,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
From: Pavel Begunkov <asml.silence@gmail.com>
There are currently no easy ways for the user to know if zcrx is out of
buffers and page pool fails to allocate. Add uapi for zcrx to communicate
it back.
It's implemented as a separate CQE, which for now is posted to the creator
ctx. To use it, on registration the user space needs to pass an instance
of struct zcrx_notification_desc, which tells the kernel the user_data
for resulting CQEs and which event types are expected / allowed.
When an allowed event happens, zcrx will post a CQE containing the
specified user_data, and lower bits of cqe->res will be set to the event
mask. Before the kernel could post another notification of the given
type, the user needs to acknowledge that it processed the previous one
by issuing IORING_REGISTER_ZCRX_CTRL with ZCRX_CTRL_ARM_NOTIFICATION.
The only notification type the patch implements is
ZCRX_NOTIF_NO_BUFFERS, but we'll need more of them in the future.
Co-developed-by: Vishwanath Seshagiri <vishs@meta.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring/zcrx.h | 24 ++++++++-
io_uring/io_uring.c | 2 +-
io_uring/io_uring.h | 1 +
io_uring/zcrx.c | 86 +++++++++++++++++++++++++++++-
io_uring/zcrx.h | 7 ++-
5 files changed, 115 insertions(+), 5 deletions(-)
diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uring/zcrx.h
index 5ce02c7a6096..67185566ad3c 100644
--- a/include/uapi/linux/io_uring/zcrx.h
+++ b/include/uapi/linux/io_uring/zcrx.h
@@ -65,6 +65,20 @@ enum zcrx_features {
* value in struct io_uring_zcrx_ifq_reg::rx_buf_len.
*/
ZCRX_FEATURE_RX_PAGE_SIZE = 1 << 0,
+ ZCRX_FEATURE_NOTIFICATION = 1 << 1,
+};
+
+enum zcrx_notification_type {
+ ZCRX_NOTIF_NO_BUFFERS,
+
+ __ZCRX_NOTIF_TYPE_LAST,
+};
+
+struct zcrx_notification_desc {
+ __u64 user_data;
+ __u32 type_mask;
+ __u32 __resv1;
+ __u64 __resv2[10];
};
/*
@@ -82,12 +96,14 @@ struct io_uring_zcrx_ifq_reg {
struct io_uring_zcrx_offsets offsets;
__u32 zcrx_id;
__u32 rx_buf_len;
- __u64 __resv[3];
+ __u64 notif_desc; /* see struct zcrx_notification_desc */
+ __u64 __resv[2];
};
enum zcrx_ctrl_op {
ZCRX_CTRL_FLUSH_RQ,
ZCRX_CTRL_EXPORT,
+ ZCRX_CTRL_ARM_NOTIFICATION,
__ZCRX_CTRL_LAST,
};
@@ -101,6 +117,11 @@ struct zcrx_ctrl_export {
__u32 __resv1[11];
};
+struct zcrx_ctrl_arm_notif {
+ __u32 notif_type;
+ __u32 __resv[11];
+};
+
struct zcrx_ctrl {
__u32 zcrx_id;
__u32 op; /* see enum zcrx_ctrl_op */
@@ -109,6 +130,7 @@ struct zcrx_ctrl {
union {
struct zcrx_ctrl_export zc_export;
struct zcrx_ctrl_flush_rq zc_flush;
+ struct zcrx_ctrl_arm_notif zc_arm_notif;
};
};
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2ebb0ba37c4f..c5972274cce1 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -160,7 +160,7 @@ static void io_poison_cached_req(struct io_kiocb *req)
req->apoll = IO_URING_PTR_POISON;
}
-static void io_poison_req(struct io_kiocb *req)
+void io_poison_req(struct io_kiocb *req)
{
io_poison_cached_req(req);
req->async_data = IO_URING_PTR_POISON;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e612a66ee80e..de0a3bed58d1 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -213,6 +213,7 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
void io_activate_pollwq(struct io_ring_ctx *ctx);
void io_restriction_clone(struct io_restriction *dst, struct io_restriction *src);
+void io_poison_req(struct io_kiocb *req);
static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
{
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 34faf90423f4..463fbaead35b 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -768,6 +768,8 @@ static int import_zcrx(struct io_ring_ctx *ctx,
return -EINVAL;
if (reg->if_rxq || reg->rq_entries || reg->area_ptr || reg->region_ptr)
return -EINVAL;
+ if (reg->notif_desc)
+ return -EINVAL;
if (reg->flags & ~ZCRX_REG_IMPORT)
return -EINVAL;
@@ -856,6 +858,7 @@ static int zcrx_register_netdev(struct io_zcrx_ifq *ifq,
int io_register_zcrx(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg)
{
+ struct zcrx_notification_desc notif;
struct io_uring_zcrx_area_reg area;
struct io_uring_zcrx_ifq_reg reg;
struct io_uring_region_desc rd;
@@ -899,10 +902,22 @@ int io_register_zcrx(struct io_ring_ctx *ctx,
if (copy_from_user(&area, u64_to_user_ptr(reg.area_ptr), sizeof(area)))
return -EFAULT;
+ memset(¬if, 0, sizeof(notif));
+ if (reg.notif_desc && copy_from_user(¬if, u64_to_user_ptr(reg.notif_desc),
+ sizeof(notif)))
+ return -EFAULT;
+ if (notif.type_mask & ~ZCRX_NOTIF_TYPE_MASK)
+ return -EINVAL;
+ if (notif.__resv1 || !mem_is_zero(¬if.__resv2, sizeof(notif.__resv2)))
+ return -EINVAL;
+
ifq = io_zcrx_ifq_alloc(ctx);
if (!ifq)
return -ENOMEM;
+ ifq->notif_data = notif.user_data;
+ ifq->allowed_notif_mask = notif.type_mask;
+
if (ctx->user) {
get_uid(ctx->user);
ifq->user = ctx->user;
@@ -954,7 +969,8 @@ int io_register_zcrx(struct io_ring_ctx *ctx,
goto err;
}
- zcrx_set_ring_ctx(ifq, ctx);
+ if (notif.type_mask)
+ zcrx_set_ring_ctx(ifq, ctx);
return 0;
err:
scoped_guard(mutex, &ctx->mmap_lock)
@@ -1127,6 +1143,48 @@ static unsigned io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *if
return allocated;
}
+static void zcrx_notif_tw(struct io_tw_req tw_req, io_tw_token_t tw)
+{
+ struct io_kiocb *req = tw_req.req;
+ struct io_ring_ctx *ctx = req->ctx;
+
+ io_post_aux_cqe(ctx, req->cqe.user_data, req->cqe.res, 0);
+ percpu_ref_put(&ctx->refs);
+ io_poison_req(req);
+ kmem_cache_free(req_cachep, req);
+}
+
+static void zcrx_send_notif(struct io_zcrx_ifq *ifq, unsigned type)
+{
+ gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN | __GFP_ZERO;
+ u32 type_mask = 1 << type;
+ struct io_kiocb *req;
+
+ if (!(type_mask & ifq->allowed_notif_mask))
+ return;
+
+ guard(spinlock_bh)(&ifq->ctx_lock);
+ if (!ifq->master_ctx)
+ return;
+ if (type_mask & ifq->fired_notifs)
+ return;
+
+ req = kmem_cache_alloc(req_cachep, gfp);
+ if (unlikely(!req))
+ return;
+
+ ifq->fired_notifs |= type_mask;
+
+ req->opcode = IORING_OP_NOP;
+ req->cqe.user_data = ifq->notif_data;
+ req->cqe.res = type;
+ req->ctx = ifq->master_ctx;
+ percpu_ref_get(&req->ctx->refs);
+ req->tctx = NULL;
+ req->io_task_work.func = zcrx_notif_tw;
+ io_req_task_work_add(req);
+}
+
static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp)
{
struct io_zcrx_ifq *ifq = io_pp_to_ifq(pp);
@@ -1143,8 +1201,10 @@ static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp)
goto out_return;
allocated = io_zcrx_refill_slow(pp, ifq, netmems, to_alloc);
- if (!allocated)
+ if (!allocated) {
+ zcrx_send_notif(ifq, ZCRX_NOTIF_NO_BUFFERS);
return 0;
+ }
out_return:
zcrx_sync_for_device(pp, ifq, netmems, allocated);
allocated--;
@@ -1293,12 +1353,32 @@ static int zcrx_flush_rq(struct io_ring_ctx *ctx, struct io_zcrx_ifq *zcrx,
return 0;
}
+static int zcrx_arm_notif(struct io_ring_ctx *ctx, struct io_zcrx_ifq *zcrx,
+ struct zcrx_ctrl *ctrl)
+{
+ const struct zcrx_ctrl_arm_notif *an = &ctrl->zc_arm_notif;
+ unsigned type_mask;
+
+ if (an->notif_type >= __ZCRX_NOTIF_TYPE_LAST)
+ return -EINVAL;
+ if (!mem_is_zero(&an->__resv, sizeof(an->__resv)))
+ return -EINVAL;
+
+ guard(spinlock_bh)(&zcrx->ctx_lock);
+ type_mask = 1U << an->notif_type;
+ if (type_mask & ~zcrx->fired_notifs)
+ return -EINVAL;
+ zcrx->fired_notifs &= ~type_mask;
+ return 0;
+}
+
int io_zcrx_ctrl(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
{
struct zcrx_ctrl ctrl;
struct io_zcrx_ifq *zcrx;
BUILD_BUG_ON(sizeof(ctrl.zc_export) != sizeof(ctrl.zc_flush));
+ BUILD_BUG_ON(sizeof(ctrl.zc_export) != sizeof(ctrl.zc_arm_notif));
if (nr_args)
return -EINVAL;
@@ -1316,6 +1396,8 @@ int io_zcrx_ctrl(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_args)
return zcrx_flush_rq(ctx, zcrx, &ctrl);
case ZCRX_CTRL_EXPORT:
return zcrx_export(ctx, zcrx, &ctrl, arg);
+ case ZCRX_CTRL_ARM_NOTIFICATION:
+ return zcrx_arm_notif(ctx, zcrx, &ctrl);
}
return -EOPNOTSUPP;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 6b565d0bf6da..cca10d0d02ac 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -9,7 +9,9 @@
#include <net/net_trackers.h>
#define ZCRX_SUPPORTED_REG_FLAGS (ZCRX_REG_IMPORT | ZCRX_REG_NODEV)
-#define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE)
+#define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE |\
+ ZCRX_FEATURE_NOTIFICATION)
+#define ZCRX_NOTIF_TYPE_MASK (1U << ZCRX_NOTIF_NO_BUFFERS)
struct io_zcrx_mem {
unsigned long size;
@@ -76,6 +78,9 @@ struct io_zcrx_ifq {
spinlock_t ctx_lock;
struct io_ring_ctx *master_ctx;
+ u32 allowed_notif_mask;
+ u32 fired_notifs;
+ u64 notif_data;
};
#if defined(CONFIG_IO_URING_ZCRX)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 3/6] io_uring/zcrx: notify user on frag copy fallback
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: Clément Léger, linux-doc, linux-kernel, linux-kselftest,
netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
Add a ZCRX_NOTIF_COPY notification type to signal userspace when a
received fragment could not be delivered using zero-copy and was
instead copied into a buffer.
Signed-off-by: Clément Léger <cleger@meta.com>
---
include/uapi/linux/io_uring/zcrx.h | 1 +
io_uring/zcrx.c | 7 ++++++-
io_uring/zcrx.h | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uring/zcrx.h
index 67185566ad3c..3f7b72b09878 100644
--- a/include/uapi/linux/io_uring/zcrx.h
+++ b/include/uapi/linux/io_uring/zcrx.h
@@ -70,6 +70,7 @@ enum zcrx_features {
enum zcrx_notification_type {
ZCRX_NOTIF_NO_BUFFERS,
+ ZCRX_NOTIF_COPY,
__ZCRX_NOTIF_TYPE_LAST,
};
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 463fbaead35b..f31f2ca0f7ec 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -1534,8 +1534,13 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
const skb_frag_t *frag, int off, int len)
{
struct page *page = skb_frag_page(frag);
+ int ret;
+
+ ret = io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len);
+ if (ret > 0)
+ zcrx_send_notif(ifq, ZCRX_NOTIF_COPY);
- return io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len);
+ return ret;
}
static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index cca10d0d02ac..203b3049e14b 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -11,7 +11,7 @@
#define ZCRX_SUPPORTED_REG_FLAGS (ZCRX_REG_IMPORT | ZCRX_REG_NODEV)
#define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE |\
ZCRX_FEATURE_NOTIFICATION)
-#define ZCRX_NOTIF_TYPE_MASK (1U << ZCRX_NOTIF_NO_BUFFERS)
+#define ZCRX_NOTIF_TYPE_MASK ((1U << ZCRX_NOTIF_NO_BUFFERS) | (1U << ZCRX_NOTIF_COPY))
struct io_zcrx_mem {
unsigned long size;
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 1/6] io_uring/zcrx: add ctx pointer to zcrx
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: linux-doc, linux-kernel, linux-kselftest, netdev, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Jonathan Corbet, Shuah Khan, Vishwanath Seshagiri,
Vishwanath Seshagiri
In-Reply-To: <20260518153532.2835502-1-cleger@meta.com>
From: Pavel Begunkov <asml.silence@gmail.com>
zcrx will need to have a pointer to an owning ctx to communicate
different events. Reference the ctx while it's attached to zcrx, and
rely on zcrx termination to drop the ctx to avoid circular ref deps.
Co-developed-by: Vishwanath Seshagiri <vishs@meta.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 39 +++++++++++++++++++++++++++++++--------
io_uring/zcrx.h | 3 +++
2 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 3f9632e7790a..34faf90423f4 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -44,6 +44,17 @@ static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *nio
return container_of(owner, struct io_zcrx_area, nia);
}
+static bool zcrx_set_ring_ctx(struct io_zcrx_ifq *zcrx,
+ struct io_ring_ctx *ctx)
+{
+ guard(spinlock_bh)(&zcrx->ctx_lock);
+ if (zcrx->master_ctx)
+ return false;
+ percpu_ref_get(&ctx->refs);
+ zcrx->master_ctx = ctx;
+ return true;
+}
+
static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
@@ -531,6 +542,7 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx)
return NULL;
ifq->if_rxq = -1;
+ spin_lock_init(&ifq->ctx_lock);
spin_lock_init(&ifq->rq.lock);
mutex_init(&ifq->pp_lock);
refcount_set(&ifq->refs, 1);
@@ -580,6 +592,8 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
return;
if (WARN_ON_ONCE(ifq->netdev != NULL))
return;
+ if (WARN_ON_ONCE(ifq->master_ctx))
+ return;
if (ifq->area)
io_zcrx_free_area(ifq, ifq->area);
@@ -656,17 +670,24 @@ static void io_zcrx_scrub(struct io_zcrx_ifq *ifq)
}
}
-static void zcrx_unregister_user(struct io_zcrx_ifq *ifq)
+static void zcrx_unregister_user(struct io_zcrx_ifq *ifq, struct io_ring_ctx *ctx)
{
+ scoped_guard(spinlock_bh, &ifq->ctx_lock) {
+ if (ctx && ifq->master_ctx == ctx) {
+ ifq->master_ctx = NULL;
+ percpu_ref_put(&ctx->refs);
+ }
+ }
+
if (refcount_dec_and_test(&ifq->user_refs)) {
io_close_queue(ifq);
io_zcrx_scrub(ifq);
}
}
-static void zcrx_unregister(struct io_zcrx_ifq *ifq)
+static void zcrx_unregister(struct io_zcrx_ifq *ifq, struct io_ring_ctx *ctx)
{
- zcrx_unregister_user(ifq);
+ zcrx_unregister_user(ifq, ctx);
io_put_zcrx_ifq(ifq);
}
@@ -686,7 +707,7 @@ static int zcrx_box_release(struct inode *inode, struct file *file)
if (WARN_ON_ONCE(!ifq))
return -EFAULT;
- zcrx_unregister(ifq);
+ zcrx_unregister(ifq, NULL);
return 0;
}
@@ -711,7 +732,7 @@ static int zcrx_export(struct io_ring_ctx *ctx, struct io_zcrx_ifq *ifq,
file = anon_inode_create_getfile("[zcrx]", &zcrx_box_fops,
ifq, O_CLOEXEC, NULL);
if (IS_ERR(file)) {
- zcrx_unregister(ifq);
+ zcrx_unregister(ifq, NULL);
return PTR_ERR(file);
}
@@ -787,7 +808,7 @@ static int import_zcrx(struct io_ring_ctx *ctx,
scoped_guard(mutex, &ctx->mmap_lock)
xa_erase(&ctx->zcrx_ctxs, id);
err:
- zcrx_unregister(ifq);
+ zcrx_unregister(ifq, ctx);
return ret;
}
@@ -932,12 +953,14 @@ int io_register_zcrx(struct io_ring_ctx *ctx,
ret = -EFAULT;
goto err;
}
+
+ zcrx_set_ring_ctx(ifq, ctx);
return 0;
err:
scoped_guard(mutex, &ctx->mmap_lock)
xa_erase(&ctx->zcrx_ctxs, id);
ifq_free:
- zcrx_unregister(ifq);
+ zcrx_unregister(ifq, ctx);
return ret;
}
@@ -967,7 +990,7 @@ void io_terminate_zcrx(struct io_ring_ctx *ctx)
break;
set_zcrx_entry_mark(ctx, id);
id++;
- zcrx_unregister_user(ifq);
+ zcrx_unregister_user(ifq, ctx);
}
}
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 9e1a6a1b11e8..6b565d0bf6da 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -73,6 +73,9 @@ struct io_zcrx_ifq {
*/
struct mutex pp_lock;
struct io_mapped_region rq_region;
+
+ spinlock_t ctx_lock;
+ struct io_ring_ctx *master_ctx;
};
#if defined(CONFIG_IO_URING_ZCRX)
--
2.53.0-Meta
^ permalink raw reply related
* [PATCH v2 0/6] io_uring/zcrx: add CQE based notifications and stats reporting
From: Clément Léger @ 2026-05-18 15:35 UTC (permalink / raw)
To: io-uring, Pavel Begunkov, Jens Axboe
Cc: Clément Léger, linux-doc, linux-kernel, linux-kselftest,
netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
Vishwanath Seshagiri
The zcrx path can encounter various conditions that lead to internal
fallbacks or errors. These errors can have a large impact on performance
and functionality but are not yet not being reported to the user which
is then unable to take action.
This series addresses this problem by adding a new notification system
paired with a statistics structure. The notification system currently
report out of buffer and packets that fallback to copy. The statistics
structure report the number and total size of packets that were copied
rather than received via the zero-copy path.
The out of buffer notification allows the user to actually adjust the
buffer sizing when registering zcrx support for the ifq. Some future
work could allow the user to add more memory on the fly to the pool so
the page allocator doesn't run out of memory.
This series can be tested using the include kselftest modification and
using the liburing series that updates headers and tests/examples so
that it uses notifications and statistics.
Changes in v2:
- Rebase on top of Pavel's branch that now uses a single CQE per notif
- Change notification mask to type (ie one CQE per event)
- Use a type rather than a mask for rearm as well
- Update tests to use single typei
- Update documentatiopn to state that notif CQEs are sent for a single
event
- Fix zero init of zcrx_query_notif __resv field
- Rename resv1 to __resv1
- Reduce __resv2 size to match io_uring_query_opcode size
- Verifies that stats_offset is 0 if FLAG_STATS is zero
- Added zcrx notif query sequence to documentation
- Add _copy_fallback to test name
---
Clément Léger (4):
io_uring/zcrx: notify user on frag copy fallback
io_uring/zcrx: add shared-memory notification statistics
Documentation: networking: document zcrx notifications and statistics
selftests: iou-zcrx: add notification and stats test for zcrx
Pavel Begunkov (2):
io_uring/zcrx: add ctx pointer to zcrx
io_uring/zcrx: notify user when out of buffers
Documentation/networking/iou-zcrx.rst | 121 ++++++++++++
include/uapi/linux/io_uring/query.h | 12 ++
include/uapi/linux/io_uring/zcrx.h | 36 +++-
io_uring/io_uring.c | 2 +-
io_uring/io_uring.h | 1 +
io_uring/query.c | 16 ++
io_uring/zcrx.c | 180 +++++++++++++++++-
io_uring/zcrx.h | 11 +-
.../selftests/drivers/net/hw/iou-zcrx.c | 114 ++++++++++-
.../selftests/drivers/net/hw/iou-zcrx.py | 49 ++++-
10 files changed, 517 insertions(+), 25 deletions(-)
--
Clément Léger
^ permalink raw reply
* Re: [PATCH 0/2] docs: iio: update dated triggered buffer example
From: Jonathan Cameron @ 2026-05-18 15:33 UTC (permalink / raw)
To: David Lechner
Cc: Jonathan Corbet, Shuah Khan, Nuno Sá, Andy Shevchenko,
linux-doc, linux-kernel, linux-iio
In-Reply-To: <20260517-iio-doc-triggered-buffer-update-helpers-v1-0-7f00d4188f6f@baylibre.com>
On Sun, 17 May 2026 12:00:57 -0500
David Lechner <dlechner@baylibre.com> wrote:
> Noticed this example was out of date while grepping for something else.
> And when I did get_maintainer.pl on it, it didn't match the IIO
> subsystem, so we get a bonus patch to fix that too.
>
> Signed-off-by: David Lechner <dlechner@baylibre.com>
> ---
> David Lechner (2):
> MAINTAINERS: add match for IIO API docs
Well I suppose we 'should' maintain those :)
> docs: iio: triggered-buffers: use new helpers in example
Series applied.
Thanks,
Jonathan
>
> Documentation/driver-api/iio/triggered-buffers.rst | 8 ++++----
> MAINTAINERS | 1 +
> 2 files changed, 5 insertions(+), 4 deletions(-)
> ---
> base-commit: 8678fb54958893818ddeccd05fea560a4e1fc759
> change-id: 20260517-iio-doc-triggered-buffer-update-helpers-ef7e3895c9f4
>
> Best regards,
> --
> David Lechner <dlechner@baylibre.com>
>
^ permalink raw reply
* Re: [PATCH] nios2: remove the architecture
From: Jonathan Cameron @ 2026-05-18 15:29 UTC (permalink / raw)
To: Ethan Nelson-Moore
Cc: linux-doc, devicetree, workflows, linux-arch, dmaengine,
linux-i2c, linux-iio, netdev, linux-pci, linux-pwm,
linux-hardening, linux-kbuild, linux-csky, Jonathan Corbet,
Shuah Khan, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Daniel Lezcano, Thomas Gleixner, Alex Shi, Yanteng Si,
Dongliang Mu, Hu Haowen, Dinh Nguyen, Kees Cook, Oleg Nesterov,
Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
Peter Zijlstra, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
David Lechner, Nuno Sá, Andy Shevchenko, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Lorenzo Pieralisi, Krzysztof Wilczyński
In-Reply-To: <20260518042833.272221-1-enelsonmoore@gmail.com>
On Sun, 17 May 2026 21:28:33 -0700
Ethan Nelson-Moore <enelsonmoore@gmail.com> wrote:
> The Nios II architecture is a soft-core architecture developed by
> Altera (since acquired by Intel) and intended to run on their FPGAs.
>
> Licenses for the architecture have not been available for purchase
> since 2024 [1], and support for it has been removed from GCC 15 [2],
> Buildroot [3], and QEMU [4].
>
> Given all of these factors, it is time to remove Nios II support from
> the kernel. The maintainer stated in 2024 that they were planning to do
> so soon [5], but this did not come to pass.
>
> Remove Nios II support from the kernel and move the former maintainer
> to CREDITS. Thank you, Dinh Nguyen, for maintaining Nios II support!
>
> References:
> [1] https://docs.altera.com/v/u/docs/781327/is-discontinuing-ip-ordering-codes-listed-in-pdn2312-for-nios-ii-ip
> [2] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e876acab6cdd84bb2b32c98fc69fb0ba29c81153
> [3] https://github.com/buildroot/buildroot/commit/6775ccc5a199d574ad70b5f79ec58cce97a07c6f
> [4] https://github.com/qemu/qemu/commit/6c3014858c4c0024dd0560f08a6eda0f92f658d6
> [5] https://sourceware.org/pipermail/newlib/2024/021083.html
>
> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
If it goes for IIO trivial changes.
Acked-by: Jonathan Cameron <jic23@kernel.org>
^ permalink raw reply
* Re: [PATCH RFC v4 09/10] Documentation: ABI: testing: add docs for ad9910 sysfs entries
From: Rodrigo Alencar @ 2026-05-18 15:27 UTC (permalink / raw)
To: Jonathan Cameron, Rodrigo Alencar
Cc: Rodrigo Alencar via B4 Relay, rodrigo.alencar, linux-iio,
devicetree, linux-kernel, linux-doc, linux-hardening,
Lars-Peter Clausen, Michael Hennerich, David Lechner,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Philipp Zabel, Jonathan Corbet, Shuah Khan, Kees Cook,
Gustavo A. R. Silva
In-Reply-To: <20260518144537.7c998308@jic23-huawei>
On 26/05/18 02:45PM, Jonathan Cameron wrote:
> On Sun, 17 May 2026 18:30:27 +0100
> Rodrigo Alencar <455.rodrigo.alencar@gmail.com> wrote:
>
> > On 26/05/17 03:58PM, Jonathan Cameron wrote:
> > > On Fri, 08 May 2026 18:00:25 +0100
> > > Rodrigo Alencar via B4 Relay <devnull+rodrigo.alencar.analog.com@kernel.org> wrote:
> > >
> > > > From: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > >
> > > > Add custom ABI documentation file for the DDS AD9910 with sysfs entries to
> > > > control Parallel Port, Digital Ramp Generator and OSK parameters.
> > > >
> > > > Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com>
> > > I'm fine with phase and frequency as defined, but for the scaling it made me wonder.
> > > For outvoltage0 channels the assumption the value is the peak voltage so if
> > > we know what input to be modulated by the ramp generator can we express them
> > > in volts (well milivolts) rather than as a scaling multiplier?
> >
> > The DAC output is current-based and differential. Voltage conversion would happen
> > outside the device...
>
> Why aren't we representing this as out_altcurrentX-Y_xxxx?
Good point! altcurrent makes more sense than altvoltage if we want to use raw to
control the output level rather than scale, which would be a constant to convert
raw into current units (what is the one that is used in the sysfs ABI? Ampere, mA or uA?)
Not sure about the benefits on setting "differential" in channel spec.. the name would
become out_altcurrentX-altcurrentY_xxxxx...
Is there any modifier for amplitude/peak/envelope? I see IIO_MOD_RMS, which could be used
if adding a 1/sqrt(2) factor to the fixed scale.
Then, I would consider something like out_altcurrent_rms_xxxx as a good alternative.
"scale" would be a constant in the top-level phy channel
single tone profile channels would have:
- frequency
- phase
- raw
drg ramp up/down channels:
- frequency and frequency_roc
- phase and phase_roc
- raw and raw_roc
parallel port channel(s):
- frequency_scale and frequency_offset (frequency destination)
- phase_offset (polar destination)
- offset (polar destination)
osk channel:
- raw
- raw_roc
raw_roc could be just roc, but that sounds like it carries the scale and refers to
a current value? and maybe that breaks consistency with other destination attributes?
I am fine with just roc if that refers to the raw value, not (raw * scale).
With all the above, still using altvoltage is not incorrect, just a matter on how
we want to express the units. Note that using raw instead of scale to control the
amplitude is just another option to tackle the problem. I suppose that the
important thing here is being technically corrent and consistent in terms of
usage. Maybe out_altcurrent_rms_* is more clear in terms of amplitude level.
>
>
> > using a resistor load or an op-amp transimpedance stage,
> > and I am no expert on that, but that often requires impedance matching so voltage
> > levels may depend on the frequency. Then, I suppose that voltage is not the right
> > unit to use.
>
> Understood that it can get complex!
> >
> > The scale here controls the amplitude of the varying signal. Assuming the peak voltage
> > (amplitude) is constant means we have a constant envelope, but that should not mean
> > we can't control it or it should not mean that the hardware can have other ways to
> > control it. That said, scale behaves as a "gain multiplier".
> Understood. Given it's the envelope then if scale happened to be 1 always it would
> be presented as _processed. So this is consistent with other channel types.
>
> >
> > >
> > > That seems to me like it fits better with the overall ABI.
> > >
> > > > +What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_scale_offset
> > > > +KernelVersion:
> > > > +Contact: linux-iio@vger.kernel.org
> > > > +Description:
> > > > + For a channel that allows amplitude control through buffers, this
> > > > + represents the value for a base amplitude scale. The actual output
> > > > + amplitude scale is a result with the sum of this value.
> > > > +
> > >
> > > > +
> > > > +What: /sys/bus/iio/devices/iio:deviceX/out_altvoltageY_scale_roc
> > >
> > > Silly question perhaps but can work out how this related to millivolts/sec
> > > That might make a more intuitive interface than scaling multiplier per sec
> > > Perhaps the combination with offset makes this impossible though maybe that
> > > could be a expressed as a voltage offset? Afterall if the amplitude being
> > > scaled is 5V then 5 * (offset + scale) = 5 * offset + 5 * scale
> > >
> > > > +KernelVersion:
> > > > +Contact: linux-iio@vger.kernel.org
> > > > +Description:
> > > > + Amplitude scale rate of change in 1/s for channels that ramp
> > > > + amplitude. This value may be influenced by the channel's
> > > > + sampling_frequency setting.
> > >
> > >
> >
>
--
Kind regards,
Rodrigo Alencar
^ permalink raw reply
* Re: [PATCH v11 3/6] iio: adc: ad4691: add triggered buffer support
From: Jonathan Cameron @ 2026-05-18 15:25 UTC (permalink / raw)
To: David Lechner
Cc: radu.sabau, Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
linux-iio, devicetree, linux-kernel, linux-pwm, linux-gpio,
linux-doc
In-Reply-To: <c9610990-6b40-40a8-948c-fa1209242dbe@baylibre.com>
On Mon, 18 May 2026 09:36:18 -0500
David Lechner <dlechner@baylibre.com> wrote:
> On 5/18/26 9:21 AM, Jonathan Cameron wrote:
> > On Sun, 17 May 2026 14:21:30 -0500
> > David Lechner <dlechner@baylibre.com> wrote:
> >
> >> On 5/17/26 7:25 AM, Jonathan Cameron wrote:
> >>> On Sat, 16 May 2026 12:32:51 -0500
> >>> David Lechner <dlechner@baylibre.com> wrote:
> >>>
> >>>> On 5/15/26 8:31 AM, Radu Sabau via B4 Relay wrote:
> >>>>> From: Radu Sabau <radu.sabau@analog.com>
> >>>>>
> >>>>> Add buffered capture support using the IIO triggered buffer framework.
> >>>>>
> >>>>> CNV Burst Mode: the GP pin identified by interrupt-names in the device
> >>>>> tree is configured as DATA_READY output. The IRQ handler stops
> >>>>> conversions and fires the IIO trigger; the trigger handler executes a
> >>>>> pre-built SPI message that reads all active channels from the AVG_IN
> >>>>> accumulator registers and then resets accumulator state and restarts
> >>>>> conversions for the next cycle.
> >>>>>
> >>>>> Manual Mode: CNV is tied to SPI CS so each transfer simultaneously
> >>>>> reads the previous result and starts the next conversion (pipelined
> >>>>> N+1 scheme). At preenable time a pre-built, optimised SPI message of
> >>>>> N+1 transfers is constructed (N channel reads plus one NOOP to drain
> >>>>> the pipeline). The trigger handler executes the message in a single
> >>>>> spi_sync() call and collects the results. An external trigger (e.g.
> >>>>> iio-trig-hrtimer) is required to drive the trigger at the desired
> >>>>> sample rate.
> >>>>>
> >>>>> Both modes share the same trigger handler and push a complete scan —
> >>>>> one big-endian 16-bit (__be16) slot per active channel, densely packed
> >>>>> in scan_index order, followed by a timestamp.
> >>>>>
> >>>>> The CNV Burst Mode sampling frequency (PWM period) is exposed as a
> >>>>> buffer-level attribute via IIO_DEVICE_ATTR.
> >>>>>
> >>>>> Signed-off-by: Radu Sabau <radu.sabau@analog.com>
> >>>
> >>>>> +
> >>>>> +static int ad4691_manual_buffer_preenable(struct iio_dev *indio_dev)
> >>>>> +{
> >>>>> + struct ad4691_state *st = iio_priv(indio_dev);
> >>>>> + unsigned int k, i;
> >>>>> + int ret;
> >>>>> +
> >>>>> + memset(st->scan_xfers, 0, sizeof(st->scan_xfers));
> >>>>> + memset(st->scan_tx, 0, sizeof(st->scan_tx));
> >>>>> +
> >>>>> + spi_message_init(&st->scan_msg);
> >>>>> +
> >>>>> + k = 0;
> >>>>> + iio_for_each_active_channel(indio_dev, i) {
> >>>>> + if (i >= indio_dev->num_channels - 1)
> >>>>> + break; /* skip soft timestamp */
> >>>>
> >>>> I don't think timestamp gets set in the scan mask. It is handled separately.
> >>>
> >>> FWIW that is a sashiko false postive (I believe anyway!)
> >>> If we do hit this please shout as we have a core bug.
> >>>
> >>> If anyone has time to look at how hard it would be to tweak
> >>> iio_for_each_active_channel to skip a last element timestamp that
> >>> would be great.
> >>>
> >>> I think that iterates one too far which is what sashiko is tripping over.
> >>>
> >>> I'm only keen to fix that if we can make it low cost and hid it entirely
> >>> from drivers.
> >>>
> >>> Jonathan
> >>>
> >> This is what I came up with (totally untested).
> >>
> >> Since timestamp can never be set in scan_mask/active_scan_mask, it should
> >> be safe to exclude it from masklength without breaking existing code.
> > Probably...
> >>
> >> I didn't check all callers of masklength/iio_get_masklength() though.
> >
> > That was the bit that made me nervous. Particularly if there is an off
> > by one that is working by luck today - or someone who understood this
> > oddity and did it deliberately.
> >
> > At one point we also had a few other timestamps - the ones come from hardware.
> > I can't remember how we handled those wrt to the scan mask. I took a quick
> > look and thing they are all fine.
> > FWIW a nice precursor would be to make sure all timestamp channels are assigned
> > using the macro. There are a few that are hand crafted. I tested a few, but obviously
> > needs turning in to a proper set and cleaning up.
> >
> > diff --git a/drivers/iio/adc/ad4170-4.c b/drivers/iio/adc/ad4170-4.c
> > index 627cbf5a37b0..890e25294baa 100644
> > --- a/drivers/iio/adc/ad4170-4.c
> > +++ b/drivers/iio/adc/ad4170-4.c
> > @@ -2385,9 +2385,7 @@ static int ad4170_parse_channels(struct iio_dev *indio_dev)
> > }
> >
> > /* Add timestamp channel */
> > - struct iio_chan_spec ts_chan = IIO_CHAN_SOFT_TIMESTAMP(chan_num);
> > -
> > - st->chans[chan_num] = ts_chan;
> > + st->chans[chan_num] = IIO_CHAN_SOFT_TIMESTAMP(chan_num);
> > num_channels = num_channels + 1;
> >
> > indio_dev->num_channels = num_channels;
> > diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c
> > index 6e1930f7c65d..56baca1f5026 100644
> > --- a/drivers/iio/adc/at91_adc.c
> > +++ b/drivers/iio/adc/at91_adc.c
> > @@ -521,13 +521,7 @@ static int at91_adc_channel_init(struct iio_dev *idev)
> > }
> > timestamp = chan_array + idx;
> >
> > - timestamp->type = IIO_TIMESTAMP;
> > - timestamp->channel = -1;
> > - timestamp->scan_index = idx;
> > - timestamp->scan_type.sign = 's';
> > - timestamp->scan_type.realbits = 64;
> > - timestamp->scan_type.storagebits = 64;
> > -
> > + *timestamp = IIO_CHAN_SOFT_TIMESTAMP(idx);
> > idev->channels = chan_array;
> > return idev->num_channels;
> > }
> > diff --git a/drivers/iio/adc/cc10001_adc.c b/drivers/iio/adc/cc10001_adc.c
> > index 2c51b90b7101..d42b747325aa 100644
> > --- a/drivers/iio/adc/cc10001_adc.c
> > +++ b/drivers/iio/adc/cc10001_adc.c
> > @@ -262,7 +262,7 @@ static const struct iio_info cc10001_adc_info = {
> > static int cc10001_adc_channel_init(struct iio_dev *indio_dev,
> > unsigned long channel_map)
> > {
> > - struct iio_chan_spec *chan_array, *timestamp;
> > + struct iio_chan_spec *chan_array;
> > unsigned int bit, idx = 0;
> >
> > indio_dev->num_channels = bitmap_weight(&channel_map,
> > @@ -289,13 +289,7 @@ static int cc10001_adc_channel_init(struct iio_dev *indio_dev,
> > idx++;
> > }
> >
> > - timestamp = &chan_array[idx];
> > - timestamp->type = IIO_TIMESTAMP;
> > - timestamp->channel = -1;
> > - timestamp->scan_index = idx;
> > - timestamp->scan_type.sign = 's';
> > - timestamp->scan_type.realbits = 64;
> > - timestamp->scan_type.storagebits = 64;
> > + chan_array[idx] = IIO_CHAN_SOFT_TIMESTAMP(idx);
> >
> > indio_dev->channels = chan_array;
> >
> > diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
> > index 96b05c86c325..702b2fc66326 100644
> > --- a/include/linux/iio/iio.h
> > +++ b/include/linux/iio/iio.h
> > @@ -353,7 +353,7 @@ static inline bool iio_channel_has_available(const struct iio_chan_spec *chan,
> > (chan->info_mask_shared_by_all_available & BIT(type));
> > }
> >
> > -#define IIO_CHAN_SOFT_TIMESTAMP(_si) { \
> > +#define IIO_CHAN_SOFT_TIMESTAMP(_si) (struct iio_chan_spec) { \
> > .type = IIO_TIMESTAMP, \
> > .channel = -1, \
> > .scan_index = _si, \
> >
> > Doing that will mean we can spot any unusual use of IIO_TIMESTAMP much more
> > easily.
> >
> > Anyhow, basic approach looks good to me.
>
> I guess you didn't see the other series cleaning up IIO_TIMESTAMP I already
> sent yet.
>
:( That's what I get for not reading all my email before starting to reply!
> >
> > Jonathan
> >
> >
> >
> >>
> >> ---
> >> diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
> >> index 9d66510a1d49..17f539fc23e2 100644
> >> --- a/drivers/iio/industrialio-buffer.c
> >> +++ b/drivers/iio/industrialio-buffer.c
> >> @@ -2300,8 +2300,10 @@ int iio_buffers_alloc_sysfs_and_mask(struct iio_dev *indio_dev)
> >> if (channels) {
> >> int ml = 0;
> >>
> >> - for (i = 0; i < indio_dev->num_channels; i++)
> >> - ml = max(ml, channels[i].scan_index + 1);
> >> + for (i = 0; i < indio_dev->num_channels; i++) {
> >> + if (channels[i].type != IIO_TIMESTAMP)
> >> + ml = max(ml, channels[i].scan_index + 1);
> >> + }
> >> ACCESS_PRIVATE(indio_dev, masklength) = ml;
> >> }
> >>
> >>
> >>
> >>
> >
>
^ permalink raw reply
* Re: [PATCH v11 4/6] iio: adc: ad4691: add SPI offload support
From: David Lechner @ 2026-05-18 15:16 UTC (permalink / raw)
To: Sabau, Radu bogdan, Lars-Peter Clausen, Hennerich, Michael,
Jonathan Cameron, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan
Cc: linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <LV9PR03MB841418AEF0059E802F7A69B2F7032@LV9PR03MB8414.namprd03.prod.outlook.com>
On 5/18/26 10:14 AM, Sabau, Radu bogdan wrote:
>> -----Original Message-----
>> From: David Lechner <dlechner@baylibre.com>
>> Sent: Saturday, May 16, 2026 8:53 PM
>
> ...
>
>>> static ssize_t sampling_frequency_show(struct device *dev,
>>> struct device_attribute *attr,
>>> char *buf)
>>> @@ -880,6 +1229,9 @@ static ssize_t sampling_frequency_show(struct
>> device *dev,
>>> struct iio_dev *indio_dev = dev_to_iio_dev(dev);
>>> struct ad4691_state *st = iio_priv(indio_dev);
>>>
>>> + if (st->manual_mode && st->offload)
>>> + return sysfs_emit(buf, "%llu\n", READ_ONCE(st->offload-
>>> trigger_hz));
>>
>> Why do we need READ_ONCE?
>>
>
> trigger_hz is u64 and if the target is 32-bit, a 64-bit access compiles to two 32-bit
> instructions, so show() reading it without a lock and store() writing it concurrently
> can produce a torn value at the compiler level. READ_ONCE/WRITE_ONCE suppress
> the compiler transformations that would allow that splitting or caching. We could
> have st->lock in show() instead, but that felt heavier than necessary for a single
> scalar where a transiently stale-but-whole read is fine.
>
I would go with the mutex. It will be easier for people to understand.
^ permalink raw reply
* RE: [PATCH v11 4/6] iio: adc: ad4691: add SPI offload support
From: Sabau, Radu bogdan @ 2026-05-18 15:14 UTC (permalink / raw)
To: David Lechner, Lars-Peter Clausen, Hennerich, Michael,
Jonathan Cameron, Sa, Nuno, Andy Shevchenko, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Uwe Kleine-König,
Liam Girdwood, Mark Brown, Linus Walleij, Bartosz Golaszewski,
Philipp Zabel, Jonathan Corbet, Shuah Khan
Cc: linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <80f61c0b-1f36-4fee-9f76-b93f63b87abe@baylibre.com>
> -----Original Message-----
> From: David Lechner <dlechner@baylibre.com>
> Sent: Saturday, May 16, 2026 8:53 PM
...
> > static ssize_t sampling_frequency_show(struct device *dev,
> > struct device_attribute *attr,
> > char *buf)
> > @@ -880,6 +1229,9 @@ static ssize_t sampling_frequency_show(struct
> device *dev,
> > struct iio_dev *indio_dev = dev_to_iio_dev(dev);
> > struct ad4691_state *st = iio_priv(indio_dev);
> >
> > + if (st->manual_mode && st->offload)
> > + return sysfs_emit(buf, "%llu\n", READ_ONCE(st->offload-
> >trigger_hz));
>
> Why do we need READ_ONCE?
>
trigger_hz is u64 and if the target is 32-bit, a 64-bit access compiles to two 32-bit
instructions, so show() reading it without a lock and store() writing it concurrently
can produce a torn value at the compiler level. READ_ONCE/WRITE_ONCE suppress
the compiler transformations that would allow that splitting or caching. We could
have st->lock in show() instead, but that felt heavier than necessary for a single
scalar where a transiently stale-but-whole read is fine.
^ permalink raw reply
* [PATCH v2 2/2] kselftest/arm64: Add 2025 dpISA coverage to hwcaps
From: Mark Brown @ 2026-05-18 15:07 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Shuah Khan
Cc: linux-arm-kernel, linux-kernel, linux-doc, linux-kselftest,
Mark Brown
In-Reply-To: <20260518-arm64-dpisa-2025-v2-0-b3367b73bd00@kernel.org>
Add coverage of the new hwcaps to the test program, encodings cross checked
against LLVM 22.
Signed-off-by: Mark Brown <broonie@kernel.org>
---
tools/testing/selftests/arm64/abi/hwcap.c | 116 ++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/tools/testing/selftests/arm64/abi/hwcap.c b/tools/testing/selftests/arm64/abi/hwcap.c
index e22703d6b97c..19fca95f7c22 100644
--- a/tools/testing/selftests/arm64/abi/hwcap.c
+++ b/tools/testing/selftests/arm64/abi/hwcap.c
@@ -108,6 +108,24 @@ static void f8mm8_sigill(void)
asm volatile(".inst 0x6e80ec00");
}
+static void f16f32dot_sigill(void)
+{
+ /* FDOT V0.2S, V0.4H, V0.2H[0] */
+ asm volatile(".inst 0xf409000");
+}
+
+static void f16f32mm_sigill(void)
+{
+ /* FMMLA V0.4S, V0.8H, V0.8H */
+ asm volatile(".inst 0x4e40ec00");
+}
+
+static void f16mm_sigill(void)
+{
+ /* FMMLA V0.8H, V0.8H, V0.8H */
+ asm volatile(".inst 0x4ec0ec00");
+}
+
static void faminmax_sigill(void)
{
/* FAMIN V0.4H, V0.4H, V0.4H */
@@ -191,6 +209,12 @@ static void lut_sigill(void)
asm volatile(".inst 0x4e801000");
}
+static void sve_lut6_sigill(void)
+{
+ /* LUTI6 Z0.H, { Z0.H, Z1.H }, Z0[0] */
+ asm volatile(".inst 0x4560ac00");
+}
+
static void mops_sigill(void)
{
char dst[1], src[1];
@@ -282,6 +306,18 @@ static void sme2p2_sigill(void)
asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
}
+static void sme2p3_sigill(void)
+{
+ /* SMSTART SM */
+ asm volatile("msr S0_3_C4_C3_3, xzr" : : : );
+
+ /* ADDQP Z0.B, Z0.B, Z0.B */
+ asm volatile(".inst 0x4207800" : : : "z0");
+
+ /* SMSTOP */
+ asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
static void sme_aes_sigill(void)
{
/* SMSTART SM */
@@ -378,6 +414,18 @@ static void smef8f32_sigill(void)
asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
}
+static void smelut6_sigill(void)
+{
+ /* SMSTART */
+ asm volatile("msr S0_3_C4_C7_3, xzr" : : : );
+
+ /* LUTI6 { Z0.B-Z3.B }, ZT0, { Z0-Z2 } */
+ asm volatile(".inst 0xc08a0000" : : : );
+
+ /* SMSTOP */
+ asm volatile("msr S0_3_C4_C6_3, xzr" : : : );
+}
+
static void smelutv2_sigill(void)
{
/* SMSTART */
@@ -486,6 +534,12 @@ static void sve2p2_sigill(void)
asm volatile(".inst 0x4cea000" : : : "z0");
}
+static void sve2p3_sigill(void)
+{
+ /* ADDQP Z0.B, Z0.B, Z0.B */
+ asm volatile(".inst 0x4207800" : : : "z0");
+}
+
static void sveaes_sigill(void)
{
/* AESD z0.b, z0.b, z0.b */
@@ -504,6 +558,12 @@ static void sveb16b16_sigill(void)
asm volatile(".inst 0x65000000" : : : );
}
+static void sveb16mm_sigill(void)
+{
+ /* BFMMLA Z0.H, Z0.H, Z0.H */
+ asm volatile(".inst 0x64e0e000" : : : );
+}
+
static void svebfscale_sigill(void)
{
/* BFSCALE Z0.H, P0/M, Z0.H, Z0.H */
@@ -729,6 +789,27 @@ static const struct hwcap_data {
.cpuinfo = "f8mm4",
.sigill_fn = f8mm4_sigill,
},
+ {
+ .name = "F16MM",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_F16MM,
+ .cpuinfo = "f16mm",
+ .sigill_fn = f16mm_sigill,
+ },
+ {
+ .name = "F16F32DOT",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_F16F32DOT,
+ .cpuinfo = "f16f32dot",
+ .sigill_fn = f16f32dot_sigill,
+ },
+ {
+ .name = "F16F32MM",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_F16F32MM,
+ .cpuinfo = "f16f32mm",
+ .sigill_fn = f16f32mm_sigill,
+ },
{
.name = "FAMINMAX",
.at_hwcap = AT_HWCAP2,
@@ -918,6 +999,13 @@ static const struct hwcap_data {
.cpuinfo = "sme2p2",
.sigill_fn = sme2p2_sigill,
},
+ {
+ .name = "SME 2.3",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_SME2P3,
+ .cpuinfo = "sme2p3",
+ .sigill_fn = sme2p3_sigill,
+ },
{
.name = "SME AES",
.at_hwcap = AT_HWCAP,
@@ -967,6 +1055,13 @@ static const struct hwcap_data {
.cpuinfo = "smef8f32",
.sigill_fn = smef8f32_sigill,
},
+ {
+ .name = "SME LUT6",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_SME_LUT6,
+ .cpuinfo = "smelut6",
+ .sigill_fn = smelut6_sigill,
+ },
{
.name = "SME LUTV2",
.at_hwcap = AT_HWCAP2,
@@ -1052,6 +1147,13 @@ static const struct hwcap_data {
.cpuinfo = "sve2p2",
.sigill_fn = sve2p2_sigill,
},
+ {
+ .name = "SVE 2.3",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_SVE2P3,
+ .cpuinfo = "sve2p3",
+ .sigill_fn = sve2p3_sigill,
+ },
{
.name = "SVE AES",
.at_hwcap = AT_HWCAP2,
@@ -1066,6 +1168,13 @@ static const struct hwcap_data {
.cpuinfo = "sveaes2",
.sigill_fn = sveaes2_sigill,
},
+ {
+ .name = "SVE B16MM",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_SVE_B16MM,
+ .cpuinfo = "sveb16mm",
+ .sigill_fn = sveb16mm_sigill,
+ },
{
.name = "SVE BFSCALE",
.at_hwcap = AT_HWCAP,
@@ -1087,6 +1196,13 @@ static const struct hwcap_data {
.cpuinfo = "svef16mm",
.sigill_fn = svef16mm_sigill,
},
+ {
+ .name = "SVE_LUT6",
+ .at_hwcap = AT_HWCAP3,
+ .hwcap_bit = HWCAP3_SVE_LUT6,
+ .cpuinfo = "svelut6",
+ .sigill_fn = sve_lut6_sigill,
+ },
{
.name = "SVE2 B16B16",
.at_hwcap = AT_HWCAP2,
--
2.47.3
^ permalink raw reply related
* [PATCH v2 1/2] arm64/cpufeature: Define hwcaps for 2025 dpISA features
From: Mark Brown @ 2026-05-18 15:07 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Jonathan Corbet, Shuah Khan
Cc: linux-arm-kernel, linux-kernel, linux-doc, linux-kselftest,
Mark Brown
In-Reply-To: <20260518-arm64-dpisa-2025-v2-0-b3367b73bd00@kernel.org>
The features added by the 2025 dpISA are all straightforward instruction
only features so there is no state to manage, we can just expose hwcaps to
let userspace know they are available.
F16MM is slightly odd in that the feature is FEAT_F16MM but it is discovered
via ID_AA64FPFR0_EL1.F16MM2. We follow the feature name.
Signed-off-by: Mark Brown <broonie@kernel.org>
---
Documentation/arch/arm64/elf_hwcaps.rst | 24 ++++++++++++++++++++++++
arch/arm64/include/uapi/asm/hwcap.h | 8 ++++++++
arch/arm64/kernel/cpufeature.c | 11 +++++++++++
arch/arm64/kernel/cpuinfo.c | 8 ++++++++
4 files changed, 51 insertions(+)
diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst
index 97315ae6c0da..07ff9ea1d605 100644
--- a/Documentation/arch/arm64/elf_hwcaps.rst
+++ b/Documentation/arch/arm64/elf_hwcaps.rst
@@ -451,6 +451,30 @@ HWCAP3_LS64
of CPU. User should only use ld64b/st64b on supported target (device)
memory location, otherwise fallback to the non-atomic alternatives.
+HWCAP3_SVE_B16MM
+ Functionality implied by ID_AA64ZFR0_EL1.B16B16 == 0b0011
+
+HWCAP3_SVE2P3
+ Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0100
+
+HWCAP3_SME_LUT6
+ Functionality implied by ID_AA64SMFR0_EL1.LUT6 == 0b1
+
+HWCAP3_SME2P3
+ Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0100
+
+HWCAP3_F16MM
+ Functionality implied by ID_AA64FPFR0_EL1.F16MM2 == 0b1
+
+HWCAP3_F16F32DOT
+ Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0010
+
+HWCAP3_F16F32MM
+ Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0011
+
+HWCAP3_SVE_LUT6
+ Functionality implied by ID_AA64ISAR2_EL1.LUT == 0b0010 and
+ ID_AA64PFR0_EL1.SVE == 0b0001.
4. Unused AT_HWCAP bits
-----------------------
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 06f83ca8de56..10272ddb4d6f 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -147,5 +147,13 @@
#define HWCAP3_MTE_STORE_ONLY (1UL << 1)
#define HWCAP3_LSFE (1UL << 2)
#define HWCAP3_LS64 (1UL << 3)
+#define HWCAP3_SVE_B16MM (1UL << 4)
+#define HWCAP3_SVE2P3 (1UL << 5)
+#define HWCAP3_SME_LUT6 (1UL << 6)
+#define HWCAP3_SME2P3 (1UL << 7)
+#define HWCAP3_F16MM (1UL << 8)
+#define HWCAP3_F16F32DOT (1UL << 9)
+#define HWCAP3_F16F32MM (1UL << 10)
+#define HWCAP3_SVE_LUT6 (1UL << 11)
#endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 6d53bb15cf7b..96de16582fca 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -365,6 +365,8 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
static const struct arm64_ftr_bits ftr_id_aa64smfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_EL1_FA64_SHIFT, 1, 0),
+ ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
+ FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_EL1_LUT6_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
FTR_STRICT, FTR_EXACT, ID_AA64SMFR0_EL1_LUTv2_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE_IF_IS_ENABLED(CONFIG_ARM64_SME),
@@ -419,6 +421,7 @@ static const struct arm64_ftr_bits ftr_id_aa64fpfr0[] = {
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F8DP2_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F8MM8_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F8MM4_SHIFT, 1, 0),
+ ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F16MM2_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F8E4M3_SHIFT, 1, 0),
ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_EXACT, ID_AA64FPFR0_EL1_F8E5M2_SHIFT, 1, 0),
ARM64_FTR_END,
@@ -3284,6 +3287,8 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64ISAR0_EL1, SM4, IMP, CAP_HWCAP, KERNEL_HWCAP_SM4),
HWCAP_CAP(ID_AA64ISAR0_EL1, DP, IMP, CAP_HWCAP, KERNEL_HWCAP_ASIMDDP),
HWCAP_CAP(ID_AA64ISAR0_EL1, FHM, IMP, CAP_HWCAP, KERNEL_HWCAP_ASIMDFHM),
+ HWCAP_CAP(ID_AA64ISAR0_EL1, FHM, F16F32DOT, CAP_HWCAP, KERNEL_HWCAP_F16F32DOT),
+ HWCAP_CAP(ID_AA64ISAR0_EL1, FHM, F16F32MM, CAP_HWCAP, KERNEL_HWCAP_F16F32MM),
HWCAP_CAP(ID_AA64ISAR0_EL1, TS, FLAGM, CAP_HWCAP, KERNEL_HWCAP_FLAGM),
HWCAP_CAP(ID_AA64ISAR0_EL1, TS, FLAGM2, CAP_HWCAP, KERNEL_HWCAP_FLAGM2),
HWCAP_CAP(ID_AA64ISAR0_EL1, RNDR, IMP, CAP_HWCAP, KERNEL_HWCAP_RNG),
@@ -3313,7 +3318,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64ISAR3_EL1, LSFE, IMP, CAP_HWCAP, KERNEL_HWCAP_LSFE),
HWCAP_CAP(ID_AA64MMFR2_EL1, AT, IMP, CAP_HWCAP, KERNEL_HWCAP_USCAT),
#ifdef CONFIG_ARM64_SVE
+ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ISAR2_EL1, LUT, LUT6, CAP_HWCAP, KERNEL_HWCAP_SVE_LUT6),
HWCAP_CAP(ID_AA64PFR0_EL1, SVE, IMP, CAP_HWCAP, KERNEL_HWCAP_SVE),
+ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2p3, CAP_HWCAP, KERNEL_HWCAP_SVE2P3),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2p2, CAP_HWCAP, KERNEL_HWCAP_SVE2P2),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2p1, CAP_HWCAP, KERNEL_HWCAP_SVE2P1),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SVEver, SVE2, CAP_HWCAP, KERNEL_HWCAP_SVE2),
@@ -3323,6 +3330,7 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BitPerm, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBITPERM),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, B16B16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVE_B16B16),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, B16B16, BFSCALE, CAP_HWCAP, KERNEL_HWCAP_SVE_BFSCALE),
+ HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, B16B16, B16MM, CAP_HWCAP, KERNEL_HWCAP_SVE_B16MM),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, IMP, CAP_HWCAP, KERNEL_HWCAP_SVEBF16),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, BF16, EBF16, CAP_HWCAP, KERNEL_HWCAP_SVE_EBF16),
HWCAP_CAP_MATCH_ID(has_sve_feature, ID_AA64ZFR0_EL1, SHA3, IMP, CAP_HWCAP, KERNEL_HWCAP_SVESHA3),
@@ -3362,7 +3370,9 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
#ifdef CONFIG_ARM64_SME
HWCAP_CAP(ID_AA64PFR1_EL1, SME, IMP, CAP_HWCAP, KERNEL_HWCAP_SME),
HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, FA64, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_FA64),
+ HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, LUT6, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_LUT6),
HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, LUTv2, IMP, CAP_HWCAP, KERNEL_HWCAP_SME_LUTV2),
+ HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, SMEver, SME2p3, CAP_HWCAP, KERNEL_HWCAP_SME2P3),
HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, SMEver, SME2p2, CAP_HWCAP, KERNEL_HWCAP_SME2P2),
HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, SMEver, SME2p1, CAP_HWCAP, KERNEL_HWCAP_SME2P1),
HWCAP_CAP_MATCH_ID(has_sme_feature, ID_AA64SMFR0_EL1, SMEver, SME2, CAP_HWCAP, KERNEL_HWCAP_SME2),
@@ -3393,6 +3403,7 @@ static const struct arm64_cpu_capabilities arm64_elf_hwcaps[] = {
HWCAP_CAP(ID_AA64FPFR0_EL1, F8DP2, IMP, CAP_HWCAP, KERNEL_HWCAP_F8DP2),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8MM8, IMP, CAP_HWCAP, KERNEL_HWCAP_F8MM8),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8MM4, IMP, CAP_HWCAP, KERNEL_HWCAP_F8MM4),
+ HWCAP_CAP(ID_AA64FPFR0_EL1, F16MM2, IMP, CAP_HWCAP, KERNEL_HWCAP_F16MM),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8E4M3, IMP, CAP_HWCAP, KERNEL_HWCAP_F8E4M3),
HWCAP_CAP(ID_AA64FPFR0_EL1, F8E5M2, IMP, CAP_HWCAP, KERNEL_HWCAP_F8E5M2),
#ifdef CONFIG_ARM64_POE
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 6149bc91251d..d50e2a9b066b 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -164,6 +164,14 @@ static const char *const hwcap_str[] = {
[KERNEL_HWCAP_MTE_FAR] = "mtefar",
[KERNEL_HWCAP_MTE_STORE_ONLY] = "mtestoreonly",
[KERNEL_HWCAP_LSFE] = "lsfe",
+ [KERNEL_HWCAP_SVE_B16MM] = "sveb16mm",
+ [KERNEL_HWCAP_SVE2P3] = "sve2p3",
+ [KERNEL_HWCAP_SME_LUT6] = "smelut6",
+ [KERNEL_HWCAP_SME2P3] = "sme2p3",
+ [KERNEL_HWCAP_F16MM] = "f16mm",
+ [KERNEL_HWCAP_F16F32DOT] = "f16f32dot",
+ [KERNEL_HWCAP_F16F32MM] = "f16f32mm",
+ [KERNEL_HWCAP_SVE_LUT6] = "svelut6",
};
#ifdef CONFIG_COMPAT
--
2.47.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox