From: Sasha Levin <sashal@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-api@vger.kernel.org, workflows@vger.kernel.org,
tools@kernel.org, Kate Stewart <kstewart@linuxfoundation.org>,
Gabriele Paoloni <gpaoloni@redhat.com>,
Chuck Wolber <chuckwolber@gmail.com>
Subject: Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
Date: Tue, 1 Jul 2025 18:16:13 -0400 [thread overview]
Message-ID: <aGReLaCleEjzu-nt@lappy> (raw)
In-Reply-To: <87v7obpoxn.fsf@trenco.lwn.net>
On Tue, Jul 01, 2025 at 03:43:32PM -0600, Jonathan Corbet wrote:
>Sasha Levin <sashal@kernel.org> writes:
>
>> So I have a proof of concept which during the build process creates
>> .apispec.h which are generated from kerneldoc and contain macros
>> identical to the ones in my RFC.
>>
>> Here's an example of sys_mlock() spec:
>
>So I'm getting ahead of the game, but I have to ask some questions...
>
>> /**
>> * sys_mlock - Lock pages in memory
>> * @start: Starting address of memory range to lock
>> * @len: Length of memory range to lock in bytes
>> *
>> * Locks pages in the specified address range into RAM, preventing them from
>> * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
>> * resource limit.
>> *
>> * long-desc: Locks pages in the specified address range into RAM, preventing
>> * them from being paged to swap. Requires CAP_IPC_LOCK capability
>> * or RLIMIT_MEMLOCK resource limit.
>
>Why duplicate the long description?
Will fix.
>> * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>> * param-type: start, KAPI_TYPE_UINT
>
>This is something I wondered before; rather than a bunch of lengthy
>KAPI_* symbols, why not just say __u64 (or some other familiar type)
>here?
I think it gets tricky when we got to more complex types. For example,
how do we represent a FD or a (struct sockaddr *)?
With macros, KAPI_TYPE_FD or KAPI_TYPE_SOCKADDR make sense, but
__sockaddr will be a bit confusing (I think).
>> * param-flags: start, KAPI_PARAM_IN
>> * param-constraint-type: start, KAPI_CONSTRAINT_NONE
>> * param-constraint: start, Rounded down to page boundary
>> * param-type: len, KAPI_TYPE_UINT
>> * param-flags: len, KAPI_PARAM_IN
>> * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
>> * param-range: len, 0, LONG_MAX
>> * param-constraint: len, Rounded up to page boundary
>> * return-type: KAPI_TYPE_INT
>> * return-check-type: KAPI_RETURN_ERROR_CHECK
>> * return-success: 0
>> * error-code: -ENOMEM, ENOMEM, Address range issue,
>> * Some of the specified range is not mapped, has unmapped gaps,
>> * or the lock would cause the number of mapped regions to exceed the limit.
>> * error-code: -EPERM, EPERM, Insufficient privileges,
>> * The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
>> * error-code: -EINVAL, EINVAL, Address overflow,
>> * The result of the addition start+len was less than start (arithmetic overflow).
>> * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
>> * Some or all of the specified address range could not be locked.
>> * error-code: -EINTR, EINTR, Interrupted by signal,
>> * The operation was interrupted by a fatal signal before completion.
>> * error-code: -EFAULT, EFAULT, Bad address,
>> * The specified address range contains invalid addresses that cannot be accessed.
>> * since-version: 2.0
>> * lock: mmap_lock, KAPI_LOCK_RWLOCK
>> * lock-acquired: true
>> * lock-released: true
>> * lock-desc: Process memory map write lock
>> * signal: FATAL
>> * signal-direction: KAPI_SIGNAL_RECEIVE
>> * signal-action: KAPI_SIGNAL_ACTION_RETURN
>> * signal-condition: Fatal signal pending
>> * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
>> * when acquiring mmap_write_lock_killable() and during page population
>> * in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
>> * mlock - the operation continues even if SIGINT/SIGTERM are received.
>> * signal-error: -EINTR
>> * signal-timing: KAPI_SIGNAL_TIME_DURING
>> * signal-priority: 0
>> * signal-interruptible: yes
>> * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
>> * examples: mlock(addr, 4096); // Lock one page
>> * mlock(addr, len); // Lock range of pages
>> * notes: Memory locks do not stack - multiple calls on the same range can be
>> * undone by a single munlock. Locks are not inherited by child processes.
>> * Pages are locked on whole page boundaries. Commonly used by real-time
>> * applications to prevent page faults during time-critical operations.
>> * Also used for security to prevent sensitive data (e.g., cryptographic keys)
>> * from being written to swap. Note: locked pages may still be saved to
>> * swap during system suspend/hibernate.
>> *
>> * Tagged addresses are automatically handled via untagged_addr(). The operation
>> * occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
>> * populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
>> * by recounting locked memory to avoid double-counting overlapping regions.
>> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
>
>I hope the really long lines starting here aren't the intended way to go...:)
I guess that we have two options around more complex blocks like these.
One, the longer lines you've pointed out. They are indeed long and
difficult to read, but they present a relatively static and "not too
interesting" information which users are likely to gloss over.
The other one would look something like:
side-effect: KAPI_EFFECT_MODIFY_STATE
side-effect-type: KAPI_EFFECT_MODIFY_STATE
side-effect-target: mm->locked_vm
side-effect-description: Increases process locked memory counter
side-effect-reversible: yes
Which isn't as long, but it occupies a bunch of vertical real estate
while not being too interesting for most of the readers.
>> * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
>> * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
>> * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
>> * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
>> * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
>> * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
>> * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
>> * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
>> * capability-without: Must respect RLIMIT_MEMLOCK resource limit
>> * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
>> * capability-priority: 0
>> * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
>> * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
>> * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
>> * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
>> *
>> * Context: Process context. May sleep. Takes mmap_lock for write.
>> *
>> * Return: 0 on success, negative error code on failure
>
>Both of these, of course, are much less informative versions of the data
>you have put up above; it would be nice to unify them somehow.
Ack
--
Thanks,
Sasha
next prev parent reply other threads:[~2025-07-01 22:16 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
2025-06-30 19:53 ` Jonathan Corbet
2025-06-30 22:20 ` Mauro Carvalho Chehab
2025-07-01 14:23 ` Sasha Levin
2025-07-01 15:25 ` Mauro Carvalho Chehab
2025-07-01 19:01 ` Jonathan Corbet
2025-07-01 20:50 ` Sasha Levin
2025-07-01 21:43 ` Jonathan Corbet
2025-07-01 22:16 ` Sasha Levin [this message]
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-07-01 2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
2025-07-01 14:54 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aGReLaCleEjzu-nt@lappy \
--to=sashal@kernel.org \
--cc=chuckwolber@gmail.com \
--cc=corbet@lwn.net \
--cc=gpaoloni@redhat.com \
--cc=kstewart@linuxfoundation.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=tools@kernel.org \
--cc=workflows@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).