All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-api@vger.kernel.org, workflows@vger.kernel.org,
	tools@kernel.org, Kate Stewart <kstewart@linuxfoundation.org>,
	Gabriele Paoloni <gpaoloni@redhat.com>,
	Chuck Wolber <chuckwolber@gmail.com>
Subject: Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
Date: Tue, 1 Jul 2025 16:50:42 -0400	[thread overview]
Message-ID: <aGRKIuR6hgW0YLc_@lappy> (raw)
In-Reply-To: <8734bfspko.fsf@trenco.lwn.net>

On Tue, Jul 01, 2025 at 01:01:27PM -0600, Jonathan Corbet wrote:
>[Adding some of the ELISA folks, who are working in a related area and
>might have thoughts on this.  You can find the patch series under
>discussion at:
>
>  https://lore.kernel.org/all/20250624180742.5795-1-sashal@kernel.org

Yup, we all met at OSS and reached the conclusion that we should lean
towards a machine readable spec, which we thought was closer to my
proposal than the kerneldoc work.

However, with your suggestion, I think it makes more sense to go back to
kerneldoc as that can be made machine readable.

>> In theory, all of that will let us have something like the following in
>> kerneldoc:
>>
>> - @api-type: syscall
>> - @api-version: 1
>> - @context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>> - @param-type: family, KAPI_TYPE_INT
>> - @param-flags: family, KAPI_PARAM_IN
>> - @param-range: family, 0, 45
>> - @param-mask: type, SOCK_TYPE_MASK | SOCK_CLOEXEC | SOCK_NONBLOCK
>> - @error-code: -EAFNOSUPPORT, "Address family not supported"
>> - @error-condition: -EAFNOSUPPORT, "family < 0 || family >= NPROTO"
>> - @capability: CAP_NET_RAW, KAPI_CAP_GRANT_PERMISSION
>> - @capability-allows: CAP_NET_RAW, "Create SOCK_RAW sockets"
>> - @since: 2.0
>> - @return-type: KAPI_TYPE_FD
>> - @return-check: KAPI_RETURN_ERROR_CHECK
>>
>> How does it sound? I'm pretty excited about the possiblity to align this
>> with kerneldoc. Please poke holes in the plan :)
>
>I think we could do it without all the @signs.  We'd also want to see
>how well we could integrate that information with the minimal structure
>we already have: getting the return-value information into the Returns:
>section, for example, and tying the parameter constraints to the
>parameter descriptions we already have.

Right!

So I have a proof of concept which during the build process creates
.apispec.h which are generated from kerneldoc and contain macros
identical to the ones in my RFC.

Here's an example of sys_mlock() spec:

/**
  * sys_mlock - Lock pages in memory
  * @start: Starting address of memory range to lock
  * @len: Length of memory range to lock in bytes
  *
  * Locks pages in the specified address range into RAM, preventing them from
  * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
  * resource limit.
  *
  * long-desc: Locks pages in the specified address range into RAM, preventing
  *   them from being paged to swap. Requires CAP_IPC_LOCK capability
  *   or RLIMIT_MEMLOCK resource limit.
  * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
  * param-type: start, KAPI_TYPE_UINT
  * param-flags: start, KAPI_PARAM_IN
  * param-constraint-type: start, KAPI_CONSTRAINT_NONE
  * param-constraint: start, Rounded down to page boundary
  * param-type: len, KAPI_TYPE_UINT
  * param-flags: len, KAPI_PARAM_IN
  * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
  * param-range: len, 0, LONG_MAX
  * param-constraint: len, Rounded up to page boundary
  * return-type: KAPI_TYPE_INT
  * return-check-type: KAPI_RETURN_ERROR_CHECK
  * return-success: 0
  * error-code: -ENOMEM, ENOMEM, Address range issue,
  *   Some of the specified range is not mapped, has unmapped gaps,
  *   or the lock would cause the number of mapped regions to exceed the limit.
  * error-code: -EPERM, EPERM, Insufficient privileges,
  *   The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
  * error-code: -EINVAL, EINVAL, Address overflow,
  *   The result of the addition start+len was less than start (arithmetic overflow).
  * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
  *   Some or all of the specified address range could not be locked.
  * error-code: -EINTR, EINTR, Interrupted by signal,
  *   The operation was interrupted by a fatal signal before completion.
  * error-code: -EFAULT, EFAULT, Bad address,
  *   The specified address range contains invalid addresses that cannot be accessed.
  * since-version: 2.0
  * lock: mmap_lock, KAPI_LOCK_RWLOCK
  * lock-acquired: true
  * lock-released: true
  * lock-desc: Process memory map write lock
  * signal: FATAL
  * signal-direction: KAPI_SIGNAL_RECEIVE
  * signal-action: KAPI_SIGNAL_ACTION_RETURN
  * signal-condition: Fatal signal pending
  * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
  *   when acquiring mmap_write_lock_killable() and during page population
  *   in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
  *   mlock - the operation continues even if SIGINT/SIGTERM are received.
  * signal-error: -EINTR
  * signal-timing: KAPI_SIGNAL_TIME_DURING
  * signal-priority: 0
  * signal-interruptible: yes
  * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
  * examples: mlock(addr, 4096);  // Lock one page
  *   mlock(addr, len);   // Lock range of pages
  * notes: Memory locks do not stack - multiple calls on the same range can be
  *   undone by a single munlock. Locks are not inherited by child processes.
  *   Pages are locked on whole page boundaries. Commonly used by real-time
  *   applications to prevent page faults during time-critical operations.
  *   Also used for security to prevent sensitive data (e.g., cryptographic keys)
  *   from being written to swap. Note: locked pages may still be saved to
  *   swap during system suspend/hibernate.
  *
  *   Tagged addresses are automatically handled via untagged_addr(). The operation
  *   occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
  *   populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
  *   by recounting locked memory to avoid double-counting overlapping regions.
  * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes
  * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
  * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
  * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
  * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
  * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
  * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
  * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
  * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
  * capability-without: Must respect RLIMIT_MEMLOCK resource limit
  * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
  * capability-priority: 0
  * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
  * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
  * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
  * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
  *
  * Context: Process context. May sleep. Takes mmap_lock for write.
  *
  * Return: 0 on success, negative error code on failure
  */

>The other thing I would really like to see, to the extent we can, is
>that a bunch of patches adding all this data to the source will actually
>be accepted by the relevant maintainers.  It would be a shame to get all
>this infrastructure into place, then have things stall out due to
>maintainer pushback.  Maybe you should start by annotating the
>scheduler-related system calls; if that works the rest should be a piece
>of cake :)

In the RFC I've sent out I've specced out API from different subsystems
to solicit some feedback on those, but so fair it's been quiet.

I'll resend a "lean" RFC v3 with just the base macro spec infra +
kerneldoc support + "tricker" sched API + "trickier" mm API.

I'm thinking that if it's still quiet in a month or two I'll propose a
talk at LPC around it, or maybe try and feedback/consensus during
maintainer's summit.

But yes, it doesn't make sense to take it in until we have an ack from a
few larger subsystems.

-- 
Thanks,
Sasha

  reply	other threads:[~2025-07-01 21:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
2025-06-30 19:53   ` Jonathan Corbet
2025-06-30 22:20     ` Mauro Carvalho Chehab
2025-07-01 14:23       ` Sasha Levin
2025-07-01 15:25         ` Mauro Carvalho Chehab
2025-07-01 19:01         ` Jonathan Corbet
2025-07-01 20:50           ` Sasha Levin [this message]
2025-07-01 21:43             ` Jonathan Corbet
2025-07-01 22:16               ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-07-01  2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
2025-07-01 14:54   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGRKIuR6hgW0YLc_@lappy \
    --to=sashal@kernel.org \
    --cc=chuckwolber@gmail.com \
    --cc=corbet@lwn.net \
    --cc=gpaoloni@redhat.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=tools@kernel.org \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.