All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Corbet <corbet@lwn.net>
To: Sasha Levin <sashal@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-api@vger.kernel.org, workflows@vger.kernel.org,
	tools@kernel.org, Kate Stewart <kstewart@linuxfoundation.org>,
	Gabriele Paoloni <gpaoloni@redhat.com>,
	Chuck Wolber <chuckwolber@gmail.com>
Subject: Re: [RFC v2 01/22] kernel/api: introduce kernel API specification framework
Date: Tue, 01 Jul 2025 15:43:32 -0600	[thread overview]
Message-ID: <87v7obpoxn.fsf@trenco.lwn.net> (raw)
In-Reply-To: <aGRKIuR6hgW0YLc_@lappy>

Sasha Levin <sashal@kernel.org> writes:

> So I have a proof of concept which during the build process creates
> .apispec.h which are generated from kerneldoc and contain macros
> identical to the ones in my RFC.
>
> Here's an example of sys_mlock() spec:

So I'm getting ahead of the game, but I have to ask some questions...

> /**
>   * sys_mlock - Lock pages in memory
>   * @start: Starting address of memory range to lock
>   * @len: Length of memory range to lock in bytes
>   *
>   * Locks pages in the specified address range into RAM, preventing them from
>   * being paged to swap. Requires CAP_IPC_LOCK capability or RLIMIT_MEMLOCK
>   * resource limit.
>   *
>   * long-desc: Locks pages in the specified address range into RAM, preventing
>   *   them from being paged to swap. Requires CAP_IPC_LOCK capability
>   *   or RLIMIT_MEMLOCK resource limit.

Why duplicate the long description?

>   * context-flags: KAPI_CTX_PROCESS | KAPI_CTX_SLEEPABLE
>   * param-type: start, KAPI_TYPE_UINT

This is something I wondered before; rather than a bunch of lengthy
KAPI_* symbols, why not just say __u64 (or some other familiar type)
here?

>   * param-flags: start, KAPI_PARAM_IN
>   * param-constraint-type: start, KAPI_CONSTRAINT_NONE
>   * param-constraint: start, Rounded down to page boundary
>   * param-type: len, KAPI_TYPE_UINT
>   * param-flags: len, KAPI_PARAM_IN
>   * param-constraint-type: len, KAPI_CONSTRAINT_RANGE
>   * param-range: len, 0, LONG_MAX
>   * param-constraint: len, Rounded up to page boundary
>   * return-type: KAPI_TYPE_INT
>   * return-check-type: KAPI_RETURN_ERROR_CHECK
>   * return-success: 0
>   * error-code: -ENOMEM, ENOMEM, Address range issue,
>   *   Some of the specified range is not mapped, has unmapped gaps,
>   *   or the lock would cause the number of mapped regions to exceed the limit.
>   * error-code: -EPERM, EPERM, Insufficient privileges,
>   *   The caller is not privileged (no CAP_IPC_LOCK) and RLIMIT_MEMLOCK is 0.
>   * error-code: -EINVAL, EINVAL, Address overflow,
>   *   The result of the addition start+len was less than start (arithmetic overflow).
>   * error-code: -EAGAIN, EAGAIN, Some or all memory could not be locked,
>   *   Some or all of the specified address range could not be locked.
>   * error-code: -EINTR, EINTR, Interrupted by signal,
>   *   The operation was interrupted by a fatal signal before completion.
>   * error-code: -EFAULT, EFAULT, Bad address,
>   *   The specified address range contains invalid addresses that cannot be accessed.
>   * since-version: 2.0
>   * lock: mmap_lock, KAPI_LOCK_RWLOCK
>   * lock-acquired: true
>   * lock-released: true
>   * lock-desc: Process memory map write lock
>   * signal: FATAL
>   * signal-direction: KAPI_SIGNAL_RECEIVE
>   * signal-action: KAPI_SIGNAL_ACTION_RETURN
>   * signal-condition: Fatal signal pending
>   * signal-desc: Fatal signals (SIGKILL) can interrupt the operation at two points:
>   *   when acquiring mmap_write_lock_killable() and during page population
>   *   in __mm_populate(). Returns -EINTR. Non-fatal signals do NOT interrupt
>   *   mlock - the operation continues even if SIGINT/SIGTERM are received.
>   * signal-error: -EINTR
>   * signal-timing: KAPI_SIGNAL_TIME_DURING
>   * signal-priority: 0
>   * signal-interruptible: yes
>   * signal-state-req: KAPI_SIGNAL_STATE_RUNNING
>   * examples: mlock(addr, 4096);  // Lock one page
>   *   mlock(addr, len);   // Lock range of pages
>   * notes: Memory locks do not stack - multiple calls on the same range can be
>   *   undone by a single munlock. Locks are not inherited by child processes.
>   *   Pages are locked on whole page boundaries. Commonly used by real-time
>   *   applications to prevent page faults during time-critical operations.
>   *   Also used for security to prevent sensitive data (e.g., cryptographic keys)
>   *   from being written to swap. Note: locked pages may still be saved to
>   *   swap during system suspend/hibernate.
>   *
>   *   Tagged addresses are automatically handled via untagged_addr(). The operation
>   *   occurs in two phases: first VMAs are marked with VM_LOCKED, then pages are
>   *   populated into memory. When checking RLIMIT_MEMLOCK, the kernel optimizes
>   *   by recounting locked memory to avoid double-counting overlapping regions.
>   * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, process memory, Locks pages into physical memory, preventing swapping, reversible=yes

I hope the really long lines starting here aren't the intended way to go...:)

>   * side-effect: KAPI_EFFECT_MODIFY_STATE, mm->locked_vm, Increases process locked memory counter, reversible=yes
>   * side-effect: KAPI_EFFECT_ALLOC_MEMORY, physical pages, May allocate and populate page table entries, condition=Pages not already present, reversible=yes
>   * side-effect: KAPI_EFFECT_MODIFY_STATE | KAPI_EFFECT_ALLOC_MEMORY, page faults, Triggers page faults to bring pages into memory, condition=Pages not already resident
>   * side-effect: KAPI_EFFECT_MODIFY_STATE, VMA splitting, May split existing VMAs at lock boundaries, condition=Lock range partially overlaps existing VMA
>   * state-trans: memory pages, swappable, locked in RAM, Pages become non-swappable and pinned in physical memory
>   * state-trans: VMA flags, unlocked, VM_LOCKED set, Virtual memory area marked as locked
>   * capability: CAP_IPC_LOCK, KAPI_CAP_BYPASS_CHECK, CAP_IPC_LOCK capability
>   * capability-allows: Lock unlimited amount of memory (no RLIMIT_MEMLOCK enforcement)
>   * capability-without: Must respect RLIMIT_MEMLOCK resource limit
>   * capability-condition: Checked when RLIMIT_MEMLOCK is 0 or locking would exceed limit
>   * capability-priority: 0
>   * constraint: RLIMIT_MEMLOCK Resource Limit, The RLIMIT_MEMLOCK soft resource limit specifies the maximum bytes of memory that may be locked into RAM. Unprivileged processes are restricted to this limit. CAP_IPC_LOCK capability allows bypassing this limit entirely. The limit is enforced per-process, not per-user.
>   * constraint-expr: RLIMIT_MEMLOCK Resource Limit, locked_memory + request_size <= RLIMIT_MEMLOCK || CAP_IPC_LOCK
>   * constraint: Memory Pressure and OOM, Locking large amounts of memory can cause system-wide memory pressure and potentially trigger the OOM killer. The kernel does not prevent locking memory that would destabilize the system.
>   * constraint: Special Memory Areas, Some memory types cannot be locked or are silently skipped: VM_IO/VM_PFNMAP areas (device mappings) are skipped; Hugetlb pages are inherently pinned and skipped; DAX mappings are always present in memory and skipped; Secret memory (memfd_secret) mappings are skipped; VM_DROPPABLE memory cannot be locked and is skipped; Gate VMA (kernel entry point) is skipped; VM_LOCKED areas are already locked. These special areas are silently excluded without error.
>   *
>   * Context: Process context. May sleep. Takes mmap_lock for write.
>   *
>   * Return: 0 on success, negative error code on failure

Both of these, of course, are much less informative versions of the data
you have put up above; it would be nice to unify them somehow.

Thanks,

jon

  reply	other threads:[~2025-07-01 21:43 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-24 18:07 [RFC v2 00/22] Kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 01/22] kernel/api: introduce kernel " Sasha Levin
2025-06-30 19:53   ` Jonathan Corbet
2025-06-30 22:20     ` Mauro Carvalho Chehab
2025-07-01 14:23       ` Sasha Levin
2025-07-01 15:25         ` Mauro Carvalho Chehab
2025-07-01 19:01         ` Jonathan Corbet
2025-07-01 20:50           ` Sasha Levin
2025-07-01 21:43             ` Jonathan Corbet [this message]
2025-07-01 22:16               ` Sasha Levin
2025-06-24 18:07 ` [RFC v2 02/22] eventpoll: add API specification for epoll_create1 Sasha Levin
2025-06-24 18:07 ` [RFC v2 03/22] eventpoll: add API specification for epoll_create Sasha Levin
2025-06-24 18:07 ` [RFC v2 04/22] eventpoll: add API specification for epoll_ctl Sasha Levin
2025-06-24 18:07 ` [RFC v2 05/22] eventpoll: add API specification for epoll_wait Sasha Levin
2025-06-24 18:07 ` [RFC v2 06/22] eventpoll: add API specification for epoll_pwait Sasha Levin
2025-06-24 18:07 ` [RFC v2 07/22] eventpoll: add API specification for epoll_pwait2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 08/22] exec: add API specification for execve Sasha Levin
2025-06-24 18:07 ` [RFC v2 09/22] exec: add API specification for execveat Sasha Levin
2025-06-24 18:07 ` [RFC v2 10/22] mm/mlock: add API specification for mlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 11/22] mm/mlock: add API specification for mlock2 Sasha Levin
2025-06-24 18:07 ` [RFC v2 12/22] mm/mlock: add API specification for mlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 13/22] mm/mlock: add API specification for munlock Sasha Levin
2025-06-24 18:07 ` [RFC v2 14/22] mm/mlock: add API specification for munlockall Sasha Levin
2025-06-24 18:07 ` [RFC v2 15/22] kernel/api: add debugfs interface for kernel API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 16/22] kernel/api: add IOCTL specification infrastructure Sasha Levin
2025-06-24 18:07 ` [RFC v2 17/22] fwctl: add detailed IOCTL API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 18/22] binder: " Sasha Levin
2025-06-24 18:07 ` [RFC v2 19/22] kernel/api: Add sysfs validation support to kernel API specification framework Sasha Levin
2025-06-24 18:07 ` [RFC v2 20/22] block: sysfs API specifications Sasha Levin
2025-06-24 18:07 ` [RFC v2 21/22] net/socket: add API specification for socket() Sasha Levin
2025-06-24 18:07 ` [RFC v2 22/22] tools/kapi: Add kernel API specification extraction tool Sasha Levin
2025-07-01  2:43 ` [RFC v2 00/22] Kernel API specification framework Jake Edge
2025-07-01 14:54   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v7obpoxn.fsf@trenco.lwn.net \
    --to=corbet@lwn.net \
    --cc=chuckwolber@gmail.com \
    --cc=gpaoloni@redhat.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=sashal@kernel.org \
    --cc=tools@kernel.org \
    --cc=workflows@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.